psma

psma is an open source Python implementation of the probabilistic surface of molecular activity workflow described in the paper "A visual approach for analysis and inference of molecular activity spaces".

The package builds molecular similarity matrices, embeds compounds into a 2D reference space, estimates class-conditional activity surfaces, and scores posterior probabilities for projected test compounds.

Features

RDKit Morgan Tanimoto, embedding cosine, and imported triples similarity backends
random and Butina train/test splitting
pure Python API returning typed result objects
CLI entrypoint for reproducible runs
static Matplotlib plots and optional interactive Bokeh plots
Sphinx documentation with a notebook-based plotting tutorial

Installation

The package currently supports Python >=3.11,<3.13.

Core install:

pip install psma

Optional extras:

pip install "psma[rdkit]"
pip install "psma[plotting]"
pip install "psma[docs]"

For local development, use Pixi:

pixi install

Quickstart

Run the CLI on a CSV with a binary endpoint and SMILES column:

pixi run psma run docs/_data/solubility_NCATS-sol.csv \
  --output-dir .tmp/ncats_sol_cli \
  --y-col low_solubility \
  --label-threshold 0.5 \
  --label-direction ge \
  --similarity-method rdkit_morgan_tanimoto \
  --smiles-col canonical_smiles \
  --split-method random

For Python use, call the pure computation API:

from psma import compute_psma_surface

result = compute_psma_surface(
    df,
    y_col="low_solubility",
    smiles_col="canonical_smiles",
    similarity_method="rdkit_morgan_tanimoto",
    label_threshold=0.5,
    label_direction="ge",
)

print(result.metrics.mcc)

Documentation

Build the searchable HTML documentation locally:

pixi run docs
open docs/_build/html/index.html

The documentation includes tutorials, how-to guides, explanations, and generated API reference pages.

Development

Common tasks:

pixi run lint
pixi run typecheck
pixi run test
pixi run docs

The project uses:

ruff for formatting and linting
pyright for type checking
pytest for tests
sphinx and myst-nb for documentation

Example Dataset

The documentation tutorial uses the NCATS-sol dataset stored at docs/_data/solubility_NCATS-sol.csv.

Source repository:

https://github.com/netknowledge/ADMET#

The dataset was downloaded from the already-preprocessed NCATS-sol data published by that repository. The upstream authors describe the preprocessing as follows:

start from an original dataset containing 2,532 records
drop one compound, represented by two rows, with inconsistent outcomes
drop one duplicated row
drop 76 compounds with inconclusive outcomes
generate the low_solubility column from Analysis Comment, mapping Low phenotype to positive class 1 and Moderate/High phenotype to negative class 0
use RDKit to transform SMILES into canonical forms

The upstream NCATS-sol description states that the resulting dataset has 2,453 compounds and binary labels indicating whether each compound has low solubility.

Dataset reference:

H. Sun, P. Shah, K. Nguyen, K. R. Yu, E. Kerns, M. Kabir, Y. Wang, and X. Xu, Predictive models of aqueous solubility of organic compounds built on a large dataset of high integrity, Bioorganic & Medicinal Chemistry 27, 3110 (2019).

License

This package is distributed under the MIT License. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.github/workflows		.github/workflows
docs		docs
psma		psma
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pixi.lock		pixi.lock
pixi.toml		pixi.toml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

psma

Features

Installation

Quickstart

Documentation

Development

Example Dataset

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

psma

Features

Installation

Quickstart

Documentation

Development

Example Dataset

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages