This workspace contains a two-stage pipeline:
- (Optional) Fine-tune CellFM and export per-cell embeddings.
- Train a hierarchical attention pooling model to predict sample phenotypes (default: age_z_score) and recover cell/sample weights.
- cellfm/: CellFM fine-tuning and embedding export.
- pooling/: Hierarchical attention pooling training + weight extraction.
- csv/: Gene mapping files required by CellFM.
- outputs/: Example outputs.
- Install dependencies
pip install -r requirements.txt- Optional: CellFM fine-tuning
python -m cellfm.finetune --adata data/train.h5ad --ms-ckpt path/to.ckpt --out-dir outputs/cellfm- Export per-cell embeddings
python -m cellfm.embed --adata data/train.h5ad --pt-ckpt outputs/cellfm/model.pt \
--out-adata data/with_emb.h5ad --emb-key X_cellfm- Pooling training (predicts z-score labels)
python -m pooling.train --adata data/with_emb.h5ad --emb_key X_cellfm --out_dir outputs/pooling- Weight extraction from a trained checkpoint
python -m pooling.extract --adata data/with_emb.h5ad --emb_key X_cellfm \
--ckpt_path outputs/pooling/model.pt --out_dir outputs/poolingCellFM input:
- h5ad with gene names in
var_names; mapping usescsv/expand_gene_info.csvandcsv/updated_hgcn.tsv. - obs columns: celltype (string), batch_id (int), train (0/1). Missing columns are auto-filled.
- use
--val-fracincellfm.finetuneto create a validation split.
Pooling input:
- obs has sample id column (default: donor_id; use
--sample_keyto override). - label column (default: age_z_score; use
--label_keyto pick). - embeddings in
obsm[emb_key](default is X; use--emb_keyto point to X_cellfm). - optional integer cell id column via
--cell_id_key; otherwise obs index is used. - Leiden clustering is run per sample (set
--resolution).
CellFM:
outputs/cellfm/model.pt
Embedding export:
.h5adwithobsm[emb_key]containing per-cell embeddings.
Pooling training:
model.pt,label_scaler.csv,test_pred.csv
Weight extraction:
weights.h5adwith obs columns: agg_weight, sample_weight, cell_weight, sample_ids
- MindSpore is only needed when loading a MindSpore checkpoint with
--ms-ckpt.