Skip to content

Latest commit

 

History

History
16 lines (8 loc) · 2.01 KB

File metadata and controls

16 lines (8 loc) · 2.01 KB

Project 30: Unpacking Single-Cell LLMs: A FAIR Framework for scalable, shareable single-cell foundation models

Abstract

Foundational models such as scGPT and the CancerFoundation model have showcased promising capabilities in tasks like cell annotation and perturbation response prediction in single-cell omics. However, their broader adoption is impeded by challenges in reproducibility, high resource demands, and the absence of standardised metadata and deployment guidelines.

Our BioHackathon 2025 project proposes a comprehensive FAIRification framework for single-cell foundational models. We aim to integrate and package these models into reproducible, shareable workflows by leveraging workflow management systems (e.g., Nextflow/Snakemake), standardised packaging formats (RO-Crate), and exploring registration to the WorkflowHub. The objective is to develop template guidelines that enable researchers to run, fine-tune, and evaluate foundational models on real-world single-cell data while ensuring that model outputs are findable, accessible, interoperable, and reusable.

We will systematically capture metadata detailing the datasets used for training and fine-tuning and the types of output representations generated by these models. We will also explore emerging standardised evaluation metrics for assessing model performance. Our project will focus on pre-trained models and evaluate model generalisation in key analytical tasks like cell annotation, gene regulatory network reconstruction and perturbation modelling, thereby closing the chasms between academic prototypes and scalable, community-adopted solutions.

Anchored in the ELIXIR Communities dedicated to machine learning, workflows, and single-cell omics, this interdisciplinary effort will bring together developers, ML experts, and FAIR data advocates to establish the guidelines, containers, and benchmark-ready workflows necessary for advancing foundational models in single-cell research.

Lead(s)

Marina Esteban Medina, George Gavriilidis, Rasool Saghaleyni