GWAS on biobank-scale data is a solved problem. REGENIE ships as a single static binary (3MB, Linux + macOS) and handles the dense genotype math.
Two paths:
Immediate: ingest summary stats. Our tabular ingest already handles TSV. Add column aliases for REGENIE/SAIGE output formats. User runs REGENIE themselves, feeds us the results.
Future: managed REGENIE integration. Download and cache the REGENIE binary automatically (same pattern as favor setup for annotation data). favor gwas calls it as a subprocess, manages I/O, ingests results.
favor gwas --trait pheno.tsv --genotypes data.bed
# auto-downloads regenie to ~/.cohort/bin/ on first run
# runs regenie step1 + step2 as subprocess
# ingests summary stats into .cohort/datasets/
# ready for favor interpret
No conda, no PATH, no environment. One command.
Related: #12, #14, #87, #95
GWAS on biobank-scale data is a solved problem. REGENIE ships as a single static binary (3MB, Linux + macOS) and handles the dense genotype math.
Two paths:
Immediate: ingest summary stats. Our tabular ingest already handles TSV. Add column aliases for REGENIE/SAIGE output formats. User runs REGENIE themselves, feeds us the results.
Future: managed REGENIE integration. Download and cache the REGENIE binary automatically (same pattern as favor setup for annotation data). favor gwas calls it as a subprocess, manages I/O, ingests results.
No conda, no PATH, no environment. One command.
Related: #12, #14, #87, #95