Running STAARpipeline-Tutorial end-to-end via favor-cli needs Rdata at both ends:
- phenotype often shipped as an .Rdata data frame
- STAARpipelineSummary calls
get(load(.)) on per-shard output files and expects named R objects
Current state: phenotype load is CSV/TSV only; outputs are parquet + JSON metadata.
Tutorial expectations (from STAARpipelineSummary scripts):
- individual: one data frame per shard, filename
<output>_<chr>_<groupid>.Rdata, columns CHR,POS,REF,ALT,ALT_AF,MAC,N,pvalue,Score,SE,Est
- gene-centric coding/noncoding/ncRNA: list of mask data frames; columns include Gene,Chr,Category,#SNV,cMAC,MAF_cutoff,STAAR-O,ACAT-O,STAAR-S(1,25),STAAR-S(1,1),STAAR-B(1,25),STAAR-B(1,1),STAAR-A(1,25),STAAR-A(1,1), plus per-annotation sub p-values
- sliding window: same column shape keyed by chr,start_loc,end_loc
- SCANG: list with SCANG_O/S/B _res, _top1, _emthr
Needs:
- Rdata reader for phenotype input (serde-rdata or equivalent)
- Rdata writer for per-shard outputs
- --output-format flag accepting parquet (default), rdata, or both
- object and column names match STAARpipelineSummary load sites exactly
Running STAARpipeline-Tutorial end-to-end via favor-cli needs Rdata at both ends:
get(load(.))on per-shard output files and expects named R objectsCurrent state: phenotype load is CSV/TSV only; outputs are parquet + JSON metadata.
Tutorial expectations (from STAARpipelineSummary scripts):
<output>_<chr>_<groupid>.Rdata, columns CHR,POS,REF,ALT,ALT_AF,MAC,N,pvalue,Score,SE,EstNeeds: