Skip to content

generalize ingest reader: VariantReader trait for VCF/BCF/BGEN/GDS #87

@vineetver

Description

@vineetver

RecordContext handles normalized fields and writes parquet. The format-specific reading should be behind a trait so new formats plug in without touching the processing core.

trait VariantReader {
    fn sample_names(&self) -> &[String];
    fn records(&mut self) -> impl Iterator<Item = Result<RawRecord, CohortError>>;
}

Implementations: VcfVariantReader (noodles-vcf), BcfVariantReader (noodles-bcf), BgenVariantReader, GdsVariantReader (SeqArray/HDF5).

Each reader owns its parallelism strategy (BGZF threads, tabix region splits, BGEN .bgi index, HDF5 chunks). RecordContext stays format-agnostic.

Composition: fn ingest(reader: &mut dyn VariantReader, ctx: &mut RecordContext, output: &dyn Output)

Related: #69, #74, #86

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestingestVCF/genotype ingest pipeline

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions