Reader and writer both derive batch sizes from the same undivided memory budget. They compete for RAM without coordination.
The writer dominates: GenotypeWriter allocates batch_size * n_samples * 4 for the FixedSizeListBuilder, and Arrow's finish() temporarily doubles that. Current workaround is budget / 4 which wastes half the memory.
Split the budget explicitly: reader gets what it needs (BGZF buffers, line buffer), writer gets the rest for larger batches and fewer flushes. Also: CohortPool claims the full budget during ingest even though DataFusion is idle.
Files: src/ingest/vcf.rs, src/staar/genotype.rs, src/resource.rs, src/engine.rs
Related: #74, #82, #59, #83
Reader and writer both derive batch sizes from the same undivided memory budget. They compete for RAM without coordination.
The writer dominates:
GenotypeWriterallocatesbatch_size * n_samples * 4for the FixedSizeListBuilder, and Arrow'sfinish()temporarily doubles that. Current workaround isbudget / 4which wastes half the memory.Split the budget explicitly: reader gets what it needs (BGZF buffers, line buffer), writer gets the rest for larger batches and fewer flushes. Also:
CohortPoolclaims the full budget during ingest even though DataFusion is idle.Files:
src/ingest/vcf.rs,src/staar/genotype.rs,src/resource.rs,src/engine.rsRelated: #74, #82, #59, #83