split memory budget between reader and writer in VCF ingest

Reader and writer both derive batch sizes from the same undivided memory budget. They compete for RAM without coordination.

The writer dominates: `GenotypeWriter` allocates `batch_size * n_samples * 4` for the FixedSizeListBuilder, and Arrow's `finish()` temporarily doubles that. Current workaround is `budget / 4` which wastes half the memory.

Split the budget explicitly: reader gets what it needs (BGZF buffers, line buffer), writer gets the rest for larger batches and fewer flushes. Also: `CohortPool` claims the full budget during ingest even though DataFusion is idle.

Files: `src/ingest/vcf.rs`, `src/staar/genotype.rs`, `src/resource.rs`, `src/engine.rs`

Related: #74, #82, #59, #83

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

split memory budget between reader and writer in VCF ingest #84

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

split memory budget between reader and writer in VCF ingest #84

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions