Skip to content

resumable VCF ingest for multi-file jobs #82

@vineetver

Description

@vineetver

When ingesting many VCF blocks (e.g. 23 UKB chr22 blocks), a failure at file 18 of 23 loses all prior work. The entire ingest restarts from zero.

A simple checkpoint mechanism would let us skip already-completed files on retry:

  • After each file's worker finishes, record it in a progress.json alongside the output
  • On restart, check which part files already exist and skip those workers
  • Final metadata merge (scan_and_register) runs after all files are done

This matters most for the parallel ingest path where jobs run on shared HPC queues and can be preempted.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestingestVCF/genotype ingest pipeline

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions