feat(ci): optimize HMF reference data download with Magic Cache#256
Draft
edmundmiller wants to merge 2 commits intodevfrom
Draft
feat(ci): optimize HMF reference data download with Magic Cache#256edmundmiller wants to merge 2 commits intodevfrom
edmundmiller wants to merge 2 commits intodevfrom
Conversation
Add automatic download of ~25GB HMF reference data from Hartwig's R2 CDN before running nf-tests. This ensures tests have access to the required GRCh38_hmf genome and WGS resource files. Changes: - Add Nextflow setup and reference download steps to nf-test workflow - Use prepare_reference mode to download from R2 CDN - Configure tests/nextflow.config to detect and use local reference data - Falls back to remote URLs for local development The download adds ~10-15 minutes to CI runs but ensures tests can access all required reference files without caching (GitHub Actions cache is limited to 10GB).
Member
|
Warning Newer version of the nf-core template is available. Your pipeline is using an old version of the nf-core template: 3.3.2. For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation. |
|
Refactor nf-test workflow to download ~25GB HMF reference data once and share across all matrix jobs using runs-on Magic Cache (S3-backed). Changes: - Enable Magic Cache (extras=s3-cache) on all jobs - Add dedicated download-reference job that runs once before matrix - Use actions/cache with runs-on Magic Cache for S3-backed storage - Matrix jobs now restore from cache instead of downloading individually - Add runs-on/action@v2 to all jobs for Magic Cache support Performance impact: - Before: 42 jobs × 15 min = 630 minutes of download time - After: 15 min download + (42 × 3 min restore) = 141 minutes - Saves ~489 minutes (~8 hours) per workflow run Benefits: - No GitHub 10GB cache limit (uses S3 backend) - Fast cache restore across all matrix jobs - Cache persists across workflow runs - Significant CI time savings
ac53b87 to
9b66daa
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Optimizes the nf-test CI workflow to download ~25GB of HMF reference data once and share it across all matrix jobs using runs-on Magic Cache (S3-backed storage).
Changes
extras=s3-cacheto all jobs andruns-on/action@v2stepdownload-referencejob: Downloads reference data once before matrix jobs startPerformance Impact
Before
After
Savings
~489 minutes (~8 hours) per workflow run! 🚀
Benefits
✅ No size limits: runs-on Magic Cache uses S3 backend (no 10GB GitHub limit)
✅ Fast restores: S3-backed cache is much faster than downloading from R2
✅ Persistent cache: Cache persists across workflow runs
✅ Significant time savings: Reduces CI runtime by ~8 hours
Related Issues
Resolves the need to download 25GB WiGiTS toolkit data for nf-tests.
Testing
download-referencejob completes successfully🤖 Generated with Claude Code
Co-Authored-By: Claude noreply@anthropic.com