Production-ready amalgkit RNA-seq workflow configurations for automated transcript quantification pipelines.
| File | Description |
|---|---|
amalgkit_template.yaml |
Reference: 400+ line template with all options documented |
amalgkit_test.yaml |
Minimal test configuration for validation |
amalgkit_pogonomyrmex_barbatus.yaml |
Production: Full P. barbatus dataset (95/110 quantified) |
amalgkit_apis_mellifera_all.yaml |
Production: Full A. mellifera dataset (~7,270 samples) |
amalgkit_cross_species.yaml |
Cross-species TMM normalization config |
tissue_mapping.yaml |
Canonical tissue name synonyms for normalization |
tissue_patches.yaml |
Per-bioproject/sample tissue overrides |
# Core paths (relative to repo root)
work_dir: output/amalgkit/{species}/work
log_dir: output/amalgkit/{species}/logs
threads: 16
# Species identification
species_list:
- Pogonomyrmex_barbatus
taxon_id: 144034
# Reference genome
genome:
accession: GCF_000187915.1
dest_dir: output/amalgkit/shared/genome/Pogonomyrmex_barbatus
# Step-specific parameters
steps:
getfastq:
redo: no # Skip already-downloaded
keep_fastq: no # Delete after quant
quant:
redo: no # Skip already-quantified
index_dir: ... # Reuse kallisto indexFor large datasets with limited disk space:
steps:
getfastq:
redo: no # Resume capability
quant:
keep_fastq: no # Immediate cleanup
redo: no # IdempotentReuse genome/index across configs:
genome:
dest_dir: output/amalgkit/shared/genome/Pogonomyrmex_barbatus
steps:
quant:
index_dir: output/amalgkit/shared/genome/Pogonomyrmex_barbatus/indexFilter to RNA-Seq + Illumina to prevent genomic samples leaking in:
steps:
metadata:
search_string: '"Species"[Organism] AND "RNA-Seq"[Strategy] AND "Illumina"[Platform]'- Copy
amalgkit_template.yaml→amalgkit_{species}.yaml - Update
species_list,taxon_id, andgenome.accession - Adjust paths:
work_dir,log_dir,genome.dest_dir - Validation: Run
python3 scripts/rna/validate_configs.pyto ensure schema compliance. - Test with small sample subset first (use
max_sample: 5) - Scale to full dataset after validation
- Config Validation: usage of
scripts/rna/validate_configs.pyis mandatory for all new configurations. - Zero-Mock Policy: All Amalgkit tests strictly adhere to the Zero-Mock policy, ensuring real functional verification of the CLI and environment.
Prefix with AK_:
export AK_THREADS=16
export AK_WORK_DIR=/fast/storage/amalgkit
export NCBI_EMAIL=your@email.com