bovreg-twas-suite is a modular Nextflow pipeline suite for performing transcriptome-wide association studies (TWAS) in cattle. It includes workflows to generate imputed genotypes and expression inputs from RNA-seq data for model training, and to run TWAS using trained models to map trait-associated genes.
It comprises two independent workflows:
| Workflow | Description |
|---|---|
bovreg-twas-preprocess |
Preprocess RNA-seq and genotype data — includes trimming, alignment, quantification, imputation, and PEER factor inference |
bovreg-twas-model |
Train elastic net gene expression prediction models and run S-PrediXcan |
Clone the repository:
git clone https://github.com/evotools/bovreg-twas-suite.git
cd bovreg-twas-suiteEnsure you have:
- Nextflow v22.10.0 or higher
- Either: Conda, Docker, or Singularity
- Reference genome files, annotation, and optionally a GWAS SNP list
Runs:
trim_galore,fastqc,star,featureCounts,kallisto- automatic
MultiQCaggregation of FastQC and STAR logs GLIMPSEphasing and imputation- PEER factor estimation
nextflow run workflows/bovreg-twas-preprocess \
--fasta cattle.fa \
--gtf annotation.gtf \
--input_sheet samples.tsv \
--panel_vcf reference_panel.vcf.gz \
--map_file genetic_map.txt \
-profile condaUse
-stub-runto validate structure.
Expected QC output:
results/multiqc/multiqc_report.html
Runs:
- VCF and SNP annotation filtering
- Training of elastic net models using nested CV
- Covariance matrix computation
- Merging results into SQLite
- Running
S-PrediXcan - Genome-wide TWAS Manhattan plots (one per trait result file)
nextflow run workflows/bovreg-twas-model \
--gwas_snps gwas_snps.txt \
--gwas_vcf gwas.vcf.gz \
--gtf annotation.gtf \
--genotype_file genotypes.txt \
--expression_file expression.txt \
--covariates_file covariates.txt \
--snp_annot_file snp_annot.txt \
--gwas_sumstats_glob "sumstats/*.tsv" \
-profile condaExpected model plotting outputs:
results_model/s_predixcan/s_predixcan_results_<trait>.tsvresults_model/twas_plots/twas_manhattan_<trait>.pngresults_model/twas_plots/twas_plot_manifest_<trait>.tsv
Each workflow includes a nextflow_schema.json so you can explore parameters with:
nextflow run workflows/bovreg-twas-preprocess --help
nextflow run workflows/bovreg-twas-model --helpYou can validate the full structure using:
nextflow run workflows/bovreg-twas-preprocess -stub-run
nextflow run workflows/bovreg-twas-model -stub-runTest templates are provided in tests/preprocess/.
Update tests/preprocess/samples.test.tsv and tests/preprocess/params.test.json
with valid cluster paths before running.
Validate environment and profile resolution:
nextflow -version
nextflow config workflows/bovreg-twas-preprocess -profile sgeStructural (no compute-heavy execution):
nextflow run workflows/bovreg-twas-preprocess \
-profile sge \
-stub-run \
-params-file tests/preprocess/params.test.jsonShort real test run:
nextflow run workflows/bovreg-twas-preprocess \
-profile sge \
-params-file tests/preprocess/params.test.json \
--queue <queue_name> \
--max_cpus 8 \
--max_memory '32 GB' \
--max_time '8.h' \
-resumeMultiQC verification checklist:
- Confirm SGE jobs are submitted and completed.
- Confirm
fastqcandstar_aligntasks both complete. - Confirm
multiqcruns and publishes:results/multiqc/multiqc_report.html
- Open the report and verify FastQC and STAR sections are present.
- Re-run with
-resumeand verify completed tasks are reused.
workflows/
├── bovreg-twas-preprocess/
│ ├── main.nf
│ ├── modules/
│ ├── nextflow.config
│ └── nextflow_schema.json
├── bovreg-twas-model/
│ ├── main.nf
│ ├── modules/
│ ├── scripts/
│ ├── nextflow.config
│ └── nextflow_schema.json
configs/ → Profiles for HPC or containerised execution
envs/ → Conda YAMLs for each module group
bin/ → Custom scripts (e.g., parse_sample_sheet.py)
Developed by Siddharth Jayaraman for the BovReg project.
Includes contributions from Roslin Institute pipelines and PredictDB modeling strategies.
MIT License. See LICENSE for full text.