-
Notifications
You must be signed in to change notification settings - Fork 20
Description
Description of the bug
Hi,
I've been running the pipeline using primary assembly FASTA and GTF ref. files and it seemed to error out during the dedupe step.
From tracing the work directories it seems like the dedupe step is preceded by splitting the bam files into chromosomes and contigs. The contig KI270755 for this sample seemed to have no reads and therefore umi_tools received an empty BAM file for the dedupe step and throws an error.
Is it possible to configure the pipeline to ignore empty contigs and carry on?
Many thanks,
Jack
Command used and terminal output
Command ran from .sh file:
#!/bin/bash
module load Nextflow/24.04.2
#module load anaconda3/personal
#conda install -c bioconda nextflow
nextflow run nf-core/scnanoseq \
--input ./samplesheet.csv \
--outdir ./results \
--genome_fasta /rds/general/user/jw4225/projects/lms-ware-analysis/live/jack/hf_trans/ref/GRCh38.primary_assembly.genome.fa \
--transcript_fasta /rds/general/user/jw4225/projects/lms-ware-analysis/live/jack/hf_trans/ref/gencode.v48.transcripts.fa \
--gtf /rds/general/user/jw4225/projects/lms-ware-analysis/live/jack/hf_trans/ref/gencode.v48.primary_assembly.annotation.gtf \
--quantifier "isoquant,oarfish" \
--barcode_format 10X_3v3 \
-profile imperial \
-c /rds/general/user/jw4225/projects/lms-ware-analysis/live/jack/hf_trans/imperial.config \
-resume infallible_wiles
---
Error message:
ERROR ~ Error executing process > 'NFCORE_SCNANOSEQ:SCNANOSEQ:PROCESS_LONGREAD_SCRNA_TRANSCRIPT:DEDUP_UMIS:UMITOOLS_DEDUP (HF374.KI270755)'
Caused by:
Process `NFCORE_SCNANOSEQ:SCNANOSEQ:PROCESS_LONGREAD_SCRNA_TRANSCRIPT:DEDUP_UMIS:UMITOOLS_DEDUP (HF374.KI270755)` terminated with an error exit status (1)
Command executed:
PYTHONHASHSEED=0 umi_tools \
dedup \
-I HF374.KI270755.sorted.bam \
-S HF374.KI270755.transcriptome.umi_dedup.bam \
-L HF374.KI270755.transcriptome.umi_dedup.log \
--output-stats HF374.KI270755.transcriptome.umi_dedup \
--paired \
--per-cell --random-seed=100
cat <<-END_VERSIONS > versions.yml
"NFCORE_SCNANOSEQ:SCNANOSEQ:PROCESS_LONGREAD_SCRNA_TRANSCRIPT:DEDUP_UMIS:UMITOOLS_DEDUP":
umitools: $( umi_tools --version | sed '/version:/!d; s/.*: //' )
END_VERSIONS
Command exit status:
1
Command output:
(empty)
Command error:
INFO: Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred
INFO: Environment variable SINGULARITYENV_NXF_TASK_WORKDIR is set, but APPTAINERENV_NXF_TASK_WORKDIR is preferred
INFO: Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred
Traceback (most recent call last):
File "/usr/local/bin/umi_tools", line 11, in <module>
sys.exit(main())
File "/usr/local/lib/python3.9/site-packages/umi_tools/umi_tools.py", line 61, in main
module.main(sys.argv)
File "/usr/local/lib/python3.9/site-packages/umi_tools/dedup.py", line 310, in main
read_gn = umi_methods.random_read_generator(
File "/usr/local/lib/python3.9/site-packages/umi_tools/umi_methods.py", line 191, in __init__
self.fill()
File "/usr/local/lib/python3.9/site-packages/umi_tools/umi_methods.py", line 224, in fill
self.refill_random()
File "/usr/local/lib/python3.9/site-packages/umi_tools/umi_methods.py", line 195, in refill_random
self.random_umis = np.random.choice(
File "numpy/random/mtrand.pyx", line 951, in numpy.random.mtrand.RandomState.choice
ValueError: 'a' cannot be empty unless no samples are taken
Work dir:
/rds/general/project/lms-ware-analysis/live/jack/hf_trans/work/4b/c4fe91ee74f94163954b19dbbff7fc
Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`
-- Check '.nextflow.log' file for details
ERROR ~ Pipeline failed. Please refer to troubleshooting docs: https://nf-co.re/docs/usage/troubleshooting
-- Check '.nextflow.log' file for details
-[nf-core/scnanoseq] Pipeline completed with errors-
WARN: Killing running tasks (27)
ERROR ~ Unexpected error [NullPointerException]
-- Check '.nextflow.log' file for detailsRelevant files
For reference this is the first few lines of the bam file contents before the bam file is split into contigs:
bash-4.4$ samtools view /rds/general/user/jw4225/projects/lms-ware-analysis/live/jack/hf_trans/work/bb/23102cf80cfdf38e79eafafc203293/HF374.tagged.bam | head
SRR32154426.10629689_CCAACTTTCAGCCTTC_AATATTCCACAT 256 ENST00000832824.1|ENSG00000290825.2|-|-|DDX11L16-260|DDX11L16|1379|lncRNA| 417 0 75S66=1I140=2D299=1D15=2I6=1X150=1X83=1X94=1X103=1066S * 0 0 * * NM:i:10 ms:i:1870 AS:i:1868 nn:i:0 tp:Z:S cm:i:146 s1:i:878 de:f:0.0083 MD:Z:206^AG299^C21T150T83T94G103 rl:i:517 CR:Z:CCAACTTTCAGCCTTC CY:Z:/132,++++1133345 UR:Z:AATATTCCACAT UY:Z:32/.--.025.. CB:Z:CCAATTTTCAGCCTTC
SRR32154426.13509812_CCACGAGGTCATCGCG_CACACATAATCC 256 ENST00000832824.1|ENSG00000290825.2|-|-|DDX11L16-260|DDX11L16|1379|lncRNA| 417 0 1206S206=2D45=1D14=1X106=2X3=1I127=1D21=1X19=2D2=2X183=1X1=1I24=1X198=1104S * 0 0 * * NM:i:16 ms:i:1828 AS:i:1826 nn:i:0 tp:Z:S cm:i:137 s1:i:841 de:f:0.0145 MD:Z:206^AG45^C14C106G0A130^C21T19^AA2T0G183A25T198 rl:i:805 CR:Z:CCACGAGGTCATCGCG CY:Z:0142.....1110.., UR:Z:CACACATAATCC UY:Z:,,./63/.-,,- CB:Z:CCACGAGGTCATCGCG
SRR32154426.43456201_TGACCCTCATAACAGA_TCCATGGTATAA 256 ENST00000832824.1|ENSG00000290825.2|-|-|DDX11L16-260|DDX11L16|1379|lncRNA| 994 0 245S6=2X144=1X3=59S * 0 0 ** NM:i:3 ms:i:294 AS:i:294 nn:i:0 tp:Z:S cm:i:21 s1:i:135 de:f:0.0192 MD:Z:6C0C144G3 rl:i:0 CR:Z:TGACCCTCATAACAGA CY:Z:3226322233544221 UR:Z:TCCATGGTATAA UY:Z:22340//..035 CB:Z:TGACCCTCATAACAGAThe bam file after splitting is empty.
This is the code that produced the empty BAM files:
#!/usr/bin/env bash -Ceuo pipefail
samtools \
view \
--threads 5 \
\
\
\
-o HF374.KI270755.bam \
HF374.tagged.bam \
\
`cat KI270755.1.transcripts.txt`
cat <<-END_VERSIONS > versions.yml
"NFCORE_SCNANOSEQ:SCNANOSEQ:PROCESS_LONGREAD_SCRNA_TRANSCRIPT:DEDUP_UMIS:SAMTOOLS_VIEW_SPLIT":
samtools: $(echo $(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*$//')
END_VERSIONSVerbose log is attached.
System information
Nextflow/24.04.2
Loaded via appraiser and scheduled via pbspro