Skip to content

Pipe line fails at dedupe step possibly due to contigs with no read maps. #89

@wufishy

Description

@wufishy

Description of the bug

Hi,

I've been running the pipeline using primary assembly FASTA and GTF ref. files and it seemed to error out during the dedupe step.

From tracing the work directories it seems like the dedupe step is preceded by splitting the bam files into chromosomes and contigs. The contig KI270755 for this sample seemed to have no reads and therefore umi_tools received an empty BAM file for the dedupe step and throws an error.

Is it possible to configure the pipeline to ignore empty contigs and carry on?

Many thanks,
Jack

Command used and terminal output

Command ran from .sh file:

#!/bin/bash

module load Nextflow/24.04.2

#module load anaconda3/personal
#conda install -c bioconda nextflow

nextflow run nf-core/scnanoseq \
  --input ./samplesheet.csv \
  --outdir ./results \
  --genome_fasta /rds/general/user/jw4225/projects/lms-ware-analysis/live/jack/hf_trans/ref/GRCh38.primary_assembly.genome.fa \
  --transcript_fasta /rds/general/user/jw4225/projects/lms-ware-analysis/live/jack/hf_trans/ref/gencode.v48.transcripts.fa \
  --gtf /rds/general/user/jw4225/projects/lms-ware-analysis/live/jack/hf_trans/ref/gencode.v48.primary_assembly.annotation.gtf \
  --quantifier "isoquant,oarfish" \
  --barcode_format 10X_3v3 \
  -profile imperial \
  -c /rds/general/user/jw4225/projects/lms-ware-analysis/live/jack/hf_trans/imperial.config \
  -resume infallible_wiles

---

Error message:

ERROR ~ Error executing process > 'NFCORE_SCNANOSEQ:SCNANOSEQ:PROCESS_LONGREAD_SCRNA_TRANSCRIPT:DEDUP_UMIS:UMITOOLS_DEDUP (HF374.KI270755)'                                                                  

Caused by:
  Process `NFCORE_SCNANOSEQ:SCNANOSEQ:PROCESS_LONGREAD_SCRNA_TRANSCRIPT:DEDUP_UMIS:UMITOOLS_DEDUP (HF374.KI270755)` terminated with an error exit status (1)                                                 


Command executed:

  PYTHONHASHSEED=0 umi_tools \
      dedup \
      -I HF374.KI270755.sorted.bam \
      -S HF374.KI270755.transcriptome.umi_dedup.bam \
      -L HF374.KI270755.transcriptome.umi_dedup.log \
      --output-stats HF374.KI270755.transcriptome.umi_dedup \
      --paired \
      --per-cell --random-seed=100
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_SCNANOSEQ:SCNANOSEQ:PROCESS_LONGREAD_SCRNA_TRANSCRIPT:DEDUP_UMIS:UMITOOLS_DEDUP":
      umitools: $( umi_tools --version | sed '/version:/!d; s/.*: //' )
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
  INFO:    Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred
  INFO:    Environment variable SINGULARITYENV_NXF_TASK_WORKDIR is set, but APPTAINERENV_NXF_TASK_WORKDIR is preferred                                                                                       
  INFO:    Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred                                                                                                     
  Traceback (most recent call last):
    File "/usr/local/bin/umi_tools", line 11, in <module>
      sys.exit(main())
    File "/usr/local/lib/python3.9/site-packages/umi_tools/umi_tools.py", line 61, in main
      module.main(sys.argv)
    File "/usr/local/lib/python3.9/site-packages/umi_tools/dedup.py", line 310, in main
      read_gn = umi_methods.random_read_generator(
    File "/usr/local/lib/python3.9/site-packages/umi_tools/umi_methods.py", line 191, in __init__
      self.fill()
    File "/usr/local/lib/python3.9/site-packages/umi_tools/umi_methods.py", line 224, in fill
      self.refill_random()
    File "/usr/local/lib/python3.9/site-packages/umi_tools/umi_methods.py", line 195, in refill_random
      self.random_umis = np.random.choice(
    File "numpy/random/mtrand.pyx", line 951, in numpy.random.mtrand.RandomState.choice
  ValueError: 'a' cannot be empty unless no samples are taken

Work dir:
  /rds/general/project/lms-ware-analysis/live/jack/hf_trans/work/4b/c4fe91ee74f94163954b19dbbff7fc

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`                                                                              

 -- Check '.nextflow.log' file for details
ERROR ~ Pipeline failed. Please refer to troubleshooting docs: https://nf-co.re/docs/usage/troubleshooting                                                                                                   

 -- Check '.nextflow.log' file for details
-[nf-core/scnanoseq] Pipeline completed with errors-
WARN: Killing running tasks (27)
ERROR ~ Unexpected error [NullPointerException]

 -- Check '.nextflow.log' file for details

Relevant files

For reference this is the first few lines of the bam file contents before the bam file is split into contigs:

bash-4.4$ samtools view /rds/general/user/jw4225/projects/lms-ware-analysis/live/jack/hf_trans/work/bb/23102cf80cfdf38e79eafafc203293/HF374.tagged.bam | head
SRR32154426.10629689_CCAACTTTCAGCCTTC_AATATTCCACAT      256     ENST00000832824.1|ENSG00000290825.2|-|-|DDX11L16-260|DDX11L16|1379|lncRNA|      417     0       75S66=1I140=2D299=1D15=2I6=1X150=1X83=1X94=1X103=1066S        *       0       0       *       *       NM:i:10 ms:i:1870       AS:i:1868       nn:i:0  tp:Z:S  cm:i:146        s1:i:878        de:f:0.0083     MD:Z:206^AG299^C21T150T83T94G103        rl:i:517      CR:Z:CCAACTTTCAGCCTTC   CY:Z:/132,++++1133345   UR:Z:AATATTCCACAT       UY:Z:32/.--.025..       CB:Z:CCAATTTTCAGCCTTC
SRR32154426.13509812_CCACGAGGTCATCGCG_CACACATAATCC      256     ENST00000832824.1|ENSG00000290825.2|-|-|DDX11L16-260|DDX11L16|1379|lncRNA|      417     0       1206S206=2D45=1D14=1X106=2X3=1I127=1D21=1X19=2D2=2X183=1X1=1I24=1X198=1104S   *       0       0       *       *       NM:i:16 ms:i:1828       AS:i:1826       nn:i:0  tp:Z:S  cm:i:137        s1:i:841        de:f:0.0145     MD:Z:206^AG45^C14C106G0A130^C21T19^AA2T0G183A25T198   rl:i:805        CR:Z:CCACGAGGTCATCGCG   CY:Z:0142.....1110..,   UR:Z:CACACATAATCC       UY:Z:,,./63/.-,,-       CB:Z:CCACGAGGTCATCGCG
SRR32154426.43456201_TGACCCTCATAACAGA_TCCATGGTATAA      256     ENST00000832824.1|ENSG00000290825.2|-|-|DDX11L16-260|DDX11L16|1379|lncRNA|      994     0       245S6=2X144=1X3=59S     *       0       0    **       NM:i:3  ms:i:294        AS:i:294        nn:i:0  tp:Z:S  cm:i:21 s1:i:135        de:f:0.0192     MD:Z:6C0C144G3  rl:i:0  CR:Z:TGACCCTCATAACAGA   CY:Z:3226322233544221   UR:Z:TCCATGGTATAA       UY:Z:22340//..035     CB:Z:TGACCCTCATAACAGA

The bam file after splitting is empty.

This is the code that produced the empty BAM files:

#!/usr/bin/env bash -Ceuo pipefail
samtools \
    view \
    --threads 5 \
     \
     \
     \
    -o HF374.KI270755.bam \
    HF374.tagged.bam \
     \
    `cat KI270755.1.transcripts.txt`

cat <<-END_VERSIONS > versions.yml
"NFCORE_SCNANOSEQ:SCNANOSEQ:PROCESS_LONGREAD_SCRNA_TRANSCRIPT:DEDUP_UMIS:SAMTOOLS_VIEW_SPLIT":
    samtools: $(echo $(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*$//')
END_VERSIONS

Verbose log is attached.

nextflow.log

System information

Nextflow/24.04.2

Loaded via appraiser and scheduled via pbspro

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions