Skip to content

ENH: refactor orient-seqs revcomp mode to use vsearch --fastx_revcomp #224

@nbokulich

Description

@nbokulich

Problem

The reverse complement mode of orient-seqs action (to revcomp sequences if no orientation reference is passed as input) uses scikit-bio to loop through a FASTA and reverse complement each sequence.

This might be faster / less maintenance to use vsearch instead.

Solution

These lines:

oriented = DNAFASTAFormat()
with oriented.open() as out_fasta:
for seq in sequences.view(DNAIterator):
seq.reverse_complement().write(out_fasta)

Should be replaced with a vsearch subprocess like this:

def _vsearch_revcomp_fastq(seqs_fp, out_fp):
cmd = [
'vsearch',
'--fastx_revcomp', str(seqs_fp),
'--fastqout', str(out_fp),

but note that --fastaout should be used instead of --fastqout (the _vsearch_revcomp_fastq function should just be refactored to make this more generalizable, using the appropriate output format depending on the input type).

Question

I think the main reason to do this would be for speed — it would be useful to check if vsearch --fastx_revcomp is any faster with large FASTA inputs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions