-
Notifications
You must be signed in to change notification settings - Fork 25
Description
Problem
The reverse complement mode of orient-seqs action (to revcomp sequences if no orientation reference is passed as input) uses scikit-bio to loop through a FASTA and reverse complement each sequence.
This might be faster / less maintenance to use vsearch instead.
Solution
These lines:
Lines 88 to 91 in 009611c
| oriented = DNAFASTAFormat() | |
| with oriented.open() as out_fasta: | |
| for seq in sequences.view(DNAIterator): | |
| seq.reverse_complement().write(out_fasta) |
Should be replaced with a vsearch subprocess like this:
Lines 96 to 100 in 009611c
| def _vsearch_revcomp_fastq(seqs_fp, out_fp): | |
| cmd = [ | |
| 'vsearch', | |
| '--fastx_revcomp', str(seqs_fp), | |
| '--fastqout', str(out_fp), |
but note that --fastaout should be used instead of --fastqout (the _vsearch_revcomp_fastq function should just be refactored to make this more generalizable, using the appropriate output format depending on the input type).
Question
I think the main reason to do this would be for speed — it would be useful to check if vsearch --fastx_revcomp is any faster with large FASTA inputs.