Skip to content

Commit ad4e8d9

Browse files
ColelymanSnicker7mbowcut2trevormartinj7kclem
authored
Sam/tests rebase (#149) (#620)
* Bug Fix - 367 (#35) * - Fixed references to ref_names_for_pe * removed extra tabs * trying to match empty line, no tabs * - changed references to ref_names[0] * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- --------- * GitHub actions integration tests (#48) * GitHub actions clean (#40) * Create pytest.yml * Create pylint.yml * Create .pylintrc * Create test_env.yml * Full path * Remove conda install * Replace path * Pytest tests * pip -e * Create integration_tests.yml * Simplify name * CRISPRESSO2_DIR environment variable * Up one dir * ls workspace * Install CRISPResso and ydiff * Clone repo instead of checkout * submodule * ls * CRISPResso2_copy * ls * Update env * Simplify * Pull from githubactions branch * Pull githubactions repo * Checkout githubactions * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- * Run tests individually * Pin plotly version * Run all tests even if one fails * Test on another branch * Switch branch with token * Update integration_tests.yml * Introduce pandas sorting in CRISPRessoCompare (#47) * New makefile commands * Fix interleaved fastq input in CRISPRessoPooled and suppress CRISPRessoWGS params (#42) * Extract out split_interleaved_fastq function to CRISPRessoShared * Implement splitting interleaved fastq files in CRISPRessoPooled * Suppress split_interleaved_input from CRISPRessoWGS parameters * Suppress other parameters in CRISPRessoWGS * Move where interleaved fastq files are split to be trimmed properly * Bug Fix - 367 (#35) * - Fixed references to ref_names_for_pe * removed extra tabs * trying to match empty line, no tabs * - changed references to ref_names[0] * Mckay/pd warnings (#45) * refactor errors='ignore' to try except * refactored integer slice to iloc[] * moved to_numeric try except to function * Refactor to_numeric_ignore_errors to to_numeric_ignore_columns This change is slightly cleaner because it addresses the root issue that some columns are strings (and can therefore not be converted to numeric types). Now if an error does occur when converting the dfs to numeric types it won't be swallowed up. * Add documentation to to_numeric_ignore_columns --------- --------- * On push no branches * On push no branches * All in one file * Fix yml errors * Rename jobs * Remove old workflow files * Remove paths * Run jobs in parallel --------- * VCF Output (#128) * vcf file writing initial concept * splitting vcf into testable functions, adding tests * adding unit tests, adding nullcontext to vcf writing file * fixed first insertion bug * Mckay/base edit plot (#119) * Cole/plot fixes (#121) * fix setting of 99%ile in negative direction (deletions) * need another break * When reading CRISPRessoPooled amplicon file, only skip lines if they start with the comment character(#) * Implement new overwrite_crispresso_options in CRISPRessoPooled to only add non-default commands when propagating * In CRISPRessoPooled implement multiplexing for sub-runs and create a run_name that is filename-safe and separate from a display name. * Add arg for display_name * Change display name name to Display Name * CRISPRessoWGS fix unpickleable partial error * Move complete message to after crispresso cup to allow for json parsing of status.json * Update CRISPRessoWGS to allow for run names * Messages to users about counting reads in input (sometimes this takes a while for large samples) * Cast guardrail values as ints because division casts them as floats * Remove invalid escape of _ when writing to JSON status file * Fix documentatio for `CRISPRessoShared.check_if_failed` and remove extraneous whitespace * Point to updated test branch * Fix Cython bug * Read version from toml file * Fix import error * update version * Update integration_tests.yml Remove Native Merge test temporarily * Pin fastp version * Run native merge test and point tests to master --------- * Additional plot fixes (#122) * fix setting of 99%ile in negative direction (deletions) * need another break * When reading CRISPRessoPooled amplicon file, only skip lines if they start with the comment character(#) * Implement new overwrite_crispresso_options in CRISPRessoPooled to only add non-default commands when propagating * In CRISPRessoPooled implement multiplexing for sub-runs and create a run_name that is filename-safe and separate from a display name. * Add arg for display_name * Change display name name to Display Name * CRISPRessoWGS fix unpickleable partial error * Move complete message to after crispresso cup to allow for json parsing of status.json * Update CRISPRessoWGS to allow for run names * Messages to users about counting reads in input (sometimes this takes a while for large samples) * Cast guardrail values as ints because division casts them as floats * Remove invalid escape of _ when writing to JSON status file * Fix documentatio for `CRISPRessoShared.check_if_failed` and remove extraneous whitespace * Point to updated test branch * Fix Cython bug * Read version from toml file * Move printing header info to after console log level setting in main function * Add import for importlib.metadata to CRISPRessoShared.py * Standardize intermediate file names for CRISPRessoPooled info and fastq files * CRISPRessoPooled avoid gzipping nonexistant files * Remove warning if no config file * Allow none for custom_colors in CRISPRessoPlot * read version from toml file * Make fig_filename_root default to None, in which case the figure is shown interactively - e.g. in a jupyter notebook * Print tool description after logging level has been set * update testRelease.sh script * Don't rerun if the 'verbosity' value has changed. * CRISPResso core will not rerun if there are changes in the 'debug', 'n_processes', and 'verbosity' arguments * Verbosity levels <=1 are set to 1, >= 4 are set to 4. * update for future pd.read_json deprecation * Fix import error * update version * Update integration_tests.yml Remove Native Merge test temporarily * Pin fastp version * Run native merge test and point tests to master * Point tests to cole/plot_fixes branch * Point tests back to master --------- * - added Kendall's code - added args - added fig 10i to report - added tests for data prep * added upsetplot to Dockerfile * added save_png to plot args * removed unused args * remove unused variable * cole's comments * removed unused function * comments * comments * updated test_be_df with UNMODIFIED row * remove comment * removed extra args fixed name * Update help string for `--base_editor_consider_changes_outside_qw` * Point tests back to master --------- * further vcf tweaks/work * vcf dynamic alt map and vcf line generation with unit testing * updating tests * Add test case for second element deletion * Remove unnecessary imports and add whitspace * Add mini integration tests for alt_map and vcf_line * Update unit tests to account for bug in find_indels_substitutions * Add failing test case for insertion and deletion occurring at the same position The alt map incorrectly separate the deletion and insertion to different positions. * Cast dict_keys to be a list * Fix test case to reflect correct position of deletion * Add checks to the VCF file * Add upsetplot as a dependency in setup.py (#130) * Fix position of deletion in test case * Fix off by one error for deletions, update tests and fix deletion at start It turns out that when there is a deletion at the start of the sequence, the correct way to handle it is to provide the last base after the deletion. Source: https://bioinformatics.stackexchange.com/questions/2476/how-to-represent-a-deletion-at-position-1-in-a-vcf-file * Update tests to convince myself that insertion and deletion handling is working * Remove duplicate writing of allele frequency table * Refactor `get_allele_row` and make vcf_output not dependent on write_detailed_allele_table * Extract out construction of df_alleles in unit tests * Write test to illustrate bug when you have an insertion then a deletion * Refactor unit tests to use create_df_alleles * Fix test_build_alt_map_mixed test * Refactor create_df_alleles to support number of reads This fixes test_build_alt_map_substitutions * Fix test_build_alt_map_insertions test * Allow amplicon name to be passed to create_df_alleles * Fix test_upsert_edit_del_and_ins * Fix deletion at start of amplicon * Fix bug when there is a deletion starting at the second position This bug only happens when a deletion starts are the second position, before the fix, it would report that the deletion started at the first position. It is fixed now, so deletions at the second position are reported correctly. * Fix test to reflect full deletion * Add tests for find_indels_substitutions for deletions at the end * Fix 1bp deletions at the end, and off by one error This ensures that when a deletion occurs at the end of a read, the entire deletion is accounted for. * Fix for representing deletions at the end of a sequence * Fix bug where deletions that extend to the end of a sequence fail * Remove qwc_indexes * Properly implement handling of deletions that start at the beginning of a sequence * Properly account for multiple delete_start events * Update test to reflect proper ref_positions and other attributes * Fix QWC inference across amplicons (#137) * Mckay/be plot improvements (#136) * trying to get the figure to fit nicely, increased element size to 100 * custom figsize to display without cutting off increased figsize in report template * Allow messages to be served in CLI reports (#134) (#583) * Fix deletion at second position (#131) * Fix bug when there is a deletion starting at the second position This bug only happens when a deletion starts are the second position, before the fix, it would report that the deletion started at the first position. It is fixed now, so deletions at the second position are reported correctly. * Update CRISPRessoCOREResources.c due to change in .pyx * Add tests for find_indels_substitutions for deletions at the end * Fix 1bp deletions at the end, and off by one error This ensures that when a deletion occurs at the end of a read, the entire deletion is accounted for. * Update CRISPRessoCOREResources.c to reflect fixes for deletions at the end of alignments * Add extra asserts to deletion checks * Point to new test branch * Reafctor deletion_coordinates to go past the end of the string for deletions at the end of the sequence * Point tests to master * Allow messages to be served in CLI reports * Point to cole/messages test branch * Point tests back to master * point to tests branch * typo * testing github actions * remove test * point tests to master --------- * Update inferred QWC tests to reflect correct intended behavior * Fix inferring QWC to match intended behavior * Add more test cases and fix bug discovered in single bp QWC * Add even more test cases testing indels outside the QWC * Point tests to cole/fix-qwc-deletion * update plotly.js (#138) * Change order of amplicon inference alignment so that 1st amplicon is the reference This makes a difference because it changes the values of `s1inds`, and therefore the value of the inferred quantification window coordinates. * Point integration tests back to master * Update CHANGELOG.md --------- * Fix a bug for `--bam_output` when there are unaligned reads (#144) * Fix BAM output when there are unaligned reads * Point tests to cole/unaligned-reads * Update CHANGELOG.md * Fix typo in tests branch name * Point tests back to master * Add VCF testing and verification design document Documents the strategy for integration testing the --vcf_output feature using golden file comparison with syn-gen synthetic data. * Change name of output VCF file * Point tests to vcf-parameters * Add writers module refactor design document Documents plan to move VCF output code from CRISPRessoUtilities.py to a new writers/ module structure. * Refactor VCF code into writers/ module Move CRISPRessoUtilities.py to writers/vcf.py to establish a pattern for organizing output writers. Update imports in CRISPRessoCORE.py and relocate tests to test_writers/test_vcf/. * Remove the FORMAT column from VCF and add the contig length to VCF headers * Initial VCF simplication writing each edit on its own line * Further simplification of VCF output code * Left align deletions for VCF output * Left-normalize insertions for VCF output * Update CLAUDE.md with design_docs * Update Github Actions with Pooled Prime Editing and VCF basic and VCF Prime Edit Basic * Update integration tests to run in parallel * Add VCF path to crispresso2_info * Point the tests back to cole/vcf-parameters * Always save the conda env cache in Github Actions * Debug: add set +e and per-target error reporting in integration tests setup-miniconda@v3 injects 'set -eo pipefail' into ~/.profile, which is sourced by bash -l. This causes make commands to fail silently. Adding set +e at script start and explicit per-target error annotations. * Attempt to fix conda cache in GitHub Actions * Update CHANGELOG.md and parameter descriptions * Remove unused nullcontext import * Remove extra blank lines after amplicon_coordinates validation * Document that --vcf_output implies --write_detailed_allele_table * Use max amplicon length for VCF contig headers and warn on collision When multiple amplicons map to the same chromosome, the contig header now uses the max amplicon length instead of silently overwriting with the last one seen. A warning is logged when this occurs. * Fix VCF POS off-by-one for start-of-amplicon deletions when pos > 1 For deletions starting at position 0 of the amplicon, the VCF POS was computed as pos-1 instead of pos. The max(1,...) clamp accidentally masked this when pos=1 (all existing tests). With pos>1, VCF POS pointed one base before the amplicon. Fix: when start==0, set left_index=pos directly instead of pos+start-1. * Fix multi-insertion bug: use alignment index to extract inserted bases _edits_from_insertions treated the second element of insertion_coordinates as an alignment index, but it is actually a reference position. These coincide for single-insertion reads but diverge when prior insertions shift alignment indices. Fix: convert right_anchor_ref_pos to an alignment index via ref_positions.index(), then extract the size characters before it. Also: - Rename misleading variables (right_anchor_ref_pos was actually left anchor, aligned_start was actually right anchor ref pos) - Add comment on defensive ref_len branch explaining it is currently unreachable - Update LEFT_NORMALIZATION.md to correctly describe the insertion_coordinates format * Add clarifying example to help text of --amplicon_coordinates * Remove planning documents * Point tests back to master * Update PR number in CHANGELOG.md * Cache the conda environment for pytest as well --------- * Tests file * Rebased tests * Rebase of new unit tests, improve coverage * Fix version assertion * Cole comment fixes * Fix tests * Add multiprocess tests * Add discovery of Python packages (#151) * Fix Docker entrypoint * Attempt to fix Circle CI pip install * Run each CRISPResso command in the conda environment * Update `--base_editor_output` parameter name for Circle CI * Update tests Makefile and batch expected output * Fix QWC inference across amplicons (#137) * Mckay/be plot improvements (#136) * trying to get the figure to fit nicely, increased element size to 100 * custom figsize to display without cutting off increased figsize in report template * Allow messages to be served in CLI reports (#134) (#583) * Fix deletion at second position (#131) * Fix bug when there is a deletion starting at the second position This bug only happens when a deletion starts are the second position, before the fix, it would report that the deletion started at the first position. It is fixed now, so deletions at the second position are reported correctly. * Update CRISPRessoCOREResources.c due to change in .pyx * Add tests for find_indels_substitutions for deletions at the end * Fix 1bp deletions at the end, and off by one error This ensures that when a deletion occurs at the end of a read, the entire deletion is accounted for. * Update CRISPRessoCOREResources.c to reflect fixes for deletions at the end of alignments * Add extra asserts to deletion checks * Point to new test branch * Reafctor deletion_coordinates to go past the end of the string for deletions at the end of the sequence * Point tests to master * Allow messages to be served in CLI reports * Point to cole/messages test branch * Point tests back to master * point to tests branch * typo * testing github actions * remove test * point tests to master --------- * Update inferred QWC tests to reflect correct intended behavior * Fix inferring QWC to match intended behavior * Add more test cases and fix bug discovered in single bp QWC * Add even more test cases testing indels outside the QWC * Point tests to cole/fix-qwc-deletion * update plotly.js (#138) * Change order of amplicon inference alignment so that 1st amplicon is the reference This makes a difference because it changes the values of `s1inds`, and therefore the value of the inferred quantification window coordinates. * Point integration tests back to master * Update CHANGELOG.md --------- * Update setup.py and pyproject.toml to find all modules and packages * Convert pyproject.toml to Unix file endings --------- * Update tests to not use length * Fix * Pixi implementation (#148) * Initial implementation of pixi * Add .pixi to .dockerignore * Fix Dockerfile * Update Dockerfile to be multistage * Strip down setup.py to reduce redundant metadata * Unpin numpy, matplotlib, and pandas. Remove tbb and pyparsing * fix: add locked: false to setup-pixi since pixi.lock is not committed * fix: disable pixi cache since pixi.lock is not committed * feat: add pixi environment caching using actions/cache keyed on pixi.toml hash * Point the integration tests to master * fix: checkout CRISPResso2_tests to separate path to preserve pixi workspace * fix: checkout CRISPResso2_tests under workspace, set CRISPRESSO2_DIR * fix: pin python <3.13 to avoid compatibility issues with newer python * remove accidentally committed test output * Add discovery of Python packages (#151) * Fix Docker entrypoint * Attempt to fix Circle CI pip install * Run each CRISPResso command in the conda environment * Update `--base_editor_output` parameter name for Circle CI * Update tests Makefile and batch expected output * Fix QWC inference across amplicons (#137) * Mckay/be plot improvements (#136) * trying to get the figure to fit nicely, increased element size to 100 * custom figsize to display without cutting off increased figsize in report template * Allow messages to be served in CLI reports (#134) (#583) * Fix deletion at second position (#131) * Fix bug when there is a deletion starting at the second position This bug only happens when a deletion starts are the second position, before the fix, it would report that the deletion started at the first position. It is fixed now, so deletions at the second position are reported correctly. * Update CRISPRessoCOREResources.c due to change in .pyx * Add tests for find_indels_substitutions for deletions at the end * Fix 1bp deletions at the end, and off by one error This ensures that when a deletion occurs at the end of a read, the entire deletion is accounted for. * Update CRISPRessoCOREResources.c to reflect fixes for deletions at the end of alignments * Add extra asserts to deletion checks * Point to new test branch * Reafctor deletion_coordinates to go past the end of the string for deletions at the end of the sequence * Point tests to master * Allow messages to be served in CLI reports * Point to cole/messages test branch * Point tests back to master * point to tests branch * typo * testing github actions * remove test * point tests to master --------- * Update inferred QWC tests to reflect correct intended behavior * Fix inferring QWC to match intended behavior * Add more test cases and fix bug discovered in single bp QWC * Add even more test cases testing indels outside the QWC * Point tests to cole/fix-qwc-deletion * update plotly.js (#138) * Change order of amplicon inference alignment so that 1st amplicon is the reference This makes a difference because it changes the values of `s1inds`, and therefore the value of the inferred quantification window coordinates. * Point integration tests back to master * Update CHANGELOG.md --------- * Update setup.py and pyproject.toml to find all modules and packages * Convert pyproject.toml to Unix file endings --------- * Remove .pi TODO's * pin matplotlib * pin pandas * pin matplotlib * pin numpy * Update CHANGELOG.md --------- * Update `np.tostring()` to `np.tobytes()` which is compatible with numpy 1.x and 2.x --------- Co-authored-by: Samuel Nichols <Snic9004@gmail.com> Co-authored-by: mbowcut2 <55161542+mbowcut2@users.noreply.github.com> Co-authored-by: Trevor Martin <trevormartinj7@gmail.com> Co-authored-by: kclem <k.clement.dev@gmail.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: mbowcut2 <mbowcut@gmail.com>
1 parent 7f3bcde commit ad4e8d9

17 files changed

+5930
-452
lines changed

CRISPResso2/CRISPRessoCORE.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2562,7 +2562,7 @@ def normalize_name(name, fastq_r1, fastq_r2, bam_input):
25622562
return '%s_%s' % (get_name_from_fasta(fastq_r1), get_name_from_fasta(fastq_r2))
25632563
elif fastq_r1:
25642564
return '%s' % get_name_from_fasta(fastq_r1)
2565-
elif bam_input != '':
2565+
elif bam_input is not None and bam_input != '':
25662566
return '%s' % get_name_from_bam(bam_input)
25672567
else:
25682568
clean_name=CRISPRessoShared.slugify(name)

CRISPResso2/CRISPRessoMultiProcessing.py

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -141,7 +141,6 @@ def run_crispresso_cmds(crispresso_cmds, n_processes="1", descriptor = 'region',
141141
pool.terminate()
142142
logger.warn('Caught SIGINT. Program Terminated')
143143
raise Exception('CRISPResso2 Terminated')
144-
exit (0)
145144
except Exception as e:
146145
print('CRISPResso2 failed')
147146
raise e
@@ -196,7 +195,6 @@ def input_function_chunk(df):
196195
pool.terminate()
197196
logging.warn('Caught SIGINT. Program Terminated')
198197
raise Exception('CRISPResso2 Terminated')
199-
exit (0)
200198
except Exception as e:
201199
print('CRISPResso2 failed')
202200
raise e
@@ -278,7 +276,6 @@ def run_parallel_commands(commands_arr, n_processes=1, descriptor='CRISPResso2',
278276
pool.terminate()
279277
logging.warn('Caught SIGINT. Program Terminated')
280278
raise Exception('CRISPResso2 Terminated')
281-
exit (0)
282279
except Exception as e:
283280
print('CRISPResso2 failed')
284281
raise e

CRISPResso2/CRISPRessoPlot.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,7 @@ def get_nuc_color(nuc, alpha):
7171
charSum += thisval
7272
charSum = (charSum/len(nuc))/90.0
7373

74-
return (charSum, (1-charSum), (2*charSum*(1-charSum)))
74+
return (charSum, (1-charSum), (2*charSum*(1-charSum)), alpha)
7575

7676
def get_color_lookup(nucs, alpha, custom_colors=None):
7777
if custom_colors is None:

CRISPResso2/filterFastqs.py

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -39,10 +39,10 @@ def filterFastqs(fastq_r1=None,fastq_r2=None,fastq_r1_out=None,fastq_r2_out=None
3939
startTime = datetime.datetime.now()
4040

4141
if not os.path.exists(fastq_r1):
42-
raise Exception("fastq_r1 file '"+fastq_r1+"' does not exit.")
42+
raise Exception("fastq_r1 file '"+fastq_r1+"' does not exist.")
4343

4444
if fastq_r2 is not None and not os.path.exists(fastq_r2):
45-
raise Exception("fastq_r2 file '"+fastq_r2+"' does not exit.")
45+
raise Exception("fastq_r2 file '"+fastq_r2+"' does not exist.")
4646

4747
##CREATION OF FILEHANDLES##
4848
if fastq_r1.endswith('.gz'):
@@ -143,7 +143,7 @@ def run_mBPN(f1_in, f1_out, min_bp_qual_in_read, min_av_read_qual, min_bp_qual_o
143143
npQualLine = numpy.frombuffer(qualLine, dtype=numpy.uint8)-33 #assume illumina 1.7
144144
npSeqLine = numpy.frombuffer(seqLine, 'c').copy()
145145
npSeqLine[npQualLine < min_bp_qual_or_N] = 'N'
146-
f1_out.write("%s\n%s\n%s\n%s\n"%(idLine, npSeqLine.tostring().decode('utf-8'), plusLine.decode('utf-8'), qualLine.decode('utf-8')))
146+
f1_out.write("%s\n%s\n%s\n%s\n"%(idLine, npSeqLine.tobytes().decode('utf-8'), plusLine.decode('utf-8'), qualLine.decode('utf-8')))
147147
idLine = f1_in.readline().rstrip().decode('utf-8')
148148

149149
def run_mRQ(f1_in, f1_out, min_bp_qual_in_read, min_av_read_qual, min_bp_qual_or_N):
@@ -195,7 +195,7 @@ def run_mBP_mBPN(f1_in, f1_out, min_bp_qual_in_read, min_av_read_qual, min_bp_qu
195195
if min >= min_bp_qual_in_read:
196196
npSeqLine = numpy.frombuffer(seqLine, 'c')
197197
npSeqLine[npQualLine < min_bp_qual_or_N] = 'N'
198-
f1_out.write("%s\n%s\n%s\n%s\n"%(idLine, npSeqLine.tostring().decode('utf-8'), plusLine.decode('utf-8'), qualLine.decode('utf-8')))
198+
f1_out.write("%s\n%s\n%s\n%s\n"%(idLine, npSeqLine.tobytes().decode('utf-8'), plusLine.decode('utf-8'), qualLine.decode('utf-8')))
199199
idLine = f1_in.readline().rstrip().decode('utf-8')
200200

201201
def run_mRQ_mBPN(f1_in, f1_out, min_bp_qual_in_read, min_av_read_qual, min_bp_qual_or_N):
@@ -209,7 +209,7 @@ def run_mRQ_mBPN(f1_in, f1_out, min_bp_qual_in_read, min_av_read_qual, min_bp_qu
209209
if mean >= min_av_read_qual:
210210
npSeqLine = numpy.frombuffer(seqLine, 'c').copy()
211211
npSeqLine[npQualLine < min_bp_qual_or_N] = 'N'
212-
f1_out.write("%s\n%s\n%s\n%s\n"%(idLine, npSeqLine.tostring().decode('utf-8'), plusLine.decode('utf-8'), qualLine.decode('utf-8')))
212+
f1_out.write("%s\n%s\n%s\n%s\n"%(idLine, npSeqLine.tobytes().decode('utf-8'), plusLine.decode('utf-8'), qualLine.decode('utf-8')))
213213
idLine = f1_in.readline().rstrip().decode('utf-8')
214214

215215
def run_mBP_mRQ_mBPN(f1_in, f1_out, min_bp_qual_in_read, min_av_read_qual, min_bp_qual_or_N):
@@ -225,7 +225,7 @@ def run_mBP_mRQ_mBPN(f1_in, f1_out, min_bp_qual_in_read, min_av_read_qual, min_b
225225
if mean >= min_av_read_qual:
226226
npSeqLine = numpy.frombuffer(seqLine, 'c').copy()
227227
npSeqLine[npQualLine < min_bp_qual_or_N] = 'N'
228-
f1_out.write("%s\n%s\n%s\n%s\n"%(idLine, npSeqLine.tostring().decode('utf-8'), plusLine.decode('utf-8'), qualLine.decode('utf-8')))
228+
f1_out.write("%s\n%s\n%s\n%s\n"%(idLine, npSeqLine.tobytes().decode('utf-8'), plusLine.decode('utf-8'), qualLine.decode('utf-8')))
229229
idLine = f1_in.readline().rstrip().decode('utf-8')
230230

231231

@@ -245,10 +245,10 @@ def run_mBPN_pair(f1_in, f1_out, f2_in, f2_out, min_bp_qual_in_read, min_av_read
245245
npQualLine2 = numpy.frombuffer(qualLine2, dtype=numpy.uint8)-33 #assume illumina 1.7
246246
npSeqLine = numpy.frombuffer(seqLine, 'c').copy()
247247
npSeqLine[npQualLine < min_bp_qual_or_N] = 'N'
248-
f1_out.write("%s\n%s\n%s\n%s\n"%(idLine, npSeqLine.tostring().decode('utf-8'), plusLine.decode('utf-8'), qualLine.decode('utf-8')))
248+
f1_out.write("%s\n%s\n%s\n%s\n"%(idLine, npSeqLine.tobytes().decode('utf-8'), plusLine.decode('utf-8'), qualLine.decode('utf-8')))
249249
npSeqLine2 = numpy.frombuffer(seqLine2, 'c').copy()
250250
npSeqLine2[npQualLine2 < min_bp_qual_or_N] = 'N'
251-
f2_out.write("%s\n%s\n%s\n%s\n"%(idLine2, npSeqLine2.tostring().decode('utf-8'), plusLine2.decode('utf-8'), qualLine2.decode('utf-8')))
251+
f2_out.write("%s\n%s\n%s\n%s\n"%(idLine2, npSeqLine2.tobytes().decode('utf-8'), plusLine2.decode('utf-8'), qualLine2.decode('utf-8')))
252252

253253
idLine = f1_in.readline().rstrip().decode('utf-8')
254254
idLine2 = f2_in.readline().rstrip().decode('utf-8')
@@ -338,10 +338,10 @@ def run_mBP_mBPN_pair(f1_in, f1_out, f2_in, f2_out, min_bp_qual_in_read, min_av_
338338
if min >= min_bp_qual_in_read and min2 >= min_bp_qual_in_read:
339339
npSeqLine = numpy.frombuffer(seqLine, 'c').copy()
340340
npSeqLine[npQualLine < min_bp_qual_or_N] = 'N'
341-
f1_out.write("%s\n%s\n%s\n%s\n"%(idLine, npSeqLine.tostring().decode('utf-8'), plusLine.decode('utf-8'), qualLine.decode('utf-8')))
341+
f1_out.write("%s\n%s\n%s\n%s\n"%(idLine, npSeqLine.tobytes().decode('utf-8'), plusLine.decode('utf-8'), qualLine.decode('utf-8')))
342342
npSeqLine2 = numpy.frombuffer(seqLine2, 'c').copy()
343343
npSeqLine2[npQualLine2 < min_bp_qual_or_N] = 'N'
344-
f2_out.write("%s\n%s\n%s\n%s\n"%(idLine2, npSeqLine2.tostring().decode('utf-8'), plusLine2.decode('utf-8'), qualLine2.decode('utf-8')))
344+
f2_out.write("%s\n%s\n%s\n%s\n"%(idLine2, npSeqLine2.tobytes().decode('utf-8'), plusLine2.decode('utf-8'), qualLine2.decode('utf-8')))
345345
idLine = f1_in.readline().rstrip().decode('utf-8')
346346
idLine2 = f2_in.readline().rstrip().decode('utf-8')
347347

@@ -363,10 +363,10 @@ def run_mRQ_mBPN_pair(f1_in, f1_out, f2_in, f2_out, min_bp_qual_in_read, min_av_
363363
if mean >= min_av_read_qual and mean2 >= min_av_read_qual:
364364
npSeqLine = numpy.frombuffer(seqLine, 'c').copy()
365365
npSeqLine[npQualLine < min_bp_qual_or_N] = 'N'
366-
f1_out.write("%s\n%s\n%s\n%s\n"%(idLine, npSeqLine.tostring().decode('utf-8'), plusLine.decode('utf-8'), qualLine.decode('utf-8')))
366+
f1_out.write("%s\n%s\n%s\n%s\n"%(idLine, npSeqLine.tobytes().decode('utf-8'), plusLine.decode('utf-8'), qualLine.decode('utf-8')))
367367
npSeqLine2 = numpy.frombuffer(seqLine2, 'c').copy()
368368
npSeqLine2[npQualLine2 < min_bp_qual_or_N] = 'N'
369-
f2_out.write("%s\n%s\n%s\n%s\n"%(idLine2, npSeqLine2.tostring().decode('utf-8'), plusLine2.decode('utf-8'), qualLine2.decode('utf-8')))
369+
f2_out.write("%s\n%s\n%s\n%s\n"%(idLine2, npSeqLine2.tobytes().decode('utf-8'), plusLine2.decode('utf-8'), qualLine2.decode('utf-8')))
370370
idLine = f1_in.readline().rstrip().decode('utf-8')
371371
idLine2 = f2_in.readline().rstrip().decode('utf-8')
372372

@@ -391,10 +391,10 @@ def run_mBP_mRQ_mBPN_pair(f1_in, f1_out, f2_in, f2_out, min_bp_qual_in_read, min
391391
if mean >= min_av_read_qual and mean2 >= min_av_read_qual:
392392
npSeqLine = numpy.frombuffer(seqLine, 'c').copy()
393393
npSeqLine[npQualLine < min_bp_qual_or_N] = 'N'
394-
f1_out.write("%s\n%s\n%s\n%s\n"%(idLine, npSeqLine.tostring().decode('utf-8'), plusLine.decode('utf-8'), qualLine.decode('utf-8')))
394+
f1_out.write("%s\n%s\n%s\n%s\n"%(idLine, npSeqLine.tobytes().decode('utf-8'), plusLine.decode('utf-8'), qualLine.decode('utf-8')))
395395
npSeqLine2 = numpy.frombuffer(seqLine2, 'c').copy()
396396
npSeqLine2[npQualLine2 < min_bp_qual_or_N] = 'N'
397-
f2_out.write("%s\n%s\n%s\n%s\n"%(idLine2, npSeqLine2.tostring().decode('utf-8'), plusLine2.decode('utf-8'), qualLine2.decode('utf-8')))
397+
f2_out.write("%s\n%s\n%s\n%s\n"%(idLine2, npSeqLine2.tobytes().decode('utf-8'), plusLine2.decode('utf-8'), qualLine2.decode('utf-8')))
398398
idLine = f1_in.readline().rstrip().decode('utf-8')
399399
idLine2 = f2_in.readline().rstrip().decode('utf-8')
400400

pytest.ini

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
[pytest]
2+
testpaths = tests/unit_tests
3+
python_files = test_*.py
4+
python_functions = test_*
5+
addopts = -v --tb=short

tests/df_alleles.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
,#Reads,Aligned_Sequence,Reference_Sequence,n_inserted,n_deleted,n_mutated,Reference_Name,Read_Status,Aligned_Reference_Names,Aligned_Reference_Scores,ref_positions,%Reads
2+
1,100,AATACGGATGTTCCAATCAGTACGCAGAGAGTCGCCGTCTCCAAGGTGAAAGCGGAAGTAGGGCCTTCGCGCACCTCATGGAATCCCTTCTGCAGCCGCTTTTCCGAGCTTCTGGCGGTCTCAAGCACTACCTACGTCAGCACCTGGGACCCCGCCACCGTGCGCCGGGCCTTGCCGTGGGCGCGCTACCTGCGCCACATCCATCGGCGCTTTGGTCGGCATGGCCCCATTCGCACGGCTCTGGAGCGGC,CGGCCGGATGTTCCAATCAGTACGCAGAGAGTCGCCGTCTCCAAGGTGAAAGCTGAAGTAGGGCCTTCGCGCACCTCATGGAATCCCTTCTGCAGCTTTTCCGAGCTTCTGGCGGTCTCAAGCACTACCTACGTCAGCACCTGGGACCCCGCCACCGTGCGCCGGGCCTTGCAGTGGGCGCGCTACCTGCGCCACATCCATCGGCGCTTTGGTCGG,0,0,0,TEST,MODIFIED,TEST,100&100,"[0,1,2,3]",100.0

tests/unit_tests/conftest.py

Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
"""Shared pytest fixtures for CRISPResso2 unit tests."""
2+
3+
import os
4+
import tempfile
5+
6+
import pytest
7+
8+
9+
@pytest.fixture
10+
def temp_dir():
11+
"""Provide a temporary directory that's cleaned up after tests."""
12+
with tempfile.TemporaryDirectory() as tmpdir:
13+
yield tmpdir
14+
15+
16+
@pytest.fixture
17+
def sample_fastq(temp_dir):
18+
"""Create a sample FASTQ file for testing."""
19+
filepath = os.path.join(temp_dir, "sample.fastq")
20+
with open(filepath, "w") as f:
21+
f.write("@read1\nATCGATCG\n+\nIIIIIIII\n")
22+
f.write("@read2\nGCTAGCTA\n+\nIIIIIIII\n")
23+
return filepath
24+
25+
26+
@pytest.fixture
27+
def sample_fastq_low_quality(temp_dir):
28+
"""Create a sample FASTQ file with low quality scores."""
29+
filepath = os.path.join(temp_dir, "low_quality.fastq")
30+
with open(filepath, "w") as f:
31+
f.write("@read1\nATCGATCG\n+\n!!!!!!!!\n") # Quality 0
32+
f.write("@read2\nGCTAGCTA\n+\n########\n") # Quality 2
33+
return filepath
34+
35+
36+
@pytest.fixture
37+
def sample_fastq_mixed_quality(temp_dir):
38+
"""Create a sample FASTQ file with mixed quality scores."""
39+
filepath = os.path.join(temp_dir, "mixed_quality.fastq")
40+
with open(filepath, "w") as f:
41+
f.write("@read1\nATCGATCG\n+\nIIIIIIII\n") # High quality
42+
f.write("@read2\nGCTAGCTA\n+\n!!!!!!!!\n") # Low quality
43+
f.write("@read3\nAAAAAAAA\n+\nIIIIIIII\n") # High quality
44+
return filepath
45+
46+
47+
@pytest.fixture
48+
def empty_fastq(temp_dir):
49+
"""Create an empty FASTQ file."""
50+
filepath = os.path.join(temp_dir, "empty.fastq")
51+
with open(filepath, "w") as f:
52+
f.close()
53+
return filepath
54+
55+
56+
@pytest.fixture
57+
def aln_matrix():
58+
"""Load the EDNAFULL alignment matrix."""
59+
from CRISPResso2 import CRISPResso2Align
60+
61+
return CRISPResso2Align.read_matrix("./CRISPResso2/EDNAFULL")
62+
63+
64+
@pytest.fixture
65+
def blosum62_matrix():
66+
"""Load the BLOSUM62 alignment matrix."""
67+
from CRISPResso2 import CRISPResso2Align
68+
69+
return CRISPResso2Align.read_matrix("./CRISPResso2/BLOSUM62")
70+
71+
72+
def create_test_fastq(filepath, records):
73+
"""Helper function to create test FASTQ files.
74+
75+
Args:
76+
filepath: Path to create the file at
77+
records: List of tuples (name, sequence, quality)
78+
"""
79+
with open(filepath, "w") as f:
80+
for name, seq, qual in records:
81+
f.write(f"@{name}\n{seq}\n+\n{qual}\n")
82+
return filepath

0 commit comments

Comments
 (0)