Skip to content

CRISPRCasTyper 1.9.0 – Python-native pipeline, bundled DB, and GFF/protein support.#66

Open
pentamorfico wants to merge 10 commits intoRussel88:masterfrom
pentamorfico:master
Open

CRISPRCasTyper 1.9.0 – Python-native pipeline, bundled DB, and GFF/protein support.#66
pentamorfico wants to merge 10 commits intoRussel88:masterfrom
pentamorfico:master

Conversation

@pentamorfico
Copy link
Copy Markdown
Collaborator

@pentamorfico pentamorfico commented Jan 12, 2026

This PR modernizes CRISPRCasTyper with a Python-native pipeline, bundled data/models, GFF/protein input, and updated install/docs. HMMER, prodigal, and minced dependencies are replaced by pyhmmer/pyhmmsearch, pyrodigal-gv, and diced; only BLAST+ remains external. The database and XGBoost models now ship inside the wheel/sdist, and CLI scripts/README are updated to recommend pip installs and document the new workflows.

Key changes

  • Packaging: moved to pyproject.toml + MANIFEST.in; removed build.sh. All data (profiles, JSON/UBJ XGBoost model, tabs) live in cctyper/data and are bundled in the wheel.
  • Pipeline deps: HMMER → pyhmmer/pyhmmsearch, prodigal → pyrodigal-gv (trains in single mode when needed), minced → diced; HMM DB rebuilt with NAME=file stem and auto-decompression.
  • GFF/protein path: --gff + --prot to skip gene calling using existing CDS annotations.
  • Repeat typing: XGBoost model converted to JSON/UBJ and bundled; guards when predictions are missing.
  • Plotting: adapted to drawsvg 2.x with invert_y to preserve old coordinates.
  • CLI/docs: entry scripts refreshed; cleanup hardened; packaged DB resolver; README revised (pip recommended, BLAST requirement, GFF usage).

To do:

  • Include Type Hints in functions, Classes and dataframes for better maintenance
  • Prepare for version 2.0 with new models and classification scheme, polars, improved speed...

Version

  • Bumped to 1.9.0.

@pentamorfico
Copy link
Copy Markdown
Collaborator Author

Hi @Russel88,

I wanted to follow up on PR #66

We are currently planning a new release that includes these changes, since the issues addressed in this PR are creating problems for downstream tool development on our side.

Ideally, we would very much prefer to keep maintaining this tool in this repository, as the original upstream project, so that development remains centralized here and credit is clearly preserved. However, if the project is no longer being actively maintained, we will likely need to release a new version ourselves so that these fixes are publicly available and installable through PyPI and related channels.

Before moving in that direction, I wanted to check whether the repository is still active and whether you would be open to reviewing or merging the PR.

Happy to help with anything needed to move this forward.

Best,
Mario

Copy link
Copy Markdown
Owner

@Russel88 Russel88 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a quick review. Looks great. Small changes requested

Comment thread README.md
export CCTYPER_DB="/path/to/data/"
# or by using the --db argument each time you run CRISPRCasTyper:
cctyper input.fa output --db /path/to/data/
mamba create -n cctyper bioconda::blast python>=3.10
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
mamba create -n cctyper bioconda::blast python>=3.10
mamba create -n cctyper bioconda::blast "python>=3.10"

Comment thread README.md
cctyper input.fa output --db /path/to/data/
mamba create -n cctyper bioconda::blast python>=3.10
mamba activate cctyper
pip install git+https://github.com/pentamorfico/CRISPRCasTyper.git
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be updated on merge into main

Comment thread README.md
@@ -205,10 +188,8 @@ cctyper -h
* **hmmer.log** Error messages from HMMER (only produced if any errors were encountered)

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* **crisprs.gff** GFF with CRISPR arrays

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants