Skip to content

Commit 4929d97

Browse files
committed
Squashed commit of the following:
commit b0a6d95 Author: tornikeo <[email protected]> Date: Fri Jan 17 09:57:29 2025 +0000 Bump version commit 374c3a7 Author: tornikeo <[email protected]> Date: Fri Jan 17 08:44:40 2025 +0000 Remove pooch from dependencies commit aae53ce Author: tornikeo <[email protected]> Date: Thu Jan 16 23:38:33 2025 +0100 Tiny notebook fixup commit 6a571a6 Author: tornikeo <[email protected]> Date: Thu Jan 16 22:24:43 2025 +0000 Update pyporoject.toml to best practices commit 9343e21 Author: tornikeo <[email protected]> Date: Thu Jan 16 21:59:40 2025 +0000 Make figure visuals more consistent commit d5c440b Author: tornikeo <[email protected]> Date: Thu Jan 16 11:46:00 2025 +0100 Add CudaFingerPrint doctest commit c79f4b7 Merge: bd20e91 5a2cb7c Author: tornikeo <[email protected]> Date: Thu Jan 16 11:41:57 2025 +0100 Merge branch 'main' into development commit bd20e91 Author: tornikeo <[email protected]> Date: Thu Jan 16 11:21:07 2025 +0100 Rely on matchms for numba dependency commit 2f431f8 Author: tornikeo <[email protected]> Date: Thu Jan 16 11:16:53 2025 +0100 Fix tests picking up data dirs commit ee46d71 Author: tornikeo <[email protected]> Date: Thu Jan 16 10:59:33 2025 +0100 Add doctests commit 4d6e8d0 Author: tornikeo <[email protected]> Date: Thu Jan 16 10:03:29 2025 +0100 Add one doctest for cosine greedy commit b8f194d Author: tornikeo <[email protected]> Date: Wed Jan 15 23:02:32 2025 +0100 Don't mention experimental CLI in the readme commit b364bb1 Author: tornikeo <[email protected]> Date: Wed Jan 15 22:45:44 2025 +0100 Remove non-vital dependencies commit 53bbf03 Author: tornikeo <[email protected]> Date: Fri Jan 10 22:17:50 2025 +0100 Change speed with comparisons/s commit 4babe67 Author: tornikeo <[email protected]> Date: Wed Dec 25 17:26:46 2024 +0100 Respect NUMBA SIM env var in CPU tests commit 57dee09 Author: tornikeo <[email protected]> Date: Mon Dec 23 10:33:40 2024 +0100 Fix typo in readme commit 6438cb5 Author: tornikeo <[email protected]> Date: Sun Dec 15 14:53:26 2024 +0100 Fix incorrectly set up test for FP comparsion commit c513561 Author: tornikeo <[email protected]> Date: Sun Dec 15 11:11:30 2024 +0100 Re-add HF spaces demo commit 42c8db9 Author: tornikeo <[email protected]> Date: Sat Dec 14 22:58:22 2024 +0100 Loosen numba requirements in favor of matchms commit d8d07a5 Author: tornikeo <[email protected]> Date: Sat Dec 14 21:21:23 2024 +0100 Update citation commit 74779cd Author: tornikeo <[email protected]> Date: Mon Dec 2 11:34:56 2024 +0100 Update BLINK benchmark commit c1c607d Author: tornikeo <[email protected]> Date: Thu Oct 24 13:24:31 2024 +0200 Include BLINK comparison, fix ReadME BLINK: Simplify plotting Include numpy version notice in colab tutorial Update README commit d02fc5e Author: tornikeo <[email protected]> Date: Fri Jul 26 13:27:34 2024 +0200 Fix ReadME type commit 4d5557b Author: tornikeo <[email protected]> Date: Fri Jul 26 13:12:55 2024 +0200 Add a visual guide figure for SimMS commit 0759225 Author: tornikeo <[email protected]> Date: Fri Jul 26 13:10:06 2024 +0200 Add flake8 commit 5c703b2 Author: tornikeo <[email protected]> Date: Fri Jul 26 13:08:26 2024 +0200 Update the colab tutorial commit 3399b8a Author: tornikeo <[email protected]> Date: Fri Jul 26 13:00:52 2024 +0200 Update ReadME, fix matchms removing add_losses
1 parent 5a2cb7c commit 4929d97

20 files changed

+2682
-2776
lines changed

.github/workflows/python-package.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ jobs:
1616
strategy:
1717
fail-fast: false
1818
matrix:
19-
python-version: ["3.9", "3.10", "3.11"]
19+
python-version: ["3.9"] #, "3.10", "3.11"]
2020

2121
steps:
2222
- uses: actions/checkout@v3

.pre-commit-config.yaml

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,9 @@
11
repos:
2+
- repo: https://github.com/PyCQA/autoflake
3+
rev: v2.2.1
4+
hooks:
5+
- id: autoflake
6+
args: [--remove-all-unused-imports, --in-place]
27
- repo: https://github.com/nbQA-dev/nbQA
38
rev: 0.11.0 # Use the latest version
49
hooks:
@@ -17,4 +22,5 @@ repos:
1722
- repo: https://github.com/PyCQA/flake8
1823
rev: 7.0.0
1924
hooks:
20-
- id: flake8
25+
- id: flake8
26+
args: ["--ignore=E501,W503"]

README.md

Lines changed: 12 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -26,30 +26,33 @@
2626
</tr>
2727
</table>
2828

29-
Calculate similarity between large number of mass spectra using a GPU. SimMS aims to provide very fast replacements for commonly used similarity functions in [matchms](https://github.com/matchms/matchms/).
30-
`
29+
Calculate the similarity between a large number of mass spectra using a GPU. SimMS aims to provide very fast replacements for commonly used similarity functions in [matchms](https://github.com/matchms/matchms).
30+
3131
<div style='text-align:center'>
3232

3333
![img](./assets/perf_speedup.svg)
3434

3535
</div>
3636

37-
3837
# How SimMS works, in a nutshell
3938

4039
![alt text](assets/visual_guide.png)
4140

42-
Comparing large sets of mass spectra can be done in parallel, since scores can be calculated independent of the other scores. By leveraging a large number of threads in a GPU, we created a GPU program (kernel) that calculates a 4096 x 4096 similarity matrix in a fraction of a second. By iteratively calculating similarities for batches of spectra, SimMS can quickly process datasets much larger than the GPU memory. For details, visit the [preprint](https://www.biorxiv.org/content/biorxiv/early/2024/07/25/2024.07.24.605006.full.pdf).
41+
Comparing large sets of mass spectra can be done in parallel since scores can be calculated independently of each other.
42+
By leveraging a large number of threads in a GPU, we created a GPU program (kernel) that calculates a 4096x4096
43+
similarity matrix in a fraction of a second.
44+
By iteratively calculating similarities for batches of spectra, SimMS can quickly process datasets much larger than the GPU's memory.
45+
For details, visit the [preprint](https://www.biorxiv.org/content/biorxiv/early/2024/07/25/2024.07.24.605006.full.pdf).
4346

4447
# Quickstart
4548

4649
## Hardware
4750

48-
Any GPU [supported](https://numba.pydata.org/numba-doc/dev/cuda/overview.html#requirements) by numba can be used. We tested a number of GPUs:
51+
Any GPU [supported](https://numba.pydata.org/numba-doc/dev/cuda/overview.html#requirements) by Numba can be used. We tested a number of GPUs:
4952

50-
- RTX 2070, used on local machine
53+
- RTX 2070, used on a local machine
5154
- T4 GPU, offered for free on Colab
52-
- RTX4090 GPU, offered on vast.ai
55+
- RTX 4090 GPU, offered on vast.ai
5356
- Any A100 GPU, offered on vast.ai
5457

5558
The `pytorch/pytorch:2.2.1-cuda12.1-cudnn8-devel` docker [image](https://hub.docker.com/layers/pytorch/pytorch/2.2.1-cuda12.1-cudnn8-devel/images/sha256-42204bca460bb77cbd524577618e1723ad474e5d77cc51f94037fffbc2c88c6f?context=explore) was used for development and testing.
@@ -84,21 +87,6 @@ scores = calculate_scores(
8487
scores.scores_by_query(queries[42], 'CudaCosineGreedy_score', sort=True)
8588
```
8689

87-
## Use as a CLI
88-
89-
```sh
90-
pangea-simms --references library.mgf --queries queries.mgf --output_file scores.pickle \
91-
--tolerance 0.01 \
92-
--mz_power 1 \
93-
--intensity_power 1 \
94-
--batch_size 512 \
95-
--n_max_peaks 512 \
96-
--match_limit 1024 \
97-
--array_type numpy \
98-
--sparse_threshold 0.5 \
99-
--method CudaCosineGreedy
100-
```
101-
10290
# Supported similarity functions
10391

10492
- `CudaModifiedCosine`, equivalent to [ModifiedCosine](https://matchms.readthedocs.io/en/latest/api/matchms.similarity.ModifiedCosine.html)
@@ -134,15 +122,15 @@ pip install git+https://github.com/PangeAI/simms
134122

135123
The `pytorch/pytorch:2.2.1-cuda12.1-cudnn8-devel` has nearly everything you need. Once inside, do:
136124

137-
```
125+
```sh
138126
pip install git+https://github.com/PangeAI/simms
139127
```
140128

141129
## Run on vast.ai
142130

143131
Use [this template](https://cloud.vast.ai/?ref_id=51575&template_id=f45f6048db515291bda978a34e908d09) as a starting point, once inside, simply do:
144132

145-
```
133+
```sh
146134
pip install git+https://github.com/PangeAI/simms
147135
```
148136

0 commit comments

Comments
 (0)