refactor(jailbreak): Use onnx instead of pickle to load model#1715
refactor(jailbreak): Use onnx instead of pickle to load model#1715erickgalinkin wants to merge 6 commits intodevelopfrom
Conversation
Signed-off-by: Erick Galinkin <[email protected]>
Greptile SummaryThis PR replaces the
|
| Filename | Overview |
|---|---|
| nemoguardrails/library/jailbreak_detection/model_based/checks.py | Two P0 bugs: broken from model_based.models import (should be from .models) and discarded hf_hub_download return value meaning downloaded file is never found at the expected path. |
| nemoguardrails/library/jailbreak_detection/model_based/models.py | Migrates pickle-based sklearn classifier to ONNX InferenceSession; redundant res[1][:2] slice noted in prior thread but functionally harmless; CPUExecutionProvider hardcoded which is intentional for the RF model. |
| tests/test_jailbreak_model_based.py | Most mocking updated correctly (onnxruntime InferenceSession), but test_model_based_classifier_missing_deps still patches sklearn.ensemble which is no longer used, making it ineffective. |
| nemoguardrails/library/jailbreak_detection/requirements.txt | Replaces sklearn/numpy 1.23.5 with onnxruntime and updates torch/transformers; huggingface_hub added for auto-download; torchvision retained per prior thread explanation. |
| nemoguardrails/library/jailbreak_detection/Dockerfile | Updates wget target from snowflake.pkl to snowflake.onnx; straightforward change. |
| nemoguardrails/library/jailbreak_detection/Dockerfile-GPU | Same as Dockerfile — updates wget target from .pkl to .onnx. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[initialize_model called] --> B{EMBEDDING_CLASSIFIER_PATH set?}
B -- No --> C[Log warning, return None]
B -- Yes --> D{snowflake.onnx present at classifier_path?}
D -- Yes --> G
D -- No --> E[hf_hub_download snowflake.onnx]
E --> F["⚠️ Return value discarded\nFile stored in nested cache dir\nnot at classifier_path/snowflake.onnx"]
F --> G["from model_based.models import JailbreakClassifier\n⚠️ ModuleNotFoundError in package context\n(should be from .models)"]
G --> H[JailbreakClassifier snowflake.onnx]
H --> I[SnowflakeEmbed - torch/transformers]
H --> J[InferenceSession CPUExecutionProvider]
I --> K[embed text → numpy array]
K --> L["classifier.run None, X: e"]
L --> M["res[0] → classification label"]
L --> N["res[1][0][classification] → probability"]
M --> O[Return bool classification, float score]
N --> O
Comments Outside Diff (2)
-
nemoguardrails/library/jailbreak_detection/model_based/checks.py, line 51-53 (link)Broken import path will cause
ModuleNotFoundErrorfrom model_based.models import JailbreakClassifieris not a valid absolute import and is not a relative import (missing the leading dot). Whenchecks.pyis imported via thenemoguardrailspackage (e.g. asnemoguardrails.library.jailbreak_detection.model_based.checks), Python will look for a top-levelmodel_basedpackage which does not exist. The original code used the full absolute path; the replacement should use a relative import. -
nemoguardrails/library/jailbreak_detection/model_based/checks.py, line 44-56 (link)hf_hub_downloadreturn value discarded; file path will be wronghf_hub_downloaddoes not place the file directly atclassifier_path/snowflake.onnx. It stores files in a nested HuggingFace cache structure (e.g.classifier_path/models--nvidia--NemoGuard-JailbreakDetect/snapshots/<hash>/snowflake.onnx) and returns the actual resolved path. Because the return value is discarded, the subsequentJailbreakClassifier(str(Path(classifier_path).joinpath("snowflake.onnx")))call will still not find the file, causing a runtime error on every cold start.The fix is to capture the returned path and pass it directly:
# check if model is present. If not, download it. model_path = Path(classifier_path).joinpath("snowflake.onnx") if not model_path.is_file(): from huggingface_hub import hf_hub_download downloaded = hf_hub_download( repo_id="nvidia/NemoGuard-JailbreakDetect", filename="snowflake.onnx", local_dir=classifier_path, ) model_path = Path(downloaded) from .models import JailbreakClassifier jailbreak_classifier = JailbreakClassifier(str(model_path))
Note:
local_dir(notcache_dir) places the file directly into the target directory.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: nemoguardrails/library/jailbreak_detection/model_based/checks.py
Line: 51-53
Comment:
**Broken import path will cause `ModuleNotFoundError`**
`from model_based.models import JailbreakClassifier` is not a valid absolute import and is not a relative import (missing the leading dot). When `checks.py` is imported via the `nemoguardrails` package (e.g. as `nemoguardrails.library.jailbreak_detection.model_based.checks`), Python will look for a top-level `model_based` package which does not exist. The original code used the full absolute path; the replacement should use a relative import.
```suggestion
from .models import (
JailbreakClassifier,
)
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: nemoguardrails/library/jailbreak_detection/model_based/checks.py
Line: 44-56
Comment:
**`hf_hub_download` return value discarded; file path will be wrong**
`hf_hub_download` does not place the file directly at `classifier_path/snowflake.onnx`. It stores files in a nested HuggingFace cache structure (e.g. `classifier_path/models--nvidia--NemoGuard-JailbreakDetect/snapshots/<hash>/snowflake.onnx`) and returns the actual resolved path. Because the return value is discarded, the subsequent `JailbreakClassifier(str(Path(classifier_path).joinpath("snowflake.onnx")))` call will still not find the file, causing a runtime error on every cold start.
The fix is to capture the returned path and pass it directly:
```python
# check if model is present. If not, download it.
model_path = Path(classifier_path).joinpath("snowflake.onnx")
if not model_path.is_file():
from huggingface_hub import hf_hub_download
downloaded = hf_hub_download(
repo_id="nvidia/NemoGuard-JailbreakDetect",
filename="snowflake.onnx",
local_dir=classifier_path,
)
model_path = Path(downloaded)
from .models import JailbreakClassifier
jailbreak_classifier = JailbreakClassifier(str(model_path))
```
Note: `local_dir` (not `cache_dir`) places the file directly into the target directory.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: tests/test_jailbreak_model_based.py
Line: 81-93
Comment:
**Stale test checks `sklearn.ensemble` which is no longer imported**
The test description says "If sklearn is missing, instantiating `JailbreakClassifier` should raise `ImportError`", but `sklearn` is no longer imported anywhere in `JailbreakClassifier.__init__` — only `onnxruntime` is. Setting `sklearn.ensemble` to `None` will therefore not trigger an `ImportError` from the new code path, making this test validate the wrong dependency. It should mock `onnxruntime` as `None` instead:
```suggestion
def test_model_based_classifier_missing_deps(monkeypatch):
"""
If onnxruntime is missing, instantiating JailbreakClassifier should raise ImportError.
"""
monkeypatch.setitem(sys.modules, "onnxruntime", None)
```
How can I resolve this? If you propose a fix, please make it concise.Reviews (5): Last reviewed commit: "Fix file download for checks. Update Doc..." | Re-trigger Greptile
📝 WalkthroughWalkthroughThis PR migrates the jailbreak detection classifier from a pickle-based model to ONNX Runtime, updating the model initialization, inference mechanism, dependencies, and test expectations accordingly. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
nemoguardrails/library/jailbreak_detection/model_based/models.py (2)
25-35:⚠️ Potential issue | 🟠 MajorPin the model revision to an immutable commit hash.
trust_remote_code=Trueexecutes code from the Hugging Face repository at load time. Without pinningrevisionto a specific commit, the application will execute any code updates published to that repository in the future, creating a supply-chain vulnerability.Add
revision="<commit-sha>"(full commit hash, not a branch or tag) to bothAutoTokenizer.from_pretrained()andAutoModel.from_pretrained()calls.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@nemoguardrails/library/jailbreak_detection/model_based/models.py` around lines 25 - 35, The AutoTokenizer and AutoModel loads (AutoTokenizer.from_pretrained and AutoModel.from_pretrained) currently use trust_remote_code=True without pinning a revision; update both calls to include revision="<commit-sha>" (a full immutable commit hash string) so the tokenizer and model load a specific commit, keeping trust_remote_code=True and preserving existing args (safe_serialization, add_pooling_layer) while preventing future remote code changes from being executed.
25-35:⚠️ Potential issue | 🟠 MajorReplace
safe_serializationwithuse_safetensorsfor load-time safetensors configuration.
safe_serializationis a save-time flag, not a load-time parameter. OnAutoTokenizer.from_pretrained()andAutoModel.from_pretrained(), useuse_safetensors=True(oruse_safetensors=Noneto prefer safetensors if available) instead. The current parameter will be ignored or may cause errors depending on the transformers version.Suggested fix
self.tokenizer = AutoTokenizer.from_pretrained( "Snowflake/snowflake-arctic-embed-m-long", trust_remote_code=True, use_safetensors=True ) self.model = AutoModel.from_pretrained( "Snowflake/snowflake-arctic-embed-m-long", trust_remote_code=True, add_pooling_layer=False, use_safetensors=True, )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@nemoguardrails/library/jailbreak_detection/model_based/models.py` around lines 25 - 35, The calls to AutoTokenizer.from_pretrained and AutoModel.from_pretrained in the model initialization use the save-time parameter safe_serialization; replace safe_serialization with the load-time option use_safetensors (e.g., use_safetensors=True or use_safetensors=None to prefer safetensors) in both self.tokenizer and self.model constructor calls (keep trust_remote_code and add_pooling_layer as-is) so the transformers loader actually uses safetensors at load time.
🧹 Nitpick comments (3)
nemoguardrails/library/jailbreak_detection/model_based/models.py (1)
56-57: BuildXas an explicit batched ndarray before calling ONNX Runtime.
{"X": [e]}passes a Python list of vectors into ORT. Exported sklearn ONNX models usually declareXas a 2-D float tensor, so this is relying on implicit coercion in a pretty fragile place.🧪 Suggested change
def __call__(self, text: str) -> Tuple[bool, float]: e = self.embed(text) - res = self.classifier.run(None, {"X": [e]}) + x = np.asarray([e], dtype=np.float32) + res = self.classifier.run(None, {"X": x})Also add at module scope:
import numpy as np🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@nemoguardrails/library/jailbreak_detection/model_based/models.py` around lines 56 - 57, The code passes a Python list to ONNX Runtime (self.classifier.run(None, {"X": [e]}) ), which relies on implicit coercion; instead, import numpy as np at module scope and build an explicit 2-D batched ndarray for X (e.g., X = np.asarray([e], dtype=np.float32) or np.expand_dims with dtype float32) before calling self.classifier.run(None, {"X": X}) so the input matches the exported sklearn ONNX model's 2-D float tensor expectation.nemoguardrails/library/jailbreak_detection/model_based/checks.py (1)
46-46: Fail fast ifsnowflake.onnxis missing.A bad
EMBEDDING_CLASSIFIER_PATHwill currently bubble up as a low-level ONNX Runtime error. Checkingis_file()here would make startup failures much easier to diagnose, especially since this migration depends on a new artifact being present.🛠 Suggested change
- jailbreak_classifier = JailbreakClassifier(str(Path(classifier_path).joinpath("snowflake.onnx"))) + model_path = Path(classifier_path).joinpath("snowflake.onnx") + if not model_path.is_file(): + raise FileNotFoundError(f"Jailbreak classifier not found: {model_path}") + jailbreak_classifier = JailbreakClassifier(str(model_path))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@nemoguardrails/library/jailbreak_detection/model_based/checks.py` at line 46, Check for the presence of the ONNX file before instantiating JailbreakClassifier: compute the path from classifier_path and "snowflake.onnx" (reference the variable classifier_path and the filename "snowflake.onnx"), use Path(...).is_file() and if it returns False raise a clear, early error (e.g., ValueError or RuntimeError) that includes the missing path, then only call JailbreakClassifier with that path; this ensures startup fails fast instead of surfacing a low-level ONNX Runtime error.tests/test_jailbreak_model_based.py (1)
230-257: Please add a unit test for the ONNX session output shape.This assertion only locks down the filename change. The fragile part of the migration is in
nemoguardrails/library/jailbreak_detection/model_based/models.py, Lines 54-62, whereInferenceSession.run()output is unpacked and scored; a mocked ORT test would catch regressions there.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_jailbreak_model_based.py` around lines 230 - 257, Add a unit test that mocks onnxruntime.InferenceSession.run to return the exact tuple/array shapes your model code expects and assert the classifier handles the unpacking and produces the expected output; specifically, in the new test mock models.InferenceSession (or the attribute used inside nemoguardrails.library.jailbreak_detection.model_based.models) so that InferenceSession.run returns a tuple of numpy arrays with the same shapes used by JailbreakClassifier scoring, then instantiate or call JailbreakClassifier (via initialize_model or directly) and assert no exceptions and that the returned score/label shapes/values match expectations; focus on covering InferenceSession.run, JailbreakClassifier, and the unpacking logic so regressions in the run() output shape will fail the test.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@nemoguardrails/library/jailbreak_detection/requirements.txt`:
- Line 14: The file ending for the requirements list is missing a trailing
newline; edit the requirements.txt entry that contains "onnxruntime>=1.24.3" and
ensure the file ends with a single newline character (add a newline after the
last line) so the end-of-file fixer / linters stop failing.
- Around line 7-14: The requirements file is missing a trailing newline which
causes lint failures; open
nemoguardrails/library/jailbreak_detection/requirements.txt and add a single
newline character at the end of the file (after the last entry
onnxruntime>=1.24.3) so the file ends with a newline; save and commit the
change.
---
Outside diff comments:
In `@nemoguardrails/library/jailbreak_detection/model_based/models.py`:
- Around line 25-35: The AutoTokenizer and AutoModel loads
(AutoTokenizer.from_pretrained and AutoModel.from_pretrained) currently use
trust_remote_code=True without pinning a revision; update both calls to include
revision="<commit-sha>" (a full immutable commit hash string) so the tokenizer
and model load a specific commit, keeping trust_remote_code=True and preserving
existing args (safe_serialization, add_pooling_layer) while preventing future
remote code changes from being executed.
- Around line 25-35: The calls to AutoTokenizer.from_pretrained and
AutoModel.from_pretrained in the model initialization use the save-time
parameter safe_serialization; replace safe_serialization with the load-time
option use_safetensors (e.g., use_safetensors=True or use_safetensors=None to
prefer safetensors) in both self.tokenizer and self.model constructor calls
(keep trust_remote_code and add_pooling_layer as-is) so the transformers loader
actually uses safetensors at load time.
---
Nitpick comments:
In `@nemoguardrails/library/jailbreak_detection/model_based/checks.py`:
- Line 46: Check for the presence of the ONNX file before instantiating
JailbreakClassifier: compute the path from classifier_path and "snowflake.onnx"
(reference the variable classifier_path and the filename "snowflake.onnx"), use
Path(...).is_file() and if it returns False raise a clear, early error (e.g.,
ValueError or RuntimeError) that includes the missing path, then only call
JailbreakClassifier with that path; this ensures startup fails fast instead of
surfacing a low-level ONNX Runtime error.
In `@nemoguardrails/library/jailbreak_detection/model_based/models.py`:
- Around line 56-57: The code passes a Python list to ONNX Runtime
(self.classifier.run(None, {"X": [e]}) ), which relies on implicit coercion;
instead, import numpy as np at module scope and build an explicit 2-D batched
ndarray for X (e.g., X = np.asarray([e], dtype=np.float32) or np.expand_dims
with dtype float32) before calling self.classifier.run(None, {"X": X}) so the
input matches the exported sklearn ONNX model's 2-D float tensor expectation.
In `@tests/test_jailbreak_model_based.py`:
- Around line 230-257: Add a unit test that mocks
onnxruntime.InferenceSession.run to return the exact tuple/array shapes your
model code expects and assert the classifier handles the unpacking and produces
the expected output; specifically, in the new test mock models.InferenceSession
(or the attribute used inside
nemoguardrails.library.jailbreak_detection.model_based.models) so that
InferenceSession.run returns a tuple of numpy arrays with the same shapes used
by JailbreakClassifier scoring, then instantiate or call JailbreakClassifier
(via initialize_model or directly) and assert no exceptions and that the
returned score/label shapes/values match expectations; focus on covering
InferenceSession.run, JailbreakClassifier, and the unpacking logic so
regressions in the run() output shape will fail the test.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 35bd8e9f-9b1d-410e-a34a-4e98c023a745
📒 Files selected for processing (4)
nemoguardrails/library/jailbreak_detection/model_based/checks.pynemoguardrails/library/jailbreak_detection/model_based/models.pynemoguardrails/library/jailbreak_detection/requirements.txttests/test_jailbreak_model_based.py
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: Erick Galinkin <[email protected]>
Signed-off-by: Erick Galinkin <[email protected]>
Signed-off-by: Erick Galinkin <[email protected]>
Signed-off-by: Erick Galinkin <[email protected]>
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
…ix requirements.txt
Description
Remove use of
pickleby migrating toonnxruntime.NB: depends on push of
onnxmodel to HuggingFace.Summary by CodeRabbit