Assistant-focused model suite for command understanding, hierarchical NLU, and runtime-safe execution
Start Here | Model Explorer | Quick Paths | Fair Benchmarks | Current Model List | Add New Model
For first-time visitors, begin with Janus:
- Featured model: JaneGPT-v2-Janus
- Full docs: JaneGPT-v2-Janus/README.md
- Try now: JaneGPT-v2-Janus/examples/demo_runtime.py
If you want a simpler intent-only baseline:
| Track | Best For | What You Get | Jump |
|---|---|---|---|
| JaneGPT-v2-Janus | Real assistant runtime flows | Hierarchical (domain, action, slots), clarifications, pending-slot fill, follow-up state |
Open Janus |
| JaneGPT-v2 | Fast intent routing baseline | Lightweight 22-intent classifier with simple integration | Open v2 |
Expand: Janus quick navigation
- Docs: JaneGPT-v2-Janus/README.md
- Runtime demo: JaneGPT-v2-Janus/examples/demo_runtime.py
- Inference demo: JaneGPT-v2-Janus/examples/demo_inference.py
- Runtime source: JaneGPT-v2-Janus/runtime/jane_nlu_runtime.py
Expand: v2 quick navigation
- Docs: JaneGPT-v2/README.md
- Basic inference: JaneGPT-v2/examples/basic_inference.py
- Classifier wrapper: JaneGPT-v2/model/classifier.py
| I Want To... | Go Here |
|---|---|
| Understand Janus architecture and runtime behavior | JaneGPT-v2-Janus/README.md |
| Run assistant-style multi-turn behavior | JaneGPT-v2-Janus/examples/demo_runtime.py |
| Run pure model inference only | JaneGPT-v2-Janus/examples/demo_inference.py |
| Use a simpler classifier baseline | JaneGPT-v2/README.md |
| Benchmark classifier performance | JaneGPT-v2/examples/benchmark.py |
| View fair benchmark summary | JaneGPT-v2-Janus/reports/fair_benchmarks.md |
Only schema-aligned or schema-agnostic benchmarks are shown here.
| Fair Test | JaneGPT-v2 | JaneGPT-v2-Janus | Why It Is Fair |
|---|---|---|---|
| Latency (CUDA, batch=1) | 31.60 ms mean, 32 preds/sec | 25.31 ms mean predict, 34.60 ms p95 | Same local hardware and same benchmark pipeline |
| Runtime reliability suite (82 turns) | - | 67 local commands, 3 Llama routes, 12 clarifications, 0 errors | In-domain assistant behavior with strict pass/fail |
| OOD rejection on BANKING77 | OOD F1: 94.31% | OOD F1: 87.80% | Label-schema independent safety test |
| OOD rejection on CLINC OOS | OOD F1: 89.16% | OOD F1: 79.23% | Label-schema independent safety test |
Jane was trained on assistant commands like "turn_on_lights" and "set_reminder".
MASSIVE/SNIPS use different command names ("light_on", "alarm_set", etc.).
Only ~50% of their labels could be mapped, so accuracy scores would mislead about Jane's real quality.
Instead, we show OOD safety tests (out-of-domain rejection) which prove this model's smarts on any topic.
Understanding These Benchmarks:
| Benchmark | What It Tests | What It Means | Example |
|---|---|---|---|
| Latency | How fast Jane runs per prediction | Speed is critical for real-time assistants. Under 50ms = excellent; over 200ms = noticeable lag | User says "turn on lights" → model responds in ~25ms (Janus) or ~32ms (v2) |
| Runtime Reliability | Can Jane handle 82 multi-turn conversations without crashing? | 0 errors = production-ready; 10+ errors = unstable. Tests real assistant behavior (clarifications, slot filling, state changes) | Turn 1: "Set alarm" → Turn 45: "Change to 3pm" → Turn 82: Still perfect |
| OOD Safety (BANKING77) | Can Jane reject finance questions when trained on home automation? | Tests this model's judgment. ~90% F1 = excellent (rejects what it shouldn't handle). Under 60% = dangerous (would give wrong answers) | User asks "What's my account balance?" → Jane correctly says "I can't help with that" |
| OOD Safety (CLINC) | Can Jane reject random real-world off-topic requests? | Similar to BANKING77 but with diverse random questions. Proves this model knows its limits | User asks "What's the capital of France?" → Jane correctly rejects it |
Bottom Line: Jane is SOLID ✅
- Fast enough for real users (25-31ms per prediction)
- Stable enough for production (0 crashes in 82 turns)
- Safe enough to deploy (87-94% OOD rejection accuracy)
Full detailed report:
| Model | Purpose | Key Highlights | Recommended Entry |
|---|---|---|---|
| JaneGPT-v2-Janus | Hierarchical NLU with runtime state | 7.95M params, domain+action+slot output, clarification/pending-slot runtime | JaneGPT-v2-Janus/README.md |
| JaneGPT-v2 | Intent classification baseline | 7.8M params, 22 intents, fast lightweight classifier | JaneGPT-v2/README.md |
- assets - shared visual assets
- JaneGPT-v2-Janus - hierarchical NLU + runtime package
- JaneGPT-v2 - intent classifier package
Each model folder defines its own license:
Ravindu Senanayake
