Skip to content

Ravindu-S/JaneGPT-v2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

JaneGPT Core

Assistant-focused model suite for command understanding, hierarchical NLU, and runtime-safe execution

Models 2 Featured JaneGPT-v2-Janus Updated Apr 2026

Animated tagline

Start Here | Model Explorer | Quick Paths | Fair Benchmarks | Current Model List | Add New Model


Model Lineup

JaneGPT-v2-Janus  JaneGPT-v2

Janus: Domain + Action + Slots + Stateful Follow-ups  v2: Fast 22-intent routing model


Start Here

For first-time visitors, begin with Janus:

If you want a simpler intent-only baseline:


Model Explorer

Track Best For What You Get Jump
JaneGPT-v2-Janus Real assistant runtime flows Hierarchical (domain, action, slots), clarifications, pending-slot fill, follow-up state Open Janus
JaneGPT-v2 Fast intent routing baseline Lightweight 22-intent classifier with simple integration Open v2
Expand: Janus quick navigation
Expand: v2 quick navigation

Quick Paths

I Want To... Go Here
Understand Janus architecture and runtime behavior JaneGPT-v2-Janus/README.md
Run assistant-style multi-turn behavior JaneGPT-v2-Janus/examples/demo_runtime.py
Run pure model inference only JaneGPT-v2-Janus/examples/demo_inference.py
Use a simpler classifier baseline JaneGPT-v2/README.md
Benchmark classifier performance JaneGPT-v2/examples/benchmark.py
View fair benchmark summary JaneGPT-v2-Janus/reports/fair_benchmarks.md

Fair Benchmarks (Apr 2026)

Janus Runtime 82 Turns 0 Errors Janus Predict Mean 25.31ms v2 Predict Mean 31.60ms

Only schema-aligned or schema-agnostic benchmarks are shown here.

Fair Test JaneGPT-v2 JaneGPT-v2-Janus Why It Is Fair
Latency (CUDA, batch=1) 31.60 ms mean, 32 preds/sec 25.31 ms mean predict, 34.60 ms p95 Same local hardware and same benchmark pipeline
Runtime reliability suite (82 turns) - 67 local commands, 3 Llama routes, 12 clarifications, 0 errors In-domain assistant behavior with strict pass/fail
OOD rejection on BANKING77 OOD F1: 94.31% OOD F1: 87.80% Label-schema independent safety test
OOD rejection on CLINC OOS OOD F1: 89.16% OOD F1: 79.23% Label-schema independent safety test

⚠️ Why we exclude MASSIVE/SNIPS from headlines:
Jane was trained on assistant commands like "turn_on_lights" and "set_reminder".
MASSIVE/SNIPS use different command names ("light_on", "alarm_set", etc.).
Only ~50% of their labels could be mapped, so accuracy scores would mislead about Jane's real quality.
Instead, we show OOD safety tests (out-of-domain rejection) which prove this model's smarts on any topic.


Understanding These Benchmarks:

Benchmark What It Tests What It Means Example
Latency How fast Jane runs per prediction Speed is critical for real-time assistants. Under 50ms = excellent; over 200ms = noticeable lag User says "turn on lights" → model responds in ~25ms (Janus) or ~32ms (v2)
Runtime Reliability Can Jane handle 82 multi-turn conversations without crashing? 0 errors = production-ready; 10+ errors = unstable. Tests real assistant behavior (clarifications, slot filling, state changes) Turn 1: "Set alarm" → Turn 45: "Change to 3pm" → Turn 82: Still perfect
OOD Safety (BANKING77) Can Jane reject finance questions when trained on home automation? Tests this model's judgment. ~90% F1 = excellent (rejects what it shouldn't handle). Under 60% = dangerous (would give wrong answers) User asks "What's my account balance?" → Jane correctly says "I can't help with that"
OOD Safety (CLINC) Can Jane reject random real-world off-topic requests? Similar to BANKING77 but with diverse random questions. Proves this model knows its limits User asks "What's the capital of France?" → Jane correctly rejects it

Bottom Line: Jane is SOLID

  • Fast enough for real users (25-31ms per prediction)
  • Stable enough for production (0 crashes in 82 turns)
  • Safe enough to deploy (87-94% OOD rejection accuracy)

Full detailed report:


Current Model List

Model Purpose Key Highlights Recommended Entry
JaneGPT-v2-Janus Hierarchical NLU with runtime state 7.95M params, domain+action+slot output, clarification/pending-slot runtime JaneGPT-v2-Janus/README.md
JaneGPT-v2 Intent classification baseline 7.8M params, 22 intents, fast lightweight classifier JaneGPT-v2/README.md

Repo Layout


License

Each model folder defines its own license:


Author

Ravindu Senanayake

Back to top

Packages

 
 
 

Contributors

Languages