-
Notifications
You must be signed in to change notification settings - Fork 304
Open
Description
| Model | Dimension | Latency | Accuracy | Memory |
|---|---|---|---|---|
| Qwen3 | 1024 | 30-50ms | Highest | 600MB |
| Gemma | 768 | 10-20ms | High | 300MB |
| Gemma | 512 | 8-15ms | Medium | 300MB |
| Gemma | 256 | 5-10ms | Lower | 300MB |
The Memory column seems to be incorrect. 300 and 600 are model sizes, not memory usage. And dtype is f32, so it should be at least 4 times the model size.
| candle_core::DType::F32, |
Metadata
Metadata
Assignees
Labels
No labels
Type
Projects
Status
Backlog