Question: How much memory does the model need?

copid from https://github.com/vllm-project/semantic-router/blob/main/website/docs/tutorials/intelligent-route/embedding-routing.md#performance-characteristics

| Model | Dimension | Latency | Accuracy | Memory |
|-------|-----------|---------|----------|--------|
| Qwen3 | 1024 | 30-50ms | Highest | 600MB |
| Gemma | 768 | 10-20ms | High | 300MB |
| Gemma | 512 | 8-15ms | Medium | 300MB |
| Gemma | 256 | 5-10ms | Lower | 300MB |


The `Memory` column seems to be incorrect. 300 and 600 are model sizes, not memory usage. And `dtype` is f32, so it should be at least 4 times the model size.

https://github.com/vllm-project/semantic-router/blob/912fe2adaebc87a615cccd584125b1b7e99da7fc/candle-binding/src/model_architectures/model_factory.rs#L180

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Question: How much memory does the model need? #785

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Model	Dimension	Latency	Accuracy	Memory
Qwen3	1024	30-50ms	Highest	600MB
Gemma	768	10-20ms	High	300MB
Gemma	512	8-15ms	Medium	300MB
Gemma	256	5-10ms	Lower	300MB

Question: How much memory does the model need? #785

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions