Add vision feature caching to all models by Blaizzy · Pull Request #1028 · Blaizzy/mlx-vlm

Blaizzy · 2026-04-16T18:26:47Z

Summary

Adds vision_cache kwarg support to all 44 model get_input_embeddings methods. On cache hit, the vision tower is skipped entirely — saving both time and memory on repeated images (multi-turn conversations, batch requests with shared images).

Based on: pc/continous-batch (continuous batching PR)

How it works

Each model's get_input_embeddings now checks:

vision_cache.get(_image_key) before calling vision_tower
Stores computed features via vision_cache.put() after the first call

The server passes vision_cache and _image_key as kwargs — models that don't support it simply ignore the extra kwargs via **kwargs.

Benchmarks (per-request, single image)

Model	Cache miss	Cache hit	Speedup	Memory saved
gemma-4-26b	244ms	1ms	228x	1 GB
Qwen3.5-4B	157ms	7ms	23x	—

Models patched (42 + 2 already done)

All 44 models with cached_image_features support. Syntax-verified and import-tested.

Test plan

Syntax check all 44 files
Import test all 44 modules
Pattern verification (vision_cache get + put)
Multi-turn cache hit/miss timing (gemma4 + qwen3.5)
Embeddings match between cached and uncached paths
Full test suite: 396 passed

🤖 Generated with Claude Code

Every model's get_input_embeddings now supports vision_cache and _image_key kwargs. On cache miss, vision features are computed and stored. On cache hit, the vision tower is skipped entirely. Benchmarks (per-request, single image): - gemma4: 244ms → 1ms (228x speedup), 1GB memory saved - qwen3.5: 157ms → 7ms (23x speedup) Pattern added to each model: vision_cache = kwargs.get("vision_cache", None) cached = kwargs.get("cached_image_features", None) if cached is None and vision_cache is not None: cached = vision_cache.get(kwargs.get("_image_key")) ... if vision_cache is not None and kwargs.get("_image_key") is not None: mx.eval(features) vision_cache.put(kwargs["_image_key"], features) 44 models patched, all syntax-verified and import-tested. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Blaizzy and others added 2 commits April 16, 2026 20:26

Merge branch 'pc/continous-batch' into pc/vision-cache-all-models

51c5209

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add vision feature caching to all models#1028

Add vision feature caching to all models#1028
Blaizzy wants to merge 2 commits intopc/continous-batchfrom
pc/vision-cache-all-models

Blaizzy commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Blaizzy commented Apr 16, 2026

Summary

How it works

Benchmarks (per-request, single image)

Models patched (42 + 2 already done)

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant