-
-
Notifications
You must be signed in to change notification settings - Fork 481
Pull requests: Blaizzy/mlx-vlm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Add KV cache quantization for continuous batching
#1030
opened Apr 17, 2026 by
Blaizzy
Owner
Loading…
4 of 5 tasks
Add DFlash speculative decoding (single + batch + server)
#1029
opened Apr 16, 2026 by
Blaizzy
Owner
Loading…
7 tasks done
Add vision feature caching to all models
#1028
opened Apr 16, 2026 by
Blaizzy
Owner
Loading…
6 tasks done
Security: Remote image fetch lacks timeout and payload size limits (DoS risk)
#1026
opened Apr 16, 2026 by
tomaioo
Loading…
Expose presence_penalty, frequency_penalty, and per-penalty context_size on the server API
#1023
opened Apr 14, 2026 by
esaruoho
Loading…
refactor: improve model loading and resource handling in utils.py
#1019
opened Apr 13, 2026 by
SyedaAnshrahGillani
Loading…
server: indicate finish reason properly when model made a tool call.
#1014
opened Apr 12, 2026 by
viktike
Contributor
Loading…
Resolve no images crash for qwen3_vl and qwen3_vl_moe generate call
#1013
opened Apr 11, 2026 by
urimem
Loading…
perf: close 5.5% decode gap vs mlx_lm.server on streaming chat endpoint
#1012
opened Apr 11, 2026 by
chilang
Loading…
fix: use OpenAI chat-completion field names in /chat/completions usage
#1009
opened Apr 10, 2026 by
chilang
Loading…
fix: replace NaN from all-masked SDPA padding rows in Gemma 4 vision
#1006
opened Apr 10, 2026 by
fabiopili
Loading…
4 tasks done
feat: OpenAI Responses API with structured tool calling and multi-turn support
#996
opened Apr 9, 2026 by
eloe
Loading…
feat: prompt prefix caching with TTL eviction and TurboQuant support
#995
opened Apr 9, 2026 by
eloe
Loading…
fix: return finish_reason=tool_calls when tool calls detected
#990
opened Apr 9, 2026 by
eloe
Loading…
Previous Next
ProTip!
Exclude everything labeled
bug with -label:bug.