fix: use OpenAI chat-completion field names in /chat/completions usage by chilang · Pull Request #1009 · Blaizzy/mlx-vlm

chilang · 2026-04-10T17:17:55Z

Summary

UsageStats (used for /v1/chat/completions responses) inherits from OpenAIUsage, which models the OpenAI Responses API (/v1/responses) — that spec uses input_tokens / output_tokens. But /v1/chat/completions is a different spec: the usage object must contain prompt_tokens, completion_tokens, and total_tokens.

The net effect: any OpenAI-compatible client hitting mlx-vlm's /v1/chat/completions fails to read the usage payload because the field names don't match. I hit this reproducing Gemma 4 benchmarks with llama-benchy, which errors out during warmup with Warmup failed: 'prompt_tokens'.

Fix

Stop inheriting UsageStats from OpenAIUsage. Keep OpenAIUsage as-is for the Responses API (/v1/responses) where the spec is correct.
Declare the chat-completion fields directly on UsageStats:
- prompt_tokens: int
- completion_tokens: int
- total_tokens: int
- (existing prompt_tps / generation_tps / peak_memory extras preserved)
Update the two call sites in chat_completions_endpoint (streaming SSE chunk + non-streaming final response) to build UsageStats with the new field names.
/v1/responses is untouched — it keeps using OpenAIUsage with input_tokens / output_tokens per OpenAI's Responses API spec.

Test plan

Added regression test test_chat_completions_response_uses_openai_usage_field_names that mocks generate() and asserts the JSON response body contains usage.prompt_tokens, usage.completion_tokens, usage.total_tokens, and does not contain the Responses-API field names.
python -m pytest mlx_vlm/tests/test_server.py — 10 passed.
Manually verified with curl against /v1/chat/completions on a locally running server — the usage object now matches the OpenAI Chat Completions spec.
Verified with llama-benchy against mlx-community/gemma-4-E4B-it-4bit and mlx-community/gemma-4-26b-a4b-it-4bit — warmup now succeeds and benchmarks complete.
black --check and isort --profile=black --check pass on changed files.

References

OpenAI Chat Completions: https://platform.openai.com/docs/api-reference/chat/object — usage object fields prompt_tokens / completion_tokens / total_tokens
OpenAI Responses: https://platform.openai.com/docs/api-reference/responses/object — usage object fields input_tokens / output_tokens

`UsageStats` previously inherited from `OpenAIUsage`, which models the `/v1/responses` endpoint spec (`input_tokens` / `output_tokens`). The `/v1/chat/completions` endpoint is a different spec and requires `prompt_tokens` / `completion_tokens` / `total_tokens` in the `usage` object. OpenAI-compatible clients (tested with llama-benchy) fail to parse the response because `prompt_tokens` is missing. Split the two: keep `OpenAIUsage` for the Responses API, and give `UsageStats` the chat-completion field names directly. Update both streaming and non-streaming code paths. Add a regression test.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: use OpenAI chat-completion field names in /chat/completions usage#1009

fix: use OpenAI chat-completion field names in /chat/completions usage#1009
chilang wants to merge 1 commit intoBlaizzy:mainfrom
chilang:fix/chat-completions-usage-field-names

chilang commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

chilang commented Apr 10, 2026

Summary

Fix

Test plan

References

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant