fix: replace NaN from all-masked SDPA padding rows in Gemma 4 vision by fabiopili · Pull Request #1006 · Blaizzy/mlx-vlm

fabiopili · 2026-04-10T08:33:28Z

I've been having problems with Gemma 4 models that I quantized locally and enabled the full 1120 vision token budget when serving from oMLX and accessing from Open WebUI. The model would respond an endless loop of the <pad> token.

A code review is recommended before merging, even though the impact seems minimal. I'm not a proficient Python developer and this change was created with Claude Code assistance for the bug identification. That said, I've been using it locally without problems and tests pass.

Summary

Replace NaN values produced by softmax over all-masked padding rows in Gemma 4 vision attention with 0, preventing NaN propagation through residual connections and layer norms
The existing ensure_fused_sdpa workaround pads head_dim to force the fused SDPA kernel, but does not help at sequence lengths (e.g. L=10080 for 1120-token budget) where the non-fused fallback is used
Padding rows are zeroed out by VisionPooler before pooling anyway, so 0 is the correct replacement value

Problem

MLX's scaled_dot_product_attention computes softmax(QK^T) V. When an entire row of the attention mask is -inf (padding tokens that should attend to nothing), softmax produces 0/0 = NaN. This NaN then propagates through:

The output projection (o_proj)
Residual additions in each of the 24 transformer layers
RMSNorm (any NaN input produces NaN output)
The MLP branch
All subsequent layers

By the time the VisionPooler zeros out padding positions, the NaN has already corrupted non-padding rows via residual connections.

Fix

A single line after the SDPA call in VisionAttention.__call__:

attn_output = mx.where(mx.isnan(attn_output), 0.0, attn_output)

This seems safe because:

Only padding rows produce NaN (real content rows have valid attention targets)
VisionPooler explicitly zeros padding positions before pooling (vision.py:364-367)
The replacement runs inside @mx.compile scope, so the isnan check fuses into the existing compute graph with minimal overhead

Changed files

mlx_vlm/models/gemma4/vision.py (+5 lines)

Test plan

All 395 unit tests pass (0 failures, 1 skipped, 2 unrelated errors)
End-to-end inference with Gemma 4 on single image (fused SDPA path)
End-to-end inference with Gemma 4 on multiple different-sized images (non-fused path, L=10080)
Verify no NaN in output embeddings at both sequence lengths

MLX's scaled_dot_product_attention produces NaN for fully-masked rows (softmax over all -inf = 0/0) at most sequence lengths. The existing ensure_fused_sdpa workaround only helps when the fused kernel activates (e.g. L=5040) but fails at other sizes like L=10080 (1120-token budget). Replace NaN with 0 after SDPA — safe because the pooler already zeros out padding rows before pooling.

…Blaizzy#924) Two upstream-PR fixes applied to local copy: vision.py: replace NaN with 0 after SDPA in VisionAttention. All-masked padding rows produce NaN via softmax (0/0) at sequence lengths where the non-fused SDPA fallback runs. The pooler zeros these rows anyway, so 0 is the correct replacement. Adapted from Blaizzy#1006 by Fabio Pili. processing_gemma4.py + chat_template.jinja: bundle the official Google Gemma 4 chat template as a fallback. Many mlx-community Gemma 4 models don't ship chat_template in tokenizer_config. Without this, the prompt falls back to plain text without turn markers and the model produces "No text generated". Adapted from Blaizzy#924 by jrp2014. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: replace NaN from all-masked SDPA padding rows in Gemma 4 vision#1006

fix: replace NaN from all-masked SDPA padding rows in Gemma 4 vision#1006
fabiopili wants to merge 1 commit intoBlaizzy:mainfrom
fabiopili:gemma4-nan-fix

fabiopili commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

fabiopili commented Apr 10, 2026

Summary

Problem

Fix

Changed files

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant