Add Gemma 4 (26B-A4B) LLM and VLM bridge by yaoyu-33 · Pull Request #3148 · NVIDIA-NeMo/Megatron-Bridge

yaoyu-33 · 2026-04-03T20:51:26Z

Summary

Adds full bridge, provider, and VLM model wrapper for Google's Gemma 4 MoE architecture (google/gemma-4-26B-A4B)
Extends the existing Gemma 3 bridge/VL infrastructure with Gemma 4-specific handling: dual local/global RoPE, MoE expert routing, fused router weights, shared expert pre-norm, and sliding window attention
VLM combines HF vision tower (SigLIP-based) + multimodal embedder with Megatron-Core GPT language model

New Files

File	Description
`gemma/gemma4_bridge.py`	LLM weight mapping — fused router, shared expert pre-norm, QKV/GatedMLP
`gemma/gemma4_provider.py`	Megatron-Core GPT provider — proportional RoPE for global layers, MoE config, logit soft-capping
`gemma_vl/gemma4_vl_bridge.py`	VLM bridge — vision tower + embedder mappings with `model.*` prefix
`gemma_vl/gemma4_vl_provider.py`	VLM provider extending Gemma4 LLM provider
`gemma_vl/modeling_gemma4_vl.py`	VLM model — HF vision encoder + Megatron language decoder

Validation Results (single-GPU, bf16)

Metric	Value
Text-only cosine similarity	0.9998
VLM cosine similarity (causal)	0.9977
VLM cosine similarity (bidirectional)	0.9966
Same top-1 token	✅
Image understanding	✅ ("Red square" correctly identified)

Key Design Decisions

Proportional RoPE: Gemma 4 global layers use inv_freq = 1/(base^(arange/head_dim)) with head_dim=512 (not the standard dim=128). Fixed via Gemma4RotaryEmbedding override.
Causal-only attention: Currently uses Megatron's default causal masking. HF uses bidirectional attention for image tokens when mm_token_type_ids is provided, which accounts for the 0.9966→0.9977 cosine gap.
Vision pipeline: vision_tower.forward() returns last_hidden_state (already pooled + standardized), then embed_vision projects to language hidden dim — matches HF's Gemma4Model.get_image_features.

Remaining Work

Bidirectional attention for image tokens: Implement mm_token_type_ids-based attention mask to allow image tokens to attend to each other (improves VLM cosine from 0.9977 → matching HF bidirectional)
Unit tests: Add bridge parity tests in tests/unit_tests/models/gemma4/
Functional tests: Multi-GPU TP/PP validation
Recipes: Add pretrain/SFT recipe configs
Requires transformers >= 5.6.0.dev0 (Gemma4 not yet in stable release)

Test plan

Single-GPU logit parity test (HF vs Megatron)
VLM inference with image input
Unit tests for bridge weight mapping
Multi-GPU TP/PP/VPP tests
SFT training convergence test

🤖 Generated with Claude Code

Add bridge, provider, and VLM model wrapper for Google's Gemma 4 MoE architecture (gemma-4-26B-A4B). Key components: - gemma4_bridge.py: Weight mapping with fused router weights, shared expert pre-norm fusion, and QKV/GatedMLP mappings - gemma4_provider.py: Megatron-Core GPT provider with dual local/global RoPE (proportional formula for global layers), MoE expert routing, sliding window attention, and logit soft-capping - gemma4_vl_bridge.py: VLM bridge with vision tower + embedder weight mappings (model.* prefix for raw safetensors keys) - gemma4_vl_provider.py: VLM provider extending Gemma4 LLM - modeling_gemma4_vl.py: VLM model combining HF vision tower with Megatron language model Validation (single-GPU, bf16): - Text-only cosine similarity: 0.9998 - VLM cosine similarity: 0.9977 (causal-only, apples-to-apples) - Same top-1 token prediction as HF - Correct image understanding ("Red square" from test image) Known limitations: - Bidirectional attention for image tokens (mm_token_type_ids) not yet implemented — uses causal-only masking - Requires transformers >= 5.6.0.dev0 for Gemma4 support Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

copy-pr-bot · 2026-04-03T20:51:31Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Gemma 4 (26B-A4B) LLM and VLM bridge#3148

Add Gemma 4 (26B-A4B) LLM and VLM bridge#3148
yaoyu-33 wants to merge 1 commit intomainfrom
yuya/gemma4-vlm-bridge

yaoyu-33 commented Apr 3, 2026

Uh oh!

copy-pr-bot bot commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yaoyu-33 commented Apr 3, 2026

Summary

New Files

Validation Results (single-GPU, bf16)

Key Design Decisions

Remaining Work

Test plan

Uh oh!

copy-pr-bot bot commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant