fix: apply post_layernorm in CLIPVisionTransformer forward by gururajkosuru · Pull Request #3456 · mlc-ai/mlc-llm

gururajkosuru · 2026-03-17T00:22:29Z

The post_layernorm was defined in CLIPVisionTransformer.init but never applied in the forward method. This fix applies it to the final encoder hidden state, replacing it in the returned tuple. This matches the HuggingFace CLIPVisionTransformer behavior where last_hidden_state is post-normalized.

The [-2] indexing in CLIPVisionModel (used by LLaVA and Phi3V to get penultimate layer features) continues to work correctly since only the last element of the tuple is replaced.

Fixes #3205

The post_layernorm was defined in CLIPVisionTransformer.__init__ but never applied in the forward method. This fix applies it to the final encoder hidden state, replacing it in the returned tuple. This matches the HuggingFace CLIPVisionTransformer behavior where last_hidden_state is post-normalized. The [-2] indexing in CLIPVisionModel (used by LLaVA and Phi3V to get penultimate layer features) continues to work correctly since only the last element of the tuple is replaced. Fixes mlc-ai#3205 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

gemini-code-assist · 2026-03-17T00:22:41Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a bug in the CLIPVisionTransformer where the post_layernorm was defined but not correctly applied in the forward method. The fix ensures that the final encoder hidden state is properly normalized, bringing the model's output in line with the expected behavior of the HuggingFace implementation. This correction addresses an inconsistency while maintaining compatibility with existing usages that access intermediate hidden states.

Highlights

CLIPVisionTransformer Forward Method: Applied the post_layernorm to the final encoder hidden state within the forward method of CLIPVisionTransformer. This change aligns the model's behavior with the HuggingFace CLIPVisionTransformer implementation, ensuring the last_hidden_state is correctly post-normalized.
Compatibility: Ensured that the modification preserves compatibility for downstream models like LLaVA and Phi3V, which rely on [-2] indexing to retrieve penultimate layer features from CLIPVisionModel.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

python/mlc_llm/model/vision/clip_vision.py
- Applied the post_layernorm to the final encoder hidden state.
- Modified the return tuple of the forward method to include the post-normalized last hidden state.

Activity

No specific activity has been recorded for this pull request yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request addresses a bug in CLIPVisionTransformer where the post_layernorm layer was defined but not applied in the forward method. The fix applies this layer to the final hidden state from the encoder, which aligns the model's behavior with the reference implementation from HuggingFace. The change modifies the last element of the returned tuple of hidden states, ensuring that indexing for intermediate layers in downstream models remains correct. The implementation is correct and resolves the issue described.

gemini-code-assist bot reviewed Mar 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: apply post_layernorm in CLIPVisionTransformer forward#3456

fix: apply post_layernorm in CLIPVisionTransformer forward#3456
gururajkosuru wants to merge 1 commit intomlc-ai:mainfrom
gururajkosuru:fix/clip-post-layernorm

gururajkosuru commented Mar 17, 2026

Uh oh!

gemini-code-assist bot commented Mar 17, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

gururajkosuru commented Mar 17, 2026

Uh oh!

gemini-code-assist bot commented Mar 17, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant