[Bugfix] Fix assertion error in flashmla backend with fullgraph enabled#33496
[Bugfix] Fix assertion error in flashmla backend with fullgraph enabled#33496Kurumi5210 wants to merge 1 commit intovllm-project:mainfrom
Conversation
Signed-off-by: Kurumi5210 <[email protected]>
There was a problem hiding this comment.
Code Review
This pull request addresses an assertion error that occurs when using the flashmla backend with fullgraph enabled. The issue stemmed from an inconsistency in _dummy_run where _build_attention_metadata was called with an unpadded token count (num_tokens_unpadded) while other parameters were padded when pad_attn was true. This mismatch in padding led to the failure.
The proposed change corrects this by conditionally passing num_tokens_padded when pad_attn is true, ensuring all arguments to _build_attention_metadata are consistently padded. This is a direct and effective fix for the bug. The change is correct and well-contained.
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
Purpose
Fix an assertion error when using
flashmlabackend withfullgraphenabled.Previously,
_build_attention_metadatawas called withnum_tokens=num_tokens_unpadded. Whenpad_attn=True(required byfullgraph+flashmla), this leads to an inconsistency betweennum_tokensand padded attention-related metadata, triggering an assertion failure inside the attention backend.This PR fixes the issue by passing
num_tokens_paddedwhen full cudagraph is enabled, ensuring consistency betweennum_tokensRelated to #33384
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.