-
Notifications
You must be signed in to change notification settings - Fork 831
[CodeGen]Fix Attention GPU pipeline for shapes not requiring padding #23315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[CodeGen]Fix Attention GPU pipeline for shapes not requiring padding #23315
Conversation
…Size, and numValueVectors to ensure tile sizes don't exceed the actual K1/N dimensions. Only set partial_reduction when needed: Skip setting the partial_reduction attribute when all tile sizes are ≤1, since padding is never needed for tile size 1. This allows small attention shapes to compile successfully on GPU without requiring the unimplemented padding-without-mask functionality. Assisted-by: Claude Signed-off-by: Keshav Vinayak Jha <[email protected]>
dimensions to avoid generating tile sizes larger than the problem. Assisted-by: Claude Signed-off-by: Keshav Vinayak Jha <[email protected]>
Add isPaddingNeeded() check before attempting to pad OnlineAttention ops. When all dimensions are multiples of their tile sizes, return early to skip unnecessary padding. This fixes compilation failures for small attention shapes that don't actually require padding. Assisted-by: Claude Signed-off-by: Keshav Vinayak Jha <[email protected]>
Signed-off-by: Keshav Vinayak Jha <[email protected]>
hanhanW
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@keshavvinayak01 Please always add tests unless it is intended. We don't rely on local testing to claim a fix. If a test is not added, please explain why in PR description. I'd expect a lit test somewhere.
btw, if this does fix the issue, can you enable the test? You can move the file from exclude_list to the first list and run bazel_to_cmake.py script.
iree/tests/e2e/linalg_ext_ops/BUILD.bazel
Lines 112 to 131 in 8c014e5
| ROCM_HIP_SRCS = enforce_glob( | |
| # keep sorted | |
| [ | |
| "arg_compare.mlir", | |
| "gather.mlir", | |
| "map_gather.mlir", | |
| "map_scatter.mlir", | |
| "scan.mlir", | |
| "scatter.mlir", | |
| "sort.mlir", | |
| "winograd_input.mlir", | |
| "winograd_output.mlir", | |
| ], | |
| include = ["*.mlir"], | |
| exclude = [ | |
| "top-k.mlir", | |
| "attention.mlir", | |
| "attention_i1_mask.mlir", | |
| ], | |
| ) |
compiler/src/iree/compiler/Codegen/Interfaces/TensorMaskingOpInterface.cpp
Show resolved
Hide resolved
| // Mismatched sizes. | ||
| if (iterationDomain.size() < padMultiples.size()) { | ||
| return true; | ||
| } | ||
|
|
||
| size_t numDims = std::min(iterationDomain.size(), padMultiples.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mismatching sizes looks invalid to me. Do we need these checks? Or can they be an assrtion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we return a failure here, then we won't generate the expected diagnostics for
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't it mean that the preset lowering config is wrong? There are five dimensions, and you only set four dimensions. If you don't want to tile the last dimension, just make it zero. Do I miss something?
I don't think we need to add any other lit tests. The file already contains the cases I'm handing. Removing |
Signed-off-by: Keshav Vinayak Jha <[email protected]>
Signed-off-by: Keshav Vinayak Jha <[email protected]>
|
|
||
| // Mismatched sizes. Return true (assume padding is needed) to allow later | ||
| // validation stages to emit proper diagnostics. | ||
| if (iterationDomain.size() < padMultiples.size()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like this should be an assert on an error. If the iteration domain size is not the same as the pad multiples list then this analysis is tricky.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, all the lit tests will fail in https://github.com/iree-org/iree/blob/main/compiler/src/iree/compiler/Codegen/Common/GPU/test/gpu_apply_padding_online_attention.mlir#L1C1-L136C2, since the iterationDomain in the affine_map size is 5 but padMultiplies / lowering_config length is 4. That's wrong if I'm not mistaken, and it is what confused me to keeps this a soft failure instead of a compilation error.
Which file?
I'd ask which pass? Is the logic exercised by the corresponding test? If so, you can explicitly mention it in your PR description. |
Signed-off-by: Keshav Vinayak Jha <[email protected]>
Small attention shapes (e.g.,
1×3×4fromtests/e2e/linalg_ext_ops/attention.mlir) failed to compile on GPU with error:Padding OnlineAttention without existing mask is not yet supportedAdded an
isPaddingNeeded()check inTensorMaskingOpInterface.cppthat verifies whether padding is actually necessary before attempting it. When all dimensions are already multiples of their respective tile sizes, the pass skips padding entirely, allowing small, aligned shapes to compile successfully.Testing
Compile command:
iree-compile --iree-hal-target-device=hip --iree-hip-target=gfx1201 tests/e2e/linalg_ext_ops/attention.mlir -o /tmp/attention_gfx1201.vmfbNumerics check:
Issue: #23305