Skip to content

Conversation

@keshavvinayak01
Copy link
Contributor

@keshavvinayak01 keshavvinayak01 commented Jan 28, 2026

Small attention shapes (e.g., 1×3×4 from tests/e2e/linalg_ext_ops/attention.mlir) failed to compile on GPU with error:

Padding OnlineAttention without existing mask is not yet supported

Added an isPaddingNeeded() check in TensorMaskingOpInterface.cpp that verifies whether padding is actually necessary before attempting it. When all dimensions are already multiples of their respective tile sizes, the pass skips padding entirely, allowing small, aligned shapes to compile successfully.

Testing

Compile command: iree-compile --iree-hal-target-device=hip --iree-hip-target=gfx1201 tests/e2e/linalg_ext_ops/attention.mlir -o /tmp/attention_gfx1201.vmfb

Numerics check:

keshavvinayak@Shark23:~/iree$ iree-check-module --device=hip --module=/tmp/attention_gfx1201.vmfb 
[==========] Running 6 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 6 tests from module
[ RUN      ] module.attention1x3x4
[       OK ] module.attention1x3x4 (1545 ms)
[ RUN      ] module.causal_attention1x3x4
[       OK ] module.causal_attention1x3x4 (92 ms)
[ RUN      ] module.attention1x4x4_i1_mask_all_ones
[       OK ] module.attention1x4x4_i1_mask_all_ones (71 ms)
[ RUN      ] module.softcap_attention1x3x4
[       OK ] module.softcap_attention1x3x4 (75 ms)
[ RUN      ] module.attention1x4x4
[       OK ] module.attention1x4x4 (72 ms)
[ RUN      ] module.attention3x3x4
[       OK ] module.attention3x3x4 (71 ms)
[----------] 6 tests from module (1928 ms total)

[----------] Global test environment tear-down
[==========] 6 tests from 1 test suite ran. (1928 ms total)
[  PASSED  ] 6 tests.

Issue: #23305

…Size, and numValueVectors to ensure tile sizes don't exceed the actual K1/N dimensions.

Only set partial_reduction when needed: Skip setting the partial_reduction attribute when all tile sizes are ≤1, since padding is never needed for tile size 1.
This allows small attention shapes to compile successfully on GPU without requiring the unimplemented padding-without-mask functionality.

Assisted-by: Claude
Signed-off-by: Keshav Vinayak Jha <[email protected]>
@keshavvinayak01 keshavvinayak01 changed the title [DRAFT : Do not review] [CodeGen]Adjust seeds to fit actual dimensions: Cap keyVectorSize, valueVector… [DRAFT : Do not review] [CodeGen] Fix Attention GPU pipeline for small shapes Jan 28, 2026
@keshavvinayak01 keshavvinayak01 changed the title [DRAFT : Do not review] [CodeGen] Fix Attention GPU pipeline for small shapes [DRAFT][CodeGen]Fix Attention GPU pipeline for small shapes Jan 28, 2026
@keshavvinayak01 keshavvinayak01 changed the title [DRAFT][CodeGen]Fix Attention GPU pipeline for small shapes [WIP][CodeGen]Fix Attention GPU pipeline for small shapes Jan 28, 2026
dimensions to avoid generating tile sizes larger than the problem.

Assisted-by: Claude
Signed-off-by: Keshav Vinayak Jha <[email protected]>
Add isPaddingNeeded() check before attempting to pad OnlineAttention ops.
When all dimensions are multiples of their tile sizes, return early to
skip unnecessary padding. This fixes compilation failures for small
attention shapes that don't actually require padding.

Assisted-by: Claude
Signed-off-by: Keshav Vinayak Jha <[email protected]>
@keshavvinayak01 keshavvinayak01 changed the title [WIP][CodeGen]Fix Attention GPU pipeline for small shapes [WIP][CodeGen]Fix Attention GPU pipeline for shapes not requiring padding Jan 29, 2026
@keshavvinayak01 keshavvinayak01 changed the title [WIP][CodeGen]Fix Attention GPU pipeline for shapes not requiring padding [CodeGen]Fix Attention GPU pipeline for shapes not requiring padding Jan 29, 2026
@keshavvinayak01 keshavvinayak01 marked this pull request as ready for review January 29, 2026 11:54
@keshavvinayak01 keshavvinayak01 marked this pull request as draft January 29, 2026 12:35
@keshavvinayak01 keshavvinayak01 marked this pull request as ready for review January 29, 2026 12:57
Copy link
Contributor

@hanhanW hanhanW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@keshavvinayak01 Please always add tests unless it is intended. We don't rely on local testing to claim a fix. If a test is not added, please explain why in PR description. I'd expect a lit test somewhere.

btw, if this does fix the issue, can you enable the test? You can move the file from exclude_list to the first list and run bazel_to_cmake.py script.

ROCM_HIP_SRCS = enforce_glob(
# keep sorted
[
"arg_compare.mlir",
"gather.mlir",
"map_gather.mlir",
"map_scatter.mlir",
"scan.mlir",
"scatter.mlir",
"sort.mlir",
"winograd_input.mlir",
"winograd_output.mlir",
],
include = ["*.mlir"],
exclude = [
"top-k.mlir",
"attention.mlir",
"attention_i1_mask.mlir",
],
)

Comment on lines 36 to 41
// Mismatched sizes.
if (iterationDomain.size() < padMultiples.size()) {
return true;
}

size_t numDims = std::min(iterationDomain.size(), padMultiples.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mismatching sizes looks invalid to me. Do we need these checks? Or can they be an assrtion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@hanhanW hanhanW Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't it mean that the preset lowering config is wrong? There are five dimensions, and you only set four dimensions. If you don't want to tile the last dimension, just make it zero. Do I miss something?

@hanhanW hanhanW requested a review from Groverkss January 29, 2026 21:31
@keshavvinayak01
Copy link
Contributor Author

@keshavvinayak01 Please always add tests unless it is intended. We don't rely on local testing to claim a fix. If a test is not added, please explain why in PR description. I'd expect a lit test somewhere.

btw, if this does fix the issue, can you enable the test? You can move the file from exclude_list to the first list and run bazel_to_cmake.py script.

ROCM_HIP_SRCS = enforce_glob(
# keep sorted
[
"arg_compare.mlir",
"gather.mlir",
"map_gather.mlir",
"map_scatter.mlir",
"scan.mlir",
"scatter.mlir",
"sort.mlir",
"winograd_input.mlir",
"winograd_output.mlir",
],
include = ["*.mlir"],
exclude = [
"top-k.mlir",
"attention.mlir",
"attention_i1_mask.mlir",
],
)

I don't think we need to add any other lit tests. The file already contains the cases I'm handing. Removing attention.mlir from the exclude list is sufficient?

Signed-off-by: Keshav Vinayak Jha <[email protected]>

// Mismatched sizes. Return true (assume padding is needed) to allow later
// validation stages to emit proper diagnostics.
if (iterationDomain.size() < padMultiples.size()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like this should be an assert on an error. If the iteration domain size is not the same as the pad multiples list then this analysis is tricky.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, all the lit tests will fail in https://github.com/iree-org/iree/blob/main/compiler/src/iree/compiler/Codegen/Common/GPU/test/gpu_apply_padding_online_attention.mlir#L1C1-L136C2, since the iterationDomain in the affine_map size is 5 but padMultiplies / lowering_config length is 4. That's wrong if I'm not mistaken, and it is what confused me to keeps this a soft failure instead of a compilation error.

@hanhanW
Copy link
Contributor

hanhanW commented Jan 30, 2026

@keshavvinayak01 Please always add tests unless it is intended. We don't rely on local testing to claim a fix. If a test is not added, please explain why in PR description. I'd expect a lit test somewhere.
btw, if this does fix the issue, can you enable the test? You can move the file from exclude_list to the first list and run bazel_to_cmake.py script.

ROCM_HIP_SRCS = enforce_glob(
# keep sorted
[
"arg_compare.mlir",
"gather.mlir",
"map_gather.mlir",
"map_scatter.mlir",
"scan.mlir",
"scatter.mlir",
"sort.mlir",
"winograd_input.mlir",
"winograd_output.mlir",
],
include = ["*.mlir"],
exclude = [
"top-k.mlir",
"attention.mlir",
"attention_i1_mask.mlir",
],
)

I don't think we need to add any other lit tests. The file already contains the cases I'm handing. Removing attention.mlir from the exclude list is sufficient?

Which file? attention.mlir? I still think we need unit test. You are updating some heuristic, and it is not reflected in unit test. We should not rely on integration tests to test such behavior. "Be able to compile and run" does not mean that your logic is fully tested. We could silently flip to other path and it still compiles in the future. Then, your change here becomes tech debts or dead code, right? No one can reason if we can delete the code or not; it will likely remain until we re-implement the thing. People won't know when can we delete the code.

When all dimensions are already multiples of their respective tile sizes, the pass skips padding entirely, allowing small, aligned shapes to compile successfully.

I'd ask which pass? Is the logic exercised by the corresponding test? If so, you can explicitly mention it in your PR description.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants