[CodeGen]Fix Attention GPU pipeline for shapes not requiring padding #23315

keshavvinayak01 · 2026-01-28T19:50:56Z

Small attention shapes (e.g., 1×3×4 from tests/e2e/linalg_ext_ops/attention.mlir) failed to compile on GPU with error:

Padding OnlineAttention without existing mask is not yet supported

Added an isPaddingNeeded() check in TensorMaskingOpInterface.cpp that verifies whether padding is actually necessary before attempting it. When all dimensions are already multiples of their respective tile sizes, the pass skips padding entirely, allowing small, aligned shapes to compile successfully.

Testing

Compile command: iree-compile --iree-hal-target-device=hip --iree-hip-target=gfx1201 tests/e2e/linalg_ext_ops/attention.mlir -o /tmp/attention_gfx1201.vmfb

Numerics check:

keshavvinayak@Shark23:~/iree$ iree-check-module --device=hip --module=/tmp/attention_gfx1201.vmfb 
[==========] Running 6 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 6 tests from module
[ RUN      ] module.attention1x3x4
[       OK ] module.attention1x3x4 (1545 ms)
[ RUN      ] module.causal_attention1x3x4
[       OK ] module.causal_attention1x3x4 (92 ms)
[ RUN      ] module.attention1x4x4_i1_mask_all_ones
[       OK ] module.attention1x4x4_i1_mask_all_ones (71 ms)
[ RUN      ] module.softcap_attention1x3x4
[       OK ] module.softcap_attention1x3x4 (75 ms)
[ RUN      ] module.attention1x4x4
[       OK ] module.attention1x4x4 (72 ms)
[ RUN      ] module.attention3x3x4
[       OK ] module.attention3x3x4 (71 ms)
[----------] 6 tests from module (1928 ms total)

[----------] Global test environment tear-down
[==========] 6 tests from 1 test suite ran. (1928 ms total)
[  PASSED  ] 6 tests.

Issue: #23305

…Size, and numValueVectors to ensure tile sizes don't exceed the actual K1/N dimensions. Only set partial_reduction when needed: Skip setting the partial_reduction attribute when all tile sizes are ≤1, since padding is never needed for tile size 1. This allows small attention shapes to compile successfully on GPU without requiring the unimplemented padding-without-mask functionality. Assisted-by: Claude Signed-off-by: Keshav Vinayak Jha <[email protected]>

dimensions to avoid generating tile sizes larger than the problem. Assisted-by: Claude Signed-off-by: Keshav Vinayak Jha <[email protected]>

Add isPaddingNeeded() check before attempting to pad OnlineAttention ops. When all dimensions are multiples of their tile sizes, return early to skip unnecessary padding. This fixes compilation failures for small attention shapes that don't actually require padding. Assisted-by: Claude Signed-off-by: Keshav Vinayak Jha <[email protected]>

Signed-off-by: Keshav Vinayak Jha <[email protected]>

hanhanW

@keshavvinayak01 Please always add tests unless it is intended. We don't rely on local testing to claim a fix. If a test is not added, please explain why in PR description. I'd expect a lit test somewhere.

btw, if this does fix the issue, can you enable the test? You can move the file from exclude_list to the first list and run bazel_to_cmake.py script.

iree/tests/e2e/linalg_ext_ops/BUILD.bazel

Lines 112 to 131 in 8c014e5

    
           ROCM_HIP_SRCS = enforce_glob( 
        
               # keep sorted 
        
               [ 
        
                   "arg_compare.mlir", 
        
                   "gather.mlir", 
        
                   "map_gather.mlir", 
        
                   "map_scatter.mlir", 
        
                   "scan.mlir", 
        
                   "scatter.mlir", 
        
                   "sort.mlir", 
        
                   "winograd_input.mlir", 
        
                   "winograd_output.mlir", 
        
               ], 
        
               include = ["*.mlir"], 
        
               exclude = [ 
        
                   "top-k.mlir", 
        
                   "attention.mlir", 
        
                   "attention_i1_mask.mlir", 
        
               ], 
        
           )

compiler/src/iree/compiler/Codegen/Interfaces/TensorMaskingOpInterface.cpp

hanhanW · 2026-01-29T21:17:01Z

compiler/src/iree/compiler/Codegen/Interfaces/TensorMaskingOpInterface.cpp

+  // Mismatched sizes.
+  if (iterationDomain.size() < padMultiples.size()) {
+    return true;
+  }
+
+  size_t numDims = std::min(iterationDomain.size(), padMultiples.size());


Mismatching sizes looks invalid to me. Do we need these checks? Or can they be an assrtion?

If we return a failure here, then we won't generate the expected diagnostics for

https://github.com/iree-org/iree/blob/fce95cd99184bb458d191edc2c0d1de7aa9940f7/compiler/src/iree/compiler/Codegen/Common/GPU/test/gpu_apply_padding_online_attention.mlir#L12C1-L41C2

Doesn't it mean that the preset lowering config is wrong? There are five dimensions, and you only set four dimensions. If you don't want to tile the last dimension, just make it zero. Do I miss something?

keshavvinayak01 · 2026-01-30T13:53:46Z

@keshavvinayak01 Please always add tests unless it is intended. We don't rely on local testing to claim a fix. If a test is not added, please explain why in PR description. I'd expect a lit test somewhere.

btw, if this does fix the issue, can you enable the test? You can move the file from exclude_list to the first list and run bazel_to_cmake.py script.

iree/tests/e2e/linalg_ext_ops/BUILD.bazel

Lines 112 to 131 in 8c014e5

ROCM_HIP_SRCS = enforce_glob(

# keep sorted

[

"arg_compare.mlir",

"gather.mlir",

"map_gather.mlir",

"map_scatter.mlir",

"scan.mlir",

"scatter.mlir",

"sort.mlir",

"winograd_input.mlir",

"winograd_output.mlir",

],

include = ["*.mlir"],

exclude = [

"top-k.mlir",

"attention.mlir",

"attention_i1_mask.mlir",

],

)

I don't think we need to add any other lit tests. The file already contains the cases I'm handing. Removing attention.mlir from the exclude list is sufficient?

Signed-off-by: Keshav Vinayak Jha <[email protected]>

MaheshRavishankar · 2026-01-30T17:38:06Z

compiler/src/iree/compiler/Codegen/Interfaces/TensorMaskingOpInterface.cpp

+
+  // Mismatched sizes. Return true (assume padding is needed) to allow later
+  // validation stages to emit proper diagnostics.
+  if (iterationDomain.size() < padMultiples.size()) {


This seems like this should be an assert on an error. If the iteration domain size is not the same as the pad multiples list then this analysis is tricky.

In that case, all the lit tests will fail in https://github.com/iree-org/iree/blob/main/compiler/src/iree/compiler/Codegen/Common/GPU/test/gpu_apply_padding_online_attention.mlir#L1C1-L136C2, since the iterationDomain in the affine_map size is 5 but padMultiplies / lowering_config length is 4. That's wrong if I'm not mistaken, and it is what confused me to keeps this a soft failure instead of a compilation error.

hanhanW · 2026-01-30T19:05:21Z

@keshavvinayak01 Please always add tests unless it is intended. We don't rely on local testing to claim a fix. If a test is not added, please explain why in PR description. I'd expect a lit test somewhere.
btw, if this does fix the issue, can you enable the test? You can move the file from exclude_list to the first list and run bazel_to_cmake.py script.

iree/tests/e2e/linalg_ext_ops/BUILD.bazel

Lines 112 to 131 in 8c014e5

ROCM_HIP_SRCS = enforce_glob(

# keep sorted

[

"arg_compare.mlir",

"gather.mlir",

"map_gather.mlir",

"map_scatter.mlir",

"scan.mlir",

"scatter.mlir",

"sort.mlir",

"winograd_input.mlir",

"winograd_output.mlir",

],

include = ["*.mlir"],

exclude = [

"top-k.mlir",

"attention.mlir",

"attention_i1_mask.mlir",

],

)

I don't think we need to add any other lit tests. The file already contains the cases I'm handing. Removing attention.mlir from the exclude list is sufficient?

Which file? attention.mlir? I still think we need unit test. You are updating some heuristic, and it is not reflected in unit test. We should not rely on integration tests to test such behavior. "Be able to compile and run" does not mean that your logic is fully tested. We could silently flip to other path and it still compiles in the future. Then, your change here becomes tech debts or dead code, right? No one can reason if we can delete the code or not; it will likely remain until we re-implement the thing. People won't know when can we delete the code.

When all dimensions are already multiples of their respective tile sizes, the pass skips padding entirely, allowing small, aligned shapes to compile successfully.

I'd ask which pass? Is the logic exercised by the corresponding test? If so, you can explicitly mention it in your PR description.

Signed-off-by: Keshav Vinayak Jha <[email protected]>

keshavvinayak01 changed the title ~~[DRAFT : Do not review] [CodeGen]Adjust seeds to fit actual dimensions: Cap keyVectorSize, valueVector…~~ [DRAFT : Do not review] [CodeGen] Fix Attention GPU pipeline for small shapes Jan 28, 2026

keshavvinayak01 changed the title ~~[DRAFT : Do not review] [CodeGen] Fix Attention GPU pipeline for small shapes~~ [DRAFT][CodeGen]Fix Attention GPU pipeline for small shapes Jan 28, 2026

keshavvinayak01 changed the title ~~[DRAFT][CodeGen]Fix Attention GPU pipeline for small shapes~~ [WIP][CodeGen]Fix Attention GPU pipeline for small shapes Jan 28, 2026

keshavvinayak01 added 2 commits January 29, 2026 02:52

Adjust attention reduction seeds to fit within actual K1/N

b6b7d3e

dimensions to avoid generating tile sizes larger than the problem. Assisted-by: Claude Signed-off-by: Keshav Vinayak Jha <[email protected]>

keshavvinayak01 changed the title ~~[WIP][CodeGen]Fix Attention GPU pipeline for small shapes~~ [WIP][CodeGen]Fix Attention GPU pipeline for shapes not requiring padding Jan 29, 2026

keshavvinayak01 changed the title ~~[WIP][CodeGen]Fix Attention GPU pipeline for shapes not requiring padding~~ [CodeGen]Fix Attention GPU pipeline for shapes not requiring padding Jan 29, 2026

keshavvinayak01 marked this pull request as ready for review January 29, 2026 11:54

keshavvinayak01 requested a review from MaheshRavishankar as a code owner January 29, 2026 11:54

keshavvinayak01 requested a review from hanhanW January 29, 2026 12:13

keshavvinayak01 marked this pull request as draft January 29, 2026 12:35

Check should fail if the iterationDims shrank

b0bdec6

Signed-off-by: Keshav Vinayak Jha <[email protected]>

keshavvinayak01 marked this pull request as ready for review January 29, 2026 12:57

sommerlukas mentioned this pull request Jan 29, 2026

Enhance Attention Testing #23323

Open

hanhanW requested changes Jan 29, 2026

View reviewed changes

hanhanW requested a review from Groverkss January 29, 2026 21:31

keshavvinayak01 added 2 commits January 30, 2026 05:57

Better docs

4e9f4a5

Signed-off-by: Keshav Vinayak Jha <[email protected]>

Added attention.mlir to ROCM_HIP_STC test list

819538c

Signed-off-by: Keshav Vinayak Jha <[email protected]>

keshavvinayak01 requested a review from bjacob as a code owner January 30, 2026 14:01

sommerlukas mentioned this pull request Jan 30, 2026

[Codegen] Extend generated E2E attention tests #23342

Draft

MaheshRavishankar reviewed Jan 30, 2026

View reviewed changes

Fixed lit test lowering_config #iteration_domans

3126dd9

Signed-off-by: Keshav Vinayak Jha <[email protected]>

keshavvinayak01 requested review from kuhar and qedawkins as code owners February 2, 2026 06:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CodeGen]Fix Attention GPU pipeline for shapes not requiring padding #23315

[CodeGen]Fix Attention GPU pipeline for shapes not requiring padding #23315

keshavvinayak01 commented Jan 28, 2026 •

edited

Loading

Uh oh!

hanhanW left a comment

Uh oh!

Uh oh!

hanhanW Jan 29, 2026

Uh oh!

keshavvinayak01 Jan 30, 2026

Uh oh!

hanhanW Jan 30, 2026 •

edited

Loading

Uh oh!

keshavvinayak01 commented Jan 30, 2026

Uh oh!

MaheshRavishankar Jan 30, 2026

Uh oh!

keshavvinayak01 Jan 31, 2026

Uh oh!

hanhanW commented Jan 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	ROCM_HIP_SRCS = enforce_glob(
	# keep sorted
	[
	"arg_compare.mlir",
	"gather.mlir",
	"map_gather.mlir",
	"map_scatter.mlir",
	"scan.mlir",
	"scatter.mlir",
	"sort.mlir",
	"winograd_input.mlir",
	"winograd_output.mlir",
	],
	include = ["*.mlir"],
	exclude = [
	"top-k.mlir",
	"attention.mlir",
	"attention_i1_mask.mlir",
	],
	)

[CodeGen]Fix Attention GPU pipeline for shapes not requiring padding #23315

Are you sure you want to change the base?

[CodeGen]Fix Attention GPU pipeline for shapes not requiring padding #23315

Conversation

keshavvinayak01 commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Testing

Uh oh!

hanhanW left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hanhanW Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

keshavvinayak01 Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

hanhanW Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

keshavvinayak01 commented Jan 30, 2026

Uh oh!

MaheshRavishankar Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

keshavvinayak01 Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

hanhanW commented Jan 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

keshavvinayak01 commented Jan 28, 2026 •

edited

Loading

hanhanW Jan 30, 2026 •

edited

Loading