Skip to content

[LDS] Add fallback for CoalescedGatherDMA lowering.#23560

Merged
lialan merged 2 commits intomainfrom
users/lialan/dma_fallback
Feb 27, 2026
Merged

[LDS] Add fallback for CoalescedGatherDMA lowering.#23560
lialan merged 2 commits intomainfrom
users/lialan/dma_fallback

Conversation

@lialan
Copy link
Contributor

@lialan lialan commented Feb 24, 2026

  • When gather_to_lds lowering fails due to DMA size alignment, fall back to element-by-element coalesced vector.transfer_read/vector.transfer_write.
  • Each lane transfers one element per iteration across the subgroup.
  • Remainder elements guarded by scf.if for partial iterations.
  • OOB handling replicates the fast path's outermost-index-replacement trick.
  • New lit tests for 1D, 2D, gather, and OOB fallback cases.

…lowering.

* When gather_to_lds lowering fails due to DMA size alignment, fall back to
  element-by-element coalesced vector.transfer_read/vector.transfer_write.
* Each lane transfers one element per iteration across the subgroup.
* Remainder elements guarded by scf.if for partial iterations.
* OOB handling replicates the fast path's outermost-index-replacement trick.
* New lit tests for 1D, 2D, gather, and OOB fallback cases.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@lialan lialan marked this pull request as ready for review February 25, 2026 00:50
@lialan lialan requested a review from krzysz00 February 26, 2026 01:22
Copy link
Contributor

@qedawkins qedawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some mostly minor comments

* Remove redundant fallback pattern constructor (benefit=1 is default).
* Use llvm::to_vector for linearBasis construction.
* Use !inBoundsAttr->empty() instead of size() > 0.
* Use llvm::map_to_vector for outerDimOffsets construction.
* Remove stale comment about future fallback lowering path.
* Capture SSA values and check offsets in remainder iteration tests (1D, 2D).
* Verify in_bounds=[false] (default, omitted) on OOB transfer_read.
* Add fallback_with_outer_dims test for non-contiguous strided memref
  with outer dimension iteration.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@lialan lialan merged commit 1cf18c7 into main Feb 27, 2026
56 of 60 checks passed
@lialan lialan deleted the users/lialan/dma_fallback branch February 27, 2026 16:43
Copy link
Contributor

@krzysz00 krzysz00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Post-commit review

for (Attribute attr : *inBounds) {
if (!cast<BoolAttr>(attr).getValue()) {
return rewriter.notifyMatchFailure(
dmaOp, "in_bounds with OOB dimensions requires "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... hold on, did we empirically confirm this? Do we know that if statements around amdgpu.gather_to_lds don't work?

And ... what should be happening is that OOB on non-buffers goes to the fallback path?

MLIRContext *context = &getContext();
RewritePatternSet patterns(context);
patterns.add<LowerCoalescedGatherDMAPattern>(context, dmaSizes);
patterns.add<LowerCoalescedGatherDMAFallbackPattern>(context);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use pattern benefits to run the lowering before the fallback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants