-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Pull requests: NVIDIA/cutlass
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[Cutlass profiler] Fix SM100 FP8 nosmem epilogue shape_div “Divisibility Condition” for non‑multiple‑of‑64 N tiles
#2946
opened Jan 10, 2026 by
aidando73
Loading…
Fix out-of-bounds TMA access in wgmma_tma_sm90 tutorial
#2945
opened Jan 10, 2026 by
Johnsonms
Loading…
cutlass profiler - align emitted SFA/SFB kernel naming with typical convention
#2942
opened Jan 10, 2026 by
aidando73
Loading…
[docs] Add additional tip for generating less kernels in blockwise
#2940
opened Jan 9, 2026 by
aidando73
Loading…
Fix Warp Memory Access Arrangement in Epilogue: Upper Bound memory access width by output tile width
#2938
opened Jan 8, 2026 by
lukas-ruettgers
Loading…
Refactor binary_op functions to remove unused result parameter
#2919
opened Jan 2, 2026 by
pbelevich
Loading…
docs: Add FP16 GEMM documentation to sgemm_sm80.cu - Fixes #1686
#2870
opened Dec 10, 2025 by
blueberrycongee
Loading…
[WIP]Unit tests for Kernels that perform BF16 x BF16 = MXFP8 and MXFP8 x MXFP8 = BF16
#2857
opened Dec 8, 2025 by
Shreya-gaur
Loading…
use cp.async.bulk for per-row data; quiets synccheck
inactive-30d
#2850
opened Dec 5, 2025 by
v0i0
Loading…
[DOCS] Update docs to precisely describe env stream scenario
#2824
opened Nov 29, 2025 by
tqchen
Loading…
[FIX] Update nvidia-cutlass-dsl
requirements version from 4.3.0 to 4.3.1
inactive-30d
#2823
opened Nov 29, 2025 by
jeromeku
Loading…
Fix processing of relative imports in AST preprocessing
#2821
opened Nov 28, 2025 by
danieldk
Loading…
Previous Next
ProTip!
Find all pull requests that aren't related to any open issues with -linked:issue.