-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Closed
Labels
Description
Which component has the problem?
CuTe Tutorial
Bug Report
Describe the bug
The Blackwell MMA tutorial example 01_mma_sm100.cu contains incorrect tensor layout stride values in the documentation comments. The commented stride values do not match the actual tensor layouts produced when the code executes, which can mislead users learning from the tutorial.
Steps/Code to reproduce bug
- Navigate to the Blackwell tutorial directory:
cd examples/cute/tutorial/blackwell - Compile the tutorial with debug prints enabled:
nvcc -std=c++17 -arch=sm_100a \
-I../../../../include \
-I../../../../tools/util/include \
01_mma_sm100.cu -o 01_mma_sm100
- Run the executable:
./01_mma_sm100 - Compare the actual output for tensor layouts (printed to console) with the documented values in the comments at lines 181-184 and 212-214.
Expected behavior
Observed incorrect comment values:
- Line 182: gB: (_256,_64,4):(_1,256,16384)
- Line 183: gC: (_128,_256):(256,_1)
- Line 184: gD: (_128,_256):(256,_1)
- Line 212: tCgB: ((_256,_16),_1,_4,4):((_1,256),_0,4096,16384)
- Line 213: tCgC: ((_128,_256),_1,_1):((256,_1),_0,_0)
- Line 214: tCgD: ((_128,_256),_1,_1):((256,_1),_0,_0)
The documentation comments should accurately reflect the actual tensor layout strides produced by the code. The correct values should be:
- Line 182: gB: (_256,_64,4):(256,_1,_64)
- Line 183: gC: (_128,_256):(1024,_1)
- Line 184: gD: (_128,_256):(1024,_1)
- Line 212: tCgB: ((_256,_16),_1,_4,4):((256,_1),_0,_16,_64)
- Line 213: tCgC: ((_128,_256),_1,_1):((1024,_1),_0,_0)
- Line 214: tCgD: ((_128,_256),_1,_1):((1024,_1),_0,_0)
Environment details (please complete the following information):
- Environment location: Bare-metal / Docker (depends on user setup)
- CUTLASS version: 4.3.4 (latest main branch)
- CUDA version: Any version supporting sm_100a (Blackwell architecture)
- GPU architecture: Blackwell (SM 100a)
- File affected: examples/cute/tutorial/blackwell/01_mma_sm100.cu
Additional context
Add any other context about the problem here.
Reactions are currently unavailable