[QST] FP4 Tensor Core mma.sync Instruction Unsupported on SM_101 Architecture

The current kernel uses the following FP4 Tensor Core instruction:
`mma.sync.aligned.m16n8k64.row.col.kind::mxf4nvf4.block_scale.scale_vec::4X.f32.e2m1.e2m1.f32.ue4m3`

This instruction is only supported on **SM_120+** (Hopper/Blackwell architectures). Our target platform is **SM_101**, which does not have native FP4 Tensor Core support, so the instruction cannot execute.

Is there a recommended method to emulate **this FP4 Tensor Core MMA** on **SM_101** while maintaining equivalent numerical results?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[QST] FP4 Tensor Core mma.sync Instruction Unsupported on SM_101 Architecture #2908

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[QST] FP4 Tensor Core mma.sync Instruction Unsupported on SM_101 Architecture #2908

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions