Skip to content

[Feat] Add SM103-specialized cubins for trtllm_fp4_block_scale_moe #3189

@LopezCastroRoberto

Description

@LopezCastroRoberto

Summary

The cutlass_fused_moe backend already dispatches SM103 Ultra groupedGEMM kernels on B300 (MainloopSm103ArrayTmaUmmaWarpSpecializedBlockScaled), but trtllm_fp4_block_scale_moe doesn't seem to have SM103 Ultra specialized cubins; all configs use instK=64 tiles, none with SM103's instK=96.

Would it be possible to add SM103-specialized cubins to the trtllm-gen batched GEMM artifacts? I think the runner infrastructure already supports it (Sm103a arch enum, isArchBlackwellUltra() check); just the cubins themselves are missing.

PR #2917 doesn't seem to include SM103-specific kernels for FP4 MoE either.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions