Summary
The cutlass_fused_moe backend already dispatches SM103 Ultra groupedGEMM kernels on B300 (MainloopSm103ArrayTmaUmmaWarpSpecializedBlockScaled), but trtllm_fp4_block_scale_moe doesn't seem to have SM103 Ultra specialized cubins; all configs use instK=64 tiles, none with SM103's instK=96.
Would it be possible to add SM103-specialized cubins to the trtllm-gen batched GEMM artifacts? I think the runner infrastructure already supports it (Sm103a arch enum, isArchBlackwellUltra() check); just the cubins themselves are missing.
PR #2917 doesn't seem to include SM103-specific kernels for FP4 MoE either.
Summary
The
cutlass_fused_moebackend already dispatches SM103 Ultra groupedGEMM kernels on B300 (MainloopSm103ArrayTmaUmmaWarpSpecializedBlockScaled), buttrtllm_fp4_block_scale_moedoesn't seem to have SM103 Ultra specialized cubins; all configs useinstK=64tiles, none with SM103'sinstK=96.Would it be possible to add SM103-specialized cubins to the trtllm-gen batched GEMM artifacts? I think the runner infrastructure already supports it (
Sm103aarch enum,isArchBlackwellUltra()check); just the cubins themselves are missing.PR #2917 doesn't seem to include SM103-specific kernels for FP4 MoE either.