[Feat] Add SM103-specialized cubins for trtllm_fp4_block_scale_moe

## Summary

The `cutlass_fused_moe` backend already dispatches SM103 Ultra groupedGEMM kernels on B300 (`MainloopSm103ArrayTmaUmmaWarpSpecializedBlockScaled`), but `trtllm_fp4_block_scale_moe` doesn't seem to have SM103 Ultra specialized cubins; all configs use `instK=64` tiles, none with SM103's `instK=96`.

Would it be possible to add SM103-specialized cubins to the trtllm-gen batched GEMM artifacts? I think the runner infrastructure already supports it (`Sm103a` arch enum, `isArchBlackwellUltra()` check); just the cubins themselves are missing.

PR #2917 doesn't seem to include SM103-specific kernels for FP4 MoE either.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feat] Add SM103-specialized cubins for trtllm_fp4_block_scale_moe #3189

Summary

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feat] Add SM103-specialized cubins for trtllm_fp4_block_scale_moe #3189

Description

Summary

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions