Please add support for RMSNorm without normalization weights.
This is to support FlashNorm — a mathematically equivalent variant of RMSNorm that folds norm weights into the subsequent linear layer. See explainer video.
We have applied this weight folding trick to a few LLMs (Llama, Qwen, SMolLM) here:
https://huggingface.co/models?other=weightless-rmsnorm
Motivation
FlashNorm's removal of norm weights reduces inference overhead at zero accuracy cost, and we'd like to share these optimized models with the broader community.
Possible Implementation
Remove norm weights from your RMSNorm implementation. E.g., just skip norm weight multiplication if there are no norm weights provided.
Please add support for RMSNorm without normalization weights.
This is to support FlashNorm — a mathematically equivalent variant of RMSNorm that folds norm weights into the subsequent linear layer. See explainer video.
We have applied this weight folding trick to a few LLMs (Llama, Qwen, SMolLM) here:
https://huggingface.co/models?other=weightless-rmsnorm
Motivation
FlashNorm's removal of norm weights reduces inference overhead at zero accuracy cost, and we'd like to share these optimized models with the broader community.
Possible Implementation
Remove norm weights from your RMSNorm implementation. E.g., just skip norm weight multiplication if there are no norm weights provided.