Skip to content

francesco-innocenti/mup-papers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 

Repository files navigation

📑 μP Papers

status

μP is an influential, theoretically grounded prescription for how to scale various neural network architectures such that the layer activations (and other quantities such as the learning rate) remain stable during training (neither shrink nor explode) with the model size (i.e. width and depth).

Overview

Key papers (width-only μP)

Depth extensions

Understanding hyperparameter transfer

Spectral perspective

Other optimisers

Other architectures

On weight decay

The role of weight decay with respect to depth-transfer is discussed in the CompleteP work (Dey et al., 2025).

Miscellaneous

Further resources

📝 Blogs

🎙️ Talks

💻 Code

  • the original mup github repo (PyTorch)
  • the nanoGPT-mup repo (PyTorch).

Contributing

Contributions are welcome! To add a paper or submit a correction, please open an issue or submit a pull request.

About

A curated repository of papers on the mean-field / maximal update parameterisation (μP) and related ideas.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors