Skip to content

Implementing AdaMuon #1575

@raghulchandramouli

Description

@raghulchandramouli

This is a new optimizer, that combines elementwise adaptivity with orthogonal updates for large networks

Novelty:

  1. an elementwise second momentum estimator applied to update directions
  2. a sign-stabilized update where the momentum is first sign-transformed before orthogonalization

Paper link : https://arxiv.org/abs/2507.11005

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions