arxiv:2603.20527

RMNP: Row-Momentum Normalized Preconditioning for Scalable Matrix-Based Optimization

Published on May 13

Authors:

Abstract

A new optimizer called RMNP is proposed that replaces the computationally expensive Newton-Schulz iteration in Muon with a simpler row-wise normalization operation, achieving similar optimization performance with significantly reduced computational complexity and wall-clock time.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Preconditioned adaptive methods have gained significant attention for training deep neural networks, as they capture rich curvature information of the loss landscape. The central challenge in this field lies in balancing preconditioning effectiveness with computational efficiency of implementing the preconditioner. Among recent advances, Muon stands out by using Newton-Schulz iteration to obtain preconditioned updates without explicitly constructing the preconditioning matrix. Despite its advantages, the efficiency of Muon still leaves room for further improvement. In this paper, we introduce RMNP (Row Momentum Normalized Preconditioning), an optimizer that replaces Newton-Schulz iteration with a simple row-wise (d_{in}) ell_2 normalization operation, motivated by the empirically observed diagonal block structure of the Transformer layerwise Hessian. We empirically verified that orthogonalization and row-wise (on input dim) ell_2 normalization are asymptotically equivalent in the case of the transformer. This substitution reduces the per-iteration computational complexity from {O}(mncdotmin(m,n)) to {O}(mn) for an mtimes n weight matrix while maintaining comparable optimization performance. Theoretically, we establish convergence guarantees for RMNP in the non-convex setting that match recent results for Muon optimizers, achieving the minimax optimal complexity. Extensive experiments on large language model pretraining show that RMNP delivers competitive optimization performance compared with Muon while substantially reducing preconditioning wall-clock time. Our code is available at https://github.com/Dominator-Index/RMNP.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2603.20527

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2603.20527 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2603.20527 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2603.20527 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.