fix(optimizer): resolve bug where weight decay was multiplied by wrong lr value 3356827 verified dongseokmotif commited on Aug 27, 2025
refactor(muon): change argument adam_wd to weight_decay and handle params' type 02ac540 iamwyldecat commited on Jun 23, 2025
fix(muon): delete intermediate tensors immediately to lower peak mem usage bdd2678 iamwyldecat commited on Jun 15, 2025