tekkmaven's picture
Fix divergence: peak_lr 1e-3 → 3e-4, max_grad_norm 1.0 → 0.5 (batch=8 too small for high LR)"
9492039 verified