Progressive Residual Warmup for Language Model Pretraining Paper • 2603.05369 • Published 26 days ago • 36 • 5