One-Step Gradient Delay is Not a Barrier for Large-Scale Asynchronous Pipeline Parallel LLM Pretraining Paper • 2606.30634 • Published 2 days ago • 18
Quartet II: Accurate LLM Pre-Training in NVFP4 by Improved Unbiased Gradient Estimation Paper • 2601.22813 • Published Jan 30 • 63
Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation Paper • 2511.14993 • Published Nov 19, 2025 • 234
Benchmarking Optimizers for Large Language Model Pretraining Paper • 2509.01440 • Published Sep 1, 2025 • 25