Rethinking Global Text Conditioning in Diffusion Transformers Paper • 2602.09268 • Published Feb 9 • 8
Quartet II: Accurate LLM Pre-Training in NVFP4 by Improved Unbiased Gradient Estimation Paper • 2601.22813 • Published Jan 30 • 57
Emergent Misalignment via In-Context Learning: Narrow in-context examples can produce broadly misaligned LLMs Paper • 2510.11288 • Published Oct 13, 2025 • 48
MADrive: Memory-Augmented Driving Scene Modeling Paper • 2506.21520 • Published Jun 26, 2025 • 36
Alchemist: Turning Public Text-to-Image Data into Generative Gold Paper • 2505.19297 • Published May 25, 2025 • 84
Quartet: Native FP4 Training Can Be Optimal for Large Language Models Paper • 2505.14669 • Published May 20, 2025 • 78
Hogwild! Inference: Parallel LLM Generation via Concurrent Attention Paper • 2504.06261 • Published Apr 8, 2025 • 110
Hogwild! Inference: Parallel LLM Generation via Concurrent Attention Paper • 2504.06261 • Published Apr 8, 2025 • 110
Running 3.74k The Ultra-Scale Playbook 🌌 3.74k The ultimate guide to training LLM on large GPU Clusters
MLGym: A New Framework and Benchmark for Advancing AI Research Agents Paper • 2502.14499 • Published Feb 20, 2025 • 194