view article Article Continuous batching from first principles +1 ror, ArthurZ, mcpotato • Nov 25, 2025 • 396
view article Article Ulysses Sequence Parallelism: Training with Million-Token Contexts kashif, stas • Mar 9 • 30
view article Article Keep the Tokens Flowing: Lessons from 16 Open-Source RL Libraries +7 aminediroHF, qgallouedec, kashif, lewtun, edbeeching, albertvillanova, nouamanetazi, lvwerra, sergiopaniego • Mar 10 • 157
view article Article Custom Kernels for All from Codex and Claude +2 burtenshaw, sayakpaul, ariG23498, evalstate • Feb 13 • 78
view article Article Training Design for Text-to-Image Models: Lessons from Ablations Photoroom • Feb 3 • 73
Towards Scalable Pre-training of Visual Tokenizers for Generation Paper • 2512.13687 • Published Dec 15, 2025 • 107
view article Article Architectural Choices in China's Open-Source AI Ecosystem: Building Beyond DeepSeek huggingface • Jan 27 • 45
view article Article Unlocking Agentic RL Training for GPT-OSS: A Practical Retrospective LinkedIn • Jan 27 • 75
view article Article You could have designed state of the art positional encoding FL33TW00D-HF • Nov 25, 2024 • 482
view article Article Make your ZeroGPU Spaces go brrr with ahead-of-time compilation +2 cbensimon, sayakpaul, linoyts, multimodalart • Sep 2, 2025 • 77
view article Article From Zero to GPU: A Guide to Building and Scaling Production-Ready CUDA Kernels drbh, danieldk • Aug 18, 2025 • 100
view article Article SmolLM3: smol, multilingual, long-context reasoner +21 eliebak, cmpatino, anton-l, edbeeching, m-ric, nouamanetazi, akseljoonas, guipenedo, hynky, clefourrier, SaylorTwift, kashif, qgallouedec, hlarcher, glutamatt, Xenova, reach-vb, ngxson, craffel, lewtun, loubnabnl, lvwerra, thomwolf • Jul 8, 2025 • 777
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention Paper • 2506.13585 • Published Jun 16, 2025 • 278