view article Article Continuous batching from first principles +1 ror, ArthurZ, mcpotato • Nov 25, 2025 • 396
view article Article Ulysses Sequence Parallelism: Training with Million-Token Contexts kashif, stas • Mar 9 • 30
view article Article Keep the Tokens Flowing: Lessons from 16 Open-Source RL Libraries +7 aminediroHF, qgallouedec, kashif, lewtun, edbeeching, albertvillanova, nouamanetazi, lvwerra, sergiopaniego • Mar 10 • 157
view article Article Custom Kernels for All from Codex and Claude +2 burtenshaw, sayakpaul, ariG23498, evalstate • Feb 13 • 78
view article Article Training Design for Text-to-Image Models: Lessons from Ablations Photoroom • Feb 3 • 73
Towards Scalable Pre-training of Visual Tokenizers for Generation Paper • 2512.13687 • Published Dec 15, 2025 • 107
view article Article Architectural Choices in China's Open-Source AI Ecosystem: Building Beyond DeepSeek huggingface • Jan 27 • 45
view article Article Unlocking Agentic RL Training for GPT-OSS: A Practical Retrospective LinkedIn • Jan 27 • 75
view article Article You could have designed state of the art positional encoding FL33TW00D-HF • Nov 25, 2024 • 482
Running 93 Scaling FineWeb to 1000+ languages: Step 1: finding signal in 100s of evaluation tasks 📝 93 Evaluate multilingual models using FineTasks
Running Featured 1.35k FineWeb: decanting the web for the finest text data at scale 🍷 1.35k Explore and download the FineWeb web‑scale text dataset
Running on CPU Upgrade Featured 3.19k The Smol Training Playbook 📚 3.19k The secrets to building world-class LLMs
view article Article Make your ZeroGPU Spaces go brrr with ahead-of-time compilation +2 cbensimon, sayakpaul, linoyts, multimodalart • Sep 2, 2025 • 77