view article Article Profiling in PyTorch (Part 2): From nn.Linear to a Fused MLP +3 ariG23498, ror, sergiopaniego, pcuenq, sayakpaul • 14 days ago • 45
view article Article Profiling in PyTorch (Part 1): A Beginner's Guide to torch.profiler +3 ariG23498, sayakpaul, sergiopaniego, ror, pcuenq • 27 days ago • 124
Nemotron-Post-Training-v3 Collection Collection of datasets used in the post-training phase of Nemotron Nano, Super, and Ultra v3. • 50 items • Updated 13 days ago • 167
Running 114 Unlocking On-Policy Distillation for Any Model Family 📝 114 Explore on-policy distillation visualization for any model
view article Article Continuous batching from first principles +1 ror, ArthurZ, mcpotato • Nov 25, 2025 • 410
view article Article Keep the Tokens Flowing: Lessons from 16 Open-Source RL Libraries +7 aminediroHF, qgallouedec, kashif, lewtun, edbeeching, albertvillanova, nouamanetazi, lvwerra, sergiopaniego • Mar 10 • 164
Running 3.9k The Ultra-Scale Playbook 🌌 3.9k The ultimate guide to training LLM on large GPU Clusters
Running Featured 1.37k FineWeb: decanting the web for the finest text data at scale 🍷 1.37k Explore and download the FineWeb web‑scale text dataset