view article Article Unlocking asynchronicity in continuous batching +1 ror, pcuenq, ariG23498 • 2 days ago • 32
view article Article Mixture of Experts (MoEs) in Transformers +5 ariG23498, pcuenq, merve, IlyasMoutawwakil, ArthurZ, sergiopaniego, Molbap • Feb 26 • 159
Running on CPU Upgrade Featured 3.17k The Smol Training Playbook 📚 3.17k The secrets to building world-class LLMs
Running 104 Unlocking On-Policy Distillation for Any Model Family 📝 104 Visualize on-policy distillation for any model family
view article Article Efficient MultiModal Data Pipeline +3 ariG23498, lusxvr, andito, sergiopaniego, pcuenq • Jul 8, 2025 • 70