view article Article Mixture of Experts (MoEs) in Transformers +5 ariG23498, pcuenq, merve, IlyasMoutawwakil, ArthurZ, sergiopaniego, Molbap • Feb 26 • 159
view article Article **NVIDIA Earth-2 Open Models Span the Whole Weather Stack** nvidia • Jan 26 • 36
Running 102 Unlocking On-Policy Distillation for Any Model Family 📝 102 Visualize on-policy distillation for any model family
Running on CPU Upgrade Featured 3.17k The Smol Training Playbook 📚 3.17k The secrets to building world-class LLMs