view article Article SyGra: The One-Stop Framework for Building Data for LLMs and SLMs ServiceNow-AI • Sep 22, 2025 • 14
view article Article Mixture of Experts (MoEs) in Transformers +5 ariG23498, pcuenq, merve, IlyasMoutawwakil, ArthurZ, sergiopaniego, Molbap • Feb 26 • 159
view article Article TRL v1.0: Post-Training Library Built to Move with the Field +2 qgallouedec, stevhliu, pcuenq, sergiopaniego • Mar 31 • 51
view article Article Multimodal Embedding & Reranker Models with Sentence Transformers tomaarsen • Apr 9 • 59
view article Article Efficient LLM Pretraining: Packed Sequences and Masked Attention sirluk • Oct 7, 2024 • 71
view article Article You could have designed state of the art positional encoding FL33TW00D-HF • Nov 25, 2024 • 478
Running on CPU Upgrade Featured 3.17k The Smol Training Playbook 📚 3.17k The secrets to building world-class LLMs
Running 3.84k The Ultra-Scale Playbook 🌌 3.84k The ultimate guide to training LLM on large GPU Clusters