view article Article Continuous batching from first principles +1 ror, ArthurZ, mcpotato • Nov 25, 2025 • 387
view article Article SmolLM3: smol, multilingual, long-context reasoner +21 eliebak, cmpatino, anton-l, edbeeching, m-ric, nouamanetazi, akseljoonas, guipenedo, hynky, clefourrier, SaylorTwift, kashif, qgallouedec, hlarcher, glutamatt, Xenova, reach-vb, ngxson, craffel, lewtun, loubnabnl, lvwerra, thomwolf • Jul 8, 2025 • 776
view article Article DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge NormalUhr • Feb 7, 2025 • 293
view article Article KV Cache from scratch in nanoVLM +3 ariG23498, kashif, lusxvr, andito, pcuenq • Jun 4, 2025 • 119
view article Article Gotchas in Tokenizer Behavior Every Developer Should Know qgallouedec • Apr 18, 2025 • 72
view article Article Improving Hugging Face Training Efficiency Through Packing with Flash Attention 2 +4 RQlee, ArthurZ, achikundu, lwtr, rganti, mayank-mishra • Aug 21, 2024 • 41
view article Article 🦸🏻#14: What Is MCP, and Why Is Everyone – Suddenly!– Talking About It? Kseniase • Mar 17, 2025 • 357
view article Article Introducing EuroBERT: A High-Performance Multilingual Encoder Model EuroBERT • Mar 10, 2025 • 147
view article Article SmolLM - blazingly fast and remarkably powerful +1 loubnabnl, anton-l, eliebak • Jul 16, 2024 • 455
view article Article Open-R1: a fully open reproduction of DeepSeek-R1 +1 eliebak, lvwerra, lewtun • Jan 28, 2025 • 889
view article Article The SOTA Text-to-speech and Zero Shot Voice cloning model that no one knows about... srinivasbilla • Jan 20, 2025 • 77
The Lessons of Developing Process Reward Models in Mathematical Reasoning Paper • 2501.07301 • Published Jan 13, 2025 • 101
Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models Paper • 2410.07985 • Published Oct 10, 2024 • 32
The Perfect Blend: Redefining RLHF with Mixture of Judges Paper • 2409.20370 • Published Sep 30, 2024 • 7