MolmoAct2-BimanualYAM Dataset Collection Collection of the MolmoAct2-BimanualYAM Dataset • 740 items • Updated 14 days ago • 14
view article Article Continuous batching from first principles +1 ror, ArthurZ, mcpotato • Nov 25, 2025 • 389
view article Article KV Caching Explained: Optimizing Transformer Inference Efficiency not-lain • Jan 30, 2025 • 332
view article Article You could have designed state of the art positional encoding FL33TW00D-HF • Nov 25, 2024 • 480
view article Article makeMoE: Implement a Sparse Mixture of Experts Language Model from Scratch AviSoori1x • May 7, 2024 • 121
view article Article SmolLM3: smol, multilingual, long-context reasoner +21 eliebak, cmpatino, anton-l, edbeeching, m-ric, nouamanetazi, akseljoonas, guipenedo, hynky, clefourrier, SaylorTwift, kashif, qgallouedec, hlarcher, glutamatt, Xenova, reach-vb, ngxson, craffel, lewtun, loubnabnl, lvwerra, thomwolf • Jul 8, 2025 • 776
view article Article Learn the Hugging Face Kernel Hub in 5 Minutes +5 drbh, danieldk, Narsil, pcuenq, pagezyhf, merve, reach-vb • Jun 12, 2025 • 164
view article Article KV Cache from scratch in nanoVLM +3 ariG23498, kashif, lusxvr, andito, pcuenq • Jun 4, 2025 • 119
view article Article No GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL +4 toslali-ibm, mirinflim, qgallouedec, esnible, rganti, mudhakar • Jun 3, 2025 • 101
view article Article 🐯 Liger GRPO meets TRL +4 shisahni, kashif, smohammadi, ShirinYamani, m0m0chen, liberty4321 • May 25, 2025 • 53
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection Paper • 2403.03507 • Published Mar 6, 2024 • 190