view article Article Mixture of Experts (MoEs) in Transformers +5 ariG23498, pcuenq, merve, IlyasMoutawwakil, ArthurZ, sergiopaniego, Molbap • Feb 26 • 159
view article Article **NVIDIA Earth-2 Open Models Span the Whole Weather Stack** nvidia • Jan 26 • 36
view article Article Smol2Operator: Post-Training GUI Agents for Computer Use +3 A-Mahla, merve, sergiopaniego, reach-vb, lewtun • Sep 23, 2025 • 138
view article Article Open-R1: a fully open reproduction of DeepSeek-R1 +1 eliebak, lvwerra, lewtun • Jan 28, 2025 • 889
view article Article A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using transformers, accelerate and bitsandbytes ybelkada, timdettmers • Aug 17, 2022 • 131
Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents Paper • 2408.07199 • Published Aug 13, 2024 • 22
LMDX: Language Model-based Document Information Extraction and Localization Paper • 2309.10952 • Published Sep 19, 2023 • 67