view article Article Mixture of Experts (MoEs) in Transformers +5 ariG23498, pcuenq, merve, IlyasMoutawwakil, ArthurZ, sergiopaniego, Molbap • Feb 26 • 169
NVIDIA Nemotron V2 Collection Open, Production-ready Enterprise Models. Nvidia Open Model license. • 9 items • Updated 21 days ago • 106
Less is More: Recursive Reasoning with Tiny Networks Paper • 2510.04871 • Published Oct 6, 2025 • 517
view changelog Hugging Face Changelog Repositories total file size is now displayed Sep 18, 2025 • 176
view article Article Vision Language Model Alignment in TRL ⚡️ +3 sergiopaniego, merve, qgallouedec, kashif, ariG23498 • Aug 7, 2025 • 112
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics Paper • 2506.01844 • Published Jun 2, 2025 • 161
view article Article AI Policy @🤗: Response to the 2025 National AI R&D Strategic Plan evijit • Jun 2, 2025 • 14
view article Article How to generate text: using different decoding methods for language generation with Transformers patrickvonplaten • Mar 1, 2020 • 301
BRAVE: Broadening the visual encoding of vision-language models Paper • 2404.07204 • Published Apr 10, 2024 • 20
Efficient Streaming Language Models with Attention Sinks Paper • 2309.17453 • Published Sep 29, 2023 • 15