view article Article KV Caching Explained: Optimizing Transformer Inference Efficiency not-lain • Jan 30, 2025 • 354
view article Article Continuous batching from first principles +1 ror, ArthurZ, mcpotato • Nov 25, 2025 • 411
view article Article Diffusers welcomes FLUX-2 +6 YiYiXu, dg845, sayakpaul, OzzyGT, dn6, ariG23498, linoyts, multimodalart • Nov 25, 2025 • 189
view article Article huggingface_hub v1.0: Five Years of Building the Foundation of Open Machine Learning +2 Wauplin, celinah, lysandre, julien-c • Oct 27, 2025 • 76
view article Article Building the Open Agent Ecosystem Together: Introducing OpenEnv +8 spisakjo, darktex, zkwentz, mortimerp9, Sanyam, Hamid-Nazeri, Pankit01, emre0, lewtun, reach-vb • Oct 23, 2025 • 164
Less is More: Recursive Reasoning with Tiny Networks Paper • 2510.04871 • Published Oct 6, 2025 • 517
view article Article How to Choose the Best Open Source LLM for Your Project in 2025 dvilasuero • Sep 9, 2025 • 78
view article Article Welcome EmbeddingGemma, Google's new efficient embedding model +4 tomaarsen, Xenova, alvarobartt, ariG23498, pcuenq, sergiopaniego • Sep 4, 2025 • 275
view article Article Make your ZeroGPU Spaces go brrr with ahead-of-time compilation +2 cbensimon, sayakpaul, linoyts, multimodalart • Sep 2, 2025 • 78
view article Article Learn the Hugging Face Kernel Hub in 5 Minutes +5 drbh, danieldk, Narsil, pcuenq, pagezyhf, merve, reach-vb • Jun 12, 2025 • 164