view article Article KV Caching Explained: Optimizing Transformer Inference Efficiency not-lain • Jan 30, 2025 • 328
view article Article Transformers v5: Simple model definitions powering the AI ecosystem +2 lysandre, ArthurZ, cyrilvallez, reach-vb • Dec 1, 2025 • 310
NVIDIA Nemotron v3 Collection Open, Production-ready Enterprise Models • 18 items • Updated 6 days ago • 289
view article Article Welcome EmbeddingGemma, Google's new efficient embedding model +4 tomaarsen, Xenova, alvarobartt, ariG23498, pcuenq, sergiopaniego • Sep 4, 2025 • 274
view article Article Say hello to `hf`: a faster, friendlier Hugging Face CLI ✨ +1 Wauplin, celinah, julien-c • Jul 25, 2025 • 84
view article Article You could have designed state of the art positional encoding FL33TW00D-HF • Nov 25, 2024 • 478
view article Article Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM +2 ariG23498, merve, pcuenq, reach-vb • Mar 12, 2025 • 496
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions Paper • 2412.09596 • Published Dec 12, 2024 • 97