Cosmos-Predict2.5 Collection ⚠️ This collection is archived. 👉 https://huggingface.co/collections/nvidia/cosmos3 • 2 items • Updated 14 days ago • 23
view article Article State of open video generation models in Diffusers +1 sayakpaul, a-r-r-o-w, dn6 • Jan 27, 2025 • 71
view article Article Small Language Models (SLM): A Comprehensive Overview jjokah • Feb 22, 2025 • 164
Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning Paper • 2604.12374 • Published Apr 14 • 37
view article Article Training and Finetuning Multimodal Embedding & Reranker Models with Sentence Transformers tomaarsen • Apr 16 • 72
view article Article Welcome Gemma 4: Frontier multimodal intelligence on device +5 merve, pcuenq, sergiopaniego, burtenshaw, Steveeeeeeen, alvarobartt, SaylorTwift • Apr 2 • 909
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model Paper • 2510.14528 • Published Oct 16, 2025 • 129
Perception Encoder Collection OpenCLIP (PE Core image + text) and timm PE Core, Spatial, Lang (ViT only) weights. NOTE: These weights do not work with original modeling code. • 19 items • Updated Sep 19, 2025 • 8
view article Article nanoVLM: The simplest repository to train your VLM in pure PyTorch +5 ariG23498, lusxvr, andito, sergiopaniego, merve, pcuenq, reach-vb • May 21, 2025 • 260
PaLI-3 Vision Language Models: Smaller, Faster, Stronger Paper • 2310.09199 • Published Oct 13, 2023 • 29
view article Article Vision Language Models (Better, faster, stronger) +3 merve, sergiopaniego, ariG23498, pcuenq, andito • May 12, 2025 • 614