meta-llama/Llama-3.2-11B-Vision-Instruct Image-Text-to-Text • 11B • Updated Dec 4, 2024 • 167k • 1.6k
meta-llama/Llama-3.2-90B-Vision-Instruct Image-Text-to-Text • 89B • Updated Mar 4, 2025 • 1.04k • 358
microsoft/Phi-4-multimodal-instruct Automatic Speech Recognition • 6B • Updated Dec 10, 2025 • 527k • 1.6k
meta-llama/Llama-4-Maverick-17B-128E-Instruct Image-Text-to-Text • 402B • Updated May 22, 2025 • 33.2k • • 493
meta-llama/Llama-4-Scout-17B-16E-Instruct Image-Text-to-Text • 109B • Updated May 22, 2025 • 451k • • 1.3k
Running on Zero Agents 42 Multimodal RAG with Granite Vision 🚀 42 RAG example using Granite [vision, embedding, instruct]
Running on Zero Agents Featured 260 MatchAnything 🏢 260 Find similar images and match them across collections
google/siglip2-so400m-patch14-384 Zero-Shot Image Classification • 1B • Updated Feb 21, 2025 • 755k • 85