meta-llama/Llama-3.2-90B-Vision-Instruct Image-Text-to-Text • 89B • Updated Mar 4, 2025 • 23.4k • 352
microsoft/Phi-4-multimodal-instruct Automatic Speech Recognition • Updated Dec 10, 2025 • 334k • 1.58k
meta-llama/Llama-4-Maverick-17B-128E-Instruct Image-Text-to-Text • 402B • Updated May 22, 2025 • 6.54k • • 466
Running on Zero 36 Multimodal RAG with Granite Vision 🚀 36 RAG example using Granite [vision, embedding, instruct]