meta-llama/Llama-3.2-11B-Vision-Instruct Image-Text-to-Text • 11B • Updated Dec 4, 2024 • 114k • • 1.55k
lamm-mit/Cephalo-Llama-3.2-11B-Vision-Instruct-128k Image-Text-to-Text • 11B • Updated Sep 30, 2024 • 38 • 6
microsoft/Phi-4-multimodal-instruct Automatic Speech Recognition • 6B • Updated 22 days ago • 257k • 1.55k
docling-project/SmolDocling-256M-preview Image-Text-to-Text • 0.3B • Updated Sep 17, 2025 • 57.1k • 1.6k