multimodal
updated
Image-Text-to-Text
• 2B • Updated • 2.89M
• 1.41k
Qwen/Qwen2.5-VL-7B-Instruct
Image-Text-to-Text
• 8B • Updated • 4.64M
• • 1.54k
google/gemma-3-27b-it-qat-q4_0-gguf
Image-Text-to-Text
• 27B • Updated • 514
• 400
google/paligemma2-3b-mix-224
Image-Text-to-Text
• 3B • Updated • 41.7k
• 52
HuggingFaceTB/SmolVLM2-256M-Video-Instruct
Image-Text-to-Text
• 0.3B • Updated • 125k
• 102
unsloth/Qwen2.5-VL-3B-Instruct-GGUF
Image-Text-to-Text
• 3B • Updated • 9.57k
• 23
Image-Text-to-Text
• 0.9B • Updated • 122k
• 84
14B • Updated • 1.26k
• 103
FastVLM: Efficient Vision Encoding for Vision Language Models
Paper
• 2412.13303
• Published • 77
Feature Extraction
• 0.9B • Updated • 134k
• 331
Qwen/Qwen3-VL-4B-Instruct
Image-Text-to-Text
• 4B • Updated • 3.08M
• 389
PaddlePaddle/PaddleOCR-VL
Image-Text-to-Text
• 1.0B • Updated • 11.1k
• 1.61k
Image-Text-to-Text
• 3B • Updated • 7.83k
• 116
Viewer
• Updated • 1.19k • 503
• 49
nvidia/NVIDIA-Nemotron-Parse-v1.2
Image-Text-to-Text
• 0.9B • Updated • 97.1k
• 35