multimodal
updated
Image-Text-to-Text
• 2B • Updated • 4.03M
• 1.4k
Qwen/Qwen2.5-VL-7B-Instruct
Image-Text-to-Text
• 8B • Updated • 4.46M
• • 1.48k
google/gemma-3-27b-it-qat-q4_0-gguf
Image-Text-to-Text
• 27B • Updated • 3.38k
• 399
google/paligemma2-3b-mix-224
Image-Text-to-Text
• 3B • Updated • 41.1k
• 48
HuggingFaceTB/SmolVLM2-256M-Video-Instruct
Image-Text-to-Text
• 0.3B • Updated • 155k
• 98
unsloth/Qwen2.5-VL-3B-Instruct-GGUF
Image-Text-to-Text
• 3B • Updated • 5.11k
• 20
Image-Text-to-Text
• 0.9B • Updated • 122k
• 79
Updated • 3.97k
• 101
FastVLM: Efficient Vision Encoding for Vision Language Models
Paper
• 2412.13303
• Published • 75
Feature Extraction
• 0.9B • Updated • 43.1k
• 328
Qwen/Qwen3-VL-4B-Instruct
Image-Text-to-Text
• 4B • Updated • 2.24M
• 364
PaddlePaddle/PaddleOCR-VL
Image-Text-to-Text
• 1.0B • Updated • 7.79k
• 1.58k
Image-Text-to-Text
• 3B • Updated • 55.2k
• 114
Viewer
• Updated • 1.19k • 418
• 47
nvidia/NVIDIA-Nemotron-Parse-v1.2
0.9B • Updated • 7.95k
• 23