multimodal
updated
Image-Text-to-Text
•
2B
•
Updated
•
3.5M
•
1.37k
Qwen/Qwen2.5-VL-7B-Instruct
Image-Text-to-Text
•
8B
•
Updated
•
3.35M
•
•
1.45k
google/gemma-3-27b-it-qat-q4_0-gguf
Image-Text-to-Text
•
27B
•
Updated
•
8.47k
•
378
google/paligemma2-3b-mix-224
Image-Text-to-Text
•
3B
•
Updated
•
31.1k
•
47
HuggingFaceTB/SmolVLM2-256M-Video-Instruct
Image-Text-to-Text
•
0.3B
•
Updated
•
106k
•
96
unsloth/Qwen2.5-VL-3B-Instruct-GGUF
Image-Text-to-Text
•
3B
•
Updated
•
13k
•
19
Image-Text-to-Text
•
0.9B
•
Updated
•
118k
•
78
14B
•
Updated
•
414
•
101
FastVLM: Efficient Vision Encoding for Vision Language Models
Paper
•
2412.13303
•
Published
•
73
Feature Extraction
•
0.9B
•
Updated
•
56.1k
•
313
Qwen/Qwen3-VL-4B-Instruct
Image-Text-to-Text
•
4B
•
Updated
•
711k
•
323
PaddlePaddle/PaddleOCR-VL
Image-Text-to-Text
•
1.0B
•
Updated
•
15.9k
•
1.54k
Image-Text-to-Text
•
3B
•
Updated
•
40.7k
•
114
Viewer
•
Updated
•
1.19k
•
314
•
47