multimodal - a fnauman Collection

fnauman 's Collections

multimodal

updated Mar 10

vikhyatk/moondream2

Image-Text-to-Text • 2B • Updated Sep 23, 2025 • 1.94M • 1.43k
Qwen/Qwen2.5-VL-7B-Instruct

Image-Text-to-Text • 8B • Updated Apr 6, 2025 • 10.2M • • 1.62k
google/gemma-3-27b-it-qat-q4_0-gguf

Image-Text-to-Text • 27B • Updated Apr 11, 2025 • 260 • 401
google/paligemma2-3b-mix-224

Image-Text-to-Text • 3B • Updated Feb 7, 2025 • 14.7k • 56
HuggingFaceTB/SmolVLM2-256M-Video-Instruct

Image-Text-to-Text • 0.3B • Updated Apr 8, 2025 • 107k • 107
unsloth/Qwen2.5-VL-3B-Instruct-GGUF

Image-Text-to-Text • 3B • Updated May 12, 2025 • 10.9k • 25
OpenGVLab/InternVL3-1B

Image-Text-to-Text • 0.9B • Updated Sep 11, 2025 • 429k • 85
BLIP3o/BLIP3o-Model-8B

14B • Updated Jun 4, 2025 • 741 • 103
FastVLM: Efficient Vision Encoding for Vision Language Models

Paper • 2412.13303 • Published Dec 17, 2024 • 77
jinaai/jina-clip-v2

Feature Extraction • 0.9B • Updated Apr 8 • 501k • 341
Qwen/Qwen3-VL-4B-Instruct

Image-Text-to-Text • 4B • Updated Oct 15, 2025 • 3.58M • • 415
PaddlePaddle/PaddleOCR-VL

Image-Text-to-Text • 1.0B • Updated 16 days ago • 29.6k • 1.63k
PerceptronAI/Isaac-0.1

Image-Text-to-Text • 3B • Updated Mar 20 • 1.84k • 116
moondream/refcoco-m

Viewer • Updated Nov 17, 2025 • 1.19k • 315 • 49
nvidia/NVIDIA-Nemotron-Parse-v1.2

Image-Text-to-Text • 0.9B • Updated May 5 • 239k • 56