Vision Language Models (Images) - a ahsanmasood Collection

ahsanmasood 's Collections

Reasoning Models

Reasoning Datasets

Video Understanding Models

Vision Language Models (Images)

Vision Language Models (Images)

updated Jan 19, 2025

adept/fuyu-8b

Image-Text-to-Text • 9B • Updated Nov 4, 2023 • 79.1k • 1.02k
openbmb/MiniCPM-Llama3-V-2_5

Image-Text-to-Text • 9B • Updated Jan 15, 2025 • 43.7k • 1.41k
zai-org/cogvlm2-llama3-caption

Video-Text-to-Text • Updated May 14, 2025 • 3.05k • 116
deepseek-ai/Janus-1.3B

Any-to-Any • 2B • Updated Jan 27, 2025 • 3.94k • 595
deepseek-ai/JanusFlow-1.3B

Any-to-Any • 2B • Updated Jan 27, 2025 • 525 • 151
Qwen/QVQ-72B-Preview

Image-Text-to-Text • 73B • Updated Jan 12, 2025 • 158 • 610
Qwen/Qwen2-VL-7B-Instruct

Image-Text-to-Text • 8B • Updated Feb 6, 2025 • 1.91M • 1.27k
meta-llama/Llama-3.2-11B-Vision-Instruct

Image-Text-to-Text • 11B • Updated Dec 4, 2024 • 135k • 1.59k
allenai/Molmo-7B-D-0924

Image-Text-to-Text • 8B • Updated Dec 15, 2025 • 28.2k • 565
microsoft/Florence-2-large

Image-Text-to-Text • 0.8B • Updated Aug 4, 2025 • 994k • 1.8k
google/paligemma-3b-pt-448

Image-Text-to-Text • 3B • Updated Jul 19, 2024 • 2.37k • 32
vikhyatk/moondream2

Image-Text-to-Text • 2B • Updated Sep 23, 2025 • 2.55M • 1.41k
OpenGVLab/InternVL2-2B

Image-Text-to-Text • 2B • Updated Mar 25, 2025 • 1.14M • 80
OpenGVLab/InternVL2_5-2B

Image-Text-to-Text • 2B • Updated Mar 25, 2025 • 63.8k • 33
HuggingFaceTB/SmolVLM-Instruct

Image-Text-to-Text • 2B • Updated Apr 8, 2025 • 24.8k • 584
mistralai/Pixtral-12B-2409

Updated Jul 28, 2025 • 4.18k • 686
microsoft/Phi-3.5-vision-instruct

Image-Text-to-Text • Updated Dec 10, 2025 • 1.53M • 732
google/paligemma2-3b-pt-896

Image-Text-to-Text • 3B • Updated Dec 5, 2024 • 946 • 26
google/paligemma2-10b-pt-896

Image-Text-to-Text • Updated Dec 5, 2024 • 353 • 32