ahsanmasood 's Collections Vision Language Models (Images)
updated
Image-Text-to-Text
• 9B • Updated • 79.1k
• 1.02k
openbmb/MiniCPM-Llama3-V-2_5
Image-Text-to-Text
• 9B • Updated • 43.7k
• 1.41k
zai-org/cogvlm2-llama3-caption
Video-Text-to-Text
• Updated • 3.05k
• 116
Any-to-Any
• 2B • Updated • 3.94k
• 595
deepseek-ai/JanusFlow-1.3B
Any-to-Any
• 2B • Updated • 525
• 151
Image-Text-to-Text
• 73B • Updated • 158
• 610
Qwen/Qwen2-VL-7B-Instruct
Image-Text-to-Text
• 8B • Updated • 1.91M
• 1.27k
meta-llama/Llama-3.2-11B-Vision-Instruct
Image-Text-to-Text
• 11B • Updated • 135k
• 1.59k
Image-Text-to-Text
• 8B • Updated • 28.2k
• 565
microsoft/Florence-2-large
Image-Text-to-Text
• 0.8B • Updated • 994k
• 1.8k
google/paligemma-3b-pt-448
Image-Text-to-Text
• 3B • Updated • 2.37k
• 32
Image-Text-to-Text
• 2B • Updated • 2.55M
• 1.41k
Image-Text-to-Text
• 2B • Updated • 1.14M
• 80
Image-Text-to-Text
• 2B • Updated • 63.8k
• 33
HuggingFaceTB/SmolVLM-Instruct
Image-Text-to-Text
• 2B • Updated • 24.8k
• 584
mistralai/Pixtral-12B-2409
Updated • 4.18k
• 686
microsoft/Phi-3.5-vision-instruct
Image-Text-to-Text
• Updated • 1.53M
• 732
google/paligemma2-3b-pt-896
Image-Text-to-Text
• 3B • Updated • 946
• 26
google/paligemma2-10b-pt-896
Image-Text-to-Text
• Updated • 353
• 32