Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
blanchefort
's Collections
Medical
VLA models
Audio
Translate
OCR
OmniModels
Edge models
Video encoders
Judge
Datasets for Embodied
Ru text encoders
Text2Image
VLMs
VLMs
updated
Mar 2
Upvote
-
Sort: Collection
Qwen/Qwen2-VL-7B-Instruct
Image-Text-to-Text
•
8B
•
Updated
Feb 6, 2025
•
1.85M
•
1.28k
NVEagle/Eagle-X5-13B-Chat
Image-Text-to-Text
•
15B
•
Updated
Sep 16, 2024
•
219
•
28
internlm/internlm-xcomposer2d5-7b
Visual Question Answering
•
Updated
Jul 22, 2024
•
585
•
210
AIRI-Institute/OmniFusion
Updated
Apr 10, 2024
•
59
OpenGVLab/InternVideo2_chat_8B_HD
Video-Text-to-Text
•
8B
•
Updated
Dec 18, 2024
•
57
•
18
OpenGVLab/InternVideo2-Chat-8B
Video-Text-to-Text
•
8B
•
Updated
Oct 10, 2024
•
206
•
26
zai-org/cogvlm2-video-llama3-chat
Text Generation
•
13B
•
Updated
Jul 24, 2024
•
61
•
56
nyu-visionx/cambrian-34b
Text Generation
•
35B
•
Updated
Jun 28, 2024
•
10
•
27
zai-org/cogvlm-base-490-hf
Text Generation
•
18B
•
Updated
Nov 20, 2023
•
42
•
7
zai-org/cogvlm-chat-hf
Text Generation
•
18B
•
Updated
Dec 19, 2023
•
843
•
199
zai-org/cogvlm-grounding-generalist-hf
Text Generation
•
18B
•
Updated
Dec 11, 2023
•
76
•
16
Qwen/Qwen-VL
Text Generation
•
Updated
Jan 25, 2024
•
80.8k
•
282
liuhaotian/llava-v1.5-7b
Image-Text-to-Text
•
Updated
May 8, 2024
•
230k
•
556
LanguageBind/MoE-LLaVA-Phi2-2.7B-4e-384
Text Generation
•
6B
•
Updated
Feb 1, 2024
•
22
•
32
LanguageBind/Video-LLaVA-7B-hf
Image-Text-to-Text
•
7B
•
Updated
May 16, 2024
•
9.11k
•
50
openvla/openvla-7b-prismatic
Image-Text-to-Text
•
Updated
Jul 9, 2024
•
62
•
8
openvla/openvla-7b-finetuned-libero-object
Image-Text-to-Text
•
8B
•
Updated
Oct 9, 2024
•
5.03k
•
2
openvla/openvla-7b-finetuned-libero-10
Image-Text-to-Text
•
8B
•
Updated
Oct 9, 2024
•
4k
•
7
IntelLabs/LlavaOLMoBitnet1B
Updated
Aug 30, 2024
•
11
•
30
mistral-community/pixtral-12b-240910
Image-Text-to-Text
•
Updated
Oct 1, 2024
•
1.05k
•
381
LanguageBind/MoE-LLaVA-StableLM-1.6B-4e
Text Generation
•
3B
•
Updated
Feb 1, 2024
•
412
•
8
llava-hf/LLaVA-NeXT-Video-7B-hf
Video-Text-to-Text
•
7B
•
Updated
Nov 11, 2025
•
160k
•
125
Qwen/Qwen-VL-Chat
Text Generation
•
Updated
Jan 25, 2024
•
75k
•
384
LanguageBind/Video-LLaVA-7B
Text Generation
•
7B
•
Updated
Apr 9, 2024
•
869
•
89
LanguageBind/LanguageBind_Image
Zero-Shot Image Classification
•
Updated
Feb 1, 2024
•
20.3k
•
11
LanguageBind/LanguageBind_Video
Zero-Shot Image Classification
•
Updated
Feb 1, 2024
•
6.64k
•
3
llava-hf/llava-1.5-13b-hf
Image-Text-to-Text
•
13B
•
Updated
Jan 27, 2025
•
14.2k
•
34
llava-hf/llava-1.5-7b-hf
Image-Text-to-Text
•
7B
•
Updated
Jun 6, 2025
•
3.11M
•
366
FreedomIntelligence/LongLLaVA-53B-A13B
Image-Text-to-Text
•
52B
•
Updated
Nov 28, 2024
•
13
•
20
meta-llama/Llama-3.2-11B-Vision
Image-Text-to-Text
•
11B
•
Updated
Sep 27, 2024
•
6.06k
•
590
BAAI/Emu3-VisionTokenizer
Image Feature Extraction
•
0.3B
•
Updated
Oct 8, 2024
•
2.24k
•
63
openbmb/MiniCPM-V-2_6
Image-Text-to-Text
•
8B
•
Updated
Jun 13, 2025
•
122k
•
1.05k
openbmb/MiniCPM-V
Visual Question Answering
•
3B
•
Updated
Jan 15, 2025
•
377
•
207
openbmb/MiniCPM-V-2
Visual Question Answering
•
3B
•
Updated
Jan 15, 2025
•
13.2k
•
499
openbmb/MiniCPM-Llama3-V-2_5
Image-Text-to-Text
•
9B
•
Updated
Jan 15, 2025
•
16.2k
•
1.41k
nvidia/NVLM-D-72B
Image-Text-to-Text
•
79B
•
Updated
Jan 14, 2025
•
148k
•
776
vikhyatk/moondream2
Image-Text-to-Text
•
2B
•
Updated
Sep 23, 2025
•
1.54M
•
1.42k
allenai/Molmo-72B-0924
Image-Text-to-Text
•
73B
•
Updated
Oct 9, 2025
•
3.56k
•
300
allenai/MolmoE-1B-0924
Image-Text-to-Text
•
Updated
Apr 24, 2025
•
1.09k
•
158
allenai/Molmo-7B-D-0924
Image-Text-to-Text
•
8B
•
Updated
Dec 15, 2025
•
18.9k
•
567
allenai/Molmo-7B-O-0924
Image-Text-to-Text
•
8B
•
Updated
Oct 9, 2025
•
1.45k
•
164
deepseek-ai/Janus-1.3B
Any-to-Any
•
2B
•
Updated
Jan 27, 2025
•
1.39k
•
597
neulab/Pangea-7B
8B
•
Updated
Oct 24, 2024
•
1.3k
•
133
neulab/Pangea-7B-hf
8B
•
Updated
Oct 28, 2025
•
767
•
13
BAAI/Aquila-VL-2B-llava-qwen
Visual Question Answering
•
2B
•
Updated
Nov 25, 2024
•
55
•
62
mistralai/Pixtral-Large-Instruct-2411
Updated
26 days ago
•
91
•
433
google/paligemma2-10b-pt-224
Image-Text-to-Text
•
10B
•
Updated
Dec 5, 2024
•
2.15k
•
10
google/paligemma2-3b-pt-224
Image-Text-to-Text
•
3B
•
Updated
Dec 5, 2024
•
31.8k
•
174
vidore/colqwen2-v1.0
Visual Document Retrieval
•
Updated
Jun 5, 2025
•
50.2k
•
119
deepseek-ai/Janus-Pro-7B
Any-to-Any
•
Updated
Feb 1, 2025
•
12.7k
•
3.62k
deepseek-ai/Janus-Pro-1B
Any-to-Any
•
Updated
Feb 1, 2025
•
10.9k
•
478
nvidia/Eagle2-9B
Image-Text-to-Text
•
9B
•
Updated
Jan 28, 2025
•
130
•
63
openbmb/MiniCPM-o-2_6
Any-to-Any
•
9B
•
Updated
Oct 5, 2025
•
442k
•
1.29k
DAMO-NLP-SG/VideoLLaMA3-7B
Video-Text-to-Text
•
8B
•
Updated
Sep 2, 2025
•
4k
•
75
DAMO-NLP-SG/VideoLLaMA3-2B
Video-Text-to-Text
•
2B
•
Updated
Sep 3, 2025
•
2.19k
•
21
ATH-MaaS/Ovis2-8B
Image-Text-to-Text
•
9B
•
Updated
Aug 15, 2025
•
924
•
74
Qwen/Qwen3-VL-2B-Thinking
Image-Text-to-Text
•
2B
•
Updated
Oct 20, 2025
•
61.8k
•
115
LiquidAI/LFM2-VL-3B
Image-Text-to-Text
•
3B
•
Updated
Mar 30
•
18.7k
•
134
facebook/sam3
Mask Generation
•
0.9B
•
Updated
Nov 20, 2025
•
1.69M
•
2.33k
stepfun-ai/Step3-VL-10B-FP8
Image-Text-to-Text
•
Updated
Feb 4
•
201
•
10
nvidia/llama-nemotron-colembed-vl-3b-v2
Visual Document Retrieval
•
4B
•
Updated
May 20
•
5.88k
•
22
Upvote
-
Sort: Collection
Share collection
View history
Collection guide
Browse collections