Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
shail-2512
's Collections
MultiModal (Any-to-Any)
ALMs (Audio Language Models)
LLMs
TTS
Coder
Reasoning (LRMs)
Image Generation
VLMs
3D
Video Generation
Speech Recognition
Dataset to fine-tune Embeddings
Reranking Models
Embedding Models
VLMs
updated
Dec 2, 2024
Upvote
-
HuggingFaceTB/SmolVLM-Instruct
Image-Text-to-Text
•
2B
•
Updated
Apr 8, 2025
•
48.1k
•
579
microsoft/OmniParser
Image-Text-to-Text
•
Updated
Dec 2, 2024
•
452
•
1.71k
vidore/colsmolvlm-v0.1
Visual Document Retrieval
•
Updated
Mar 14, 2025
•
163
•
55
meta-llama/Llama-3.2-11B-Vision-Instruct
Image-Text-to-Text
•
Updated
Dec 4, 2024
•
279k
•
1.57k
Qwen/Qwen2-VL-7B-Instruct
Image-Text-to-Text
•
Updated
Feb 6, 2025
•
1.56M
•
1.27k
mistral-labs/pixtral-12b
Image-Text-to-Text
•
13B
•
Updated
Jan 27, 2025
•
146k
•
103
HuggingFaceM4/Idefics3-8B-Llama3
Image-Text-to-Text
•
Updated
Dec 2, 2024
•
157k
•
302
allenai/Molmo-7B-O-0924
Image-Text-to-Text
•
8B
•
Updated
Oct 9, 2025
•
1.92k
•
163
Upvote
-
Share collection
View history
Collection guide
Browse collections