Vision-Language Alignment Datasets 🧊 - a ChiefTheLord Collection

ChiefTheLord 's Collections

Finest Instructions 🧑🏻‍🍳

Vision-Language Alignment Datasets 🧊

Vision-Language Alignment Datasets 🧊

updated Jan 17

Collections of public datasets for Vision-Language modalities, especially for Frozen Vision Language Alignment.

HuggingFaceM4/the_cauldron

Viewer • Updated May 6, 2024 • 1.88M • 288k • 547
5CD-AI/LLaVA-CoT-o1-Instruct

Viewer • Updated Nov 27, 2024 • 58.5k • 24 • 110
kargwalaryan/SynCap-Flickr8k

Viewer • Updated Oct 3, 2024 • 7.96k • 50 • 2

Note SynthRecap
UCSC-VLAA/Recap-COCO-30K

Viewer • Updated Jun 12, 2024 • 30.5k • 248 • 26

Note SynthRecap
liuhaotian/LLaVA-Instruct-150K

Preview • Updated Jan 3, 2024 • 3.65k • 616
Xkev/LLaVA-CoT-100k

Viewer • Updated Dec 20, 2025 • 98.6k • 2.65k • 105
jackyhate/text-to-image-2M

Viewer • Updated Apr 30 • 649k • 9.33k • 164
HuggingFaceM4/Docmatix

Viewer • Updated Aug 26, 2024 • 2.55M • 13k • 305
CaptionEmporium/conceptual-captions-cc12m-llavanext

Viewer • Updated Jun 30, 2024 • 11M • 711 • 23
IlyaGusev/gpt_roleplay_realm

Viewer • Updated Apr 7, 2024 • 435 • 285 • 105