Dataset Sources & References
This document tracks official and related visual reasoning datasets.
Primary Dataset
Zebra-CoT
- Source: Hugging Face
- Paper: arXiv:2507.16746
- Size: 182,384 samples (58.9 GB)
- License: CC BY-NC 4.0
- Modalities: Image, Text
- Use Case: Training multimodal models for Visual Chain of Thought reasoning
Related Datasets
Visual-CoT (NeurIPS'24 Spotlight)
- Source: Hugging Face
- GitHub: deepcs233/Visual-CoT
- Size: 438K question-answer pairs
- Description: Multi-turn processing pipeline for MLLMs with intermediate bounding boxes
LLaVA-CoT-100k (ICCV 2025)
- Source: Hugging Face
- GitHub: PKU-YuanGroup/LLaVA-CoT
- Description: Visual language model with spontaneous systematic reasoning
MM-CoT (Amazon Science)
- GitHub: amazon-science/mm-cot
- Description: Multimodal CoT with decoupled training framework for rationale generation
MME-CoT Benchmark
- GitHub: CaraJ7/MME-CoT
- Description: Benchmark for evaluating CoT reasoning across math, science, OCR, logic, space-time
Pre-trained Models
| Model | Base | Link |
|---|---|---|
| Anole-Zebra-CoT | Anole-7B | HuggingFace |
| Bagel-Zebra-CoT | Bagel-7B | HuggingFace |
Adoption Status
- Zebra-CoT integrated
- Visual-CoT samples pending
- Custom samples in development