Zebra-CoT Dataset

A diverse large-scale dataset for Visual Chain of Thought (Visual CoT) reasoning, containing logically coherent interleaved text-image reasoning traces.
Official Resources
Dataset Statistics
| Metric |
Value |
| Total Samples |
182,384 |
| Download Size |
58.9 GB |
| Format |
Parquet |
| License |
CC BY-NC 4.0 |
Categories
| Category |
Samples |
Percentage |
| Visual Logic & Strategic Games |
66,854 |
36.7% |
| 2D Visual Reasoning |
51,899 |
28.5% |
| 3D Visual Reasoning |
39,610 |
21.7% |
| Scientific Reasoning |
24,021 |
13.2% |
Category Details
1. Scientific Reasoning (samples/scientific/)
- Geometry: Geometric proofs and spatial reasoning
- Physics: Physical simulations and force diagrams
- Algorithms: Algorithm visualization and step-by-step execution
2. 2D Visual Reasoning (samples/visual_2d/)
- Visual Search: Finding patterns and objects
- Jigsaw Puzzles: Spatial arrangement tasks
3. 3D Reasoning (samples/visual_3d/)
- 3D Multi-hop Inference: Complex spatial reasoning
- Embodied Planning: Robot navigation and manipulation
4. Visual Logic & Games (samples/logic_games/)
- Chess: Strategic game analysis
- Visual Logic: Pattern-based logical problems
Sample Format
Each sample contains:
| Field |
Description |
problem |
Textual description of the question |
problem_image |
Zero or more images accompanying the problem |
reasoning_image |
Visual aids supporting intermediate reasoning steps |
text_reasoning_trace |
Sequence of thoughts with image placeholders |
final_answer |
Solution to the problem |
Quick Start
Load from Hugging Face
from datasets import load_dataset
dataset = load_dataset("multimodal-reasoning-lab/Zebra-CoT")
sample = dataset['train'][0]
print(sample['problem'])
Local Validation
python scripts/validate.py --input samples/
Citation
@misc{li2025zebracot,
title={Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning},
author={Ang Li and Charles Wang and Kaiyu Yue and Zikui Cai and Ollie Liu and Deqing Fu and Peng Guo and Wang Bill Zhu and Vatsal Sharan and Robin Jia and Willie Neiswanger and Furong Huang and Tom Goldstein and Micah Goldblum},
year={2025},
eprint={2507.16746},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2507.16746},
}
License
This dataset is released under CC BY-NC 4.0.