shk-bd's picture
Upload folder using huggingface_hub
9537200 verified

Dataset Sources & References

This document tracks official and related visual reasoning datasets.

Primary Dataset

Zebra-CoT

  • Source: Hugging Face
  • Paper: arXiv:2507.16746
  • Size: 182,384 samples (58.9 GB)
  • License: CC BY-NC 4.0
  • Modalities: Image, Text
  • Use Case: Training multimodal models for Visual Chain of Thought reasoning

Related Datasets

Visual-CoT (NeurIPS'24 Spotlight)

  • Source: Hugging Face
  • GitHub: deepcs233/Visual-CoT
  • Size: 438K question-answer pairs
  • Description: Multi-turn processing pipeline for MLLMs with intermediate bounding boxes

LLaVA-CoT-100k (ICCV 2025)

MM-CoT (Amazon Science)

  • GitHub: amazon-science/mm-cot
  • Description: Multimodal CoT with decoupled training framework for rationale generation

MME-CoT Benchmark

  • GitHub: CaraJ7/MME-CoT
  • Description: Benchmark for evaluating CoT reasoning across math, science, OCR, logic, space-time

Pre-trained Models

Model Base Link
Anole-Zebra-CoT Anole-7B HuggingFace
Bagel-Zebra-CoT Bagel-7B HuggingFace

Adoption Status

  • Zebra-CoT integrated
  • Visual-CoT samples pending
  • Custom samples in development