File size: 2,142 Bytes
9537200 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 |
# Sheikh-Freemium
[](https://github.com/isshiki-dev/Sheikh-Freemium/actions/workflows/validate-dataset.yml)
[](https://github.com/isshiki-dev/Sheikh-Freemium/actions/workflows/ci.yml)
[](https://github.com/isshiki-dev/Sheikh-Freemium/actions/workflows/sync-huggingface.yml)
## Zebra-CoT Visual Reasoning Dataset
A repository for managing and developing Visual Chain of Thought (Visual CoT) datasets, based on the [Zebra-CoT](https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT) framework.
### Quick Links
| Resource | Link |
|----------|------|
| ๐ Dataset | [dataset/](./dataset/) |
| ๐ Architecture | [dataset/docs/ARCHITECTURE.md](./dataset/docs/ARCHITECTURE.md) |
| ๐ Sources | [dataset/SOURCES.md](./dataset/SOURCES.md) |
| ๐ค HuggingFace | [Zebra-CoT Dataset](https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT) |
### Features
- **182K+ samples** across 4 reasoning categories
- **Interleaved text-image** reasoning traces
- **MoT architecture** support (Mixture-of-Transformer-Experts)
- **Automated validation** via GitHub Actions
- **HuggingFace sync** for dataset updates
### Categories
| Category | Samples | Description |
|----------|---------|-------------|
| Visual Logic & Games | 66,854 | Chess, strategic games, visual logic |
| 2D Visual Reasoning | 51,899 | Visual search, jigsaw puzzles |
| 3D Visual Reasoning | 39,610 | Multi-hop inference, embodied planning |
| Scientific Reasoning | 24,021 | Geometry, physics, algorithms |
### Getting Started
```bash
# Clone the repository
git clone https://github.com/isshiki-dev/Sheikh-Freemium.git
cd Sheikh-Freemium
# Validate local samples
python dataset/scripts/validate.py --input dataset/samples/
```
### License
Dataset content: [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/)
|