|
|
--- |
|
|
license: apache-2.0 |
|
|
base_model: |
|
|
- ByteDance-Seed/BAGEL-7B-MoT |
|
|
pipeline_tag: any-to-any |
|
|
library_name: ThinkMorph-7B |
|
|
--- |
|
|
|
|
|
<p align="center"> |
|
|
<img src="https://github.com/ThinkMorph/ThinkMorph/raw/main/assets/logo.png" width="40%"> <br> |
|
|
</p> |
|
|
|
|
|
|
|
|
## Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning |
|
|
|
|
|
|
|
|
|
|
|
<p align="center"> |
|
|
<a href="https://thinkmorph.github.io/"> |
|
|
<img |
|
|
src="https://img.shields.io/badge/ThinkMorph-Website-0A66C2?logo=safari&logoColor=white" |
|
|
alt="ThinkMorph Website" |
|
|
/> |
|
|
</a> |
|
|
<a href="https://arxiv.org/abs/2510.27492"> |
|
|
<img |
|
|
src="https://img.shields.io/badge/ThinkMorph-Paper-red?logo=arxiv&logoColor=red" |
|
|
alt="ThinkMorph Paper on arXiv" |
|
|
/> |
|
|
</a> |
|
|
<a href="https://github.com/ThinkMorph/ThinkMorph"> |
|
|
<img |
|
|
alt="Github" src="https://img.shields.io/badge/ThinkMorph-Codebase-536af5?color=536af5&logo=github" |
|
|
alt="ThinkMorph Codebase" |
|
|
/> |
|
|
</a> |
|
|
<a href="https://huggingface.co/ThinkMorph"> |
|
|
<img |
|
|
src="https://img.shields.io/badge/ThinkMorph-Dataset-yellow?logo=huggingface&logoColor=yellow" |
|
|
alt="ThinkMorph Dataset" |
|
|
/> |
|
|
</a> |
|
|
<!-- <a href="https://demo.bagel-ai.org/"> |
|
|
<img |
|
|
src="https://img.shields.io/badge/BAGEL-Demo-blue?logo=googleplay&logoColor=blue" |
|
|
alt="BAGEL Demo" |
|
|
/> |
|
|
</a> --> |
|
|
</p> |
|
|
|
|
|
|
|
|
## 👀 About ThinkMorph |
|
|
|
|
|
<p align="center"> |
|
|
<img src="https://github.com/ThinkMorph/ThinkMorph/raw/main/assets/thinkmorph.jpg" width="100%"> <br> |
|
|
</p> |
|
|
|
|
|
We present **ThinkMorph**, a unified model fine-tuned on ∼24K high-quality interleaved reasoning traces across tasks, learning to generate progressive text–image reasoning steps that |
|
|
concretely manipulate visual content while maintaining coherent verbal logic. |
|
|
|
|
|
Beyond strong vision-benchmark performance and robust out-of-domain generalization, ThinkMorph demonstrates emergent multimodal intelligence, including novel visual manipulation skills and so on. |
|
|
These findings suggest promising directions for characterizing the emergent capabilities of unified models for multimodal reasoning. |
|
|
|
|
|
|
|
|
|
|
|
## 📊 Benchmarks |
|
|
|
|
|
| Model | Size | | VSP | VisPuzzle | ChartQA | VStar | BLINK-J | MMVP | SAT | BLINK | CV-Bench | |
|
|
| --- | --- | --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | |
|
|
| GPT-4o | – | | 33.50 | 43.75 | 76.34 | 61.78 | 72.67 | 84.67 | 28.00 | 60.28 | 75.61 | |
|
|
| GPT-5 | – | | 57.33 | 78.00 | 80.85 | 71.73 | 77.33 | 86.33 | 73.30 | 69.86 | 85.46 | |
|
|
| Gemini 2.5 Flash | – | | 59.33 | 47.00 | 83.79 | 70.68 | 66.00 | 80.33 | 56.00 | 67.49 | 85.07 | |
|
|
| InternVL3.5 | 8B | | 8.17 | 34.75 | 76.26 | 68.59 | 71.33 | 76.33 | 45.33 | 59.60 | 81.99 | |
|
|
| | 38B | | 20.16 | 36.50 | 80.44 | 76.96 | 80.67 | 80.33 | 49.33 | 62.65 | 85.96 | |
|
|
| Qwen2.5-VL | 7B | | 2.16 | 34.75 | 78.12 | 76.44 | 59.33 | 77.33 | 51.33 | 55.92 | 75.20 | |
|
|
| | 72B | | 41.83 | 40.00 | 82.03 | 85.86 | 61.33 | 82.00 | 64.67 | 61.91 | 82.54 | |
|
|
| Janus-pro | 7B | | 0.00 | 33.50 | 43.08 | 38.22 | 50.67 | 63.33 | 22.00 | 38.51 | 67.83 | |
|
|
| Chameleon | 7B | | 0.83 | 30.50 | 5.74 | 28.27 | 0.67 | 47.67 | 10.67 | 16.52 | 36.52 | |
|
|
| Bagel | 7B | | 0.83* | 35.00* | 61.82 | 55.49 | 67.33 | 70.33 | 44.67 | 47.66 | 76.03 | |
|
|
| **ThinkMorph** | **7B** | | **75.83** | **79.00** | **78.10** | **67.02** | **72.00** | **80.33** | **52.67** | **60.07** | **80.82** | |
|
|
| Δ (vs Bagel) | | | +75.00 | +44.00 | +16.28 | +11.53 | +4.67 | +10.00 | +8.00 | +12.41 | +4.79 | |
|
|
|
|
|
|
|
|
## ✍️ Citation |
|
|
|
|
|
```bibtex |
|
|
@misc{gu2025thinkmorphemergentpropertiesmultimodal, |
|
|
title={ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning}, |
|
|
author={Jiawei Gu and Yunzhuo Hao and Huichen Will Wang and Linjie Li and Michael Qizhe Shieh and Yejin Choi and Ranjay Krishna and Yu Cheng}, |
|
|
year={2025}, |
|
|
eprint={2510.27492}, |
|
|
archivePrefix={arXiv}, |
|
|
primaryClass={cs.CV}, |
|
|
url={https://arxiv.org/abs/2510.27492}, |
|
|
} |
|
|
``` |
|
|
|