File size: 3,963 Bytes
1880343
 
 
d9db579
1880343
d9db579
1880343
 
d9db579
0a33b90
1880343
 
 
d9db579
1880343
754aa49
1880343
d9db579
ad1a4ab
1880343
d9db579
 
1880343
 
ad1a4ab
1880343
d9db579
 
1880343
 
f726c91
1880343
d9db579
 
1880343
 
d9db579
 
 
 
1880343
 
d9db579
1880343
d9db579
 
1880343
d9db579
1880343
 
 
d9db579
1880343
d9db579
ad1a4ab
d9db579
1880343
ad1a4ab
 
 
 
 
1880343
 
 
d9db579
1880343
d9db579
 
 
 
 
 
 
 
 
 
 
 
 
 
1880343
 
 
d9db579
1880343
ad1a4ab
 
 
 
 
 
 
 
 
d9db579
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
---
license: apache-2.0
base_model:
- ByteDance-Seed/BAGEL-7B-MoT
pipeline_tag: any-to-any
library_name: ThinkMorph-7B
---

<p align="center">
    <img src="https://github.com/ThinkMorph/ThinkMorph/raw/main/assets/logo.png" width="40%"> <br>
</p>


## Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning



<p align="center">
  <a href="https://thinkmorph.github.io/">
    <img
      src="https://img.shields.io/badge/ThinkMorph-Website-0A66C2?logo=safari&logoColor=white"
      alt="ThinkMorph Website"
    />
  </a>
  <a href="https://arxiv.org/abs/2510.27492">
    <img
      src="https://img.shields.io/badge/ThinkMorph-Paper-red?logo=arxiv&logoColor=red"
      alt="ThinkMorph Paper on arXiv"
    />
  </a>
  <a href="https://github.com/ThinkMorph/ThinkMorph">
      <img 
        alt="Github" src="https://img.shields.io/badge/ThinkMorph-Codebase-536af5?color=536af5&logo=github"
        alt="ThinkMorph Codebase"
      />
  </a>
  <a href="https://huggingface.co/ThinkMorph">
    <img 
        src="https://img.shields.io/badge/ThinkMorph-Dataset-yellow?logo=huggingface&logoColor=yellow" 
        alt="ThinkMorph Dataset"
    />
  </a>
  <!-- <a href="https://demo.bagel-ai.org/">
    <img
      src="https://img.shields.io/badge/BAGEL-Demo-blue?logo=googleplay&logoColor=blue"
      alt="BAGEL Demo"
    />
  </a> -->
</p>


## 👀 About ThinkMorph

<p align="center">
    <img src="https://github.com/ThinkMorph/ThinkMorph/raw/main/assets/thinkmorph.jpg" width="100%"> <br>
</p>

We present **ThinkMorph**, a unified model fine-tuned on ∼24K high-quality interleaved reasoning traces across tasks, learning to generate progressive text–image reasoning steps that
concretely manipulate visual content while maintaining coherent verbal logic.

Beyond strong vision-benchmark performance and robust out-of-domain generalization, ThinkMorph demonstrates emergent multimodal intelligence, including novel visual manipulation skills and so on.
These findings suggest promising directions for characterizing the emergent capabilities of unified models for multimodal reasoning.



## 📊 Benchmarks

| Model | Size |  | VSP | VisPuzzle | ChartQA | VStar | BLINK-J | MMVP | SAT | BLINK | CV-Bench |
| --- | --- | --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
| GPT-4o | – |  | 33.50 | 43.75 | 76.34 | 61.78 | 72.67 | 84.67 | 28.00 | 60.28 | 75.61 |
| GPT-5 | – |  | 57.33 | 78.00 | 80.85 | 71.73 | 77.33 | 86.33 | 73.30 | 69.86 | 85.46 |
| Gemini 2.5 Flash | – |  | 59.33 | 47.00 | 83.79 | 70.68 | 66.00 | 80.33 | 56.00 | 67.49 | 85.07 |
| InternVL3.5 | 8B |  | 8.17 | 34.75 | 76.26 | 68.59 | 71.33 | 76.33 | 45.33 | 59.60 | 81.99 |
|  | 38B |  | 20.16 | 36.50 | 80.44 | 76.96 | 80.67 | 80.33 | 49.33 | 62.65 | 85.96 |
| Qwen2.5-VL | 7B |  | 2.16 | 34.75 | 78.12 | 76.44 | 59.33 | 77.33 | 51.33 | 55.92 | 75.20 |
|  | 72B |  | 41.83 | 40.00 | 82.03 | 85.86 | 61.33 | 82.00 | 64.67 | 61.91 | 82.54 |
| Janus-pro | 7B |  | 0.00 | 33.50 | 43.08 | 38.22 | 50.67 | 63.33 | 22.00 | 38.51 | 67.83 |
| Chameleon | 7B |  | 0.83 | 30.50 | 5.74 | 28.27 | 0.67 | 47.67 | 10.67 | 16.52 | 36.52 |
| Bagel | 7B |  | 0.83* | 35.00* | 61.82 | 55.49 | 67.33 | 70.33 | 44.67 | 47.66 | 76.03 |
| **ThinkMorph** | **7B** |  | **75.83** | **79.00** | **78.10** | **67.02** | **72.00** | **80.33** | **52.67** | **60.07** | **80.82** |
| Δ (vs Bagel) |  |  | +75.00 | +44.00 | +16.28 | +11.53 | +4.67 | +10.00 | +8.00 | +12.41 | +4.79 |


## ✍️ Citation

```bibtex
@misc{gu2025thinkmorphemergentpropertiesmultimodal,
      title={ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning}, 
      author={Jiawei Gu and Yunzhuo Hao and Huichen Will Wang and Linjie Li and Michael Qizhe Shieh and Yejin Choi and Ranjay Krishna and Yu Cheng},
      year={2025},
      eprint={2510.27492},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2510.27492}, 
}
```