luckychao commited on
Commit
0a33b90
·
verified ·
1 Parent(s): d9db579

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -7,7 +7,7 @@ library_name: ThinkMorph-7B
7
  ---
8
 
9
  <p align="center">
10
- <img src="https://github.com/ThinkMorph/ThinkMorph/blob/main/assets/logo.png" width="40%"> <br>
11
  </p>
12
 
13
 
@@ -58,11 +58,11 @@ For installation, usage instructions, and further documentation, please visit ou
58
  Multimodal reasoning demands synergistic coordination of language and vision. However, determining what constitutes meaningful interleaved reasoning is non-trivial, and current approaches lack a generalizable recipe.
59
  We present **ThinkMorph**, a unified model that enables such generalization through a principled approach: treating text and images as complementary modalities that mutually advance reasoning.
60
  <p align="center">
61
- <img src="https://github.com/ThinkMorph/ThinkMorph/blob/main/assets/interleaved_design.jpg" width="100%"> <br>
62
  </p>
63
  Guided by this principle, we identify tasks requiring concrete, verifiable visual engagement and design a high-quality data pipeline that trains models to generate interleaved images and text as progressive reasoning traces.
64
  <p align="center">
65
- <img src="https://github.com/ThinkMorph/ThinkMorph/blob/main/assets/thinkmorph_main.jpg" width="100%"> <br>
66
  </p>
67
 
68
  ThinkMorph delivers substantial gains on **vision-centric** tasks, achieving an average improvement of 34.74% over the base model while consistently surpassing text-only and image-only modes.
@@ -70,7 +70,7 @@ By fine-tuning with **merely ~24K** samples, it achieves out-of-domain performan
70
 
71
  Intriguingly, ThinkMorph unlocks emergent properties that represent a *hallmark of multimodal intelligence*: the elicitation of unseen visual manipulation skills, the self-adaptive switching between reasoning modes according to task complexity, and better test-time scaling via diversified thoughts.
72
  <p align="center">
73
- <img src="https://github.com/ThinkMorph/ThinkMorph/blob/main/assets/emrging_prop.jpg" width="100%"> <br>
74
  </p>
75
  These findings suggest promising directions for future work to characterize the emergent capabilities of unified models for multimodal reasoning.
76
 
 
7
  ---
8
 
9
  <p align="center">
10
+ <img src="https://github.com/ThinkMorph/ThinkMorph/raw/main/assets/logo.png" width="40%"> <br>
11
  </p>
12
 
13
 
 
58
  Multimodal reasoning demands synergistic coordination of language and vision. However, determining what constitutes meaningful interleaved reasoning is non-trivial, and current approaches lack a generalizable recipe.
59
  We present **ThinkMorph**, a unified model that enables such generalization through a principled approach: treating text and images as complementary modalities that mutually advance reasoning.
60
  <p align="center">
61
+ <img src="https://github.com/ThinkMorph/ThinkMorph/raw/main/assets/interleaved_design.jpg" width="100%"> <br>
62
  </p>
63
  Guided by this principle, we identify tasks requiring concrete, verifiable visual engagement and design a high-quality data pipeline that trains models to generate interleaved images and text as progressive reasoning traces.
64
  <p align="center">
65
+ <img src="https://github.com/ThinkMorph/ThinkMorph/raw/main/assets/thinkmorph_main.jpg" width="100%"> <br>
66
  </p>
67
 
68
  ThinkMorph delivers substantial gains on **vision-centric** tasks, achieving an average improvement of 34.74% over the base model while consistently surpassing text-only and image-only modes.
 
70
 
71
  Intriguingly, ThinkMorph unlocks emergent properties that represent a *hallmark of multimodal intelligence*: the elicitation of unseen visual manipulation skills, the self-adaptive switching between reasoning modes according to task complexity, and better test-time scaling via diversified thoughts.
72
  <p align="center">
73
+ <img src="https://github.com/ThinkMorph/ThinkMorph/raw/main/assets/emrging_prop.jpg" width="100%"> <br>
74
  </p>
75
  These findings suggest promising directions for future work to characterize the emergent capabilities of unified models for multimodal reasoning.
76