Add model card

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +28 -0
README.md ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: image-text-to-text
3
+ ---
4
+
5
+ # SwimBird: Eliciting Switchable Reasoning Mode in Hybrid Autoregressive MLLMs
6
+
7
+ [Paper](https://huggingface.co/papers/2602.06040) | [Project Page](https://accio-lab.github.io/SwimBird) | [GitHub](https://github.com/Accio-Lab/SwimBird)
8
+
9
+ SwimBird is a reasoning-switchable Multimodal Large Language Model (MLLM) that dynamically switches among three reasoning modes conditioned on the input:
10
+ 1. **Text-only reasoning**: Standard textual Chain-of-Thought.
11
+ 2. **Vision-only reasoning**: Utilizes continuous hidden states as "visual thoughts" for vision-intensive tasks.
12
+ 3. **Interleaved vision-text reasoning**: A combination of both textual and visual thinking modalities.
13
+
14
+ By enabling flexible, query-adaptive mode selection, SwimBird preserves strong textual logic while substantially improving performance on vision-dense tasks.
15
+
16
+ ## Method
17
+ SwimBird adopts a hybrid autoregressive formulation that unifies next-token prediction for textual thoughts with next-embedding prediction for visual thoughts. To enable this capability, the authors designed a systematic reasoning-mode curation strategy to construct **SwimBird-SFT-92K**, a diverse supervised fine-tuning dataset covering all three reasoning patterns. Experiments across diverse benchmarks covering textual reasoning and challenging visual understanding demonstrate that SwimBird achieves state-of-the-art results and robust gains over prior fixed-pattern multimodal reasoning methods.
18
+
19
+ ## Citation
20
+
21
+ ```bibtex
22
+ @article{tong2025swimbird,
23
+ title={SwimBird: Eliciting Switchable Reasoning Mode in Hybrid Autoregressive MLLMs},
24
+ author={Tong, Jintao and Yan, Shilin and Xue, Hongwei and Tang, Xiaojun and Shi, Kunyu and Zhang, Guannan and Li, Ruixuan and Zou, Yixiong},
25
+ journal={arXiv preprint arXiv:2602.06040},
26
+ year={2025}
27
+ }
28
+ ```