ModalityDance
/

Omni-R1-Zero

image-text-to-text

Model card Files Files and versions

charlesdj commited on Jan 15

Commit

20b00ad

·

verified ·

1 Parent(s): 0c73807

Create README.md

Files changed (1) hide show

README.md +37 -0

README.md ADDED Viewed

	@@ -0,0 +1,37 @@

+---
+library_name: transformers
+tags:
+- multimodal
+- reasoning
+- sft
+- rl
+datasets:
+- LightChen2333/M3CoT
+- ModalityDance/Omni-Bench
+base_model:
+- GAIR/Anole-7b-v0.1
+license: mit
+---
+# Omni-R1-Zero
+Omni-R1-Zero is trained without multimodal annotations. It bootstraps step-wise visualizations from text-only CoT seeds, then follows the SFT→RL recipe to learn interleaved multimodal reasoning.
+<p align="center">
+  <a href="https://arxiv.org/abs/2601.09536"><b>Paper</b>👁️</a> ·
+  <a href="https://github.com/ModalityDance/Omni-R1"><b>Code</b>🐙</a> ·
+  <a href="https://huggingface.co/datasets/ModalityDance/Omni-Bench"><b>Omni-Bench</b>🧪</a>
+</p>
+## Citation
+```bibtex
+@misc{cheng2026omnir1unifiedgenerativeparadigm,
+      title={Omni-R1: Towards the Unified Generative Paradigm for Multimodal Reasoning},
+      author={Dongjie Cheng and Yongqi Li and Zhixin Ma and Hongru Cai and Yupeng Hu and Wenjie Wang and Liqiang Nie and Wenjie Li},
+      year={2026},
+      eprint={2601.09536},
+      archivePrefix={arXiv},
+      primaryClass={cs.AI},
+      url={https://arxiv.org/abs/2601.09536},
+}
+```