|
|
--- |
|
|
library_name: transformers |
|
|
tags: |
|
|
- multimodal |
|
|
- reasoning |
|
|
- sft |
|
|
- rl |
|
|
datasets: |
|
|
- multimodal-reasoning-lab/Zebra-CoT |
|
|
- ModalityDance/Omni-Bench |
|
|
base_model: |
|
|
- GAIR/Anole-7b-v0.1 |
|
|
pipeline_tag: any-to-any |
|
|
--- |
|
|
|
|
|
# Omni-R1 |
|
|
|
|
|
Omni-R1 is trained with multimodal interleaved supervision. It uses PeSFT for stable functional image generation, then PeRPO for RL refinement on unified tasks. |
|
|
|
|
|
<p align="center"> |
|
|
<a href="https://arxiv.org/abs/2601.09536"><b>Paper</b>👁️</a> · |
|
|
<a href="https://github.com/ModalityDance/Omni-R1"><b>Code</b>🐙</a> · |
|
|
<a href="https://huggingface.co/datasets/ModalityDance/Omni-Bench"><b>Omni-Bench</b>🧪</a> |
|
|
</p> |
|
|
|
|
|
## Citation |
|
|
```bibtex |
|
|
@misc{cheng2026omnir1unifiedgenerativeparadigm, |
|
|
title={Omni-R1: Towards the Unified Generative Paradigm for Multimodal Reasoning}, |
|
|
author={Dongjie Cheng and Yongqi Li and Zhixin Ma and Hongru Cai and Yupeng Hu and Wenjie Wang and Liqiang Nie and Wenjie Li}, |
|
|
year={2026}, |
|
|
eprint={2601.09536}, |
|
|
archivePrefix={arXiv}, |
|
|
primaryClass={cs.AI}, |
|
|
url={https://arxiv.org/abs/2601.09536}, |
|
|
} |
|
|
``` |