Any-to-Any
Transformers
Safetensors
chameleon
image-to-text
multimodal
reasoning
sft
rl
Omni-R1 / README.md
charlesdj's picture
Create README.md
d36b1f3 verified
|
raw
history blame
1.09 kB
metadata
library_name: transformers
tags:
  - multimodal
  - reasoning
  - sft
  - rl
  - perception
datasets:
  - multimodal-reasoning-lab/Zebra-CoT
  - ModalityDance/Omni-Bench
base_model:
  - GAIR/Anole-7b-v0.1

Omni-R1

Omni-R1 is trained with multimodal interleaved supervision. It uses PeSFT for stable functional image generation, then PeRPO for RL refinement on unified tasks.

Paper👁️ · Code🐙 · Omni-Bench🧪

Citation

@misc{cheng2026omnir1unifiedgenerativeparadigm,
      title={Omni-R1: Towards the Unified Generative Paradigm for Multimodal Reasoning}, 
      author={Dongjie Cheng and Yongqi Li and Zhixin Ma and Hongru Cai and Yupeng Hu and Wenjie Wang and Liqiang Nie and Wenjie Li},
      year={2026},
      eprint={2601.09536},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2601.09536}, 
}