--- base_model: - Qwen/Qwen2.5-VL-7B-Instruct language: - en library_name: transformers license: apache-2.0 pipeline_tag: image-text-to-text tags: - transformers - multimodal --- ## 🌟 ReVisual-R1 (7B) β€” Open-Source Multimodal Reasoner > **One cold-start, two RL stages, endless reasoning power.** --- ### πŸ”‘ Highlights * **SOTA on 9 tough benchmarks** covering visual–math + text reasoning. * **Three-Stage SRO Training** 1. **Text Cold-Start** β€” seed deep reflection 2. **Multimodal RL** β€” align vision & logic 3. **Text RL** β€” polish fluency & brevity * **PAD** (Prioritized Advantage Distillation) keeps gradients alive. * **Efficient-Length Reward** = concise, self-reflective CoT. --- ### πŸ“š Resources * [Paper](https://arxiv.org/abs/2506.04207) * [Code](https://github.com/CSfufu/Revisual-R1) --- ### πŸ“Œ Citation ```bibtex @misc{chen2025advancingmultimodalreasoningoptimized, title = {Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning}, author = {Shuang Chen and Yue Guo and Zhaochen Su and Yafu Li and Yulun Wu and Jiacheng Chen and Jiayu Chen and Weijie Wang and Xiaoye Qu and Yu Cheng}, year = {2025}, eprint = {2506.04207}, archivePrefix = {arXiv}, primaryClass = {cs.LG}, url = {https://arxiv.org/abs/2506.04207} } ``` Take ReVisual-R1 for a spin and let us know what you build! 🎯