File size: 1,297 Bytes
859807f
b80d637
 
 
859807f
 
b80d637
859807f
b80d637
859807f
b80d637
 
 
 
 
 
859807f
b80d637
859807f
b80d637
859807f
b80d637
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
---
license: mit
base_model:
- CodeGoat24/UnifiedReward-2.0-qwen3vl-2b
---

## Model Summary

`UnifiedReward-Think-qwen3vl-2b` is the first unified multimodal CoT reward model, capable of multi-dimensional, step-by-step long-chain reasoning for both visual understanding and generation reward tasks.

For further details, please refer to the following resources:
- ๐Ÿ“ฐ Paper: https://arxiv.org/pdf/2505.03318
- ๐Ÿช Project Page: https://codegoat24.github.io/UnifiedReward/think
- ๐Ÿค— Model Collections: https://huggingface.co/collections/CodeGoat24/unifiedreward-models-67c3008148c3a380d15ac63a
- ๐Ÿค— Dataset Collections: https://huggingface.co/collections/CodeGoat24/unifiedreward-training-data-67c300d4fd5eff00fa7f1ede
- ๐Ÿ‘‹ Point of Contact: [Yibin Wang](https://codegoat24.github.io)

๐Ÿš€ All inference code is provided at [Github](https://github.com/CodeGoat24/UnifiedReward/tree/main/UnifiedReward-Think/inference_qwen/UnifiedReward-Think-qwen3vl-inference).

## Citation

```
@article{unifiedreward-think,
  title={Unified multimodal chain-of-thought reward model through reinforcement fine-tuning},
  author={Wang, Yibin and Li, Zhimin and Zang, Yuhang and Wang, Chunyu and Lu, Qinglin and Jin, Cheng and Wang, Jiaqi},
  journal={arXiv preprint arXiv:2505.03318},
  year={2025}
}
```