CodeGoat24
/

UnifiedReward-Think-qwen3vl-32b

Model card Files Files and versions

CodeGoat24 commited on Nov 25, 2025

Commit

bd9ff42

·

verified ·

1 Parent(s): 36d498e

Update README.md

Files changed (1) hide show

README.md +29 -3

README.md CHANGED Viewed

@@ -1,3 +1,29 @@
----
-license: mit
----

+---
+license: mit
+base_model:
+- CodeGoat24/UnifiedReward-2.0-qwen3vl-32b
+---
+## Model Summary
+`UnifiedReward-Think-qwen3vl-32b` is the first unified multimodal CoT reward model, capable of multi-dimensional, step-by-step long-chain reasoning for both visual understanding and generation reward tasks.
+For further details, please refer to the following resources:
+- 📰 Paper: https://arxiv.org/pdf/2505.03318
+- 🪐 Project Page: https://codegoat24.github.io/UnifiedReward/think
+- 🤗 Model Collections: https://huggingface.co/collections/CodeGoat24/unifiedreward-models-67c3008148c3a380d15ac63a
+- 🤗 Dataset Collections: https://huggingface.co/collections/CodeGoat24/unifiedreward-training-data-67c300d4fd5eff00fa7f1ede
+- 👋 Point of Contact: [Yibin Wang](https://codegoat24.github.io)
+🚀 All inference code is provided at [Github](https://github.com/CodeGoat24/UnifiedReward/tree/main/UnifiedReward-Think/inference_qwen/UnifiedReward-Think-qwen3vl-inference).
+## Citation
+```
+@article{unifiedreward-think,
+  title={Unified multimodal chain-of-thought reward model through reinforcement fine-tuning},
+  author={Wang, Yibin and Li, Zhimin and Zang, Yuhang and Wang, Chunyu and Lu, Qinglin and Jin, Cheng and Wang, Jiaqi},
+  journal={arXiv preprint arXiv:2505.03318},
+  year={2025}
+}
+```