CodeGoat24
/

UnifiedReward-Think-qwen35-9b

Model card Files Files and versions

UnifiedReward-Think-qwen35-9b / README.md

CodeGoat24's picture

Create README.md

a5af81e verified 29 days ago

|

history blame contribute delete

1.76 kB

	---
	license: mit
	base_model:
	- CodeGoat24/UnifiedReward-2.0-qwen35-9b
	---

	## Model Summary

	`UnifiedReward-Think-qwen35-9b` is the first unified multimodal CoT reward model, capable of multi-dimensional, step-by-step long-chain reasoning for both visual understanding and generation reward tasks.

	For further details, please refer to the following resources:
	- 📰 Paper: https://arxiv.org/pdf/2505.03318
	- 🪐 Project Page: https://codegoat24.github.io/UnifiedReward/think
	- 🤗 Model Collections: https://huggingface.co/collections/CodeGoat24/unifiedreward-models-67c3008148c3a380d15ac63a
	- 🤗 Dataset Collections: https://huggingface.co/collections/CodeGoat24/unifiedreward-training-data-67c300d4fd5eff00fa7f1ede
	- 👋 Point of Contact: [Yibin Wang](https://codegoat24.github.io)

	## vLLM Server Deployment

	```
	export VLLM_DISABLE_FLASHINFER_GDN_PREFILL=1
	export TOKENIZERS_PARALLELISM=false
	vllm serve CodeGoat24/UnifiedReward-Think-qwen35-9b \
	--host localhost \
	--port 8080 \
	--trust-remote-code \
	--served-model-name UnifiedReward \
	--gpu-memory-utilization 0.95 \
	--mm-encoder-tp-mode data \
	--mm-processor-cache-type shm \
	--enable-prefix-caching \
	--tensor-parallel-size 8 \
	--default-chat-template-kwargs '{"enable_thinking": false}'
	```

	The inference code is provided [here](https://github.com/CodeGoat24/UnifiedReward/tree/main/UnifiedReward-Think/inference_qwen/UnifiedReward-Think-qwen3-inference).

	## Citation

	```
	@article{unifiedreward-think,
	title={Unified multimodal chain-of-thought reward model through reinforcement fine-tuning},
	author={Wang, Yibin and Li, Zhimin and Zang, Yuhang and Wang, Chunyu and Lu, Qinglin and Jin, Cheng and Wang, Jiaqi},
	journal={arXiv preprint arXiv:2505.03318},
	year={2025}
	}
	```