--- license: mit base_model: - CodeGoat24/UnifiedReward-2.0-qwen35-9b --- ## Model Summary `UnifiedReward-Think-qwen35-9b` is the first unified multimodal CoT reward model, capable of multi-dimensional, step-by-step long-chain reasoning for both visual understanding and generation reward tasks. For further details, please refer to the following resources: - 📰 Paper: https://arxiv.org/pdf/2505.03318 - 🪐 Project Page: https://codegoat24.github.io/UnifiedReward/think - 🤗 Model Collections: https://huggingface.co/collections/CodeGoat24/unifiedreward-models-67c3008148c3a380d15ac63a - 🤗 Dataset Collections: https://huggingface.co/collections/CodeGoat24/unifiedreward-training-data-67c300d4fd5eff00fa7f1ede - 👋 Point of Contact: [Yibin Wang](https://codegoat24.github.io) ## vLLM Server Deployment ``` export VLLM_DISABLE_FLASHINFER_GDN_PREFILL=1 export TOKENIZERS_PARALLELISM=false vllm serve CodeGoat24/UnifiedReward-Think-qwen35-9b \ --host localhost \ --port 8080 \ --trust-remote-code \ --served-model-name UnifiedReward \ --gpu-memory-utilization 0.95 \ --mm-encoder-tp-mode data \ --mm-processor-cache-type shm \ --enable-prefix-caching \ --tensor-parallel-size 8 \ --default-chat-template-kwargs '{"enable_thinking": false}' ``` The inference code is provided [here](https://github.com/CodeGoat24/UnifiedReward/tree/main/UnifiedReward-Think/inference_qwen/UnifiedReward-Think-qwen3-inference). ## Citation ``` @article{unifiedreward-think, title={Unified multimodal chain-of-thought reward model through reinforcement fine-tuning}, author={Wang, Yibin and Li, Zhimin and Zang, Yuhang and Wang, Chunyu and Lu, Qinglin and Jin, Cheng and Wang, Jiaqi}, journal={arXiv preprint arXiv:2505.03318}, year={2025} } ```