metadata
license: mit
base_model:
- CodeGoat24/UnifiedReward-Think-qwen35-27b
datasets:
- CodeGoat24/UnifiedReward-Flex-SFT-90K
Model Summary
UnifiedReward-Flex-qwen35-27b is a unified personalized reward model for vision generation that couples reward modeling with flexible and context-adaptive reasoning!!
For further details, please refer to the following resources:
- 📰 Paper: https://arxiv.org/abs/2602.02380
- 🪐 Project Page: https://codegoat24.github.io/UnifiedReward/flex
- 🤗 Model Collections: https://huggingface.co/collections/CodeGoat24/unifiedreward-flex
- 🤗 Dataset: https://huggingface.co/datasets/CodeGoat24/UnifiedReward-Flex-SFT-90K
- 👋 Point of Contact: Yibin Wang
vLLM Server Deployment
export VLLM_DISABLE_FLASHINFER_GDN_PREFILL=1
export TOKENIZERS_PARALLELISM=false
vllm serve CodeGoat24/UnifiedReward-Flex-qwen35-27b \
--host localhost \
--port 8080 \
--trust-remote-code \
--served-model-name UnifiedReward \
--gpu-memory-utilization 0.95 \
--mm-encoder-tp-mode data \
--mm-processor-cache-type shm \
--enable-prefix-caching \
--tensor-parallel-size 8 \
--default-chat-template-kwargs '{"enable_thinking": false}'
The inference code is provided here.
Citation
@article{unifiedreward-flex,
title={Unified Personalized Reward Model for Vision Generation},
author={Wang, Yibin and Zang, Yuhang and Han, Feng and Bu, Jiazi and Zhou, Yujie and Jin, Cheng and Wang, Jiaqi},
journal={arXiv preprint arXiv:2602.02380},
year={2026}
}