UnifiedReward Flex
Collection
12 items β’ Updated β’ 6
UnifiedReward-Flex-qwen3vl-4b is a unified personalized reward model for vision generation that couples reward modeling with flexible and context-adaptive reasoning!!
π The inference code is available at Github.
[2026/05/13] π₯π₯ We updated the model weights and enhanced the training data to mitigate the position bias issue!! The model weights for other sizes will also be updated soon.
For further details, please refer to the following resources:
@article{unifiedreward-flex,
title={Unified Personalized Reward Model for Vision Generation},
author={Wang, Yibin and Zang, Yuhang and Han, Feng and Bu, Jiazi and Zhou, Yujie and Jin, Cheng and Wang, Jiaqi},
journal={arXiv preprint arXiv:2602.02380},
year={2026}
}
Base model
Qwen/Qwen3-VL-4B-Instruct