Model Summary

UnifiedReward-Flex-qwen3vl-4b is a unified personalized reward model for vision generation that couples reward modeling with flexible and context-adaptive reasoning!!

πŸš€ The inference code is available at Github.

[2026/05/13] πŸ”₯πŸ”₯ We updated the model weights and enhanced the training data to mitigate the position bias issue!! The model weights for other sizes will also be updated soon.

For further details, please refer to the following resources:

Citation

@article{unifiedreward-flex,
  title={Unified Personalized Reward Model for Vision Generation},
  author={Wang, Yibin and Zang, Yuhang and Han, Feng and Bu, Jiazi and Zhou, Yujie and Jin, Cheng and Wang, Jiaqi},
  journal={arXiv preprint arXiv:2602.02380},
  year={2026}
}
Downloads last month
46
Safetensors
Model size
5B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for CodeGoat24/UnifiedReward-Flex-qwen3vl-4b

Dataset used to train CodeGoat24/UnifiedReward-Flex-qwen3vl-4b

Collection including CodeGoat24/UnifiedReward-Flex-qwen3vl-4b

Paper for CodeGoat24/UnifiedReward-Flex-qwen3vl-4b