Model Summary

This model is GRPO trained using UnifiedReward-Flex as reward on the training dataset of UniGenBench.

🚀 The inference code is available at Github.

For further details, please refer to the following resources:

📰 Paper: https://arxiv.org/abs/2602.02380
🪐 Project Page: https://codegoat24.github.io/UnifiedReward/flex
🤗 Model Collections: https://huggingface.co/collections/CodeGoat24/unifiedreward-flex
🤗 Dataset: https://huggingface.co/datasets/CodeGoat24/UnifiedReward-Flex-SFT-90K
👋 Point of Contact: Yibin Wang

Citation

@article{unifiedreward-flex,
  title={Unified Personalized Reward Model for Vision Generation},
  author={Wang, Yibin and Zang, Yuhang and Han, Feng and Bu, Jiazi and Zhou, Yujie and Jin, Cheng and Wang, Jiaqi},
  journal={arXiv preprint arXiv:2602.02380},
  year={2026}
}

Downloads last month: 11

Model tree for CodeGoat24/Wan2.1-T2V-14B-UnifiedReward-Flex-lora

Base model

Wan-AI/Wan2.1-T2V-14B

Finetuned

(62)

this model

Collection including CodeGoat24/Wan2.1-T2V-14B-UnifiedReward-Flex-lora

UnifiedReward Flex

Collection

12 items • Updated about 3 hours ago • 6

Paper for CodeGoat24/Wan2.1-T2V-14B-UnifiedReward-Flex-lora

Unified Personalized Reward Model for Vision Generation

Paper • 2602.02380 • Published Feb 2 • 20