| language: | |
| - en | |
| license: other | |
| license_name: cogvlm2 | |
| license_link: https://huggingface.co/THUDM/cogvlm2-video-llama3-chat/blob/main/LICENSE | |
| pipeline_tag: feature-extraction | |
| library_name: transformers | |
| tags: | |
| - chat | |
| - cogvlm2 | |
| - cogvlm--video | |
| inference: false | |
| # VisionReward-Video | |
| This repository contains the model described in the paper [VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation](https://huggingface.co/papers/2412.21059). | |
| ## Introduction | |
| We present VisionReward, a general strategy to aligning visual generation models——both image and video generation——with human preferences through a fine-grainedand multi-dimensional framework. We decompose human preferences in images and videos into multiple dimensions,each represented by a series of judgment questions, linearly weighted and summed to an interpretable and accuratescore. To address the challenges of video quality assess-ment, we systematically analyze various dynamic features of videos, which helps VisionReward surpass VideoScore by 17.2% and achieve top performance for video preference prediction. | |
| Here, we present the model of VisionReward-Video. | |
| ## Using this model | |
| You can quickly install the Python package dependencies and run model inference in our [github](https://github.com/THUDM/VisionReward). |