| --- |
| license: apache-2.0 |
| base_model: |
| - Qwen/Qwen3-VL-8B-Instruct |
| pipeline_tag: image-text-to-text |
| tags: |
| - reward-model |
| - text-to-image |
| - human-preference |
| - rlhf |
| --- |
| |
| # HPSv3++: Scaling Reward Models Across the Full Spectrum of Diffusion Model Capabilities |
|
|
| HPSv3++ is a **capability-aware and RL-iteration-aware** text-to-image (T2I) reward model, built on the `Qwen/Qwen3-VL-8B-Instruct` backbone with a Capability Encoder, a FiLM conditioning head, and a three-layer RankNet reward head. |
|
|
| A Capability Encoder implicitly infers the generative ability of the model that produced an image, while the RL iteration step is supplied as an explicit condition; the two are jointly modulated through FiLM so that a single reward model produces calibrated preference scores across generators of differing capability and different stages of RL optimization. |
|
|
| The training/evaluation dataset, HPDv3++, is released separately: [Junjun2333/HPDv3-PlusPlus](https://huggingface.co/datasets/Junjun2333/HPDv3-PlusPlus). |
|
|
| ## Files |
|
|
| | File | Description | |
| |---|---| |
| | `hpsv3++.pth` | Final HPSv3++ reward-model weights (17.6 GB) | |
| | `config.json` | Model configuration | |
|
|
| ## Conditioning at inference |
|
|
| - **Model capability** is inferred implicitly from the image; you do not pass it in. |
| - **RL iteration** is passed explicitly as a normalized scalar in `[0, 1]`. |
| - General preference scoring / ranking: use `0.0` (pre-RL setting). |
| - As the reward inside T2I RL fine-tuning: ramp the iteration condition linearly from `0.3` to `1.0` over training (the setting used in the paper). |
| - Use the mean (`mu`) output as the scalar reward. |
|
|
| ## Citation |
|
|
| ```bibtex |
| @misc{hpsv3pp, |
| title = {HPSv3++: Scaling Reward Models Across the Full Spectrum of Diffusion Model Capabilities}, |
| author = {HPSv3++ Team}, |
| year = {2026} |
| } |
| ``` |
|
|