Junjun2333
/

HPSv3-PlusPlus

Image-Text-to-Text

human-preference

Model card Files Files and versions

HPSv3-PlusPlus / README.md

Junjun2333's picture

Upload README.md with huggingface_hub

ac26578 verified 9 days ago

|

History Blame Contribute Delete

1.81 kB

	---
	license: apache-2.0
	base_model:
	- Qwen/Qwen3-VL-8B-Instruct
	pipeline_tag: image-text-to-text
	tags:
	- reward-model
	- text-to-image
	- human-preference
	- rlhf
	---

	# HPSv3++: Scaling Reward Models Across the Full Spectrum of Diffusion Model Capabilities

	HPSv3++ is a capability-aware and RL-iteration-aware text-to-image (T2I) reward model, built on the `Qwen/Qwen3-VL-8B-Instruct` backbone with a Capability Encoder, a FiLM conditioning head, and a three-layer RankNet reward head.

	A Capability Encoder implicitly infers the generative ability of the model that produced an image, while the RL iteration step is supplied as an explicit condition; the two are jointly modulated through FiLM so that a single reward model produces calibrated preference scores across generators of differing capability and different stages of RL optimization.

	The training/evaluation dataset, HPDv3++, is released separately: [Junjun2333/HPDv3-PlusPlus](https://huggingface.co/datasets/Junjun2333/HPDv3-PlusPlus).

	## Files

	\| File \| Description \|
	\|---\|---\|
	\| `hpsv3++.pth` \| Final HPSv3++ reward-model weights (17.6 GB) \|
	\| `config.json` \| Model configuration \|

	## Conditioning at inference

	- Model capability is inferred implicitly from the image; you do not pass it in.
	- RL iteration is passed explicitly as a normalized scalar in `[0, 1]`.
	- General preference scoring / ranking: use `0.0` (pre-RL setting).
	- As the reward inside T2I RL fine-tuning: ramp the iteration condition linearly from `0.3` to `1.0` over training (the setting used in the paper).
	- Use the mean (`mu`) output as the scalar reward.

	## Citation

	```bibtex
	@misc{hpsv3pp,
	title = {HPSv3++: Scaling Reward Models Across the Full Spectrum of Diffusion Model Capabilities},
	author = {HPSv3++ Team},
	year = {2026}
	}
	```