UnifiedReward Flex
Collection
13 items β’ Updated β’ 6
import torch
from diffusers import DiffusionPipeline
# switch to "mps" for apple devices
pipe = DiffusionPipeline.from_pretrained("CodeGoat24/FLUX.1-dev-UnifiedReward-Flex", dtype=torch.bfloat16, device_map="cuda")
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt).images[0]This model is GRPO trained using UnifiedReward-Flex as reward on the training dataset of UniGenBench.
π The inference code is available at Github.
For further details, please refer to the following resources:
@article{unifiedreward-flex,
title={Unified Personalized Reward Model for Vision Generation},
author={Wang, Yibin and Zang, Yuhang and Han, Feng and Bu, Jiazi and Zhou, Yujie and Jin, Cheng and Wang, Jiaqi},
journal={arXiv preprint arXiv:2602.02380},
year={2026}
}