Text-to-Image
Diffusers
Safetensors
LRM / README.md
nielsr's picture
nielsr HF Staff
Add library_name and pipeline_tag, include paper link and content of Github README
f2c82db verified
|
raw
history blame
6.04 kB
metadata
license: mit
library_name: diffusers
pipeline_tag: text-to-image

This repository contains the Latent Reward Models (LRM) based on SD1.5 and SDXL.

Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization

vis

πŸ“ News

  • [2025.03.20]: πŸ”₯ The pre-trained models are released!
  • [2025.03.20]: πŸ”₯ The source code is publicly available!

πŸ“– Introduction

This repository contains the official pytorch implementation of the paper β€œDiffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization” paper.

intro

In this work, we analyze the challenges when pixel-level reward models are used in step-level preference optimization for diffusion models. Then we propose the Latent Reward Model (LRM) to utilize diffusion models for step-level reward modeling, based on the insights that diffusion models possess text-image alignment abilities and can perceive noisy latent images across different timesteps. We further introduce Latent Preference Optimization (LPO), a method that employs LRM for step-level preference optimization, operating entirely within the latent space.

Extensive experiments demonstrate that LPO significantly improves the image quality of various diffusion models and consistently outperforms existing DPO and SPO methods across the general, aesthetic, and alignment preferences. Moreover, LPO exhibits remarkable training efficiency, achieving a speedup of 10-28Γ— over Diffusion-DPO and 2.5-3.5Γ— over SPO.

πŸ› οΈ Usage

Clone this repository.

git clone https://github.com/Kwai-Kolors/LPO
cd LPO

1. LRM Training

1.1 Environmental Setup

conda create -n lrm python=3.8
conda activate lrm
pip install torch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 --index-url https://download.pytorch.org/whl/cu118
cd ./lrm
pip install -r requirements.txt
cd ./lrm_15
pip install -e .

1.2 Pre-trained Weights Downloading

  • Download the pytorch_model.bin from the openai/clip-vit-large-patch14 Hugging Face repository. Change the clip_ckpt_path in lrm_15/trainer/conf/step_sd15.yaml to its real storage path.
  • Download the pre-computed score file from Google Drive, which contains multiple preference scores for images in Pick-a-Pic, and place it under the LRM folder.

1.3 Training

  • LRM-1.5
cd lrm_15
bash train_lrm_15.sh
  • LRM-XL
cd lrm_xl
bash train_lrm_xl.sh

2. LPO Training

2.1 Environmental Setup

conda create -n lpo python=3.9
conda activate lpo
pip install torch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 --index-url https://download.pytorch.org/whl/cu118
pip3 install -U xformers==0.0.24 --index-url https://download.pytorch.org/whl/cu118
cd ./lpo
pip install -r requirements.txt

2.2 Pre-trained Weights Downloading

  • Download the pytorch_model.bin from the openai/clip-vit-large-patch14 Hugging Face repository. Change the clip_ckpt_path in lpo/lpo/preference_models/models/sd15_preference_model.py to its real storage path.
  • Change the ft_model_path in the lpo/configs to real path of reward models. Our public reward models are available in Hugging Face.

2.3 Training

  • Train SD1.5 using LRM-1.5
cd lpo
accelerate launch --config_file accelerate_cfg/1m4g_fp16.yaml train_scripts/train_lpo.py --config configs/lpo_sd-v1-5_5ep_cfg75_4k_beta500_multiscale_wocfg_thresh035-05-sigma.py
  • Train SD2.1 using LRM-2.1
cd lpo
accelerate launch --config_file accelerate_cfg/1m4g_fp16.yaml train_scripts/train_lpo.py --config configs/lpo_sd-v2-1_5ep_cfg75_4k_beta500_multiscale_wocfg_thresh035-05-sigma.py
  • Train SDXL using LRM-XL
cd lpo
accelerate launch --config_file accelerate_cfg/1m4g_fp16.yaml train_scripts/train_lpo_sdxl.py --config configs/lpo_sdxl_5ep_cfg75_8k_beta500_multiscale_wocfg_thresh45-6-sigma.py

3. Pre-trained Models

  • The pre-trained Latent Reward Models (LRM) are available in Hugging Face.
  • The optimized diffusion models by the Latent Preference Optimization (LPO) method are available in Hugging Face.

⭐ Citation

If you find this repository helpful, please consider giving it a star ⭐ and citing:

@article{zhang2025diffusion,
  title={Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization},
  author={Zhang, Tao and Da, Cheng and Ding, Kun and Jin, Kun and Li, Yan and Gao, Tingting and Zhang, Di and Xiang, Shiming and Pan, Chunhong},
  journal={arXiv preprint arXiv:2502.01051},
  year={2025}
}

πŸ€— Acknowledgments

This codebase is built upon the PickScore repository and the SPO repository. Thanks for their great work!