Text-to-Image
Diffusers
Safetensors

Add pipeline tag, library name and Github README content

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +140 -0
README.md CHANGED
@@ -1,6 +1,146 @@
1
  ---
2
  license: mit
 
 
3
  ---
4
 
5
  This repository contains public models of [Latent Preference Optimization (LPO)](https://github.com/Kwai-Kolors/LPO) based on SD1.5 and SDXL. The merged models represent the merged weights of the lora weights with the original models.
6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ pipeline_tag: text-to-image
4
+ library_name: diffusers
5
  ---
6
 
7
  This repository contains public models of [Latent Preference Optimization (LPO)](https://github.com/Kwai-Kolors/LPO) based on SD1.5 and SDXL. The merged models represent the merged weights of the lora weights with the original models.
8
 
9
+ <h1 align="center"> Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization </h1>
10
+
11
+ <p align="center">
12
+ <a href='https://arxiv.org/abs/2502.01051'>
13
+ <img src='https://img.shields.io/badge/Arxiv-2502.01051-A42C25?style=flat&logo=arXiv&logoColor=A42C25'></a>
14
+ <a href='https://huggingface.co/casiatao/LRM'>
15
+ <img src='https://img.shields.io/badge/%F0%9F%A4%97%20Model-LRM-yellow'></a>
16
+ <a href='https://huggingface.co/casiatao/LPO'>
17
+ <img src='https://img.shields.io/badge/%F0%9F%A4%97%20Model-LPO-yellow'></a>
18
+ <a href='https://visitor-badge.laobi.icu/badge?page_id=Kwai-Kolors.LPO'>
19
+ <img src="https://visitor-badge.laobi.icu/badge?page_id=Kwai-Kolors.LPO&left_color=gray&right_color=%2342b983"></a>
20
+ </p>
21
+
22
+ <p align="center">
23
+ <img src="imgs/vis.png" alt="vis" style="width:100%; height:auto;" />
24
+ </p>
25
+
26
+ ## ๐Ÿ“ News
27
+ * [2025.03.20]: ๐Ÿ”ฅ The pre-trained models are released!
28
+ * [2025.03.20]: ๐Ÿ”ฅ The source code is publicly available!
29
+
30
+
31
+ ## ๐Ÿ“– Introduction
32
+ This repository contains the official pytorch implementation of the paper โ€œ[Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization](https://arxiv.org/abs/2502.01051)โ€ paper.
33
+
34
+ <p align="center">
35
+ <img src="imgs/intro.png" alt="intro" style="width:60%; height:auto;" />
36
+ </p>
37
+
38
+ In this work, we analyze the challenges when pixel-level reward models are used in step-level preference optimization for diffusion models. Then we propose the Latent Reward Model (LRM) to utilize diffusion models for step-level reward modeling, based on the insights that diffusion models possess text-image alignment abilities and can perceive noisy latent images across different timesteps. We further introduce Latent Preference Optimization (LPO), a method that employs LRM for step-level preference optimization, operating entirely within the latent space.
39
+
40
+ Extensive experiments demonstrate that LPO significantly improves the image quality of various diffusion models and consistently outperforms existing DPO and SPO methods across the general, aesthetic, and alignment preferences. Moreover, LPO exhibits remarkable training efficiency, achieving a speedup of 10-28&times; over Diffusion-DPO and 2.5-3.5&times; over SPO.
41
+
42
+
43
+ ## ๐Ÿ› ๏ธ Usage
44
+ Clone this repository.
45
+ ```bash
46
+ git clone https://github.com/Kwai-Kolors/LPO
47
+ cd LPO
48
+ ```
49
+
50
+ ### 1. LRM Training
51
+
52
+ #### 1.1 Environmental Setup
53
+
54
+ ```bash
55
+ conda create -n lrm python=3.8
56
+ conda activate lrm
57
+ pip install torch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 --index-url https://download.pytorch.org/whl/cu118
58
+ cd ./lrm
59
+ pip install -r requirements.txt
60
+ cd ./lrm_15
61
+ pip install -e .
62
+ ```
63
+
64
+ #### 1.2 Pre-trained Weights Downloading
65
+ - Download the `pytorch_model.bin` from the `openai/clip-vit-large-patch14` [Hugging Face repository](https://huggingface.co/openai/clip-vit-large-patch14). Change the `clip_ckpt_path` in `lrm_15/trainer/conf/step_sd15.yaml` to its real storage path.
66
+ - Download the pre-computed score file from [Google Drive](https://drive.google.com/file/d/1baFGMntt6QxVqy8hzxQHfU9sCC-Eagq_/view?usp=drive_link), which contains multiple preference scores for images in Pick-a-Pic, and place it under the LRM folder.
67
+
68
+ #### 1.3 Training
69
+
70
+ - LRM-1.5
71
+
72
+ ```bash
73
+ cd lrm_15
74
+ bash train_lrm_15.sh
75
+ ```
76
+
77
+ - LRM-XL
78
+
79
+ ```bash
80
+ cd lrm_xl
81
+ bash train_lrm_xl.sh
82
+ ```
83
+
84
+
85
+
86
+ ### 2. LPO Training
87
+
88
+ #### 2.1 Environmental Setup
89
+
90
+ ```bash
91
+ conda create -n lpo python=3.9
92
+ conda activate lpo
93
+ pip install torch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 --index-url https://download.pytorch.org/whl/cu118
94
+ pip3 install -U xformers==0.0.24 --index-url https://download.pytorch.org/whl/cu118
95
+ cd ./lpo
96
+ pip install -r requirements.txt
97
+ ```
98
+
99
+ #### 2.2 Pre-trained Weights Downloading
100
+ - Download the `pytorch_model.bin` from the `openai/clip-vit-large-patch14` [Hugging Face repository](https://huggingface.co/openai/clip-vit-large-patch14). Change the `clip_ckpt_path` in `lpo/lpo/preference_models/models/sd15_preference_model.py` to its real storage path.
101
+ - Change the `ft_model_path` in the `lpo/configs` to real path of reward models. Our public reward models are available in [Hugging Face](https://huggingface.co/casiatao/LRM).
102
+
103
+ #### 2.3 Training
104
+
105
+ - Train SD1.5 using LRM-1.5
106
+
107
+ ```bash
108
+ cd lpo
109
+ accelerate launch --config_file accelerate_cfg/1m4g_fp16.yaml train_scripts/train_lpo.py --config configs/lpo_sd-v1-5_5ep_cfg75_4k_beta500_multiscale_wocfg_thresh035-05-sigma.py
110
+ ```
111
+
112
+ - Train SD2.1 using LRM-2.1
113
+
114
+ ```bash
115
+ cd lpo
116
+ accelerate launch --config_file accelerate_cfg/1m4g_fp16.yaml train_scripts/train_lpo.py --config configs/lpo_sd-v2-1_5ep_cfg75_4k_beta500_multiscale_wocfg_thresh035-05-sigma.py
117
+ ```
118
+
119
+ - Train SDXL using LRM-XL
120
+
121
+ ```bash
122
+ cd lpo
123
+ accelerate launch --config_file accelerate_cfg/1m4g_fp16.yaml train_scripts/train_lpo_sdxl.py --config configs/lpo_sdxl_5ep_cfg75_8k_beta500_multiscale_wocfg_thresh45-6-sigma.py
124
+ ```
125
+
126
+ ### 3. Pre-trained Models
127
+ - The pre-trained Latent Reward Models (LRM) are available in [Hugging Face](https://huggingface.co/casiatao/LRM).
128
+ - The optimized diffusion models by the Latent Preference Optimization (LPO) method are available in [Hugging Face](https://huggingface.co/casiatao/LPO).
129
+
130
+
131
+ ## โญ Citation
132
+ If you find this repository helpful, please consider giving it a star โญ and citing:
133
+ ```bibtex
134
+ @article{zhang2025diffusion,
135
+ title={Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization},
136
+ author={Zhang, Tao and Da, Cheng and Ding, Kun and Jin, Kun and Li, Yan and Gao, Tingting and Zhang, Di and Xiang, Shiming and Pan, Chunhong},
137
+ journal={arXiv preprint arXiv:2502.01051},
138
+ year={2025}
139
+ }
140
+ ```
141
+
142
+
143
+
144
+ ## ๐Ÿค— Acknowledgments
145
+
146
+ This codebase is built upon the [PickScore](https://github.com/yuvalkirstain/PickScore) repository and the [SPO](https://github.com/RockeyCoss/SPO) repository. Thanks for their great work๏ผ