Text-to-Image
Diffusers
Safetensors
casiatao commited on
Commit
8ef7258
·
verified ·
1 Parent(s): d2506a2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +116 -120
README.md CHANGED
@@ -4,132 +4,134 @@ pipeline_tag: text-to-image
4
  library_name: diffusers
5
  ---
6
 
7
- This repository contains public models of [Latent Preference Optimization (LPO)](https://github.com/Kwai-Kolors/LPO) based on SD1.5 and SDXL. The merged models represent the merged weights of the lora weights with the original models.
8
-
9
- <h1 align="center"> Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization </h1>
10
-
11
- <p align="center">
12
- <a href='https://arxiv.org/abs/2502.01051'>
13
- <img src='https://img.shields.io/badge/Arxiv-2502.01051-A42C25?style=flat&logo=arXiv&logoColor=A42C25'></a>
14
- <a href='https://huggingface.co/casiatao/LRM'>
15
- <img src='https://img.shields.io/badge/%F0%9F%A4%97%20Model-LRM-yellow'></a>
16
- <a href='https://huggingface.co/casiatao/LPO'>
17
- <img src='https://img.shields.io/badge/%F0%9F%A4%97%20Model-LPO-yellow'></a>
18
- <a href='https://visitor-badge.laobi.icu/badge?page_id=Kwai-Kolors.LPO'>
19
- <img src="https://visitor-badge.laobi.icu/badge?page_id=Kwai-Kolors.LPO&left_color=gray&right_color=%2342b983"></a>
20
- </p>
21
-
22
- <p align="center">
23
- <img src="imgs/vis.png" alt="vis" style="width:100%; height:auto;" />
24
- </p>
25
-
26
- ## 📝 News
27
- * [2025.03.20]: 🔥 The pre-trained models are released!
28
- * [2025.03.20]: 🔥 The source code is publicly available!
29
-
30
-
31
- ## 📖 Introduction
32
- This repository contains the official pytorch implementation of the paper “[Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization](https://arxiv.org/abs/2502.01051)” paper.
33
-
34
- <p align="center">
35
- <img src="imgs/intro.png" alt="intro" style="width:60%; height:auto;" />
36
- </p>
37
-
38
- In this work, we analyze the challenges when pixel-level reward models are used in step-level preference optimization for diffusion models. Then we propose the Latent Reward Model (LRM) to utilize diffusion models for step-level reward modeling, based on the insights that diffusion models possess text-image alignment abilities and can perceive noisy latent images across different timesteps. We further introduce Latent Preference Optimization (LPO), a method that employs LRM for step-level preference optimization, operating entirely within the latent space.
39
-
40
- Extensive experiments demonstrate that LPO significantly improves the image quality of various diffusion models and consistently outperforms existing DPO and SPO methods across the general, aesthetic, and alignment preferences. Moreover, LPO exhibits remarkable training efficiency, achieving a speedup of 10-28&times; over Diffusion-DPO and 2.5-3.5&times; over SPO.
41
 
42
 
43
  ## 🛠️ Usage
44
- Clone this repository.
45
- ```bash
46
- git clone https://github.com/Kwai-Kolors/LPO
47
- cd LPO
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
48
  ```
49
 
50
- ### 1. LRM Training
51
-
52
- #### 1.1 Environmental Setup
53
-
54
- ```bash
55
- conda create -n lrm python=3.8
56
- conda activate lrm
57
- pip install torch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 --index-url https://download.pytorch.org/whl/cu118
58
- cd ./lrm
59
- pip install -r requirements.txt
60
- cd ./lrm_15
61
- pip install -e .
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
62
  ```
63
 
64
- #### 1.2 Pre-trained Weights Downloading
65
- - Download the `pytorch_model.bin` from the `openai/clip-vit-large-patch14` [Hugging Face repository](https://huggingface.co/openai/clip-vit-large-patch14). Change the `clip_ckpt_path` in `lrm_15/trainer/conf/step_sd15.yaml` to its real storage path.
66
- - Download the pre-computed score file from [Google Drive](https://drive.google.com/file/d/1baFGMntt6QxVqy8hzxQHfU9sCC-Eagq_/view?usp=drive_link), which contains multiple preference scores for images in Pick-a-Pic, and place it under the LRM folder.
67
-
68
- #### 1.3 Training
69
-
70
- - LRM-1.5
71
-
72
- ```bash
73
- cd lrm_15
74
- bash train_lrm_15.sh
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
75
  ```
76
 
77
- - LRM-XL
78
-
79
- ```bash
80
- cd lrm_xl
81
- bash train_lrm_xl.sh
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
82
  ```
83
 
84
 
85
-
86
- ### 2. LPO Training
87
-
88
- #### 2.1 Environmental Setup
89
-
90
- ```bash
91
- conda create -n lpo python=3.9
92
- conda activate lpo
93
- pip install torch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 --index-url https://download.pytorch.org/whl/cu118
94
- pip3 install -U xformers==0.0.24 --index-url https://download.pytorch.org/whl/cu118
95
- cd ./lpo
96
- pip install -r requirements.txt
97
- ```
98
-
99
- #### 2.2 Pre-trained Weights Downloading
100
- - Download the `pytorch_model.bin` from the `openai/clip-vit-large-patch14` [Hugging Face repository](https://huggingface.co/openai/clip-vit-large-patch14). Change the `clip_ckpt_path` in `lpo/lpo/preference_models/models/sd15_preference_model.py` to its real storage path.
101
- - Change the `ft_model_path` in the `lpo/configs` to real path of reward models. Our public reward models are available in [Hugging Face](https://huggingface.co/casiatao/LRM).
102
-
103
- #### 2.3 Training
104
-
105
- - Train SD1.5 using LRM-1.5
106
-
107
- ```bash
108
- cd lpo
109
- accelerate launch --config_file accelerate_cfg/1m4g_fp16.yaml train_scripts/train_lpo.py --config configs/lpo_sd-v1-5_5ep_cfg75_4k_beta500_multiscale_wocfg_thresh035-05-sigma.py
110
- ```
111
-
112
- - Train SD2.1 using LRM-2.1
113
-
114
- ```bash
115
- cd lpo
116
- accelerate launch --config_file accelerate_cfg/1m4g_fp16.yaml train_scripts/train_lpo.py --config configs/lpo_sd-v2-1_5ep_cfg75_4k_beta500_multiscale_wocfg_thresh035-05-sigma.py
117
- ```
118
-
119
- - Train SDXL using LRM-XL
120
-
121
- ```bash
122
- cd lpo
123
- accelerate launch --config_file accelerate_cfg/1m4g_fp16.yaml train_scripts/train_lpo_sdxl.py --config configs/lpo_sdxl_5ep_cfg75_8k_beta500_multiscale_wocfg_thresh45-6-sigma.py
124
- ```
125
-
126
- ### 3. Pre-trained Models
127
- - The pre-trained Latent Reward Models (LRM) are available in [Hugging Face](https://huggingface.co/casiatao/LRM).
128
- - The optimized diffusion models by the Latent Preference Optimization (LPO) method are available in [Hugging Face](https://huggingface.co/casiatao/LPO).
129
-
130
-
131
- ## ⭐ Citation
132
- If you find this repository helpful, please consider giving it a star ⭐ and citing:
133
  ```bibtex
134
  @article{zhang2025diffusion,
135
  title={Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization},
@@ -138,9 +140,3 @@ If you find this repository helpful, please consider giving it a star ⭐ and ci
138
  year={2025}
139
  }
140
  ```
141
-
142
-
143
-
144
- ## 🤗 Acknowledgments
145
-
146
- This codebase is built upon the [PickScore](https://github.com/yuvalkirstain/PickScore) repository and the [SPO](https://github.com/RockeyCoss/SPO) repository. Thanks for their great work!
 
4
  library_name: diffusers
5
  ---
6
 
7
+ This repository contains public models of [Latent Preference Optimization (LPO)](https://github.com/Kwai-Kolors/LPO) based on SD1.5 and SDXL. The merged models represent the merged weights of the LoRA weights with the original models.
8
+ The corresponding github repository is [https://github.com/Kwai-Kolors/LPO](https://github.com/Kwai-Kolors/LPO).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
 
10
 
11
  ## 🛠️ Usage
12
+ ### SDXL
13
+ ```python
14
+ from diffusers import StableDiffusionXLPipeline, UNet2DConditionModel, AutoencoderKL
15
+ import torch
16
+
17
+ unet = UNet2DConditionModel.from_pretrained(
18
+ 'casiatao/LPO',
19
+ subfolder="lpo_sdxl_merge/unet",
20
+ torch_dtype=torch.float16
21
+ )
22
+ vae = AutoencoderKL.from_pretrained(
23
+ 'madebyollin/sdxl-vae-fp16-fix',
24
+ torch_dtype=torch.float16,
25
+ )
26
+ pipe = StableDiffusionXLPipeline.from_pretrained(
27
+ 'stabilityai/stable-diffusion-xl-base-1.0',
28
+ unet=unet,
29
+ vae=vae,
30
+ torch_dtype=torch.float16
31
+ )
32
+ pipe = pipe.to("cuda")
33
+
34
+ prompt = "A cat holding a sign that says hello world"
35
+
36
+ generator=torch.Generator(device="cuda").manual_seed(42)
37
+ image = pipe(
38
+ prompt=prompt,
39
+ guidance_scale=5.0,
40
+ num_inference_steps=20,
41
+ generator=generator,
42
+ output_type='pil',
43
+ ).images[0]
44
+ image.save("img_sdxl.png")
45
  ```
46
 
47
+ ### SDXL(LoRA)
48
+ ```python
49
+ from diffusers import StableDiffusionXLPipeline, AutoencoderKL
50
+ import torch
51
+
52
+ vae = AutoencoderKL.from_pretrained(
53
+ 'madebyollin/sdxl-vae-fp16-fix',
54
+ torch_dtype=torch.float16,
55
+ )
56
+ pipe = StableDiffusionXLPipeline.from_pretrained(
57
+ 'stabilityai/stable-diffusion-xl-base-1.0',
58
+ vae=vae,
59
+ torch_dtype=torch.float16
60
+ )
61
+ pipe.load_lora_weights("casiatao/LPO", weight_name="lpo_sdxl_lora/pytorch_lora_weights.safetensors")
62
+ pipe = pipe.to("cuda")
63
+
64
+ prompt = "A cat holding a sign that says hello world"
65
+
66
+ generator=torch.Generator(device="cuda").manual_seed(42)
67
+ image = pipe(
68
+ prompt=prompt,
69
+ guidance_scale=5.0,
70
+ num_inference_steps=20,
71
+ generator=generator,
72
+ output_type='pil',
73
+ ).images[0]
74
+ image.save("img_sdxl_lora.png")
75
  ```
76
 
77
+ ### SD1.5
78
+ ```python
79
+ from diffusers import StableDiffusionPipeline, UNet2DConditionModel
80
+ import torch
81
+
82
+ unet = UNet2DConditionModel.from_pretrained(
83
+ 'casiatao/LPO',
84
+ subfolder="lpo_sd15_merge/unet",
85
+ torch_dtype=torch.float16
86
+ )
87
+ pipe = StableDiffusionPipeline.from_pretrained(
88
+ 'sd-legacy/stable-diffusion-v1-5',
89
+ unet=unet,
90
+ torch_dtype=torch.float16
91
+ )
92
+ pipe = pipe.to("cuda")
93
+
94
+ prompt = "a photo of a cat"
95
+
96
+ generator=torch.Generator(device="cuda").manual_seed(42)
97
+ image = pipe(
98
+ prompt=prompt,
99
+ guidance_scale=5.0,
100
+ num_inference_steps=20,
101
+ generator=generator,
102
+ output_type='pil',
103
+ ).images[0]
104
+ image.save("img_sd15.png")
105
  ```
106
 
107
+ ### SD1.5(LoRA)
108
+ ```python
109
+ from diffusers import StableDiffusionPipeline
110
+ import torch
111
+
112
+ pipe = StableDiffusionPipeline.from_pretrained(
113
+ 'sd-legacy/stable-diffusion-v1-5',
114
+ torch_dtype=torch.float16
115
+ )
116
+ pipe.load_lora_weights("casiatao/LPO", weight_name="lpo_sd15_lora/pytorch_lora_weights.safetensors")
117
+ pipe = pipe.to("cuda")
118
+
119
+ prompt = "a photo of a cat"
120
+
121
+ generator=torch.Generator(device="cuda").manual_seed(42)
122
+ image = pipe(
123
+ prompt=prompt,
124
+ guidance_scale=5.0,
125
+ num_inference_steps=20,
126
+ generator=generator,
127
+ output_type='pil',
128
+ ).images[0]
129
+ image.save("img_sd15_lora.png")
130
  ```
131
 
132
 
133
+ ## ❤️ Citation
134
+ If you find this repository helpful, please consider giving it a like ❤️ and citing:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
135
  ```bibtex
136
  @article{zhang2025diffusion,
137
  title={Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization},
 
140
  year={2025}
141
  }
142
  ```