File size: 9,662 Bytes
ab2369a 103f7e0 ab2369a 103f7e0 ab2369a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 |
<div align="center">
# πππ Improve Diffusion Image Generation Quality using Levenberg-Marquardt-Langevin
We introduce **LML**, an accelerated sampler for diffusion models leveraging the second-order Hessian geometry. Our LML imlpementation is completely compatible with the **[diffusers](https://github.com/huggingface/diffusers)**.
This repository is the official implementation of the **ICCV 2025** paper:
_"Unleashing High-Quality Image Generation in Diffusion Sampling Using Second-Order Levenberg-Marquardt-Langevin"_
> **Fangyikang Wang<sup>1,2</sup>, Hubery Yin<sup>2</sup>, Lei Qian<sup>1</sup>, Yinan Li<sup>1</sup>, Shaobin Zhuang<sup>3,2</sup>, Huminhao Zhu<sup>1</sup>, Yilin Zhang<sup>1</sup>, Yanlong Tang<sup>4</sup>, Chao Zhang<sup>1</sup>, Hanbin Zhao<sup>1</sup>, Hui Qian<sup>1</sup>, Chen Li<sup>2</sup>**
>
> <sup>1</sup>Zhejiang University <sup>2</sup>WeChat Vision, Tencent Inc <sup>3</sup>Shanghai Jiao Tong University <sup>4</sup>Tencent Lightspeed Studio
[](https://www.arxiv.org/abs/2505.24222)
[](https://huggingface.co/zituitui/LML-diffusion-sampler)
[](https://github.com/zituitui/LML-diffusion-sampler)
[](https://opensource.org/licenses/MIT)
<img src="assets/lml-sd-visual_2_new-1.png" alt="SD Results" style="width: 100%;">
<img src="assets/lml-celeb-visual-1.png" alt="celeb Results" style="width: 70%;">
</div>
## The intuition of our LML diffusion sampler

> **Schematic comparison** between our LML method and baselines. While previous works mainly focus on intriguing designs along the annealing path to improve diffusion sampling, they leave operations at specific noise levels to be performed using first-order Langevin. Our approach proposes to leverage the Levenberg-Marquardt approximated Hessian geometry to guide the Langevin update to be more accurate.

> The relation between optimization algorithms and MCMC sampling algorithms. We initially wanted to develop a diffusion sampler utilizing Hessian geometry, following the path of Newton-Langevin dynamics.
However, this approach proved to be highly computationally expensive within the DM context.
Drawing inspiration from the Levenberg-Marquardt method used in optimization, our method incorporates low-rank approximation and damping techniques. This enables us to obtain the Hessian geometry in a computationally affordable manner. Subsequently, we use this approximated Hessian geometry to guide the Langevin updates.
## π¨π»βπ» Run the code
### 1) Get start
* Python 3.8.12
* CUDA 11.7
* NVIDIA A100 40GB PCIe
* Torch 2.0.0
* Torchvision 0.14.0
Please follow **[diffusers](https://github.com/huggingface/diffusers)** to install diffusers.
### 2) Sampling
first, please switch to the root directory.
- #### CIFAR-10 sampling
For baseline, you can do CIAFR-10 sampling as follows, choose sampler_type within [ddim, pndm, dpm, dpm++, unipc]:
```bash
python3 ./scripts/cifar10.py --test_num 1 --batch_size 1 --num_inference_steps 10 --save_dir YOUR/SAVE/DIR --model_id xx/xx/ddpm_ema_cifar10 --sampler_type ddim
```
For our LML sampler, there is an additional $\lambda$ hyperparameter:
```bash
python3 ./scripts/cifar10.py --test_num 1 --batch_size 1 --num_inference_steps 10 --save_dir YOUR/SAVE/DIR --model_id xx/xx/ddpm_ema_cifar10 --sampler_type dpm_lm --lamb 0.0008
```
For the optimal choice of LML, we have:
| | 5 NFEs | 6 NFEs | 7 NFEs | 8 NFEs | 9 NFEs | 10 NFEs | 12 NFEs | 15 NFEs | 20 NFEs | 30 NFEs | 50 NFEs | 100 NFEs |
|---------|---------|---------|---------|---------|---------|----------|----------|----------|----------|----------|----------|-----------|
| optimal value of lamb | 0.0008 | 0.0008 | 0.001 | 0.001 | 0.001 | 0.0008 | 0.001 | 0.001 | 0.0005 | 0.0003 | 0.0001 | 0.00005 |
- #### CelebA-HQ sampling
For baseline:
```bash
python3 ./scripts/celeba.py --test_num 1 --batch_size 1 --num_inference_steps 10 --save_dir YOUR/SAVE/DIR --model_id xx/xx/ldm-celebahq-256 --sampler_type ddim
```
For our LML:
```bash
python3 ./scripts/celeba.py --test_num 1 --batch_size 1 --num_inference_steps 10 --save_dir YOUR/SAVE/DIR --model_id xx/xx/ldm-celebahq-256 --sampler_type ddim_lm --lamb 0.005
```
- #### SD-15 and SD-2b on MS-COCO sampling
```bash
python3 ./scripts/StableDiffusion_COCO.py --test_num 30002 --num_inference_steps 10 --save_dir YOUR/SAVE/DIR --model_id xx/xx/stable-diffusion-v1-5 --sampler_type dpm_lm --lamb 0.001
```
For the optimal choice of LML on MS-COCO, for NFEs of {5, 6, 7, 8, 9, 10, 12, 15}, we always choose $\lambda = 0.001$:
<!-- |NFEs| 5 | 6 | 7 | 8 | 9 | 10 | 12 | 15 |
|---------|------|------|------|------|------|------|------|------|
| SD-15 | -- | -- | 0.001 | - | - | -| 0.001 | 0.001 |
| SD-2b | -- | - | - | - | - | - | - | - | -->
- #### SD-15, SD-2b, SD-XL, and PixArt-$\alpha$ on T2i-compbench sampling
Before running the scripts, make sure to clone T2I-CompBench repository. Generated images are stored in the directory "save_dir/model/dataset_category/sampler_type/samples".
For baseline, you can do T2i-compbench sampling as follows, choose sampler_type within [ddim, pndm, dpm, dpm++, unipc] and model within [sd15, sd2_base, sdxl, pixart]:
```bash
python3 ./scripts/StableDiffusion_PixArt_T2i_Sampling.py --dataset_category color --dataset_path PATH/TO/T2I-COMPBENCH --test_num 10 --num_inference_steps 10 --model_dir YOUR/MODEL/DIR --save_dir YOUR/SAVE/DIR --model sd15 --sampler_type ddim
```
For our LML sampler, there is an additional $\lambda$ hyperparameter:
```bash
python3 ./scripts/StableDiffusion_PixArt_T2i_Sampling.py --dataset_category color --dataset_path PATH/TO/T2I-COMPBENCH --test_num 10 --num_inference_steps 10 --model_dir YOUR/MODEL/DIR --save_dir YOUR/SAVE/DIR --model sd15 --sampler_type dpm_lm --lamb 0.006
```
- #### Use our LML diffusion sampler with ControlNet
**canny**
```bash
python3 ./scripts/control_net_canny.py --num_inference_steps 10 --original_image_path /xxx/xxx/data/input_image_vermeer.png --controlnet_dir /xxx/xxx/sd-controlnet-canny --sd_dir /xxx/xxx/stable-diffusion-v1-5 --save_dir YOUR/SAVE/DIR --sampler_type dpm_lm --lamb 0.001
```
**depth**
```bash
python3 ./scripts/control_net_depth.py --num_inference_steps 10 --controlnet_dir /xxx/xxx/control_v11f1p_sd15_depth --sd_dir /xxx/xxx/stable-diffusion-v1-5 --save_dir YOUR/SAVE/DIR --sampler_type dpm_lm --lamb 0.001
```
**pose**
```bash
python3 ./scripts/control_net_canny.py --num_inference_steps 10 --controlnet_dir /xxx/xxx/sd-controlnet-openpose --sd_dir /xxx/xxx/stable-diffusion-v1-5 --save_dir YOUR/SAVE/DIR --sampler_type dpm_lm --lamb 0.001
```
- #### LML sampling on FLUX
For baseline:
```bash
python3 ./scripts/FLUX_T2i_Sampling.py --dataset_category color --dataset_path PATH/TO/T2I-COMPBENCH --test_num 10 --num_inference_steps 10 --model_id YOUR/MODEL/DIR --save_dir YOUR/SAVE/DIR --sampler_type fm_euler
```
For our LML:
```bash
python3 ./scripts/FLUX_T2i_Sampling.py --dataset_category color --dataset_path PATH/TO/T2I-COMPBENCH --test_num 10 --num_inference_steps 10 --model_id YOUR/MODEL/DIR --save_dir YOUR/SAVE/DIR --sampler_type lml_euler --lamb 0.01
```
### 3) Evaluation
- #### FID evaluation on CIFAR-10
[Coming Soon] β³
- #### FID evaluation on MS-COCO
[Coming Soon] β³
- #### T2I-compbench evaluation
Please refer to the [T2I-CompBench](https://github.com/Karine-Huang/T2I-CompBench) guide. Create a new environment and install the dependencies for T2I-CompBench evaluation.
For testing combinations of multiple models and samplers, we also provide a convenient one-click script. Place the script file in the corresponding directory of **T2I-CompBench** to replace the origin script. For example:
```sh
# BLIP-VQA for Attribute Binding
cd T2I-CompBench
bash BLIPvqa_eval/test.sh
||
||
\/
cp evaluations/T2I-CompBench/BLIPvqa_test.sh T2I-CompBench/BLIPvqa_eval
cd T2I-CompBench
bash BLIPvqa_eval/BLIPvqa_test.sh 'save_dir'
```
The directory structure of **'save_dir'** should satisfy the following format:
```
{save_dir}/model/dataset_category/sampler_type/samples/
βββ a green bench and a blue bowl_000000.png
βββ a green bench and a blue bowl_000001.png
βββ...
```
## πͺͺ License
This project is licensed under the MIT License - see the [LICENSE](LICENSE.txt) file for details.
## π Citation
If our work assists your research, feel free to give us a star β or cite us using:
```
@article{wang2025unleashing,
title={Unleashing High-Quality Image Generation in Diffusion Sampling Using Second-Order Levenberg-Marquardt-Langevin},
author={Wang, Fangyikang and Yin, Hubery and Qian, Lei and Li, Yinan and Zhuang, Shaobin and Zhu, Huminhao and Zhang, Yilin and Tang, Yanlong and Zhang, Chao and Zhao, Hanbin and others},
journal={arXiv preprint arXiv:2505.24222},
year={2025}
}
```
## π© Contact me
Our e-mail address:
```
wangfangyikang@zju.edu.cn, qianlei33@zju.edu.cn, liyinan@zju.edu.cn
``` |