|
|
--- |
|
|
datasets: |
|
|
- yuvalkirstain/pickapic_v2 |
|
|
library_name: diffusers |
|
|
--- |
|
|
# Diffusion-KTO: Aligning Diffusion Models by Optimizing Human Utility |
|
|
<p align="center"> |
|
|
<img src="https://github.com/jacklishufan/diffusion-kto/blob/main/assets/teaser.png?raw=true", width=60%> <br> |
|
|
</p> |
|
|
|
|
|
|
|
|
This model is fine-tuned from Stable Diffusion v1-5 on Pick-a-Pic v2 dataset using KTO. |
|
|
|
|
|
|
|
|
### Usage |
|
|
```python |
|
|
import torch |
|
|
from diffusers import AutoencoderKL, UNet2DConditionModel, DiffusionPipeline |
|
|
vae_path = model_name = "runwayml/stable-diffusion-v1-5" |
|
|
device = 'cuda' |
|
|
weight_dtype = torch.float16 |
|
|
vae = AutoencoderKL.from_pretrained( |
|
|
vae_path, |
|
|
subfolder="vae", |
|
|
) |
|
|
unet = UNet2DConditionModel.from_pretrained( |
|
|
"jacklishufan/diffusion-kto", subfolder="unet", |
|
|
) |
|
|
pipeline = DiffusionPipeline.from_pretrained( |
|
|
model_name, |
|
|
vae=vae, |
|
|
unet=unet, |
|
|
device=device, |
|
|
).to(device).to(weight_dtype) |
|
|
|
|
|
|
|
|
result = pipeline( |
|
|
prompt="Self-portrait oil painting, a beautiful cyborg with golden hair, 8k", |
|
|
num_inference_steps=50, |
|
|
guidance_scale=7.0 |
|
|
) |
|
|
img = result[0][0] |
|
|
``` |
|
|
### Code |
|
|
|
|
|
The code is available [here](https://github.com/jacklishufan/diffusion-kto) |
|
|
|
|
|
### Citation |
|
|
``` |
|
|
@misc{li2024aligning, |
|
|
title={Aligning Diffusion Models by Optimizing Human Utility}, |
|
|
author={Shufan Li and Konstantinos Kallidromitis and Akash Gokul and Yusuke Kato and Kazuki Kozuka}, |
|
|
year={2024}, |
|
|
eprint={2404.04465}, |
|
|
archivePrefix={arXiv}, |
|
|
primaryClass={cs.CV} |
|
|
} |
|
|
``` |