This repository contains public models of Latent Preference Optimization (LPO) based on SD1.5 and SDXL. The merged models represent the merged weights of the LoRA weights with the original models. The corresponding github repository is https://github.com/Kwai-Kolors/LPO.

🛠️ Usage

SDXL

from diffusers import StableDiffusionXLPipeline, UNet2DConditionModel, AutoencoderKL
import torch

unet = UNet2DConditionModel.from_pretrained(
    'casiatao/LPO',
    subfolder="lpo_sdxl_merge/unet",
    torch_dtype=torch.float16
)
vae = AutoencoderKL.from_pretrained(
    'madebyollin/sdxl-vae-fp16-fix',
    torch_dtype=torch.float16,
)
pipe = StableDiffusionXLPipeline.from_pretrained(
    'stabilityai/stable-diffusion-xl-base-1.0', 
    unet=unet,
    vae=vae,
    torch_dtype=torch.float16
)
pipe = pipe.to("cuda")

prompt = "A cat holding a sign that says hello world"

generator=torch.Generator(device="cuda").manual_seed(42)
image = pipe(
    prompt=prompt,
    guidance_scale=5.0,
    num_inference_steps=20,
    generator=generator,
    output_type='pil',
).images[0]
image.save("img_sdxl.png")

SD1.5

from diffusers import StableDiffusionPipeline, UNet2DConditionModel
import torch

unet = UNet2DConditionModel.from_pretrained(
    'casiatao/LPO',
    subfolder="lpo_sd15_merge/unet",
    torch_dtype=torch.float16
)
pipe = StableDiffusionPipeline.from_pretrained(
    'sd-legacy/stable-diffusion-v1-5', 
    unet=unet,
    torch_dtype=torch.float16
)
pipe = pipe.to("cuda")

prompt = "a photo of a cat"

generator=torch.Generator(device="cuda").manual_seed(42)
image = pipe(
    prompt=prompt,
    guidance_scale=5.0,
    num_inference_steps=20,
    generator=generator,
    output_type='pil',
).images[0]
image.save("img_sd15.png")

❤️ Citation

If you find this repository helpful, please consider giving it a like ❤️ and citing:

@article{zhang2025diffusion,
  title={Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization},
  author={Zhang, Tao and Da, Cheng and Ding, Kun and Jin, Kun and Li, Yan and Gao, Tingting and Zhang, Di and Xiang, Shiming and Pan, Chunhong},
  journal={arXiv preprint arXiv:2502.01051},
  year={2025}
}

Downloads last month: -

Dataset used to train casiatao/LPO

Paper for casiatao/LPO

Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization

Paper • 2502.01051 • Published Feb 3, 2025 • 2