Training Diffusion Models with Reinforcement Learning
Paper • 2305.13301 • Published • 5
import torch
from diffusers import DiffusionPipeline
# switch to "mps" for apple devices
pipe = DiffusionPipeline.from_pretrained("kvablack/ddpo-compressibility", dtype=torch.bfloat16, device_map="cuda")
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt).images[0]This model was finetuned from Stable Diffusion v1-4 using DDPO and a reward function encouraging images that are JPEG-compressible. See the project website for more details.
The model was finetuned for 60 iterations with a batch size of 256 samples per iteration. During finetuning, it was prompted with all of the animals in the Imagenet-1000 categories (the first 398 categories), but it exhibits some generalization to other prompts.