Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think
Paper • 2502.20172 • Published • 29
import torch
from diffusers import DiffusionPipeline
# switch to "mps" for apple devices
pipe = DiffusionPipeline.from_pretrained("leonardPKU/DreamEngine-ObjectFusion", dtype=torch.bfloat16, device_map="cuda")
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt).images[0]YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.