How to use from the
Use from the
Diffusers library
pip install -U diffusers transformers accelerate
import torch
from diffusers import DiffusionPipeline

# switch to "mps" for apple devices
pipe = DiffusionPipeline.from_pretrained("longlian/lmd_plus", dtype=torch.bfloat16, device_map="cuda")

prompt = "In an indoor scene, a blue cube directly above a red cube with a vase on the left of them"
image = pipe(prompt).images[0]

LMD+ Model Card

Paper | Project Page | 5-minute Blog Post | Demo | Code | Citation | Related work: LLM-grounded Video Diffusion Models

LMD and LMD+ greatly improves the prompt following ability of text-to-image generation models by introducing an LLM as a front-end prompt parser and layout planner. It improves spatial reasoning, the understanding of negation, attribute binding, generative numeracy, etc. in a unified manner without explicitly aiming for each. LMD is completely training-free (i.e., uses SD model off-the-shelf). LMD+ takes in additional adapters for better control. This is a reproduction of LMD+ model used in our work. Our full codebase is at here.

This LMD+ model is based on Stable Diffusion v1.4 and integrates the adapters trained with GLIGEN. The model can be directly used with our LLMGroundedDiffusionPipeline, which is a simplified pipeline of LMD+ without per-box generation.

See the original SD Model Card here.

Cite our work

@article{lian2023llmgrounded,
    title={LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models}, 
    author={Lian, Long and Li, Boyi and Yala, Adam and Darrell, Trevor},
    journal={arXiv preprint arXiv:2305.13655},
    year={2023}
}
Downloads last month
14,957
Inference Examples
Examples
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Spaces using longlian/lmd_plus 16

Paper for longlian/lmd_plus