Buckets:

hf-doc-build/doc / diffusers /v0.25.0 /ko /using-diffusers /inference_with_lcm.html
rtrm's picture
download
raw
33.1 kB
<!-- META HERE --><meta charset="utf-8" /><meta name="hf:doc:metadata" content="{&quot;local&quot;:&quot;latent-consistency-model&quot;,&quot;sections&quot;:[{&quot;local&quot;:&quot;texttoimage&quot;,&quot;title&quot;:&quot;Text-to-image&quot;},{&quot;local&quot;:&quot;imagetoimage&quot;,&quot;title&quot;:&quot;Image-to-image&quot;},{&quot;local&quot;:&quot;combine-with-style-loras&quot;,&quot;title&quot;:&quot;Combine with style LoRAs&quot;},{&quot;local&quot;:&quot;controlnett2iadapter&quot;,&quot;sections&quot;:[{&quot;local&quot;:&quot;controlnet&quot;,&quot;title&quot;:&quot;ControlNet&quot;},{&quot;local&quot;:&quot;t2iadapter&quot;,&quot;title&quot;:&quot;T2I-Adapter&quot;}],&quot;title&quot;:&quot;ControlNet/T2I-Adapter&quot;}],&quot;title&quot;:&quot;Latent Consistency Model&quot;}">
<link href="/docs/diffusers/v0.25.0/ko/_app/immutable/assets/0.e3b0c442.css" rel="modulepreload">
<link rel="modulepreload" href="/docs/diffusers/v0.25.0/ko/_app/immutable/entry/start.739fcb44.js">
<link rel="modulepreload" href="/docs/diffusers/v0.25.0/ko/_app/immutable/chunks/scheduler.182ea377.js">
<link rel="modulepreload" href="/docs/diffusers/v0.25.0/ko/_app/immutable/chunks/singletons.5abecdb2.js">
<link rel="modulepreload" href="/docs/diffusers/v0.25.0/ko/_app/immutable/chunks/index.1f6d62f6.js">
<link rel="modulepreload" href="/docs/diffusers/v0.25.0/ko/_app/immutable/chunks/paths.497209e7.js">
<link rel="modulepreload" href="/docs/diffusers/v0.25.0/ko/_app/immutable/entry/app.a301e47d.js">
<link rel="modulepreload" href="/docs/diffusers/v0.25.0/ko/_app/immutable/chunks/index.008d68e4.js">
<link rel="modulepreload" href="/docs/diffusers/v0.25.0/ko/_app/immutable/nodes/0.a130e184.js">
<link rel="modulepreload" href="/docs/diffusers/v0.25.0/ko/_app/immutable/chunks/each.e59479a4.js">
<link rel="modulepreload" href="/docs/diffusers/v0.25.0/ko/_app/immutable/nodes/164.794de2a1.js">
<link rel="modulepreload" href="/docs/diffusers/v0.25.0/ko/_app/immutable/chunks/Tip.4f096367.js">
<link rel="modulepreload" href="/docs/diffusers/v0.25.0/ko/_app/immutable/chunks/IconCopyLink.96bbb92b.js">
<link rel="modulepreload" href="/docs/diffusers/v0.25.0/ko/_app/immutable/chunks/CodeBlock.5ed6eb7b.js">
<link rel="modulepreload" href="/docs/diffusers/v0.25.0/ko/_app/immutable/chunks/DocNotebookDropdown.bb388256.js">
<link rel="modulepreload" href="/docs/diffusers/v0.25.0/ko/_app/immutable/chunks/globals.7f7f1b26.js"><!-- HEAD_svelte-1phssyn_START --><!-- HEAD_svelte-1phssyn_END --> <div class="flex space-x-1 absolute z-10 right-0 top-0"> <div class="relative colab-dropdown "> <button class=" " type="button"> <img alt="Open In Colab" class="!m-0" src="https://colab.research.google.com/assets/colab-badge.svg"> </button> </div> <div class="relative colab-dropdown "> <button class=" " type="button"> <img alt="Open In Studio Lab" class="!m-0" src="https://studiolab.sagemaker.aws/studiolab.svg"> </button> </div></div> <h1 class="relative group"><a id="latent-consistency-model" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#latent-consistency-model"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span data-svelte-h="svelte-oeemq0">Latent Consistency Model</span></h1> <p data-svelte-h="svelte-2fpkx0">Latent Consistency Models (LCM) enable quality image generation in typically 2-4 steps making it possible to use diffusion models in almost real-time settings.</p> <p data-svelte-h="svelte-d9zkw1">From the <a href="https://latent-consistency-models.github.io/" rel="nofollow">official website</a>:</p> <blockquote data-svelte-h="svelte-9ngsrn"><p>LCMs can be distilled from any pre-trained Stable Diffusion (SD) in only 4,000 training steps (~32 A100 GPU Hours) for generating high quality 768 x 768 resolution images in 2~4 steps or even one step, significantly accelerating text-to-image generation. We employ LCM to distill the Dreamshaper-V7 version of SD in just 4,000 training iterations.</p></blockquote> <p data-svelte-h="svelte-yzou0e">For a more technical overview of LCMs, refer to <a href="https://huggingface.co/papers/2310.04378" rel="nofollow">the paper</a>.</p> <p data-svelte-h="svelte-1r9tviz">LCM distilled models are available for <a href="https://huggingface.co/runwayml/stable-diffusion-v1-5" rel="nofollow">stable-diffusion-v1-5</a>, <a href="https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0" rel="nofollow">stable-diffusion-xl-base-1.0</a>, and the <a href="https://huggingface.co/segmind/SSD-1B" rel="nofollow">SSD-1B</a> model. All the checkpoints can be found in this <a href="https://huggingface.co/collections/latent-consistency/latent-consistency-models-weights-654ce61a95edd6dffccef6a8" rel="nofollow">collection</a>.</p> <p data-svelte-h="svelte-m0v2lt">This guide shows how to perform inference with LCMs for</p> <ul data-svelte-h="svelte-up6zsx"><li>text-to-image</li> <li>image-to-image</li> <li>combined with style LoRAs</li> <li>ControlNet/T2I-Adapter</li></ul> <h2 class="relative group"><a id="texttoimage" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#texttoimage"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span data-svelte-h="svelte-ws6p24">Text-to-image</span></h2> <p data-svelte-h="svelte-1gkvhl6">You’ll use the <a href="/docs/diffusers/v0.25.0/ko/api/pipelines/stable_diffusion/stable_diffusion_xl#diffusers.StableDiffusionXLPipeline">StableDiffusionXLPipeline</a> pipeline with the <a href="/docs/diffusers/v0.25.0/ko/api/schedulers/lcm#diffusers.LCMScheduler">LCMScheduler</a> and then load the LCM-LoRA. Together with the LCM-LoRA and the scheduler, the pipeline enables a fast inference workflow, overcoming the slow iterative nature of diffusion models.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre><!-- HTML_TAG_START --><span class="hljs-keyword">from</span> diffusers <span class="hljs-keyword">import</span> StableDiffusionXLPipeline, UNet2DConditionModel, LCMScheduler
<span class="hljs-keyword">import</span> torch
unet = UNet2DConditionModel.from_pretrained(
<span class="hljs-string">&quot;latent-consistency/lcm-sdxl&quot;</span>,
torch_dtype=torch.float16,
variant=<span class="hljs-string">&quot;fp16&quot;</span>,
)
pipe = StableDiffusionXLPipeline.from_pretrained(
<span class="hljs-string">&quot;stabilityai/stable-diffusion-xl-base-1.0&quot;</span>, unet=unet, torch_dtype=torch.float16, variant=<span class="hljs-string">&quot;fp16&quot;</span>,
).to(<span class="hljs-string">&quot;cuda&quot;</span>)
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
prompt = <span class="hljs-string">&quot;Self-portrait oil painting, a beautiful cyborg with golden hair, 8k&quot;</span>
generator = torch.manual_seed(<span class="hljs-number">0</span>)
image = pipe(
prompt=prompt, num_inference_steps=<span class="hljs-number">4</span>, generator=generator, guidance_scale=<span class="hljs-number">8.0</span>
).images[<span class="hljs-number">0</span>]<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1x1da6k"><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/lcm/lcm_full_sdxl_t2i.png"></p> <p data-svelte-h="svelte-n61tgz">Notice that we use only 4 steps for generation which is way less than what’s typically used for standard SDXL.</p> <p data-svelte-h="svelte-k48bzl">Some details to keep in mind:</p> <ul data-svelte-h="svelte-j6mcv0"><li>To perform classifier-free guidance, batch size is usually doubled inside the pipeline. LCM, however, applies guidance using guidance embeddings, so the batch size does not have to be doubled in this case. This leads to a faster inference time, with the drawback that negative prompts don’t have any effect on the denoising process.</li> <li>The UNet was trained using the [3., 13.] guidance scale range. So, that is the ideal range for <code>guidance_scale</code>. However, disabling <code>guidance_scale</code> using a value of 1.0 is also effective in most cases.</li></ul> <h2 class="relative group"><a id="imagetoimage" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#imagetoimage"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span data-svelte-h="svelte-1r5u1a2">Image-to-image</span></h2> <p data-svelte-h="svelte-wsp4r1">LCMs can be applied to image-to-image tasks too. For this example, we’ll use the <a href="https://huggingface.co/SimianLuo/LCM_Dreamshaper_v7" rel="nofollow">LCM_Dreamshaper_v7</a> model, but the same steps can be applied to other LCM models as well.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre><!-- HTML_TAG_START --><span class="hljs-keyword">import</span> torch
<span class="hljs-keyword">from</span> diffusers <span class="hljs-keyword">import</span> AutoPipelineForImage2Image, UNet2DConditionModel, LCMScheduler
<span class="hljs-keyword">from</span> diffusers.utils <span class="hljs-keyword">import</span> make_image_grid, load_image
unet = UNet2DConditionModel.from_pretrained(
<span class="hljs-string">&quot;SimianLuo/LCM_Dreamshaper_v7&quot;</span>,
subfolder=<span class="hljs-string">&quot;unet&quot;</span>,
torch_dtype=torch.float16,
)
pipe = AutoPipelineForImage2Image.from_pretrained(
<span class="hljs-string">&quot;Lykon/dreamshaper-7&quot;</span>,
unet=unet,
torch_dtype=torch.float16,
variant=<span class="hljs-string">&quot;fp16&quot;</span>,
).to(<span class="hljs-string">&quot;cuda&quot;</span>)
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
<span class="hljs-comment"># prepare image</span>
url = <span class="hljs-string">&quot;https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/img2img-init.png&quot;</span>
init_image = load_image(url)
prompt = <span class="hljs-string">&quot;Astronauts in a jungle, cold color palette, muted colors, detailed, 8k&quot;</span>
<span class="hljs-comment"># pass prompt and image to pipeline</span>
generator = torch.manual_seed(<span class="hljs-number">0</span>)
image = pipe(
prompt,
image=init_image,
num_inference_steps=<span class="hljs-number">4</span>,
guidance_scale=<span class="hljs-number">7.5</span>,
strength=<span class="hljs-number">0.5</span>,
generator=generator
).images[<span class="hljs-number">0</span>]
make_image_grid([init_image, image], rows=<span class="hljs-number">1</span>, cols=<span class="hljs-number">2</span>)<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1bxllow"><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/lcm/lcm_full_sdv1-5_i2i.png"></p> <div class="course-tip bg-gradient-to-br dark:bg-gradient-to-r before:border-green-500 dark:before:border-green-800 from-green-50 dark:from-gray-900 to-white dark:to-gray-950 border border-green-50 text-green-700 dark:text-gray-400"><p data-svelte-h="svelte-18pswq">You can get different results based on your prompt and the image you provide. To get the best results, we recommend trying different values for <code>num_inference_steps</code>, <code>strength</code>, and <code>guidance_scale</code> parameters and choose the best one.</p></div> <h2 class="relative group"><a id="combine-with-style-loras" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#combine-with-style-loras"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span data-svelte-h="svelte-1ifz62o">Combine with style LoRAs</span></h2> <p data-svelte-h="svelte-5kn6or">LCMs can be used with other styled LoRAs to generate styled-images in very few steps (4-8). In the following example, we’ll use the <a href="TheLastBen/Papercut_SDXL">papercut LoRA</a>.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre><!-- HTML_TAG_START --><span class="hljs-keyword">from</span> diffusers <span class="hljs-keyword">import</span> StableDiffusionXLPipeline, UNet2DConditionModel, LCMScheduler
<span class="hljs-keyword">import</span> torch
unet = UNet2DConditionModel.from_pretrained(
<span class="hljs-string">&quot;latent-consistency/lcm-sdxl&quot;</span>,
torch_dtype=torch.float16,
variant=<span class="hljs-string">&quot;fp16&quot;</span>,
)
pipe = StableDiffusionXLPipeline.from_pretrained(
<span class="hljs-string">&quot;stabilityai/stable-diffusion-xl-base-1.0&quot;</span>, unet=unet, torch_dtype=torch.float16, variant=<span class="hljs-string">&quot;fp16&quot;</span>,
).to(<span class="hljs-string">&quot;cuda&quot;</span>)
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
pipe.load_lora_weights(<span class="hljs-string">&quot;TheLastBen/Papercut_SDXL&quot;</span>, weight_name=<span class="hljs-string">&quot;papercut.safetensors&quot;</span>, adapter_name=<span class="hljs-string">&quot;papercut&quot;</span>)
prompt = <span class="hljs-string">&quot;papercut, a cute fox&quot;</span>
generator = torch.manual_seed(<span class="hljs-number">0</span>)
image = pipe(
prompt=prompt, num_inference_steps=<span class="hljs-number">4</span>, generator=generator, guidance_scale=<span class="hljs-number">8.0</span>
).images[<span class="hljs-number">0</span>]
image<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1xcyp4e"><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/lcm/lcm_full_sdx_lora_mix.png"></p> <h2 class="relative group"><a id="controlnett2iadapter" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#controlnett2iadapter"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span data-svelte-h="svelte-1q34qyx">ControlNet/T2I-Adapter</span></h2> <p data-svelte-h="svelte-1373sb3">Let’s look at how we can perform inference with ControlNet/T2I-Adapter and a LCM.</p> <h3 class="relative group"><a id="controlnet" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#controlnet"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span data-svelte-h="svelte-1147sj1">ControlNet</span></h3>
For this example, we&#39;ll use the [LCM_Dreamshaper_v7](https://huggingface.co/SimianLuo/LCM_Dreamshaper_v7) model with canny ControlNet, but the same steps can be applied to other LCM models as well.
<div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre><!-- HTML_TAG_START --><span class="hljs-keyword">import</span> torch
<span class="hljs-keyword">import</span> cv2
<span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np
<span class="hljs-keyword">from</span> PIL <span class="hljs-keyword">import</span> Image
<span class="hljs-keyword">from</span> diffusers <span class="hljs-keyword">import</span> StableDiffusionControlNetPipeline, ControlNetModel, LCMScheduler
<span class="hljs-keyword">from</span> diffusers.utils <span class="hljs-keyword">import</span> load_image, make_image_grid
image = load_image(
<span class="hljs-string">&quot;https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/input_image_vermeer.png&quot;</span>
).resize((<span class="hljs-number">512</span>, <span class="hljs-number">512</span>))
image = np.array(image)
low_threshold = <span class="hljs-number">100</span>
high_threshold = <span class="hljs-number">200</span>
image = cv2.Canny(image, low_threshold, high_threshold)
image = image[:, :, <span class="hljs-literal">None</span>]
image = np.concatenate([image, image, image], axis=<span class="hljs-number">2</span>)
canny_image = Image.fromarray(image)
controlnet = ControlNetModel.from_pretrained(<span class="hljs-string">&quot;lllyasviel/sd-controlnet-canny&quot;</span>, torch_dtype=torch.float16)
pipe = StableDiffusionControlNetPipeline.from_pretrained(
<span class="hljs-string">&quot;SimianLuo/LCM_Dreamshaper_v7&quot;</span>,
controlnet=controlnet,
torch_dtype=torch.float16,
safety_checker=<span class="hljs-literal">None</span>,
).to(<span class="hljs-string">&quot;cuda&quot;</span>)
<span class="hljs-comment"># set scheduler</span>
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
generator = torch.manual_seed(<span class="hljs-number">0</span>)
image = pipe(
<span class="hljs-string">&quot;the mona lisa&quot;</span>,
image=canny_image,
num_inference_steps=<span class="hljs-number">4</span>,
generator=generator,
).images[<span class="hljs-number">0</span>]
make_image_grid([canny_image, image], rows=<span class="hljs-number">1</span>, cols=<span class="hljs-number">2</span>)<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-jgtjlo"><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/lcm/lcm_full_sdv1-5_controlnet.png"></p> <div class="course-tip bg-gradient-to-br dark:bg-gradient-to-r before:border-green-500 dark:before:border-green-800 from-green-50 dark:from-gray-900 to-white dark:to-gray-950 border border-green-50 text-green-700 dark:text-gray-400">The inference parameters in this example might not work for all examples, so we recommend trying different values for the `num_inference_steps`, `guidance_scale`, `controlnet_conditioning_scale`, and `cross_attention_kwargs` parameters and choosing the best one.</div> <h3 class="relative group"><a id="t2iadapter" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#t2iadapter"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span data-svelte-h="svelte-1np368w">T2I-Adapter</span></h3> <p data-svelte-h="svelte-15vnvid">This example shows how to use the <code>lcm-sdxl</code> with the <a href="TencentARC/t2i-adapter-canny-sdxl-1.0">Canny T2I-Adapter</a>.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre><!-- HTML_TAG_START --><span class="hljs-keyword">import</span> torch
<span class="hljs-keyword">import</span> cv2
<span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np
<span class="hljs-keyword">from</span> PIL <span class="hljs-keyword">import</span> Image
<span class="hljs-keyword">from</span> diffusers <span class="hljs-keyword">import</span> StableDiffusionXLAdapterPipeline, UNet2DConditionModel, T2IAdapter, LCMScheduler
<span class="hljs-keyword">from</span> diffusers.utils <span class="hljs-keyword">import</span> load_image, make_image_grid
<span class="hljs-comment"># Prepare image</span>
<span class="hljs-comment"># Detect the canny map in low resolution to avoid high-frequency details</span>
image = load_image(
<span class="hljs-string">&quot;https://huggingface.co/Adapter/t2iadapter/resolve/main/figs_SDXLV1.0/org_canny.jpg&quot;</span>
).resize((<span class="hljs-number">384</span>, <span class="hljs-number">384</span>))
image = np.array(image)
low_threshold = <span class="hljs-number">100</span>
high_threshold = <span class="hljs-number">200</span>
image = cv2.Canny(image, low_threshold, high_threshold)
image = image[:, :, <span class="hljs-literal">None</span>]
image = np.concatenate([image, image, image], axis=<span class="hljs-number">2</span>)
canny_image = Image.fromarray(image).resize((<span class="hljs-number">1024</span>, <span class="hljs-number">1216</span>))
<span class="hljs-comment"># load adapter</span>
adapter = T2IAdapter.from_pretrained(<span class="hljs-string">&quot;TencentARC/t2i-adapter-canny-sdxl-1.0&quot;</span>, torch_dtype=torch.float16, varient=<span class="hljs-string">&quot;fp16&quot;</span>).to(<span class="hljs-string">&quot;cuda&quot;</span>)
unet = UNet2DConditionModel.from_pretrained(
<span class="hljs-string">&quot;latent-consistency/lcm-sdxl&quot;</span>,
torch_dtype=torch.float16,
variant=<span class="hljs-string">&quot;fp16&quot;</span>,
)
pipe = StableDiffusionXLAdapterPipeline.from_pretrained(
<span class="hljs-string">&quot;stabilityai/stable-diffusion-xl-base-1.0&quot;</span>,
unet=unet,
adapter=adapter,
torch_dtype=torch.float16,
variant=<span class="hljs-string">&quot;fp16&quot;</span>,
).to(<span class="hljs-string">&quot;cuda&quot;</span>)
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
prompt = <span class="hljs-string">&quot;Mystical fairy in real, magic, 4k picture, high quality&quot;</span>
negative_prompt = <span class="hljs-string">&quot;extra digit, fewer digits, cropped, worst quality, low quality, glitch, deformed, mutated, ugly, disfigured&quot;</span>
generator = torch.manual_seed(<span class="hljs-number">0</span>)
image = pipe(
prompt=prompt,
negative_prompt=negative_prompt,
image=canny_image,
num_inference_steps=<span class="hljs-number">4</span>,
guidance_scale=<span class="hljs-number">5</span>,
adapter_conditioning_scale=<span class="hljs-number">0.8</span>,
adapter_conditioning_factor=<span class="hljs-number">1</span>,
generator=generator,
).images[<span class="hljs-number">0</span>]
grid = make_image_grid([canny_image, image], rows=<span class="hljs-number">1</span>, cols=<span class="hljs-number">2</span>)<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-dphjgr"><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/lcm/lcm_full_sdxl_t2iadapter.png"></p>
<script>
{
__sveltekit_160bpzs = {
assets: "/docs/diffusers/v0.25.0/ko",
base: "/docs/diffusers/v0.25.0/ko",
env: {}
};
const element = document.currentScript.parentElement;
const data = [null,null];
Promise.all([
import("/docs/diffusers/v0.25.0/ko/_app/immutable/entry/start.739fcb44.js"),
import("/docs/diffusers/v0.25.0/ko/_app/immutable/entry/app.a301e47d.js")
]).then(([kit, app]) => {
kit.start(app, element, {
node_ids: [0, 164],
data,
form: null,
error: null
});
});
}
</script>

Xet Storage Details

Size:
33.1 kB
·
Xet hash:
7f4d8b591b745ca363bd28fb34a11feae8fcac222f47065969352fed71c20173

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.