Buckets:

rtrm's picture
download
raw
16.2 kB
<meta charset="utf-8" /><meta name="hf:doc:metadata" content="{&quot;title&quot;:&quot;Control over Diffusion Models&quot;,&quot;local&quot;:&quot;control-over-diffusion-models&quot;,&quot;sections&quot;:[{&quot;title&quot;:&quot;Dreambooth&quot;,&quot;local&quot;:&quot;dreambooth&quot;,&quot;sections&quot;:[],&quot;depth&quot;:2},{&quot;title&quot;:&quot;Guided Diffusion via ControlNet&quot;,&quot;local&quot;:&quot;guided-diffusion-via-controlnet&quot;,&quot;sections&quot;:[],&quot;depth&quot;:2}],&quot;depth&quot;:1}">
<link href="/docs/computer-vision-course/pr_397/en/_app/immutable/assets/0.e3b0c442.css" rel="modulepreload">
<link rel="modulepreload" href="/docs/computer-vision-course/pr_397/en/_app/immutable/entry/start.7f209408.js">
<link rel="modulepreload" href="/docs/computer-vision-course/pr_397/en/_app/immutable/chunks/scheduler.7bc62968.js">
<link rel="modulepreload" href="/docs/computer-vision-course/pr_397/en/_app/immutable/chunks/singletons.b15acae1.js">
<link rel="modulepreload" href="/docs/computer-vision-course/pr_397/en/_app/immutable/chunks/paths.11cdc4b4.js">
<link rel="modulepreload" href="/docs/computer-vision-course/pr_397/en/_app/immutable/entry/app.32e8338e.js">
<link rel="modulepreload" href="/docs/computer-vision-course/pr_397/en/_app/immutable/chunks/index.2f8492b0.js">
<link rel="modulepreload" href="/docs/computer-vision-course/pr_397/en/_app/immutable/nodes/0.e37092e8.js">
<link rel="modulepreload" href="/docs/computer-vision-course/pr_397/en/_app/immutable/nodes/63.b9a77360.js">
<link rel="modulepreload" href="/docs/computer-vision-course/pr_397/en/_app/immutable/chunks/CodeBlock.bb61a5a9.js">
<link rel="modulepreload" href="/docs/computer-vision-course/pr_397/en/_app/immutable/chunks/index.514d62da.js"><!-- HEAD_svelte-u9bgzb_START --><meta name="hf:doc:metadata" content="{&quot;title&quot;:&quot;Control over Diffusion Models&quot;,&quot;local&quot;:&quot;control-over-diffusion-models&quot;,&quot;sections&quot;:[{&quot;title&quot;:&quot;Dreambooth&quot;,&quot;local&quot;:&quot;dreambooth&quot;,&quot;sections&quot;:[],&quot;depth&quot;:2},{&quot;title&quot;:&quot;Guided Diffusion via ControlNet&quot;,&quot;local&quot;:&quot;guided-diffusion-via-controlnet&quot;,&quot;sections&quot;:[],&quot;depth&quot;:2}],&quot;depth&quot;:1}"><!-- HEAD_svelte-u9bgzb_END --> <p></p> <h1 class="relative group"><a id="control-over-diffusion-models" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#control-over-diffusion-models"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Control over Diffusion Models</span></h1> <h2 class="relative group"><a id="dreambooth" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#dreambooth"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Dreambooth</span></h2> <p data-svelte-h="svelte-dm5ekh">Although diffusion models and GANs can generate many unique images, they can’t always generate what you need exactly. Hence, you have to fine-tune a model, which usually requires a lot of data and computation. However, some techniques can be used to personalize a model with just a few examples.</p> <p data-svelte-h="svelte-7epn70">One example is Dreambooth by Google Research, a training technique that updates the entire diffusion model by training on just a few images of a subject or style. It works by associating a special word in the prompt with the example images. The details on Dreambooth can be found in the <a href="https://dreambooth.github.io/" rel="nofollow">paper</a> and the <a href="https://huggingface.co/docs/diffusers/training/dreambooth" rel="nofollow">Hugging Face Dreambooth training documentation</a>.</p> <p data-svelte-h="svelte-3y9hjg">Below, you can see the results of Dreambooth being used to train on 4 images of a dog and some inference examples.
<img src="https://huggingface.co/datasets/hf-vision/course-assets/resolve/main/teaser_static.jpg" alt="Dreambooth Dog Example">
You can recreate the results following the Hugging Face documentation given above.</p> <p data-svelte-h="svelte-1mlf4v3">From this example, it’s clear that the model has learned about that specific dog and can generate new images of that dog in diverse poses and backgrounds. Although computing, data, and time are improvements, others have found more efficient ways to customize a model.</p> <p data-svelte-h="svelte-1vfqo5c">This is where the current most popular method for this comes in, which is Low Rank Adaptation (LoRA). This method was initially developed to efficiently fine-tune Large Language Models by Microsoft in this <a href="https://arxiv.org/abs/2106.09685" rel="nofollow">paper</a>. The main idea is to factorise the weight update matrix into 2 low-rank matrices, which are optimized during training whilst the rest of the model is frozen. <a href="https://huggingface.co/docs/peft/conceptual_guides/lora" rel="nofollow">Hugging face documentation</a> has a good conceptual guide on how LoRA works.</p> <p data-svelte-h="svelte-18olxdv">Now, if we put those ideas together we can use LoRA to efficiently fine-tune a diffusion model on a few examples using Dreambooth. A tutorial Google Colab notebook on how to do this can be found <a href="https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/SDXL_DreamBooth_LoRA_.ipynb" rel="nofollow">here</a>.</p> <p data-svelte-h="svelte-s85m5l">Due to the quality and efficiency of this method, many people have created their own LoRA parameters which many can be found on a website called <a href="https://civitai.com/models" rel="nofollow">Civitai</a> and <a href="https://huggingface.co/collections/multimodalart/awesome-sdxl-loras-64f9af6d5cce4f4e8f351466" rel="nofollow">Hugging Face</a>.
For Civitai you can download the LoRA weights which usually are in the range of 50-500MB and in the case of Hugging Face version you can just load the model directly from the model hub.
Below is an example of how to load the LoRA weights in both cases and then fuse them with the model.</p> <p data-svelte-h="svelte-12lbort">We can start with installing diffusers library.</p> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->pip install diffusers<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1m8x1nt">We will initialize the <code>StableDiffusionXLPipeline</code> and load LoRA adapter weights.</p> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-keyword">from</span> diffusers <span class="hljs-keyword">import</span> StableDiffusionXLPipeline
<span class="hljs-keyword">import</span> torch
model = <span class="hljs-string">&quot;stabilityai/stable-diffusion-xl-base-1.0&quot;</span>
pipe = StableDiffusionXLPipeline.from_pretrained(model, torch_dtype=torch.float16)
pipe.load_lora_weights(
<span class="hljs-string">&quot;lora_weights.safetensors&quot;</span>
) <span class="hljs-comment"># if you want to install from a weight file</span>
pipe.load_lora_weights(
<span class="hljs-string">&quot;ostris/crayon_style_lora_sdxl&quot;</span>
) <span class="hljs-comment"># if you wish to install a lora from a repository directly</span>
pipe.fuse_lora(lora_scale=<span class="hljs-number">0.8</span>)<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1yk0z3z">This makes it quick to load a customised diffusion model and use it for inference, especially since there are a lot of models to choose from. Then, if we want to remove the LoRA weights, we can call <code>pipe.unfuse_lora()</code> which will return the model to its original state. As for the <code>lora_scale</code> parameter, this is a hyperparameter that controls how much the LoRA weights are used during inference. A value of 1.0 means the LoRA weights are fully used and a value of 0.0 means the LoRA weights are not used at all. The best value is often between 0.7 and 1.0 but it’s worth experimenting with different values to see what works best for your use case.</p> <p data-svelte-h="svelte-ajhdq4">You can try some of the Hugging Face LoRA models in this Gradio demo:</p> <iframe src="https://multimodalart-LoraTheExplorer.hf.space" frameborder="0" width="850" height="450" data-svelte-h="svelte-11rdemp"></iframe> <h2 class="relative group"><a id="guided-diffusion-via-controlnet" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#guided-diffusion-via-controlnet"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Guided Diffusion via ControlNet</span></h2> <p data-svelte-h="svelte-1iuomxt">Diffusion models have many ways in which they can be guided to create a desired output, such as prompts, negative prompts, guidance scale, inpainting and many others. Here, we will focus on a method that has many variants and can be combined with all the other methods, called ControlNet. It was introduced in this <a href="https://arxiv.org/abs/2302.05543" rel="nofollow">paper</a> by Stanford University. This method allows us to guide the diffusion model with an image that usually holds very specific information such as depth, pose, edges, and many others. This allows for more consistency in the generated images, which is often a problem with diffusion models.</p> <p data-svelte-h="svelte-yqr741">ControlNet can be used in both text-to-image and image-to-image. Below is a text 2 image example using a ControlNet which was trained on edge detection conditioning, with the top left image being used as input.
Here we can see how all of the generated images have a very similar shape but with different colours. This is because the ControlNet is guiding the diffusion model to create images with the same shape as the input image.
<img src="https://github.com/lllyasviel/ControlNet/raw/main/github_page/p1.png" alt="bird"></p> <p data-svelte-h="svelte-1j6zjnh">For code to run ControlNet with Stable Diffusion XL refer to the official documentation <a href="https://huggingface.co/docs/diffusers/api/pipelines/controlnet_sdxl#diffusers.StableDiffusionXLControlNetPipeline" rel="nofollow">here</a> but if you just want to test out some examples take a look at this Gradio demo that lets you try different types of ControlNet:</p> <iframe src="https://hysts-ControlNet-v1-1.hf.space" frameborder="0" width="850" height="450" data-svelte-h="svelte-ppftsn"></iframe> <a class="!text-gray-400 !no-underline text-sm flex items-center not-prose mt-4" href="https://github.com/huggingface/computer-vision-course/blob/main/chapters/en/unit5/generative-models/diffusion-models/simple-explanation.mdx" target="_blank"><span data-svelte-h="svelte-1kd6by1">&lt;</span> <span data-svelte-h="svelte-x0xyl0">&gt;</span> <span data-svelte-h="svelte-1dajgef"><span class="underline ml-1.5">Update</span> on GitHub</span></a> <p></p>
<script>
{
__sveltekit_1p6gie1 = {
assets: "/docs/computer-vision-course/pr_397/en",
base: "/docs/computer-vision-course/pr_397/en",
env: {}
};
const element = document.currentScript.parentElement;
const data = [null,null];
Promise.all([
import("/docs/computer-vision-course/pr_397/en/_app/immutable/entry/start.7f209408.js"),
import("/docs/computer-vision-course/pr_397/en/_app/immutable/entry/app.32e8338e.js")
]).then(([kit, app]) => {
kit.start(app, element, {
node_ids: [0, 63],
data,
form: null,
error: null
});
});
}
</script>

Xet Storage Details

Size:
16.2 kB
·
Xet hash:
199ad1981f049c9c99394966f68ef0950c804b950299a8190548901e670fee2b

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.