Buckets:
| <meta charset="utf-8" /><meta http-equiv="content-security-policy" content=""><meta name="hf:doc:metadata" content="{"local":"lora-support-in-diffusers","sections":[{"local":"getting-started-with-lora-for-finetuning","title":"Getting started with LoRA for fine-tuning"},{"local":"inference","title":"Inference "},{"local":"known-limitations","title":"Known limitations "}],"title":"LoRA Support in Diffusers "}" data-svelte="svelte-1phssyn"> | |
| <link rel="modulepreload" href="/docs/diffusers/v0.12.0/en/_app/assets/pages/__layout.svelte-hf-doc-builder.css"> | |
| <link rel="modulepreload" href="/docs/diffusers/v0.12.0/en/_app/start-hf-doc-builder.js"> | |
| <link rel="modulepreload" href="/docs/diffusers/v0.12.0/en/_app/chunks/vendor-hf-doc-builder.js"> | |
| <link rel="modulepreload" href="/docs/diffusers/v0.12.0/en/_app/chunks/paths-hf-doc-builder.js"> | |
| <link rel="modulepreload" href="/docs/diffusers/v0.12.0/en/_app/pages/__layout.svelte-hf-doc-builder.js"> | |
| <link rel="modulepreload" href="/docs/diffusers/v0.12.0/en/_app/pages/training/lora.mdx-hf-doc-builder.js"> | |
| <link rel="modulepreload" href="/docs/diffusers/v0.12.0/en/_app/chunks/Tip-hf-doc-builder.js"> | |
| <link rel="modulepreload" href="/docs/diffusers/v0.12.0/en/_app/chunks/IconCopyLink-hf-doc-builder.js"> | |
| <link rel="modulepreload" href="/docs/diffusers/v0.12.0/en/_app/chunks/CodeBlock-hf-doc-builder.js"> | |
| <h1 class="relative group"><a id="lora-support-in-diffusers" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#lora-support-in-diffusers"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> | |
| <span>LoRA Support in Diffusers | |
| </span></h1> | |
| <p>Diffusers supports LoRA for faster fine-tuning of Stable Diffusion, allowing greater memory efficiency and easier portability. </p> | |
| <p>Low-Rank Adaption of Large Language Models was first introduced by Microsoft in | |
| <a href="https://arxiv.org/abs/2106.09685" rel="nofollow">LoRA: Low-Rank Adaptation of Large Language Models</a> by <em>Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen</em>.</p> | |
| <p>In a nutshell, LoRA allows adapting pretrained models by adding pairs of rank-decomposition weight matrices (called <strong>update matrices</strong>) | |
| to existing weights and <strong>only</strong> training those newly added weights. This has a couple of advantages:</p> | |
| <ul><li>Previous pretrained weights are kept frozen so that the model is not so prone to <a href="https://www.pnas.org/doi/10.1073/pnas.1611835114" rel="nofollow">catastrophic forgetting</a>. </li> | |
| <li>Rank-decomposition matrices have significantly fewer parameters than the original model, which means that trained LoRA weights are easily portable.</li> | |
| <li>LoRA matrices are generally added to the attention layers of the original model and they control to which extent the model is adapted toward new training images via a <code>scale</code> parameter.</li></ul> | |
| <p><strong><strong>Note that the usage of LoRA is not just limited to attention layers. In the original LoRA work, the authors found out that just amending | |
| the attention layers of a language model is sufficient to obtain good downstream performance with great efficiency. This is why, it’s common | |
| to just add the LoRA weights to the attention layers of a model.</strong></strong></p> | |
| <p><a href="https://github.com/cloneofsimo" rel="nofollow">cloneofsimo</a> was the first to try out LoRA training for Stable Diffusion in the popular <a href="https://github.com/cloneofsimo/lora" rel="nofollow">lora</a> GitHub repository.</p> | |
| <div class="course-tip bg-gradient-to-br dark:bg-gradient-to-r before:border-green-500 dark:before:border-green-800 from-green-50 dark:from-gray-900 to-white dark:to-gray-950 border border-green-50 text-green-700 dark:text-gray-400"><p>LoRA allows us to achieve greater memory efficiency since the pretrained weights are kept frozen and only the LoRA weights are trained, thereby | |
| allowing us to run fine-tuning on consumer GPUs like Tesla T4, RTX 3080 or even RTX 2080 Ti! One can get access to GPUs like T4 in the free | |
| tiers of Kaggle Kernels and Google Colab Notebooks.</p></div> | |
| <h2 class="relative group"><a id="getting-started-with-lora-for-finetuning" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#getting-started-with-lora-for-finetuning"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> | |
| <span>Getting started with LoRA for fine-tuning | |
| </span></h2> | |
| <p>Stable Diffusion can be fine-tuned in different ways:</p> | |
| <ul><li><a href="https://huggingface.co/docs/diffusers/main/en/training/text_inversion" rel="nofollow">Textual inversion</a></li> | |
| <li><a href="https://huggingface.co/docs/diffusers/main/en/training/dreambooth" rel="nofollow">DreamBooth</a></li> | |
| <li><a href="https://huggingface.co/docs/diffusers/main/en/training/text2image" rel="nofollow">Text2Image fine-tuning</a></li></ul> | |
| <p>We provide two end-to-end examples that show how to run fine-tuning with LoRA:</p> | |
| <ul><li><a href="https://github.com/huggingface/diffusers/tree/main/examples/dreambooth#training-with-low-rank-adaptation-of-large-language-models-lora" rel="nofollow">DreamBooth</a></li> | |
| <li><a href="https://github.com/huggingface/diffusers/tree/main/examples/text_to_image#training-with-lora" rel="nofollow">Text2Image</a></li></ul> | |
| <p>If you want to perform DreamBooth training with LoRA, for instance, you would run:</p> | |
| <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> | |
| <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> | |
| Copied</div></button></div> | |
| <pre><!-- HTML_TAG_START --><span class="hljs-built_in">export</span> MODEL_NAME=<span class="hljs-string">"runwayml/stable-diffusion-v1-5"</span> | |
| <span class="hljs-built_in">export</span> INSTANCE_DIR=<span class="hljs-string">"path-to-instance-images"</span> | |
| <span class="hljs-built_in">export</span> OUTPUT_DIR=<span class="hljs-string">"path-to-save-model"</span> | |
| accelerate launch train_dreambooth_lora.py \ | |
| --pretrained_model_name_or_path=<span class="hljs-variable">$MODEL_NAME</span> \ | |
| --instance_data_dir=<span class="hljs-variable">$INSTANCE_DIR</span> \ | |
| --output_dir=<span class="hljs-variable">$OUTPUT_DIR</span> \ | |
| --instance_prompt=<span class="hljs-string">"a photo of sks dog"</span> \ | |
| --resolution=512 \ | |
| --train_batch_size=1 \ | |
| --gradient_accumulation_steps=1 \ | |
| --checkpointing_steps=100 \ | |
| --learning_rate=1e-4 \ | |
| --report_to=<span class="hljs-string">"wandb"</span> \ | |
| --lr_scheduler=<span class="hljs-string">"constant"</span> \ | |
| --lr_warmup_steps=0 \ | |
| --max_train_steps=500 \ | |
| --validation_prompt=<span class="hljs-string">"A photo of sks dog in a bucket"</span> \ | |
| --validation_epochs=50 \ | |
| --seed=<span class="hljs-string">"0"</span> \ | |
| --push_to_hub<!-- HTML_TAG_END --></pre></div> | |
| <p>A similar process can be followed to fully fine-tune Stable Diffusion on a custom dataset using the | |
| <code>examples/text_to_image/train_text_to_image_lora.py</code> script.</p> | |
| <p>Refer to the respective examples linked above to learn more. </p> | |
| <div class="course-tip bg-gradient-to-br dark:bg-gradient-to-r before:border-green-500 dark:before:border-green-800 from-green-50 dark:from-gray-900 to-white dark:to-gray-950 border border-green-50 text-green-700 dark:text-gray-400"><p>When using LoRA we can use a much higher learning rate (typically 1e-4 as opposed to ~1e-6) compared to non-LoRA Dreambooth fine-tuning.</p></div> | |
| <p>But there is no free lunch. For the given dataset and expected generation quality, you’d still need to experiment with | |
| different hyperparameters. Here are some important ones:</p> | |
| <ul><li>Training time<ul><li>Learning rate </li> | |
| <li>Number of training steps</li></ul></li> | |
| <li>Inference time <ul><li>Number of steps </li> | |
| <li>Scheduler type</li></ul></li></ul> | |
| <p>Additionally, you can follow <a href="https://huggingface.co/blog/dreambooth" rel="nofollow">this blog</a> that documents some of our experimental | |
| findings for performing DreamBooth training of Stable Diffusion.</p> | |
| <p>When fine-tuning, the LoRA update matrices are only added to the attention layers. To enable this, we added new weight | |
| loading functionalities. Their details are available <a href="https://huggingface.co/docs/diffusers/main/en/api/loaders" rel="nofollow">here</a>.</p> | |
| <h2 class="relative group"><a id="inference" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#inference"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> | |
| <span>Inference | |
| </span></h2> | |
| <p>Assuming you used the <code>examples/text_to_image/train_text_to_image_lora.py</code> to fine-tune Stable Diffusion on the <a href="https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions" rel="nofollow">Pokemon | |
| dataset</a>, you can perform inference like so: </p> | |
| <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> | |
| <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> | |
| Copied</div></button></div> | |
| <pre><!-- HTML_TAG_START --><span class="hljs-keyword">from</span> diffusers <span class="hljs-keyword">import</span> StableDiffusionPipeline | |
| <span class="hljs-keyword">import</span> torch | |
| model_path = <span class="hljs-string">"sayakpaul/sd-model-finetuned-lora-t4"</span> | |
| pipe = StableDiffusionPipeline.from_pretrained(<span class="hljs-string">"CompVis/stable-diffusion-v1-4"</span>, torch_dtype=torch.float16) | |
| pipe.unet.load_attn_procs(model_path) | |
| pipe.to(<span class="hljs-string">"cuda"</span>) | |
| prompt = <span class="hljs-string">"A pokemon with blue eyes."</span> | |
| image = pipe(prompt, num_inference_steps=<span class="hljs-number">30</span>, guidance_scale=<span class="hljs-number">7.5</span>).images[<span class="hljs-number">0</span>] | |
| image.save(<span class="hljs-string">"pokemon.png"</span>)<!-- HTML_TAG_END --></pre></div> | |
| <p>Here are some example images you can expect:</p> | |
| <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pokemon-collage.png"> | |
| <p><a href="https://huggingface.co/sayakpaul/sd-model-finetuned-lora-t4" rel="nofollow"><code>sayakpaul/sd-model-finetuned-lora-t4</code></a> contains <a href="https://huggingface.co/sayakpaul/sd-model-finetuned-lora-t4/blob/main/pytorch_lora_weights.bin" rel="nofollow">LoRA fine-tuned update matrices</a> | |
| which is only 3 MBs in size. During inference, the pre-trained Stable Diffusion checkpoints are loaded alongside these update | |
| matrices and then they are combined to run inference.</p> | |
| <p>You can use the <a href="https://github.com/huggingface/huggingface_hub" rel="nofollow"><code>huggingface_hub</code></a> library to retrieve the base model | |
| from <a href="https://huggingface.co/sayakpaul/sd-model-finetuned-lora-t4" rel="nofollow"><code>sayakpaul/sd-model-finetuned-lora-t4</code></a> like so:</p> | |
| <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> | |
| <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> | |
| Copied</div></button></div> | |
| <pre><!-- HTML_TAG_START --><span class="hljs-keyword">from</span> huggingface_hub.repocard <span class="hljs-keyword">import</span> RepoCard | |
| card = RepoCard.load(<span class="hljs-string">"sayakpaul/sd-model-finetuned-lora-t4"</span>) | |
| base_model = card.data.to_dict()[<span class="hljs-string">"base_model"</span>] | |
| <span class="hljs-comment"># 'CompVis/stable-diffusion-v1-4'</span><!-- HTML_TAG_END --></pre></div> | |
| <p>And then you can use <code>pipe = StableDiffusionPipeline.from_pretrained(base_model, torch_dtype=torch.float16)</code>.</p> | |
| <p>This is especially useful when you don’t want to hardcode the base model identifier during initializing the <code>StableDiffusionPipeline</code>.</p> | |
| <p>Inference for DreamBooth training remains the same. Check | |
| <a href="https://github.com/huggingface/diffusers/tree/main/examples/dreambooth#inference-1" rel="nofollow">this section</a> for more details. </p> | |
| <h2 class="relative group"><a id="known-limitations" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#known-limitations"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> | |
| <span>Known limitations | |
| </span></h2> | |
| <ul><li>Currently, we only support LoRA for the attention layers of <a href="https://huggingface.co/docs/diffusers/main/en/api/models#diffusers.UNet2DConditionModel" rel="nofollow"><code>UNet2DConditionModel</code></a>.</li></ul> | |
| <script type="module" data-hydrate="v4slkt"> | |
| import { start } from "/docs/diffusers/v0.12.0/en/_app/start-hf-doc-builder.js"; | |
| start({ | |
| target: document.querySelector('[data-hydrate="v4slkt"]').parentNode, | |
| paths: {"base":"/docs/diffusers/v0.12.0/en","assets":"/docs/diffusers/v0.12.0/en"}, | |
| session: {}, | |
| route: false, | |
| spa: false, | |
| trailing_slash: "never", | |
| hydrate: { | |
| status: 200, | |
| error: null, | |
| nodes: [ | |
| import("/docs/diffusers/v0.12.0/en/_app/pages/__layout.svelte-hf-doc-builder.js"), | |
| import("/docs/diffusers/v0.12.0/en/_app/pages/training/lora.mdx-hf-doc-builder.js") | |
| ], | |
| params: {} | |
| } | |
| }); | |
| </script> | |
Xet Storage Details
- Size:
- 19.6 kB
- Xet hash:
- 7fe20961573be944fb23b1d6b3b329141145895255b5886cc658055889d1a427
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.