Buckets:

rtrm's picture
download
raw
19.6 kB
<meta charset="utf-8" /><meta http-equiv="content-security-policy" content=""><meta name="hf:doc:metadata" content="{&quot;local&quot;:&quot;lora-support-in-diffusers&quot;,&quot;sections&quot;:[{&quot;local&quot;:&quot;getting-started-with-lora-for-finetuning&quot;,&quot;title&quot;:&quot;Getting started with LoRA for fine-tuning&quot;},{&quot;local&quot;:&quot;inference&quot;,&quot;title&quot;:&quot;Inference &quot;},{&quot;local&quot;:&quot;known-limitations&quot;,&quot;title&quot;:&quot;Known limitations &quot;}],&quot;title&quot;:&quot;LoRA Support in Diffusers &quot;}" data-svelte="svelte-1phssyn">
<link rel="modulepreload" href="/docs/diffusers/v0.12.0/en/_app/assets/pages/__layout.svelte-hf-doc-builder.css">
<link rel="modulepreload" href="/docs/diffusers/v0.12.0/en/_app/start-hf-doc-builder.js">
<link rel="modulepreload" href="/docs/diffusers/v0.12.0/en/_app/chunks/vendor-hf-doc-builder.js">
<link rel="modulepreload" href="/docs/diffusers/v0.12.0/en/_app/chunks/paths-hf-doc-builder.js">
<link rel="modulepreload" href="/docs/diffusers/v0.12.0/en/_app/pages/__layout.svelte-hf-doc-builder.js">
<link rel="modulepreload" href="/docs/diffusers/v0.12.0/en/_app/pages/training/lora.mdx-hf-doc-builder.js">
<link rel="modulepreload" href="/docs/diffusers/v0.12.0/en/_app/chunks/Tip-hf-doc-builder.js">
<link rel="modulepreload" href="/docs/diffusers/v0.12.0/en/_app/chunks/IconCopyLink-hf-doc-builder.js">
<link rel="modulepreload" href="/docs/diffusers/v0.12.0/en/_app/chunks/CodeBlock-hf-doc-builder.js">
<h1 class="relative group"><a id="lora-support-in-diffusers" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#lora-support-in-diffusers"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a>
<span>LoRA Support in Diffusers
</span></h1>
<p>Diffusers supports LoRA for faster fine-tuning of Stable Diffusion, allowing greater memory efficiency and easier portability. </p>
<p>Low-Rank Adaption of Large Language Models was first introduced by Microsoft in
<a href="https://arxiv.org/abs/2106.09685" rel="nofollow">LoRA: Low-Rank Adaptation of Large Language Models</a> by <em>Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen</em>.</p>
<p>In a nutshell, LoRA allows adapting pretrained models by adding pairs of rank-decomposition weight matrices (called <strong>update matrices</strong>)
to existing weights and <strong>only</strong> training those newly added weights. This has a couple of advantages:</p>
<ul><li>Previous pretrained weights are kept frozen so that the model is not so prone to <a href="https://www.pnas.org/doi/10.1073/pnas.1611835114" rel="nofollow">catastrophic forgetting</a>. </li>
<li>Rank-decomposition matrices have significantly fewer parameters than the original model, which means that trained LoRA weights are easily portable.</li>
<li>LoRA matrices are generally added to the attention layers of the original model and they control to which extent the model is adapted toward new training images via a <code>scale</code> parameter.</li></ul>
<p><strong><strong>Note that the usage of LoRA is not just limited to attention layers. In the original LoRA work, the authors found out that just amending
the attention layers of a language model is sufficient to obtain good downstream performance with great efficiency. This is why, it’s common
to just add the LoRA weights to the attention layers of a model.</strong></strong></p>
<p><a href="https://github.com/cloneofsimo" rel="nofollow">cloneofsimo</a> was the first to try out LoRA training for Stable Diffusion in the popular <a href="https://github.com/cloneofsimo/lora" rel="nofollow">lora</a> GitHub repository.</p>
<div class="course-tip bg-gradient-to-br dark:bg-gradient-to-r before:border-green-500 dark:before:border-green-800 from-green-50 dark:from-gray-900 to-white dark:to-gray-950 border border-green-50 text-green-700 dark:text-gray-400"><p>LoRA allows us to achieve greater memory efficiency since the pretrained weights are kept frozen and only the LoRA weights are trained, thereby
allowing us to run fine-tuning on consumer GPUs like Tesla T4, RTX 3080 or even RTX 2080 Ti! One can get access to GPUs like T4 in the free
tiers of Kaggle Kernels and Google Colab Notebooks.</p></div>
<h2 class="relative group"><a id="getting-started-with-lora-for-finetuning" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#getting-started-with-lora-for-finetuning"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a>
<span>Getting started with LoRA for fine-tuning
</span></h2>
<p>Stable Diffusion can be fine-tuned in different ways:</p>
<ul><li><a href="https://huggingface.co/docs/diffusers/main/en/training/text_inversion" rel="nofollow">Textual inversion</a></li>
<li><a href="https://huggingface.co/docs/diffusers/main/en/training/dreambooth" rel="nofollow">DreamBooth</a></li>
<li><a href="https://huggingface.co/docs/diffusers/main/en/training/text2image" rel="nofollow">Text2Image fine-tuning</a></li></ul>
<p>We provide two end-to-end examples that show how to run fine-tuning with LoRA:</p>
<ul><li><a href="https://github.com/huggingface/diffusers/tree/main/examples/dreambooth#training-with-low-rank-adaptation-of-large-language-models-lora" rel="nofollow">DreamBooth</a></li>
<li><a href="https://github.com/huggingface/diffusers/tree/main/examples/text_to_image#training-with-lora" rel="nofollow">Text2Image</a></li></ul>
<p>If you want to perform DreamBooth training with LoRA, for instance, you would run:</p>
<div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg>
<div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div>
Copied</div></button></div>
<pre><!-- HTML_TAG_START --><span class="hljs-built_in">export</span> MODEL_NAME=<span class="hljs-string">&quot;runwayml/stable-diffusion-v1-5&quot;</span>
<span class="hljs-built_in">export</span> INSTANCE_DIR=<span class="hljs-string">&quot;path-to-instance-images&quot;</span>
<span class="hljs-built_in">export</span> OUTPUT_DIR=<span class="hljs-string">&quot;path-to-save-model&quot;</span>
accelerate launch train_dreambooth_lora.py \
--pretrained_model_name_or_path=<span class="hljs-variable">$MODEL_NAME</span> \
--instance_data_dir=<span class="hljs-variable">$INSTANCE_DIR</span> \
--output_dir=<span class="hljs-variable">$OUTPUT_DIR</span> \
--instance_prompt=<span class="hljs-string">&quot;a photo of sks dog&quot;</span> \
--resolution=512 \
--train_batch_size=1 \
--gradient_accumulation_steps=1 \
--checkpointing_steps=100 \
--learning_rate=1e-4 \
--report_to=<span class="hljs-string">&quot;wandb&quot;</span> \
--lr_scheduler=<span class="hljs-string">&quot;constant&quot;</span> \
--lr_warmup_steps=0 \
--max_train_steps=500 \
--validation_prompt=<span class="hljs-string">&quot;A photo of sks dog in a bucket&quot;</span> \
--validation_epochs=50 \
--seed=<span class="hljs-string">&quot;0&quot;</span> \
--push_to_hub<!-- HTML_TAG_END --></pre></div>
<p>A similar process can be followed to fully fine-tune Stable Diffusion on a custom dataset using the
<code>examples/text_to_image/train_text_to_image_lora.py</code> script.</p>
<p>Refer to the respective examples linked above to learn more. </p>
<div class="course-tip bg-gradient-to-br dark:bg-gradient-to-r before:border-green-500 dark:before:border-green-800 from-green-50 dark:from-gray-900 to-white dark:to-gray-950 border border-green-50 text-green-700 dark:text-gray-400"><p>When using LoRA we can use a much higher learning rate (typically 1e-4 as opposed to ~1e-6) compared to non-LoRA Dreambooth fine-tuning.</p></div>
<p>But there is no free lunch. For the given dataset and expected generation quality, you’d still need to experiment with
different hyperparameters. Here are some important ones:</p>
<ul><li>Training time<ul><li>Learning rate </li>
<li>Number of training steps</li></ul></li>
<li>Inference time <ul><li>Number of steps </li>
<li>Scheduler type</li></ul></li></ul>
<p>Additionally, you can follow <a href="https://huggingface.co/blog/dreambooth" rel="nofollow">this blog</a> that documents some of our experimental
findings for performing DreamBooth training of Stable Diffusion.</p>
<p>When fine-tuning, the LoRA update matrices are only added to the attention layers. To enable this, we added new weight
loading functionalities. Their details are available <a href="https://huggingface.co/docs/diffusers/main/en/api/loaders" rel="nofollow">here</a>.</p>
<h2 class="relative group"><a id="inference" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#inference"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a>
<span>Inference
</span></h2>
<p>Assuming you used the <code>examples/text_to_image/train_text_to_image_lora.py</code> to fine-tune Stable Diffusion on the <a href="https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions" rel="nofollow">Pokemon
dataset</a>, you can perform inference like so: </p>
<div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg>
<div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div>
Copied</div></button></div>
<pre><!-- HTML_TAG_START --><span class="hljs-keyword">from</span> diffusers <span class="hljs-keyword">import</span> StableDiffusionPipeline
<span class="hljs-keyword">import</span> torch
model_path = <span class="hljs-string">&quot;sayakpaul/sd-model-finetuned-lora-t4&quot;</span>
pipe = StableDiffusionPipeline.from_pretrained(<span class="hljs-string">&quot;CompVis/stable-diffusion-v1-4&quot;</span>, torch_dtype=torch.float16)
pipe.unet.load_attn_procs(model_path)
pipe.to(<span class="hljs-string">&quot;cuda&quot;</span>)
prompt = <span class="hljs-string">&quot;A pokemon with blue eyes.&quot;</span>
image = pipe(prompt, num_inference_steps=<span class="hljs-number">30</span>, guidance_scale=<span class="hljs-number">7.5</span>).images[<span class="hljs-number">0</span>]
image.save(<span class="hljs-string">&quot;pokemon.png&quot;</span>)<!-- HTML_TAG_END --></pre></div>
<p>Here are some example images you can expect:</p>
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pokemon-collage.png">
<p><a href="https://huggingface.co/sayakpaul/sd-model-finetuned-lora-t4" rel="nofollow"><code>sayakpaul/sd-model-finetuned-lora-t4</code></a> contains <a href="https://huggingface.co/sayakpaul/sd-model-finetuned-lora-t4/blob/main/pytorch_lora_weights.bin" rel="nofollow">LoRA fine-tuned update matrices</a>
which is only 3 MBs in size. During inference, the pre-trained Stable Diffusion checkpoints are loaded alongside these update
matrices and then they are combined to run inference.</p>
<p>You can use the <a href="https://github.com/huggingface/huggingface_hub" rel="nofollow"><code>huggingface_hub</code></a> library to retrieve the base model
from <a href="https://huggingface.co/sayakpaul/sd-model-finetuned-lora-t4" rel="nofollow"><code>sayakpaul/sd-model-finetuned-lora-t4</code></a> like so:</p>
<div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg>
<div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div>
Copied</div></button></div>
<pre><!-- HTML_TAG_START --><span class="hljs-keyword">from</span> huggingface_hub.repocard <span class="hljs-keyword">import</span> RepoCard
card = RepoCard.load(<span class="hljs-string">&quot;sayakpaul/sd-model-finetuned-lora-t4&quot;</span>)
base_model = card.data.to_dict()[<span class="hljs-string">&quot;base_model&quot;</span>]
<span class="hljs-comment"># &#x27;CompVis/stable-diffusion-v1-4&#x27;</span><!-- HTML_TAG_END --></pre></div>
<p>And then you can use <code>pipe = StableDiffusionPipeline.from_pretrained(base_model, torch_dtype=torch.float16)</code>.</p>
<p>This is especially useful when you don’t want to hardcode the base model identifier during initializing the <code>StableDiffusionPipeline</code>.</p>
<p>Inference for DreamBooth training remains the same. Check
<a href="https://github.com/huggingface/diffusers/tree/main/examples/dreambooth#inference-1" rel="nofollow">this section</a> for more details. </p>
<h2 class="relative group"><a id="known-limitations" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#known-limitations"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a>
<span>Known limitations
</span></h2>
<ul><li>Currently, we only support LoRA for the attention layers of <a href="https://huggingface.co/docs/diffusers/main/en/api/models#diffusers.UNet2DConditionModel" rel="nofollow"><code>UNet2DConditionModel</code></a>.</li></ul>
<script type="module" data-hydrate="v4slkt">
import { start } from "/docs/diffusers/v0.12.0/en/_app/start-hf-doc-builder.js";
start({
target: document.querySelector('[data-hydrate="v4slkt"]').parentNode,
paths: {"base":"/docs/diffusers/v0.12.0/en","assets":"/docs/diffusers/v0.12.0/en"},
session: {},
route: false,
spa: false,
trailing_slash: "never",
hydrate: {
status: 200,
error: null,
nodes: [
import("/docs/diffusers/v0.12.0/en/_app/pages/__layout.svelte-hf-doc-builder.js"),
import("/docs/diffusers/v0.12.0/en/_app/pages/training/lora.mdx-hf-doc-builder.js")
],
params: {}
}
});
</script>

Xet Storage Details

Size:
19.6 kB
·
Xet hash:
7fe20961573be944fb23b1d6b3b329141145895255b5886cc658055889d1a427

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.