| <!--Copyright 2025 The HuggingFace Team. All rights reserved. | |
| Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | |
| the License. You may obtain a copy of the License at | |
| http://www.apache.org/licenses/LICENSE-2.0 | |
| Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | |
| an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | |
| specific language governing permissions and limitations under the License. | |
| --> | |
| # LoRA 低秩适配 | |
| <Tip warning={true}> | |
| 当前功能处于实验阶段,API可能在未来版本中变更。 | |
| </Tip> | |
| [LoRA(大语言模型的低秩适配)](https://hf.co/papers/2106.09685) 是一种轻量级训练技术,能显著减少可训练参数量。其原理是通过向模型注入少量新权重参数,仅训练这些新增参数。这使得LoRA训练速度更快、内存效率更高,并生成更小的模型权重文件(通常仅数百MB),便于存储和分享。LoRA还可与DreamBooth等其他训练技术结合以加速训练过程。 | |
| <Tip> | |
| LoRA具有高度通用性,目前已支持以下应用场景:[DreamBooth](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth_lora.py)、[Kandinsky 2.2](https://github.com/huggingface/diffusers/blob/main/examples/kandinsky2_2/text_to_image/train_text_to_image_lora_decoder.py)、[Stable Diffusion XL](https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image_lora_sdxl.py)、[文生图](https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image_lora.py)以及[Wuerstchen](https://github.com/huggingface/diffusers/blob/main/examples/wuerstchen/text_to_image/train_text_to_image_lora_prior.py)。 | |
| </Tip> | |
| 本指南将通过解析[train_text_to_image_lora.py](https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image_lora.py)脚本,帮助您深入理解其工作原理,并掌握如何针对具体需求进行定制化修改。 | |
| 运行脚本前,请确保从源码安装库: | |
| ```bash | |
| git clone https://github.com/huggingface/diffusers | |
| cd diffusers | |
| pip install . | |
| ``` | |
| 进入包含训练脚本的示例目录,并安装所需依赖: | |
| <hfoptions id="installation"> | |
| <hfoption id="PyTorch"> | |
| ```bash | |
| cd examples/text_to_image | |
| pip install -r requirements.txt | |
| ``` | |
| </hfoption> | |
| <hfoption id="Flax"> | |
| ```bash | |
| cd examples/text_to_image | |
| pip install -r requirements_flax.txt | |
| ``` | |
| </hfoption> | |
| </hfoptions> | |
| <Tip> | |
| 🤗 Accelerate是一个支持多GPU/TPU训练和混合精度计算的库,它能根据硬件环境自动配置训练方案。参阅🤗 Accelerate[快速入门](https://huggingface.co/docs/accelerate/quicktour)了解更多。 | |
| </Tip> | |
| 初始化🤗 Accelerate环境: | |
| ```bash | |
| accelerate config | |
| ``` | |
| 若要创建默认配置环境(不进行交互式设置): | |
| ```bash | |
| accelerate config default | |
| ``` | |
| 若在非交互环境(如Jupyter notebook)中使用: | |
| ```py | |
| from accelerate.utils import write_basic_config | |
| write_basic_config() | |
| ``` | |
| 如需训练自定义数据集,请参考[创建训练数据集指南](create_dataset)了解数据准备流程。 | |
| <Tip> | |
| 以下章节重点解析训练脚本中与LoRA相关的核心部分,但不会涵盖所有实现细节。如需完整理解,建议直接阅读[脚本源码](https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image_lora.py),如有疑问欢迎反馈。 | |
| </Tip> | |
| ## 脚本参数 | |
| 训练脚本提供众多参数用于定制训练过程。所有参数及其说明均定义在[`parse_args()`](https://github.com/huggingface/diffusers/blob/dd9a5caf61f04d11c0fa9f3947b69ab0010c9a0f/examples/text_to_image/train_text_to_image_lora.py#L85)函数中。多数参数设有默认值,您也可以通过命令行参数覆盖: | |
| 例如增加训练轮次: | |
| ```bash | |
| accelerate launch train_text_to_image_lora.py \ | |
| --num_train_epochs=150 \ | |
| ``` | |
| 基础参数说明可参考[文生图训练指南](text2image#script-parameters),此处重点介绍LoRA相关参数: | |
| - `--rank`:低秩矩阵的内部维度,数值越高可训练参数越多 | |
| - `--learning_rate`:默认学习率为1e-4,但使用LoRA时可适当提高 | |
| ## 训练脚本实现 | |
| 数据集预处理和训练循环逻辑位于[`main()`](https://github.com/huggingface/diffusers/blob/dd9a5caf61f04d11c0fa9f3947b69ab0010c9a0f/examples/text_to_image/train_text_to_image_lora.py#L371)函数,如需定制训练流程,可在此处进行修改。 | |
| 与参数说明类似,训练流程的完整解析请参考[文生图指南](text2image#training-script),下文重点介绍LoRA相关实现。 | |
| <hfoptions id="lora"> | |
| <hfoption id="UNet"> | |
| Diffusers使用[PEFT](https://hf.co/docs/peft)库的[`~peft.LoraConfig`]配置LoRA适配器参数,包括秩(rank)、alpha值以及目标模块。适配器被注入UNet后,通过`lora_layers`筛选出需要优化的LoRA层。 | |
| ```py | |
| unet_lora_config = LoraConfig( | |
| r=args.rank, | |
| lora_alpha=args.rank, | |
| init_lora_weights="gaussian", | |
| target_modules=["to_k", "to_q", "to_v", "to_out.0"], | |
| ) | |
| unet.add_adapter(unet_lora_config) | |
| lora_layers = filter(lambda p: p.requires_grad, unet.parameters()) | |
| ``` | |
| </hfoption> | |
| <hfoption id="text encoder"> | |
| 当需要微调文本编码器时(如SDXL模型),Diffusers同样支持通过[PEFT](https://hf.co/docs/peft)库实现。[`~peft.LoraConfig`]配置适配器参数后注入文本编码器,并筛选LoRA层进行训练。 | |
| ```py | |
| text_lora_config = LoraConfig( | |
| r=args.rank, | |
| lora_alpha=args.rank, | |
| init_lora_weights="gaussian", | |
| target_modules=["q_proj", "k_proj", "v_proj", "out_proj"], | |
| ) | |
| text_encoder_one.add_adapter(text_lora_config) | |
| text_encoder_two.add_adapter(text_lora_config) | |
| text_lora_parameters_one = list(filter(lambda p: p.requires_grad, text_encoder_one.parameters())) | |
| text_lora_parameters_two = list(filter(lambda p: p.requires_grad, text_encoder_two.parameters())) | |
| ``` | |
| </hfoption> | |
| </hfoptions> | |
| [优化器](https://github.com/huggingface/diffusers/blob/e4b8f173b97731686e290b2eb98e7f5df2b1b322/examples/text_to_image/train_text_to_image_lora.py#L529)仅对`lora_layers`参数进行优化: | |
| ```py | |
| optimizer = optimizer_cls( | |
| lora_layers, | |
| lr=args.learning_rate, | |
| betas=(args.adam_beta1, args.adam_beta2), | |
| weight_decay=args.adam_weight_decay, | |
| eps=args.adam_epsilon, | |
| ) | |
| ``` | |
| 除LoRA层设置外,该训练脚本与标准train_text_to_image.py基本相同! | |
| ## 启动训练 | |
| 完成所有配置后,即可启动训练脚本!🚀 | |
| 以下示例使用[Naruto BLIP captions](https://huggingface.co/datasets/lambdalabs/naruto-blip-captions)训练生成火影角色。请设置环境变量`MODEL_NAME`和`DATASET_NAME`指定基础模型和数据集,`OUTPUT_DIR`设置输出目录,`HUB_MODEL_ID`指定Hub存储库名称。脚本运行后将生成以下文件: | |
| - 模型检查点 | |
| - `pytorch_lora_weights.safetensors`(训练好的LoRA权重) | |
| 多GPU训练请添加`--multi_gpu`参数。 | |
| <Tip warning={true}> | |
| 在11GB显存的2080 Ti显卡上完整训练约需5小时。 | |
| </Tip> | |
| ```bash | |
| export MODEL_NAME="stable-diffusion-v1-5/stable-diffusion-v1-5" | |
| export OUTPUT_DIR="/sddata/finetune/lora/naruto" | |
| export HUB_MODEL_ID="naruto-lora" | |
| export DATASET_NAME="lambdalabs/naruto-blip-captions" | |
| accelerate launch --mixed_precision="fp16" train_text_to_image_lora.py \ | |
| --pretrained_model_name_or_path=$MODEL_NAME \ | |
| --dataset_name=$DATASET_NAME \ | |
| --dataloader_num_workers=8 \ | |
| --resolution=512 \ | |
| --center_crop \ | |
| --random_flip \ | |
| --train_batch_size=1 \ | |
| --gradient_accumulation_steps=4 \ | |
| --max_train_steps=15000 \ | |
| --learning_rate=1e-04 \ | |
| --max_grad_norm=1 \ | |
| --lr_scheduler="cosine" \ | |
| --lr_warmup_steps=0 \ | |
| --output_dir=${OUTPUT_DIR} \ | |
| --push_to_hub \ | |
| --hub_model_id=${HUB_MODEL_ID} \ | |
| --report_to=wandb \ | |
| --checkpointing_steps=500 \ | |
| --validation_prompt="蓝色眼睛的火影忍者角色" \ | |
| --seed=1337 | |
| ``` | |
| 训练完成后,您可以通过以下方式进行推理: | |
| ```py | |
| from diffusers import AutoPipelineForText2Image | |
| import torch | |
| pipeline = AutoPipelineForText2Image.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16).to("cuda") | |
| pipeline.load_lora_weights("path/to/lora/model", weight_name="pytorch_lora_weights.safetensors") | |
| image = pipeline("A naruto with blue eyes").images[0] | |
| ``` | |
| ## 后续步骤 | |
| 恭喜完成LoRA模型训练!如需进一步了解模型使用方法,可参考以下指南: | |
| - 学习如何加载[不同格式的LoRA权重](../using-diffusers/loading_adapters#LoRA)(如Kohya或TheLastBen训练的模型) | |
| - 掌握使用PEFT进行[多LoRA组合推理](../tutorials/using_peft_for_inference)的技巧 |