Buckets:
| <meta charset="utf-8" /><meta name="hf:doc:metadata" content="{"title":"Fully Sharded Data Parallel","local":"fully-sharded-data-parallel","sections":[],"depth":1}"> | |
| <link href="/docs/peft/pr_3207/en/_app/immutable/assets/0.e3b0c442.css" rel="modulepreload"> | |
| <link rel="modulepreload" href="/docs/peft/pr_3207/en/_app/immutable/entry/start.dbacf4b8.js"> | |
| <link rel="modulepreload" href="/docs/peft/pr_3207/en/_app/immutable/chunks/scheduler.78382b47.js"> | |
| <link rel="modulepreload" href="/docs/peft/pr_3207/en/_app/immutable/chunks/singletons.2397427a.js"> | |
| <link rel="modulepreload" href="/docs/peft/pr_3207/en/_app/immutable/chunks/index.fadd215c.js"> | |
| <link rel="modulepreload" href="/docs/peft/pr_3207/en/_app/immutable/chunks/paths.6fea2ad6.js"> | |
| <link rel="modulepreload" href="/docs/peft/pr_3207/en/_app/immutable/entry/app.cb03e0d4.js"> | |
| <link rel="modulepreload" href="/docs/peft/pr_3207/en/_app/immutable/chunks/preload-helper.2f407600.js"> | |
| <link rel="modulepreload" href="/docs/peft/pr_3207/en/_app/immutable/chunks/index.6dd35eb6.js"> | |
| <link rel="modulepreload" href="/docs/peft/pr_3207/en/_app/immutable/nodes/0.18735116.js"> | |
| <link rel="modulepreload" href="/docs/peft/pr_3207/en/_app/immutable/chunks/each.e59479a4.js"> | |
| <link rel="modulepreload" href="/docs/peft/pr_3207/en/_app/immutable/nodes/3.4efe874e.js"> | |
| <link rel="modulepreload" href="/docs/peft/pr_3207/en/_app/immutable/chunks/MermaidChart.svelte_svelte_type_style_lang.d25d6883.js"> | |
| <link rel="modulepreload" href="/docs/peft/pr_3207/en/_app/immutable/chunks/CodeBlock.147ab5db.js"><!-- HEAD_svelte-u9bgzb_START --><meta name="hf:doc:metadata" content="{"title":"Fully Sharded Data Parallel","local":"fully-sharded-data-parallel","sections":[],"depth":1}"><!-- HEAD_svelte-u9bgzb_END --> <p></p> <div class="items-center shrink-0 min-w-[100px] max-sm:min-w-[50px] justify-end ml-auto flex" style="float: right; margin-left: 10px; display: inline-flex; position: relative; z-index: 10;"><div class="inline-flex rounded-md max-sm:rounded-sm"><button class="inline-flex items-center gap-1 h-7 max-sm:h-7 px-2 max-sm:px-1.5 text-sm font-medium text-gray-800 border border-r-0 rounded-l-md max-sm:rounded-l-sm border-gray-200 bg-white hover:shadow-inner dark:border-gray-850 dark:bg-gray-950 dark:text-gray-200 dark:hover:bg-gray-800" aria-live="polite"><span class="inline-flex items-center justify-center rounded-md p-0.5 max-sm:p-0 hover:text-gray-800 dark:hover:text-gray-200"><svg class="sm:size-3.5 size-3" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg></span> <span>Copy page</span></button> <button class="inline-flex items-center justify-center w-6 max-sm:w-5 h-7 max-sm:h-7 disabled:pointer-events-none text-sm text-gray-500 hover:text-gray-700 dark:hover:text-white rounded-r-md max-sm:rounded-r-sm border border-l transition border-gray-200 bg-white hover:shadow-inner dark:border-gray-850 dark:bg-gray-950 dark:text-gray-200 dark:hover:bg-gray-800" aria-haspopup="menu" aria-expanded="false" aria-label="Open copy menu"><svg class="transition-transform text-gray-400 overflow-visible sm:size-3.5 size-3 rotate-0" width="1em" height="1em" viewBox="0 0 12 7" fill="none" xmlns="http://www.w3.org/2000/svg"><path d="M1 1L6 6L11 1" stroke="currentColor"></path></svg></button></div> </div> <h1 class="relative group"><a id="fully-sharded-data-parallel" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#fully-sharded-data-parallel"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Fully Sharded Data Parallel</span></h1> <p data-svelte-h="svelte-1552p1b"><a href="https://pytorch.org/docs/stable/fsdp.html" rel="nofollow">Fully sharded data parallel</a> (FSDP) is developed for distributed training of large pretrained models up to 1T parameters. FSDP achieves this by sharding the model parameters, gradients, and optimizer states across data parallel processes and it can also offload sharded model parameters to a CPU. The memory efficiency afforded by FSDP allows you to scale training to larger batch or model sizes.</p> <p data-svelte-h="svelte-1o8k4c8">Both of these features are supported in 🤗 Accelerate, and you can use them with 🤗 PEFT.</p> <h1 class="relative group"><a id="use-peft-and-fsdp" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#use-peft-and-fsdp"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Use PEFT and FSDP</span></h1> <p data-svelte-h="svelte-14llqsi">This section of guide will help you learn how to use our DeepSpeed <a href="https://github.com/huggingface/peft/blob/main/examples/sft/train.py" rel="nofollow">training script</a> for performing SFT. You’ll configure the script to do SFT (supervised fine-tuning) of Llama-70B model with LoRA and FSDP on 8xH100 80GB GPUs on a single machine. You can configure it to scale to multiple machines by changing the accelerate config.</p> <h2 class="relative group"><a id="configuration" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#configuration"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Configuration</span></h2> <p data-svelte-h="svelte-zjx7c1">Start by running the following command to <a href="https://huggingface.co/docs/accelerate/quicktour#launching-your-distributed-script" rel="nofollow">create a FSDP configuration file</a> with 🤗 Accelerate. The <code>--config_file</code> flag allows you to save the configuration file to a specific location, otherwise it is saved as a <code>default_config.yaml</code> file in the 🤗 Accelerate cache.</p> <p data-svelte-h="svelte-1oar51z">The configuration file is used to set the default options when you launch the training script.</p> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->accelerate config --config_file fsdp_config.yaml<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1dbl6fp">You’ll be asked a few questions about your setup, and configure the following arguments. In this example, you’ll answer the questionnaire as shown in the image below.</p> <div class="flex justify-center" data-svelte-h="svelte-j046au"><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/peft/fsdp-peft-config.png"></div> <small data-svelte-h="svelte-rynbwn">Creating Accelerate's config to use FSDP</small> <p data-svelte-h="svelte-1iebck7">Once this is done, the corresponding config should look like below and you can find it in config folder at <a href="https://github.com/huggingface/peft/blob/main/examples/sft/configs/fsdp_config.yaml" rel="nofollow">fsdp_config.yaml</a>:</p> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-attr">compute_environment:</span> <span class="hljs-string">LOCAL_MACHINE</span> | |
| <span class="hljs-attr">debug:</span> <span class="hljs-literal">false</span> | |
| <span class="hljs-attr">distributed_type:</span> <span class="hljs-string">FSDP</span> | |
| <span class="hljs-attr">downcast_bf16:</span> <span class="hljs-string">'no'</span> | |
| <span class="hljs-attr">fsdp_config:</span> | |
| <span class="hljs-attr">fsdp_auto_wrap_policy:</span> <span class="hljs-string">TRANSFORMER_BASED_WRAP</span> | |
| <span class="hljs-attr">fsdp_backward_prefetch:</span> <span class="hljs-string">BACKWARD_PRE</span> | |
| <span class="hljs-attr">fsdp_cpu_ram_efficient_loading:</span> <span class="hljs-literal">true</span> | |
| <span class="hljs-attr">fsdp_forward_prefetch:</span> <span class="hljs-literal">false</span> | |
| <span class="hljs-attr">fsdp_offload_params:</span> <span class="hljs-literal">false</span> | |
| <span class="hljs-attr">fsdp_sharding_strategy:</span> <span class="hljs-string">FULL_SHARD</span> | |
| <span class="hljs-attr">fsdp_state_dict_type:</span> <span class="hljs-string">SHARDED_STATE_DICT</span> | |
| <span class="hljs-attr">fsdp_sync_module_states:</span> <span class="hljs-literal">true</span> | |
| <span class="hljs-attr">fsdp_use_orig_params:</span> <span class="hljs-literal">false</span> | |
| <span class="hljs-attr">machine_rank:</span> <span class="hljs-number">0</span> | |
| <span class="hljs-attr">main_training_function:</span> <span class="hljs-string">main</span> | |
| <span class="hljs-attr">mixed_precision:</span> <span class="hljs-string">bf16</span> | |
| <span class="hljs-attr">num_machines:</span> <span class="hljs-number">1</span> | |
| <span class="hljs-attr">num_processes:</span> <span class="hljs-number">8</span> | |
| <span class="hljs-attr">rdzv_backend:</span> <span class="hljs-string">static</span> | |
| <span class="hljs-attr">same_network:</span> <span class="hljs-literal">true</span> | |
| <span class="hljs-attr">tpu_env:</span> [] | |
| <span class="hljs-attr">tpu_use_cluster:</span> <span class="hljs-literal">false</span> | |
| <span class="hljs-attr">tpu_use_sudo:</span> <span class="hljs-literal">false</span> | |
| <span class="hljs-attr">use_cpu:</span> <span class="hljs-literal">false</span><!-- HTML_TAG_END --></pre></div> <h2 class="relative group"><a id="launch-command" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#launch-command"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Launch command</span></h2> <p data-svelte-h="svelte-1vkau4u">The launch command is available at <a href="https://github.com/huggingface/peft/blob/main/examples/sft/run_peft_fsdp.sh" rel="nofollow">run_peft_fsdp.sh</a> and it is also shown below:</p> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->accelerate launch --config_file <span class="hljs-string">"configs/fsdp_config.yaml"</span> train.py \ | |
| --seed 100 \ | |
| --model_name_or_path <span class="hljs-string">"meta-llama/Llama-2-70b-hf"</span> \ | |
| --dataset_name <span class="hljs-string">"smangrul/ultrachat-10k-chatml"</span> \ | |
| --chat_template_format <span class="hljs-string">"chatml"</span> \ | |
| --add_special_tokens False \ | |
| --append_concat_token False \ | |
| --splits <span class="hljs-string">"train,test"</span> \ | |
| --max_seq_len 2048 \ | |
| --num_train_epochs 1 \ | |
| --logging_steps 5 \ | |
| --log_level <span class="hljs-string">"info"</span> \ | |
| --logging_strategy <span class="hljs-string">"steps"</span> \ | |
| --eval_strategy <span class="hljs-string">"epoch"</span> \ | |
| --save_strategy <span class="hljs-string">"epoch"</span> \ | |
| --push_to_hub \ | |
| --hub_private_repo True \ | |
| --hub_strategy <span class="hljs-string">"every_save"</span> \ | |
| --bf16 True \ | |
| --packing True \ | |
| --learning_rate 1e-4 \ | |
| --lr_scheduler_type <span class="hljs-string">"cosine"</span> \ | |
| --weight_decay 1e-4 \ | |
| --warmup_steps 0 \ | |
| --max_grad_norm 1.0 \ | |
| --output_dir <span class="hljs-string">"llama-sft-lora-fsdp"</span> \ | |
| --per_device_train_batch_size 8 \ | |
| --per_device_eval_batch_size 8 \ | |
| --gradient_accumulation_steps 4 \ | |
| --gradient_checkpointing True \ | |
| --use_reentrant False \ | |
| --dataset_text_field <span class="hljs-string">"content"</span> \ | |
| --use_flash_attn True \ | |
| --use_peft_lora True \ | |
| --lora_r 8 \ | |
| --lora_alpha 16 \ | |
| --lora_dropout 0.1 \ | |
| --lora_target_modules <span class="hljs-string">"all-linear"</span> \ | |
| --use_4bit_quantization False<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1ymw855">Notice that we are using LoRA with rank=8, alpha=16 and targeting all linear layers. We are passing the FSDP config file and finetuning the 70B Llama model on a subset of the <a href="https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k" rel="nofollow">ultrachat dataset</a>.</p> <h2 class="relative group"><a id="the-important-parts" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#the-important-parts"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>The important parts</span></h2> <p data-svelte-h="svelte-y7jq4k">Let’s dive a little deeper into the script so you can see what’s going on, and understand how it works.</p> <p data-svelte-h="svelte-187kjdq">The first thing to know is that the script uses FSDP for distributed training as the FSDP config has been passed. The <a href="https://huggingface.co/docs/trl/main/en/sft_trainer#trl.SFTTrainer" rel="nofollow">SFTTrainer</a> class handles all the heavy lifting of creating PEFT model using the peft config that is passed. After that when you call <code>trainer.train()</code>, Trainer internally uses 🤗 Accelerate to prepare model, optimizer and trainer using the FSDP config to create FSDP wrapped model which is then trained. The main code snippet is below:</p> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-comment"># trainer</span> | |
| trainer = SFTTrainer( | |
| model=model, | |
| processing_class=tokenizer, | |
| args=training_args, | |
| train_dataset=train_dataset, | |
| eval_dataset=eval_dataset, | |
| peft_config=peft_config, | |
| ) | |
| trainer.accelerator.<span class="hljs-built_in">print</span>(<span class="hljs-string">f"<span class="hljs-subst">{trainer.model}</span>"</span>) | |
| <span class="hljs-keyword">if</span> model_args.use_peft_lora: | |
| <span class="hljs-comment"># handle PEFT+FSDP case</span> | |
| trainer.model.print_trainable_parameters() | |
| <span class="hljs-keyword">if</span> <span class="hljs-built_in">getattr</span>(trainer.accelerator.state, <span class="hljs-string">"fsdp_plugin"</span>, <span class="hljs-literal">None</span>): | |
| <span class="hljs-keyword">from</span> peft.utils.other <span class="hljs-keyword">import</span> fsdp_auto_wrap_policy | |
| fsdp_plugin = trainer.accelerator.state.fsdp_plugin | |
| fsdp_plugin.auto_wrap_policy = fsdp_auto_wrap_policy(trainer.model) | |
| <span class="hljs-comment"># train</span> | |
| checkpoint = <span class="hljs-literal">None</span> | |
| <span class="hljs-keyword">if</span> training_args.resume_from_checkpoint <span class="hljs-keyword">is</span> <span class="hljs-keyword">not</span> <span class="hljs-literal">None</span>: | |
| checkpoint = training_args.resume_from_checkpoint | |
| trainer.train(resume_from_checkpoint=checkpoint) | |
| <span class="hljs-comment"># saving final model</span> | |
| <span class="hljs-keyword">if</span> trainer.is_fsdp_enabled: | |
| trainer.accelerator.state.fsdp_plugin.set_state_dict_type(<span class="hljs-string">"FULL_STATE_DICT"</span>) | |
| trainer.save_model()<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-w1wqr8">Here, one main thing to note currently when using FSDP with PEFT is that <code>use_orig_params</code> needs to be <code>False</code> to realize GPU memory savings. Due to <code>use_orig_params=False</code>, the auto wrap policy for FSDP needs to change so that trainable and non-trainable parameters are wrapped separately. This is done by the code snippt below which uses the util function <code>fsdp_auto_wrap_policy</code> from PEFT:</p> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->if getattr(trainer.accelerator.<span class="hljs-keyword">state</span>, <span class="hljs-string">"fsdp_plugin"</span>, None): | |
| <span class="hljs-keyword">from</span> peft.utils.other import fsdp_auto_wrap_policy | |
| fsdp_plugin = trainer.accelerator.<span class="hljs-keyword">state</span>.fsdp_plugin | |
| fsdp_plugin.auto_wrap_policy = fsdp_auto_wrap_policy(trainer.model)<!-- HTML_TAG_END --></pre></div> <h2 class="relative group"><a id="memory-usage" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#memory-usage"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Memory usage</span></h2> <p data-svelte-h="svelte-1l81qml">In the above example, the memory consumed per GPU is 72-80 GB (90-98%) as seen in the screenshot below. The slight increase in GPU memory at the end is when saving the model using <code>FULL_STATE_DICT</code> state dict type instead of the <code>SHARDED_STATE_DICT</code> so that the model has adapter weights that can be loaded normally with <code>from_pretrained</code> method during inference:</p> <div class="flex justify-center" data-svelte-h="svelte-1w1ze4b"><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/peft/peft_fsdp_mem_usage.png"></div> <small data-svelte-h="svelte-8v78s3">GPU memory usage for the training run</small> <h1 class="relative group"><a id="use-peft-qlora-and-fsdp-for-finetuning-large-models-on-multiple-gpus" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#use-peft-qlora-and-fsdp-for-finetuning-large-models-on-multiple-gpus"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Use PEFT QLoRA and FSDP for finetuning large models on multiple GPUs</span></h1> <p data-svelte-h="svelte-1idfuez">In this section, we will look at how to use QLoRA and FSDP for finetuning 70B llama model on 2X24GB GPUs. <a href="https://www.answer.ai/" rel="nofollow">Answer.AI</a> in collaboration with bitsandbytes and Hugging Face 🤗 open sourced code enabling the usage of FSDP+QLoRA and explained the whole process in their insightful blogpost <a href="https://www.answer.ai/posts/2024-03-06-fsdp-qlora.html" rel="nofollow">You can now train a 70b language model at home</a>. This is now integrated in Hugging Face ecosystem.</p> <p data-svelte-h="svelte-nmznmj">For this, we first need <code>bitsandbytes>=0.43.3</code>, <code>accelerate>=1.0.1</code>, <code>transformers>4.44.2</code>, <code>trl>0.11.4</code> and <code>peft>0.13.0</code>. We need to set <code>fsdp_cpu_ram_efficient_loading=true</code>, <code>fsdp_use_orig_params=false</code> and <code>fsdp_offload_params=true</code>(cpu offloading) when using Accelerate config. When not using accelerate launcher, you can alternately set the environment variable <code>export FSDP_CPU_RAM_EFFICIENT_LOADING=true</code>. Here, we will be using accelerate config and below is the config which can be found at <a href="https://github.com/huggingface/peft/blob/main/examples/sft/configs/fsdp_config_qlora.yaml" rel="nofollow">fsdp_config_qlora.yaml</a>:</p> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-attr">compute_environment:</span> <span class="hljs-string">LOCAL_MACHINE</span> | |
| <span class="hljs-attr">debug:</span> <span class="hljs-literal">false</span> | |
| <span class="hljs-attr">distributed_type:</span> <span class="hljs-string">FSDP</span> | |
| <span class="hljs-attr">downcast_bf16:</span> <span class="hljs-string">'no'</span> | |
| <span class="hljs-attr">fsdp_config:</span> | |
| <span class="hljs-attr">fsdp_auto_wrap_policy:</span> <span class="hljs-string">TRANSFORMER_BASED_WRAP</span> | |
| <span class="hljs-attr">fsdp_backward_prefetch:</span> <span class="hljs-string">BACKWARD_PRE</span> | |
| <span class="hljs-attr">fsdp_cpu_ram_efficient_loading:</span> <span class="hljs-literal">true</span> | |
| <span class="hljs-attr">fsdp_forward_prefetch:</span> <span class="hljs-literal">false</span> | |
| <span class="hljs-attr">fsdp_offload_params:</span> <span class="hljs-literal">true</span> | |
| <span class="hljs-attr">fsdp_sharding_strategy:</span> <span class="hljs-string">FULL_SHARD</span> | |
| <span class="hljs-attr">fsdp_state_dict_type:</span> <span class="hljs-string">SHARDED_STATE_DICT</span> | |
| <span class="hljs-attr">fsdp_sync_module_states:</span> <span class="hljs-literal">true</span> | |
| <span class="hljs-attr">fsdp_use_orig_params:</span> <span class="hljs-literal">false</span> | |
| <span class="hljs-attr">machine_rank:</span> <span class="hljs-number">0</span> | |
| <span class="hljs-attr">main_training_function:</span> <span class="hljs-string">main</span> | |
| <span class="hljs-attr">mixed_precision:</span> <span class="hljs-string">'no'</span> | |
| <span class="hljs-attr">num_machines:</span> <span class="hljs-number">1</span> | |
| <span class="hljs-attr">num_processes:</span> <span class="hljs-number">2</span> | |
| <span class="hljs-attr">rdzv_backend:</span> <span class="hljs-string">static</span> | |
| <span class="hljs-attr">same_network:</span> <span class="hljs-literal">true</span> | |
| <span class="hljs-attr">tpu_env:</span> [] | |
| <span class="hljs-attr">tpu_use_cluster:</span> <span class="hljs-literal">false</span> | |
| <span class="hljs-attr">tpu_use_sudo:</span> <span class="hljs-literal">false</span> | |
| <span class="hljs-attr">use_cpu:</span> <span class="hljs-literal">false</span><!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-afuz8n">Launch command is given below which is available at <a href="https://github.com/huggingface/peft/blob/main/examples/sft/run_peft_qlora_fsdp.sh" rel="nofollow">run_peft_qlora_fsdp.sh</a>:</p> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->accelerate launch <span class="hljs-attr">--config_file</span> <span class="hljs-string">"configs/fsdp_config_qlora.yaml"</span> train<span class="hljs-selector-class">.py</span> \ | |
| <span class="hljs-attr">--seed</span> <span class="hljs-number">100</span> \ | |
| <span class="hljs-attr">--model_name_or_path</span> <span class="hljs-string">"meta-llama/Llama-2-70b-hf"</span> \ | |
| <span class="hljs-attr">--dataset_name</span> <span class="hljs-string">"smangrul/ultrachat-10k-chatml"</span> \ | |
| <span class="hljs-attr">--chat_template_format</span> <span class="hljs-string">"chatml"</span> \ | |
| <span class="hljs-attr">--add_special_tokens</span> False \ | |
| <span class="hljs-attr">--append_concat_token</span> False \ | |
| <span class="hljs-attr">--splits</span> <span class="hljs-string">"train,test"</span> \ | |
| <span class="hljs-attr">--max_seq_len</span> <span class="hljs-number">2048</span> \ | |
| <span class="hljs-attr">--num_train_epochs</span> <span class="hljs-number">1</span> \ | |
| <span class="hljs-attr">--logging_steps</span> <span class="hljs-number">5</span> \ | |
| <span class="hljs-attr">--log_level</span> <span class="hljs-string">"info"</span> \ | |
| <span class="hljs-attr">--logging_strategy</span> <span class="hljs-string">"steps"</span> \ | |
| <span class="hljs-attr">--eval_strategy</span> <span class="hljs-string">"epoch"</span> \ | |
| <span class="hljs-attr">--save_strategy</span> <span class="hljs-string">"epoch"</span> \ | |
| <span class="hljs-attr">--push_to_hub</span> \ | |
| <span class="hljs-attr">--hub_private_repo</span> True \ | |
| <span class="hljs-attr">--hub_strategy</span> <span class="hljs-string">"every_save"</span> \ | |
| <span class="hljs-attr">--bf16</span> True \ | |
| <span class="hljs-attr">--packing</span> True \ | |
| <span class="hljs-attr">--learning_rate</span> <span class="hljs-number">1</span>e-<span class="hljs-number">4</span> \ | |
| <span class="hljs-attr">--lr_scheduler_type</span> <span class="hljs-string">"cosine"</span> \ | |
| <span class="hljs-attr">--weight_decay</span> <span class="hljs-number">1</span>e-<span class="hljs-number">4</span> \ | |
| <span class="hljs-attr">--warmup_steps</span> <span class="hljs-number">0</span> \ | |
| <span class="hljs-attr">--max_grad_norm</span> <span class="hljs-number">1.0</span> \ | |
| <span class="hljs-attr">--output_dir</span> <span class="hljs-string">"llama-sft-qlora-fsdp"</span> \ | |
| <span class="hljs-attr">--per_device_train_batch_size</span> <span class="hljs-number">2</span> \ | |
| <span class="hljs-attr">--per_device_eval_batch_size</span> <span class="hljs-number">2</span> \ | |
| <span class="hljs-attr">--gradient_accumulation_steps</span> <span class="hljs-number">2</span> \ | |
| <span class="hljs-attr">--gradient_checkpointing</span> True \ | |
| <span class="hljs-attr">--use_reentrant</span> True \ | |
| <span class="hljs-attr">--dataset_text_field</span> <span class="hljs-string">"content"</span> \ | |
| <span class="hljs-attr">--use_flash_attn</span> True \ | |
| <span class="hljs-attr">--use_peft_lora</span> True \ | |
| <span class="hljs-attr">--lora_r</span> <span class="hljs-number">8</span> \ | |
| <span class="hljs-attr">--lora_alpha</span> <span class="hljs-number">16</span> \ | |
| <span class="hljs-attr">--lora_dropout</span> <span class="hljs-number">0.1</span> \ | |
| <span class="hljs-attr">--lora_target_modules</span> <span class="hljs-string">"all-linear"</span> \ | |
| <span class="hljs-attr">--use_4bit_quantization</span> True \ | |
| <span class="hljs-attr">--use_nested_quant</span> True \ | |
| <span class="hljs-attr">--bnb_4bit_compute_dtype</span> <span class="hljs-string">"bfloat16"</span> \ | |
| <span class="hljs-attr">--bnb_4bit_quant_storage_dtype</span> <span class="hljs-string">"bfloat16"</span><!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1t3rorv">Notice the new argument being passed, <code>bnb_4bit_quant_storage_dtype</code>, which denotes the data type for packing the 4-bit parameters. For example, when it is set to <code>bfloat16</code>, <strong>16/4 = 4</strong> 4-bit params are packed together post quantization. When using mixed precision training with <code>bfloat16</code>, <code>bnb_4bit_quant_storage_dtype</code> can be either <code>bfloat16</code> for pure <code>bfloat16</code> finetuning, or <code>float32</code> for automatic mixed precision (this consumes more GPU memory). When using mixed precision training with <code>float16</code>, <code>bnb_4bit_quant_storage_dtype</code> should be set to <code>float32</code> for stable automatic mixed precision training.</p> <p data-svelte-h="svelte-19s9lyl">In terms of training code, the important code changes are:</p> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->... | |
| bnb_config = BitsAndBytesConfig( | |
| load_in_4bit=args.use_4bit_quantization, | |
| bnb_4bit_quant_type=args.bnb_4bit_quant_type, | |
| bnb_4bit_compute_dtype=compute_dtype, | |
| bnb_4bit_use_double_quant=args.use_nested_quant, | |
| <span class="hljs-addition">+ bnb_4bit_quant_storage=quant_storage_dtype,</span> | |
| ) | |
| ... | |
| model = AutoModelForCausalLM.from_pretrained( | |
| args.model_name_or_path, | |
| quantization_config=bnb_config, | |
| trust_remote_code=True, | |
| attn_implementation="flash_attention_2" if args.use_flash_attn else "eager", | |
| <span class="hljs-addition">+ dtype=quant_storage_dtype or torch.float32,</span> | |
| )<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-46tfpq">Notice that <code>dtype</code> for <code>AutoModelForCausalLM</code> is same as the <code>bnb_4bit_quant_storage</code> data type. That’s it. Everything else is handled by Trainer and TRL.</p> <h2 class="relative group"><a id="memory-usage" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#memory-usage"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Memory usage</span></h2> <p data-svelte-h="svelte-1e2ovrm">In the above example, the memory consumed per GPU is <strong>19.6 GB</strong> while CPU RAM usage is around <strong>107 GB</strong>. When disabling CPU offloading, the GPU memory usage is <strong>35.6 GB/ GPU</strong>. Therefore, what took 16X80GB GPUs for full finetuning, 8X80GB GPUs with FSDP+LoRA, and a couple of 80GB GPUs with DDP+QLoRA, now requires 2X24GB GPUs. This makes finetuning of large models more accessible.</p> <h2 class="relative group"><a id="more-resources" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#more-resources"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>More resources</span></h2> <p data-svelte-h="svelte-yjfr5u">You can also refer the <a href="https://github.com/facebookresearch/llama-recipes/?tab=readme-ov-file#fine-tuning" rel="nofollow">llama-recipes</a> repo and <a href="https://llama.meta.com/get-started/#fine-tuning" rel="nofollow">Getting started with Llama</a> guide on how to finetune using FSDP and PEFT.</p> <h2 class="relative group"><a id="caveats" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#caveats"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Caveats</span></h2> <ol data-svelte-h="svelte-1tvr0x6"><li>Merging when using PEFT and FSDP is currently unsupported and will raise error.</li> <li>Passing <code>modules_to_save</code> config parameter to is untested at present.</li> <li>GPU Memory saving when using CPU Offloading is untested at present.</li> <li>When using FSDP+QLoRA, <code>paged_adamw_8bit</code> currently results in an error when saving a checkpoint.</li> <li>DoRA training with FSDP should work (albeit at lower speed than LoRA). If combined with bitsandbytes (QDoRA), 4-bit quantization should also work, but 8-bit quantization has known issues and is not recommended.</li></ol> <a class="!text-gray-400 !no-underline text-sm flex items-center not-prose mt-4" href="https://github.com/huggingface/peft/blob/main/docs/source/accelerate/fsdp.md" target="_blank"><svg class="mr-1" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M31,16l-7,7l-1.41-1.41L28.17,16l-5.58-5.59L24,9l7,7z"></path><path d="M1,16l7-7l1.41,1.41L3.83,16l5.58,5.59L8,23l-7-7z"></path><path d="M12.419,25.484L17.639,6.552l1.932,0.518L14.351,26.002z"></path></svg> <span data-svelte-h="svelte-zjs2n5"><span class="underline">Update</span> on GitHub</span></a> <p></p> | |
| <script> | |
| { | |
| __sveltekit_1wz1iqk = { | |
| assets: "/docs/peft/pr_3207/en", | |
| base: "/docs/peft/pr_3207/en", | |
| env: {} | |
| }; | |
| const element = document.currentScript.parentElement; | |
| const data = [null,null]; | |
| Promise.all([ | |
| import("/docs/peft/pr_3207/en/_app/immutable/entry/start.dbacf4b8.js"), | |
| import("/docs/peft/pr_3207/en/_app/immutable/entry/app.cb03e0d4.js") | |
| ]).then(([kit, app]) => { | |
| kit.start(app, element, { | |
| node_ids: [0, 3], | |
| data, | |
| form: null, | |
| error: null | |
| }); | |
| }); | |
| } | |
| </script> | |
Xet Storage Details
- Size:
- 47.6 kB
- Xet hash:
- 490500c528c82b72d3fb4e561b69ca04af022d1e1b432812f197d9c5b35ce3db
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.