Buckets:
| <meta charset="utf-8" /><meta name="hf:doc:metadata" content="{"title":"Reducing Memory Usage","local":"reducing-memory-usage","sections":[{"title":"Truncation","local":"truncation","sections":[],"depth":2},{"title":"Packing","local":"packing","sections":[],"depth":2}],"depth":1}"> | |
| <link href="/docs/trl/pr_2506/en/_app/immutable/assets/0.e3b0c442.css" rel="modulepreload"> | |
| <link rel="modulepreload" href="/docs/trl/pr_2506/en/_app/immutable/entry/start.7148be0f.js"> | |
| <link rel="modulepreload" href="/docs/trl/pr_2506/en/_app/immutable/chunks/scheduler.d627b047.js"> | |
| <link rel="modulepreload" href="/docs/trl/pr_2506/en/_app/immutable/chunks/singletons.b4b5b8ec.js"> | |
| <link rel="modulepreload" href="/docs/trl/pr_2506/en/_app/immutable/chunks/index.a57a1c33.js"> | |
| <link rel="modulepreload" href="/docs/trl/pr_2506/en/_app/immutable/chunks/paths.8e429b42.js"> | |
| <link rel="modulepreload" href="/docs/trl/pr_2506/en/_app/immutable/entry/app.d5fbe210.js"> | |
| <link rel="modulepreload" href="/docs/trl/pr_2506/en/_app/immutable/chunks/index.73c51727.js"> | |
| <link rel="modulepreload" href="/docs/trl/pr_2506/en/_app/immutable/nodes/0.386dff60.js"> | |
| <link rel="modulepreload" href="/docs/trl/pr_2506/en/_app/immutable/chunks/each.e59479a4.js"> | |
| <link rel="modulepreload" href="/docs/trl/pr_2506/en/_app/immutable/nodes/36.1a226c9b.js"> | |
| <link rel="modulepreload" href="/docs/trl/pr_2506/en/_app/immutable/chunks/Tip.a82942ec.js"> | |
| <link rel="modulepreload" href="/docs/trl/pr_2506/en/_app/immutable/chunks/CodeBlock.b1cdc5f6.js"> | |
| <link rel="modulepreload" href="/docs/trl/pr_2506/en/_app/immutable/chunks/EditOnGithub.859b9ebc.js"><!-- HEAD_svelte-u9bgzb_START --><meta name="hf:doc:metadata" content="{"title":"Reducing Memory Usage","local":"reducing-memory-usage","sections":[{"title":"Truncation","local":"truncation","sections":[],"depth":2},{"title":"Packing","local":"packing","sections":[],"depth":2}],"depth":1}"><!-- HEAD_svelte-u9bgzb_END --> <p></p> <h1 class="relative group"><a id="reducing-memory-usage" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#reducing-memory-usage"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Reducing Memory Usage</span></h1> <div class="course-tip course-tip-orange bg-gradient-to-br dark:bg-gradient-to-r before:border-orange-500 dark:before:border-orange-800 from-orange-50 dark:from-gray-900 to-white dark:to-gray-950 border border-orange-50 text-orange-700 dark:text-gray-400"><p data-svelte-h="svelte-txx63b">Section under construction. Feel free to contribute!</p></div> <h2 class="relative group"><a id="truncation" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#truncation"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Truncation</span></h2> <p data-svelte-h="svelte-uq0s5j">Sequence lengths in the dataset can vary widely, and by default, TRL does not modify the data. When data is batched, sequences are padded to match the longest one in the batch, which can cause high memory usage, even if most sequences are relatively short.</p> <div class="flex justify-center" data-svelte-h="svelte-4pi4g0"><img src="https://huggingface.co/datasets/trl-lib/documentation-images/resolve/main/why_you_should_truncate.png" alt="Truncation prompt completion" width="600"></div> <p data-svelte-h="svelte-kr68fk">To reduce memory usage, it’s important to truncate sequences to a reasonable length. Even discarding just a few tokens from the dataset can result in significant memory savings by minimizing unnecessary padding. Truncation is a good practice and should always be applied to ensure efficient use of resources. While the truncation limit doesn’t need to be overly restrictive, setting a sensible value is essential for optimal performance.</p> <div class="flex space-x-2 items-center my-1.5 mr-8 h-7 !pl-0 -mx-3 md:mx-0"><div class="flex items-center border rounded-lg px-1.5 py-1 leading-none select-none text-smd border-gray-800 bg-black dark:bg-gray-700 text-white">DPO </div><div class="flex items-center border rounded-lg px-1.5 py-1 leading-none select-none text-smd text-gray-500 cursor-pointer opacity-90 hover:text-gray-700 dark:hover:text-gray-200 hover:shadow-sm">SFT </div></div> <div class="language-select"><p data-svelte-h="svelte-cly35y">DPO truncation is applied first to the prompt and to the completion via the <code>max_prompt_length</code> and <code>max_completion_length</code> parameters. The <code>max_length</code> parameter is then used to truncate the resulting sequence.</p> <div class="flex justify-center" data-svelte-h="svelte-1f9vb58"><img src="https://huggingface.co/datasets/trl-lib/documentation-images/resolve/main/truncation_prompt_completion.png" alt="Truncation prompt completion" width="600"></div> <p data-svelte-h="svelte-1r0zv8c">To set the truncation parameters, use the following code snippet:</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-keyword">from</span> trl <span class="hljs-keyword">import</span> DPOConfig | |
| training_args = DPOConfig(..., max_prompt_length=..., max_completion_length=..., max_length=...)<!-- HTML_TAG_END --></pre></div> </div> <h2 class="relative group"><a id="packing" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#packing"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Packing</span></h2> <div class="course-tip bg-gradient-to-br dark:bg-gradient-to-r before:border-green-500 dark:before:border-green-800 from-green-50 dark:from-gray-900 to-white dark:to-gray-950 border border-green-50 text-green-700 dark:text-gray-400"><p data-svelte-h="svelte-o7a208">This technique applies only to SFT.</p></div> <p data-svelte-h="svelte-6rm19q"><a href="#truncation">Truncation</a> has several drawbacks:</p> <ol data-svelte-h="svelte-17zhqal"><li><strong>Loss of information</strong>: Key data at the end of a sequence may be discarded.</li> <li><strong>Choosing truncation length</strong>: Too short loses data; too long undermines efficiency.</li></ol> <p data-svelte-h="svelte-1016bgl">Packing, introduced in <a href="https://huggingface.co/papers/1910.10683" rel="nofollow">Raffel et al., 2020</a>, addresses these issues by grouping sequences instead of truncating. It concatenates and splits dataset sequences into the desired lengths.</p> <div class="flex justify-center" data-svelte-h="svelte-ah8vi2"><img src="https://huggingface.co/datasets/trl-lib/documentation-images/resolve/main/packing.png" alt="Packing" width="600"></div> <p data-svelte-h="svelte-1c5au0w">Packing eliminates padding, preserves all sequence information, and allows for flexible sequence lengths, making it a more efficient alternative to truncation. To enable packing, use <code>packing=True</code> in the <a href="/docs/trl/pr_2506/en/sft_trainer#trl.SFTConfig">SFTConfig</a>:</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-keyword">from</span> trl <span class="hljs-keyword">import</span> SFTConfig | |
| training_args = SFTConfig(..., packing=<span class="hljs-literal">True</span>, max_seq_length=<span class="hljs-number">512</span>)<!-- HTML_TAG_END --></pre></div> <div class="course-tip course-tip-orange bg-gradient-to-br dark:bg-gradient-to-r before:border-orange-500 dark:before:border-orange-800 from-orange-50 dark:from-gray-900 to-white dark:to-gray-950 border border-orange-50 text-orange-700 dark:text-gray-400"><p data-svelte-h="svelte-6zyodn">Packing may cause batch contamination, where adjacent sequences influence one another. This can be problematic for some applications. For more details, see <a href="https://github.com/huggingface/trl/issues/1230" rel="nofollow">#1230</a>.</p></div> <a class="!text-gray-400 !no-underline text-sm flex items-center not-prose mt-4" href="https://github.com/huggingface/trl/blob/main/docs/source/reducing_memory_usage.md" target="_blank"><span data-svelte-h="svelte-1kd6by1"><</span> <span data-svelte-h="svelte-x0xyl0">></span> <span data-svelte-h="svelte-1dajgef"><span class="underline ml-1.5">Update</span> on GitHub</span></a> <p></p> | |
| <script> | |
| { | |
| __sveltekit_1t0aie3 = { | |
| assets: "/docs/trl/pr_2506/en", | |
| base: "/docs/trl/pr_2506/en", | |
| env: {} | |
| }; | |
| const element = document.currentScript.parentElement; | |
| const data = [null,null]; | |
| Promise.all([ | |
| import("/docs/trl/pr_2506/en/_app/immutable/entry/start.7148be0f.js"), | |
| import("/docs/trl/pr_2506/en/_app/immutable/entry/app.d5fbe210.js") | |
| ]).then(([kit, app]) => { | |
| kit.start(app, element, { | |
| node_ids: [0, 36], | |
| data, | |
| form: null, | |
| error: null | |
| }); | |
| }); | |
| } | |
| </script> | |
Xet Storage Details
- Size:
- 13.8 kB
- Xet hash:
- 47f8d564cf67685efdb0bdf98b43c6f53bc6dab2930be808f154973c4ac5b19f
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.