Buckets:
| <meta charset="utf-8" /><meta name="hf:doc:metadata" content="{"title":"Low Precision Training Methods","local":"low-precision-training-methods","sections":[{"title":"What training on FP8 means","local":"what-training-on-fp8-means","sections":[],"depth":2},{"title":"Configuring the Accelerator","local":"configuring-the-accelerator","sections":[],"depth":2},{"title":"Configuring MS-AMP","local":"configuring-ms-amp","sections":[],"depth":2},{"title":"Configuring TransformersEngine","local":"configuring-transformersengine","sections":[],"depth":2},{"title":"Configuring torchao","local":"configuring-torchao","sections":[],"depth":2},{"title":"Example Zoo","local":"example-zoo","sections":[],"depth":2},{"title":"Further Reading","local":"further-reading","sections":[],"depth":2}],"depth":1}"> | |
| <link href="/docs/accelerate/pr_3469/en/_app/immutable/assets/0.e3b0c442.css" rel="modulepreload"> | |
| <link rel="modulepreload" href="/docs/accelerate/pr_3469/en/_app/immutable/entry/start.6d95f9ad.js"> | |
| <link rel="modulepreload" href="/docs/accelerate/pr_3469/en/_app/immutable/chunks/scheduler.defa9a21.js"> | |
| <link rel="modulepreload" href="/docs/accelerate/pr_3469/en/_app/immutable/chunks/singletons.4bb598a6.js"> | |
| <link rel="modulepreload" href="/docs/accelerate/pr_3469/en/_app/immutable/chunks/index.beade68d.js"> | |
| <link rel="modulepreload" href="/docs/accelerate/pr_3469/en/_app/immutable/chunks/paths.b686be03.js"> | |
| <link rel="modulepreload" href="/docs/accelerate/pr_3469/en/_app/immutable/entry/app.7adfeecf.js"> | |
| <link rel="modulepreload" href="/docs/accelerate/pr_3469/en/_app/immutable/chunks/index.fe795e71.js"> | |
| <link rel="modulepreload" href="/docs/accelerate/pr_3469/en/_app/immutable/nodes/0.fe70ae86.js"> | |
| <link rel="modulepreload" href="/docs/accelerate/pr_3469/en/_app/immutable/chunks/each.e59479a4.js"> | |
| <link rel="modulepreload" href="/docs/accelerate/pr_3469/en/_app/immutable/nodes/47.19f03227.js"> | |
| <link rel="modulepreload" href="/docs/accelerate/pr_3469/en/_app/immutable/chunks/CodeBlock.204b6c34.js"> | |
| <link rel="modulepreload" href="/docs/accelerate/pr_3469/en/_app/immutable/chunks/index.207e1174.js"><!-- HEAD_svelte-u9bgzb_START --><meta name="hf:doc:metadata" content="{"title":"Low Precision Training Methods","local":"low-precision-training-methods","sections":[{"title":"What training on FP8 means","local":"what-training-on-fp8-means","sections":[],"depth":2},{"title":"Configuring the Accelerator","local":"configuring-the-accelerator","sections":[],"depth":2},{"title":"Configuring MS-AMP","local":"configuring-ms-amp","sections":[],"depth":2},{"title":"Configuring TransformersEngine","local":"configuring-transformersengine","sections":[],"depth":2},{"title":"Configuring torchao","local":"configuring-torchao","sections":[],"depth":2},{"title":"Example Zoo","local":"example-zoo","sections":[],"depth":2},{"title":"Further Reading","local":"further-reading","sections":[],"depth":2}],"depth":1}"><!-- HEAD_svelte-u9bgzb_END --> <p></p> <h1 class="relative group"><a id="low-precision-training-methods" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#low-precision-training-methods"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Low Precision Training Methods</span></h1> <p data-svelte-h="svelte-bx5fz6">Accelerate provides integrations to train on lower precision methods using specified supported hardware through the <code>TransformersEngine</code>, <code>MS-AMP</code>, and <code>torchao</code> packages. This documentation will help guide you through what hardware is supported, how to configure your <a href="/docs/accelerate/pr_3469/en/package_reference/accelerator#accelerate.Accelerator">Accelerator</a> to leverage the low precision methods, and what you can expect when training.</p> <h2 class="relative group"><a id="what-training-on-fp8-means" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#what-training-on-fp8-means"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>What training on FP8 means</span></h2> <p data-svelte-h="svelte-wuhpuo">To explore more of the nitty-gritty in training in FP8 with PyTorch and Accelerate, check out the <a href="../concept_guides/low_precision_training">concept_guide</a> on why this can be difficult. But essentially rather than training in BF16, some (or all) aspects of training a model can be performed using 8 bits instead of 16. The challenge is doing so without degrading final performance.</p> <p data-svelte-h="svelte-10cwb11">This is only enabled on specific NVIDIA hardware, namely:</p> <ul data-svelte-h="svelte-5d1df8"><li>Anything after the 3000 series consumer graphics cards (such as the 4090)</li> <li>Hopper-based GPU architectures (such as the <code>H100</code> and <code>H200</code>)</li></ul> <p data-svelte-h="svelte-149dy0g">What this will result in is some reduction in the memory used (as we’ve cut the needed memory in half for some parts of training) and an increase in throughput <em>should</em> be seen as well for larger models that can replace certain layers with FP8-enabled ones.</p> <h2 class="relative group"><a id="configuring-the-accelerator" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#configuring-the-accelerator"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Configuring the Accelerator</span></h2> <p data-svelte-h="svelte-1ytxd1j">Currently three different backends for FP8 are supported (<code>TransformersEngine</code>, <code>torchao</code>, and <code>MS-AMP</code>), each with different capabilities and configurations.</p> <p data-svelte-h="svelte-bpqomw">To use either, the same core API is used. Just pass <code>mixed_precision="fp8"</code> to either the <a href="/docs/accelerate/pr_3469/en/package_reference/accelerator#accelerate.Accelerator">Accelerator</a>, during <code>accelerate config</code> when prompted about mixed precision, or as part of your <code>config.yaml</code> file in the <code>mixed_precision</code> key:</p> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->from accelerate import Accelerator | |
| <span class="hljs-attribute">accelerator</span> <span class="hljs-operator">=</span> Accelerator(mixed_precision<span class="hljs-operator">=</span><span class="hljs-string">"fp8"</span>)<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-hoblda">By default, if <code>MS-AMP</code> is available in your environment, Accelerate will automatically utilize it as a backend. To specify it yourself (and customize other parts of the FP8 mixed precision setup), you can utilize one of the <code>RecipeKwargs</code> dataclasses such as <code>utils.AORecipeKwargs</code>, <code>utils.TERecipeKwargs</code>, or <code>utils.MSAMPRecipeKwargs</code>; you can also clarify it in your config <code>yaml</code>/during <code>accelerate launch</code>:</p> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-keyword">from</span> accelerate import Accelerator | |
| <span class="hljs-keyword">from</span> accelerate.utils import MSAMPRecipeKwargs | |
| kwargs = [MSAMPRecipeKwargs()] | |
| <span class="hljs-comment"># Or to specify the backend as `TransformersEngine` even if MS-AMP is installed</span> | |
| <span class="hljs-comment"># kwargs = [TERecipeKwargs()]</span> | |
| <span class="hljs-comment"># Or to use torchao</span> | |
| <span class="hljs-comment"># kwargs = [AORecipeKwargs()]</span> | |
| accelerator = Accelerator(<span class="hljs-attribute">mixed_precision</span>=<span class="hljs-string">"fp8"</span>, <span class="hljs-attribute">kwarg_handlers</span>=kwargs)<!-- HTML_TAG_END --></pre></div> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-attr">mixed_precision:</span> <span class="hljs-string">fp8</span> | |
| <span class="hljs-attr">fp8_config:</span> | |
| <span class="hljs-attr">amax_compute_algo:</span> <span class="hljs-string">max</span> | |
| <span class="hljs-attr">amax_history_len:</span> <span class="hljs-number">1024</span> | |
| <span class="hljs-attr">backend:</span> <span class="hljs-string">TE</span> | |
| <span class="hljs-attr">fp8_format:</span> <span class="hljs-string">HYBRID</span> | |
| <span class="hljs-attr">interval:</span> <span class="hljs-number">1</span> | |
| <span class="hljs-attr">margin:</span> <span class="hljs-number">0</span> | |
| <span class="hljs-attr">override_linear_precision:</span> <span class="hljs-string">(false,</span> <span class="hljs-literal">false</span><span class="hljs-string">,</span> <span class="hljs-literal">false</span><span class="hljs-string">)</span> | |
| <span class="hljs-attr">use_autocast_during_eval:</span> <span class="hljs-literal">false</span><!-- HTML_TAG_END --></pre></div> <h2 class="relative group"><a id="configuring-ms-amp" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#configuring-ms-amp"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Configuring MS-AMP</span></h2> <p data-svelte-h="svelte-13jlsqr">Of the two, <code>MS-AMP</code> is traditionally the easier one to configure as there is only a single argument: the optimization level.</p> <p data-svelte-h="svelte-11bftkh">Currently two levels of optimization are supported in the Accelerate integration, <code>"O1"</code> and <code>"O2"</code> (using the letter ‘o’, not zero).</p> <ul data-svelte-h="svelte-ha185h"><li><code>"O1"</code> will cast the weight gradients and <code>all_reduce</code> communications to happen in 8-bit, while the rest are done in 16 bit. This reduces the general GPU memory usage and speeds up communication bandwidths.</li> <li><code>"O2"</code> will also cast first-order optimizer states into 8 bit, while the second order states are in FP16. (Currently just the <code>Adam</code> optimizer is supported). This tries its best to minimize final accuracy degradation and will save the highest potential memory.</li></ul> <p data-svelte-h="svelte-wx6vs8">To specify an optimization level, pass it to the <code>FP8KwargsHandler</code> by setting the <code>optimization_level</code> argument:</p> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-keyword">from</span> accelerate import Accelerator | |
| <span class="hljs-keyword">from</span> accelerate.utils import FP8RecipeKwargs | |
| kwargs = [FP8RecipeKwargs(<span class="hljs-attribute">backend</span>=<span class="hljs-string">"msamp"</span>, <span class="hljs-attribute">optimization_level</span>=<span class="hljs-string">"O2"</span>)] | |
| accelerator = Accelerator(<span class="hljs-attribute">mixed_precision</span>=<span class="hljs-string">"fp8"</span>, <span class="hljs-attribute">kwarg_handlers</span>=kwargs)<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-vo4vij">Or during <code>accelerate launch</code> via <code>--fp8_backend=msamp --fp8_opt_level=O2</code></p> <p data-svelte-h="svelte-1n0fir7">Similarly this can be set in your <code>config.yaml</code>:</p> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-symbol">mixed_precision:</span> fp8 | |
| <span class="hljs-symbol">fp8_config:</span> | |
| <span class="hljs-symbol"> backend:</span> MSAMP | |
| <span class="hljs-symbol"> opt_level:</span> O2<!-- HTML_TAG_END --></pre></div> <h2 class="relative group"><a id="configuring-transformersengine" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#configuring-transformersengine"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Configuring TransformersEngine</span></h2> <p data-svelte-h="svelte-1tlihxp">TransformersEngine has many options for customizing how and what FP8 calculations are performed. A full list of supported arguments and what they mean are available in <a href="https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/api/common.html" rel="nofollow">NVIDIA’s documentation</a>, however they are restated as part of <code>FP8KwargsHandler</code>’s docstring for your convenience.</p> <p data-svelte-h="svelte-wvn4fr">Accelerate tries to set sensible defaults, but exploring and tweaking the various parameters yourself can lead to better performance potentially.</p> <p data-svelte-h="svelte-8khoko">To use it, specify <code>backend="te"</code> and modify any of the arguments you want as part of your kwarg handler:</p> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-keyword">from</span> accelerate import Accelerator | |
| <span class="hljs-keyword">from</span> accelerate.utils import FP8RecipeKwargs | |
| kwargs = [FP8RecipeKwargs(<span class="hljs-attribute">backend</span>=<span class="hljs-string">"te"</span>, <span class="hljs-built_in">..</span>.)] | |
| accelerator = Accelerator(<span class="hljs-attribute">mixed_precision</span>=<span class="hljs-string">"fp8"</span>, <span class="hljs-attribute">kwarg_handlers</span>=kwargs)<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-99qxte">Or during <code>accelerate launch</code> via <code>--fp8_backend=te ...</code>. Use <code>accelerate launch --fp8_backend=te -h</code> to see relevent arguments.</p> <p data-svelte-h="svelte-1n0fir7">Similarly this can be set in your <code>config.yaml</code>:</p> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-attr">mixed_precision:</span> <span class="hljs-string">fp8</span> | |
| <span class="hljs-attr">fp8_config:</span> | |
| <span class="hljs-attr">amax_compute_algo:</span> <span class="hljs-string">max</span> | |
| <span class="hljs-attr">amax_history_len:</span> <span class="hljs-number">1024</span> | |
| <span class="hljs-attr">backend:</span> <span class="hljs-string">TE</span> | |
| <span class="hljs-attr">fp8_format:</span> <span class="hljs-string">HYBRID</span> | |
| <span class="hljs-attr">interval:</span> <span class="hljs-number">1</span> | |
| <span class="hljs-attr">margin:</span> <span class="hljs-number">0</span> | |
| <span class="hljs-attr">override_linear_precision:</span> <span class="hljs-string">(false,</span> <span class="hljs-literal">false</span><span class="hljs-string">,</span> <span class="hljs-literal">false</span><span class="hljs-string">)</span> | |
| <span class="hljs-attr">use_autocast_during_eval:</span> <span class="hljs-literal">false</span><!-- HTML_TAG_END --></pre></div> <h2 class="relative group"><a id="configuring-torchao" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#configuring-torchao"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Configuring torchao</span></h2> <p data-svelte-h="svelte-13u4x7b"><code>torchao</code> is a <a href="https://github.com/pytorch/ao/tree/main/torchao/float8" rel="nofollow">PyTorch-driven</a> hackable FP8 backend, aiming to be more approchable than the prior two engines. One of the core differences with <code>ao</code> compared to the prior two is that for numerical stability, it’s found to be generally better off keeping the first <em>and</em> last layers in the model at the regular precision (be it FP32 or BF16), and then the other layers quantized down to FP8. As a result, a config for <code>ao</code> looks a bit differently:</p> <blockquote data-svelte-h="svelte-rg03zf"><p>Note: this API is experimental and is subject to change</p></blockquote> <div class="code-block relative "><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-keyword">from</span> accelerate import Accelerator | |
| <span class="hljs-keyword">from</span> accelerate.utils import AORecipeKwargs | |
| kwargs = [AORecipeKwargs()] | |
| accelerator = Accelerator(<span class="hljs-attribute">mixed_precision</span>=<span class="hljs-string">"fp8"</span>, <span class="hljs-attribute">kwarg_handlers</span>=kwargs)<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1qqyqot">To learn more about the specific parameters to be used, please see the official <code>torchao</code> repo.</p> <h2 class="relative group"><a id="example-zoo" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#example-zoo"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Example Zoo</span></h2> <p data-svelte-h="svelte-1ay0trc">We have examples showcasing training with FP8 both with accelerate and its underlying implementation available in the accelerate repo. | |
| Currently we support scripts showcasing:</p> <ul data-svelte-h="svelte-1affbo7"><li>Single GPU</li> <li>Distributed Data Parallelism (Multi-GPU)</li> <li>Fully Sharded Data Parallelism</li> <li>DeepSpeed ZeRO 1 through 3</li></ul> <p data-svelte-h="svelte-sau342">Find out more <a href="https://github.com/huggingface/accelerate/tree/main/benchmarks/fp8" rel="nofollow">here</a></p> <h2 class="relative group"><a id="further-reading" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#further-reading"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Further Reading</span></h2> <p data-svelte-h="svelte-t5s4ol">To learn more about training in FP8 please check out the following resources:</p> <ul data-svelte-h="svelte-1gt16sy"><li><a href="../concept_guides/low_precision_training">Our concept guide</a> detailing into more about both TransformersEngine and MS-AMP</li> <li><a href="https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/api/common.html" rel="nofollow">The <code>transformers-engine</code> documentation</a></li> <li><a href="https://azure.github.io/MS-AMP/docs/" rel="nofollow">The <code>MS-AMP</code> documentation</a></li> <li><a href="https://github.com/pytorch/ao/tree/main/torchao/float8" rel="nofollow">The <code>torchao</code> documentation</a></li></ul> <a class="!text-gray-400 !no-underline text-sm flex items-center not-prose mt-4" href="https://github.com/huggingface/accelerate/blob/main/docs/source/usage_guides/low_precision_training.md" target="_blank"><span data-svelte-h="svelte-1kd6by1"><</span> <span data-svelte-h="svelte-x0xyl0">></span> <span data-svelte-h="svelte-1dajgef"><span class="underline ml-1.5">Update</span> on GitHub</span></a> <p></p> | |
| <script> | |
| { | |
| __sveltekit_1tfhs0h = { | |
| assets: "/docs/accelerate/pr_3469/en", | |
| base: "/docs/accelerate/pr_3469/en", | |
| env: {} | |
| }; | |
| const element = document.currentScript.parentElement; | |
| const data = [null,null]; | |
| Promise.all([ | |
| import("/docs/accelerate/pr_3469/en/_app/immutable/entry/start.6d95f9ad.js"), | |
| import("/docs/accelerate/pr_3469/en/_app/immutable/entry/app.7adfeecf.js") | |
| ]).then(([kit, app]) => { | |
| kit.start(app, element, { | |
| node_ids: [0, 47], | |
| data, | |
| form: null, | |
| error: null | |
| }); | |
| }); | |
| } | |
| </script> | |
Xet Storage Details
- Size:
- 35.3 kB
- Xet hash:
- 535d62d0f8caff29e7120293af17fb00412d282859424d48086bb9abaa915995
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.