Buckets:
| <meta charset="utf-8" /><meta name="hf:doc:metadata" content="{"title":"Intel® Extension for PyTorch","local":"intel-extension-for-pytorch","sections":[{"title":"IPEX installation:","local":"ipex-installation","sections":[],"depth":2},{"title":"How It Works For Training optimization in CPU","local":"how-it-works-for-training-optimization-in-cpu","sections":[],"depth":2},{"title":"Related Resources","local":"related-resources","sections":[],"depth":2}],"depth":1}"> | |
| <link href="/docs/accelerate/main/en/_app/immutable/assets/0.e3b0c442.css" rel="modulepreload"> | |
| <link rel="modulepreload" href="/docs/accelerate/main/en/_app/immutable/entry/start.2ea03080.js"> | |
| <link rel="modulepreload" href="/docs/accelerate/main/en/_app/immutable/chunks/scheduler.defa9a21.js"> | |
| <link rel="modulepreload" href="/docs/accelerate/main/en/_app/immutable/chunks/singletons.aff0b9fc.js"> | |
| <link rel="modulepreload" href="/docs/accelerate/main/en/_app/immutable/chunks/index.beade68d.js"> | |
| <link rel="modulepreload" href="/docs/accelerate/main/en/_app/immutable/chunks/paths.2c85d1a6.js"> | |
| <link rel="modulepreload" href="/docs/accelerate/main/en/_app/immutable/entry/app.e6812672.js"> | |
| <link rel="modulepreload" href="/docs/accelerate/main/en/_app/immutable/chunks/index.fe795e71.js"> | |
| <link rel="modulepreload" href="/docs/accelerate/main/en/_app/immutable/nodes/0.39c84d5d.js"> | |
| <link rel="modulepreload" href="/docs/accelerate/main/en/_app/immutable/chunks/each.e59479a4.js"> | |
| <link rel="modulepreload" href="/docs/accelerate/main/en/_app/immutable/nodes/43.227a5b0c.js"> | |
| <link rel="modulepreload" href="/docs/accelerate/main/en/_app/immutable/chunks/CodeBlock.42404125.js"> | |
| <link rel="modulepreload" href="/docs/accelerate/main/en/_app/immutable/chunks/EditOnGithub.0f575778.js"><!-- HEAD_svelte-u9bgzb_START --><meta name="hf:doc:metadata" content="{"title":"Intel® Extension for PyTorch","local":"intel-extension-for-pytorch","sections":[{"title":"IPEX installation:","local":"ipex-installation","sections":[],"depth":2},{"title":"How It Works For Training optimization in CPU","local":"how-it-works-for-training-optimization-in-cpu","sections":[],"depth":2},{"title":"Related Resources","local":"related-resources","sections":[],"depth":2}],"depth":1}"><!-- HEAD_svelte-u9bgzb_END --> <p></p> <h1 class="relative group"><a id="intel-extension-for-pytorch" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#intel-extension-for-pytorch"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Intel® Extension for PyTorch</span></h1> <p data-svelte-h="svelte-ssu4xp"><a href="https://github.com/intel/intel-extension-for-pytorch" rel="nofollow">IPEX</a> is optimized for CPUs with AVX-512 or above, and functionally works for CPUs with only AVX2. So, it is expected to bring performance benefit for Intel CPU generations with AVX-512 or above while CPUs with only AVX2 (e.g., AMD CPUs or older Intel CPUs) might result in a better performance under IPEX, but not guaranteed. IPEX provides performance optimizations for CPU training with both Float32 and BFloat16. The usage of BFloat16 is the main focus of the following sections.</p> <p data-svelte-h="svelte-8rqn29">Low precision data type BFloat16 has been natively supported on the 3rd Generation Xeon® Scalable Processors (aka Cooper Lake) with AVX512 instruction set and will be supported on the next generation of Intel® Xeon® Scalable Processors with Intel® Advanced Matrix Extensions (Intel® AMX) instruction set with further boosted performance. The Auto Mixed Precision for CPU backend has been enabled since PyTorch-1.10. At the same time, the support of Auto Mixed Precision with BFloat16 for CPU and BFloat16 optimization of operators has been massively enabled in Intel® Extension for PyTorch, and partially upstreamed to PyTorch master branch. Users can get better performance and user experience with IPEX Auto Mixed Precision.</p> <h2 class="relative group"><a id="ipex-installation" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#ipex-installation"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>IPEX installation:</span></h2> <p data-svelte-h="svelte-yt7jgu">IPEX release is following PyTorch, to install via pip:</p> <table data-svelte-h="svelte-1f2b5tb"><thead><tr><th align="center">PyTorch Version</th> <th align="center">IPEX version</th></tr></thead> <tbody><tr><td align="center">2.0</td> <td align="center">2.0.0</td></tr> <tr><td align="center">1.13</td> <td align="center">1.13.0</td></tr> <tr><td align="center">1.12</td> <td align="center">1.12.300</td></tr> <tr><td align="center">1.11</td> <td align="center">1.11.200</td></tr> <tr><td align="center">1.10</td> <td align="center">1.10.100</td></tr></tbody></table> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->pip install <span class="hljs-built_in">int</span>el_extension_for_pytorch==<version_name> -f https:<span class="hljs-comment">//developer.intel.com/ipex-whl-stable-cpu</span><!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-6x6sx">Check more approaches for <a href="https://intel.github.io/intel-extension-for-pytorch/cpu/latest/tutorials/installation.html" rel="nofollow">IPEX installation</a>.</p> <h2 class="relative group"><a id="how-it-works-for-training-optimization-in-cpu" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#how-it-works-for-training-optimization-in-cpu"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>How It Works For Training optimization in CPU</span></h2> <p data-svelte-h="svelte-hew9g6">🤗 Accelerate has integrated <a href="https://github.com/intel/intel-extension-for-pytorch" rel="nofollow">IPEX</a>, all you need to do is enabling it through the config.</p> <p data-svelte-h="svelte-1rh12ql"><strong>Scenario 1</strong>: Acceleration of No distributed CPU training</p> <p data-svelte-h="svelte-6hcu3i">Run <u>accelerate config</u> on your machine:</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->$ accelerate config | |
| ----------------------------------------------------------------------------------------------------------------------------------------------------------- | |
| In <span class="hljs-built_in">which</span> compute environment are you running? | |
| This machine | |
| ----------------------------------------------------------------------------------------------------------------------------------------------------------- | |
| Which <span class="hljs-built_in">type</span> of machine are you using? | |
| No distributed training | |
| Do you want to run your training on CPU only (even <span class="hljs-keyword">if</span> a GPU / Apple Silicon device is available)? [<span class="hljs-built_in">yes</span>/NO]:<span class="hljs-built_in">yes</span> | |
| Do you want to use Intel PyTorch Extension (IPEX) to speed up training on CPU? [<span class="hljs-built_in">yes</span>/NO]:<span class="hljs-built_in">yes</span> | |
| Do you wish to optimize your script with torch dynamo?[<span class="hljs-built_in">yes</span>/NO]:NO | |
| Do you want to use DeepSpeed? [<span class="hljs-built_in">yes</span>/NO]: NO | |
| ----------------------------------------------------------------------------------------------------------------------------------------------------------- | |
| Do you wish to use FP16 or BF16 (mixed precision)? | |
| bf16<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-xoeuxn">This will generate a config file that will be used automatically to properly set the | |
| default options when doing</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->accelerate launch my_script.py --args_to_my_script<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-mfz1pv">For instance, here is how you would run the NLP example <code>examples/nlp_example.py</code> (from the root of the repo) with IPEX enabled. | |
| default_config.yaml that is generated after <code>accelerate config</code></p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->compute_environment: LOCAL_MACHINE | |
| distributed_type: <span class="hljs-string">'NO'</span> | |
| downcast_bf16: <span class="hljs-string">'no'</span> | |
| ipex_config: | |
| ipex: <span class="hljs-literal">true</span> | |
| machine_rank: 0 | |
| main_training_function: main | |
| mixed_precision: bf16 | |
| num_machines: 1 | |
| num_processes: 1 | |
| rdzv_backend: static | |
| same_network: <span class="hljs-literal">true</span> | |
| tpu_env: [] | |
| tpu_use_cluster: <span class="hljs-literal">false</span> | |
| tpu_use_sudo: <span class="hljs-literal">false</span> | |
| use_cpu: <span class="hljs-literal">true</span><!-- HTML_TAG_END --></pre></div> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->accelerate launch examples/nlp_example.py<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1sz9b1k"><strong>Scenario 2</strong>: Acceleration of distributed CPU training | |
| we use Intel oneCCL for communication, combined with Intel® MPI library to deliver flexible, efficient, scalable cluster messaging on Intel® architecture. you could refer the <a href="https://huggingface.co/docs/transformers/perf_train_cpu_many" rel="nofollow">here</a> for the installation guide</p> <p data-svelte-h="svelte-1g9y35l">Run <u>accelerate config</u> on your machine(node0):</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->$ accelerate config | |
| ----------------------------------------------------------------------------------------------------------------------------------------------------------- | |
| In <span class="hljs-built_in">which</span> compute environment are you running? | |
| This machine | |
| ----------------------------------------------------------------------------------------------------------------------------------------------------------- | |
| Which <span class="hljs-built_in">type</span> of machine are you using? | |
| multi-CPU | |
| How many different machines will you use (use more than 1 <span class="hljs-keyword">for</span> multi-node training)? [1]: 4 | |
| ----------------------------------------------------------------------------------------------------------------------------------------------------------- | |
| What is the rank of this machine? | |
| 0 | |
| What is the IP address of the machine that will host the main process? 36.112.23.24 | |
| What is the port you will use to communicate with the main process? 29500 | |
| Are all the machines on the same <span class="hljs-built_in">local</span> network? Answer `no` <span class="hljs-keyword">if</span> nodes are on the cloud and/or on different network hosts [YES/no]: <span class="hljs-built_in">yes</span> | |
| Do you want to use Intel PyTorch Extension (IPEX) to speed up training on CPU? [<span class="hljs-built_in">yes</span>/NO]:<span class="hljs-built_in">yes</span> | |
| Do you want accelerate to launch mpirun? [<span class="hljs-built_in">yes</span>/NO]: <span class="hljs-built_in">yes</span> | |
| Please enter the path to the hostfile to use with mpirun [~/hostfile]: ~/hostfile | |
| Enter the number of oneCCL worker threads [1]: 1 | |
| Do you wish to optimize your script with torch dynamo?[<span class="hljs-built_in">yes</span>/NO]:NO | |
| How many processes should be used <span class="hljs-keyword">for</span> distributed training? [1]:16 | |
| ----------------------------------------------------------------------------------------------------------------------------------------------------------- | |
| Do you wish to use FP16 or BF16 (mixed precision)? | |
| bf16<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1en3aqm">For instance, here is how you would run the NLP example <code>examples/nlp_example.py</code> (from the root of the repo) with IPEX enabled for distributed CPU training.</p> <p data-svelte-h="svelte-1q4y39h">default_config.yaml that is generated after <code>accelerate config</code></p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->compute_environment: LOCAL_MACHINE | |
| distributed_type: MULTI_CPU | |
| downcast_bf16: <span class="hljs-string">'no'</span> | |
| ipex_config: | |
| ipex: <span class="hljs-literal">true</span> | |
| machine_rank: 0 | |
| main_process_ip: 36.112.23.24 | |
| main_process_port: 29500 | |
| main_training_function: main | |
| mixed_precision: bf16 | |
| mpirun_config: | |
| mpirun_ccl: <span class="hljs-string">'1'</span> | |
| mpirun_hostfile: /home/user/hostfile | |
| num_machines: 4 | |
| num_processes: 16 | |
| rdzv_backend: static | |
| same_network: <span class="hljs-literal">true</span> | |
| tpu_env: [] | |
| tpu_use_cluster: <span class="hljs-literal">false</span> | |
| tpu_use_sudo: <span class="hljs-literal">false</span> | |
| use_cpu: <span class="hljs-literal">true</span><!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-drika4">Set following env and using intel MPI to launch the training</p> <p data-svelte-h="svelte-uoehqm">In node0, you need to create a configuration file which contains the IP addresses of each node (for example hostfile) and pass that configuration file path as an argument. | |
| If you selected to have Accelerate launch <code>mpirun</code>, ensure that the location of your hostfile matches the path in the config.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->$ <span class="hljs-built_in">cat</span> hostfile | |
| xxx.xxx.xxx.xxx <span class="hljs-comment">#node0 ip</span> | |
| xxx.xxx.xxx.xxx <span class="hljs-comment">#node1 ip</span> | |
| xxx.xxx.xxx.xxx <span class="hljs-comment">#node2 ip</span> | |
| xxx.xxx.xxx.xxx <span class="hljs-comment">#node3 ip</span><!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1ib1sch">When Accelerate is launching <code>mpirun</code>, source the oneCCL bindings setvars.sh to get your Intel MPI environment, and then | |
| run your script using <code>accelerate launch</code>. Note that the python script and environment needs to exist on all of the | |
| machines being used for multi-CPU training.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->oneccl_bindings_for_pytorch_path=$(python -c <span class="hljs-string">"from oneccl_bindings_for_pytorch import cwd; print(cwd)"</span>) | |
| <span class="hljs-built_in">source</span> <span class="hljs-variable">$oneccl_bindings_for_pytorch_path</span>/env/setvars.sh | |
| accelerate launch examples/nlp_example.py<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1iirbzf">Otherwise, if you selected not to have Accelerate launch <code>mpirun</code>, run the following command in node0 and <strong>16DDP</strong> will | |
| be enabled in node0,node1,node2,node3 with BF16 mixed precision. When using this method, the python script, python | |
| environment, and accelerate config file need to be present on all of the machines used for multi-CPU training.</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->oneccl_bindings_for_pytorch_path=$(python -c <span class="hljs-string">"from oneccl_bindings_for_pytorch import cwd; print(cwd)"</span>) | |
| <span class="hljs-built_in">source</span> <span class="hljs-variable">$oneccl_bindings_for_pytorch_path</span>/env/setvars.sh | |
| <span class="hljs-built_in">export</span> CCL_WORKER_COUNT=1 | |
| <span class="hljs-built_in">export</span> MASTER_ADDR=xxx.xxx.xxx.xxx <span class="hljs-comment">#node0 ip</span> | |
| <span class="hljs-built_in">export</span> CCL_ATL_TRANSPORT=ofi | |
| mpirun -f hostfile -n 16 -ppn 4 accelerate launch examples/nlp_example.py<!-- HTML_TAG_END --></pre></div> <h2 class="relative group"><a id="related-resources" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#related-resources"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Related Resources</span></h2> <ul data-svelte-h="svelte-pt8bza"><li><a href="https://github.com/intel/intel-extension-for-pytorch" rel="nofollow">Project’s github</a></li> <li><a href="https://intel.github.io/intel-extension-for-pytorch/cpu/latest/tutorials/api_doc.html" rel="nofollow">API docs</a></li> <li><a href="https://intel.github.io/intel-extension-for-pytorch/cpu/latest/tutorials/performance_tuning/tuning_guide.html" rel="nofollow">Tuning guide</a></li> <li><a href="https://intel.github.io/intel-extension-for-pytorch/cpu/latest/tutorials/blogs_publications.html" rel="nofollow">Blogs & Publications</a></li></ul> <a class="!text-gray-400 !no-underline text-sm flex items-center not-prose mt-4" href="https://github.com/huggingface/accelerate/blob/main/docs/source/usage_guides/ipex.md" target="_blank"><span data-svelte-h="svelte-1kd6by1"><</span> <span data-svelte-h="svelte-x0xyl0">></span> <span data-svelte-h="svelte-1dajgef"><span class="underline ml-1.5">Update</span> on GitHub</span></a> <p></p> | |
| <script> | |
| { | |
| __sveltekit_1fyccrg = { | |
| assets: "/docs/accelerate/main/en", | |
| base: "/docs/accelerate/main/en", | |
| env: {} | |
| }; | |
| const element = document.currentScript.parentElement; | |
| const data = [null,null]; | |
| Promise.all([ | |
| import("/docs/accelerate/main/en/_app/immutable/entry/start.2ea03080.js"), | |
| import("/docs/accelerate/main/en/_app/immutable/entry/app.e6812672.js") | |
| ]).then(([kit, app]) => { | |
| kit.start(app, element, { | |
| node_ids: [0, 43], | |
| data, | |
| form: null, | |
| error: null | |
| }); | |
| }); | |
| } | |
| </script> | |
Xet Storage Details
- Size:
- 32.2 kB
- Xet hash:
- bd6efc37f7a4a4e0677c8de37a96484a98b0dffcce7b4ff05da76a47bea0f191
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.