Buckets:

hf-doc-build/doc-dev / optimum-amd /pr_149 /en /amdgpu /perf_hardware.html
rtrm's picture
download
raw
36 kB
<meta charset="utf-8" /><meta name="hf:doc:metadata" content="{&quot;title&quot;:&quot;AMD Instinct GPU connectivity&quot;,&quot;local&quot;:&quot;amd-instinct-gpu-connectivity&quot;,&quot;sections&quot;:[{&quot;title&quot;:&quot;Dual-die topology&quot;,&quot;local&quot;:&quot;dual-die-topology&quot;,&quot;sections&quot;:[],&quot;depth&quot;:2},{&quot;title&quot;:&quot;NUMA nodes&quot;,&quot;local&quot;:&quot;numa-nodes&quot;,&quot;sections&quot;:[],&quot;depth&quot;:2},{&quot;title&quot;:&quot;Infinity Fabric&quot;,&quot;local&quot;:&quot;infinity-fabric&quot;,&quot;sections&quot;:[],&quot;depth&quot;:2}],&quot;depth&quot;:1}">
<link href="/docs/optimum.amd/pr_149/en/_app/immutable/assets/0.e3b0c442.css" rel="modulepreload">
<link rel="modulepreload" href="/docs/optimum.amd/pr_149/en/_app/immutable/entry/start.554e2095.js">
<link rel="modulepreload" href="/docs/optimum.amd/pr_149/en/_app/immutable/chunks/scheduler.7da89386.js">
<link rel="modulepreload" href="/docs/optimum.amd/pr_149/en/_app/immutable/chunks/singletons.14434af1.js">
<link rel="modulepreload" href="/docs/optimum.amd/pr_149/en/_app/immutable/chunks/paths.372e4583.js">
<link rel="modulepreload" href="/docs/optimum.amd/pr_149/en/_app/immutable/entry/app.de596a17.js">
<link rel="modulepreload" href="/docs/optimum.amd/pr_149/en/_app/immutable/chunks/index.0b7befd3.js">
<link rel="modulepreload" href="/docs/optimum.amd/pr_149/en/_app/immutable/nodes/0.121538a6.js">
<link rel="modulepreload" href="/docs/optimum.amd/pr_149/en/_app/immutable/chunks/each.e59479a4.js">
<link rel="modulepreload" href="/docs/optimum.amd/pr_149/en/_app/immutable/nodes/3.657e82a2.js">
<link rel="modulepreload" href="/docs/optimum.amd/pr_149/en/_app/immutable/chunks/Tip.1e71740f.js">
<link rel="modulepreload" href="/docs/optimum.amd/pr_149/en/_app/immutable/chunks/CodeBlock.ce33a881.js">
<link rel="modulepreload" href="/docs/optimum.amd/pr_149/en/_app/immutable/chunks/Heading.8a936589.js"><!-- HEAD_svelte-u9bgzb_START --><meta name="hf:doc:metadata" content="{&quot;title&quot;:&quot;AMD Instinct GPU connectivity&quot;,&quot;local&quot;:&quot;amd-instinct-gpu-connectivity&quot;,&quot;sections&quot;:[{&quot;title&quot;:&quot;Dual-die topology&quot;,&quot;local&quot;:&quot;dual-die-topology&quot;,&quot;sections&quot;:[],&quot;depth&quot;:2},{&quot;title&quot;:&quot;NUMA nodes&quot;,&quot;local&quot;:&quot;numa-nodes&quot;,&quot;sections&quot;:[],&quot;depth&quot;:2},{&quot;title&quot;:&quot;Infinity Fabric&quot;,&quot;local&quot;:&quot;infinity-fabric&quot;,&quot;sections&quot;:[],&quot;depth&quot;:2}],&quot;depth&quot;:1}"><!-- HEAD_svelte-u9bgzb_END --> <p></p> <h1 class="relative group"><a id="amd-instinct-gpu-connectivity" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#amd-instinct-gpu-connectivity"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>AMD Instinct GPU connectivity</span></h1> <p data-svelte-h="svelte-car2bi">When using Hugging Face libraries with AMD Instinct MI210 or MI250 GPUs in a multi-GPU settings where collective operations are used, training and inference performance may vary depending on which devices are used together on a node. Some use cases are for example tensor parallelism, pipeline paralellism or data parallelism.</p> <h2 class="relative group"><a id="dual-die-topology" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#dual-die-topology"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Dual-die topology</span></h2> <div class="course-tip bg-gradient-to-br dark:bg-gradient-to-r before:border-green-500 dark:before:border-green-800 from-green-50 dark:from-gray-900 to-white dark:to-gray-950 border border-green-50 text-green-700 dark:text-gray-400"><p data-svelte-h="svelte-18h7ey9">Using several devices on an AMD Instinct machine through <code>torchrun</code> on a single node? We recommend using <code>amdrun --ngpus &lt;num_gpus&gt; &lt;script&gt; &lt;script_args&gt;</code> instead to automatically dispatch to the best <code>num_gpus</code> available for maximum performance.</p></div> <p data-svelte-h="svelte-1ctraos">Let’s take an MI250 machine for example. As <code>rocm-smi</code> shows, 8 devices are available:</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span> ROCm System Management Interface <span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span>
<span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span> Concise Info <span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span>
GPU Temp (DieEdge) AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU%
<span class="hljs-number">0</span> <span class="hljs-number">35.0</span>c <span class="hljs-number">90.0</span>W <span class="hljs-number">800</span>Mhz <span class="hljs-number">1600</span>Mhz <span class="hljs-number">0</span>% auto <span class="hljs-number">560.0</span>W <span class="hljs-number">0</span>% <span class="hljs-number">0</span>%
<span class="hljs-number">1</span> <span class="hljs-number">34.0</span>c N/A <span class="hljs-number">800</span>Mhz <span class="hljs-number">1600</span>Mhz <span class="hljs-number">0</span>% auto <span class="hljs-number">0.0</span>W <span class="hljs-number">0</span>% <span class="hljs-number">0</span>%
<span class="hljs-number">2</span> <span class="hljs-number">31.0</span>c <span class="hljs-number">95.0</span>W <span class="hljs-number">800</span>Mhz <span class="hljs-number">1600</span>Mhz <span class="hljs-number">0</span>% auto <span class="hljs-number">560.0</span>W <span class="hljs-number">0</span>% <span class="hljs-number">0</span>%
<span class="hljs-number">3</span> <span class="hljs-number">37.0</span>c N/A <span class="hljs-number">800</span>Mhz <span class="hljs-number">1600</span>Mhz <span class="hljs-number">0</span>% auto <span class="hljs-number">0.0</span>W <span class="hljs-number">0</span>% <span class="hljs-number">0</span>%
<span class="hljs-number">4</span> <span class="hljs-number">35.0</span>c <span class="hljs-number">99.0</span>W <span class="hljs-number">800</span>Mhz <span class="hljs-number">1600</span>Mhz <span class="hljs-number">0</span>% auto <span class="hljs-number">560.0</span>W <span class="hljs-number">0</span>% <span class="hljs-number">0</span>%
<span class="hljs-number">5</span> <span class="hljs-number">31.0</span>c N/A <span class="hljs-number">800</span>Mhz <span class="hljs-number">1600</span>Mhz <span class="hljs-number">0</span>% auto <span class="hljs-number">0.0</span>W <span class="hljs-number">0</span>% <span class="hljs-number">0</span>%
<span class="hljs-number">6</span> <span class="hljs-number">38.0</span>c <span class="hljs-number">94.0</span>W <span class="hljs-number">800</span>Mhz <span class="hljs-number">1600</span>Mhz <span class="hljs-number">0</span>% auto <span class="hljs-number">560.0</span>W <span class="hljs-number">0</span>% <span class="hljs-number">0</span>%
<span class="hljs-number">7</span> <span class="hljs-number">39.0</span>c N/A <span class="hljs-number">800</span>Mhz <span class="hljs-number">1600</span>Mhz <span class="hljs-number">0</span>% auto <span class="hljs-number">0.0</span>W <span class="hljs-number">0</span>% <span class="hljs-number">0</span>%
<span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><span class="hljs-operator">=</span><!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1noyqtk">However, as can be seen on the <a href="https://www.amd.com/content/dam/amd/en/documents/instinct-business-docs/white-papers/amd-cdna2-white-paper.pdf" rel="nofollow">description of the machine architecture</a>, some devices effectively have a privileged connection and two devices (two GCDs, Graphics Compute Die) from <code>rocm-smi</code> actually correspond to one MI250 (one OAM, OCP Accelerator Module).</p> <div style="text-align: center" data-svelte-h="svelte-56c05s"><img src="https://huggingface.co/datasets/optimum/documentation-images/resolve/main/amd/mi250_topology.png" width="512" height="512" alt="4xMI250 machine topology"></div> <p data-svelte-h="svelte-15xtw77"><a href="https://www.amd.com/content/dam/amd/en/documents/instinct-business-docs/white-papers/amd-cdna2-white-paper.pdf" rel="nofollow">4xMI250 machine topology</a></p> <p data-svelte-h="svelte-7d3dnm">This can be checked by running <code>rocm-smi --shownodesbw</code>: some device &lt;-&gt; device link have a higher maximum bandwith. For example, from the table below, we can conclude that:</p> <ul data-svelte-h="svelte-ex3kul"><li>If using two devices, using <code>CUDA_VISIBLE_DEVICES=&quot;0,1&quot;</code>, or <code>&quot;2,3&quot;</code>, or <code>&quot;4,5&quot;</code> or <code>&quot;6,7&quot;</code> should be privileged.</li> <li>If using three devices, <code>CUDA_VISIBLE_DEVICES=&quot;0,1,6&quot;</code> is a good option.</li> <li>If using four devices <code>CUDA_VISIBLE_DEVICES=&quot;0,1,6,7&quot;</code> or <code>&quot;2,3,4,5&quot;</code> is a good option.</li></ul> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->========================= ROCm System Management Interface =========================
==================================== Bandwidth =====================================
GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7
GPU0 N/A 50000<span class="hljs-string">-200000</span> 50000<span class="hljs-string">-50000</span> 0<span class="hljs-string">-0</span> 0<span class="hljs-string">-0</span> 0<span class="hljs-string">-0</span> 50000<span class="hljs-string">-100000</span> 0<span class="hljs-string">-0</span>
GPU1 50000<span class="hljs-string">-200000</span> N/A 0<span class="hljs-string">-0</span> 50000<span class="hljs-string">-50000</span> 0<span class="hljs-string">-0</span> 50000<span class="hljs-string">-50000</span> 0<span class="hljs-string">-0</span> 0<span class="hljs-string">-0</span>
GPU2 50000<span class="hljs-string">-50000</span> 0<span class="hljs-string">-0</span> N/A 50000<span class="hljs-string">-200000</span> 50000<span class="hljs-string">-100000</span> 0<span class="hljs-string">-0</span> 0<span class="hljs-string">-0</span> 0<span class="hljs-string">-0</span>
GPU3 0<span class="hljs-string">-0</span> 50000<span class="hljs-string">-50000</span> 50000<span class="hljs-string">-200000</span> N/A 0<span class="hljs-string">-0</span> 0<span class="hljs-string">-0</span> 0<span class="hljs-string">-0</span> 50000<span class="hljs-string">-50000</span>
GPU4 0<span class="hljs-string">-0</span> 0<span class="hljs-string">-0</span> 50000<span class="hljs-string">-100000</span> 0<span class="hljs-string">-0</span> N/A 50000<span class="hljs-string">-200000</span> 50000<span class="hljs-string">-50000</span> 0<span class="hljs-string">-0</span>
GPU5 0<span class="hljs-string">-0</span> 50000<span class="hljs-string">-50000</span> 0<span class="hljs-string">-0</span> 0<span class="hljs-string">-0</span> 50000<span class="hljs-string">-200000</span> N/A 0<span class="hljs-string">-0</span> 50000<span class="hljs-string">-50000</span>
GPU6 50000<span class="hljs-string">-100000</span> 0<span class="hljs-string">-0</span> 0<span class="hljs-string">-0</span> 0<span class="hljs-string">-0</span> 50000<span class="hljs-string">-50000</span> 0<span class="hljs-string">-0</span> N/A 50000<span class="hljs-string">-200000</span>
GPU7 0<span class="hljs-string">-0</span> 0<span class="hljs-string">-0</span> 0<span class="hljs-string">-0</span> 50000<span class="hljs-string">-50000</span> 0<span class="hljs-string">-0</span> 50000<span class="hljs-string">-50000</span> 50000<span class="hljs-string">-200000</span> N/A
Format: min-max; Units: mps
&quot;0<span class="hljs-string">-0</span>&quot; min-max bandwidth indicates devices are not connected directly<!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-hi6hof">This table only gives theoretical minimum/maximum bandwidth. A good option to validate which devices to use together is to run the <a href="https://github.com/RadeonOpenCompute/rocm_bandwidth_test" rel="nofollow">rocm_bandwidth_test</a> on your device.</p> <h2 class="relative group"><a id="numa-nodes" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#numa-nodes"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>NUMA nodes</span></h2> <p data-svelte-h="svelte-qsadxk">On certain AMD machines as seen in the figure below, some devices may have a privileged connectivity with certain CPU cores.</p> <div style="text-align: center" data-svelte-h="svelte-1iavqr2"><img src="https://huggingface.co/datasets/optimum/documentation-images/resolve/main/amd/mi250_topology2.png" width="512" height="512" alt="4xMI250 machine topology"></div> <p data-svelte-h="svelte-1yq23ia"><a href="https://www.supermicro.com/products/brief/product-brief-Universal-GPU.pdf" rel="nofollow">4xMI250 machine topology</a></p> <p data-svelte-h="svelte-1lanpdy">This can be checked using <code>rocm-smi --showtoponuma</code> that gives the NUMA topology:</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START -->==================================== Numa Nodes ====================================
GPU[<span class="hljs-number">0</span>] : (Topology) Numa <span class="hljs-keyword">Node</span><span class="hljs-title">: 0</span>
GPU[<span class="hljs-number">0</span>] : (Topology) Numa Affinity: <span class="hljs-number">0</span>
GPU[<span class="hljs-number">1</span>] : (Topology) Numa <span class="hljs-keyword">Node</span><span class="hljs-title">: 0</span>
GPU[<span class="hljs-number">1</span>] : (Topology) Numa Affinity: <span class="hljs-number">0</span>
GPU[<span class="hljs-number">2</span>] : (Topology) Numa <span class="hljs-keyword">Node</span><span class="hljs-title">: 0</span>
GPU[<span class="hljs-number">2</span>] : (Topology) Numa Affinity: <span class="hljs-number">0</span>
GPU[<span class="hljs-number">3</span>] : (Topology) Numa <span class="hljs-keyword">Node</span><span class="hljs-title">: 0</span>
GPU[<span class="hljs-number">3</span>] : (Topology) Numa Affinity: <span class="hljs-number">0</span>
GPU[<span class="hljs-number">4</span>] : (Topology) Numa <span class="hljs-keyword">Node</span><span class="hljs-title">: 1</span>
GPU[<span class="hljs-number">4</span>] : (Topology) Numa Affinity: <span class="hljs-number">1</span>
GPU[<span class="hljs-number">5</span>] : (Topology) Numa <span class="hljs-keyword">Node</span><span class="hljs-title">: 1</span>
GPU[<span class="hljs-number">5</span>] : (Topology) Numa Affinity: <span class="hljs-number">1</span>
GPU[<span class="hljs-number">6</span>] : (Topology) Numa <span class="hljs-keyword">Node</span><span class="hljs-title">: 1</span>
GPU[<span class="hljs-number">6</span>] : (Topology) Numa Affinity: <span class="hljs-number">1</span>
GPU[<span class="hljs-number">7</span>] : (Topology) Numa <span class="hljs-keyword">Node</span><span class="hljs-title">: 1</span>
GPU[<span class="hljs-number">7</span>] : (Topology) Numa Affinity: <span class="hljs-number">1</span><!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-pnlzzg">and the difference in bandwidth can be checked using <a href="https://github.com/RadeonOpenCompute/rocm_bandwidth_test" rel="nofollow">rocm_bandwidth_test</a> (redacted):</p> <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> Copied</div></button></div> <pre class=""><!-- HTML_TAG_START --><span class="hljs-attribute">Bidirectional</span> copy peak bandwidth GB/s
<span class="hljs-attribute">D</span>/D cpu0 cpu1
<span class="hljs-attribute">cpu0</span> N/A N/A
<span class="hljs-attribute">cpu1</span> N/A N/A
<span class="hljs-attribute">0</span> <span class="hljs-number">47</span>.<span class="hljs-number">763</span> <span class="hljs-number">38</span>.<span class="hljs-number">101</span>
<span class="hljs-attribute">1</span> <span class="hljs-number">47</span>.<span class="hljs-number">796</span> <span class="hljs-number">38</span>.<span class="hljs-number">101</span>
<span class="hljs-attribute">2</span> <span class="hljs-number">47</span>.<span class="hljs-number">732</span> <span class="hljs-number">36</span>.<span class="hljs-number">429</span>
<span class="hljs-attribute">3</span> <span class="hljs-number">47</span>.<span class="hljs-number">709</span> <span class="hljs-number">36</span>.<span class="hljs-number">330</span>
<span class="hljs-attribute">4</span> <span class="hljs-number">36</span>.<span class="hljs-number">705</span> <span class="hljs-number">47</span>.<span class="hljs-number">468</span>
<span class="hljs-attribute">5</span> <span class="hljs-number">36</span>.<span class="hljs-number">725</span> <span class="hljs-number">47</span>.<span class="hljs-number">396</span>
<span class="hljs-attribute">6</span> <span class="hljs-number">35</span>.<span class="hljs-number">605</span> <span class="hljs-number">47</span>.<span class="hljs-number">294</span>
<span class="hljs-attribute">7</span> <span class="hljs-number">35</span>.<span class="hljs-number">377</span> <span class="hljs-number">47</span>.<span class="hljs-number">233</span><!-- HTML_TAG_END --></pre></div> <p data-svelte-h="svelte-1of3tan">When benchmarking for optimal performances, we advise testing both without/with <a href="https://access.redhat.com/documentation/en-en/red_hat_enterprise_linux/7/html/virtualization_tuning_and_optimization_guide/sect-virtualization_tuning_optimization_guide-numa-auto_numa_balancing" rel="nofollow">NUMA balancing</a> at <code>/proc/sys/kernel/numa_balancing</code>, which may impact performances. The table below shows the difference in performance of <a href="https://github.com/huggingface/text-generation-inference" rel="nofollow">Text Generation Inference</a> in a specific case where disabling NUMA balancing greatly increased performances.</p> <div style="text-align: center" data-svelte-h="svelte-wkuigk"><img src="https://huggingface.co/datasets/optimum/documentation-images/resolve/main/amd/tgi_numa_llama70b.png" alt="Text Generation Inference latency comparison without/with NUMA balancing"></div> <p data-svelte-h="svelte-7k7z7o">An alternative can be to use <code>numactl --membind</code>, binding a process using a GPU to its corresponding NUMA node cores. More details <a href="https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html#numa-best-practices" rel="nofollow">here</a>.</p> <h2 class="relative group"><a id="infinity-fabric" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#infinity-fabric"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Infinity Fabric</span></h2> <p data-svelte-h="svelte-ym0olu">As seen on the below architecture for an MI210 machine, some GPU devices may be linked by an <a href="https://en.wikichip.org/wiki/amd/infinity_fabric" rel="nofollow">Infinity Fabric link</a> that typically has a higher bandwidth than PCIe switch (up to 100 GB/s per Infinity Fabric link).</p> <p data-svelte-h="svelte-knas6n">In fact measuring unidirectional copy peak bandwidth, we see that MI210 GPUs linked by Infinity Fabric can communicate ~1.7x times faster than through PCIe switch.</p> <div style="text-align: center" data-svelte-h="svelte-4cdibo"><img src="https://huggingface.co/datasets/optimum/documentation-images/resolve/main/amd/mi210_topology.png" width="512" height="512" alt="8xMI210 machine topology"></div> <p data-svelte-h="svelte-a9hwhv"><a href="https://www.amd.com/content/dam/amd/en/documents/instinct-business-docs/white-papers/amd-cdna2-white-paper.pdf" rel="nofollow">8xMI210 machine topology</a></p> <p></p>
<script>
{
__sveltekit_c75107 = {
assets: "/docs/optimum.amd/pr_149/en",
base: "/docs/optimum.amd/pr_149/en",
env: {}
};
const element = document.currentScript.parentElement;
const data = [null,null];
Promise.all([
import("/docs/optimum.amd/pr_149/en/_app/immutable/entry/start.554e2095.js"),
import("/docs/optimum.amd/pr_149/en/_app/immutable/entry/app.de596a17.js")
]).then(([kit, app]) => {
kit.start(app, element, {
node_ids: [0, 3],
data,
form: null,
error: null
});
});
}
</script>

Xet Storage Details

Size:
36 kB
·
Xet hash:
4dd4d838d1120578c0a9757e2fd9c820dec9dde739235e40fc2acbbab015d639

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.