Spaces:
Sleeping
Sleeping
File size: 4,664 Bytes
42d1e42 159b6fa 42d1e42 159b6fa 42d1e42 159b6fa 42d1e42 159b6fa 42d1e42 159b6fa 42d1e42 159b6fa 42d1e42 159b6fa 42d1e42 159b6fa 42d1e42 159b6fa 42d1e42 159b6fa 42d1e42 159b6fa 42d1e42 159b6fa 42d1e42 159b6fa 42d1e42 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 |
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
β οΈ Note that this file is in Markdown but contains specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# κ°μκΈ° μ ν [[accelerator-selection]]
λΆμ° νμ΅ μ€μλ μ¬μ©ν κ°μκΈ°(CUDA, XPU, MPS, HPU λ±)μ μμ μμλ₯Ό μ§μ ν μ μμ΅λλ€. μ΄λ μλ‘ λ€λ₯Έ μ»΄ν¨ν
μ±λ₯μ κ°μ§ κ°μκΈ°κ° μμ λ λ λΉ λ₯Έ κ°μκΈ°λ₯Ό λ¨Όμ μ¬μ©νκ³ μΆμ κ²½μ°μ μ μ©ν μ μμ΅λλ€. λλ μ¬μ© κ°λ₯ν κ°μκΈ°μ μΌλΆλ§ μ¬μ©ν μλ μμ΅λλ€. μ ν κ³Όμ μ [DistributedDataParallel](https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html)κ³Ό [DataParallel](https://pytorch.org/docs/stable/generated/torch.nn.DataParallel.html) λͺ¨λμμ μλν©λλ€. Accelerateλ [DeepSpeed integration](./main_classes/deepspeed)λ νμνμ§ μμ΅λλ€.
μ΄ κ°μ΄λλ μ¬μ©ν κ°μκΈ°μ μμ μ¬μ© μμλ₯Ό μ ννλ λ°©λ²μ 보μ¬μ€λλ€.
## κ°μκΈ° μ [[number-of-accelerators]]
μλ₯Ό λ€μ΄, 4κ°μ κ°μκΈ°κ° μκ³ μ²μ 2κ°λ§ μ¬μ©νκ³ μΆλ€λ©΄ μλ λͺ
λ Ήμ μ€ννμΈμ.
<hfoptions id="select-accelerator">
<hfoption id="torchrun">
`--nproc_per_node`λ₯Ό μ¬μ©νμ¬ μ¬μ©ν κ°μκΈ° μλ₯Ό μ νν©λλ€.
```bash
torchrun --nproc_per_node=2 trainer-program.py ...
```
</hfoption>
<hfoption id="Accelerate">
`--num_processes`λ₯Ό μ¬μ©νμ¬ μ¬μ©ν κ°μκΈ° μλ₯Ό μ νν©λλ€.
```bash
accelerate launch --num_processes 2 trainer-program.py ...
```
</hfoption>
<hfoption id="DeepSpeed">
`--num_gpus`λ₯Ό μ¬μ©νμ¬ μ¬μ©ν GPU μλ₯Ό μ νν©λλ€.
```bash
deepspeed --num_gpus 2 trainer-program.py ...
```
</hfoption>
</hfoptions>
## κ°μκΈ° μμ [[order-of-accelerators]]
μ¬μ©ν νΉμ κ°μκΈ°μ κ·Έ μμλ₯Ό μ ννλ €λ©΄ νλμ¨μ΄μ μ ν©ν νκ²½ λ³μλ₯Ό μ¬μ©νμΈμ. μ΄λ μ’
μ’
κ° μ€νμ λν΄ λͺ
λ Ήμ€μμ μ€μ λμ§λ§, `~/.bashrc`λ λ€λ₯Έ μμ κ΅¬μ± νμΌμ μΆκ°ν μλ μμ΅λλ€.
μλ₯Ό λ€μ΄, 4κ°μ κ°μκΈ°(0, 1, 2, 3)κ° μκ³ κ°μκΈ° 0κ³Ό 2λ§ μ€ννκ³ μΆλ€λ©΄:
<hfoptions id="accelerator-type">
<hfoption id="CUDA">
```bash
CUDA_VISIBLE_DEVICES=0,2 torchrun trainer-program.py ...
```
GPU 0κ³Ό 2λ§ PyTorchμμ "보μ΄λ©°" κ°κ° `cuda:0`κ³Ό `cuda:1`λ‘ λ§€νλ©λλ€.
μμλ₯Ό λ°κΎΈλ €λ©΄ (GPU 2λ₯Ό `cuda:0`μΌλ‘, GPU 0μ `cuda:1`λ‘ μ¬μ©):
```bash
CUDA_VISIBLE_DEVICES=2,0 torchrun trainer-program.py ...
```
GPU μμ΄ μ€ννλ €λ©΄:
```bash
CUDA_VISIBLE_DEVICES= python trainer-program.py ...
```
`CUDA_DEVICE_ORDER`λ₯Ό μ¬μ©νμ¬ CUDA μ₯μΉμ μμλ₯Ό μ μ΄ν μλ μμ΅λλ€:
- PCIe λ²μ€ ID μμ (`nvidia-smi`μ μΌμΉ):
```bash
$hf_i18n_placeholder21export CUDA_DEVICE_ORDER=PCI_BUS_ID
```
- μ»΄ν¨ν
μ±λ₯ μμ (κ°μ₯ λΉ λ₯Έ κ²λΆν°):
```bash
export CUDA_DEVICE_ORDER=FASTEST_FIRST
```
</hfoption>
<hfoption id="Intel XPU">
```bash
ZE_AFFINITY_MASK=0,2 torchrun trainer-program.py ...
```
XPU 0κ³Ό 2λ§ PyTorchμμ "보μ΄λ©°" κ°κ° `xpu:0`κ³Ό `xpu:1`λ‘ λ§€νλ©λλ€.
μμλ₯Ό λ°κΎΈλ €λ©΄ (XPU 2λ₯Ό `xpu:0`μΌλ‘, XPU 0μ `xpu:1`λ‘ μ¬μ©):
```bash
ZE_AFFINITY_MASK=2,0 torchrun trainer-program.py ...
```
λ€μμ μ¬μ©νμ¬ Intel XPUμ μμλ₯Ό μ μ΄ν μλ μμ΅λλ€:
```bash
export ZE_ENABLE_PCI_ID_DEVICE_ORDER=1
```
Intel XPUμμμ μ₯μΉ μ΄κ±° λ° μ λ ¬μ λν μμΈν μ 보λ [Level Zero](https://github.com/oneapi-src/level-zero/blob/master/README.md?plain=1#L87) λ¬Έμλ₯Ό μ°Έμ‘°νμΈμ.
</hfoption>
</hfoptions>
> [!WARNING]
> νκ²½ λ³μλ λͺ
λ Ήμ€μ μΆκ°νλ λμ λ΄λ³΄λΌ μ μμ΅λλ€. νκ²½ λ³μκ° μ΄λ»κ² μ€μ λμλμ§ μμ΄λ²λ¦¬κ³ μλͺ»λ κ°μκΈ°λ₯Ό μ¬μ©νκ² λ μ μμ΄ νΌλμ μΌκΈ°ν μ μμΌλ―λ‘ κΆμ₯νμ§ μμ΅λλ€. λμ , κ°μ λͺ
λ Ήμ€μμ νΉμ νλ ¨ μ€νμ μν΄ νκ²½ λ³μλ₯Ό μ€μ νλ κ²μ΄ μΌλ°μ μΈ κ΄λ‘μ
λλ€.
``` |