File size: 4,664 Bytes
42d1e42
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
159b6fa
42d1e42
 
 
 
 
 
 
 
 
 
159b6fa
42d1e42
 
 
 
 
 
 
 
159b6fa
42d1e42
 
 
 
 
 
 
 
159b6fa
42d1e42
 
 
 
 
 
 
 
 
159b6fa
42d1e42
 
 
 
 
 
 
 
 
 
159b6fa
42d1e42
 
 
 
 
 
 
 
 
 
 
 
 
159b6fa
42d1e42
159b6fa
42d1e42
 
 
 
 
159b6fa
42d1e42
 
 
 
 
 
 
 
 
 
 
 
159b6fa
42d1e42
 
 
 
 
 
 
159b6fa
42d1e42
 
 
 
 
159b6fa
42d1e42
 
 
 
 
 
 
159b6fa
42d1e42
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
<!--Copyright 2025 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.

⚠️ Note that this file is in Markdown but contains specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.

-->

# 가속기 선택 [[accelerator-selection]]

λΆ„μ‚° ν•™μŠ΅ μ€‘μ—λŠ” μ‚¬μš©ν•  가속기(CUDA, XPU, MPS, HPU λ“±)의 μˆ˜μ™€ μˆœμ„œλ₯Ό μ§€μ •ν•  수 μžˆμŠ΅λ‹ˆλ‹€. μ΄λŠ” μ„œλ‘œ λ‹€λ₯Έ μ»΄ν“¨νŒ… μ„±λŠ₯을 κ°€μ§„ 가속기가 μžˆμ„ λ•Œ 더 λΉ λ₯Έ 가속기λ₯Ό λ¨Όμ € μ‚¬μš©ν•˜κ³  싢은 κ²½μš°μ— μœ μš©ν•  수 μžˆμŠ΅λ‹ˆλ‹€. λ˜λŠ” μ‚¬μš© κ°€λŠ₯ν•œ κ°€μ†κΈ°μ˜ μΌλΆ€λ§Œ μ‚¬μš©ν•  μˆ˜λ„ μžˆμŠ΅λ‹ˆλ‹€. 선택 과정은 [DistributedDataParallel](https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html)κ³Ό [DataParallel](https://pytorch.org/docs/stable/generated/torch.nn.DataParallel.html) λͺ¨λ‘μ—μ„œ μž‘λ™ν•©λ‹ˆλ‹€. Accelerateλ‚˜ [DeepSpeed integration](./main_classes/deepspeed)λŠ” ν•„μš”ν•˜μ§€ μ•ŠμŠ΅λ‹ˆλ‹€.

이 κ°€μ΄λ“œλŠ” μ‚¬μš©ν•  κ°€μ†κΈ°μ˜ μˆ˜μ™€ μ‚¬μš© μˆœμ„œλ₯Ό μ„ νƒν•˜λŠ” 방법을 λ³΄μ—¬μ€λ‹ˆλ‹€.

## 가속기 수 [[number-of-accelerators]]

예λ₯Ό λ“€μ–΄, 4개의 가속기가 있고 처음 2개만 μ‚¬μš©ν•˜κ³  μ‹Άλ‹€λ©΄ μ•„λž˜ λͺ…령을 μ‹€ν–‰ν•˜μ„Έμš”.

<hfoptions id="select-accelerator">
<hfoption id="torchrun">

`--nproc_per_node`λ₯Ό μ‚¬μš©ν•˜μ—¬ μ‚¬μš©ν•  가속기 수λ₯Ό μ„ νƒν•©λ‹ˆλ‹€.

```bash
torchrun --nproc_per_node=2  trainer-program.py ...
```

</hfoption>
<hfoption id="Accelerate">

`--num_processes`λ₯Ό μ‚¬μš©ν•˜μ—¬ μ‚¬μš©ν•  가속기 수λ₯Ό μ„ νƒν•©λ‹ˆλ‹€.

```bash
accelerate launch --num_processes 2 trainer-program.py ...
```

</hfoption>
<hfoption id="DeepSpeed">

`--num_gpus`λ₯Ό μ‚¬μš©ν•˜μ—¬ μ‚¬μš©ν•  GPU 수λ₯Ό μ„ νƒν•©λ‹ˆλ‹€.

```bash
deepspeed --num_gpus 2 trainer-program.py ...
```

</hfoption>
</hfoptions>

## 가속기 μˆœμ„œ [[order-of-accelerators]]
μ‚¬μš©ν•  νŠΉμ • 가속기와 κ·Έ μˆœμ„œλ₯Ό μ„ νƒν•˜λ €λ©΄ ν•˜λ“œμ›¨μ–΄μ— μ ν•©ν•œ ν™˜κ²½ λ³€μˆ˜λ₯Ό μ‚¬μš©ν•˜μ„Έμš”. μ΄λŠ” μ’…μ’… 각 싀행에 λŒ€ν•΄ λͺ…λ Ήμ€„μ—μ„œ μ„€μ •λ˜μ§€λ§Œ, `~/.bashrc`λ‚˜ λ‹€λ₯Έ μ‹œμž‘ ꡬ성 νŒŒμΌμ— μΆ”κ°€ν•  μˆ˜λ„ μžˆμŠ΅λ‹ˆλ‹€.

예λ₯Ό λ“€μ–΄, 4개의 가속기(0, 1, 2, 3)κ°€ 있고 가속기 0κ³Ό 2만 μ‹€ν–‰ν•˜κ³  μ‹Άλ‹€λ©΄:

<hfoptions id="accelerator-type">
<hfoption id="CUDA">

```bash
CUDA_VISIBLE_DEVICES=0,2 torchrun trainer-program.py ...
```

GPU 0κ³Ό 2만 PyTorchμ—μ„œ "보이며" 각각 `cuda:0`κ³Ό `cuda:1`둜 λ§€ν•‘λ©λ‹ˆλ‹€.  
μˆœμ„œλ₯Ό λ°”κΎΈλ €λ©΄ (GPU 2λ₯Ό `cuda:0`으둜, GPU 0을 `cuda:1`둜 μ‚¬μš©):


```bash
CUDA_VISIBLE_DEVICES=2,0 torchrun trainer-program.py ...
```

GPU 없이 μ‹€ν–‰ν•˜λ €λ©΄:

```bash
CUDA_VISIBLE_DEVICES= python trainer-program.py ...
```

`CUDA_DEVICE_ORDER`λ₯Ό μ‚¬μš©ν•˜μ—¬ CUDA μž₯치의 μˆœμ„œλ₯Ό μ œμ–΄ν•  μˆ˜λ„ μžˆμŠ΅λ‹ˆλ‹€:

- PCIe λ²„μŠ€ ID μˆœμ„œ (`nvidia-smi`와 일치):

    ```bash
$hf_i18n_placeholder21export CUDA_DEVICE_ORDER=PCI_BUS_ID
    ```

- μ»΄ν“¨νŒ… μ„±λŠ₯ μˆœμ„œ (κ°€μž₯ λΉ λ₯Έ 것뢀터):

    ```bash
    export CUDA_DEVICE_ORDER=FASTEST_FIRST
    ```

</hfoption>
<hfoption id="Intel XPU">

```bash
ZE_AFFINITY_MASK=0,2 torchrun trainer-program.py ...
```

XPU 0κ³Ό 2만 PyTorchμ—μ„œ "보이며" 각각 `xpu:0`κ³Ό `xpu:1`둜 λ§€ν•‘λ©λ‹ˆλ‹€.  
μˆœμ„œλ₯Ό λ°”κΎΈλ €λ©΄ (XPU 2λ₯Ό `xpu:0`으둜, XPU 0을 `xpu:1`둜 μ‚¬μš©):

```bash
ZE_AFFINITY_MASK=2,0 torchrun trainer-program.py ...
```


λ‹€μŒμ„ μ‚¬μš©ν•˜μ—¬ Intel XPU의 μˆœμ„œλ₯Ό μ œμ–΄ν•  μˆ˜λ„ μžˆμŠ΅λ‹ˆλ‹€:

```bash
export ZE_ENABLE_PCI_ID_DEVICE_ORDER=1
```

Intel XPUμ—μ„œμ˜ μž₯치 μ—΄κ±° 및 정렬에 λŒ€ν•œ μžμ„Έν•œ μ •λ³΄λŠ” [Level Zero](https://github.com/oneapi-src/level-zero/blob/master/README.md?plain=1#L87) λ¬Έμ„œλ₯Ό μ°Έμ‘°ν•˜μ„Έμš”.

</hfoption>
</hfoptions>



> [!WARNING]
> ν™˜κ²½ λ³€μˆ˜λŠ” λͺ…령쀄에 μΆ”κ°€ν•˜λŠ” λŒ€μ‹  내보낼 수 μžˆμŠ΅λ‹ˆλ‹€. ν™˜κ²½ λ³€μˆ˜κ°€ μ–΄λ–»κ²Œ μ„€μ •λ˜μ—ˆλŠ”μ§€ μžŠμ–΄λ²„λ¦¬κ³  잘λͺ»λœ 가속기λ₯Ό μ‚¬μš©ν•˜κ²Œ 될 수 μžˆμ–΄ ν˜Όλž€μ„ μ•ΌκΈ°ν•  수 μžˆμœΌλ―€λ‘œ ꢌμž₯ν•˜μ§€ μ•ŠμŠ΅λ‹ˆλ‹€. λŒ€μ‹ , 같은 λͺ…λ Ήμ€„μ—μ„œ νŠΉμ • ν›ˆλ ¨ 싀행을 μœ„ν•΄ ν™˜κ²½ λ³€μˆ˜λ₯Ό μ„€μ •ν•˜λŠ” 것이 일반적인 κ΄€λ‘€μž…λ‹ˆλ‹€.
```