File size: 1,159 Bytes
7134ce7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
# How to run

## 1. Install pdsh in your nodes

```shell

# https://code.google.com/archive/p/pdsh/downloads

# For example, download to /root:

cd /root

wget https://storage.googleapis.com/google-code-archive-downloads/v2/code.google.com/pdsh/pdsh-2.29.tar.bz2

tar -xvf pdsh-2.29.tar.bz2

cd pdsh-2.29

./configure --prefix=/root/pdsh-2.29 --with-ssh --without-rsh --with-exec --with-timeout=60 --with-nodeupdown --with-rcmd-rank-list=ssh

make

make install

```

In case of the privilege is correct:
```shell

chown root:root /root/pdsh-2.29

```

## Configure the ssh

vim your ~/.ssh/config and input:
```text

Host worker-0

    HostName your-worker-0-ip-here

    User root

Host worker-1

    HostName your-worker-1-ip-here

    User root

```
Say you have two nodes, when doing this, make sure your other nodes can be logined with `ssh root@worker-x` without password(with ssh-key).

## Clone swift repo and run

```shell

git clone https://github.com/modelscope/ms-swift.git

cd ms-swift

# If your node number is different, edit examples/train/multi-node/deepspeed/host.txt

sh examples/train/multi-node/deepspeed/train.sh

```