File size: 1,117 Bytes
cb2428f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
# How to run

## 1. Install pdsh in your nodes

```shell
# https://code.google.com/archive/p/pdsh/downloads
# For example, download to /root:
cd /root
wget https://storage.googleapis.com/google-code-archive-downloads/v2/code.google.com/pdsh/pdsh-2.29.tar.bz2
tar -xvf pdsh-2.29.tar.bz2
cd pdsh-2.29
./configure --prefix=/root/pdsh-2.29 --with-ssh --without-rsh --with-exec --with-timeout=60 --with-nodeupdown --with-rcmd-rank-list=ssh
make
make install
```

In case of the privilege is correct:
```shell
chown root:root /root/pdsh-2.29
```

## Configure the ssh

vim your ~/.ssh/config and input:
```text
Host worker-0
    HostName your-worker-0-ip-here
    User root
Host worker-1
    HostName your-worker-1-ip-here
    User root
```
Say you have two nodes, when doing this, make sure your other nodes can be logined with `ssh root@worker-x` without password(with ssh-key).

## Clone swift repo and run

```shell
git clone https://github.com/modelscope/ms-swift.git
cd ms-swift
# If your node number is different, edit examples/train/multi-node/deepspeed/host.txt
sh examples/train/multi-node/deepspeed/train.sh
```