File size: 1,117 Bytes
cb2428f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
# How to run
## 1. Install pdsh in your nodes
```shell
# https://code.google.com/archive/p/pdsh/downloads
# For example, download to /root:
cd /root
wget https://storage.googleapis.com/google-code-archive-downloads/v2/code.google.com/pdsh/pdsh-2.29.tar.bz2
tar -xvf pdsh-2.29.tar.bz2
cd pdsh-2.29
./configure --prefix=/root/pdsh-2.29 --with-ssh --without-rsh --with-exec --with-timeout=60 --with-nodeupdown --with-rcmd-rank-list=ssh
make
make install
```
In case of the privilege is correct:
```shell
chown root:root /root/pdsh-2.29
```
## Configure the ssh
vim your ~/.ssh/config and input:
```text
Host worker-0
HostName your-worker-0-ip-here
User root
Host worker-1
HostName your-worker-1-ip-here
User root
```
Say you have two nodes, when doing this, make sure your other nodes can be logined with `ssh root@worker-x` without password(with ssh-key).
## Clone swift repo and run
```shell
git clone https://github.com/modelscope/ms-swift.git
cd ms-swift
# If your node number is different, edit examples/train/multi-node/deepspeed/host.txt
sh examples/train/multi-node/deepspeed/train.sh
```
|