File size: 5,800 Bytes

7134ce7

# ray的支持

SWIFT已经支持使用ray来进行多卡或多节点训练。已有功能中对ray的支持情况如下：

| 功能       | 支持ray | 例子                                                                             | 可分配角色           |
|----------|-------|--------------------------------------------------------------------------------|-----------------|
| pt/sft   | ✅     | https://github.com/modelscope/ms-swift/tree/main/examples/train/multi-node/ray | default         |
| dpo      | ❎     |                                                                                |                 |
| grpo     | ❎     |                                                                                |                 |
| ppo      | ❎     |                                                                                |                 |
| megatron | ❎     |                                                                                |                 |
| sampling | ✅     | https://github.com/modelscope/ms-swift/tree/main/examples/sampler/distill      | sampler/prm/orm |
| distill  | ✅     | https://github.com/modelscope/ms-swift/tree/main/examples/sampler/sample       | sampler/prm/orm |

## 技术细节

在叙述参数设置之前，我们有必要先行讲一下技术细节。由于SWIFT的内部当前使用了大量transformers和trl的已有实现，像veRL或ROLL一样拆解为不同的ray角色是不现实的，而且拆解后会以ray为中心，对非ray的场景的支持会不良。
因此SWIFT采取了装饰器为主的技术方案，以函数级别定义了不同角色，这些角色可以在参数中被定义如何使用。看下面的例子：

```python

from swift.ray import RayHelper



@RayHelper.worker(group=['model1', 'model2'])

class MyTrainer:



    def __init__(self, args):

        self._prepare_model1()

        self._prepare_model2()

        self._prepare_datasets()



    @RayHelper.function(group='model1')

    def _prepare_model1(self):

        ...



    @RayHelper.function(group='model2')

    def _prepare_model2(self):

        ...



    @RayHelper.function(group='model1')

    def rollout(self, inputs):

        return self.model1.generate(inputs)



    @RayHelper.function(group='model2')

    def forward_model2(self, inputs):

        loss = self.model2.forward(inputs)

        loss.backward()



    def _prepare_datasets(self):

        self.dataset = ...



    def train(self):

        for batch in DataLoader(self.dataset):

            generated = self.rollout(batch)

            self.forward_model2(generated)

            ...





if __name__ == '__main__':

    ...

    MyTrainer(args).train()

```

RayHelper会将被装饰的方法分配到不同的硬件集群中，本地调用会被平滑转换到ray集群中进行远程调用。也可以以类为中心进行划分：

```python



@RayHelper.worker(group=['model1'])

class Model1:

    ...



    @RayHelper.function(group='model1')

    def rollout(self):

        ...



@RayHelper.worker(group=['model2'])

class Model2:

    ...



    @RayHelper.function(group='model2')

    def forward_and_optimize(self):

        ...





class Trainer:

    ...

```

SWIFT对ray的支持本质上是使用@worker和@function两个注解的组合使用，worker指定ray集群的角色，function指定如何分配数据。

function注解有额外的几个参数：
```python

    @staticmethod

    def function(group: str,

                 dispatch: Union[Literal['slice', 'all'], Callable] = 'all',

                 execute: Literal['first', 'all'] = 'all',

                 collect: Union[Literal['none', 'flatten'], Callable] = 'none'):

```

- dispatch: 如何分配调用入参
  - slice：对入参切分，也就是worker负载均衡执行
  - all：各个worker入参完全相同
  - 自定义切分方式，格式为：
    ```python

        def my_custom_slice(n, i, data):

            # n是worker数量，i是当前worker索引，data是原始入参

            # 返回第i个的入参

    ```

- execute: 如何执行

  - first: rank0执行，此时slice和Callable方式切分无效

  - all: 全部执行


- collect: 如何收集返回数据
  - none：原样返回，格式为各个worker返回值的列表
  - flatten: 将worker返回的结果进行拉平，支持tuple的拉平
  - Callable: 自定义collect方式，格式为：
    ```python

        def my_custom_collect(result):

            # result是各个worker返回的列表

            # 输入你想要的格式

    ```


## 参数设置

讲完技术细节后，可以将参数配置了。开发者可以根据不同的流程中的角色列表，设置不同的硬件搭配方式，例如采样功能中，共有三个角色，sampler、prm、orm，可以这样配置：

```yaml

device_groups:

  nproc_per_node: 4

  sample_group:

    device: GPU

    ranks: list(range(0, 2))

    workers:

      - sampler

  rm_group:

    device: GPU

    ranks: list(range(2, 4))

    workers:

      - prm

      - orm

```

- nproc_per_node: ray集群中需要的每个node的最小卡数。
xxx_group: 每个ray组的名称，可以随意指定

  - device: 设备类型，当前支持GPU/CPU等。

  - ranks: 当前组分配到哪些ranks上。如果是CPU，ranks只能为整数，代表共需要多少进程，如果是GPU，可以为`[0,1,2,3]`, `4`, `list(range(0, 4))`等格式。

  - workers: 哪些角色分配到当前组中。



所有可用的角色可以见本文最上面的表。



如果使用命令行，device_groups也可以以`--device_groups xxx`方式传入，xxx为jsonstring。为了配置的简便，我们强烈推荐使用yaml方式搭配ray使用。