File size: 6,221 Bytes
1faccd4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
Ascend Quickstart with SGLang Backend
===================================

Last updated: 01/27/2026.

我们在 verl 上增加对华为昇腾设备的支持。

硬件支持
-----------------------------------

Atlas 200T A2 Box16

Atlas 900 A2 PODc

Atlas 800T A3


安装
-----------------------------------
关键支持版本
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

+-----------+-----------------+
| software  | version         |
+===========+=================+
| Python    | == 3.11         |
+-----------+-----------------+
| HDK       | >= 25.3.RC1     |
+-----------+-----------------+
| CANN      | >= 8.3.RC1      |
+-----------+-----------------+
| torch     | >= 2.7.1        |
+-----------+-----------------+
| torch_npu | >= 2.7.1.post2  |
+-----------+-----------------+
| sglang    | v0.5.8          |
+-----------+-----------------+

从 Docker 镜像进行安装
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
我们提供了DockerFile进行构建,详见 `dockerfile_build_guidance <https://github.com/verl-project/verl/blob/main/docs/ascend_tutorial/dockerfile_build_guidance.rst>`_ ,请根据设备自行选择对应构建文件

从自定义环境安装
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

**1. 安装HDK&CANN依赖并激活**

异构计算架构CANN(Compute Architecture for Neural Networks)是昇腾针对AI场景推出的异构计算架构, 为了使训练和推理引擎能够利用更好、更快的硬件支持, 我们需要安装以下 `先决条件 <https://www.hiascend.com/document/detail/zh/canncommercial/83RC1/softwareinst/instg/instg_quick.html?Mode=PmIns&InstallType=netconda&OS=openEuler&Software=cannToolKit>`_

+-----------+-------------+
| HDK       | >= 25.3.RC1 |
+-----------+-------------+
| CANN      | >= 8.3.RC1  |
+-----------+-------------+
安装完成后请激活环境

.. code-block:: bash

    source /usr/local/Ascend/ascend-toolkit/set_env.sh
    source /usr/local/Ascend/nnal/atb/set_env.sh

**2. 创建conda环境**

.. code-block:: bash
    
    # create conda env
    conda create -n verl-sglang python==3.11
    conda activate verl-sglang

**3. 然后,执行我们在 verl 中提供的脚本** `install_sglang_mcore_npu.sh <https://github.com/verl-project/verl/blob/main/scripts/install_sglang_mcore_npu.sh>`_

如果在此步骤中遇到错误,请检查脚本并手动按照脚本中的步骤操作。

.. code-block:: bash

    git clone https://github.com/volcengine/verl.git  
    # Make sure you have activated verl conda env
    # NPU_DEVICE=A3 or A2 depends on your device
    # USE_MEGATRON=1 if you need to install megatron backend
    NPU_DEVICE=A3 USE_MEGATRON=1 bash verl/scripts/install_sglang_mcore_npu.sh

**4. 安装verl**

.. code-block:: bash

    cd verl
    pip install --no-deps -e .
    pip install -r requirements-npu.txt 


快速开始
-----------------------------------

**1.当前NPU sglang脚本一览**

.. _Qwen3-30B: https://github.com/verl-project/verl/blob/main/examples/grpo_trainer/run_qwen3moe-30b_sglang_megatron_npu.sh
.. _Qwen2.5-32B: https://github.com/verl-project/verl/blob/main/examples/grpo_trainer/run_qwen2-32b_sglang_fsdp_npu.sh
.. _Qwen3-8B-1k: https://github.com/verl-project/verl/blob/main/examples/grpo_trainer/run_qwen3_8b_grpo_sglang_1k_spmd_npu.sh
.. _Qwen3-8B-32k: https://github.com/verl-project/verl/blob/main/examples/grpo_trainer/run_qwen3_8b_grpo_sglang_32k_spmd_npu.sh

   +-----------------+----------------+----------+-------------------+
   | 模型            | 推荐NPU型号    | 节点数量 | 训推后端          |
   +=================+================+==========+===================+
   | `Qwen3-30B`_    | Atlas 800T A3  | 1        | SGLang + Megatron |
   +-----------------+----------------+----------+-------------------+
   | `Qwen2.5-32B`_  | Atlas 900 A2   | 2        | SGLang + FSDP     |
   +-----------------+----------------+----------+-------------------+
   | `Qwen3-8B-1k`_  | Atlas A3/A2    | 1        | SGLang + FSDP     |
   +-----------------+----------------+----------+-------------------+
   | `Qwen3-8B-32k`_ | Atlas A3/A2    | 1        | SGLang + FSDP     |
   +-----------------+----------------+----------+-------------------+

**2.最佳实践**

我们提供基于verl+sglang `Qwen3-30B`_ 以及 `Qwen2.5-32B`_ 的 `最佳实践 <https://github.com/verl-project/verl/blob/main/docs/ascend_tutorial/examples/ascend_sglang_best_practices.rst>`_ 作为参考

**3.环境变量与参数**

当前NPU上支持sglang后端必须添加以下环境变量

.. code-block:: bash

    #支持NPU单卡多进程 https://www.hiascend.com/document/detail/zh/canncommercial/850/commlib/hcclug/hcclug_000091.html
    export HCCL_HOST_SOCKET_PORT_RANGE=60000-60050
    export HCCL_NPU_SOCKET_PORT_RANGE=61000-61050
    #规避ray在device侧调用无法根据is_npu_available接口识别设备可用性
    export RAY_EXPERIMENTAL_NOSET_ASCEND_RT_VISIBLE_DEVICES=1
    #根据当前设备和需要卡数定义
    export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
    #使能推理EP时需要
    export SGLANG_DEEPEP_BF16_DISPATCH=1



当前verl已解析推理常见参数, 详见 `async_sglang_server.py <https://github.com/verl-project/verl/blob/main/verl/workers/rollout/sglang_rollout/async_sglang_server.py>`_  中 ServerArgs初始化传参,其他 `sglang参数 <https://github.com/sgl-project/sglang/blob/main/docs/advanced_features/server_arguments.md>`_ 均可通过engine_kwargs 进行参数传递

vllm后端推理脚本转换为sglang, 需要添加修改以下参数

.. code-block:: bash

    #必须
    actor_rollout_ref.rollout.name=sglang
    +actor_rollout_ref.rollout.engine_kwargs.sglang.attention_backend="ascend"
    #可选
    #使能推理EP,详细使用方法见 https://github.com/sgl-project/sgl-kernel-npu/blob/main/python/deep_ep/README_CN.md
    ++actor_rollout_ref.rollout.engine_kwargs.sglang.deepep_mode="auto" 
    ++actor_rollout_ref.rollout.engine_kwargs.sglang.moe_a2a_backend="deepep"
    #Moe模型多DP时必须设置为True
    +actor_rollout_ref.rollout.engine_kwargs.sglang.enable_dp_attention=False
    #chunked_prefill默认关闭
    +actor_rollout_ref.rollout.engine_kwargs.sglang.chunked_prefill_size=-1