arithmetic-grpo / docs /ascend_tutorial /ascend_sglang_quick_start.rst
LeTue09's picture
initial clean commit
1faccd4
Ascend Quickstart with SGLang Backend
===================================
Last updated: 01/27/2026.
我们在 verl 上增加对华为昇腾设备的支持。
硬件支持
-----------------------------------
Atlas 200T A2 Box16
Atlas 900 A2 PODc
Atlas 800T A3
安装
-----------------------------------
关键支持版本
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+-----------+-----------------+
| software | version |
+===========+=================+
| Python | == 3.11 |
+-----------+-----------------+
| HDK | >= 25.3.RC1 |
+-----------+-----------------+
| CANN | >= 8.3.RC1 |
+-----------+-----------------+
| torch | >= 2.7.1 |
+-----------+-----------------+
| torch_npu | >= 2.7.1.post2 |
+-----------+-----------------+
| sglang | v0.5.8 |
+-----------+-----------------+
从 Docker 镜像进行安装
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
我们提供了DockerFile进行构建,详见 `dockerfile_build_guidance <https://github.com/verl-project/verl/blob/main/docs/ascend_tutorial/dockerfile_build_guidance.rst>`_ ,请根据设备自行选择对应构建文件
从自定义环境安装
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
**1. 安装HDK&CANN依赖并激活**
异构计算架构CANN(Compute Architecture for Neural Networks)是昇腾针对AI场景推出的异构计算架构, 为了使训练和推理引擎能够利用更好、更快的硬件支持, 我们需要安装以下 `先决条件 <https://www.hiascend.com/document/detail/zh/canncommercial/83RC1/softwareinst/instg/instg_quick.html?Mode=PmIns&InstallType=netconda&OS=openEuler&Software=cannToolKit>`_
+-----------+-------------+
| HDK | >= 25.3.RC1 |
+-----------+-------------+
| CANN | >= 8.3.RC1 |
+-----------+-------------+
安装完成后请激活环境
.. code-block:: bash
source /usr/local/Ascend/ascend-toolkit/set_env.sh
source /usr/local/Ascend/nnal/atb/set_env.sh
**2. 创建conda环境**
.. code-block:: bash
# create conda env
conda create -n verl-sglang python==3.11
conda activate verl-sglang
**3. 然后,执行我们在 verl 中提供的脚本** `install_sglang_mcore_npu.sh <https://github.com/verl-project/verl/blob/main/scripts/install_sglang_mcore_npu.sh>`_
如果在此步骤中遇到错误,请检查脚本并手动按照脚本中的步骤操作。
.. code-block:: bash
git clone https://github.com/volcengine/verl.git
# Make sure you have activated verl conda env
# NPU_DEVICE=A3 or A2 depends on your device
# USE_MEGATRON=1 if you need to install megatron backend
NPU_DEVICE=A3 USE_MEGATRON=1 bash verl/scripts/install_sglang_mcore_npu.sh
**4. 安装verl**
.. code-block:: bash
cd verl
pip install --no-deps -e .
pip install -r requirements-npu.txt
快速开始
-----------------------------------
**1.当前NPU sglang脚本一览**
.. _Qwen3-30B: https://github.com/verl-project/verl/blob/main/examples/grpo_trainer/run_qwen3moe-30b_sglang_megatron_npu.sh
.. _Qwen2.5-32B: https://github.com/verl-project/verl/blob/main/examples/grpo_trainer/run_qwen2-32b_sglang_fsdp_npu.sh
.. _Qwen3-8B-1k: https://github.com/verl-project/verl/blob/main/examples/grpo_trainer/run_qwen3_8b_grpo_sglang_1k_spmd_npu.sh
.. _Qwen3-8B-32k: https://github.com/verl-project/verl/blob/main/examples/grpo_trainer/run_qwen3_8b_grpo_sglang_32k_spmd_npu.sh
+-----------------+----------------+----------+-------------------+
| 模型 | 推荐NPU型号 | 节点数量 | 训推后端 |
+=================+================+==========+===================+
| `Qwen3-30B`_ | Atlas 800T A3 | 1 | SGLang + Megatron |
+-----------------+----------------+----------+-------------------+
| `Qwen2.5-32B`_ | Atlas 900 A2 | 2 | SGLang + FSDP |
+-----------------+----------------+----------+-------------------+
| `Qwen3-8B-1k`_ | Atlas A3/A2 | 1 | SGLang + FSDP |
+-----------------+----------------+----------+-------------------+
| `Qwen3-8B-32k`_ | Atlas A3/A2 | 1 | SGLang + FSDP |
+-----------------+----------------+----------+-------------------+
**2.最佳实践**
我们提供基于verl+sglang `Qwen3-30B`_ 以及 `Qwen2.5-32B`_ 的 `最佳实践 <https://github.com/verl-project/verl/blob/main/docs/ascend_tutorial/examples/ascend_sglang_best_practices.rst>`_ 作为参考
**3.环境变量与参数**
当前NPU上支持sglang后端必须添加以下环境变量
.. code-block:: bash
#支持NPU单卡多进程 https://www.hiascend.com/document/detail/zh/canncommercial/850/commlib/hcclug/hcclug_000091.html
export HCCL_HOST_SOCKET_PORT_RANGE=60000-60050
export HCCL_NPU_SOCKET_PORT_RANGE=61000-61050
#规避ray在device侧调用无法根据is_npu_available接口识别设备可用性
export RAY_EXPERIMENTAL_NOSET_ASCEND_RT_VISIBLE_DEVICES=1
#根据当前设备和需要卡数定义
export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
#使能推理EP时需要
export SGLANG_DEEPEP_BF16_DISPATCH=1
当前verl已解析推理常见参数, 详见 `async_sglang_server.py <https://github.com/verl-project/verl/blob/main/verl/workers/rollout/sglang_rollout/async_sglang_server.py>`_ 中 ServerArgs初始化传参,其他 `sglang参数 <https://github.com/sgl-project/sglang/blob/main/docs/advanced_features/server_arguments.md>`_ 均可通过engine_kwargs 进行参数传递
vllm后端推理脚本转换为sglang, 需要添加修改以下参数
.. code-block:: bash
#必须
actor_rollout_ref.rollout.name=sglang
+actor_rollout_ref.rollout.engine_kwargs.sglang.attention_backend="ascend"
#可选
#使能推理EP,详细使用方法见 https://github.com/sgl-project/sgl-kernel-npu/blob/main/python/deep_ep/README_CN.md
++actor_rollout_ref.rollout.engine_kwargs.sglang.deepep_mode="auto"
++actor_rollout_ref.rollout.engine_kwargs.sglang.moe_a2a_backend="deepep"
#Moe模型多DP时必须设置为True
+actor_rollout_ref.rollout.engine_kwargs.sglang.enable_dp_attention=False
#chunked_prefill默认关闭
+actor_rollout_ref.rollout.engine_kwargs.sglang.chunked_prefill_size=-1