Hanrui / sglang /docs /platforms /mindspore_backend.md
Lekr0's picture
Add files using upload-large-folder tool
a227c91 verified
# MindSpore Models
## Introduction
MindSpore is a high-performance AI framework optimized for Ascend NPUs. This doc guides users to run MindSpore models in SGLang.
## Requirements
MindSpore currently only supports Ascend NPU devices. Users need to first install Ascend CANN software packages.
The CANN software packages can be downloaded from the [Ascend Official Website](https://www.hiascend.com). The recommended version is 8.3.RC2.
## Supported Models
Currently, the following models are supported:
- **Qwen3**: Dense and MoE models
- **DeepSeek V3/R1**
- *More models coming soon...*
## Installation
> **Note**: Currently, MindSpore models are provided by an independent package `sgl-mindspore`. Support for MindSpore is built upon current SGLang support for Ascend NPU platform. Please first [install SGLang for Ascend NPU](ascend_npu.md) and then install `sgl-mindspore`:
```shell
git clone https://github.com/mindspore-lab/sgl-mindspore.git
cd sgl-mindspore
pip install -e .
```
## Run Model
Current SGLang-MindSpore supports Qwen3 and DeepSeek V3/R1 models. This doc uses Qwen3-8B as an example.
### Offline infer
Use the following script for offline infer:
```python
import sglang as sgl
# Initialize the engine with MindSpore backend
llm = sgl.Engine(
model_path="/path/to/your/model", # Local model path
device="npu", # Use NPU device
model_impl="mindspore", # MindSpore implementation
attention_backend="ascend", # Attention backend
tp_size=1, # Tensor parallelism size
dp_size=1 # Data parallelism size
)
# Generate text
prompts = [
"Hello, my name is",
"The capital of France is",
"The future of AI is"
]
sampling_params = {"temperature": 0, "top_p": 0.9}
outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
print(f"Prompt: {prompt}")
print(f"Generated: {output['text']}")
print("---")
```
### Start server
Launch a server with MindSpore backend:
```bash
# Basic server startup
python3 -m sglang.launch_server \
--model-path /path/to/your/model \
--host 0.0.0.0 \
--device npu \
--model-impl mindspore \
--attention-backend ascend \
--tp-size 1 \
--dp-size 1
```
For distributed server with multiple nodes:
```bash
# Multi-node distributed server
python3 -m sglang.launch_server \
--model-path /path/to/your/model \
--host 0.0.0.0 \
--device npu \
--model-impl mindspore \
--attention-backend ascend \
--dist-init-addr 127.0.0.1:29500 \
--nnodes 2 \
--node-rank 0 \
--tp-size 4 \
--dp-size 2
```
## Troubleshooting
#### Debug Mode
Enable sglang debug logging by log-level argument.
```bash
python3 -m sglang.launch_server \
--model-path /path/to/your/model \
--host 0.0.0.0 \
--device npu \
--model-impl mindspore \
--attention-backend ascend \
--log-level DEBUG
```
Enable mindspore info and debug logging by setting environments.
```bash
export GLOG_v=1 # INFO
export GLOG_v=0 # DEBUG
```
#### Explicitly select devices
Use the following environment variable to explicitly select the devices to use.
```shell
export ASCEND_RT_VISIBLE_DEVICES=4,5,6,7 # to set device
```
#### Some communication environment issues
In case of some environment with special communication environment, users need set some environment variables.
```shell
export MS_ENABLE_LCCL=off # current not support LCCL communication mode in SGLang-MindSpore
```
#### Some dependencies of protobuf
In case of some environment with special protobuf version, users need set some environment variables to avoid binary version mismatch.
```shell
export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python # to avoid protobuf binary version mismatch
```
## Support
For MindSpore-specific issues:
- Refer to the [MindSpore documentation](https://www.mindspore.cn/)