Hanrui / sglang /docs /platforms /mindspore_backend.md

Add files using upload-large-folder tool

a227c91 verified about 2 months ago

3.95 kB

	# MindSpore Models

	## Introduction

	MindSpore is a high-performance AI framework optimized for Ascend NPUs. This doc guides users to run MindSpore models in SGLang.

	## Requirements

	MindSpore currently only supports Ascend NPU devices. Users need to first install Ascend CANN software packages.
	The CANN software packages can be downloaded from the [Ascend Official Website](https://www.hiascend.com). The recommended version is 8.3.RC2.

	## Supported Models

	Currently, the following models are supported:

	- Qwen3: Dense and MoE models
	- DeepSeek V3/R1
	- More models coming soon...

	## Installation

	> Note: Currently, MindSpore models are provided by an independent package `sgl-mindspore`. Support for MindSpore is built upon current SGLang support for Ascend NPU platform. Please first [install SGLang for Ascend NPU](ascend_npu.md) and then install `sgl-mindspore`:

	```shell
	git clone https://github.com/mindspore-lab/sgl-mindspore.git
	cd sgl-mindspore
	pip install -e .
	```


	## Run Model

	Current SGLang-MindSpore supports Qwen3 and DeepSeek V3/R1 models. This doc uses Qwen3-8B as an example.

	### Offline infer

	Use the following script for offline infer:

	```python
	import sglang as sgl

	# Initialize the engine with MindSpore backend
	llm = sgl.Engine(
	model_path="/path/to/your/model", # Local model path
	device="npu", # Use NPU device
	model_impl="mindspore", # MindSpore implementation
	attention_backend="ascend", # Attention backend
	tp_size=1, # Tensor parallelism size
	dp_size=1 # Data parallelism size
	)

	# Generate text
	prompts = [
	"Hello, my name is",
	"The capital of France is",
	"The future of AI is"
	]

	sampling_params = {"temperature": 0, "top_p": 0.9}
	outputs = llm.generate(prompts, sampling_params)

	for prompt, output in zip(prompts, outputs):
	print(f"Prompt: {prompt}")
	print(f"Generated: {output['text']}")
	print("---")
	```

	### Start server

	Launch a server with MindSpore backend:

	```bash
	# Basic server startup
	python3 -m sglang.launch_server \
	--model-path /path/to/your/model \
	--host 0.0.0.0 \
	--device npu \
	--model-impl mindspore \
	--attention-backend ascend \
	--tp-size 1 \
	--dp-size 1
	```

	For distributed server with multiple nodes:

	```bash
	# Multi-node distributed server
	python3 -m sglang.launch_server \
	--model-path /path/to/your/model \
	--host 0.0.0.0 \
	--device npu \
	--model-impl mindspore \
	--attention-backend ascend \
	--dist-init-addr 127.0.0.1:29500 \
	--nnodes 2 \
	--node-rank 0 \
	--tp-size 4 \
	--dp-size 2
	```

	## Troubleshooting

	#### Debug Mode

	Enable sglang debug logging by log-level argument.

	```bash
	python3 -m sglang.launch_server \
	--model-path /path/to/your/model \
	--host 0.0.0.0 \
	--device npu \
	--model-impl mindspore \
	--attention-backend ascend \
	--log-level DEBUG
	```

	Enable mindspore info and debug logging by setting environments.

	```bash
	export GLOG_v=1 # INFO
	export GLOG_v=0 # DEBUG
	```

	#### Explicitly select devices

	Use the following environment variable to explicitly select the devices to use.

	```shell
	export ASCEND_RT_VISIBLE_DEVICES=4,5,6,7 # to set device
	```

	#### Some communication environment issues

	In case of some environment with special communication environment, users need set some environment variables.

	```shell
	export MS_ENABLE_LCCL=off # current not support LCCL communication mode in SGLang-MindSpore
	```

	#### Some dependencies of protobuf

	In case of some environment with special protobuf version, users need set some environment variables to avoid binary version mismatch.

	```shell
	export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python # to avoid protobuf binary version mismatch
	```

	## Support
	For MindSpore-specific issues:

	- Refer to the [MindSpore documentation](https://www.mindspore.cn/)