# MindSpore Models ## Introduction MindSpore is a high-performance AI framework optimized for Ascend NPUs. This doc guides users to run MindSpore models in SGLang. ## Requirements MindSpore currently only supports Ascend NPU devices. Users need to first install Ascend CANN software packages. The CANN software packages can be downloaded from the [Ascend Official Website](https://www.hiascend.com). The recommended version is 8.3.RC2. ## Supported Models Currently, the following models are supported: - **Qwen3**: Dense and MoE models - **DeepSeek V3/R1** - *More models coming soon...* ## Installation > **Note**: Currently, MindSpore models are provided by an independent package `sgl-mindspore`. Support for MindSpore is built upon current SGLang support for Ascend NPU platform. Please first [install SGLang for Ascend NPU](ascend_npu.md) and then install `sgl-mindspore`: ```shell git clone https://github.com/mindspore-lab/sgl-mindspore.git cd sgl-mindspore pip install -e . ``` ## Run Model Current SGLang-MindSpore supports Qwen3 and DeepSeek V3/R1 models. This doc uses Qwen3-8B as an example. ### Offline infer Use the following script for offline infer: ```python import sglang as sgl # Initialize the engine with MindSpore backend llm = sgl.Engine( model_path="/path/to/your/model", # Local model path device="npu", # Use NPU device model_impl="mindspore", # MindSpore implementation attention_backend="ascend", # Attention backend tp_size=1, # Tensor parallelism size dp_size=1 # Data parallelism size ) # Generate text prompts = [ "Hello, my name is", "The capital of France is", "The future of AI is" ] sampling_params = {"temperature": 0, "top_p": 0.9} outputs = llm.generate(prompts, sampling_params) for prompt, output in zip(prompts, outputs): print(f"Prompt: {prompt}") print(f"Generated: {output['text']}") print("---") ``` ### Start server Launch a server with MindSpore backend: ```bash # Basic server startup python3 -m sglang.launch_server \ --model-path /path/to/your/model \ --host 0.0.0.0 \ --device npu \ --model-impl mindspore \ --attention-backend ascend \ --tp-size 1 \ --dp-size 1 ``` For distributed server with multiple nodes: ```bash # Multi-node distributed server python3 -m sglang.launch_server \ --model-path /path/to/your/model \ --host 0.0.0.0 \ --device npu \ --model-impl mindspore \ --attention-backend ascend \ --dist-init-addr 127.0.0.1:29500 \ --nnodes 2 \ --node-rank 0 \ --tp-size 4 \ --dp-size 2 ``` ## Troubleshooting #### Debug Mode Enable sglang debug logging by log-level argument. ```bash python3 -m sglang.launch_server \ --model-path /path/to/your/model \ --host 0.0.0.0 \ --device npu \ --model-impl mindspore \ --attention-backend ascend \ --log-level DEBUG ``` Enable mindspore info and debug logging by setting environments. ```bash export GLOG_v=1 # INFO export GLOG_v=0 # DEBUG ``` #### Explicitly select devices Use the following environment variable to explicitly select the devices to use. ```shell export ASCEND_RT_VISIBLE_DEVICES=4,5,6,7 # to set device ``` #### Some communication environment issues In case of some environment with special communication environment, users need set some environment variables. ```shell export MS_ENABLE_LCCL=off # current not support LCCL communication mode in SGLang-MindSpore ``` #### Some dependencies of protobuf In case of some environment with special protobuf version, users need set some environment variables to avoid binary version mismatch. ```shell export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python # to avoid protobuf binary version mismatch ``` ## Support For MindSpore-specific issues: - Refer to the [MindSpore documentation](https://www.mindspore.cn/)