Hanrui / sglang /docs /platforms /ascend_npu_support_models.md
Lekr0's picture
Add files using upload-large-folder tool
a227c91 verified

Support Models on Ascend NPU

This section describes the models supported on the Ascend NPU, including Large Language Models, Multimodal Language Models, Embedding Models, Reward Models and Rerank Models. Mainstream DeepSeek/Qwen/GLM series are included. You are welcome to enable various models based on your business requirements.

Large Language Models

Models Model Family A2 Supported A3 Supported
DeepSeek V3/V3.1 DeepSeek √ √
DeepSeek-V3.2-Exp-W8A8 DeepSeek √ √
DeepSeek-R1-0528-W8A8 DeepSeek √ √
DeepSeek-V2-Lite-W8A8 DeepSeek √ √
Qwen/Qwen3-30B-A3B-Instruct-2507 Qwen √ √
Qwen/Qwen3-32B Qwen √ √
Qwen/Qwen3-0.6B Qwen √ √
Qwen3-235B-A22B-W8A8 Qwen √ √
Qwen/Qwen3-Next-80B-A3B-Instruct Qwen √ √
Qwen3-Coder-480B-A35B-Instruct-w8a8-QuaRot Qwen √ √
Qwen/Qwen2.5-7B-Instruct Qwen √ √
QWQ-32B-W8A8 Qwen √ √
meta-llama/Llama-4-Scout-17B-16E-Instruct Llama √ √
AI-ModelScope/Llama-3.1-8B-Instruct Llama √ √
LLM-Research/llama-2-7b Llama √ √
LLM-Research/Llama-3.2-1B-Instruct Llama √ √
mistralai/Mistral-7B-Instruct-v0.2 Mistral √ √
google/gemma-3-4b-it Gemma √ √
microsoft/Phi-4-multimodal-instruct Phi √ √
allenai/OLMoE-1B-7B-0924 OLMoE √ √
stabilityai/stablelm-2-1_6b StableLM √ √
CohereForAI/c4ai-command-r-v01 Command-R √ √
huihui-ai/grok-2 Grok √ √
ZhipuAI/chatglm2-6b ChatGLM √ √
Shanghai_AI_Laboratory/internlm2-7b InternLM 2 √ √
LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct ExaONE 3 √ √
xverse/XVERSE-MoE-A36B XVERSE √ √
HuggingFaceTB/SmolLM-1.7B SmolLM √ √
ZhipuAI/glm-4-9b-chat GLM-4 √ √
XiaomiMiMo/MiMo-7B-RL MiMo √ √
arcee-ai/AFM-4.5B-Base Arcee AFM-4.5B √ √
Howeee/persimmon-8b-chat Persimmon √ √
inclusionAI/Ling-lite Ling √ √
ibm-granite/granite-3.1-8b-instruct Granite √ √
ibm-granite/granite-3.0-3b-a800m-instruct Granite MoE √ √
AI-ModelScope/dbrx-instruct DBRX (Databricks) √ √
baichuan-inc/Baichuan2-13B-Chat Baichuan 2 (7B, 13B) √ √
baidu/ERNIE-4.5-21B-A3B-PT ERNIE-4.5 (4.5, 4.5MoE series) √ √
OpenBMB/MiniCPM3-4B MiniCPM (v3, 4B) √ √
Kimi/Kimi-K2-Thinking Kimi √ √
openai/gpt-oss-120b GPTOSS √ √
allenai/OLMo-2-1124-7B-Instruct OLMo √ √
minimax/MiniMax-M2 MiniMax-M2 √ √
upstage/SOLAR-10.7B-Instruct-v1.0 Solar √ √
bigcode/starcoder2-7b StarCoder2 √ √
arcee-ai/Trinity-Mini Trinity (Nano, Mini) √ √

Multimodal Language Models

Models Model Family (Variants) A2 Supported A3 Supported
Qwen/Qwen2.5-VL-3B-Instruct Qwen-VL √ √
Qwen/Qwen2.5-VL-72B-Instruct Qwen-VL √ √
Qwen/Qwen3-VL-30B-A3B-Instruct Qwen-VL √ √
Qwen/Qwen3-VL-8B-Instruct Qwen-VL √ √
Qwen/Qwen3-VL-4B-Instruct Qwen-VL √ √
Qwen/Qwen3-VL-235B-A22B-Instruct Qwen-VL √ √
deepseek-ai/deepseek-vl2 DeepSeek-VL2 √ √
deepseek-ai/Janus-Pro-1B Janus-Pro (1B, 7B) √ √
deepseek-ai/Janus-Pro-7B Janus-Pro (1B, 7B) √ √
openbmb/MiniCPM-V-2_6 MiniCPM-V / MiniCPM-o √ √
openbmb/MiniCPM-o-2_6 MiniCPM-V / MiniCPM-o √ √
google/gemma-3-4b-it Gemma 3 (Multimodal) √ √
mistralai/Mistral-Small-3.1-24B-Instruct-2503 Mistral-Small-3.1-24B √ √
microsoft/Phi-4-multimodal-instruct Phi-4-multimodal-instruct √ √
XiaomiMiMo/MiMo-VL-7B-RL MiMo-VL (7B) √ √
AI-ModelScope/llava-v1.6-34b LLaVA (v1.5 & v1.6) √ √
lmms-lab/llava-next-72b LLaVA-NeXT (8B, 72B) √ √
lmms-lab/llava-onevision-qwen2-7b-ov LLaVA-OneVision √ √
Kimi/Kimi-VL-A3B-Instruct Kimi-VL (A3B) √ √
ZhipuAI/GLM-4.5V GLM-4.5V (106B) √ √
LLM-Research/Llama-3.2-11B-Vision-Instruct Llama 3.2 Vision (11B) √ √
rednote-hilab/dots.ocr DotsVLM-OCR √ √

Embedding Models

Models Model Family A2 Supported A3 Supported
intfloat/e5-mistral-7b-instruct E5 (Llama/Mistral based) √ √
iic/gte_Qwen2-1.5B-instruct GTE-Qwen2 √ √
Qwen/Qwen3-Embedding-8B Qwen3-Embedding √ √
Alibaba-NLP/gme-Qwen2-VL-2B-Instruct GME (Multimodal) √ √
AI-ModelScope/clip-vit-large-patch14-336 CLIP √ √
BAAI/bge-large-en-v1.5 BGE √ √

Reward Models

Models Model Family A2 Supported A3 Supported
Skywork/Skywork-Reward-Llama-3.1-8B-v0.2 Llama3.1 Reward √ √
Shanghai_AI_Laboratory/internlm2-7b-reward InternLM 2 Reward √ √
Qwen/Qwen2.5-Math-RM-72B Qwen2.5 Reward - Math √ √
Howeee/Qwen2.5-1.5B-apeach Qwen2.5 Reward - Sequence √ √
AI-ModelScope/Skywork-Reward-Gemma-2-27B-v0.2 Gemma 2-27B Reward √ √

Rerank Models

Models Model Family A2 Supported A3 Supported
BAAI/bge-reranker-v2-m3 BGE-Reranker √ √