ineso22 commited on
Commit
1fe3bba
·
verified ·
1 Parent(s): b346ad1

Delete docs/sglang_deploy_guide.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. docs/sglang_deploy_guide.md +0 -112
docs/sglang_deploy_guide.md DELETED
@@ -1,112 +0,0 @@
1
- # MiniMax M2.1 Model SGLang Deployment Guide
2
-
3
- [English Version](./sglang_deploy_guide.md) | [Chinese Version](./sglang_deploy_guide_cn.md)
4
-
5
- We recommend using [SGLang](https://github.com/sgl-project/sglang) to deploy the [MiniMax-M2.1](https://huggingface.co/MiniMaxAI/MiniMax-M2.1) model. SGLang is a high-performance inference engine with excellent serving throughput, efficient and intelligent memory management, powerful batch request processing capabilities, and deeply optimized underlying performance. We recommend reviewing SGLang's official documentation to check hardware compatibility before deployment.
6
-
7
- ## Applicable Models
8
-
9
- This document applies to the following models. You only need to change the model name during deployment.
10
-
11
- - [MiniMaxAI/MiniMax-M2.1](https://huggingface.co/MiniMaxAI/MiniMax-M2.1)
12
- - [MiniMaxAI/MiniMax-M2](https://huggingface.co/MiniMaxAI/MiniMax-M2)
13
-
14
- The deployment process is illustrated below using MiniMax-M2.1 as an example.
15
-
16
- ## System Requirements
17
-
18
- - OS: Linux
19
-
20
- - Python: 3.9 - 3.12
21
-
22
- - GPU:
23
-
24
- - compute capability 7.0 or higher
25
-
26
- - Memory requirements: 220 GB for weights, 240 GB per 1M context tokens
27
-
28
- The following are recommended configurations; actual requirements should be adjusted based on your use case:
29
-
30
- - **96G x4** GPU: Supports a total KV Cache capacity of 400K tokens.
31
-
32
- - **144G x8** GPU: Supports a total KV Cache capacity of up to 3M tokens.
33
-
34
- > **Note**: The values above represent the total aggregate hardware KV Cache capacity. The maximum context length per individual sequence remains **196K** tokens.
35
-
36
- ## Deployment with Python
37
-
38
- It is recommended to use a virtual environment (such as **venv**, **conda**, or **uv**) to avoid dependency conflicts.
39
-
40
- We recommend installing SGLang in a fresh Python environment:
41
-
42
- ```bash
43
- uv venv
44
- source .venv/bin/activate
45
- git clone https://github.com/sgl-project/sglang
46
- cd sglang
47
- uv pip install -e "python" --prerelease=allow
48
- ```
49
-
50
- Run the following command to start the SGLang server. SGLang will automatically download and cache the MiniMax-M2.1 model from Hugging Face.
51
-
52
- 4-GPU deployment command:
53
-
54
- ```bash
55
- python -m sglang.launch_server \
56
- --model-path MiniMaxAI/MiniMax-M2.1 \
57
- --tp-size 4 \
58
- --tool-call-parser minimax-m2 \
59
- --reasoning-parser minimax-append-think \
60
- --host 0.0.0.0 \
61
- --trust-remote-code \
62
- --port 8000 \
63
- --mem-fraction-static 0.85
64
- ```
65
-
66
- 8-GPU deployment command:
67
-
68
- ```bash
69
- python -m sglang.launch_server \
70
- --model-path MiniMaxAI/MiniMax-M2.1 \
71
- --tp-size 8 \
72
- --ep-size 8 \
73
- --tool-call-parser minimax-m2 \
74
- --trust-remote-code \
75
- --host 0.0.0.0 \
76
- --reasoning-parser minimax-append-think \
77
- --port 8000 \
78
- --mem-fraction-static 0.85
79
- ```
80
-
81
- ## Testing Deployment
82
-
83
- After startup, you can test the SGLang OpenAI-compatible API with the following command:
84
-
85
- ```bash
86
- curl http://localhost:8000/v1/chat/completions \
87
- -H "Content-Type: application/json" \
88
- -d '{
89
- "model": "MiniMaxAI/MiniMax-M2.1",
90
- "messages": [
91
- {"role": "system", "content": [{"type": "text", "text": "You are a helpful assistant."}]},
92
- {"role": "user", "content": [{"type": "text", "text": "Who won the world series in 2020?"}]}
93
- ]
94
- }'
95
- ```
96
-
97
- ## Common Issues
98
-
99
- ### MiniMax-M2 model is not currently supported
100
-
101
- Please upgrade to the latest stable version, >= v0.5.4.post1.
102
-
103
- ## Getting Support
104
-
105
- If you encounter any issues while deploying the MiniMax model:
106
-
107
- - Contact our technical support team through official channels such as email at [model@minimax.io](mailto:model@minimax.io)
108
-
109
- - Submit an issue on our [GitHub](https://github.com/MiniMax-AI) repository
110
-
111
- We continuously optimize the deployment experience for our models. Feedback is welcome!
112
-