Update README.md
Browse files
README.md
CHANGED
|
@@ -160,7 +160,67 @@ We, over-caffinated researchers at VibeStud.io wanted to create a 50% pruned ver
|
|
| 160 |
| Clinical Knowledge | 92.83% | 85.66% | \-7.17% | ⚠️ Moderate Drop |
|
| 161 |
|
| 162 |
---
|
|
|
|
| 163 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 164 |
## Benchmarks
|
| 165 |
|
| 166 |
Coming soon.
|
|
|
|
| 160 |
| Clinical Knowledge | 92.83% | 85.66% | \-7.17% | ⚠️ Moderate Drop |
|
| 161 |
|
| 162 |
---
|
| 163 |
+
## **Deployment with Python**
|
| 164 |
|
| 165 |
+
It is recommended to use a virtual environment (such as **venv**, **conda**, or **uv**) to avoid dependency conflicts.
|
| 166 |
+
|
| 167 |
+
We recommend installing SGLang in a fresh Python environment:
|
| 168 |
+
|
| 169 |
+
```shell
|
| 170 |
+
git clone -b v0.5.4.post1 https://github.com/sgl-project/sglang.git
|
| 171 |
+
cd sglang
|
| 172 |
+
|
| 173 |
+
# Install the python packages
|
| 174 |
+
pip install --upgrade pip
|
| 175 |
+
pip install -e "python"
|
| 176 |
+
```
|
| 177 |
+
|
| 178 |
+
Run the following command to start the SGLang server. SGLang will automatically download and cache the MiniMax-M2 model from Hugging Face.
|
| 179 |
+
|
| 180 |
+
**4-GPU deployment command:**
|
| 181 |
+
|
| 182 |
+
```shell
|
| 183 |
+
python -m sglang.launch_server \
|
| 184 |
+
--model-path MiniMaxAI/MiniMax-M2 \
|
| 185 |
+
--tp-size 4 \
|
| 186 |
+
--tool-call-parser minimax-m2 \
|
| 187 |
+
--reasoning-parser minimax-append-think \
|
| 188 |
+
--host 0.0.0.0 \
|
| 189 |
+
--trust-remote-code \
|
| 190 |
+
--port 8000 \
|
| 191 |
+
--mem-fraction-static 0.85
|
| 192 |
+
```
|
| 193 |
+
|
| 194 |
+
**8-GPU deployment command:**
|
| 195 |
+
|
| 196 |
+
```shell
|
| 197 |
+
python -m sglang.launch_server \
|
| 198 |
+
--model-path MiniMaxAI/MiniMax-M2 \
|
| 199 |
+
--tp-size 8 \
|
| 200 |
+
--ep-size 8 \
|
| 201 |
+
--tool-call-parser minimax-m2 \
|
| 202 |
+
--trust-remote-code \
|
| 203 |
+
--host 0.0.0.0 \
|
| 204 |
+
--reasoning-parser minimax-append-think \
|
| 205 |
+
--port 8000 \
|
| 206 |
+
--mem-fraction-static 0.85
|
| 207 |
+
```
|
| 208 |
+
|
| 209 |
+
## **Testing Deployment**
|
| 210 |
+
|
| 211 |
+
After startup, you can test the SGLang OpenAI-compatible API with the following command:
|
| 212 |
+
|
| 213 |
+
```shell
|
| 214 |
+
curl http://localhost:8000/v1/chat/completions \
|
| 215 |
+
-H "Content-Type: application/json" \
|
| 216 |
+
-d '{
|
| 217 |
+
"model": "MiniMaxAI/MiniMax-M2",
|
| 218 |
+
"messages": [
|
| 219 |
+
{"role": "system", "content": [{"type": "text", "text": "You are a helpful assistant."}]},
|
| 220 |
+
{"role": "user", "content": [{"type": "text", "text": "Who won the world series in 2020?"}]}
|
| 221 |
+
]
|
| 222 |
+
}'
|
| 223 |
+
```
|
| 224 |
## Benchmarks
|
| 225 |
|
| 226 |
Coming soon.
|