Update README.md
Browse files
README.md
CHANGED
|
@@ -106,7 +106,25 @@ This model was obtained by quantizing the weights and activations of MiniMax-M2.
|
|
| 106 |
To serve this checkpoint with [SGLang](https://github.com/sgl-project/sglang), you can start the docker `lmsysorg/sglang:nightly-dev-20260313-c21ddbc7` and run the sample command below:
|
| 107 |
|
| 108 |
```sh
|
| 109 |
-
python3 -m sglang.launch_server --model nvidia/MiniMax-M2.5-NVFP4
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 110 |
```
|
| 111 |
|
| 112 |
### Evaluation
|
|
|
|
| 106 |
To serve this checkpoint with [SGLang](https://github.com/sgl-project/sglang), you can start the docker `lmsysorg/sglang:nightly-dev-20260313-c21ddbc7` and run the sample command below:
|
| 107 |
|
| 108 |
```sh
|
| 109 |
+
python3 -m sglang.launch_server --model nvidia/MiniMax-M2.5-NVFP4 \
|
| 110 |
+
--tensor-parallel-size 8 \
|
| 111 |
+
--quantization modelopt_fp4 \
|
| 112 |
+
--trust-remote-code \
|
| 113 |
+
--reasoning-parser minimax-append-think \
|
| 114 |
+
--tool-call-parser minimax-m2 \
|
| 115 |
+
--moe-runner-backend flashinfer_cutlass \
|
| 116 |
+
--attention-backend flashinfer
|
| 117 |
+
```
|
| 118 |
+
|
| 119 |
+
To serve this checkpoint with [vLLM](https://github.com/vllm-project/vllm), you can launch the docker image `vllm/vllm-openai:latest` and run the sample command below:
|
| 120 |
+
|
| 121 |
+
```sh
|
| 122 |
+
vllm serve nvidia/MiniMax-M2.5-NVFP4 \
|
| 123 |
+
--tensor-parallel-size 8 \
|
| 124 |
+
--tool-call-parser minimax_m2 \
|
| 125 |
+
--reasoning-parser minimax_m2_append_think \
|
| 126 |
+
--enable-auto-tool-choice \
|
| 127 |
+
--trust-remote-code
|
| 128 |
```
|
| 129 |
|
| 130 |
### Evaluation
|