Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -106,7 +106,25 @@ This model was obtained by quantizing the weights and activations of MiniMax-M2.
 To serve this checkpoint with [SGLang](https://github.com/sgl-project/sglang), you can start the docker `lmsysorg/sglang:nightly-dev-20260313-c21ddbc7` and run the sample command below:
 ```sh
-python3 -m sglang.launch_server --model nvidia/MiniMax-M2.5-NVFP4 --tensor-parallel-size 8 --quantization modelopt_fp4 --trust-remote-code --reasoning-parser minimax-append-think --tool-call-parser minimax-m2 --moe-runner-backend flashinfer_cutlass --attention-backend flashinfer
 ```
 ### Evaluation

 To serve this checkpoint with [SGLang](https://github.com/sgl-project/sglang), you can start the docker `lmsysorg/sglang:nightly-dev-20260313-c21ddbc7` and run the sample command below:
 ```sh
+python3 -m sglang.launch_server --model nvidia/MiniMax-M2.5-NVFP4 \
+  --tensor-parallel-size 8 \
+  --quantization modelopt_fp4 \
+  --trust-remote-code \
+  --reasoning-parser minimax-append-think \
+  --tool-call-parser minimax-m2 \
+  --moe-runner-backend flashinfer_cutlass \
+  --attention-backend flashinfer
+```
+To serve this checkpoint with [vLLM](https://github.com/vllm-project/vllm), you can launch the docker image `vllm/vllm-openai:latest` and run the sample command below:
+```sh
+vllm serve nvidia/MiniMax-M2.5-NVFP4 \
+  --tensor-parallel-size 8 \
+  --tool-call-parser minimax_m2 \
+  --reasoning-parser minimax_m2_append_think \
+  --enable-auto-tool-choice \
+  --trust-remote-code
 ```
 ### Evaluation