chenjiel commited on
Commit
2b7347e
·
verified ·
1 Parent(s): a9e2fe8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -1
README.md CHANGED
@@ -106,7 +106,25 @@ This model was obtained by quantizing the weights and activations of MiniMax-M2.
106
  To serve this checkpoint with [SGLang](https://github.com/sgl-project/sglang), you can start the docker `lmsysorg/sglang:nightly-dev-20260313-c21ddbc7` and run the sample command below:
107
 
108
  ```sh
109
- python3 -m sglang.launch_server --model nvidia/MiniMax-M2.5-NVFP4 --tensor-parallel-size 8 --quantization modelopt_fp4 --trust-remote-code --reasoning-parser minimax-append-think --tool-call-parser minimax-m2 --moe-runner-backend flashinfer_cutlass --attention-backend flashinfer
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
110
  ```
111
 
112
  ### Evaluation
 
106
  To serve this checkpoint with [SGLang](https://github.com/sgl-project/sglang), you can start the docker `lmsysorg/sglang:nightly-dev-20260313-c21ddbc7` and run the sample command below:
107
 
108
  ```sh
109
+ python3 -m sglang.launch_server --model nvidia/MiniMax-M2.5-NVFP4 \
110
+ --tensor-parallel-size 8 \
111
+ --quantization modelopt_fp4 \
112
+ --trust-remote-code \
113
+ --reasoning-parser minimax-append-think \
114
+ --tool-call-parser minimax-m2 \
115
+ --moe-runner-backend flashinfer_cutlass \
116
+ --attention-backend flashinfer
117
+ ```
118
+
119
+ To serve this checkpoint with [vLLM](https://github.com/vllm-project/vllm), you can launch the docker image `vllm/vllm-openai:latest` and run the sample command below:
120
+
121
+ ```sh
122
+ vllm serve nvidia/MiniMax-M2.5-NVFP4 \
123
+ --tensor-parallel-size 8 \
124
+ --tool-call-parser minimax_m2 \
125
+ --reasoning-parser minimax_m2_append_think \
126
+ --enable-auto-tool-choice \
127
+ --trust-remote-code
128
  ```
129
 
130
  ### Evaluation