Update vLLM description

#1
by jeeejeee - opened
Files changed (1) hide show
  1. README.md +7 -0
README.md CHANGED
@@ -109,6 +109,13 @@ To serve this checkpoint with [SGLang](https://github.com/sgl-project/sglang), y
109
  python3 -m sglang.launch_server --model nvidia/MiniMax-M2.5-NVFP4 --tensor-parallel-size 8 --quantization modelopt_fp4 --trust-remote-code --reasoning-parser minimax-append-think --tool-call-parser minimax-m2 --moe-runner-backend flashinfer_cutlass --attention-backend flashinfer
110
  ```
111
 
 
 
 
 
 
 
 
112
  ### Evaluation
113
  The accuracy benchmark results are presented in the table below:
114
  <table>
 
109
  python3 -m sglang.launch_server --model nvidia/MiniMax-M2.5-NVFP4 --tensor-parallel-size 8 --quantization modelopt_fp4 --trust-remote-code --reasoning-parser minimax-append-think --tool-call-parser minimax-m2 --moe-runner-backend flashinfer_cutlass --attention-backend flashinfer
110
  ```
111
 
112
+ To serve this checkpoint with [vLLM](https://github.com/vllm-project/vllm), you can install the [vllm nightly](https://docs.vllm.ai/en/latest/getting_started/installation/gpu/#install-the-latest-code) or start the docker `vllm/vllm-openai:nightly`, and run the sample command below:
113
+
114
+ ```sh
115
+ vllm serve nvidia/MiniMax-M2.5-NVFP4 --tensor-parallel-size 8 --trust-remote-code --reasoning-parser minimax-append-think --tool-call-parser minimax-m2 --enable-auto-tool-choice
116
+ ```
117
+
118
+
119
  ### Evaluation
120
  The accuracy benchmark results are presented in the table below:
121
  <table>