Update README about vllm reproduction scripts

#3
Files changed (1) hide show
  1. README.md +24 -1
README.md CHANGED
@@ -66,11 +66,34 @@ python3 quantize_quark.py --model_dir /amd/DeepSeek-R1-0528-BF16 \
66
  </td>
67
  <td>94.24
68
  </td>
69
- <td>93.71
70
  </td>
71
  </tr>
72
  </table>
73
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
74
 
75
  # License
76
  Modifications Copyright(c) 2025 Advanced Micro Devices, Inc. All rights reserved.
 
66
  </td>
67
  <td>94.24
68
  </td>
69
+ <td>94.90
70
  </td>
71
  </tr>
72
  </table>
73
 
74
+ ### Reproduction
75
+
76
+ Docker image: rocm/vllm-dev:base_main_20260212
77
+
78
+ Step 1: start a vLLM server with the quantized DeepSeek-R1 checkpoint
79
+
80
+ ```bash
81
+ vllm serve amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4 \
82
+ --tensor-parallel-size 8 \
83
+ --dtype auto \
84
+ --speculative-config '{"method":"mtp","num_speculative_tokens":1}' \
85
+ --gpu-memory-utilization 0.9 \
86
+ --block-size 1 \
87
+ --trust-remote-code \
88
+ --port 8000
89
+ ```
90
+ Note: CLI parameters such as `--tensor-parallel-size`, `--gpu-memory-utilization`, and `--port` can be adjusted as needed to match the target runtime environment.
91
+
92
+ Step 2: in a second terminal, run the GSM8K evaluation client against the running server.
93
+
94
+ ```bash
95
+ python3 tests/evals/gsm8k/gsm8k_eval.py
96
+ ```
97
 
98
  # License
99
  Modifications Copyright(c) 2025 Advanced Micro Devices, Inc. All rights reserved.