amd
/

DeepSeek-R1-0528-MXFP4-MTP-MoEFP4

@@ -66,11 +66,34 @@ python3 quantize_quark.py --model_dir /amd/DeepSeek-R1-0528-BF16 \
    </td>
    <td>94.24
    </td>
-   <td>93.71
    </td>
   </tr>
 </table>
 # License
 Modifications Copyright(c) 2025 Advanced Micro Devices, Inc. All rights reserved.

    </td>
    <td>94.24
    </td>
+   <td>94.90
    </td>
   </tr>
 </table>
+### Reproduction
+Docker image: rocm/vllm-dev:base_main_20260212
+Step 1: start a vLLM server with the quantized DeepSeek-R1 checkpoint
+```bash
+vllm serve amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4 \
+  --tensor-parallel-size 8 \
+  --dtype auto \
+  --speculative-config '{"method":"mtp","num_speculative_tokens":1}' \
+  --gpu-memory-utilization 0.9 \
+  --block-size 1 \
+  --trust-remote-code \
+  --port 8000
+```
+Note: CLI parameters such as `--tensor-parallel-size`, `--gpu-memory-utilization`, and `--port` can be adjusted as needed to match the target runtime environment.
+Step 2: in a second terminal, run the GSM8K evaluation client against the running server.
+```bash
+python3 tests/evals/gsm8k/gsm8k_eval.py
+```
 # License
 Modifications Copyright(c) 2025 Advanced Micro Devices, Inc. All rights reserved.