haoyang-amd commited on
Commit
69d9b02
·
verified ·
1 Parent(s): 9b118ac

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -8
README.md CHANGED
@@ -32,7 +32,6 @@ The model was quantized from [amd/MiniMax-M2.7-BF16](https://huggingface.co/amd/
32
  **Quantization scripts:**
33
  ```
34
  export exclude_layers="lm_head *block_sparse_moe.gate* *self_attn*"
35
- export HIP_VISIBLE_DEVICES=4,5,6,7
36
  python3 quantize_quark.py --model_dir /shareddata/amd/MiniMax-M2.7-bf16 \
37
  --quant_scheme mxfp4 \
38
  --num_calib_data 128 \
@@ -66,11 +65,11 @@ The model was evaluated on gsm8k benchmarks using the vllm framework.
66
  <tr>
67
  <td>gsm8k (flexible-extract)
68
  </td>
69
- <td>91.36
70
- </td>
71
  <td>91.81
72
  </td>
73
- <td>100.49%
 
 
74
  </td>
75
  </tr>
76
  </table>
@@ -84,15 +83,14 @@ The GSM8K results were obtained using the lm-eval framework, based on the Docker
84
  ```
85
  vllm serve "$MODEL" \
86
  --tensor-parallel-size 4 \
87
- --trust-remote-code \
88
- --max-model-len 32768 \
89
- --port 8899
90
  ```
91
 
92
 
93
  #### Evaluating model in a new terminal
94
  ```
95
- python vllm/tests/evals/gsm8k/gsm8k_eval.py --host http://127.0.0.1 --port 8899
96
  ```
97
 
98
 
 
32
  **Quantization scripts:**
33
  ```
34
  export exclude_layers="lm_head *block_sparse_moe.gate* *self_attn*"
 
35
  python3 quantize_quark.py --model_dir /shareddata/amd/MiniMax-M2.7-bf16 \
36
  --quant_scheme mxfp4 \
37
  --num_calib_data 128 \
 
65
  <tr>
66
  <td>gsm8k (flexible-extract)
67
  </td>
 
 
68
  <td>91.81
69
  </td>
70
+ <td>91.89
71
+ </td>
72
+ <td>100.09%
73
  </td>
74
  </tr>
75
  </table>
 
83
  ```
84
  vllm serve "$MODEL" \
85
  --tensor-parallel-size 4 \
86
+ --enable-auto-tool-choice --tool-call-parser minimax_m2 \
87
+ --reasoning-parser minimax_m2_append_think
 
88
  ```
89
 
90
 
91
  #### Evaluating model in a new terminal
92
  ```
93
+ python vllm/tests/evals/gsm8k/gsm8k_eval.py
94
  ```
95
 
96