seungrokj commited on
Commit
f994868
·
verified ·
1 Parent(s): 4c6349b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +111 -3
README.md CHANGED
@@ -1,3 +1,111 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - QuixiAI/MiniMax-M2.1-bf16
4
+ language:
5
+ - en
6
+ library_name: transformers
7
+ license: mit
8
+ ---
9
+
10
+ # Model Overview
11
+
12
+ - **Model Architecture:** Qwen3MoeForCausalLM
13
+ - **Input:** Text
14
+ - **Output:** Text
15
+ - **Supported Hardware Microarchitecture:** AMD MI300 MI350/MI355
16
+ - **ROCm**: 7.0
17
+ - **PyTorch**: 2.8.0
18
+ - **Transformers**: 4.57.6
19
+ - **Operating System(s):** Linux
20
+ - **Inference Engine:** [SGLang](https://docs.sglang.ai/)/[vLLM](https://docs.vllm.ai/en/latest/)
21
+ - **Model Optimizer:** [AMD-Quark](https://quark.docs.amd.com/latest/index.html) (v0.11)
22
+ - **Weight quantization:** OCP MXFP4, Static
23
+ - **Activation quantization:** OCP MXFP4, Dynamic
24
+
25
+
26
+ # Model Quantization
27
+
28
+ The model was quantized from [QuixiAI/MiniMax-M2.1-bf16](https://huggingface.co/QuixiAI/MiniMax-M2.1-bf16) using [AMD-Quark](https://quark.docs.amd.com/latest/index.html). The weights are quantized to MXFP4 and activations are quantized to MXFP4.
29
+
30
+
31
+ **Quantization scripts:**
32
+ ```
33
+ cd Quark/examples/torch/language_modeling/llm_ptq/
34
+ export exclude_layers="lm_head *block_sparse_moe.gate* *self_attn*"
35
+ python3 quantize_quark.py --model_dir $MODEL_DIR \
36
+ --quant_scheme mxfp4 \
37
+ --num_calib_data 128 \
38
+ --exclude_layers $exclude_layers \
39
+ --skip_evaluation \
40
+ --multi_gpu \
41
+ --trust_remote_code \
42
+ --model_export hf_format \
43
+ --output_dir $output_dir
44
+ ```
45
+ For further details or issues, please refer to the AMD-Quark documentation or contact the respective developers.
46
+
47
+ # Evaluation
48
+ The model was evaluated on gsm8k benchmarks using the [vllm](https://github.com/vllm-project/vllm/tree/v0.13.0) framework.
49
+
50
+ ### Accuracy
51
+
52
+ <table>
53
+ <tr>
54
+ <td><strong>Benchmark</strong>
55
+ </td>
56
+ <td><strong>QuixiAI/MiniMax-M2.1-bf16 </strong>
57
+ </td>
58
+ <td><strong>amd/MiniMax-M2.1-MXFP4(this model)</strong>
59
+ </td>
60
+ <td><strong>Recovery</strong>
61
+ </td>
62
+ </tr>
63
+ <tr>
64
+ <td>gsm8k
65
+ </td>
66
+ <td>0.9356
67
+ </td>
68
+ <td>0.9348
69
+ </td>
70
+ <td>99.91%
71
+ </td>
72
+ </tr>
73
+ </table>
74
+
75
+
76
+ ### Reproduction
77
+
78
+ The GSM8K results were obtained using the vLLM framework, based on the Docker image `rocm/vllm:rocm7.0.0_vllm_0.11.2_20251210`, and vLLM is installed from source inside the container.
79
+
80
+ #### Preparation in container
81
+ ```
82
+ # Reinstall vLLM
83
+ pip uninstall vllm -y
84
+ git clone https://github.com/vllm-project/vllm.git
85
+ cd vllm
86
+ git checkout v0.13.0
87
+ pip install -r requirements/rocm.txt
88
+ python setup.py develop
89
+ cd ..
90
+ ```
91
+
92
+ #### Launching server
93
+ ```
94
+ VLLM_ROCM_USE_AITER=1 \
95
+ VLLM_DISABLE_COMPILE_CACHE=1 \
96
+ vllm serve "$MODEL" \
97
+ --tensor-parallel-size 4 \
98
+ --trust-remote-code \
99
+ --max-model-len 32768 \
100
+ --port 8899
101
+ ```
102
+
103
+
104
+ #### Evaluating model in a new terminal
105
+ ```
106
+ python vllm/tests/evals/gsm8k/gsm8k_eval.py --host http://127.0.0.1 --port 8899 --num-questions 1000 --save-results logs
107
+ ```
108
+
109
+
110
+ # License
111
+ Modifications Copyright(c) 2026 Advanced Micro Devices, Inc. All rights reserved.