Xiao-AMD commited on
Commit
2c0dd2d
·
verified ·
1 Parent(s): 85d69cb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +113 -5
README.md CHANGED
@@ -1,5 +1,113 @@
1
- ---
2
- license: other
3
- license_name: modified-mit
4
- license_link: https://github.com/MiniMax-AI/MiniMax-M2.1/blob/main/LICENSE
5
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - MiniMaxAI/MiniMax-M2.5
4
+ language:
5
+ - en
6
+ library_name: transformers
7
+ license: other
8
+ license_name: modified-mit
9
+ license_link: https://github.com/MiniMax-AI/MiniMax-M2.5/blob/main/LICENSE
10
+ ---
11
+
12
+ # Model Overview
13
+
14
+ - **Model Architecture:** MiniMaxM2ForCausalLM
15
+ - **Input:** Text
16
+ - **Output:** Text
17
+ - **Supported Hardware Microarchitecture:** AMD MI300 MI350/MI355
18
+ - **ROCm**: 7.0
19
+ - **PyTorch**: 2.8.0
20
+ - **Transformers**: 4.57.1
21
+ - **Operating System(s):** Linux
22
+ - **Inference Engine:** [SGLang](https://docs.sglang.ai/)/[vLLM](https://docs.vllm.ai/en/latest/)
23
+ - **Model Optimizer:** [AMD-Quark](https://quark.docs.amd.com/latest/index.html) (v0.11)
24
+ - **Weight quantization:** OCP MXFP4, Static
25
+ - **Activation quantization:** OCP MXFP4, Dynamic
26
+
27
+
28
+ # Model Quantization
29
+
30
+ The model was quantized from [QuixiAI/MiniMax-M2.1-bf16](https://huggingface.co/QuixiAI/MiniMax-M2.1-bf16) using [AMD-Quark](https://quark.docs.amd.com/latest/index.html). The weights are quantized to MXFP4 and activations are quantized to MXFP4.
31
+
32
+
33
+ **Quantization scripts:**
34
+ ```
35
+ cd Quark/examples/torch/language_modeling/llm_ptq/
36
+ export exclude_layers="lm_head *block_sparse_moe.gate* *self_attn*"
37
+ python3 quantize_quark.py --model_dir $MODEL_DIR \
38
+ --quant_scheme mxfp4 \
39
+ --num_calib_data 128 \
40
+ --exclude_layers $exclude_layers \
41
+ --skip_evaluation \
42
+ --multi_gpu \
43
+ --trust_remote_code \
44
+ --model_export hf_format \
45
+ --output_dir $output_dir
46
+ ```
47
+ For further details or issues, please refer to the AMD-Quark documentation or contact the respective developers.
48
+
49
+ # Evaluation
50
+ The model was evaluated on gsm8k benchmarks using the [vllm](https://github.com/vllm-project/vllm/tree/v0.13.0) framework.
51
+
52
+ ### Accuracy
53
+
54
+ <table>
55
+ <tr>
56
+ <td><strong>Benchmark</strong>
57
+ </td>
58
+ <td><strong>MiniMaxAI/MiniMax-M2.5 </strong>
59
+ </td>
60
+ <td><strong>amd/MiniMax-M2.5-MXFP4(this model)</strong>
61
+ </td>
62
+ <td><strong>Recovery</strong>
63
+ </td>
64
+ </tr>
65
+ <tr>
66
+ <td>gsm8k (flexible-extract)
67
+ </td>
68
+ <td>0.9401
69
+ </td>
70
+ <td>0.9256
71
+ </td>
72
+ <td>98.46%
73
+ </td>
74
+ </tr>
75
+ </table>
76
+
77
+
78
+ ### Reproduction
79
+
80
+ The GSM8K results were obtained using the vLLM framework, based on the Docker image `rocm/vllm:rocm7.0.0_vllm_0.11.2_20251210`, and vLLM is installed inside the container with fixes applied for model support.
81
+
82
+ #### Preparation in container
83
+ ```
84
+ # Reinstall vLLM
85
+ pip uninstall vllm -y
86
+ git clone https://github.com/vllm-project/vllm.git
87
+ cd vllm
88
+ git checkout v0.13.0
89
+ pip install -r requirements/rocm.txt
90
+ python setup.py develop
91
+ cd ..
92
+ ```
93
+
94
+ #### Launching server
95
+ ```
96
+ VLLM_ROCM_USE_AITER=1 \
97
+ VLLM_DISABLE_COMPILE_CACHE=1 \
98
+ vllm serve "$MODEL" \
99
+ --tensor-parallel-size 4 \
100
+ --trust-remote-code \
101
+ --max-model-len 32768 \
102
+ --port 8899
103
+ ```
104
+
105
+
106
+ #### Evaluating model in a new terminal
107
+ ```
108
+ python vllm/tests/evals/gsm8k/gsm8k_eval.py --host http://127.0.0.1 --port 8899 --num-questions 1000 --save-results logs
109
+ ```
110
+
111
+
112
+ # License
113
+ Modifications Copyright(c) 2026 Advanced Micro Devices, Inc. All rights reserved.