Files changed (1) hide show
  1. README.md +110 -4
README.md CHANGED
@@ -1,5 +1,111 @@
1
- **Disclaimer**
 
 
 
 
 
 
 
 
 
 
 
 
2
 
3
- This model is provided for research and evaluation purposes only.
4
- Quantization may introduce accuracy or behavioral differences compared to the original model.
5
- Users are responsible for validating the model in their own environments and complying with the original model license.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: image-text-to-text
3
+ license: other
4
+ license_name: minimax-community
5
+ license_link: LICENSE
6
+ library_name: transformers
7
+ tags:
8
+ - multimodal
9
+ - moe
10
+ - agent
11
+ - coding
12
+ - video
13
+ ---
14
 
15
+ # Model Overview
16
+
17
+ - **Model Architecture:** MiniMaxM3SparseForConditionalGeneration
18
+ - **Input:** Text, Image
19
+ - **Output:** Text
20
+ - **Supported Hardware Microarchitecture:** AMD MI350/MI355
21
+ - **ROCm**: 7.1.1
22
+ - **PyTorch**: 2.10.0
23
+ - **Transformers**: 5.2.0
24
+ - **Operating System(s):** Linux
25
+ - **Inference Engine:** [vLLM](https://docs.vllm.ai/en/latest/)
26
+ - **Model Optimizer:** [AMD-Quark](https://quark.docs.amd.com/latest/index.html)
27
+ - **Weight quantization:** OCP MXFP4, Static
28
+ - **Activation quantization:** OCP MXFP4, Dynamic
29
+
30
+
31
+ # Model Quantization
32
+
33
+ The model was quantized from [MiniMaxAI/MiniMax-M3](https://huggingface.co/MiniMaxAI/MiniMax-M3) using [AMD-Quark](https://quark.docs.amd.com/latest/index.html). The weights are quantized to MXFP4 and activations are quantized to MXFP4.
34
+
35
+
36
+ **Quantization scripts:**
37
+ ```
38
+ cd Quark/examples/torch/language_modeling/llm_ptq/
39
+ exclude_layers="*lm_head *vision_tower* *multi_modal_projector* *patch_merge_mlp* *block_sparse_moe.gate *self_attn* *mlp.gate_proj *mlp.up_proj *mlp.down_proj"
40
+ CUDA_VISIBLE_DEVICES=0 python3 quantize_quark.py \
41
+ --model_dir MiniMaxAI/MiniMax-M3 \
42
+ --quant_scheme mxfp4 \
43
+ --exclude_layers $exclude_layers \
44
+ --output_dir /mnt/amd/MiniMax-M3-MXFP4 \
45
+ --file2file_quantization
46
+ ```
47
+
48
+ For further details or issues, please refer to the AMD-Quark documentation or contact the respective developers.
49
+
50
+ # Evaluation
51
+ The model was evaluated on gsm8k benchmarks using the vllm framework.
52
+
53
+ ### Accuracy
54
+
55
+ <table>
56
+ <tr>
57
+ <td><strong>Benchmark</strong>
58
+ </td>
59
+ <td><strong>MiniMaxAI/MiniMax-M3 </strong>
60
+ </td>
61
+ <td><strong>amd/MiniMax-M3-MXFP4(this model)</strong>
62
+ </td>
63
+ <td><strong>Recovery</strong>
64
+ </td>
65
+ </tr>
66
+ <tr>
67
+ <td>gsm8k (flexible-extract)
68
+ </td>
69
+ <td>95.30
70
+ </td>
71
+ <td>94.19
72
+ </td>
73
+ <td>98.84%
74
+ </td>
75
+ </tr>
76
+ </table>
77
+
78
+
79
+ ### Reproduction
80
+
81
+ The GSM8K results were obtained using the lm-eval framework, based on the
82
+ Docker image `rocm/pytorch-private:vllm-hy-mm-06112026`. The vLLM shipped in
83
+ that image was used as-is, with the patch from this PR ([#45794](https://github.com/vllm-project/vllm/pull/45794/changes)) applied on top.
84
+
85
+ #### Launching server
86
+ ```
87
+ vllm serve /mnt/amd/MiniMax-M3-MXFP4 \
88
+ --trust-remote-code \
89
+ --block-size 128 \
90
+ --tensor-parallel-size 8 \
91
+ --attention-backend TRITON_ATTN \
92
+ --mm-encoder-tp-mode data \
93
+ --mm-encoder-attn-backend ROCM_AITER_FA \
94
+ --tool-call-parser minimax_m3 \
95
+ --enable-auto-tool-choice \
96
+ --reasoning-parser minimax_m3 \
97
+ --moe-backend emulation
98
+ ```
99
+
100
+
101
+ #### Evaluating model in a new terminal
102
+ ```
103
+ lm_eval \
104
+ --model local-chat-completions \
105
+ --model_args "model=/mnt/amd/MiniMax-M3-MXFP4,base_url=http://127.0.0.1:8000/v1/chat/completions,num_concurrent=32,max_gen_toks=16384" \
106
+ --tasks gsm8k \
107
+ --num_fewshot 5 \
108
+ --batch_size 1 \
109
+ --apply_chat_template \
110
+ --fewshot_as_multiturn
111
+ ```