Files changed (1) hide show
  1. README.md +7 -4
README.md CHANGED
@@ -15,23 +15,26 @@ license_link: https://huggingface.co/MiniMaxAI/MiniMax-M2.7/blob/main/LICENSE
15
  - **Output:** Text
16
  - **Supported Hardware Microarchitecture:** AMD MI300/MI350/MI355 (emulation)
17
  - **ROCm:** 7.2.2
 
 
18
  - **Operating System(s):** Linux
19
  - **Inference Engine:** [vLLM](https://docs.vllm.ai/en/latest/)
20
  - **Model Optimizer:** [AMD-Quark](https://quark.docs.amd.com/latest/index.html) (v0.12)
21
  - **Inference Engine:** [SGLang](https://docs.sglang.ai/)/[vLLM](https://docs.vllm.ai/en/latest/)
22
  - **Model Optimizer:** [AMD-Quark](https://quark.docs.amd.com/latest/index.html)
 
23
  - **Weight quantization:** NVFP4, Static
24
  - **Activation quantization:** NVFP4, Dynamic
25
 
26
 
27
  # Model Quantization
28
 
29
- The model was quantized from [amd/MiniMax-M2.7-BF16](https://huggingface.co/amd/MiniMax-M2.7-BF16), originally from [MiniMax/MiniMax-M2.7](https://huggingface.co/MiniMax/MiniMax-M2.7), using [AMD-Quark](https://quark.docs.amd.com/latest/index.html). The weights are quantized to NVFP4 and activations are quantized to NVFP4.
30
 
31
 
32
  **Quantization scripts:**
33
  ```
34
- cd /Quark/examples/torch/language_modeling/llm_ptq
35
  exclude_layers="lm_head *block_sparse_moe.gate* *self_attn*"
36
  python3 quantize_quark.py --model_dir amd/MiniMax-M2.7-BF16 \
37
  --quant_scheme nvfp4 \
@@ -45,8 +48,8 @@ python3 quantize_quark.py --model_dir amd/MiniMax-M2.7-BF16 \
45
 
46
 
47
  For further details or issues, please refer to the AMD-Quark documentation or contact the respective developers.
48
-
49
- # Evaluation
50
  The model was evaluated on gsm8k benchmarks using the vllm framework.
51
 
52
  ### Accuracy
 
15
  - **Output:** Text
16
  - **Supported Hardware Microarchitecture:** AMD MI300/MI350/MI355 (emulation)
17
  - **ROCm:** 7.2.2
18
+ - **PyTorch**: 2.10.0
19
+ - **Transformers**: 5.2.0
20
  - **Operating System(s):** Linux
21
  - **Inference Engine:** [vLLM](https://docs.vllm.ai/en/latest/)
22
  - **Model Optimizer:** [AMD-Quark](https://quark.docs.amd.com/latest/index.html) (v0.12)
23
  - **Inference Engine:** [SGLang](https://docs.sglang.ai/)/[vLLM](https://docs.vllm.ai/en/latest/)
24
  - **Model Optimizer:** [AMD-Quark](https://quark.docs.amd.com/latest/index.html)
25
+ - **Quantized layers:** `experts`
26
  - **Weight quantization:** NVFP4, Static
27
  - **Activation quantization:** NVFP4, Dynamic
28
 
29
 
30
  # Model Quantization
31
 
32
+ The model was quantized from [amd/MiniMax-M2.7-BF16](https://huggingface.co/amd/MiniMax-M2.7-BF16), originally from [MiniMax/MiniMax-M2.7](https://huggingface.co/MiniMax/MiniMax-M2.7), using [AMD-Quark](https://quark.docs.amd.com/latest/index.html). The weights and activations are quantized to NVFP4.
33
 
34
 
35
  **Quantization scripts:**
36
  ```
37
+ cd Quark/examples/torch/language_modeling/llm_ptq
38
  exclude_layers="lm_head *block_sparse_moe.gate* *self_attn*"
39
  python3 quantize_quark.py --model_dir amd/MiniMax-M2.7-BF16 \
40
  --quant_scheme nvfp4 \
 
48
 
49
 
50
  For further details or issues, please refer to the AMD-Quark documentation or contact the respective developers.
51
+ # Deployment
52
+ ## Evaluation
53
  The model was evaluated on gsm8k benchmarks using the vllm framework.
54
 
55
  ### Accuracy