--- pipeline_tag: image-text-to-text license: other license_name: minimax-community license_link: LICENSE library_name: transformers tags: - multimodal - moe - agent - coding - video --- # Model Overview - **Model Architecture:** MiniMaxM3SparseForConditionalGeneration - **Input:** Text, Image - **Output:** Text - **Supported Hardware Microarchitecture:** AMD MI350/MI355 - **ROCm**: 7.1.1 - **PyTorch**: 2.10.0 - **Transformers**: 5.2.0 - **Operating System(s):** Linux - **Inference Engine:** [vLLM](https://docs.vllm.ai/en/latest/) - **Model Optimizer:** [AMD-Quark](https://quark.docs.amd.com/latest/index.html) - **Weight quantization:** OCP MXFP4, Static - **Activation quantization:** OCP MXFP4, Dynamic # Model Quantization The model was quantized from [MiniMaxAI/MiniMax-M3](https://huggingface.co/MiniMaxAI/MiniMax-M3) using [AMD-Quark](https://quark.docs.amd.com/latest/index.html). The weights are quantized to MXFP4 and activations are quantized to MXFP4. **Quantization scripts:** ``` cd Quark/examples/torch/language_modeling/llm_ptq/ exclude_layers="*lm_head *vision_tower* *multi_modal_projector* *patch_merge_mlp* *block_sparse_moe.gate *self_attn* *mlp.gate_proj *mlp.up_proj *mlp.down_proj" CUDA_VISIBLE_DEVICES=0 python3 quantize_quark.py \ --model_dir MiniMaxAI/MiniMax-M3 \ --quant_scheme mxfp4 \ --exclude_layers $exclude_layers \ --output_dir /mnt/amd/MiniMax-M3-MXFP4 \ --file2file_quantization ``` For further details or issues, please refer to the AMD-Quark documentation or contact the respective developers. # Evaluation The model was evaluated on gsm8k benchmarks using the vllm framework. ### Accuracy
| Benchmark | MiniMaxAI/MiniMax-M3 | amd/MiniMax-M3-MXFP4(this model) | Recovery |
| gsm8k (flexible-extract) | 95.30 | 94.19 | 98.84% |