--- quantized_by: moxin-org pipeline_tag: text-generation license: mit base_model: - MiniMaxAI/MiniMax-M2.1 base_model_relation: quantized tags: - MiniMaxAI - MiniMaxM2ForCausalLM - GGUF - llama.cpp - moxin-org --- ## Moxin x llama.cpp Customized Quant for MiniMax-M2.1

We sincerely thank the open-source community developers and contributors [unsloth](https://huggingface.co/unsloth) for providing `BF16 version` and `imatrix file`. We really appreciate the attention and we’re also happy to share additional quantization variants for everyone to try out and experiment with — hope you enjoy them! ``` - Q2_K_XL : 79.04 GiB (2.97 BPW) - MXFP4_MOE : 115.27 GiB (4.33 BPW) - Q4_K_XL : 129.72 GiB (4.87 BPW) - Other Quant Versions (Coming soon) ```

👈 Download Guide

```bash huggingface-cli download moxin-org/MiniMax-M2.1-GGUF --include "*Q2_K_XL*" --local-dir ./MiniMax-M2.1-GGUF ``` ```bash # !pip install huggingface_hub hf_transfer import os # os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1" from huggingface_hub import snapshot_download snapshot_download( repo_id = "moxin-org/MiniMax-M2.1-GGUF", local_dir = "MiniMax-M2.1-GGUF", allow_patterns = ["*Q2_K_XL*"], # MXFP4_MOE ) ```

> Download Available for huggingface_hub, huggingface-cli, snapshot_download, xet. ### Usage Example of runing gguf with local build of llama.cpp. (llama-cli/llama-server)

👈 Build llama.cpp locally

```bash git clone https://github.com/ggml-org/llama.cpp.git cd llama.cpp # -DLLAMA_CURL=OFF if error cmake -B build -DGGML_CUDA=ON -DBUILD_SHARED_LIBS=OFF cmake --build build --config Release -j --clean-first ```

```bash build/bin/llama-cli -m MiniMax-M2.1-GGUF/Moxin-Q4_K_XL/MiniMax-M2.1-Q2_K_XL-00001-of-00004.gguf \ -ngl 99 \ --temp 1.0 \ --top-k 40 \ --top-p 0.95 \ --min-p 0.01 \ --ctx-size 8192 \ # 4096, 16384 ``` --- ### Citation If this work is helpful, please kindly helpe cite as: ```bibtex @article{chen2025collaborative, title={Collaborative Compression for Large-Scale MoE Deployment on Edge}, author={Chen, Yixiao and Xie, Yanyue and Yang, Ruining and Jiang, Wei and Wang, Wei and He, Yong and Chen, Yue and Zhao, Pu and Wang, Yanzhi}, journal={arXiv preprint arXiv:2509.25689}, year={2025} } ``` ## Acknowledgements This repository builds upon the outstanding work of the following open-source authors and projects: - [MiniMaxAI/MiniMax-M2.1](https://huggingface.co/MiniMaxAI/MiniMax-M2.1) - [ggml-org/llama.cpp](https://github.com/ggml-org/llama.cpp), [unsloth.ai](https://unsloth.ai/), [bartowski](https://github.com/bartowski1182). - [ikawrakow/ik_llama.cpp](https://github.com/ikawrakow/ik_llama.cpp), [ikawrakow](https://github.com/ikawrakow), [ubergarm](https://github.com/ubergarm). We sincerely thank them for their excellent contributions to the open-source community.