MiniMax-M3-MLX-6bit

Built with MiniMax M3.

This is an MLX (Apple Silicon) conversion of MiniMaxAI/MiniMax-M3, quantized to 6-bit (high quality).

It is a text-only extraction of the M3 backbone (the vision tower, multimodal projector and multi-token-prediction heads are not included). The model is a ~427B-parameter Mixture-of-Experts (128 experts, top-4, + 1 shared expert; first 3 layers dense), with per-head QK-norm, partial RoPE, Gemma-style RMSNorm and the SwiGLU-OAI activation.

Quantizations

Part of the MiniMax-M3 MLX collection.

Variant Size Notes
8-bit ~453 GB near-lossless
6-bit (this repo) ~346 GB high quality
4-bit ~240 GB balanced default
3-bit ~186 GB smallest
mixed-3_6bit ~191 GB experts@3-bit, attn/embeds/router@6-8-bit · best quality-per-GB

Attention / context note

MiniMax Sparse Attention (MSA) is implemented here as full causal attention. This is numerically exact for sequences up to 2048 tokens (MSA selects every key block at that length) and is the dense, un-approximated attention that MSA approximates beyond it — so quality is preserved, at the cost of MSA's long-context speed/memory savings.

Use with mlx-lm

pip install mlx-lm

This build requires the minimax_m3 model class (mlx_lm/models/minimax_m3.py, included in this repo — copy it into your mlx_lm/models/ directory).

from mlx_lm import load, generate

model, tokenizer = load("pipenetwork/MiniMax-M3-MLX-6bit")
prompt = tokenizer.apply_chat_template(
    [{"role": "user", "content": "Explain Mixture-of-Experts in one paragraph."}],
    add_generation_prompt=True,
)
print(generate(model, tokenizer, prompt=prompt, max_tokens=256, verbose=True))

License

Released under the MiniMax Community License (see LICENSE). Use is non-commercial by default; commercial use requires displaying "Built with MiniMax M3" and may require prior authorization from MiniMax — see the license text for details.

Provenance

Converted from the BF16 checkpoint with mlx-lm quantization. Quantization config: {"group_size": 64, "bits": 6, "mode": "affine", "model.layers.3.block_sparse_moe.gate": {"group_size": 64, "bits": 8}, "model.layers.4.block_sparse_moe.gate": {"group_size": 64, "bits": 8}, "model.layers.5.block_sparse_moe.gate": {"group_size": 64, "bits": 8}, "model.layers.6.block_sparse_moe.gate": {"group_size": 64, "bits": 8}, "model.layers.7.block_sparse_moe.gate": {"group_size": 64, "bits": 8}, "model.layers.8.block_sparse_moe.gate": {"group_size": 64, "bits": 8}, "model.layers.9.block_sparse_moe.gate": {"group_size": 64, "bits": 8}, "model.layers.10.block_sparse_moe.gate": {"group_size": 64, "bits": 8}, "model.layers.11.block_sparse_moe.gate": {"group_size": 64, "bits": 8}, "model.layers.12.block_sparse_moe.gate": {"group_size": 64, "bits": 8}, "model.layers.13.block_sparse_moe.gate": {"group_size": 64, "bits": 8}, "model.layers.14.block_sparse_moe.gate": {"group_size": 64, "bits": 8}, "model.layers.15.block_sparse_moe.gate": {"group_size": 64, "bits": 8}, "model.layers.16.block_sparse_moe.gate": {"group_size": 64, "bits": 8}, "model.layers.17.block_sparse_moe.gate": {"group_size": 64, "bits": 8}, "model.layers.18.block_sparse_moe.gate": {"group_size": 64, "bits": 8}, "model.layers.19.block_sparse_moe.gate": {"group_size": 64, "bits": 8}, "model.layers.20.block_sparse_moe.gate": {"group_size": 64, "bits": 8}, "model.layers.21.block_sparse_moe.gate": {"group_size": 64, "bits": 8}, "model.layers.22.block_sparse_moe.gate": {"group_size": 64, "bits": 8}, "model.layers.23.block_sparse_moe.gate": {"group_size": 64, "bits": 8}, "model.layers.24.block_sparse_moe.gate": {"group_size": 64, "bits": 8}, "model.layers.25.block_sparse_moe.gate": {"group_size": 64, "bits": 8}, "model.layers.26.block_sparse_moe.gate": {"group_size": 64, "bits": 8}, "model.layers.27.block_sparse_moe.gate": {"group_size": 64, "bits": 8}, "model.layers.28.block_sparse_moe.gate": {"group_size": 64, "bits": 8}, "model.layers.29.block_sparse_moe.gate": {"group_size": 64, "bits": 8}, "model.layers.30.block_sparse_moe.gate": {"group_size": 64, "bits": 8}, "model.layers.31.block_sparse_moe.gate": {"group_size": 64, "bits": 8}, "model.layers.32.block_sparse_moe.gate": {"group_size": 64, "bits": 8}, "model.layers.33.block_sparse_moe.gate": {"group_size": 64, "bits": 8}, "model.layers.34.block_sparse_moe.gate": {"group_size": 64, "bits": 8}, "model.layers.35.block_sparse_moe.gate": {"group_size": 64, "bits": 8}, "model.layers.36.block_sparse_moe.gate": {"group_size": 64, "bits": 8}, "model.layers.37.block_sparse_moe.gate": {"group_size": 64, "bits": 8}, "model.layers.38.block_sparse_moe.gate": {"group_size": 64, "bits": 8}, "model.layers.39.block_sparse_moe.gate": {"group_size": 64, "bits": 8}, "model.layers.40.block_sparse_moe.gate": {"group_size": 64, "bits": 8}, "model.layers.41.block_sparse_moe.gate": {"group_size": 64, "bits": 8}, "model.layers.42.block_sparse_moe.gate": {"group_size": 64, "bits": 8}, "model.layers.43.block_sparse_moe.gate": {"group_size": 64, "bits": 8}, "model.layers.44.block_sparse_moe.gate": {"group_size": 64, "bits": 8}, "model.layers.45.block_sparse_moe.gate": {"group_size": 64, "bits": 8}, "model.layers.46.block_sparse_moe.gate": {"group_size": 64, "bits": 8}, "model.layers.47.block_sparse_moe.gate": {"group_size": 64, "bits": 8}, "model.layers.48.block_sparse_moe.gate": {"group_size": 64, "bits": 8}, "model.layers.49.block_sparse_moe.gate": {"group_size": 64, "bits": 8}, "model.layers.50.block_sparse_moe.gate": {"group_size": 64, "bits": 8}, "model.layers.51.block_sparse_moe.gate": {"group_size": 64, "bits": 8}, "model.layers.52.block_sparse_moe.gate": {"group_size": 64, "bits": 8}, "model.layers.53.block_sparse_moe.gate": {"group_size": 64, "bits": 8}, "model.layers.54.block_sparse_moe.gate": {"group_size": 64, "bits": 8}, "model.layers.55.block_sparse_moe.gate": {"group_size": 64, "bits": 8}, "model.layers.56.block_sparse_moe.gate": {"group_size": 64, "bits": 8}, "model.layers.57.block_sparse_moe.gate": {"group_size": 64, "bits": 8}, "model.layers.58.block_sparse_moe.gate": {"group_size": 64, "bits": 8}, "model.layers.59.block_sparse_moe.gate": {"group_size": 64, "bits": 8}}.

Downloads last month
725
Safetensors
Model size
426B params
Tensor type
BF16
·
U32
·
F32
·
MLX
Hardware compatibility
Log In to add your hardware

6-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for pipenetwork/MiniMax-M3-MLX-6bit

Quantized
(19)
this model

Collection including pipenetwork/MiniMax-M3-MLX-6bit