Instructions to use majentik/MiniMax-M2.7-RotorQuant-MLX-3bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use majentik/MiniMax-M2.7-RotorQuant-MLX-3bit with MLX:

# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm

# Generate text with mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("majentik/MiniMax-M2.7-RotorQuant-MLX-3bit")

prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

text = generate(model, tokenizer, prompt=prompt, verbose=True)

Notebooks
Google Colab
Kaggle
Local Apps
LM Studio

Pi new

How to use majentik/MiniMax-M2.7-RotorQuant-MLX-3bit with Pi:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "majentik/MiniMax-M2.7-RotorQuant-MLX-3bit"

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "mlx-lm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "majentik/MiniMax-M2.7-RotorQuant-MLX-3bit"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use majentik/MiniMax-M2.7-RotorQuant-MLX-3bit with Hermes Agent:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "majentik/MiniMax-M2.7-RotorQuant-MLX-3bit"

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default majentik/MiniMax-M2.7-RotorQuant-MLX-3bit

Run Hermes

hermes

MLX LM

How to use majentik/MiniMax-M2.7-RotorQuant-MLX-3bit with MLX LM:

Generate or start a chat session

# Install MLX LM
uv tool install mlx-lm
# Interactive chat REPL
mlx_lm.chat --model "majentik/MiniMax-M2.7-RotorQuant-MLX-3bit"

Run an OpenAI-compatible server

# Install MLX LM
uv tool install mlx-lm
# Start the server
mlx_lm.server --model "majentik/MiniMax-M2.7-RotorQuant-MLX-3bit"
# Calling the OpenAI-compatible server with curl
curl -X POST "http://localhost:8000/v1/chat/completions" \
   -H "Content-Type: application/json" \
   --data '{
     "model": "majentik/MiniMax-M2.7-RotorQuant-MLX-3bit",
     "messages": [
       {"role": "user", "content": "Hello"}
     ]
   }'

MiniMax-M2.7-RotorQuant-MLX-3bit / README.md

majentik

docs: Tier 2 polish — variant matrix + quant trade-off

fdfe53f verified 10 days ago

preview code

raw

history blame contribute delete

5.91 kB

	---
	base_model: MiniMaxAI/MiniMax-M2.7
	library_name: mlx
	pipeline_tag: text-generation
	license: other
	license_name: minimax-model-license
	license_link: https://huggingface.co/MiniMaxAI/MiniMax-M2.7/blob/main/LICENSE
	tags:
	- minimax
	- m2.7
	- moe
	- quantized
	- rotorquant
	- kv-cache-quantization
	- mlx
	---

	# MiniMax-M2.7-RotorQuant-MLX-3bit

	MLX 3-bit quantized variant of [MiniMaxAI/MiniMax-M2.7](https://huggingface.co/MiniMaxAI/MiniMax-M2.7) with RotorQuant KV-cache compression, optimized for Apple Silicon.

	## Overview

	MiniMax-M2.7 is a massive 256-expert Mixture-of-Experts (MoE) model with 8 experts active per token, totaling approximately 456 billion parameters. This variant combines 3-bit MLX weight quantization with RotorQuant KV-cache quantization for deployment on Apple Silicon hardware.

	RotorQuant applies a learned Hadamard rotation matrix to keys and values before quantization, smoothing the activation distribution for better quality retention. At 3-bit, RotorQuant's rotation-based approach is particularly valuable for preserving output quality where naive quantization would noticeably degrade.

	\| Property \| Value \|
	\|---\|---\|
	\| Architecture \| MoE (256 experts, 8 active/token) \|
	\| Total Parameters \| ~456B \|
	\| Layers \| 62 \|
	\| Hidden Size \| 3072 \|
	\| Attention Heads \| 48 \|
	\| Weight Quantization \| 3-bit (MLX) \|
	\| KV-Cache Quantization \| RotorQuant \|
	\| Estimated Size \| ~170 GB \|
	\| Base Model \| MiniMaxAI/MiniMax-M2.7 \|

	## Quickstart

	```bash
	pip install mlx-lm
	```

	```python
	from mlx_lm import load, generate

	model, tokenizer = load("majentik/MiniMax-M2.7-RotorQuant-MLX-3bit")

	prompt = "What is a Comprehensive Geriatric Assessment?"
	messages = [{"role": "user", "content": prompt}]
	text = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)

	response = generate(
	model,
	tokenizer,
	prompt=text,
	max_tokens=512,
	)
	print(response)
	```

	## RotorQuant vs TurboQuant

	\| Feature \| RotorQuant \| TurboQuant \|
	\|---\|---\|---\|
	\| Technique \| Rotation-based KV quantization (Hadamard transform) \| Asymmetric per-channel KV quantization \|
	\| Throughput \| Slightly lower throughput (rotation overhead) \| Higher throughput, lower latency \|
	\| Quality \| Better quality preservation at low bit-widths \| Good quality preservation \|
	\| Best For \| Quality-sensitive tasks, research \| High-throughput serving, long contexts \|

	> At 3-bit quantization, RotorQuant provides meaningfully better quality than TurboQuant due to its rotation-based outlier smoothing.

	## Memory Estimates (Apple Silicon)

	\| Variant \| Estimated Size \| Minimum Unified Memory \|
	\|---\|---\|---\|
	\| MLX 8-bit \| ~456 GB \| 512 GB (Mac Studio M2/M3/M4 Ultra) \|
	\| MLX 5-bit \| ~280 GB \| 384 GB \|
	\| MLX 4-bit \| ~225 GB \| 256 GB \|
	\| MLX 3-bit \| ~170 GB \| 192 GB \|
	\| MLX 2-bit \| ~110 GB \| 128 GB \|

	> Note: 3-bit quantization requires Apple Silicon with 192 GB+ unified memory, such as a Mac Studio with M2/M3/M4 Ultra.

	## See Also

	- [MiniMaxAI/MiniMax-M2.7](https://huggingface.co/MiniMaxAI/MiniMax-M2.7) -- Base model
	- [majentik/MiniMax-M2.7-RotorQuant](https://huggingface.co/majentik/MiniMax-M2.7-RotorQuant) -- KV-cache only (transformers)
	- [majentik/MiniMax-M2.7-TurboQuant-MLX-3bit](https://huggingface.co/majentik/MiniMax-M2.7-TurboQuant-MLX-3bit) -- TurboQuant MLX 3-bit
	- [majentik/MiniMax-M2.7-RotorQuant-MLX-4bit](https://huggingface.co/majentik/MiniMax-M2.7-RotorQuant-MLX-4bit) -- MLX 4-bit
	- [majentik/MiniMax-M2.7-RotorQuant-MLX-2bit](https://huggingface.co/majentik/MiniMax-M2.7-RotorQuant-MLX-2bit) -- MLX 2-bit

	## Quant trade-off (MLX lane)

	\| Bits \| Approx size \| Use case \| Recommendation \|
	\|---\|---\|---\|---\|
	\| 2-bit \| ~119 GB \| Aggressive quantization \| Very low-RAM Macs \|
	\| 3-bit \| ~164 GB \| Lossy but small \| Low-RAM Macs \|
	\| 4-bit \| ~192 GB \| Balanced default \| Recommended for most Macs \|
	\| 5-bit \| ~228 GB \| Higher fidelity \| Quality-sensitive \|
	\| 6-bit \| ~274 GB \| Approaching FP16 quality \| High-fidelity \|
	\| 8-bit \| ~347 GB \| Near-lossless reference \| Fidelity-critical work \|

	(Current variant — 3bit — is bolded.)

	## Variants in this family

	(Showing 12 sibling variants under `majentik/minimax-m2.7-`. The current variant — `RotorQuant-MLX-3bit` — is bolded*.)

	\| Variant \| Runtime \| Approx size \| Use case \|
	\|---\|---\|---\|---\|
	\| [RotorQuant](https://huggingface.co/majentik/minimax-m2.7-rotorquant) \| runtime modifier \| n/a \| KV-cache root (weight-agnostic) \|
	\| [RotorQuant-MLX-2bit](https://huggingface.co/majentik/minimax-m2.7-rotorquant-mlx-2bit) \| mlx-lm \| ~885 MB \| Apple Silicon, smallest \|
	\| RotorQuant-MLX-3bit \| mlx-lm \| ~1.2 GB \| Apple Silicon, small \|
	\| [RotorQuant-MLX-4bit](https://huggingface.co/majentik/minimax-m2.7-rotorquant-mlx-4bit) \| mlx-lm \| ~1.7 GB \| Apple Silicon balanced \|
	\| [RotorQuant-MLX-5bit](https://huggingface.co/majentik/minimax-m2.7-rotorquant-mlx-5bit) \| mlx-lm \| ~2.1 GB \| Apple Silicon, higher fidelity \|
	\| [RotorQuant-MLX-8bit](https://huggingface.co/majentik/minimax-m2.7-rotorquant-mlx-8bit) \| mlx-lm \| ~3.2 GB \| Apple Silicon reference \|
	\| [TurboQuant](https://huggingface.co/majentik/minimax-m2.7-turboquant) \| runtime modifier \| n/a \| KV-cache root (weight-agnostic) \|
	\| [TurboQuant-MLX-2bit](https://huggingface.co/majentik/minimax-m2.7-turboquant-mlx-2bit) \| mlx-lm \| ~885 MB \| Apple Silicon, smallest \|
	\| [TurboQuant-MLX-3bit](https://huggingface.co/majentik/minimax-m2.7-turboquant-mlx-3bit) \| mlx-lm \| ~1.2 GB \| Apple Silicon, small \|
	\| [TurboQuant-MLX-4bit](https://huggingface.co/majentik/minimax-m2.7-turboquant-mlx-4bit) \| mlx-lm \| ~1.7 GB \| Apple Silicon balanced \|
	\| [TurboQuant-MLX-5bit](https://huggingface.co/majentik/minimax-m2.7-turboquant-mlx-5bit) \| mlx-lm \| ~2.1 GB \| Apple Silicon, higher fidelity \|
	\| [TurboQuant-MLX-8bit](https://huggingface.co/majentik/minimax-m2.7-turboquant-mlx-8bit) \| mlx-lm \| ~3.2 GB \| Apple Silicon reference \|

	---
	base_model: MiniMaxAI/MiniMax-M2.7
	library_name: mlx
	pipeline_tag: text-generation
	license: other
	license_name: minimax-model-license
	license_link: https://huggingface.co/MiniMaxAI/MiniMax-M2.7/blob/main/LICENSE
	tags:
	- minimax
	- m2.7
	- moe
	- quantized
	- rotorquant
	- kv-cache-quantization
	- mlx
	---

	# MiniMax-M2.7-RotorQuant-MLX-3bit

	MLX 3-bit quantized variant of [MiniMaxAI/MiniMax-M2.7](https://huggingface.co/MiniMaxAI/MiniMax-M2.7) with RotorQuant KV-cache compression, optimized for Apple Silicon.

	## Overview

	MiniMax-M2.7 is a massive 256-expert Mixture-of-Experts (MoE) model with 8 experts active per token, totaling approximately 456 billion parameters. This variant combines 3-bit MLX weight quantization with RotorQuant KV-cache quantization for deployment on Apple Silicon hardware.

	RotorQuant applies a learned Hadamard rotation matrix to keys and values before quantization, smoothing the activation distribution for better quality retention. At 3-bit, RotorQuant's rotation-based approach is particularly valuable for preserving output quality where naive quantization would noticeably degrade.

	\| Property \| Value \|
	\|---\|---\|
	\| Architecture \| MoE (256 experts, 8 active/token) \|
	\| Total Parameters \| ~456B \|
	\| Layers \| 62 \|
	\| Hidden Size \| 3072 \|
	\| Attention Heads \| 48 \|
	\| Weight Quantization \| 3-bit (MLX) \|
	\| KV-Cache Quantization \| RotorQuant \|
	\| Estimated Size \| ~170 GB \|
	\| Base Model \| MiniMaxAI/MiniMax-M2.7 \|

	## Quickstart

	```bash
	pip install mlx-lm
	```

	```python
	from mlx_lm import load, generate

	model, tokenizer = load("majentik/MiniMax-M2.7-RotorQuant-MLX-3bit")

	prompt = "What is a Comprehensive Geriatric Assessment?"
	messages = [{"role": "user", "content": prompt}]
	text = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)

	response = generate(
	model,
	tokenizer,
	prompt=text,
	max_tokens=512,
	)
	print(response)
	```

	## RotorQuant vs TurboQuant

	\| Feature \| RotorQuant \| TurboQuant \|
	\|---\|---\|---\|
	\| Technique \| Rotation-based KV quantization (Hadamard transform) \| Asymmetric per-channel KV quantization \|
	\| Throughput \| Slightly lower throughput (rotation overhead) \| Higher throughput, lower latency \|
	\| Quality \| Better quality preservation at low bit-widths \| Good quality preservation \|
	\| Best For \| Quality-sensitive tasks, research \| High-throughput serving, long contexts \|

	> At 3-bit quantization, RotorQuant provides meaningfully better quality than TurboQuant due to its rotation-based outlier smoothing.

	## Memory Estimates (Apple Silicon)

	\| Variant \| Estimated Size \| Minimum Unified Memory \|
	\|---\|---\|---\|
	\| MLX 8-bit \| ~456 GB \| 512 GB (Mac Studio M2/M3/M4 Ultra) \|
	\| MLX 5-bit \| ~280 GB \| 384 GB \|
	\| MLX 4-bit \| ~225 GB \| 256 GB \|
	\| MLX 3-bit \| ~170 GB \| 192 GB \|
	\| MLX 2-bit \| ~110 GB \| 128 GB \|

	> Note: 3-bit quantization requires Apple Silicon with 192 GB+ unified memory, such as a Mac Studio with M2/M3/M4 Ultra.

	## See Also

	- [MiniMaxAI/MiniMax-M2.7](https://huggingface.co/MiniMaxAI/MiniMax-M2.7) -- Base model
	- [majentik/MiniMax-M2.7-RotorQuant](https://huggingface.co/majentik/MiniMax-M2.7-RotorQuant) -- KV-cache only (transformers)
	- [majentik/MiniMax-M2.7-TurboQuant-MLX-3bit](https://huggingface.co/majentik/MiniMax-M2.7-TurboQuant-MLX-3bit) -- TurboQuant MLX 3-bit
	- [majentik/MiniMax-M2.7-RotorQuant-MLX-4bit](https://huggingface.co/majentik/MiniMax-M2.7-RotorQuant-MLX-4bit) -- MLX 4-bit
	- [majentik/MiniMax-M2.7-RotorQuant-MLX-2bit](https://huggingface.co/majentik/MiniMax-M2.7-RotorQuant-MLX-2bit) -- MLX 2-bit

	## Quant trade-off (MLX lane)

	\| Bits \| Approx size \| Use case \| Recommendation \|
	\|---\|---\|---\|---\|
	\| 2-bit \| ~119 GB \| Aggressive quantization \| Very low-RAM Macs \|
	\| 3-bit \| ~164 GB \| Lossy but small \| Low-RAM Macs \|
	\| 4-bit \| ~192 GB \| Balanced default \| Recommended for most Macs \|
	\| 5-bit \| ~228 GB \| Higher fidelity \| Quality-sensitive \|
	\| 6-bit \| ~274 GB \| Approaching FP16 quality \| High-fidelity \|
	\| 8-bit \| ~347 GB \| Near-lossless reference \| Fidelity-critical work \|

	(Current variant — 3bit — is bolded.)

	## Variants in this family

	(Showing 12 sibling variants under `majentik/minimax-m2.7-`. The current variant — `RotorQuant-MLX-3bit` — is bolded*.)

	\| Variant \| Runtime \| Approx size \| Use case \|
	\|---\|---\|---\|---\|
	\| [RotorQuant](https://huggingface.co/majentik/minimax-m2.7-rotorquant) \| runtime modifier \| n/a \| KV-cache root (weight-agnostic) \|
	\| [RotorQuant-MLX-2bit](https://huggingface.co/majentik/minimax-m2.7-rotorquant-mlx-2bit) \| mlx-lm \| ~885 MB \| Apple Silicon, smallest \|
	\| RotorQuant-MLX-3bit \| mlx-lm \| ~1.2 GB \| Apple Silicon, small \|
	\| [RotorQuant-MLX-4bit](https://huggingface.co/majentik/minimax-m2.7-rotorquant-mlx-4bit) \| mlx-lm \| ~1.7 GB \| Apple Silicon balanced \|
	\| [RotorQuant-MLX-5bit](https://huggingface.co/majentik/minimax-m2.7-rotorquant-mlx-5bit) \| mlx-lm \| ~2.1 GB \| Apple Silicon, higher fidelity \|
	\| [RotorQuant-MLX-8bit](https://huggingface.co/majentik/minimax-m2.7-rotorquant-mlx-8bit) \| mlx-lm \| ~3.2 GB \| Apple Silicon reference \|
	\| [TurboQuant](https://huggingface.co/majentik/minimax-m2.7-turboquant) \| runtime modifier \| n/a \| KV-cache root (weight-agnostic) \|
	\| [TurboQuant-MLX-2bit](https://huggingface.co/majentik/minimax-m2.7-turboquant-mlx-2bit) \| mlx-lm \| ~885 MB \| Apple Silicon, smallest \|
	\| [TurboQuant-MLX-3bit](https://huggingface.co/majentik/minimax-m2.7-turboquant-mlx-3bit) \| mlx-lm \| ~1.2 GB \| Apple Silicon, small \|
	\| [TurboQuant-MLX-4bit](https://huggingface.co/majentik/minimax-m2.7-turboquant-mlx-4bit) \| mlx-lm \| ~1.7 GB \| Apple Silicon balanced \|
	\| [TurboQuant-MLX-5bit](https://huggingface.co/majentik/minimax-m2.7-turboquant-mlx-5bit) \| mlx-lm \| ~2.1 GB \| Apple Silicon, higher fidelity \|
	\| [TurboQuant-MLX-8bit](https://huggingface.co/majentik/minimax-m2.7-turboquant-mlx-8bit) \| mlx-lm \| ~3.2 GB \| Apple Silicon reference \|