Instructions to use majentik/Qwen3.5-397B-A17B-RotorQuant-MLX-2bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use majentik/Qwen3.5-397B-A17B-RotorQuant-MLX-2bit with MLX:

# Make sure mlx-vlm is installed
# pip install --upgrade mlx-vlm

from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config

# Load the model
model, processor = load("majentik/Qwen3.5-397B-A17B-RotorQuant-MLX-2bit")
config = load_config("majentik/Qwen3.5-397B-A17B-RotorQuant-MLX-2bit")

# Prepare input
image = ["http://images.cocodataset.org/val2017/000000039769.jpg"]
prompt = "Describe this image."

# Apply chat template
formatted_prompt = apply_chat_template(
    processor, config, prompt, num_images=1
)

# Generate output
output = generate(model, processor, formatted_prompt, image)
print(output)

Notebooks
Google Colab
Kaggle
Local Apps Settings
LM Studio

How to use majentik/Qwen3.5-397B-A17B-RotorQuant-MLX-2bit with Pi:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "majentik/Qwen3.5-397B-A17B-RotorQuant-MLX-2bit"

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "mlx-lm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "majentik/Qwen3.5-397B-A17B-RotorQuant-MLX-2bit"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use majentik/Qwen3.5-397B-A17B-RotorQuant-MLX-2bit with Hermes Agent:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "majentik/Qwen3.5-397B-A17B-RotorQuant-MLX-2bit"

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default majentik/Qwen3.5-397B-A17B-RotorQuant-MLX-2bit

Run Hermes

hermes

OpenClaw new

How to use majentik/Qwen3.5-397B-A17B-RotorQuant-MLX-2bit with OpenClaw:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "majentik/Qwen3.5-397B-A17B-RotorQuant-MLX-2bit"

Configure OpenClaw

# Install OpenClaw:
npm install -g openclaw@latest
# Register the local server and set it as the default model:
openclaw onboard --non-interactive --mode local \
  --auth-choice custom-api-key \
  --custom-base-url http://127.0.0.1:8080/v1 \
  --custom-model-id "majentik/Qwen3.5-397B-A17B-RotorQuant-MLX-2bit" \
  --custom-provider-id mlx-lm \
  --custom-compatibility openai \
  --custom-text-input \
  --accept-risk \
  --skip-health

Run OpenClaw

openclaw agent --local --agent main --message "Hello from Hugging Face"

Qwen3.5-397B-A17B-RotorQuant-MLX-2bit

2-bit MLX weight-quantized build of Qwen/Qwen3.5-397B-A17B (397B total / 17B active Sparse MoE, multimodal) — re-quantized from the 4-bit RotorQuant MLX checkpoint for maximum compression. Optimized for Apple Silicon via MLX.

An experimental extreme-compression variant: the learned rotors from RotorQuant's calibration pass help preserve weight structure significantly better than static rotations at this bit-width. It is the highest-quality 2-bit build of Qwen3.5-397B-A17B in the Majentik suite.

Quickstart

from mlx_lm import load, generate

model, tokenizer = load("majentik/Qwen3.5-397B-A17B-RotorQuant-MLX-2bit")

prompt = tokenizer.apply_chat_template(
    [{"role": "user", "content": "Describe what a 2-bit weight means in one sentence."}],
    add_generation_prompt=True,
)
print(generate(model, tokenizer, prompt=prompt, max_tokens=128, verbose=True))

Model Specs

Property	Value
Base model	Qwen/Qwen3.5-397B-A17B
Architecture	Sparse Mixture-of-Experts (MoE)
Total parameters	397B
Active per token	17B
Modalities	Image + Text → Text (`image-text-to-text`)
Context window	256K tokens
Weight quantization	2-bit MLX (re-quantized from 4-bit RotorQuant)
Approx. disk footprint	~135 GB
License	Apache 2.0

RotorQuant vs TurboQuant

Aspect	RotorQuant (this repo)	TurboQuant
Rotation	Learned orthogonal rotors (data-calibrated)	Randomized Hadamard (static)
Calibration	~512 sample calibration pass	Zero-shot
Accuracy @ 2-bit	~95–97% of FP16 baseline (task-dependent)	~93–95% of FP16 baseline (task-dependent)
Best for	Squeezing the model in with the best quality	Squeezing the model into small VRAM

Memory Estimates (2-bit MLX)

Context	Active memory (approx.)
8K	~143 GB
32K	~153 GB
128K	~183 GB
256K	~213 GB

Hardware Requirements

Minimum: Apple Silicon with 192 GB unified memory for short/medium contexts
Recommended: 256 GB+ unified memory for full 256K context
Fits on top-end Mac Studio M-series configurations; does not fit on 96 GB or 128 GB Macs

Caveats

Re-quantized from the 4-bit RotorQuant MLX checkpoint (not directly from FP16)
Still the preferred 2-bit option — learned rotors meaningfully outperform Hadamard rotations at extreme bit-widths
For production use, prefer the 4-bit or higher variants when your hardware allows

Model tree for majentik/Qwen3.5-397B-A17B-RotorQuant-MLX-2bit

Base model

Qwen/Qwen3.5-397B-A17B

Quantized

(69)

this model

Collection including majentik/Qwen3.5-397B-A17B-RotorQuant-MLX-2bit

Qwen 3.5 — quantized

Collection

Quantized GGUF and MLX packs of the Qwen 3.5 model family, including the 397B-A17B MoE. • 23 items • Updated 1 day ago

majentik
/

Qwen3.5-397B-A17B-RotorQuant-MLX-2bit

Qwen3.5-397B-A17B-RotorQuant-MLX-2bit

Quickstart

Model Specs

RotorQuant vs TurboQuant

Memory Estimates (2-bit MLX)

Hardware Requirements

Caveats

See Also

Model tree for majentik/Qwen3.5-397B-A17B-RotorQuant-MLX-2bit

Collection including majentik/Qwen3.5-397B-A17B-RotorQuant-MLX-2bit

Qwen 3.5 — quantized