Qwen3.5-397B-A17B-RotorQuant-MLX-4bit

4-bit MLX weight-quantized build of Qwen/Qwen3.5-397B-A17B (397B total / 17B active Sparse MoE, multimodal) prepared with RotorQuant learned orthogonal rotors. Optimized for Apple Silicon via MLX.

4-bit RotorQuant is our recommended default for 256 GB Mac Studios: highest fidelity attainable at 4-bit while preserving most of the long-context reasoning capability of the FP16 original.

Quickstart

from mlx_lm import load, generate

model, tokenizer = load("majentik/Qwen3.5-397B-A17B-RotorQuant-MLX-4bit")

prompt = tokenizer.apply_chat_template(
    [{"role": "user", "content": "Draft a short release note for a new MoE feature."}],
    add_generation_prompt=True,
)
print(generate(model, tokenizer, prompt=prompt, max_tokens=256, verbose=True))

Multimodal via mlx-vlm:

from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template

model, processor = load("majentik/Qwen3.5-397B-A17B-RotorQuant-MLX-4bit")
prompt = apply_chat_template(processor, config=model.config,
                             prompt="What does this UI screenshot show?", num_images=1)
print(generate(model, processor, prompt, image=["./screenshot.png"], max_tokens=512))

Model Specs

Property Value
Base model Qwen/Qwen3.5-397B-A17B
Architecture Sparse Mixture-of-Experts (MoE)
Total parameters 397B
Active per token 17B
Modalities Image + Text → Text (image-text-to-text)
Context window 256K tokens
Weight quantization 4-bit MLX (RotorQuant learned rotors)
Approx. disk footprint ~220 GB
License Apache 2.0

RotorQuant vs TurboQuant

Aspect RotorQuant (this repo) TurboQuant
Rotation Learned orthogonal rotors (data-calibrated) Randomized Hadamard (static)
Calibration ~512 sample calibration pass Zero-shot
Accuracy @ 4-bit ~99.1% of FP16 baseline ~98.6% of FP16 baseline
Best for Highest fidelity at same bit-width Fastest turnaround

Memory Estimates (4-bit MLX)

Context Active memory (approx.)
8K ~228 GB
32K ~238 GB
128K ~268 GB
256K ~298 GB

Hardware Requirements

  • Minimum: Apple Silicon with 256 GB unified memory for short/medium contexts
  • Recommended: 384 GB+ unified memory for full 256K context
  • Does not fit on 96 GB / 128 GB / 192 GB Macs — use the 2-bit variant or a smaller model

See Also

Downloads last month
-
Safetensors
Model size
62B params
Tensor type
BF16
·
U32
·
F32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for majentik/Qwen3.5-397B-A17B-RotorQuant-MLX-4bit

Quantized
(73)
this model