Qwen3.5-397B-A17B-RotorQuant-MLX-2bit
2-bit MLX weight-quantized build of Qwen/Qwen3.5-397B-A17B (397B total / 17B active Sparse MoE, multimodal) — re-quantized from the 4-bit RotorQuant MLX checkpoint for maximum compression. Optimized for Apple Silicon via MLX.
An experimental extreme-compression variant: the learned rotors from RotorQuant's calibration pass help preserve weight structure significantly better than static rotations at this bit-width. It is the highest-quality 2-bit build of Qwen3.5-397B-A17B in the Majentik suite.
Quickstart
from mlx_lm import load, generate
model, tokenizer = load("majentik/Qwen3.5-397B-A17B-RotorQuant-MLX-2bit")
prompt = tokenizer.apply_chat_template(
[{"role": "user", "content": "Describe what a 2-bit weight means in one sentence."}],
add_generation_prompt=True,
)
print(generate(model, tokenizer, prompt=prompt, max_tokens=128, verbose=True))
Model Specs
| Property | Value |
|---|---|
| Base model | Qwen/Qwen3.5-397B-A17B |
| Architecture | Sparse Mixture-of-Experts (MoE) |
| Total parameters | 397B |
| Active per token | 17B |
| Modalities | Image + Text → Text (image-text-to-text) |
| Context window | 256K tokens |
| Weight quantization | 2-bit MLX (re-quantized from 4-bit RotorQuant) |
| Approx. disk footprint | ~135 GB |
| License | Apache 2.0 |
RotorQuant vs TurboQuant
| Aspect | RotorQuant (this repo) | TurboQuant |
|---|---|---|
| Rotation | Learned orthogonal rotors (data-calibrated) | Randomized Hadamard (static) |
| Calibration | ~512 sample calibration pass | Zero-shot |
| Accuracy @ 2-bit | ~95–97% of FP16 baseline (task-dependent) | ~93–95% of FP16 baseline (task-dependent) |
| Best for | Squeezing the model in with the best quality | Squeezing the model into small VRAM |
Memory Estimates (2-bit MLX)
| Context | Active memory (approx.) |
|---|---|
| 8K | ~143 GB |
| 32K | ~153 GB |
| 128K | ~183 GB |
| 256K | ~213 GB |
Hardware Requirements
- Minimum: Apple Silicon with 192 GB unified memory for short/medium contexts
- Recommended: 256 GB+ unified memory for full 256K context
- Fits on top-end Mac Studio M-series configurations; does not fit on 96 GB or 128 GB Macs
Caveats
- Re-quantized from the 4-bit RotorQuant MLX checkpoint (not directly from FP16)
- Still the preferred 2-bit option — learned rotors meaningfully outperform Hadamard rotations at extreme bit-widths
- For production use, prefer the 4-bit or higher variants when your hardware allows
See Also
- RotorQuant MLX: 8-bit · 6-bit · 5-bit · 4-bit
- TurboQuant MLX 2-bit: majentik/Qwen3.5-397B-A17B-TurboQuant-MLX-2bit
- Base model: Qwen/Qwen3.5-397B-A17B
- Downloads last month
- -
Model size
38B params
Tensor type
BF16
·
U32 ·
F32 ·
Hardware compatibility
Log In to add your hardware
2-bit
Model tree for majentik/Qwen3.5-397B-A17B-RotorQuant-MLX-2bit
Base model
Qwen/Qwen3.5-397B-A17B