Voxtral-Mini-3B-2507-TurboQuant-MLX-2bit

2-bit MLX weight-quantized build of mistralai/Voxtral-Mini-3B-2507. Extreme-compression variant with a TurboQuant KV-cache profile โ€” designed for memory-constrained Apple Silicon devices.

Hardware compatibility

Device VRAM / RAM Recommendation
Apple M4 Max 128 GB ~1.3 GB recommended โ€” headroom for long context
Apple M3 Max 64 GB ~1.3 GB comfortable
Apple M2 Max 32 GB ~1.2 GB fits

Overview

  • Base: mistralai/Voxtral-Mini-3B-2507 โ€” 3B speech-understanding model
  • Capabilities: transcription, speech translation, audio QA
  • Weight precision: 2-bit (group-wise)
  • KV-cache profile: TurboQuant (per-head static calibration)
  • Approx. on-disk size: ~1 GB
  • Runtime: MLX on Apple Silicon

Expect minor WER degradation vs the 4-bit build. Best used with clean, single-speaker audio.

Quickstart

pip install mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("majentik/Voxtral-Mini-3B-2507-TurboQuant-MLX-2bit")

prompt = tokenizer.apply_chat_template(
    [{"role": "user", "content": [{"type": "audio", "path": "sample.wav"},
                                  {"type": "text", "text": "Transcribe this."}]}],
    add_generation_prompt=True,
)
print(generate(model, tokenizer, prompt=prompt, max_tokens=256))

Model specs

Field Value
Parameters 3B
Weight bits 2
Group size 32
Cache profile TurboQuant
Size on disk ~1 GB
Target hardware Apple Silicon (M1/M2/M3/M4)
License Apache 2.0

RotorQuant vs TurboQuant

TurboQuant RotorQuant
Strategy Per-head static calibration Rotational online re-basis
Memory reduction ~3.5x on KV-cache ~4x on KV-cache
Best for Batch transcription Streaming / code-switching

At 2-bit, RotorQuant often preserves quality better in drifting audio โ€” consider the RotorQuant counterpart for streaming workloads.

See also

Downloads last month
56
Safetensors
Model size
5B params
Tensor type
BF16
ยท
F32
ยท
U32
ยท
MLX
Hardware compatibility
Log In to add your hardware

2-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for majentik/Voxtral-Mini-3B-2507-TurboQuant-MLX-2bit

Quantized
(18)
this model