| --- |
| base_model: mistralai/Voxtral-Mini-3B-2507 |
| library_name: mlx |
| license: apache-2.0 |
| pipeline_tag: automatic-speech-recognition |
| tags: |
| - voxtral |
| - audio |
| - speech |
| - speech-recognition |
| - transcription |
| - translation |
| - mlx |
| - rotorquant |
| - quantization |
| - 2-bit |
| --- |
| |
| # Voxtral-Mini-3B-2507-RotorQuant-MLX-2bit |
|
|
| 2-bit MLX weight-quantized build of [`mistralai/Voxtral-Mini-3B-2507`](https://huggingface.co/mistralai/Voxtral-Mini-3B-2507) with a RotorQuant KV-cache profile. Ultra-compact, best-available 2-bit stability for streaming audio on Apple Silicon. |
|
|
| ## Overview |
|
|
| - **Base:** `mistralai/Voxtral-Mini-3B-2507` — 3B speech-understanding model |
| - **Capabilities:** transcription, speech translation, audio QA |
| - **Weight precision:** 2-bit (group-wise) |
| - **KV-cache profile:** RotorQuant (rotational online re-basis) |
| - **Approx. on-disk size:** ~1 GB |
| - **Runtime:** MLX on Apple Silicon |
|
|
| RotorQuant's rotational re-basis helps 2-bit builds remain stable under distributional drift — preferred over TurboQuant at this precision for streaming workloads. |
|
|
| ## Quickstart |
|
|
| ```bash |
| pip install mlx-lm |
| ``` |
|
|
| ```python |
| from mlx_lm import load, generate |
| |
| model, tokenizer = load("majentik/Voxtral-Mini-3B-2507-RotorQuant-MLX-2bit") |
| |
| prompt = tokenizer.apply_chat_template( |
| [{"role": "user", "content": [{"type": "audio", "path": "stream.wav"}, |
| {"type": "text", "text": "Transcribe."}]}], |
| add_generation_prompt=True, |
| ) |
| print(generate(model, tokenizer, prompt=prompt, max_tokens=256)) |
| ``` |
|
|
| ## Model specs |
|
|
| | Field | Value | |
| |---|---| |
| | Parameters | 3B | |
| | Weight bits | 2 | |
| | Group size | 32 | |
| | Cache profile | RotorQuant | |
| | Size on disk | ~1 GB | |
| | Target hardware | Apple Silicon (M1/M2/M3/M4) | |
| | License | Apache 2.0 | |
|
|
| ## RotorQuant vs TurboQuant |
|
|
| | | RotorQuant | TurboQuant | |
| |---|---|---| |
| | Strategy | Rotational online re-basis | Per-head static calibration | |
| | Memory reduction | ~4x on KV-cache | ~3.5x on KV-cache | |
| | Best for | Streaming, code-switching | Batch transcription | |
|
|
| ## See also |
|
|
| - [`majentik/Voxtral-Mini-3B-2507-RotorQuant-MLX-4bit`](https://huggingface.co/majentik/Voxtral-Mini-3B-2507-RotorQuant-MLX-4bit) |
| - [`majentik/Voxtral-Mini-3B-2507-RotorQuant-MLX-8bit`](https://huggingface.co/majentik/Voxtral-Mini-3B-2507-RotorQuant-MLX-8bit) |
| - [`majentik/Voxtral-Mini-3B-2507-TurboQuant-MLX-2bit`](https://huggingface.co/majentik/Voxtral-Mini-3B-2507-TurboQuant-MLX-2bit) |
| - [`majentik/Voxtral-Mini-3B-2507-RotorQuant`](https://huggingface.co/majentik/Voxtral-Mini-3B-2507-RotorQuant) — KV-cache-only bundle |
| - [`mistralai/Voxtral-Mini-3B-2507`](https://huggingface.co/mistralai/Voxtral-Mini-3B-2507) — upstream base model |
|
|