| --- |
| base_model: mistralai/Voxtral-Mini-4B-Realtime-2602 |
| library_name: mlx |
| license: apache-2.0 |
| pipeline_tag: automatic-speech-recognition |
| tags: |
| - voxtral |
| - audio |
| - speech |
| - speech-recognition |
| - realtime |
| - streaming |
| - asr |
| - mlx |
| - rotorquant |
| - quantization |
| - 2-bit |
| --- |
| |
| # Voxtral-Mini-4B-Realtime-2602-RotorQuant-MLX-2bit |
|
|
| 2-bit MLX weight-quantized build of [`mistralai/Voxtral-Mini-4B-Realtime-2602`](https://huggingface.co/mistralai/Voxtral-Mini-4B-Realtime-2602) with RotorQuant KV-cache. Ultra-compact real-time ASR for memory-constrained Apple Silicon — best-available 2-bit stability on streaming audio. |
|
|
| ## Overview |
|
|
| - **Base:** `mistralai/Voxtral-Mini-4B-Realtime-2602` — 4B real-time ASR model |
| - **Weight precision:** 2-bit (group-wise) |
| - **KV-cache profile:** RotorQuant |
| - **Approx. on-disk size:** ~1.2 GB |
| - **Runtime:** MLX on Apple Silicon |
|
|
| ## Quickstart |
|
|
| ```bash |
| pip install mlx-lm |
| ``` |
|
|
| ```python |
| from mlx_lm import load, generate |
| |
| model, tokenizer = load("majentik/Voxtral-Mini-4B-Realtime-2602-RotorQuant-MLX-2bit") |
| |
| for chunk in audio_stream(): |
| prompt = tokenizer.apply_chat_template( |
| [{"role": "user", "content": [{"type": "audio", "path": chunk}]}], |
| add_generation_prompt=True, |
| ) |
| emit(generate(model, tokenizer, prompt=prompt, max_tokens=32)) |
| ``` |
|
|
| ## Model specs |
|
|
| | Field | Value | |
| |---|---| |
| | Parameters | 4B | |
| | Weight bits | 2 | |
| | Group size | 32 | |
| | Cache profile | RotorQuant | |
| | Size on disk | ~1.2 GB | |
| | Target hardware | Apple Silicon (M1/M2/M3/M4) | |
| | License | Apache 2.0 | |
|
|
| ## RotorQuant vs TurboQuant |
|
|
| | | RotorQuant | TurboQuant | |
| |---|---|---| |
| | Strategy | Rotational online re-basis | Per-head static calibration | |
| | Memory reduction | ~4x on KV-cache | ~3.5x on KV-cache | |
| | Best for | Noisy/multi-speaker streams | Predictable domains, lowest p50 latency | |
|
|
| ## See also |
|
|
| - [`majentik/Voxtral-Mini-4B-Realtime-2602-RotorQuant-MLX-4bit`](https://huggingface.co/majentik/Voxtral-Mini-4B-Realtime-2602-RotorQuant-MLX-4bit) |
| - [`majentik/Voxtral-Mini-4B-Realtime-2602-RotorQuant-MLX-8bit`](https://huggingface.co/majentik/Voxtral-Mini-4B-Realtime-2602-RotorQuant-MLX-8bit) |
| - [`majentik/Voxtral-Mini-4B-Realtime-2602-TurboQuant-MLX-2bit`](https://huggingface.co/majentik/Voxtral-Mini-4B-Realtime-2602-TurboQuant-MLX-2bit) |
| - [`majentik/Voxtral-Mini-4B-Realtime-2602-RotorQuant`](https://huggingface.co/majentik/Voxtral-Mini-4B-Realtime-2602-RotorQuant) — KV-cache-only bundle |
| - [`mistralai/Voxtral-Mini-4B-Realtime-2602`](https://huggingface.co/mistralai/Voxtral-Mini-4B-Realtime-2602) — upstream base model |
|
|