--- base_model: mistralai/Voxtral-Mini-3B-2507 library_name: mlx license: apache-2.0 pipeline_tag: automatic-speech-recognition tags: - voxtral - audio - speech - speech-recognition - transcription - translation - mlx - rotorquant - quantization - 2-bit --- # Voxtral-Mini-3B-2507-RotorQuant-MLX-2bit 2-bit MLX weight-quantized build of [`mistralai/Voxtral-Mini-3B-2507`](https://huggingface.co/mistralai/Voxtral-Mini-3B-2507) with a RotorQuant KV-cache profile. Ultra-compact, best-available 2-bit stability for streaming audio on Apple Silicon. ## Overview - **Base:** `mistralai/Voxtral-Mini-3B-2507` — 3B speech-understanding model - **Capabilities:** transcription, speech translation, audio QA - **Weight precision:** 2-bit (group-wise) - **KV-cache profile:** RotorQuant (rotational online re-basis) - **Approx. on-disk size:** ~1 GB - **Runtime:** MLX on Apple Silicon RotorQuant's rotational re-basis helps 2-bit builds remain stable under distributional drift — preferred over TurboQuant at this precision for streaming workloads. ## Quickstart ```bash pip install mlx-lm ``` ```python from mlx_lm import load, generate model, tokenizer = load("majentik/Voxtral-Mini-3B-2507-RotorQuant-MLX-2bit") prompt = tokenizer.apply_chat_template( [{"role": "user", "content": [{"type": "audio", "path": "stream.wav"}, {"type": "text", "text": "Transcribe."}]}], add_generation_prompt=True, ) print(generate(model, tokenizer, prompt=prompt, max_tokens=256)) ``` ## Model specs | Field | Value | |---|---| | Parameters | 3B | | Weight bits | 2 | | Group size | 32 | | Cache profile | RotorQuant | | Size on disk | ~1 GB | | Target hardware | Apple Silicon (M1/M2/M3/M4) | | License | Apache 2.0 | ## RotorQuant vs TurboQuant | | RotorQuant | TurboQuant | |---|---|---| | Strategy | Rotational online re-basis | Per-head static calibration | | Memory reduction | ~4x on KV-cache | ~3.5x on KV-cache | | Best for | Streaming, code-switching | Batch transcription | ## See also - [`majentik/Voxtral-Mini-3B-2507-RotorQuant-MLX-4bit`](https://huggingface.co/majentik/Voxtral-Mini-3B-2507-RotorQuant-MLX-4bit) - [`majentik/Voxtral-Mini-3B-2507-RotorQuant-MLX-8bit`](https://huggingface.co/majentik/Voxtral-Mini-3B-2507-RotorQuant-MLX-8bit) - [`majentik/Voxtral-Mini-3B-2507-TurboQuant-MLX-2bit`](https://huggingface.co/majentik/Voxtral-Mini-3B-2507-TurboQuant-MLX-2bit) - [`majentik/Voxtral-Mini-3B-2507-RotorQuant`](https://huggingface.co/majentik/Voxtral-Mini-3B-2507-RotorQuant) — KV-cache-only bundle - [`mistralai/Voxtral-Mini-3B-2507`](https://huggingface.co/mistralai/Voxtral-Mini-3B-2507) — upstream base model