--- base_model: mistralai/Voxtral-Mini-4B-Realtime-2602 library_name: mlx license: apache-2.0 pipeline_tag: automatic-speech-recognition tags: - voxtral - audio - speech - speech-recognition - realtime - streaming - asr - mlx - rotorquant - quantization - 2-bit --- # Voxtral-Mini-4B-Realtime-2602-RotorQuant-MLX-2bit 2-bit MLX weight-quantized build of [`mistralai/Voxtral-Mini-4B-Realtime-2602`](https://huggingface.co/mistralai/Voxtral-Mini-4B-Realtime-2602) with RotorQuant KV-cache. Ultra-compact real-time ASR for memory-constrained Apple Silicon — best-available 2-bit stability on streaming audio. ## Overview - **Base:** `mistralai/Voxtral-Mini-4B-Realtime-2602` — 4B real-time ASR model - **Weight precision:** 2-bit (group-wise) - **KV-cache profile:** RotorQuant - **Approx. on-disk size:** ~1.2 GB - **Runtime:** MLX on Apple Silicon ## Quickstart ```bash pip install mlx-lm ``` ```python from mlx_lm import load, generate model, tokenizer = load("majentik/Voxtral-Mini-4B-Realtime-2602-RotorQuant-MLX-2bit") for chunk in audio_stream(): prompt = tokenizer.apply_chat_template( [{"role": "user", "content": [{"type": "audio", "path": chunk}]}], add_generation_prompt=True, ) emit(generate(model, tokenizer, prompt=prompt, max_tokens=32)) ``` ## Model specs | Field | Value | |---|---| | Parameters | 4B | | Weight bits | 2 | | Group size | 32 | | Cache profile | RotorQuant | | Size on disk | ~1.2 GB | | Target hardware | Apple Silicon (M1/M2/M3/M4) | | License | Apache 2.0 | ## RotorQuant vs TurboQuant | | RotorQuant | TurboQuant | |---|---|---| | Strategy | Rotational online re-basis | Per-head static calibration | | Memory reduction | ~4x on KV-cache | ~3.5x on KV-cache | | Best for | Noisy/multi-speaker streams | Predictable domains, lowest p50 latency | ## See also - [`majentik/Voxtral-Mini-4B-Realtime-2602-RotorQuant-MLX-4bit`](https://huggingface.co/majentik/Voxtral-Mini-4B-Realtime-2602-RotorQuant-MLX-4bit) - [`majentik/Voxtral-Mini-4B-Realtime-2602-RotorQuant-MLX-8bit`](https://huggingface.co/majentik/Voxtral-Mini-4B-Realtime-2602-RotorQuant-MLX-8bit) - [`majentik/Voxtral-Mini-4B-Realtime-2602-TurboQuant-MLX-2bit`](https://huggingface.co/majentik/Voxtral-Mini-4B-Realtime-2602-TurboQuant-MLX-2bit) - [`majentik/Voxtral-Mini-4B-Realtime-2602-RotorQuant`](https://huggingface.co/majentik/Voxtral-Mini-4B-Realtime-2602-RotorQuant) — KV-cache-only bundle - [`mistralai/Voxtral-Mini-4B-Realtime-2602`](https://huggingface.co/mistralai/Voxtral-Mini-4B-Realtime-2602) — upstream base model