majentik
/

Voxtral-Mini-3B-2507-RotorQuant-MLX-2bit

Automatic Speech Recognition

speech-recognition

Model card Files Files and versions

Voxtral-Mini-3B-2507-RotorQuant-MLX-2bit / README.md

majentik's picture

Add MLX quantized model

c2b3049 verified 1 day ago

|

history blame contribute delete

2.7 kB

	---
	base_model: mistralai/Voxtral-Mini-3B-2507
	library_name: mlx
	license: apache-2.0
	pipeline_tag: automatic-speech-recognition
	tags:
	- voxtral
	- audio
	- speech
	- speech-recognition
	- transcription
	- translation
	- mlx
	- rotorquant
	- quantization
	- 2-bit
	---

	# Voxtral-Mini-3B-2507-RotorQuant-MLX-2bit

	2-bit MLX weight-quantized build of [`mistralai/Voxtral-Mini-3B-2507`](https://huggingface.co/mistralai/Voxtral-Mini-3B-2507) with a RotorQuant KV-cache profile. Ultra-compact, best-available 2-bit stability for streaming audio on Apple Silicon.

	## Overview

	- Base: `mistralai/Voxtral-Mini-3B-2507` — 3B speech-understanding model
	- Capabilities: transcription, speech translation, audio QA
	- Weight precision: 2-bit (group-wise)
	- KV-cache profile: RotorQuant (rotational online re-basis)
	- Approx. on-disk size: ~1 GB
	- Runtime: MLX on Apple Silicon

	RotorQuant's rotational re-basis helps 2-bit builds remain stable under distributional drift — preferred over TurboQuant at this precision for streaming workloads.

	## Quickstart

	```bash
	pip install mlx-lm
	```

	```python
	from mlx_lm import load, generate

	model, tokenizer = load("majentik/Voxtral-Mini-3B-2507-RotorQuant-MLX-2bit")

	prompt = tokenizer.apply_chat_template(
	[{"role": "user", "content": [{"type": "audio", "path": "stream.wav"},
	{"type": "text", "text": "Transcribe."}]}],
	add_generation_prompt=True,
	)
	print(generate(model, tokenizer, prompt=prompt, max_tokens=256))
	```

	## Model specs

	\| Field \| Value \|
	\|---\|---\|
	\| Parameters \| 3B \|
	\| Weight bits \| 2 \|
	\| Group size \| 32 \|
	\| Cache profile \| RotorQuant \|
	\| Size on disk \| ~1 GB \|
	\| Target hardware \| Apple Silicon (M1/M2/M3/M4) \|
	\| License \| Apache 2.0 \|

	## RotorQuant vs TurboQuant

	\| \| RotorQuant \| TurboQuant \|
	\|---\|---\|---\|
	\| Strategy \| Rotational online re-basis \| Per-head static calibration \|
	\| Memory reduction \| ~4x on KV-cache \| ~3.5x on KV-cache \|
	\| Best for \| Streaming, code-switching \| Batch transcription \|

	## See also

	- [`majentik/Voxtral-Mini-3B-2507-RotorQuant-MLX-4bit`](https://huggingface.co/majentik/Voxtral-Mini-3B-2507-RotorQuant-MLX-4bit)
	- [`majentik/Voxtral-Mini-3B-2507-RotorQuant-MLX-8bit`](https://huggingface.co/majentik/Voxtral-Mini-3B-2507-RotorQuant-MLX-8bit)
	- [`majentik/Voxtral-Mini-3B-2507-TurboQuant-MLX-2bit`](https://huggingface.co/majentik/Voxtral-Mini-3B-2507-TurboQuant-MLX-2bit)
	- [`majentik/Voxtral-Mini-3B-2507-RotorQuant`](https://huggingface.co/majentik/Voxtral-Mini-3B-2507-RotorQuant) — KV-cache-only bundle
	- [`mistralai/Voxtral-Mini-3B-2507`](https://huggingface.co/mistralai/Voxtral-Mini-3B-2507) — upstream base model