metadata
base_model: mistralai/Voxtral-Mini-3B-2507
library_name: transformers
license: apache-2.0
pipeline_tag: automatic-speech-recognition
tags:
- voxtral
- audio
- speech
- speech-recognition
- transcription
- translation
- kv-cache
- rotorquant
- quantization
Voxtral-Mini-3B-2507-RotorQuant
RotorQuant KV-cache for mistralai/Voxtral-Mini-3B-2507. Uses a rotational online re-basis of the attention cache that is robust to distributional drift across long, code-switched, or noisy audio streams.
This artifact ships only the quantized KV-cache bundle — model weights load from the upstream repo.
Overview
- Base model:
mistralai/Voxtral-Mini-3B-2507 - Capabilities: transcription, speech translation, audio understanding
- Quantization target: attention KV-cache only
- Method: RotorQuant — orthogonal rotation + low-bit quantization, refreshed per session
RotorQuant trades a tiny per-session calibration pass for better low-bit stability on streaming audio. Preferred when audio domains shift mid-stream (multi-speaker meetings, code-switching, noise bursts).
Quickstart
from transformers import VoxtralForConditionalGeneration, AutoProcessor
from majentik_quant import RotorQuantCache
model_id = "mistralai/Voxtral-Mini-3B-2507"
processor = AutoProcessor.from_pretrained(model_id)
model = VoxtralForConditionalGeneration.from_pretrained(model_id, torch_dtype="auto")
cache = RotorQuantCache.from_pretrained("majentik/Voxtral-Mini-3B-2507-RotorQuant")
inputs = processor(audio="meeting.wav", return_tensors="pt")
out = model.generate(**inputs, past_key_values=cache, max_new_tokens=512)
print(processor.batch_decode(out, skip_special_tokens=True)[0])
Model specs
| Field | Value |
|---|---|
| Parameters | 3B |
| Modality | Audio-in, text-out |
| Languages | Multilingual (24+) |
| Cache quantization | RotorQuant (rotated int4) |
| License | Apache 2.0 |
RotorQuant vs TurboQuant
| RotorQuant | TurboQuant | |
|---|---|---|
| Strategy | Rotational online re-basis | Per-head static calibration |
| Memory reduction | ~4x on KV-cache | ~3.5x on KV-cache |
| Best for | Streaming, code-switching audio | Batch transcription, fixed domains |
| Calibration cost | Per-session light re-basis | One-shot, fast |
See also
majentik/Voxtral-Mini-3B-2507-TurboQuant— static calibrated KV-cache variantmajentik/Voxtral-Mini-3B-2507-RotorQuant-MLX-8bit— MLX weight-quantized 8-bitmajentik/Voxtral-Mini-3B-2507-RotorQuant-MLX-4bit— MLX weight-quantized 4-bitmajentik/Voxtral-Mini-3B-2507-RotorQuant-MLX-2bit— MLX weight-quantized 2-bitmistralai/Voxtral-Mini-3B-2507— upstream base model