Voxtral-Mini-4B-Realtime-2602-TurboQuant
TurboQuant KV-cache bundle for mistralai/Voxtral-Mini-4B-Realtime-2602, a 4B-parameter real-time speech input ASR model optimized for low-latency streaming transcription.
This artifact ships only the quantized KV-cache โ weights load from the upstream repo.
Overview
- Base model:
mistralai/Voxtral-Mini-4B-Realtime-2602(Apache 2.0, ~864K downloads) - Capabilities: real-time ASR, streaming speech-to-text
- Quantization target: attention KV-cache only
- Method: TurboQuant โ per-head, per-channel calibrated cache quantization
Real-time ASR keeps a rolling audio context; TurboQuant drops cache memory and memory-bandwidth pressure, allowing longer sessions on the same device without dropping frames.
Quickstart
from transformers import VoxtralForConditionalGeneration, AutoProcessor
from majentik_quant import TurboQuantCache
model_id = "mistralai/Voxtral-Mini-4B-Realtime-2602"
processor = AutoProcessor.from_pretrained(model_id)
model = VoxtralForConditionalGeneration.from_pretrained(model_id, torch_dtype="auto")
cache = TurboQuantCache.from_pretrained("majentik/Voxtral-Mini-4B-Realtime-2602-TurboQuant")
for chunk in audio_stream(): # 20 ms PCM chunks
inputs = processor(audio=chunk, return_tensors="pt")
out = model.generate(**inputs, past_key_values=cache, max_new_tokens=32)
emit(processor.batch_decode(out, skip_special_tokens=True)[0])
Model specs
| Field | Value |
|---|---|
| Parameters | 4B |
| Modality | Streaming audio-in, text-out |
| Use case | Real-time ASR |
| Cache quantization | TurboQuant (int8 heads, int4 channels) |
| License | Apache 2.0 |
RotorQuant vs TurboQuant
| TurboQuant | RotorQuant | |
|---|---|---|
| Strategy | Per-head static calibration | Rotational online re-basis |
| Memory reduction | ~3.5x on KV-cache | ~4x on KV-cache |
| Best for | Predictable domains, lowest p50 latency | Noisy/multi-speaker streams |
| Calibration cost | One-shot, fast | Per-session light re-basis |
TurboQuant is the lowest-latency option. RotorQuant preserves more quality when domains drift mid-session.
See also
Model tree for majentik/Voxtral-Mini-4B-Realtime-2602-TurboQuant
Base model
mistralai/Ministral-3-3B-Base-2512 Finetuned
mistralai/Voxtral-Mini-4B-Realtime-2602