Voxtral-4B-TTS-2603-RotorQuant
RotorQuant KV-cache bundle for mistralai/Voxtral-4B-TTS-2603. Rotational online re-basis of the acoustic KV-cache โ recommended when switching voices, languages, or styles within a batch.
This artifact ships only the quantized KV-cache โ weights load from upstream.
Overview
- Base model:
mistralai/Voxtral-4B-TTS-2603 - Capabilities: TTS, zero-shot voice cloning, 9 languages
- Quantization target: attention KV-cache only
- Method: RotorQuant โ orthogonal rotation + low-bit quantization, refreshed per session
Quickstart
from transformers import VoxtralForConditionalGeneration, AutoProcessor
from majentik_quant import RotorQuantCache
model_id = "mistralai/Voxtral-4B-TTS-2603"
processor = AutoProcessor.from_pretrained(model_id)
model = VoxtralForConditionalGeneration.from_pretrained(model_id, torch_dtype="auto")
cache = RotorQuantCache.from_pretrained("majentik/Voxtral-4B-TTS-2603-RotorQuant")
for line, voice in utterances:
inputs = processor(text=line, speaker_audio=voice, return_tensors="pt")
audio = model.generate(**inputs, past_key_values=cache, max_new_tokens=2048)
processor.save_audio(audio, f"{line[:10]}.wav")
Model specs
| Field | Value |
|---|---|
| Parameters | 4B |
| Modality | Text-in, audio-out |
| Languages | 9 |
| Voice cloning | Zero-shot |
| Cache quantization | RotorQuant (rotated int4) |
| License | Apache 2.0 |
RotorQuant vs TurboQuant
| RotorQuant | TurboQuant | |
|---|---|---|
| Strategy | Rotational online re-basis | Per-head static calibration |
| Memory reduction | ~4x on KV-cache | ~3.5x on KV-cache |
| Best for | Multi-voice / multi-language batches | Single-voice, single-language sessions |
See also
Model tree for majentik/Voxtral-4B-TTS-2603-RotorQuant
Base model
mistralai/Ministral-3-3B-Base-2512 Finetuned
mistralai/Voxtral-4B-TTS-2603