Voxtral-4B-TTS-2603-RotorQuant

RotorQuant KV-cache bundle for mistralai/Voxtral-4B-TTS-2603. Rotational online re-basis of the acoustic KV-cache โ€” recommended when switching voices, languages, or styles within a batch.

This artifact ships only the quantized KV-cache โ€” weights load from upstream.

Overview

  • Base model: mistralai/Voxtral-4B-TTS-2603
  • Capabilities: TTS, zero-shot voice cloning, 9 languages
  • Quantization target: attention KV-cache only
  • Method: RotorQuant โ€” orthogonal rotation + low-bit quantization, refreshed per session

Quickstart

from transformers import VoxtralForConditionalGeneration, AutoProcessor
from majentik_quant import RotorQuantCache

model_id = "mistralai/Voxtral-4B-TTS-2603"
processor = AutoProcessor.from_pretrained(model_id)
model = VoxtralForConditionalGeneration.from_pretrained(model_id, torch_dtype="auto")

cache = RotorQuantCache.from_pretrained("majentik/Voxtral-4B-TTS-2603-RotorQuant")

for line, voice in utterances:
    inputs = processor(text=line, speaker_audio=voice, return_tensors="pt")
    audio = model.generate(**inputs, past_key_values=cache, max_new_tokens=2048)
    processor.save_audio(audio, f"{line[:10]}.wav")

Model specs

Field Value
Parameters 4B
Modality Text-in, audio-out
Languages 9
Voice cloning Zero-shot
Cache quantization RotorQuant (rotated int4)
License Apache 2.0

RotorQuant vs TurboQuant

RotorQuant TurboQuant
Strategy Rotational online re-basis Per-head static calibration
Memory reduction ~4x on KV-cache ~3.5x on KV-cache
Best for Multi-voice / multi-language batches Single-voice, single-language sessions

See also

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for majentik/Voxtral-4B-TTS-2603-RotorQuant

Finetuned
(4)
this model