Add model card

f51e5b5 verified 1 day ago

3.12 kB

base_model: mistralai/Voxtral-Mini-3B-2507
library_name: transformers
license: apache-2.0
pipeline_tag: automatic-speech-recognition
tags:
  - voxtral
  - audio
  - speech
  - speech-recognition
  - transcription
  - translation
  - kv-cache
  - rotorquant
  - quantization

Voxtral-Mini-3B-2507-RotorQuant

RotorQuant KV-cache for mistralai/Voxtral-Mini-3B-2507. Uses a rotational online re-basis of the attention cache that is robust to distributional drift across long, code-switched, or noisy audio streams.

This artifact ships only the quantized KV-cache bundle — model weights load from the upstream repo.

Overview

Base model: mistralai/Voxtral-Mini-3B-2507
Capabilities: transcription, speech translation, audio understanding
Quantization target: attention KV-cache only
Method: RotorQuant — orthogonal rotation + low-bit quantization, refreshed per session

RotorQuant trades a tiny per-session calibration pass for better low-bit stability on streaming audio. Preferred when audio domains shift mid-stream (multi-speaker meetings, code-switching, noise bursts).

Quickstart

from transformers import VoxtralForConditionalGeneration, AutoProcessor
from majentik_quant import RotorQuantCache

model_id = "mistralai/Voxtral-Mini-3B-2507"
processor = AutoProcessor.from_pretrained(model_id)
model = VoxtralForConditionalGeneration.from_pretrained(model_id, torch_dtype="auto")

cache = RotorQuantCache.from_pretrained("majentik/Voxtral-Mini-3B-2507-RotorQuant")

inputs = processor(audio="meeting.wav", return_tensors="pt")
out = model.generate(**inputs, past_key_values=cache, max_new_tokens=512)
print(processor.batch_decode(out, skip_special_tokens=True)[0])

Model specs

Field	Value
Parameters	3B
Modality	Audio-in, text-out
Languages	Multilingual (24+)
Cache quantization	RotorQuant (rotated int4)
License	Apache 2.0

RotorQuant vs TurboQuant

	RotorQuant	TurboQuant
Strategy	Rotational online re-basis	Per-head static calibration
Memory reduction	~4x on KV-cache	~3.5x on KV-cache
Best for	Streaming, code-switching audio	Batch transcription, fixed domains
Calibration cost	Per-session light re-basis	One-shot, fast

majentik
/

Voxtral-Mini-3B-2507-RotorQuant

Voxtral-Mini-3B-2507-RotorQuant

Overview

Quickstart

Model specs

RotorQuant vs TurboQuant

See also