majentik
/

Voxtral-Mini-4B-Realtime-2602-RotorQuant

Automatic Speech Recognition

speech-recognition

Model card Files Files and versions

majentik commited on 1 day ago

Commit

aa0480e

·

verified ·

1 Parent(s): 371683c

Add model card

Files changed (1) hide show

README.md +74 -0

README.md ADDED Viewed

	@@ -0,0 +1,74 @@

+---
+base_model: mistralai/Voxtral-Mini-4B-Realtime-2602
+library_name: transformers
+license: apache-2.0
+pipeline_tag: automatic-speech-recognition
+tags:
+  - voxtral
+  - audio
+  - speech
+  - speech-recognition
+  - realtime
+  - streaming
+  - asr
+  - kv-cache
+  - rotorquant
+  - quantization
+---
+# Voxtral-Mini-4B-Realtime-2602-RotorQuant
+RotorQuant KV-cache bundle for [`mistralai/Voxtral-Mini-4B-Realtime-2602`](https://huggingface.co/mistralai/Voxtral-Mini-4B-Realtime-2602). Rotational online re-basis of the attention cache — preferred for noisy, multi-speaker, or code-switching real-time streams.
+This artifact ships **only the quantized KV-cache** — weights load from upstream.
+## Overview
+- **Base model:** `mistralai/Voxtral-Mini-4B-Realtime-2602`
+- **Capabilities:** real-time ASR, streaming speech-to-text
+- **Quantization target:** attention KV-cache only
+- **Method:** RotorQuant — orthogonal rotation + low-bit quantization, refreshed per session
+## Quickstart
+```python
+from transformers import VoxtralForConditionalGeneration, AutoProcessor
+from majentik_quant import RotorQuantCache
+model_id = "mistralai/Voxtral-Mini-4B-Realtime-2602"
+processor = AutoProcessor.from_pretrained(model_id)
+model = VoxtralForConditionalGeneration.from_pretrained(model_id, torch_dtype="auto")
+cache = RotorQuantCache.from_pretrained("majentik/Voxtral-Mini-4B-Realtime-2602-RotorQuant")
+for chunk in audio_stream():
+    inputs = processor(audio=chunk, return_tensors="pt")
+    out = model.generate(**inputs, past_key_values=cache, max_new_tokens=32)
+    emit(processor.batch_decode(out, skip_special_tokens=True)[0])
+```
+## Model specs
+| Field | Value |
+|---|---|
+| Parameters | 4B |
+| Modality | Streaming audio-in, text-out |
+| Use case | Real-time ASR |
+| Cache quantization | RotorQuant (rotated int4) |
+| License | Apache 2.0 |
+## RotorQuant vs TurboQuant
+| | RotorQuant | TurboQuant |
+|---|---|---|
+| Strategy | Rotational online re-basis | Per-head static calibration |
+| Memory reduction | ~4x on KV-cache | ~3.5x on KV-cache |
+| Best for | Noisy/multi-speaker streams | Predictable domains, lowest p50 latency |
+## See also
+- [`majentik/Voxtral-Mini-4B-Realtime-2602-TurboQuant`](https://huggingface.co/majentik/Voxtral-Mini-4B-Realtime-2602-TurboQuant)
+- [`majentik/Voxtral-Mini-4B-Realtime-2602-RotorQuant-MLX-8bit`](https://huggingface.co/majentik/Voxtral-Mini-4B-Realtime-2602-RotorQuant-MLX-8bit)
+- [`majentik/Voxtral-Mini-4B-Realtime-2602-RotorQuant-MLX-4bit`](https://huggingface.co/majentik/Voxtral-Mini-4B-Realtime-2602-RotorQuant-MLX-4bit)
+- [`majentik/Voxtral-Mini-4B-Realtime-2602-RotorQuant-MLX-2bit`](https://huggingface.co/majentik/Voxtral-Mini-4B-Realtime-2602-RotorQuant-MLX-2bit)
+- [`mistralai/Voxtral-Mini-4B-Realtime-2602`](https://huggingface.co/mistralai/Voxtral-Mini-4B-Realtime-2602) — upstream base model