Voxtral Mini 4B Realtime β Q4 GGUF
Q4_0 quantized GGUF weights for Voxtral Mini 4B Realtime, converted for use with voxtral-mini-realtime-rs.
Files
| File | Size | Description |
|---|---|---|
voxtral-q4.gguf |
2.51 GB | Q4_0 quantized model weights (GGUF v3) |
tekken.json |
14.9 MB | Tekken tokenizer (131,072 vocab) |
Usage
Native CLI
cargo run --features "wgpu,cli,hub" --bin voxtral-transcribe -- \
--audio input.wav \
--gguf models/voxtral-q4.gguf \
--tokenizer models/voxtral/tekken.json
Browser (WASM + WebGPU)
The Q4 GGUF is designed to run entirely client-side in a browser tab via WebGPU. See the GitHub repo for the full WASM build and dev server setup.
Quantization Details
- Method: Q4_0 (4-bit quantization, block size 32, 18 bytes per block)
- Original model: mistralai/Voxtral-Mini-4B-Realtime-2602 (~16 GB F32)
- Quantized size: ~2.5 GB (fits in browser memory)
- Inference: Custom WGSL shader for fused GPU dequantize + matmul
Links
- Code: github.com/TrevorS/voxtral-mini-realtime-rs
- Original model: mistralai/Voxtral-Mini-4B-Realtime-2602
- Downloads last month
- 26
Hardware compatibility
Log In
to add your hardware
We're not able to determine the quantization variants.
Model tree for TrevorJS/voxtral-mini-realtime-gguf
Base model
mistralai/Ministral-3-3B-Base-2512
Finetuned
mistralai/Voxtral-Mini-4B-Realtime-2602