Voxtral Mini 4B Realtime 2602 โ MLX
MLX quantized weights for Voxtral-Mini-4B-Realtime-2602 on Apple Silicon.
Architecture
Voxtral Realtime differs from the standard Voxtral Mini 3B. It uses additive fusion (audio embeddings + token embeddings) instead of sequence concatenation, and a causal transformer encoder with sliding window attention (window=750) instead of a bidirectional Whisper-style encoder. This makes it suitable for streaming / low-latency inference.
Variants
| Folder | Quantization | Description |
|---|---|---|
mlx-mxfp4-mixed/ |
MXFP4 mixed precision | Text decoder: MXFP4 4-bit (group_size=32). Audio encoder/projector: 8-bit affine (group_size=64). Embeddings: full precision. |
License
This model is distributed under the Apache 2.0 license, following the upstream Voxtral-Mini-4B-Realtime-2602 license.
Requirements
- Apple Silicon (M1+, M5+ recommended for native MXFP4)
- MLX >= 0.30.0
- Python 3.11+
Source
Converted from mistralai/Voxtral-Mini-4B-Realtime-2602 using oriloq-mlx.
Hardware compatibility
Log In to add your hardware
Quantized
Model tree for NeoRoth/voxtral-4b-realtime-2602-mlx
Base model
mistralai/Ministral-3-3B-Base-2512
Finetuned
mistralai/Voxtral-Mini-4B-Realtime-2602