Voxtral Mini 4B Realtime 2602 — MLX

MLX quantized weights for Voxtral-Mini-4B-Realtime-2602 on Apple Silicon.

Architecture

Voxtral Realtime differs from the standard Voxtral Mini 3B. It uses additive fusion (audio embeddings + token embeddings) instead of sequence concatenation, and a causal transformer encoder with sliding window attention (window=750) instead of a bidirectional Whisper-style encoder. This makes it suitable for streaming / low-latency inference.

Variants

Folder	Quantization	Description
`mlx-mxfp4-mixed/`	MXFP4 mixed precision	Text decoder: MXFP4 4-bit (group_size=32). Audio encoder/projector: 8-bit affine (group_size=64). Embeddings: full precision.

License

This model is distributed under the Apache 2.0 license, following the upstream Voxtral-Mini-4B-Realtime-2602 license.

Requirements

Apple Silicon (M1+, M5+ recommended for native MXFP4)
MLX >= 0.30.0
Python 3.11+

Source

Converted from mistralai/Voxtral-Mini-4B-Realtime-2602 using oriloq-mlx.

Downloads last month: -; Downloads are not tracked for this model. How to track

MLX

Hardware compatibility

Quantized

Model tree for NeoRoth/voxtral-4b-realtime-2602-mlx

Base model

mistralai/Ministral-3-3B-Base-2512

Finetuned

mistralai/Voxtral-Mini-4B-Realtime-2602

Finetuned

(18)

this model