Voxtral Mini 4B Realtime 2602 โ€” MLX

MLX quantized weights for Voxtral-Mini-4B-Realtime-2602 on Apple Silicon.

Architecture

Voxtral Realtime differs from the standard Voxtral Mini 3B. It uses additive fusion (audio embeddings + token embeddings) instead of sequence concatenation, and a causal transformer encoder with sliding window attention (window=750) instead of a bidirectional Whisper-style encoder. This makes it suitable for streaming / low-latency inference.

Variants

Folder Quantization Description
mlx-mxfp4-mixed/ MXFP4 mixed precision Text decoder: MXFP4 4-bit (group_size=32). Audio encoder/projector: 8-bit affine (group_size=64). Embeddings: full precision.

License

This model is distributed under the Apache 2.0 license, following the upstream Voxtral-Mini-4B-Realtime-2602 license.

Requirements

  • Apple Silicon (M1+, M5+ recommended for native MXFP4)
  • MLX >= 0.30.0
  • Python 3.11+

Source

Converted from mistralai/Voxtral-Mini-4B-Realtime-2602 using oriloq-mlx.

Downloads last month

-

Downloads are not tracked for this model. How to track
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for NeoRoth/voxtral-4b-realtime-2602-mlx