Instructions to use mlx-community/Voxtral-Mini-4B-Realtime-6bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use mlx-community/Voxtral-Mini-4B-Realtime-6bit with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir Voxtral-Mini-4B-Realtime-6bit mlx-community/Voxtral-Mini-4B-Realtime-6bit
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
VoxtralKit β Swift-native inference for this model on Apple Silicon
#1
by omidbicker - opened
Hi!
For a project of mine I need a swift package that runs the model on device, I thought others might benefit from this as well.
VoxtralKit β a Swift package that runs this model entirely on-device via MLX Swift. It's the first Swift
implementation of Voxtral, designed to make on-device speech-to-text accessible to Swift/macOS developers.
Quick example
import VoxtralKit
let transcriber = try VoxtralTranscriber(modelPath: modelDirectory)
let text = try transcriber.transcribe(URL(fileURLWithPath: "recording.wav"))
// "Hello, this is a test of the Voxtral speech recognition system."
What it includes
- Full inference pipeline: mel spectrogram β CausalWhisperEncoder (32 layers) β Decoder (26 layers, GQA + AdaptiveNorm)
- Works with this 6-bit quantized model out of the box
- Decode-only Tekken tokenizer
- Supports all 13 languages
- Apache 2.0 licensed
Details
- Ported from https://github.com/awni/voxmlx (Python reference)
- Depends only on https://github.com/ml-explore/mlx-swift
- macOS 15+, Apple Silicon
Repo: https://github.com/omidscn/VoxtralKit
Thanks to the mlx-community for the quantized conversion and to @awni for the Python reference that made this port possible.