VoxtralKit β Swift-native inference for this model on Apple Silicon
#1
by omidbicker - opened
Hi!
For a project of mine I need a swift package that runs the model on device, I thought others might benefit from this as well.
VoxtralKit β a Swift package that runs this model entirely on-device via MLX Swift. It's the first Swift
implementation of Voxtral, designed to make on-device speech-to-text accessible to Swift/macOS developers.
Quick example
import VoxtralKit
let transcriber = try VoxtralTranscriber(modelPath: modelDirectory)
let text = try transcriber.transcribe(URL(fileURLWithPath: "recording.wav"))
// "Hello, this is a test of the Voxtral speech recognition system."
What it includes
- Full inference pipeline: mel spectrogram β CausalWhisperEncoder (32 layers) β Decoder (26 layers, GQA + AdaptiveNorm)
- Works with this 6-bit quantized model out of the box
- Decode-only Tekken tokenizer
- Supports all 13 languages
- Apache 2.0 licensed
Details
- Ported from https://github.com/awni/voxmlx (Python reference)
- Depends only on https://github.com/ml-explore/mlx-swift
- macOS 15+, Apple Silicon
Repo: https://github.com/omidscn/VoxtralKit
Thanks to the mlx-community for the quantized conversion and to @awni for the Python reference that made this port possible.