VoxtralKit β€” Swift-native inference for this model on Apple Silicon

#1
by omidbicker - opened

Hi!
For a project of mine I need a swift package that runs the model on device, I thought others might benefit from this as well.

VoxtralKit β€” a Swift package that runs this model entirely on-device via MLX Swift. It's the first Swift
implementation of Voxtral, designed to make on-device speech-to-text accessible to Swift/macOS developers.

Quick example

import VoxtralKit

let transcriber = try VoxtralTranscriber(modelPath: modelDirectory)
let text = try transcriber.transcribe(URL(fileURLWithPath: "recording.wav"))
// "Hello, this is a test of the Voxtral speech recognition system."

What it includes

  • Full inference pipeline: mel spectrogram β†’ CausalWhisperEncoder (32 layers) β†’ Decoder (26 layers, GQA + AdaptiveNorm)
  • Works with this 6-bit quantized model out of the box
  • Decode-only Tekken tokenizer
  • Supports all 13 languages
  • Apache 2.0 licensed

Details

Repo: https://github.com/omidscn/VoxtralKit

Thanks to the mlx-community for the quantized conversion and to @awni for the Python reference that made this port possible.

Sign up or log in to comment