VoxtralKit — Swift-native inference for this model on Apple Silicon

by omidbicker - opened Mar 21

Mar 21

Hi!
For a project of mine I need a swift package that runs the model on device, I thought others might benefit from this as well.

VoxtralKit — a Swift package that runs this model entirely on-device via MLX Swift. It's the first Swift
implementation of Voxtral, designed to make on-device speech-to-text accessible to Swift/macOS developers.

Quick example

import VoxtralKit

let transcriber = try VoxtralTranscriber(modelPath: modelDirectory)
let text = try transcriber.transcribe(URL(fileURLWithPath: "recording.wav"))
// "Hello, this is a test of the Voxtral speech recognition system."

What it includes

Full inference pipeline: mel spectrogram → CausalWhisperEncoder (32 layers) → Decoder (26 layers, GQA + AdaptiveNorm)
Works with this 6-bit quantized model out of the box
Decode-only Tekken tokenizer
Supports all 13 languages
Apache 2.0 licensed

Details

Ported from https://github.com/awni/voxmlx (Python reference)
Depends only on https://github.com/ml-explore/mlx-swift
macOS 15+, Apple Silicon

Repo: https://github.com/omidscn/VoxtralKit

Thanks to the mlx-community for the quantized conversion and to @awni for the Python reference that made this port possible.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment