CrisperWhisper Unsloth (MLX, FP16)

TL;DR: Please consider using my FP8 finetune. In my preliminary tests and benchmarks it performs similar to this FP16 variant. However, it's 2x as fast and 2x less memory hungry!

This repo provides the CrisperWhisper model converted to MLX for fast on-device ASR on Apple Silicon.

This model works exceptionally well for scenarios where word-level precision is desired. Instead of grammatically correct sentences, this model is fine-tuned for word-by-word transcriptions - which is exactly what you probably want to use for interviews, or Alexa-like home automation applications.

Huge credit to Laurin from nyra.health and Daniel + Michael from Unsloth for the heavy lifting. Free to use for non-commercial use only!

See Laurin's original paper for more details.

Base model: unsloth/CrisperWhisper (Torch) → converted via mlx-examples/whisper/convert.py. :contentReference[oaicite:0]{index=0}

What’s inside

weights.safetensors — MLX FP16 weights
config.json — MLX Whisper config

Usage (recommended: auto-download from Hugging Face)

mlx_whisper supports Hugging Face repo IDs in path_or_hf_repo, and will download automatically. :contentReference[oaicite:1]{index=1}

from mlx_whisper import transcribe

out = transcribe(
    "audio.wav",
    path_or_hf_repo="kyr0/crisperwhisper-unsloth-mlx",
)
print(out["text"])

Usage (local path)

If you already have a local MLX folder, point path_or_hf_repo to it:

from mlx_whisper import transcribe

out = transcribe(
    "audio.wav",
    path_or_hf_repo="./mlx_models/crisperwhisper-unsloth-mlx",
)
print(out["text"])