Whisper Large V3 Turbo - MLX FP16

This is the OpenAI Whisper Large V3 Turbo model converted to MLX format with FP16 precision, optimized for Apple Silicon inference.

Whisper Large V3 Turbo is a distilled version of Whisper Large V3 that uses only 4 decoder layers instead of 32, making it significantly faster while maintaining high accuracy.

Model Details

Property	Value
Base Model	openai/whisper-large-v3-turbo
Parameters	~809M
Format	MLX SafeTensors (FP16)
Model Size	1,539.20 MB
Sample Rate	16,000 Hz
Mel Bins	128
Audio Layers	32
Text Layers	4
Hidden Size	1280
Attention Heads	20
Vocabulary Size	51,866

Intended Use

This model is optimized for on-device automatic speech recognition (ASR) on Apple Silicon devices (Mac, iPhone, iPad). It is designed for use with the WhisperKit or MLX frameworks.

The Turbo variant offers the best speed/accuracy trade-off for real-time transcription on device.

Files

config.json - Model configuration
model.safetensors - Model weights in SafeTensors format (FP16)
multilingual.tiktoken - Tokenizer

Usage

import mlx_whisper

result = mlx_whisper.transcribe(
    "audio.mp3",
    path_or_hf_repo="aitytech/Whisper-Large-V3-Turbo-MLX-FP16",
)
print(result["text"])