Robust Speech Recognition via Large-Scale Weak Supervision
Paper
• 2212.04356 • Published
• 51
This is the OpenAI Whisper Large V3 Turbo model converted to MLX format with FP16 precision, optimized for Apple Silicon inference.
Whisper Large V3 Turbo is a distilled version of Whisper Large V3 that uses only 4 decoder layers instead of 32, making it significantly faster while maintaining high accuracy.
| Property | Value |
|---|---|
| Base Model | openai/whisper-large-v3-turbo |
| Parameters | ~809M |
| Format | MLX SafeTensors (FP16) |
| Model Size | 1,539.20 MB |
| Sample Rate | 16,000 Hz |
| Mel Bins | 128 |
| Audio Layers | 32 |
| Text Layers | 4 |
| Hidden Size | 1280 |
| Attention Heads | 20 |
| Vocabulary Size | 51,866 |
This model is optimized for on-device automatic speech recognition (ASR) on Apple Silicon devices (Mac, iPhone, iPad). It is designed for use with the WhisperKit or MLX frameworks.
The Turbo variant offers the best speed/accuracy trade-off for real-time transcription on device.
config.json - Model configurationmodel.safetensors - Model weights in SafeTensors format (FP16)multilingual.tiktoken - Tokenizerimport mlx_whisper
result = mlx_whisper.transcribe(
"audio.mp3",
path_or_hf_repo="aitytech/Whisper-Large-V3-Turbo-MLX-FP16",
)
print(result["text"])
Quantized
Base model
openai/whisper-large-v3