Whisper Large V3 Turbo - MLX FP16

This is the OpenAI Whisper Large V3 Turbo model converted to MLX format with FP16 precision, optimized for Apple Silicon inference.

Whisper Large V3 Turbo is a distilled version of Whisper Large V3 that uses only 4 decoder layers instead of 32, making it significantly faster while maintaining high accuracy.

Model Details

Property Value
Base Model openai/whisper-large-v3-turbo
Parameters ~809M
Format MLX SafeTensors (FP16)
Model Size 1,539.20 MB
Sample Rate 16,000 Hz
Mel Bins 128
Audio Layers 32
Text Layers 4
Hidden Size 1280
Attention Heads 20
Vocabulary Size 51,866

Intended Use

This model is optimized for on-device automatic speech recognition (ASR) on Apple Silicon devices (Mac, iPhone, iPad). It is designed for use with the WhisperKit or MLX frameworks.

The Turbo variant offers the best speed/accuracy trade-off for real-time transcription on device.

Files

  • config.json - Model configuration
  • model.safetensors - Model weights in SafeTensors format (FP16)
  • multilingual.tiktoken - Tokenizer

Usage

import mlx_whisper

result = mlx_whisper.transcribe(
    "audio.mp3",
    path_or_hf_repo="aitytech/Whisper-Large-V3-Turbo-MLX-FP16",
)
print(result["text"])

Original Model

Downloads last month
-
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for aitytech/Whisper-Large-V3-Turbo-MLX-FP16

Finetuned
(461)
this model

Paper for aitytech/Whisper-Large-V3-Turbo-MLX-FP16