Whisper Medium - MLX FP16

This is the OpenAI Whisper Medium model converted to MLX format with FP16 precision, optimized for Apple Silicon inference.

Model Details

Property	Value
Base Model	openai/whisper-medium
Parameters	~769M
Format	MLX SafeTensors (FP16)
Model Size	1,454.10 MB
Sample Rate	16,000 Hz
Audio Layers	24
Text Layers	24
Hidden Size	1024
Attention Heads	16
Vocabulary Size	51,865

Intended Use

This model is optimized for on-device automatic speech recognition (ASR) on Apple Silicon devices (Mac, iPhone, iPad). It is designed for use with the WhisperKit or MLX frameworks.

Files

config.json - Model configuration
model.safetensors - Model weights in SafeTensors format (FP16)
multilingual.tiktoken - Tokenizer

Usage

import mlx_whisper

result = mlx_whisper.transcribe(
    "audio.mp3",
    path_or_hf_repo="aitytech/Whisper-Medium-MLX-FP16",
)
print(result["text"])

Original Model

Paper: Robust Speech Recognition via Large-Scale Weak Supervision
Authors: OpenAI
License: Apache-2.0

Downloads last month: 5

Safetensors

Model size

0.8B params

Tensor type

F16

MLX

Hardware compatibility

Quantized

Model tree for aitytech/Whisper-Medium-MLX-FP16

Base model

openai/whisper-medium

Finetuned

(900)

this model

Paper for aitytech/Whisper-Medium-MLX-FP16

Robust Speech Recognition via Large-Scale Weak Supervision

Paper • 2212.04356 • Published Dec 6, 2022 • 55