leduclinh's picture
feat: add model files
dd32b84 verified
metadata
license: apache-2.0
library_name: mlx
tags:
  - mlx
  - whisper
  - speech-recognition
  - automatic-speech-recognition
  - fp16
  - apple-silicon
  - ios
  - coreml
language:
  - en
  - zh
  - de
  - es
  - ru
  - ko
  - fr
  - ja
  - pt
  - tr
  - pl
  - ca
  - nl
  - ar
  - sv
  - it
  - id
  - hi
  - fi
  - vi
  - he
  - uk
  - el
  - ms
  - cs
  - ro
  - da
  - hu
  - ta
  - 'no'
  - th
  - ur
  - hr
  - bg
  - lt
  - la
  - mi
  - ml
  - cy
  - sk
  - te
  - fa
  - lv
  - bn
  - sr
  - az
  - sl
  - kn
  - et
  - mk
  - br
  - eu
  - is
  - hy
  - ne
  - mn
  - bs
  - kk
  - sq
  - sw
  - gl
  - mr
  - pa
  - si
  - km
  - sn
  - yo
  - so
  - af
  - oc
  - ka
  - be
  - tg
  - sd
  - gu
  - am
  - yi
  - lo
  - uz
  - fo
  - ht
  - ps
  - tk
  - nn
  - mt
  - sa
  - lb
  - my
  - bo
  - tl
  - mg
  - as
  - tt
  - haw
  - ln
  - ha
  - ba
  - jw
  - su
  - yue
pipeline_tag: automatic-speech-recognition
base_model: openai/whisper-medium

Whisper Medium - MLX FP16

This is the OpenAI Whisper Medium model converted to MLX format with FP16 precision, optimized for Apple Silicon inference.

Model Details

Property Value
Base Model openai/whisper-medium
Parameters ~769M
Format MLX SafeTensors (FP16)
Model Size 1,454.10 MB
Sample Rate 16,000 Hz
Audio Layers 24
Text Layers 24
Hidden Size 1024
Attention Heads 16
Vocabulary Size 51,865

Intended Use

This model is optimized for on-device automatic speech recognition (ASR) on Apple Silicon devices (Mac, iPhone, iPad). It is designed for use with the WhisperKit or MLX frameworks.

Files

  • config.json - Model configuration
  • model.safetensors - Model weights in SafeTensors format (FP16)
  • multilingual.tiktoken - Tokenizer

Usage

import mlx_whisper

result = mlx_whisper.transcribe(
    "audio.mp3",
    path_or_hf_repo="aitytech/Whisper-Medium-MLX-FP16",
)
print(result["text"])

Original Model