feat: add model files

dd32b84 verified about 14 hours ago

2.31 kB

license: apache-2.0
library_name: mlx
tags:
  - mlx
  - whisper
  - speech-recognition
  - automatic-speech-recognition
  - fp16
  - apple-silicon
  - ios
  - coreml
language:
  - en
  - zh
  - de
  - es
  - ru
  - ko
  - fr
  - ja
  - pt
  - tr
  - pl
  - ca
  - nl
  - ar
  - sv
  - it
  - id
  - hi
  - fi
  - vi
  - he
  - uk
  - el
  - ms
  - cs
  - ro
  - da
  - hu
  - ta
  - 'no'
  - th
  - ur
  - hr
  - bg
  - lt
  - la
  - mi
  - ml
  - cy
  - sk
  - te
  - fa
  - lv
  - bn
  - sr
  - az
  - sl
  - kn
  - et
  - mk
  - br
  - eu
  - is
  - hy
  - ne
  - mn
  - bs
  - kk
  - sq
  - sw
  - gl
  - mr
  - pa
  - si
  - km
  - sn
  - yo
  - so
  - af
  - oc
  - ka
  - be
  - tg
  - sd
  - gu
  - am
  - yi
  - lo
  - uz
  - fo
  - ht
  - ps
  - tk
  - nn
  - mt
  - sa
  - lb
  - my
  - bo
  - tl
  - mg
  - as
  - tt
  - haw
  - ln
  - ha
  - ba
  - jw
  - su
  - yue
pipeline_tag: automatic-speech-recognition
base_model: openai/whisper-medium

Whisper Medium - MLX FP16

This is the OpenAI Whisper Medium model converted to MLX format with FP16 precision, optimized for Apple Silicon inference.

Model Details

Property	Value
Base Model	openai/whisper-medium
Parameters	~769M
Format	MLX SafeTensors (FP16)
Model Size	1,454.10 MB
Sample Rate	16,000 Hz
Audio Layers	24
Text Layers	24
Hidden Size	1024
Attention Heads	16
Vocabulary Size	51,865

Intended Use

This model is optimized for on-device automatic speech recognition (ASR) on Apple Silicon devices (Mac, iPhone, iPad). It is designed for use with the WhisperKit or MLX frameworks.

Files

config.json - Model configuration
model.safetensors - Model weights in SafeTensors format (FP16)
multilingual.tiktoken - Tokenizer

Usage

import mlx_whisper

result = mlx_whisper.transcribe(
    "audio.mp3",
    path_or_hf_repo="aitytech/Whisper-Medium-MLX-FP16",
)
print(result["text"])

Original Model

Paper: Robust Speech Recognition via Large-Scale Weak Supervision
Authors: OpenAI
License: Apache-2.0