feat: add model files

6ab43d7 verified about 9 hours ago

2.3 kB

license: apache-2.0
library_name: mlx
tags:
  - mlx
  - whisper
  - speech-recognition
  - automatic-speech-recognition
  - fp16
  - apple-silicon
  - ios
  - coreml
language:
  - en
  - zh
  - de
  - es
  - ru
  - ko
  - fr
  - ja
  - pt
  - tr
  - pl
  - ca
  - nl
  - ar
  - sv
  - it
  - id
  - hi
  - fi
  - vi
  - he
  - uk
  - el
  - ms
  - cs
  - ro
  - da
  - hu
  - ta
  - 'no'
  - th
  - ur
  - hr
  - bg
  - lt
  - la
  - mi
  - ml
  - cy
  - sk
  - te
  - fa
  - lv
  - bn
  - sr
  - az
  - sl
  - kn
  - et
  - mk
  - br
  - eu
  - is
  - hy
  - ne
  - mn
  - bs
  - kk
  - sq
  - sw
  - gl
  - mr
  - pa
  - si
  - km
  - sn
  - yo
  - so
  - af
  - oc
  - ka
  - be
  - tg
  - sd
  - gu
  - am
  - yi
  - lo
  - uz
  - fo
  - ht
  - ps
  - tk
  - nn
  - mt
  - sa
  - lb
  - my
  - bo
  - tl
  - mg
  - as
  - tt
  - haw
  - ln
  - ha
  - ba
  - jw
  - su
  - yue
pipeline_tag: automatic-speech-recognition
base_model: openai/whisper-small

Whisper Small - MLX FP16

This is the OpenAI Whisper Small model converted to MLX format with FP16 precision, optimized for Apple Silicon inference.

Model Details

Property	Value
Base Model	openai/whisper-small
Parameters	~244M
Format	MLX SafeTensors (FP16)
Model Size	458.92 MB
Sample Rate	16,000 Hz
Audio Layers	12
Text Layers	12
Hidden Size	768
Attention Heads	12
Vocabulary Size	51,865

Intended Use

This model is optimized for on-device automatic speech recognition (ASR) on Apple Silicon devices (Mac, iPhone, iPad). It is designed for use with the WhisperKit or MLX frameworks.

Files

config.json - Model configuration
model.safetensors - Model weights in SafeTensors format (FP16)
multilingual.tiktoken - Tokenizer

Usage

import mlx_whisper

result = mlx_whisper.transcribe(
    "audio.mp3",
    path_or_hf_repo="aitytech/Whisper-Small-MLX-FP16",
)
print(result["text"])

Original Model

Paper: Robust Speech Recognition via Large-Scale Weak Supervision
Authors: OpenAI
License: Apache-2.0