feat: add model files

Browse files

Files changed (4) hide show

README.md +168 -3
config.json +13 -0
model.safetensors +3 -0
multilingual.tiktoken +0 -0

README.md CHANGED Viewed

@@ -1,3 +1,168 @@
----
-license: apache-2.0
----

+---
+license: mit
+library_name: mlx
+tags:
+  - mlx
+  - whisper
+  - speech-recognition
+  - automatic-speech-recognition
+  - fp16
+  - apple-silicon
+  - ios
+  - coreml
+language:
+  - en
+  - zh
+  - de
+  - es
+  - ru
+  - ko
+  - fr
+  - ja
+  - pt
+  - tr
+  - pl
+  - ca
+  - nl
+  - ar
+  - sv
+  - it
+  - id
+  - hi
+  - fi
+  - vi
+  - he
+  - uk
+  - el
+  - ms
+  - cs
+  - ro
+  - da
+  - hu
+  - ta
+  - "no"
+  - th
+  - ur
+  - hr
+  - bg
+  - lt
+  - la
+  - mi
+  - ml
+  - cy
+  - sk
+  - te
+  - fa
+  - lv
+  - bn
+  - sr
+  - az
+  - sl
+  - kn
+  - et
+  - mk
+  - br
+  - eu
+  - is
+  - hy
+  - ne
+  - mn
+  - bs
+  - kk
+  - sq
+  - sw
+  - gl
+  - mr
+  - pa
+  - si
+  - km
+  - sn
+  - yo
+  - so
+  - af
+  - oc
+  - ka
+  - be
+  - tg
+  - sd
+  - gu
+  - am
+  - yi
+  - lo
+  - uz
+  - fo
+  - ht
+  - ps
+  - tk
+  - nn
+  - mt
+  - sa
+  - lb
+  - my
+  - bo
+  - tl
+  - mg
+  - as
+  - tt
+  - haw
+  - ln
+  - ha
+  - ba
+  - jw
+  - su
+  - yue
+pipeline_tag: automatic-speech-recognition
+base_model: openai/whisper-large-v3-turbo
+---
+# Whisper Large V3 Turbo - MLX FP16
+This is the [OpenAI Whisper Large V3 Turbo](https://huggingface.co/openai/whisper-large-v3-turbo) model converted to [MLX](https://github.com/ml-explore/mlx) format with FP16 precision, optimized for Apple Silicon inference.
+Whisper Large V3 Turbo is a distilled version of Whisper Large V3 that uses only 4 decoder layers instead of 32, making it significantly faster while maintaining high accuracy.
+## Model Details
+| Property | Value |
+|---|---|
+| Base Model | openai/whisper-large-v3-turbo |
+| Parameters | ~809M |
+| Format | MLX SafeTensors (FP16) |
+| Model Size | 1,539.20 MB |
+| Sample Rate | 16,000 Hz |
+| Mel Bins | 128 |
+| Audio Layers | 32 |
+| Text Layers | 4 |
+| Hidden Size | 1280 |
+| Attention Heads | 20 |
+| Vocabulary Size | 51,866 |
+## Intended Use
+This model is optimized for on-device automatic speech recognition (ASR) on Apple Silicon devices (Mac, iPhone, iPad). It is designed for use with the [WhisperKit](https://github.com/argmaxinc/WhisperKit) or [MLX](https://github.com/ml-explore/mlx) frameworks.
+The Turbo variant offers the best speed/accuracy trade-off for real-time transcription on device.
+## Files
+- `config.json` - Model configuration
+- `model.safetensors` - Model weights in SafeTensors format (FP16)
+- `multilingual.tiktoken` - Tokenizer
+## Usage
+```python
+import mlx_whisper
+result = mlx_whisper.transcribe(
+    "audio.mp3",
+    path_or_hf_repo="aitytech/Whisper-Large-V3-Turbo-MLX-FP16",
+)
+print(result["text"])
+```
+## Original Model
+- **Paper:** [Robust Speech Recognition via Large-Scale Weak Supervision](https://arxiv.org/abs/2212.04356)
+- **Authors:** OpenAI
+- **License:** MIT

config.json ADDED Viewed

	@@ -0,0 +1,13 @@

+{
+    "n_mels": 128,
+    "n_audio_ctx": 1500,
+    "n_audio_state": 1280,
+    "n_audio_head": 20,
+    "n_audio_layer": 32,
+    "n_vocab": 51866,
+    "n_text_ctx": 448,
+    "n_text_state": 1280,
+    "n_text_head": 20,
+    "n_text_layer": 4,
+    "model_type": "whisper"
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:951ed3fc1203e6a62467abb2144a96ce7eafca8fa77e3704fdb8635ff3e7f8a6
+size 1613977612

multilingual.tiktoken ADDED Viewed

The diff for this file is too large to render. See raw diff