leduclinh commited on
Commit
fdf33e4
·
verified ·
1 Parent(s): 3be9166

feat: add model files

Browse files
Files changed (4) hide show
  1. README.md +168 -3
  2. config.json +13 -0
  3. model.safetensors +3 -0
  4. multilingual.tiktoken +0 -0
README.md CHANGED
@@ -1,3 +1,168 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ library_name: mlx
4
+ tags:
5
+ - mlx
6
+ - whisper
7
+ - speech-recognition
8
+ - automatic-speech-recognition
9
+ - fp16
10
+ - apple-silicon
11
+ - ios
12
+ - coreml
13
+ language:
14
+ - en
15
+ - zh
16
+ - de
17
+ - es
18
+ - ru
19
+ - ko
20
+ - fr
21
+ - ja
22
+ - pt
23
+ - tr
24
+ - pl
25
+ - ca
26
+ - nl
27
+ - ar
28
+ - sv
29
+ - it
30
+ - id
31
+ - hi
32
+ - fi
33
+ - vi
34
+ - he
35
+ - uk
36
+ - el
37
+ - ms
38
+ - cs
39
+ - ro
40
+ - da
41
+ - hu
42
+ - ta
43
+ - "no"
44
+ - th
45
+ - ur
46
+ - hr
47
+ - bg
48
+ - lt
49
+ - la
50
+ - mi
51
+ - ml
52
+ - cy
53
+ - sk
54
+ - te
55
+ - fa
56
+ - lv
57
+ - bn
58
+ - sr
59
+ - az
60
+ - sl
61
+ - kn
62
+ - et
63
+ - mk
64
+ - br
65
+ - eu
66
+ - is
67
+ - hy
68
+ - ne
69
+ - mn
70
+ - bs
71
+ - kk
72
+ - sq
73
+ - sw
74
+ - gl
75
+ - mr
76
+ - pa
77
+ - si
78
+ - km
79
+ - sn
80
+ - yo
81
+ - so
82
+ - af
83
+ - oc
84
+ - ka
85
+ - be
86
+ - tg
87
+ - sd
88
+ - gu
89
+ - am
90
+ - yi
91
+ - lo
92
+ - uz
93
+ - fo
94
+ - ht
95
+ - ps
96
+ - tk
97
+ - nn
98
+ - mt
99
+ - sa
100
+ - lb
101
+ - my
102
+ - bo
103
+ - tl
104
+ - mg
105
+ - as
106
+ - tt
107
+ - haw
108
+ - ln
109
+ - ha
110
+ - ba
111
+ - jw
112
+ - su
113
+ - yue
114
+ pipeline_tag: automatic-speech-recognition
115
+ base_model: openai/whisper-large-v3-turbo
116
+ ---
117
+
118
+ # Whisper Large V3 Turbo - MLX FP16
119
+
120
+ This is the [OpenAI Whisper Large V3 Turbo](https://huggingface.co/openai/whisper-large-v3-turbo) model converted to [MLX](https://github.com/ml-explore/mlx) format with FP16 precision, optimized for Apple Silicon inference.
121
+
122
+ Whisper Large V3 Turbo is a distilled version of Whisper Large V3 that uses only 4 decoder layers instead of 32, making it significantly faster while maintaining high accuracy.
123
+
124
+ ## Model Details
125
+
126
+ | Property | Value |
127
+ |---|---|
128
+ | Base Model | openai/whisper-large-v3-turbo |
129
+ | Parameters | ~809M |
130
+ | Format | MLX SafeTensors (FP16) |
131
+ | Model Size | 1,539.20 MB |
132
+ | Sample Rate | 16,000 Hz |
133
+ | Mel Bins | 128 |
134
+ | Audio Layers | 32 |
135
+ | Text Layers | 4 |
136
+ | Hidden Size | 1280 |
137
+ | Attention Heads | 20 |
138
+ | Vocabulary Size | 51,866 |
139
+
140
+ ## Intended Use
141
+
142
+ This model is optimized for on-device automatic speech recognition (ASR) on Apple Silicon devices (Mac, iPhone, iPad). It is designed for use with the [WhisperKit](https://github.com/argmaxinc/WhisperKit) or [MLX](https://github.com/ml-explore/mlx) frameworks.
143
+
144
+ The Turbo variant offers the best speed/accuracy trade-off for real-time transcription on device.
145
+
146
+ ## Files
147
+
148
+ - `config.json` - Model configuration
149
+ - `model.safetensors` - Model weights in SafeTensors format (FP16)
150
+ - `multilingual.tiktoken` - Tokenizer
151
+
152
+ ## Usage
153
+
154
+ ```python
155
+ import mlx_whisper
156
+
157
+ result = mlx_whisper.transcribe(
158
+ "audio.mp3",
159
+ path_or_hf_repo="aitytech/Whisper-Large-V3-Turbo-MLX-FP16",
160
+ )
161
+ print(result["text"])
162
+ ```
163
+
164
+ ## Original Model
165
+
166
+ - **Paper:** [Robust Speech Recognition via Large-Scale Weak Supervision](https://arxiv.org/abs/2212.04356)
167
+ - **Authors:** OpenAI
168
+ - **License:** MIT
config.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "n_mels": 128,
3
+ "n_audio_ctx": 1500,
4
+ "n_audio_state": 1280,
5
+ "n_audio_head": 20,
6
+ "n_audio_layer": 32,
7
+ "n_vocab": 51866,
8
+ "n_text_ctx": 448,
9
+ "n_text_state": 1280,
10
+ "n_text_head": 20,
11
+ "n_text_layer": 4,
12
+ "model_type": "whisper"
13
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:951ed3fc1203e6a62467abb2144a96ce7eafca8fa77e3704fdb8635ff3e7f8a6
3
+ size 1613977612
multilingual.tiktoken ADDED
The diff for this file is too large to render. See raw diff