leduclinh commited on
Commit
1e120d8
·
verified ·
1 Parent(s): 1df802c

feat: add model files

Browse files
Files changed (4) hide show
  1. README.md +163 -3
  2. config.json +13 -0
  3. model.safetensors +3 -0
  4. multilingual.tiktoken +0 -0
README.md CHANGED
@@ -1,3 +1,163 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: mlx
4
+ tags:
5
+ - mlx
6
+ - whisper
7
+ - speech-recognition
8
+ - automatic-speech-recognition
9
+ - fp16
10
+ - apple-silicon
11
+ - ios
12
+ - coreml
13
+ language:
14
+ - en
15
+ - zh
16
+ - de
17
+ - es
18
+ - ru
19
+ - ko
20
+ - fr
21
+ - ja
22
+ - pt
23
+ - tr
24
+ - pl
25
+ - ca
26
+ - nl
27
+ - ar
28
+ - sv
29
+ - it
30
+ - id
31
+ - hi
32
+ - fi
33
+ - vi
34
+ - he
35
+ - uk
36
+ - el
37
+ - ms
38
+ - cs
39
+ - ro
40
+ - da
41
+ - hu
42
+ - ta
43
+ - "no"
44
+ - th
45
+ - ur
46
+ - hr
47
+ - bg
48
+ - lt
49
+ - la
50
+ - mi
51
+ - ml
52
+ - cy
53
+ - sk
54
+ - te
55
+ - fa
56
+ - lv
57
+ - bn
58
+ - sr
59
+ - az
60
+ - sl
61
+ - kn
62
+ - et
63
+ - mk
64
+ - br
65
+ - eu
66
+ - is
67
+ - hy
68
+ - ne
69
+ - mn
70
+ - bs
71
+ - kk
72
+ - sq
73
+ - sw
74
+ - gl
75
+ - mr
76
+ - pa
77
+ - si
78
+ - km
79
+ - sn
80
+ - yo
81
+ - so
82
+ - af
83
+ - oc
84
+ - ka
85
+ - be
86
+ - tg
87
+ - sd
88
+ - gu
89
+ - am
90
+ - yi
91
+ - lo
92
+ - uz
93
+ - fo
94
+ - ht
95
+ - ps
96
+ - tk
97
+ - nn
98
+ - mt
99
+ - sa
100
+ - lb
101
+ - my
102
+ - bo
103
+ - tl
104
+ - mg
105
+ - as
106
+ - tt
107
+ - haw
108
+ - ln
109
+ - ha
110
+ - ba
111
+ - jw
112
+ - su
113
+ - yue
114
+ pipeline_tag: automatic-speech-recognition
115
+ base_model: openai/whisper-base
116
+ ---
117
+
118
+ # Whisper Base - MLX FP16
119
+
120
+ This is the [OpenAI Whisper Base](https://huggingface.co/openai/whisper-base) model converted to [MLX](https://github.com/ml-explore/mlx) format with FP16 precision, optimized for Apple Silicon inference.
121
+
122
+ ## Model Details
123
+
124
+ | Property | Value |
125
+ |---|---|
126
+ | Base Model | openai/whisper-base |
127
+ | Parameters | ~74M |
128
+ | Format | MLX SafeTensors (FP16) |
129
+ | Model Size | 137.02 MB |
130
+ | Sample Rate | 16,000 Hz |
131
+ | Audio Layers | 6 |
132
+ | Text Layers | 6 |
133
+ | Hidden Size | 512 |
134
+ | Attention Heads | 8 |
135
+ | Vocabulary Size | 51,865 |
136
+
137
+ ## Intended Use
138
+
139
+ This model is optimized for on-device automatic speech recognition (ASR) on Apple Silicon devices (Mac, iPhone, iPad). It is designed for use with the [WhisperKit](https://github.com/argmaxinc/WhisperKit) or [MLX](https://github.com/ml-explore/mlx) frameworks.
140
+
141
+ ## Files
142
+
143
+ - `config.json` - Model configuration
144
+ - `model.safetensors` - Model weights in SafeTensors format (FP16)
145
+ - `multilingual.tiktoken` - Tokenizer
146
+
147
+ ## Usage
148
+
149
+ ```python
150
+ import mlx_whisper
151
+
152
+ result = mlx_whisper.transcribe(
153
+ "audio.mp3",
154
+ path_or_hf_repo="aitytech/Whisper-Base-MLX-FP16",
155
+ )
156
+ print(result["text"])
157
+ ```
158
+
159
+ ## Original Model
160
+
161
+ - **Paper:** [Robust Speech Recognition via Large-Scale Weak Supervision](https://arxiv.org/abs/2212.04356)
162
+ - **Authors:** OpenAI
163
+ - **License:** Apache-2.0
config.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "n_mels": 80,
3
+ "n_audio_ctx": 1500,
4
+ "n_audio_state": 512,
5
+ "n_audio_head": 8,
6
+ "n_audio_layer": 6,
7
+ "n_vocab": 51865,
8
+ "n_text_ctx": 448,
9
+ "n_text_state": 512,
10
+ "n_text_head": 8,
11
+ "n_text_layer": 6,
12
+ "model_type": "whisper"
13
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8dfd359ff6f512d943abe01743a134784b93506bbe7df6b4f23e43791409a4d7
3
+ size 143676871
multilingual.tiktoken ADDED
The diff for this file is too large to render. See raw diff