mudler commited on
Commit
ed36312
·
verified ·
1 Parent(s): 6d0b393

Update combined parakeet.cpp model card

Browse files
Files changed (1) hide show
  1. README.md +179 -0
README.md ADDED
@@ -0,0 +1,179 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-4.0
3
+ library_name: parakeet.cpp
4
+ tags:
5
+ - automatic-speech-recognition
6
+ - asr
7
+ - parakeet
8
+ - gguf
9
+ - ggml
10
+ - cpp-inference
11
+ - nemo
12
+ pipeline_tag: automatic-speech-recognition
13
+ base_model:
14
+ - nvidia/parakeet-tdt_ctc-110m
15
+ - nvidia/parakeet_realtime_eou_120m-v1
16
+ - nvidia/parakeet-ctc-0.6b
17
+ - nvidia/parakeet-rnnt-0.6b
18
+ - nvidia/parakeet-tdt-0.6b-v2
19
+ - nvidia/parakeet-tdt-0.6b-v3
20
+ - nvidia/parakeet-ctc-1.1b
21
+ - nvidia/parakeet-rnnt-1.1b
22
+ - nvidia/parakeet-tdt-1.1b
23
+ - nvidia/parakeet-tdt_ctc-1.1b
24
+ ---
25
+
26
+ # Parakeet GGUF — models for parakeet.cpp
27
+
28
+ GGUF-format weights for [parakeet.cpp](https://github.com/mudler/parakeet.cpp), a C++/ggml port of NVIDIA NeMo Parakeet that matches the upstream PyTorch models on CPU. This single repo collects **every supported model × quantization** as a flat set of `.gguf` files — download just the one you need.
29
+
30
+ **F16 is the recommended default** — same accuracy as F32, ~1.7× smaller, and typically the fastest on modern CPUs via ggml's F32×F16 matmul fast path.
31
+
32
+ ## Models
33
+
34
+ ### tdt_ctc-110m
35
+
36
+ Source: [nvidia/parakeet-tdt_ctc-110m](https://huggingface.co/nvidia/parakeet-tdt_ctc-110m) · Hybrid TDT+CTC (FastConformer) · heads: TDT + CTC
37
+
38
+ | File | Variant | Size | WER vs NeMo |
39
+ |---|---|---:|---:|
40
+ | `tdt_ctc-110m-f16.gguf` ← **recommended** | F16 | 267.5 MB | 0.0000 |
41
+ | `tdt_ctc-110m-q8_0.gguf` | Q8_0 | 177.8 MB | 0.0000 |
42
+ | `tdt_ctc-110m-q6_k.gguf` | Q6_K | 155.9 MB | not measured |
43
+ | `tdt_ctc-110m-q5_k.gguf` | Q5_K | 143.3 MB | not measured |
44
+ | `tdt_ctc-110m-q4_k.gguf` | Q4_K | 131.4 MB | 0.0000 |
45
+
46
+ ### realtime_eou_120m-v1
47
+
48
+ Source: [nvidia/parakeet_realtime_eou_120m-v1](https://huggingface.co/nvidia/parakeet_realtime_eou_120m-v1) · Cache-aware streaming RNNT (FastConformer, EOU/EOB) · heads: RNNT (streaming)
49
+
50
+ | File | Variant | Size | WER vs NeMo |
51
+ |---|---|---:|---:|
52
+ | `realtime_eou_120m-v1-f16.gguf` ← **recommended** | F16 | 266.5 MB | not measured |
53
+ | `realtime_eou_120m-v1-q8_0.gguf` | Q8_0 | 176.0 MB | not measured |
54
+ | `realtime_eou_120m-v1-q6_k.gguf` | Q6_K | 153.9 MB | not measured |
55
+ | `realtime_eou_120m-v1-q5_k.gguf` | Q5_K | 141.2 MB | not measured |
56
+ | `realtime_eou_120m-v1-q4_k.gguf` | Q4_K | 129.1 MB | not measured |
57
+
58
+ ### ctc-0.6b
59
+
60
+ Source: [nvidia/parakeet-ctc-0.6b](https://huggingface.co/nvidia/parakeet-ctc-0.6b) · CTC (FastConformer) · heads: CTC
61
+
62
+ | File | Variant | Size | WER vs NeMo |
63
+ |---|---|---:|---:|
64
+ | `ctc-0.6b-f16.gguf` ← **recommended** | F16 | 1373.4 MB | 0.0000 |
65
+ | `ctc-0.6b-q8_0.gguf` | Q8_0 | 875.4 MB | 0.0000 |
66
+ | `ctc-0.6b-q6_k.gguf` | Q6_K | 746.8 MB | not measured |
67
+ | `ctc-0.6b-q5_k.gguf` | Q5_K | 676.3 MB | not measured |
68
+ | `ctc-0.6b-q4_k.gguf` | Q4_K | 609.9 MB | not measured |
69
+
70
+ ### rnnt-0.6b
71
+
72
+ Source: [nvidia/parakeet-rnnt-0.6b](https://huggingface.co/nvidia/parakeet-rnnt-0.6b) · RNNT transducer (FastConformer) · heads: RNNT
73
+
74
+ | File | Variant | Size | WER vs NeMo |
75
+ |---|---|---:|---:|
76
+ | `rnnt-0.6b-f16.gguf` ← **recommended** | F16 | 1402.8 MB | 0.0000 |
77
+ | `rnnt-0.6b-q8_0.gguf` | Q8_0 | 903.9 MB | 0.0000 |
78
+ | `rnnt-0.6b-q6_k.gguf` | Q6_K | 776.3 MB | not measured |
79
+ | `rnnt-0.6b-q5_k.gguf` | Q5_K | 705.7 MB | not measured |
80
+ | `rnnt-0.6b-q4_k.gguf` | Q4_K | 639.2 MB | not measured |
81
+
82
+ ### tdt-0.6b-v2
83
+
84
+ Source: [nvidia/parakeet-tdt-0.6b-v2](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2) · TDT transducer (FastConformer) · heads: TDT
85
+
86
+ | File | Variant | Size | WER vs NeMo |
87
+ |---|---|---:|---:|
88
+ | `tdt-0.6b-v2-f16.gguf` ← **recommended** | F16 | 1404.2 MB | 0.0000 |
89
+ | `tdt-0.6b-v2-q8_0.gguf` | Q8_0 | 903.8 MB | 0.0000 |
90
+ | `tdt-0.6b-v2-q6_k.gguf` | Q6_K | 775.9 MB | not measured |
91
+ | `tdt-0.6b-v2-q5_k.gguf` | Q5_K | 705.0 MB | not measured |
92
+ | `tdt-0.6b-v2-q4_k.gguf` | Q4_K | 638.4 MB | not measured |
93
+
94
+ ### tdt-0.6b-v3
95
+
96
+ Source: [nvidia/parakeet-tdt-0.6b-v3](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3) · TDT transducer (FastConformer) · heads: TDT
97
+
98
+ | File | Variant | Size | WER vs NeMo |
99
+ |---|---|---:|---:|
100
+ | `tdt-0.6b-v3-f16.gguf` ← **recommended** | F16 | 1441.0 MB | 0.0000 |
101
+ | `tdt-0.6b-v3-q8_0.gguf` | Q8_0 | 940.7 MB | 0.0000 |
102
+ | `tdt-0.6b-v3-q6_k.gguf` | Q6_K | 812.7 MB | not measured |
103
+ | `tdt-0.6b-v3-q5_k.gguf` | Q5_K | 741.9 MB | not measured |
104
+ | `tdt-0.6b-v3-q4_k.gguf` | Q4_K | 675.2 MB | not measured |
105
+
106
+ ### ctc-1.1b
107
+
108
+ Source: [nvidia/parakeet-ctc-1.1b](https://huggingface.co/nvidia/parakeet-ctc-1.1b) · CTC (FastConformer) · heads: CTC
109
+
110
+ | File | Variant | Size | WER vs NeMo |
111
+ |---|---|---:|---:|
112
+ | `ctc-1.1b-f16.gguf` ← **recommended** | F16 | 2395.8 MB | 0.0000 |
113
+ | `ctc-1.1b-q8_0.gguf` | Q8_0 | 1526.3 MB | 0.0000 |
114
+ | `ctc-1.1b-q6_k.gguf` | Q6_K | 1301.7 MB | not measured |
115
+ | `ctc-1.1b-q5_k.gguf` | Q5_K | 1178.5 MB | not measured |
116
+ | `ctc-1.1b-q4_k.gguf` | Q4_K | 1062.6 MB | not measured |
117
+
118
+ ### rnnt-1.1b
119
+
120
+ Source: [nvidia/parakeet-rnnt-1.1b](https://huggingface.co/nvidia/parakeet-rnnt-1.1b) · RNNT transducer (FastConformer) · heads: RNNT
121
+
122
+ | File | Variant | Size | WER vs NeMo |
123
+ |---|---|---:|---:|
124
+ | `rnnt-1.1b-f16.gguf` ← **recommended** | F16 | 2425.2 MB | 0.0000 |
125
+ | `rnnt-1.1b-q8_0.gguf` | Q8_0 | 1554.7 MB | 0.0000 |
126
+ | `rnnt-1.1b-q6_k.gguf` | Q6_K | 1331.2 MB | not measured |
127
+ | `rnnt-1.1b-q5_k.gguf` | Q5_K | 1207.9 MB | not measured |
128
+ | `rnnt-1.1b-q4_k.gguf` | Q4_K | 1091.9 MB | not measured |
129
+
130
+ ### tdt-1.1b
131
+
132
+ Source: [nvidia/parakeet-tdt-1.1b](https://huggingface.co/nvidia/parakeet-tdt-1.1b) · TDT transducer (FastConformer) · heads: TDT
133
+
134
+ | File | Variant | Size | WER vs NeMo |
135
+ |---|---|---:|---:|
136
+ | `tdt-1.1b-f16.gguf` ← **recommended** | F16 | 2425.3 MB | 0.0000 |
137
+ | `tdt-1.1b-q8_0.gguf` | Q8_0 | 1554.8 MB | 0.0000 |
138
+ | `tdt-1.1b-q6_k.gguf` | Q6_K | 1331.2 MB | not measured |
139
+ | `tdt-1.1b-q5_k.gguf` | Q5_K | 1207.9 MB | not measured |
140
+ | `tdt-1.1b-q4_k.gguf` | Q4_K | 1091.9 MB | not measured |
141
+
142
+ ### tdt_ctc-1.1b
143
+
144
+ Source: [nvidia/parakeet-tdt_ctc-1.1b](https://huggingface.co/nvidia/parakeet-tdt_ctc-1.1b) · Hybrid TDT+CTC (FastConformer) · heads: TDT + CTC
145
+
146
+ | File | Variant | Size | WER vs NeMo |
147
+ |---|---|---:|---:|
148
+ | `tdt_ctc-1.1b-f16.gguf` ← **recommended** | F16 | 2429.5 MB | 0.0000 |
149
+ | `tdt_ctc-1.1b-q8_0.gguf` | Q8_0 | 1559.0 MB | 0.0000 |
150
+ | `tdt_ctc-1.1b-q6_k.gguf` | Q6_K | 1335.4 MB | not measured |
151
+ | `tdt_ctc-1.1b-q5_k.gguf` | Q5_K | 1212.1 MB | not measured |
152
+ | `tdt_ctc-1.1b-q4_k.gguf` | Q4_K | 1096.1 MB | not measured |
153
+
154
+ > WER (word error rate) is computed against the upstream NeMo reference on `tests/fixtures/speech.wav` (LibriSpeech `2086-149220-0033`, ~7.4 s, English). 0.0 = byte-for-byte identical transcript. See [parity.md](https://github.com/mudler/parakeet.cpp/blob/main/docs/parity.md) and [quantization.md](https://github.com/mudler/parakeet.cpp/blob/main/docs/quantization.md).
155
+
156
+ ## Quantization notes
157
+
158
+ Quantization is applied **only** to the large linear weights fed directly into `ggml_mul_mat` (encoder FFN + attention projections, subsampling output projection, joint enc/pred projections). All other tensors (mel filterbank, LSTM prediction net, conv kernels, batch_norm stats, norms, biases, embeddings) stay F32.
159
+
160
+ ## Usage
161
+
162
+ ```bash
163
+ # 1. Clone + build parakeet.cpp
164
+ git clone https://github.com/mudler/parakeet.cpp
165
+ cd parakeet.cpp
166
+ cmake -B build -DPARAKEET_BUILD_CLI=ON && cmake --build build -j
167
+
168
+ # 2. Download one quant (F16 recommended)
169
+ huggingface-cli download mudler/parakeet-cpp-gguf tdt_ctc-110m-f16.gguf --local-dir models/
170
+
171
+ # 3. Transcribe
172
+ build/examples/cli/parakeet-cli transcribe \
173
+ --model models/tdt_ctc-110m-f16.gguf \
174
+ --input audio.wav
175
+ ```
176
+
177
+ ## License
178
+
179
+ The GGUF weights are derived from the NVIDIA NeMo Parakeet checkpoints, released under the [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/) license. The parakeet.cpp runtime is MIT-licensed.