odg123 commited on
Commit
d8d48c5
·
verified ·
1 Parent(s): 3078c5d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -389
README.md CHANGED
@@ -1,389 +0,0 @@
1
- <div align="center">
2
- <img src="https://raw.githubusercontent.com/k2-fsa/icefall/master/docs/source/_static/logo.png" width=168>
3
- </div>
4
-
5
- # Introduction
6
-
7
- The icefall project contains speech-related recipes for various datasets
8
- using [k2-fsa](https://github.com/k2-fsa/k2) and [lhotse](https://github.com/lhotse-speech/lhotse).
9
-
10
- You can use [sherpa](https://github.com/k2-fsa/sherpa), [sherpa-ncnn](https://github.com/k2-fsa/sherpa-ncnn) or [sherpa-onnx](https://github.com/k2-fsa/sherpa-onnx) for deployment with models
11
- in icefall; these frameworks also support models not included in icefall; please refer to respective documents for more details.
12
-
13
- You can try pre-trained models from within your browser without the need
14
- to download or install anything by visiting this [huggingface space](https://huggingface.co/spaces/k2-fsa/automatic-speech-recognition).
15
- Please refer to [document](https://k2-fsa.github.io/icefall/huggingface/spaces.html) for more details.
16
-
17
- # Installation
18
-
19
- Please refer to [document](https://k2-fsa.github.io/icefall/installation/index.html)
20
- for installation.
21
-
22
- # Recipes
23
-
24
- Please refer to [document](https://k2-fsa.github.io/icefall/recipes/index.html)
25
- for more details.
26
-
27
- ## ASR: Automatic Speech Recognition
28
-
29
- ### Supported Datasets
30
- - [yesno][yesno]
31
-
32
- - [Aidatatang_200zh][aidatatang_200zh]
33
- - [Aishell][aishell]
34
- - [Aishell2][aishell2]
35
- - [Aishell4][aishell4]
36
- - [Alimeeting][alimeeting]
37
- - [AMI][ami]
38
- - [CommonVoice][commonvoice]
39
- - [Corpus of Spontaneous Japanese][csj]
40
- - [GigaSpeech][gigaspeech]
41
- - [LibriCSS][libricss]
42
- - [LibriSpeech][librispeech]
43
- - [Libriheavy][libriheavy]
44
- - [Multi-Dialect Broadcast News Arabic Speech Recognition][mgb2]
45
- - [SPGISpeech][spgispeech]
46
- - [Switchboard][swbd]
47
- - [TIMIT][timit]
48
- - [TED-LIUM3][tedlium3]
49
- - [TAL_CSASR][tal_csasr]
50
- - [Voxpopuli][voxpopuli]
51
- - [XBMU-AMDO31][xbmu-amdo31]
52
- - [WenetSpeech][wenetspeech]
53
-
54
- More datasets will be added in the future.
55
-
56
- ### Supported Models
57
-
58
- The [LibriSpeech][librispeech] recipe supports the most comprehensive set of models, you are welcome to try them out.
59
-
60
- #### CTC
61
- - TDNN LSTM CTC
62
- - Conformer CTC
63
- - Zipformer CTC
64
-
65
- #### MMI
66
- - Conformer MMI
67
- - Zipformer MMI
68
-
69
- #### Transducer
70
- - Conformer-based Encoder
71
- - LSTM-based Encoder
72
- - Zipformer-based Encoder
73
- - LSTM-based Predictor
74
- - [Stateless Predictor](https://research.google/pubs/rnn-transducer-with-stateless-prediction-network/)
75
-
76
- #### Whisper
77
- - [OpenAi Whisper](https://arxiv.org/abs/2212.04356) (We support fine-tuning on AiShell-1.)
78
-
79
- If you are willing to contribute to icefall, please refer to [contributing](https://k2-fsa.github.io/icefall/contributing/index.html) for more details.
80
-
81
- We would like to highlight the performance of some of the recipes here.
82
-
83
- ### [yesno][yesno]
84
-
85
- This is the simplest ASR recipe in `icefall` and can be run on CPU.
86
- Training takes less than 30 seconds and gives you the following WER:
87
-
88
- ```
89
- [test_set] %WER 0.42% [1 / 240, 0 ins, 1 del, 0 sub ]
90
- ```
91
- We provide a Colab notebook for this recipe: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1tIjjzaJc3IvGyKiMCDWO-TSnBgkcuN3B?usp=sharing)
92
-
93
-
94
- ### [LibriSpeech][librispeech]
95
-
96
- Please see [RESULTS.md](https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/RESULTS.md)
97
- for the **latest** results.
98
-
99
- #### [Conformer CTC](https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/conformer_ctc)
100
-
101
- | | test-clean | test-other |
102
- |-----|------------|------------|
103
- | WER | 2.42 | 5.73 |
104
-
105
-
106
- We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1huyupXAcHsUrKaWfI83iMEJ6J0Nh0213?usp=sharing)
107
-
108
- #### [TDNN LSTM CTC](https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/tdnn_lstm_ctc)
109
-
110
- | | test-clean | test-other |
111
- |-----|------------|------------|
112
- | WER | 6.59 | 17.69 |
113
-
114
- We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1-iSfQMp2So-We_Uu49N4AAcMInB72u9z?usp=sharing)
115
-
116
-
117
- #### [Transducer (Conformer Encoder + LSTM Predictor)](https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/transducer)
118
-
119
- | | test-clean | test-other |
120
- |---------------|------------|------------|
121
- | greedy_search | 3.07 | 7.51 |
122
-
123
- We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1_u6yK9jDkPwG_NLrZMN2XK7Aeq4suMO2?usp=sharing)
124
-
125
- #### [Transducer (Conformer Encoder + Stateless Predictor)](https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/transducer)
126
-
127
- | | test-clean | test-other |
128
- |---------------------------------------|------------|------------|
129
- | modified_beam_search (`beam_size=4`) | 2.56 | 6.27 |
130
-
131
-
132
- We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1CO1bXJ-2khDckZIW8zjOPHGSKLHpTDlp?usp=sharing)
133
-
134
-
135
- #### [Transducer (Zipformer Encoder + Stateless Predictor)](https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/zipformer)
136
-
137
- WER (modified_beam_search `beam_size=4` unless further stated)
138
-
139
- 1. LibriSpeech-960hr
140
-
141
- | Encoder | Params | test-clean | test-other | epochs | devices |
142
- |-----------------|--------|------------|------------|---------|------------|
143
- | Zipformer | 65.5M | 2.21 | 4.79 | 50 | 4 32G-V100 |
144
- | Zipformer-small | 23.2M | 2.42 | 5.73 | 50 | 2 32G-V100 |
145
- | Zipformer-large | 148.4M | 2.06 | 4.63 | 50 | 4 32G-V100 |
146
- | Zipformer-large | 148.4M | 2.00 | 4.38 | 174 | 8 80G-A100 |
147
-
148
- 2. LibriSpeech-960hr + GigaSpeech
149
-
150
- | Encoder | Params | test-clean | test-other |
151
- |-----------------|--------|------------|------------|
152
- | Zipformer | 65.5M | 1.78 | 4.08 |
153
-
154
-
155
- 3. LibriSpeech-960hr + GigaSpeech + CommonVoice
156
-
157
- | Encoder | Params | test-clean | test-other |
158
- |-----------------|--------|------------|------------|
159
- | Zipformer | 65.5M | 1.90 | 3.98 |
160
-
161
-
162
- ### [GigaSpeech][gigaspeech]
163
-
164
- #### [Conformer CTC](https://github.com/k2-fsa/icefall/tree/master/egs/gigaspeech/ASR/conformer_ctc)
165
-
166
- | | Dev | Test |
167
- |-----|-------|-------|
168
- | WER | 10.47 | 10.58 |
169
-
170
- #### [Transducer (pruned_transducer_stateless2)](https://github.com/k2-fsa/icefall/tree/master/egs/gigaspeech/ASR/pruned_transducer_stateless2)
171
-
172
- Conformer Encoder + Stateless Predictor + k2 Pruned RNN-T Loss
173
-
174
- | | Dev | Test |
175
- |----------------------|-------|-------|
176
- | greedy_search | 10.51 | 10.73 |
177
- | fast_beam_search | 10.50 | 10.69 |
178
- | modified_beam_search | 10.40 | 10.51 |
179
-
180
- #### [Transducer (Zipformer Encoder + Stateless Predictor)](https://github.com/k2-fsa/icefall/tree/master/egs/gigaspeech/ASR/zipformer)
181
-
182
- | | Dev | Test |
183
- |----------------------|-------|-------|
184
- | greedy_search | 10.31 | 10.50 |
185
- | fast_beam_search | 10.26 | 10.48 |
186
- | modified_beam_search | 10.25 | 10.38 |
187
-
188
-
189
- ### [Aishell][aishell]
190
-
191
- #### [TDNN LSTM CTC](https://github.com/k2-fsa/icefall/tree/master/egs/aishell/ASR/tdnn_lstm_ctc)
192
-
193
- | | test |
194
- |-----|-------|
195
- | CER | 10.16 |
196
-
197
- We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1jbyzYq3ytm6j2nlEt-diQm-6QVWyDDEa?usp=sharing)
198
-
199
- #### [Transducer (Conformer Encoder + Stateless Predictor)](https://github.com/k2-fsa/icefall/tree/master/egs/aishell/ASR/transducer_stateless)
200
-
201
- | | test |
202
- |-----|------|
203
- | CER | 4.38 |
204
-
205
- We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/14XaT2MhnBkK-3_RqqWq3K90Xlbin-GZC?usp=sharing)
206
-
207
- #### [Transducer (Zipformer Encoder + Stateless Predictor)](https://github.com/k2-fsa/icefall/tree/master/egs/aishell/ASR/zipformer)
208
-
209
- WER (modified_beam_search `beam_size=4`)
210
-
211
- | Encoder | Params | dev | test | epochs |
212
- |-----------------|--------|-----|------|---------|
213
- | Zipformer | 73.4M | 4.13| 4.40 | 55 |
214
- | Zipformer-small | 30.2M | 4.40| 4.67 | 55 |
215
- | Zipformer-large | 157.3M | 4.03| 4.28 | 56 |
216
-
217
-
218
- ### [Aishell4][aishell4]
219
-
220
- #### [Transducer (pruned_transducer_stateless5)](https://github.com/k2-fsa/icefall/tree/master/egs/aishell4/ASR/pruned_transducer_stateless5)
221
-
222
- 1 Trained with all subsets:
223
- | | test |
224
- |-----|------------|
225
- | CER | 29.08 |
226
-
227
- We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1z3lkURVv9M7uTiIgf3Np9IntMHEknaks?usp=sharing)
228
-
229
-
230
- ### [TIMIT][timit]
231
-
232
- #### [TDNN LSTM CTC](https://github.com/k2-fsa/icefall/tree/master/egs/timit/ASR/tdnn_lstm_ctc)
233
-
234
- | |TEST|
235
- |---|----|
236
- |PER| 19.71% |
237
-
238
- We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1Hs9DA4V96uapw_30uNp32OMJgkuR5VVd?usp=sharing)
239
-
240
- #### [TDNN LiGRU CTC](https://github.com/k2-fsa/icefall/tree/master/egs/timit/ASR/tdnn_ligru_ctc)
241
-
242
- | |TEST|
243
- |---|----|
244
- |PER| 17.66% |
245
-
246
- We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1z3lkURVv9M7uTiIgf3Np9IntMHEknaks?usp=sharing)
247
-
248
-
249
- ### [TED-LIUM3][tedlium3]
250
-
251
- #### [Transducer (Conformer Encoder + Stateless Predictor)](https://github.com/k2-fsa/icefall/tree/master/egs/tedlium3/ASR/transducer_stateless)
252
-
253
- | | dev | test |
254
- |--------------------------------------|-------|--------|
255
- | modified_beam_search (`beam_size=4`) | 6.91 | 6.33 |
256
-
257
-
258
- We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1MmY5bBxwvKLNT4A2DJnwiqRXhdchUqPN?usp=sharing)
259
-
260
- #### [Transducer (pruned_transducer_stateless)](https://github.com/k2-fsa/icefall/tree/master/egs/tedlium3/ASR/pruned_transducer_stateless)
261
-
262
- | | dev | test |
263
- |--------------------------------------|-------|--------|
264
- | modified_beam_search (`beam_size=4`) | 6.77 | 6.14 |
265
-
266
- We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1je_1zGrOkGVVd4WLzgkXRHxl-I27yWtz?usp=sharing)
267
-
268
-
269
- ### [Aidatatang_200zh][aidatatang_200zh]
270
-
271
- #### [Transducer (pruned_transducer_stateless2)](https://github.com/k2-fsa/icefall/tree/master/egs/aidatatang_200zh/ASR/pruned_transducer_stateless2)
272
-
273
- | | Dev | Test |
274
- |----------------------|-------|-------|
275
- | greedy_search | 5.53 | 6.59 |
276
- | fast_beam_search | 5.30 | 6.34 |
277
- | modified_beam_search | 5.27 | 6.33 |
278
-
279
- We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1wNSnSj3T5oOctbh5IGCa393gKOoQw2GH?usp=sharing)
280
-
281
-
282
- ### [WenetSpeech][wenetspeech]
283
-
284
- #### [Transducer (pruned_transducer_stateless2)](https://github.com/k2-fsa/icefall/tree/master/egs/wenetspeech/ASR/pruned_transducer_stateless2)
285
-
286
- | | Dev | Test-Net | Test-Meeting |
287
- |----------------------|-------|----------|--------------|
288
- | greedy_search | 7.80 | 8.75 | 13.49 |
289
- | fast_beam_search | 7.94 | 8.74 | 13.80 |
290
- | modified_beam_search | 7.76 | 8.71 | 13.41 |
291
-
292
- We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1EV4e1CHa1GZgEF-bZgizqI9RyFFehIiN?usp=sharing)
293
-
294
- #### [Transducer **Streaming** (pruned_transducer_stateless5) ](https://github.com/k2-fsa/icefall/tree/master/egs/wenetspeech/ASR/pruned_transducer_stateless5)
295
-
296
- | | Dev | Test-Net | Test-Meeting |
297
- |----------------------|-------|----------|--------------|
298
- | greedy_search | 8.78 | 10.12 | 16.16 |
299
- | fast_beam_search| 9.01 | 10.47 | 16.28 |
300
- | modified_beam_search | 8.53| 9.95 | 15.81 |
301
-
302
-
303
- ### [Alimeeting][alimeeting]
304
-
305
- #### [Transducer (pruned_transducer_stateless2)](https://github.com/k2-fsa/icefall/tree/master/egs/alimeeting/ASR/pruned_transducer_stateless2)
306
-
307
- | | Eval | Test-Net |
308
- |----------------------|--------|----------|
309
- | greedy_search | 31.77 | 34.66 |
310
- | fast_beam_search | 31.39 | 33.02 |
311
- | modified_beam_search | 30.38 | 34.25 |
312
-
313
- We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1tKr3f0mL17uO_ljdHGKtR7HOmthYHwJG?usp=sharing)
314
-
315
-
316
- ### [TAL_CSASR][tal_csasr]
317
-
318
-
319
- #### [Transducer (pruned_transducer_stateless5)](https://github.com/k2-fsa/icefall/tree/master/egs/tal_csasr/ASR/pruned_transducer_stateless5)
320
-
321
- The best results for Chinese CER(%) and English WER(%) respectively (zh: Chinese, en: English):
322
- |decoding-method | dev | dev_zh | dev_en | test | test_zh | test_en |
323
- |--|--|--|--|--|--|--|
324
- |greedy_search| 7.30 | 6.48 | 19.19 |7.39| 6.66 | 19.13|
325
- |fast_beam_search| 7.18 | 6.39| 18.90 | 7.27| 6.55 | 18.77|
326
- |modified_beam_search| 7.15 | 6.35 | 18.95 | 7.22| 6.50 | 18.70 |
327
-
328
- We provide a Colab notebook to test the pre-trained model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1DmIx-NloI1CMU5GdZrlse7TRu4y3Dpf8?usp=sharing)
329
-
330
- ## TTS: Text-to-Speech
331
-
332
- ### Supported Datasets
333
-
334
- - [LJSpeech][ljspeech]
335
- - [VCTK][vctk]
336
- - [LibriTTS][libritts_tts]
337
-
338
- ### Supported Models
339
-
340
- - [VITS](https://arxiv.org/abs/2106.06103)
341
-
342
- # Deployment with C++
343
-
344
- Once you have trained a model in icefall, you may want to deploy it with C++ without Python dependencies.
345
-
346
- Please refer to
347
-
348
- - https://k2-fsa.github.io/icefall/model-export/export-with-torch-jit-script.html
349
- - https://k2-fsa.github.io/icefall/model-export/export-onnx.html
350
- - https://k2-fsa.github.io/icefall/model-export/export-ncnn.html
351
-
352
- for how to do this.
353
-
354
- We also provide a Colab notebook, showing you how to run a torch scripted model in [k2][k2] with C++.
355
- Please see: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1BIGLWzS36isskMXHKcqC9ysN6pspYXs_?usp=sharing)
356
-
357
-
358
- [yesno]: egs/yesno/ASR
359
- [librispeech]: egs/librispeech/ASR
360
- [aishell]: egs/aishell/ASR
361
- [aishell2]: egs/aishell2/ASR
362
- [aishell4]: egs/aishell4/ASR
363
- [timit]: egs/timit/ASR
364
- [tedlium3]: egs/tedlium3/ASR
365
- [gigaspeech]: egs/gigaspeech/ASR
366
- [aidatatang_200zh]: egs/aidatatang_200zh/ASR
367
- [wenetspeech]: egs/wenetspeech/ASR
368
- [alimeeting]: egs/alimeeting/ASR
369
- [tal_csasr]: egs/tal_csasr/ASR
370
- [ami]: egs/ami
371
- [swbd]: egs/swbd/ASR
372
- [k2]: https://github.com/k2-fsa/k2
373
- [commonvoice]: egs/commonvoice/ASR
374
- [csj]: egs/csj/ASR
375
- [libricss]: egs/libricss/SURT
376
- [libritts_asr]: egs/libritts/ASR
377
- [libriheavy]: egs/libriheavy/ASR
378
- [mgb2]: egs/mgb2/ASR
379
- [spgispeech]: egs/spgispeech/ASR
380
- [voxpopuli]: egs/voxpopuli/ASR
381
- [xbmu-amdo31]: egs/xbmu-amdo31/ASR
382
-
383
- [vctk]: egs/vctk/TTS
384
- [ljspeech]: egs/ljspeech/TTS
385
- [libritts_tts]: egs/libritts/TTS
386
-
387
- ## Acknowledgements
388
-
389
- Some contributors to this project were supported by Xiaomi Corporation. Others were supported by National Science Foundation CCRI award 2120435. This is not an exhaustive list of sources of support.