Update README.md
Browse files
README.md
CHANGED
|
@@ -1,389 +0,0 @@
|
|
| 1 |
-
<div align="center">
|
| 2 |
-
<img src="https://raw.githubusercontent.com/k2-fsa/icefall/master/docs/source/_static/logo.png" width=168>
|
| 3 |
-
</div>
|
| 4 |
-
|
| 5 |
-
# Introduction
|
| 6 |
-
|
| 7 |
-
The icefall project contains speech-related recipes for various datasets
|
| 8 |
-
using [k2-fsa](https://github.com/k2-fsa/k2) and [lhotse](https://github.com/lhotse-speech/lhotse).
|
| 9 |
-
|
| 10 |
-
You can use [sherpa](https://github.com/k2-fsa/sherpa), [sherpa-ncnn](https://github.com/k2-fsa/sherpa-ncnn) or [sherpa-onnx](https://github.com/k2-fsa/sherpa-onnx) for deployment with models
|
| 11 |
-
in icefall; these frameworks also support models not included in icefall; please refer to respective documents for more details.
|
| 12 |
-
|
| 13 |
-
You can try pre-trained models from within your browser without the need
|
| 14 |
-
to download or install anything by visiting this [huggingface space](https://huggingface.co/spaces/k2-fsa/automatic-speech-recognition).
|
| 15 |
-
Please refer to [document](https://k2-fsa.github.io/icefall/huggingface/spaces.html) for more details.
|
| 16 |
-
|
| 17 |
-
# Installation
|
| 18 |
-
|
| 19 |
-
Please refer to [document](https://k2-fsa.github.io/icefall/installation/index.html)
|
| 20 |
-
for installation.
|
| 21 |
-
|
| 22 |
-
# Recipes
|
| 23 |
-
|
| 24 |
-
Please refer to [document](https://k2-fsa.github.io/icefall/recipes/index.html)
|
| 25 |
-
for more details.
|
| 26 |
-
|
| 27 |
-
## ASR: Automatic Speech Recognition
|
| 28 |
-
|
| 29 |
-
### Supported Datasets
|
| 30 |
-
- [yesno][yesno]
|
| 31 |
-
|
| 32 |
-
- [Aidatatang_200zh][aidatatang_200zh]
|
| 33 |
-
- [Aishell][aishell]
|
| 34 |
-
- [Aishell2][aishell2]
|
| 35 |
-
- [Aishell4][aishell4]
|
| 36 |
-
- [Alimeeting][alimeeting]
|
| 37 |
-
- [AMI][ami]
|
| 38 |
-
- [CommonVoice][commonvoice]
|
| 39 |
-
- [Corpus of Spontaneous Japanese][csj]
|
| 40 |
-
- [GigaSpeech][gigaspeech]
|
| 41 |
-
- [LibriCSS][libricss]
|
| 42 |
-
- [LibriSpeech][librispeech]
|
| 43 |
-
- [Libriheavy][libriheavy]
|
| 44 |
-
- [Multi-Dialect Broadcast News Arabic Speech Recognition][mgb2]
|
| 45 |
-
- [SPGISpeech][spgispeech]
|
| 46 |
-
- [Switchboard][swbd]
|
| 47 |
-
- [TIMIT][timit]
|
| 48 |
-
- [TED-LIUM3][tedlium3]
|
| 49 |
-
- [TAL_CSASR][tal_csasr]
|
| 50 |
-
- [Voxpopuli][voxpopuli]
|
| 51 |
-
- [XBMU-AMDO31][xbmu-amdo31]
|
| 52 |
-
- [WenetSpeech][wenetspeech]
|
| 53 |
-
|
| 54 |
-
More datasets will be added in the future.
|
| 55 |
-
|
| 56 |
-
### Supported Models
|
| 57 |
-
|
| 58 |
-
The [LibriSpeech][librispeech] recipe supports the most comprehensive set of models, you are welcome to try them out.
|
| 59 |
-
|
| 60 |
-
#### CTC
|
| 61 |
-
- TDNN LSTM CTC
|
| 62 |
-
- Conformer CTC
|
| 63 |
-
- Zipformer CTC
|
| 64 |
-
|
| 65 |
-
#### MMI
|
| 66 |
-
- Conformer MMI
|
| 67 |
-
- Zipformer MMI
|
| 68 |
-
|
| 69 |
-
#### Transducer
|
| 70 |
-
- Conformer-based Encoder
|
| 71 |
-
- LSTM-based Encoder
|
| 72 |
-
- Zipformer-based Encoder
|
| 73 |
-
- LSTM-based Predictor
|
| 74 |
-
- [Stateless Predictor](https://research.google/pubs/rnn-transducer-with-stateless-prediction-network/)
|
| 75 |
-
|
| 76 |
-
#### Whisper
|
| 77 |
-
- [OpenAi Whisper](https://arxiv.org/abs/2212.04356) (We support fine-tuning on AiShell-1.)
|
| 78 |
-
|
| 79 |
-
If you are willing to contribute to icefall, please refer to [contributing](https://k2-fsa.github.io/icefall/contributing/index.html) for more details.
|
| 80 |
-
|
| 81 |
-
We would like to highlight the performance of some of the recipes here.
|
| 82 |
-
|
| 83 |
-
### [yesno][yesno]
|
| 84 |
-
|
| 85 |
-
This is the simplest ASR recipe in `icefall` and can be run on CPU.
|
| 86 |
-
Training takes less than 30 seconds and gives you the following WER:
|
| 87 |
-
|
| 88 |
-
```
|
| 89 |
-
[test_set] %WER 0.42% [1 / 240, 0 ins, 1 del, 0 sub ]
|
| 90 |
-
```
|
| 91 |
-
We provide a Colab notebook for this recipe: [](https://colab.research.google.com/drive/1tIjjzaJc3IvGyKiMCDWO-TSnBgkcuN3B?usp=sharing)
|
| 92 |
-
|
| 93 |
-
|
| 94 |
-
### [LibriSpeech][librispeech]
|
| 95 |
-
|
| 96 |
-
Please see [RESULTS.md](https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/RESULTS.md)
|
| 97 |
-
for the **latest** results.
|
| 98 |
-
|
| 99 |
-
#### [Conformer CTC](https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/conformer_ctc)
|
| 100 |
-
|
| 101 |
-
| | test-clean | test-other |
|
| 102 |
-
|-----|------------|------------|
|
| 103 |
-
| WER | 2.42 | 5.73 |
|
| 104 |
-
|
| 105 |
-
|
| 106 |
-
We provide a Colab notebook to test the pre-trained model: [](https://colab.research.google.com/drive/1huyupXAcHsUrKaWfI83iMEJ6J0Nh0213?usp=sharing)
|
| 107 |
-
|
| 108 |
-
#### [TDNN LSTM CTC](https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/tdnn_lstm_ctc)
|
| 109 |
-
|
| 110 |
-
| | test-clean | test-other |
|
| 111 |
-
|-----|------------|------------|
|
| 112 |
-
| WER | 6.59 | 17.69 |
|
| 113 |
-
|
| 114 |
-
We provide a Colab notebook to test the pre-trained model: [](https://colab.research.google.com/drive/1-iSfQMp2So-We_Uu49N4AAcMInB72u9z?usp=sharing)
|
| 115 |
-
|
| 116 |
-
|
| 117 |
-
#### [Transducer (Conformer Encoder + LSTM Predictor)](https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/transducer)
|
| 118 |
-
|
| 119 |
-
| | test-clean | test-other |
|
| 120 |
-
|---------------|------------|------------|
|
| 121 |
-
| greedy_search | 3.07 | 7.51 |
|
| 122 |
-
|
| 123 |
-
We provide a Colab notebook to test the pre-trained model: [](https://colab.research.google.com/drive/1_u6yK9jDkPwG_NLrZMN2XK7Aeq4suMO2?usp=sharing)
|
| 124 |
-
|
| 125 |
-
#### [Transducer (Conformer Encoder + Stateless Predictor)](https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/transducer)
|
| 126 |
-
|
| 127 |
-
| | test-clean | test-other |
|
| 128 |
-
|---------------------------------------|------------|------------|
|
| 129 |
-
| modified_beam_search (`beam_size=4`) | 2.56 | 6.27 |
|
| 130 |
-
|
| 131 |
-
|
| 132 |
-
We provide a Colab notebook to test the pre-trained model: [](https://colab.research.google.com/drive/1CO1bXJ-2khDckZIW8zjOPHGSKLHpTDlp?usp=sharing)
|
| 133 |
-
|
| 134 |
-
|
| 135 |
-
#### [Transducer (Zipformer Encoder + Stateless Predictor)](https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/zipformer)
|
| 136 |
-
|
| 137 |
-
WER (modified_beam_search `beam_size=4` unless further stated)
|
| 138 |
-
|
| 139 |
-
1. LibriSpeech-960hr
|
| 140 |
-
|
| 141 |
-
| Encoder | Params | test-clean | test-other | epochs | devices |
|
| 142 |
-
|-----------------|--------|------------|------------|---------|------------|
|
| 143 |
-
| Zipformer | 65.5M | 2.21 | 4.79 | 50 | 4 32G-V100 |
|
| 144 |
-
| Zipformer-small | 23.2M | 2.42 | 5.73 | 50 | 2 32G-V100 |
|
| 145 |
-
| Zipformer-large | 148.4M | 2.06 | 4.63 | 50 | 4 32G-V100 |
|
| 146 |
-
| Zipformer-large | 148.4M | 2.00 | 4.38 | 174 | 8 80G-A100 |
|
| 147 |
-
|
| 148 |
-
2. LibriSpeech-960hr + GigaSpeech
|
| 149 |
-
|
| 150 |
-
| Encoder | Params | test-clean | test-other |
|
| 151 |
-
|-----------------|--------|------------|------------|
|
| 152 |
-
| Zipformer | 65.5M | 1.78 | 4.08 |
|
| 153 |
-
|
| 154 |
-
|
| 155 |
-
3. LibriSpeech-960hr + GigaSpeech + CommonVoice
|
| 156 |
-
|
| 157 |
-
| Encoder | Params | test-clean | test-other |
|
| 158 |
-
|-----------------|--------|------------|------------|
|
| 159 |
-
| Zipformer | 65.5M | 1.90 | 3.98 |
|
| 160 |
-
|
| 161 |
-
|
| 162 |
-
### [GigaSpeech][gigaspeech]
|
| 163 |
-
|
| 164 |
-
#### [Conformer CTC](https://github.com/k2-fsa/icefall/tree/master/egs/gigaspeech/ASR/conformer_ctc)
|
| 165 |
-
|
| 166 |
-
| | Dev | Test |
|
| 167 |
-
|-----|-------|-------|
|
| 168 |
-
| WER | 10.47 | 10.58 |
|
| 169 |
-
|
| 170 |
-
#### [Transducer (pruned_transducer_stateless2)](https://github.com/k2-fsa/icefall/tree/master/egs/gigaspeech/ASR/pruned_transducer_stateless2)
|
| 171 |
-
|
| 172 |
-
Conformer Encoder + Stateless Predictor + k2 Pruned RNN-T Loss
|
| 173 |
-
|
| 174 |
-
| | Dev | Test |
|
| 175 |
-
|----------------------|-------|-------|
|
| 176 |
-
| greedy_search | 10.51 | 10.73 |
|
| 177 |
-
| fast_beam_search | 10.50 | 10.69 |
|
| 178 |
-
| modified_beam_search | 10.40 | 10.51 |
|
| 179 |
-
|
| 180 |
-
#### [Transducer (Zipformer Encoder + Stateless Predictor)](https://github.com/k2-fsa/icefall/tree/master/egs/gigaspeech/ASR/zipformer)
|
| 181 |
-
|
| 182 |
-
| | Dev | Test |
|
| 183 |
-
|----------------------|-------|-------|
|
| 184 |
-
| greedy_search | 10.31 | 10.50 |
|
| 185 |
-
| fast_beam_search | 10.26 | 10.48 |
|
| 186 |
-
| modified_beam_search | 10.25 | 10.38 |
|
| 187 |
-
|
| 188 |
-
|
| 189 |
-
### [Aishell][aishell]
|
| 190 |
-
|
| 191 |
-
#### [TDNN LSTM CTC](https://github.com/k2-fsa/icefall/tree/master/egs/aishell/ASR/tdnn_lstm_ctc)
|
| 192 |
-
|
| 193 |
-
| | test |
|
| 194 |
-
|-----|-------|
|
| 195 |
-
| CER | 10.16 |
|
| 196 |
-
|
| 197 |
-
We provide a Colab notebook to test the pre-trained model: [](https://colab.research.google.com/drive/1jbyzYq3ytm6j2nlEt-diQm-6QVWyDDEa?usp=sharing)
|
| 198 |
-
|
| 199 |
-
#### [Transducer (Conformer Encoder + Stateless Predictor)](https://github.com/k2-fsa/icefall/tree/master/egs/aishell/ASR/transducer_stateless)
|
| 200 |
-
|
| 201 |
-
| | test |
|
| 202 |
-
|-----|------|
|
| 203 |
-
| CER | 4.38 |
|
| 204 |
-
|
| 205 |
-
We provide a Colab notebook to test the pre-trained model: [](https://colab.research.google.com/drive/14XaT2MhnBkK-3_RqqWq3K90Xlbin-GZC?usp=sharing)
|
| 206 |
-
|
| 207 |
-
#### [Transducer (Zipformer Encoder + Stateless Predictor)](https://github.com/k2-fsa/icefall/tree/master/egs/aishell/ASR/zipformer)
|
| 208 |
-
|
| 209 |
-
WER (modified_beam_search `beam_size=4`)
|
| 210 |
-
|
| 211 |
-
| Encoder | Params | dev | test | epochs |
|
| 212 |
-
|-----------------|--------|-----|------|---------|
|
| 213 |
-
| Zipformer | 73.4M | 4.13| 4.40 | 55 |
|
| 214 |
-
| Zipformer-small | 30.2M | 4.40| 4.67 | 55 |
|
| 215 |
-
| Zipformer-large | 157.3M | 4.03| 4.28 | 56 |
|
| 216 |
-
|
| 217 |
-
|
| 218 |
-
### [Aishell4][aishell4]
|
| 219 |
-
|
| 220 |
-
#### [Transducer (pruned_transducer_stateless5)](https://github.com/k2-fsa/icefall/tree/master/egs/aishell4/ASR/pruned_transducer_stateless5)
|
| 221 |
-
|
| 222 |
-
1 Trained with all subsets:
|
| 223 |
-
| | test |
|
| 224 |
-
|-----|------------|
|
| 225 |
-
| CER | 29.08 |
|
| 226 |
-
|
| 227 |
-
We provide a Colab notebook to test the pre-trained model: [](https://colab.research.google.com/drive/1z3lkURVv9M7uTiIgf3Np9IntMHEknaks?usp=sharing)
|
| 228 |
-
|
| 229 |
-
|
| 230 |
-
### [TIMIT][timit]
|
| 231 |
-
|
| 232 |
-
#### [TDNN LSTM CTC](https://github.com/k2-fsa/icefall/tree/master/egs/timit/ASR/tdnn_lstm_ctc)
|
| 233 |
-
|
| 234 |
-
| |TEST|
|
| 235 |
-
|---|----|
|
| 236 |
-
|PER| 19.71% |
|
| 237 |
-
|
| 238 |
-
We provide a Colab notebook to test the pre-trained model: [](https://colab.research.google.com/drive/1Hs9DA4V96uapw_30uNp32OMJgkuR5VVd?usp=sharing)
|
| 239 |
-
|
| 240 |
-
#### [TDNN LiGRU CTC](https://github.com/k2-fsa/icefall/tree/master/egs/timit/ASR/tdnn_ligru_ctc)
|
| 241 |
-
|
| 242 |
-
| |TEST|
|
| 243 |
-
|---|----|
|
| 244 |
-
|PER| 17.66% |
|
| 245 |
-
|
| 246 |
-
We provide a Colab notebook to test the pre-trained model: [](https://colab.research.google.com/drive/1z3lkURVv9M7uTiIgf3Np9IntMHEknaks?usp=sharing)
|
| 247 |
-
|
| 248 |
-
|
| 249 |
-
### [TED-LIUM3][tedlium3]
|
| 250 |
-
|
| 251 |
-
#### [Transducer (Conformer Encoder + Stateless Predictor)](https://github.com/k2-fsa/icefall/tree/master/egs/tedlium3/ASR/transducer_stateless)
|
| 252 |
-
|
| 253 |
-
| | dev | test |
|
| 254 |
-
|--------------------------------------|-------|--------|
|
| 255 |
-
| modified_beam_search (`beam_size=4`) | 6.91 | 6.33 |
|
| 256 |
-
|
| 257 |
-
|
| 258 |
-
We provide a Colab notebook to test the pre-trained model: [](https://colab.research.google.com/drive/1MmY5bBxwvKLNT4A2DJnwiqRXhdchUqPN?usp=sharing)
|
| 259 |
-
|
| 260 |
-
#### [Transducer (pruned_transducer_stateless)](https://github.com/k2-fsa/icefall/tree/master/egs/tedlium3/ASR/pruned_transducer_stateless)
|
| 261 |
-
|
| 262 |
-
| | dev | test |
|
| 263 |
-
|--------------------------------------|-------|--------|
|
| 264 |
-
| modified_beam_search (`beam_size=4`) | 6.77 | 6.14 |
|
| 265 |
-
|
| 266 |
-
We provide a Colab notebook to test the pre-trained model: [](https://colab.research.google.com/drive/1je_1zGrOkGVVd4WLzgkXRHxl-I27yWtz?usp=sharing)
|
| 267 |
-
|
| 268 |
-
|
| 269 |
-
### [Aidatatang_200zh][aidatatang_200zh]
|
| 270 |
-
|
| 271 |
-
#### [Transducer (pruned_transducer_stateless2)](https://github.com/k2-fsa/icefall/tree/master/egs/aidatatang_200zh/ASR/pruned_transducer_stateless2)
|
| 272 |
-
|
| 273 |
-
| | Dev | Test |
|
| 274 |
-
|----------------------|-------|-------|
|
| 275 |
-
| greedy_search | 5.53 | 6.59 |
|
| 276 |
-
| fast_beam_search | 5.30 | 6.34 |
|
| 277 |
-
| modified_beam_search | 5.27 | 6.33 |
|
| 278 |
-
|
| 279 |
-
We provide a Colab notebook to test the pre-trained model: [](https://colab.research.google.com/drive/1wNSnSj3T5oOctbh5IGCa393gKOoQw2GH?usp=sharing)
|
| 280 |
-
|
| 281 |
-
|
| 282 |
-
### [WenetSpeech][wenetspeech]
|
| 283 |
-
|
| 284 |
-
#### [Transducer (pruned_transducer_stateless2)](https://github.com/k2-fsa/icefall/tree/master/egs/wenetspeech/ASR/pruned_transducer_stateless2)
|
| 285 |
-
|
| 286 |
-
| | Dev | Test-Net | Test-Meeting |
|
| 287 |
-
|----------------------|-------|----------|--------------|
|
| 288 |
-
| greedy_search | 7.80 | 8.75 | 13.49 |
|
| 289 |
-
| fast_beam_search | 7.94 | 8.74 | 13.80 |
|
| 290 |
-
| modified_beam_search | 7.76 | 8.71 | 13.41 |
|
| 291 |
-
|
| 292 |
-
We provide a Colab notebook to test the pre-trained model: [](https://colab.research.google.com/drive/1EV4e1CHa1GZgEF-bZgizqI9RyFFehIiN?usp=sharing)
|
| 293 |
-
|
| 294 |
-
#### [Transducer **Streaming** (pruned_transducer_stateless5) ](https://github.com/k2-fsa/icefall/tree/master/egs/wenetspeech/ASR/pruned_transducer_stateless5)
|
| 295 |
-
|
| 296 |
-
| | Dev | Test-Net | Test-Meeting |
|
| 297 |
-
|----------------------|-------|----------|--------------|
|
| 298 |
-
| greedy_search | 8.78 | 10.12 | 16.16 |
|
| 299 |
-
| fast_beam_search| 9.01 | 10.47 | 16.28 |
|
| 300 |
-
| modified_beam_search | 8.53| 9.95 | 15.81 |
|
| 301 |
-
|
| 302 |
-
|
| 303 |
-
### [Alimeeting][alimeeting]
|
| 304 |
-
|
| 305 |
-
#### [Transducer (pruned_transducer_stateless2)](https://github.com/k2-fsa/icefall/tree/master/egs/alimeeting/ASR/pruned_transducer_stateless2)
|
| 306 |
-
|
| 307 |
-
| | Eval | Test-Net |
|
| 308 |
-
|----------------------|--------|----------|
|
| 309 |
-
| greedy_search | 31.77 | 34.66 |
|
| 310 |
-
| fast_beam_search | 31.39 | 33.02 |
|
| 311 |
-
| modified_beam_search | 30.38 | 34.25 |
|
| 312 |
-
|
| 313 |
-
We provide a Colab notebook to test the pre-trained model: [](https://colab.research.google.com/drive/1tKr3f0mL17uO_ljdHGKtR7HOmthYHwJG?usp=sharing)
|
| 314 |
-
|
| 315 |
-
|
| 316 |
-
### [TAL_CSASR][tal_csasr]
|
| 317 |
-
|
| 318 |
-
|
| 319 |
-
#### [Transducer (pruned_transducer_stateless5)](https://github.com/k2-fsa/icefall/tree/master/egs/tal_csasr/ASR/pruned_transducer_stateless5)
|
| 320 |
-
|
| 321 |
-
The best results for Chinese CER(%) and English WER(%) respectively (zh: Chinese, en: English):
|
| 322 |
-
|decoding-method | dev | dev_zh | dev_en | test | test_zh | test_en |
|
| 323 |
-
|--|--|--|--|--|--|--|
|
| 324 |
-
|greedy_search| 7.30 | 6.48 | 19.19 |7.39| 6.66 | 19.13|
|
| 325 |
-
|fast_beam_search| 7.18 | 6.39| 18.90 | 7.27| 6.55 | 18.77|
|
| 326 |
-
|modified_beam_search| 7.15 | 6.35 | 18.95 | 7.22| 6.50 | 18.70 |
|
| 327 |
-
|
| 328 |
-
We provide a Colab notebook to test the pre-trained model: [](https://colab.research.google.com/drive/1DmIx-NloI1CMU5GdZrlse7TRu4y3Dpf8?usp=sharing)
|
| 329 |
-
|
| 330 |
-
## TTS: Text-to-Speech
|
| 331 |
-
|
| 332 |
-
### Supported Datasets
|
| 333 |
-
|
| 334 |
-
- [LJSpeech][ljspeech]
|
| 335 |
-
- [VCTK][vctk]
|
| 336 |
-
- [LibriTTS][libritts_tts]
|
| 337 |
-
|
| 338 |
-
### Supported Models
|
| 339 |
-
|
| 340 |
-
- [VITS](https://arxiv.org/abs/2106.06103)
|
| 341 |
-
|
| 342 |
-
# Deployment with C++
|
| 343 |
-
|
| 344 |
-
Once you have trained a model in icefall, you may want to deploy it with C++ without Python dependencies.
|
| 345 |
-
|
| 346 |
-
Please refer to
|
| 347 |
-
|
| 348 |
-
- https://k2-fsa.github.io/icefall/model-export/export-with-torch-jit-script.html
|
| 349 |
-
- https://k2-fsa.github.io/icefall/model-export/export-onnx.html
|
| 350 |
-
- https://k2-fsa.github.io/icefall/model-export/export-ncnn.html
|
| 351 |
-
|
| 352 |
-
for how to do this.
|
| 353 |
-
|
| 354 |
-
We also provide a Colab notebook, showing you how to run a torch scripted model in [k2][k2] with C++.
|
| 355 |
-
Please see: [](https://colab.research.google.com/drive/1BIGLWzS36isskMXHKcqC9ysN6pspYXs_?usp=sharing)
|
| 356 |
-
|
| 357 |
-
|
| 358 |
-
[yesno]: egs/yesno/ASR
|
| 359 |
-
[librispeech]: egs/librispeech/ASR
|
| 360 |
-
[aishell]: egs/aishell/ASR
|
| 361 |
-
[aishell2]: egs/aishell2/ASR
|
| 362 |
-
[aishell4]: egs/aishell4/ASR
|
| 363 |
-
[timit]: egs/timit/ASR
|
| 364 |
-
[tedlium3]: egs/tedlium3/ASR
|
| 365 |
-
[gigaspeech]: egs/gigaspeech/ASR
|
| 366 |
-
[aidatatang_200zh]: egs/aidatatang_200zh/ASR
|
| 367 |
-
[wenetspeech]: egs/wenetspeech/ASR
|
| 368 |
-
[alimeeting]: egs/alimeeting/ASR
|
| 369 |
-
[tal_csasr]: egs/tal_csasr/ASR
|
| 370 |
-
[ami]: egs/ami
|
| 371 |
-
[swbd]: egs/swbd/ASR
|
| 372 |
-
[k2]: https://github.com/k2-fsa/k2
|
| 373 |
-
[commonvoice]: egs/commonvoice/ASR
|
| 374 |
-
[csj]: egs/csj/ASR
|
| 375 |
-
[libricss]: egs/libricss/SURT
|
| 376 |
-
[libritts_asr]: egs/libritts/ASR
|
| 377 |
-
[libriheavy]: egs/libriheavy/ASR
|
| 378 |
-
[mgb2]: egs/mgb2/ASR
|
| 379 |
-
[spgispeech]: egs/spgispeech/ASR
|
| 380 |
-
[voxpopuli]: egs/voxpopuli/ASR
|
| 381 |
-
[xbmu-amdo31]: egs/xbmu-amdo31/ASR
|
| 382 |
-
|
| 383 |
-
[vctk]: egs/vctk/TTS
|
| 384 |
-
[ljspeech]: egs/ljspeech/TTS
|
| 385 |
-
[libritts_tts]: egs/libritts/TTS
|
| 386 |
-
|
| 387 |
-
## Acknowledgements
|
| 388 |
-
|
| 389 |
-
Some contributors to this project were supported by Xiaomi Corporation. Others were supported by National Science Foundation CCRI award 2120435. This is not an exhaustive list of sources of support.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|