| --- |
| license: apache-2.0 |
| language: |
| - zh |
| - en |
| metrics: |
| - wer |
| tags: |
| - ASR |
| - onnx |
| --- |
| |
| ## Introduction |
|
|
|
|
| This is a medium [zipformer](https://arxiv.org/pdf/2310.11230) model developed by Xiaomi AI Lab Next-gen-Kaldi team. The model was trained on around 20,0000 hours of open-sourced Chinese and English datasets. The number of parameters is around 68M (for ctc head), 73M (for transducer head). |
|
|
| The performance on some popular test sets (CER for Chinese, WER for English). |
|
|
| | Head | aishell test 1 / 2 | wenetspeech test-net/meetting | Common Voice zh | kespeech test | librispeech test-clean / other | gigaspeech test | Common voice en | tedium test | |
| | -- | -- | -- | -- | -- | -- | -- | -- | -- | |
| | CTC | 3.08 / 3.98 | 7.08 / 7.62 | 9.2 | 11.23| 3.01 / 6.06 | 11.22 | 15.28 | 10.38 | |
| | Transducer | 2.67 / 3.67 | 6.79 / 7.33 | 8.97 | 10.67| 2.61 / 5.36 | 10.56 | 12.94 | 10.06 | |
|
|
| Please refer to [zipformer in github](https://github.com/pkufool/zipformer) for model details. |
|
|
| > Training set list: Librispeech, Gigaspeech, Commonvoice-2022(zh + en), Libriheavy, Emilia (zh+en), AIshell 2, Wenetspeech, Wenetspeech4tts, Kespeech, AIshell, aidatatang, aishell4, alimeeting, magicdata, primewords, stcmds, thchs30. |
|
|
|
|
| ## Documentation |
|
|
| Please refer to [https://pkufool.github.io/zipformer/en/models/](https://pkufool.github.io/zipformer/en/models/) |
|
|
|
|
| ## Citation |
|
|
| ``` |
| @inproceedings{yao2024zipformer, |
| title={Zipformer: A faster and better encoder for automatic speech recognition}, |
| author={Yao, Zengwei and Guo, Liyong and Yang, Xiaoyu and Kang, Wei and Kuang, Fangjun and Yang, Yifan and Jin, Zengrui and Lin, Long and Povey, Daniel}, |
| booktitle={International Conference on Learning Representations}, |
| volume={2024}, |
| pages={44440--44455}, |
| year={2024} |
| } |
| @inproceedings{yao2025cr, |
| title={Cr-ctc: Consistency regularization on ctc for improved speech recognition}, |
| author={Yao, Zengwei and Kang, Wei and Yang, Xiaoyu and Kuang, Fangjun and Guo, Liyong and Zhu, Han and Jin, Zengrui and Li, Zhaoqing and Lin, Long and Povey, Daniel}, |
| booktitle={International Conference on Learning Representations}, |
| volume={2025}, |
| pages={26850--26868}, |
| year={2025} |
| } |
| ``` |