Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,54 @@
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
+
language:
|
| 4 |
+
- zh
|
| 5 |
+
- en
|
| 6 |
+
metrics:
|
| 7 |
+
- wer
|
| 8 |
+
tags:
|
| 9 |
+
- ASR
|
| 10 |
+
- onnx
|
| 11 |
---
|
| 12 |
+
|
| 13 |
+
## Introduction
|
| 14 |
+
|
| 15 |
+
|
| 16 |
+
This is a medium [zipformer](https://arxiv.org/pdf/2310.11230) model developed by Xiaomi AI Lab Next-gen-Kaldi team. The model was trained on around 20,0000 hours of open-sourced Chinese and English datasets. The number of parameters is around 68M (for ctc head), 73M (for transducer head).
|
| 17 |
+
|
| 18 |
+
The performance on some popular test sets (CER for Chinese, WER for English).
|
| 19 |
+
|
| 20 |
+
| Head | aishell test 1 / 2 | wenetspeech test-net/meetting | Common Voice zh | kespeech test | librispeech test-clean / other | gigaspeech test | Common voice en | tedium test |
|
| 21 |
+
| -- | -- | -- | -- | -- | -- | -- | -- | -- |
|
| 22 |
+
| CTC | 3.08 / 3.98 | 7.08 / 7.62 | 9.2 | 11.23| 3.01 / 6.06 | 11.22 | 15.28 | 10.38 |
|
| 23 |
+
| Transducer | 2.67 / 3.67 | 6.79 / 7.33 | 8.97 | 10.67| 2.61 / 5.36 | 10.56 | 12.94 | 10.06 |
|
| 24 |
+
|
| 25 |
+
Please refer to [zipformer in github](https://github.com/pkufool/zipformer) for model details.
|
| 26 |
+
|
| 27 |
+
> Training set list: Librispeech, Gigaspeech, Commonvoice-2022(zh + en), Libriheavy, Emilia (zh+en), AIshell 2, Wenetspeech, Wenetspeech4tts, Kespeech, AIshell, aidatatang, aishell4, alimeeting, magicdata, primewords, stcmds, thchs30.
|
| 28 |
+
|
| 29 |
+
|
| 30 |
+
## Documentation
|
| 31 |
+
|
| 32 |
+
Please refer to [https://pkufool.github.io/zipformer/en/models/](https://pkufool.github.io/zipformer/en/models/)
|
| 33 |
+
|
| 34 |
+
|
| 35 |
+
## Citation
|
| 36 |
+
|
| 37 |
+
```
|
| 38 |
+
@inproceedings{yao2024zipformer,
|
| 39 |
+
title={Zipformer: A faster and better encoder for automatic speech recognition},
|
| 40 |
+
author={Yao, Zengwei and Guo, Liyong and Yang, Xiaoyu and Kang, Wei and Kuang, Fangjun and Yang, Yifan and Jin, Zengrui and Lin, Long and Povey, Daniel},
|
| 41 |
+
booktitle={International Conference on Learning Representations},
|
| 42 |
+
volume={2024},
|
| 43 |
+
pages={44440--44455},
|
| 44 |
+
year={2024}
|
| 45 |
+
}
|
| 46 |
+
@inproceedings{yao2025cr,
|
| 47 |
+
title={Cr-ctc: Consistency regularization on ctc for improved speech recognition},
|
| 48 |
+
author={Yao, Zengwei and Kang, Wei and Yang, Xiaoyu and Kuang, Fangjun and Guo, Liyong and Zhu, Han and Jin, Zengrui and Li, Zhaoqing and Lin, Long and Povey, Daniel},
|
| 49 |
+
booktitle={International Conference on Learning Representations},
|
| 50 |
+
volume={2025},
|
| 51 |
+
pages={26850--26868},
|
| 52 |
+
year={2025}
|
| 53 |
+
}
|
| 54 |
+
```
|