pkufool
/

zipformer-medium

Model card Files Files and versions

zipformer-medium / README.md

pkufool's picture

Update README.md

17c09dd verified 6 days ago

|

History Blame Contribute Delete

2.18 kB

	---
	license: apache-2.0
	language:
	- zh
	- en
	metrics:
	- wer
	tags:
	- ASR
	- onnx
	---

	## Introduction


	This is a medium [zipformer](https://arxiv.org/pdf/2310.11230) model developed by Xiaomi AI Lab Next-gen-Kaldi team. The model was trained on around 20,0000 hours of open-sourced Chinese and English datasets. The number of parameters is around 68M (for ctc head), 73M (for transducer head).

	The performance on some popular test sets (CER for Chinese, WER for English).

	\| Head \| aishell test 1 / 2 \| wenetspeech test-net/meetting \| Common Voice zh \| kespeech test \| librispeech test-clean / other \| gigaspeech test \| Common voice en \| tedium test \|
	\| -- \| -- \| -- \| -- \| -- \| -- \| -- \| -- \| -- \|
	\| CTC \| 3.08 / 3.98 \| 7.08 / 7.62 \| 9.2 \| 11.23\| 3.01 / 6.06 \| 11.22 \| 15.28 \| 10.38 \|
	\| Transducer \| 2.67 / 3.67 \| 6.79 / 7.33 \| 8.97 \| 10.67\| 2.61 / 5.36 \| 10.56 \| 12.94 \| 10.06 \|

	Please refer to [zipformer in github](https://github.com/pkufool/zipformer) for model details.

	> Training set list: Librispeech, Gigaspeech, Commonvoice-2022(zh + en), Libriheavy, Emilia (zh+en), AIshell 2, Wenetspeech, Wenetspeech4tts, Kespeech, AIshell, aidatatang, aishell4, alimeeting, magicdata, primewords, stcmds, thchs30.


	## Documentation

	Please refer to [https://pkufool.github.io/zipformer/en/models/](https://pkufool.github.io/zipformer/en/models/)


	## Citation

	```
	@inproceedings{yao2024zipformer,
	title={Zipformer: A faster and better encoder for automatic speech recognition},
	author={Yao, Zengwei and Guo, Liyong and Yang, Xiaoyu and Kang, Wei and Kuang, Fangjun and Yang, Yifan and Jin, Zengrui and Lin, Long and Povey, Daniel},
	booktitle={International Conference on Learning Representations},
	volume={2024},
	pages={44440--44455},
	year={2024}
	}
	@inproceedings{yao2025cr,
	title={Cr-ctc: Consistency regularization on ctc for improved speech recognition},
	author={Yao, Zengwei and Kang, Wei and Yang, Xiaoyu and Kuang, Fangjun and Guo, Liyong and Zhu, Han and Jin, Zengrui and Li, Zhaoqing and Lin, Long and Povey, Daniel},
	booktitle={International Conference on Learning Representations},
	volume={2025},
	pages={26850--26868},
	year={2025}
	}
	```