ESPnet: End-to-End Speech Processing Toolkit
Paper
•
1804.00015
•
Published
espnet/yoshiki_chime4_whisper_medium_finetuning
This model was trained by Yoshiki using chime4 recipe in espnet.
Follow the ESPnet installation instructions if you haven't done that already.
cd espnet
git checkout fe00740b80cd26fad7c550cd9e975609deb664db
pip install -e .
cd egs2/chime4/asr1
./run.sh --skip_data_prep false --skip_train true --download_model espnet/yoshiki_chime4_whisper_medium_finetuning
Fri Jul 21 19:08:31 JST 20233.10.10 (main, Mar 21 2023, 18:45:11) [GCC 11.2.0]espnet 202304pytorch 1.13.1d7172fcb7181ffdcca9c0061400254b63e37bf21Sat Jul 15 15:01:30 2023 +0900| dataset | Snt | Wrd | Corr | Sub | Del | Ins | Err | S.Err |
|---|---|---|---|---|---|---|---|---|
| decode_asr_whisper_noctc_greedy_asr_model_valid.acc.ave/dt05_real_isolated_1ch_track | 1640 | 24791 | 97.7 | 1.9 | 0.5 | 0.7 | 3.0 | 25.7 |
| decode_asr_whisper_noctc_greedy_asr_model_valid.acc.ave/dt05_simu_isolated_1ch_track | 1640 | 24792 | 95.9 | 3.3 | 0.8 | 0.8 | 4.9 | 37.0 |
| decode_asr_whisper_noctc_greedy_asr_model_valid.acc.ave/et05_real_isolated_1ch_track | 1320 | 19341 | 96.3 | 3.2 | 0.5 | 0.8 | 4.5 | 33.6 |
| decode_asr_whisper_noctc_greedy_asr_model_valid.acc.ave/et05_simu_isolated_1ch_track | 1320 | 19344 | 93.1 | 5.8 | 1.1 | 1.2 | 8.1 | 43.3 |
| dataset | Snt | Wrd | Corr | Sub | Del | Ins | Err | S.Err |
|---|---|---|---|---|---|---|---|---|
| decode_asr_whisper_noctc_greedy_asr_model_valid.acc.ave/dt05_real_isolated_1ch_track | 1640 | 141889 | 99.2 | 0.4 | 0.4 | 0.7 | 1.5 | 25.7 |
| decode_asr_whisper_noctc_greedy_asr_model_valid.acc.ave/dt05_simu_isolated_1ch_track | 1640 | 141900 | 98.2 | 0.9 | 0.9 | 0.8 | 2.6 | 37.0 |
| decode_asr_whisper_noctc_greedy_asr_model_valid.acc.ave/et05_real_isolated_1ch_track | 1320 | 110558 | 98.6 | 0.8 | 0.6 | 0.7 | 2.1 | 33.6 |
| decode_asr_whisper_noctc_greedy_asr_model_valid.acc.ave/et05_simu_isolated_1ch_track | 1320 | 110572 | 96.5 | 1.9 | 1.5 | 1.2 | 4.7 | 43.3 |
@inproceedings{watanabe2018espnet,
author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
title={{ESPnet}: End-to-End Speech Processing Toolkit},
year={2018},
booktitle={Proceedings of Interspeech},
pages={2207--2211},
doi={10.21437/Interspeech.2018-1456},
url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
}
or arXiv:
@misc{watanabe2018espnet,
title={ESPnet: End-to-End Speech Processing Toolkit},
author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
year={2018},
eprint={1804.00015},
archivePrefix={arXiv},
primaryClass={cs.CL}
}