Robust Speech Recognition via Large-Scale Weak Supervision
Paper
• 2212.04356 • Published
• 51
This is openai/whisper-large-v2 converted to eole format using eole convert --model_dir openai/whisper-large-v2.
No weights were modified — this is a format conversion only.
| Original model | openai/whisper-large-v2 |
| Parameters | 1.55B |
| Encoder layers | 32 |
| Decoder layers | 32 |
| Hidden size | 1280 |
| Attention heads | 20 |
| Mel bins | 80 |
| Vocab size | 51,865 |
| License | Apache 2.0 |
pip install eole[wer]
eole predict \
-config eval_config.yaml \
-model_path whisper-large-v2-eole \
-src audio_files.txt \
-output transcriptions.txt \
-language en \
-task transcribe \
-gpu_ranks 0
All evaluations use beam size 5.
| Benchmark | WER |
|---|---|
| LibriSpeech test-clean | 2.44% |
eole convert --model_dir openai/whisper-large-v2 --output whisper-large-v2-eole
@misc{radford2023robust,
title={Robust Speech Recognition via Large-Scale Weak Supervision},
author={Alec Radford and Jong Wook Kim and Tao Xu and Greg Brockman and Christine McLeavey and Ilya Sutskever},
year={2023},
eprint={2212.04356},
archivePrefix={arXiv},
primaryClass={eess.AS}
}
Base model
openai/whisper-large-v2