| | --- |
| | language: |
| | - en |
| | - zh |
| | - de |
| | - es |
| | - ru |
| | - ko |
| | - fr |
| | - ja |
| | - pt |
| | - tr |
| | - pl |
| | - ca |
| | - nl |
| | - ar |
| | - sv |
| | - it |
| | - id |
| | - hi |
| | - fi |
| | - vi |
| | - he |
| | - uk |
| | - el |
| | - ms |
| | - cs |
| | - ro |
| | - da |
| | - hu |
| | - ta |
| | - "no" |
| | - th |
| | - ur |
| | - hr |
| | - bg |
| | - lt |
| | - la |
| | - mi |
| | - ml |
| | - cy |
| | - sk |
| | - te |
| | - fa |
| | - lv |
| | - bn |
| | - sr |
| | - az |
| | - sl |
| | - kn |
| | - et |
| | - mk |
| | - br |
| | - eu |
| | - is |
| | - hy |
| | - ne |
| | - mn |
| | - bs |
| | - kk |
| | - sq |
| | - sw |
| | - gl |
| | - mr |
| | - pa |
| | - si |
| | - km |
| | - sn |
| | - yo |
| | - so |
| | - af |
| | - oc |
| | - ka |
| | - be |
| | - tg |
| | - sd |
| | - gu |
| | - am |
| | - yi |
| | - lo |
| | - uz |
| | - fo |
| | - ht |
| | - ps |
| | - tk |
| | - nn |
| | - mt |
| | - sa |
| | - lb |
| | - my |
| | - bo |
| | - tl |
| | - mg |
| | - as |
| | - tt |
| | - haw |
| | - ln |
| | - ha |
| | - ba |
| | - jw |
| | - su |
| | tags: |
| | - audio |
| | - automatic-speech-recognition |
| | - eole |
| | - whisper |
| | license: apache-2.0 |
| | base_model: openai/whisper-small |
| | pipeline_tag: automatic-speech-recognition |
| | --- |
| | |
| | # Whisper Small (eole) |
| |
|
| | This is [openai/whisper-small](https://huggingface.co/openai/whisper-small) converted to [eole](https://github.com/eole-nlp/eole) format using `eole convert --model_dir openai/whisper-small`. |
| |
|
| | No weights were modified — this is a format conversion only. |
| |
|
| | ## Model details |
| |
|
| | | | | |
| | |---|---| |
| | | **Original model** | [openai/whisper-small](https://huggingface.co/openai/whisper-small) | |
| | | **Parameters** | 244M | |
| | | **Encoder layers** | 12 | |
| | | **Decoder layers** | 12 | |
| | | **Hidden size** | 768 | |
| | | **Attention heads** | 12 | |
| | | **Mel bins** | 80 | |
| | | **Vocab size** | 51,865 | |
| | | **License** | Apache 2.0 | |
| |
|
| | ## Usage |
| |
|
| | ```bash |
| | pip install eole[wer] |
| | ``` |
| |
|
| | ### Transcribe |
| |
|
| | ```bash |
| | eole predict \ |
| | -config eval_config.yaml \ |
| | -model_path whisper-small-eole \ |
| | -src audio_files.txt \ |
| | -output transcriptions.txt \ |
| | -language en \ |
| | -task transcribe \ |
| | -gpu_ranks 0 |
| | ``` |
| |
|
| | ## Evaluation |
| |
|
| | All evaluations use beam size 5. |
| |
|
| | | Benchmark | WER | |
| | |---|---| |
| | | LibriSpeech test-clean | 3.30% | |
| |
|
| | ## Conversion |
| |
|
| | ```bash |
| | eole convert --model_dir openai/whisper-small --output whisper-small-eole |
| | ``` |
| |
|
| | ## Citation |
| |
|
| | ```bibtex |
| | @misc{radford2023robust, |
| | title={Robust Speech Recognition via Large-Scale Weak Supervision}, |
| | author={Alec Radford and Jong Wook Kim and Tao Xu and Greg Brockman and Christine McLeavey and Ilya Sutskever}, |
| | year={2023}, |
| | eprint={2212.04356}, |
| | archivePrefix={arXiv}, |
| | primaryClass={eess.AS} |
| | } |
| | ``` |
| |
|