|
|
--- |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- multimodel |
|
|
--- |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# Available models and languages |
|
|
|
|
|
There are six model sizes, four with English-only versions, offering speed and accuracy tradeoffs. |
|
|
Below are the names of the available models and their approximate memory requirements and inference speed relative to the large model. |
|
|
The relative speeds below are measured by transcribing English speech on a A100, and the real-world speed may vary significantly depending on many factors including the language, the speaking speed, and the available hardware. |
|
|
|
|
|
| Size | Parameters | English-only model | Multilingual model | Required VRAM | Relative speed | |
|
|
|:------:|:----------:|:------------------:|:------------------:|:-------------:|:--------------:| |
|
|
| tiny | 39 M | `tiny.en` | `tiny` | ~1 GB | ~10x | |
|
|
| base | 74 M | `base.en` | `base` | ~1 GB | ~7x | |
|
|
| small | 244 M | `small.en` | `small` | ~2 GB | ~4x | |
|
|
| medium | 769 M | `medium.en` | `medium` | ~5 GB | ~2x | |
|
|
| large | 1550 M | N/A | `large` | ~10 GB | 1x | |
|
|
| turbo | 809 M | N/A | `turbo` | ~6 GB | ~8x | |
|
|
|
|
|
Supported Languages |
|
|
--- |
|
|
English |
|
|
Chinese |
|
|
German |
|
|
Spanish |
|
|
Russian |
|
|
Korean |
|
|
French |
|
|
Japanese |
|
|
Portuguese |
|
|
Turkish |
|
|
Polish |
|
|
Catalan |
|
|
Dutch |
|
|
Arabic |
|
|
Swedish |
|
|
Italian |
|
|
Indonesian |
|
|
Hindi |
|
|
Finnish |
|
|
Vietnamese |
|
|
Hebrew |
|
|
Ukrainian |
|
|
Greek |
|
|
Malay |
|
|
Czech |
|
|
Romanian |
|
|
Danish |
|
|
Hungarian |
|
|
Tamil |
|
|
Norwegian |
|
|
Thai |
|
|
Urdu |
|
|
Croatian |
|
|
Bulgarian |
|
|
Lithuanian |
|
|
Latin |
|
|
Māori |
|
|
Malayalam |
|
|
Welsh |
|
|
Slovak |
|
|
Telugu |
|
|
Persian |
|
|
Latvian |
|
|
Bengali |
|
|
Serbian |
|
|
Azerbaijani |
|
|
Slovenian |
|
|
Kannada |
|
|
Estonian |
|
|
Macedonian |
|
|
Breton |
|
|
Basque |
|
|
Icelandic |
|
|
Armenian |
|
|
Nepali |
|
|
Mongolian |
|
|
Bosnian |
|
|
Kazakh |
|
|
Albanian |
|
|
Swahili |
|
|
Galician |
|
|
Marathi |
|
|
Panjabi |
|
|
Sinhala |
|
|
Khmer |
|
|
Shona |
|
|
Yoruba |
|
|
Somali |
|
|
Afrikaans |
|
|
Occitan |
|
|
Georgian |
|
|
Belarusian |
|
|
Tajik |
|
|
Sindhi |
|
|
Gujarati |
|
|
Amharic |
|
|
Yiddish |
|
|
Lao |
|
|
Uzbek |
|
|
Faroese |
|
|
Haitian |
|
|
Pashto |
|
|
Turkmen |
|
|
Norwegian Nynorsk |
|
|
Maltese |
|
|
Sanskrit |
|
|
Luxembourgish |
|
|
Burmese |
|
|
Tibetan |
|
|
Tagalog |
|
|
Malagasy |
|
|
Assamese |
|
|
Tatar |
|
|
Hawaiian |
|
|
Lingala |
|
|
Hausa |
|
|
Bashkir |
|
|
jw |
|
|
Sundanese |
|
|
=== |
|
|
Whisper's performance varies widely depending on the language. The figure below shows a performance breakdown of `large-v3` and `large-v2` models by language, using WERs (word error rates) or CER (character error rates, shown in *Italic*) evaluated on the Common Voice 15 and Fleurs datasets. Additional WER/CER metrics corresponding to the other models and datasets can be found in Appendix D.1, D.2, and D.4 of [the paper](https://arxiv.org/abs/2212.04356), as well as the BLEU (Bilingual Evaluation Understudy) scores for translation in Appendix D.3. |
|
|
|
|
|
 |
|
|
|
|
|
|