File size: 2,841 Bytes
bfe9b1d 694b20e bfe9b1d b4c7f72 694b20e b4c7f72 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 |
---
license: apache-2.0
tags:
- multimodel
---
# Available models and languages
There are six model sizes, four with English-only versions, offering speed and accuracy tradeoffs.
Below are the names of the available models and their approximate memory requirements and inference speed relative to the large model.
The relative speeds below are measured by transcribing English speech on a A100, and the real-world speed may vary significantly depending on many factors including the language, the speaking speed, and the available hardware.
| Size | Parameters | English-only model | Multilingual model | Required VRAM | Relative speed |
|:------:|:----------:|:------------------:|:------------------:|:-------------:|:--------------:|
| tiny | 39 M | `tiny.en` | `tiny` | ~1 GB | ~10x |
| base | 74 M | `base.en` | `base` | ~1 GB | ~7x |
| small | 244 M | `small.en` | `small` | ~2 GB | ~4x |
| medium | 769 M | `medium.en` | `medium` | ~5 GB | ~2x |
| large | 1550 M | N/A | `large` | ~10 GB | 1x |
| turbo | 809 M | N/A | `turbo` | ~6 GB | ~8x |
Supported Languages
---
English
Chinese
German
Spanish
Russian
Korean
French
Japanese
Portuguese
Turkish
Polish
Catalan
Dutch
Arabic
Swedish
Italian
Indonesian
Hindi
Finnish
Vietnamese
Hebrew
Ukrainian
Greek
Malay
Czech
Romanian
Danish
Hungarian
Tamil
Norwegian
Thai
Urdu
Croatian
Bulgarian
Lithuanian
Latin
Māori
Malayalam
Welsh
Slovak
Telugu
Persian
Latvian
Bengali
Serbian
Azerbaijani
Slovenian
Kannada
Estonian
Macedonian
Breton
Basque
Icelandic
Armenian
Nepali
Mongolian
Bosnian
Kazakh
Albanian
Swahili
Galician
Marathi
Panjabi
Sinhala
Khmer
Shona
Yoruba
Somali
Afrikaans
Occitan
Georgian
Belarusian
Tajik
Sindhi
Gujarati
Amharic
Yiddish
Lao
Uzbek
Faroese
Haitian
Pashto
Turkmen
Norwegian Nynorsk
Maltese
Sanskrit
Luxembourgish
Burmese
Tibetan
Tagalog
Malagasy
Assamese
Tatar
Hawaiian
Lingala
Hausa
Bashkir
jw
Sundanese
===
Whisper's performance varies widely depending on the language. The figure below shows a performance breakdown of `large-v3` and `large-v2` models by language, using WERs (word error rates) or CER (character error rates, shown in *Italic*) evaluated on the Common Voice 15 and Fleurs datasets. Additional WER/CER metrics corresponding to the other models and datasets can be found in Appendix D.1, D.2, and D.4 of [the paper](https://arxiv.org/abs/2212.04356), as well as the BLEU (Bilingual Evaluation Understudy) scores for translation in Appendix D.3.

|