hik63382
/

Video_Audio_Scribe

Model card Files Files and versions

Video_Audio_Scribe / README.md

hik63382's picture

Update README.md

b4c7f72 verified 17 days ago

|

history blame contribute delete

2.84 kB

	---
	license: apache-2.0
	tags:
	- multimodel
	---




	# Available models and languages

	There are six model sizes, four with English-only versions, offering speed and accuracy tradeoffs.
	Below are the names of the available models and their approximate memory requirements and inference speed relative to the large model.
	The relative speeds below are measured by transcribing English speech on a A100, and the real-world speed may vary significantly depending on many factors including the language, the speaking speed, and the available hardware.

	\| Size \| Parameters \| English-only model \| Multilingual model \| Required VRAM \| Relative speed \|
	\|:------:\|:----------:\|:------------------:\|:------------------:\|:-------------:\|:--------------:\|
	\| tiny \| 39 M \| `tiny.en` \| `tiny` \| ~1 GB \| ~10x \|
	\| base \| 74 M \| `base.en` \| `base` \| ~1 GB \| ~7x \|
	\| small \| 244 M \| `small.en` \| `small` \| ~2 GB \| ~4x \|
	\| medium \| 769 M \| `medium.en` \| `medium` \| ~5 GB \| ~2x \|
	\| large \| 1550 M \| N/A \| `large` \| ~10 GB \| 1x \|
	\| turbo \| 809 M \| N/A \| `turbo` \| ~6 GB \| ~8x \|

	Supported Languages
	---
	English
	Chinese
	German
	Spanish
	Russian
	Korean
	French
	Japanese
	Portuguese
	Turkish
	Polish
	Catalan
	Dutch
	Arabic
	Swedish
	Italian
	Indonesian
	Hindi
	Finnish
	Vietnamese
	Hebrew
	Ukrainian
	Greek
	Malay
	Czech
	Romanian
	Danish
	Hungarian
	Tamil
	Norwegian
	Thai
	Urdu
	Croatian
	Bulgarian
	Lithuanian
	Latin
	Māori
	Malayalam
	Welsh
	Slovak
	Telugu
	Persian
	Latvian
	Bengali
	Serbian
	Azerbaijani
	Slovenian
	Kannada
	Estonian
	Macedonian
	Breton
	Basque
	Icelandic
	Armenian
	Nepali
	Mongolian
	Bosnian
	Kazakh
	Albanian
	Swahili
	Galician
	Marathi
	Panjabi
	Sinhala
	Khmer
	Shona
	Yoruba
	Somali
	Afrikaans
	Occitan
	Georgian
	Belarusian
	Tajik
	Sindhi
	Gujarati
	Amharic
	Yiddish
	Lao
	Uzbek
	Faroese
	Haitian
	Pashto
	Turkmen
	Norwegian Nynorsk
	Maltese
	Sanskrit
	Luxembourgish
	Burmese
	Tibetan
	Tagalog
	Malagasy
	Assamese
	Tatar
	Hawaiian
	Lingala
	Hausa
	Bashkir
	jw
	Sundanese
	===
	Whisper's performance varies widely depending on the language. The figure below shows a performance breakdown of `large-v3` and `large-v2` models by language, using WERs (word error rates) or CER (character error rates, shown in Italic) evaluated on the Common Voice 15 and Fleurs datasets. Additional WER/CER metrics corresponding to the other models and datasets can be found in Appendix D.1, D.2, and D.4 of [the paper](https://arxiv.org/abs/2212.04356), as well as the BLEU (Bilingual Evaluation Understudy) scores for translation in Appendix D.3.

	![WER breakdown by language](https://github.com/openai/whisper/assets/266841/f4619d66-1058-4005-8f67-a9d811b77c62)