Update README.md
Browse files
README.md
CHANGED
|
@@ -9,4 +9,18 @@ pipeline_tag: text-to-image
|
|
| 9 |
tags:
|
| 10 |
- medical
|
| 11 |
- free tags
|
| 12 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
tags:
|
| 10 |
- medical
|
| 11 |
- free tags
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
+
# Whisper
|
| 15 |
+
|
| 16 |
+
Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. Trained on 680k hours
|
| 17 |
+
of labelled data, Whisper models demonstrate a strong ability to generalise to many datasets and domains **without** the need
|
| 18 |
+
for fine-tuning.
|
| 19 |
+
|
| 20 |
+
Whisper was proposed in the paper [Robust Speech Recognition via Large-Scale Weak Supervision](https://arxiv.org/abs/2212.04356)
|
| 21 |
+
by Alec Radford et al. from OpenAI. The original code repository can be found [here](https://github.com/openai/whisper).
|
| 22 |
+
|
| 23 |
+
Whisper `large-v3` has the same architecture as the previous large models except the following minor differences:
|
| 24 |
+
|
| 25 |
+
1. The input uses 128 Mel frequency bins instead of 80
|
| 26 |
+
2. A new language token for Cantonese
|