Update README.md
Browse files
README.md
CHANGED
|
@@ -14,21 +14,26 @@ license: cc-by-4.0
|
|
| 14 |
|
| 15 |
## Open Whisper-style Speech Model (OWSM)
|
| 16 |
|
| 17 |
-
OWSM aims to develop fully open speech foundation models using publicly available data and open-source toolkits including [ESPnet](https://github.com/espnet/espnet).
|
| 18 |
|
| 19 |
-
Inference examples can be found
|
| 20 |
The Gradio demo is [here](https://huggingface.co/spaces/pyf98/OWSM_v3_demo).
|
| 21 |
|
| 22 |
[OWSM v4]() is the latest version in the OWSM series, which significantly outperforms OWSM v3.1 in LID and multilingual ASR.
|
|
|
|
|
|
|
| 23 |
|
| 24 |
-
This repo contains a
|
|
|
|
|
|
|
| 25 |
|
| 26 |
It supports the following speech-to-text tasks:
|
| 27 |
- Language identification
|
| 28 |
- Speech recognition
|
| 29 |
- Speech translation
|
| 30 |
- Utterance-level timestamp prediction
|
| 31 |
-
- Long-form
|
|
|
|
| 32 |
|
| 33 |
|
| 34 |
### OWSM series
|
|
|
|
| 14 |
|
| 15 |
## Open Whisper-style Speech Model (OWSM)
|
| 16 |
|
| 17 |
+
OWSM aims to develop fully open speech foundation models using publicly available data and open-source toolkits, including [ESPnet](https://github.com/espnet/espnet).
|
| 18 |
|
| 19 |
+
Inference examples can be found on our [project page](https://www.wavlab.org/activities/2024/owsm/).
|
| 20 |
The Gradio demo is [here](https://huggingface.co/spaces/pyf98/OWSM_v3_demo).
|
| 21 |
|
| 22 |
[OWSM v4]() is the latest version in the OWSM series, which significantly outperforms OWSM v3.1 in LID and multilingual ASR.
|
| 23 |
+
Additionally, OWSM v4 applies 8 times subsampling (instead of 4 times in OWSM v3.1) to the log Mel features, leading to a final resolution of 80 ms in the encoder.
|
| 24 |
+
When running inference, we recommend setting `maxlenratio=1.0` (default) instead of smaller values.
|
| 25 |
|
| 26 |
+
This repo contains a base-sized model with 102M parameters, developed by [Yifan Peng](https://pyf98.github.io/) (CMU).
|
| 27 |
+
It is trained on 320k hours of public speech data.
|
| 28 |
+
The newly curated data will be publicly released. Please stay tuned!
|
| 29 |
|
| 30 |
It supports the following speech-to-text tasks:
|
| 31 |
- Language identification
|
| 32 |
- Speech recognition
|
| 33 |
- Speech translation
|
| 34 |
- Utterance-level timestamp prediction
|
| 35 |
+
- Long-form recognition or translation
|
| 36 |
+
|
| 37 |
|
| 38 |
|
| 39 |
### OWSM series
|