pyf98 commited on
Commit
2422ee9
·
verified ·
1 Parent(s): 0f96e13

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -5
README.md CHANGED
@@ -17,11 +17,15 @@ tags:
17
  pipeline_tag: automatic-speech-recognition
18
  ---
19
 
20
- [OWSM-CTC](https://aclanthology.org/2024.acl-long.549/) (Peng et al., ACL 2024) is an encoder-only speech foundation model based on hierarchical multi-task self-conditioned CTC.
21
- It follows the design of the project, [Open Whisper-style Speech Model (OWSM)](https://www.wavlab.org/activities/2024/owsm/).
 
22
 
23
- [OWSM-CTC v4](https://huggingface.co/papers/2506.00338) is trained for three epochs on 320k hours of public audio data covering multilingual speech recognition, any-to-any speech translation, and language identification.
24
- The newly curated data will be publicly released. Please stay tuned!
 
 
 
25
 
26
  To use the pre-trained model, please install `espnet` and `espnet_model_zoo`. The requirements are:
27
  ```
@@ -31,7 +35,7 @@ espnet
31
  espnet_model_zoo
32
  ```
33
 
34
- **The recipe can be found in ESPnet:** https://github.com/espnet/espnet/tree/master/egs2/owsm_ctc_v3.1/s2t1
35
 
36
  ### Example script for batched inference
37
 
 
17
  pipeline_tag: automatic-speech-recognition
18
  ---
19
 
20
+ [Open Whisper-style Speech Model (OWSM)](https://www.wavlab.org/activities/2024/owsm/) is the first **fully open** Whisper-style speech foundation model.
21
+ It reproduces and advances OpenAI's Whisper-style training using publicly available data and open-source toolkits.
22
+ The code, pre-trained model weights, and training logs are publicly released to promote open science in speech foundation models.
23
 
24
+ [OWSM-CTC](https://aclanthology.org/2024.acl-long.549/) (Peng et al., ACL 2024) is a novel encoder-only speech foundation model based on hierarchical multi-task self-conditioned CTC.
25
+ It supports multilingual speech recognition, speech translation, and language identification within a single non-autoregressive model.
26
+
27
+ [OWSM-CTC v4](https://www.isca-archive.org/interspeech_2025/peng25c_interspeech.html) is trained for three epochs on 320k hours of public audio data covering multilingual speech recognition, any-to-any speech translation, and language identification.
28
+ The newly curated data are publicly released: https://huggingface.co/datasets/espnet/yodas_owsmv4
29
 
30
  To use the pre-trained model, please install `espnet` and `espnet_model_zoo`. The requirements are:
31
  ```
 
35
  espnet_model_zoo
36
  ```
37
 
38
+ **The recipe can be found in ESPnet:** https://github.com/espnet/espnet/tree/master/egs2/owsm_ctc_v4/s2t1
39
 
40
  ### Example script for batched inference
41