espnet
/

owsm_v4_small_370M

Automatic Speech Recognition

speech-translation

language-identification

Model card Files Files and versions

pyf98 commited on Aug 30, 2025

Commit

01db7d2

·

verified ·

1 Parent(s): 10a9010

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -27,7 +27,7 @@ The Gradio demo is [here](https://huggingface.co/spaces/pyf98/OWSM_v3_demo).
 Additionally, OWSM v4 applies 8 times subsampling (instead of 4 times in OWSM v3.1) to the log Mel features, leading to a final resolution of 80 ms in the encoder.
 When running inference, we recommend setting `maxlenratio=1.0` (default) instead of smaller values.
-This repo contains a base-sized model with 370M parameters, developed by [Yifan Peng](https://pyf98.github.io/) (CMU).
 It is trained on 320k hours of public speech data.
 The newly curated data are publicly released: https://huggingface.co/datasets/espnet/yodas_owsmv4

 Additionally, OWSM v4 applies 8 times subsampling (instead of 4 times in OWSM v3.1) to the log Mel features, leading to a final resolution of 80 ms in the encoder.
 When running inference, we recommend setting `maxlenratio=1.0` (default) instead of smaller values.
+This repo contains a small-sized model with 370M parameters, developed by [Yifan Peng](https://pyf98.github.io/) (CMU).
 It is trained on 320k hours of public speech data.
 The newly curated data are publicly released: https://huggingface.co/datasets/espnet/yodas_owsmv4