Update README.md
Browse files
README.md
CHANGED
|
@@ -32,9 +32,12 @@ pipeline_tag: automatic-speech-recognition
|
|
| 32 |
POWSM-CTC is a variant of [POWSM](https://huggingface.co/espnet/powsm), the first phonetic foundation model that can perform four phone-related tasks.
|
| 33 |
Its multi-task encoder-CTC structure is based on [OWSM-CTC](https://aclanthology.org/2024.acl-long.549/), and trained on [IPAPack++](https://huggingface.co/anyspeech), the same dataset as POWSM.
|
| 34 |
|
| 35 |
-
|
| 36 |
Its decoding is much faster than encoder-decoder models, with similar or enhanced PR performance on unseen domain.
|
| 37 |
|
|
|
|
|
|
|
|
|
|
| 38 |
To use the pre-trained model, please install `espnet` and `espnet_model_zoo`. The requirements are:
|
| 39 |
```
|
| 40 |
torch
|
|
|
|
| 32 |
POWSM-CTC is a variant of [POWSM](https://huggingface.co/espnet/powsm), the first phonetic foundation model that can perform four phone-related tasks.
|
| 33 |
Its multi-task encoder-CTC structure is based on [OWSM-CTC](https://aclanthology.org/2024.acl-long.549/), and trained on [IPAPack++](https://huggingface.co/anyspeech), the same dataset as POWSM.
|
| 34 |
|
| 35 |
+
This model is proposed together with our paper [PRiSM](https://arxiv.org/abs/2601.14046), the first open-source benchmark for phone recognition systems.
|
| 36 |
Its decoding is much faster than encoder-decoder models, with similar or enhanced PR performance on unseen domain.
|
| 37 |
|
| 38 |
+
> [!TIP]
|
| 39 |
+
> Check out POWSM-CTC's predecessor: [🐁POWSM](https://huggingface.co/espnet/powsm)
|
| 40 |
+
|
| 41 |
To use the pre-trained model, please install `espnet` and `espnet_model_zoo`. The requirements are:
|
| 42 |
```
|
| 43 |
torch
|