espnet
/

powsm_ctc

@@ -32,9 +32,12 @@ pipeline_tag: automatic-speech-recognition
 POWSM-CTC is a variant of [POWSM](https://huggingface.co/espnet/powsm),  the first phonetic foundation model that can perform four phone-related tasks.
 Its multi-task encoder-CTC structure is based on [OWSM-CTC](https://aclanthology.org/2024.acl-long.549/), and trained on [IPAPack++](https://huggingface.co/anyspeech), the same dataset as POWSM.
-POWSM-CTC is proposed together with our paper [PRiSM](https://arxiv.org/abs/2601.14046), the first open-source benchmark for phone recognition systems.
 Its decoding is much faster than encoder-decoder models, with similar or enhanced PR performance on unseen domain.
 To use the pre-trained model, please install `espnet` and `espnet_model_zoo`. The requirements are:
 ```
 torch

 POWSM-CTC is a variant of [POWSM](https://huggingface.co/espnet/powsm),  the first phonetic foundation model that can perform four phone-related tasks.
 Its multi-task encoder-CTC structure is based on [OWSM-CTC](https://aclanthology.org/2024.acl-long.549/), and trained on [IPAPack++](https://huggingface.co/anyspeech), the same dataset as POWSM.
+This model is proposed together with our paper [PRiSM](https://arxiv.org/abs/2601.14046), the first open-source benchmark for phone recognition systems.
 Its decoding is much faster than encoder-decoder models, with similar or enhanced PR performance on unseen domain.
+> [!TIP]
+> Check out POWSM-CTC's predecessor: [🐁POWSM](https://huggingface.co/espnet/powsm)
 To use the pre-trained model, please install `espnet` and `espnet_model_zoo`. The requirements are:
 ```
 torch