Improve model card: Add metadata (pipeline, language, library, tags, dataset) and paper link, and usage example

This PR significantly improves the model card for the YoruTTS-0.5 model by:
- Adding essential metadata for better discoverability and categorization:
- `pipeline_tag: text-to-speech` to ensure the model appears under the correct task.
- `language: yo` for accurate language filtering.
- `library_name: coqui_tts` to indicate its compatibility with the Coqui-TTS library, as inferred from the paper and config file.
- `tags: [tts, yoruba]` for enhanced search capabilities.
- `datasets: [aspmirlab/BENYO-S2ST-Corpus-1]` to link it to its foundational corpus.
- Linking the model to its associated paper: [BENYO-S2ST-Corpus-1: A Bilingual English-to-Yoruba Direct Speech-to-Speech Translation Corpus](https://huggingface.co/papers/2507.09342).
- Improving the content's markdown formatting by converting non-standard HTML tags (`<b>`, `<link>`, `<br>`) to standard Markdown.
- Adding a practical `Usage` section with a code example to facilitate easy inference and promote model adoption.

Files changed (1) hide show

README.md +50 -8

README.md CHANGED Viewed

@@ -1,14 +1,56 @@
 ---
 license: apache-2.0
 ---
-<b>YoruTTS-0.5 Model</b>
-=====================
-The use of TTS-based augmentation to generate large-scale synthetic English audio is well reported in the literature (Li et al., 2025; Moslem 2024; Robinson et al., 2022)  due to the high resourcefulness of English. Conversely, this is not the case with the Yorùbá Language, which is extremely low resourced in terms of audio datasets and speech based models. The critical place of robust TTS models for synthesizing target audio files for  direct S2ST models from a high resource source language to a low resource target language is also well reported (Jia et al. 2022; Conneau et al. 2023; IWSLT, 2023).
-We reviewed Eighteen (18) state-of-the-art TTS models  across five (5) architectural categories (i.e. autoregressive, flow-based, diffusion-based, parallel feedforward, and prompt-based) reveal the predominance of the English language. Among these, only Facebook MMS supports Yorùbá TTS  in its pre-trained version but with no Yorùbá specific Grapheme2Phoneme(G2P) tool. Furthermore, Variational Inference Text-to-Speech(VITS) was not pretrained with  Yorùbá but can only be finetuned for it, but it also lacks G2P tool for Yorùbá and other low resourced African languages.<br>
-Given the foregoing, a Yorùbá TTS model named <b>YoruTTS-0.5</b>, based on our newly released  <b>BENYO-S2ST-Corpus-1</b>(<link>https://huggingface.co/datasets/aspmirlab/BENYO-S2ST-Corpus-1</link> was developed.  Developing a Yorùbá TTS model with the augmented Yorùbá audio and transcript pairs, which is a subset of the  <b>BENYO-S2ST-Corpus-1</b> presents several potential benefits. The major one is that the model can be utilised to carry out TTS-based augmentation, which would boost the size of the Yorùbá audio samples for upgrading the <b>BENYO-S2ST-Corpus-1</b> towards building robust direct S2ST model for English and Yorùbá language pair.
-This work is funded through the 2024 Google Academic Research Award (GARA) for Society Centered Artificial Intelligence (SCAI) to Emmanuel Adetiba on the research project titled - A Direct Speech-to-Speech Model for English-to-Yoruba Translation Towards Bridging Language Barriers in Public Health Education Outreaches<link>(https://bit.ly/3PQj7fq)</link>.<br>
-<b>CONTACT:</b>
-<b><link>emmanueladetiba@gmail.com, emmanuel.adetiba@covenantuniversity.edu.ng

 ---
 license: apache-2.0
+pipeline_tag: text-to-speech
+language: yo
+library_name: coqui_tts
+tags:
+  - tts
+  - yoruba
+datasets:
+  - aspmirlab/BENYO-S2ST-Corpus-1
 ---
+# YoruTTS-0.5 Model
+This model is a Yoruba Text-to-Speech (TTS) model presented in the paper [BENYO-S2ST-Corpus-1: A Bilingual English-to-Yoruba Direct Speech-to-Speech Translation Corpus](https://huggingface.co/papers/2507.09342).
+The use of TTS-based augmentation to generate large-scale synthetic English audio is well reported in the literature (Li et al., 2025; Moslem 2024; Robinson et al., 2022) due to the high resourcefulness of English. Conversely, this is not the case with the Yorùbá Language, which is extremely low resourced in terms of audio datasets and speech based models. The critical place of robust TTS models for synthesizing target audio files for direct S2ST models from a high resource source language to a low resource target language is also well reported (Jia et al. 2022; Conneau et al. 2023; IWSLT, 2023).
+We reviewed Eighteen (18) state-of-the-art TTS models across five (5) architectural categories (i.e. autoregressive, flow-based, diffusion-based, parallel feedforward, and prompt-based) reveal the predominance of the English language. Among these, only Facebook MMS supports Yorùbá TTS in its pre-trained version but with no Yorùbá specific Grapheme2Phoneme(G2P) tool. Furthermore, Variational Inference Text-to-Speech(VITS) was not pretrained with Yorùbá but can only be finetuned for it, but it also lacks G2P tool for Yorùbá and other low resourced African languages.
+Given the foregoing, a Yorùbá TTS model named **YoruTTS-0.5**, based on our newly released **BENYO-S2ST-Corpus-1** ([aspmirlab/BENYO-S2ST-Corpus-1](https://huggingface.co/datasets/aspmirlab/BENYO-S2ST-Corpus-1)) was developed. Developing a Yorùbá TTS model with the augmented Yorùbá audio and transcript pairs, which is a subset of the **BENYO-S2ST-Corpus-1** presents several potential benefits. The major one is that the model can be utilised to carry out TTS-based augmentation, which would boost the size of the Yorùbá audio samples for upgrading the **BENYO-S2ST-Corpus-1** towards building robust direct S2ST model for English and Yorùbá language pair.
+This work is funded through the 2024 Google Academic Research Award (GARA) for Society Centered Artificial Intelligence (SCAI) to Emmanuel Adetiba on the research project titled - A Direct Speech-to-Speech Model for English-to-Yoruba Translation Towards Bridging Language Barriers in Public Health Education Outreaches ([https://bit.ly/3PQj7fq](https://bit.ly/3PQj7fq)).
+### Usage
+You can use this model with the `TTS` (Coqui-TTS) library. First, ensure you have the `TTS` library installed:
+```bash
+pip install TTS
+```
+Then, you can load and use the model for text-to-speech generation:
+```python
+from TTS.api import TTS
+# Initialize TTS model (YoruTTS-0.5 is a VITS-based model)
+# You might need to specify the model path if it's not automatically loaded
+# Depending on how the model is hosted, you might be able to load it directly
+# via its Hugging Face ID: tts = TTS(model_name="path/to/your/YoruTTS-0.5")
+# For local use with downloaded files:
+tts = TTS(model_path="your_model_path/model.pth", config_path="your_model_path/YoruTTS-0p5-Config.json")
+# Text to synthesize (example Yoruba text: "I love learning Yoruba")
+text = "Mo nifẹ si kikọ Yorùbá"
+# Generate speech
+output_filepath = "output_yoruba_speech.wav"
+tts.tts_to_file(text=text, file_path=output_filepath)
+print(f"Generated speech saved to {output_filepath}")
+```
+**CONTACT:**
+emmanueladetiba@gmail.com, emmanuel.adetiba@covenantuniversity.edu.ng