nielsr HF Staff commited on
Commit
fc22527
·
verified ·
1 Parent(s): 6a8a991

Improve model card: Add metadata (pipeline, language, library, tags, dataset) and paper link, and usage example

Browse files

This PR significantly improves the model card for the YoruTTS-0.5 model by:
- Adding essential metadata for better discoverability and categorization:
- `pipeline_tag: text-to-speech` to ensure the model appears under the correct task.
- `language: yo` for accurate language filtering.
- `library_name: coqui_tts` to indicate its compatibility with the Coqui-TTS library, as inferred from the paper and config file.
- `tags: [tts, yoruba]` for enhanced search capabilities.
- `datasets: [aspmirlab/BENYO-S2ST-Corpus-1]` to link it to its foundational corpus.
- Linking the model to its associated paper: [BENYO-S2ST-Corpus-1: A Bilingual English-to-Yoruba Direct Speech-to-Speech Translation Corpus](https://huggingface.co/papers/2507.09342).
- Improving the content's markdown formatting by converting non-standard HTML tags (`<b>`, `<link>`, `<br>`) to standard Markdown.
- Adding a practical `Usage` section with a code example to facilitate easy inference and promote model adoption.

Files changed (1) hide show
  1. README.md +50 -8
README.md CHANGED
@@ -1,14 +1,56 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
 
 
3
  ---
4
- <b>YoruTTS-0.5 Model</b>
5
- =====================
6
- The use of TTS-based augmentation to generate large-scale synthetic English audio is well reported in the literature (Li et al., 2025; Moslem 2024; Robinson et al., 2022)  due to the high resourcefulness of English. Conversely, this is not the case with the Yorùbá Language, which is extremely low resourced in terms of audio datasets and speech based models. The critical place of robust TTS models for synthesizing target audio files for direct S2ST models from a high resource source language to a low resource target language is also well reported (Jia et al. 2022; Conneau et al. 2023; IWSLT, 2023).
7
- We reviewed Eighteen (18) state-of-the-art TTS models across five (5) architectural categories (i.e. autoregressive, flow-based, diffusion-based, parallel feedforward, and prompt-based) reveal the predominance of the English language. Among these, only Facebook MMS supports Yorùbá TTS  in its pre-trained version but with no Yorùbá specific Grapheme2Phoneme(G2P) tool. Furthermore, Variational Inference Text-to-Speech(VITS) was not pretrained with  Yorùbá but can only be finetuned for it, but it also lacks G2P tool for Yorùbá and other low resourced African languages.<br>
8
 
9
- Given the foregoing, a Yorùbá TTS model named <b>YoruTTS-0.5</b>, based on our newly released <b>BENYO-S2ST-Corpus-1</b>(<link>https://huggingface.co/datasets/aspmirlab/BENYO-S2ST-Corpus-1</link> was developed. Developing a Yorùbá TTS model with the augmented Yorùbá audio and transcript pairs, which is a subset of the <b>BENYO-S2ST-Corpus-1</b> presents several potential benefits. The major one is that the model can be utilised to carry out TTS-based augmentation, which would boost the size of the Yorùbá audio samples for upgrading the <b>BENYO-S2ST-Corpus-1</b> towards building robust direct S2ST model for English and Yorùbá language pair.
10
 
11
- This work is funded through the 2024 Google Academic Research Award (GARA) for Society Centered Artificial Intelligence (SCAI) to Emmanuel Adetiba on the research project titled - A Direct Speech-to-Speech Model for English-to-Yoruba Translation Towards Bridging Language Barriers in Public Health Education Outreaches<link>(https://bit.ly/3PQj7fq)</link>.<br>
12
 
13
- <b>CONTACT:</b>
14
- <b><link>emmanueladetiba@gmail.com, emmanuel.adetiba@covenantuniversity.edu.ng
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ pipeline_tag: text-to-speech
4
+ language: yo
5
+ library_name: coqui_tts
6
+ tags:
7
+ - tts
8
+ - yoruba
9
+ datasets:
10
+ - aspmirlab/BENYO-S2ST-Corpus-1
11
  ---
 
 
 
 
12
 
13
+ # YoruTTS-0.5 Model
14
 
15
+ This model is a Yoruba Text-to-Speech (TTS) model presented in the paper [BENYO-S2ST-Corpus-1: A Bilingual English-to-Yoruba Direct Speech-to-Speech Translation Corpus](https://huggingface.co/papers/2507.09342).
16
 
17
+ The use of TTS-based augmentation to generate large-scale synthetic English audio is well reported in the literature (Li et al., 2025; Moslem 2024; Robinson et al., 2022) due to the high resourcefulness of English. Conversely, this is not the case with the Yorùbá Language, which is extremely low resourced in terms of audio datasets and speech based models. The critical place of robust TTS models for synthesizing target audio files for direct S2ST models from a high resource source language to a low resource target language is also well reported (Jia et al. 2022; Conneau et al. 2023; IWSLT, 2023).
18
+
19
+ We reviewed Eighteen (18) state-of-the-art TTS models across five (5) architectural categories (i.e. autoregressive, flow-based, diffusion-based, parallel feedforward, and prompt-based) reveal the predominance of the English language. Among these, only Facebook MMS supports Yorùbá TTS in its pre-trained version but with no Yorùbá specific Grapheme2Phoneme(G2P) tool. Furthermore, Variational Inference Text-to-Speech(VITS) was not pretrained with Yorùbá but can only be finetuned for it, but it also lacks G2P tool for Yorùbá and other low resourced African languages.
20
+
21
+ Given the foregoing, a Yorùbá TTS model named **YoruTTS-0.5**, based on our newly released **BENYO-S2ST-Corpus-1** ([aspmirlab/BENYO-S2ST-Corpus-1](https://huggingface.co/datasets/aspmirlab/BENYO-S2ST-Corpus-1)) was developed. Developing a Yorùbá TTS model with the augmented Yorùbá audio and transcript pairs, which is a subset of the **BENYO-S2ST-Corpus-1** presents several potential benefits. The major one is that the model can be utilised to carry out TTS-based augmentation, which would boost the size of the Yorùbá audio samples for upgrading the **BENYO-S2ST-Corpus-1** towards building robust direct S2ST model for English and Yorùbá language pair.
22
+
23
+ This work is funded through the 2024 Google Academic Research Award (GARA) for Society Centered Artificial Intelligence (SCAI) to Emmanuel Adetiba on the research project titled - A Direct Speech-to-Speech Model for English-to-Yoruba Translation Towards Bridging Language Barriers in Public Health Education Outreaches ([https://bit.ly/3PQj7fq](https://bit.ly/3PQj7fq)).
24
+
25
+ ### Usage
26
+
27
+ You can use this model with the `TTS` (Coqui-TTS) library. First, ensure you have the `TTS` library installed:
28
+
29
+ ```bash
30
+ pip install TTS
31
+ ```
32
+
33
+ Then, you can load and use the model for text-to-speech generation:
34
+
35
+ ```python
36
+ from TTS.api import TTS
37
+
38
+ # Initialize TTS model (YoruTTS-0.5 is a VITS-based model)
39
+ # You might need to specify the model path if it's not automatically loaded
40
+ # Depending on how the model is hosted, you might be able to load it directly
41
+ # via its Hugging Face ID: tts = TTS(model_name="path/to/your/YoruTTS-0.5")
42
+ # For local use with downloaded files:
43
+ tts = TTS(model_path="your_model_path/model.pth", config_path="your_model_path/YoruTTS-0p5-Config.json")
44
+
45
+ # Text to synthesize (example Yoruba text: "I love learning Yoruba")
46
+ text = "Mo nifẹ si kikọ Yorùbá"
47
+
48
+ # Generate speech
49
+ output_filepath = "output_yoruba_speech.wav"
50
+ tts.tts_to_file(text=text, file_path=output_filepath)
51
+
52
+ print(f"Generated speech saved to {output_filepath}")
53
+ ```
54
+
55
+ **CONTACT:**
56
+ emmanueladetiba@gmail.com, emmanuel.adetiba@covenantuniversity.edu.ng