Update README.md
Browse files
README.md
CHANGED
|
@@ -1,10 +1,238 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
---
|
| 9 |
|
| 10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Uganda Text-to-Speech (TTS) Models
|
| 2 |
+
|
| 3 |
+
**🔗 [Access Models on HuggingFace](https://huggingface.co/Uganda-lang)**
|
| 4 |
+
|
| 5 |
+
A comprehensive suite of Text-to-Speech (TTS) models for Ugandan languages, supporting **English, Luganda, Runyonkole, Tesso, and Acholi** with multiple speaker voices for each language.
|
| 6 |
+
|
| 7 |
+
---
|
| 8 |
+
|
| 9 |
+
## Table of Contents
|
| 10 |
+
|
| 11 |
+
- [Introduction](#introduction)
|
| 12 |
+
- [Model Architecture](#model-architecture)
|
| 13 |
+
- [Supported Languages & Voices](#supported-languages--voices)
|
| 14 |
+
- [Audio Examples](#audio-examples)
|
| 15 |
+
- [Access & Usage](#access--usage)
|
| 16 |
+
- [Limitations](#limitations)
|
| 17 |
+
- [Future Work](#future-work)
|
| 18 |
+
- [Citation](#citation)
|
| 19 |
+
|
| 20 |
+
---
|
| 21 |
+
|
| 22 |
+
## Introduction
|
| 23 |
+
|
| 24 |
+
This project introduces a groundbreaking collection of Text-to-Speech models specifically designed for Ugandan languages. These models represent a significant advancement in African language technology, addressing the critical gap in speech synthesis capabilities for Uganda's diverse linguistic landscape.
|
| 25 |
+
|
| 26 |
+
The Uganda TTS model family consists of fine-tuned versions of the **Orpheus 3B model**, each specialized for different Ugandan languages while maintaining quality speech synthesis capabilities. Each language model supports multiple distinct speaker voices, enabling versatile applications in education, accessibility, content creation, and digital preservation of Ugandan languages.
|
| 27 |
+
|
| 28 |
+
These models are designed to serve researchers, developers, educators, and content creators who need high-quality speech synthesis in Ugandan languages, contributing to the broader goal of digital language preservation and accessibility in Africa.
|
| 29 |
+
|
| 30 |
+
## Model Architecture
|
| 31 |
+
|
| 32 |
+
The Uganda TTS models utilize a sophisticated two-stage architecture:
|
| 33 |
+
|
| 34 |
+
1. **Audio Token Generation**: The models generate audio tokens based on the SNAC (Structured Neural Audio Codec) framework
|
| 35 |
+
2. **Fine-tuned Processing**: These audio tokens are then processed through specialized fine-tuned versions of the Orpheus 3B model, each optimized for specific Ugandan languages
|
| 36 |
+
|
| 37 |
+
This architecture enables efficient and high-quality speech synthesis while maintaining computational efficiency suitable for various deployment scenarios. More about the Orpheous Models [here](https://canopylabs.ai/releases/orpheus_can_speak_any_language)
|
| 38 |
+
|
| 39 |
+
---
|
| 40 |
+
|
| 41 |
+
## Supported Languages & Voices
|
| 42 |
+
|
| 43 |
+
### English
|
| 44 |
+
**Supported Voices**: Barbara, Mary, Jennifer, Jessica, Susan, James, Linda, Patricia, Elizabeth, Christopher
|
| 45 |
+
|
| 46 |
+
### Luganda
|
| 47 |
+
**Supported Voices**: Charles, Sandra, Christopher, Mark, Barbara, Michelle, Karen, James, Margaret, Daniel
|
| 48 |
+
|
| 49 |
+
### Runyonkole
|
| 50 |
+
**Supported Voices**: Michelle, James, Patricia, Mark, Elizabeth, Charles, Daniel, Barbara, Christopher, Linda
|
| 51 |
+
|
| 52 |
+
### Tesso
|
| 53 |
+
**Supported Voices**: Michelle, Barbara, Jessica, Christopher, James, Daniel, Charles, Mark
|
| 54 |
+
|
| 55 |
+
### Acholi
|
| 56 |
+
**Supported Voices**: James, Barbara, Michelle, Mark, Christopher
|
| 57 |
+
|
| 58 |
+
---
|
| 59 |
+
|
| 60 |
+
## Audio Examples
|
| 61 |
+
|
| 62 |
+
### English
|
| 63 |
+
|
| 64 |
+
**Christopher**
|
| 65 |
+
*Prompt*: "Hello I can speak in English as christopher, one of the voices I can speak."
|
| 66 |
+
<audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/English/christopher.wav"></audio>
|
| 67 |
+
|
| 68 |
+
**Barbara**
|
| 69 |
+
*Prompt*: "Or as barbra, this is one of my female voices. Pretty cool right?."
|
| 70 |
+
<audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/English/barbara.wav"></audio>
|
| 71 |
+
|
| 72 |
+
**Mary**
|
| 73 |
+
*Prompt*: "I can also speak as Mary as well."
|
| 74 |
+
<audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/English/mary.wav"></audio>
|
| 75 |
+
|
| 76 |
+
**James**
|
| 77 |
+
*Prompt*: "Or I can speak as james as you can see."
|
| 78 |
+
<audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/English/james.wav"></audio>
|
| 79 |
+
|
| 80 |
+
**Jessica**
|
| 81 |
+
*Prompt*: "This is my other voice called jessica, I have more voices of jennifer, suzan, linda, patricia and elizabeth. But I will be sharing these voices once I am fully done from baking."
|
| 82 |
+
<audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/English/jessica.wav"></audio>
|
| 83 |
+
|
| 84 |
+
### Luganda
|
| 85 |
+
|
| 86 |
+
**Christopher**
|
| 87 |
+
*Prompt*: "Nsobolla okwo'geranga Christopher nga wowulila kati."
|
| 88 |
+
<audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/Luganda/christopher.wav"></audio>
|
| 89 |
+
|
| 90 |
+
**Charles**
|
| 91 |
+
*Prompt*: "Oba neenjogela nga charles wenti."
|
| 92 |
+
<audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/Luganda/charles.wav"></audio>
|
| 93 |
+
|
| 94 |
+
**Sandra**
|
| 95 |
+
*Prompt*: "Nina neddoboozi lya Sandra bweliti."
|
| 96 |
+
<audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/Luganda/sandra.wav"></audio>
|
| 97 |
+
|
| 98 |
+
**Michelle**
|
| 99 |
+
*Prompt*: "Nsobolla ogwogella bwenti mulino eddoboozi."
|
| 100 |
+
<audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/Luganda/michelle.wav"></audio>
|
| 101 |
+
|
| 102 |
+
**Daniel**
|
| 103 |
+
*Prompt*: "Oba nemulino elye'kisajja nga woowulira."
|
| 104 |
+
<audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/Luganda/daniel.wav"></audio>
|
| 105 |
+
|
| 106 |
+
**Margaret**
|
| 107 |
+
*Prompt*: "Charlissi yimilila awo."
|
| 108 |
+
<audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/Luganda/margaret.wav"></audio>
|
| 109 |
+
|
| 110 |
+
**Mark**
|
| 111 |
+
*Prompt*: "Ninna amaloboozi amalala naye nja kugalaga nga mazze oku tureyininga."
|
| 112 |
+
<audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/Luganda/mark.wav"></audio>
|
| 113 |
+
|
| 114 |
+
### Runyonkole
|
| 115 |
+
|
| 116 |
+
**Christopher**
|
| 117 |
+
*Prompt*: "Nimbasa kugamba nka Christopher omwiraka eri."
|
| 118 |
+
<audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/Runyankole/christopher.wav"></audio>
|
| 119 |
+
|
| 120 |
+
**Michelle**
|
| 121 |
+
*Prompt*: "Nimbasa kugamba nka Michelle omwiraka eri."
|
| 122 |
+
<audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/Runyankole/michelle.wav"></audio>
|
| 123 |
+
|
| 124 |
+
**James**
|
| 125 |
+
*Prompt*: "Uganda eteire amaani aha buhingi n'oburiisa."
|
| 126 |
+
<audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/Runyankole/james.wav"></audio>
|
| 127 |
+
|
| 128 |
+
**Patricia**
|
| 129 |
+
*Prompt*: "Bimwe ebirikugambwa aha reediyo nibihwera abantu kumanya obutare burungi bw'amasharuura gaabo."
|
| 130 |
+
<audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/Runyankole/patricia.wav"></audio>
|
| 131 |
+
|
| 132 |
+
**Charles**
|
| 133 |
+
*Prompt*: "Okukyerererwa kufuuhirira nikyo kirikutokooza ebyokurya ebitwine ebiro ebi."
|
| 134 |
+
<audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/Runyankole/charles.wav"></audio>
|
| 135 |
+
|
| 136 |
+
**Elizabeth**
|
| 137 |
+
*Prompt*: "Omu disiturikiti ya Kayunga emisiri erikukira obwngi ekashangwa erimu ebicoori ebiine oburwaire."
|
| 138 |
+
<audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/Runyankole/elizabeth.wav"></audio>
|
| 139 |
+
|
| 140 |
+
### Tesso
|
| 141 |
+
|
| 142 |
+
**Christopher**
|
| 143 |
+
*Prompt*: "Epedorete akoriok aimedaun ejok kanejaas aicoreta nu itikitikere adeka."
|
| 144 |
+
<audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/Teso/christopher.wav"></audio>
|
| 145 |
+
|
| 146 |
+
**Jessica**
|
| 147 |
+
*Prompt*: "Akoru ikorion luegelegela nes ingarakini itunganan."
|
| 148 |
+
<audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/Teso/jessica.wav"></audio>
|
| 149 |
+
|
| 150 |
+
**James**
|
| 151 |
+
*Prompt*: "Iraasit yen emunaara aticepak ikur enyamitos."
|
| 152 |
+
<audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/Teso/james.wav"></audio>
|
| 153 |
+
|
| 154 |
+
**Daniel**
|
| 155 |
+
*Prompt*: "Aipagisanar nes ewai ecie lo ibwaikinet iboro toma aswam."
|
| 156 |
+
<audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/Teso/daniel.wav"></audio>
|
| 157 |
+
|
| 158 |
+
**Barbara**
|
| 159 |
+
*Prompt*: "Isisianakinete isomeroi kwana asiomak eipone lo isubusaere."
|
| 160 |
+
<audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/Teso/barbara.wav"></audio>
|
| 161 |
+
|
| 162 |
+
### Acholi
|
| 163 |
+
|
| 164 |
+
**Mark**
|
| 165 |
+
*Prompt*: "Uganda tye ka keme ki lok me pur."
|
| 166 |
+
<audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/Acholi/mark.wav"></audio>
|
| 167 |
+
|
| 168 |
+
**Barbara**
|
| 169 |
+
*Prompt*: "Lupur twero nongo kony ma dit ka gunongo ngec me gengo onyo cango two ma balo jami ma i poto."
|
| 170 |
+
<audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/Acholi/barbara.wav"></audio>
|
| 171 |
+
|
| 172 |
+
**James**
|
| 173 |
+
*Prompt*: "Ler ma pe gidodo ma woto ka yenyo cam i dye poto obalo cam weng ma tye i poto."
|
| 174 |
+
<audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/Acholi/james.wav"></audio>
|
| 175 |
+
|
| 176 |
+
**Michelle**
|
| 177 |
+
*Prompt*: "Gum madwong me timo biacara tye i te yub ma pe jenge i kom gamente."
|
| 178 |
+
<audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/Acholi/mitchelle.wav"></audio>
|
| 179 |
+
|
| 180 |
+
---
|
| 181 |
+
|
| 182 |
+
## Access & Usage
|
| 183 |
+
|
| 184 |
+
The models are openly accessible and available for research and development purposes:
|
| 185 |
+
|
| 186 |
+
**🔗 [HuggingFace Model Hub](https://huggingface.co/uganda-tts-models)**
|
| 187 |
+
|
| 188 |
+
All models are provided under an open-source license, encouraging collaboration and further development in African language technologies.
|
| 189 |
+
|
| 190 |
---
|
| 191 |
+
|
| 192 |
+
## Limitations
|
| 193 |
+
|
| 194 |
+
While these models represent significant progress in Ugandan language TTS, there are some current limitations:
|
| 195 |
+
|
| 196 |
+
- **Non-English Language Quality**: The non-English models may occasionally produce lower quality outputs compared to the English model. This is primarily due to the SNAC audio codec not being pre-trained on these languages, which affects the initial audio token generation quality.
|
| 197 |
+
|
| 198 |
+
- **Speaker Consistency**: Non-English voices may sometimes generate speech that does not perfectly match the specified speaker characteristics due to limited training data for certain voice-language combinations.
|
| 199 |
+
|
| 200 |
+
- **Language Coverage**: Current models focus on five major Ugandan languages, with plans to expand to additional languages based on data availability and community needs.
|
| 201 |
+
|
| 202 |
+
**Note**: We are actively working on an improved version that addresses these limitations, including training SNAC on a more diverse set of languages and expanding the training datasets for better speaker fidelity.
|
| 203 |
+
|
| 204 |
---
|
| 205 |
|
| 206 |
+
## Future Work
|
| 207 |
+
|
| 208 |
+
### Completed ✅
|
| 209 |
+
- [x] Train the models for each of the languages
|
| 210 |
+
- [x] Open source the models
|
| 211 |
+
|
| 212 |
+
### In Progress 🔄
|
| 213 |
+
- [ ] Develop a Python package to act as an API for the models
|
| 214 |
+
- [ ] Write a comprehensive white paper detailing the training process and methodology
|
| 215 |
+
- [ ] Improve SNAC training for better non-English language support
|
| 216 |
+
- [ ] Expand training datasets for enhanced speaker consistency
|
| 217 |
+
|
| 218 |
+
---
|
| 219 |
+
|
| 220 |
+
## Citation
|
| 221 |
+
|
| 222 |
+
If you use these models in your research or applications, please cite:
|
| 223 |
+
|
| 224 |
+
```bibtex
|
| 225 |
+
@misc{uganda_tts_2024,
|
| 226 |
+
author = {Kisejjere Rashid and Magala Reuben},
|
| 227 |
+
title = {Uganda Text-to-Speech (TTS) Models},
|
| 228 |
+
year = {2024},
|
| 229 |
+
howpublished = {\url{https://huggingface.co/Uganda-lang}},
|
| 230 |
+
note = {Fine-tuned versions of Orpheus 3B for Ugandan languages}
|
| 231 |
+
}
|
| 232 |
+
```
|
| 233 |
+
|
| 234 |
+
---
|
| 235 |
+
|
| 236 |
+
**Contributing**: We welcome contributions, feedback, and collaboration from the community. Please feel free to open issues or submit pull requests to help improve these models.
|
| 237 |
+
|
| 238 |
+
**Contact**: For questions, collaborations, or support, please reach out through the HuggingFace model repository or create an issue in our GitHub repository."
|