| language: | |
| - xho # Xhosa | |
| license: cc-by-4.0 | |
| tags: | |
| - automatic-speech-recognition | |
| - audio | |
| - speech | |
| - african-languages | |
| - multilingual | |
| - simba | |
| - low-resource | |
| - speech-recognition | |
| - asr | |
| datasets: | |
| - UBC-NLP/SimbaBench | |
| metrics: | |
| - wer | |
| - cer | |
| library_name: transformers | |
| pipeline_tag: automatic-speech-recognition | |
| <div align="center"> | |
| <img src="https://africa.dlnlp.ai/simba/images/VoC_logo.png" alt="VoC Logo"> | |
| [](https://aclanthology.org/2025.emnlp-main.559/) | |
| [](https://africa.dlnlp.ai/simba/) | |
| [](https://huggingface.co/spaces/UBC-NLP/SimbaBench) | |
| [](https://huggingface.co/collections/UBC-NLP/simba-speech-series) | |
| </div> | |
| ## *Bridging the Digital Divide for African AI* | |
| **Voice of a Continent** is a comprehensive open-source ecosystem designed to bring African languages to the forefront of artificial intelligence. By providing a unified suite of benchmarking tools and state-of-the-art models, we ensure that the future of speech technology is inclusive, representative, and accessible to over a billion people. | |
| ## Best-in-Class Multilingual Models | |
| <img src="https://africa.dlnlp.ai/simba/images/VoC_simba" alt="VoC Simba Models Logo"> | |
| Introduced in our EMNLP 2025 paper *[Voice of a Continent](https://aclanthology.org/2025.emnlp-main.559/)*, the **Simba Series** represents the current state-of-the-art for African speech AI. | |
| - **Unified Suite:** Models optimized for African languages. | |
| - **Superior Accuracy:** Outperforms generic multilingual models by leveraging SimbaBench's high-quality, domain-diverse datasets. | |
| - **Multitask Capability:** Designed for high performance in ASR (Automatic Speech Recognition) and TTS (Text-to-Speech). | |
| - **Inclusion-First:** Specifically built to mitigate the "digital divide" by empowering speakers of underrepresented languages. | |
| The **Simba** family consists of state-of-the-art models fine-tuned using SimbaBench. These models achieve superior performance by leveraging dataset quality, domain diversity, and language family relationships. | |
| ### π Simba-TTS (Text-to-Speech) | |
| * **π― Task:** `Text-to-Speech` β Natural Voice Synthesis. | |
| **π Language Coverage (7 African languages)** | |
| > **Afrikaans** (`afr`), **Asante Twi** (`asanti`), **Akuapem Twi** (`akuapem`), **Lingala** (`lin`), **Southern Sotho** (`sot`), **Tswana** (`tsn`), **Xhosa** (`xho`) | |
| | **TTS Model** | **Architecture** | **Hugging Face Card** | **Status** | | |
| | :--- | :--- | :---: | :---: | | |
| | **Simba-TTS-afr** π | MMS-TTS | π€ [https://huggingface.co/UBC-NLP/Simba-TTS-afr](https://huggingface.co/UBC-NLP/Simba-TTS-afr) | β Released | | |
| | **Simba-TTS-twi-asanti** π | MMS-TTS | π€ [https://huggingface.co/UBC-NLP/Simba-TTS-twi-asanti](https://huggingface.co/UBC-NLP/Simba-TTS-twi-asanti) | β Released | | |
| | **Simba-TTS-twi-akuapem** π | MMS-TTS | π€ [https://huggingface.co/UBC-NLP/Simba-TTS-twi-akuapem](https://huggingface.co/UBC-NLP/Simba-TTS-twi-akuapem) | β Released | | |
| | **Simba-TTS-lin** π | MMS-TTS | π€ [https://huggingface.co/UBC-NLP/Simba-TTS-lin](https://huggingface.co/UBC-NLP/Simba-TTS-lin) | β Released | | |
| | **Simba-TTS-sot** π | MMS-TTS | π€ [https://huggingface.co/UBC-NLP/Simba-TTS-sot](https://huggingface.co/UBC-NLP/Simba-TTS-sot) | β Released | | |
| | **Simba-TTS-tsn** π | MMS-TTS | π€ [https://huggingface.co/UBC-NLP/Simba-TTS-tsn](https://huggingface.co/UBC-NLP/Simba-TTS-tsn) | β Released | | |
| | **Simba-TTS-xho** π | MMS-TTS | π€ [https://huggingface.co/UBC-NLP/Simba-TTS-xho](https://huggingface.co/UBC-NLP/Simba-TTS-xho) | β Released | | |
| **π§© Usage Example** | |
| You can easily run inference using the Hugging Face `transformers` library. | |
| ```python | |
| from transformers import VitsModel, AutoTokenizer | |
| import torch | |
| model_name="Simba-TTS-afr" ## Simba-TTS-twi-asanti, Simba-TTS-twi-akuapem, Simba-TTS-lin, Simba-TTS-sot, Simba-TTS-tsn, Simba-TTS-xho | |
| model = VitsModel.from_pretrained(model_name) | |
| tokenizer = AutoTokenizer.from_pretrained(model_name) | |
| text = "Ons noem hierdie deeltjies sub-atomiese deeltjies" #example of Afrikaans (afr) language | |
| inputs = tokenizer(text, return_tensors="pt") | |
| with torch.no_grad(): | |
| output = model(**inputs).waveform | |
| ``` | |
| The resulting waveform can be saved as a .wav file: | |
| ```python | |
| scipy.io.wavfile.write("outputfile.wav", rate=model.config.sampling_rate, data=output.float().numpy()) | |
| ``` | |
| ## Citation | |
| If you use the Simba models or SimbaBench benchmark for your scientific publication, or if you find the resources in this website useful, please cite our paper. | |
| ```bibtex | |
| @inproceedings{elmadany-etal-2025-voice, | |
| title = "Voice of a Continent: Mapping {A}frica{'}s Speech Technology Frontier", | |
| author = "Elmadany, AbdelRahim A. and | |
| Kwon, Sang Yun and | |
| Toyin, Hawau Olamide and | |
| Alcoba Inciarte, Alcides and | |
| Aldarmaki, Hanan and | |
| Abdul-Mageed, Muhammad", | |
| editor = "Christodoulopoulos, Christos and | |
| Chakraborty, Tanmoy and | |
| Rose, Carolyn and | |
| Peng, Violet", | |
| booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing", | |
| month = nov, | |
| year = "2025", | |
| address = "Suzhou, China", | |
| publisher = "Association for Computational Linguistics", | |
| url = "https://aclanthology.org/2025.emnlp-main.559/", | |
| doi = "10.18653/v1/2025.emnlp-main.559", | |
| pages = "11039--11061", | |
| ISBN = "979-8-89176-332-6", | |
| } | |
| ``` | |