Bridging the Digital Divide for African AI
Voice of a Continent is a comprehensive open-source ecosystem designed to bring African languages to the forefront of artificial intelligence. By providing a unified suite of benchmarking tools and state-of-the-art models, we ensure that the future of speech technology is inclusive, representative, and accessible to over a billion people.
Best-in-Class Multilingual Models
Introduced in our EMNLP 2025 paper Voice of a Continent, the Simba Series represents the current state-of-the-art for African speech AI.
- Unified Suite: Models optimized for African languages.
- Superior Accuracy: Outperforms generic multilingual models by leveraging SimbaBench's high-quality, domain-diverse datasets.
- Multitask Capability: Designed for high performance in ASR (Automatic Speech Recognition) and TTS (Text-to-Speech).
- Inclusion-First: Specifically built to mitigate the "digital divide" by empowering speakers of underrepresented languages.
The Simba family consists of state-of-the-art models fine-tuned using SimbaBench. These models achieve superior performance by leveraging dataset quality, domain diversity, and language family relationships.
π Simba-TTS (Text-to-Speech)
- π― Task:
Text-to-Speechβ Natural Voice Synthesis. π Language Coverage (7 African languages)Afrikaans (
afr), Asante Twi (asanti), Akuapem Twi (akuapem), Lingala (lin), Southern Sotho (sot), Tswana (tsn), Xhosa (xho)
| TTS Model | Architecture | Hugging Face Card | Status |
|---|---|---|---|
| Simba-TTS-afr π | MMS-TTS | π€ https://huggingface.co/UBC-NLP/Simba-TTS-afr | β Released |
| Simba-TTS-twi-asanti π | MMS-TTS | π€ https://huggingface.co/UBC-NLP/Simba-TTS-twi-asanti | β Released |
| Simba-TTS-twi-akuapem π | MMS-TTS | π€ https://huggingface.co/UBC-NLP/Simba-TTS-twi-akuapem | β Released |
| Simba-TTS-lin π | MMS-TTS | π€ https://huggingface.co/UBC-NLP/Simba-TTS-lin | β Released |
| Simba-TTS-sot π | MMS-TTS | π€ https://huggingface.co/UBC-NLP/Simba-TTS-sot | β Released |
| Simba-TTS-tsn π | MMS-TTS | π€ https://huggingface.co/UBC-NLP/Simba-TTS-tsn | β Released |
| Simba-TTS-xho π | MMS-TTS | π€ https://huggingface.co/UBC-NLP/Simba-TTS-xho | β Released |
π§© Usage Example
You can easily run inference using the Hugging Face transformers library.
from transformers import VitsModel, AutoTokenizer
import torch
model_name="Simba-TTS-afr" ## Simba-TTS-twi-asanti, Simba-TTS-twi-akuapem, Simba-TTS-lin, Simba-TTS-sot, Simba-TTS-tsn, Simba-TTS-xho
model = VitsModel.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
text = "Ons noem hierdie deeltjies sub-atomiese deeltjies" #example of Afrikaans (afr) language
inputs = tokenizer(text, return_tensors="pt")
with torch.no_grad():
output = model(**inputs).waveform
The resulting waveform can be saved as a .wav file:
scipy.io.wavfile.write("outputfile.wav", rate=model.config.sampling_rate, data=output.float().numpy())
Citation
If you use the Simba models or SimbaBench benchmark for your scientific publication, or if you find the resources in this website useful, please cite our paper.
@inproceedings{elmadany-etal-2025-voice,
title = "Voice of a Continent: Mapping {A}frica{'}s Speech Technology Frontier",
author = "Elmadany, AbdelRahim A. and
Kwon, Sang Yun and
Toyin, Hawau Olamide and
Alcoba Inciarte, Alcides and
Aldarmaki, Hanan and
Abdul-Mageed, Muhammad",
editor = "Christodoulopoulos, Christos and
Chakraborty, Tanmoy and
Rose, Carolyn and
Peng, Violet",
booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing",
month = nov,
year = "2025",
address = "Suzhou, China",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.emnlp-main.559/",
doi = "10.18653/v1/2025.emnlp-main.559",
pages = "11039--11061",
ISBN = "979-8-89176-332-6",
}
- Downloads last month
- -