Simba-TTS-lin / README.md

Update README.md

c5f13db verified about 17 hours ago

4.93 kB

	---
	language:
	- lin # Lingala
	license: cc-by-4.0
	tags:
	- automatic-speech-recognition
	- audio
	- speech
	- african-languages
	- multilingual
	- simba
	- low-resource
	- speech-recognition
	- asr
	datasets:
	- UBC-NLP/SimbaBench
	metrics:
	- wer
	- cer
	library_name: transformers
	pipeline_tag: automatic-speech-recognition
	---
	<div align="center">

	<img src="https://africa.dlnlp.ai/simba/images/VoC_logo.png" alt="VoC Logo">

	[![EMNLP 2025 Paper](https://img.shields.io/badge/EMNLP_2025-Paper-B31B1B?style=for-the-badge&logo=arxiv&logoColor=B31B1B&labelColor=FFCDD2)](https://aclanthology.org/2025.emnlp-main.559/)
	[![Official Website](https://img.shields.io/badge/Official-Website-2EA44F?style=for-the-badge&logo=googlechrome&logoColor=2EA44F&labelColor=C8E6C9)](https://africa.dlnlp.ai/simba/)
	[![SimbaBench](https://img.shields.io/badge/SimbaBench-Benchmark-8A2BE2?style=for-the-badge&logo=googlecharts&logoColor=8A2BE2&labelColor=E1BEE7)](https://huggingface.co/spaces/UBC-NLP/SimbaBench)
	[![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-FFD21E?style=for-the-badge&logoColor=black&labelColor=FFF9C4)](https://huggingface.co/collections/UBC-NLP/simba-speech-series)

	</div>

	## Bridging the Digital Divide for African AI

	Voice of a Continent is a comprehensive open-source ecosystem designed to bring African languages to the forefront of artificial intelligence. By providing a unified suite of benchmarking tools and state-of-the-art models, we ensure that the future of speech technology is inclusive, representative, and accessible to over a billion people.

	## Best-in-Class Multilingual Models

	<img src="https://africa.dlnlp.ai/simba/images/VoC_simba" alt="VoC Simba Models Logo">

	Introduced in our EMNLP 2025 paper [Voice of a Continent](https://aclanthology.org/2025.emnlp-main.559/), the Simba Series represents the current state-of-the-art for African speech AI.

	- Unified Suite: Models optimized for African languages.
	- Superior Accuracy: Outperforms generic multilingual models by leveraging SimbaBench's high-quality, domain-diverse datasets.
	- Multitask Capability: Designed for high performance in ASR (Automatic Speech Recognition) and TTS (Text-to-Speech).
	- Inclusion-First: Specifically built to mitigate the "digital divide" by empowering speakers of underrepresented languages.

	The Simba family consists of state-of-the-art models fine-tuned using SimbaBench. These models achieve superior performance by leveraging dataset quality, domain diversity, and language family relationships.


	### 🔊 Simba-TTS (Text-to-Speech)
	* 🎯 Task: `Text-to-Speech` — Natural Voice Synthesis.
	🌍 Language Coverage (7 African languages)
	> Afrikaans (`afr`), Asante Twi (`asanti`), Akuapem Twi (`akuapem`), Lingala (`lin`), Southern Sotho (`sot`), Tswana (`tsn`), Xhosa (`xho`)

	\| TTS Model \| Architecture \| Hugging Face Card \| Status \|
	\| :--- \| :--- \| :---: \| :---: \|
	\| Simba-TTS-afr 🔊 \| MMS-TTS \| 🤗 [https://huggingface.co/UBC-NLP/Simba-TTS-afr](https://huggingface.co/UBC-NLP/Simba-TTS-afr) \| ✅ Released \|
	\| Simba-TTS-twi-asanti 🔊 \| MMS-TTS \| 🤗 [https://huggingface.co/UBC-NLP/Simba-TTS-twi-asanti](https://huggingface.co/UBC-NLP/Simba-TTS-twi-asanti) \| ✅ Released \|
	\| Simba-TTS-twi-akuapem 🔊 \| MMS-TTS \| 🤗 [https://huggingface.co/UBC-NLP/Simba-TTS-twi-akuapem](https://huggingface.co/UBC-NLP/Simba-TTS-twi-akuapem) \| ✅ Released \|
	\| Simba-TTS-lin 🔊 \| MMS-TTS \| 🤗 [https://huggingface.co/UBC-NLP/Simba-TTS-lin](https://huggingface.co/UBC-NLP/Simba-TTS-lin) \| ✅ Released \|
	\| Simba-TTS-sot 🔊 \| MMS-TTS \| 🤗 [https://huggingface.co/UBC-NLP/Simba-TTS-sot](https://huggingface.co/UBC-NLP/Simba-TTS-sot) \| ✅ Released \|
	\| Simba-TTS-tsn 🔊 \| MMS-TTS \| 🤗 [https://huggingface.co/UBC-NLP/Simba-TTS-tsn](https://huggingface.co/UBC-NLP/Simba-TTS-tsn) \| ✅ Released \|
	\| Simba-TTS-xho 🔊 \| MMS-TTS \| 🤗 [https://huggingface.co/UBC-NLP/Simba-TTS-xho](https://huggingface.co/UBC-NLP/Simba-TTS-xho) \| ✅ Released \|

	🧩 Usage Example

	You can easily run inference using the Hugging Face `transformers` library.

	```python
	from transformers import VitsModel, AutoTokenizer
	import torch

	model_name="Simba-TTS-afr" ## Simba-TTS-twi-asanti, Simba-TTS-twi-akuapem, Simba-TTS-lin, Simba-TTS-sot, Simba-TTS-tsn, Simba-TTS-xho
	model = VitsModel.from_pretrained(model_name)
	tokenizer = AutoTokenizer.from_pretrained(model_name)

	text = "Ons noem hierdie deeltjies sub-atomiese deeltjies" #example of Afrikaans (afr) language
	inputs = tokenizer(text, return_tensors="pt")

	with torch.no_grad():
	output = model(**inputs).waveform

	```
	The resulting waveform can be saved as a .wav file:
	```python
	scipy.io.wavfile.write("outputfile.wav", rate=model.config.sampling_rate, data=output.float().numpy())

	```