|
|
--- |
|
|
language: |
|
|
- lin |
|
|
license: cc-by-4.0 |
|
|
tags: |
|
|
- automatic-speech-recognition |
|
|
- audio |
|
|
- speech |
|
|
- african-languages |
|
|
- multilingual |
|
|
- simba |
|
|
- low-resource |
|
|
- speech-recognition |
|
|
- asr |
|
|
datasets: |
|
|
- UBC-NLP/SimbaBench |
|
|
metrics: |
|
|
- wer |
|
|
- cer |
|
|
library_name: transformers |
|
|
pipeline_tag: automatic-speech-recognition |
|
|
--- |
|
|
<div align="center"> |
|
|
|
|
|
<img src="https://africa.dlnlp.ai/simba/images/VoC_logo.png" alt="VoC Logo"> |
|
|
|
|
|
[](https://aclanthology.org/2025.emnlp-main.559/) |
|
|
[](https://africa.dlnlp.ai/simba/) |
|
|
[](https://huggingface.co/spaces/UBC-NLP/SimbaBench) |
|
|
[](https://huggingface.co/collections/UBC-NLP/simba-speech-series) |
|
|
|
|
|
</div> |
|
|
|
|
|
## *Bridging the Digital Divide for African AI* |
|
|
|
|
|
**Voice of a Continent** is a comprehensive open-source ecosystem designed to bring African languages to the forefront of artificial intelligence. By providing a unified suite of benchmarking tools and state-of-the-art models, we ensure that the future of speech technology is inclusive, representative, and accessible to over a billion people. |
|
|
|
|
|
## Best-in-Class Multilingual Models |
|
|
|
|
|
<img src="https://africa.dlnlp.ai/simba/images/VoC_simba" alt="VoC Simba Models Logo"> |
|
|
|
|
|
Introduced in our EMNLP 2025 paper *[Voice of a Continent](https://aclanthology.org/2025.emnlp-main.559/)*, the **Simba Series** represents the current state-of-the-art for African speech AI. |
|
|
|
|
|
- **Unified Suite:** Models optimized for African languages. |
|
|
- **Superior Accuracy:** Outperforms generic multilingual models by leveraging SimbaBench's high-quality, domain-diverse datasets. |
|
|
- **Multitask Capability:** Designed for high performance in ASR (Automatic Speech Recognition) and TTS (Text-to-Speech). |
|
|
- **Inclusion-First:** Specifically built to mitigate the "digital divide" by empowering speakers of underrepresented languages. |
|
|
|
|
|
The **Simba** family consists of state-of-the-art models fine-tuned using SimbaBench. These models achieve superior performance by leveraging dataset quality, domain diversity, and language family relationships. |
|
|
|
|
|
|
|
|
### π Simba-TTS (Text-to-Speech) |
|
|
* **π― Task:** `Text-to-Speech` β Natural Voice Synthesis. |
|
|
**π Language Coverage (7 African languages)** |
|
|
> **Afrikaans** (`afr`), **Asante Twi** (`asanti`), **Akuapem Twi** (`akuapem`), **Lingala** (`lin`), **Southern Sotho** (`sot`), **Tswana** (`tsn`), **Xhosa** (`xho`) |
|
|
|
|
|
| **TTS Model** | **Architecture** | **Hugging Face Card** | **Status** | |
|
|
| :--- | :--- | :---: | :---: | |
|
|
| **Simba-TTS-afr** π | MMS-TTS | π€ [https://huggingface.co/UBC-NLP/Simba-TTS-afr](https://huggingface.co/UBC-NLP/Simba-TTS-afr) | β
Released | |
|
|
| **Simba-TTS-twi-asanti** π | MMS-TTS | π€ [https://huggingface.co/UBC-NLP/Simba-TTS-twi-asanti](https://huggingface.co/UBC-NLP/Simba-TTS-twi-asanti) | β
Released | |
|
|
| **Simba-TTS-twi-akuapem** π | MMS-TTS | π€ [https://huggingface.co/UBC-NLP/Simba-TTS-twi-akuapem](https://huggingface.co/UBC-NLP/Simba-TTS-twi-akuapem) | β
Released | |
|
|
| **Simba-TTS-lin** π | MMS-TTS | π€ [https://huggingface.co/UBC-NLP/Simba-TTS-lin](https://huggingface.co/UBC-NLP/Simba-TTS-lin) | β
Released | |
|
|
| **Simba-TTS-sot** π | MMS-TTS | π€ [https://huggingface.co/UBC-NLP/Simba-TTS-sot](https://huggingface.co/UBC-NLP/Simba-TTS-sot) | β
Released | |
|
|
| **Simba-TTS-tsn** π | MMS-TTS | π€ [https://huggingface.co/UBC-NLP/Simba-TTS-tsn](https://huggingface.co/UBC-NLP/Simba-TTS-tsn) | β
Released | |
|
|
| **Simba-TTS-xho** π | MMS-TTS | π€ [https://huggingface.co/UBC-NLP/Simba-TTS-xho](https://huggingface.co/UBC-NLP/Simba-TTS-xho) | β
Released | |
|
|
|
|
|
**π§© Usage Example** |
|
|
|
|
|
You can easily run inference using the Hugging Face `transformers` library. |
|
|
|
|
|
```python |
|
|
from transformers import VitsModel, AutoTokenizer |
|
|
import torch |
|
|
|
|
|
model_name="Simba-TTS-afr" ## Simba-TTS-twi-asanti, Simba-TTS-twi-akuapem, Simba-TTS-lin, Simba-TTS-sot, Simba-TTS-tsn, Simba-TTS-xho |
|
|
model = VitsModel.from_pretrained(model_name) |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
|
|
|
text = "Ons noem hierdie deeltjies sub-atomiese deeltjies" #example of Afrikaans (afr) language |
|
|
inputs = tokenizer(text, return_tensors="pt") |
|
|
|
|
|
with torch.no_grad(): |
|
|
output = model(**inputs).waveform |
|
|
|
|
|
``` |
|
|
The resulting waveform can be saved as a .wav file: |
|
|
```python |
|
|
scipy.io.wavfile.write("outputfile.wav", rate=model.config.sampling_rate, data=output.float().numpy()) |
|
|
|
|
|
``` |