Update README.md
Browse files
README.md
CHANGED
|
@@ -1,102 +1,116 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
-
|
| 8 |
-
-
|
| 9 |
-
|
| 10 |
-
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
-
|
| 24 |
-
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
-
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
-
|
| 48 |
-
-
|
| 49 |
-
-
|
| 50 |
-
-
|
| 51 |
-
-
|
| 52 |
-
|
| 53 |
-
-
|
| 54 |
-
|
| 55 |
-
|
| 56 |
-
|
| 57 |
-
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
|
| 71 |
-
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
|
| 76 |
-
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
|
| 80 |
-
|
| 81 |
-
|
| 82 |
-
|
| 83 |
-
|
| 84 |
-
|
| 85 |
-
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
|
| 89 |
-
|
| 90 |
-
|
| 91 |
-
|
| 92 |
-
|
| 93 |
-
|
| 94 |
-
|
| 95 |
-
|
| 96 |
-
|
| 97 |
-
|
| 98 |
-
|
| 99 |
-
-
|
| 100 |
-
|
| 101 |
-
|
| 102 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<div align="center">
|
| 2 |
+
|
| 3 |
+
<img src="https://africa.dlnlp.ai/simba/images/VoC_simba" alt="VoC Simba Models Logo">
|
| 4 |
+
|
| 5 |
+
|
| 6 |
+
[](https://aclanthology.org/2025.emnlp-main.559/)
|
| 7 |
+
[](https://africa.dlnlp.ai/simba/)
|
| 8 |
+
[](#simbabench)
|
| 9 |
+
[](https://huggingface.co/collections/UBC-NLP/simba-speech-series)
|
| 10 |
+
[](#demo)
|
| 11 |
+
|
| 12 |
+
</div>
|
| 13 |
+
|
| 14 |
+
## *Bridging the Digital Divide for African AI*
|
| 15 |
+
|
| 16 |
+
**Voice of a Continent** is a comprehensive open-source ecosystem designed to bring African languages to the forefront of artificial intelligence. By providing a unified suite of benchmarking tools and state-of-the-art models, we ensure that the future of speech technology is inclusive, representative, and accessible to over a billion people.
|
| 17 |
+
|
| 18 |
+
## Best-in-Class Multilingual Models
|
| 19 |
+
|
| 20 |
+
Introduced in our EMNLP 2025 paper *[Voice of a Continent](https://aclanthology.org/2025.emnlp-main.559/)*, the **Simba Series** represents the current state-of-the-art for African speech AI.
|
| 21 |
+
|
| 22 |
+
- **Unified Suite:** Models optimized for African languages.
|
| 23 |
+
- **Superior Accuracy:** Outperforms generic multilingual models by leveraging SimbaBench's high-quality, domain-diverse datasets.
|
| 24 |
+
- **Multitask Capability:** Designed for high performance in ASR (Automatic Speech Recognition) and TTS (Text-to-Speech).
|
| 25 |
+
- **Inclusion-First:** Specifically built to mitigate the "digital divide" by empowering speakers of underrepresented languages.
|
| 26 |
+
|
| 27 |
+
The **Simba** family consists of state-of-the-art models fine-tuned using SimbaBench. These models achieve superior performance by leveraging dataset quality, domain diversity, and language family relationships.
|
| 28 |
+
|
| 29 |
+
### π£οΈβοΈ Simba-ASR
|
| 30 |
+
> **The New Standard for African Speech-to-Text**
|
| 31 |
+
|
| 32 |
+
**π― Task** `Automatic Speech Recognition` β Powering high-accuracy transcription across the continent.
|
| 33 |
+
|
| 34 |
+
**π Language Coverage (43 African languages)**
|
| 35 |
+
> **Amharic** (`amh`), **Arabic** (`ara`), **Asante Twi** (`asanti`), **Bambara** (`bam`), **BaoulΓ©** (`bau`), **Bemba** (`bem`), **Ewe** (`ewe`), **Fanti** (`fat`), **Fon** (`fon`), **French** (`fra`), **Ganda** (`lug`), **Hausa** (`hau`), **Igbo** (`ibo`), **Kabiye** (`kab`), **Kinyarwanda** (`kin`), **Kongo** (`kon`), **Lingala** (`lin`), **Luba-Katanga** (`lub`), **Luo** (`luo`), **Malagasy** (`mlg`), **Mossi** (`mos`), **Northern Sotho** (`nso`), **Nyanja** (`nya`), **Oromo** (`orm`), **Portuguese** (`por`), **Shona** (`sna`), **Somali** (`som`), **Southern Sotho** (`sot`), **Swahili** (`swa`), **Swati** (`ssw`), **Tigrinya** (`tir`), **Tsonga** (`tso`), **Tswana** (`tsn`), **Twi** (`twi`), **Umbundu** (`umb`), **Venda** (`ven`), **Wolof** (`wol`), **Xhosa** (`xho`), **Yoruba** (`yor`), **Zulu** (`zul`), **Tamazight** (`tzm`), **Sango** (`sag`), **Dinka** (`din`).
|
| 36 |
+
|
| 37 |
+
**ποΈ Base Architectures**
|
| 38 |
+
|
| 39 |
+
- **Simba-S** (SeamlessM4T-v2-MT) β *Top Performer*
|
| 40 |
+
- **Simba-W** (Whisper-v3-large)
|
| 41 |
+
- **Simba-X** (Wav2Vec2-XLS-R-2b)
|
| 42 |
+
- **Simba-M** (MMS-1b-all)
|
| 43 |
+
- **Simba-H** (AfriHuBERT)
|
| 44 |
+
|
| 45 |
+
| **ASR Models** | **Architecture** | **π€ Hugging Face Model Card** | **Status** |
|
| 46 |
+
|---------|:------------------:| :------------------:| :------------------:|
|
| 47 |
+
| π₯**Simba-S**π₯| SeamlessM4T-v2 | π€ [https://huggingface.co/UBC-NLP/Simba-S](https://huggingface.co/UBC-NLP/Simba-S) | β
Released |
|
| 48 |
+
| π₯**Simba-W**π₯| Whisper | π€ [https://huggingface.co/UBC-NLP/Simba-W](https://huggingface.co/UBC-NLP/Simba-W) | β
Released |
|
| 49 |
+
| π₯**Simba-X**π₯| Wav2Vec2 | π€ [https://huggingface.co/UBC-NLP/Simba-X](https://huggingface.co/UBC-NLP/Simba-X) | β
Released |
|
| 50 |
+
| π₯**Simba-M**π₯| MMS | π€ [https://huggingface.co/UBC-NLP/Simba-M](https://huggingface.co/UBC-NLP/Simba-M) | β
Released |
|
| 51 |
+
| π₯**Simba-H**π₯| HuBERT | π€ [https://huggingface.co/UBC-NLP/Simba-H](https://huggingface.co/UBC-NLP/Simba-H) | β
Released |
|
| 52 |
+
|
| 53 |
+
* **Simba-S** (based on SeamlessM4T-v2-MT) emerged as the best-performing ASR model overall.
|
| 54 |
+
|
| 55 |
+
**π§© Usage Example**
|
| 56 |
+
|
| 57 |
+
You can easily run inference using the Hugging Face `transformers` library.
|
| 58 |
+
|
| 59 |
+
```python
|
| 60 |
+
from transformers import pipeline
|
| 61 |
+
|
| 62 |
+
# Load Simba-S for ASR
|
| 63 |
+
asr_pipeline = pipeline(
|
| 64 |
+
"automatic-speech-recognition",
|
| 65 |
+
model="UBC-NLP/Simba-S" #Simba mdoels `UBC-NLP/Simba-S`, `UBC-NLP/Simba-W`, `UBC-NLP/Simba-X`, `UBC-NLP/Simba-H`, `UBC-NLP/Simba-M`
|
| 66 |
+
)
|
| 67 |
+
|
| 68 |
+
asr_pipeline.model.load_adapter("multilingual_african") # Only for `UBC-NLP/Simba-M`
|
| 69 |
+
|
| 70 |
+
# Transcribe audio from file
|
| 71 |
+
result = asr_pipeline("https://africa.dlnlp.ai/simba/audio/afr_Lwazi_afr_test_idx3889.wav")
|
| 72 |
+
print(result["text"])
|
| 73 |
+
|
| 74 |
+
|
| 75 |
+
# Transcribe audio from audio array
|
| 76 |
+
result = asr_pipeline({
|
| 77 |
+
"array": audio_array,
|
| 78 |
+
"sampling_rate": 16_000
|
| 79 |
+
})
|
| 80 |
+
print(result["text"])
|
| 81 |
+
|
| 82 |
+
```
|
| 83 |
+
Get started with Simba models in minutes using our interactive Colab notebook: [](https://github.com/UBC-NLP/simba/edit/main/simba_models.ipynb)
|
| 84 |
+
|
| 85 |
+
|
| 86 |
+
## Citation
|
| 87 |
+
|
| 88 |
+
If you use the Simba models or SimbaBench benchmark for your scientific publication, or if you find the resources in this website useful, please cite our paper.
|
| 89 |
+
|
| 90 |
+
```bibtex
|
| 91 |
+
|
| 92 |
+
@inproceedings{elmadany-etal-2025-voice,
|
| 93 |
+
title = "Voice of a Continent: Mapping {A}frica{'}s Speech Technology Frontier",
|
| 94 |
+
author = "Elmadany, AbdelRahim A. and
|
| 95 |
+
Kwon, Sang Yun and
|
| 96 |
+
Toyin, Hawau Olamide and
|
| 97 |
+
Alcoba Inciarte, Alcides and
|
| 98 |
+
Aldarmaki, Hanan and
|
| 99 |
+
Abdul-Mageed, Muhammad",
|
| 100 |
+
editor = "Christodoulopoulos, Christos and
|
| 101 |
+
Chakraborty, Tanmoy and
|
| 102 |
+
Rose, Carolyn and
|
| 103 |
+
Peng, Violet",
|
| 104 |
+
booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing",
|
| 105 |
+
month = nov,
|
| 106 |
+
year = "2025",
|
| 107 |
+
address = "Suzhou, China",
|
| 108 |
+
publisher = "Association for Computational Linguistics",
|
| 109 |
+
url = "https://aclanthology.org/2025.emnlp-main.559/",
|
| 110 |
+
doi = "10.18653/v1/2025.emnlp-main.559",
|
| 111 |
+
pages = "11039--11061",
|
| 112 |
+
ISBN = "979-8-89176-332-6",
|
| 113 |
+
}
|
| 114 |
+
|
| 115 |
+
```
|
| 116 |
+
|