File size: 8,251 Bytes
1e29ea4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
eb18c6d
 
 
 
 
 
 
10d45ed
eb18c6d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b080dc5
 
 
 
 
 
 
 
 
 
 
eb18c6d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b080dc5
eb18c6d
b080dc5
eb18c6d
 
 
 
 
 
 
 
 
 
 
 
 
 
b080dc5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ebdb147
eb18c6d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
---
language:
  - am  # Amharic
  - ar  # Arabic
  - tw  # Asante Twi
  - bm  # Bambara
  - fr  # French
  - lg  # Ganda
  - ha  # Hausa
  - ig  # Igbo
  - rw  # Kinyarwanda
  - kg  # Kongo
  - ln  # Lingala
  - lu  # Luba-Katanga
  - mg  # Malagasy
  - nso # Northern Sotho
  - ny  # Nyanja
  - om  # Oromo
  - pt  # Portuguese
  - sn  # Shona
  - so  # Somali
  - st  # Southern Sotho
  - sw  # Swahili
  - ss  # Swati
  - ti  # Tigrinya
  - ts  # Tsonga
  - tn  # Tswana
  - ak  # Twi
  - ve  # Venda
  - wo  # Wolof
  - xh  # Xhosa
  - yo  # Yoruba
  - zu  # Zulu
  - tzm # Tamazight
  - sg  # Sango
  - din # Dinka
  - ee  # Ewe
  - fo  # Fon
  - luo # Luo
  - mos # Mossi
  - umb # Umbundu
license: cc-by-4.0
tags:
  - automatic-speech-recognition
  - audio
  - speech
  - african-languages
  - multilingual
  - simba
  - low-resource
  - speech-recognition
  - asr
datasets:
  - UBC-NLP/SimbaBench
metrics:
  - wer
  - cer
library_name: transformers
pipeline_tag: automatic-speech-recognition
---
<div align="center">

<img src="https://africa.dlnlp.ai/simba/images/VoC_simba" alt="VoC Simba Models Logo">


[![EMNLP 2025 Paper](https://img.shields.io/badge/EMNLP_2025-Paper-B31B1B?style=for-the-badge&logo=arxiv&logoColor=B31B1B&labelColor=FFCDD2)](https://aclanthology.org/2025.emnlp-main.559/)
[![Official Website](https://img.shields.io/badge/Official-Website-2EA44F?style=for-the-badge&logo=googlechrome&logoColor=2EA44F&labelColor=C8E6C9)](https://africa.dlnlp.ai/simba/)
[![SimbaBench](https://img.shields.io/badge/SimbaBench-Benchmark-8A2BE2?style=for-the-badge&logo=googlecharts&logoColor=8A2BE2&labelColor=E1BEE7)](https://huggingface.co/spaces/UBC-NLP/SimbaBench)
[![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-FFD21E?style=for-the-badge&logoColor=black&labelColor=FFF9C4)](https://huggingface.co/collections/UBC-NLP/simba-speech-series)

</div>

## *Bridging the Digital Divide for African AI*

**Voice of a Continent** is a comprehensive open-source ecosystem designed to bring African languages to the forefront of artificial intelligence. By providing a unified suite of benchmarking tools and state-of-the-art models, we ensure that the future of speech technology is inclusive, representative, and accessible to over a billion people.

## Best-in-Class Multilingual Models

Introduced in our EMNLP 2025 paper *[Voice of a Continent](https://aclanthology.org/2025.emnlp-main.559/)*, the **Simba Series** represents the current state-of-the-art for African speech AI.

- **Unified Suite:** Models optimized for African languages.
- **Superior Accuracy:** Outperforms generic multilingual models by leveraging SimbaBench's high-quality, domain-diverse datasets.
- **Multitask Capability:** Designed for high performance in ASR (Automatic Speech Recognition) and TTS (Text-to-Speech).
- **Inclusion-First:** Specifically built to mitigate the "digital divide" by empowering speakers of underrepresented languages.

The **Simba** family consists of state-of-the-art models fine-tuned using SimbaBench. These models achieve superior performance by leveraging dataset quality, domain diversity, and language family relationships.

### πŸ—£οΈβœοΈ Simba-ASR
> **The New Standard for African Speech-to-Text**

**🎯 Task** `Automatic Speech Recognition` β€” Powering high-accuracy transcription across the continent.

**🌍 Language Coverage (43 African languages)**
>  **Amharic** (`amh`), **Arabic** (`ara`), **Asante Twi** (`asanti`), **Bambara** (`bam`), **BaoulΓ©** (`bau`), **Bemba** (`bem`), **Ewe** (`ewe`), **Fanti** (`fat`), **Fon** (`fon`), **French** (`fra`), **Ganda** (`lug`), **Hausa** (`hau`), **Igbo** (`ibo`), **Kabiye** (`kab`), **Kinyarwanda** (`kin`), **Kongo** (`kon`), **Lingala** (`lin`), **Luba-Katanga** (`lub`), **Luo** (`luo`), **Malagasy** (`mlg`), **Mossi** (`mos`), **Northern Sotho** (`nso`), **Nyanja** (`nya`), **Oromo** (`orm`), **Portuguese** (`por`), **Shona** (`sna`), **Somali** (`som`), **Southern Sotho** (`sot`), **Swahili** (`swa`), **Swati** (`ssw`), **Tigrinya** (`tir`), **Tsonga** (`tso`), **Tswana** (`tsn`), **Twi** (`twi`), **Umbundu** (`umb`), **Venda** (`ven`), **Wolof** (`wol`), **Xhosa** (`xho`), **Yoruba** (`yor`), **Zulu** (`zul`), **Tamazight** (`tzm`), **Sango** (`sag`), **Dinka** (`din`).

**πŸ—οΈ Base Architectures**

  -  **Simba-S** (SeamlessM4T-v2-MT) β€” *Top Performer*
  - **Simba-W** (Whisper-v3-large)
  - **Simba-X** (Wav2Vec2-XLS-R-2b)
  - **Simba-M** (MMS-1b-all)
  - **Simba-H** (AfriHuBERT)
      
🌐 Explore the Frontier

| **ASR Models**   | **Architecture**  | **#Parameters** | **πŸ€— Hugging Face Model Card** | **Status** |
|---------|:------------------:| :------------------:| :------------------:|:------------------:|    
| πŸ”₯**Simba-S**πŸ”₯|    SeamlessM4T-v2  |  2.3B | πŸ€— [https://huggingface.co/UBC-NLP/Simba-S](https://huggingface.co/UBC-NLP/Simba-S) | βœ… Released |
| πŸ”₯**Simba-W**πŸ”₯|    Whisper         |  1.5B | πŸ€— [https://huggingface.co/UBC-NLP/Simba-W](https://huggingface.co/UBC-NLP/Simba-W) | βœ… Released | 
| πŸ”₯**Simba-X**πŸ”₯|    Wav2Vec2        |  1B | πŸ€— [https://huggingface.co/UBC-NLP/Simba-X](https://huggingface.co/UBC-NLP/Simba-X) | βœ… Released |   
| πŸ”₯**Simba-M**πŸ”₯|    MMS             |  1B | πŸ€— [https://huggingface.co/UBC-NLP/Simba-M](https://huggingface.co/UBC-NLP/Simba-M) | βœ… Released |   
| πŸ”₯**Simba-H**πŸ”₯|    HuBERT          |  94M | πŸ€— [https://huggingface.co/UBC-NLP/Simba-H](https://huggingface.co/UBC-NLP/Simba-H) | βœ… Released |   

* **Simba-S** emerged as the best-performing ASR model overall.


**🧩 Usage Example**

You can easily run inference using the Hugging Face `transformers` library.

```python
from transformers import pipeline

# Load Simba-S for ASR
asr_pipeline = pipeline(
    "automatic-speech-recognition",
    model="UBC-NLP/Simba-S" #Simba mdoels `UBC-NLP/Simba-S`, `UBC-NLP/Simba-W`, `UBC-NLP/Simba-X`, `UBC-NLP/Simba-H`, `UBC-NLP/Simba-M`
)

##### Load the multilingual African adapter (Only for  `UBC-NLP/Simba-M`)
asr_pipeline.model.load_adapter("multilingual_african")  # Only for  `UBC-NLP/Simba-M`
###########################

# Transcribe audio from file
result = asr_pipeline("https://africa.dlnlp.ai/simba/audio/afr_Lwazi_afr_test_idx3889.wav")
print(result["text"])


# Transcribe audio from audio array
result = asr_pipeline({
    "array": audio_array,
    "sampling_rate": 16_000
})
print(result["text"])

```

#### Example Outputs

Using the same audio file with different Simba models:

```python
# Simba-S
{'text': 'watter verontwaardiging sou daar, in ons binneste gewees het.'}
```

```python
# Simba-W
{'text': 'watter veronwaardigingsel daar, in ons binneste gewees het.'}
```

```python
# Simba-X
{'text': 'fator fr on ar taamsodr is'}
```

```python
# Simba-M
{'text': 'watter veronwaardiging sodaar in ons binniste gewees het'}
```

```python
# Simba-H
{'text': 'watter vironwaardiging so daar in ons binneste geweeshet'}
```

Get started with Simba models in minutes using our interactive Colab notebook: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://github.com/UBC-NLP/simba/blob/main/simba_models.ipynb)


## Citation

If you use the Simba models or SimbaBench  benchmark for your scientific publication, or if you find the resources in this website useful, please cite our paper.

```bibtex

@inproceedings{elmadany-etal-2025-voice,
    title = "Voice of a Continent: Mapping {A}frica{'}s Speech Technology Frontier",
    author = "Elmadany, AbdelRahim A.  and
      Kwon, Sang Yun  and
      Toyin, Hawau Olamide  and
      Alcoba Inciarte, Alcides  and
      Aldarmaki, Hanan  and
      Abdul-Mageed, Muhammad",
    editor = "Christodoulopoulos, Christos  and
      Chakraborty, Tanmoy  and
      Rose, Carolyn  and
      Peng, Violet",
    booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2025",
    address = "Suzhou, China",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.emnlp-main.559/",
    doi = "10.18653/v1/2025.emnlp-main.559",
    pages = "11039--11061",
    ISBN = "979-8-89176-332-6",
}

```