elmadany commited on
Commit
bd1c657
Β·
verified Β·
1 Parent(s): 4a57f1a

Initial model upload

Browse files
Files changed (2) hide show
  1. .ipynb_checkpoints/README-checkpoint.md +211 -0
  2. README.md +206 -92
.ipynb_checkpoints/README-checkpoint.md ADDED
@@ -0,0 +1,211 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - am # Amharic
4
+ - ar # Arabic
5
+ - tw # Asante Twi
6
+ - bm # Bambara
7
+ - fr # French
8
+ - lg # Ganda
9
+ - ha # Hausa
10
+ - ig # Igbo
11
+ - rw # Kinyarwanda
12
+ - kg # Kongo
13
+ - ln # Lingala
14
+ - lu # Luba-Katanga
15
+ - mg # Malagasy
16
+ - nso # Northern Sotho
17
+ - ny # Nyanja
18
+ - om # Oromo
19
+ - pt # Portuguese
20
+ - sn # Shona
21
+ - so # Somali
22
+ - st # Southern Sotho
23
+ - sw # Swahili
24
+ - ss # Swati
25
+ - ti # Tigrinya
26
+ - ts # Tsonga
27
+ - tn # Tswana
28
+ - ak # Twi
29
+ - ve # Venda
30
+ - wo # Wolof
31
+ - xh # Xhosa
32
+ - yo # Yoruba
33
+ - zu # Zulu
34
+ - tzm # Tamazight
35
+ - sg # Sango
36
+ - din # Dinka
37
+ - ee # Ewe
38
+ - fo # Fon
39
+ - luo # Luo
40
+ - mos # Mossi
41
+ - umb # Umbundu
42
+ license: cc-by-4.0
43
+ tags:
44
+ - automatic-speech-recognition
45
+ - audio
46
+ - speech
47
+ - african-languages
48
+ - multilingual
49
+ - simba
50
+ - low-resource
51
+ - speech-recognition
52
+ - asr
53
+ datasets:
54
+ - UBC-NLP/SimbaBench
55
+ metrics:
56
+ - wer
57
+ - cer
58
+ library_name: transformers
59
+ pipeline_tag: automatic-speech-recognition
60
+ ---
61
+ <div align="center">
62
+
63
+ <img src="https://africa.dlnlp.ai/simba/images/VoC_simba" alt="VoC Simba Models Logo">
64
+
65
+
66
+ [![EMNLP 2025 Paper](https://img.shields.io/badge/EMNLP_2025-Paper-B31B1B?style=for-the-badge&logo=arxiv&logoColor=B31B1B&labelColor=FFCDD2)](https://aclanthology.org/2025.emnlp-main.559/)
67
+ [![Official Website](https://img.shields.io/badge/Official-Website-2EA44F?style=for-the-badge&logo=googlechrome&logoColor=2EA44F&labelColor=C8E6C9)](https://africa.dlnlp.ai/simba/)
68
+ [![SimbaBench](https://img.shields.io/badge/SimbaBench-Benchmark-8A2BE2?style=for-the-badge&logo=googlecharts&logoColor=8A2BE2&labelColor=E1BEE7)](#simbabench)
69
+ [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-FFD21E?style=for-the-badge&logoColor=black&labelColor=FFF9C4)](https://huggingface.co/collections/UBC-NLP/simba-speech-series)
70
+ [![YouTube Video](https://img.shields.io/badge/YouTube-Video-FF0000?style=for-the-badge&logo=youtube&logoColor=FF0000&labelColor=FFCCBC)](#demo)
71
+
72
+ </div>
73
+
74
+ ## *Bridging the Digital Divide for African AI*
75
+
76
+ **Voice of a Continent** is a comprehensive open-source ecosystem designed to bring African languages to the forefront of artificial intelligence. By providing a unified suite of benchmarking tools and state-of-the-art models, we ensure that the future of speech technology is inclusive, representative, and accessible to over a billion people.
77
+
78
+ ## Best-in-Class Multilingual Models
79
+
80
+ Introduced in our EMNLP 2025 paper *[Voice of a Continent](https://aclanthology.org/2025.emnlp-main.559/)*, the **Simba Series** represents the current state-of-the-art for African speech AI.
81
+
82
+ - **Unified Suite:** Models optimized for African languages.
83
+ - **Superior Accuracy:** Outperforms generic multilingual models by leveraging SimbaBench's high-quality, domain-diverse datasets.
84
+ - **Multitask Capability:** Designed for high performance in ASR (Automatic Speech Recognition) and TTS (Text-to-Speech).
85
+ - **Inclusion-First:** Specifically built to mitigate the "digital divide" by empowering speakers of underrepresented languages.
86
+
87
+ The **Simba** family consists of state-of-the-art models fine-tuned using SimbaBench. These models achieve superior performance by leveraging dataset quality, domain diversity, and language family relationships.
88
+
89
+ ### πŸ—£οΈβœοΈ Simba-ASR
90
+ > **The New Standard for African Speech-to-Text**
91
+
92
+ **🎯 Task** `Automatic Speech Recognition` β€” Powering high-accuracy transcription across the continent.
93
+
94
+ **🌍 Language Coverage (43 African languages)**
95
+ > **Amharic** (`amh`), **Arabic** (`ara`), **Asante Twi** (`asanti`), **Bambara** (`bam`), **BaoulΓ©** (`bau`), **Bemba** (`bem`), **Ewe** (`ewe`), **Fanti** (`fat`), **Fon** (`fon`), **French** (`fra`), **Ganda** (`lug`), **Hausa** (`hau`), **Igbo** (`ibo`), **Kabiye** (`kab`), **Kinyarwanda** (`kin`), **Kongo** (`kon`), **Lingala** (`lin`), **Luba-Katanga** (`lub`), **Luo** (`luo`), **Malagasy** (`mlg`), **Mossi** (`mos`), **Northern Sotho** (`nso`), **Nyanja** (`nya`), **Oromo** (`orm`), **Portuguese** (`por`), **Shona** (`sna`), **Somali** (`som`), **Southern Sotho** (`sot`), **Swahili** (`swa`), **Swati** (`ssw`), **Tigrinya** (`tir`), **Tsonga** (`tso`), **Tswana** (`tsn`), **Twi** (`twi`), **Umbundu** (`umb`), **Venda** (`ven`), **Wolof** (`wol`), **Xhosa** (`xho`), **Yoruba** (`yor`), **Zulu** (`zul`), **Tamazight** (`tzm`), **Sango** (`sag`), **Dinka** (`din`).
96
+
97
+ **πŸ—οΈ Base Architectures**
98
+
99
+ - **Simba-S** (SeamlessM4T-v2-MT) β€” *Top Performer*
100
+ - **Simba-W** (Whisper-v3-large)
101
+ - **Simba-X** (Wav2Vec2-XLS-R-2b)
102
+ - **Simba-M** (MMS-1b-all)
103
+ - **Simba-H** (AfriHuBERT)
104
+
105
+ 🌐 Explore the Frontier
106
+
107
+ | **ASR Models** | **Architecture** | **#Parameters** | **πŸ€— Hugging Face Model Card** | **Status** |
108
+ |---------|:------------------:| :------------------:| :------------------:|:------------------:|
109
+ | πŸ”₯**Simba-S**πŸ”₯| SeamlessM4T-v2 | 2.3B | πŸ€— [https://huggingface.co/UBC-NLP/Simba-S](https://huggingface.co/UBC-NLP/Simba-S) | βœ… Released |
110
+ | πŸ”₯**Simba-W**πŸ”₯| Whisper | 1.5B | πŸ€— [https://huggingface.co/UBC-NLP/Simba-W](https://huggingface.co/UBC-NLP/Simba-W) | βœ… Released |
111
+ | πŸ”₯**Simba-X**πŸ”₯| Wav2Vec2 | 1B | πŸ€— [https://huggingface.co/UBC-NLP/Simba-X](https://huggingface.co/UBC-NLP/Simba-X) | βœ… Released |
112
+ | πŸ”₯**Simba-M**πŸ”₯| MMS | 1B | πŸ€— [https://huggingface.co/UBC-NLP/Simba-M](https://huggingface.co/UBC-NLP/Simba-M) | βœ… Released |
113
+ | πŸ”₯**Simba-H**πŸ”₯| HuBERT | 94M | πŸ€— [https://huggingface.co/UBC-NLP/Simba-H](https://huggingface.co/UBC-NLP/Simba-H) | βœ… Released |
114
+
115
+ * **Simba-S** emerged as the best-performing ASR model overall.
116
+
117
+
118
+ **🧩 Usage Example**
119
+
120
+ You can easily run inference using the Hugging Face `transformers` library.
121
+
122
+ ```python
123
+ from transformers import pipeline
124
+
125
+ # Load Simba-S for ASR
126
+ asr_pipeline = pipeline(
127
+ "automatic-speech-recognition",
128
+ model="UBC-NLP/Simba-S" #Simba mdoels `UBC-NLP/Simba-S`, `UBC-NLP/Simba-W`, `UBC-NLP/Simba-X`, `UBC-NLP/Simba-H`, `UBC-NLP/Simba-M`
129
+ )
130
+
131
+ ##### Load the multilingual African adapter (Only for `UBC-NLP/Simba-M`)
132
+ asr_pipeline.model.load_adapter("multilingual_african") # Only for `UBC-NLP/Simba-M`
133
+ ###########################
134
+
135
+ # Transcribe audio from file
136
+ result = asr_pipeline("https://africa.dlnlp.ai/simba/audio/afr_Lwazi_afr_test_idx3889.wav")
137
+ print(result["text"])
138
+
139
+
140
+ # Transcribe audio from audio array
141
+ result = asr_pipeline({
142
+ "array": audio_array,
143
+ "sampling_rate": 16_000
144
+ })
145
+ print(result["text"])
146
+
147
+ ```
148
+
149
+ #### Example Outputs
150
+
151
+ Using the same audio file with different Simba models:
152
+
153
+ ```python
154
+ # Simba-S
155
+ {'text': 'watter verontwaardiging sou daar, in ons binneste gewees het.'}
156
+ ```
157
+
158
+ ```python
159
+ # Simba-W
160
+ {'text': 'watter veronwaardigingsel daar, in ons binneste gewees het.'}
161
+ ```
162
+
163
+ ```python
164
+ # Simba-X
165
+ {'text': 'fator fr on ar taamsodr is'}
166
+ ```
167
+
168
+ ```python
169
+ # Simba-M
170
+ {'text': 'watter veronwaardiging sodaar in ons binniste gewees het'}
171
+ ```
172
+
173
+ ```python
174
+ # Simba-H
175
+ {'text': 'watter vironwaardiging so daar in ons binneste geweeshet'}
176
+ ```
177
+
178
+ Get started with Simba models in minutes using our interactive Colab notebook: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://github.com/UBC-NLP/simba/edit/main/simba_models.ipynb)
179
+
180
+
181
+ ## Citation
182
+
183
+ If you use the Simba models or SimbaBench benchmark for your scientific publication, or if you find the resources in this website useful, please cite our paper.
184
+
185
+ ```bibtex
186
+
187
+ @inproceedings{elmadany-etal-2025-voice,
188
+ title = "Voice of a Continent: Mapping {A}frica{'}s Speech Technology Frontier",
189
+ author = "Elmadany, AbdelRahim A. and
190
+ Kwon, Sang Yun and
191
+ Toyin, Hawau Olamide and
192
+ Alcoba Inciarte, Alcides and
193
+ Aldarmaki, Hanan and
194
+ Abdul-Mageed, Muhammad",
195
+ editor = "Christodoulopoulos, Christos and
196
+ Chakraborty, Tanmoy and
197
+ Rose, Carolyn and
198
+ Peng, Violet",
199
+ booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing",
200
+ month = nov,
201
+ year = "2025",
202
+ address = "Suzhou, China",
203
+ publisher = "Association for Computational Linguistics",
204
+ url = "https://aclanthology.org/2025.emnlp-main.559/",
205
+ doi = "10.18653/v1/2025.emnlp-main.559",
206
+ pages = "11039--11061",
207
+ ISBN = "979-8-89176-332-6",
208
+ }
209
+
210
+ ```
211
+
README.md CHANGED
@@ -1,97 +1,211 @@
1
  ---
2
- license: apache-2.0
3
- base_model: facebook/wav2vec2-xls-r-1b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  tags:
5
- - automatic-speech-recognition
6
- - /mnt/home/elmadany/elmadany_workspace/African_speechT5/jasmine-raid/elmadany_work/HF_format/SimbaBench_tasks_ft
7
- - generated_from_trainer
 
 
 
 
 
 
 
 
8
  metrics:
9
- - wer
10
- model-index:
11
- - name: wav2vec2-xls-r-1b
12
- results: []
13
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
 
15
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
16
- should probably proofread and complete it, then remove this comment. -->
17
-
18
- # wav2vec2-xls-r-1b
19
-
20
- This model is a fine-tuned version of [facebook/wav2vec2-xls-r-1b](https://huggingface.co/facebook/wav2vec2-xls-r-1b) on the /MNT/HOME/ELMADANY/ELMADANY_WORKSPACE/AFRICAN_SPEECHT5/JASMINE-RAID/ELMADANY_WORK/HF_FORMAT/SIMBABENCH_TASKS_FT - ASR_FT_DATA_BATCH_2 dataset.
21
- It achieves the following results on the evaluation set:
22
- - Loss: 1.3260
23
- - Wer: 0.8212
24
-
25
- ## Model description
26
-
27
- More information needed
28
-
29
- ## Intended uses & limitations
30
-
31
- More information needed
32
-
33
- ## Training and evaluation data
34
-
35
- More information needed
36
-
37
- ## Training procedure
38
-
39
- ### Training hyperparameters
40
-
41
- The following hyperparameters were used during training:
42
- - learning_rate: 0.0001
43
- - train_batch_size: 8
44
- - eval_batch_size: 8
45
- - seed: 42
46
- - distributed_type: multi-GPU
47
- - num_devices: 4
48
- - gradient_accumulation_steps: 8
49
- - total_train_batch_size: 256
50
- - total_eval_batch_size: 32
51
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
52
- - lr_scheduler_type: linear
53
- - lr_scheduler_warmup_steps: 1000
54
- - num_epochs: 30.0
55
-
56
- ### Training results
57
-
58
- | Training Loss | Epoch | Step | Validation Loss | Wer |
59
- |:-------------:|:-----:|:-----:|:---------------:|:------:|
60
- | 4.5214 | 1.0 | 610 | 2.5037 | 0.9721 |
61
- | 0.5588 | 2.0 | 1221 | 2.3758 | 0.9199 |
62
- | 0.4547 | 3.0 | 1831 | 2.2466 | 0.9419 |
63
- | 0.4025 | 4.0 | 2442 | 2.2313 | 0.8876 |
64
- | 0.3789 | 5.0 | 3053 | 2.2444 | 0.8845 |
65
- | 0.3542 | 6.0 | 3663 | 1.9893 | 0.8914 |
66
- | 0.3383 | 7.0 | 4274 | 1.3485 | 0.8357 |
67
- | 0.3239 | 8.0 | 4885 | 1.9888 | 0.8776 |
68
- | 0.3074 | 9.0 | 5495 | 1.9462 | 0.8689 |
69
- | 0.2883 | 10.0 | 6106 | 1.5767 | 0.8871 |
70
- | 0.2749 | 11.0 | 6716 | 1.3260 | 0.8212 |
71
- | 0.2563 | 12.0 | 7327 | 1.8270 | 0.8440 |
72
- | 0.2474 | 13.0 | 7938 | 1.8219 | 0.8677 |
73
- | 0.2347 | 14.0 | 8548 | 1.5346 | 0.8635 |
74
- | 0.2211 | 15.0 | 9159 | 1.7185 | 0.8636 |
75
- | 0.2117 | 16.0 | 9770 | 1.8663 | 0.8697 |
76
- | 0.1987 | 17.0 | 10380 | 1.4298 | 0.8687 |
77
- | 0.1814 | 18.0 | 10991 | 1.5630 | 0.8680 |
78
- | 0.1694 | 19.0 | 11601 | 1.3627 | 0.8573 |
79
- | 0.1597 | 20.0 | 12212 | 1.7108 | 0.8642 |
80
- | 0.1517 | 21.0 | 12823 | 1.8344 | 0.8794 |
81
- | 0.1405 | 22.0 | 13433 | 1.4838 | 0.8508 |
82
- | 0.1262 | 23.0 | 14044 | 1.5322 | 0.8415 |
83
- | 0.1171 | 24.0 | 14655 | 1.7095 | 0.8682 |
84
- | 0.1079 | 25.0 | 15265 | 1.7445 | 0.8719 |
85
- | 0.0996 | 26.0 | 15876 | 1.7322 | 0.8502 |
86
- | 0.0922 | 27.0 | 16486 | 1.8349 | 0.8625 |
87
- | 0.0855 | 28.0 | 17097 | 1.8259 | 0.8646 |
88
- | 0.081 | 29.0 | 17708 | 1.8187 | 0.8651 |
89
- | 0.0771 | 29.97 | 18300 | 1.8427 | 0.8624 |
90
-
91
-
92
- ### Framework versions
93
-
94
- - Transformers 4.33.2
95
- - Pytorch 2.0.1+cu117
96
- - Datasets 3.5.0
97
- - Tokenizers 0.13.3
 
1
  ---
2
+ language:
3
+ - am # Amharic
4
+ - ar # Arabic
5
+ - tw # Asante Twi
6
+ - bm # Bambara
7
+ - fr # French
8
+ - lg # Ganda
9
+ - ha # Hausa
10
+ - ig # Igbo
11
+ - rw # Kinyarwanda
12
+ - kg # Kongo
13
+ - ln # Lingala
14
+ - lu # Luba-Katanga
15
+ - mg # Malagasy
16
+ - nso # Northern Sotho
17
+ - ny # Nyanja
18
+ - om # Oromo
19
+ - pt # Portuguese
20
+ - sn # Shona
21
+ - so # Somali
22
+ - st # Southern Sotho
23
+ - sw # Swahili
24
+ - ss # Swati
25
+ - ti # Tigrinya
26
+ - ts # Tsonga
27
+ - tn # Tswana
28
+ - ak # Twi
29
+ - ve # Venda
30
+ - wo # Wolof
31
+ - xh # Xhosa
32
+ - yo # Yoruba
33
+ - zu # Zulu
34
+ - tzm # Tamazight
35
+ - sg # Sango
36
+ - din # Dinka
37
+ - ee # Ewe
38
+ - fo # Fon
39
+ - luo # Luo
40
+ - mos # Mossi
41
+ - umb # Umbundu
42
+ license: cc-by-4.0
43
  tags:
44
+ - automatic-speech-recognition
45
+ - audio
46
+ - speech
47
+ - african-languages
48
+ - multilingual
49
+ - simba
50
+ - low-resource
51
+ - speech-recognition
52
+ - asr
53
+ datasets:
54
+ - UBC-NLP/SimbaBench
55
  metrics:
56
+ - wer
57
+ - cer
58
+ library_name: transformers
59
+ pipeline_tag: automatic-speech-recognition
60
  ---
61
+ <div align="center">
62
+
63
+ <img src="https://africa.dlnlp.ai/simba/images/VoC_simba" alt="VoC Simba Models Logo">
64
+
65
+
66
+ [![EMNLP 2025 Paper](https://img.shields.io/badge/EMNLP_2025-Paper-B31B1B?style=for-the-badge&logo=arxiv&logoColor=B31B1B&labelColor=FFCDD2)](https://aclanthology.org/2025.emnlp-main.559/)
67
+ [![Official Website](https://img.shields.io/badge/Official-Website-2EA44F?style=for-the-badge&logo=googlechrome&logoColor=2EA44F&labelColor=C8E6C9)](https://africa.dlnlp.ai/simba/)
68
+ [![SimbaBench](https://img.shields.io/badge/SimbaBench-Benchmark-8A2BE2?style=for-the-badge&logo=googlecharts&logoColor=8A2BE2&labelColor=E1BEE7)](#simbabench)
69
+ [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-FFD21E?style=for-the-badge&logoColor=black&labelColor=FFF9C4)](https://huggingface.co/collections/UBC-NLP/simba-speech-series)
70
+ [![YouTube Video](https://img.shields.io/badge/YouTube-Video-FF0000?style=for-the-badge&logo=youtube&logoColor=FF0000&labelColor=FFCCBC)](#demo)
71
+
72
+ </div>
73
+
74
+ ## *Bridging the Digital Divide for African AI*
75
+
76
+ **Voice of a Continent** is a comprehensive open-source ecosystem designed to bring African languages to the forefront of artificial intelligence. By providing a unified suite of benchmarking tools and state-of-the-art models, we ensure that the future of speech technology is inclusive, representative, and accessible to over a billion people.
77
+
78
+ ## Best-in-Class Multilingual Models
79
+
80
+ Introduced in our EMNLP 2025 paper *[Voice of a Continent](https://aclanthology.org/2025.emnlp-main.559/)*, the **Simba Series** represents the current state-of-the-art for African speech AI.
81
+
82
+ - **Unified Suite:** Models optimized for African languages.
83
+ - **Superior Accuracy:** Outperforms generic multilingual models by leveraging SimbaBench's high-quality, domain-diverse datasets.
84
+ - **Multitask Capability:** Designed for high performance in ASR (Automatic Speech Recognition) and TTS (Text-to-Speech).
85
+ - **Inclusion-First:** Specifically built to mitigate the "digital divide" by empowering speakers of underrepresented languages.
86
+
87
+ The **Simba** family consists of state-of-the-art models fine-tuned using SimbaBench. These models achieve superior performance by leveraging dataset quality, domain diversity, and language family relationships.
88
+
89
+ ### πŸ—£οΈβœοΈ Simba-ASR
90
+ > **The New Standard for African Speech-to-Text**
91
+
92
+ **🎯 Task** `Automatic Speech Recognition` β€” Powering high-accuracy transcription across the continent.
93
+
94
+ **🌍 Language Coverage (43 African languages)**
95
+ > **Amharic** (`amh`), **Arabic** (`ara`), **Asante Twi** (`asanti`), **Bambara** (`bam`), **BaoulΓ©** (`bau`), **Bemba** (`bem`), **Ewe** (`ewe`), **Fanti** (`fat`), **Fon** (`fon`), **French** (`fra`), **Ganda** (`lug`), **Hausa** (`hau`), **Igbo** (`ibo`), **Kabiye** (`kab`), **Kinyarwanda** (`kin`), **Kongo** (`kon`), **Lingala** (`lin`), **Luba-Katanga** (`lub`), **Luo** (`luo`), **Malagasy** (`mlg`), **Mossi** (`mos`), **Northern Sotho** (`nso`), **Nyanja** (`nya`), **Oromo** (`orm`), **Portuguese** (`por`), **Shona** (`sna`), **Somali** (`som`), **Southern Sotho** (`sot`), **Swahili** (`swa`), **Swati** (`ssw`), **Tigrinya** (`tir`), **Tsonga** (`tso`), **Tswana** (`tsn`), **Twi** (`twi`), **Umbundu** (`umb`), **Venda** (`ven`), **Wolof** (`wol`), **Xhosa** (`xho`), **Yoruba** (`yor`), **Zulu** (`zul`), **Tamazight** (`tzm`), **Sango** (`sag`), **Dinka** (`din`).
96
+
97
+ **πŸ—οΈ Base Architectures**
98
+
99
+ - **Simba-S** (SeamlessM4T-v2-MT) β€” *Top Performer*
100
+ - **Simba-W** (Whisper-v3-large)
101
+ - **Simba-X** (Wav2Vec2-XLS-R-2b)
102
+ - **Simba-M** (MMS-1b-all)
103
+ - **Simba-H** (AfriHuBERT)
104
+
105
+ 🌐 Explore the Frontier
106
+
107
+ | **ASR Models** | **Architecture** | **#Parameters** | **πŸ€— Hugging Face Model Card** | **Status** |
108
+ |---------|:------------------:| :------------------:| :------------------:|:------------------:|
109
+ | πŸ”₯**Simba-S**πŸ”₯| SeamlessM4T-v2 | 2.3B | πŸ€— [https://huggingface.co/UBC-NLP/Simba-S](https://huggingface.co/UBC-NLP/Simba-S) | βœ… Released |
110
+ | πŸ”₯**Simba-W**πŸ”₯| Whisper | 1.5B | πŸ€— [https://huggingface.co/UBC-NLP/Simba-W](https://huggingface.co/UBC-NLP/Simba-W) | βœ… Released |
111
+ | πŸ”₯**Simba-X**πŸ”₯| Wav2Vec2 | 1B | πŸ€— [https://huggingface.co/UBC-NLP/Simba-X](https://huggingface.co/UBC-NLP/Simba-X) | βœ… Released |
112
+ | πŸ”₯**Simba-M**πŸ”₯| MMS | 1B | πŸ€— [https://huggingface.co/UBC-NLP/Simba-M](https://huggingface.co/UBC-NLP/Simba-M) | βœ… Released |
113
+ | πŸ”₯**Simba-H**πŸ”₯| HuBERT | 94M | πŸ€— [https://huggingface.co/UBC-NLP/Simba-H](https://huggingface.co/UBC-NLP/Simba-H) | βœ… Released |
114
+
115
+ * **Simba-S** emerged as the best-performing ASR model overall.
116
+
117
+
118
+ **🧩 Usage Example**
119
+
120
+ You can easily run inference using the Hugging Face `transformers` library.
121
+
122
+ ```python
123
+ from transformers import pipeline
124
+
125
+ # Load Simba-S for ASR
126
+ asr_pipeline = pipeline(
127
+ "automatic-speech-recognition",
128
+ model="UBC-NLP/Simba-S" #Simba mdoels `UBC-NLP/Simba-S`, `UBC-NLP/Simba-W`, `UBC-NLP/Simba-X`, `UBC-NLP/Simba-H`, `UBC-NLP/Simba-M`
129
+ )
130
+
131
+ ##### Load the multilingual African adapter (Only for `UBC-NLP/Simba-M`)
132
+ asr_pipeline.model.load_adapter("multilingual_african") # Only for `UBC-NLP/Simba-M`
133
+ ###########################
134
+
135
+ # Transcribe audio from file
136
+ result = asr_pipeline("https://africa.dlnlp.ai/simba/audio/afr_Lwazi_afr_test_idx3889.wav")
137
+ print(result["text"])
138
+
139
+
140
+ # Transcribe audio from audio array
141
+ result = asr_pipeline({
142
+ "array": audio_array,
143
+ "sampling_rate": 16_000
144
+ })
145
+ print(result["text"])
146
+
147
+ ```
148
+
149
+ #### Example Outputs
150
+
151
+ Using the same audio file with different Simba models:
152
+
153
+ ```python
154
+ # Simba-S
155
+ {'text': 'watter verontwaardiging sou daar, in ons binneste gewees het.'}
156
+ ```
157
+
158
+ ```python
159
+ # Simba-W
160
+ {'text': 'watter veronwaardigingsel daar, in ons binneste gewees het.'}
161
+ ```
162
+
163
+ ```python
164
+ # Simba-X
165
+ {'text': 'fator fr on ar taamsodr is'}
166
+ ```
167
+
168
+ ```python
169
+ # Simba-M
170
+ {'text': 'watter veronwaardiging sodaar in ons binniste gewees het'}
171
+ ```
172
+
173
+ ```python
174
+ # Simba-H
175
+ {'text': 'watter vironwaardiging so daar in ons binneste geweeshet'}
176
+ ```
177
+
178
+ Get started with Simba models in minutes using our interactive Colab notebook: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://github.com/UBC-NLP/simba/edit/main/simba_models.ipynb)
179
+
180
+
181
+ ## Citation
182
+
183
+ If you use the Simba models or SimbaBench benchmark for your scientific publication, or if you find the resources in this website useful, please cite our paper.
184
+
185
+ ```bibtex
186
+
187
+ @inproceedings{elmadany-etal-2025-voice,
188
+ title = "Voice of a Continent: Mapping {A}frica{'}s Speech Technology Frontier",
189
+ author = "Elmadany, AbdelRahim A. and
190
+ Kwon, Sang Yun and
191
+ Toyin, Hawau Olamide and
192
+ Alcoba Inciarte, Alcides and
193
+ Aldarmaki, Hanan and
194
+ Abdul-Mageed, Muhammad",
195
+ editor = "Christodoulopoulos, Christos and
196
+ Chakraborty, Tanmoy and
197
+ Rose, Carolyn and
198
+ Peng, Violet",
199
+ booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing",
200
+ month = nov,
201
+ year = "2025",
202
+ address = "Suzhou, China",
203
+ publisher = "Association for Computational Linguistics",
204
+ url = "https://aclanthology.org/2025.emnlp-main.559/",
205
+ doi = "10.18653/v1/2025.emnlp-main.559",
206
+ pages = "11039--11061",
207
+ ISBN = "979-8-89176-332-6",
208
+ }
209
+
210
+ ```
211