--- language: - ak # Akuapim Twi - tw # Asante Twi - aeb # Tunisian Arabic - af # Afrikaans - am # Amharic - ar # Arabic - bas # Basaa - bem # Bemba - dav # Taita - dyu # Dyula - en # English - pcm # Nigerian Pidgin - ee # Ewe - fat # Fanti - fon # Fon - fuc # Pulaar - ff # Pular - gaa # Ga - ha # Hausa - ig # Igbo - kab # Kabyle - rw # Kinyarwanda - kln # Kalenjin - ln # Lingala - loz # Lozi - lg # Luganda - luo # Luo - mlq # Western Maninkakan - nr # South Ndebele - nso # Northern Sotho - ny # Chichewa - st # Southern Sotho - srr # Serer - ss # Swati - sus # Susu - sw # Kiswahili/Swahili - tig # Tigre - ti # Tigrinya - toi # Tonga - tn # Tswana - ts # Tsonga - tw # Twi - ve # Venda - wo # Wolof - xh # Xhosa - yo # Yoruba - zgh # Standard Moroccan Tamazight - zu # Zulu license: cc-by-4.0 tags: - automatic-speech-recognition - audio - speech - african-languages - multilingual - simba - low-resource - speech-recognition - asr - spoken-language-identification - language-identification datasets: - UBC-NLP/SimbaBench metrics: - wer - cer - accuracy library_name: transformers pipeline_tag: automatic-speech-recognition ---
VoC Logo [![EMNLP 2025 Paper](https://img.shields.io/badge/EMNLP_2025-Paper-B31B1B?style=for-the-badge&logo=arxiv&logoColor=B31B1B&labelColor=FFCDD2)](https://aclanthology.org/2025.emnlp-main.559/) [![Official Website](https://img.shields.io/badge/Official-Website-2EA44F?style=for-the-badge&logo=googlechrome&logoColor=2EA44F&labelColor=C8E6C9)](https://africa.dlnlp.ai/simba/) [![SimbaBench](https://img.shields.io/badge/SimbaBench-Benchmark-8A2BE2?style=for-the-badge&logo=googlecharts&logoColor=8A2BE2&labelColor=E1BEE7)](https://huggingface.co/spaces/UBC-NLP/SimbaBench) [![GitHub Repository](https://img.shields.io/badge/GitHub-Repository-181717?style=for-the-badge&logo=github&logoColor=181717&labelColor=E0E0E0)](https://github.com/UBC-NLP/simba) [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-FFD21E?style=for-the-badge&logoColor=181717&labelColor=FFF9C4)](https://huggingface.co/collections/UBC-NLP/simba-speech-series) [![Hugging Face Dataset](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Dataset-FFD21E?style=for-the-badge&logoColor=181717&labelColor=FFF9C4)](https://huggingface.co/datasets/UBC-NLP/SimbaBench_dataset)
## *Bridging the Digital Divide for African AI* **Voice of a Continent** is a comprehensive open-source ecosystem designed to bring African languages to the forefront of artificial intelligence. By providing a unified suite of benchmarking tools and state-of-the-art models, we ensure that the future of speech technology is inclusive, representative, and accessible to over a billion people. ## Best-in-Class Multilingual Models VoC Simba Models Logo Introduced in our EMNLP 2025 paper *[Voice of a Continent](https://aclanthology.org/2025.emnlp-main.559/)*, the **Simba Series** represents the current state-of-the-art for African speech AI. - **Unified Suite:** Models optimized for African languages. - **Superior Accuracy:** Outperforms generic multilingual models by leveraging SimbaBench's high-quality, domain-diverse datasets. - **Multitask Capability:** Designed for high performance in ASR (Automatic Speech Recognition) and TTS (Text-to-Speech). - **Inclusion-First:** Specifically built to mitigate the "digital divide" by empowering speakers of underrepresented languages. The **Simba** family consists of state-of-the-art models fine-tuned using SimbaBench. These models achieve superior performance by leveraging dataset quality, domain diversity, and language family relationships. ### 🔍 Simba-SLID (Spoken Language Identification) * **🎯 Task:** `Spoken Language Identification` — Intelligent input routing. * **🌍 Language Coverage (49 African languages)** > **Akuapim Twi** (`Akuapim-twi`), **Asante Twi** (`Asante-twi`), **Tunisian Arabic** (`aeb`), **Afrikaans** (`afr`), **Amharic** (`amh`), **Arabic** (`ara`), **Basaa** (`bas`), **Bemba** (`bem`), **Taita** (`dav`), **Dyula** (`dyu`), **English** (`eng`), **Nigerian Pidgin** (`eng-zul`), **Ewe** (`ewe`), **Fanti** (`fat`), **Fon** (`fon`), **Pulaar** (`fuc`), **Pular** (`fuf`), **Ga** (`gaa`), **Hausa** (`hau`), **Igbo** (`ibo`), **Kabyle** (`kab`), **Kinyarwanda** (`kin`), **Kalenjin** (`kln`), **Lingala** (`lin`), **Lozi** (`loz`), **Luganda** (`lug`), **Luo** (`luo`), **Western Maninkakan** (`mlq`), **South Ndebele** (`nbl`), **Northern Sotho** (`nso`), **Chichewa** (`nya`), **Southern Sotho** (`sot`), **Serer** (`srr`), **Swati** (`ssw`), **Susu** (`sus`), **Kiswahili** (`swa`), **Swahili** (`swh`), **Tigre** (`tig`), **Tigrinya** (`tir`), **Tonga** (`toi`), **Tswana** (`tsn`), **Tsonga** (`tso`), **Twi** (`twi`), **Venda** (`ven`), **Wolof** (`wol`), **Xhosa** (`xho`), **Yoruba** (`yor`), **Standard Moroccan Tamazight** (`zgh`), **Zulu** (`zul`) | **SLID Model** | **Architecture** | **Hugging Face Card** | **Status** | | :--- | :--- | :---: | :---: | | **Simba-SLID-49** 🔍 | HuBERT | 🤗 [https://huggingface.co/UBC-NLP/Simba-SLIS-49](https://huggingface.co/UBC-NLPSimba-SLIS-49) | ✅ Released | **🧩 Usage Example** You can easily run inference using the Hugging Face `transformers` library. ```python from transformers import ( HubertForSequenceClassification, AutoFeatureExtractor, AutoProcessor ) import torch model_id = "UBC-NLP/Simba-SLIS_49" model = HubertForSequenceClassification.from_pretrained(model_id).to("cuda") # HuBERT models can use either processor or feature extractor depending on the specific model try: processor = AutoProcessor.from_pretrained(model_id) print("Loaded Simba-SLIS_49 model with AutoProcessor") except: processor = AutoFeatureExtractor.from_pretrained(model_id) print("Loaded Simba-SLIS_49 model with AutoFeatureExtractor") # Optimize model for inference model.eval() audio_arrays = [] ### add your audio array sample_rate=16000 nputs = processor(audio_arrays, sampling_rate=sample_rate, return_tensors="pt", padding=True).to("cuda") # Different models might have slightly different input formats try: logits = model(**inputs).logits except Exception as e: # Try alternative input format if the first attempt fails if "input_values" in inputs: logits = model(input_values=inputs.input_values).logits else: raise e # Calculate softmax probabilities probs = torch.nn.functional.softmax(logits, dim=-1) # Get the maximum probability (confidence) for each prediction confidence_values, pred_ids = torch.max(probs, dim=-1) # Convert to Python lists pred_ids = pred_ids.tolist() confidence_values = confidence_values.cpu().tolist() # Get labels from IDs pred_labels = [model.config.id2label[i] for i in pred_ids] print(pred_labels, confidence_values) ``` ## Citation If you use the Simba models or SimbaBench benchmark for your scientific publication, or if you find the resources in this website useful, please cite our paper. ```bibtex @inproceedings{elmadany-etal-2025-voice, title = "Voice of a Continent: Mapping {A}frica{'}s Speech Technology Frontier", author = "Elmadany, AbdelRahim A. and Kwon, Sang Yun and Toyin, Hawau Olamide and Alcoba Inciarte, Alcides and Aldarmaki, Hanan and Abdul-Mageed, Muhammad", editor = "Christodoulopoulos, Christos and Chakraborty, Tanmoy and Rose, Carolyn and Peng, Violet", booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing", month = nov, year = "2025", address = "Suzhou, China", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2025.emnlp-main.559/", doi = "10.18653/v1/2025.emnlp-main.559", pages = "11039--11061", ISBN = "979-8-89176-332-6", } ```