File size: 8,169 Bytes

4701d9d

---
language:
  - ak  # Akuapim Twi
  - tw  # Asante Twi
  - aeb # Tunisian Arabic
  - af  # Afrikaans
  - am  # Amharic
  - ar  # Arabic
  - bas # Basaa
  - bem # Bemba
  - dav # Taita
  - dyu # Dyula
  - en  # English
  - pcm # Nigerian Pidgin
  - ee  # Ewe
  - fat # Fanti
  - fon # Fon
  - fuc # Pulaar
  - ff  # Pular
  - gaa # Ga
  - ha  # Hausa
  - ig  # Igbo
  - kab # Kabyle
  - rw  # Kinyarwanda
  - kln # Kalenjin
  - ln  # Lingala
  - loz # Lozi
  - lg  # Luganda
  - luo # Luo
  - mlq # Western Maninkakan
  - nr  # South Ndebele
  - nso # Northern Sotho
  - ny  # Chichewa
  - st  # Southern Sotho
  - srr # Serer
  - ss  # Swati
  - sus # Susu
  - sw  # Kiswahili/Swahili
  - tig # Tigre
  - ti  # Tigrinya
  - toi # Tonga
  - tn  # Tswana
  - ts  # Tsonga
  - tw  # Twi
  - ve  # Venda
  - wo  # Wolof
  - xh  # Xhosa
  - yo  # Yoruba
  - zgh # Standard Moroccan Tamazight
  - zu  # Zulu

license: cc-by-4.0
tags:
  - automatic-speech-recognition
  - audio
  - speech
  - african-languages
  - multilingual
  - simba
  - low-resource
  - speech-recognition
  - asr
  - spoken-language-identification
  - language-identification
datasets:
  - UBC-NLP/SimbaBench
metrics:
  - wer
  - cer
  - accuracy
library_name: transformers
pipeline_tag: automatic-speech-recognition
---

<div align="center">

<img src="https://africa.dlnlp.ai/simba/images/VoC_logo.png" alt="VoC Logo">

[![EMNLP 2025 Paper](https://img.shields.io/badge/EMNLP_2025-Paper-B31B1B?style=for-the-badge&logo=arxiv&logoColor=B31B1B&labelColor=FFCDD2)](https://aclanthology.org/2025.emnlp-main.559/)
[![Official Website](https://img.shields.io/badge/Official-Website-2EA44F?style=for-the-badge&logo=googlechrome&logoColor=2EA44F&labelColor=C8E6C9)](https://africa.dlnlp.ai/simba/)
[![SimbaBench](https://img.shields.io/badge/SimbaBench-Benchmark-8A2BE2?style=for-the-badge&logo=googlecharts&logoColor=8A2BE2&labelColor=E1BEE7)](https://huggingface.co/spaces/UBC-NLP/SimbaBench)
[![GitHub Repository](https://img.shields.io/badge/GitHub-Repository-181717?style=for-the-badge&logo=github&logoColor=181717&labelColor=E0E0E0)](https://github.com/UBC-NLP/simba)
[![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-FFD21E?style=for-the-badge&logoColor=181717&labelColor=FFF9C4)](https://huggingface.co/collections/UBC-NLP/simba-speech-series)
[![Hugging Face Dataset](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Dataset-FFD21E?style=for-the-badge&logoColor=181717&labelColor=FFF9C4)](https://huggingface.co/datasets/UBC-NLP/SimbaBench_dataset)

</div>

## *Bridging the Digital Divide for African AI*

**Voice of a Continent** is a comprehensive open-source ecosystem designed to bring African languages to the forefront of artificial intelligence. By providing a unified suite of benchmarking tools and state-of-the-art models, we ensure that the future of speech technology is inclusive, representative, and accessible to over a billion people.

## Best-in-Class Multilingual Models

<img src="https://africa.dlnlp.ai/simba/images/VoC_simba" alt="VoC Simba Models Logo">

Introduced in our EMNLP 2025 paper *[Voice of a Continent](https://aclanthology.org/2025.emnlp-main.559/)*, the **Simba Series** represents the current state-of-the-art for African speech AI.

- **Unified Suite:** Models optimized for African languages.
- **Superior Accuracy:** Outperforms generic multilingual models by leveraging SimbaBench's high-quality, domain-diverse datasets.
- **Multitask Capability:** Designed for high performance in ASR (Automatic Speech Recognition) and TTS (Text-to-Speech).
- **Inclusion-First:** Specifically built to mitigate the "digital divide" by empowering speakers of underrepresented languages.

The **Simba** family consists of state-of-the-art models fine-tuned using SimbaBench. These models achieve superior performance by leveraging dataset quality, domain diversity, and language family relationships.


### 🔍 Simba-SLID (Spoken Language Identification)
* **🎯 Task:** `Spoken Language Identification` — Intelligent input routing.
* **🌍 Language Coverage (49 African languages)**
  > **Akuapim Twi** (`Akuapim-twi`), **Asante Twi** (`Asante-twi`), **Tunisian Arabic** (`aeb`), **Afrikaans** (`afr`), **Amharic** (`amh`), **Arabic** (`ara`), **Basaa** (`bas`), **Bemba** (`bem`), **Taita** (`dav`), **Dyula** (`dyu`), **English** (`eng`), **Nigerian Pidgin** (`eng-zul`), **Ewe** (`ewe`), **Fanti** (`fat`), **Fon** (`fon`), **Pulaar** (`fuc`), **Pular** (`fuf`), **Ga** (`gaa`), **Hausa** (`hau`), **Igbo** (`ibo`), **Kabyle** (`kab`), **Kinyarwanda** (`kin`), **Kalenjin** (`kln`), **Lingala** (`lin`), **Lozi** (`loz`), **Luganda** (`lug`), **Luo** (`luo`), **Western Maninkakan** (`mlq`), **South Ndebele** (`nbl`), **Northern Sotho** (`nso`), **Chichewa** (`nya`), **Southern Sotho** (`sot`), **Serer** (`srr`), **Swati** (`ssw`), **Susu** (`sus`), **Kiswahili** (`swa`), **Swahili** (`swh`), **Tigre** (`tig`), **Tigrinya** (`tir`), **Tonga** (`toi`), **Tswana** (`tsn`), **Tsonga** (`tso`), **Twi** (`twi`), **Venda** (`ven`), **Wolof** (`wol`), **Xhosa** (`xho`), **Yoruba** (`yor`), **Standard Moroccan Tamazight** (`zgh`), **Zulu** (`zul`)

| **SLID Model** | **Architecture** | **Hugging Face Card** | **Status** |
| :--- | :--- | :---: | :---: |
| **Simba-SLID-49** 🔍 | HuBERT | 🤗 [https://huggingface.co/UBC-NLP/Simba-SLIS-49](https://huggingface.co/UBC-NLPSimba-SLIS-49) | ✅ Released |


**🧩 Usage Example**

You can easily run inference using the Hugging Face `transformers` library.

```python
from transformers import (
    HubertForSequenceClassification,
    AutoFeatureExtractor,
    AutoProcessor
)
import torch

model_id = "UBC-NLP/Simba-SLIS_49"
model = HubertForSequenceClassification.from_pretrained(model_id).to("cuda")
# HuBERT models can use either processor or feature extractor depending on the specific model
try:
    processor = AutoProcessor.from_pretrained(model_id)
    print("Loaded Simba-SLIS_49 model with AutoProcessor")
except:
    processor = AutoFeatureExtractor.from_pretrained(model_id)
    print("Loaded Simba-SLIS_49 model with AutoFeatureExtractor")

# Optimize model for inference
model.eval()
audio_arrays = [] ### add your audio array
sample_rate=16000

nputs = processor(audio_arrays, sampling_rate=sample_rate, return_tensors="pt", padding=True).to("cuda")
    
# Different models might have slightly different input formats
try:
    logits = model(**inputs).logits
except Exception as e:
    # Try alternative input format if the first attempt fails
    if "input_values" in inputs:
        logits = model(input_values=inputs.input_values).logits
    else:
        raise e

# Calculate softmax probabilities
probs = torch.nn.functional.softmax(logits, dim=-1)

# Get the maximum probability (confidence) for each prediction
confidence_values, pred_ids = torch.max(probs, dim=-1)

# Convert to Python lists
pred_ids = pred_ids.tolist()
confidence_values = confidence_values.cpu().tolist()
# Get labels from IDs
pred_labels = [model.config.id2label[i] for i in pred_ids]


print(pred_labels, confidence_values)
```


## Citation

If you use the Simba models or SimbaBench  benchmark for your scientific publication, or if you find the resources in this website useful, please cite our paper.

```bibtex

@inproceedings{elmadany-etal-2025-voice,
    title = "Voice of a Continent: Mapping {A}frica{'}s Speech Technology Frontier",
    author = "Elmadany, AbdelRahim A.  and
      Kwon, Sang Yun  and
      Toyin, Hawau Olamide  and
      Alcoba Inciarte, Alcides  and
      Aldarmaki, Hanan  and
      Abdul-Mageed, Muhammad",
    editor = "Christodoulopoulos, Christos  and
      Chakraborty, Tanmoy  and
      Rose, Carolyn  and
      Peng, Violet",
    booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2025",
    address = "Suzhou, China",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.emnlp-main.559/",
    doi = "10.18653/v1/2025.emnlp-main.559",
    pages = "11039--11061",
    ISBN = "979-8-89176-332-6",
}

```