Simba-SLID-49 / README.md

Initial model upload

4701d9d verified 4 days ago

8.17 kB

	---
	language:
	- ak # Akuapim Twi
	- tw # Asante Twi
	- aeb # Tunisian Arabic
	- af # Afrikaans
	- am # Amharic
	- ar # Arabic
	- bas # Basaa
	- bem # Bemba
	- dav # Taita
	- dyu # Dyula
	- en # English
	- pcm # Nigerian Pidgin
	- ee # Ewe
	- fat # Fanti
	- fon # Fon
	- fuc # Pulaar
	- ff # Pular
	- gaa # Ga
	- ha # Hausa
	- ig # Igbo
	- kab # Kabyle
	- rw # Kinyarwanda
	- kln # Kalenjin
	- ln # Lingala
	- loz # Lozi
	- lg # Luganda
	- luo # Luo
	- mlq # Western Maninkakan
	- nr # South Ndebele
	- nso # Northern Sotho
	- ny # Chichewa
	- st # Southern Sotho
	- srr # Serer
	- ss # Swati
	- sus # Susu
	- sw # Kiswahili/Swahili
	- tig # Tigre
	- ti # Tigrinya
	- toi # Tonga
	- tn # Tswana
	- ts # Tsonga
	- tw # Twi
	- ve # Venda
	- wo # Wolof
	- xh # Xhosa
	- yo # Yoruba
	- zgh # Standard Moroccan Tamazight
	- zu # Zulu

	license: cc-by-4.0
	tags:
	- automatic-speech-recognition
	- audio
	- speech
	- african-languages
	- multilingual
	- simba
	- low-resource
	- speech-recognition
	- asr
	- spoken-language-identification
	- language-identification
	datasets:
	- UBC-NLP/SimbaBench
	metrics:
	- wer
	- cer
	- accuracy
	library_name: transformers
	pipeline_tag: automatic-speech-recognition
	---

	<div align="center">

	<img src="https://africa.dlnlp.ai/simba/images/VoC_logo.png" alt="VoC Logo">

	[![EMNLP 2025 Paper](https://img.shields.io/badge/EMNLP_2025-Paper-B31B1B?style=for-the-badge&logo=arxiv&logoColor=B31B1B&labelColor=FFCDD2)](https://aclanthology.org/2025.emnlp-main.559/)
	[![Official Website](https://img.shields.io/badge/Official-Website-2EA44F?style=for-the-badge&logo=googlechrome&logoColor=2EA44F&labelColor=C8E6C9)](https://africa.dlnlp.ai/simba/)
	[![SimbaBench](https://img.shields.io/badge/SimbaBench-Benchmark-8A2BE2?style=for-the-badge&logo=googlecharts&logoColor=8A2BE2&labelColor=E1BEE7)](https://huggingface.co/spaces/UBC-NLP/SimbaBench)
	[![GitHub Repository](https://img.shields.io/badge/GitHub-Repository-181717?style=for-the-badge&logo=github&logoColor=181717&labelColor=E0E0E0)](https://github.com/UBC-NLP/simba)
	[![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-FFD21E?style=for-the-badge&logoColor=181717&labelColor=FFF9C4)](https://huggingface.co/collections/UBC-NLP/simba-speech-series)
	[![Hugging Face Dataset](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Dataset-FFD21E?style=for-the-badge&logoColor=181717&labelColor=FFF9C4)](https://huggingface.co/datasets/UBC-NLP/SimbaBench_dataset)

	</div>

	## Bridging the Digital Divide for African AI

	Voice of a Continent is a comprehensive open-source ecosystem designed to bring African languages to the forefront of artificial intelligence. By providing a unified suite of benchmarking tools and state-of-the-art models, we ensure that the future of speech technology is inclusive, representative, and accessible to over a billion people.

	## Best-in-Class Multilingual Models

	<img src="https://africa.dlnlp.ai/simba/images/VoC_simba" alt="VoC Simba Models Logo">

	Introduced in our EMNLP 2025 paper [Voice of a Continent](https://aclanthology.org/2025.emnlp-main.559/), the Simba Series represents the current state-of-the-art for African speech AI.

	- Unified Suite: Models optimized for African languages.
	- Superior Accuracy: Outperforms generic multilingual models by leveraging SimbaBench's high-quality, domain-diverse datasets.
	- Multitask Capability: Designed for high performance in ASR (Automatic Speech Recognition) and TTS (Text-to-Speech).
	- Inclusion-First: Specifically built to mitigate the "digital divide" by empowering speakers of underrepresented languages.

	The Simba family consists of state-of-the-art models fine-tuned using SimbaBench. These models achieve superior performance by leveraging dataset quality, domain diversity, and language family relationships.


	### 🔍 Simba-SLID (Spoken Language Identification)
	* 🎯 Task: `Spoken Language Identification` — Intelligent input routing.
	* 🌍 Language Coverage (49 African languages)
	> Akuapim Twi (`Akuapim-twi`), Asante Twi (`Asante-twi`), Tunisian Arabic (`aeb`), Afrikaans (`afr`), Amharic (`amh`), Arabic (`ara`), Basaa (`bas`), Bemba (`bem`), Taita (`dav`), Dyula (`dyu`), English (`eng`), Nigerian Pidgin (`eng-zul`), Ewe (`ewe`), Fanti (`fat`), Fon (`fon`), Pulaar (`fuc`), Pular (`fuf`), Ga (`gaa`), Hausa (`hau`), Igbo (`ibo`), Kabyle (`kab`), Kinyarwanda (`kin`), Kalenjin (`kln`), Lingala (`lin`), Lozi (`loz`), Luganda (`lug`), Luo (`luo`), Western Maninkakan (`mlq`), South Ndebele (`nbl`), Northern Sotho (`nso`), Chichewa (`nya`), Southern Sotho (`sot`), Serer (`srr`), Swati (`ssw`), Susu (`sus`), Kiswahili (`swa`), Swahili (`swh`), Tigre (`tig`), Tigrinya (`tir`), Tonga (`toi`), Tswana (`tsn`), Tsonga (`tso`), Twi (`twi`), Venda (`ven`), Wolof (`wol`), Xhosa (`xho`), Yoruba (`yor`), Standard Moroccan Tamazight (`zgh`), Zulu (`zul`)

	\| SLID Model \| Architecture \| Hugging Face Card \| Status \|
	\| :--- \| :--- \| :---: \| :---: \|
	\| Simba-SLID-49 🔍 \| HuBERT \| 🤗 [https://huggingface.co/UBC-NLP/Simba-SLIS-49](https://huggingface.co/UBC-NLPSimba-SLIS-49) \| ✅ Released \|


	🧩 Usage Example

	You can easily run inference using the Hugging Face `transformers` library.

	```python
	from transformers import (
	HubertForSequenceClassification,
	AutoFeatureExtractor,
	AutoProcessor
	)
	import torch

	model_id = "UBC-NLP/Simba-SLIS_49"
	model = HubertForSequenceClassification.from_pretrained(model_id).to("cuda")
	# HuBERT models can use either processor or feature extractor depending on the specific model
	try:
	processor = AutoProcessor.from_pretrained(model_id)
	print("Loaded Simba-SLIS_49 model with AutoProcessor")
	except:
	processor = AutoFeatureExtractor.from_pretrained(model_id)
	print("Loaded Simba-SLIS_49 model with AutoFeatureExtractor")

	# Optimize model for inference
	model.eval()
	audio_arrays = [] ### add your audio array
	sample_rate=16000

	nputs = processor(audio_arrays, sampling_rate=sample_rate, return_tensors="pt", padding=True).to("cuda")

	# Different models might have slightly different input formats
	try:
	logits = model(**inputs).logits
	except Exception as e:
	# Try alternative input format if the first attempt fails
	if "input_values" in inputs:
	logits = model(input_values=inputs.input_values).logits
	else:
	raise e

	# Calculate softmax probabilities
	probs = torch.nn.functional.softmax(logits, dim=-1)

	# Get the maximum probability (confidence) for each prediction
	confidence_values, pred_ids = torch.max(probs, dim=-1)

	# Convert to Python lists
	pred_ids = pred_ids.tolist()
	confidence_values = confidence_values.cpu().tolist()
	# Get labels from IDs
	pred_labels = [model.config.id2label[i] for i in pred_ids]


	print(pred_labels, confidence_values)
	```


	## Citation

	If you use the Simba models or SimbaBench benchmark for your scientific publication, or if you find the resources in this website useful, please cite our paper.

	```bibtex

	@inproceedings{elmadany-etal-2025-voice,
	title = "Voice of a Continent: Mapping {A}frica{'}s Speech Technology Frontier",
	author = "Elmadany, AbdelRahim A. and
	Kwon, Sang Yun and
	Toyin, Hawau Olamide and
	Alcoba Inciarte, Alcides and
	Aldarmaki, Hanan and
	Abdul-Mageed, Muhammad",
	editor = "Christodoulopoulos, Christos and
	Chakraborty, Tanmoy and
	Rose, Carolyn and
	Peng, Violet",
	booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing",
	month = nov,
	year = "2025",
	address = "Suzhou, China",
	publisher = "Association for Computational Linguistics",
	url = "https://aclanthology.org/2025.emnlp-main.559/",
	doi = "10.18653/v1/2025.emnlp-main.559",
	pages = "11039--11061",
	ISBN = "979-8-89176-332-6",
	}

	```