Automatic Speech Recognition
Transformers
Safetensors
seamless_m4t_v2
audio
speech
african-languages
multilingual
simba
low-resource
speech-recognition
asr
Instructions to use ghananlpcommunity/Simba-S with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ghananlpcommunity/Simba-S with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="ghananlpcommunity/Simba-S")# Load model directly from transformers import AutoTokenizer, AutoModelForSpeechSeq2Seq tokenizer = AutoTokenizer.from_pretrained("ghananlpcommunity/Simba-S") model = AutoModelForSpeechSeq2Seq.from_pretrained("ghananlpcommunity/Simba-S") - Notebooks
- Google Colab
- Kaggle
| language: | |
| - am # Amharic | |
| - ar # Arabic | |
| - tw # Asante Twi | |
| - bm # Bambara | |
| - fr # French | |
| - lg # Ganda | |
| - ha # Hausa | |
| - ig # Igbo | |
| - rw # Kinyarwanda | |
| - kg # Kongo | |
| - ln # Lingala | |
| - lu # Luba-Katanga | |
| - mg # Malagasy | |
| - nso # Northern Sotho | |
| - ny # Nyanja | |
| - om # Oromo | |
| - pt # Portuguese | |
| - sn # Shona | |
| - so # Somali | |
| - st # Southern Sotho | |
| - sw # Swahili | |
| - ss # Swati | |
| - ti # Tigrinya | |
| - ts # Tsonga | |
| - tn # Tswana | |
| - ak # Twi | |
| - ve # Venda | |
| - wo # Wolof | |
| - xh # Xhosa | |
| - yo # Yoruba | |
| - zu # Zulu | |
| - tzm # Tamazight | |
| - sg # Sango | |
| - din # Dinka | |
| - ee # Ewe | |
| - fo # Fon | |
| - luo # Luo | |
| - mos # Mossi | |
| - umb # Umbundu | |
| license: cc-by-4.0 | |
| tags: | |
| - automatic-speech-recognition | |
| - audio | |
| - speech | |
| - african-languages | |
| - multilingual | |
| - simba | |
| - low-resource | |
| - speech-recognition | |
| - asr | |
| datasets: | |
| - UBC-NLP/SimbaBench | |
| metrics: | |
| - wer | |
| - cer | |
| library_name: transformers | |
| pipeline_tag: automatic-speech-recognition | |
| <div align="center"> | |
| <img src="https://africa.dlnlp.ai/simba/images/VoC_simba" alt="VoC Simba Models Logo"> | |
| [](https://aclanthology.org/2025.emnlp-main.559/) | |
| [](https://africa.dlnlp.ai/simba/) | |
| [](https://huggingface.co/spaces/UBC-NLP/SimbaBench) | |
| [](https://github.com/UBC-NLP/simba) | |
| [](https://huggingface.co/collections/UBC-NLP/simba-speech-series) | |
| [](https://huggingface.co/datasets/UBC-NLP/SimbaBench_dataset) | |
| </div> | |
| ## *Bridging the Digital Divide for African AI* | |
| **Voice of a Continent** is a comprehensive open-source ecosystem designed to bring African languages to the forefront of artificial intelligence. By providing a unified suite of benchmarking tools and state-of-the-art models, we ensure that the future of speech technology is inclusive, representative, and accessible to over a billion people. | |
| ## Best-in-Class Multilingual Models | |
| Introduced in our EMNLP 2025 paper *[Voice of a Continent](https://aclanthology.org/2025.emnlp-main.559/)*, the **Simba Series** represents the current state-of-the-art for African speech AI. | |
| - **Unified Suite:** Models optimized for African languages. | |
| - **Superior Accuracy:** Outperforms generic multilingual models by leveraging SimbaBench's high-quality, domain-diverse datasets. | |
| - **Multitask Capability:** Designed for high performance in ASR (Automatic Speech Recognition) and TTS (Text-to-Speech). | |
| - **Inclusion-First:** Specifically built to mitigate the "digital divide" by empowering speakers of underrepresented languages. | |
| The **Simba** family consists of state-of-the-art models fine-tuned using SimbaBench. These models achieve superior performance by leveraging dataset quality, domain diversity, and language family relationships. | |
| ### π£οΈβοΈ Simba-ASR | |
| > **The New Standard for African Speech-to-Text** | |
| **π― Task** `Automatic Speech Recognition` β Powering high-accuracy transcription across the continent. | |
| **π Language Coverage (43 African languages)** | |
| > **Amharic** (`amh`), **Arabic** (`ara`), **Asante Twi** (`asanti`), **Bambara** (`bam`), **BaoulΓ©** (`bau`), **Bemba** (`bem`), **Ewe** (`ewe`), **Fanti** (`fat`), **Fon** (`fon`), **French** (`fra`), **Ganda** (`lug`), **Hausa** (`hau`), **Igbo** (`ibo`), **Kabiye** (`kab`), **Kinyarwanda** (`kin`), **Kongo** (`kon`), **Lingala** (`lin`), **Luba-Katanga** (`lub`), **Luo** (`luo`), **Malagasy** (`mlg`), **Mossi** (`mos`), **Northern Sotho** (`nso`), **Nyanja** (`nya`), **Oromo** (`orm`), **Portuguese** (`por`), **Shona** (`sna`), **Somali** (`som`), **Southern Sotho** (`sot`), **Swahili** (`swa`), **Swati** (`ssw`), **Tigrinya** (`tir`), **Tsonga** (`tso`), **Tswana** (`tsn`), **Twi** (`twi`), **Umbundu** (`umb`), **Venda** (`ven`), **Wolof** (`wol`), **Xhosa** (`xho`), **Yoruba** (`yor`), **Zulu** (`zul`), **Tamazight** (`tzm`), **Sango** (`sag`), **Dinka** (`din`). | |
| **ποΈ Base Architectures** | |
| - **Simba-S** (SeamlessM4T-v2-MT) β *Top Performer* | |
| - **Simba-W** (Whisper-v3-large) | |
| - **Simba-X** (Wav2Vec2-XLS-R-2b) | |
| - **Simba-M** (MMS-1b-all) | |
| - **Simba-H** (AfriHuBERT) | |
| π Explore the Frontier | |
| | **ASR Models** | **Architecture** | **#Parameters** | **π€ Hugging Face Model Card** | **Status** | | |
| |---------|:------------------:| :------------------:| :------------------:|:------------------:| | |
| | π₯**Simba-S**π₯| SeamlessM4T-v2 | 2.3B | π€ [https://huggingface.co/UBC-NLP/Simba-S](https://huggingface.co/UBC-NLP/Simba-S) | β Released | | |
| | π₯**Simba-W**π₯| Whisper | 1.5B | π€ [https://huggingface.co/UBC-NLP/Simba-W](https://huggingface.co/UBC-NLP/Simba-W) | β Released | | |
| | π₯**Simba-X**π₯| Wav2Vec2 | 1B | π€ [https://huggingface.co/UBC-NLP/Simba-X](https://huggingface.co/UBC-NLP/Simba-X) | β Released | | |
| | π₯**Simba-M**π₯| MMS | 1B | π€ [https://huggingface.co/UBC-NLP/Simba-M](https://huggingface.co/UBC-NLP/Simba-M) | β Released | | |
| | π₯**Simba-H**π₯| HuBERT | 94M | π€ [https://huggingface.co/UBC-NLP/Simba-H](https://huggingface.co/UBC-NLP/Simba-H) | β Released | | |
| * **Simba-S** emerged as the best-performing ASR model overall. | |
| **π§© Usage Example** | |
| You can easily run inference using the Hugging Face `transformers` library. | |
| ```python | |
| from transformers import pipeline | |
| # Load Simba-S for ASR | |
| asr_pipeline = pipeline( | |
| "automatic-speech-recognition", | |
| model="UBC-NLP/Simba-S" #Simba mdoels `UBC-NLP/Simba-S`, `UBC-NLP/Simba-W`, `UBC-NLP/Simba-X`, `UBC-NLP/Simba-H`, `UBC-NLP/Simba-M` | |
| ) | |
| ##### Load the multilingual African adapter (Only for `UBC-NLP/Simba-M`) | |
| asr_pipeline.model.load_adapter("multilingual_african") # Only for `UBC-NLP/Simba-M` | |
| ########################### | |
| # Transcribe audio from file | |
| result = asr_pipeline("https://africa.dlnlp.ai/simba/audio/afr_Lwazi_afr_test_idx3889.wav") | |
| print(result["text"]) | |
| # Transcribe audio from audio array | |
| result = asr_pipeline({ | |
| "array": audio_array, | |
| "sampling_rate": 16_000 | |
| }) | |
| print(result["text"]) | |
| ``` | |
| #### Example Outputs | |
| Using the same audio file with different Simba models: | |
| ```python | |
| # Simba-S | |
| {'text': 'watter verontwaardiging sou daar, in ons binneste gewees het.'} | |
| ``` | |
| ```python | |
| # Simba-W | |
| {'text': 'watter veronwaardigingsel daar, in ons binneste gewees het.'} | |
| ``` | |
| ```python | |
| # Simba-X | |
| {'text': 'fator fr on ar taamsodr is'} | |
| ``` | |
| ```python | |
| # Simba-M | |
| {'text': 'watter veronwaardiging sodaar in ons binniste gewees het'} | |
| ``` | |
| ```python | |
| # Simba-H | |
| {'text': 'watter vironwaardiging so daar in ons binneste geweeshet'} | |
| ``` | |
| Get started with Simba models in minutes using our interactive Colab notebook: [](https://github.com/UBC-NLP/simba/blob/main/simba_models.ipynb) | |
| ## Citation | |
| If you use the Simba models or SimbaBench benchmark for your scientific publication, or if you find the resources in this website useful, please cite our paper. | |
| ```bibtex | |
| @inproceedings{elmadany-etal-2025-voice, | |
| title = "Voice of a Continent: Mapping {A}frica{'}s Speech Technology Frontier", | |
| author = "Elmadany, AbdelRahim A. and | |
| Kwon, Sang Yun and | |
| Toyin, Hawau Olamide and | |
| Alcoba Inciarte, Alcides and | |
| Aldarmaki, Hanan and | |
| Abdul-Mageed, Muhammad", | |
| editor = "Christodoulopoulos, Christos and | |
| Chakraborty, Tanmoy and | |
| Rose, Carolyn and | |
| Peng, Violet", | |
| booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing", | |
| month = nov, | |
| year = "2025", | |
| address = "Suzhou, China", | |
| publisher = "Association for Computational Linguistics", | |
| url = "https://aclanthology.org/2025.emnlp-main.559/", | |
| doi = "10.18653/v1/2025.emnlp-main.559", | |
| pages = "11039--11061", | |
| ISBN = "979-8-89176-332-6", | |
| } | |
| ``` | |