metadata
library_name: transformers
license: cc-by-nc-4.0
datasets:
- google/WaxalNLP
metrics:
- wer
- cer
language:
- am
- ti
- om
base_model:
- facebook/mms-300m
pipeline_tag: automatic-speech-recognition
tags:
- multilingual
arXiv ๐ [ preprint ]
โ๏ธ Model Description
Ethio-ASR is a suite of multilingual Automatic Speech Recognition (ASR) models that support five Ethiopian languages: Amharic, Tigrinya, Afaan Oromo, Sidama, and Wolaytta. The ASR model in this repo is based on the mms-300m pre-trained model by fine-tuning it on the WAXAL Speech Dataset.
- Developed by: Ethio-ASR Team
- Task: Speech Recognition (ASR) and Language Identification (LID)
- Languages: Amharic, Tigrinya, Afaan Oromo, Sidama, and Wolaytta
- License: CC-BY-NC 4.0
- Finetuned from: facebook/mms-300m
๐ Evaluation on WAXAL Test Set
๐ ASR model in this HF repo
| Model | # Params | Amharic | Tigrinya | Oromo | Wolaytta | Sidaama | Avg. |
|---|---|---|---|---|---|---|---|
| Ethio-ASR (afrihubert) | 92M | 30.95 | 42.42 | 27.57 | 40.44 | 34.02 | 35.08 |
| Ethio-ASR (mms-300) ๐ | 300M | 30.19 | 41.62 | 26.41 | 39.10 | 32.66 | 33.99 |
| Ethio-ASR (mms-1b) | 1B | 26.14 | 37.63 | 23.69 | 37.51 | 31.02 | 31.20 |
| Ethio-ASR (w2v-bert-2.0) | 600M | 22.92 | 35.22 | 24.44 | 38.19 | 31.65 | 30.48 |
๐ง Direct Use
from transformers import AutoModelForCTC, AutoProcessor
import torchaudio, torch
processor = AutoProcessor.from_pretrained("badrex/Ethio-ASR-multilingual-300M")
model = AutoModelForCTC.from_pretrained("badrex/Ethio-ASR-multilingual-300M")
audio, sr = torchaudio.load("audio.wav")
inputs = processor(audio.squeeze(), sampling_rate=sr, return_tensors="pt")
with torch.no_grad():
logits = model(**inputs).logits
pred_ids = torch.argmax(logits, dim=-1)
transcription = processor.batch_decode(pred_ids)[0]
print(transcription)
๐ง Downstream Use
- Voice assistants
- Accessibility tools
- Research baselines
๐ซ OutโofโScope Use
- Languages outside Amharic, Tigrinya, Afaan Oromo, Sidama, and Wolaytta.
- Highโstakes deployments without human review
- Noisy audio without speech enhancement
โ ๏ธ Risks & Limitations
Performance might vary across dialects, genders, ages, and recording quality.
๐ Citation
@misc{ethio_asr_2026,
author = {
Abdullah, Badr M. and
Azime, Israel Abebe and
Tonja, Atnafu Lambebo and
Alabi, Jesujoba O. and
Alemu, Abel Mulat and
Hagos, Eyob G. and
Balcha, Bontu Fufa and
Nerea, Mulubrhan A. and
Yadeta, Debela Desalegn and
Marilign, Dagnachew Mekonnen and
Fentahun, Amanuel Temesgen and
Kebede, Tadesse and
Gebru, Israel D. and
Woldeyohannis, Michael Melese and
Sewunetie, Walelign Tewabe and
Mรถbius, Bernd and
Klakow, Dietrich
},
title = {Ethio-ASR: Joint Multilingual Speech Recognition and Language Identification for Ethiopian Languages},
year = {2026},
howpublished = {\url{https://huggingface.co/badrex/Ethio-ASR-multilingual-300M}}
}