Update README.md

c9ec1e7 verified 20 days ago

4.39 kB

library_name: transformers
license: cc-by-nc-4.0
datasets:
  - google/WaxalNLP
metrics:
  - wer
  - cer
language:
  - am
  - ti
  - om
base_model:
  - facebook/mms-300m
pipeline_tag: automatic-speech-recognition
tags:
  - multilingual

arXiv 📖 [ preprint ]

⚒️ Model Description

Ethio-ASR is a suite of multilingual Automatic Speech Recognition (ASR) models that support five Ethiopian languages: Amharic, Tigrinya, Afaan Oromo, Sidama, and Wolaytta. The ASR model in this repo is based on the mms-300m pre-trained model by fine-tuning it on the WAXAL Speech Dataset.

Developed by: Ethio-ASR Team
Task: Speech Recognition (ASR) and Language Identification (LID)
Languages: Amharic, Tigrinya, Afaan Oromo, Sidama, and Wolaytta
License: CC-BY-NC 4.0
Finetuned from: facebook/mms-300m

📈 Evaluation on WAXAL Test Set

📌 ASR model in this HF repo

Model	# Params	Amharic	Tigrinya	Oromo	Wolaytta	Sidaama	Avg.
Ethio-ASR (afrihubert)	92M	30.95	42.42	27.57	40.44	34.02	35.08
Ethio-ASR (mms-300) 📌	300M	30.19	41.62	26.41	39.10	32.66	33.99
Ethio-ASR (mms-1b)	1B	26.14	37.63	23.69	37.51	31.02	31.20
Ethio-ASR (w2v-bert-2.0)	600M	22.92	35.22	24.44	38.19	31.65	30.48

🎧 Direct Use

from transformers import AutoModelForCTC, AutoProcessor
import torchaudio, torch

processor = AutoProcessor.from_pretrained("badrex/Ethio-ASR-multilingual-300M")
model = AutoModelForCTC.from_pretrained("badrex/Ethio-ASR-multilingual-300M")

audio, sr = torchaudio.load("audio.wav")
inputs = processor(audio.squeeze(), sampling_rate=sr, return_tensors="pt")

with torch.no_grad():
    logits = model(**inputs).logits

pred_ids = torch.argmax(logits, dim=-1)
transcription = processor.batch_decode(pred_ids)[0]

print(transcription)

🔧 Downstream Use

Voice assistants
Accessibility tools
Research baselines

🚫 Out‑of‑Scope Use

Languages outside Amharic, Tigrinya, Afaan Oromo, Sidama, and Wolaytta.
High‑stakes deployments without human review
Noisy audio without speech enhancement

⚠️ Risks & Limitations

Performance might vary across dialects, genders, ages, and recording quality.

📌 Citation

@misc{ethio_asr_2026,
  author = {
    Abdullah, Badr M. and
    Azime, Israel Abebe and
    Tonja, Atnafu Lambebo and
    Alabi, Jesujoba O. and
    Alemu, Abel Mulat and
    Hagos, Eyob G. and
    Balcha, Bontu Fufa and
    Nerea, Mulubrhan A. and
    Yadeta, Debela Desalegn and
    Marilign, Dagnachew Mekonnen and
    Fentahun, Amanuel Temesgen and
    Kebede, Tadesse and
    Gebru, Israel D. and
    Woldeyohannis, Michael Melese and
    Sewunetie, Walelign Tewabe and
    Möbius, Bernd and
    Klakow, Dietrich
  },
  title = {Ethio-ASR: Joint Multilingual Speech Recognition and Language Identification for Ethiopian Languages},
  year = {2026},
  howpublished = {\url{https://huggingface.co/badrex/Ethio-ASR-multilingual-300M}}
}