AI & ML interests
None defined yet.
Recent Activity
Leyu on Hugging Face
Welcome to the official Leyu by gheero organization on Hugging Face!
Featured Datasets (Leyu Amharic Dialects)
- Leyu-amharic-shewa-dialect
- Leyu-amharic-wello-dialect
- Leyu-amharic-gonder-dialect
- Leyu-amharic-gojjam-dialect
About the Datasets
Our datasets are a specialized collection of speech audio focused on low-resource African languages, currently emphasizing dialects of Ethiopian local languages. Designed primarily for Speech-to-Text (STT) research, the corpus captures the unique phonetic nuances and rhythmic patterns of different dialects.
The audio was recorded in real-world environments by contributors using mobile devices, providing diverse acoustic conditions that help train robust models. Every recording undergoes rigorous manual review, where designated reviewers verify transcript alignment and audio clarity.
To support inclusive and representative AI systems, we prioritized demographic diversity across the collection:
- Gender Balance: balanced representation of male and female voices
- Age Distribution: 18–35 years
- Regional Diversity: native speakers from the specific regional zones of each dialect
- Technical Environment: mobile-recorded in real-world conditions (background noise, varied microphones)
gheero Blogs
Explore more about our work on low-resource languages, dialect research, and inclusive AI development:
- Leyu: Crowdsourcing Datasets for Ethiopian Languages
- Dialects & Socioeconomics: Shaping Inclusive Language Models
- Progress of Natural Language Processing (NLP) for Ethiopian Languages – Part One
- Progress of Natural Language Processing (NLP) for Ethiopian Languages – Part Two
- Data Collection with Purpose: The Leyu Approach