English to Wolaytta Translation Model
A machine translation model for translating English โ Wolaytta (an Ethiopian language spoken by 2-7 million people).
This is the first publicly available English-to-Wolaytta neural machine translation model.
Model Details
| Property | Value |
|---|---|
| Base Model | Helsinki-NLP/opus-mt-en-mul |
| Architecture | MarianMT (Transformer) |
| Parameters | 77M |
| Training Data | 120,608 sentence pairs |
| Final Validation Loss | 0.3485 |
| License | Apache 2.0 |
Usage
from transformers import MarianMTModel, MarianTokenizer
model_name = "WellDunDun/opus-mt-en-wal"
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)
text = "Hello, how are you?"
inputs = tokenizer(text, return_tensors="pt", padding=True)
outputs = model.generate(**inputs, max_length=128, num_beams=4)
translation = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(translation) # Output: "Halo, neeni waanidee?"
Example Translations
| English | Wolaytta |
|---|---|
| Hello, how are you? | Halo, neeni waanidee? |
| Thank you very much | Keehippe galatays |
| What is your name? | Ne sunttay aybee? |
Training Data
This model was fine-tuned on the michsethowusu/english-wolaytta_sentence-pairs_mt560 dataset, which contains 120,608 English-Wolaytta parallel sentences derived from OPUS MT560.
The training data primarily comes from:
- Bible translations
- JW.org publications
Intended Uses
- Communication with Wolaytta speakers
- Language learning and education
- Research on low-resource language translation
- Building applications for the Wolaytta-speaking community
Limitations
- Domain bias: Heavy religious/biblical content in training data
- Casual speech: May struggle with informal expressions or slang
- Modern vocabulary: Limited coverage of technology, contemporary topics
- Low-resource language: Wolaytta has limited digital resources; verify important translations with native speakers
Training Procedure
Training Hyperparameters
- Learning rate: 2e-05
- Train batch size: 16
- Eval batch size: 16
- Seed: 42
- Optimizer: AdamW (fused) with betas=(0.9,0.999) and epsilon=1e-08
- LR scheduler: Linear
- Epochs: 3
- Mixed precision: Native AMP
- Hardware: Google Colab (T4 GPU)
- Training time: ~3 hours
Training Results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| 0.6944 | 0.14 | 1000 | 0.6297 |
| 0.5968 | 0.28 | 2000 | 0.5214 |
| 0.5329 | 0.42 | 3000 | 0.4742 |
| 0.5116 | 0.56 | 4000 | 0.4459 |
| 0.4747 | 0.70 | 5000 | 0.4255 |
| 0.4483 | 0.84 | 6000 | 0.4120 |
| 0.4501 | 0.98 | 7000 | 0.4021 |
| 0.4275 | 1.12 | 8000 | 0.3899 |
| 0.4174 | 1.26 | 9000 | 0.3833 |
| 0.4060 | 1.40 | 10000 | 0.3768 |
| 0.4145 | 1.54 | 11000 | 0.3727 |
| 0.3968 | 1.68 | 12000 | 0.3675 |
| 0.3930 | 1.82 | 13000 | 0.3635 |
| 0.4027 | 1.95 | 14000 | 0.3595 |
| 0.3778 | 2.09 | 15000 | 0.3573 |
| 0.3732 | 2.23 | 16000 | 0.3556 |
| 0.3695 | 2.37 | 17000 | 0.3535 |
| 0.3611 | 2.51 | 18000 | 0.3518 |
| 0.3605 | 2.65 | 19000 | 0.3504 |
| 0.3639 | 2.79 | 20000 | 0.3491 |
| 0.3680 | 2.93 | 21000 | 0.3485 |
Framework Versions
- Transformers 4.57.3
- PyTorch 2.9.0+cu126
- Datasets 4.0.0
- Tokenizers 0.22.1
Related Models
- Helsinki-NLP/opus-mt-wal-en - Wolaytta โ English (reverse direction)
About Wolaytta
Wolaytta (also spelled Wolayta, Wolaitta, Welayta) is a North Omotic language spoken in the Wolaita Zone of Ethiopia's Southern Nations, Nationalities, and Peoples' Region by approximately 2-7 million people.
Citation
@misc{opus_mt_en_wal_2026,
title={English to Wolaytta Translation Model},
author={WellDunDun},
year={2026},
url={https://huggingface.co/WellDunDun/opus-mt-en-wal},
note={Fine-tuned on michsethowusu/english-wolaytta_sentence-pairs_mt560 dataset, derived from OPUS MT560}
}
Acknowledgments
- Helsinki-NLP for the base multilingual model
- michsethowusu for curating the parallel corpus
- OPUS MT560 for the original training data
- The Wolaytta language community
- Downloads last month
- 186
Model tree for WellDunDun/opus-mt-en-wal
Base model
Helsinki-NLP/opus-mt-en-mul