opus-mt-en-wal / README.md
WellDunDun's picture
Update README.md
aea525f verified
---
library_name: transformers
license: apache-2.0
language:
- en
- wal
base_model: Helsinki-NLP/opus-mt-en-mul
tags:
- translation
- en-wal
- wolaytta
- ethiopian-languages
- low-resource
- marian
- opus-mt
- generated_from_trainer
datasets:
- michsethowusu/english-wolaytta_sentence-pairs_mt560
pipeline_tag: translation
model-index:
- name: opus-mt-en-wal
results: []
---
# English to Wolaytta Translation Model
A machine translation model for translating **English → Wolaytta** (an Ethiopian language spoken by 2-7 million people).
This is the first publicly available English-to-Wolaytta neural machine translation model.
## Model Details
| Property | Value |
|----------|-------|
| **Base Model** | [Helsinki-NLP/opus-mt-en-mul](https://huggingface.co/Helsinki-NLP/opus-mt-en-mul) |
| **Architecture** | MarianMT (Transformer) |
| **Parameters** | 77M |
| **Training Data** | 120,608 sentence pairs |
| **Final Validation Loss** | 0.3485 |
| **License** | Apache 2.0 |
## Usage
```python
from transformers import MarianMTModel, MarianTokenizer
model_name = "WellDunDun/opus-mt-en-wal"
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)
text = "Hello, how are you?"
inputs = tokenizer(text, return_tensors="pt", padding=True)
outputs = model.generate(**inputs, max_length=128, num_beams=4)
translation = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(translation) # Output: "Halo, neeni waanidee?"
```
## Example Translations
| English | Wolaytta |
|---------|----------|
| Hello, how are you? | Halo, neeni waanidee? |
| Thank you very much | Keehippe galatays |
| What is your name? | Ne sunttay aybee? |
## Training Data
This model was fine-tuned on the [michsethowusu/english-wolaytta_sentence-pairs_mt560](https://huggingface.co/datasets/michsethowusu/english-wolaytta_sentence-pairs_mt560) dataset, which contains 120,608 English-Wolaytta parallel sentences derived from [OPUS MT560](https://opus.nlpl.eu/MT560).
The training data primarily comes from:
- Bible translations
- JW.org publications
## Intended Uses
- Communication with Wolaytta speakers
- Language learning and education
- Research on low-resource language translation
- Building applications for the Wolaytta-speaking community
## Limitations
- **Domain bias**: Heavy religious/biblical content in training data
- **Casual speech**: May struggle with informal expressions or slang
- **Modern vocabulary**: Limited coverage of technology, contemporary topics
- **Low-resource language**: Wolaytta has limited digital resources; verify important translations with native speakers
## Training Procedure
### Training Hyperparameters
- **Learning rate**: 2e-05
- **Train batch size**: 16
- **Eval batch size**: 16
- **Seed**: 42
- **Optimizer**: AdamW (fused) with betas=(0.9,0.999) and epsilon=1e-08
- **LR scheduler**: Linear
- **Epochs**: 3
- **Mixed precision**: Native AMP
- **Hardware**: Google Colab (T4 GPU)
- **Training time**: ~3 hours
### Training Results
| Training Loss | Epoch | Step | Validation Loss |
|:-------------:|:------:|:-----:|:---------------:|
| 0.6944 | 0.14 | 1000 | 0.6297 |
| 0.5968 | 0.28 | 2000 | 0.5214 |
| 0.5329 | 0.42 | 3000 | 0.4742 |
| 0.5116 | 0.56 | 4000 | 0.4459 |
| 0.4747 | 0.70 | 5000 | 0.4255 |
| 0.4483 | 0.84 | 6000 | 0.4120 |
| 0.4501 | 0.98 | 7000 | 0.4021 |
| 0.4275 | 1.12 | 8000 | 0.3899 |
| 0.4174 | 1.26 | 9000 | 0.3833 |
| 0.4060 | 1.40 | 10000 | 0.3768 |
| 0.4145 | 1.54 | 11000 | 0.3727 |
| 0.3968 | 1.68 | 12000 | 0.3675 |
| 0.3930 | 1.82 | 13000 | 0.3635 |
| 0.4027 | 1.95 | 14000 | 0.3595 |
| 0.3778 | 2.09 | 15000 | 0.3573 |
| 0.3732 | 2.23 | 16000 | 0.3556 |
| 0.3695 | 2.37 | 17000 | 0.3535 |
| 0.3611 | 2.51 | 18000 | 0.3518 |
| 0.3605 | 2.65 | 19000 | 0.3504 |
| 0.3639 | 2.79 | 20000 | 0.3491 |
| 0.3680 | 2.93 | 21000 | 0.3485 |
### Framework Versions
- Transformers 4.57.3
- PyTorch 2.9.0+cu126
- Datasets 4.0.0
- Tokenizers 0.22.1
## Related Models
- [Helsinki-NLP/opus-mt-wal-en](https://huggingface.co/Helsinki-NLP/opus-mt-wal-en) - Wolaytta → English (reverse direction)
## About Wolaytta
Wolaytta (also spelled Wolayta, Wolaitta, Welayta) is a North Omotic language spoken in the Wolaita Zone of Ethiopia's Southern Nations, Nationalities, and Peoples' Region by approximately 2-7 million people.
## Citation
```bibtex
@misc{opus_mt_en_wal_2026,
title={English to Wolaytta Translation Model},
author={WellDunDun},
year={2026},
url={https://huggingface.co/WellDunDun/opus-mt-en-wal},
note={Fine-tuned on michsethowusu/english-wolaytta_sentence-pairs_mt560 dataset, derived from OPUS MT560}
}
```
## Acknowledgments
- [Helsinki-NLP](https://huggingface.co/Helsinki-NLP) for the base multilingual model
- [michsethowusu](https://huggingface.co/datasets/michsethowusu/english-wolaytta_sentence-pairs_mt560) for curating the parallel corpus
- [OPUS MT560](https://opus.nlpl.eu/MT560) for the original training data
- The Wolaytta language community