---
library_name: transformers
license: apache-2.0
language:
- en
- wal
base_model: Helsinki-NLP/opus-mt-en-mul
tags:
- translation
- en-wal
- wolaytta
- ethiopian-languages
- low-resource
- marian
- opus-mt
- generated_from_trainer
datasets:
- michsethowusu/english-wolaytta_sentence-pairs_mt560
pipeline_tag: translation
model-index:
- name: opus-mt-en-wal
  results: []
---

# English to Wolaytta Translation Model

A machine translation model for translating **English → Wolaytta** (an Ethiopian language spoken by 2-7 million people).

This is the first publicly available English-to-Wolaytta neural machine translation model.

## Model Details

| Property | Value |
|----------|-------|
| **Base Model** | [Helsinki-NLP/opus-mt-en-mul](https://huggingface.co/Helsinki-NLP/opus-mt-en-mul) |
| **Architecture** | MarianMT (Transformer) |
| **Parameters** | 77M |
| **Training Data** | 120,608 sentence pairs |
| **Final Validation Loss** | 0.3485 |
| **License** | Apache 2.0 |

## Usage

```python
from transformers import MarianMTModel, MarianTokenizer

model_name = "WellDunDun/opus-mt-en-wal"
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)

text = "Hello, how are you?"
inputs = tokenizer(text, return_tensors="pt", padding=True)
outputs = model.generate(**inputs, max_length=128, num_beams=4)
translation = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(translation)  # Output: "Halo, neeni waanidee?"
```

## Example Translations

| English | Wolaytta |
|---------|----------|
| Hello, how are you? | Halo, neeni waanidee? |
| Thank you very much | Keehippe galatays |
| What is your name? | Ne sunttay aybee? |

## Training Data

This model was fine-tuned on the [michsethowusu/english-wolaytta_sentence-pairs_mt560](https://huggingface.co/datasets/michsethowusu/english-wolaytta_sentence-pairs_mt560) dataset, which contains 120,608 English-Wolaytta parallel sentences derived from [OPUS MT560](https://opus.nlpl.eu/MT560).

The training data primarily comes from:
- Bible translations
- JW.org publications

## Intended Uses

- Communication with Wolaytta speakers
- Language learning and education
- Research on low-resource language translation
- Building applications for the Wolaytta-speaking community

## Limitations

- **Domain bias**: Heavy religious/biblical content in training data
- **Casual speech**: May struggle with informal expressions or slang
- **Modern vocabulary**: Limited coverage of technology, contemporary topics
- **Low-resource language**: Wolaytta has limited digital resources; verify important translations with native speakers

## Training Procedure

### Training Hyperparameters

- **Learning rate**: 2e-05
- **Train batch size**: 16
- **Eval batch size**: 16
- **Seed**: 42
- **Optimizer**: AdamW (fused) with betas=(0.9,0.999) and epsilon=1e-08
- **LR scheduler**: Linear
- **Epochs**: 3
- **Mixed precision**: Native AMP
- **Hardware**: Google Colab (T4 GPU)
- **Training time**: ~3 hours

### Training Results

| Training Loss | Epoch  | Step  | Validation Loss |
|:-------------:|:------:|:-----:|:---------------:|
| 0.6944        | 0.14   | 1000  | 0.6297          |
| 0.5968        | 0.28   | 2000  | 0.5214          |
| 0.5329        | 0.42   | 3000  | 0.4742          |
| 0.5116        | 0.56   | 4000  | 0.4459          |
| 0.4747        | 0.70   | 5000  | 0.4255          |
| 0.4483        | 0.84   | 6000  | 0.4120          |
| 0.4501        | 0.98   | 7000  | 0.4021          |
| 0.4275        | 1.12   | 8000  | 0.3899          |
| 0.4174        | 1.26   | 9000  | 0.3833          |
| 0.4060        | 1.40   | 10000 | 0.3768          |
| 0.4145        | 1.54   | 11000 | 0.3727          |
| 0.3968        | 1.68   | 12000 | 0.3675          |
| 0.3930        | 1.82   | 13000 | 0.3635          |
| 0.4027        | 1.95   | 14000 | 0.3595          |
| 0.3778        | 2.09   | 15000 | 0.3573          |
| 0.3732        | 2.23   | 16000 | 0.3556          |
| 0.3695        | 2.37   | 17000 | 0.3535          |
| 0.3611        | 2.51   | 18000 | 0.3518          |
| 0.3605        | 2.65   | 19000 | 0.3504          |
| 0.3639        | 2.79   | 20000 | 0.3491          |
| 0.3680        | 2.93   | 21000 | 0.3485          |

### Framework Versions

- Transformers 4.57.3
- PyTorch 2.9.0+cu126
- Datasets 4.0.0
- Tokenizers 0.22.1

## Related Models

- [Helsinki-NLP/opus-mt-wal-en](https://huggingface.co/Helsinki-NLP/opus-mt-wal-en) - Wolaytta → English (reverse direction)

## About Wolaytta

Wolaytta (also spelled Wolayta, Wolaitta, Welayta) is a North Omotic language spoken in the Wolaita Zone of Ethiopia's Southern Nations, Nationalities, and Peoples' Region by approximately 2-7 million people.

## Citation

```bibtex
@misc{opus_mt_en_wal_2026,
  title={English to Wolaytta Translation Model},
  author={WellDunDun},
  year={2026},
  url={https://huggingface.co/WellDunDun/opus-mt-en-wal},
  note={Fine-tuned on michsethowusu/english-wolaytta_sentence-pairs_mt560 dataset, derived from OPUS MT560}
}
```

## Acknowledgments

- [Helsinki-NLP](https://huggingface.co/Helsinki-NLP) for the base multilingual model
- [michsethowusu](https://huggingface.co/datasets/michsethowusu/english-wolaytta_sentence-pairs_mt560) for curating the parallel corpus
- [OPUS MT560](https://opus.nlpl.eu/MT560) for the original training data
- The Wolaytta language community