English to Wolaytta Translation Model

A machine translation model for translating English → Wolaytta (an Ethiopian language spoken by 2-7 million people).

This is the first publicly available English-to-Wolaytta neural machine translation model.

Model Details

Property	Value
Base Model	Helsinki-NLP/opus-mt-en-mul
Architecture	MarianMT (Transformer)
Parameters	77M
Training Data	120,608 sentence pairs
Final Validation Loss	0.3485
License	Apache 2.0

Usage

from transformers import MarianMTModel, MarianTokenizer

model_name = "WellDunDun/opus-mt-en-wal"
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)

text = "Hello, how are you?"
inputs = tokenizer(text, return_tensors="pt", padding=True)
outputs = model.generate(**inputs, max_length=128, num_beams=4)
translation = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(translation)  # Output: "Halo, neeni waanidee?"

Example Translations

English	Wolaytta
Hello, how are you?	Halo, neeni waanidee?
Thank you very much	Keehippe galatays
What is your name?	Ne sunttay aybee?

Training Data

This model was fine-tuned on the michsethowusu/english-wolaytta_sentence-pairs_mt560 dataset, which contains 120,608 English-Wolaytta parallel sentences derived from OPUS MT560.

The training data primarily comes from:

Bible translations
JW.org publications

Intended Uses

Communication with Wolaytta speakers
Language learning and education
Research on low-resource language translation
Building applications for the Wolaytta-speaking community

Limitations

Domain bias: Heavy religious/biblical content in training data
Casual speech: May struggle with informal expressions or slang
Modern vocabulary: Limited coverage of technology, contemporary topics
Low-resource language: Wolaytta has limited digital resources; verify important translations with native speakers

Training Procedure

Training Hyperparameters

Learning rate: 2e-05
Train batch size: 16
Eval batch size: 16
Seed: 42
Optimizer: AdamW (fused) with betas=(0.9,0.999) and epsilon=1e-08
LR scheduler: Linear
Epochs: 3
Mixed precision: Native AMP
Hardware: Google Colab (T4 GPU)
Training time: ~3 hours

Training Results

Training Loss	Epoch	Step	Validation Loss
0.6944	0.14	1000	0.6297
0.5968	0.28	2000	0.5214
0.5329	0.42	3000	0.4742
0.5116	0.56	4000	0.4459
0.4747	0.70	5000	0.4255
0.4483	0.84	6000	0.4120
0.4501	0.98	7000	0.4021
0.4275	1.12	8000	0.3899
0.4174	1.26	9000	0.3833
0.4060	1.40	10000	0.3768
0.4145	1.54	11000	0.3727
0.3968	1.68	12000	0.3675
0.3930	1.82	13000	0.3635
0.4027	1.95	14000	0.3595
0.3778	2.09	15000	0.3573
0.3732	2.23	16000	0.3556
0.3695	2.37	17000	0.3535
0.3611	2.51	18000	0.3518
0.3605	2.65	19000	0.3504
0.3639	2.79	20000	0.3491
0.3680	2.93	21000	0.3485

Framework Versions

Transformers 4.57.3
PyTorch 2.9.0+cu126
Datasets 4.0.0
Tokenizers 0.22.1

Related Models

Helsinki-NLP/opus-mt-wal-en - Wolaytta → English (reverse direction)

About Wolaytta

Wolaytta (also spelled Wolayta, Wolaitta, Welayta) is a North Omotic language spoken in the Wolaita Zone of Ethiopia's Southern Nations, Nationalities, and Peoples' Region by approximately 2-7 million people.

Citation

@misc{opus_mt_en_wal_2026,
  title={English to Wolaytta Translation Model},
  author={WellDunDun},
  year={2026},
  url={https://huggingface.co/WellDunDun/opus-mt-en-wal},
  note={Fine-tuned on michsethowusu/english-wolaytta_sentence-pairs_mt560 dataset, derived from OPUS MT560}
}

Acknowledgments

Helsinki-NLP for the base multilingual model
michsethowusu for curating the parallel corpus
OPUS MT560 for the original training data
The Wolaytta language community

Downloads last month: 5

Safetensors

Model size

77M params

Tensor type

F32

Model tree for WellDunDun/opus-mt-en-wal

Base model

Helsinki-NLP/opus-mt-en-mul

Finetuned

(15)

this model

WellDunDun
/

opus-mt-en-wal