English to Wolaytta Translation Model

A machine translation model for translating English โ†’ Wolaytta (an Ethiopian language spoken by 2-7 million people).

This is the first publicly available English-to-Wolaytta neural machine translation model.

Model Details

Property Value
Base Model Helsinki-NLP/opus-mt-en-mul
Architecture MarianMT (Transformer)
Parameters 77M
Training Data 120,608 sentence pairs
Final Validation Loss 0.3485
License Apache 2.0

Usage

from transformers import MarianMTModel, MarianTokenizer

model_name = "WellDunDun/opus-mt-en-wal"
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)

text = "Hello, how are you?"
inputs = tokenizer(text, return_tensors="pt", padding=True)
outputs = model.generate(**inputs, max_length=128, num_beams=4)
translation = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(translation)  # Output: "Halo, neeni waanidee?"

Example Translations

English Wolaytta
Hello, how are you? Halo, neeni waanidee?
Thank you very much Keehippe galatays
What is your name? Ne sunttay aybee?

Training Data

This model was fine-tuned on the michsethowusu/english-wolaytta_sentence-pairs_mt560 dataset, which contains 120,608 English-Wolaytta parallel sentences derived from OPUS MT560.

The training data primarily comes from:

  • Bible translations
  • JW.org publications

Intended Uses

  • Communication with Wolaytta speakers
  • Language learning and education
  • Research on low-resource language translation
  • Building applications for the Wolaytta-speaking community

Limitations

  • Domain bias: Heavy religious/biblical content in training data
  • Casual speech: May struggle with informal expressions or slang
  • Modern vocabulary: Limited coverage of technology, contemporary topics
  • Low-resource language: Wolaytta has limited digital resources; verify important translations with native speakers

Training Procedure

Training Hyperparameters

  • Learning rate: 2e-05
  • Train batch size: 16
  • Eval batch size: 16
  • Seed: 42
  • Optimizer: AdamW (fused) with betas=(0.9,0.999) and epsilon=1e-08
  • LR scheduler: Linear
  • Epochs: 3
  • Mixed precision: Native AMP
  • Hardware: Google Colab (T4 GPU)
  • Training time: ~3 hours

Training Results

Training Loss Epoch Step Validation Loss
0.6944 0.14 1000 0.6297
0.5968 0.28 2000 0.5214
0.5329 0.42 3000 0.4742
0.5116 0.56 4000 0.4459
0.4747 0.70 5000 0.4255
0.4483 0.84 6000 0.4120
0.4501 0.98 7000 0.4021
0.4275 1.12 8000 0.3899
0.4174 1.26 9000 0.3833
0.4060 1.40 10000 0.3768
0.4145 1.54 11000 0.3727
0.3968 1.68 12000 0.3675
0.3930 1.82 13000 0.3635
0.4027 1.95 14000 0.3595
0.3778 2.09 15000 0.3573
0.3732 2.23 16000 0.3556
0.3695 2.37 17000 0.3535
0.3611 2.51 18000 0.3518
0.3605 2.65 19000 0.3504
0.3639 2.79 20000 0.3491
0.3680 2.93 21000 0.3485

Framework Versions

  • Transformers 4.57.3
  • PyTorch 2.9.0+cu126
  • Datasets 4.0.0
  • Tokenizers 0.22.1

Related Models

About Wolaytta

Wolaytta (also spelled Wolayta, Wolaitta, Welayta) is a North Omotic language spoken in the Wolaita Zone of Ethiopia's Southern Nations, Nationalities, and Peoples' Region by approximately 2-7 million people.

Citation

@misc{opus_mt_en_wal_2026,
  title={English to Wolaytta Translation Model},
  author={WellDunDun},
  year={2026},
  url={https://huggingface.co/WellDunDun/opus-mt-en-wal},
  note={Fine-tuned on michsethowusu/english-wolaytta_sentence-pairs_mt560 dataset, derived from OPUS MT560}
}

Acknowledgments

Downloads last month
186
Safetensors
Model size
77M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for WellDunDun/opus-mt-en-wal

Finetuned
(15)
this model

Dataset used to train WellDunDun/opus-mt-en-wal

Space using WellDunDun/opus-mt-en-wal 1