| | --- |
| | library_name: transformers |
| | license: apache-2.0 |
| | language: |
| | - en |
| | - wal |
| | base_model: Helsinki-NLP/opus-mt-en-mul |
| | tags: |
| | - translation |
| | - en-wal |
| | - wolaytta |
| | - ethiopian-languages |
| | - low-resource |
| | - marian |
| | - opus-mt |
| | - generated_from_trainer |
| | datasets: |
| | - michsethowusu/english-wolaytta_sentence-pairs_mt560 |
| | pipeline_tag: translation |
| | model-index: |
| | - name: opus-mt-en-wal |
| | results: [] |
| | --- |
| | |
| | # English to Wolaytta Translation Model |
| |
|
| | A machine translation model for translating **English → Wolaytta** (an Ethiopian language spoken by 2-7 million people). |
| |
|
| | This is the first publicly available English-to-Wolaytta neural machine translation model. |
| |
|
| | ## Model Details |
| |
|
| | | Property | Value | |
| | |----------|-------| |
| | | **Base Model** | [Helsinki-NLP/opus-mt-en-mul](https://huggingface.co/Helsinki-NLP/opus-mt-en-mul) | |
| | | **Architecture** | MarianMT (Transformer) | |
| | | **Parameters** | 77M | |
| | | **Training Data** | 120,608 sentence pairs | |
| | | **Final Validation Loss** | 0.3485 | |
| | | **License** | Apache 2.0 | |
| |
|
| | ## Usage |
| |
|
| | ```python |
| | from transformers import MarianMTModel, MarianTokenizer |
| | |
| | model_name = "WellDunDun/opus-mt-en-wal" |
| | tokenizer = MarianTokenizer.from_pretrained(model_name) |
| | model = MarianMTModel.from_pretrained(model_name) |
| | |
| | text = "Hello, how are you?" |
| | inputs = tokenizer(text, return_tensors="pt", padding=True) |
| | outputs = model.generate(**inputs, max_length=128, num_beams=4) |
| | translation = tokenizer.decode(outputs[0], skip_special_tokens=True) |
| | print(translation) # Output: "Halo, neeni waanidee?" |
| | ``` |
| |
|
| | ## Example Translations |
| |
|
| | | English | Wolaytta | |
| | |---------|----------| |
| | | Hello, how are you? | Halo, neeni waanidee? | |
| | | Thank you very much | Keehippe galatays | |
| | | What is your name? | Ne sunttay aybee? | |
| |
|
| | ## Training Data |
| |
|
| | This model was fine-tuned on the [michsethowusu/english-wolaytta_sentence-pairs_mt560](https://huggingface.co/datasets/michsethowusu/english-wolaytta_sentence-pairs_mt560) dataset, which contains 120,608 English-Wolaytta parallel sentences derived from [OPUS MT560](https://opus.nlpl.eu/MT560). |
| |
|
| | The training data primarily comes from: |
| | - Bible translations |
| | - JW.org publications |
| |
|
| | ## Intended Uses |
| |
|
| | - Communication with Wolaytta speakers |
| | - Language learning and education |
| | - Research on low-resource language translation |
| | - Building applications for the Wolaytta-speaking community |
| |
|
| | ## Limitations |
| |
|
| | - **Domain bias**: Heavy religious/biblical content in training data |
| | - **Casual speech**: May struggle with informal expressions or slang |
| | - **Modern vocabulary**: Limited coverage of technology, contemporary topics |
| | - **Low-resource language**: Wolaytta has limited digital resources; verify important translations with native speakers |
| |
|
| | ## Training Procedure |
| |
|
| | ### Training Hyperparameters |
| |
|
| | - **Learning rate**: 2e-05 |
| | - **Train batch size**: 16 |
| | - **Eval batch size**: 16 |
| | - **Seed**: 42 |
| | - **Optimizer**: AdamW (fused) with betas=(0.9,0.999) and epsilon=1e-08 |
| | - **LR scheduler**: Linear |
| | - **Epochs**: 3 |
| | - **Mixed precision**: Native AMP |
| | - **Hardware**: Google Colab (T4 GPU) |
| | - **Training time**: ~3 hours |
| |
|
| | ### Training Results |
| |
|
| | | Training Loss | Epoch | Step | Validation Loss | |
| | |:-------------:|:------:|:-----:|:---------------:| |
| | | 0.6944 | 0.14 | 1000 | 0.6297 | |
| | | 0.5968 | 0.28 | 2000 | 0.5214 | |
| | | 0.5329 | 0.42 | 3000 | 0.4742 | |
| | | 0.5116 | 0.56 | 4000 | 0.4459 | |
| | | 0.4747 | 0.70 | 5000 | 0.4255 | |
| | | 0.4483 | 0.84 | 6000 | 0.4120 | |
| | | 0.4501 | 0.98 | 7000 | 0.4021 | |
| | | 0.4275 | 1.12 | 8000 | 0.3899 | |
| | | 0.4174 | 1.26 | 9000 | 0.3833 | |
| | | 0.4060 | 1.40 | 10000 | 0.3768 | |
| | | 0.4145 | 1.54 | 11000 | 0.3727 | |
| | | 0.3968 | 1.68 | 12000 | 0.3675 | |
| | | 0.3930 | 1.82 | 13000 | 0.3635 | |
| | | 0.4027 | 1.95 | 14000 | 0.3595 | |
| | | 0.3778 | 2.09 | 15000 | 0.3573 | |
| | | 0.3732 | 2.23 | 16000 | 0.3556 | |
| | | 0.3695 | 2.37 | 17000 | 0.3535 | |
| | | 0.3611 | 2.51 | 18000 | 0.3518 | |
| | | 0.3605 | 2.65 | 19000 | 0.3504 | |
| | | 0.3639 | 2.79 | 20000 | 0.3491 | |
| | | 0.3680 | 2.93 | 21000 | 0.3485 | |
| |
|
| | ### Framework Versions |
| |
|
| | - Transformers 4.57.3 |
| | - PyTorch 2.9.0+cu126 |
| | - Datasets 4.0.0 |
| | - Tokenizers 0.22.1 |
| |
|
| | ## Related Models |
| |
|
| | - [Helsinki-NLP/opus-mt-wal-en](https://huggingface.co/Helsinki-NLP/opus-mt-wal-en) - Wolaytta → English (reverse direction) |
| |
|
| | ## About Wolaytta |
| |
|
| | Wolaytta (also spelled Wolayta, Wolaitta, Welayta) is a North Omotic language spoken in the Wolaita Zone of Ethiopia's Southern Nations, Nationalities, and Peoples' Region by approximately 2-7 million people. |
| |
|
| | ## Citation |
| |
|
| | ```bibtex |
| | @misc{opus_mt_en_wal_2026, |
| | title={English to Wolaytta Translation Model}, |
| | author={WellDunDun}, |
| | year={2026}, |
| | url={https://huggingface.co/WellDunDun/opus-mt-en-wal}, |
| | note={Fine-tuned on michsethowusu/english-wolaytta_sentence-pairs_mt560 dataset, derived from OPUS MT560} |
| | } |
| | ``` |
| |
|
| | ## Acknowledgments |
| |
|
| | - [Helsinki-NLP](https://huggingface.co/Helsinki-NLP) for the base multilingual model |
| | - [michsethowusu](https://huggingface.co/datasets/michsethowusu/english-wolaytta_sentence-pairs_mt560) for curating the parallel corpus |
| | - [OPUS MT560](https://opus.nlpl.eu/MT560) for the original training data |
| | - The Wolaytta language community |