--- library_name: transformers license: apache-2.0 language: - en - wal base_model: Helsinki-NLP/opus-mt-en-mul tags: - translation - en-wal - wolaytta - ethiopian-languages - low-resource - marian - opus-mt - generated_from_trainer datasets: - michsethowusu/english-wolaytta_sentence-pairs_mt560 pipeline_tag: translation model-index: - name: opus-mt-en-wal results: [] --- # English to Wolaytta Translation Model A machine translation model for translating **English → Wolaytta** (an Ethiopian language spoken by 2-7 million people). This is the first publicly available English-to-Wolaytta neural machine translation model. ## Model Details | Property | Value | |----------|-------| | **Base Model** | [Helsinki-NLP/opus-mt-en-mul](https://huggingface.co/Helsinki-NLP/opus-mt-en-mul) | | **Architecture** | MarianMT (Transformer) | | **Parameters** | 77M | | **Training Data** | 120,608 sentence pairs | | **Final Validation Loss** | 0.3485 | | **License** | Apache 2.0 | ## Usage ```python from transformers import MarianMTModel, MarianTokenizer model_name = "WellDunDun/opus-mt-en-wal" tokenizer = MarianTokenizer.from_pretrained(model_name) model = MarianMTModel.from_pretrained(model_name) text = "Hello, how are you?" inputs = tokenizer(text, return_tensors="pt", padding=True) outputs = model.generate(**inputs, max_length=128, num_beams=4) translation = tokenizer.decode(outputs[0], skip_special_tokens=True) print(translation) # Output: "Halo, neeni waanidee?" ``` ## Example Translations | English | Wolaytta | |---------|----------| | Hello, how are you? | Halo, neeni waanidee? | | Thank you very much | Keehippe galatays | | What is your name? | Ne sunttay aybee? | ## Training Data This model was fine-tuned on the [michsethowusu/english-wolaytta_sentence-pairs_mt560](https://huggingface.co/datasets/michsethowusu/english-wolaytta_sentence-pairs_mt560) dataset, which contains 120,608 English-Wolaytta parallel sentences derived from [OPUS MT560](https://opus.nlpl.eu/MT560). The training data primarily comes from: - Bible translations - JW.org publications ## Intended Uses - Communication with Wolaytta speakers - Language learning and education - Research on low-resource language translation - Building applications for the Wolaytta-speaking community ## Limitations - **Domain bias**: Heavy religious/biblical content in training data - **Casual speech**: May struggle with informal expressions or slang - **Modern vocabulary**: Limited coverage of technology, contemporary topics - **Low-resource language**: Wolaytta has limited digital resources; verify important translations with native speakers ## Training Procedure ### Training Hyperparameters - **Learning rate**: 2e-05 - **Train batch size**: 16 - **Eval batch size**: 16 - **Seed**: 42 - **Optimizer**: AdamW (fused) with betas=(0.9,0.999) and epsilon=1e-08 - **LR scheduler**: Linear - **Epochs**: 3 - **Mixed precision**: Native AMP - **Hardware**: Google Colab (T4 GPU) - **Training time**: ~3 hours ### Training Results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:------:|:-----:|:---------------:| | 0.6944 | 0.14 | 1000 | 0.6297 | | 0.5968 | 0.28 | 2000 | 0.5214 | | 0.5329 | 0.42 | 3000 | 0.4742 | | 0.5116 | 0.56 | 4000 | 0.4459 | | 0.4747 | 0.70 | 5000 | 0.4255 | | 0.4483 | 0.84 | 6000 | 0.4120 | | 0.4501 | 0.98 | 7000 | 0.4021 | | 0.4275 | 1.12 | 8000 | 0.3899 | | 0.4174 | 1.26 | 9000 | 0.3833 | | 0.4060 | 1.40 | 10000 | 0.3768 | | 0.4145 | 1.54 | 11000 | 0.3727 | | 0.3968 | 1.68 | 12000 | 0.3675 | | 0.3930 | 1.82 | 13000 | 0.3635 | | 0.4027 | 1.95 | 14000 | 0.3595 | | 0.3778 | 2.09 | 15000 | 0.3573 | | 0.3732 | 2.23 | 16000 | 0.3556 | | 0.3695 | 2.37 | 17000 | 0.3535 | | 0.3611 | 2.51 | 18000 | 0.3518 | | 0.3605 | 2.65 | 19000 | 0.3504 | | 0.3639 | 2.79 | 20000 | 0.3491 | | 0.3680 | 2.93 | 21000 | 0.3485 | ### Framework Versions - Transformers 4.57.3 - PyTorch 2.9.0+cu126 - Datasets 4.0.0 - Tokenizers 0.22.1 ## Related Models - [Helsinki-NLP/opus-mt-wal-en](https://huggingface.co/Helsinki-NLP/opus-mt-wal-en) - Wolaytta → English (reverse direction) ## About Wolaytta Wolaytta (also spelled Wolayta, Wolaitta, Welayta) is a North Omotic language spoken in the Wolaita Zone of Ethiopia's Southern Nations, Nationalities, and Peoples' Region by approximately 2-7 million people. ## Citation ```bibtex @misc{opus_mt_en_wal_2026, title={English to Wolaytta Translation Model}, author={WellDunDun}, year={2026}, url={https://huggingface.co/WellDunDun/opus-mt-en-wal}, note={Fine-tuned on michsethowusu/english-wolaytta_sentence-pairs_mt560 dataset, derived from OPUS MT560} } ``` ## Acknowledgments - [Helsinki-NLP](https://huggingface.co/Helsinki-NLP) for the base multilingual model - [michsethowusu](https://huggingface.co/datasets/michsethowusu/english-wolaytta_sentence-pairs_mt560) for curating the parallel corpus - [OPUS MT560](https://opus.nlpl.eu/MT560) for the original training data - The Wolaytta language community