| --- |
| library_name: transformers |
| language: |
| - mt |
| - en |
| license: cc-by-nc-sa-4.0 |
| base_model: google/mt5-small |
| datasets: |
| - MLRS/OPUS-MT-EN-Fixed |
| model-index: |
| - name: mt5-small_opus100-eng-mlt |
| results: |
| - task: |
| type: machine-translation |
| name: Machine Translation |
| dataset: |
| type: opus100-eng-mlt |
| name: MLRS/OPUS-MT-EN-Fixed |
| metrics: |
| - type: bleu |
| value: 51.00 |
| name: BLEU |
| - type: chrf |
| value: 75.40 |
| name: ChrF |
| source: |
| name: MELABench Leaderboard |
| url: https://huggingface.co/spaces/MLRS/MELABench |
| extra_gated_fields: |
| Name: text |
| Surname: text |
| Date of Birth: date_picker |
| Organisation: text |
| Country: country |
| I agree to use this model in accordance to the license and for non-commercial use ONLY: checkbox |
| --- |
| |
| # mT5-Small (OPUS-100 English→Maltese) |
|
|
| This model is a fine-tuned version of [google/mt5-small](https://huggingface.co/google/mt5-small) on the [MLRS/OPUS-MT-EN-Fixed](https://huggingface.co/datasets/MLRS/OPUS-MT-EN-Fixed) dataset. |
| It achieves the following results on the test set: |
| - Loss: 0.5395 |
| - Bleu: |
| - Bleu: 0.5175 |
| - Brevity Penalty: 0.9870 |
| - Length Ratio: 0.9871 |
| - Translation Length: 41331 |
| - Reference Length: 41873 |
| - Chrf: |
| - Score: 75.9261 |
| - Char Order: 6 |
| - Word Order: 0 |
| - Beta: 2 |
| - Gen Len: 51.54 |
|
|
| ## Intended uses & limitations |
|
|
| The model is fine-tuned on a specific task and it should be used on the same or similar task. |
| Any limitations present in the base model are inherited. |
|
|
| ## Training procedure |
|
|
| The model was fine-tuned using a customised [script](https://github.com/MLRS/MELABench/blob/main/finetuning/run_seq2seq.py). |
|
|
| ### Training hyperparameters |
|
|
| The following hyperparameters were used during training: |
| - learning_rate: 0.001 |
| - train_batch_size: 32 |
| - eval_batch_size: 32 |
| - seed: 42 |
| - gradient_accumulation_steps: 4 |
| - total_train_batch_size: 128 |
| - optimizer: Use adafactor and the args are: |
| No additional optimizer arguments |
| - lr_scheduler_type: linear |
| - num_epochs: 10.0 |
| |
| ### Training results |
| |
| | Training Loss | Epoch | Step | Validation Loss | Bleu Bleu | Bleu Brevity Penalty | Bleu Length Ratio | Bleu Translation Length | Bleu Reference Length | Chrf Score | Chrf Char Order | Chrf Word Order | Chrf Beta | Gen Len | |
| |:-------------:|:------:|:-----:|:---------------:|:---------:|:--------------------:|:-----------------:|:-----------------------:|:---------------------:|:----------:|:---------------:|:---------------:|:---------:|:-------:| |
| | 0.7973 | 1.0 | 7813 | 0.7208 | 0.4470 | 0.9888 | 0.9889 | 43489 | 43979 | 71.3941 | 6 | 0 | 2 | 54.3435 | |
| | 0.6534 | 2.0 | 15626 | 0.6406 | 0.4712 | 0.9931 | 0.9931 | 43675 | 43979 | 72.7865 | 6 | 0 | 2 | 54.1575 | |
| | 0.5785 | 3.0 | 23439 | 0.6027 | 0.4804 | 0.9937 | 0.9937 | 43703 | 43979 | 73.6939 | 6 | 0 | 2 | 54.501 | |
| | 0.5336 | 4.0 | 31252 | 0.5779 | 0.4900 | 0.9937 | 0.9937 | 43704 | 43979 | 74.1543 | 6 | 0 | 2 | 54.535 | |
| | 0.5034 | 5.0 | 39065 | 0.5617 | 0.4995 | 1.0 | 1.0004 | 43998 | 43979 | 74.7266 | 6 | 0 | 2 | 54.694 | |
| | 0.4797 | 6.0 | 46878 | 0.5501 | 0.4985 | 0.9897 | 0.9898 | 43530 | 43979 | 74.7707 | 6 | 0 | 2 | 54.3215 | |
| | 0.4576 | 7.0 | 54691 | 0.5458 | 0.5050 | 0.9921 | 0.9921 | 43632 | 43979 | 75.0066 | 6 | 0 | 2 | 54.259 | |
| | 0.4424 | 8.0 | 62504 | 0.5369 | 0.5062 | 0.9914 | 0.9915 | 43604 | 43979 | 75.0734 | 6 | 0 | 2 | 54.286 | |
| | 0.4287 | 9.0 | 70317 | 0.5358 | 0.5107 | 0.9875 | 0.9876 | 43434 | 43979 | 75.3841 | 6 | 0 | 2 | 54.1655 | |
| | 0.417 | 9.9988 | 78120 | 0.5350 | 0.5100 | 0.9868 | 0.9869 | 43404 | 43979 | 75.4033 | 6 | 0 | 2 | 54.058 | |
| |
| |
| ### Framework versions |
| |
| - Transformers 4.48.1 |
| - Pytorch 2.4.1+cu121 |
| - Datasets 3.2.0 |
| - Tokenizers 0.21.0 |
| |
| ## License |
| |
| This work is licensed under a |
| [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License][cc-by-nc-sa]. |
| Permissions beyond the scope of this license may be available at [https://mlrs.research.um.edu.mt/](https://mlrs.research.um.edu.mt/). |
| |
| [![CC BY-NC-SA 4.0][cc-by-nc-sa-image]][cc-by-nc-sa] |
| |
| [cc-by-nc-sa]: http://creativecommons.org/licenses/by-nc-sa/4.0/ |
| [cc-by-nc-sa-image]: https://licensebuttons.net/l/by-nc-sa/4.0/88x31.png |
| |
| ## Citation |
| |
| This work was first presented in [MELABenchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource Maltese NLP](https://arxiv.org/abs/2506.04385). |
| Cite it as follows: |
| |
| ```bibtex |
| @inproceedings{micallef-borg-2025-melabenchv1, |
| title = "{MELAB}enchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource {M}altese {NLP}", |
| author = "Micallef, Kurt and |
| Borg, Claudia", |
| editor = "Che, Wanxiang and |
| Nabende, Joyce and |
| Shutova, Ekaterina and |
| Pilehvar, Mohammad Taher", |
| booktitle = "Findings of the Association for Computational Linguistics: ACL 2025", |
| month = jul, |
| year = "2025", |
| address = "Vienna, Austria", |
| publisher = "Association for Computational Linguistics", |
| url = "https://aclanthology.org/2025.findings-acl.1053/", |
| doi = "10.18653/v1/2025.findings-acl.1053", |
| pages = "20505--20527", |
| ISBN = "979-8-89176-256-5", |
| } |
| ``` |
| |