KurtMica's picture
Update README.md
de0d053 verified
---
library_name: transformers
language:
- mt
- en
license: cc-by-nc-sa-4.0
base_model: google/mt5-small
datasets:
- MLRS/OPUS-MT-EN-Fixed
model-index:
- name: mt5-small_opus100-eng-mlt
results:
- task:
type: machine-translation
name: Machine Translation
dataset:
type: opus100-eng-mlt
name: MLRS/OPUS-MT-EN-Fixed
metrics:
- type: bleu
value: 51.00
name: BLEU
- type: chrf
value: 75.40
name: ChrF
source:
name: MELABench Leaderboard
url: https://huggingface.co/spaces/MLRS/MELABench
extra_gated_fields:
Name: text
Surname: text
Date of Birth: date_picker
Organisation: text
Country: country
I agree to use this model in accordance to the license and for non-commercial use ONLY: checkbox
---
# mT5-Small (OPUS-100 English→Maltese)
This model is a fine-tuned version of [google/mt5-small](https://huggingface.co/google/mt5-small) on the [MLRS/OPUS-MT-EN-Fixed](https://huggingface.co/datasets/MLRS/OPUS-MT-EN-Fixed) dataset.
It achieves the following results on the test set:
- Loss: 0.5395
- Bleu:
- Bleu: 0.5175
- Brevity Penalty: 0.9870
- Length Ratio: 0.9871
- Translation Length: 41331
- Reference Length: 41873
- Chrf:
- Score: 75.9261
- Char Order: 6
- Word Order: 0
- Beta: 2
- Gen Len: 51.54
## Intended uses & limitations
The model is fine-tuned on a specific task and it should be used on the same or similar task.
Any limitations present in the base model are inherited.
## Training procedure
The model was fine-tuned using a customised [script](https://github.com/MLRS/MELABench/blob/main/finetuning/run_seq2seq.py).
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.001
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 128
- optimizer: Use adafactor and the args are:
No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 10.0
### Training results
| Training Loss | Epoch | Step | Validation Loss | Bleu Bleu | Bleu Brevity Penalty | Bleu Length Ratio | Bleu Translation Length | Bleu Reference Length | Chrf Score | Chrf Char Order | Chrf Word Order | Chrf Beta | Gen Len |
|:-------------:|:------:|:-----:|:---------------:|:---------:|:--------------------:|:-----------------:|:-----------------------:|:---------------------:|:----------:|:---------------:|:---------------:|:---------:|:-------:|
| 0.7973 | 1.0 | 7813 | 0.7208 | 0.4470 | 0.9888 | 0.9889 | 43489 | 43979 | 71.3941 | 6 | 0 | 2 | 54.3435 |
| 0.6534 | 2.0 | 15626 | 0.6406 | 0.4712 | 0.9931 | 0.9931 | 43675 | 43979 | 72.7865 | 6 | 0 | 2 | 54.1575 |
| 0.5785 | 3.0 | 23439 | 0.6027 | 0.4804 | 0.9937 | 0.9937 | 43703 | 43979 | 73.6939 | 6 | 0 | 2 | 54.501 |
| 0.5336 | 4.0 | 31252 | 0.5779 | 0.4900 | 0.9937 | 0.9937 | 43704 | 43979 | 74.1543 | 6 | 0 | 2 | 54.535 |
| 0.5034 | 5.0 | 39065 | 0.5617 | 0.4995 | 1.0 | 1.0004 | 43998 | 43979 | 74.7266 | 6 | 0 | 2 | 54.694 |
| 0.4797 | 6.0 | 46878 | 0.5501 | 0.4985 | 0.9897 | 0.9898 | 43530 | 43979 | 74.7707 | 6 | 0 | 2 | 54.3215 |
| 0.4576 | 7.0 | 54691 | 0.5458 | 0.5050 | 0.9921 | 0.9921 | 43632 | 43979 | 75.0066 | 6 | 0 | 2 | 54.259 |
| 0.4424 | 8.0 | 62504 | 0.5369 | 0.5062 | 0.9914 | 0.9915 | 43604 | 43979 | 75.0734 | 6 | 0 | 2 | 54.286 |
| 0.4287 | 9.0 | 70317 | 0.5358 | 0.5107 | 0.9875 | 0.9876 | 43434 | 43979 | 75.3841 | 6 | 0 | 2 | 54.1655 |
| 0.417 | 9.9988 | 78120 | 0.5350 | 0.5100 | 0.9868 | 0.9869 | 43404 | 43979 | 75.4033 | 6 | 0 | 2 | 54.058 |
### Framework versions
- Transformers 4.48.1
- Pytorch 2.4.1+cu121
- Datasets 3.2.0
- Tokenizers 0.21.0
## License
This work is licensed under a
[Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License][cc-by-nc-sa].
Permissions beyond the scope of this license may be available at [https://mlrs.research.um.edu.mt/](https://mlrs.research.um.edu.mt/).
[![CC BY-NC-SA 4.0][cc-by-nc-sa-image]][cc-by-nc-sa]
[cc-by-nc-sa]: http://creativecommons.org/licenses/by-nc-sa/4.0/
[cc-by-nc-sa-image]: https://licensebuttons.net/l/by-nc-sa/4.0/88x31.png
## Citation
This work was first presented in [MELABenchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource Maltese NLP](https://arxiv.org/abs/2506.04385).
Cite it as follows:
```bibtex
@inproceedings{micallef-borg-2025-melabenchv1,
title = "{MELAB}enchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource {M}altese {NLP}",
author = "Micallef, Kurt and
Borg, Claudia",
editor = "Che, Wanxiang and
Nabende, Joyce and
Shutova, Ekaterina and
Pilehvar, Mohammad Taher",
booktitle = "Findings of the Association for Computational Linguistics: ACL 2025",
month = jul,
year = "2025",
address = "Vienna, Austria",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.findings-acl.1053/",
doi = "10.18653/v1/2025.findings-acl.1053",
pages = "20505--20527",
ISBN = "979-8-89176-256-5",
}
```