| --- |
| library_name: transformers |
| language: |
| - mt |
| license: cc-by-nc-sa-4.0 |
| base_model: google/mt5-small |
| datasets: |
| - dennlinger/eur-lex-sum |
| model-index: |
| - name: mt5-small_eurlexsum-mlt |
| results: |
| - task: |
| type: summarization |
| name: Text Summarization |
| dataset: |
| type: eurlexsum-mlt |
| name: dennlinger/eur-lex-sum maltese |
| config: maltese |
| metrics: |
| - type: chrf |
| value: 52.14 |
| name: ChrF |
| - type: rougel |
| value: 0.44 |
| name: Rouge-L |
| source: |
| name: MELABench Leaderboard |
| url: https://huggingface.co/spaces/MLRS/MELABench |
| extra_gated_fields: |
| Name: text |
| Surname: text |
| Date of Birth: date_picker |
| Organisation: text |
| Country: country |
| I agree to use this model in accordance to the license and for non-commercial use ONLY: checkbox |
| --- |
| |
| # mT5-Small (Eur-Lex-Sum Maltese) |
|
|
| This model is a fine-tuned version of [google/mt5-small](https://huggingface.co/google/mt5-small) on the [dennlinger/eur-lex-sum maltese](https://huggingface.co/datasets/dennlinger/eur-lex-sum) dataset. |
| It achieves the following results on the test set: |
| - Loss: 1.4531 |
| - Chrf: |
| - Score: 51.5481 |
| - Char Order: 6 |
| - Word Order: 0 |
| - Beta: 2 |
| - Rouge: |
| - Rouge1: 0.5176 |
| - Rouge2: 0.3497 |
| - Rougel: 0.4249 |
| - Rougelsum: 0.4247 |
| - Gen Len: 254.8511 |
|
|
| ## Intended uses & limitations |
|
|
| The model is fine-tuned on a specific task and it should be used on the same or similar task. |
| Any limitations present in the base model are inherited. |
|
|
| ## Training procedure |
|
|
| The model was fine-tuned using a customised [script](https://github.com/MLRS/MELABench/blob/main/finetuning/run_seq2seq.py). |
|
|
| ### Training hyperparameters |
|
|
| The following hyperparameters were used during training: |
| - learning_rate: 0.001 |
| - train_batch_size: 32 |
| - eval_batch_size: 32 |
| - seed: 42 |
| - optimizer: Use adafactor and the args are: |
| No additional optimizer arguments |
| - lr_scheduler_type: linear |
| - num_epochs: 200.0 |
| - early_stopping_patience: 20 |
|
|
| ### Training results |
|
|
| | Training Loss | Epoch | Step | Validation Loss | Chrf Score | Chrf Char Order | Chrf Word Order | Chrf Beta | Rouge Rouge1 | Rouge Rouge2 | Rouge Rougel | Rouge Rougelsum | Gen Len | |
| |:-------------:|:-----:|:----:|:---------------:|:----------:|:---------------:|:---------------:|:---------:|:------------:|:------------:|:------------:|:---------------:|:--------:| |
| | No log | 1.0 | 30 | 2.2500 | 18.4506 | 6 | 0 | 2 | 0.1901 | 0.0868 | 0.1728 | 0.1729 | 255.0 | |
| | No log | 2.0 | 60 | 1.9908 | 40.0789 | 6 | 0 | 2 | 0.3872 | 0.2330 | 0.3379 | 0.3376 | 255.0 | |
| | No log | 3.0 | 90 | 1.7490 | 44.1723 | 6 | 0 | 2 | 0.4406 | 0.2759 | 0.3760 | 0.3758 | 255.0 | |
| | No log | 4.0 | 120 | 1.7205 | 49.4429 | 6 | 0 | 2 | 0.4885 | 0.3313 | 0.4081 | 0.4079 | 255.0 | |
| | No log | 5.0 | 150 | 1.5647 | 46.3055 | 6 | 0 | 2 | 0.4626 | 0.3068 | 0.3886 | 0.3886 | 255.0 | |
| | No log | 6.0 | 180 | 1.5374 | 46.3856 | 6 | 0 | 2 | 0.4756 | 0.3169 | 0.3986 | 0.3989 | 254.4439 | |
| | No log | 7.0 | 210 | 1.5262 | 47.2806 | 6 | 0 | 2 | 0.4706 | 0.3154 | 0.3959 | 0.3962 | 254.7807 | |
| | No log | 8.0 | 240 | 1.5142 | 48.5214 | 6 | 0 | 2 | 0.4916 | 0.3255 | 0.4121 | 0.4119 | 254.8449 | |
| | No log | 9.0 | 270 | 1.5271 | 49.4788 | 6 | 0 | 2 | 0.4982 | 0.3350 | 0.4211 | 0.4210 | 253.9893 | |
| | No log | 10.0 | 300 | 1.4995 | 48.3063 | 6 | 0 | 2 | 0.4832 | 0.3224 | 0.4127 | 0.4126 | 254.6684 | |
| | No log | 11.0 | 330 | 1.4947 | 52.1382 | 6 | 0 | 2 | 0.5213 | 0.3593 | 0.4416 | 0.4418 | 254.7914 | |
| | No log | 12.0 | 360 | 1.4704 | 49.9226 | 6 | 0 | 2 | 0.5004 | 0.3363 | 0.4236 | 0.4235 | 254.6203 | |
| | No log | 13.0 | 390 | 1.4933 | 51.6030 | 6 | 0 | 2 | 0.5199 | 0.3514 | 0.4317 | 0.4318 | 253.6257 | |
| | No log | 14.0 | 420 | 1.4640 | 47.8714 | 6 | 0 | 2 | 0.4840 | 0.3242 | 0.4094 | 0.4091 | 254.6952 | |
| | No log | 15.0 | 450 | 1.4726 | 51.2718 | 6 | 0 | 2 | 0.5188 | 0.3488 | 0.4354 | 0.4356 | 254.7166 | |
| | No log | 16.0 | 480 | 1.4667 | 49.9968 | 6 | 0 | 2 | 0.4989 | 0.3400 | 0.4287 | 0.4281 | 254.6203 | |
| | 1.7931 | 17.0 | 510 | 1.4624 | 50.7874 | 6 | 0 | 2 | 0.5123 | 0.3436 | 0.4345 | 0.4345 | 254.5508 | |
| | 1.7931 | 18.0 | 540 | 1.4775 | 50.5126 | 6 | 0 | 2 | 0.5121 | 0.3448 | 0.4273 | 0.4274 | 253.4439 | |
| | 1.7931 | 19.0 | 570 | 1.4762 | 50.7875 | 6 | 0 | 2 | 0.5194 | 0.3458 | 0.4311 | 0.4315 | 252.6631 | |
| | 1.7931 | 20.0 | 600 | 1.5157 | 52.2624 | 6 | 0 | 2 | 0.5187 | 0.3446 | 0.4324 | 0.4323 | 253.8289 | |
| | 1.7931 | 21.0 | 630 | 1.4982 | 51.8279 | 6 | 0 | 2 | 0.5161 | 0.3478 | 0.4368 | 0.4369 | 254.3529 | |
| | 1.7931 | 22.0 | 660 | 1.5087 | 51.9486 | 6 | 0 | 2 | 0.5174 | 0.3438 | 0.4315 | 0.4310 | 254.7807 | |
| | 1.7931 | 23.0 | 690 | 1.5355 | 51.9191 | 6 | 0 | 2 | 0.5224 | 0.3500 | 0.4301 | 0.4298 | 254.4439 | |
| | 1.7931 | 24.0 | 720 | 1.5061 | 50.0702 | 6 | 0 | 2 | 0.5002 | 0.3307 | 0.4152 | 0.4153 | 254.1765 | |
| | 1.7931 | 25.0 | 750 | 1.5271 | 50.3567 | 6 | 0 | 2 | 0.5046 | 0.3349 | 0.4216 | 0.4222 | 253.3102 | |
| | 1.7931 | 26.0 | 780 | 1.5378 | 50.8240 | 6 | 0 | 2 | 0.5089 | 0.3401 | 0.4210 | 0.4202 | 253.6471 | |
| | 1.7931 | 27.0 | 810 | 1.5414 | 50.8294 | 6 | 0 | 2 | 0.5118 | 0.3447 | 0.4282 | 0.4280 | 254.1176 | |
| | 1.7931 | 28.0 | 840 | 1.5774 | 52.6591 | 6 | 0 | 2 | 0.5283 | 0.3537 | 0.4390 | 0.4387 | 253.6684 | |
| | 1.7931 | 29.0 | 870 | 1.5661 | 52.3420 | 6 | 0 | 2 | 0.5292 | 0.3525 | 0.4376 | 0.4376 | 253.3262 | |
| | 1.7931 | 30.0 | 900 | 1.6079 | 51.8227 | 6 | 0 | 2 | 0.5212 | 0.3448 | 0.4313 | 0.4315 | 253.9626 | |
| | 1.7931 | 31.0 | 930 | 1.5900 | 51.9129 | 6 | 0 | 2 | 0.5245 | 0.3479 | 0.4327 | 0.4327 | 253.7380 | |
|
|
|
|
| ### Framework versions |
|
|
| - Transformers 4.48.2 |
| - Pytorch 2.4.1+cu121 |
| - Datasets 3.2.0 |
| - Tokenizers 0.21.0 |
|
|
| ## License |
|
|
| This work is licensed under a |
| [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License][cc-by-nc-sa]. |
| Permissions beyond the scope of this license may be available at [https://mlrs.research.um.edu.mt/](https://mlrs.research.um.edu.mt/). |
|
|
| [![CC BY-NC-SA 4.0][cc-by-nc-sa-image]][cc-by-nc-sa] |
|
|
| [cc-by-nc-sa]: http://creativecommons.org/licenses/by-nc-sa/4.0/ |
| [cc-by-nc-sa-image]: https://licensebuttons.net/l/by-nc-sa/4.0/88x31.png |
|
|
| ## Citation |
|
|
| This work was first presented in [MELABenchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource Maltese NLP](https://arxiv.org/abs/2506.04385). |
| Cite it as follows: |
|
|
| ```bibtex |
| @inproceedings{micallef-borg-2025-melabenchv1, |
| title = "{MELAB}enchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource {M}altese {NLP}", |
| author = "Micallef, Kurt and |
| Borg, Claudia", |
| editor = "Che, Wanxiang and |
| Nabende, Joyce and |
| Shutova, Ekaterina and |
| Pilehvar, Mohammad Taher", |
| booktitle = "Findings of the Association for Computational Linguistics: ACL 2025", |
| month = jul, |
| year = "2025", |
| address = "Vienna, Austria", |
| publisher = "Association for Computational Linguistics", |
| url = "https://aclanthology.org/2025.findings-acl.1053/", |
| doi = "10.18653/v1/2025.findings-acl.1053", |
| pages = "20505--20527", |
| ISBN = "979-8-89176-256-5", |
| } |
| ``` |
|
|