Update README.md

202f966 verified 8 months ago

9.68 kB

	---
	library_name: transformers
	language:
	- mt
	license: cc-by-nc-sa-4.0
	base_model: google/mt5-small
	datasets:
	- dennlinger/eur-lex-sum
	model-index:
	- name: mt5-small_eurlexsum-mlt
	results:
	- task:
	type: summarization
	name: Text Summarization
	dataset:
	type: eurlexsum-mlt
	name: dennlinger/eur-lex-sum maltese
	config: maltese
	metrics:
	- type: chrf
	value: 52.14
	name: ChrF
	- type: rougel
	value: 0.44
	name: Rouge-L
	source:
	name: MELABench Leaderboard
	url: https://huggingface.co/spaces/MLRS/MELABench
	extra_gated_fields:
	Name: text
	Surname: text
	Date of Birth: date_picker
	Organisation: text
	Country: country
	I agree to use this model in accordance to the license and for non-commercial use ONLY: checkbox
	---

	# mT5-Small (Eur-Lex-Sum Maltese)

	This model is a fine-tuned version of [google/mt5-small](https://huggingface.co/google/mt5-small) on the [dennlinger/eur-lex-sum maltese](https://huggingface.co/datasets/dennlinger/eur-lex-sum) dataset.
	It achieves the following results on the test set:
	- Loss: 1.4531
	- Chrf:
	- Score: 51.5481
	- Char Order: 6
	- Word Order: 0
	- Beta: 2
	- Rouge:
	- Rouge1: 0.5176
	- Rouge2: 0.3497
	- Rougel: 0.4249
	- Rougelsum: 0.4247
	- Gen Len: 254.8511

	## Intended uses & limitations

	The model is fine-tuned on a specific task and it should be used on the same or similar task.
	Any limitations present in the base model are inherited.

	## Training procedure

	The model was fine-tuned using a customised [script](https://github.com/MLRS/MELABench/blob/main/finetuning/run_seq2seq.py).

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.001
	- train_batch_size: 32
	- eval_batch_size: 32
	- seed: 42
	- optimizer: Use adafactor and the args are:
	No additional optimizer arguments
	- lr_scheduler_type: linear
	- num_epochs: 200.0
	- early_stopping_patience: 20

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Chrf Score \| Chrf Char Order \| Chrf Word Order \| Chrf Beta \| Rouge Rouge1 \| Rouge Rouge2 \| Rouge Rougel \| Rouge Rougelsum \| Gen Len \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:----------:\|:---------------:\|:---------------:\|:---------:\|:------------:\|:------------:\|:------------:\|:---------------:\|:--------:\|
	\| No log \| 1.0 \| 30 \| 2.2500 \| 18.4506 \| 6 \| 0 \| 2 \| 0.1901 \| 0.0868 \| 0.1728 \| 0.1729 \| 255.0 \|
	\| No log \| 2.0 \| 60 \| 1.9908 \| 40.0789 \| 6 \| 0 \| 2 \| 0.3872 \| 0.2330 \| 0.3379 \| 0.3376 \| 255.0 \|
	\| No log \| 3.0 \| 90 \| 1.7490 \| 44.1723 \| 6 \| 0 \| 2 \| 0.4406 \| 0.2759 \| 0.3760 \| 0.3758 \| 255.0 \|
	\| No log \| 4.0 \| 120 \| 1.7205 \| 49.4429 \| 6 \| 0 \| 2 \| 0.4885 \| 0.3313 \| 0.4081 \| 0.4079 \| 255.0 \|
	\| No log \| 5.0 \| 150 \| 1.5647 \| 46.3055 \| 6 \| 0 \| 2 \| 0.4626 \| 0.3068 \| 0.3886 \| 0.3886 \| 255.0 \|
	\| No log \| 6.0 \| 180 \| 1.5374 \| 46.3856 \| 6 \| 0 \| 2 \| 0.4756 \| 0.3169 \| 0.3986 \| 0.3989 \| 254.4439 \|
	\| No log \| 7.0 \| 210 \| 1.5262 \| 47.2806 \| 6 \| 0 \| 2 \| 0.4706 \| 0.3154 \| 0.3959 \| 0.3962 \| 254.7807 \|
	\| No log \| 8.0 \| 240 \| 1.5142 \| 48.5214 \| 6 \| 0 \| 2 \| 0.4916 \| 0.3255 \| 0.4121 \| 0.4119 \| 254.8449 \|
	\| No log \| 9.0 \| 270 \| 1.5271 \| 49.4788 \| 6 \| 0 \| 2 \| 0.4982 \| 0.3350 \| 0.4211 \| 0.4210 \| 253.9893 \|
	\| No log \| 10.0 \| 300 \| 1.4995 \| 48.3063 \| 6 \| 0 \| 2 \| 0.4832 \| 0.3224 \| 0.4127 \| 0.4126 \| 254.6684 \|
	\| No log \| 11.0 \| 330 \| 1.4947 \| 52.1382 \| 6 \| 0 \| 2 \| 0.5213 \| 0.3593 \| 0.4416 \| 0.4418 \| 254.7914 \|
	\| No log \| 12.0 \| 360 \| 1.4704 \| 49.9226 \| 6 \| 0 \| 2 \| 0.5004 \| 0.3363 \| 0.4236 \| 0.4235 \| 254.6203 \|
	\| No log \| 13.0 \| 390 \| 1.4933 \| 51.6030 \| 6 \| 0 \| 2 \| 0.5199 \| 0.3514 \| 0.4317 \| 0.4318 \| 253.6257 \|
	\| No log \| 14.0 \| 420 \| 1.4640 \| 47.8714 \| 6 \| 0 \| 2 \| 0.4840 \| 0.3242 \| 0.4094 \| 0.4091 \| 254.6952 \|
	\| No log \| 15.0 \| 450 \| 1.4726 \| 51.2718 \| 6 \| 0 \| 2 \| 0.5188 \| 0.3488 \| 0.4354 \| 0.4356 \| 254.7166 \|
	\| No log \| 16.0 \| 480 \| 1.4667 \| 49.9968 \| 6 \| 0 \| 2 \| 0.4989 \| 0.3400 \| 0.4287 \| 0.4281 \| 254.6203 \|
	\| 1.7931 \| 17.0 \| 510 \| 1.4624 \| 50.7874 \| 6 \| 0 \| 2 \| 0.5123 \| 0.3436 \| 0.4345 \| 0.4345 \| 254.5508 \|
	\| 1.7931 \| 18.0 \| 540 \| 1.4775 \| 50.5126 \| 6 \| 0 \| 2 \| 0.5121 \| 0.3448 \| 0.4273 \| 0.4274 \| 253.4439 \|
	\| 1.7931 \| 19.0 \| 570 \| 1.4762 \| 50.7875 \| 6 \| 0 \| 2 \| 0.5194 \| 0.3458 \| 0.4311 \| 0.4315 \| 252.6631 \|
	\| 1.7931 \| 20.0 \| 600 \| 1.5157 \| 52.2624 \| 6 \| 0 \| 2 \| 0.5187 \| 0.3446 \| 0.4324 \| 0.4323 \| 253.8289 \|
	\| 1.7931 \| 21.0 \| 630 \| 1.4982 \| 51.8279 \| 6 \| 0 \| 2 \| 0.5161 \| 0.3478 \| 0.4368 \| 0.4369 \| 254.3529 \|
	\| 1.7931 \| 22.0 \| 660 \| 1.5087 \| 51.9486 \| 6 \| 0 \| 2 \| 0.5174 \| 0.3438 \| 0.4315 \| 0.4310 \| 254.7807 \|
	\| 1.7931 \| 23.0 \| 690 \| 1.5355 \| 51.9191 \| 6 \| 0 \| 2 \| 0.5224 \| 0.3500 \| 0.4301 \| 0.4298 \| 254.4439 \|
	\| 1.7931 \| 24.0 \| 720 \| 1.5061 \| 50.0702 \| 6 \| 0 \| 2 \| 0.5002 \| 0.3307 \| 0.4152 \| 0.4153 \| 254.1765 \|
	\| 1.7931 \| 25.0 \| 750 \| 1.5271 \| 50.3567 \| 6 \| 0 \| 2 \| 0.5046 \| 0.3349 \| 0.4216 \| 0.4222 \| 253.3102 \|
	\| 1.7931 \| 26.0 \| 780 \| 1.5378 \| 50.8240 \| 6 \| 0 \| 2 \| 0.5089 \| 0.3401 \| 0.4210 \| 0.4202 \| 253.6471 \|
	\| 1.7931 \| 27.0 \| 810 \| 1.5414 \| 50.8294 \| 6 \| 0 \| 2 \| 0.5118 \| 0.3447 \| 0.4282 \| 0.4280 \| 254.1176 \|
	\| 1.7931 \| 28.0 \| 840 \| 1.5774 \| 52.6591 \| 6 \| 0 \| 2 \| 0.5283 \| 0.3537 \| 0.4390 \| 0.4387 \| 253.6684 \|
	\| 1.7931 \| 29.0 \| 870 \| 1.5661 \| 52.3420 \| 6 \| 0 \| 2 \| 0.5292 \| 0.3525 \| 0.4376 \| 0.4376 \| 253.3262 \|
	\| 1.7931 \| 30.0 \| 900 \| 1.6079 \| 51.8227 \| 6 \| 0 \| 2 \| 0.5212 \| 0.3448 \| 0.4313 \| 0.4315 \| 253.9626 \|
	\| 1.7931 \| 31.0 \| 930 \| 1.5900 \| 51.9129 \| 6 \| 0 \| 2 \| 0.5245 \| 0.3479 \| 0.4327 \| 0.4327 \| 253.7380 \|


	### Framework versions

	- Transformers 4.48.2
	- Pytorch 2.4.1+cu121
	- Datasets 3.2.0
	- Tokenizers 0.21.0

	## License

	This work is licensed under a
	[Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License][cc-by-nc-sa].
	Permissions beyond the scope of this license may be available at [https://mlrs.research.um.edu.mt/](https://mlrs.research.um.edu.mt/).

	[![CC BY-NC-SA 4.0][cc-by-nc-sa-image]][cc-by-nc-sa]

	[cc-by-nc-sa]: http://creativecommons.org/licenses/by-nc-sa/4.0/
	[cc-by-nc-sa-image]: https://licensebuttons.net/l/by-nc-sa/4.0/88x31.png

	## Citation

	This work was first presented in [MELABenchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource Maltese NLP](https://arxiv.org/abs/2506.04385).
	Cite it as follows:

	```bibtex
	@inproceedings{micallef-borg-2025-melabenchv1,
	title = "{MELAB}enchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource {M}altese {NLP}",
	author = "Micallef, Kurt and
	Borg, Claudia",
	editor = "Che, Wanxiang and
	Nabende, Joyce and
	Shutova, Ekaterina and
	Pilehvar, Mohammad Taher",
	booktitle = "Findings of the Association for Computational Linguistics: ACL 2025",
	month = jul,
	year = "2025",
	address = "Vienna, Austria",
	publisher = "Association for Computational Linguistics",
	url = "https://aclanthology.org/2025.findings-acl.1053/",
	doi = "10.18653/v1/2025.findings-acl.1053",
	pages = "20505--20527",
	ISBN = "979-8-89176-256-5",
	}
	```