Update README.md

b8310cd verified 5 months ago

5.11 kB

	---
	library_name: transformers
	language:
	- mt
	license: cc-by-nc-sa-4.0
	base_model: google/mt5-small
	datasets:
	- nlpaueb/multi_eurlex
	model-index:
	- name: mt5-small_multieurlex-mlt
	results:
	- task:
	type: text-classification
	name: Topic Classification
	dataset:
	type: multieurlex-mt
	name: nlpaueb/multi_eurlex
	config: mt
	metrics:
	- type: f1
	args: macro
	value: 30.10
	name: Macro-averaged F1
	source:
	name: MELABench Leaderboard
	url: https://huggingface.co/spaces/MLRS/MELABench
	extra_gated_fields:
	Name: text
	Surname: text
	Date of Birth: date_picker
	Organisation: text
	Country: country
	I agree to use this model in accordance to the license and for non-commercial use ONLY: checkbox
	---

	# mT5-Small (MultiEURLEX Maltese)

	This model is a fine-tuned version of [google/mt5-small](https://huggingface.co/google/mt5-small) on the [nlpaueb/multi_eurlex mt](https://huggingface.co/datasets/nlpaueb/multi_eurlex) dataset.
	It achieves the following results on the test set:
	- Loss: 0.3648
	- F1: 0.3125

	## Intended uses & limitations

	The model is fine-tuned on a specific task and it should be used on the same or similar task.
	Any limitations present in the base model are inherited.

	## Training procedure

	The model was fine-tuned using a customised [script](https://github.com/MLRS/MELABench/blob/main/finetuning/run_seq2seq_classification.py).

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.001
	- train_batch_size: 32
	- eval_batch_size: 8
	- seed: 42
	- optimizer: Use adafactor and the args are:
	No additional optimizer arguments
	- lr_scheduler_type: linear
	- num_epochs: 200.0
	- early_stopping_patience: 20

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| F1 \|
	\|:-------------:\|:-----:\|:-----:\|:---------------:\|:------:\|
	\| 1.5559 \| 1.0 \| 548 \| 0.4136 \| 0.2994 \|
	\| 0.424 \| 2.0 \| 1096 \| 0.3933 \| 0.2995 \|
	\| 0.4078 \| 3.0 \| 1644 \| 0.3755 \| 0.3007 \|
	\| 0.3848 \| 4.0 \| 2192 \| 0.3663 \| 0.2990 \|
	\| 0.3714 \| 5.0 \| 2740 \| 0.3571 \| 0.2987 \|
	\| 0.3599 \| 6.0 \| 3288 \| 0.3452 \| 0.3010 \|
	\| 0.3436 \| 7.0 \| 3836 \| 0.3237 \| 0.3010 \|
	\| 0.3358 \| 8.0 \| 4384 \| 0.3232 \| 0.3009 \|
	\| 0.3292 \| 9.0 \| 4932 \| 0.3145 \| 0.2989 \|
	\| 0.3196 \| 10.0 \| 5480 \| 0.3101 \| 0.2983 \|
	\| 0.3045 \| 11.0 \| 6028 \| 0.3111 \| 0.2985 \|
	\| 0.301 \| 12.0 \| 6576 \| 0.3009 \| 0.2941 \|
	\| 0.3017 \| 13.0 \| 7124 \| 0.3081 \| 0.2911 \|
	\| 0.3008 \| 14.0 \| 7672 \| 0.3077 \| 0.2952 \|
	\| 0.2945 \| 15.0 \| 8220 \| 0.3013 \| 0.2982 \|
	\| 0.2933 \| 16.0 \| 8768 \| 0.2941 \| 0.2940 \|
	\| 0.2858 \| 17.0 \| 9316 \| 0.3019 \| 0.2918 \|
	\| 0.2849 \| 18.0 \| 9864 \| 0.2933 \| 0.2965 \|
	\| 0.2804 \| 19.0 \| 10412 \| 0.2937 \| 0.2918 \|
	\| 0.2814 \| 20.0 \| 10960 \| 0.2969 \| 0.2960 \|
	\| 0.2735 \| 21.0 \| 11508 \| 0.2983 \| 0.2925 \|
	\| 0.2735 \| 22.0 \| 12056 \| 0.3021 \| 0.2986 \|
	\| 0.2713 \| 23.0 \| 12604 \| 0.2953 \| 0.2956 \|
	\| 0.2704 \| 24.0 \| 13152 \| 0.3007 \| 0.2959 \|
	\| 0.2634 \| 25.0 \| 13700 \| 0.3044 \| 0.2986 \|
	\| 0.2678 \| 26.0 \| 14248 \| 0.2996 \| 0.3005 \|
	\| 0.2611 \| 27.0 \| 14796 \| 0.2942 \| 0.2961 \|


	### Framework versions

	- Transformers 4.51.1
	- Pytorch 2.7.0+cu126
	- Datasets 3.2.0
	- Tokenizers 0.21.1

	## License

	This work is licensed under a
	[Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License][cc-by-nc-sa].
	Permissions beyond the scope of this license may be available at [https://mlrs.research.um.edu.mt/](https://mlrs.research.um.edu.mt/).

	[![CC BY-NC-SA 4.0][cc-by-nc-sa-image]][cc-by-nc-sa]

	[cc-by-nc-sa]: http://creativecommons.org/licenses/by-nc-sa/4.0/
	[cc-by-nc-sa-image]: https://licensebuttons.net/l/by-nc-sa/4.0/88x31.png

	## Citation

	This work was first presented in [MELABenchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource Maltese NLP](https://arxiv.org/abs/2506.04385).
	Cite it as follows:

	```bibtex
	@inproceedings{micallef-borg-2025-melabenchv1,
	title = "{MELAB}enchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource {M}altese {NLP}",
	author = "Micallef, Kurt and
	Borg, Claudia",
	editor = "Che, Wanxiang and
	Nabende, Joyce and
	Shutova, Ekaterina and
	Pilehvar, Mohammad Taher",
	booktitle = "Findings of the Association for Computational Linguistics: ACL 2025",
	month = jul,
	year = "2025",
	address = "Vienna, Austria",
	publisher = "Association for Computational Linguistics",
	url = "https://aclanthology.org/2025.findings-acl.1053/",
	doi = "10.18653/v1/2025.findings-acl.1053",
	pages = "20505--20527",
	ISBN = "979-8-89176-256-5",
	}
	```