Model upload.

de1278b verified 5 months ago

7.29 kB

	---
	library_name: transformers
	language:
	- mt
	license: cc-by-nc-sa-4.0
	base_model: MLRS/BERTu
	datasets:
	- nlpaueb/multi_eurlex
	model-index:
	- name: BERTu_multieurlex-mlt
	results:
	- task:
	type: text-classification
	name: Topic Classification
	dataset:
	type: multieurlex-mt
	name: nlpaueb/multi_eurlex
	config: mt
	metrics:
	- type: f1
	args: macro
	value: 30.10
	name: Macro-averaged F1
	source:
	name: MELABench Leaderboard
	url: https://huggingface.co/spaces/MLRS/MELABench
	extra_gated_fields:
	Name: text
	Surname: text
	Date of Birth: date_picker
	Organisation: text
	Country: country
	I agree to use this model in accordance to the license and for non-commercial use ONLY: checkbox
	---

	# BERTu (Maltese News Categories)

	<img src="https://raw.githubusercontent.com/MLRS/BERTu/master/logo.png" width="200" margin-right="1em" align="left" />

	This model is a fine-tuned version of [MLRS/BERTu](https://huggingface.co/MLRS/BERTu) on the [nlpaueb/multi_eurlex mt](https://huggingface.co/datasets/nlpaueb/multi_eurlex) dataset.
	It achieves the following results on the test set:
	- Loss: 0.2734
	- F1: 0.6723

	## Intended uses & limitations

	The model is fine-tuned on a specific task and it should be used on the same or similar task.
	Any limitations present in the base model are inherited.

	## Training procedure

	The model was fine-tuned using a customised [script](https://github.com/MLRS/MELABench/blob/main/finetuning/run_classification.py).

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 2e-05
	- train_batch_size: 32
	- eval_batch_size: 32
	- seed: 3
	- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: inverse_sqrt
	- lr_scheduler_warmup_ratio: 0.005
	- num_epochs: 200.0
	- early_stopping_patience: 20

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| F1 \|
	\|:-------------:\|:-----:\|:-----:\|:---------------:\|:------:\|
	\| 0.3962 \| 1.0 \| 548 \| 0.2352 \| 0.4398 \|
	\| 0.2143 \| 2.0 \| 1096 \| 0.1898 \| 0.5998 \|
	\| 0.1753 \| 3.0 \| 1644 \| 0.1780 \| 0.6361 \|
	\| 0.1547 \| 4.0 \| 2192 \| 0.1744 \| 0.6610 \|
	\| 0.1401 \| 5.0 \| 2740 \| 0.1725 \| 0.6687 \|
	\| 0.1284 \| 6.0 \| 3288 \| 0.1723 \| 0.6814 \|
	\| 0.1187 \| 7.0 \| 3836 \| 0.1717 \| 0.6882 \|
	\| 0.1119 \| 8.0 \| 4384 \| 0.1725 \| 0.6951 \|
	\| 0.1031 \| 9.0 \| 4932 \| 0.1757 \| 0.6997 \|
	\| 0.0977 \| 10.0 \| 5480 \| 0.1766 \| 0.7012 \|
	\| 0.0861 \| 11.0 \| 6028 \| 0.1767 \| 0.7089 \|
	\| 0.0811 \| 12.0 \| 6576 \| 0.1826 \| 0.7060 \|
	\| 0.0769 \| 13.0 \| 7124 \| 0.1817 \| 0.7074 \|
	\| 0.0733 \| 14.0 \| 7672 \| 0.1865 \| 0.7071 \|
	\| 0.0697 \| 15.0 \| 8220 \| 0.1879 \| 0.7090 \|
	\| 0.0656 \| 16.0 \| 8768 \| 0.1906 \| 0.7065 \|
	\| 0.0633 \| 17.0 \| 9316 \| 0.1921 \| 0.7123 \|
	\| 0.0594 \| 18.0 \| 9864 \| 0.1946 \| 0.7152 \|
	\| 0.0574 \| 19.0 \| 10412 \| 0.1964 \| 0.7178 \|
	\| 0.0545 \| 20.0 \| 10960 \| 0.1988 \| 0.7153 \|
	\| 0.0503 \| 21.0 \| 11508 \| 0.2003 \| 0.7149 \|
	\| 0.0479 \| 22.0 \| 12056 \| 0.2018 \| 0.7179 \|
	\| 0.0459 \| 23.0 \| 12604 \| 0.2041 \| 0.7194 \|
	\| 0.0438 \| 24.0 \| 13152 \| 0.2051 \| 0.7197 \|
	\| 0.0424 \| 25.0 \| 13700 \| 0.2076 \| 0.7182 \|
	\| 0.0404 \| 26.0 \| 14248 \| 0.2089 \| 0.7182 \|
	\| 0.0393 \| 27.0 \| 14796 \| 0.2111 \| 0.7167 \|
	\| 0.0373 \| 28.0 \| 15344 \| 0.2138 \| 0.7181 \|
	\| 0.036 \| 29.0 \| 15892 \| 0.2148 \| 0.7228 \|
	\| 0.0346 \| 30.0 \| 16440 \| 0.2186 \| 0.7176 \|
	\| 0.0334 \| 31.0 \| 16988 \| 0.2190 \| 0.7179 \|
	\| 0.0305 \| 32.0 \| 17536 \| 0.2213 \| 0.7191 \|
	\| 0.0301 \| 33.0 \| 18084 \| 0.2214 \| 0.7207 \|
	\| 0.0281 \| 34.0 \| 18632 \| 0.2242 \| 0.7192 \|
	\| 0.0275 \| 35.0 \| 19180 \| 0.2233 \| 0.7214 \|
	\| 0.0266 \| 36.0 \| 19728 \| 0.2258 \| 0.7206 \|
	\| 0.0255 \| 37.0 \| 20276 \| 0.2290 \| 0.7176 \|
	\| 0.0247 \| 38.0 \| 20824 \| 0.2307 \| 0.7204 \|
	\| 0.0238 \| 39.0 \| 21372 \| 0.2321 \| 0.7160 \|
	\| 0.0231 \| 40.0 \| 21920 \| 0.2350 \| 0.7235 \|
	\| 0.0225 \| 41.0 \| 22468 \| 0.2343 \| 0.7170 \|
	\| 0.0208 \| 42.0 \| 23016 \| 0.2369 \| 0.7210 \|
	\| 0.0199 \| 43.0 \| 23564 \| 0.2390 \| 0.7205 \|
	\| 0.0193 \| 44.0 \| 24112 \| 0.2396 \| 0.7225 \|
	\| 0.0188 \| 45.0 \| 24660 \| 0.2414 \| 0.7192 \|
	\| 0.0184 \| 46.0 \| 25208 \| 0.2441 \| 0.7185 \|
	\| 0.0176 \| 47.0 \| 25756 \| 0.2445 \| 0.7224 \|
	\| 0.0172 \| 48.0 \| 26304 \| 0.2468 \| 0.7185 \|
	\| 0.0167 \| 49.0 \| 26852 \| 0.2476 \| 0.7187 \|
	\| 0.0161 \| 50.0 \| 27400 \| 0.2472 \| 0.7212 \|
	\| 0.0158 \| 51.0 \| 27948 \| 0.2511 \| 0.7200 \|
	\| 0.0151 \| 52.0 \| 28496 \| 0.2507 \| 0.7201 \|
	\| 0.0142 \| 53.0 \| 29044 \| 0.2533 \| 0.7173 \|
	\| 0.0137 \| 54.0 \| 29592 \| 0.2550 \| 0.7210 \|
	\| 0.0133 \| 55.0 \| 30140 \| 0.2553 \| 0.7191 \|
	\| 0.013 \| 56.0 \| 30688 \| 0.2581 \| 0.7213 \|
	\| 0.0127 \| 57.0 \| 31236 \| 0.2597 \| 0.7209 \|
	\| 0.0121 \| 58.0 \| 31784 \| 0.2616 \| 0.7175 \|
	\| 0.012 \| 59.0 \| 32332 \| 0.2605 \| 0.7198 \|
	\| 0.0115 \| 60.0 \| 32880 \| 0.2641 \| 0.7207 \|


	### Framework versions

	- Transformers 4.51.1
	- Pytorch 2.7.0+cu126
	- Datasets 3.2.0
	- Tokenizers 0.21.1

	## License

	This work is licensed under a
	[Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License][cc-by-nc-sa].
	Permissions beyond the scope of this license may be available at [https://mlrs.research.um.edu.mt/](https://mlrs.research.um.edu.mt/).

	[![CC BY-NC-SA 4.0][cc-by-nc-sa-image]][cc-by-nc-sa]

	[cc-by-nc-sa]: http://creativecommons.org/licenses/by-nc-sa/4.0/
	[cc-by-nc-sa-image]: https://licensebuttons.net/l/by-nc-sa/4.0/88x31.png

	## Citation

	This work was first presented in [MELABenchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource Maltese NLP](https://arxiv.org/abs/2506.04385).
	Cite it as follows:

	```bibtex
	@inproceedings{micallef-borg-2025-melabenchv1,
	title = "{MELAB}enchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource {M}altese {NLP}",
	author = "Micallef, Kurt and
	Borg, Claudia",
	editor = "Che, Wanxiang and
	Nabende, Joyce and
	Shutova, Ekaterina and
	Pilehvar, Mohammad Taher",
	booktitle = "Findings of the Association for Computational Linguistics: ACL 2025",
	month = jul,
	year = "2025",
	address = "Vienna, Austria",
	publisher = "Association for Computational Linguistics",
	url = "https://aclanthology.org/2025.findings-acl.1053/",
	doi = "10.18653/v1/2025.findings-acl.1053",
	pages = "20505--20527",
	ISBN = "979-8-89176-256-5",
	}
	```