BERTu_sib200-mlt / README.md

Update README.md

c35543f verified 5 months ago

5.37 kB

	---
	library_name: transformers
	language:
	- mt
	license: cc-by-nc-sa-4.0
	base_model: MLRS/BERTu
	datasets:
	- Davlan/sib200
	model-index:
	- name: BERTu_sentiment-mlt
	results:
	- task:
	type: text-classification
	name: Topic Classification
	dataset:
	type: sib200-mlt_Latn
	name: Davlan/sib200
	config: mlt_Latn
	metrics:
	- type: f1
	args: macro
	value: 86.21
	name: Macro-averaged F1
	source:
	name: MELABench Leaderboard
	url: https://huggingface.co/spaces/MLRS/MELABench
	extra_gated_fields:
	Name: text
	Surname: text
	Date of Birth: date_picker
	Organisation: text
	Country: country
	I agree to use this model in accordance to the license and for non-commercial use ONLY: checkbox
	---

	# BERTu (SIB-200 Maltese)

	<img src="https://raw.githubusercontent.com/MLRS/BERTu/master/logo.png" width="200" margin-right="1em" align="left" />

	This model is a fine-tuned version of [MLRS/BERTu](https://huggingface.co/MLRS/BERTu) on the [Davlan/sib200 mlt_Latn](https://huggingface.co/datasets/Davlan/sib200/viewer/mlt_Latn) dataset.
	It achieves the following results on the test set:
	- Loss: 0.5018
	- F1: 0.8621

	## Intended uses & limitations

	The model is fine-tuned on a specific task and it should be used on the same or similar task.
	Any limitations present in the base model are inherited.

	## Training procedure

	The model was fine-tuned using a customised [script](https://github.com/MLRS/MELABench/blob/main/finetuning/run_classification.py).

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 2e-05
	- train_batch_size: 16
	- eval_batch_size: 32
	- seed: 3
	- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: inverse_sqrt
	- lr_scheduler_warmup_ratio: 0.005
	- num_epochs: 200.0
	- early_stopping_patience: 20

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| F1 \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:------:\|
	\| No log \| 1.0 \| 44 \| 1.5054 \| 0.4062 \|
	\| No log \| 2.0 \| 88 \| 0.8147 \| 0.8010 \|
	\| No log \| 3.0 \| 132 \| 0.5343 \| 0.8243 \|
	\| No log \| 4.0 \| 176 \| 0.4906 \| 0.8290 \|
	\| No log \| 5.0 \| 220 \| 0.4502 \| 0.8505 \|
	\| No log \| 6.0 \| 264 \| 0.4615 \| 0.8450 \|
	\| No log \| 7.0 \| 308 \| 0.5045 \| 0.8552 \|
	\| No log \| 8.0 \| 352 \| 0.5117 \| 0.8525 \|
	\| No log \| 9.0 \| 396 \| 0.5132 \| 0.8684 \|
	\| No log \| 10.0 \| 440 \| 0.5334 \| 0.8607 \|
	\| No log \| 11.0 \| 484 \| 0.5530 \| 0.8592 \|
	\| 0.3355 \| 12.0 \| 528 \| 0.5476 \| 0.8607 \|
	\| 0.3355 \| 13.0 \| 572 \| 0.5605 \| 0.8684 \|
	\| 0.3355 \| 14.0 \| 616 \| 0.5683 \| 0.8607 \|
	\| 0.3355 \| 15.0 \| 660 \| 0.5689 \| 0.8607 \|
	\| 0.3355 \| 16.0 \| 704 \| 0.5729 \| 0.8607 \|
	\| 0.3355 \| 17.0 \| 748 \| 0.5831 \| 0.8607 \|
	\| 0.3355 \| 18.0 \| 792 \| 0.5860 \| 0.8607 \|
	\| 0.3355 \| 19.0 \| 836 \| 0.5919 \| 0.8607 \|
	\| 0.3355 \| 20.0 \| 880 \| 0.5971 \| 0.8684 \|
	\| 0.3355 \| 21.0 \| 924 \| 0.6006 \| 0.8607 \|
	\| 0.3355 \| 22.0 \| 968 \| 0.6053 \| 0.8607 \|
	\| 0.0037 \| 23.0 \| 1012 \| 0.6094 \| 0.8607 \|
	\| 0.0037 \| 24.0 \| 1056 \| 0.6141 \| 0.8607 \|
	\| 0.0037 \| 25.0 \| 1100 \| 0.6177 \| 0.8684 \|
	\| 0.0037 \| 26.0 \| 1144 \| 0.6202 \| 0.8607 \|
	\| 0.0037 \| 27.0 \| 1188 \| 0.6241 \| 0.8684 \|
	\| 0.0037 \| 28.0 \| 1232 \| 0.6291 \| 0.8684 \|
	\| 0.0037 \| 29.0 \| 1276 \| 0.6328 \| 0.8684 \|


	### Framework versions

	- Transformers 4.51.1
	- Pytorch 2.7.0+cu126
	- Datasets 3.2.0
	- Tokenizers 0.21.1

	## License

	This work is licensed under a
	[Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License][cc-by-nc-sa].
	Permissions beyond the scope of this license may be available at [https://mlrs.research.um.edu.mt/](https://mlrs.research.um.edu.mt/).

	[![CC BY-NC-SA 4.0][cc-by-nc-sa-image]][cc-by-nc-sa]

	[cc-by-nc-sa]: http://creativecommons.org/licenses/by-nc-sa/4.0/
	[cc-by-nc-sa-image]: https://licensebuttons.net/l/by-nc-sa/4.0/88x31.png

	## Citation

	This work was first presented in [MELABenchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource Maltese NLP](https://arxiv.org/abs/2506.04385).
	Cite it as follows:

	```bibtex
	@inproceedings{micallef-borg-2025-melabenchv1,
	title = "{MELAB}enchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource {M}altese {NLP}",
	author = "Micallef, Kurt and
	Borg, Claudia",
	editor = "Che, Wanxiang and
	Nabende, Joyce and
	Shutova, Ekaterina and
	Pilehvar, Mohammad Taher",
	booktitle = "Findings of the Association for Computational Linguistics: ACL 2025",
	month = jul,
	year = "2025",
	address = "Vienna, Austria",
	publisher = "Association for Computational Linguistics",
	url = "https://aclanthology.org/2025.findings-acl.1053/",
	doi = "10.18653/v1/2025.findings-acl.1053",
	pages = "20505--20527",
	ISBN = "979-8-89176-256-5",
	}
	```