|
|
--- |
|
|
library_name: transformers |
|
|
language: |
|
|
- mt |
|
|
license: cc-by-nc-sa-4.0 |
|
|
base_model: MLRS/BERTu |
|
|
datasets: |
|
|
- nlpaueb/multi_eurlex |
|
|
model-index: |
|
|
- name: BERTu_multieurlex-mlt |
|
|
results: |
|
|
- task: |
|
|
type: text-classification |
|
|
name: Topic Classification |
|
|
dataset: |
|
|
type: multieurlex-mt |
|
|
name: nlpaueb/multi_eurlex |
|
|
config: mt |
|
|
metrics: |
|
|
- type: f1 |
|
|
args: macro |
|
|
value: 30.10 |
|
|
name: Macro-averaged F1 |
|
|
source: |
|
|
name: MELABench Leaderboard |
|
|
url: https://huggingface.co/spaces/MLRS/MELABench |
|
|
extra_gated_fields: |
|
|
Name: text |
|
|
Surname: text |
|
|
Date of Birth: date_picker |
|
|
Organisation: text |
|
|
Country: country |
|
|
I agree to use this model in accordance to the license and for non-commercial use ONLY: checkbox |
|
|
--- |
|
|
|
|
|
# BERTu (Maltese News Categories) |
|
|
|
|
|
<img src="https://raw.githubusercontent.com/MLRS/BERTu/master/logo.png" width="200" margin-right="1em" align="left" /> |
|
|
|
|
|
This model is a fine-tuned version of [MLRS/BERTu](https://huggingface.co/MLRS/BERTu) on the [nlpaueb/multi_eurlex mt](https://huggingface.co/datasets/nlpaueb/multi_eurlex) dataset. |
|
|
It achieves the following results on the test set: |
|
|
- Loss: 0.2734 |
|
|
- F1: 0.6723 |
|
|
|
|
|
## Intended uses & limitations |
|
|
|
|
|
The model is fine-tuned on a specific task and it should be used on the same or similar task. |
|
|
Any limitations present in the base model are inherited. |
|
|
|
|
|
## Training procedure |
|
|
|
|
|
The model was fine-tuned using a customised [script](https://github.com/MLRS/MELABench/blob/main/finetuning/run_classification.py). |
|
|
|
|
|
### Training hyperparameters |
|
|
|
|
|
The following hyperparameters were used during training: |
|
|
- learning_rate: 2e-05 |
|
|
- train_batch_size: 32 |
|
|
- eval_batch_size: 32 |
|
|
- seed: 3 |
|
|
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments |
|
|
- lr_scheduler_type: inverse_sqrt |
|
|
- lr_scheduler_warmup_ratio: 0.005 |
|
|
- num_epochs: 200.0 |
|
|
- early_stopping_patience: 20 |
|
|
|
|
|
### Training results |
|
|
|
|
|
| Training Loss | Epoch | Step | Validation Loss | F1 | |
|
|
|:-------------:|:-----:|:-----:|:---------------:|:------:| |
|
|
| 0.3962 | 1.0 | 548 | 0.2352 | 0.4398 | |
|
|
| 0.2143 | 2.0 | 1096 | 0.1898 | 0.5998 | |
|
|
| 0.1753 | 3.0 | 1644 | 0.1780 | 0.6361 | |
|
|
| 0.1547 | 4.0 | 2192 | 0.1744 | 0.6610 | |
|
|
| 0.1401 | 5.0 | 2740 | 0.1725 | 0.6687 | |
|
|
| 0.1284 | 6.0 | 3288 | 0.1723 | 0.6814 | |
|
|
| 0.1187 | 7.0 | 3836 | 0.1717 | 0.6882 | |
|
|
| 0.1119 | 8.0 | 4384 | 0.1725 | 0.6951 | |
|
|
| 0.1031 | 9.0 | 4932 | 0.1757 | 0.6997 | |
|
|
| 0.0977 | 10.0 | 5480 | 0.1766 | 0.7012 | |
|
|
| 0.0861 | 11.0 | 6028 | 0.1767 | 0.7089 | |
|
|
| 0.0811 | 12.0 | 6576 | 0.1826 | 0.7060 | |
|
|
| 0.0769 | 13.0 | 7124 | 0.1817 | 0.7074 | |
|
|
| 0.0733 | 14.0 | 7672 | 0.1865 | 0.7071 | |
|
|
| 0.0697 | 15.0 | 8220 | 0.1879 | 0.7090 | |
|
|
| 0.0656 | 16.0 | 8768 | 0.1906 | 0.7065 | |
|
|
| 0.0633 | 17.0 | 9316 | 0.1921 | 0.7123 | |
|
|
| 0.0594 | 18.0 | 9864 | 0.1946 | 0.7152 | |
|
|
| 0.0574 | 19.0 | 10412 | 0.1964 | 0.7178 | |
|
|
| 0.0545 | 20.0 | 10960 | 0.1988 | 0.7153 | |
|
|
| 0.0503 | 21.0 | 11508 | 0.2003 | 0.7149 | |
|
|
| 0.0479 | 22.0 | 12056 | 0.2018 | 0.7179 | |
|
|
| 0.0459 | 23.0 | 12604 | 0.2041 | 0.7194 | |
|
|
| 0.0438 | 24.0 | 13152 | 0.2051 | 0.7197 | |
|
|
| 0.0424 | 25.0 | 13700 | 0.2076 | 0.7182 | |
|
|
| 0.0404 | 26.0 | 14248 | 0.2089 | 0.7182 | |
|
|
| 0.0393 | 27.0 | 14796 | 0.2111 | 0.7167 | |
|
|
| 0.0373 | 28.0 | 15344 | 0.2138 | 0.7181 | |
|
|
| 0.036 | 29.0 | 15892 | 0.2148 | 0.7228 | |
|
|
| 0.0346 | 30.0 | 16440 | 0.2186 | 0.7176 | |
|
|
| 0.0334 | 31.0 | 16988 | 0.2190 | 0.7179 | |
|
|
| 0.0305 | 32.0 | 17536 | 0.2213 | 0.7191 | |
|
|
| 0.0301 | 33.0 | 18084 | 0.2214 | 0.7207 | |
|
|
| 0.0281 | 34.0 | 18632 | 0.2242 | 0.7192 | |
|
|
| 0.0275 | 35.0 | 19180 | 0.2233 | 0.7214 | |
|
|
| 0.0266 | 36.0 | 19728 | 0.2258 | 0.7206 | |
|
|
| 0.0255 | 37.0 | 20276 | 0.2290 | 0.7176 | |
|
|
| 0.0247 | 38.0 | 20824 | 0.2307 | 0.7204 | |
|
|
| 0.0238 | 39.0 | 21372 | 0.2321 | 0.7160 | |
|
|
| 0.0231 | 40.0 | 21920 | 0.2350 | 0.7235 | |
|
|
| 0.0225 | 41.0 | 22468 | 0.2343 | 0.7170 | |
|
|
| 0.0208 | 42.0 | 23016 | 0.2369 | 0.7210 | |
|
|
| 0.0199 | 43.0 | 23564 | 0.2390 | 0.7205 | |
|
|
| 0.0193 | 44.0 | 24112 | 0.2396 | 0.7225 | |
|
|
| 0.0188 | 45.0 | 24660 | 0.2414 | 0.7192 | |
|
|
| 0.0184 | 46.0 | 25208 | 0.2441 | 0.7185 | |
|
|
| 0.0176 | 47.0 | 25756 | 0.2445 | 0.7224 | |
|
|
| 0.0172 | 48.0 | 26304 | 0.2468 | 0.7185 | |
|
|
| 0.0167 | 49.0 | 26852 | 0.2476 | 0.7187 | |
|
|
| 0.0161 | 50.0 | 27400 | 0.2472 | 0.7212 | |
|
|
| 0.0158 | 51.0 | 27948 | 0.2511 | 0.7200 | |
|
|
| 0.0151 | 52.0 | 28496 | 0.2507 | 0.7201 | |
|
|
| 0.0142 | 53.0 | 29044 | 0.2533 | 0.7173 | |
|
|
| 0.0137 | 54.0 | 29592 | 0.2550 | 0.7210 | |
|
|
| 0.0133 | 55.0 | 30140 | 0.2553 | 0.7191 | |
|
|
| 0.013 | 56.0 | 30688 | 0.2581 | 0.7213 | |
|
|
| 0.0127 | 57.0 | 31236 | 0.2597 | 0.7209 | |
|
|
| 0.0121 | 58.0 | 31784 | 0.2616 | 0.7175 | |
|
|
| 0.012 | 59.0 | 32332 | 0.2605 | 0.7198 | |
|
|
| 0.0115 | 60.0 | 32880 | 0.2641 | 0.7207 | |
|
|
|
|
|
|
|
|
### Framework versions |
|
|
|
|
|
- Transformers 4.51.1 |
|
|
- Pytorch 2.7.0+cu126 |
|
|
- Datasets 3.2.0 |
|
|
- Tokenizers 0.21.1 |
|
|
|
|
|
## License |
|
|
|
|
|
This work is licensed under a |
|
|
[Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License][cc-by-nc-sa]. |
|
|
Permissions beyond the scope of this license may be available at [https://mlrs.research.um.edu.mt/](https://mlrs.research.um.edu.mt/). |
|
|
|
|
|
[![CC BY-NC-SA 4.0][cc-by-nc-sa-image]][cc-by-nc-sa] |
|
|
|
|
|
[cc-by-nc-sa]: http://creativecommons.org/licenses/by-nc-sa/4.0/ |
|
|
[cc-by-nc-sa-image]: https://licensebuttons.net/l/by-nc-sa/4.0/88x31.png |
|
|
|
|
|
## Citation |
|
|
|
|
|
This work was first presented in [MELABenchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource Maltese NLP](https://arxiv.org/abs/2506.04385). |
|
|
Cite it as follows: |
|
|
|
|
|
```bibtex |
|
|
@inproceedings{micallef-borg-2025-melabenchv1, |
|
|
title = "{MELAB}enchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource {M}altese {NLP}", |
|
|
author = "Micallef, Kurt and |
|
|
Borg, Claudia", |
|
|
editor = "Che, Wanxiang and |
|
|
Nabende, Joyce and |
|
|
Shutova, Ekaterina and |
|
|
Pilehvar, Mohammad Taher", |
|
|
booktitle = "Findings of the Association for Computational Linguistics: ACL 2025", |
|
|
month = jul, |
|
|
year = "2025", |
|
|
address = "Vienna, Austria", |
|
|
publisher = "Association for Computational Linguistics", |
|
|
url = "https://aclanthology.org/2025.findings-acl.1053/", |
|
|
doi = "10.18653/v1/2025.findings-acl.1053", |
|
|
pages = "20505--20527", |
|
|
ISBN = "979-8-89176-256-5", |
|
|
} |
|
|
``` |
|
|
|