BERTu_sib200-mlt / README.md
KurtMica's picture
Update README.md
c35543f verified
metadata
library_name: transformers
language:
  - mt
license: cc-by-nc-sa-4.0
base_model: MLRS/BERTu
datasets:
  - Davlan/sib200
model-index:
  - name: BERTu_sentiment-mlt
    results:
      - task:
          type: text-classification
          name: Topic Classification
        dataset:
          type: sib200-mlt_Latn
          name: Davlan/sib200
          config: mlt_Latn
        metrics:
          - type: f1
            args: macro
            value: 86.21
            name: Macro-averaged F1
        source:
          name: MELABench Leaderboard
          url: https://huggingface.co/spaces/MLRS/MELABench
extra_gated_fields:
  Name: text
  Surname: text
  Date of Birth: date_picker
  Organisation: text
  Country: country
  I agree to use this model in accordance to the license and for non-commercial use ONLY: checkbox

BERTu (SIB-200 Maltese)

This model is a fine-tuned version of MLRS/BERTu on the Davlan/sib200 mlt_Latn dataset. It achieves the following results on the test set:

  • Loss: 0.5018
  • F1: 0.8621

Intended uses & limitations

The model is fine-tuned on a specific task and it should be used on the same or similar task. Any limitations present in the base model are inherited.

Training procedure

The model was fine-tuned using a customised script.

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 16
  • eval_batch_size: 32
  • seed: 3
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: inverse_sqrt
  • lr_scheduler_warmup_ratio: 0.005
  • num_epochs: 200.0
  • early_stopping_patience: 20

Training results

Training Loss Epoch Step Validation Loss F1
No log 1.0 44 1.5054 0.4062
No log 2.0 88 0.8147 0.8010
No log 3.0 132 0.5343 0.8243
No log 4.0 176 0.4906 0.8290
No log 5.0 220 0.4502 0.8505
No log 6.0 264 0.4615 0.8450
No log 7.0 308 0.5045 0.8552
No log 8.0 352 0.5117 0.8525
No log 9.0 396 0.5132 0.8684
No log 10.0 440 0.5334 0.8607
No log 11.0 484 0.5530 0.8592
0.3355 12.0 528 0.5476 0.8607
0.3355 13.0 572 0.5605 0.8684
0.3355 14.0 616 0.5683 0.8607
0.3355 15.0 660 0.5689 0.8607
0.3355 16.0 704 0.5729 0.8607
0.3355 17.0 748 0.5831 0.8607
0.3355 18.0 792 0.5860 0.8607
0.3355 19.0 836 0.5919 0.8607
0.3355 20.0 880 0.5971 0.8684
0.3355 21.0 924 0.6006 0.8607
0.3355 22.0 968 0.6053 0.8607
0.0037 23.0 1012 0.6094 0.8607
0.0037 24.0 1056 0.6141 0.8607
0.0037 25.0 1100 0.6177 0.8684
0.0037 26.0 1144 0.6202 0.8607
0.0037 27.0 1188 0.6241 0.8684
0.0037 28.0 1232 0.6291 0.8684
0.0037 29.0 1276 0.6328 0.8684

Framework versions

  • Transformers 4.51.1
  • Pytorch 2.7.0+cu126
  • Datasets 3.2.0
  • Tokenizers 0.21.1

License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Permissions beyond the scope of this license may be available at https://mlrs.research.um.edu.mt/.

CC BY-NC-SA 4.0

Citation

This work was first presented in MELABenchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource Maltese NLP. Cite it as follows:

@inproceedings{micallef-borg-2025-melabenchv1,
    title = "{MELAB}enchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource {M}altese {NLP}",
    author = "Micallef, Kurt  and
      Borg, Claudia",
    editor = "Che, Wanxiang  and
      Nabende, Joyce  and
      Shutova, Ekaterina  and
      Pilehvar, Mohammad Taher",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2025",
    month = jul,
    year = "2025",
    address = "Vienna, Austria",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.findings-acl.1053/",
    doi = "10.18653/v1/2025.findings-acl.1053",
    pages = "20505--20527",
    ISBN = "979-8-89176-256-5",
}