Update README.md

c4c7e52 verified 8 months ago

11.7 kB

library_name: transformers
language:
  - mt
license: cc-by-nc-sa-4.0
base_model: google/mt5-small
datasets:
  - webnlg/challenge-2023
model-index:
  - name: mt5-small_webnlg-mlt
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          type: webnlg_mt
          name: webnlg/challenge-2023
          config: mt
        metrics:
          - type: chrf
            value: 47.86
            name: ChrF
          - type: rougel
            value: 48.35
            name: Rouge-L
        source:
          name: MELABench Leaderboard
          url: https://huggingface.co/spaces/MLRS/MELABench
extra_gated_fields:
  Name: text
  Surname: text
  Date of Birth: date_picker
  Organisation: text
  Country: country
  I agree to use this model in accordance to the license and for non-commercial use ONLY: checkbox

mT5-Small (WebNLG Maltese)

This model is a fine-tuned version of google/mt5-small on the webnlg/challenge-2023 mt dataset. It achieves the following results on the test set:

Loss: 4.0028
Chrf
- Score: 31.6417
- Char Order: 6
- Word Order: 0
- Beta: 2
Rouge:
- Rouge1: 0.3464
- Rouge2: 0.1552
- Rougel: 0.2797
- Rougelsum: 0.2797
Gen Len: 41.3142

Intended uses & limitations

The model is fine-tuned on a specific task and it should be used on the same or similar task. Any limitations present in the base model are inherited.

Training procedure

The model was fine-tuned using a customised script.

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 32
eval_batch_size: 32
seed: 42
optimizer: Use adafactor and the args are: No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 200.0
early_stopping_patience: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Chrf Score	Chrf Char Order	Chrf Beta	Rouge Rouge1	Rouge Rouge2	Rouge Rougel	Rouge Rougelsum	Gen Len
No log	1.0	413	1.9425	36.6472	6	2	0.4422	0.2384	0.3786	0.3786	41.7718
2.9022	2.0	826	1.7744	39.3892	6	2	0.4914	0.2853	0.4246	0.4246	32.6222
1.1948	3.0	1239	1.7101	40.9010	6	2	0.5116	0.3069	0.4444	0.4445	30.5916
0.9411	4.0	1652	1.6656	41.7312	6	2	0.5228	0.3138	0.4521	0.4522	29.4324
0.8059	5.0	2065	1.7050	43.6392	6	2	0.5394	0.3266	0.4638	0.4638	31.0360
0.8059	6.0	2478	1.7013	45.7818	6	2	0.5490	0.3303	0.4685	0.4689	34.6811
0.7092	7.0	2891	1.7480	45.4992	6	2	0.5507	0.3378	0.4716	0.4716	32.6366
0.6343	8.0	3304	1.7694	46.6990	6	2	0.5574	0.3406	0.4767	0.4769	32.9538
0.5849	9.0	3717	1.8058	46.1749	6	2	0.5548	0.3394	0.4747	0.4751	32.9459
0.5417	10.0	4130	1.8047	45.7135	6	2	0.5525	0.3340	0.4731	0.4734	32.3598
0.506	11.0	4543	1.8555	45.2631	6	2	0.5511	0.3357	0.4740	0.4745	30.5940
0.506	12.0	4956	1.9072	48.1670	6	2	0.5647	0.3436	0.4779	0.4779	35.5598
0.4679	13.0	5369	1.8842	46.5682	6	2	0.5601	0.3440	0.4786	0.4786	32.7610
0.4355	14.0	5782	1.9549	45.8614	6	2	0.5570	0.3418	0.4765	0.4766	31.9219
0.4132	15.0	6195	2.0120	46.3608	6	2	0.5589	0.3433	0.4785	0.4785	31.5231
0.3921	16.0	6608	1.9967	47.3205	6	2	0.5629	0.3460	0.4799	0.4800	33.4625
0.3702	17.0	7021	2.0298	46.2312	6	2	0.5558	0.3375	0.4715	0.4717	32.0348
0.3702	18.0	7434	2.0882	47.4461	6	2	0.5645	0.3450	0.4780	0.4780	33.7477
0.3447	19.0	7847	2.0836	48.3709	6	2	0.5683	0.3471	0.4774	0.4774	34.9514
0.3259	20.0	8260	2.1483	47.2591	6	2	0.5662	0.3468	0.4788	0.4790	32.8258
0.314	21.0	8673	2.1717	47.1720	6	2	0.5619	0.3424	0.4774	0.4775	32.9495
0.296	22.0	9086	2.1921	47.8603	6	2	0.5706	0.3494	0.4835	0.4838	33.9309
0.296	23.0	9499	2.2782	47.4664	6	2	0.5647	0.3449	0.4774	0.4776	33.2060
0.2845	24.0	9912	2.2365	47.7147	6	2	0.5633	0.3448	0.4767	0.4767	33.8763
0.264	25.0	10325	2.3044	46.6542	6	2	0.5577	0.3387	0.4706	0.4706	32.8595
0.2523	26.0	10738	2.2961	48.6373	6	2	0.5696	0.3476	0.4796	0.4797	34.8505
0.2432	27.0	11151	2.3465	48.0798	6	2	0.5639	0.3417	0.4765	0.4767	34.2979
0.2342	28.0	11564	2.3723	46.5735	6	2	0.5581	0.3394	0.4755	0.4755	32.2901
0.2342	29.0	11977	2.4377	47.8037	6	2	0.5661	0.3445	0.4767	0.4770	33.9459
0.2213	30.0	12390	2.4408	47.6035	6	2	0.5604	0.3390	0.4738	0.4735	33.9045
0.209	31.0	12803	2.4824	47.9566	6	2	0.5636	0.3438	0.4752	0.4753	33.9045
0.2009	32.0	13216	2.5603	48.2374	6	2	0.5661	0.3438	0.4750	0.4750	34.2378
0.1928	33.0	13629	2.5011	47.6750	6	2	0.5630	0.3417	0.4749	0.4753	34.1279
0.1876	34.0	14042	2.5800	48.1924	6	2	0.5617	0.3373	0.4712	0.4710	34.8667
0.1876	35.0	14455	2.6025	49.7077	6	2	0.5739	0.3489	0.4783	0.4786	36.3231
0.1756	36.0	14868	2.6041	48.9179	6	2	0.5656	0.3397	0.4726	0.4726	35.8432
0.1683	37.0	15281	2.6548	48.8265	6	2	0.5680	0.3416	0.4776	0.4777	34.9946
0.1622	38.0	15694	2.6819	49.3948	6	2	0.5709	0.3458	0.4795	0.4794	36.3520
0.1573	39.0	16107	2.7615	48.7379	6	2	0.5662	0.3400	0.4721	0.4723	35.6745
0.1516	40.0	16520	2.7286	49.0554	6	2	0.5679	0.3446	0.4757	0.4758	36.1453
0.1516	41.0	16933	2.7290	49.3973	6	2	0.5677	0.3424	0.4740	0.4739	37.0631
0.1437	42.0	17346	2.8045	47.3914	6	2	0.5601	0.3371	0.4692	0.4690	33.9021

Framework versions

Transformers 4.48.2
Pytorch 2.4.1+cu121
Datasets 3.2.0
Tokenizers 0.21.0

License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Permissions beyond the scope of this license may be available at https://mlrs.research.um.edu.mt/.

Citation

This work was first presented in MELABenchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource Maltese NLP. Cite it as follows:

@inproceedings{micallef-borg-2025-melabenchv1,
    title = "{MELAB}enchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource {M}altese {NLP}",
    author = "Micallef, Kurt  and
      Borg, Claudia",
    editor = "Che, Wanxiang  and
      Nabende, Joyce  and
      Shutova, Ekaterina  and
      Pilehvar, Mohammad Taher",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2025",
    month = jul,
    year = "2025",
    address = "Vienna, Austria",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.findings-acl.1053/",
    doi = "10.18653/v1/2025.findings-acl.1053",
    pages = "20505--20527",
    ISBN = "979-8-89176-256-5",
}