mt5-small_opus100-eng-mlt / README.md

KurtMica

Update README.md

de0d053 verified 5 months ago

preview code

raw

history blame contribute delete

6.34 kB

metadata

library_name: transformers
language:
  - mt
  - en
license: cc-by-nc-sa-4.0
base_model: google/mt5-small
datasets:
  - MLRS/OPUS-MT-EN-Fixed
model-index:
  - name: mt5-small_opus100-eng-mlt
    results:
      - task:
          type: machine-translation
          name: Machine Translation
        dataset:
          type: opus100-eng-mlt
          name: MLRS/OPUS-MT-EN-Fixed
        metrics:
          - type: bleu
            value: 51
            name: BLEU
          - type: chrf
            value: 75.4
            name: ChrF
        source:
          name: MELABench Leaderboard
          url: https://huggingface.co/spaces/MLRS/MELABench
extra_gated_fields:
  Name: text
  Surname: text
  Date of Birth: date_picker
  Organisation: text
  Country: country
  I agree to use this model in accordance to the license and for non-commercial use ONLY: checkbox

mT5-Small (OPUS-100 English→Maltese)

This model is a fine-tuned version of google/mt5-small on the MLRS/OPUS-MT-EN-Fixed dataset. It achieves the following results on the test set:

Loss: 0.5395
Bleu:
- Bleu: 0.5175
- Brevity Penalty: 0.9870
- Length Ratio: 0.9871
- Translation Length: 41331
- Reference Length: 41873
Chrf:
- Score: 75.9261
- Char Order: 6
- Word Order: 0
- Beta: 2
Gen Len: 51.54

Intended uses & limitations

The model is fine-tuned on a specific task and it should be used on the same or similar task. Any limitations present in the base model are inherited.

Training procedure

The model was fine-tuned using a customised script.

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 32
eval_batch_size: 32
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 128
optimizer: Use adafactor and the args are: No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 10.0

Training results

Training Loss	Epoch	Step	Validation Loss	Bleu Bleu	Bleu Brevity Penalty	Bleu Length Ratio	Bleu Translation Length	Bleu Reference Length	Chrf Score	Chrf Char Order	Chrf Beta	Gen Len
0.7973	1.0	7813	0.7208	0.4470	0.9888	0.9889	43489	43979	71.3941	6	2	54.3435
0.6534	2.0	15626	0.6406	0.4712	0.9931	0.9931	43675	43979	72.7865	6	2	54.1575
0.5785	3.0	23439	0.6027	0.4804	0.9937	0.9937	43703	43979	73.6939	6	2	54.501
0.5336	4.0	31252	0.5779	0.4900	0.9937	0.9937	43704	43979	74.1543	6	2	54.535
0.5034	5.0	39065	0.5617	0.4995	1.0	1.0004	43998	43979	74.7266	6	2	54.694
0.4797	6.0	46878	0.5501	0.4985	0.9897	0.9898	43530	43979	74.7707	6	2	54.3215
0.4576	7.0	54691	0.5458	0.5050	0.9921	0.9921	43632	43979	75.0066	6	2	54.259
0.4424	8.0	62504	0.5369	0.5062	0.9914	0.9915	43604	43979	75.0734	6	2	54.286
0.4287	9.0	70317	0.5358	0.5107	0.9875	0.9876	43434	43979	75.3841	6	2	54.1655
0.417	9.9988	78120	0.5350	0.5100	0.9868	0.9869	43404	43979	75.4033	6	2	54.058

Framework versions

Transformers 4.48.1
Pytorch 2.4.1+cu121
Datasets 3.2.0
Tokenizers 0.21.0

License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Permissions beyond the scope of this license may be available at https://mlrs.research.um.edu.mt/.

Citation

This work was first presented in MELABenchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource Maltese NLP. Cite it as follows:

@inproceedings{micallef-borg-2025-melabenchv1,
    title = "{MELAB}enchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource {M}altese {NLP}",
    author = "Micallef, Kurt  and
      Borg, Claudia",
    editor = "Che, Wanxiang  and
      Nabende, Joyce  and
      Shutova, Ekaterina  and
      Pilehvar, Mohammad Taher",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2025",
    month = jul,
    year = "2025",
    address = "Vienna, Austria",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.findings-acl.1053/",
    doi = "10.18653/v1/2025.findings-acl.1053",
    pages = "20505--20527",
    ISBN = "979-8-89176-256-5",
}