Update README.md

de0d053 verified 8 months ago

6.34 kB

	---
	library_name: transformers
	language:
	- mt
	- en
	license: cc-by-nc-sa-4.0
	base_model: google/mt5-small
	datasets:
	- MLRS/OPUS-MT-EN-Fixed
	model-index:
	- name: mt5-small_opus100-eng-mlt
	results:
	- task:
	type: machine-translation
	name: Machine Translation
	dataset:
	type: opus100-eng-mlt
	name: MLRS/OPUS-MT-EN-Fixed
	metrics:
	- type: bleu
	value: 51.00
	name: BLEU
	- type: chrf
	value: 75.40
	name: ChrF
	source:
	name: MELABench Leaderboard
	url: https://huggingface.co/spaces/MLRS/MELABench
	extra_gated_fields:
	Name: text
	Surname: text
	Date of Birth: date_picker
	Organisation: text
	Country: country
	I agree to use this model in accordance to the license and for non-commercial use ONLY: checkbox
	---

	# mT5-Small (OPUS-100 English→Maltese)

	This model is a fine-tuned version of [google/mt5-small](https://huggingface.co/google/mt5-small) on the [MLRS/OPUS-MT-EN-Fixed](https://huggingface.co/datasets/MLRS/OPUS-MT-EN-Fixed) dataset.
	It achieves the following results on the test set:
	- Loss: 0.5395
	- Bleu:
	- Bleu: 0.5175
	- Brevity Penalty: 0.9870
	- Length Ratio: 0.9871
	- Translation Length: 41331
	- Reference Length: 41873
	- Chrf:
	- Score: 75.9261
	- Char Order: 6
	- Word Order: 0
	- Beta: 2
	- Gen Len: 51.54

	## Intended uses & limitations

	The model is fine-tuned on a specific task and it should be used on the same or similar task.
	Any limitations present in the base model are inherited.

	## Training procedure

	The model was fine-tuned using a customised [script](https://github.com/MLRS/MELABench/blob/main/finetuning/run_seq2seq.py).

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.001
	- train_batch_size: 32
	- eval_batch_size: 32
	- seed: 42
	- gradient_accumulation_steps: 4
	- total_train_batch_size: 128
	- optimizer: Use adafactor and the args are:
	No additional optimizer arguments
	- lr_scheduler_type: linear
	- num_epochs: 10.0

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Bleu Bleu \| Bleu Brevity Penalty \| Bleu Length Ratio \| Bleu Translation Length \| Bleu Reference Length \| Chrf Score \| Chrf Char Order \| Chrf Word Order \| Chrf Beta \| Gen Len \|
	\|:-------------:\|:------:\|:-----:\|:---------------:\|:---------:\|:--------------------:\|:-----------------:\|:-----------------------:\|:---------------------:\|:----------:\|:---------------:\|:---------------:\|:---------:\|:-------:\|
	\| 0.7973 \| 1.0 \| 7813 \| 0.7208 \| 0.4470 \| 0.9888 \| 0.9889 \| 43489 \| 43979 \| 71.3941 \| 6 \| 0 \| 2 \| 54.3435 \|
	\| 0.6534 \| 2.0 \| 15626 \| 0.6406 \| 0.4712 \| 0.9931 \| 0.9931 \| 43675 \| 43979 \| 72.7865 \| 6 \| 0 \| 2 \| 54.1575 \|
	\| 0.5785 \| 3.0 \| 23439 \| 0.6027 \| 0.4804 \| 0.9937 \| 0.9937 \| 43703 \| 43979 \| 73.6939 \| 6 \| 0 \| 2 \| 54.501 \|
	\| 0.5336 \| 4.0 \| 31252 \| 0.5779 \| 0.4900 \| 0.9937 \| 0.9937 \| 43704 \| 43979 \| 74.1543 \| 6 \| 0 \| 2 \| 54.535 \|
	\| 0.5034 \| 5.0 \| 39065 \| 0.5617 \| 0.4995 \| 1.0 \| 1.0004 \| 43998 \| 43979 \| 74.7266 \| 6 \| 0 \| 2 \| 54.694 \|
	\| 0.4797 \| 6.0 \| 46878 \| 0.5501 \| 0.4985 \| 0.9897 \| 0.9898 \| 43530 \| 43979 \| 74.7707 \| 6 \| 0 \| 2 \| 54.3215 \|
	\| 0.4576 \| 7.0 \| 54691 \| 0.5458 \| 0.5050 \| 0.9921 \| 0.9921 \| 43632 \| 43979 \| 75.0066 \| 6 \| 0 \| 2 \| 54.259 \|
	\| 0.4424 \| 8.0 \| 62504 \| 0.5369 \| 0.5062 \| 0.9914 \| 0.9915 \| 43604 \| 43979 \| 75.0734 \| 6 \| 0 \| 2 \| 54.286 \|
	\| 0.4287 \| 9.0 \| 70317 \| 0.5358 \| 0.5107 \| 0.9875 \| 0.9876 \| 43434 \| 43979 \| 75.3841 \| 6 \| 0 \| 2 \| 54.1655 \|
	\| 0.417 \| 9.9988 \| 78120 \| 0.5350 \| 0.5100 \| 0.9868 \| 0.9869 \| 43404 \| 43979 \| 75.4033 \| 6 \| 0 \| 2 \| 54.058 \|


	### Framework versions

	- Transformers 4.48.1
	- Pytorch 2.4.1+cu121
	- Datasets 3.2.0
	- Tokenizers 0.21.0

	## License

	This work is licensed under a
	[Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License][cc-by-nc-sa].
	Permissions beyond the scope of this license may be available at [https://mlrs.research.um.edu.mt/](https://mlrs.research.um.edu.mt/).

	[![CC BY-NC-SA 4.0][cc-by-nc-sa-image]][cc-by-nc-sa]

	[cc-by-nc-sa]: http://creativecommons.org/licenses/by-nc-sa/4.0/
	[cc-by-nc-sa-image]: https://licensebuttons.net/l/by-nc-sa/4.0/88x31.png

	## Citation

	This work was first presented in [MELABenchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource Maltese NLP](https://arxiv.org/abs/2506.04385).
	Cite it as follows:

	```bibtex
	@inproceedings{micallef-borg-2025-melabenchv1,
	title = "{MELAB}enchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource {M}altese {NLP}",
	author = "Micallef, Kurt and
	Borg, Claudia",
	editor = "Che, Wanxiang and
	Nabende, Joyce and
	Shutova, Ekaterina and
	Pilehvar, Mohammad Taher",
	booktitle = "Findings of the Association for Computational Linguistics: ACL 2025",
	month = jul,
	year = "2025",
	address = "Vienna, Austria",
	publisher = "Association for Computational Linguistics",
	url = "https://aclanthology.org/2025.findings-acl.1053/",
	doi = "10.18653/v1/2025.findings-acl.1053",
	pages = "20505--20527",
	ISBN = "979-8-89176-256-5",
	}
	```