d7f1049a2fa34b32d9e1ee4cc7938efb

This model is a fine-tuned version of google/mt5-large on the Helsinki-NLP/opus_books [en-sv] dataset. It achieves the following results on the evaluation set:

Loss: 1.5929
Data Size: 1.0
Epoch Runtime: 39.9771
Bleu: 11.4174

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training results

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Bleu
No log	0	0	23.1069	0	3.4459	0.0073
No log	1	77	21.9427	0.0078	3.5749	0.0077
No log	2	154	18.2021	0.0156	7.0180	0.0091
No log	3	231	15.9658	0.0312	9.9569	0.0103
No log	4	308	13.2615	0.0625	12.5620	0.0135
No log	5	385	6.2569	0.125	15.3582	0.0250
1.3066	6	462	3.6604	0.25	17.7817	0.2901
1.9396	7	539	2.5102	0.5	26.2846	0.8960
3.0753	8.0	616	2.0290	1.0	44.3548	1.2903
2.6545	9.0	693	1.8167	1.0	39.0566	12.6798
2.2231	10.0	770	1.7058	1.0	40.3304	8.1665
2.1147	11.0	847	1.6617	1.0	39.6470	9.2358
1.9042	12.0	924	1.6281	1.0	39.8972	10.1124
1.7797	13.0	1001	1.6121	1.0	39.0526	10.5262
1.706	14.0	1078	1.5885	1.0	40.1050	10.9272
1.6053	15.0	1155	1.5839	1.0	39.1444	11.0332
1.5345	16.0	1232	1.5768	1.0	40.3075	11.1244
1.4472	17.0	1309	1.5833	1.0	39.0605	11.3432
1.3965	18.0	1386	1.5863	1.0	39.8835	11.3031
1.3086	19.0	1463	1.5860	1.0	40.2195	11.4865
1.2804	20.0	1540	1.5929	1.0	39.9771	11.4174

Framework versions

Transformers 4.57.0
Pytorch 2.8.0+cu128
Datasets 4.2.0
Tokenizers 0.22.1

Downloads last month: -

Safetensors

Model size

0.3B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/d7f1049a2fa34b32d9e1ee4cc7938efb

Base model

google/mt5-large

Finetuned

(97)

this model