End of training

b96e560 verified 5 months ago

6.17 kB

library_name: transformers
license: apache-2.0
base_model: google/umt5-small
tags:
  - generated_from_trainer
metrics:
  - bleu
model-index:
  - name: 1bd9cceed2dcd10f2ece1070a2e20a3c
    results: []

1bd9cceed2dcd10f2ece1070a2e20a3c

This model is a fine-tuned version of google/umt5-small on the Helsinki-NLP/opus_books [es-it] dataset. It achieves the following results on the evaluation set:

Loss: 2.8072
Data Size: 1.0
Epoch Runtime: 112.5041
Bleu: 4.3082

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training results

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Bleu
No log	0	0	16.6425	0	10.0867	0.2928
No log	1	721	16.3540	0.0078	10.9510	0.2893
No log	2	1442	14.6128	0.0156	11.9953	0.3034
0.3458	3	2163	12.9897	0.0312	14.2539	0.3222
1.0187	4	2884	9.4280	0.0625	17.0721	0.3791
9.7765	5	3605	6.0200	0.125	23.5481	0.4979
6.4556	6	4326	4.5495	0.25	35.6383	1.6521
5.2523	7	5047	3.9789	0.5	61.2487	1.6179
4.508	8.0	5768	3.5314	1.0	112.2478	2.0289
4.1967	9.0	6489	3.3986	1.0	112.5871	2.3105
4.052	10.0	7210	3.3132	1.0	113.8064	2.5055
3.8898	11.0	7931	3.2623	1.0	112.9548	2.6756
3.8333	12.0	8652	3.2131	1.0	113.5531	2.7829
3.7702	13.0	9373	3.1821	1.0	113.0508	2.8880
3.6635	14.0	10094	3.1422	1.0	113.4677	3.0043
3.6578	15.0	10815	3.1133	1.0	113.1681	3.0899
3.5582	16.0	11536	3.0999	1.0	113.1700	3.1533
3.5449	17.0	12257	3.0735	1.0	114.2252	3.2124
3.5093	18.0	12978	3.0548	1.0	112.8411	3.2856
3.4384	19.0	13699	3.0419	1.0	113.2164	3.3314
3.4229	20.0	14420	3.0157	1.0	113.5167	3.3987
3.4119	21.0	15141	3.0014	1.0	113.0884	3.4310
3.3609	22.0	15862	2.9874	1.0	113.3006	3.5151
3.2723	23.0	16583	2.9811	1.0	114.8710	3.5543
3.2748	24.0	17304	2.9645	1.0	114.0400	3.6138
3.2806	25.0	18025	2.9625	1.0	113.1700	3.6308
3.2696	26.0	18746	2.9382	1.0	113.3355	3.6929
3.2254	27.0	19467	2.9330	1.0	112.4022	3.6982
3.2108	28.0	20188	2.9252	1.0	113.1494	3.7675
3.1536	29.0	20909	2.9150	1.0	113.0551	3.8057
3.1271	30.0	21630	2.9039	1.0	113.0676	3.8281
3.1324	31.0	22351	2.9001	1.0	113.4059	3.8688
3.1245	32.0	23072	2.8917	1.0	114.1657	3.9119
3.0853	33.0	23793	2.8821	1.0	113.8688	3.9384
3.025	34.0	24514	2.8809	1.0	113.4756	3.9585
3.0303	35.0	25235	2.8723	1.0	112.4681	3.9852
3.0046	36.0	25956	2.8594	1.0	113.2854	3.9970
2.9943	37.0	26677	2.8579	1.0	113.8893	4.0160
2.9874	38.0	27398	2.8528	1.0	112.7151	4.0269
2.9358	39.0	28119	2.8503	1.0	113.6051	4.0450
2.9332	40.0	28840	2.8432	1.0	112.3515	4.0958
2.9513	41.0	29561	2.8370	1.0	113.1157	4.1324
2.9465	42.0	30282	2.8293	1.0	112.8311	4.1777
2.8816	43.0	31003	2.8295	1.0	114.5790	4.1632
2.8867	44.0	31724	2.8162	1.0	114.1613	4.1918
2.8684	45.0	32445	2.8202	1.0	112.4990	4.2068
2.8588	46.0	33166	2.8130	1.0	113.9478	4.2499
2.817	47.0	33887	2.8068	1.0	113.8720	4.2553
2.8057	48.0	34608	2.8122	1.0	112.9857	4.2949
2.8197	49.0	35329	2.8053	1.0	112.6193	4.3000
2.8217	50.0	36050	2.8072	1.0	112.5041	4.3082

Framework versions

Transformers 4.57.0
Pytorch 2.8.0+cu128
Datasets 4.2.0
Tokenizers 0.22.1