ac46bfb05ca97f2c3c0d5d441dc1f320

This model is a fine-tuned version of google/umt5-small on the Helsinki-NLP/opus_books [fi-no] dataset. It achieves the following results on the evaluation set:

Loss: 3.3014
Data Size: 1.0
Epoch Runtime: 15.5650
Bleu: 3.2782

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training results

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Bleu
No log	0	0	20.4846	0	1.8456	0.0471
No log	1	85	20.0629	0.0078	2.0750	0.0410
No log	2	170	19.3595	0.0156	2.6262	0.0541
No log	3	255	18.7857	0.0312	3.0738	0.0694
No log	4	340	17.2868	0.0625	3.7254	0.0387
1.2802	5	425	15.4411	0.125	4.6881	0.0419
1.2802	6	510	13.1572	0.25	6.2857	0.0545
4.8141	7	595	9.9638	0.5	9.6160	0.0748
10.6633	8.0	680	6.9914	1.0	16.4476	0.1602
7.6292	9.0	765	5.3567	1.0	14.2130	0.4702
6.5098	10.0	850	4.8971	1.0	14.5216	1.1747
6.2063	11.0	935	4.6364	1.0	15.0752	1.0439
5.7514	12.0	1020	4.4694	1.0	14.9287	0.9951
5.4377	13.0	1105	4.2971	1.0	15.0105	1.2044
5.309	14.0	1190	4.1205	1.0	15.5370	2.0124
5.0263	15.0	1275	3.9475	1.0	14.3123	1.3760
4.8406	16.0	1360	3.8006	1.0	13.9997	1.6711
4.7169	17.0	1445	3.7074	1.0	14.0292	1.9271
4.5842	18.0	1530	3.6490	1.0	14.2387	1.9551
4.4437	19.0	1615	3.6177	1.0	15.4691	2.1789
4.3329	20.0	1700	3.5798	1.0	15.1585	2.2025
4.2839	21.0	1785	3.5639	1.0	15.2358	2.2920
4.233	22.0	1870	3.5289	1.0	14.3394	2.3262
4.1427	23.0	1955	3.5152	1.0	14.6519	2.3887
4.1117	24.0	2040	3.4927	1.0	14.9714	2.5290
4.0378	25.0	2125	3.4744	1.0	15.1210	2.6254
4.0233	26.0	2210	3.4548	1.0	15.5285	2.6719
3.9641	27.0	2295	3.4447	1.0	15.9118	2.6654
3.9449	28.0	2380	3.4381	1.0	15.5406	2.6565
3.8673	29.0	2465	3.4221	1.0	15.5655	2.7155
3.8012	30.0	2550	3.4152	1.0	14.2643	2.7333
3.8407	31.0	2635	3.4047	1.0	14.5685	2.7547
3.8055	32.0	2720	3.3925	1.0	14.4025	2.8800
3.7316	33.0	2805	3.3804	1.0	14.7314	2.8752
3.7355	34.0	2890	3.3729	1.0	14.5378	2.9105
3.6632	35.0	2975	3.3657	1.0	15.2909	2.9530
3.6315	36.0	3060	3.3596	1.0	15.1494	2.9893
3.6343	37.0	3145	3.3544	1.0	15.5905	3.1070
3.5968	38.0	3230	3.3444	1.0	14.3635	2.9994
3.6004	39.0	3315	3.3377	1.0	14.2375	3.0890
3.5587	40.0	3400	3.3384	1.0	14.3461	3.0214
3.5362	41.0	3485	3.3334	1.0	14.8068	3.0894
3.4851	42.0	3570	3.3233	1.0	15.2558	3.1220
3.4934	43.0	3655	3.3145	1.0	14.9720	3.1507
3.4576	44.0	3740	3.3182	1.0	15.4616	3.1191
3.4556	45.0	3825	3.3177	1.0	15.9163	3.1289
3.4143	46.0	3910	3.3114	1.0	14.9332	3.1962
3.3592	47.0	3995	3.3054	1.0	14.9497	3.2189
3.3725	48.0	4080	3.2991	1.0	14.9902	3.2019
3.3191	49.0	4165	3.2964	1.0	15.7867	3.1985
3.3174	50.0	4250	3.3014	1.0	15.5650	3.2782

Framework versions

Transformers 4.57.0
Pytorch 2.8.0+cu128
Datasets 4.2.0
Tokenizers 0.22.1

Downloads last month: 3

Safetensors

Model size

0.6B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/ac46bfb05ca97f2c3c0d5d441dc1f320

Base model

google/umt5-small

Finetuned

(45)

this model