83ca2b931d222347c8424b603646260c

This model is a fine-tuned version of google/mt5-base on the Helsinki-NLP/opus_books [de-ru] dataset. It achieves the following results on the evaluation set:

Loss: 1.5587
Data Size: 1.0
Epoch Runtime: 94.9043
Bleu: 8.9133

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training results

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Bleu
No log	0	0	17.9214	0	8.1126	0.0074
No log	1	434	19.1442	0.0078	9.9087	0.0056
No log	2	868	15.3624	0.0156	9.7936	0.0099
No log	3	1302	9.9556	0.0312	11.7583	0.0137
No log	4	1736	7.2914	0.0625	14.6846	0.0149
0.467	5	2170	3.8665	0.125	20.3708	0.0821
3.9708	6	2604	2.4223	0.25	30.1851	2.6894
2.9952	7	3038	2.1053	0.5	51.0852	3.9262
2.596	8.0	3472	1.9132	1.0	93.1141	4.8514
2.3711	9.0	3906	1.8187	1.0	93.5745	5.4804
2.23	10.0	4340	1.7574	1.0	93.0811	6.0059
2.1331	11.0	4774	1.7146	1.0	93.0832	6.5665
2.0062	12.0	5208	1.6797	1.0	93.8144	6.7421
1.9482	13.0	5642	1.6533	1.0	93.8417	7.1525
1.873	14.0	6076	1.6290	1.0	93.1289	7.1184
1.8153	15.0	6510	1.6088	1.0	93.7660	7.4296
1.7606	16.0	6944	1.5983	1.0	93.3560	7.7978
1.7148	17.0	7378	1.5834	1.0	93.1155	7.9990
1.6777	18.0	7812	1.5761	1.0	94.9056	8.1409
1.5887	19.0	8246	1.5651	1.0	93.7158	8.1161
1.5872	20.0	8680	1.5490	1.0	94.3458	8.2935
1.534	21.0	9114	1.5488	1.0	95.2056	8.4408
1.4803	22.0	9548	1.5472	1.0	93.1390	8.4442
1.4652	23.0	9982	1.5434	1.0	93.4683	8.5657
1.4289	24.0	10416	1.5374	1.0	93.5374	8.6401
1.3991	25.0	10850	1.5361	1.0	94.5638	8.6876
1.3638	26.0	11284	1.5435	1.0	93.9030	8.7042
1.3452	27.0	11718	1.5347	1.0	92.9642	8.7730
1.2729	28.0	12152	1.5374	1.0	93.2383	8.7727
1.2714	29.0	12586	1.5336	1.0	93.6202	8.8632
1.2404	30.0	13020	1.5326	1.0	93.4600	8.8624
1.214	31.0	13454	1.5427	1.0	95.5668	8.8654
1.1871	32.0	13888	1.5489	1.0	94.9191	8.9337
1.1825	33.0	14322	1.5495	1.0	92.5991	8.9077
1.1405	34.0	14756	1.5587	1.0	94.9043	8.9133

Framework versions

Transformers 4.57.0
Pytorch 2.8.0+cu128
Datasets 4.2.0
Tokenizers 0.22.1

Downloads last month: 1

Safetensors

Model size

1.0B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/83ca2b931d222347c8424b603646260c

Base model

google/mt5-base

Finetuned

(301)

this model