5b409f4438fef7ba7ab2132dbaaa0d91

This model is a fine-tuned version of google/mt5-large on the Helsinki-NLP/opus_books [de-ru] dataset. It achieves the following results on the evaluation set:

  • Loss: 1.3111
  • Data Size: 1.0
  • Epoch Runtime: 180.7120
  • Bleu: 10.3155

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 23.3626 0 13.8304 0.0125
No log 1 434 23.9957 0.0078 15.3958 0.0093
No log 2 868 20.1921 0.0156 17.7159 0.0108
No log 3 1302 10.4964 0.0312 21.5673 0.0223
No log 4 1736 3.9784 0.0625 28.8893 0.1610
0.5362 5 2170 2.5155 0.125 39.9764 0.5591
2.8305 6 2604 1.8329 0.25 59.9116 6.1797
2.1372 7 3038 1.5705 0.5 101.8626 6.1786
1.8663 8.0 3472 1.4538 1.0 190.6350 7.8930
1.6803 9.0 3906 1.3960 1.0 186.8617 9.0823
1.5558 10.0 4340 1.3584 1.0 187.2073 9.5962
1.4758 11.0 4774 1.3342 1.0 189.5848 9.9947
1.364 12.0 5208 1.3198 1.0 187.0171 10.0214
1.3104 13.0 5642 1.3096 1.0 190.2128 9.9382
1.2567 14.0 6076 1.3096 1.0 187.6537 9.9939
1.1979 15.0 6510 1.2991 1.0 189.2613 10.2176
1.1351 16.0 6944 1.2906 1.0 181.2897 10.2235
1.1014 17.0 7378 1.3017 1.0 179.3653 10.2882
1.0424 18.0 7812 1.3031 1.0 179.7579 10.2756
0.9774 19.0 8246 1.3075 1.0 180.9290 10.2699
0.9627 20.0 8680 1.3111 1.0 180.7120 10.3155

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
-
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/5b409f4438fef7ba7ab2132dbaaa0d91

Base model

google/mt5-large
Finetuned
(97)
this model