4f6b0c53057ae1c793ce15d7bc7b775f

This model is a fine-tuned version of google/mt5-large on the Helsinki-NLP/opus_books [en-ru] dataset. It achieves the following results on the evaluation set:

  • Loss: 1.3503
  • Data Size: 1.0
  • Epoch Runtime: 181.0558
  • Bleu: 10.6815

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 23.0086 0 13.8254 0.0177
No log 1 437 24.2370 0.0078 15.5508 0.0181
No log 2 874 22.4692 0.0156 18.1560 0.0179
No log 3 1311 20.1321 0.0312 21.8500 0.0203
No log 4 1748 12.3472 0.0625 27.5813 0.0235
6.758 5 2185 2.8360 0.125 38.6283 0.4308
3.1699 6 2622 1.9156 0.25 59.6038 2.9574
2.229 7 3059 1.6252 0.5 101.4152 6.5201
1.8745 8.0 3496 1.4850 1.0 184.1214 8.2572
1.7347 9.0 3933 1.4291 1.0 184.4869 9.5415
1.587 10.0 4370 1.3908 1.0 186.9955 9.8106
1.5171 11.0 4807 1.3615 1.0 183.2168 10.2186
1.424 12.0 5244 1.3433 1.0 182.3575 10.3990
1.3361 13.0 5681 1.3335 1.0 184.1796 10.3927
1.2673 14.0 6118 1.3274 1.0 185.5688 10.3714
1.2173 15.0 6555 1.3282 1.0 180.7421 10.6344
1.1354 16.0 6992 1.3216 1.0 180.5052 10.6264
1.1011 17.0 7429 1.3290 1.0 181.7048 10.6168
1.0552 18.0 7866 1.3304 1.0 184.0366 10.7247
0.9884 19.0 8303 1.3401 1.0 183.3214 10.6767
0.965 20.0 8740 1.3503 1.0 181.0558 10.6815

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
1
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/4f6b0c53057ae1c793ce15d7bc7b775f

Base model

google/mt5-large
Finetuned
(98)
this model