8ca2e63766dbde8a7aa3097817fd2f3c

This model is a fine-tuned version of google-t5/t5-large on the Helsinki-NLP/opus_books [en-ru] dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9198
  • Data Size: 1.0
  • Epoch Runtime: 178.2104
  • Bleu: 14.4192

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 2.1967 0 13.1722 0.4158
No log 1 437 2.0449 0.0078 14.8412 0.9544
No log 2 874 1.9380 0.0156 15.9749 1.3087
No log 3 1311 1.8222 0.0312 20.7062 1.7295
No log 4 1748 1.7119 0.0625 26.9948 2.8520
1.81 5 2185 1.6065 0.125 38.1287 3.4831
1.6817 6 2622 1.4706 0.25 59.5105 4.3908
1.5213 7 3059 1.3347 0.5 100.7658 6.1842
1.3251 8.0 3496 1.1834 1.0 180.4133 8.4047
1.2087 9.0 3933 1.0970 1.0 182.8740 9.7600
1.091 10.0 4370 1.0375 1.0 177.3929 10.6758
1.0247 11.0 4807 1.0022 1.0 176.7033 11.4031
0.9658 12.0 5244 0.9670 1.0 177.8799 12.2733
0.9057 13.0 5681 0.9476 1.0 178.1959 12.5709
0.8377 14.0 6118 0.9361 1.0 176.9459 13.1417
0.8037 15.0 6555 0.9284 1.0 180.4766 13.0320
0.7443 16.0 6992 0.9108 1.0 179.0464 13.6398
0.7211 17.0 7429 0.9076 1.0 178.3307 13.3907
0.6807 18.0 7866 0.9184 1.0 181.1875 13.8668
0.6415 19.0 8303 0.9113 1.0 178.9721 13.9984
0.6186 20.0 8740 0.9104 1.0 180.3212 13.9662
0.592 21.0 9177 0.9198 1.0 178.2104 14.4192

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
-
Safetensors
Model size
0.8B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/8ca2e63766dbde8a7aa3097817fd2f3c

Base model

google-t5/t5-large
Finetuned
(173)
this model