e9793706acad19a2f1b2de93b4bd05ad

This model is a fine-tuned version of google-t5/t5-small on the Helsinki-NLP/opus_books dataset. It achieves the following results on the evaluation set:

  • Loss: 1.2333
  • Data Size: 1.0
  • Epoch Runtime: 62.4257
  • Bleu: 6.7287

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 3.5153 0 6.0879 0.1499
No log 1 447 3.3034 0.0078 6.7053 0.1761
0.0534 2 894 2.9168 0.0156 6.3933 0.3345
0.0695 3 1341 2.6587 0.0312 7.3390 0.2405
0.1077 4 1788 2.5219 0.0625 9.7589 0.5060
0.1872 5 2235 2.3968 0.125 12.3938 0.4165
2.5361 6 2682 2.2921 0.25 20.1647 0.4993
2.3725 7 3129 2.1755 0.5 33.7213 0.7969
2.2278 8.0 3576 2.0506 1.0 65.3310 1.2631
2.1392 9.0 4023 1.9710 1.0 59.8108 1.8236
2.0917 10.0 4470 1.9135 1.0 63.5706 2.0827
2.0302 11.0 4917 1.8624 1.0 60.9909 2.1980
2.0009 12.0 5364 1.8201 1.0 58.3167 2.3737
1.9478 13.0 5811 1.7800 1.0 59.5104 2.6749
1.9062 14.0 6258 1.7443 1.0 62.0392 2.8530
1.8946 15.0 6705 1.7109 1.0 62.7633 2.9931
1.8473 16.0 7152 1.6837 1.0 67.1382 3.2108
1.8293 17.0 7599 1.6539 1.0 63.2569 3.2521
1.8043 18.0 8046 1.6283 1.0 64.8657 3.5135
1.7746 19.0 8493 1.6055 1.0 65.3130 3.6802
1.7476 20.0 8940 1.5824 1.0 63.0923 3.8136
1.7504 21.0 9387 1.5615 1.0 63.1112 3.9703
1.6967 22.0 9834 1.5403 1.0 62.8212 4.1350
1.675 23.0 10281 1.5224 1.0 61.3662 4.2455
1.6563 24.0 10728 1.5053 1.0 60.7117 4.4245
1.6528 25.0 11175 1.4848 1.0 65.6816 4.5583
1.6143 26.0 11622 1.4692 1.0 59.1256 4.6835
1.6241 27.0 12069 1.4557 1.0 61.9651 4.8235
1.5818 28.0 12516 1.4404 1.0 65.2822 4.9559
1.5555 29.0 12963 1.4248 1.0 62.0607 5.0545
1.5487 30.0 13410 1.4118 1.0 60.2353 5.2027
1.5477 31.0 13857 1.4000 1.0 60.2108 5.2571
1.5129 32.0 14304 1.3893 1.0 60.5813 5.3978
1.5345 33.0 14751 1.3763 1.0 61.8097 5.5671
1.4995 34.0 15198 1.3655 1.0 59.2048 5.5855
1.4859 35.0 15645 1.3556 1.0 60.9427 5.6993
1.4635 36.0 16092 1.3441 1.0 58.0727 5.7860
1.4432 37.0 16539 1.3368 1.0 58.5465 5.9151
1.4568 38.0 16986 1.3253 1.0 64.0433 5.9920
1.4298 39.0 17433 1.3160 1.0 59.5569 6.0769
1.3959 40.0 17880 1.3069 1.0 62.9847 6.1072
1.393 41.0 18327 1.2998 1.0 59.5837 6.2188
1.3913 42.0 18774 1.2923 1.0 60.4817 6.2844
1.4034 43.0 19221 1.2855 1.0 61.9362 6.3534
1.37 44.0 19668 1.2743 1.0 60.6214 6.4056
1.3536 45.0 20115 1.2665 1.0 63.8938 6.4811
1.3514 46.0 20562 1.2609 1.0 61.7827 6.5167
1.359 47.0 21009 1.2558 1.0 53.8058 6.5776
1.3249 48.0 21456 1.2452 1.0 58.4070 6.6900
1.3388 49.0 21903 1.2413 1.0 61.2955 6.7241
1.3113 50.0 22350 1.2333 1.0 62.4257 6.7287

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
1
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/e9793706acad19a2f1b2de93b4bd05ad

Finetuned
(2270)
this model