a513e980b2cf367ab8eea6f4d1b6864e

This model is a fine-tuned version of google-t5/t5-small on the Helsinki-NLP/opus_books dataset. It achieves the following results on the evaluation set:

  • Loss: 2.4829
  • Data Size: 1.0
  • Epoch Runtime: 12.8301
  • Bleu: 1.1859

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 4.9196 0 1.7581 0.0452
No log 1 86 4.9036 0.0078 2.7906 0.0456
No log 2 172 4.7683 0.0156 2.1189 0.0489
No log 3 258 4.6167 0.0312 2.2456 0.0575
No log 4 344 4.4223 0.0625 2.6220 0.0855
0.2495 5 430 4.2693 0.125 3.2169 0.0880
1.1137 6 516 4.0653 0.25 4.8851 0.1141
1.4913 7 602 3.8116 0.5 7.7852 0.1640
2.244 8.0 688 3.5580 1.0 14.7651 0.1932
3.6859 9.0 774 3.4243 1.0 13.0133 0.3036
3.5677 10.0 860 3.3250 1.0 12.7559 0.3887
3.5001 11.0 946 3.2470 1.0 13.7381 0.4316
3.4039 12.0 1032 3.1842 1.0 12.9902 0.4563
3.351 13.0 1118 3.1301 1.0 13.4499 0.5009
3.2783 14.0 1204 3.0783 1.0 13.0028 0.5354
3.2463 15.0 1290 3.0364 1.0 12.6125 0.5338
3.1979 16.0 1376 3.0006 1.0 12.5225 0.5516
3.1577 17.0 1462 2.9654 1.0 12.9970 0.5683
3.1142 18.0 1548 2.9309 1.0 12.5788 0.6017
3.075 19.0 1634 2.9053 1.0 12.3158 0.6179
3.0465 20.0 1720 2.8779 1.0 12.5898 0.6423
3.0153 21.0 1806 2.8523 1.0 11.9801 0.6640
3.0035 22.0 1892 2.8295 1.0 11.9979 0.7202
2.9598 23.0 1978 2.8069 1.0 12.1205 0.7813
2.9328 24.0 2064 2.7873 1.0 12.3217 0.7962
2.893 25.0 2150 2.7698 1.0 12.6392 0.8028
2.8921 26.0 2236 2.7514 1.0 12.7615 0.8282
2.8411 27.0 2322 2.7332 1.0 13.4124 0.8155
2.8286 28.0 2408 2.7148 1.0 12.4894 0.8278
2.8242 29.0 2494 2.7044 1.0 12.9070 0.8499
2.7917 30.0 2580 2.6899 1.0 12.9505 0.8816
2.7827 31.0 2666 2.6755 1.0 12.6046 0.8929
2.7398 32.0 2752 2.6666 1.0 11.6782 0.8981
2.7315 33.0 2838 2.6461 1.0 12.5770 0.9176
2.7199 34.0 2924 2.6410 1.0 13.1229 0.9124
2.7127 35.0 3010 2.6209 1.0 14.5130 0.9174
2.6797 36.0 3096 2.6066 1.0 14.3944 0.9410
2.6753 37.0 3182 2.6019 1.0 14.1127 0.9272
2.646 38.0 3268 2.5858 1.0 13.5025 0.9313
2.625 39.0 3354 2.5758 1.0 13.6073 0.9706
2.6172 40.0 3440 2.5639 1.0 13.0175 1.0059
2.6094 41.0 3526 2.5551 1.0 12.3109 1.0225
2.5961 42.0 3612 2.5475 1.0 12.4898 1.0111
2.5635 43.0 3698 2.5383 1.0 11.6926 1.0706
2.5724 44.0 3784 2.5275 1.0 11.6896 1.1004
2.5536 45.0 3870 2.5211 1.0 13.1903 1.1308
2.518 46.0 3956 2.5143 1.0 12.5735 1.1436
2.5136 47.0 4042 2.5037 1.0 13.7180 1.1571
2.4721 48.0 4128 2.4929 1.0 12.4402 1.1575
2.489 49.0 4214 2.4898 1.0 12.5055 1.2060
2.4664 50.0 4300 2.4829 1.0 12.8301 1.1859

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
-
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/a513e980b2cf367ab8eea6f4d1b6864e

Finetuned
(2257)
this model