decafd18d2dee1b3cd3d3a28a6635305

This model is a fine-tuned version of google-t5/t5-base on the Helsinki-NLP/opus_books dataset. It achieves the following results on the evaluation set:

  • Loss: 0.9814
  • Data Size: 1.0
  • Epoch Runtime: 98.1355
  • Bleu: 13.3563

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 2.5607 0 14.5991 0.1842
No log 1 437 2.4747 0.0078 16.6316 0.2266
No log 2 874 2.3841 0.0156 12.4275 0.4988
No log 3 1311 2.2488 0.0312 17.1922 0.5118
No log 4 1748 2.1029 0.0625 19.2542 0.7667
2.2109 5 2185 1.9879 0.125 26.4880 1.2838
2.1006 6 2622 1.8642 0.25 42.0241 2.6669
1.9291 7 3059 1.7206 0.5 70.2491 4.0708
1.7381 8.0 3496 1.5636 1.0 149.8675 5.5590
1.6181 9.0 3933 1.4499 1.0 130.7932 6.6572
1.5056 10.0 4370 1.3723 1.0 145.4794 7.4242
1.447 11.0 4807 1.3099 1.0 145.7462 8.0778
1.383 12.0 5244 1.2594 1.0 108.3039 8.8326
1.3299 13.0 5681 1.2202 1.0 115.9984 9.1728
1.2613 14.0 6118 1.1907 1.0 104.9783 9.7938
1.2273 15.0 6555 1.1621 1.0 102.6652 9.7652
1.1678 16.0 6992 1.1362 1.0 100.6030 10.2138
1.1441 17.0 7429 1.1101 1.0 105.9486 10.3005
1.1065 18.0 7866 1.0993 1.0 106.0890 11.0051
1.0658 19.0 8303 1.0737 1.0 104.2994 10.9763
1.0488 20.0 8740 1.0597 1.0 101.6413 11.4335
1.0181 21.0 9177 1.0459 1.0 101.3497 11.7183
0.9851 22.0 9614 1.0368 1.0 101.2914 11.7005
0.963 23.0 10051 1.0302 1.0 111.7042 11.8514
0.9359 24.0 10488 1.0192 1.0 111.2087 12.1365
0.9233 25.0 10925 1.0095 1.0 105.1604 12.3197
0.9022 26.0 11362 1.0089 1.0 104.0405 12.3111
0.8816 27.0 11799 1.0057 1.0 104.8092 12.5789
0.8525 28.0 12236 0.9977 1.0 106.6144 12.4490
0.8293 29.0 12673 0.9884 1.0 98.7457 12.7247
0.8088 30.0 13110 0.9838 1.0 100.4866 12.8502
0.8072 31.0 13547 0.9825 1.0 98.6900 12.8563
0.7987 32.0 13984 0.9789 1.0 99.3213 12.8370
0.7723 33.0 14421 0.9761 1.0 98.1782 12.9031
0.7595 34.0 14858 0.9773 1.0 97.5901 13.0862
0.7414 35.0 15295 0.9782 1.0 102.2905 13.1639
0.7138 36.0 15732 0.9729 1.0 96.9078 13.5037
0.7123 37.0 16169 0.9727 1.0 97.9281 13.1368
0.6947 38.0 16606 0.9816 1.0 101.6442 13.5031
0.6734 39.0 17043 0.9750 1.0 99.8666 13.1697
0.664 40.0 17480 0.9799 1.0 98.9188 13.2457
0.6697 41.0 17917 0.9814 1.0 98.1355 13.3563

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
1
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/decafd18d2dee1b3cd3d3a28a6635305

Base model

google-t5/t5-base
Finetuned
(718)
this model