c5b12a681d1784a5d2bd19f15b6fa055

This model is a fine-tuned version of google-t5/t5-base on the Helsinki-NLP/opus_books dataset. It achieves the following results on the evaluation set:

  • Loss: 1.2239
  • Data Size: 1.0
  • Epoch Runtime: 551.7313
  • Bleu: 13.8107

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 2.9614 0 43.3896 1.2616
No log 1 2336 2.6315 0.0078 49.1768 2.5799
0.0426 2 4672 2.4835 0.0156 55.1410 3.3543
0.0605 3 7008 2.3631 0.0312 59.2835 4.1974
2.517 4 9344 2.2438 0.0625 77.6258 5.2345
2.3771 5 11680 2.1099 0.125 109.8965 6.5155
2.1849 6 14016 1.9473 0.25 168.9280 8.3202
2.0232 7 16352 1.7793 0.5 305.9969 10.0454
1.8339 8.0 18688 1.6112 1.0 556.5045 11.5823
1.6857 9.0 21024 1.5153 1.0 654.5415 12.1292
1.5826 10.0 23360 1.4566 1.0 607.7700 12.7234
1.5364 11.0 25696 1.4112 1.0 604.4563 13.2133
1.47 12.0 28032 1.3787 1.0 569.7755 13.1228
1.4091 13.0 30368 1.3494 1.0 580.4711 13.3372
1.3823 14.0 32704 1.3264 1.0 584.1167 13.7721
1.3276 15.0 35040 1.3081 1.0 574.8867 13.9498
1.345 16.0 37376 1.2958 1.0 628.8890 13.4865
1.2603 17.0 39712 1.2765 1.0 583.2858 13.7269
1.2638 18.0 42048 1.2682 1.0 623.2504 13.8755
1.2323 19.0 44384 1.2627 1.0 562.4616 13.7426
1.1931 20.0 46720 1.2478 1.0 590.6083 13.8796
1.1645 21.0 49056 1.2396 1.0 578.0513 13.6221
1.1487 22.0 51392 1.2348 1.0 578.9635 13.7758
1.1352 23.0 53728 1.2331 1.0 595.5290 13.5993
1.0869 24.0 56064 1.2301 1.0 580.7034 13.9527
1.0943 25.0 58400 1.2242 1.0 557.8745 13.9567
1.0709 26.0 60736 1.2177 1.0 607.4727 14.3138
1.0636 27.0 63072 1.2183 1.0 577.9959 13.7990
1.0399 28.0 65408 1.2145 1.0 616.5945 14.1473
1.0205 29.0 67744 1.2165 1.0 631.6346 13.9411
1.0111 30.0 70080 1.2168 1.0 555.7352 13.7206
0.9924 31.0 72416 1.2138 1.0 584.9051 13.9443
0.9949 32.0 74752 1.2126 1.0 662.6398 13.8611
0.9613 33.0 77088 1.2123 1.0 591.4603 13.8049
0.9642 34.0 79424 1.2153 1.0 665.3925 13.8260
0.9437 35.0 81760 1.2113 1.0 565.2026 13.8388
0.9413 36.0 84096 1.2106 1.0 537.2378 13.7324
0.9205 37.0 86432 1.2100 1.0 580.5891 13.7002
0.9219 38.0 88768 1.2152 1.0 561.8277 13.9455
0.9163 39.0 91104 1.2191 1.0 528.5714 13.9284
0.8856 40.0 93440 1.2203 1.0 562.3911 13.6180
0.8833 41.0 95776 1.2239 1.0 551.7313 13.8107

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
1
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/c5b12a681d1784a5d2bd19f15b6fa055

Finetuned
(729)
this model