b7537e0fe33c614891986f3d5c825c91

This model is a fine-tuned version of google/long-t5-local-large on the Helsinki-NLP/opus_books [fr-sv] dataset. It achieves the following results on the evaluation set:

  • Loss: 3.2105
  • Data Size: 1.0
  • Epoch Runtime: 38.8575
  • Bleu: 0.2691

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 219.7699 0 3.0637 0.0033
No log 1 75 206.1623 0.0078 3.4862 0.0033
No log 2 150 184.6765 0.0156 4.8569 0.0040
No log 3 225 158.5659 0.0312 7.0352 0.0042
No log 4 300 126.6633 0.0625 8.7852 0.0036
No log 5 375 78.1125 0.125 12.4607 0.0053
No log 6 450 34.4594 0.25 16.1818 0.0034
No log 7 525 15.1997 0.5 24.0986 0.0065
18.7003 8.0 600 11.0175 1.0 39.3052 0.0035
16.1992 9.0 675 9.5453 1.0 37.8482 0.0065
13.5206 10.0 750 8.3748 1.0 37.9171 0.0125
12.5957 11.0 825 7.6047 1.0 38.3496 0.0064
11.1559 12.0 900 7.1902 1.0 38.1695 0.0136
10.5851 13.0 975 7.0050 1.0 37.7150 0.0138
9.7612 14.0 1050 6.4365 1.0 38.2679 0.0163
9.2522 15.0 1125 6.2579 1.0 37.6155 0.0262
8.6555 16.0 1200 5.8325 1.0 37.5047 0.0388
8.3139 17.0 1275 5.6114 1.0 37.9400 0.0429
7.8543 18.0 1350 5.4419 1.0 37.7108 0.0428
7.6153 19.0 1425 5.1149 1.0 37.4297 0.0597
7.2648 20.0 1500 5.0269 1.0 37.1780 0.0605
7.1111 21.0 1575 4.9814 1.0 37.1947 0.0376
6.8276 22.0 1650 5.0170 1.0 37.3842 0.0462
6.6184 23.0 1725 4.6441 1.0 36.7989 0.1019
6.3262 24.0 1800 4.5687 1.0 37.3828 0.0698
6.234 25.0 1875 4.4098 1.0 37.0953 0.0428
5.9817 26.0 1950 4.3941 1.0 37.4955 0.0887
5.8618 27.0 2025 4.1422 1.0 37.4159 0.0867
5.6413 28.0 2100 4.0678 1.0 37.4700 0.1030
5.5014 29.0 2175 4.0464 1.0 37.5988 0.0937
5.3375 30.0 2250 3.9598 1.0 37.3789 0.1076
5.2614 31.0 2325 4.0291 1.0 37.4278 0.0897
5.1003 32.0 2400 3.9221 1.0 37.2692 0.1297
4.985 33.0 2475 3.7562 1.0 37.7929 0.1092
4.853 34.0 2550 3.8291 1.0 37.4791 0.0810
4.7849 35.0 2625 3.6264 1.0 37.1947 0.1982
4.6432 36.0 2700 3.6636 1.0 37.4072 0.1725
4.5749 37.0 2775 3.6744 1.0 37.4433 0.2292
4.4215 38.0 2850 3.5357 1.0 37.0327 0.2356
4.3749 39.0 2925 3.3974 1.0 38.0901 0.2529
4.2708 40.0 3000 3.5127 1.0 37.3004 0.2200
4.5798 41.0 3075 3.3963 1.0 36.7234 0.1987
4.1807 42.0 3150 3.3445 1.0 38.2956 0.2380
4.1043 43.0 3225 3.3404 1.0 37.2292 0.2919
4.0446 44.0 3300 3.3719 1.0 37.0030 0.2774
3.9573 45.0 3375 3.2986 1.0 37.8242 0.2756
3.8895 46.0 3450 3.3025 1.0 36.8436 0.2323
3.8454 47.0 3525 3.2694 1.0 37.9007 0.2297
3.7597 48.0 3600 3.2325 1.0 37.4543 0.3330
3.7171 49.0 3675 3.2183 1.0 37.8201 0.2300
3.6733 50.0 3750 3.2105 1.0 38.8575 0.2691

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
2
Safetensors
Model size
0.8B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/b7537e0fe33c614891986f3d5c825c91

Finetuned
(38)
this model