b075eb4a9e29a1e6571d7047473604c5

This model is a fine-tuned version of google/mt5-base on the Helsinki-NLP/opus_books [fr-nl] dataset. It achieves the following results on the evaluation set:

  • Loss: 1.6085
  • Data Size: 1.0
  • Epoch Runtime: 209.5152
  • Bleu: 9.7936

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 15.4205 0 17.2439 0.0154
No log 1 1000 13.9017 0.0078 19.7549 0.0194
No log 2 2000 11.4029 0.0156 20.5286 0.0212
No log 3 3000 10.3222 0.0312 24.3867 0.0179
0.4118 4 4000 5.8715 0.0625 31.3563 0.0230
4.4572 5 5000 2.9395 0.125 42.7468 2.0196
0.2117 6 6000 2.5269 0.25 66.7118 3.8018
0.2683 7 7000 2.3041 0.5 116.0491 4.8542
2.6311 8.0 8000 2.1122 1.0 210.4427 5.7843
2.4335 9.0 9000 2.0011 1.0 207.4353 6.4811
2.2967 10.0 10000 1.9233 1.0 207.8650 6.9229
2.2179 11.0 11000 1.8646 1.0 207.1371 7.3574
2.1122 12.0 12000 1.8194 1.0 206.8493 7.7190
2.0336 13.0 13000 1.7927 1.0 207.7914 7.9566
1.9738 14.0 14000 1.7589 1.0 210.8171 8.2123
1.8977 15.0 15000 1.7363 1.0 209.6291 8.3925
1.8554 16.0 16000 1.7115 1.0 210.7359 8.5549
1.7706 17.0 17000 1.6998 1.0 211.4426 8.7277
1.7873 18.0 18000 1.6766 1.0 211.3672 8.8573
1.7265 19.0 19000 1.6663 1.0 209.6792 8.9481
1.672 20.0 20000 1.6533 1.0 210.9265 9.1015
1.6525 21.0 21000 1.6385 1.0 210.8974 9.1918
1.6174 22.0 22000 1.6344 1.0 211.5922 9.3196
1.5776 23.0 23000 1.6296 1.0 212.8005 9.3744
1.5264 24.0 24000 1.6211 1.0 209.8647 9.4105
1.5552 25.0 25000 1.6154 1.0 210.6224 9.5693
1.4744 26.0 26000 1.6045 1.0 209.0745 9.5901
1.4567 27.0 27000 1.6080 1.0 209.5370 9.6895
1.4102 28.0 28000 1.6087 1.0 209.9979 9.6693
1.4109 29.0 29000 1.6057 1.0 209.9585 9.7511
1.3673 30.0 30000 1.6085 1.0 209.5152 9.7936

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
1
Safetensors
Model size
1.0B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/b075eb4a9e29a1e6571d7047473604c5

Base model

google/mt5-base
Finetuned
(301)
this model