67f78e5609d10f79d93334947eb4ea81

This model is a fine-tuned version of google/mt5-small on the Helsinki-NLP/opus_books [fr-nl] dataset. It achieves the following results on the evaluation set:

  • Loss: 1.9072
  • Data Size: 1.0
  • Epoch Runtime: 146.8319
  • Bleu: 7.6933

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 25.0059 0 12.7692 0.0048
No log 1 1000 21.2792 0.0078 13.7547 0.0046
No log 2 2000 15.9712 0.0156 15.1095 0.0047
No log 3 3000 12.4554 0.0312 17.7115 0.0063
0.5906 4 4000 7.8840 0.0625 22.7222 0.0151
7.1888 5 5000 4.8394 0.125 31.0653 0.0323
0.2984 6 6000 3.5904 0.25 47.6058 0.8191
0.3735 7 7000 3.2006 0.5 81.2562 1.8199
3.6568 8.0 8000 2.9083 1.0 146.8896 2.8294
3.4029 9.0 9000 2.7560 1.0 146.8793 3.3878
3.2491 10.0 10000 2.6562 1.0 144.8004 3.7127
3.1471 11.0 11000 2.5820 1.0 143.9108 4.0387
3.0497 12.0 12000 2.5156 1.0 143.9953 4.2739
2.9515 13.0 13000 2.4650 1.0 143.0167 4.5165
2.8894 14.0 14000 2.4179 1.0 143.5409 4.6757
2.8164 15.0 15000 2.3775 1.0 146.5068 4.8487
2.7565 16.0 16000 2.3439 1.0 142.7527 5.0430
2.668 17.0 17000 2.3097 1.0 143.3021 5.2109
2.6807 18.0 18000 2.2790 1.0 141.4281 5.3746
2.6175 19.0 19000 2.2513 1.0 140.5433 5.5266
2.5615 20.0 20000 2.2301 1.0 142.2420 5.6386
2.5557 21.0 21000 2.2066 1.0 141.5709 5.7679
2.4982 22.0 22000 2.1906 1.0 141.4199 5.8798
2.4646 23.0 23000 2.1690 1.0 142.7218 5.9994
2.4235 24.0 24000 2.1503 1.0 144.2434 6.0590
2.448 25.0 25000 2.1291 1.0 142.9854 6.1774
2.3732 26.0 26000 2.1164 1.0 142.8238 6.2716
2.3499 27.0 27000 2.1017 1.0 142.3916 6.3534
2.3031 28.0 28000 2.0889 1.0 142.5359 6.4245
2.3186 29.0 29000 2.0757 1.0 142.3031 6.5032
2.2609 30.0 30000 2.0675 1.0 143.6074 6.5912
2.2772 31.0 31000 2.0480 1.0 143.4460 6.6842
2.2382 32.0 32000 2.0414 1.0 142.8258 6.7297
2.1964 33.0 33000 2.0333 1.0 142.5789 6.8041
2.1787 34.0 34000 2.0263 1.0 141.3301 6.8929
2.1484 35.0 35000 2.0118 1.0 140.3920 6.9442
2.1415 36.0 36000 1.9954 1.0 141.8709 7.0288
2.1151 37.0 37000 1.9950 1.0 142.0263 7.0444
2.1257 38.0 38000 1.9811 1.0 141.7408 7.1191
2.0678 39.0 39000 1.9765 1.0 141.5414 7.1762
2.0599 40.0 40000 1.9726 1.0 142.8618 7.2139
2.0554 41.0 41000 1.9567 1.0 142.9126 7.2727
2.0399 42.0 42000 1.9563 1.0 142.8270 7.3510
2.0209 43.0 43000 1.9488 1.0 142.8891 7.3960
2.0283 44.0 44000 1.9399 1.0 140.6617 7.4480
1.9838 45.0 45000 1.9342 1.0 145.7035 7.4922
1.9968 46.0 46000 1.9319 1.0 142.4968 7.5003
1.9656 47.0 47000 1.9235 1.0 141.3154 7.5467
1.9808 48.0 48000 1.9177 1.0 142.4049 7.5940
1.9234 49.0 49000 1.9177 1.0 142.8916 7.6179
1.9233 50.0 50000 1.9072 1.0 146.8319 7.6933

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
2
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/67f78e5609d10f79d93334947eb4ea81

Base model

google/mt5-small
Finetuned
(666)
this model