a3bdaa46b3e502d2a06cd428b5d3ae1e

This model is a fine-tuned version of google/mt5-base on the Helsinki-NLP/opus_books [fr-ru] dataset. It achieves the following results on the evaluation set:

  • Loss: 2.0601
  • Data Size: 1.0
  • Epoch Runtime: 46.9453
  • Bleu: 5.7971

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 18.6095 0 4.5211 0.0136
No log 1 204 17.3576 0.0078 4.7122 0.0142
No log 2 408 16.3045 0.0156 5.5679 0.0169
No log 3 612 14.8927 0.0312 6.2537 0.0159
No log 4 816 11.1767 0.0625 8.6712 0.0145
No log 5 1020 7.9354 0.125 11.9308 0.0252
0.9848 6 1224 5.3314 0.25 15.4465 0.0287
5.14 7 1428 2.9558 0.5 26.3808 0.6859
3.6072 8.0 1632 2.5230 1.0 47.0342 2.3668
3.2278 9.0 1836 2.3959 1.0 44.3959 2.7395
3.0851 10.0 2040 2.3240 1.0 44.8476 3.0738
2.9153 11.0 2244 2.2805 1.0 45.1252 3.2457
2.8097 12.0 2448 2.2381 1.0 44.5890 3.4658
2.6992 13.0 2652 2.1911 1.0 44.6900 3.7137
2.5957 14.0 2856 2.1635 1.0 45.4663 3.8334
2.5173 15.0 3060 2.1350 1.0 45.7136 3.9924
2.4215 16.0 3264 2.1216 1.0 46.7592 4.1316
2.3519 17.0 3468 2.1107 1.0 45.0083 4.2839
2.3378 18.0 3672 2.0925 1.0 46.0805 4.4017
2.2692 19.0 3876 2.0784 1.0 44.8357 4.4737
2.2006 20.0 4080 2.0757 1.0 46.2836 4.8063
2.1543 21.0 4284 2.0690 1.0 47.1755 4.8514
2.0956 22.0 4488 2.0558 1.0 47.1702 5.0655
2.0687 23.0 4692 2.0561 1.0 46.9332 4.9966
2.0073 24.0 4896 2.0529 1.0 45.2441 4.9802
1.9752 25.0 5100 2.0567 1.0 46.7553 5.0820
1.927 26.0 5304 2.0518 1.0 46.3650 5.2447
1.8755 27.0 5508 2.0517 1.0 46.3734 5.1145
1.8388 28.0 5712 2.0515 1.0 46.9446 5.4780
1.779 29.0 5916 2.0525 1.0 45.3471 5.4312
1.7595 30.0 6120 2.0545 1.0 44.8351 5.8170
1.7471 31.0 6324 2.0574 1.0 46.3363 5.6903
1.7126 32.0 6528 2.0601 1.0 46.9453 5.7971

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
-
Safetensors
Model size
1.0B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/a3bdaa46b3e502d2a06cd428b5d3ae1e

Base model

google/mt5-base
Finetuned
(296)
this model