70271a268c3e123382014df2a1971831

This model is a fine-tuned version of facebook/mbart-large-cc25 on the Helsinki-NLP/opus_books [en-pl] dataset. It achieves the following results on the evaluation set:

  • Loss: 5.8639
  • Data Size: 1.0
  • Epoch Runtime: 22.1241
  • Bleu: 0.3164

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 13.6352 0 2.4635 0.1990
No log 1 70 11.3085 0.0078 2.9932 0.2009
No log 2 140 11.0147 0.0156 4.5462 0.2207
No log 3 210 9.2760 0.0312 5.5653 0.0571
No log 4 280 8.3625 0.0625 7.5766 0.1924
No log 5 350 7.5944 0.125 9.6046 0.1633
No log 6 420 6.6443 0.25 11.2748 0.2749
1.126 7 490 5.7002 0.5 14.6275 0.6001
5.2593 8.0 560 5.0453 1.0 23.8047 0.7805
4.7225 9.0 630 4.7522 1.0 23.1385 0.9869
4.0016 10.0 700 4.5100 1.0 23.6459 1.0757
4.2758 11.0 770 11.2109 1.0 21.8179 0.0
11.9106 12.0 840 6.7697 1.0 22.6605 0.0077
6.5717 13.0 910 6.4409 1.0 23.5267 0.1066
6.162 14.0 980 5.8639 1.0 22.1241 0.3164

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
1
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/70271a268c3e123382014df2a1971831

Finetuned
(67)
this model