mt5-amharic-antonym

This model is a fine-tuned version of google/mt5-small on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5822

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 100

Training results

Training Loss Epoch Step Validation Loss
48.1602 1.0 60 39.0911
39.8397 2.0 120 28.2010
33.5322 3.0 180 22.2332
29.7329 4.0 240 18.5160
26.4591 5.0 300 16.6270
23.7199 6.0 360 14.9033
21.4338 7.0 420 13.3966
19.4494 8.0 480 12.1507
17.5813 9.0 540 11.0042
15.385 10.0 600 9.7896
14.1214 11.0 660 8.7964
12.824 12.0 720 7.9328
11.4696 13.0 780 7.1627
10.2074 14.0 840 6.4906
9.0975 15.0 900 5.9028
8.3801 16.0 960 5.3529
7.7496 17.0 1020 4.9146
6.7216 18.0 1080 4.4308
6.2098 19.0 1140 3.9412
5.4955 20.0 1200 3.4992
5.0123 21.0 1260 3.1167
4.3249 22.0 1320 2.7577
3.7474 23.0 1380 2.3593
3.3065 24.0 1440 2.0139
2.901 25.0 1500 1.6806
2.4941 26.0 1560 1.5083
2.2495 27.0 1620 1.3519
2.1328 28.0 1680 1.2387
1.9432 29.0 1740 1.1626
1.7885 30.0 1800 1.0925
1.6632 31.0 1860 1.0438
1.5964 32.0 1920 1.0213
1.4927 33.0 1980 0.9974
1.443 34.0 2040 0.9755
1.4459 35.0 2100 0.9626
1.4127 36.0 2160 0.9419
1.3008 37.0 2220 0.9232
1.3198 38.0 2280 0.9001
1.2208 39.0 2340 0.8826
1.2165 40.0 2400 0.8694
1.2188 41.0 2460 0.8589
1.1627 42.0 2520 0.8427
1.155 43.0 2580 0.8290
1.069 44.0 2640 0.8145
1.0762 45.0 2700 0.8038
1.0239 46.0 2760 0.7905
1.0317 47.0 2820 0.7829
1.0047 48.0 2880 0.7727
0.9471 49.0 2940 0.7634
0.9366 50.0 3000 0.7542
0.9635 51.0 3060 0.7464
0.8958 52.0 3120 0.7378
0.9107 53.0 3180 0.7269
0.8582 54.0 3240 0.7168
0.8749 55.0 3300 0.7082
0.8661 56.0 3360 0.6979
0.838 57.0 3420 0.6865
0.8453 58.0 3480 0.6786
0.8125 59.0 3540 0.6671
0.8392 60.0 3600 0.6605
0.8039 61.0 3660 0.6539
0.7836 62.0 3720 0.6474
0.8213 63.0 3780 0.6412
0.8254 64.0 3840 0.6374
0.817 65.0 3900 0.6325
0.8193 66.0 3960 0.6288
0.7962 67.0 4020 0.6231
0.7844 68.0 4080 0.6178
0.7597 69.0 4140 0.6146
0.7924 70.0 4200 0.6108
0.7812 71.0 4260 0.6064
0.7714 72.0 4320 0.6042
0.8165 73.0 4380 0.6029
0.7469 74.0 4440 0.6007
0.7636 75.0 4500 0.5988
0.7597 76.0 4560 0.5976
0.7227 77.0 4620 0.5944
0.7816 78.0 4680 0.5924
0.7509 79.0 4740 0.5914
0.7563 80.0 4800 0.5903
0.7658 81.0 4860 0.5879
0.7438 82.0 4920 0.5867
0.7357 83.0 4980 0.5858
0.7325 84.0 5040 0.5846
0.741 85.0 5100 0.5838
0.7294 86.0 5160 0.5833
0.7199 87.0 5220 0.5826
0.7642 88.0 5280 0.5820
0.7459 89.0 5340 0.5821
0.7152 90.0 5400 0.5822

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.4.1+cu121
  • Datasets 3.2.0
  • Tokenizers 0.19.1
Downloads last month
6
Safetensors
Model size
0.3B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Beck90/mt5-amharic-antonym

Base model

google/mt5-small
Finetuned
(611)
this model

Evaluation results