mt5-amharic-antonym

This model is a fine-tuned version of google/mt5-small on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.5822

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 32
eval_batch_size: 32
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 100

Training results

Training Loss	Epoch	Step	Validation Loss
48.1602	1.0	60	39.0911
39.8397	2.0	120	28.2010
33.5322	3.0	180	22.2332
29.7329	4.0	240	18.5160
26.4591	5.0	300	16.6270
23.7199	6.0	360	14.9033
21.4338	7.0	420	13.3966
19.4494	8.0	480	12.1507
17.5813	9.0	540	11.0042
15.385	10.0	600	9.7896
14.1214	11.0	660	8.7964
12.824	12.0	720	7.9328
11.4696	13.0	780	7.1627
10.2074	14.0	840	6.4906
9.0975	15.0	900	5.9028
8.3801	16.0	960	5.3529
7.7496	17.0	1020	4.9146
6.7216	18.0	1080	4.4308
6.2098	19.0	1140	3.9412
5.4955	20.0	1200	3.4992
5.0123	21.0	1260	3.1167
4.3249	22.0	1320	2.7577
3.7474	23.0	1380	2.3593
3.3065	24.0	1440	2.0139
2.901	25.0	1500	1.6806
2.4941	26.0	1560	1.5083
2.2495	27.0	1620	1.3519
2.1328	28.0	1680	1.2387
1.9432	29.0	1740	1.1626
1.7885	30.0	1800	1.0925
1.6632	31.0	1860	1.0438
1.5964	32.0	1920	1.0213
1.4927	33.0	1980	0.9974
1.443	34.0	2040	0.9755
1.4459	35.0	2100	0.9626
1.4127	36.0	2160	0.9419
1.3008	37.0	2220	0.9232
1.3198	38.0	2280	0.9001
1.2208	39.0	2340	0.8826
1.2165	40.0	2400	0.8694
1.2188	41.0	2460	0.8589
1.1627	42.0	2520	0.8427
1.155	43.0	2580	0.8290
1.069	44.0	2640	0.8145
1.0762	45.0	2700	0.8038
1.0239	46.0	2760	0.7905
1.0317	47.0	2820	0.7829
1.0047	48.0	2880	0.7727
0.9471	49.0	2940	0.7634
0.9366	50.0	3000	0.7542
0.9635	51.0	3060	0.7464
0.8958	52.0	3120	0.7378
0.9107	53.0	3180	0.7269
0.8582	54.0	3240	0.7168
0.8749	55.0	3300	0.7082
0.8661	56.0	3360	0.6979
0.838	57.0	3420	0.6865
0.8453	58.0	3480	0.6786
0.8125	59.0	3540	0.6671
0.8392	60.0	3600	0.6605
0.8039	61.0	3660	0.6539
0.7836	62.0	3720	0.6474
0.8213	63.0	3780	0.6412
0.8254	64.0	3840	0.6374
0.817	65.0	3900	0.6325
0.8193	66.0	3960	0.6288
0.7962	67.0	4020	0.6231
0.7844	68.0	4080	0.6178
0.7597	69.0	4140	0.6146
0.7924	70.0	4200	0.6108
0.7812	71.0	4260	0.6064
0.7714	72.0	4320	0.6042
0.8165	73.0	4380	0.6029
0.7469	74.0	4440	0.6007
0.7636	75.0	4500	0.5988
0.7597	76.0	4560	0.5976
0.7227	77.0	4620	0.5944
0.7816	78.0	4680	0.5924
0.7509	79.0	4740	0.5914
0.7563	80.0	4800	0.5903
0.7658	81.0	4860	0.5879
0.7438	82.0	4920	0.5867
0.7357	83.0	4980	0.5858
0.7325	84.0	5040	0.5846
0.741	85.0	5100	0.5838
0.7294	86.0	5160	0.5833
0.7199	87.0	5220	0.5826
0.7642	88.0	5280	0.5820
0.7459	89.0	5340	0.5821
0.7152	90.0	5400	0.5822

Framework versions

Transformers 4.44.2
Pytorch 2.4.1+cu121
Datasets 3.2.0
Tokenizers 0.19.1

Downloads last month: -

Safetensors

Model size

0.3B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Beck90/mt5-amharic-antonym

Base model

google/mt5-small

Finetuned

(663)

this model