en_wiki_mlm_42

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss
No log	1.1319	2000	7.9220
7.9468	2.2637	4000	7.1138
7.9468	3.3956	6000	7.0179
7.0274	4.5274	8000	6.9387
7.0274	5.6593	10000	6.8684
6.8824	6.7912	12000	6.8074
6.8824	7.9230	14000	6.7360
6.7613	9.0549	16000	6.6897
6.7613	10.1868	18000	6.6394
6.6553	11.3186	20000	6.5982
6.6553	12.4505	22000	6.5549
6.5571	13.5823	24000	6.4910
6.5571	14.7142	26000	6.3365
6.3693	15.8461	28000	6.1672
6.3693	16.9779	30000	6.0045
6.0899	18.1098	32000	5.7855
6.0899	19.2417	34000	5.4393
5.5439	20.3735	36000	4.9515
5.5439	21.5054	38000	4.7547
4.8683	22.6372	40000	4.5845
4.8683	23.7691	42000	4.4155
4.5176	24.9010	44000	4.2623
4.5176	26.0328	46000	4.1626
4.2542	27.1647	48000	4.0574
4.2542	28.2965	50000	3.9692
4.0419	29.4284	52000	3.8587
4.0419	30.5603	54000	3.7976
3.886	31.6921	56000	3.7284
3.886	32.8240	58000	3.6753
3.7574	33.9559	60000	3.6361
3.7574	35.0877	62000	3.5934
3.6518	36.2196	64000	3.5501
3.6518	37.3514	66000	3.5198
3.5686	38.4833	68000	3.4513
3.5686	39.6152	70000	3.4401
3.4978	40.7470	72000	3.4219
3.4978	41.8789	74000	3.3757
3.4364	43.0108	76000	3.3725
3.4364	44.1426	78000	3.3441
3.3897	45.2745	80000	3.3154
3.3897	46.4063	82000	3.3061
3.3414	47.5382	84000	3.2805
3.3414	48.6701	86000	3.2789
3.3082	49.8019	88000	3.2435
3.3082	50.9338	90000	3.2386
3.2764	52.0656	92000	3.2367
3.2764	53.1975	94000	3.2309
3.261	54.3294	96000	3.2278
3.261	55.4612	98000	3.2301
3.2384	56.5931	100000	3.2073

Safetensors

Model size

14.9M params

Tensor type

F32