en_wiki_mlm_13

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss
No log	1.1319	2000	7.8860
7.9349	2.2637	4000	7.1166
7.9349	3.3956	6000	7.0106
7.0278	4.5274	8000	6.9329
7.0278	5.6593	10000	6.8746
6.8808	6.7912	12000	6.7941
6.8808	7.9230	14000	6.7404
6.7556	9.0549	16000	6.6972
6.7556	10.1868	18000	6.6456
6.6549	11.3186	20000	6.6062
6.6549	12.4505	22000	6.5677
6.5603	13.5823	24000	6.4841
6.5603	14.7142	26000	6.3446
6.3833	15.8461	28000	6.1835
6.3833	16.9779	30000	5.9980
6.0808	18.1098	32000	5.7140
6.0808	19.2417	34000	5.3122
5.4546	20.3735	36000	4.8991
5.4546	21.5054	38000	4.6956
4.8294	22.6372	40000	4.5697
4.8294	23.7691	42000	4.3905
4.4991	24.9010	44000	4.2690
4.4991	26.0328	46000	4.1406
4.2339	27.1647	48000	4.0296
4.2339	28.2965	50000	3.9278
4.0315	29.4284	52000	3.8567
4.0315	30.5603	54000	3.7756
3.8738	31.6921	56000	3.7191
3.8738	32.8240	58000	3.6721
3.744	33.9559	60000	3.6069
3.744	35.0877	62000	3.5736
3.6408	36.2196	64000	3.5199
3.6408	37.3514	66000	3.4748
3.5553	38.4833	68000	3.4648
3.5553	39.6152	70000	3.4312
3.4864	40.7470	72000	3.4074
3.4864	41.8789	74000	3.3510
3.4224	43.0108	76000	3.3420
3.4224	44.1426	78000	3.3249
3.3729	45.2745	80000	3.3256
3.3729	46.4063	82000	3.2926
3.3305	47.5382	84000	3.2637
3.3305	48.6701	86000	3.2796
3.2928	49.8019	88000	3.2453
3.2928	50.9338	90000	3.2232
3.2636	52.0656	92000	3.2075
3.2636	53.1975	94000	3.2181
3.2402	54.3294	96000	3.2066
3.2402	55.4612	98000	3.1755
3.2256	56.5931	100000	3.1892

Safetensors

Model size

14.9M params

Tensor type

F32