gpt2_small_wiki_100M_32768_53

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Accuracy
7.1917	0.22	2000	7.1253	0.1375
6.2932	0.43	4000	6.4682	0.1656
5.7724	0.65	6000	6.0204	0.1842
5.3495	0.87	8000	5.6941	0.1986
5.0008	1.08	10000	5.4465	0.2123
4.6744	1.3	12000	5.2044	0.2326
4.3954	1.52	14000	5.0246	0.2466
4.186	1.73	16000	4.8652	0.2584
4.0297	1.95	18000	4.7515	0.2660
3.8857	2.17	20000	4.6631	0.2734
3.81	2.38	22000	4.5863	0.2789
3.7289	2.6	24000	4.5271	0.2832
3.677	2.82	26000	4.4742	0.2878
3.6032	3.03	28000	4.4318	0.2922
3.5399	3.25	30000	4.3990	0.2955
3.5102	3.47	32000	4.3566	0.2981
3.484	3.68	34000	4.3380	0.3012
3.4487	3.9	36000	4.3095	0.3034
3.3679	4.12	38000	4.2899	0.3053
3.3619	4.33	40000	4.2583	0.3080
3.3495	4.55	42000	4.2440	0.3095
3.3216	4.77	44000	4.2131	0.3115
3.3056	4.98	46000	4.1926	0.3145
3.2263	5.2	48000	4.1758	0.3160
3.219	5.42	50000	4.1600	0.3178
3.2041	5.63	52000	4.1466	0.3192
3.1942	5.85	54000	4.1256	0.3210
3.1384	6.07	56000	4.1273	0.3214
3.1184	6.28	58000	4.1083	0.3233
3.1166	6.5	60000	4.0978	0.3241
3.1126	6.72	62000	4.0857	0.3253
3.106	6.93	64000	4.0709	0.3269
3.0349	7.15	66000	4.0753	0.3267
3.0382	7.36	68000	4.0661	0.3277
3.0407	7.58	70000	4.0545	0.3293
3.0384	7.8	72000	4.0445	0.3303
3.0227	8.01	74000	4.0465	0.3312
2.9699	8.23	76000	4.0399	0.3313
2.976	8.45	78000	4.0330	0.3322
2.9766	8.66	80000	4.0177	0.3337
2.9713	8.88	82000	4.0155	0.3338
2.9172	9.1	84000	4.0189	0.3341
2.917	9.31	86000	4.0145	0.3347
2.9207	9.53	88000	4.0109	0.3352
2.9169	9.75	90000	3.9986	0.3364
2.9102	9.96	92000	3.9945	0.3366
2.8663	10.18	94000	3.9954	0.3371
2.8715	10.4	96000	3.9949	0.3371
2.8695	10.61	98000	3.9913	0.3376
2.8658	10.83	100000	3.9882	0.3378

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support