GPT2-705M

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss
8.0336	1.0	3	7.3770
6.2535	2.0	6	6.3128
5.6213	3.0	9	5.6716
4.8242	4.0	12	5.1521
4.6266	5.0	15	4.9789
4.4097	6.0	18	4.7306
4.0358	7.0	21	4.5332
4.0027	8.0	24	4.4014
3.8638	9.0	27	4.1175
3.5414	10.0	30	4.0355
3.4701	11.0	33	3.8834
3.4822	12.0	36	3.8336
3.0602	13.0	39	3.7213
3.1109	14.0	42	3.7379
2.9087	15.0	45	3.7389
2.7124	16.0	48	3.6220
2.5867	17.0	51	3.7192
2.4577	18.0	54	3.5953
2.279	19.0	57	3.7648
2.3218	20.0	60	3.6046

Safetensors

Model size

0.7B params

Tensor type

F32