BMU_Finetuned_GPT2_model_MedQUAD

This model is a fine-tuned version of gpt2 on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 4
eval_batch_size: 4
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 50

Training Loss	Epoch	Step	Validation Loss
3.0006	1.0	2461	2.5653
2.531	2.0	4922	2.3356
1.9099	3.0	7383	2.2514
1.3875	4.0	9844	2.2217
1.4393	5.0	12305	2.2787
1.2323	6.0	14766	2.3574
1.0674	7.0	17227	2.4938
0.9805	8.0	19688	2.6373
0.6874	9.0	22149	2.8057
0.5196	10.0	24610	3.0072
0.5449	11.0	27071	3.1151
0.4589	12.0	29532	3.2684
0.4114	13.0	31993	3.3635
0.3701	14.0	34454	3.4848
0.3241	15.0	36915	3.5529
0.3004	16.0	39376	3.6071
0.2966	17.0	41837	3.6471
0.2905	18.0	44298	3.7240
0.2399	19.0	46759	3.7850
0.2007	20.0	49220	3.8772
0.1967	21.0	51681	3.9415
0.197	22.0	54142	3.9142
0.1821	23.0	56603	3.9967
0.184	24.0	59064	4.0487
0.1638	25.0	61525	4.1227
0.1489	26.0	63986	4.1623
0.1399	27.0	66447	4.2030
0.1306	28.0	68908	4.4161
0.1311	29.0	71369	4.3014
0.1244	30.0	73830	4.3804
0.1224	31.0	76291	4.4235
0.1098	32.0	78752	4.4575
0.1091	33.0	81213	4.5210
0.1072	34.0	83674	4.6605
0.1181	35.0	86135	4.6586
0.1115	36.0	88596	4.7863
0.0899	37.0	91057	4.8294
0.0962	38.0	93518	4.8182
0.0976	39.0	95979	4.8918
0.0907	40.0	98440	5.1227
0.0805	41.0	100901	5.1013
0.0711	42.0	103362	5.1835
0.0765	43.0	105823	5.2443
0.0795	44.0	108284	5.3505
0.0781	45.0	110745	5.4632
0.0674	46.0	113206	5.5556
0.0736	47.0	115667	5.6530
0.0637	48.0	118128	5.7094
0.065	49.0	120589	5.7493
0.0731	50.0	123050	5.7820

Safetensors

Model size

0.1B params

Tensor type

F32

Base model

Finetuned

this model