Model save

6c1380b verified 8 months ago

4.05 kB

metadata

library_name: peft
license: mit
base_model: gpt2
tags:
  - generated_from_trainer
model-index:
  - name: Se124M100KInfPrompt_endtoken
    results: []

Se124M100KInfPrompt_endtoken

This model is a fine-tuned version of gpt2 on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0005
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 200
num_epochs: 50
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
0.7209	1.0	5717	0.7060
0.7027	2.0	11434	0.6916
0.7005	3.0	17151	0.6865
0.7009	4.0	22868	0.6858
0.6933	5.0	28585	0.6854
0.6922	6.0	34302	0.6825
0.6859	7.0	40019	0.6810
0.6923	8.0	45736	0.6812
0.6919	9.0	51453	0.6809
0.6871	10.0	57170	0.6795
0.6844	11.0	62887	0.6776
0.6923	12.0	68604	0.6780
0.6878	13.0	74321	0.6785
0.6765	14.0	80038	0.6775
0.6864	15.0	85755	0.6769
0.6776	16.0	91472	0.6761
0.6823	17.0	97189	0.6768
0.6743	18.0	102906	0.6751
0.682	19.0	108623	0.6776
0.6902	20.0	114340	0.6762
0.6774	21.0	120057	0.6751
0.6748	22.0	125774	0.6747
0.6864	23.0	131491	0.6745
0.6819	24.0	137208	0.6756
0.6818	25.0	142925	0.6745
0.6757	26.0	148642	0.6737
0.6801	27.0	154359	0.6734
0.6717	28.0	160076	0.6724
0.6717	29.0	165793	0.6722
0.6802	30.0	171510	0.6723
0.677	31.0	177227	0.6725
0.6764	32.0	182944	0.6712
0.6767	33.0	188661	0.6712
0.6758	34.0	194378	0.6716
0.6772	35.0	200095	0.6715
0.679	36.0	205812	0.6717
0.6744	37.0	211529	0.6702
0.6654	38.0	217246	0.6707
0.6723	39.0	222963	0.6704
0.6758	40.0	228680	0.6701
0.6795	41.0	234397	0.6701
0.6681	42.0	240114	0.6698
0.6761	43.0	245831	0.6700
0.673	44.0	251548	0.6697
0.6736	45.0	257265	0.6698
0.673	46.0	262982	0.6695
0.6686	47.0	268699	0.6695
0.666	48.0	274416	0.6696
0.663	49.0	280133	0.6695
0.6667	50.0	285850	0.6695