Se124M10KInfPrompt_endtoken

This model is a fine-tuned version of gpt2 on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0005
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 200
num_epochs: 50
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
0.8085	1.0	610	0.7760
0.7801	2.0	1220	0.7436
0.7608	3.0	1830	0.7269
0.7438	4.0	2440	0.7199
0.7413	5.0	3050	0.7118
0.7343	6.0	3660	0.7121
0.7332	7.0	4270	0.7089
0.7319	8.0	4880	0.7025
0.7289	9.0	5490	0.7001
0.7236	10.0	6100	0.6965
0.7147	11.0	6710	0.6970
0.7126	12.0	7320	0.6973
0.7167	13.0	7930	0.6935
0.711	14.0	8540	0.6927
0.7057	15.0	9150	0.6940
0.7109	16.0	9760	0.6924
0.7117	17.0	10370	0.6928
0.7086	18.0	10980	0.6882
0.7004	19.0	11590	0.6872
0.7016	20.0	12200	0.6895
0.7027	21.0	12810	0.6884
0.6928	22.0	13420	0.6885
0.7059	23.0	14030	0.6894
0.6916	24.0	14640	0.6875

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

Adapter

this model