Se124M500KInfPrompt_endtoken

This model is a fine-tuned version of gpt2 on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 32
eval_batch_size: 32
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 50
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
0.1898	1.0	5427	0.7433
0.1857	2.0	10854	0.7238
0.1843	3.0	16281	0.7118
0.1813	4.0	21708	0.7045
0.1802	5.0	27135	0.6990
0.1785	6.0	32562	0.6944
0.1769	7.0	37989	0.6918
0.1743	8.0	43416	0.6875
0.1752	9.0	48843	0.6854
0.1756	10.0	54270	0.6854
0.1736	11.0	59697	0.6837
0.1756	12.0	65124	0.6812
0.173	13.0	70551	0.6798
0.1737	14.0	75978	0.6791
0.1741	15.0	81405	0.6783
0.177	16.0	86832	0.6771
0.1734	17.0	92259	0.6765
0.1719	18.0	97686	0.6760
0.1737	19.0	103113	0.6763
0.1716	20.0	108540	0.6747
0.1713	21.0	113967	0.6741
0.1739	22.0	119394	0.6738
0.1694	23.0	124821	0.6737
0.1703	24.0	130248	0.6743
0.1697	25.0	135675	0.6730
0.172	26.0	141102	0.6731
0.1711	27.0	146529	0.6720
0.1726	28.0	151956	0.6720
0.1703	29.0	157383	0.6716
0.1732	30.0	162810	0.6716
0.171	31.0	168237	0.6719

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

Adapter

this model