Se124M10KInfKeyValue

This model is a fine-tuned version of gpt2 on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 32
eval_batch_size: 32
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 50
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
0.3613	1.0	225	0.9669
0.2223	2.0	450	0.7171
0.1879	3.0	675	0.6545
0.1715	4.0	900	0.6227
0.1658	5.0	1125	0.6118
0.1606	6.0	1350	0.6041
0.1571	7.0	1575	0.5922
0.1547	8.0	1800	0.5869
0.1541	9.0	2025	0.5817
0.1499	10.0	2250	0.5814
0.1511	11.0	2475	0.5738
0.1479	12.0	2700	0.5730
0.1487	13.0	2925	0.5697
0.1449	14.0	3150	0.5665
0.1448	15.0	3375	0.5653
0.1435	16.0	3600	0.5645
0.1455	17.0	3825	0.5612
0.1437	18.0	4050	0.5586
0.1417	19.0	4275	0.5565
0.1434	20.0	4500	0.5580
0.1439	21.0	4725	0.5561
0.1421	22.0	4950	0.5552
0.1414	23.0	5175	0.5532
0.1396	24.0	5400	0.5505
0.1392	25.0	5625	0.5521
0.1413	26.0	5850	0.5517
0.1385	27.0	6075	0.5478
0.1413	28.0	6300	0.5485
0.1411	29.0	6525	0.5500
0.139	30.0	6750	0.5482
0.1402	31.0	6975	0.5480
0.1397	32.0	7200	0.5462
0.1402	33.0	7425	0.5448
0.1405	34.0	7650	0.5472
0.1374	35.0	7875	0.5437
0.1373	36.0	8100	0.5446
0.1386	37.0	8325	0.5439
0.1372	38.0	8550	0.5438
0.1383	39.0	8775	0.5431
0.1375	40.0	9000	0.5428
0.1398	41.0	9225	0.5431
0.1394	42.0	9450	0.5418
0.1395	43.0	9675	0.5423
0.1377	44.0	9900	0.5423
0.1366	45.0	10125	0.5422
0.138	46.0	10350	0.5419
0.136	47.0	10575	0.5416
0.138	48.0	10800	0.5418
0.1373	49.0	11025	0.5417
0.1365	50.0	11250	0.5416

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

Adapter

this model