pretrained_gpt2_1.5M

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.5401

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 256
eval_batch_size: 256
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.95) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 625000
num_epochs: 15
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss
0.4776	0.1888	1000	1.8068
0.3864	0.3776	2000	1.4641
0.3546	0.5664	3000	1.3386
0.3287	0.7551	4000	1.2600
0.3188	0.9439	5000	1.1931
0.3032	1.1327	6000	1.1311
0.2907	1.3215	7000	1.0674
0.2771	1.5103	8000	1.0217
0.2654	1.6991	9000	0.9736
0.2521	1.8879	10000	0.9296
0.2483	2.0766	11000	0.8892
0.2337	2.2654	12000	0.8559
0.2331	2.4542	13000	0.8287
0.2243	2.6430	14000	0.8064
0.2142	2.8318	15000	0.7833
0.2071	3.0206	16000	0.7687
0.2102	3.2094	17000	0.7541
0.2017	3.3981	18000	0.7455
0.195	3.5869	19000	0.7298
0.1953	3.7757	20000	0.7176
0.193	3.9645	21000	0.7071
0.1932	4.1533	22000	0.6946
0.1854	4.3421	23000	0.6880
0.1854	4.5309	24000	0.6905
0.182	4.7197	25000	0.6740
0.1811	4.9084	26000	0.6641
0.1791	5.0972	27000	0.6600
0.1738	5.2860	28000	0.6544
0.1733	5.4748	29000	0.6459
0.1697	5.6636	30000	0.6436
0.1731	5.8524	31000	0.6400
0.1712	6.0412	32000	0.6424
0.1705	6.2299	33000	0.6275
0.1664	6.4187	34000	0.6249
0.1679	6.6075	35000	0.6197
0.1668	6.7963	36000	0.6191
0.1663	6.9851	37000	0.6158
0.1635	7.1739	38000	0.6103
0.1621	7.3627	39000	0.6061
0.1628	7.5514	40000	0.6050
0.163	7.7402	41000	0.6000
0.1596	7.9290	42000	0.5975
0.162	8.1178	43000	0.5946
0.1607	8.3066	44000	0.5939
0.1582	8.4954	45000	0.5916
0.1593	8.6842	46000	0.5864
0.1593	8.8729	47000	0.5865
0.156	9.0617	48000	0.5836
0.1544	9.2505	49000	0.5819
0.1546	9.4393	50000	0.5787
0.1533	9.6281	51000	0.5751
0.1529	9.8169	52000	0.5758
0.1542	10.0057	53000	0.5728
0.1528	10.1944	54000	0.5714
0.1502	10.3832	55000	0.5709
0.1524	10.5720	56000	0.5691
0.1505	10.7608	57000	0.5691
0.1514	10.9496	58000	0.5652
0.147	11.1384	59000	0.5657
0.149	11.3272	60000	0.5618
0.1456	11.5160	61000	0.5601
0.146	11.7047	62000	0.5588
0.147	11.8935	63000	0.5599
0.1493	12.0823	64000	0.5557
0.1481	12.2711	65000	0.5562
0.1463	12.4599	66000	0.5536
0.1479	12.6487	67000	0.5532
0.1458	12.8375	68000	0.5504
0.1418	13.0262	69000	0.5511
0.146	13.2150	70000	0.5496
0.1452	13.4038	71000	0.5477
0.144	13.5926	72000	0.5486
0.1423	13.7814	73000	0.5457
0.1438	13.9702	74000	0.5451
0.1461	14.1590	75000	0.5443
0.1442	14.3477	76000	0.5439
0.1409	14.5365	77000	0.5410
0.1443	14.7253	78000	0.5401
0.1404	14.9141	79000	0.5404

Framework versions

Transformers 4.51.1
Pytorch 2.6.0+cu124
Datasets 3.5.0
Tokenizers 0.21.1

Downloads last month: -

Safetensors

Model size

85.2M params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support