--- license: mit base_model: gpt2 tags: - generated_from_trainer model-index: - name: GPT2WaP results: [] --- # GPT2WaP This model is a [gpt2](https://huggingface.co/gpt2) model trained from scratch on the War and peace book. It achieves the following results on the evaluation set: - Loss: 9.0987 - Perplexity: 8943.6289 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-05 - train_batch_size: 64 - eval_batch_size: 64 - seed: 42 - distributed_type: multi-GPU - num_devices: 2 - gradient_accumulation_steps: 4 - total_train_batch_size: 512 - total_eval_batch_size: 128 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_steps: 100 - num_epochs: 40 - mixed_precision_training: Native AMP ### Training results | Training Loss | Epoch | Step | Validation Loss | Perplexity | |:-------------:|:-------:|:----:|:---------------:|:----------:| | 10.157 | 0.6897 | 10 | 9.2336 | 10235.7480 | | 9.2581 | 1.3793 | 20 | 8.9452 | 7671.1870 | | 8.8166 | 2.0690 | 30 | 9.4917 | 13248.7207 | | 8.5094 | 2.7586 | 40 | 9.5417 | 13928.9434 | | 8.0914 | 3.4483 | 50 | 9.5507 | 14054.4785 | | 7.663 | 4.1379 | 60 | 9.4760 | 13043.2441 | | 7.3275 | 4.8276 | 70 | 9.3510 | 11510.8203 | | 6.9788 | 5.5172 | 80 | 9.0822 | 8797.7188 | | 6.6639 | 6.2069 | 90 | 8.9803 | 7945.4014 | | 6.3749 | 6.8966 | 100 | 8.6494 | 5706.8130 | | 6.0702 | 7.5862 | 110 | 8.5696 | 5268.9268 | | 5.9107 | 8.2759 | 120 | 8.3612 | 4277.6265 | | 5.6724 | 8.9655 | 130 | 8.4294 | 4579.6484 | | 5.5949 | 9.6552 | 140 | 8.4934 | 4882.4316 | | 5.4904 | 10.3448 | 150 | 8.4683 | 4761.3862 | | 5.3792 | 11.0345 | 160 | 8.4647 | 4744.5381 | | 5.3091 | 11.7241 | 170 | 8.5767 | 5306.3535 | | 5.233 | 12.4138 | 180 | 8.5257 | 5042.5068 | | 5.2252 | 13.1034 | 190 | 8.5328 | 5078.8433 | | 5.1445 | 13.7931 | 200 | 8.5871 | 5361.9390 | | 5.0824 | 14.4828 | 210 | 8.5784 | 5315.4043 | | 5.0272 | 15.1724 | 220 | 8.6434 | 5672.6934 | | 4.979 | 15.8621 | 230 | 8.6836 | 5905.4277 | | 4.924 | 16.5517 | 240 | 8.7112 | 6070.2261 | | 4.9394 | 17.2414 | 250 | 8.7233 | 6144.3931 | | 4.8663 | 17.9310 | 260 | 8.7411 | 6254.5234 | | 4.8599 | 18.6207 | 270 | 8.7824 | 6518.7896 | | 4.8572 | 19.3103 | 280 | 8.8338 | 6862.5586 | | 4.8064 | 20.0 | 290 | 8.7774 | 6485.7441 | | 4.746 | 20.6897 | 300 | 8.8458 | 6944.8892 | | 4.7569 | 21.3793 | 310 | 8.8436 | 6930.1416 | | 4.6954 | 22.0690 | 320 | 8.8618 | 7057.1084 | | 4.7277 | 22.7586 | 330 | 8.8706 | 7119.4478 | | 4.6432 | 23.4483 | 340 | 8.9084 | 7393.6138 | | 4.6032 | 24.1379 | 350 | 8.9111 | 7413.5176 | | 4.6198 | 24.8276 | 360 | 8.9526 | 7728.0210 | | 4.5874 | 25.5172 | 370 | 8.9740 | 7895.1641 | | 4.5455 | 26.2069 | 380 | 8.9365 | 7604.7129 | | 4.5313 | 26.8966 | 390 | 8.9738 | 7893.2969 | | 4.5297 | 27.5862 | 400 | 8.9659 | 7831.8110 | | 4.5279 | 28.2759 | 410 | 8.9914 | 8034.0391 | | 4.4974 | 28.9655 | 420 | 9.0293 | 8344.2529 | | 4.4554 | 29.6552 | 430 | 9.0191 | 8259.1533 | | 4.4651 | 30.3448 | 440 | 9.0236 | 8296.4531 | | 4.4647 | 31.0345 | 450 | 9.0349 | 8391.1279 | | 4.4668 | 31.7241 | 460 | 9.0530 | 8543.8340 | | 4.4264 | 32.4138 | 470 | 9.0722 | 8709.4141 | | 4.4008 | 33.1034 | 480 | 9.0876 | 8844.6104 | | 4.3982 | 33.7931 | 490 | 9.0711 | 8700.4893 | | 4.3846 | 34.4828 | 500 | 9.0894 | 8860.7441 | | 4.3971 | 35.1724 | 510 | 9.0879 | 8847.6973 | | 4.379 | 35.8621 | 520 | 9.0949 | 8909.6025 | | 4.3696 | 36.5517 | 530 | 9.1097 | 9042.2295 | | 4.3447 | 37.2414 | 540 | 9.1007 | 8961.6953 | | 4.3796 | 37.9310 | 550 | 9.0869 | 8839.0781 | | 4.364 | 38.6207 | 560 | 9.0987 | 8943.6289 | ### Framework versions - Transformers 4.40.1 - Pytorch 2.3.0+cu121 - Datasets 2.19.0 - Tokenizers 0.19.1