Kasdeja23 commited on
Commit
84a52d3
·
verified ·
1 Parent(s): 9376cd9

End of training

Browse files
Files changed (3) hide show
  1. README.md +61 -32
  2. model.safetensors +1 -1
  3. training_args.bin +1 -1
README.md CHANGED
@@ -15,8 +15,8 @@ should probably proofread and complete it, then remove this comment. -->
15
 
16
  This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on the None dataset.
17
  It achieves the following results on the evaluation set:
18
- - Loss: 8.6577
19
- - Perplexity: 5754.3535
20
 
21
  ## Model description
22
 
@@ -35,7 +35,7 @@ More information needed
35
  ### Training hyperparameters
36
 
37
  The following hyperparameters were used during training:
38
- - learning_rate: 0.0001
39
  - train_batch_size: 64
40
  - eval_batch_size: 64
41
  - seed: 42
@@ -46,41 +46,70 @@ The following hyperparameters were used during training:
46
  - total_eval_batch_size: 128
47
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
48
  - lr_scheduler_type: linear
49
- - num_epochs: 20
 
50
  - mixed_precision_training: Native AMP
51
 
52
  ### Training results
53
 
54
  | Training Loss | Epoch | Step | Validation Loss | Perplexity |
55
  |:-------------:|:-------:|:----:|:---------------:|:----------:|
56
- | 8.4382 | 0.6897 | 10 | 9.2702 | 10617.0752 |
57
- | 7.4961 | 1.3793 | 20 | 8.6698 | 5824.3774 |
58
- | 6.7565 | 2.0690 | 30 | 8.0646 | 3179.9265 |
59
- | 6.3202 | 2.7586 | 40 | 8.0297 | 3070.9265 |
60
- | 6.0296 | 3.4483 | 50 | 8.0547 | 3148.4561 |
61
- | 5.8302 | 4.1379 | 60 | 8.0955 | 3279.8328 |
62
- | 5.6731 | 4.8276 | 70 | 8.0292 | 3069.4158 |
63
- | 5.5112 | 5.5172 | 80 | 8.0928 | 3270.8430 |
64
- | 5.3804 | 6.2069 | 90 | 8.1594 | 3496.0083 |
65
- | 5.2854 | 6.8966 | 100 | 8.1695 | 3531.4038 |
66
- | 5.1756 | 7.5862 | 110 | 8.2787 | 3938.9458 |
67
- | 5.1088 | 8.2759 | 120 | 8.3184 | 4098.6143 |
68
- | 5.0795 | 8.9655 | 130 | 8.3422 | 4197.5151 |
69
- | 4.9467 | 9.6552 | 140 | 8.4200 | 4537.0161 |
70
- | 4.9345 | 10.3448 | 150 | 8.4348 | 4604.7515 |
71
- | 4.8858 | 11.0345 | 160 | 8.5120 | 4973.9927 |
72
- | 4.8203 | 11.7241 | 170 | 8.5191 | 5009.6968 |
73
- | 4.7969 | 12.4138 | 180 | 8.5414 | 5122.4717 |
74
- | 4.7913 | 13.1034 | 190 | 8.5665 | 5252.6514 |
75
- | 4.7259 | 13.7931 | 200 | 8.6067 | 5467.9600 |
76
- | 4.7399 | 14.4828 | 210 | 8.5804 | 5326.0552 |
77
- | 4.6635 | 15.1724 | 220 | 8.6029 | 5447.3901 |
78
- | 4.6997 | 15.8621 | 230 | 8.6199 | 5540.8413 |
79
- | 4.609 | 16.5517 | 240 | 8.6413 | 5660.4258 |
80
- | 4.6155 | 17.2414 | 250 | 8.6510 | 5715.6362 |
81
- | 4.5782 | 17.9310 | 260 | 8.6577 | 5754.1724 |
82
- | 4.6361 | 18.6207 | 270 | 8.6577 | 5754.1450 |
83
- | 4.5793 | 19.3103 | 280 | 8.6577 | 5754.3535 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
84
 
85
 
86
  ### Framework versions
 
15
 
16
  This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on the None dataset.
17
  It achieves the following results on the evaluation set:
18
+ - Loss: 9.0987
19
+ - Perplexity: 8943.6289
20
 
21
  ## Model description
22
 
 
35
  ### Training hyperparameters
36
 
37
  The following hyperparameters were used during training:
38
+ - learning_rate: 5e-05
39
  - train_batch_size: 64
40
  - eval_batch_size: 64
41
  - seed: 42
 
46
  - total_eval_batch_size: 128
47
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
48
  - lr_scheduler_type: linear
49
+ - lr_scheduler_warmup_steps: 100
50
+ - num_epochs: 40
51
  - mixed_precision_training: Native AMP
52
 
53
  ### Training results
54
 
55
  | Training Loss | Epoch | Step | Validation Loss | Perplexity |
56
  |:-------------:|:-------:|:----:|:---------------:|:----------:|
57
+ | 10.157 | 0.6897 | 10 | 9.2336 | 10235.7480 |
58
+ | 9.2581 | 1.3793 | 20 | 8.9452 | 7671.1870 |
59
+ | 8.8166 | 2.0690 | 30 | 9.4917 | 13248.7207 |
60
+ | 8.5094 | 2.7586 | 40 | 9.5417 | 13928.9434 |
61
+ | 8.0914 | 3.4483 | 50 | 9.5507 | 14054.4785 |
62
+ | 7.663 | 4.1379 | 60 | 9.4760 | 13043.2441 |
63
+ | 7.3275 | 4.8276 | 70 | 9.3510 | 11510.8203 |
64
+ | 6.9788 | 5.5172 | 80 | 9.0822 | 8797.7188 |
65
+ | 6.6639 | 6.2069 | 90 | 8.9803 | 7945.4014 |
66
+ | 6.3749 | 6.8966 | 100 | 8.6494 | 5706.8130 |
67
+ | 6.0702 | 7.5862 | 110 | 8.5696 | 5268.9268 |
68
+ | 5.9107 | 8.2759 | 120 | 8.3612 | 4277.6265 |
69
+ | 5.6724 | 8.9655 | 130 | 8.4294 | 4579.6484 |
70
+ | 5.5949 | 9.6552 | 140 | 8.4934 | 4882.4316 |
71
+ | 5.4904 | 10.3448 | 150 | 8.4683 | 4761.3862 |
72
+ | 5.3792 | 11.0345 | 160 | 8.4647 | 4744.5381 |
73
+ | 5.3091 | 11.7241 | 170 | 8.5767 | 5306.3535 |
74
+ | 5.233 | 12.4138 | 180 | 8.5257 | 5042.5068 |
75
+ | 5.2252 | 13.1034 | 190 | 8.5328 | 5078.8433 |
76
+ | 5.1445 | 13.7931 | 200 | 8.5871 | 5361.9390 |
77
+ | 5.0824 | 14.4828 | 210 | 8.5784 | 5315.4043 |
78
+ | 5.0272 | 15.1724 | 220 | 8.6434 | 5672.6934 |
79
+ | 4.979 | 15.8621 | 230 | 8.6836 | 5905.4277 |
80
+ | 4.924 | 16.5517 | 240 | 8.7112 | 6070.2261 |
81
+ | 4.9394 | 17.2414 | 250 | 8.7233 | 6144.3931 |
82
+ | 4.8663 | 17.9310 | 260 | 8.7411 | 6254.5234 |
83
+ | 4.8599 | 18.6207 | 270 | 8.7824 | 6518.7896 |
84
+ | 4.8572 | 19.3103 | 280 | 8.8338 | 6862.5586 |
85
+ | 4.8064 | 20.0 | 290 | 8.7774 | 6485.7441 |
86
+ | 4.746 | 20.6897 | 300 | 8.8458 | 6944.8892 |
87
+ | 4.7569 | 21.3793 | 310 | 8.8436 | 6930.1416 |
88
+ | 4.6954 | 22.0690 | 320 | 8.8618 | 7057.1084 |
89
+ | 4.7277 | 22.7586 | 330 | 8.8706 | 7119.4478 |
90
+ | 4.6432 | 23.4483 | 340 | 8.9084 | 7393.6138 |
91
+ | 4.6032 | 24.1379 | 350 | 8.9111 | 7413.5176 |
92
+ | 4.6198 | 24.8276 | 360 | 8.9526 | 7728.0210 |
93
+ | 4.5874 | 25.5172 | 370 | 8.9740 | 7895.1641 |
94
+ | 4.5455 | 26.2069 | 380 | 8.9365 | 7604.7129 |
95
+ | 4.5313 | 26.8966 | 390 | 8.9738 | 7893.2969 |
96
+ | 4.5297 | 27.5862 | 400 | 8.9659 | 7831.8110 |
97
+ | 4.5279 | 28.2759 | 410 | 8.9914 | 8034.0391 |
98
+ | 4.4974 | 28.9655 | 420 | 9.0293 | 8344.2529 |
99
+ | 4.4554 | 29.6552 | 430 | 9.0191 | 8259.1533 |
100
+ | 4.4651 | 30.3448 | 440 | 9.0236 | 8296.4531 |
101
+ | 4.4647 | 31.0345 | 450 | 9.0349 | 8391.1279 |
102
+ | 4.4668 | 31.7241 | 460 | 9.0530 | 8543.8340 |
103
+ | 4.4264 | 32.4138 | 470 | 9.0722 | 8709.4141 |
104
+ | 4.4008 | 33.1034 | 480 | 9.0876 | 8844.6104 |
105
+ | 4.3982 | 33.7931 | 490 | 9.0711 | 8700.4893 |
106
+ | 4.3846 | 34.4828 | 500 | 9.0894 | 8860.7441 |
107
+ | 4.3971 | 35.1724 | 510 | 9.0879 | 8847.6973 |
108
+ | 4.379 | 35.8621 | 520 | 9.0949 | 8909.6025 |
109
+ | 4.3696 | 36.5517 | 530 | 9.1097 | 9042.2295 |
110
+ | 4.3447 | 37.2414 | 540 | 9.1007 | 8961.6953 |
111
+ | 4.3796 | 37.9310 | 550 | 9.0869 | 8839.0781 |
112
+ | 4.364 | 38.6207 | 560 | 9.0987 | 8943.6289 |
113
 
114
 
115
  ### Framework versions
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:1d66aadc2b1d6720acf202c1c565626b39a26223a37f2b48d55f776974819977
3
  size 497774208
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b22d807f2179d13b91cec151f754dda8bf44f84c7af760b8d721e93be1ba638d
3
  size 497774208
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:4700b5a618a2afe7e8645620d6bb973f32fb9f924755c937b8df7cc424625120
3
  size 4920
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7ea17b4bf4c67485ce1e7ffd761a16992c72e5f6b8abf27a0f17f344a18182b7
3
  size 4920