End of training
Browse files- README.md +61 -32
- model.safetensors +1 -1
- training_args.bin +1 -1
README.md
CHANGED
|
@@ -15,8 +15,8 @@ should probably proofread and complete it, then remove this comment. -->
|
|
| 15 |
|
| 16 |
This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on the None dataset.
|
| 17 |
It achieves the following results on the evaluation set:
|
| 18 |
-
- Loss:
|
| 19 |
-
- Perplexity:
|
| 20 |
|
| 21 |
## Model description
|
| 22 |
|
|
@@ -35,7 +35,7 @@ More information needed
|
|
| 35 |
### Training hyperparameters
|
| 36 |
|
| 37 |
The following hyperparameters were used during training:
|
| 38 |
-
- learning_rate:
|
| 39 |
- train_batch_size: 64
|
| 40 |
- eval_batch_size: 64
|
| 41 |
- seed: 42
|
|
@@ -46,41 +46,70 @@ The following hyperparameters were used during training:
|
|
| 46 |
- total_eval_batch_size: 128
|
| 47 |
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
| 48 |
- lr_scheduler_type: linear
|
| 49 |
-
-
|
|
|
|
| 50 |
- mixed_precision_training: Native AMP
|
| 51 |
|
| 52 |
### Training results
|
| 53 |
|
| 54 |
| Training Loss | Epoch | Step | Validation Loss | Perplexity |
|
| 55 |
|:-------------:|:-------:|:----:|:---------------:|:----------:|
|
| 56 |
-
|
|
| 57 |
-
|
|
| 58 |
-
|
|
| 59 |
-
|
|
| 60 |
-
|
|
| 61 |
-
|
|
| 62 |
-
|
|
| 63 |
-
|
|
| 64 |
-
|
|
| 65 |
-
|
|
| 66 |
-
|
|
| 67 |
-
| 5.
|
| 68 |
-
| 5.
|
| 69 |
-
|
|
| 70 |
-
|
|
| 71 |
-
|
|
| 72 |
-
|
|
| 73 |
-
|
|
| 74 |
-
|
|
| 75 |
-
|
|
| 76 |
-
|
|
| 77 |
-
|
|
| 78 |
-
| 4.
|
| 79 |
-
| 4.
|
| 80 |
-
| 4.
|
| 81 |
-
| 4.
|
| 82 |
-
| 4.
|
| 83 |
-
| 4.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 84 |
|
| 85 |
|
| 86 |
### Framework versions
|
|
|
|
| 15 |
|
| 16 |
This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on the None dataset.
|
| 17 |
It achieves the following results on the evaluation set:
|
| 18 |
+
- Loss: 9.0987
|
| 19 |
+
- Perplexity: 8943.6289
|
| 20 |
|
| 21 |
## Model description
|
| 22 |
|
|
|
|
| 35 |
### Training hyperparameters
|
| 36 |
|
| 37 |
The following hyperparameters were used during training:
|
| 38 |
+
- learning_rate: 5e-05
|
| 39 |
- train_batch_size: 64
|
| 40 |
- eval_batch_size: 64
|
| 41 |
- seed: 42
|
|
|
|
| 46 |
- total_eval_batch_size: 128
|
| 47 |
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
| 48 |
- lr_scheduler_type: linear
|
| 49 |
+
- lr_scheduler_warmup_steps: 100
|
| 50 |
+
- num_epochs: 40
|
| 51 |
- mixed_precision_training: Native AMP
|
| 52 |
|
| 53 |
### Training results
|
| 54 |
|
| 55 |
| Training Loss | Epoch | Step | Validation Loss | Perplexity |
|
| 56 |
|:-------------:|:-------:|:----:|:---------------:|:----------:|
|
| 57 |
+
| 10.157 | 0.6897 | 10 | 9.2336 | 10235.7480 |
|
| 58 |
+
| 9.2581 | 1.3793 | 20 | 8.9452 | 7671.1870 |
|
| 59 |
+
| 8.8166 | 2.0690 | 30 | 9.4917 | 13248.7207 |
|
| 60 |
+
| 8.5094 | 2.7586 | 40 | 9.5417 | 13928.9434 |
|
| 61 |
+
| 8.0914 | 3.4483 | 50 | 9.5507 | 14054.4785 |
|
| 62 |
+
| 7.663 | 4.1379 | 60 | 9.4760 | 13043.2441 |
|
| 63 |
+
| 7.3275 | 4.8276 | 70 | 9.3510 | 11510.8203 |
|
| 64 |
+
| 6.9788 | 5.5172 | 80 | 9.0822 | 8797.7188 |
|
| 65 |
+
| 6.6639 | 6.2069 | 90 | 8.9803 | 7945.4014 |
|
| 66 |
+
| 6.3749 | 6.8966 | 100 | 8.6494 | 5706.8130 |
|
| 67 |
+
| 6.0702 | 7.5862 | 110 | 8.5696 | 5268.9268 |
|
| 68 |
+
| 5.9107 | 8.2759 | 120 | 8.3612 | 4277.6265 |
|
| 69 |
+
| 5.6724 | 8.9655 | 130 | 8.4294 | 4579.6484 |
|
| 70 |
+
| 5.5949 | 9.6552 | 140 | 8.4934 | 4882.4316 |
|
| 71 |
+
| 5.4904 | 10.3448 | 150 | 8.4683 | 4761.3862 |
|
| 72 |
+
| 5.3792 | 11.0345 | 160 | 8.4647 | 4744.5381 |
|
| 73 |
+
| 5.3091 | 11.7241 | 170 | 8.5767 | 5306.3535 |
|
| 74 |
+
| 5.233 | 12.4138 | 180 | 8.5257 | 5042.5068 |
|
| 75 |
+
| 5.2252 | 13.1034 | 190 | 8.5328 | 5078.8433 |
|
| 76 |
+
| 5.1445 | 13.7931 | 200 | 8.5871 | 5361.9390 |
|
| 77 |
+
| 5.0824 | 14.4828 | 210 | 8.5784 | 5315.4043 |
|
| 78 |
+
| 5.0272 | 15.1724 | 220 | 8.6434 | 5672.6934 |
|
| 79 |
+
| 4.979 | 15.8621 | 230 | 8.6836 | 5905.4277 |
|
| 80 |
+
| 4.924 | 16.5517 | 240 | 8.7112 | 6070.2261 |
|
| 81 |
+
| 4.9394 | 17.2414 | 250 | 8.7233 | 6144.3931 |
|
| 82 |
+
| 4.8663 | 17.9310 | 260 | 8.7411 | 6254.5234 |
|
| 83 |
+
| 4.8599 | 18.6207 | 270 | 8.7824 | 6518.7896 |
|
| 84 |
+
| 4.8572 | 19.3103 | 280 | 8.8338 | 6862.5586 |
|
| 85 |
+
| 4.8064 | 20.0 | 290 | 8.7774 | 6485.7441 |
|
| 86 |
+
| 4.746 | 20.6897 | 300 | 8.8458 | 6944.8892 |
|
| 87 |
+
| 4.7569 | 21.3793 | 310 | 8.8436 | 6930.1416 |
|
| 88 |
+
| 4.6954 | 22.0690 | 320 | 8.8618 | 7057.1084 |
|
| 89 |
+
| 4.7277 | 22.7586 | 330 | 8.8706 | 7119.4478 |
|
| 90 |
+
| 4.6432 | 23.4483 | 340 | 8.9084 | 7393.6138 |
|
| 91 |
+
| 4.6032 | 24.1379 | 350 | 8.9111 | 7413.5176 |
|
| 92 |
+
| 4.6198 | 24.8276 | 360 | 8.9526 | 7728.0210 |
|
| 93 |
+
| 4.5874 | 25.5172 | 370 | 8.9740 | 7895.1641 |
|
| 94 |
+
| 4.5455 | 26.2069 | 380 | 8.9365 | 7604.7129 |
|
| 95 |
+
| 4.5313 | 26.8966 | 390 | 8.9738 | 7893.2969 |
|
| 96 |
+
| 4.5297 | 27.5862 | 400 | 8.9659 | 7831.8110 |
|
| 97 |
+
| 4.5279 | 28.2759 | 410 | 8.9914 | 8034.0391 |
|
| 98 |
+
| 4.4974 | 28.9655 | 420 | 9.0293 | 8344.2529 |
|
| 99 |
+
| 4.4554 | 29.6552 | 430 | 9.0191 | 8259.1533 |
|
| 100 |
+
| 4.4651 | 30.3448 | 440 | 9.0236 | 8296.4531 |
|
| 101 |
+
| 4.4647 | 31.0345 | 450 | 9.0349 | 8391.1279 |
|
| 102 |
+
| 4.4668 | 31.7241 | 460 | 9.0530 | 8543.8340 |
|
| 103 |
+
| 4.4264 | 32.4138 | 470 | 9.0722 | 8709.4141 |
|
| 104 |
+
| 4.4008 | 33.1034 | 480 | 9.0876 | 8844.6104 |
|
| 105 |
+
| 4.3982 | 33.7931 | 490 | 9.0711 | 8700.4893 |
|
| 106 |
+
| 4.3846 | 34.4828 | 500 | 9.0894 | 8860.7441 |
|
| 107 |
+
| 4.3971 | 35.1724 | 510 | 9.0879 | 8847.6973 |
|
| 108 |
+
| 4.379 | 35.8621 | 520 | 9.0949 | 8909.6025 |
|
| 109 |
+
| 4.3696 | 36.5517 | 530 | 9.1097 | 9042.2295 |
|
| 110 |
+
| 4.3447 | 37.2414 | 540 | 9.1007 | 8961.6953 |
|
| 111 |
+
| 4.3796 | 37.9310 | 550 | 9.0869 | 8839.0781 |
|
| 112 |
+
| 4.364 | 38.6207 | 560 | 9.0987 | 8943.6289 |
|
| 113 |
|
| 114 |
|
| 115 |
### Framework versions
|
model.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 497774208
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:b22d807f2179d13b91cec151f754dda8bf44f84c7af760b8d721e93be1ba638d
|
| 3 |
size 497774208
|
training_args.bin
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 4920
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:7ea17b4bf4c67485ce1e7ffd761a16992c72e5f6b8abf27a0f17f344a18182b7
|
| 3 |
size 4920
|