File size: 5,140 Bytes
c17ee2a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7dfb2e6
e8d9df1
84a52d3
 
c17ee2a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
84a52d3
665ce6b
 
c17ee2a
80e47f4
 
a488e07
 
665ce6b
c17ee2a
6654f1c
84a52d3
 
c17ee2a
 
e8d9df1
 
9376cd9
 
84a52d3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e8d9df1
 
c17ee2a
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
---
license: mit
base_model: gpt2
tags:
- generated_from_trainer
model-index:
- name: GPT2WaP
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# GPT2WaP

This model is a [gpt2](https://huggingface.co/gpt2) model trained from scratch on the War and peace book.
It achieves the following results on the evaluation set:
- Loss: 9.0987
- Perplexity: 8943.6289

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 64
- eval_batch_size: 64
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- gradient_accumulation_steps: 4
- total_train_batch_size: 512
- total_eval_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 100
- num_epochs: 40
- mixed_precision_training: Native AMP

### Training results

| Training Loss | Epoch   | Step | Validation Loss | Perplexity |
|:-------------:|:-------:|:----:|:---------------:|:----------:|
| 10.157        | 0.6897  | 10   | 9.2336          | 10235.7480 |
| 9.2581        | 1.3793  | 20   | 8.9452          | 7671.1870  |
| 8.8166        | 2.0690  | 30   | 9.4917          | 13248.7207 |
| 8.5094        | 2.7586  | 40   | 9.5417          | 13928.9434 |
| 8.0914        | 3.4483  | 50   | 9.5507          | 14054.4785 |
| 7.663         | 4.1379  | 60   | 9.4760          | 13043.2441 |
| 7.3275        | 4.8276  | 70   | 9.3510          | 11510.8203 |
| 6.9788        | 5.5172  | 80   | 9.0822          | 8797.7188  |
| 6.6639        | 6.2069  | 90   | 8.9803          | 7945.4014  |
| 6.3749        | 6.8966  | 100  | 8.6494          | 5706.8130  |
| 6.0702        | 7.5862  | 110  | 8.5696          | 5268.9268  |
| 5.9107        | 8.2759  | 120  | 8.3612          | 4277.6265  |
| 5.6724        | 8.9655  | 130  | 8.4294          | 4579.6484  |
| 5.5949        | 9.6552  | 140  | 8.4934          | 4882.4316  |
| 5.4904        | 10.3448 | 150  | 8.4683          | 4761.3862  |
| 5.3792        | 11.0345 | 160  | 8.4647          | 4744.5381  |
| 5.3091        | 11.7241 | 170  | 8.5767          | 5306.3535  |
| 5.233         | 12.4138 | 180  | 8.5257          | 5042.5068  |
| 5.2252        | 13.1034 | 190  | 8.5328          | 5078.8433  |
| 5.1445        | 13.7931 | 200  | 8.5871          | 5361.9390  |
| 5.0824        | 14.4828 | 210  | 8.5784          | 5315.4043  |
| 5.0272        | 15.1724 | 220  | 8.6434          | 5672.6934  |
| 4.979         | 15.8621 | 230  | 8.6836          | 5905.4277  |
| 4.924         | 16.5517 | 240  | 8.7112          | 6070.2261  |
| 4.9394        | 17.2414 | 250  | 8.7233          | 6144.3931  |
| 4.8663        | 17.9310 | 260  | 8.7411          | 6254.5234  |
| 4.8599        | 18.6207 | 270  | 8.7824          | 6518.7896  |
| 4.8572        | 19.3103 | 280  | 8.8338          | 6862.5586  |
| 4.8064        | 20.0    | 290  | 8.7774          | 6485.7441  |
| 4.746         | 20.6897 | 300  | 8.8458          | 6944.8892  |
| 4.7569        | 21.3793 | 310  | 8.8436          | 6930.1416  |
| 4.6954        | 22.0690 | 320  | 8.8618          | 7057.1084  |
| 4.7277        | 22.7586 | 330  | 8.8706          | 7119.4478  |
| 4.6432        | 23.4483 | 340  | 8.9084          | 7393.6138  |
| 4.6032        | 24.1379 | 350  | 8.9111          | 7413.5176  |
| 4.6198        | 24.8276 | 360  | 8.9526          | 7728.0210  |
| 4.5874        | 25.5172 | 370  | 8.9740          | 7895.1641  |
| 4.5455        | 26.2069 | 380  | 8.9365          | 7604.7129  |
| 4.5313        | 26.8966 | 390  | 8.9738          | 7893.2969  |
| 4.5297        | 27.5862 | 400  | 8.9659          | 7831.8110  |
| 4.5279        | 28.2759 | 410  | 8.9914          | 8034.0391  |
| 4.4974        | 28.9655 | 420  | 9.0293          | 8344.2529  |
| 4.4554        | 29.6552 | 430  | 9.0191          | 8259.1533  |
| 4.4651        | 30.3448 | 440  | 9.0236          | 8296.4531  |
| 4.4647        | 31.0345 | 450  | 9.0349          | 8391.1279  |
| 4.4668        | 31.7241 | 460  | 9.0530          | 8543.8340  |
| 4.4264        | 32.4138 | 470  | 9.0722          | 8709.4141  |
| 4.4008        | 33.1034 | 480  | 9.0876          | 8844.6104  |
| 4.3982        | 33.7931 | 490  | 9.0711          | 8700.4893  |
| 4.3846        | 34.4828 | 500  | 9.0894          | 8860.7441  |
| 4.3971        | 35.1724 | 510  | 9.0879          | 8847.6973  |
| 4.379         | 35.8621 | 520  | 9.0949          | 8909.6025  |
| 4.3696        | 36.5517 | 530  | 9.1097          | 9042.2295  |
| 4.3447        | 37.2414 | 540  | 9.1007          | 8961.6953  |
| 4.3796        | 37.9310 | 550  | 9.0869          | 8839.0781  |
| 4.364         | 38.6207 | 560  | 9.0987          | 8943.6289  |


### Framework versions

- Transformers 4.40.1
- Pytorch 2.3.0+cu121
- Datasets 2.19.0
- Tokenizers 0.19.1