temporary0-0name commited on
Commit
4d6d3f5
·
1 Parent(s): d618529

End of training

Browse files
Files changed (1) hide show
  1. README.md +10 -22
README.md CHANGED
@@ -17,7 +17,7 @@ should probably proofread and complete it, then remove this comment. -->
17
 
18
  This model is a fine-tuned version of [bert-base-uncased](https://huggingface.co/bert-base-uncased) on the wikitext dataset.
19
  It achieves the following results on the evaluation set:
20
- - Loss: 7.1789
21
 
22
  ## Model description
23
 
@@ -36,37 +36,25 @@ More information needed
36
  ### Training hyperparameters
37
 
38
  The following hyperparameters were used during training:
39
- - learning_rate: 0.005
40
- - train_batch_size: 8
41
- - eval_batch_size: 8
42
  - seed: 42
43
  - gradient_accumulation_steps: 8
44
- - total_train_batch_size: 64
45
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
46
  - lr_scheduler_type: cosine
47
  - lr_scheduler_warmup_steps: 100
48
- - num_epochs: 10
49
 
50
  ### Training results
51
 
52
  | Training Loss | Epoch | Step | Validation Loss |
53
  |:-------------:|:-----:|:----:|:---------------:|
54
- | 7.8933 | 0.07 | 50 | 7.4310 |
55
- | 7.3323 | 0.14 | 100 | 7.3016 |
56
- | 7.2569 | 0.21 | 150 | 7.2390 |
57
- | 7.2126 | 0.27 | 200 | 7.2150 |
58
- | 7.1929 | 0.34 | 250 | 7.2001 |
59
- | 7.1759 | 0.41 | 300 | 7.1929 |
60
- | 7.2085 | 0.48 | 350 | 7.1945 |
61
- | 7.1944 | 0.55 | 400 | 7.1941 |
62
- | 7.1889 | 0.62 | 450 | 7.1847 |
63
- | 7.1626 | 0.69 | 500 | 7.1791 |
64
- | 7.1635 | 0.75 | 550 | 7.1777 |
65
- | 7.1868 | 0.82 | 600 | 7.1842 |
66
- | 7.1766 | 0.89 | 650 | 7.1725 |
67
- | 7.1767 | 0.96 | 700 | 7.1743 |
68
- | 7.1778 | 1.03 | 750 | 7.1728 |
69
- | 7.1497 | 1.1 | 800 | 7.1789 |
70
 
71
 
72
  ### Framework versions
 
17
 
18
  This model is a fine-tuned version of [bert-base-uncased](https://huggingface.co/bert-base-uncased) on the wikitext dataset.
19
  It achieves the following results on the evaluation set:
20
+ - Loss: 6.0287
21
 
22
  ## Model description
23
 
 
36
  ### Training hyperparameters
37
 
38
  The following hyperparameters were used during training:
39
+ - learning_rate: 0.0001
40
+ - train_batch_size: 128
41
+ - eval_batch_size: 128
42
  - seed: 42
43
  - gradient_accumulation_steps: 8
44
+ - total_train_batch_size: 1024
45
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
46
  - lr_scheduler_type: cosine
47
  - lr_scheduler_warmup_steps: 100
48
+ - num_epochs: 5
49
 
50
  ### Training results
51
 
52
  | Training Loss | Epoch | Step | Validation Loss |
53
  |:-------------:|:-----:|:----:|:---------------:|
54
+ | 9.0662 | 1.1 | 50 | 7.8866 |
55
+ | 7.1297 | 2.19 | 100 | 6.6448 |
56
+ | 6.4229 | 3.29 | 150 | 6.2367 |
57
+ | 6.0864 | 4.38 | 200 | 6.0287 |
 
 
 
 
 
 
 
 
 
 
 
 
58
 
59
 
60
  ### Framework versions