CNR-ILC commited on
Commit
39d4534
·
verified ·
1 Parent(s): 7ee588d

ILC-CNR/gs-greBERTa

Browse files
Files changed (6) hide show
  1. README.md +22 -25
  2. all_results.json +16 -14
  3. eval_results.json +10 -8
  4. model.safetensors +1 -1
  5. train_results.json +6 -6
  6. trainer_state.json +170 -148
README.md CHANGED
@@ -7,11 +7,6 @@ tags:
7
  model-index:
8
  - name: gs-greBERTa
9
  results: []
10
- datasets:
11
- - CNR-ILC/gs-maat-corpus
12
- metrics:
13
- - accuracy
14
- pipeline_tag: fill-mask
15
  ---
16
 
17
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -19,13 +14,15 @@ should probably proofread and complete it, then remove this comment. -->
19
 
20
  # gs-greBERTa
21
 
22
- This model is a fine-tuned version of [bowphs/GreBerta](https://huggingface.co/bowphs/GreBerta) on the [gs-maat-corpus](https://huggingface.co/datasets/CNR-ILC/gs-maat-corpus) dataset.
23
- It achieves the following results on the held out test set:
24
- - Loss: 0.5786
25
- - Top1 Acc: 0.8589
26
- - Top5 Acc: 0.9202
27
- - Top10 Acc: 0.9448
28
- - Top20 Acc: 0.9632
 
 
29
 
30
  ## Model description
31
 
@@ -55,18 +52,18 @@ The following hyperparameters were used during training:
55
 
56
  ### Training results
57
 
58
- | Training Loss | Epoch | Step | Validation Loss | Top1 Acc | Top5 Acc | Top10 Acc | Top20 Acc |
59
- |:-------------:|:-----:|:-----:|:---------------:|:--------:|:--------:|:---------:|:---------:|
60
- | 1.2561 | 1.0 | 5634 | 0.9441 | 0.7965 | 0.8837 | 0.9302 | 0.9593 |
61
- | 0.969 | 2.0 | 11268 | 0.8007 | 0.8028 | 0.9296 | 0.9366 | 0.9648 |
62
- | 0.8407 | 3.0 | 16902 | 0.7505 | 0.8092 | 0.9249 | 0.9480 | 0.9769 |
63
- | 0.7791 | 4.0 | 22536 | 0.6900 | 0.825 | 0.9313 | 0.95 | 0.975 |
64
- | 0.7264 | 5.0 | 28170 | 0.6541 | 0.8824 | 0.9471 | 0.9706 | 0.9765 |
65
- | 0.6872 | 6.0 | 33804 | 0.6343 | 0.8344 | 0.9264 | 0.9571 | 0.9877 |
66
- | 0.6553 | 7.0 | 39438 | 0.6069 | 0.8705 | 0.9568 | 0.9712 | 0.9784 |
67
- | 0.6479 | 8.0 | 45072 | 0.5924 | 0.8905 | 0.9708 | 0.9854 | 0.9927 |
68
- | 0.6181 | 9.0 | 50706 | 0.5827 | 0.8834 | 0.9571 | 0.9693 | 0.9816 |
69
- | 0.6051 | 10.0 | 56340 | 0.5851 | 0.8922 | 0.9701 | 1.0 | 1.0 |
70
 
71
 
72
  ### Framework versions
@@ -74,4 +71,4 @@ The following hyperparameters were used during training:
74
  - Transformers 4.51.3
75
  - Pytorch 2.7.0+cu126
76
  - Datasets 3.5.1
77
- - Tokenizers 0.21.1
 
7
  model-index:
8
  - name: gs-greBERTa
9
  results: []
 
 
 
 
 
10
  ---
11
 
12
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 
14
 
15
  # gs-greBERTa
16
 
17
+ This model is a fine-tuned version of [bowphs/GreBerta](https://huggingface.co/bowphs/GreBerta) on an unknown dataset.
18
+ It achieves the following results on the evaluation set:
19
+ - Loss: 2.8902
20
+ - Top1 Acc: 0.2938
21
+ - Top5 Acc: 0.5563
22
+ - Top10 Acc: 0.725
23
+ - Top15 Acc: 0.8
24
+ - Top20 Acc: 0.8562
25
+ - Top25 Acc: 0.8875
26
 
27
  ## Model description
28
 
 
52
 
53
  ### Training results
54
 
55
+ | Training Loss | Epoch | Step | Validation Loss | Top1 Acc | Top5 Acc | Top10 Acc | Top15 Acc | Top20 Acc | Top25 Acc |
56
+ |:-------------:|:-----:|:-----:|:---------------:|:--------:|:--------:|:---------:|:---------:|:---------:|:---------:|
57
+ | 1.2596 | 1.0 | 5710 | 0.9249 | 0.8129 | 0.8889 | 0.9123 | 0.9357 | 0.9415 | 0.9591 |
58
+ | 0.9346 | 2.0 | 11420 | 0.8081 | 0.8 | 0.9214 | 0.9643 | 0.9714 | 0.9786 | 0.9929 |
59
+ | 0.8283 | 3.0 | 17130 | 0.7369 | 0.8313 | 0.95 | 0.9563 | 0.975 | 0.9812 | 0.9812 |
60
+ | 0.7704 | 4.0 | 22840 | 0.6792 | 0.7812 | 0.9062 | 0.9375 | 0.9375 | 0.95 | 0.9625 |
61
+ | 0.7199 | 5.0 | 28550 | 0.6544 | 0.8158 | 0.9342 | 0.9539 | 0.9605 | 0.9671 | 0.9737 |
62
+ | 0.6929 | 6.0 | 34260 | 0.6316 | 0.8235 | 0.9265 | 0.9412 | 0.9485 | 0.9485 | 0.9632 |
63
+ | 1.0611 | 7.0 | 39970 | 2.5215 | 0.4110 | 0.6164 | 0.7397 | 0.8151 | 0.8699 | 0.9041 |
64
+ | 2.6146 | 8.0 | 45680 | 2.8091 | 0.2865 | 0.4663 | 0.6292 | 0.7191 | 0.7640 | 0.7865 |
65
+ | 2.8789 | 9.0 | 51390 | 2.9949 | 0.3469 | 0.5850 | 0.7211 | 0.8095 | 0.8435 | 0.8844 |
66
+ | 2.9492 | 10.0 | 57100 | 2.8886 | 0.2733 | 0.5267 | 0.72 | 0.8 | 0.8467 | 0.8733 |
67
 
68
 
69
  ### Framework versions
 
71
  - Transformers 4.51.3
72
  - Pytorch 2.7.0+cu126
73
  - Datasets 3.5.1
74
+ - Tokenizers 0.21.1
all_results.json CHANGED
@@ -1,17 +1,19 @@
1
  {
2
  "epoch": 10.0,
3
- "eval_loss": 0.5786494612693787,
4
- "eval_runtime": 736.6563,
5
- "eval_samples_per_second": 3.851,
6
- "eval_steps_per_second": 0.482,
7
- "eval_top10_acc": 0.9447852760736196,
8
- "eval_top1_acc": 0.8588957055214724,
9
- "eval_top20_acc": 0.9631901840490797,
10
- "eval_top5_acc": 0.9202453987730062,
11
- "step": 56340,
12
- "total_flos": 5.933070595915776e+16,
13
- "train_loss": 0.7784920386062855,
14
- "train_runtime": 14293.974,
15
- "train_samples_per_second": 63.066,
16
- "train_steps_per_second": 3.942
 
 
17
  }
 
1
  {
2
  "epoch": 10.0,
3
+ "eval_loss": 2.8902111053466797,
4
+ "eval_runtime": 4655.9501,
5
+ "eval_samples_per_second": 2.196,
6
+ "eval_steps_per_second": 0.274,
7
+ "eval_top10_acc": 0.725,
8
+ "eval_top15_acc": 0.8,
9
+ "eval_top1_acc": 0.29375,
10
+ "eval_top20_acc": 0.85625,
11
+ "eval_top25_acc": 0.8875,
12
+ "eval_top5_acc": 0.55625,
13
+ "step": 57100,
14
+ "total_flos": 6.01310491705344e+16,
15
+ "train_loss": 1.4709516054277036,
16
+ "train_runtime": 50482.2285,
17
+ "train_samples_per_second": 18.099,
18
+ "train_steps_per_second": 1.131
19
  }
eval_results.json CHANGED
@@ -1,11 +1,13 @@
1
  {
2
  "epoch": 10.0,
3
- "eval_loss": 0.5786494612693787,
4
- "eval_runtime": 736.6563,
5
- "eval_samples_per_second": 3.851,
6
- "eval_steps_per_second": 0.482,
7
- "eval_top10_acc": 0.9447852760736196,
8
- "eval_top1_acc": 0.8588957055214724,
9
- "eval_top20_acc": 0.9631901840490797,
10
- "eval_top5_acc": 0.9202453987730062
 
 
11
  }
 
1
  {
2
  "epoch": 10.0,
3
+ "eval_loss": 2.8902111053466797,
4
+ "eval_runtime": 4655.9501,
5
+ "eval_samples_per_second": 2.196,
6
+ "eval_steps_per_second": 0.274,
7
+ "eval_top10_acc": 0.725,
8
+ "eval_top15_acc": 0.8,
9
+ "eval_top1_acc": 0.29375,
10
+ "eval_top20_acc": 0.85625,
11
+ "eval_top25_acc": 0.8875,
12
+ "eval_top5_acc": 0.55625
13
  }
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:dbc282e85cce75ef4bb3e2e8e5e0fe234ef492beebc040cd69cd3f3746b1457c
3
  size 504150808
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bf585d817002ecdff72009d19f5d4b18258b1b20cd6eec4beb507b78b6b9897c
3
  size 504150808
train_results.json CHANGED
@@ -1,9 +1,9 @@
1
  {
2
  "epoch": 10.0,
3
- "step": 56340,
4
- "total_flos": 5.933070595915776e+16,
5
- "train_loss": 0.7784920386062855,
6
- "train_runtime": 14293.974,
7
- "train_samples_per_second": 63.066,
8
- "train_steps_per_second": 3.942
9
  }
 
1
  {
2
  "epoch": 10.0,
3
+ "step": 57100,
4
+ "total_flos": 6.01310491705344e+16,
5
+ "train_loss": 1.4709516054277036,
6
+ "train_runtime": 50482.2285,
7
+ "train_samples_per_second": 18.099,
8
+ "train_steps_per_second": 1.131
9
  }
trainer_state.json CHANGED
@@ -4,225 +4,247 @@
4
  "best_model_checkpoint": null,
5
  "epoch": 10.0,
6
  "eval_steps": 500,
7
- "global_step": 56340,
8
  "is_hyper_param_search": false,
9
  "is_local_process_zero": true,
10
  "is_world_process_zero": true,
11
  "log_history": [
12
  {
13
  "epoch": 1.0,
14
- "grad_norm": 5.6999335289001465,
15
- "learning_rate": 4.5005324813631524e-05,
16
- "loss": 1.2561,
17
- "step": 5634
18
  },
19
  {
20
  "epoch": 1.0,
21
- "eval_loss": 0.9440904259681702,
22
- "eval_runtime": 727.0892,
23
- "eval_samples_per_second": 3.902,
24
- "eval_steps_per_second": 0.488,
25
- "eval_top10_acc": 0.9302325581395349,
26
- "eval_top1_acc": 0.7965116279069767,
27
- "eval_top20_acc": 0.9593023255813954,
28
- "eval_top5_acc": 0.8837209302325582,
29
- "step": 5634
 
 
30
  },
31
  {
32
  "epoch": 2.0,
33
- "grad_norm": 4.9532036781311035,
34
- "learning_rate": 4.000887468938587e-05,
35
- "loss": 0.969,
36
- "step": 11268
37
  },
38
  {
39
  "epoch": 2.0,
40
- "eval_loss": 0.8007155656814575,
41
- "eval_runtime": 732.2211,
42
- "eval_samples_per_second": 3.875,
43
- "eval_steps_per_second": 0.485,
44
- "eval_top10_acc": 0.9366197183098591,
45
- "eval_top1_acc": 0.8028169014084507,
46
- "eval_top20_acc": 0.9647887323943662,
47
- "eval_top5_acc": 0.9295774647887324,
48
- "step": 11268
 
 
49
  },
50
  {
51
  "epoch": 3.0,
52
- "grad_norm": 4.767992973327637,
53
- "learning_rate": 3.5011537096201635e-05,
54
- "loss": 0.8407,
55
- "step": 16902
56
  },
57
  {
58
  "epoch": 3.0,
59
- "eval_loss": 0.750490665435791,
60
- "eval_runtime": 746.6021,
61
- "eval_samples_per_second": 3.8,
62
- "eval_steps_per_second": 0.475,
63
- "eval_top10_acc": 0.9479768786127167,
64
- "eval_top1_acc": 0.8092485549132948,
65
- "eval_top20_acc": 0.976878612716763,
66
- "eval_top5_acc": 0.9248554913294798,
67
- "step": 16902
 
 
68
  },
69
  {
70
  "epoch": 4.0,
71
- "grad_norm": 4.196132659912109,
72
- "learning_rate": 3.0011537096201635e-05,
73
- "loss": 0.7791,
74
- "step": 22536
75
  },
76
  {
77
  "epoch": 4.0,
78
- "eval_loss": 0.6900169849395752,
79
- "eval_runtime": 744.1442,
80
- "eval_samples_per_second": 3.812,
81
- "eval_steps_per_second": 0.477,
82
- "eval_top10_acc": 0.95,
83
- "eval_top1_acc": 0.825,
84
- "eval_top20_acc": 0.975,
85
- "eval_top5_acc": 0.93125,
86
- "step": 22536
 
 
87
  },
88
  {
89
  "epoch": 5.0,
90
- "grad_norm": 4.124821186065674,
91
- "learning_rate": 2.5014199503017393e-05,
92
- "loss": 0.7264,
93
- "step": 28170
94
  },
95
  {
96
  "epoch": 5.0,
97
- "eval_loss": 0.6541450619697571,
98
- "eval_runtime": 732.9508,
99
- "eval_samples_per_second": 3.871,
100
- "eval_steps_per_second": 0.484,
101
- "eval_top10_acc": 0.9705882352941176,
102
- "eval_top1_acc": 0.8823529411764706,
103
- "eval_top20_acc": 0.9764705882352941,
104
- "eval_top5_acc": 0.9470588235294117,
105
- "step": 28170
 
 
106
  },
107
  {
108
  "epoch": 6.0,
109
- "grad_norm": 4.150660991668701,
110
- "learning_rate": 2.001508697195598e-05,
111
- "loss": 0.6872,
112
- "step": 33804
113
  },
114
  {
115
  "epoch": 6.0,
116
- "eval_loss": 0.6342827081680298,
117
- "eval_runtime": 735.093,
118
- "eval_samples_per_second": 3.859,
119
- "eval_steps_per_second": 0.483,
120
- "eval_top10_acc": 0.9570552147239264,
121
- "eval_top1_acc": 0.8343558282208589,
122
- "eval_top20_acc": 0.9877300613496932,
123
- "eval_top5_acc": 0.9263803680981595,
124
- "step": 33804
 
 
125
  },
126
  {
127
  "epoch": 7.0,
128
- "grad_norm": 4.235877990722656,
129
- "learning_rate": 1.501597444089457e-05,
130
- "loss": 0.6553,
131
- "step": 39438
132
  },
133
  {
134
  "epoch": 7.0,
135
- "eval_loss": 0.6069093942642212,
136
- "eval_runtime": 729.8603,
137
- "eval_samples_per_second": 3.887,
138
- "eval_steps_per_second": 0.486,
139
- "eval_top10_acc": 0.9712230215827338,
140
- "eval_top1_acc": 0.8705035971223022,
141
- "eval_top20_acc": 0.9784172661870504,
142
- "eval_top5_acc": 0.9568345323741008,
143
- "step": 39438
 
 
144
  },
145
  {
146
  "epoch": 8.0,
147
- "grad_norm": 4.277960777282715,
148
- "learning_rate": 1.0021299254526093e-05,
149
- "loss": 0.6479,
150
- "step": 45072
151
  },
152
  {
153
  "epoch": 8.0,
154
- "eval_loss": 0.5924458503723145,
155
- "eval_runtime": 735.4415,
156
- "eval_samples_per_second": 3.858,
157
- "eval_steps_per_second": 0.483,
158
- "eval_top10_acc": 0.9854014598540146,
159
- "eval_top1_acc": 0.8905109489051095,
160
- "eval_top20_acc": 0.9927007299270073,
161
- "eval_top5_acc": 0.9708029197080292,
162
- "step": 45072
 
 
163
  },
164
  {
165
  "epoch": 9.0,
166
- "grad_norm": 3.431784152984619,
167
- "learning_rate": 5.023074192403266e-06,
168
- "loss": 0.6181,
169
- "step": 50706
170
  },
171
  {
172
  "epoch": 9.0,
173
- "eval_loss": 0.5826597213745117,
174
- "eval_runtime": 740.5141,
175
- "eval_samples_per_second": 3.831,
176
- "eval_steps_per_second": 0.479,
177
- "eval_top10_acc": 0.9693251533742331,
178
- "eval_top1_acc": 0.8834355828220859,
179
- "eval_top20_acc": 0.9815950920245399,
180
- "eval_top5_acc": 0.9570552147239264,
181
- "step": 50706
 
 
182
  },
183
  {
184
  "epoch": 10.0,
185
- "grad_norm": 4.740328788757324,
186
- "learning_rate": 2.5736599219027333e-08,
187
- "loss": 0.6051,
188
- "step": 56340
189
  },
190
  {
191
  "epoch": 10.0,
192
- "eval_loss": 0.5851157307624817,
193
- "eval_runtime": 739.7027,
194
- "eval_samples_per_second": 3.835,
195
- "eval_steps_per_second": 0.48,
196
- "eval_top10_acc": 1.0,
197
- "eval_top1_acc": 0.8922155688622755,
198
- "eval_top20_acc": 1.0,
199
- "eval_top5_acc": 0.9700598802395209,
200
- "step": 56340
 
 
201
  },
202
  {
203
  "epoch": 10.0,
204
- "step": 56340,
205
- "total_flos": 5.933070595915776e+16,
206
- "train_loss": 0.7784920386062855,
207
- "train_runtime": 14293.974,
208
- "train_samples_per_second": 63.066,
209
- "train_steps_per_second": 3.942
210
  },
211
  {
212
  "epoch": 10.0,
213
- "eval_loss": 0.5786494612693787,
214
- "eval_runtime": 736.6563,
215
- "eval_samples_per_second": 3.851,
216
- "eval_steps_per_second": 0.482,
217
- "eval_top10_acc": 0.9447852760736196,
218
- "eval_top1_acc": 0.8588957055214724,
219
- "eval_top20_acc": 0.9631901840490797,
220
- "eval_top5_acc": 0.9202453987730062,
221
- "step": 56340
 
 
222
  }
223
  ],
224
  "logging_steps": 500,
225
- "max_steps": 56340,
226
  "num_input_tokens_seen": 0,
227
  "num_train_epochs": 10,
228
  "save_steps": 500,
@@ -238,7 +260,7 @@
238
  "attributes": {}
239
  }
240
  },
241
- "total_flos": 5.933070595915776e+16,
242
  "train_batch_size": 16,
243
  "trial_name": null,
244
  "trial_params": null
 
4
  "best_model_checkpoint": null,
5
  "epoch": 10.0,
6
  "eval_steps": 500,
7
+ "global_step": 57100,
8
  "is_hyper_param_search": false,
9
  "is_local_process_zero": true,
10
  "is_world_process_zero": true,
11
  "log_history": [
12
  {
13
  "epoch": 1.0,
14
+ "grad_norm": 4.83211088180542,
15
+ "learning_rate": 4.50061295971979e-05,
16
+ "loss": 1.2596,
17
+ "step": 5710
18
  },
19
  {
20
  "epoch": 1.0,
21
+ "eval_loss": 0.9248688817024231,
22
+ "eval_runtime": 4196.9447,
23
+ "eval_samples_per_second": 2.436,
24
+ "eval_steps_per_second": 0.305,
25
+ "eval_top10_acc": 0.9122807017543859,
26
+ "eval_top15_acc": 0.935672514619883,
27
+ "eval_top1_acc": 0.8128654970760234,
28
+ "eval_top20_acc": 0.9415204678362573,
29
+ "eval_top25_acc": 0.9590643274853801,
30
+ "eval_top5_acc": 0.8888888888888888,
31
+ "step": 5710
32
  },
33
  {
34
  "epoch": 2.0,
35
+ "grad_norm": 5.2830610275268555,
36
+ "learning_rate": 4.0007005253940456e-05,
37
+ "loss": 0.9346,
38
+ "step": 11420
39
  },
40
  {
41
  "epoch": 2.0,
42
+ "eval_loss": 0.8081104159355164,
43
+ "eval_runtime": 4239.8896,
44
+ "eval_samples_per_second": 2.411,
45
+ "eval_steps_per_second": 0.301,
46
+ "eval_top10_acc": 0.9642857142857143,
47
+ "eval_top15_acc": 0.9714285714285714,
48
+ "eval_top1_acc": 0.8,
49
+ "eval_top20_acc": 0.9785714285714285,
50
+ "eval_top25_acc": 0.9928571428571429,
51
+ "eval_top5_acc": 0.9214285714285714,
52
+ "step": 11420
53
  },
54
  {
55
  "epoch": 3.0,
56
+ "grad_norm": 4.63004207611084,
57
+ "learning_rate": 3.500875656742557e-05,
58
+ "loss": 0.8283,
59
+ "step": 17130
60
  },
61
  {
62
  "epoch": 3.0,
63
+ "eval_loss": 0.7369009852409363,
64
+ "eval_runtime": 4264.7537,
65
+ "eval_samples_per_second": 2.397,
66
+ "eval_steps_per_second": 0.3,
67
+ "eval_top10_acc": 0.95625,
68
+ "eval_top15_acc": 0.975,
69
+ "eval_top1_acc": 0.83125,
70
+ "eval_top20_acc": 0.98125,
71
+ "eval_top25_acc": 0.98125,
72
+ "eval_top5_acc": 0.95,
73
+ "step": 17130
74
  },
75
  {
76
  "epoch": 4.0,
77
+ "grad_norm": 4.364180564880371,
78
+ "learning_rate": 3.0010507880910683e-05,
79
+ "loss": 0.7704,
80
+ "step": 22840
81
  },
82
  {
83
  "epoch": 4.0,
84
+ "eval_loss": 0.6792041063308716,
85
+ "eval_runtime": 4245.71,
86
+ "eval_samples_per_second": 2.408,
87
+ "eval_steps_per_second": 0.301,
88
+ "eval_top10_acc": 0.9375,
89
+ "eval_top15_acc": 0.9375,
90
+ "eval_top1_acc": 0.78125,
91
+ "eval_top20_acc": 0.95,
92
+ "eval_top25_acc": 0.9625,
93
+ "eval_top5_acc": 0.90625,
94
+ "step": 22840
95
  },
96
  {
97
  "epoch": 5.0,
98
+ "grad_norm": 4.872766971588135,
99
+ "learning_rate": 2.50122591943958e-05,
100
+ "loss": 0.7199,
101
+ "step": 28550
102
  },
103
  {
104
  "epoch": 5.0,
105
+ "eval_loss": 0.6544201970100403,
106
+ "eval_runtime": 4216.6247,
107
+ "eval_samples_per_second": 2.425,
108
+ "eval_steps_per_second": 0.303,
109
+ "eval_top10_acc": 0.9539473684210527,
110
+ "eval_top15_acc": 0.9605263157894737,
111
+ "eval_top1_acc": 0.8157894736842105,
112
+ "eval_top20_acc": 0.9671052631578947,
113
+ "eval_top25_acc": 0.9736842105263158,
114
+ "eval_top5_acc": 0.9342105263157895,
115
+ "step": 28550
116
  },
117
  {
118
  "epoch": 6.0,
119
+ "grad_norm": 4.480776309967041,
120
+ "learning_rate": 2.001488616462347e-05,
121
+ "loss": 0.6929,
122
+ "step": 34260
123
  },
124
  {
125
  "epoch": 6.0,
126
+ "eval_loss": 0.63157719373703,
127
+ "eval_runtime": 4227.1237,
128
+ "eval_samples_per_second": 2.419,
129
+ "eval_steps_per_second": 0.302,
130
+ "eval_top10_acc": 0.9411764705882353,
131
+ "eval_top15_acc": 0.9485294117647058,
132
+ "eval_top1_acc": 0.8235294117647058,
133
+ "eval_top20_acc": 0.9485294117647058,
134
+ "eval_top25_acc": 0.9632352941176471,
135
+ "eval_top5_acc": 0.9264705882352942,
136
+ "step": 34260
137
  },
138
  {
139
  "epoch": 7.0,
140
+ "grad_norm": 5.865991592407227,
141
+ "learning_rate": 1.5018388791593696e-05,
142
+ "loss": 1.0611,
143
+ "step": 39970
144
  },
145
  {
146
  "epoch": 7.0,
147
+ "eval_loss": 2.5214645862579346,
148
+ "eval_runtime": 4231.1251,
149
+ "eval_samples_per_second": 2.416,
150
+ "eval_steps_per_second": 0.302,
151
+ "eval_top10_acc": 0.7397260273972602,
152
+ "eval_top15_acc": 0.815068493150685,
153
+ "eval_top1_acc": 0.410958904109589,
154
+ "eval_top20_acc": 0.8698630136986302,
155
+ "eval_top25_acc": 0.9041095890410958,
156
+ "eval_top5_acc": 0.6164383561643836,
157
+ "step": 39970
158
  },
159
  {
160
  "epoch": 8.0,
161
+ "grad_norm": 5.6015849113464355,
162
+ "learning_rate": 1.0021015761821365e-05,
163
+ "loss": 2.6146,
164
+ "step": 45680
165
  },
166
  {
167
  "epoch": 8.0,
168
+ "eval_loss": 2.8090970516204834,
169
+ "eval_runtime": 4501.1732,
170
+ "eval_samples_per_second": 2.271,
171
+ "eval_steps_per_second": 0.284,
172
+ "eval_top10_acc": 0.6292134831460674,
173
+ "eval_top15_acc": 0.7191011235955056,
174
+ "eval_top1_acc": 0.28651685393258425,
175
+ "eval_top20_acc": 0.7640449438202247,
176
+ "eval_top25_acc": 0.7865168539325843,
177
+ "eval_top5_acc": 0.46629213483146065,
178
+ "step": 45680
179
  },
180
  {
181
  "epoch": 9.0,
182
+ "grad_norm": 5.040839672088623,
183
+ "learning_rate": 5.023642732049037e-06,
184
+ "loss": 2.8789,
185
+ "step": 51390
186
  },
187
  {
188
  "epoch": 9.0,
189
+ "eval_loss": 2.994880199432373,
190
+ "eval_runtime": 4685.973,
191
+ "eval_samples_per_second": 2.182,
192
+ "eval_steps_per_second": 0.273,
193
+ "eval_top10_acc": 0.7210884353741497,
194
+ "eval_top15_acc": 0.8095238095238095,
195
+ "eval_top1_acc": 0.3469387755102041,
196
+ "eval_top20_acc": 0.8435374149659864,
197
+ "eval_top25_acc": 0.8843537414965986,
198
+ "eval_top5_acc": 0.5850340136054422,
199
+ "step": 51390
200
  },
201
  {
202
  "epoch": 10.0,
203
+ "grad_norm": 5.243908405303955,
204
+ "learning_rate": 2.4518388791593697e-08,
205
+ "loss": 2.9492,
206
+ "step": 57100
207
  },
208
  {
209
  "epoch": 10.0,
210
+ "eval_loss": 2.8885974884033203,
211
+ "eval_runtime": 4645.6271,
212
+ "eval_samples_per_second": 2.201,
213
+ "eval_steps_per_second": 0.275,
214
+ "eval_top10_acc": 0.72,
215
+ "eval_top15_acc": 0.8,
216
+ "eval_top1_acc": 0.2733333333333333,
217
+ "eval_top20_acc": 0.8466666666666667,
218
+ "eval_top25_acc": 0.8733333333333333,
219
+ "eval_top5_acc": 0.5266666666666666,
220
+ "step": 57100
221
  },
222
  {
223
  "epoch": 10.0,
224
+ "step": 57100,
225
+ "total_flos": 6.01310491705344e+16,
226
+ "train_loss": 1.4709516054277036,
227
+ "train_runtime": 50482.2285,
228
+ "train_samples_per_second": 18.099,
229
+ "train_steps_per_second": 1.131
230
  },
231
  {
232
  "epoch": 10.0,
233
+ "eval_loss": 2.8902111053466797,
234
+ "eval_runtime": 4655.9501,
235
+ "eval_samples_per_second": 2.196,
236
+ "eval_steps_per_second": 0.274,
237
+ "eval_top10_acc": 0.725,
238
+ "eval_top15_acc": 0.8,
239
+ "eval_top1_acc": 0.29375,
240
+ "eval_top20_acc": 0.85625,
241
+ "eval_top25_acc": 0.8875,
242
+ "eval_top5_acc": 0.55625,
243
+ "step": 57100
244
  }
245
  ],
246
  "logging_steps": 500,
247
+ "max_steps": 57100,
248
  "num_input_tokens_seen": 0,
249
  "num_train_epochs": 10,
250
  "save_steps": 500,
 
260
  "attributes": {}
261
  }
262
  },
263
+ "total_flos": 6.01310491705344e+16,
264
  "train_batch_size": 16,
265
  "trial_name": null,
266
  "trial_params": null