Shawon16 commited on
Commit
118cc9a
·
verified ·
1 Parent(s): d198ed9

End of training

Browse files
Files changed (4) hide show
  1. README.md +2 -2
  2. all_results.json +8 -0
  3. test_results.json +8 -0
  4. trainer_state.json +1502 -0
README.md CHANGED
@@ -18,8 +18,8 @@ should probably proofread and complete it, then remove this comment. -->
18
 
19
  This model is a fine-tuned version of [MCG-NJU/videomae-base](https://huggingface.co/MCG-NJU/videomae-base) on an unknown dataset.
20
  It achieves the following results on the evaluation set:
21
- - Loss: 0.0583
22
- - Accuracy: 0.9929
23
 
24
  ## Model description
25
 
 
18
 
19
  This model is a fine-tuned version of [MCG-NJU/videomae-base](https://huggingface.co/MCG-NJU/videomae-base) on an unknown dataset.
20
  It achieves the following results on the evaluation set:
21
+ - Loss: 0.5588
22
+ - Accuracy: 0.8973
23
 
24
  ## Model description
25
 
all_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 19.040078125,
3
+ "eval_accuracy": 0.8973354231974922,
4
+ "eval_loss": 0.5587517619132996,
5
+ "eval_runtime": 352.6874,
6
+ "eval_samples_per_second": 3.618,
7
+ "eval_steps_per_second": 1.809
8
+ }
test_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 19.040078125,
3
+ "eval_accuracy": 0.8973354231974922,
4
+ "eval_loss": 0.5587517619132996,
5
+ "eval_runtime": 352.6874,
6
+ "eval_samples_per_second": 3.618,
7
+ "eval_steps_per_second": 1.809
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,1502 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.9929411764705882,
3
+ "best_model_checkpoint": "/media/cse/HDD/Shawon/shawon/MY DATA/VideoMAE_BdSLW60_FrameRateCorrected_withAug_100/checkpoint-13466",
4
+ "epoch": 19.040078125,
5
+ "eval_steps": 500,
6
+ "global_step": 17955,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.004464285714285714,
13
+ "grad_norm": 11.12140941619873,
14
+ "learning_rate": 2.2321428571428573e-06,
15
+ "loss": 4.1557,
16
+ "step": 100
17
+ },
18
+ {
19
+ "epoch": 0.008928571428571428,
20
+ "grad_norm": 10.578296661376953,
21
+ "learning_rate": 4.464285714285715e-06,
22
+ "loss": 4.1159,
23
+ "step": 200
24
+ },
25
+ {
26
+ "epoch": 0.013392857142857142,
27
+ "grad_norm": 9.035299301147461,
28
+ "learning_rate": 6.696428571428572e-06,
29
+ "loss": 4.0848,
30
+ "step": 300
31
+ },
32
+ {
33
+ "epoch": 0.017857142857142856,
34
+ "grad_norm": 9.214325904846191,
35
+ "learning_rate": 8.92857142857143e-06,
36
+ "loss": 4.0703,
37
+ "step": 400
38
+ },
39
+ {
40
+ "epoch": 0.022321428571428572,
41
+ "grad_norm": 8.834626197814941,
42
+ "learning_rate": 1.1160714285714287e-05,
43
+ "loss": 4.0688,
44
+ "step": 500
45
+ },
46
+ {
47
+ "epoch": 0.026785714285714284,
48
+ "grad_norm": 10.655806541442871,
49
+ "learning_rate": 1.3392857142857144e-05,
50
+ "loss": 3.8577,
51
+ "step": 600
52
+ },
53
+ {
54
+ "epoch": 0.03125,
55
+ "grad_norm": 11.894658088684082,
56
+ "learning_rate": 1.5625e-05,
57
+ "loss": 3.4927,
58
+ "step": 700
59
+ },
60
+ {
61
+ "epoch": 0.03571428571428571,
62
+ "grad_norm": 13.875555992126465,
63
+ "learning_rate": 1.785714285714286e-05,
64
+ "loss": 3.0699,
65
+ "step": 800
66
+ },
67
+ {
68
+ "epoch": 0.040044642857142855,
69
+ "eval_accuracy": 0.4752941176470588,
70
+ "eval_loss": 2.454066514968872,
71
+ "eval_runtime": 290.1198,
72
+ "eval_samples_per_second": 2.93,
73
+ "eval_steps_per_second": 1.465,
74
+ "step": 897
75
+ },
76
+ {
77
+ "epoch": 1.0001004464285714,
78
+ "grad_norm": 11.632246017456055,
79
+ "learning_rate": 2.0089285714285717e-05,
80
+ "loss": 2.5881,
81
+ "step": 900
82
+ },
83
+ {
84
+ "epoch": 1.0045647321428572,
85
+ "grad_norm": 15.39003849029541,
86
+ "learning_rate": 2.2321428571428575e-05,
87
+ "loss": 2.2052,
88
+ "step": 1000
89
+ },
90
+ {
91
+ "epoch": 1.0090290178571428,
92
+ "grad_norm": 17.561227798461914,
93
+ "learning_rate": 2.455357142857143e-05,
94
+ "loss": 1.8017,
95
+ "step": 1100
96
+ },
97
+ {
98
+ "epoch": 1.0134933035714286,
99
+ "grad_norm": 16.368633270263672,
100
+ "learning_rate": 2.6785714285714288e-05,
101
+ "loss": 1.5213,
102
+ "step": 1200
103
+ },
104
+ {
105
+ "epoch": 1.0179575892857142,
106
+ "grad_norm": 18.419261932373047,
107
+ "learning_rate": 2.9017857142857146e-05,
108
+ "loss": 1.1462,
109
+ "step": 1300
110
+ },
111
+ {
112
+ "epoch": 1.022421875,
113
+ "grad_norm": 14.493526458740234,
114
+ "learning_rate": 3.125e-05,
115
+ "loss": 1.0545,
116
+ "step": 1400
117
+ },
118
+ {
119
+ "epoch": 1.0268861607142856,
120
+ "grad_norm": 15.404373168945312,
121
+ "learning_rate": 3.348214285714286e-05,
122
+ "loss": 0.7972,
123
+ "step": 1500
124
+ },
125
+ {
126
+ "epoch": 1.0313504464285714,
127
+ "grad_norm": 7.37654972076416,
128
+ "learning_rate": 3.571428571428572e-05,
129
+ "loss": 0.6743,
130
+ "step": 1600
131
+ },
132
+ {
133
+ "epoch": 1.0358147321428572,
134
+ "grad_norm": 17.836456298828125,
135
+ "learning_rate": 3.794642857142857e-05,
136
+ "loss": 0.6366,
137
+ "step": 1700
138
+ },
139
+ {
140
+ "epoch": 1.0400558035714287,
141
+ "eval_accuracy": 0.84,
142
+ "eval_loss": 0.6831679344177246,
143
+ "eval_runtime": 295.5183,
144
+ "eval_samples_per_second": 2.876,
145
+ "eval_steps_per_second": 1.438,
146
+ "step": 1795
147
+ },
148
+ {
149
+ "epoch": 2.000200892857143,
150
+ "grad_norm": 27.33871078491211,
151
+ "learning_rate": 4.017857142857143e-05,
152
+ "loss": 0.6165,
153
+ "step": 1800
154
+ },
155
+ {
156
+ "epoch": 2.0046651785714285,
157
+ "grad_norm": 1.4543864727020264,
158
+ "learning_rate": 4.2410714285714285e-05,
159
+ "loss": 0.4179,
160
+ "step": 1900
161
+ },
162
+ {
163
+ "epoch": 2.0091294642857145,
164
+ "grad_norm": 7.2733659744262695,
165
+ "learning_rate": 4.464285714285715e-05,
166
+ "loss": 0.4156,
167
+ "step": 2000
168
+ },
169
+ {
170
+ "epoch": 2.01359375,
171
+ "grad_norm": 21.995115280151367,
172
+ "learning_rate": 4.6875e-05,
173
+ "loss": 0.3666,
174
+ "step": 2100
175
+ },
176
+ {
177
+ "epoch": 2.0180580357142857,
178
+ "grad_norm": 19.265806198120117,
179
+ "learning_rate": 4.910714285714286e-05,
180
+ "loss": 0.3751,
181
+ "step": 2200
182
+ },
183
+ {
184
+ "epoch": 2.0225223214285712,
185
+ "grad_norm": 26.048490524291992,
186
+ "learning_rate": 4.985119047619048e-05,
187
+ "loss": 0.3401,
188
+ "step": 2300
189
+ },
190
+ {
191
+ "epoch": 2.0269866071428573,
192
+ "grad_norm": 26.414731979370117,
193
+ "learning_rate": 4.960317460317461e-05,
194
+ "loss": 0.2955,
195
+ "step": 2400
196
+ },
197
+ {
198
+ "epoch": 2.031450892857143,
199
+ "grad_norm": 17.34372901916504,
200
+ "learning_rate": 4.9355158730158735e-05,
201
+ "loss": 0.2859,
202
+ "step": 2500
203
+ },
204
+ {
205
+ "epoch": 2.0359151785714285,
206
+ "grad_norm": 3.029252767562866,
207
+ "learning_rate": 4.910714285714286e-05,
208
+ "loss": 0.2253,
209
+ "step": 2600
210
+ },
211
+ {
212
+ "epoch": 2.0400669642857143,
213
+ "eval_accuracy": 0.9023529411764706,
214
+ "eval_loss": 0.3464316725730896,
215
+ "eval_runtime": 282.6757,
216
+ "eval_samples_per_second": 3.007,
217
+ "eval_steps_per_second": 1.503,
218
+ "step": 2693
219
+ },
220
+ {
221
+ "epoch": 3.000301339285714,
222
+ "grad_norm": 11.130131721496582,
223
+ "learning_rate": 4.8859126984126984e-05,
224
+ "loss": 0.232,
225
+ "step": 2700
226
+ },
227
+ {
228
+ "epoch": 3.004765625,
229
+ "grad_norm": 3.47011661529541,
230
+ "learning_rate": 4.8611111111111115e-05,
231
+ "loss": 0.1247,
232
+ "step": 2800
233
+ },
234
+ {
235
+ "epoch": 3.0092299107142857,
236
+ "grad_norm": 18.701496124267578,
237
+ "learning_rate": 4.836309523809524e-05,
238
+ "loss": 0.1293,
239
+ "step": 2900
240
+ },
241
+ {
242
+ "epoch": 3.0136941964285713,
243
+ "grad_norm": 0.7256734371185303,
244
+ "learning_rate": 4.811507936507937e-05,
245
+ "loss": 0.1291,
246
+ "step": 3000
247
+ },
248
+ {
249
+ "epoch": 3.0181584821428573,
250
+ "grad_norm": 24.983957290649414,
251
+ "learning_rate": 4.7867063492063496e-05,
252
+ "loss": 0.195,
253
+ "step": 3100
254
+ },
255
+ {
256
+ "epoch": 3.022622767857143,
257
+ "grad_norm": 0.1959875524044037,
258
+ "learning_rate": 4.761904761904762e-05,
259
+ "loss": 0.0969,
260
+ "step": 3200
261
+ },
262
+ {
263
+ "epoch": 3.0270870535714285,
264
+ "grad_norm": 1.1051886081695557,
265
+ "learning_rate": 4.7371031746031745e-05,
266
+ "loss": 0.1691,
267
+ "step": 3300
268
+ },
269
+ {
270
+ "epoch": 3.031551339285714,
271
+ "grad_norm": 0.48205551505088806,
272
+ "learning_rate": 4.7123015873015876e-05,
273
+ "loss": 0.1297,
274
+ "step": 3400
275
+ },
276
+ {
277
+ "epoch": 3.036015625,
278
+ "grad_norm": 0.8840370774269104,
279
+ "learning_rate": 4.6875e-05,
280
+ "loss": 0.1229,
281
+ "step": 3500
282
+ },
283
+ {
284
+ "epoch": 3.040078125,
285
+ "eval_accuracy": 0.9647058823529412,
286
+ "eval_loss": 0.14670781791210175,
287
+ "eval_runtime": 285.475,
288
+ "eval_samples_per_second": 2.977,
289
+ "eval_steps_per_second": 1.489,
290
+ "step": 3591
291
+ },
292
+ {
293
+ "epoch": 4.000401785714286,
294
+ "grad_norm": 0.21204273402690887,
295
+ "learning_rate": 4.662698412698413e-05,
296
+ "loss": 0.1337,
297
+ "step": 3600
298
+ },
299
+ {
300
+ "epoch": 4.004866071428571,
301
+ "grad_norm": 2.2111618518829346,
302
+ "learning_rate": 4.637896825396826e-05,
303
+ "loss": 0.0821,
304
+ "step": 3700
305
+ },
306
+ {
307
+ "epoch": 4.009330357142857,
308
+ "grad_norm": 2.208402395248413,
309
+ "learning_rate": 4.613095238095239e-05,
310
+ "loss": 0.098,
311
+ "step": 3800
312
+ },
313
+ {
314
+ "epoch": 4.0137946428571425,
315
+ "grad_norm": 3.035139560699463,
316
+ "learning_rate": 4.5882936507936506e-05,
317
+ "loss": 0.0828,
318
+ "step": 3900
319
+ },
320
+ {
321
+ "epoch": 4.018258928571429,
322
+ "grad_norm": 0.06664509326219559,
323
+ "learning_rate": 4.563492063492064e-05,
324
+ "loss": 0.0705,
325
+ "step": 4000
326
+ },
327
+ {
328
+ "epoch": 4.0227232142857146,
329
+ "grad_norm": 0.049911659210920334,
330
+ "learning_rate": 4.538690476190476e-05,
331
+ "loss": 0.0506,
332
+ "step": 4100
333
+ },
334
+ {
335
+ "epoch": 4.0271875,
336
+ "grad_norm": 6.9254374504089355,
337
+ "learning_rate": 4.5138888888888894e-05,
338
+ "loss": 0.0895,
339
+ "step": 4200
340
+ },
341
+ {
342
+ "epoch": 4.031651785714286,
343
+ "grad_norm": 0.6636308431625366,
344
+ "learning_rate": 4.489087301587302e-05,
345
+ "loss": 0.0762,
346
+ "step": 4300
347
+ },
348
+ {
349
+ "epoch": 4.036116071428571,
350
+ "grad_norm": 0.07036083936691284,
351
+ "learning_rate": 4.464285714285715e-05,
352
+ "loss": 0.1045,
353
+ "step": 4400
354
+ },
355
+ {
356
+ "epoch": 4.040044642857143,
357
+ "eval_accuracy": 0.9635294117647059,
358
+ "eval_loss": 0.1458999365568161,
359
+ "eval_runtime": 292.1403,
360
+ "eval_samples_per_second": 2.91,
361
+ "eval_steps_per_second": 1.455,
362
+ "step": 4488
363
+ },
364
+ {
365
+ "epoch": 5.000502232142857,
366
+ "grad_norm": 25.948030471801758,
367
+ "learning_rate": 4.439484126984127e-05,
368
+ "loss": 0.1201,
369
+ "step": 4500
370
+ },
371
+ {
372
+ "epoch": 5.0049665178571425,
373
+ "grad_norm": 4.851236343383789,
374
+ "learning_rate": 4.41468253968254e-05,
375
+ "loss": 0.0751,
376
+ "step": 4600
377
+ },
378
+ {
379
+ "epoch": 5.009430803571429,
380
+ "grad_norm": 2.069117307662964,
381
+ "learning_rate": 4.3898809523809523e-05,
382
+ "loss": 0.06,
383
+ "step": 4700
384
+ },
385
+ {
386
+ "epoch": 5.013895089285715,
387
+ "grad_norm": 0.02893979474902153,
388
+ "learning_rate": 4.3650793650793655e-05,
389
+ "loss": 0.0583,
390
+ "step": 4800
391
+ },
392
+ {
393
+ "epoch": 5.018359375,
394
+ "grad_norm": 38.84079360961914,
395
+ "learning_rate": 4.340277777777778e-05,
396
+ "loss": 0.0854,
397
+ "step": 4900
398
+ },
399
+ {
400
+ "epoch": 5.022823660714286,
401
+ "grad_norm": 0.01713498868048191,
402
+ "learning_rate": 4.315476190476191e-05,
403
+ "loss": 0.1064,
404
+ "step": 5000
405
+ },
406
+ {
407
+ "epoch": 5.027287946428571,
408
+ "grad_norm": 2.2113935947418213,
409
+ "learning_rate": 4.290674603174603e-05,
410
+ "loss": 0.0534,
411
+ "step": 5100
412
+ },
413
+ {
414
+ "epoch": 5.031752232142857,
415
+ "grad_norm": 0.030846355482935905,
416
+ "learning_rate": 4.265873015873016e-05,
417
+ "loss": 0.0812,
418
+ "step": 5200
419
+ },
420
+ {
421
+ "epoch": 5.0362165178571425,
422
+ "grad_norm": 63.66303253173828,
423
+ "learning_rate": 4.2410714285714285e-05,
424
+ "loss": 0.0631,
425
+ "step": 5300
426
+ },
427
+ {
428
+ "epoch": 5.040055803571429,
429
+ "eval_accuracy": 0.971764705882353,
430
+ "eval_loss": 0.13126207888126373,
431
+ "eval_runtime": 282.9661,
432
+ "eval_samples_per_second": 3.004,
433
+ "eval_steps_per_second": 1.502,
434
+ "step": 5386
435
+ },
436
+ {
437
+ "epoch": 6.000602678571428,
438
+ "grad_norm": 0.01721133291721344,
439
+ "learning_rate": 4.2162698412698416e-05,
440
+ "loss": 0.1066,
441
+ "step": 5400
442
+ },
443
+ {
444
+ "epoch": 6.005066964285715,
445
+ "grad_norm": 0.06797400861978531,
446
+ "learning_rate": 4.191468253968254e-05,
447
+ "loss": 0.0751,
448
+ "step": 5500
449
+ },
450
+ {
451
+ "epoch": 6.00953125,
452
+ "grad_norm": 0.22653132677078247,
453
+ "learning_rate": 4.166666666666667e-05,
454
+ "loss": 0.0417,
455
+ "step": 5600
456
+ },
457
+ {
458
+ "epoch": 6.013995535714286,
459
+ "grad_norm": 0.07131924480199814,
460
+ "learning_rate": 4.14186507936508e-05,
461
+ "loss": 0.0158,
462
+ "step": 5700
463
+ },
464
+ {
465
+ "epoch": 6.018459821428571,
466
+ "grad_norm": 40.63113784790039,
467
+ "learning_rate": 4.117063492063492e-05,
468
+ "loss": 0.0522,
469
+ "step": 5800
470
+ },
471
+ {
472
+ "epoch": 6.022924107142857,
473
+ "grad_norm": 0.09443258494138718,
474
+ "learning_rate": 4.0922619047619046e-05,
475
+ "loss": 0.072,
476
+ "step": 5900
477
+ },
478
+ {
479
+ "epoch": 6.027388392857143,
480
+ "grad_norm": 0.5265907049179077,
481
+ "learning_rate": 4.067460317460318e-05,
482
+ "loss": 0.0318,
483
+ "step": 6000
484
+ },
485
+ {
486
+ "epoch": 6.031852678571428,
487
+ "grad_norm": 0.03210202232003212,
488
+ "learning_rate": 4.04265873015873e-05,
489
+ "loss": 0.0877,
490
+ "step": 6100
491
+ },
492
+ {
493
+ "epoch": 6.036316964285715,
494
+ "grad_norm": 0.34825244545936584,
495
+ "learning_rate": 4.017857142857143e-05,
496
+ "loss": 0.0736,
497
+ "step": 6200
498
+ },
499
+ {
500
+ "epoch": 6.040066964285714,
501
+ "eval_accuracy": 0.9635294117647059,
502
+ "eval_loss": 0.18067213892936707,
503
+ "eval_runtime": 285.3373,
504
+ "eval_samples_per_second": 2.979,
505
+ "eval_steps_per_second": 1.489,
506
+ "step": 6284
507
+ },
508
+ {
509
+ "epoch": 7.000703125,
510
+ "grad_norm": 0.006914912257343531,
511
+ "learning_rate": 3.993055555555556e-05,
512
+ "loss": 0.0283,
513
+ "step": 6300
514
+ },
515
+ {
516
+ "epoch": 7.005167410714286,
517
+ "grad_norm": 0.0338265560567379,
518
+ "learning_rate": 3.968253968253968e-05,
519
+ "loss": 0.0499,
520
+ "step": 6400
521
+ },
522
+ {
523
+ "epoch": 7.009631696428571,
524
+ "grad_norm": 10.877938270568848,
525
+ "learning_rate": 3.943452380952381e-05,
526
+ "loss": 0.0082,
527
+ "step": 6500
528
+ },
529
+ {
530
+ "epoch": 7.014095982142857,
531
+ "grad_norm": 0.10941223055124283,
532
+ "learning_rate": 3.918650793650794e-05,
533
+ "loss": 0.0657,
534
+ "step": 6600
535
+ },
536
+ {
537
+ "epoch": 7.018560267857143,
538
+ "grad_norm": 12.054357528686523,
539
+ "learning_rate": 3.893849206349206e-05,
540
+ "loss": 0.0609,
541
+ "step": 6700
542
+ },
543
+ {
544
+ "epoch": 7.023024553571428,
545
+ "grad_norm": 0.006210957653820515,
546
+ "learning_rate": 3.8690476190476195e-05,
547
+ "loss": 0.0486,
548
+ "step": 6800
549
+ },
550
+ {
551
+ "epoch": 7.027488839285715,
552
+ "grad_norm": 0.013958507217466831,
553
+ "learning_rate": 3.844246031746032e-05,
554
+ "loss": 0.0747,
555
+ "step": 6900
556
+ },
557
+ {
558
+ "epoch": 7.031953125,
559
+ "grad_norm": 14.515870094299316,
560
+ "learning_rate": 3.8194444444444444e-05,
561
+ "loss": 0.0343,
562
+ "step": 7000
563
+ },
564
+ {
565
+ "epoch": 7.036417410714286,
566
+ "grad_norm": 0.007723964750766754,
567
+ "learning_rate": 3.794642857142857e-05,
568
+ "loss": 0.0673,
569
+ "step": 7100
570
+ },
571
+ {
572
+ "epoch": 7.040078125,
573
+ "eval_accuracy": 0.9694117647058823,
574
+ "eval_loss": 0.14643678069114685,
575
+ "eval_runtime": 288.72,
576
+ "eval_samples_per_second": 2.944,
577
+ "eval_steps_per_second": 1.472,
578
+ "step": 7182
579
+ },
580
+ {
581
+ "epoch": 8.000803571428571,
582
+ "grad_norm": 45.418617248535156,
583
+ "learning_rate": 3.76984126984127e-05,
584
+ "loss": 0.0476,
585
+ "step": 7200
586
+ },
587
+ {
588
+ "epoch": 8.005267857142858,
589
+ "grad_norm": 0.008381331339478493,
590
+ "learning_rate": 3.7450396825396824e-05,
591
+ "loss": 0.0421,
592
+ "step": 7300
593
+ },
594
+ {
595
+ "epoch": 8.009732142857143,
596
+ "grad_norm": 0.7666055560112,
597
+ "learning_rate": 3.7202380952380956e-05,
598
+ "loss": 0.0832,
599
+ "step": 7400
600
+ },
601
+ {
602
+ "epoch": 8.014196428571429,
603
+ "grad_norm": 0.09307380765676498,
604
+ "learning_rate": 3.695436507936508e-05,
605
+ "loss": 0.0875,
606
+ "step": 7500
607
+ },
608
+ {
609
+ "epoch": 8.018660714285714,
610
+ "grad_norm": 0.012713397853076458,
611
+ "learning_rate": 3.6706349206349205e-05,
612
+ "loss": 0.0441,
613
+ "step": 7600
614
+ },
615
+ {
616
+ "epoch": 8.023125,
617
+ "grad_norm": 0.021006299182772636,
618
+ "learning_rate": 3.6458333333333336e-05,
619
+ "loss": 0.054,
620
+ "step": 7700
621
+ },
622
+ {
623
+ "epoch": 8.027589285714285,
624
+ "grad_norm": 0.1419028341770172,
625
+ "learning_rate": 3.621031746031746e-05,
626
+ "loss": 0.0608,
627
+ "step": 7800
628
+ },
629
+ {
630
+ "epoch": 8.032053571428571,
631
+ "grad_norm": 0.025018220767378807,
632
+ "learning_rate": 3.5962301587301586e-05,
633
+ "loss": 0.0479,
634
+ "step": 7900
635
+ },
636
+ {
637
+ "epoch": 8.036517857142858,
638
+ "grad_norm": 0.5912023186683655,
639
+ "learning_rate": 3.571428571428572e-05,
640
+ "loss": 0.0239,
641
+ "step": 8000
642
+ },
643
+ {
644
+ "epoch": 8.040044642857143,
645
+ "eval_accuracy": 0.9576470588235294,
646
+ "eval_loss": 0.193200945854187,
647
+ "eval_runtime": 279.9813,
648
+ "eval_samples_per_second": 3.036,
649
+ "eval_steps_per_second": 1.518,
650
+ "step": 8079
651
+ },
652
+ {
653
+ "epoch": 9.000904017857144,
654
+ "grad_norm": 0.0350213348865509,
655
+ "learning_rate": 3.546626984126984e-05,
656
+ "loss": 0.067,
657
+ "step": 8100
658
+ },
659
+ {
660
+ "epoch": 9.005368303571428,
661
+ "grad_norm": 2.537632465362549,
662
+ "learning_rate": 3.521825396825397e-05,
663
+ "loss": 0.0245,
664
+ "step": 8200
665
+ },
666
+ {
667
+ "epoch": 9.009832589285715,
668
+ "grad_norm": 2.564781665802002,
669
+ "learning_rate": 3.49702380952381e-05,
670
+ "loss": 0.0262,
671
+ "step": 8300
672
+ },
673
+ {
674
+ "epoch": 9.014296875,
675
+ "grad_norm": 0.00803827028721571,
676
+ "learning_rate": 3.472222222222222e-05,
677
+ "loss": 0.0559,
678
+ "step": 8400
679
+ },
680
+ {
681
+ "epoch": 9.018761160714286,
682
+ "grad_norm": 0.005816516932100058,
683
+ "learning_rate": 3.4474206349206354e-05,
684
+ "loss": 0.0519,
685
+ "step": 8500
686
+ },
687
+ {
688
+ "epoch": 9.02322544642857,
689
+ "grad_norm": 0.021420830860733986,
690
+ "learning_rate": 3.422619047619048e-05,
691
+ "loss": 0.032,
692
+ "step": 8600
693
+ },
694
+ {
695
+ "epoch": 9.027689732142857,
696
+ "grad_norm": 0.028336547315120697,
697
+ "learning_rate": 3.397817460317461e-05,
698
+ "loss": 0.0227,
699
+ "step": 8700
700
+ },
701
+ {
702
+ "epoch": 9.032154017857144,
703
+ "grad_norm": 0.02300655096769333,
704
+ "learning_rate": 3.3730158730158734e-05,
705
+ "loss": 0.0392,
706
+ "step": 8800
707
+ },
708
+ {
709
+ "epoch": 9.036618303571428,
710
+ "grad_norm": 0.05427232384681702,
711
+ "learning_rate": 3.348214285714286e-05,
712
+ "loss": 0.0868,
713
+ "step": 8900
714
+ },
715
+ {
716
+ "epoch": 9.040055803571429,
717
+ "eval_accuracy": 0.9882352941176471,
718
+ "eval_loss": 0.05633905157446861,
719
+ "eval_runtime": 285.433,
720
+ "eval_samples_per_second": 2.978,
721
+ "eval_steps_per_second": 1.489,
722
+ "step": 8977
723
+ },
724
+ {
725
+ "epoch": 10.001004464285714,
726
+ "grad_norm": 0.0491323284804821,
727
+ "learning_rate": 3.3234126984126983e-05,
728
+ "loss": 0.0618,
729
+ "step": 9000
730
+ },
731
+ {
732
+ "epoch": 10.00546875,
733
+ "grad_norm": 1.0003972053527832,
734
+ "learning_rate": 3.2986111111111115e-05,
735
+ "loss": 0.0202,
736
+ "step": 9100
737
+ },
738
+ {
739
+ "epoch": 10.009933035714285,
740
+ "grad_norm": 0.00252954987809062,
741
+ "learning_rate": 3.273809523809524e-05,
742
+ "loss": 0.0531,
743
+ "step": 9200
744
+ },
745
+ {
746
+ "epoch": 10.014397321428572,
747
+ "grad_norm": 9.270633697509766,
748
+ "learning_rate": 3.249007936507937e-05,
749
+ "loss": 0.035,
750
+ "step": 9300
751
+ },
752
+ {
753
+ "epoch": 10.018861607142858,
754
+ "grad_norm": 0.014138671569526196,
755
+ "learning_rate": 3.2242063492063495e-05,
756
+ "loss": 0.0392,
757
+ "step": 9400
758
+ },
759
+ {
760
+ "epoch": 10.023325892857143,
761
+ "grad_norm": 0.01277222577482462,
762
+ "learning_rate": 3.199404761904762e-05,
763
+ "loss": 0.059,
764
+ "step": 9500
765
+ },
766
+ {
767
+ "epoch": 10.02779017857143,
768
+ "grad_norm": 0.0034905134234577417,
769
+ "learning_rate": 3.1746031746031745e-05,
770
+ "loss": 0.0664,
771
+ "step": 9600
772
+ },
773
+ {
774
+ "epoch": 10.032254464285714,
775
+ "grad_norm": 0.0024051007349044085,
776
+ "learning_rate": 3.1498015873015876e-05,
777
+ "loss": 0.0286,
778
+ "step": 9700
779
+ },
780
+ {
781
+ "epoch": 10.03671875,
782
+ "grad_norm": 0.002095526549965143,
783
+ "learning_rate": 3.125e-05,
784
+ "loss": 0.0016,
785
+ "step": 9800
786
+ },
787
+ {
788
+ "epoch": 10.040066964285714,
789
+ "eval_accuracy": 0.9776470588235294,
790
+ "eval_loss": 0.08437661826610565,
791
+ "eval_runtime": 280.7764,
792
+ "eval_samples_per_second": 3.027,
793
+ "eval_steps_per_second": 1.514,
794
+ "step": 9875
795
+ },
796
+ {
797
+ "epoch": 11.001104910714286,
798
+ "grad_norm": 0.0019545548129826784,
799
+ "learning_rate": 3.100198412698413e-05,
800
+ "loss": 0.0109,
801
+ "step": 9900
802
+ },
803
+ {
804
+ "epoch": 11.00556919642857,
805
+ "grad_norm": 0.005866718012839556,
806
+ "learning_rate": 3.075396825396826e-05,
807
+ "loss": 0.0479,
808
+ "step": 10000
809
+ },
810
+ {
811
+ "epoch": 11.010033482142857,
812
+ "grad_norm": 0.012244959361851215,
813
+ "learning_rate": 3.0505952380952385e-05,
814
+ "loss": 0.0116,
815
+ "step": 10100
816
+ },
817
+ {
818
+ "epoch": 11.014497767857144,
819
+ "grad_norm": 0.004522031173110008,
820
+ "learning_rate": 3.0257936507936506e-05,
821
+ "loss": 0.025,
822
+ "step": 10200
823
+ },
824
+ {
825
+ "epoch": 11.018962053571428,
826
+ "grad_norm": 0.010159791447222233,
827
+ "learning_rate": 3.0009920634920634e-05,
828
+ "loss": 0.0036,
829
+ "step": 10300
830
+ },
831
+ {
832
+ "epoch": 11.023426339285715,
833
+ "grad_norm": 0.40824609994888306,
834
+ "learning_rate": 2.9761904761904762e-05,
835
+ "loss": 0.0933,
836
+ "step": 10400
837
+ },
838
+ {
839
+ "epoch": 11.027890625,
840
+ "grad_norm": 0.11058317124843597,
841
+ "learning_rate": 2.951388888888889e-05,
842
+ "loss": 0.0161,
843
+ "step": 10500
844
+ },
845
+ {
846
+ "epoch": 11.032354910714286,
847
+ "grad_norm": 1.2187433242797852,
848
+ "learning_rate": 2.9265873015873018e-05,
849
+ "loss": 0.0329,
850
+ "step": 10600
851
+ },
852
+ {
853
+ "epoch": 11.03681919642857,
854
+ "grad_norm": 0.020026879385113716,
855
+ "learning_rate": 2.9017857142857146e-05,
856
+ "loss": 0.0318,
857
+ "step": 10700
858
+ },
859
+ {
860
+ "epoch": 11.040078125,
861
+ "eval_accuracy": 0.9752941176470589,
862
+ "eval_loss": 0.11233757436275482,
863
+ "eval_runtime": 279.6949,
864
+ "eval_samples_per_second": 3.039,
865
+ "eval_steps_per_second": 1.52,
866
+ "step": 10773
867
+ },
868
+ {
869
+ "epoch": 12.001205357142856,
870
+ "grad_norm": 0.004233605694025755,
871
+ "learning_rate": 2.876984126984127e-05,
872
+ "loss": 0.0145,
873
+ "step": 10800
874
+ },
875
+ {
876
+ "epoch": 12.005669642857143,
877
+ "grad_norm": 0.0020020680967718363,
878
+ "learning_rate": 2.8521825396825395e-05,
879
+ "loss": 0.0022,
880
+ "step": 10900
881
+ },
882
+ {
883
+ "epoch": 12.01013392857143,
884
+ "grad_norm": 0.0010592287871986628,
885
+ "learning_rate": 2.8273809523809523e-05,
886
+ "loss": 0.0029,
887
+ "step": 11000
888
+ },
889
+ {
890
+ "epoch": 12.014598214285714,
891
+ "grad_norm": 0.01872986927628517,
892
+ "learning_rate": 2.802579365079365e-05,
893
+ "loss": 0.0352,
894
+ "step": 11100
895
+ },
896
+ {
897
+ "epoch": 12.0190625,
898
+ "grad_norm": 0.05156349390745163,
899
+ "learning_rate": 2.777777777777778e-05,
900
+ "loss": 0.0047,
901
+ "step": 11200
902
+ },
903
+ {
904
+ "epoch": 12.023526785714285,
905
+ "grad_norm": 0.00894691701978445,
906
+ "learning_rate": 2.7529761904761907e-05,
907
+ "loss": 0.0303,
908
+ "step": 11300
909
+ },
910
+ {
911
+ "epoch": 12.027991071428572,
912
+ "grad_norm": 0.004200028255581856,
913
+ "learning_rate": 2.7281746031746032e-05,
914
+ "loss": 0.0782,
915
+ "step": 11400
916
+ },
917
+ {
918
+ "epoch": 12.032455357142856,
919
+ "grad_norm": 0.008372528478503227,
920
+ "learning_rate": 2.703373015873016e-05,
921
+ "loss": 0.0154,
922
+ "step": 11500
923
+ },
924
+ {
925
+ "epoch": 12.036919642857143,
926
+ "grad_norm": 0.010021534748375416,
927
+ "learning_rate": 2.6785714285714288e-05,
928
+ "loss": 0.0144,
929
+ "step": 11600
930
+ },
931
+ {
932
+ "epoch": 12.040044642857143,
933
+ "eval_accuracy": 0.9894117647058823,
934
+ "eval_loss": 0.04987098649144173,
935
+ "eval_runtime": 331.781,
936
+ "eval_samples_per_second": 2.562,
937
+ "eval_steps_per_second": 1.281,
938
+ "step": 11670
939
+ },
940
+ {
941
+ "epoch": 13.001305803571428,
942
+ "grad_norm": 0.3831511437892914,
943
+ "learning_rate": 2.6537698412698416e-05,
944
+ "loss": 0.0175,
945
+ "step": 11700
946
+ },
947
+ {
948
+ "epoch": 13.005770089285715,
949
+ "grad_norm": 0.0010712681105360389,
950
+ "learning_rate": 2.628968253968254e-05,
951
+ "loss": 0.0281,
952
+ "step": 11800
953
+ },
954
+ {
955
+ "epoch": 13.010234375,
956
+ "grad_norm": 0.004961916245520115,
957
+ "learning_rate": 2.604166666666667e-05,
958
+ "loss": 0.0162,
959
+ "step": 11900
960
+ },
961
+ {
962
+ "epoch": 13.014698660714286,
963
+ "grad_norm": 0.3577312231063843,
964
+ "learning_rate": 2.5793650793650796e-05,
965
+ "loss": 0.0133,
966
+ "step": 12000
967
+ },
968
+ {
969
+ "epoch": 13.01916294642857,
970
+ "grad_norm": 0.0016846248181536794,
971
+ "learning_rate": 2.554563492063492e-05,
972
+ "loss": 0.0456,
973
+ "step": 12100
974
+ },
975
+ {
976
+ "epoch": 13.023627232142857,
977
+ "grad_norm": 0.005252454895526171,
978
+ "learning_rate": 2.529761904761905e-05,
979
+ "loss": 0.0043,
980
+ "step": 12200
981
+ },
982
+ {
983
+ "epoch": 13.028091517857144,
984
+ "grad_norm": 65.35294342041016,
985
+ "learning_rate": 2.5049603174603177e-05,
986
+ "loss": 0.0248,
987
+ "step": 12300
988
+ },
989
+ {
990
+ "epoch": 13.032555803571428,
991
+ "grad_norm": 0.0010413563577458262,
992
+ "learning_rate": 2.4801587301587305e-05,
993
+ "loss": 0.033,
994
+ "step": 12400
995
+ },
996
+ {
997
+ "epoch": 13.037020089285715,
998
+ "grad_norm": 28.086708068847656,
999
+ "learning_rate": 2.455357142857143e-05,
1000
+ "loss": 0.0028,
1001
+ "step": 12500
1002
+ },
1003
+ {
1004
+ "epoch": 13.040055803571429,
1005
+ "eval_accuracy": 0.9870588235294118,
1006
+ "eval_loss": 0.08093971014022827,
1007
+ "eval_runtime": 287.2538,
1008
+ "eval_samples_per_second": 2.959,
1009
+ "eval_steps_per_second": 1.48,
1010
+ "step": 12568
1011
+ },
1012
+ {
1013
+ "epoch": 14.00140625,
1014
+ "grad_norm": 0.011327456682920456,
1015
+ "learning_rate": 2.4305555555555558e-05,
1016
+ "loss": 0.0203,
1017
+ "step": 12600
1018
+ },
1019
+ {
1020
+ "epoch": 14.005870535714285,
1021
+ "grad_norm": 0.006360394414514303,
1022
+ "learning_rate": 2.4057539682539686e-05,
1023
+ "loss": 0.0009,
1024
+ "step": 12700
1025
+ },
1026
+ {
1027
+ "epoch": 14.010334821428572,
1028
+ "grad_norm": 1.3321506977081299,
1029
+ "learning_rate": 2.380952380952381e-05,
1030
+ "loss": 0.0186,
1031
+ "step": 12800
1032
+ },
1033
+ {
1034
+ "epoch": 14.014799107142856,
1035
+ "grad_norm": 0.0009386364254169166,
1036
+ "learning_rate": 2.3561507936507938e-05,
1037
+ "loss": 0.0048,
1038
+ "step": 12900
1039
+ },
1040
+ {
1041
+ "epoch": 14.019263392857143,
1042
+ "grad_norm": 0.0016534485621377826,
1043
+ "learning_rate": 2.3313492063492066e-05,
1044
+ "loss": 0.037,
1045
+ "step": 13000
1046
+ },
1047
+ {
1048
+ "epoch": 14.02372767857143,
1049
+ "grad_norm": 0.001421699533239007,
1050
+ "learning_rate": 2.3065476190476194e-05,
1051
+ "loss": 0.0111,
1052
+ "step": 13100
1053
+ },
1054
+ {
1055
+ "epoch": 14.028191964285714,
1056
+ "grad_norm": 0.0014466221909970045,
1057
+ "learning_rate": 2.281746031746032e-05,
1058
+ "loss": 0.0169,
1059
+ "step": 13200
1060
+ },
1061
+ {
1062
+ "epoch": 14.03265625,
1063
+ "grad_norm": 0.0036468463949859142,
1064
+ "learning_rate": 2.2569444444444447e-05,
1065
+ "loss": 0.019,
1066
+ "step": 13300
1067
+ },
1068
+ {
1069
+ "epoch": 14.037120535714285,
1070
+ "grad_norm": 0.0012320175301283598,
1071
+ "learning_rate": 2.2321428571428575e-05,
1072
+ "loss": 0.0074,
1073
+ "step": 13400
1074
+ },
1075
+ {
1076
+ "epoch": 14.040066964285714,
1077
+ "eval_accuracy": 0.9929411764705882,
1078
+ "eval_loss": 0.045501772314310074,
1079
+ "eval_runtime": 285.3107,
1080
+ "eval_samples_per_second": 2.979,
1081
+ "eval_steps_per_second": 1.49,
1082
+ "step": 13466
1083
+ },
1084
+ {
1085
+ "epoch": 15.00150669642857,
1086
+ "grad_norm": 0.0006422046571969986,
1087
+ "learning_rate": 2.20734126984127e-05,
1088
+ "loss": 0.0202,
1089
+ "step": 13500
1090
+ },
1091
+ {
1092
+ "epoch": 15.005970982142857,
1093
+ "grad_norm": 0.0008420124650001526,
1094
+ "learning_rate": 2.1825396825396827e-05,
1095
+ "loss": 0.0116,
1096
+ "step": 13600
1097
+ },
1098
+ {
1099
+ "epoch": 15.010435267857142,
1100
+ "grad_norm": 0.018089979887008667,
1101
+ "learning_rate": 2.1577380952380955e-05,
1102
+ "loss": 0.0099,
1103
+ "step": 13700
1104
+ },
1105
+ {
1106
+ "epoch": 15.014899553571428,
1107
+ "grad_norm": 0.0031337908003479242,
1108
+ "learning_rate": 2.132936507936508e-05,
1109
+ "loss": 0.0566,
1110
+ "step": 13800
1111
+ },
1112
+ {
1113
+ "epoch": 15.019363839285715,
1114
+ "grad_norm": 0.0016157528152689338,
1115
+ "learning_rate": 2.1081349206349208e-05,
1116
+ "loss": 0.0212,
1117
+ "step": 13900
1118
+ },
1119
+ {
1120
+ "epoch": 15.023828125,
1121
+ "grad_norm": 0.01456926204264164,
1122
+ "learning_rate": 2.0833333333333336e-05,
1123
+ "loss": 0.0003,
1124
+ "step": 14000
1125
+ },
1126
+ {
1127
+ "epoch": 15.028292410714286,
1128
+ "grad_norm": 0.001924099400639534,
1129
+ "learning_rate": 2.058531746031746e-05,
1130
+ "loss": 0.0149,
1131
+ "step": 14100
1132
+ },
1133
+ {
1134
+ "epoch": 15.03275669642857,
1135
+ "grad_norm": 0.0008741599158383906,
1136
+ "learning_rate": 2.033730158730159e-05,
1137
+ "loss": 0.0168,
1138
+ "step": 14200
1139
+ },
1140
+ {
1141
+ "epoch": 15.037220982142857,
1142
+ "grad_norm": 0.06954433768987656,
1143
+ "learning_rate": 2.0089285714285717e-05,
1144
+ "loss": 0.0002,
1145
+ "step": 14300
1146
+ },
1147
+ {
1148
+ "epoch": 15.040078125,
1149
+ "eval_accuracy": 0.9905882352941177,
1150
+ "eval_loss": 0.058066971600055695,
1151
+ "eval_runtime": 289.1743,
1152
+ "eval_samples_per_second": 2.939,
1153
+ "eval_steps_per_second": 1.47,
1154
+ "step": 14364
1155
+ },
1156
+ {
1157
+ "epoch": 16.001607142857143,
1158
+ "grad_norm": 0.0014486366417258978,
1159
+ "learning_rate": 1.984126984126984e-05,
1160
+ "loss": 0.0063,
1161
+ "step": 14400
1162
+ },
1163
+ {
1164
+ "epoch": 16.006071428571428,
1165
+ "grad_norm": 0.0007301854784600437,
1166
+ "learning_rate": 1.959325396825397e-05,
1167
+ "loss": 0.0186,
1168
+ "step": 14500
1169
+ },
1170
+ {
1171
+ "epoch": 16.010535714285716,
1172
+ "grad_norm": 0.003457231679931283,
1173
+ "learning_rate": 1.9345238095238097e-05,
1174
+ "loss": 0.0236,
1175
+ "step": 14600
1176
+ },
1177
+ {
1178
+ "epoch": 16.015,
1179
+ "grad_norm": 0.005807195790112019,
1180
+ "learning_rate": 1.9097222222222222e-05,
1181
+ "loss": 0.0183,
1182
+ "step": 14700
1183
+ },
1184
+ {
1185
+ "epoch": 16.019464285714285,
1186
+ "grad_norm": 0.002843959955498576,
1187
+ "learning_rate": 1.884920634920635e-05,
1188
+ "loss": 0.0242,
1189
+ "step": 14800
1190
+ },
1191
+ {
1192
+ "epoch": 16.02392857142857,
1193
+ "grad_norm": 0.37613585591316223,
1194
+ "learning_rate": 1.8601190476190478e-05,
1195
+ "loss": 0.0101,
1196
+ "step": 14900
1197
+ },
1198
+ {
1199
+ "epoch": 16.028392857142858,
1200
+ "grad_norm": 0.0005575509858317673,
1201
+ "learning_rate": 1.8353174603174602e-05,
1202
+ "loss": 0.0109,
1203
+ "step": 15000
1204
+ },
1205
+ {
1206
+ "epoch": 16.032857142857143,
1207
+ "grad_norm": 0.0006386680179275572,
1208
+ "learning_rate": 1.810515873015873e-05,
1209
+ "loss": 0.0013,
1210
+ "step": 15100
1211
+ },
1212
+ {
1213
+ "epoch": 16.037321428571428,
1214
+ "grad_norm": 0.0010088573908433318,
1215
+ "learning_rate": 1.785714285714286e-05,
1216
+ "loss": 0.0077,
1217
+ "step": 15200
1218
+ },
1219
+ {
1220
+ "epoch": 16.040044642857143,
1221
+ "eval_accuracy": 0.9894117647058823,
1222
+ "eval_loss": 0.05021252483129501,
1223
+ "eval_runtime": 284.0094,
1224
+ "eval_samples_per_second": 2.993,
1225
+ "eval_steps_per_second": 1.496,
1226
+ "step": 15261
1227
+ },
1228
+ {
1229
+ "epoch": 17.001707589285715,
1230
+ "grad_norm": 0.0006336846854537725,
1231
+ "learning_rate": 1.7609126984126986e-05,
1232
+ "loss": 0.0212,
1233
+ "step": 15300
1234
+ },
1235
+ {
1236
+ "epoch": 17.006171875,
1237
+ "grad_norm": 0.0005883209523744881,
1238
+ "learning_rate": 1.736111111111111e-05,
1239
+ "loss": 0.0095,
1240
+ "step": 15400
1241
+ },
1242
+ {
1243
+ "epoch": 17.010636160714284,
1244
+ "grad_norm": 0.0021267228294163942,
1245
+ "learning_rate": 1.711309523809524e-05,
1246
+ "loss": 0.0113,
1247
+ "step": 15500
1248
+ },
1249
+ {
1250
+ "epoch": 17.015100446428573,
1251
+ "grad_norm": 0.0009908992797136307,
1252
+ "learning_rate": 1.6865079365079367e-05,
1253
+ "loss": 0.0118,
1254
+ "step": 15600
1255
+ },
1256
+ {
1257
+ "epoch": 17.019564732142857,
1258
+ "grad_norm": 0.000644190120510757,
1259
+ "learning_rate": 1.6617063492063492e-05,
1260
+ "loss": 0.0007,
1261
+ "step": 15700
1262
+ },
1263
+ {
1264
+ "epoch": 17.024029017857142,
1265
+ "grad_norm": 0.0005113797378726304,
1266
+ "learning_rate": 1.636904761904762e-05,
1267
+ "loss": 0.01,
1268
+ "step": 15800
1269
+ },
1270
+ {
1271
+ "epoch": 17.02849330357143,
1272
+ "grad_norm": 0.0008760132477618754,
1273
+ "learning_rate": 1.6121031746031748e-05,
1274
+ "loss": 0.0026,
1275
+ "step": 15900
1276
+ },
1277
+ {
1278
+ "epoch": 17.032957589285715,
1279
+ "grad_norm": 0.00030510194483213127,
1280
+ "learning_rate": 1.5873015873015872e-05,
1281
+ "loss": 0.0015,
1282
+ "step": 16000
1283
+ },
1284
+ {
1285
+ "epoch": 17.037421875,
1286
+ "grad_norm": 0.0004963899846188724,
1287
+ "learning_rate": 1.5625e-05,
1288
+ "loss": 0.0005,
1289
+ "step": 16100
1290
+ },
1291
+ {
1292
+ "epoch": 17.040055803571427,
1293
+ "eval_accuracy": 0.9929411764705882,
1294
+ "eval_loss": 0.04069099575281143,
1295
+ "eval_runtime": 285.9614,
1296
+ "eval_samples_per_second": 2.972,
1297
+ "eval_steps_per_second": 1.486,
1298
+ "step": 16159
1299
+ },
1300
+ {
1301
+ "epoch": 18.001808035714287,
1302
+ "grad_norm": 0.0015891814837232232,
1303
+ "learning_rate": 1.537698412698413e-05,
1304
+ "loss": 0.0376,
1305
+ "step": 16200
1306
+ },
1307
+ {
1308
+ "epoch": 18.006272321428572,
1309
+ "grad_norm": 0.008500500582158566,
1310
+ "learning_rate": 1.5128968253968253e-05,
1311
+ "loss": 0.0203,
1312
+ "step": 16300
1313
+ },
1314
+ {
1315
+ "epoch": 18.010736607142857,
1316
+ "grad_norm": 0.0030595629941672087,
1317
+ "learning_rate": 1.4880952380952381e-05,
1318
+ "loss": 0.0042,
1319
+ "step": 16400
1320
+ },
1321
+ {
1322
+ "epoch": 18.01520089285714,
1323
+ "grad_norm": 1.0810060501098633,
1324
+ "learning_rate": 1.4632936507936509e-05,
1325
+ "loss": 0.017,
1326
+ "step": 16500
1327
+ },
1328
+ {
1329
+ "epoch": 18.01966517857143,
1330
+ "grad_norm": 0.0005325720412656665,
1331
+ "learning_rate": 1.4384920634920635e-05,
1332
+ "loss": 0.0036,
1333
+ "step": 16600
1334
+ },
1335
+ {
1336
+ "epoch": 18.024129464285714,
1337
+ "grad_norm": 0.0014920306857675314,
1338
+ "learning_rate": 1.4136904761904762e-05,
1339
+ "loss": 0.0236,
1340
+ "step": 16700
1341
+ },
1342
+ {
1343
+ "epoch": 18.02859375,
1344
+ "grad_norm": 0.00048302882350981236,
1345
+ "learning_rate": 1.388888888888889e-05,
1346
+ "loss": 0.0127,
1347
+ "step": 16800
1348
+ },
1349
+ {
1350
+ "epoch": 18.033058035714287,
1351
+ "grad_norm": 0.002715888200327754,
1352
+ "learning_rate": 1.3640873015873016e-05,
1353
+ "loss": 0.0146,
1354
+ "step": 16900
1355
+ },
1356
+ {
1357
+ "epoch": 18.037522321428572,
1358
+ "grad_norm": 0.0004213691863697022,
1359
+ "learning_rate": 1.3392857142857144e-05,
1360
+ "loss": 0.0004,
1361
+ "step": 17000
1362
+ },
1363
+ {
1364
+ "epoch": 18.040066964285714,
1365
+ "eval_accuracy": 0.9905882352941177,
1366
+ "eval_loss": 0.05496314540505409,
1367
+ "eval_runtime": 318.6642,
1368
+ "eval_samples_per_second": 2.667,
1369
+ "eval_steps_per_second": 1.334,
1370
+ "step": 17057
1371
+ },
1372
+ {
1373
+ "epoch": 19.001908482142856,
1374
+ "grad_norm": 0.00044045469257980585,
1375
+ "learning_rate": 1.314484126984127e-05,
1376
+ "loss": 0.0143,
1377
+ "step": 17100
1378
+ },
1379
+ {
1380
+ "epoch": 19.006372767857144,
1381
+ "grad_norm": 0.0004946400295011699,
1382
+ "learning_rate": 1.2896825396825398e-05,
1383
+ "loss": 0.0002,
1384
+ "step": 17200
1385
+ },
1386
+ {
1387
+ "epoch": 19.01083705357143,
1388
+ "grad_norm": 0.014897634275257587,
1389
+ "learning_rate": 1.2648809523809524e-05,
1390
+ "loss": 0.0011,
1391
+ "step": 17300
1392
+ },
1393
+ {
1394
+ "epoch": 19.015301339285713,
1395
+ "grad_norm": 0.015875551849603653,
1396
+ "learning_rate": 1.2400793650793652e-05,
1397
+ "loss": 0.0007,
1398
+ "step": 17400
1399
+ },
1400
+ {
1401
+ "epoch": 19.019765625,
1402
+ "grad_norm": 0.0004391854163259268,
1403
+ "learning_rate": 1.2152777777777779e-05,
1404
+ "loss": 0.0068,
1405
+ "step": 17500
1406
+ },
1407
+ {
1408
+ "epoch": 19.024229910714286,
1409
+ "grad_norm": 0.00046034177648834884,
1410
+ "learning_rate": 1.1904761904761905e-05,
1411
+ "loss": 0.0001,
1412
+ "step": 17600
1413
+ },
1414
+ {
1415
+ "epoch": 19.02869419642857,
1416
+ "grad_norm": 0.0017288514645770192,
1417
+ "learning_rate": 1.1656746031746033e-05,
1418
+ "loss": 0.0001,
1419
+ "step": 17700
1420
+ },
1421
+ {
1422
+ "epoch": 19.033158482142856,
1423
+ "grad_norm": 0.0026627290062606335,
1424
+ "learning_rate": 1.140873015873016e-05,
1425
+ "loss": 0.0001,
1426
+ "step": 17800
1427
+ },
1428
+ {
1429
+ "epoch": 19.037622767857144,
1430
+ "grad_norm": 0.0004681396530941129,
1431
+ "learning_rate": 1.1160714285714287e-05,
1432
+ "loss": 0.0001,
1433
+ "step": 17900
1434
+ },
1435
+ {
1436
+ "epoch": 19.040078125,
1437
+ "eval_accuracy": 0.9929411764705882,
1438
+ "eval_loss": 0.05834496021270752,
1439
+ "eval_runtime": 239.7594,
1440
+ "eval_samples_per_second": 3.545,
1441
+ "eval_steps_per_second": 1.773,
1442
+ "step": 17955
1443
+ },
1444
+ {
1445
+ "epoch": 19.040078125,
1446
+ "step": 17955,
1447
+ "total_flos": 1.7905236367909847e+20,
1448
+ "train_loss": 0.295465733557037,
1449
+ "train_runtime": 68646.4816,
1450
+ "train_samples_per_second": 2.61,
1451
+ "train_steps_per_second": 0.326
1452
+ },
1453
+ {
1454
+ "epoch": 19.040078125,
1455
+ "eval_accuracy": 0.9929411764705882,
1456
+ "eval_loss": 0.04550177976489067,
1457
+ "eval_runtime": 230.0122,
1458
+ "eval_samples_per_second": 3.695,
1459
+ "eval_steps_per_second": 1.848,
1460
+ "step": 17955
1461
+ },
1462
+ {
1463
+ "epoch": 19.040078125,
1464
+ "eval_accuracy": 0.8973354231974922,
1465
+ "eval_loss": 0.5587517619132996,
1466
+ "eval_runtime": 352.6874,
1467
+ "eval_samples_per_second": 3.618,
1468
+ "eval_steps_per_second": 1.809,
1469
+ "step": 17955
1470
+ }
1471
+ ],
1472
+ "logging_steps": 100,
1473
+ "max_steps": 22400,
1474
+ "num_input_tokens_seen": 0,
1475
+ "num_train_epochs": 9223372036854775807,
1476
+ "save_steps": 500,
1477
+ "stateful_callbacks": {
1478
+ "EarlyStoppingCallback": {
1479
+ "args": {
1480
+ "early_stopping_patience": 5,
1481
+ "early_stopping_threshold": 0.0
1482
+ },
1483
+ "attributes": {
1484
+ "early_stopping_patience_counter": 0
1485
+ }
1486
+ },
1487
+ "TrainerControl": {
1488
+ "args": {
1489
+ "should_epoch_stop": false,
1490
+ "should_evaluate": false,
1491
+ "should_log": false,
1492
+ "should_save": true,
1493
+ "should_training_stop": true
1494
+ },
1495
+ "attributes": {}
1496
+ }
1497
+ },
1498
+ "total_flos": 1.7905236367909847e+20,
1499
+ "train_batch_size": 2,
1500
+ "trial_name": null,
1501
+ "trial_params": null
1502
+ }