gsmyrnis commited on
Commit
45320bf
·
verified ·
1 Parent(s): d32534d

End of training

Browse files
Files changed (5) hide show
  1. README.md +2 -1
  2. all_results.json +8 -0
  3. train_results.json +8 -0
  4. trainer_state.json +1260 -0
  5. training_loss.png +0 -0
README.md CHANGED
@@ -4,6 +4,7 @@ license: apache-2.0
4
  base_model: Qwen/Qwen2.5-7B-Instruct
5
  tags:
6
  - llama-factory
 
7
  - generated_from_trainer
8
  model-index:
9
  - name: llama3-1_8b_multiple_samples_majority_consensus_numina_aime
@@ -15,7 +16,7 @@ should probably proofread and complete it, then remove this comment. -->
15
 
16
  # llama3-1_8b_multiple_samples_majority_consensus_numina_aime
17
 
18
- This model is a fine-tuned version of [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) on an unknown dataset.
19
 
20
  ## Model description
21
 
 
4
  base_model: Qwen/Qwen2.5-7B-Instruct
5
  tags:
6
  - llama-factory
7
+ - full
8
  - generated_from_trainer
9
  model-index:
10
  - name: llama3-1_8b_multiple_samples_majority_consensus_numina_aime
 
16
 
17
  # llama3-1_8b_multiple_samples_majority_consensus_numina_aime
18
 
19
+ This model is a fine-tuned version of [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) on the mlfoundations-dev/multiple_samples_majority_consensus_numina_aime dataset.
20
 
21
  ## Model description
22
 
all_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 2.965909090909091,
3
+ "total_flos": 156886855483392.0,
4
+ "train_loss": 0.7380151625337272,
5
+ "train_runtime": 2697.8392,
6
+ "train_samples_per_second": 6.228,
7
+ "train_steps_per_second": 0.064
8
+ }
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 2.965909090909091,
3
+ "total_flos": 156886855483392.0,
4
+ "train_loss": 0.7380151625337272,
5
+ "train_runtime": 2697.8392,
6
+ "train_samples_per_second": 6.228,
7
+ "train_steps_per_second": 0.064
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,1260 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 2.965909090909091,
5
+ "eval_steps": 500,
6
+ "global_step": 174,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.017045454545454544,
13
+ "grad_norm": 7.22884205408755,
14
+ "learning_rate": 5.555555555555555e-07,
15
+ "loss": 1.1159,
16
+ "step": 1
17
+ },
18
+ {
19
+ "epoch": 0.03409090909090909,
20
+ "grad_norm": 6.86377349527473,
21
+ "learning_rate": 1.111111111111111e-06,
22
+ "loss": 1.0455,
23
+ "step": 2
24
+ },
25
+ {
26
+ "epoch": 0.05113636363636364,
27
+ "grad_norm": 7.199961151138557,
28
+ "learning_rate": 1.6666666666666667e-06,
29
+ "loss": 1.1275,
30
+ "step": 3
31
+ },
32
+ {
33
+ "epoch": 0.06818181818181818,
34
+ "grad_norm": 7.268274547750973,
35
+ "learning_rate": 2.222222222222222e-06,
36
+ "loss": 1.1096,
37
+ "step": 4
38
+ },
39
+ {
40
+ "epoch": 0.08522727272727272,
41
+ "grad_norm": 6.311407588180345,
42
+ "learning_rate": 2.7777777777777783e-06,
43
+ "loss": 1.0481,
44
+ "step": 5
45
+ },
46
+ {
47
+ "epoch": 0.10227272727272728,
48
+ "grad_norm": 5.325136507507456,
49
+ "learning_rate": 3.3333333333333333e-06,
50
+ "loss": 1.0745,
51
+ "step": 6
52
+ },
53
+ {
54
+ "epoch": 0.11931818181818182,
55
+ "grad_norm": 3.204819984360326,
56
+ "learning_rate": 3.88888888888889e-06,
57
+ "loss": 0.9787,
58
+ "step": 7
59
+ },
60
+ {
61
+ "epoch": 0.13636363636363635,
62
+ "grad_norm": 2.6398476108217075,
63
+ "learning_rate": 4.444444444444444e-06,
64
+ "loss": 0.9367,
65
+ "step": 8
66
+ },
67
+ {
68
+ "epoch": 0.1534090909090909,
69
+ "grad_norm": 2.5518759893438645,
70
+ "learning_rate": 5e-06,
71
+ "loss": 0.9116,
72
+ "step": 9
73
+ },
74
+ {
75
+ "epoch": 0.17045454545454544,
76
+ "grad_norm": 5.223827384885404,
77
+ "learning_rate": 5.555555555555557e-06,
78
+ "loss": 0.9835,
79
+ "step": 10
80
+ },
81
+ {
82
+ "epoch": 0.1875,
83
+ "grad_norm": 4.47133774615716,
84
+ "learning_rate": 6.111111111111112e-06,
85
+ "loss": 0.9244,
86
+ "step": 11
87
+ },
88
+ {
89
+ "epoch": 0.20454545454545456,
90
+ "grad_norm": 4.277259377499211,
91
+ "learning_rate": 6.666666666666667e-06,
92
+ "loss": 0.9496,
93
+ "step": 12
94
+ },
95
+ {
96
+ "epoch": 0.2215909090909091,
97
+ "grad_norm": 3.526152904409924,
98
+ "learning_rate": 7.222222222222223e-06,
99
+ "loss": 0.8649,
100
+ "step": 13
101
+ },
102
+ {
103
+ "epoch": 0.23863636363636365,
104
+ "grad_norm": 3.259182755810135,
105
+ "learning_rate": 7.77777777777778e-06,
106
+ "loss": 0.882,
107
+ "step": 14
108
+ },
109
+ {
110
+ "epoch": 0.2556818181818182,
111
+ "grad_norm": 2.657169823270014,
112
+ "learning_rate": 8.333333333333334e-06,
113
+ "loss": 0.9532,
114
+ "step": 15
115
+ },
116
+ {
117
+ "epoch": 0.2727272727272727,
118
+ "grad_norm": 2.2469055281791768,
119
+ "learning_rate": 8.888888888888888e-06,
120
+ "loss": 0.8686,
121
+ "step": 16
122
+ },
123
+ {
124
+ "epoch": 0.2897727272727273,
125
+ "grad_norm": 2.2961013306684492,
126
+ "learning_rate": 9.444444444444445e-06,
127
+ "loss": 0.9136,
128
+ "step": 17
129
+ },
130
+ {
131
+ "epoch": 0.3068181818181818,
132
+ "grad_norm": 2.0321977080614575,
133
+ "learning_rate": 1e-05,
134
+ "loss": 0.8197,
135
+ "step": 18
136
+ },
137
+ {
138
+ "epoch": 0.32386363636363635,
139
+ "grad_norm": 1.5442837234976816,
140
+ "learning_rate": 9.998986144924253e-06,
141
+ "loss": 0.7929,
142
+ "step": 19
143
+ },
144
+ {
145
+ "epoch": 0.3409090909090909,
146
+ "grad_norm": 1.3959797656829187,
147
+ "learning_rate": 9.995944990857848e-06,
148
+ "loss": 0.746,
149
+ "step": 20
150
+ },
151
+ {
152
+ "epoch": 0.35795454545454547,
153
+ "grad_norm": 1.6413702216042307,
154
+ "learning_rate": 9.990877771116588e-06,
155
+ "loss": 0.8195,
156
+ "step": 21
157
+ },
158
+ {
159
+ "epoch": 0.375,
160
+ "grad_norm": 1.1455825686481986,
161
+ "learning_rate": 9.983786540671052e-06,
162
+ "loss": 0.7925,
163
+ "step": 22
164
+ },
165
+ {
166
+ "epoch": 0.39204545454545453,
167
+ "grad_norm": 1.1413214191419678,
168
+ "learning_rate": 9.974674175313228e-06,
169
+ "loss": 0.8086,
170
+ "step": 23
171
+ },
172
+ {
173
+ "epoch": 0.4090909090909091,
174
+ "grad_norm": 1.1533147121145144,
175
+ "learning_rate": 9.96354437049027e-06,
176
+ "loss": 0.8001,
177
+ "step": 24
178
+ },
179
+ {
180
+ "epoch": 0.42613636363636365,
181
+ "grad_norm": 0.9882676326329094,
182
+ "learning_rate": 9.950401639805822e-06,
183
+ "loss": 0.868,
184
+ "step": 25
185
+ },
186
+ {
187
+ "epoch": 0.4431818181818182,
188
+ "grad_norm": 1.028373201431852,
189
+ "learning_rate": 9.935251313189564e-06,
190
+ "loss": 0.7855,
191
+ "step": 26
192
+ },
193
+ {
194
+ "epoch": 0.4602272727272727,
195
+ "grad_norm": 1.049228042355448,
196
+ "learning_rate": 9.91809953473572e-06,
197
+ "loss": 0.7744,
198
+ "step": 27
199
+ },
200
+ {
201
+ "epoch": 0.4772727272727273,
202
+ "grad_norm": 0.8404995559488506,
203
+ "learning_rate": 9.89895326021134e-06,
204
+ "loss": 0.7396,
205
+ "step": 28
206
+ },
207
+ {
208
+ "epoch": 0.4943181818181818,
209
+ "grad_norm": 0.9078468037297426,
210
+ "learning_rate": 9.87782025423547e-06,
211
+ "loss": 0.8565,
212
+ "step": 29
213
+ },
214
+ {
215
+ "epoch": 0.5113636363636364,
216
+ "grad_norm": 0.8508053415926495,
217
+ "learning_rate": 9.854709087130261e-06,
218
+ "loss": 0.729,
219
+ "step": 30
220
+ },
221
+ {
222
+ "epoch": 0.5284090909090909,
223
+ "grad_norm": 0.944676094285407,
224
+ "learning_rate": 9.829629131445342e-06,
225
+ "loss": 0.7635,
226
+ "step": 31
227
+ },
228
+ {
229
+ "epoch": 0.5454545454545454,
230
+ "grad_norm": 0.7058532649010517,
231
+ "learning_rate": 9.802590558156863e-06,
232
+ "loss": 0.7878,
233
+ "step": 32
234
+ },
235
+ {
236
+ "epoch": 0.5625,
237
+ "grad_norm": 0.8621390038765169,
238
+ "learning_rate": 9.77360433254273e-06,
239
+ "loss": 0.8099,
240
+ "step": 33
241
+ },
242
+ {
243
+ "epoch": 0.5795454545454546,
244
+ "grad_norm": 0.8073345495598823,
245
+ "learning_rate": 9.742682209735727e-06,
246
+ "loss": 0.7863,
247
+ "step": 34
248
+ },
249
+ {
250
+ "epoch": 0.5965909090909091,
251
+ "grad_norm": 0.6669987952697678,
252
+ "learning_rate": 9.709836729956326e-06,
253
+ "loss": 0.7973,
254
+ "step": 35
255
+ },
256
+ {
257
+ "epoch": 0.6136363636363636,
258
+ "grad_norm": 0.8486755092214402,
259
+ "learning_rate": 9.675081213427076e-06,
260
+ "loss": 0.7459,
261
+ "step": 36
262
+ },
263
+ {
264
+ "epoch": 0.6306818181818182,
265
+ "grad_norm": 0.7143187741160664,
266
+ "learning_rate": 9.638429754970715e-06,
267
+ "loss": 0.7543,
268
+ "step": 37
269
+ },
270
+ {
271
+ "epoch": 0.6477272727272727,
272
+ "grad_norm": 0.6848079020913995,
273
+ "learning_rate": 9.599897218294122e-06,
274
+ "loss": 0.8018,
275
+ "step": 38
276
+ },
277
+ {
278
+ "epoch": 0.6647727272727273,
279
+ "grad_norm": 0.853749579569145,
280
+ "learning_rate": 9.55949922996045e-06,
281
+ "loss": 0.7145,
282
+ "step": 39
283
+ },
284
+ {
285
+ "epoch": 0.6818181818181818,
286
+ "grad_norm": 0.68372338772216,
287
+ "learning_rate": 9.517252173051912e-06,
288
+ "loss": 0.764,
289
+ "step": 40
290
+ },
291
+ {
292
+ "epoch": 0.6988636363636364,
293
+ "grad_norm": 0.6422340925591769,
294
+ "learning_rate": 9.473173180525737e-06,
295
+ "loss": 0.7332,
296
+ "step": 41
297
+ },
298
+ {
299
+ "epoch": 0.7159090909090909,
300
+ "grad_norm": 0.7261628443441004,
301
+ "learning_rate": 9.427280128266049e-06,
302
+ "loss": 0.7428,
303
+ "step": 42
304
+ },
305
+ {
306
+ "epoch": 0.7329545454545454,
307
+ "grad_norm": 0.7702313145051908,
308
+ "learning_rate": 9.37959162783444e-06,
309
+ "loss": 0.7859,
310
+ "step": 43
311
+ },
312
+ {
313
+ "epoch": 0.75,
314
+ "grad_norm": 0.6747045160311267,
315
+ "learning_rate": 9.330127018922195e-06,
316
+ "loss": 0.824,
317
+ "step": 44
318
+ },
319
+ {
320
+ "epoch": 0.7670454545454546,
321
+ "grad_norm": 0.7223692043597745,
322
+ "learning_rate": 9.278906361507238e-06,
323
+ "loss": 0.7605,
324
+ "step": 45
325
+ },
326
+ {
327
+ "epoch": 0.7840909090909091,
328
+ "grad_norm": 0.578610691292999,
329
+ "learning_rate": 9.225950427718974e-06,
330
+ "loss": 0.7546,
331
+ "step": 46
332
+ },
333
+ {
334
+ "epoch": 0.8011363636363636,
335
+ "grad_norm": 0.7399134643017758,
336
+ "learning_rate": 9.171280693414307e-06,
337
+ "loss": 0.7068,
338
+ "step": 47
339
+ },
340
+ {
341
+ "epoch": 0.8181818181818182,
342
+ "grad_norm": 0.6651749348569863,
343
+ "learning_rate": 9.114919329468283e-06,
344
+ "loss": 0.7707,
345
+ "step": 48
346
+ },
347
+ {
348
+ "epoch": 0.8352272727272727,
349
+ "grad_norm": 0.6681727962907063,
350
+ "learning_rate": 9.056889192782865e-06,
351
+ "loss": 0.7423,
352
+ "step": 49
353
+ },
354
+ {
355
+ "epoch": 0.8522727272727273,
356
+ "grad_norm": 0.5989175339145305,
357
+ "learning_rate": 8.997213817017508e-06,
358
+ "loss": 0.7367,
359
+ "step": 50
360
+ },
361
+ {
362
+ "epoch": 0.8693181818181818,
363
+ "grad_norm": 0.6415631156829461,
364
+ "learning_rate": 8.935917403045251e-06,
365
+ "loss": 0.7742,
366
+ "step": 51
367
+ },
368
+ {
369
+ "epoch": 0.8863636363636364,
370
+ "grad_norm": 0.7053287962999208,
371
+ "learning_rate": 8.873024809138272e-06,
372
+ "loss": 0.8004,
373
+ "step": 52
374
+ },
375
+ {
376
+ "epoch": 0.9034090909090909,
377
+ "grad_norm": 0.6616877102090415,
378
+ "learning_rate": 8.808561540886796e-06,
379
+ "loss": 0.8331,
380
+ "step": 53
381
+ },
382
+ {
383
+ "epoch": 0.9204545454545454,
384
+ "grad_norm": 0.6686807030220209,
385
+ "learning_rate": 8.742553740855507e-06,
386
+ "loss": 0.7622,
387
+ "step": 54
388
+ },
389
+ {
390
+ "epoch": 0.9375,
391
+ "grad_norm": 0.6207828741528636,
392
+ "learning_rate": 8.675028177981643e-06,
393
+ "loss": 0.7329,
394
+ "step": 55
395
+ },
396
+ {
397
+ "epoch": 0.9545454545454546,
398
+ "grad_norm": 0.5650546394483587,
399
+ "learning_rate": 8.606012236719073e-06,
400
+ "loss": 0.7638,
401
+ "step": 56
402
+ },
403
+ {
404
+ "epoch": 0.9715909090909091,
405
+ "grad_norm": 0.6124784723851092,
406
+ "learning_rate": 8.535533905932739e-06,
407
+ "loss": 0.7341,
408
+ "step": 57
409
+ },
410
+ {
411
+ "epoch": 0.9886363636363636,
412
+ "grad_norm": 0.6052340709182291,
413
+ "learning_rate": 8.463621767547998e-06,
414
+ "loss": 0.7457,
415
+ "step": 58
416
+ },
417
+ {
418
+ "epoch": 1.0056818181818181,
419
+ "grad_norm": 0.8052906258453199,
420
+ "learning_rate": 8.390304984959455e-06,
421
+ "loss": 0.939,
422
+ "step": 59
423
+ },
424
+ {
425
+ "epoch": 1.0227272727272727,
426
+ "grad_norm": 0.6376926058729759,
427
+ "learning_rate": 8.315613291203977e-06,
428
+ "loss": 0.7367,
429
+ "step": 60
430
+ },
431
+ {
432
+ "epoch": 1.0397727272727273,
433
+ "grad_norm": 0.6369503379591582,
434
+ "learning_rate": 8.239576976902694e-06,
435
+ "loss": 0.7449,
436
+ "step": 61
437
+ },
438
+ {
439
+ "epoch": 1.0568181818181819,
440
+ "grad_norm": 0.6969242917426808,
441
+ "learning_rate": 8.162226877976886e-06,
442
+ "loss": 0.7259,
443
+ "step": 62
444
+ },
445
+ {
446
+ "epoch": 1.0738636363636365,
447
+ "grad_norm": 0.6530544217263711,
448
+ "learning_rate": 8.083594363142717e-06,
449
+ "loss": 0.646,
450
+ "step": 63
451
+ },
452
+ {
453
+ "epoch": 1.0909090909090908,
454
+ "grad_norm": 0.5900438510967748,
455
+ "learning_rate": 8.003711321189895e-06,
456
+ "loss": 0.6842,
457
+ "step": 64
458
+ },
459
+ {
460
+ "epoch": 1.1079545454545454,
461
+ "grad_norm": 0.6284392353104462,
462
+ "learning_rate": 7.922610148049445e-06,
463
+ "loss": 0.8002,
464
+ "step": 65
465
+ },
466
+ {
467
+ "epoch": 1.125,
468
+ "grad_norm": 0.6494425439689134,
469
+ "learning_rate": 7.84032373365578e-06,
470
+ "loss": 0.638,
471
+ "step": 66
472
+ },
473
+ {
474
+ "epoch": 1.1420454545454546,
475
+ "grad_norm": 0.5499044253465919,
476
+ "learning_rate": 7.75688544860846e-06,
477
+ "loss": 0.6289,
478
+ "step": 67
479
+ },
480
+ {
481
+ "epoch": 1.1590909090909092,
482
+ "grad_norm": 0.6392668714240907,
483
+ "learning_rate": 7.672329130639007e-06,
484
+ "loss": 0.7572,
485
+ "step": 68
486
+ },
487
+ {
488
+ "epoch": 1.1761363636363638,
489
+ "grad_norm": 0.673210138953056,
490
+ "learning_rate": 7.586689070888284e-06,
491
+ "loss": 0.6524,
492
+ "step": 69
493
+ },
494
+ {
495
+ "epoch": 1.1931818181818181,
496
+ "grad_norm": 0.6410447658297229,
497
+ "learning_rate": 7.500000000000001e-06,
498
+ "loss": 0.7362,
499
+ "step": 70
500
+ },
501
+ {
502
+ "epoch": 1.2102272727272727,
503
+ "grad_norm": 0.6785817657314511,
504
+ "learning_rate": 7.412297074035968e-06,
505
+ "loss": 0.7131,
506
+ "step": 71
507
+ },
508
+ {
509
+ "epoch": 1.2272727272727273,
510
+ "grad_norm": 0.861453272486042,
511
+ "learning_rate": 7.323615860218844e-06,
512
+ "loss": 0.7169,
513
+ "step": 72
514
+ },
515
+ {
516
+ "epoch": 1.2443181818181819,
517
+ "grad_norm": 0.611947189690942,
518
+ "learning_rate": 7.2339923225081296e-06,
519
+ "loss": 0.7185,
520
+ "step": 73
521
+ },
522
+ {
523
+ "epoch": 1.2613636363636362,
524
+ "grad_norm": 0.6501253363478586,
525
+ "learning_rate": 7.143462807015271e-06,
526
+ "loss": 0.6616,
527
+ "step": 74
528
+ },
529
+ {
530
+ "epoch": 1.2784090909090908,
531
+ "grad_norm": 0.715695504058172,
532
+ "learning_rate": 7.052064027263785e-06,
533
+ "loss": 0.7005,
534
+ "step": 75
535
+ },
536
+ {
537
+ "epoch": 1.2954545454545454,
538
+ "grad_norm": 0.7073752700474762,
539
+ "learning_rate": 6.959833049300376e-06,
540
+ "loss": 0.7879,
541
+ "step": 76
542
+ },
543
+ {
544
+ "epoch": 1.3125,
545
+ "grad_norm": 0.49068564720807656,
546
+ "learning_rate": 6.8668072766631054e-06,
547
+ "loss": 0.6314,
548
+ "step": 77
549
+ },
550
+ {
551
+ "epoch": 1.3295454545454546,
552
+ "grad_norm": 0.7803765124238339,
553
+ "learning_rate": 6.773024435212678e-06,
554
+ "loss": 0.7851,
555
+ "step": 78
556
+ },
557
+ {
558
+ "epoch": 1.3465909090909092,
559
+ "grad_norm": 0.5942335579737438,
560
+ "learning_rate": 6.678522557833025e-06,
561
+ "loss": 0.689,
562
+ "step": 79
563
+ },
564
+ {
565
+ "epoch": 1.3636363636363638,
566
+ "grad_norm": 0.5176233208142944,
567
+ "learning_rate": 6.583339969007364e-06,
568
+ "loss": 0.6542,
569
+ "step": 80
570
+ },
571
+ {
572
+ "epoch": 1.3806818181818181,
573
+ "grad_norm": 0.607753727654389,
574
+ "learning_rate": 6.487515269276015e-06,
575
+ "loss": 0.7534,
576
+ "step": 81
577
+ },
578
+ {
579
+ "epoch": 1.3977272727272727,
580
+ "grad_norm": 0.5409054512847689,
581
+ "learning_rate": 6.391087319582264e-06,
582
+ "loss": 0.6305,
583
+ "step": 82
584
+ },
585
+ {
586
+ "epoch": 1.4147727272727273,
587
+ "grad_norm": 0.5071385453642557,
588
+ "learning_rate": 6.294095225512604e-06,
589
+ "loss": 0.7589,
590
+ "step": 83
591
+ },
592
+ {
593
+ "epoch": 1.4318181818181819,
594
+ "grad_norm": 0.5143547906358554,
595
+ "learning_rate": 6.1965783214377895e-06,
596
+ "loss": 0.7914,
597
+ "step": 84
598
+ },
599
+ {
600
+ "epoch": 1.4488636363636362,
601
+ "grad_norm": 0.5071795878844763,
602
+ "learning_rate": 6.0985761545610865e-06,
603
+ "loss": 0.6643,
604
+ "step": 85
605
+ },
606
+ {
607
+ "epoch": 1.4659090909090908,
608
+ "grad_norm": 0.5505402808974502,
609
+ "learning_rate": 6.000128468880223e-06,
610
+ "loss": 0.7187,
611
+ "step": 86
612
+ },
613
+ {
614
+ "epoch": 1.4829545454545454,
615
+ "grad_norm": 0.5577932417601431,
616
+ "learning_rate": 5.90127518906953e-06,
617
+ "loss": 0.6295,
618
+ "step": 87
619
+ },
620
+ {
621
+ "epoch": 1.5,
622
+ "grad_norm": 0.6475715595827016,
623
+ "learning_rate": 5.8020564042888015e-06,
624
+ "loss": 0.7056,
625
+ "step": 88
626
+ },
627
+ {
628
+ "epoch": 1.5170454545454546,
629
+ "grad_norm": 0.5835532211565827,
630
+ "learning_rate": 5.7025123519254644e-06,
631
+ "loss": 0.7256,
632
+ "step": 89
633
+ },
634
+ {
635
+ "epoch": 1.5340909090909092,
636
+ "grad_norm": 0.5415921745872375,
637
+ "learning_rate": 5.6026834012766155e-06,
638
+ "loss": 0.7471,
639
+ "step": 90
640
+ },
641
+ {
642
+ "epoch": 1.5511363636363638,
643
+ "grad_norm": 0.5833270165675741,
644
+ "learning_rate": 5.502610037177586e-06,
645
+ "loss": 0.7633,
646
+ "step": 91
647
+ },
648
+ {
649
+ "epoch": 1.5681818181818183,
650
+ "grad_norm": 0.560199988553917,
651
+ "learning_rate": 5.402332843583631e-06,
652
+ "loss": 0.6595,
653
+ "step": 92
654
+ },
655
+ {
656
+ "epoch": 1.5852272727272727,
657
+ "grad_norm": 0.4939855321494504,
658
+ "learning_rate": 5.301892487111431e-06,
659
+ "loss": 0.6958,
660
+ "step": 93
661
+ },
662
+ {
663
+ "epoch": 1.6022727272727273,
664
+ "grad_norm": 0.5378643220876148,
665
+ "learning_rate": 5.201329700547077e-06,
666
+ "loss": 0.6934,
667
+ "step": 94
668
+ },
669
+ {
670
+ "epoch": 1.6193181818181817,
671
+ "grad_norm": 0.5420525607884883,
672
+ "learning_rate": 5.100685266327202e-06,
673
+ "loss": 0.6365,
674
+ "step": 95
675
+ },
676
+ {
677
+ "epoch": 1.6363636363636362,
678
+ "grad_norm": 0.4520550820837634,
679
+ "learning_rate": 5e-06,
680
+ "loss": 0.6338,
681
+ "step": 96
682
+ },
683
+ {
684
+ "epoch": 1.6534090909090908,
685
+ "grad_norm": 0.5117553753106799,
686
+ "learning_rate": 4.8993147336728e-06,
687
+ "loss": 0.7532,
688
+ "step": 97
689
+ },
690
+ {
691
+ "epoch": 1.6704545454545454,
692
+ "grad_norm": 0.5106162671673076,
693
+ "learning_rate": 4.798670299452926e-06,
694
+ "loss": 0.7011,
695
+ "step": 98
696
+ },
697
+ {
698
+ "epoch": 1.6875,
699
+ "grad_norm": 0.47075334035417565,
700
+ "learning_rate": 4.69810751288857e-06,
701
+ "loss": 0.6379,
702
+ "step": 99
703
+ },
704
+ {
705
+ "epoch": 1.7045454545454546,
706
+ "grad_norm": 0.5343852849626032,
707
+ "learning_rate": 4.597667156416371e-06,
708
+ "loss": 0.7058,
709
+ "step": 100
710
+ },
711
+ {
712
+ "epoch": 1.7215909090909092,
713
+ "grad_norm": 0.5025252513851014,
714
+ "learning_rate": 4.497389962822416e-06,
715
+ "loss": 0.7518,
716
+ "step": 101
717
+ },
718
+ {
719
+ "epoch": 1.7386363636363638,
720
+ "grad_norm": 0.4973905372002931,
721
+ "learning_rate": 4.397316598723385e-06,
722
+ "loss": 0.7153,
723
+ "step": 102
724
+ },
725
+ {
726
+ "epoch": 1.7556818181818183,
727
+ "grad_norm": 0.46247569289594587,
728
+ "learning_rate": 4.297487648074538e-06,
729
+ "loss": 0.6349,
730
+ "step": 103
731
+ },
732
+ {
733
+ "epoch": 1.7727272727272727,
734
+ "grad_norm": 0.48340291588569545,
735
+ "learning_rate": 4.1979435957111984e-06,
736
+ "loss": 0.7174,
737
+ "step": 104
738
+ },
739
+ {
740
+ "epoch": 1.7897727272727273,
741
+ "grad_norm": 0.45562405643253756,
742
+ "learning_rate": 4.098724810930472e-06,
743
+ "loss": 0.6847,
744
+ "step": 105
745
+ },
746
+ {
747
+ "epoch": 1.8068181818181817,
748
+ "grad_norm": 0.47781922446221436,
749
+ "learning_rate": 3.999871531119779e-06,
750
+ "loss": 0.7162,
751
+ "step": 106
752
+ },
753
+ {
754
+ "epoch": 1.8238636363636362,
755
+ "grad_norm": 0.4315069745006211,
756
+ "learning_rate": 3.901423845438916e-06,
757
+ "loss": 0.6691,
758
+ "step": 107
759
+ },
760
+ {
761
+ "epoch": 1.8409090909090908,
762
+ "grad_norm": 0.47883228124204524,
763
+ "learning_rate": 3.803421678562213e-06,
764
+ "loss": 0.7257,
765
+ "step": 108
766
+ },
767
+ {
768
+ "epoch": 1.8579545454545454,
769
+ "grad_norm": 0.5320184023237785,
770
+ "learning_rate": 3.705904774487396e-06,
771
+ "loss": 0.7532,
772
+ "step": 109
773
+ },
774
+ {
775
+ "epoch": 1.875,
776
+ "grad_norm": 0.4269725257899147,
777
+ "learning_rate": 3.6089126804177373e-06,
778
+ "loss": 0.6863,
779
+ "step": 110
780
+ },
781
+ {
782
+ "epoch": 1.8920454545454546,
783
+ "grad_norm": 0.4583005565763506,
784
+ "learning_rate": 3.5124847307239863e-06,
785
+ "loss": 0.7502,
786
+ "step": 111
787
+ },
788
+ {
789
+ "epoch": 1.9090909090909092,
790
+ "grad_norm": 0.4486844352174395,
791
+ "learning_rate": 3.416660030992639e-06,
792
+ "loss": 0.6897,
793
+ "step": 112
794
+ },
795
+ {
796
+ "epoch": 1.9261363636363638,
797
+ "grad_norm": 0.49053197217486366,
798
+ "learning_rate": 3.3214774421669777e-06,
799
+ "loss": 0.698,
800
+ "step": 113
801
+ },
802
+ {
803
+ "epoch": 1.9431818181818183,
804
+ "grad_norm": 0.5128637973645126,
805
+ "learning_rate": 3.226975564787322e-06,
806
+ "loss": 0.6907,
807
+ "step": 114
808
+ },
809
+ {
810
+ "epoch": 1.9602272727272727,
811
+ "grad_norm": 0.478047879056811,
812
+ "learning_rate": 3.1331927233368954e-06,
813
+ "loss": 0.7661,
814
+ "step": 115
815
+ },
816
+ {
817
+ "epoch": 1.9772727272727273,
818
+ "grad_norm": 0.4387651831852447,
819
+ "learning_rate": 3.040166950699626e-06,
820
+ "loss": 0.6683,
821
+ "step": 116
822
+ },
823
+ {
824
+ "epoch": 1.9943181818181817,
825
+ "grad_norm": 0.5998562328424022,
826
+ "learning_rate": 2.947935972736217e-06,
827
+ "loss": 0.8983,
828
+ "step": 117
829
+ },
830
+ {
831
+ "epoch": 2.0113636363636362,
832
+ "grad_norm": 0.5630847402650452,
833
+ "learning_rate": 2.8565371929847286e-06,
834
+ "loss": 0.6758,
835
+ "step": 118
836
+ },
837
+ {
838
+ "epoch": 2.028409090909091,
839
+ "grad_norm": 0.4489743674974812,
840
+ "learning_rate": 2.766007677491871e-06,
841
+ "loss": 0.6843,
842
+ "step": 119
843
+ },
844
+ {
845
+ "epoch": 2.0454545454545454,
846
+ "grad_norm": 0.4457573392562645,
847
+ "learning_rate": 2.6763841397811576e-06,
848
+ "loss": 0.6327,
849
+ "step": 120
850
+ },
851
+ {
852
+ "epoch": 2.0625,
853
+ "grad_norm": 0.5634916203600439,
854
+ "learning_rate": 2.587702925964034e-06,
855
+ "loss": 0.6627,
856
+ "step": 121
857
+ },
858
+ {
859
+ "epoch": 2.0795454545454546,
860
+ "grad_norm": 0.4707269296382247,
861
+ "learning_rate": 2.5000000000000015e-06,
862
+ "loss": 0.7272,
863
+ "step": 122
864
+ },
865
+ {
866
+ "epoch": 2.096590909090909,
867
+ "grad_norm": 0.443373403354557,
868
+ "learning_rate": 2.4133109291117156e-06,
869
+ "loss": 0.6195,
870
+ "step": 123
871
+ },
872
+ {
873
+ "epoch": 2.1136363636363638,
874
+ "grad_norm": 0.46908613730040055,
875
+ "learning_rate": 2.3276708693609947e-06,
876
+ "loss": 0.5704,
877
+ "step": 124
878
+ },
879
+ {
880
+ "epoch": 2.1306818181818183,
881
+ "grad_norm": 0.4105495278728908,
882
+ "learning_rate": 2.243114551391542e-06,
883
+ "loss": 0.711,
884
+ "step": 125
885
+ },
886
+ {
887
+ "epoch": 2.147727272727273,
888
+ "grad_norm": 0.537311901769593,
889
+ "learning_rate": 2.159676266344222e-06,
890
+ "loss": 0.7541,
891
+ "step": 126
892
+ },
893
+ {
894
+ "epoch": 2.164772727272727,
895
+ "grad_norm": 0.5468018420551715,
896
+ "learning_rate": 2.077389851950557e-06,
897
+ "loss": 0.5964,
898
+ "step": 127
899
+ },
900
+ {
901
+ "epoch": 2.1818181818181817,
902
+ "grad_norm": 0.5124295355078099,
903
+ "learning_rate": 1.996288678810105e-06,
904
+ "loss": 0.6619,
905
+ "step": 128
906
+ },
907
+ {
908
+ "epoch": 2.1988636363636362,
909
+ "grad_norm": 0.4715818069175043,
910
+ "learning_rate": 1.9164056368572847e-06,
911
+ "loss": 0.6833,
912
+ "step": 129
913
+ },
914
+ {
915
+ "epoch": 2.215909090909091,
916
+ "grad_norm": 0.48951497017695567,
917
+ "learning_rate": 1.8377731220231144e-06,
918
+ "loss": 0.6855,
919
+ "step": 130
920
+ },
921
+ {
922
+ "epoch": 2.2329545454545454,
923
+ "grad_norm": 0.42265291988632153,
924
+ "learning_rate": 1.7604230230973068e-06,
925
+ "loss": 0.6424,
926
+ "step": 131
927
+ },
928
+ {
929
+ "epoch": 2.25,
930
+ "grad_norm": 0.4650605098758252,
931
+ "learning_rate": 1.6843867087960252e-06,
932
+ "loss": 0.6231,
933
+ "step": 132
934
+ },
935
+ {
936
+ "epoch": 2.2670454545454546,
937
+ "grad_norm": 0.4143527777268036,
938
+ "learning_rate": 1.6096950150405454e-06,
939
+ "loss": 0.5598,
940
+ "step": 133
941
+ },
942
+ {
943
+ "epoch": 2.284090909090909,
944
+ "grad_norm": 0.44107050291230937,
945
+ "learning_rate": 1.5363782324520033e-06,
946
+ "loss": 0.7221,
947
+ "step": 134
948
+ },
949
+ {
950
+ "epoch": 2.3011363636363638,
951
+ "grad_norm": 0.445778553889928,
952
+ "learning_rate": 1.4644660940672628e-06,
953
+ "loss": 0.6413,
954
+ "step": 135
955
+ },
956
+ {
957
+ "epoch": 2.3181818181818183,
958
+ "grad_norm": 0.4187405257211714,
959
+ "learning_rate": 1.3939877632809279e-06,
960
+ "loss": 0.6992,
961
+ "step": 136
962
+ },
963
+ {
964
+ "epoch": 2.3352272727272725,
965
+ "grad_norm": 0.4350630455961759,
966
+ "learning_rate": 1.3249718220183583e-06,
967
+ "loss": 0.6878,
968
+ "step": 137
969
+ },
970
+ {
971
+ "epoch": 2.3522727272727275,
972
+ "grad_norm": 0.393861708986206,
973
+ "learning_rate": 1.257446259144494e-06,
974
+ "loss": 0.6385,
975
+ "step": 138
976
+ },
977
+ {
978
+ "epoch": 2.3693181818181817,
979
+ "grad_norm": 0.42310655150402887,
980
+ "learning_rate": 1.1914384591132045e-06,
981
+ "loss": 0.7421,
982
+ "step": 139
983
+ },
984
+ {
985
+ "epoch": 2.3863636363636362,
986
+ "grad_norm": 0.40172644758715986,
987
+ "learning_rate": 1.1269751908617277e-06,
988
+ "loss": 0.647,
989
+ "step": 140
990
+ },
991
+ {
992
+ "epoch": 2.403409090909091,
993
+ "grad_norm": 0.4418275610345151,
994
+ "learning_rate": 1.0640825969547498e-06,
995
+ "loss": 0.6702,
996
+ "step": 141
997
+ },
998
+ {
999
+ "epoch": 2.4204545454545454,
1000
+ "grad_norm": 0.3802094194842585,
1001
+ "learning_rate": 1.0027861829824953e-06,
1002
+ "loss": 0.5888,
1003
+ "step": 142
1004
+ },
1005
+ {
1006
+ "epoch": 2.4375,
1007
+ "grad_norm": 0.39698478453376557,
1008
+ "learning_rate": 9.431108072171346e-07,
1009
+ "loss": 0.7405,
1010
+ "step": 143
1011
+ },
1012
+ {
1013
+ "epoch": 2.4545454545454546,
1014
+ "grad_norm": 0.4302235176734781,
1015
+ "learning_rate": 8.850806705317183e-07,
1016
+ "loss": 0.6819,
1017
+ "step": 144
1018
+ },
1019
+ {
1020
+ "epoch": 2.471590909090909,
1021
+ "grad_norm": 0.4055826304794628,
1022
+ "learning_rate": 8.287193065856936e-07,
1023
+ "loss": 0.6332,
1024
+ "step": 145
1025
+ },
1026
+ {
1027
+ "epoch": 2.4886363636363638,
1028
+ "grad_norm": 0.4307108144646676,
1029
+ "learning_rate": 7.740495722810271e-07,
1030
+ "loss": 0.676,
1031
+ "step": 146
1032
+ },
1033
+ {
1034
+ "epoch": 2.5056818181818183,
1035
+ "grad_norm": 0.4295524075776934,
1036
+ "learning_rate": 7.210936384927631e-07,
1037
+ "loss": 0.7018,
1038
+ "step": 147
1039
+ },
1040
+ {
1041
+ "epoch": 2.5227272727272725,
1042
+ "grad_norm": 0.3860805358590195,
1043
+ "learning_rate": 6.698729810778065e-07,
1044
+ "loss": 0.6894,
1045
+ "step": 148
1046
+ },
1047
+ {
1048
+ "epoch": 2.5397727272727275,
1049
+ "grad_norm": 0.4334053053827782,
1050
+ "learning_rate": 6.204083721655607e-07,
1051
+ "loss": 0.645,
1052
+ "step": 149
1053
+ },
1054
+ {
1055
+ "epoch": 2.5568181818181817,
1056
+ "grad_norm": 0.4287547792874842,
1057
+ "learning_rate": 5.727198717339511e-07,
1058
+ "loss": 0.7111,
1059
+ "step": 150
1060
+ },
1061
+ {
1062
+ "epoch": 2.5738636363636362,
1063
+ "grad_norm": 0.3887625011714418,
1064
+ "learning_rate": 5.268268194742638e-07,
1065
+ "loss": 0.7216,
1066
+ "step": 151
1067
+ },
1068
+ {
1069
+ "epoch": 2.590909090909091,
1070
+ "grad_norm": 0.3798899338475208,
1071
+ "learning_rate": 4.827478269480895e-07,
1072
+ "loss": 0.621,
1073
+ "step": 152
1074
+ },
1075
+ {
1076
+ "epoch": 2.6079545454545454,
1077
+ "grad_norm": 0.45053111754260017,
1078
+ "learning_rate": 4.405007700395497e-07,
1079
+ "loss": 0.7521,
1080
+ "step": 153
1081
+ },
1082
+ {
1083
+ "epoch": 2.625,
1084
+ "grad_norm": 0.377645018921979,
1085
+ "learning_rate": 4.001027817058789e-07,
1086
+ "loss": 0.6145,
1087
+ "step": 154
1088
+ },
1089
+ {
1090
+ "epoch": 2.6420454545454546,
1091
+ "grad_norm": 0.3776664317185743,
1092
+ "learning_rate": 3.615702450292857e-07,
1093
+ "loss": 0.6278,
1094
+ "step": 155
1095
+ },
1096
+ {
1097
+ "epoch": 2.659090909090909,
1098
+ "grad_norm": 0.4305890271008056,
1099
+ "learning_rate": 3.2491878657292643e-07,
1100
+ "loss": 0.7472,
1101
+ "step": 156
1102
+ },
1103
+ {
1104
+ "epoch": 2.6761363636363638,
1105
+ "grad_norm": 0.40843066342237466,
1106
+ "learning_rate": 2.901632700436757e-07,
1107
+ "loss": 0.6258,
1108
+ "step": 157
1109
+ },
1110
+ {
1111
+ "epoch": 2.6931818181818183,
1112
+ "grad_norm": 0.3956427977171968,
1113
+ "learning_rate": 2.573177902642726e-07,
1114
+ "loss": 0.6839,
1115
+ "step": 158
1116
+ },
1117
+ {
1118
+ "epoch": 2.7102272727272725,
1119
+ "grad_norm": 0.41431945926782104,
1120
+ "learning_rate": 2.2639566745727203e-07,
1121
+ "loss": 0.6734,
1122
+ "step": 159
1123
+ },
1124
+ {
1125
+ "epoch": 2.7272727272727275,
1126
+ "grad_norm": 0.3818344903770187,
1127
+ "learning_rate": 1.9740944184313882e-07,
1128
+ "loss": 0.6367,
1129
+ "step": 160
1130
+ },
1131
+ {
1132
+ "epoch": 2.7443181818181817,
1133
+ "grad_norm": 0.38077075132026156,
1134
+ "learning_rate": 1.7037086855465902e-07,
1135
+ "loss": 0.632,
1136
+ "step": 161
1137
+ },
1138
+ {
1139
+ "epoch": 2.7613636363636362,
1140
+ "grad_norm": 0.39006091928915276,
1141
+ "learning_rate": 1.4529091286973994e-07,
1142
+ "loss": 0.6924,
1143
+ "step": 162
1144
+ },
1145
+ {
1146
+ "epoch": 2.778409090909091,
1147
+ "grad_norm": 0.38840487543638486,
1148
+ "learning_rate": 1.2217974576453072e-07,
1149
+ "loss": 0.7466,
1150
+ "step": 163
1151
+ },
1152
+ {
1153
+ "epoch": 2.7954545454545454,
1154
+ "grad_norm": 0.45834263911645595,
1155
+ "learning_rate": 1.0104673978866164e-07,
1156
+ "loss": 0.6057,
1157
+ "step": 164
1158
+ },
1159
+ {
1160
+ "epoch": 2.8125,
1161
+ "grad_norm": 0.38210612410412104,
1162
+ "learning_rate": 8.190046526428241e-08,
1163
+ "loss": 0.663,
1164
+ "step": 165
1165
+ },
1166
+ {
1167
+ "epoch": 2.8295454545454546,
1168
+ "grad_norm": 0.38478210491106063,
1169
+ "learning_rate": 6.474868681043578e-08,
1170
+ "loss": 0.6762,
1171
+ "step": 166
1172
+ },
1173
+ {
1174
+ "epoch": 2.846590909090909,
1175
+ "grad_norm": 0.3720363705810689,
1176
+ "learning_rate": 4.959836019417963e-08,
1177
+ "loss": 0.6105,
1178
+ "step": 167
1179
+ },
1180
+ {
1181
+ "epoch": 2.8636363636363638,
1182
+ "grad_norm": 0.4440736370045247,
1183
+ "learning_rate": 3.645562950973014e-08,
1184
+ "loss": 0.7567,
1185
+ "step": 168
1186
+ },
1187
+ {
1188
+ "epoch": 2.8806818181818183,
1189
+ "grad_norm": 0.381304638800026,
1190
+ "learning_rate": 2.5325824686772138e-08,
1191
+ "loss": 0.6174,
1192
+ "step": 169
1193
+ },
1194
+ {
1195
+ "epoch": 2.8977272727272725,
1196
+ "grad_norm": 0.4156474533014312,
1197
+ "learning_rate": 1.6213459328950355e-08,
1198
+ "loss": 0.5918,
1199
+ "step": 170
1200
+ },
1201
+ {
1202
+ "epoch": 2.9147727272727275,
1203
+ "grad_norm": 0.4459243063651079,
1204
+ "learning_rate": 9.12222888341252e-09,
1205
+ "loss": 0.693,
1206
+ "step": 171
1207
+ },
1208
+ {
1209
+ "epoch": 2.9318181818181817,
1210
+ "grad_norm": 0.3705961074102094,
1211
+ "learning_rate": 4.055009142152066e-09,
1212
+ "loss": 0.6132,
1213
+ "step": 172
1214
+ },
1215
+ {
1216
+ "epoch": 2.9488636363636362,
1217
+ "grad_norm": 0.3768879535202281,
1218
+ "learning_rate": 1.0138550757493592e-09,
1219
+ "loss": 0.6515,
1220
+ "step": 173
1221
+ },
1222
+ {
1223
+ "epoch": 2.965909090909091,
1224
+ "grad_norm": 0.39814357318394317,
1225
+ "learning_rate": 0.0,
1226
+ "loss": 0.7362,
1227
+ "step": 174
1228
+ },
1229
+ {
1230
+ "epoch": 2.965909090909091,
1231
+ "step": 174,
1232
+ "total_flos": 156886855483392.0,
1233
+ "train_loss": 0.7380151625337272,
1234
+ "train_runtime": 2697.8392,
1235
+ "train_samples_per_second": 6.228,
1236
+ "train_steps_per_second": 0.064
1237
+ }
1238
+ ],
1239
+ "logging_steps": 1,
1240
+ "max_steps": 174,
1241
+ "num_input_tokens_seen": 0,
1242
+ "num_train_epochs": 3,
1243
+ "save_steps": 500,
1244
+ "stateful_callbacks": {
1245
+ "TrainerControl": {
1246
+ "args": {
1247
+ "should_epoch_stop": false,
1248
+ "should_evaluate": false,
1249
+ "should_log": false,
1250
+ "should_save": true,
1251
+ "should_training_stop": true
1252
+ },
1253
+ "attributes": {}
1254
+ }
1255
+ },
1256
+ "total_flos": 156886855483392.0,
1257
+ "train_batch_size": 1,
1258
+ "trial_name": null,
1259
+ "trial_params": null
1260
+ }
training_loss.png ADDED