zitrone44 commited on
Commit
4beaf66
·
1 Parent(s): c2c97a8

feat: add tnn

Browse files
README.md CHANGED
@@ -23,7 +23,7 @@ model-index:
23
  metrics:
24
  - name: Accuracy
25
  type: accuracy
26
- value: 0.8883208808493905
27
  ---
28
 
29
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -33,8 +33,8 @@ should probably proofread and complete it, then remove this comment. -->
33
 
34
  This model is a fine-tuned version of [google/vit-base-patch16-224-in21k](https://huggingface.co/google/vit-base-patch16-224-in21k) on the imagefolder dataset.
35
  It achieves the following results on the evaluation set:
36
- - Loss: 0.3039
37
- - Accuracy: 0.8883
38
 
39
  ## Model description
40
 
@@ -54,7 +54,7 @@ More information needed
54
 
55
  The following hyperparameters were used during training:
56
  - learning_rate: 0.0002
57
- - train_batch_size: 128
58
  - eval_batch_size: 8
59
  - seed: 42
60
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
@@ -65,41 +65,30 @@ The following hyperparameters were used during training:
65
 
66
  | Training Loss | Epoch | Step | Validation Loss | Accuracy |
67
  |:-------------:|:-----:|:----:|:---------------:|:--------:|
68
- | 0.8337 | 0.11 | 100 | 0.7774 | 0.6475 |
69
- | 0.6868 | 0.23 | 200 | 0.6481 | 0.7239 |
70
- | 0.6141 | 0.34 | 300 | 0.6004 | 0.7459 |
71
- | 0.6257 | 0.46 | 400 | 0.5776 | 0.7549 |
72
- | 0.5603 | 0.57 | 500 | 0.5395 | 0.7766 |
73
- | 0.5281 | 0.69 | 600 | 0.5066 | 0.7876 |
74
- | 0.4781 | 0.8 | 700 | 0.4940 | 0.7918 |
75
- | 0.4794 | 0.91 | 800 | 0.4649 | 0.8064 |
76
- | 0.3345 | 1.03 | 900 | 0.4549 | 0.8167 |
77
- | 0.3827 | 1.14 | 1000 | 0.4284 | 0.8231 |
78
- | 0.3415 | 1.26 | 1100 | 0.4137 | 0.8310 |
79
- | 0.3633 | 1.37 | 1200 | 0.3927 | 0.8384 |
80
- | 0.3414 | 1.49 | 1300 | 0.3922 | 0.8390 |
81
- | 0.3441 | 1.6 | 1400 | 0.3774 | 0.8476 |
82
- | 0.316 | 1.71 | 1500 | 0.3788 | 0.8475 |
83
- | 0.3218 | 1.83 | 1600 | 0.3580 | 0.8546 |
84
- | 0.2656 | 1.94 | 1700 | 0.3584 | 0.8597 |
85
- | 0.2005 | 2.06 | 1800 | 0.3576 | 0.8671 |
86
- | 0.181 | 2.17 | 1900 | 0.3426 | 0.8699 |
87
- | 0.2094 | 2.29 | 2000 | 0.3427 | 0.8696 |
88
- | 0.1831 | 2.4 | 2100 | 0.3355 | 0.8755 |
89
- | 0.1774 | 2.51 | 2200 | 0.3325 | 0.8793 |
90
- | 0.2002 | 2.63 | 2300 | 0.3211 | 0.8786 |
91
- | 0.1508 | 2.74 | 2400 | 0.3312 | 0.8818 |
92
- | 0.1669 | 2.86 | 2500 | 0.3132 | 0.8854 |
93
- | 0.1461 | 2.97 | 2600 | 0.3039 | 0.8883 |
94
- | 0.07 | 3.09 | 2700 | 0.3402 | 0.8921 |
95
- | 0.0637 | 3.2 | 2800 | 0.3446 | 0.8944 |
96
- | 0.0807 | 3.31 | 2900 | 0.3425 | 0.8947 |
97
- | 0.0637 | 3.43 | 3000 | 0.3396 | 0.8964 |
98
- | 0.0535 | 3.54 | 3100 | 0.3407 | 0.8971 |
99
- | 0.064 | 3.66 | 3200 | 0.3420 | 0.9002 |
100
- | 0.0707 | 3.77 | 3300 | 0.3314 | 0.8995 |
101
- | 0.058 | 3.89 | 3400 | 0.3286 | 0.9002 |
102
- | 0.048 | 4.0 | 3500 | 0.3263 | 0.9013 |
103
 
104
 
105
  ### Framework versions
 
23
  metrics:
24
  - name: Accuracy
25
  type: accuracy
26
+ value: 0.8962788114412663
27
  ---
28
 
29
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 
33
 
34
  This model is a fine-tuned version of [google/vit-base-patch16-224-in21k](https://huggingface.co/google/vit-base-patch16-224-in21k) on the imagefolder dataset.
35
  It achieves the following results on the evaluation set:
36
+ - Loss: 0.2805
37
+ - Accuracy: 0.8963
38
 
39
  ## Model description
40
 
 
54
 
55
  The following hyperparameters were used during training:
56
  - learning_rate: 0.0002
57
+ - train_batch_size: 256
58
  - eval_batch_size: 8
59
  - seed: 42
60
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 
65
 
66
  | Training Loss | Epoch | Step | Validation Loss | Accuracy |
67
  |:-------------:|:-----:|:----:|:---------------:|:--------:|
68
+ | 0.7945 | 0.16 | 100 | 0.7726 | 0.6916 |
69
+ | 0.6502 | 0.32 | 200 | 0.6581 | 0.7327 |
70
+ | 0.5671 | 0.49 | 300 | 0.5892 | 0.7589 |
71
+ | 0.5625 | 0.65 | 400 | 0.5424 | 0.7754 |
72
+ | 0.5115 | 0.81 | 500 | 0.4990 | 0.7931 |
73
+ | 0.4643 | 0.97 | 600 | 0.4710 | 0.8040 |
74
+ | 0.3586 | 1.13 | 700 | 0.4337 | 0.8221 |
75
+ | 0.3421 | 1.29 | 800 | 0.4097 | 0.8337 |
76
+ | 0.3478 | 1.46 | 900 | 0.3817 | 0.8446 |
77
+ | 0.2965 | 1.62 | 1000 | 0.3754 | 0.8457 |
78
+ | 0.2986 | 1.78 | 1100 | 0.3548 | 0.8550 |
79
+ | 0.2932 | 1.94 | 1200 | 0.3387 | 0.8633 |
80
+ | 0.1701 | 2.1 | 1300 | 0.3415 | 0.8677 |
81
+ | 0.1891 | 2.27 | 1400 | 0.3260 | 0.8766 |
82
+ | 0.1741 | 2.43 | 1500 | 0.3103 | 0.8818 |
83
+ | 0.1542 | 2.59 | 1600 | 0.3061 | 0.8869 |
84
+ | 0.172 | 2.75 | 1700 | 0.2925 | 0.8888 |
85
+ | 0.1575 | 2.91 | 1800 | 0.2805 | 0.8963 |
86
+ | 0.0698 | 3.07 | 1900 | 0.3031 | 0.8984 |
87
+ | 0.0671 | 3.24 | 2000 | 0.3075 | 0.9009 |
88
+ | 0.0576 | 3.4 | 2100 | 0.3051 | 0.9029 |
89
+ | 0.0519 | 3.56 | 2200 | 0.2982 | 0.9066 |
90
+ | 0.0527 | 3.72 | 2300 | 0.2974 | 0.9072 |
91
+ | 0.0561 | 3.88 | 2400 | 0.2912 | 0.9091 |
 
 
 
 
 
 
 
 
 
 
 
92
 
93
 
94
  ### Framework versions
all_results.json CHANGED
@@ -1,13 +1,13 @@
1
  {
2
  "epoch": 4.0,
3
- "eval_accuracy": 0.8883208808493905,
4
- "eval_loss": 0.3038625717163086,
5
- "eval_runtime": 113.9287,
6
- "eval_samples_per_second": 133.926,
7
- "eval_steps_per_second": 16.747,
8
- "total_flos": 3.4689445074730156e+19,
9
- "train_loss": 0.2980680786711829,
10
- "train_runtime": 7553.3952,
11
- "train_samples_per_second": 59.264,
12
- "train_steps_per_second": 0.463
13
  }
 
1
  {
2
  "epoch": 4.0,
3
+ "eval_accuracy": 0.8962788114412663,
4
+ "eval_loss": 0.2805336117744446,
5
+ "eval_runtime": 164.7032,
6
+ "eval_samples_per_second": 131.181,
7
+ "eval_steps_per_second": 16.399,
8
+ "total_flos": 4.902492055502632e+19,
9
+ "train_loss": 0.2963483939957006,
10
+ "train_runtime": 9311.4957,
11
+ "train_samples_per_second": 67.941,
12
+ "train_steps_per_second": 0.265
13
  }
config.json CHANGED
@@ -9,17 +9,21 @@
9
  "hidden_dropout_prob": 0.0,
10
  "hidden_size": 768,
11
  "id2label": {
12
- "0": "up",
13
- "1": "up-left",
14
- "2": "up-right"
 
 
15
  },
16
  "image_size": 224,
17
  "initializer_range": 0.02,
18
  "intermediate_size": 3072,
19
  "label2id": {
20
- "up": "0",
21
- "up-left": "1",
22
- "up-right": "2"
 
 
23
  },
24
  "layer_norm_eps": 1e-12,
25
  "model_type": "vit",
 
9
  "hidden_dropout_prob": 0.0,
10
  "hidden_size": 768,
11
  "id2label": {
12
+ "0": "left",
13
+ "1": "right",
14
+ "2": "up",
15
+ "3": "up-left",
16
+ "4": "up-right"
17
  },
18
  "image_size": 224,
19
  "initializer_range": 0.02,
20
  "intermediate_size": 3072,
21
  "label2id": {
22
+ "left": "0",
23
+ "right": "1",
24
+ "up": "2",
25
+ "up-left": "3",
26
+ "up-right": "4"
27
  },
28
  "layer_norm_eps": 1e-12,
29
  "model_type": "vit",
eval_results.json CHANGED
@@ -1,8 +1,8 @@
1
  {
2
  "epoch": 4.0,
3
- "eval_accuracy": 0.8883208808493905,
4
- "eval_loss": 0.3038625717163086,
5
- "eval_runtime": 113.9287,
6
- "eval_samples_per_second": 133.926,
7
- "eval_steps_per_second": 16.747
8
  }
 
1
  {
2
  "epoch": 4.0,
3
+ "eval_accuracy": 0.8962788114412663,
4
+ "eval_loss": 0.2805336117744446,
5
+ "eval_runtime": 164.7032,
6
+ "eval_samples_per_second": 131.181,
7
+ "eval_steps_per_second": 16.399
8
  }
pytorch_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:33deac17c3b39587f3c296ed5cb235c6c8aee5a4536ac509977f5a7369d4fc6f
3
- size 343271789
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b5dc773afc8ca6694e1f7ef11e739981e312e5348256fdc8ab652e1a9b7b44f8
3
+ size 343277933
train_results.json CHANGED
@@ -1,8 +1,8 @@
1
  {
2
  "epoch": 4.0,
3
- "total_flos": 3.4689445074730156e+19,
4
- "train_loss": 0.2980680786711829,
5
- "train_runtime": 7553.3952,
6
- "train_samples_per_second": 59.264,
7
- "train_steps_per_second": 0.463
8
  }
 
1
  {
2
  "epoch": 4.0,
3
+ "total_flos": 4.902492055502632e+19,
4
+ "train_loss": 0.2963483939957006,
5
+ "train_runtime": 9311.4957,
6
+ "train_samples_per_second": 67.941,
7
+ "train_steps_per_second": 0.265
8
  }
trainer_state.json CHANGED
@@ -1,2443 +1,1726 @@
1
  {
2
- "best_metric": 0.3038625717163086,
3
- "best_model_checkpoint": "./vit-base-tm/checkpoint-2600",
4
  "epoch": 4.0,
5
  "eval_steps": 100,
6
- "global_step": 3500,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
10
  "log_history": [
11
  {
12
- "epoch": 0.01,
13
- "learning_rate": 0.00019942857142857143,
14
- "loss": 1.0787,
15
  "step": 10
16
  },
17
  {
18
- "epoch": 0.02,
19
- "learning_rate": 0.00019885714285714287,
20
- "loss": 1.0282,
21
  "step": 20
22
  },
23
  {
24
- "epoch": 0.03,
25
- "learning_rate": 0.0001982857142857143,
26
- "loss": 1.0362,
27
  "step": 30
28
  },
29
  {
30
- "epoch": 0.05,
31
- "learning_rate": 0.0001977142857142857,
32
- "loss": 0.9613,
33
  "step": 40
34
  },
35
  {
36
- "epoch": 0.06,
37
- "learning_rate": 0.00019714285714285716,
38
- "loss": 0.9225,
39
  "step": 50
40
  },
41
  {
42
- "epoch": 0.07,
43
- "learning_rate": 0.00019657142857142858,
44
- "loss": 0.8672,
45
  "step": 60
46
  },
47
  {
48
- "epoch": 0.08,
49
- "learning_rate": 0.000196,
50
- "loss": 0.8526,
51
  "step": 70
52
  },
53
  {
54
- "epoch": 0.09,
55
- "learning_rate": 0.00019542857142857144,
56
- "loss": 0.818,
57
  "step": 80
58
  },
59
  {
60
- "epoch": 0.1,
61
- "learning_rate": 0.00019485714285714286,
62
- "loss": 0.8688,
63
  "step": 90
64
  },
65
  {
66
- "epoch": 0.11,
67
- "learning_rate": 0.0001942857142857143,
68
- "loss": 0.8337,
69
  "step": 100
70
  },
71
  {
72
- "epoch": 0.11,
73
- "eval_accuracy": 0.6475291650281819,
74
- "eval_loss": 0.7773517966270447,
75
- "eval_runtime": 113.9929,
76
- "eval_samples_per_second": 133.85,
77
- "eval_steps_per_second": 16.738,
78
  "step": 100
79
  },
80
  {
81
- "epoch": 0.13,
82
- "learning_rate": 0.00019371428571428572,
83
- "loss": 0.8004,
84
  "step": 110
85
  },
86
  {
87
- "epoch": 0.14,
88
- "learning_rate": 0.00019314285714285717,
89
- "loss": 0.7706,
90
  "step": 120
91
  },
92
  {
93
- "epoch": 0.15,
94
- "learning_rate": 0.00019257142857142859,
95
- "loss": 0.7266,
96
  "step": 130
97
  },
98
  {
99
- "epoch": 0.16,
100
- "learning_rate": 0.000192,
101
- "loss": 0.7239,
102
  "step": 140
103
  },
104
  {
105
- "epoch": 0.17,
106
- "learning_rate": 0.00019142857142857145,
107
- "loss": 0.7477,
108
  "step": 150
109
  },
110
  {
111
- "epoch": 0.18,
112
- "learning_rate": 0.00019085714285714287,
113
- "loss": 0.7548,
114
  "step": 160
115
  },
116
  {
117
- "epoch": 0.19,
118
- "learning_rate": 0.0001902857142857143,
119
- "loss": 0.7122,
120
  "step": 170
121
  },
122
  {
123
- "epoch": 0.21,
124
- "learning_rate": 0.00018971428571428573,
125
- "loss": 0.699,
126
  "step": 180
127
  },
128
  {
129
- "epoch": 0.22,
130
- "learning_rate": 0.00018914285714285715,
131
- "loss": 0.6659,
132
  "step": 190
133
  },
134
  {
135
- "epoch": 0.23,
136
- "learning_rate": 0.00018857142857142857,
137
- "loss": 0.6868,
138
  "step": 200
139
  },
140
  {
141
- "epoch": 0.23,
142
- "eval_accuracy": 0.723948092803775,
143
- "eval_loss": 0.6481478214263916,
144
- "eval_runtime": 117.1527,
145
- "eval_samples_per_second": 130.24,
146
- "eval_steps_per_second": 16.286,
147
  "step": 200
148
  },
149
  {
150
- "epoch": 0.24,
151
- "learning_rate": 0.000188,
152
- "loss": 0.6739,
153
  "step": 210
154
  },
155
  {
156
- "epoch": 0.25,
157
- "learning_rate": 0.00018742857142857143,
158
- "loss": 0.673,
159
  "step": 220
160
  },
161
  {
162
- "epoch": 0.26,
163
- "learning_rate": 0.00018685714285714285,
164
- "loss": 0.6483,
165
  "step": 230
166
  },
167
  {
168
- "epoch": 0.27,
169
- "learning_rate": 0.0001862857142857143,
170
- "loss": 0.6628,
171
  "step": 240
172
  },
173
  {
174
- "epoch": 0.29,
175
- "learning_rate": 0.00018571428571428572,
176
- "loss": 0.6368,
177
  "step": 250
178
  },
179
  {
180
- "epoch": 0.3,
181
- "learning_rate": 0.00018514285714285716,
182
- "loss": 0.6278,
183
  "step": 260
184
  },
185
  {
186
- "epoch": 0.31,
187
- "learning_rate": 0.00018457142857142858,
188
- "loss": 0.6291,
189
  "step": 270
190
  },
191
  {
192
- "epoch": 0.32,
193
- "learning_rate": 0.00018400000000000003,
194
- "loss": 0.6443,
195
  "step": 280
196
  },
197
  {
198
- "epoch": 0.33,
199
- "learning_rate": 0.00018342857142857145,
200
- "loss": 0.6066,
201
  "step": 290
202
  },
203
  {
204
- "epoch": 0.34,
205
- "learning_rate": 0.00018285714285714286,
206
- "loss": 0.6141,
207
  "step": 300
208
  },
209
  {
210
- "epoch": 0.34,
211
- "eval_accuracy": 0.7459037881766942,
212
- "eval_loss": 0.6003548502922058,
213
- "eval_runtime": 102.9154,
214
- "eval_samples_per_second": 148.258,
215
- "eval_steps_per_second": 18.54,
216
  "step": 300
217
  },
218
  {
219
- "epoch": 0.35,
220
- "learning_rate": 0.0001822857142857143,
221
- "loss": 0.6083,
222
  "step": 310
223
  },
224
  {
225
- "epoch": 0.37,
226
- "learning_rate": 0.00018171428571428573,
227
- "loss": 0.5894,
228
  "step": 320
229
  },
230
  {
231
- "epoch": 0.38,
232
- "learning_rate": 0.00018114285714285715,
233
- "loss": 0.601,
234
  "step": 330
235
  },
236
  {
237
- "epoch": 0.39,
238
- "learning_rate": 0.00018057142857142857,
239
- "loss": 0.5704,
240
  "step": 340
241
  },
242
  {
243
- "epoch": 0.4,
244
- "learning_rate": 0.00018,
245
- "loss": 0.5864,
246
  "step": 350
247
  },
248
  {
249
- "epoch": 0.41,
250
- "learning_rate": 0.00017942857142857143,
251
- "loss": 0.5937,
252
  "step": 360
253
  },
254
  {
255
- "epoch": 0.42,
256
- "learning_rate": 0.00017885714285714285,
257
- "loss": 0.5715,
258
  "step": 370
259
  },
260
  {
261
- "epoch": 0.43,
262
- "learning_rate": 0.0001782857142857143,
263
- "loss": 0.5766,
264
  "step": 380
265
  },
266
  {
267
- "epoch": 0.45,
268
- "learning_rate": 0.0001777142857142857,
269
- "loss": 0.5911,
270
  "step": 390
271
  },
272
  {
273
- "epoch": 0.46,
274
- "learning_rate": 0.00017714285714285713,
275
- "loss": 0.6257,
276
  "step": 400
277
  },
278
  {
279
- "epoch": 0.46,
280
- "eval_accuracy": 0.7548826844933805,
281
- "eval_loss": 0.5776283740997314,
282
- "eval_runtime": 102.3115,
283
- "eval_samples_per_second": 149.133,
284
- "eval_steps_per_second": 18.649,
285
  "step": 400
286
  },
287
  {
288
- "epoch": 0.47,
289
- "learning_rate": 0.00017657142857142858,
290
- "loss": 0.5817,
291
  "step": 410
292
  },
293
  {
294
- "epoch": 0.48,
295
- "learning_rate": 0.00017600000000000002,
296
- "loss": 0.6011,
297
  "step": 420
298
  },
299
  {
300
- "epoch": 0.49,
301
- "learning_rate": 0.00017542857142857144,
302
- "loss": 0.5619,
303
  "step": 430
304
  },
305
  {
306
- "epoch": 0.5,
307
- "learning_rate": 0.0001748571428571429,
308
- "loss": 0.5443,
309
  "step": 440
310
  },
311
  {
312
- "epoch": 0.51,
313
- "learning_rate": 0.0001742857142857143,
314
- "loss": 0.5686,
315
  "step": 450
316
  },
317
  {
318
- "epoch": 0.53,
319
- "learning_rate": 0.00017371428571428572,
320
- "loss": 0.6073,
321
  "step": 460
322
  },
323
  {
324
- "epoch": 0.54,
325
- "learning_rate": 0.00017314285714285717,
326
- "loss": 0.564,
327
  "step": 470
328
  },
329
  {
330
- "epoch": 0.55,
331
- "learning_rate": 0.0001725714285714286,
332
- "loss": 0.5632,
333
  "step": 480
334
  },
335
  {
336
- "epoch": 0.56,
337
- "learning_rate": 0.000172,
338
- "loss": 0.5287,
339
  "step": 490
340
  },
341
  {
342
- "epoch": 0.57,
343
- "learning_rate": 0.00017142857142857143,
344
- "loss": 0.5603,
345
  "step": 500
346
  },
347
  {
348
- "epoch": 0.57,
349
- "eval_accuracy": 0.7765762223096081,
350
- "eval_loss": 0.5395107865333557,
351
- "eval_runtime": 103.0077,
352
- "eval_samples_per_second": 148.125,
353
- "eval_steps_per_second": 18.523,
354
  "step": 500
355
  },
356
  {
357
- "epoch": 0.58,
358
- "learning_rate": 0.00017085714285714287,
359
- "loss": 0.5569,
360
  "step": 510
361
  },
362
  {
363
- "epoch": 0.59,
364
- "learning_rate": 0.0001702857142857143,
365
- "loss": 0.5473,
366
  "step": 520
367
  },
368
  {
369
- "epoch": 0.61,
370
- "learning_rate": 0.0001697142857142857,
371
- "loss": 0.5143,
372
  "step": 530
373
  },
374
  {
375
- "epoch": 0.62,
376
- "learning_rate": 0.00016914285714285715,
377
- "loss": 0.5266,
378
  "step": 540
379
  },
380
  {
381
- "epoch": 0.63,
382
- "learning_rate": 0.00016857142857142857,
383
- "loss": 0.5212,
384
  "step": 550
385
  },
386
  {
387
- "epoch": 0.64,
388
- "learning_rate": 0.000168,
389
- "loss": 0.5306,
390
  "step": 560
391
  },
392
  {
393
- "epoch": 0.65,
394
- "learning_rate": 0.00016742857142857144,
395
- "loss": 0.5439,
396
  "step": 570
397
  },
398
  {
399
- "epoch": 0.66,
400
- "learning_rate": 0.00016685714285714285,
401
- "loss": 0.4931,
402
  "step": 580
403
  },
404
  {
405
- "epoch": 0.67,
406
- "learning_rate": 0.0001662857142857143,
407
- "loss": 0.5461,
408
  "step": 590
409
  },
410
  {
411
- "epoch": 0.69,
412
- "learning_rate": 0.00016571428571428575,
413
- "loss": 0.5281,
414
  "step": 600
415
  },
416
  {
417
- "epoch": 0.69,
418
- "eval_accuracy": 0.7875868396906541,
419
- "eval_loss": 0.5065582394599915,
420
- "eval_runtime": 103.2693,
421
- "eval_samples_per_second": 147.75,
422
- "eval_steps_per_second": 18.476,
423
  "step": 600
424
  },
425
  {
426
- "epoch": 0.7,
427
- "learning_rate": 0.00016514285714285716,
428
- "loss": 0.529,
429
  "step": 610
430
  },
431
  {
432
- "epoch": 0.71,
433
- "learning_rate": 0.00016457142857142858,
434
- "loss": 0.4968,
435
  "step": 620
436
  },
437
  {
438
- "epoch": 0.72,
439
- "learning_rate": 0.000164,
440
- "loss": 0.5311,
441
  "step": 630
442
  },
443
  {
444
- "epoch": 0.73,
445
- "learning_rate": 0.00016342857142857145,
446
- "loss": 0.5113,
447
  "step": 640
448
  },
449
  {
450
- "epoch": 0.74,
451
- "learning_rate": 0.00016285714285714287,
452
- "loss": 0.5082,
453
  "step": 650
454
  },
455
  {
456
- "epoch": 0.75,
457
- "learning_rate": 0.00016228571428571428,
458
- "loss": 0.4817,
459
  "step": 660
460
  },
461
  {
462
- "epoch": 0.77,
463
- "learning_rate": 0.00016171428571428573,
464
- "loss": 0.4749,
465
  "step": 670
466
  },
467
  {
468
- "epoch": 0.78,
469
- "learning_rate": 0.00016114285714285715,
470
- "loss": 0.4792,
471
  "step": 680
472
  },
473
  {
474
- "epoch": 0.79,
475
- "learning_rate": 0.00016057142857142857,
476
- "loss": 0.4822,
477
  "step": 690
478
  },
479
  {
480
- "epoch": 0.8,
481
- "learning_rate": 0.00016,
482
- "loss": 0.4781,
483
  "step": 700
484
  },
485
  {
486
- "epoch": 0.8,
487
- "eval_accuracy": 0.7918468999868922,
488
- "eval_loss": 0.49400392174720764,
489
- "eval_runtime": 103.2184,
490
- "eval_samples_per_second": 147.822,
491
- "eval_steps_per_second": 18.485,
492
  "step": 700
493
  },
494
  {
495
- "epoch": 0.81,
496
- "learning_rate": 0.00015942857142857143,
497
- "loss": 0.5182,
498
  "step": 710
499
  },
500
  {
501
- "epoch": 0.82,
502
- "learning_rate": 0.00015885714285714285,
503
- "loss": 0.4686,
504
  "step": 720
505
  },
506
  {
507
- "epoch": 0.83,
508
- "learning_rate": 0.0001582857142857143,
509
- "loss": 0.4879,
510
  "step": 730
511
  },
512
  {
513
- "epoch": 0.85,
514
- "learning_rate": 0.00015771428571428571,
515
- "loss": 0.5082,
516
  "step": 740
517
  },
518
  {
519
- "epoch": 0.86,
520
- "learning_rate": 0.00015714285714285716,
521
- "loss": 0.5016,
522
  "step": 750
523
  },
524
  {
525
- "epoch": 0.87,
526
- "learning_rate": 0.00015657142857142858,
527
- "loss": 0.488,
528
  "step": 760
529
  },
530
  {
531
- "epoch": 0.88,
532
- "learning_rate": 0.00015600000000000002,
533
- "loss": 0.4707,
534
  "step": 770
535
  },
536
  {
537
- "epoch": 0.89,
538
- "learning_rate": 0.00015542857142857144,
539
- "loss": 0.4671,
540
  "step": 780
541
  },
542
  {
543
- "epoch": 0.9,
544
- "learning_rate": 0.00015485714285714286,
545
- "loss": 0.4609,
546
  "step": 790
547
  },
548
  {
549
- "epoch": 0.91,
550
- "learning_rate": 0.0001542857142857143,
551
- "loss": 0.4794,
552
  "step": 800
553
  },
554
  {
555
- "epoch": 0.91,
556
- "eval_accuracy": 0.8063966443832743,
557
- "eval_loss": 0.4648732841014862,
558
- "eval_runtime": 100.6756,
559
- "eval_samples_per_second": 151.556,
560
- "eval_steps_per_second": 18.952,
561
  "step": 800
562
  },
563
  {
564
- "epoch": 0.93,
565
- "learning_rate": 0.00015371428571428573,
566
- "loss": 0.5067,
567
  "step": 810
568
  },
569
  {
570
- "epoch": 0.94,
571
- "learning_rate": 0.00015314285714285714,
572
- "loss": 0.4436,
573
  "step": 820
574
  },
575
  {
576
- "epoch": 0.95,
577
- "learning_rate": 0.0001525714285714286,
578
- "loss": 0.4588,
579
  "step": 830
580
  },
581
  {
582
- "epoch": 0.96,
583
- "learning_rate": 0.000152,
584
- "loss": 0.4583,
585
  "step": 840
586
  },
587
  {
588
- "epoch": 0.97,
589
- "learning_rate": 0.00015142857142857143,
590
- "loss": 0.4518,
591
  "step": 850
592
  },
593
  {
594
- "epoch": 0.98,
595
- "learning_rate": 0.00015085714285714287,
596
- "loss": 0.4735,
597
  "step": 860
598
  },
599
  {
600
- "epoch": 0.99,
601
- "learning_rate": 0.0001502857142857143,
602
- "loss": 0.4597,
603
  "step": 870
604
  },
605
  {
606
- "epoch": 1.01,
607
- "learning_rate": 0.0001497142857142857,
608
- "loss": 0.4767,
609
  "step": 880
610
  },
611
  {
612
- "epoch": 1.02,
613
- "learning_rate": 0.00014914285714285713,
614
- "loss": 0.3818,
615
  "step": 890
616
  },
617
  {
618
- "epoch": 1.03,
619
- "learning_rate": 0.00014857142857142857,
620
- "loss": 0.3345,
621
  "step": 900
622
  },
623
  {
624
- "epoch": 1.03,
625
- "eval_accuracy": 0.8166863284834185,
626
- "eval_loss": 0.4548985958099365,
627
- "eval_runtime": 107.5234,
628
- "eval_samples_per_second": 141.904,
629
- "eval_steps_per_second": 17.745,
630
  "step": 900
631
  },
632
  {
633
- "epoch": 1.04,
634
- "learning_rate": 0.000148,
635
- "loss": 0.367,
636
  "step": 910
637
  },
638
  {
639
- "epoch": 1.05,
640
- "learning_rate": 0.00014742857142857144,
641
- "loss": 0.4015,
642
  "step": 920
643
  },
644
  {
645
- "epoch": 1.06,
646
- "learning_rate": 0.00014685714285714288,
647
- "loss": 0.3476,
648
  "step": 930
649
  },
650
  {
651
- "epoch": 1.07,
652
- "learning_rate": 0.0001462857142857143,
653
- "loss": 0.3624,
654
  "step": 940
655
  },
656
  {
657
- "epoch": 1.09,
658
- "learning_rate": 0.00014571428571428572,
659
- "loss": 0.3739,
660
  "step": 950
661
  },
662
  {
663
- "epoch": 1.1,
664
- "learning_rate": 0.00014514285714285717,
665
- "loss": 0.3419,
666
  "step": 960
667
  },
668
  {
669
- "epoch": 1.11,
670
- "learning_rate": 0.00014457142857142859,
671
- "loss": 0.3556,
672
  "step": 970
673
  },
674
  {
675
- "epoch": 1.12,
676
- "learning_rate": 0.000144,
677
- "loss": 0.3361,
678
  "step": 980
679
  },
680
  {
681
- "epoch": 1.13,
682
- "learning_rate": 0.00014342857142857145,
683
- "loss": 0.3346,
684
  "step": 990
685
  },
686
  {
687
- "epoch": 1.14,
688
- "learning_rate": 0.00014285714285714287,
689
- "loss": 0.3827,
690
  "step": 1000
691
  },
692
  {
693
- "epoch": 1.14,
694
- "eval_accuracy": 0.823109188622362,
695
- "eval_loss": 0.4283953905105591,
696
- "eval_runtime": 109.8077,
697
- "eval_samples_per_second": 138.952,
698
- "eval_steps_per_second": 17.376,
699
  "step": 1000
700
  },
701
  {
702
- "epoch": 1.15,
703
- "learning_rate": 0.00014228571428571429,
704
- "loss": 0.3653,
705
  "step": 1010
706
  },
707
  {
708
- "epoch": 1.17,
709
- "learning_rate": 0.0001417142857142857,
710
- "loss": 0.3798,
711
  "step": 1020
712
  },
713
  {
714
- "epoch": 1.18,
715
- "learning_rate": 0.00014114285714285715,
716
- "loss": 0.3895,
717
  "step": 1030
718
  },
719
  {
720
- "epoch": 1.19,
721
- "learning_rate": 0.00014057142857142857,
722
- "loss": 0.3631,
723
  "step": 1040
724
  },
725
  {
726
- "epoch": 1.2,
727
- "learning_rate": 0.00014,
728
- "loss": 0.4126,
729
  "step": 1050
730
  },
731
  {
732
- "epoch": 1.21,
733
- "learning_rate": 0.00013942857142857143,
734
- "loss": 0.3626,
735
  "step": 1060
736
  },
737
  {
738
- "epoch": 1.22,
739
- "learning_rate": 0.00013885714285714285,
740
- "loss": 0.3848,
741
  "step": 1070
742
  },
743
  {
744
- "epoch": 1.23,
745
- "learning_rate": 0.0001382857142857143,
746
- "loss": 0.364,
747
  "step": 1080
748
  },
749
  {
750
- "epoch": 1.25,
751
- "learning_rate": 0.00013771428571428572,
752
- "loss": 0.3745,
753
  "step": 1090
754
  },
755
  {
756
- "epoch": 1.26,
757
- "learning_rate": 0.00013714285714285716,
758
- "loss": 0.3415,
759
  "step": 1100
760
  },
761
  {
762
- "epoch": 1.26,
763
- "eval_accuracy": 0.8310394547122821,
764
- "eval_loss": 0.4137427806854248,
765
- "eval_runtime": 111.3131,
766
- "eval_samples_per_second": 137.073,
767
- "eval_steps_per_second": 17.141,
768
  "step": 1100
769
  },
770
  {
771
- "epoch": 1.27,
772
- "learning_rate": 0.00013657142857142858,
773
- "loss": 0.3618,
774
  "step": 1110
775
  },
776
  {
777
- "epoch": 1.28,
778
- "learning_rate": 0.00013600000000000003,
779
- "loss": 0.3962,
780
  "step": 1120
781
  },
782
  {
783
- "epoch": 1.29,
784
- "learning_rate": 0.00013542857142857144,
785
- "loss": 0.3811,
786
  "step": 1130
787
  },
788
  {
789
- "epoch": 1.3,
790
- "learning_rate": 0.00013485714285714286,
791
- "loss": 0.346,
792
  "step": 1140
793
  },
794
  {
795
- "epoch": 1.31,
796
- "learning_rate": 0.00013428571428571428,
797
- "loss": 0.3497,
798
  "step": 1150
799
  },
800
  {
801
- "epoch": 1.33,
802
- "learning_rate": 0.00013371428571428573,
803
- "loss": 0.3583,
804
  "step": 1160
805
  },
806
  {
807
- "epoch": 1.34,
808
- "learning_rate": 0.00013314285714285715,
809
- "loss": 0.321,
810
  "step": 1170
811
  },
812
  {
813
- "epoch": 1.35,
814
- "learning_rate": 0.00013257142857142856,
815
- "loss": 0.3493,
816
  "step": 1180
817
  },
818
  {
819
- "epoch": 1.36,
820
- "learning_rate": 0.000132,
821
- "loss": 0.3748,
822
  "step": 1190
823
  },
824
  {
825
- "epoch": 1.37,
826
- "learning_rate": 0.00013142857142857143,
827
- "loss": 0.3633,
828
  "step": 1200
829
  },
830
  {
831
- "epoch": 1.37,
832
- "eval_accuracy": 0.838445405688819,
833
- "eval_loss": 0.3926539719104767,
834
- "eval_runtime": 111.5423,
835
- "eval_samples_per_second": 136.791,
836
- "eval_steps_per_second": 17.106,
837
  "step": 1200
838
  },
839
  {
840
- "epoch": 1.38,
841
- "learning_rate": 0.00013085714285714285,
842
- "loss": 0.3464,
843
  "step": 1210
844
  },
845
  {
846
- "epoch": 1.39,
847
- "learning_rate": 0.0001302857142857143,
848
- "loss": 0.2921,
849
  "step": 1220
850
  },
851
  {
852
- "epoch": 1.41,
853
- "learning_rate": 0.0001297142857142857,
854
- "loss": 0.3399,
855
  "step": 1230
856
  },
857
  {
858
- "epoch": 1.42,
859
- "learning_rate": 0.00012914285714285713,
860
- "loss": 0.3452,
861
  "step": 1240
862
  },
863
  {
864
- "epoch": 1.43,
865
- "learning_rate": 0.00012857142857142858,
866
- "loss": 0.3681,
867
  "step": 1250
868
  },
869
  {
870
- "epoch": 1.44,
871
- "learning_rate": 0.00012800000000000002,
872
- "loss": 0.3567,
873
  "step": 1260
874
  },
875
  {
876
- "epoch": 1.45,
877
- "learning_rate": 0.00012742857142857144,
878
- "loss": 0.3145,
879
  "step": 1270
880
  },
881
  {
882
- "epoch": 1.46,
883
- "learning_rate": 0.00012685714285714286,
884
- "loss": 0.3283,
885
  "step": 1280
886
  },
887
  {
888
- "epoch": 1.47,
889
- "learning_rate": 0.0001262857142857143,
890
- "loss": 0.3575,
891
  "step": 1290
892
  },
893
  {
894
- "epoch": 1.49,
895
- "learning_rate": 0.00012571428571428572,
896
- "loss": 0.3414,
897
  "step": 1300
898
  },
899
  {
900
- "epoch": 1.49,
901
- "eval_accuracy": 0.8389697208022021,
902
- "eval_loss": 0.3922114372253418,
903
- "eval_runtime": 109.5467,
904
- "eval_samples_per_second": 139.283,
905
- "eval_steps_per_second": 17.417,
906
  "step": 1300
907
  },
908
  {
909
- "epoch": 1.5,
910
- "learning_rate": 0.00012514285714285714,
911
- "loss": 0.3507,
912
  "step": 1310
913
  },
914
  {
915
- "epoch": 1.51,
916
- "learning_rate": 0.0001245714285714286,
917
- "loss": 0.348,
918
  "step": 1320
919
  },
920
  {
921
- "epoch": 1.52,
922
- "learning_rate": 0.000124,
923
- "loss": 0.3571,
924
  "step": 1330
925
  },
926
  {
927
- "epoch": 1.53,
928
- "learning_rate": 0.00012342857142857142,
929
- "loss": 0.3203,
930
  "step": 1340
931
  },
932
  {
933
- "epoch": 1.54,
934
- "learning_rate": 0.00012285714285714287,
935
- "loss": 0.3186,
936
  "step": 1350
937
  },
938
  {
939
- "epoch": 1.55,
940
- "learning_rate": 0.0001222857142857143,
941
- "loss": 0.3308,
942
  "step": 1360
943
  },
944
  {
945
- "epoch": 1.57,
946
- "learning_rate": 0.00012171428571428572,
947
- "loss": 0.3396,
948
  "step": 1370
949
  },
950
  {
951
- "epoch": 1.58,
952
- "learning_rate": 0.00012114285714285715,
953
- "loss": 0.3362,
954
  "step": 1380
955
  },
956
  {
957
- "epoch": 1.59,
958
- "learning_rate": 0.00012057142857142858,
959
- "loss": 0.3762,
960
  "step": 1390
961
  },
962
  {
963
- "epoch": 1.6,
964
- "learning_rate": 0.00012,
965
- "loss": 0.3441,
966
  "step": 1400
967
  },
968
  {
969
- "epoch": 1.6,
970
- "eval_accuracy": 0.8475553807838511,
971
- "eval_loss": 0.3774389624595642,
972
- "eval_runtime": 103.1232,
973
- "eval_samples_per_second": 147.959,
974
- "eval_steps_per_second": 18.502,
975
  "step": 1400
976
  },
977
  {
978
- "epoch": 1.61,
979
- "learning_rate": 0.00011942857142857145,
980
- "loss": 0.3553,
981
  "step": 1410
982
  },
983
  {
984
- "epoch": 1.62,
985
- "learning_rate": 0.00011885714285714287,
986
- "loss": 0.3274,
987
  "step": 1420
988
  },
989
  {
990
- "epoch": 1.63,
991
- "learning_rate": 0.00011828571428571429,
992
- "loss": 0.3206,
993
  "step": 1430
994
  },
995
  {
996
- "epoch": 1.65,
997
- "learning_rate": 0.0001177142857142857,
998
- "loss": 0.3132,
999
  "step": 1440
1000
  },
1001
  {
1002
- "epoch": 1.66,
1003
- "learning_rate": 0.00011714285714285715,
1004
- "loss": 0.3229,
1005
  "step": 1450
1006
  },
1007
  {
1008
- "epoch": 1.67,
1009
- "learning_rate": 0.00011657142857142858,
1010
- "loss": 0.3466,
1011
  "step": 1460
1012
  },
1013
  {
1014
- "epoch": 1.68,
1015
- "learning_rate": 0.000116,
1016
- "loss": 0.315,
1017
  "step": 1470
1018
  },
1019
  {
1020
- "epoch": 1.69,
1021
- "learning_rate": 0.00011542857142857145,
1022
- "loss": 0.3097,
1023
  "step": 1480
1024
  },
1025
  {
1026
- "epoch": 1.7,
1027
- "learning_rate": 0.00011485714285714286,
1028
- "loss": 0.3281,
1029
  "step": 1490
1030
  },
1031
  {
1032
- "epoch": 1.71,
1033
- "learning_rate": 0.00011428571428571428,
1034
- "loss": 0.316,
1035
  "step": 1500
1036
  },
1037
  {
1038
- "epoch": 1.71,
1039
- "eval_accuracy": 0.8474898413946782,
1040
- "eval_loss": 0.37882503867149353,
1041
- "eval_runtime": 108.3361,
1042
- "eval_samples_per_second": 140.839,
1043
- "eval_steps_per_second": 17.612,
1044
  "step": 1500
1045
  },
1046
  {
1047
- "epoch": 1.73,
1048
- "learning_rate": 0.00011371428571428573,
1049
- "loss": 0.341,
1050
  "step": 1510
1051
  },
1052
  {
1053
- "epoch": 1.74,
1054
- "learning_rate": 0.00011314285714285715,
1055
- "loss": 0.3395,
1056
  "step": 1520
1057
  },
1058
  {
1059
- "epoch": 1.75,
1060
- "learning_rate": 0.00011257142857142857,
1061
- "loss": 0.3301,
1062
  "step": 1530
1063
  },
1064
  {
1065
- "epoch": 1.76,
1066
- "learning_rate": 0.00011200000000000001,
1067
- "loss": 0.2956,
1068
  "step": 1540
1069
  },
1070
  {
1071
- "epoch": 1.77,
1072
- "learning_rate": 0.00011142857142857144,
1073
- "loss": 0.3095,
1074
  "step": 1550
1075
  },
1076
  {
1077
- "epoch": 1.78,
1078
- "learning_rate": 0.00011085714285714286,
1079
- "loss": 0.3101,
1080
  "step": 1560
1081
  },
1082
  {
1083
- "epoch": 1.79,
1084
- "learning_rate": 0.00011028571428571428,
1085
- "loss": 0.3059,
1086
  "step": 1570
1087
  },
1088
  {
1089
- "epoch": 1.81,
1090
- "learning_rate": 0.00010971428571428573,
1091
- "loss": 0.3056,
1092
  "step": 1580
1093
  },
1094
  {
1095
- "epoch": 1.82,
1096
- "learning_rate": 0.00010914285714285715,
1097
- "loss": 0.2812,
1098
  "step": 1590
1099
  },
1100
  {
1101
- "epoch": 1.83,
1102
- "learning_rate": 0.00010857142857142856,
1103
- "loss": 0.3218,
1104
  "step": 1600
1105
  },
1106
  {
1107
- "epoch": 1.83,
1108
- "eval_accuracy": 0.8546336348145235,
1109
- "eval_loss": 0.35803067684173584,
1110
- "eval_runtime": 108.5802,
1111
- "eval_samples_per_second": 140.523,
1112
- "eval_steps_per_second": 17.572,
1113
  "step": 1600
1114
  },
1115
  {
1116
- "epoch": 1.84,
1117
- "learning_rate": 0.00010800000000000001,
1118
- "loss": 0.3049,
1119
  "step": 1610
1120
  },
1121
  {
1122
- "epoch": 1.85,
1123
- "learning_rate": 0.00010742857142857143,
1124
- "loss": 0.3463,
1125
  "step": 1620
1126
  },
1127
  {
1128
- "epoch": 1.86,
1129
- "learning_rate": 0.00010685714285714286,
1130
- "loss": 0.2694,
1131
  "step": 1630
1132
  },
1133
  {
1134
- "epoch": 1.87,
1135
- "learning_rate": 0.0001062857142857143,
1136
- "loss": 0.3018,
1137
  "step": 1640
1138
  },
1139
  {
1140
- "epoch": 1.89,
1141
- "learning_rate": 0.00010571428571428572,
1142
- "loss": 0.3346,
1143
  "step": 1650
1144
  },
1145
  {
1146
- "epoch": 1.9,
1147
- "learning_rate": 0.00010514285714285714,
1148
- "loss": 0.3274,
1149
  "step": 1660
1150
  },
1151
  {
1152
- "epoch": 1.91,
1153
- "learning_rate": 0.00010457142857142859,
1154
- "loss": 0.3277,
1155
  "step": 1670
1156
  },
1157
  {
1158
- "epoch": 1.92,
1159
- "learning_rate": 0.00010400000000000001,
1160
- "loss": 0.3092,
1161
  "step": 1680
1162
  },
1163
  {
1164
- "epoch": 1.93,
1165
- "learning_rate": 0.00010342857142857143,
1166
- "loss": 0.3119,
1167
  "step": 1690
1168
  },
1169
  {
1170
- "epoch": 1.94,
1171
- "learning_rate": 0.00010285714285714286,
1172
- "loss": 0.2656,
1173
  "step": 1700
1174
  },
1175
  {
1176
- "epoch": 1.94,
1177
- "eval_accuracy": 0.8596801677808363,
1178
- "eval_loss": 0.3583575189113617,
1179
- "eval_runtime": 112.4135,
1180
- "eval_samples_per_second": 135.731,
1181
- "eval_steps_per_second": 16.973,
1182
  "step": 1700
1183
  },
1184
  {
1185
- "epoch": 1.95,
1186
- "learning_rate": 0.00010228571428571429,
1187
- "loss": 0.2957,
1188
  "step": 1710
1189
  },
1190
  {
1191
- "epoch": 1.97,
1192
- "learning_rate": 0.00010171428571428572,
1193
- "loss": 0.2941,
1194
  "step": 1720
1195
  },
1196
  {
1197
- "epoch": 1.98,
1198
- "learning_rate": 0.00010114285714285714,
1199
- "loss": 0.3133,
1200
  "step": 1730
1201
  },
1202
  {
1203
- "epoch": 1.99,
1204
- "learning_rate": 0.00010057142857142859,
1205
- "loss": 0.2963,
1206
  "step": 1740
1207
  },
1208
  {
1209
- "epoch": 2.0,
1210
- "learning_rate": 0.0001,
1211
- "loss": 0.2892,
1212
  "step": 1750
1213
  },
1214
  {
1215
- "epoch": 2.01,
1216
- "learning_rate": 9.942857142857144e-05,
1217
- "loss": 0.2107,
1218
  "step": 1760
1219
  },
1220
  {
1221
- "epoch": 2.02,
1222
- "learning_rate": 9.885714285714286e-05,
1223
- "loss": 0.2141,
1224
  "step": 1770
1225
  },
1226
  {
1227
- "epoch": 2.03,
1228
- "learning_rate": 9.828571428571429e-05,
1229
- "loss": 0.2031,
1230
  "step": 1780
1231
  },
1232
  {
1233
- "epoch": 2.05,
1234
- "learning_rate": 9.771428571428572e-05,
1235
- "loss": 0.1953,
1236
  "step": 1790
1237
  },
1238
  {
1239
- "epoch": 2.06,
1240
- "learning_rate": 9.714285714285715e-05,
1241
- "loss": 0.2005,
1242
  "step": 1800
1243
  },
1244
  {
1245
- "epoch": 2.06,
1246
- "eval_accuracy": 0.8670861187573732,
1247
- "eval_loss": 0.357607901096344,
1248
- "eval_runtime": 111.227,
1249
- "eval_samples_per_second": 137.179,
1250
- "eval_steps_per_second": 17.154,
1251
  "step": 1800
1252
  },
1253
  {
1254
- "epoch": 2.07,
1255
- "learning_rate": 9.657142857142858e-05,
1256
- "loss": 0.2209,
1257
  "step": 1810
1258
  },
1259
  {
1260
- "epoch": 2.08,
1261
- "learning_rate": 9.6e-05,
1262
- "loss": 0.2035,
1263
  "step": 1820
1264
  },
1265
  {
1266
- "epoch": 2.09,
1267
- "learning_rate": 9.542857142857143e-05,
1268
- "loss": 0.1859,
1269
  "step": 1830
1270
  },
1271
  {
1272
- "epoch": 2.1,
1273
- "learning_rate": 9.485714285714287e-05,
1274
- "loss": 0.1761,
1275
  "step": 1840
1276
  },
1277
  {
1278
- "epoch": 2.11,
1279
- "learning_rate": 9.428571428571429e-05,
1280
- "loss": 0.1789,
1281
  "step": 1850
1282
  },
1283
  {
1284
- "epoch": 2.13,
1285
- "learning_rate": 9.371428571428572e-05,
1286
- "loss": 0.1896,
1287
  "step": 1860
1288
  },
1289
  {
1290
- "epoch": 2.14,
1291
- "learning_rate": 9.314285714285715e-05,
1292
- "loss": 0.1787,
1293
  "step": 1870
1294
  },
1295
  {
1296
- "epoch": 2.15,
1297
- "learning_rate": 9.257142857142858e-05,
1298
- "loss": 0.1793,
1299
  "step": 1880
1300
  },
1301
  {
1302
- "epoch": 2.16,
1303
- "learning_rate": 9.200000000000001e-05,
1304
- "loss": 0.2135,
1305
  "step": 1890
1306
  },
1307
  {
1308
- "epoch": 2.17,
1309
- "learning_rate": 9.142857142857143e-05,
1310
- "loss": 0.181,
1311
  "step": 1900
1312
  },
1313
  {
1314
- "epoch": 2.17,
1315
- "eval_accuracy": 0.8699043124918076,
1316
- "eval_loss": 0.3425835371017456,
1317
- "eval_runtime": 107.3192,
1318
- "eval_samples_per_second": 142.174,
1319
- "eval_steps_per_second": 17.779,
1320
  "step": 1900
1321
  },
1322
  {
1323
- "epoch": 2.18,
1324
- "learning_rate": 9.085714285714286e-05,
1325
- "loss": 0.1827,
1326
  "step": 1910
1327
  },
1328
  {
1329
- "epoch": 2.19,
1330
- "learning_rate": 9.028571428571428e-05,
1331
- "loss": 0.1782,
1332
  "step": 1920
1333
  },
1334
  {
1335
- "epoch": 2.21,
1336
- "learning_rate": 8.971428571428571e-05,
1337
- "loss": 0.1748,
1338
  "step": 1930
1339
  },
1340
  {
1341
- "epoch": 2.22,
1342
- "learning_rate": 8.914285714285715e-05,
1343
- "loss": 0.2041,
1344
  "step": 1940
1345
  },
1346
  {
1347
- "epoch": 2.23,
1348
- "learning_rate": 8.857142857142857e-05,
1349
- "loss": 0.193,
1350
  "step": 1950
1351
  },
1352
  {
1353
- "epoch": 2.24,
1354
- "learning_rate": 8.800000000000001e-05,
1355
- "loss": 0.1665,
1356
  "step": 1960
1357
  },
1358
  {
1359
- "epoch": 2.25,
1360
- "learning_rate": 8.742857142857144e-05,
1361
- "loss": 0.1671,
1362
  "step": 1970
1363
  },
1364
  {
1365
- "epoch": 2.26,
1366
- "learning_rate": 8.685714285714286e-05,
1367
- "loss": 0.1549,
1368
  "step": 1980
1369
  },
1370
  {
1371
- "epoch": 2.27,
1372
- "learning_rate": 8.62857142857143e-05,
1373
- "loss": 0.175,
1374
  "step": 1990
1375
  },
1376
  {
1377
- "epoch": 2.29,
1378
- "learning_rate": 8.571428571428571e-05,
1379
- "loss": 0.2094,
1380
  "step": 2000
1381
  },
1382
  {
1383
- "epoch": 2.29,
1384
- "eval_accuracy": 0.869642154935116,
1385
- "eval_loss": 0.3426852524280548,
1386
- "eval_runtime": 116.2829,
1387
- "eval_samples_per_second": 131.215,
1388
- "eval_steps_per_second": 16.408,
1389
  "step": 2000
1390
  },
1391
  {
1392
- "epoch": 2.3,
1393
- "learning_rate": 8.514285714285714e-05,
1394
- "loss": 0.1989,
1395
  "step": 2010
1396
  },
1397
  {
1398
- "epoch": 2.31,
1399
- "learning_rate": 8.457142857142858e-05,
1400
- "loss": 0.2089,
1401
  "step": 2020
1402
  },
1403
  {
1404
- "epoch": 2.32,
1405
- "learning_rate": 8.4e-05,
1406
- "loss": 0.1664,
1407
  "step": 2030
1408
  },
1409
  {
1410
- "epoch": 2.33,
1411
- "learning_rate": 8.342857142857143e-05,
1412
- "loss": 0.2071,
1413
  "step": 2040
1414
  },
1415
  {
1416
- "epoch": 2.34,
1417
- "learning_rate": 8.285714285714287e-05,
1418
- "loss": 0.1739,
1419
  "step": 2050
1420
  },
1421
  {
1422
- "epoch": 2.35,
1423
- "learning_rate": 8.228571428571429e-05,
1424
- "loss": 0.1854,
1425
  "step": 2060
1426
  },
1427
  {
1428
- "epoch": 2.37,
1429
- "learning_rate": 8.171428571428572e-05,
1430
- "loss": 0.1897,
1431
  "step": 2070
1432
  },
1433
  {
1434
- "epoch": 2.38,
1435
- "learning_rate": 8.114285714285714e-05,
1436
- "loss": 0.1681,
1437
  "step": 2080
1438
  },
1439
  {
1440
- "epoch": 2.39,
1441
- "learning_rate": 8.057142857142857e-05,
1442
- "loss": 0.1761,
1443
  "step": 2090
1444
  },
1445
  {
1446
- "epoch": 2.4,
1447
- "learning_rate": 8e-05,
1448
- "loss": 0.1831,
1449
  "step": 2100
1450
  },
1451
  {
1452
- "epoch": 2.4,
1453
- "eval_accuracy": 0.8754751605715034,
1454
- "eval_loss": 0.33553817868232727,
1455
- "eval_runtime": 102.6822,
1456
- "eval_samples_per_second": 148.594,
1457
- "eval_steps_per_second": 18.582,
1458
  "step": 2100
1459
  },
1460
  {
1461
- "epoch": 2.41,
1462
- "learning_rate": 7.942857142857143e-05,
1463
- "loss": 0.1707,
1464
  "step": 2110
1465
  },
1466
  {
1467
- "epoch": 2.42,
1468
- "learning_rate": 7.885714285714286e-05,
1469
- "loss": 0.1926,
1470
  "step": 2120
1471
  },
1472
  {
1473
- "epoch": 2.43,
1474
- "learning_rate": 7.828571428571429e-05,
1475
- "loss": 0.1573,
1476
  "step": 2130
1477
  },
1478
  {
1479
- "epoch": 2.45,
1480
- "learning_rate": 7.771428571428572e-05,
1481
- "loss": 0.1884,
1482
  "step": 2140
1483
  },
1484
  {
1485
- "epoch": 2.46,
1486
- "learning_rate": 7.714285714285715e-05,
1487
- "loss": 0.19,
1488
  "step": 2150
1489
  },
1490
  {
1491
- "epoch": 2.47,
1492
- "learning_rate": 7.657142857142857e-05,
1493
- "loss": 0.1812,
1494
  "step": 2160
1495
  },
1496
  {
1497
- "epoch": 2.48,
1498
- "learning_rate": 7.6e-05,
1499
- "loss": 0.1701,
1500
  "step": 2170
1501
  },
1502
  {
1503
- "epoch": 2.49,
1504
- "learning_rate": 7.542857142857144e-05,
1505
- "loss": 0.1842,
1506
  "step": 2180
1507
  },
1508
  {
1509
- "epoch": 2.5,
1510
- "learning_rate": 7.485714285714285e-05,
1511
- "loss": 0.1924,
1512
  "step": 2190
1513
  },
1514
  {
1515
- "epoch": 2.51,
1516
- "learning_rate": 7.428571428571429e-05,
1517
- "loss": 0.1774,
1518
  "step": 2200
1519
  },
1520
  {
1521
- "epoch": 2.51,
1522
- "eval_accuracy": 0.8793419845327042,
1523
- "eval_loss": 0.3325485289096832,
1524
- "eval_runtime": 113.8154,
1525
- "eval_samples_per_second": 134.059,
1526
- "eval_steps_per_second": 16.764,
1527
  "step": 2200
1528
  },
1529
  {
1530
- "epoch": 2.53,
1531
- "learning_rate": 7.371428571428572e-05,
1532
- "loss": 0.1912,
1533
  "step": 2210
1534
  },
1535
  {
1536
- "epoch": 2.54,
1537
- "learning_rate": 7.314285714285715e-05,
1538
- "loss": 0.1695,
1539
  "step": 2220
1540
  },
1541
  {
1542
- "epoch": 2.55,
1543
- "learning_rate": 7.257142857142858e-05,
1544
- "loss": 0.1797,
1545
  "step": 2230
1546
  },
1547
  {
1548
- "epoch": 2.56,
1549
- "learning_rate": 7.2e-05,
1550
- "loss": 0.1441,
1551
  "step": 2240
1552
  },
1553
  {
1554
- "epoch": 2.57,
1555
- "learning_rate": 7.142857142857143e-05,
1556
- "loss": 0.1746,
1557
  "step": 2250
1558
  },
1559
  {
1560
- "epoch": 2.58,
1561
- "learning_rate": 7.085714285714285e-05,
1562
- "loss": 0.196,
1563
  "step": 2260
1564
  },
1565
  {
1566
- "epoch": 2.59,
1567
- "learning_rate": 7.028571428571428e-05,
1568
- "loss": 0.1671,
1569
  "step": 2270
1570
  },
1571
  {
1572
- "epoch": 2.61,
1573
- "learning_rate": 6.971428571428572e-05,
1574
- "loss": 0.1702,
1575
  "step": 2280
1576
  },
1577
  {
1578
- "epoch": 2.62,
1579
- "learning_rate": 6.914285714285715e-05,
1580
- "loss": 0.1897,
1581
  "step": 2290
1582
  },
1583
  {
1584
- "epoch": 2.63,
1585
- "learning_rate": 6.857142857142858e-05,
1586
- "loss": 0.2002,
1587
  "step": 2300
1588
  },
1589
  {
1590
- "epoch": 2.63,
1591
- "eval_accuracy": 0.8785555118626295,
1592
- "eval_loss": 0.32107195258140564,
1593
- "eval_runtime": 107.3933,
1594
- "eval_samples_per_second": 142.076,
1595
- "eval_steps_per_second": 17.766,
1596
  "step": 2300
1597
  },
1598
  {
1599
- "epoch": 2.64,
1600
- "learning_rate": 6.800000000000001e-05,
1601
- "loss": 0.1792,
1602
  "step": 2310
1603
  },
1604
  {
1605
- "epoch": 2.65,
1606
- "learning_rate": 6.742857142857143e-05,
1607
- "loss": 0.1693,
1608
  "step": 2320
1609
  },
1610
  {
1611
- "epoch": 2.66,
1612
- "learning_rate": 6.685714285714286e-05,
1613
- "loss": 0.2175,
1614
  "step": 2330
1615
  },
1616
  {
1617
- "epoch": 2.67,
1618
- "learning_rate": 6.628571428571428e-05,
1619
- "loss": 0.2042,
1620
  "step": 2340
1621
  },
1622
  {
1623
- "epoch": 2.69,
1624
- "learning_rate": 6.571428571428571e-05,
1625
- "loss": 0.186,
1626
  "step": 2350
1627
  },
1628
  {
1629
- "epoch": 2.7,
1630
- "learning_rate": 6.514285714285715e-05,
1631
- "loss": 0.1862,
1632
  "step": 2360
1633
  },
1634
  {
1635
- "epoch": 2.71,
1636
- "learning_rate": 6.457142857142856e-05,
1637
- "loss": 0.1873,
1638
  "step": 2370
1639
  },
1640
  {
1641
- "epoch": 2.72,
1642
- "learning_rate": 6.400000000000001e-05,
1643
- "loss": 0.1616,
1644
  "step": 2380
1645
  },
1646
  {
1647
- "epoch": 2.73,
1648
- "learning_rate": 6.342857142857143e-05,
1649
- "loss": 0.1485,
1650
  "step": 2390
1651
  },
1652
  {
1653
- "epoch": 2.74,
1654
- "learning_rate": 6.285714285714286e-05,
1655
- "loss": 0.1508,
1656
  "step": 2400
1657
  },
1658
  {
1659
- "epoch": 2.74,
1660
- "eval_accuracy": 0.8818324813212741,
1661
- "eval_loss": 0.33117443323135376,
1662
- "eval_runtime": 109.2375,
1663
- "eval_samples_per_second": 139.677,
1664
- "eval_steps_per_second": 17.467,
1665
  "step": 2400
1666
  },
1667
  {
1668
- "epoch": 2.75,
1669
- "learning_rate": 6.22857142857143e-05,
1670
- "loss": 0.1671,
1671
  "step": 2410
1672
  },
1673
  {
1674
- "epoch": 2.77,
1675
- "learning_rate": 6.171428571428571e-05,
1676
- "loss": 0.1592,
1677
  "step": 2420
1678
  },
1679
  {
1680
- "epoch": 2.78,
1681
- "learning_rate": 6.114285714285714e-05,
1682
- "loss": 0.194,
1683
  "step": 2430
1684
  },
1685
  {
1686
- "epoch": 2.79,
1687
- "learning_rate": 6.0571428571428576e-05,
1688
- "loss": 0.1807,
1689
  "step": 2440
1690
  },
1691
  {
1692
- "epoch": 2.8,
1693
- "learning_rate": 6e-05,
1694
- "loss": 0.1739,
1695
  "step": 2450
1696
  },
1697
- {
1698
- "epoch": 2.81,
1699
- "learning_rate": 5.9428571428571434e-05,
1700
- "loss": 0.1636,
1701
- "step": 2460
1702
- },
1703
- {
1704
- "epoch": 2.82,
1705
- "learning_rate": 5.885714285714285e-05,
1706
- "loss": 0.1435,
1707
- "step": 2470
1708
- },
1709
- {
1710
- "epoch": 2.83,
1711
- "learning_rate": 5.828571428571429e-05,
1712
- "loss": 0.1587,
1713
- "step": 2480
1714
- },
1715
- {
1716
- "epoch": 2.85,
1717
- "learning_rate": 5.771428571428572e-05,
1718
- "loss": 0.1543,
1719
- "step": 2490
1720
- },
1721
- {
1722
- "epoch": 2.86,
1723
- "learning_rate": 5.714285714285714e-05,
1724
- "loss": 0.1669,
1725
- "step": 2500
1726
- },
1727
- {
1728
- "epoch": 2.86,
1729
- "eval_accuracy": 0.8854371477257832,
1730
- "eval_loss": 0.3132110834121704,
1731
- "eval_runtime": 114.59,
1732
- "eval_samples_per_second": 133.153,
1733
- "eval_steps_per_second": 16.651,
1734
- "step": 2500
1735
- },
1736
- {
1737
- "epoch": 2.87,
1738
- "learning_rate": 5.6571428571428574e-05,
1739
- "loss": 0.1679,
1740
- "step": 2510
1741
- },
1742
- {
1743
- "epoch": 2.88,
1744
- "learning_rate": 5.6000000000000006e-05,
1745
- "loss": 0.1467,
1746
- "step": 2520
1747
- },
1748
- {
1749
- "epoch": 2.89,
1750
- "learning_rate": 5.542857142857143e-05,
1751
- "loss": 0.1495,
1752
- "step": 2530
1753
- },
1754
- {
1755
- "epoch": 2.9,
1756
- "learning_rate": 5.485714285714286e-05,
1757
- "loss": 0.184,
1758
- "step": 2540
1759
- },
1760
- {
1761
- "epoch": 2.91,
1762
- "learning_rate": 5.428571428571428e-05,
1763
- "loss": 0.1741,
1764
- "step": 2550
1765
- },
1766
- {
1767
- "epoch": 2.93,
1768
- "learning_rate": 5.3714285714285714e-05,
1769
- "loss": 0.1766,
1770
- "step": 2560
1771
- },
1772
- {
1773
- "epoch": 2.94,
1774
- "learning_rate": 5.314285714285715e-05,
1775
- "loss": 0.1745,
1776
- "step": 2570
1777
- },
1778
- {
1779
- "epoch": 2.95,
1780
- "learning_rate": 5.257142857142857e-05,
1781
- "loss": 0.156,
1782
- "step": 2580
1783
- },
1784
- {
1785
- "epoch": 2.96,
1786
- "learning_rate": 5.2000000000000004e-05,
1787
- "loss": 0.1478,
1788
- "step": 2590
1789
- },
1790
- {
1791
- "epoch": 2.97,
1792
- "learning_rate": 5.142857142857143e-05,
1793
- "loss": 0.1461,
1794
- "step": 2600
1795
- },
1796
- {
1797
- "epoch": 2.97,
1798
- "eval_accuracy": 0.8883208808493905,
1799
- "eval_loss": 0.3038625717163086,
1800
- "eval_runtime": 109.6426,
1801
- "eval_samples_per_second": 139.161,
1802
- "eval_steps_per_second": 17.402,
1803
- "step": 2600
1804
- },
1805
- {
1806
- "epoch": 2.98,
1807
- "learning_rate": 5.085714285714286e-05,
1808
- "loss": 0.1587,
1809
- "step": 2610
1810
- },
1811
- {
1812
- "epoch": 2.99,
1813
- "learning_rate": 5.028571428571429e-05,
1814
- "loss": 0.1789,
1815
- "step": 2620
1816
- },
1817
- {
1818
- "epoch": 3.01,
1819
- "learning_rate": 4.971428571428572e-05,
1820
- "loss": 0.1253,
1821
- "step": 2630
1822
- },
1823
- {
1824
- "epoch": 3.02,
1825
- "learning_rate": 4.9142857142857144e-05,
1826
- "loss": 0.0804,
1827
- "step": 2640
1828
- },
1829
- {
1830
- "epoch": 3.03,
1831
- "learning_rate": 4.8571428571428576e-05,
1832
- "loss": 0.0721,
1833
- "step": 2650
1834
- },
1835
- {
1836
- "epoch": 3.04,
1837
- "learning_rate": 4.8e-05,
1838
- "loss": 0.0641,
1839
- "step": 2660
1840
- },
1841
- {
1842
- "epoch": 3.05,
1843
- "learning_rate": 4.742857142857143e-05,
1844
- "loss": 0.0762,
1845
- "step": 2670
1846
- },
1847
- {
1848
- "epoch": 3.06,
1849
- "learning_rate": 4.685714285714286e-05,
1850
- "loss": 0.0926,
1851
- "step": 2680
1852
- },
1853
- {
1854
- "epoch": 3.07,
1855
- "learning_rate": 4.628571428571429e-05,
1856
- "loss": 0.0677,
1857
- "step": 2690
1858
- },
1859
- {
1860
- "epoch": 3.09,
1861
- "learning_rate": 4.5714285714285716e-05,
1862
- "loss": 0.07,
1863
- "step": 2700
1864
- },
1865
- {
1866
- "epoch": 3.09,
1867
- "eval_accuracy": 0.8921221654214183,
1868
- "eval_loss": 0.3402355909347534,
1869
- "eval_runtime": 178.6248,
1870
- "eval_samples_per_second": 85.419,
1871
- "eval_steps_per_second": 10.682,
1872
- "step": 2700
1873
- },
1874
- {
1875
- "epoch": 3.1,
1876
- "learning_rate": 4.514285714285714e-05,
1877
- "loss": 0.0897,
1878
- "step": 2710
1879
- },
1880
- {
1881
- "epoch": 3.11,
1882
- "learning_rate": 4.4571428571428574e-05,
1883
- "loss": 0.0644,
1884
- "step": 2720
1885
- },
1886
- {
1887
- "epoch": 3.12,
1888
- "learning_rate": 4.4000000000000006e-05,
1889
- "loss": 0.0591,
1890
- "step": 2730
1891
- },
1892
- {
1893
- "epoch": 3.13,
1894
- "learning_rate": 4.342857142857143e-05,
1895
- "loss": 0.067,
1896
- "step": 2740
1897
- },
1898
- {
1899
- "epoch": 3.14,
1900
- "learning_rate": 4.2857142857142856e-05,
1901
- "loss": 0.0915,
1902
- "step": 2750
1903
- },
1904
- {
1905
- "epoch": 3.15,
1906
- "learning_rate": 4.228571428571429e-05,
1907
- "loss": 0.0737,
1908
- "step": 2760
1909
- },
1910
- {
1911
- "epoch": 3.17,
1912
- "learning_rate": 4.1714285714285714e-05,
1913
- "loss": 0.0655,
1914
- "step": 2770
1915
- },
1916
- {
1917
- "epoch": 3.18,
1918
- "learning_rate": 4.1142857142857146e-05,
1919
- "loss": 0.0657,
1920
- "step": 2780
1921
- },
1922
- {
1923
- "epoch": 3.19,
1924
- "learning_rate": 4.057142857142857e-05,
1925
- "loss": 0.0651,
1926
- "step": 2790
1927
- },
1928
- {
1929
- "epoch": 3.2,
1930
- "learning_rate": 4e-05,
1931
- "loss": 0.0637,
1932
- "step": 2800
1933
- },
1934
- {
1935
- "epoch": 3.2,
1936
- "eval_accuracy": 0.8944160440424695,
1937
- "eval_loss": 0.34459471702575684,
1938
- "eval_runtime": 102.3895,
1939
- "eval_samples_per_second": 149.019,
1940
- "eval_steps_per_second": 18.635,
1941
- "step": 2800
1942
- },
1943
- {
1944
- "epoch": 3.21,
1945
- "learning_rate": 3.942857142857143e-05,
1946
- "loss": 0.0629,
1947
- "step": 2810
1948
- },
1949
- {
1950
- "epoch": 3.22,
1951
- "learning_rate": 3.885714285714286e-05,
1952
- "loss": 0.0505,
1953
- "step": 2820
1954
- },
1955
- {
1956
- "epoch": 3.23,
1957
- "learning_rate": 3.8285714285714286e-05,
1958
- "loss": 0.0721,
1959
- "step": 2830
1960
- },
1961
- {
1962
- "epoch": 3.25,
1963
- "learning_rate": 3.771428571428572e-05,
1964
- "loss": 0.0686,
1965
- "step": 2840
1966
- },
1967
- {
1968
- "epoch": 3.26,
1969
- "learning_rate": 3.7142857142857143e-05,
1970
- "loss": 0.0703,
1971
- "step": 2850
1972
- },
1973
- {
1974
- "epoch": 3.27,
1975
- "learning_rate": 3.6571428571428576e-05,
1976
- "loss": 0.0768,
1977
- "step": 2860
1978
- },
1979
- {
1980
- "epoch": 3.28,
1981
- "learning_rate": 3.6e-05,
1982
- "loss": 0.0895,
1983
- "step": 2870
1984
- },
1985
- {
1986
- "epoch": 3.29,
1987
- "learning_rate": 3.5428571428571426e-05,
1988
- "loss": 0.0854,
1989
- "step": 2880
1990
- },
1991
- {
1992
- "epoch": 3.3,
1993
- "learning_rate": 3.485714285714286e-05,
1994
- "loss": 0.0843,
1995
- "step": 2890
1996
- },
1997
- {
1998
- "epoch": 3.31,
1999
- "learning_rate": 3.428571428571429e-05,
2000
- "loss": 0.0807,
2001
- "step": 2900
2002
- },
2003
- {
2004
- "epoch": 3.31,
2005
- "eval_accuracy": 0.8946782015991611,
2006
- "eval_loss": 0.3424582779407501,
2007
- "eval_runtime": 112.9833,
2008
- "eval_samples_per_second": 135.047,
2009
- "eval_steps_per_second": 16.887,
2010
- "step": 2900
2011
- },
2012
- {
2013
- "epoch": 3.33,
2014
- "learning_rate": 3.3714285714285716e-05,
2015
- "loss": 0.0713,
2016
- "step": 2910
2017
- },
2018
- {
2019
- "epoch": 3.34,
2020
- "learning_rate": 3.314285714285714e-05,
2021
- "loss": 0.0551,
2022
- "step": 2920
2023
- },
2024
- {
2025
- "epoch": 3.35,
2026
- "learning_rate": 3.257142857142857e-05,
2027
- "loss": 0.0642,
2028
- "step": 2930
2029
- },
2030
- {
2031
- "epoch": 3.36,
2032
- "learning_rate": 3.2000000000000005e-05,
2033
- "loss": 0.0803,
2034
- "step": 2940
2035
- },
2036
- {
2037
- "epoch": 3.37,
2038
- "learning_rate": 3.142857142857143e-05,
2039
- "loss": 0.0849,
2040
- "step": 2950
2041
- },
2042
- {
2043
- "epoch": 3.38,
2044
- "learning_rate": 3.0857142857142856e-05,
2045
- "loss": 0.0545,
2046
- "step": 2960
2047
- },
2048
- {
2049
- "epoch": 3.39,
2050
- "learning_rate": 3.0285714285714288e-05,
2051
- "loss": 0.0546,
2052
- "step": 2970
2053
- },
2054
- {
2055
- "epoch": 3.41,
2056
- "learning_rate": 2.9714285714285717e-05,
2057
- "loss": 0.0732,
2058
- "step": 2980
2059
- },
2060
- {
2061
- "epoch": 3.42,
2062
- "learning_rate": 2.9142857142857146e-05,
2063
- "loss": 0.0676,
2064
- "step": 2990
2065
- },
2066
- {
2067
- "epoch": 3.43,
2068
- "learning_rate": 2.857142857142857e-05,
2069
- "loss": 0.0637,
2070
- "step": 3000
2071
- },
2072
- {
2073
- "epoch": 3.43,
2074
- "eval_accuracy": 0.8964477651068292,
2075
- "eval_loss": 0.33960404992103577,
2076
- "eval_runtime": 107.6669,
2077
- "eval_samples_per_second": 141.715,
2078
- "eval_steps_per_second": 17.721,
2079
- "step": 3000
2080
- },
2081
- {
2082
- "epoch": 3.44,
2083
- "learning_rate": 2.8000000000000003e-05,
2084
- "loss": 0.0732,
2085
- "step": 3010
2086
- },
2087
- {
2088
- "epoch": 3.45,
2089
- "learning_rate": 2.742857142857143e-05,
2090
- "loss": 0.0704,
2091
- "step": 3020
2092
- },
2093
- {
2094
- "epoch": 3.46,
2095
- "learning_rate": 2.6857142857142857e-05,
2096
- "loss": 0.0801,
2097
- "step": 3030
2098
- },
2099
- {
2100
- "epoch": 3.47,
2101
- "learning_rate": 2.6285714285714286e-05,
2102
- "loss": 0.0606,
2103
- "step": 3040
2104
- },
2105
- {
2106
- "epoch": 3.49,
2107
- "learning_rate": 2.5714285714285714e-05,
2108
- "loss": 0.0553,
2109
- "step": 3050
2110
- },
2111
- {
2112
- "epoch": 3.5,
2113
- "learning_rate": 2.5142857142857147e-05,
2114
- "loss": 0.0411,
2115
- "step": 3060
2116
- },
2117
- {
2118
- "epoch": 3.51,
2119
- "learning_rate": 2.4571428571428572e-05,
2120
- "loss": 0.0535,
2121
- "step": 3070
2122
- },
2123
- {
2124
- "epoch": 3.52,
2125
- "learning_rate": 2.4e-05,
2126
- "loss": 0.0713,
2127
- "step": 3080
2128
- },
2129
- {
2130
- "epoch": 3.53,
2131
- "learning_rate": 2.342857142857143e-05,
2132
- "loss": 0.0562,
2133
- "step": 3090
2134
- },
2135
- {
2136
- "epoch": 3.54,
2137
- "learning_rate": 2.2857142857142858e-05,
2138
- "loss": 0.0535,
2139
- "step": 3100
2140
- },
2141
- {
2142
- "epoch": 3.54,
2143
- "eval_accuracy": 0.8971031589985581,
2144
- "eval_loss": 0.3407392203807831,
2145
- "eval_runtime": 101.8347,
2146
- "eval_samples_per_second": 149.831,
2147
- "eval_steps_per_second": 18.736,
2148
- "step": 3100
2149
- },
2150
- {
2151
- "epoch": 3.55,
2152
- "learning_rate": 2.2285714285714287e-05,
2153
- "loss": 0.0685,
2154
- "step": 3110
2155
- },
2156
- {
2157
- "epoch": 3.57,
2158
- "learning_rate": 2.1714285714285715e-05,
2159
- "loss": 0.0666,
2160
- "step": 3120
2161
- },
2162
- {
2163
- "epoch": 3.58,
2164
- "learning_rate": 2.1142857142857144e-05,
2165
- "loss": 0.0745,
2166
- "step": 3130
2167
- },
2168
- {
2169
- "epoch": 3.59,
2170
- "learning_rate": 2.0571428571428573e-05,
2171
- "loss": 0.0507,
2172
- "step": 3140
2173
- },
2174
- {
2175
- "epoch": 3.6,
2176
- "learning_rate": 2e-05,
2177
- "loss": 0.0728,
2178
- "step": 3150
2179
- },
2180
- {
2181
- "epoch": 3.61,
2182
- "learning_rate": 1.942857142857143e-05,
2183
- "loss": 0.0476,
2184
- "step": 3160
2185
- },
2186
- {
2187
- "epoch": 3.62,
2188
- "learning_rate": 1.885714285714286e-05,
2189
- "loss": 0.0601,
2190
- "step": 3170
2191
- },
2192
- {
2193
- "epoch": 3.63,
2194
- "learning_rate": 1.8285714285714288e-05,
2195
- "loss": 0.0531,
2196
- "step": 3180
2197
- },
2198
- {
2199
- "epoch": 3.65,
2200
- "learning_rate": 1.7714285714285713e-05,
2201
- "loss": 0.0543,
2202
- "step": 3190
2203
- },
2204
- {
2205
- "epoch": 3.66,
2206
- "learning_rate": 1.7142857142857145e-05,
2207
- "loss": 0.064,
2208
- "step": 3200
2209
- },
2210
- {
2211
- "epoch": 3.66,
2212
- "eval_accuracy": 0.9001835102896841,
2213
- "eval_loss": 0.3420109748840332,
2214
- "eval_runtime": 106.6302,
2215
- "eval_samples_per_second": 143.093,
2216
- "eval_steps_per_second": 17.894,
2217
- "step": 3200
2218
- },
2219
- {
2220
- "epoch": 3.67,
2221
- "learning_rate": 1.657142857142857e-05,
2222
- "loss": 0.0644,
2223
- "step": 3210
2224
- },
2225
- {
2226
- "epoch": 3.68,
2227
- "learning_rate": 1.6000000000000003e-05,
2228
- "loss": 0.0744,
2229
- "step": 3220
2230
- },
2231
- {
2232
- "epoch": 3.69,
2233
- "learning_rate": 1.5428571428571428e-05,
2234
- "loss": 0.0573,
2235
- "step": 3230
2236
- },
2237
- {
2238
- "epoch": 3.7,
2239
- "learning_rate": 1.4857142857142858e-05,
2240
- "loss": 0.0719,
2241
- "step": 3240
2242
- },
2243
- {
2244
- "epoch": 3.71,
2245
- "learning_rate": 1.4285714285714285e-05,
2246
- "loss": 0.0615,
2247
- "step": 3250
2248
- },
2249
- {
2250
- "epoch": 3.73,
2251
- "learning_rate": 1.3714285714285716e-05,
2252
- "loss": 0.0646,
2253
- "step": 3260
2254
- },
2255
- {
2256
- "epoch": 3.74,
2257
- "learning_rate": 1.3142857142857143e-05,
2258
- "loss": 0.0504,
2259
- "step": 3270
2260
- },
2261
- {
2262
- "epoch": 3.75,
2263
- "learning_rate": 1.2571428571428573e-05,
2264
- "loss": 0.0632,
2265
- "step": 3280
2266
- },
2267
- {
2268
- "epoch": 3.76,
2269
- "learning_rate": 1.2e-05,
2270
- "loss": 0.0698,
2271
- "step": 3290
2272
- },
2273
- {
2274
- "epoch": 3.77,
2275
- "learning_rate": 1.1428571428571429e-05,
2276
- "loss": 0.0707,
2277
- "step": 3300
2278
- },
2279
- {
2280
- "epoch": 3.77,
2281
- "eval_accuracy": 0.8994625770087823,
2282
- "eval_loss": 0.33135712146759033,
2283
- "eval_runtime": 104.0422,
2284
- "eval_samples_per_second": 146.652,
2285
- "eval_steps_per_second": 18.339,
2286
- "step": 3300
2287
- },
2288
- {
2289
- "epoch": 3.78,
2290
- "learning_rate": 1.0857142857142858e-05,
2291
- "loss": 0.0645,
2292
- "step": 3310
2293
- },
2294
- {
2295
- "epoch": 3.79,
2296
- "learning_rate": 1.0285714285714286e-05,
2297
- "loss": 0.051,
2298
- "step": 3320
2299
- },
2300
- {
2301
- "epoch": 3.81,
2302
- "learning_rate": 9.714285714285715e-06,
2303
- "loss": 0.063,
2304
- "step": 3330
2305
- },
2306
- {
2307
- "epoch": 3.82,
2308
- "learning_rate": 9.142857142857144e-06,
2309
- "loss": 0.0667,
2310
- "step": 3340
2311
- },
2312
- {
2313
- "epoch": 3.83,
2314
- "learning_rate": 8.571428571428573e-06,
2315
- "loss": 0.0611,
2316
- "step": 3350
2317
- },
2318
- {
2319
- "epoch": 3.84,
2320
- "learning_rate": 8.000000000000001e-06,
2321
- "loss": 0.0633,
2322
- "step": 3360
2323
- },
2324
- {
2325
- "epoch": 3.85,
2326
- "learning_rate": 7.428571428571429e-06,
2327
- "loss": 0.054,
2328
- "step": 3370
2329
- },
2330
- {
2331
- "epoch": 3.86,
2332
- "learning_rate": 6.857142857142858e-06,
2333
- "loss": 0.0645,
2334
- "step": 3380
2335
- },
2336
- {
2337
- "epoch": 3.87,
2338
- "learning_rate": 6.285714285714287e-06,
2339
- "loss": 0.0632,
2340
- "step": 3390
2341
- },
2342
- {
2343
- "epoch": 3.89,
2344
- "learning_rate": 5.7142857142857145e-06,
2345
- "loss": 0.058,
2346
- "step": 3400
2347
- },
2348
- {
2349
- "epoch": 3.89,
2350
- "eval_accuracy": 0.900249049678857,
2351
- "eval_loss": 0.32864585518836975,
2352
- "eval_runtime": 110.4035,
2353
- "eval_samples_per_second": 138.202,
2354
- "eval_steps_per_second": 17.282,
2355
- "step": 3400
2356
- },
2357
- {
2358
- "epoch": 3.9,
2359
- "learning_rate": 5.142857142857143e-06,
2360
- "loss": 0.0583,
2361
- "step": 3410
2362
- },
2363
- {
2364
- "epoch": 3.91,
2365
- "learning_rate": 4.571428571428572e-06,
2366
- "loss": 0.0833,
2367
- "step": 3420
2368
- },
2369
- {
2370
- "epoch": 3.92,
2371
- "learning_rate": 4.000000000000001e-06,
2372
- "loss": 0.0534,
2373
- "step": 3430
2374
- },
2375
- {
2376
- "epoch": 3.93,
2377
- "learning_rate": 3.428571428571429e-06,
2378
- "loss": 0.0789,
2379
- "step": 3440
2380
- },
2381
- {
2382
- "epoch": 3.94,
2383
- "learning_rate": 2.8571428571428573e-06,
2384
- "loss": 0.0608,
2385
- "step": 3450
2386
- },
2387
- {
2388
- "epoch": 3.95,
2389
- "learning_rate": 2.285714285714286e-06,
2390
- "loss": 0.0605,
2391
- "step": 3460
2392
- },
2393
- {
2394
- "epoch": 3.97,
2395
- "learning_rate": 1.7142857142857145e-06,
2396
- "loss": 0.0478,
2397
- "step": 3470
2398
- },
2399
  {
2400
  "epoch": 3.98,
2401
- "learning_rate": 1.142857142857143e-06,
2402
- "loss": 0.0661,
2403
- "step": 3480
2404
- },
2405
- {
2406
- "epoch": 3.99,
2407
- "learning_rate": 5.714285714285715e-07,
2408
- "loss": 0.0429,
2409
- "step": 3490
2410
- },
2411
- {
2412
- "epoch": 4.0,
2413
- "learning_rate": 0.0,
2414
- "loss": 0.048,
2415
- "step": 3500
2416
  },
2417
  {
2418
  "epoch": 4.0,
2419
- "eval_accuracy": 0.9012976799056233,
2420
- "eval_loss": 0.3263189494609833,
2421
- "eval_runtime": 103.7636,
2422
- "eval_samples_per_second": 147.046,
2423
- "eval_steps_per_second": 18.388,
2424
- "step": 3500
2425
  },
2426
  {
2427
  "epoch": 4.0,
2428
- "step": 3500,
2429
- "total_flos": 3.4689445074730156e+19,
2430
- "train_loss": 0.2980680786711829,
2431
- "train_runtime": 7553.3952,
2432
- "train_samples_per_second": 59.264,
2433
- "train_steps_per_second": 0.463
2434
  }
2435
  ],
2436
  "logging_steps": 10,
2437
- "max_steps": 3500,
2438
  "num_train_epochs": 4,
2439
  "save_steps": 100,
2440
- "total_flos": 3.4689445074730156e+19,
2441
  "trial_name": null,
2442
  "trial_params": null
2443
  }
 
1
  {
2
+ "best_metric": 0.2805336117744446,
3
+ "best_model_checkpoint": "./vit-base-tm/checkpoint-1800",
4
  "epoch": 4.0,
5
  "eval_steps": 100,
6
+ "global_step": 2472,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
10
  "log_history": [
11
  {
12
+ "epoch": 0.02,
13
+ "learning_rate": 0.00019919093851132686,
14
+ "loss": 1.2237,
15
  "step": 10
16
  },
17
  {
18
+ "epoch": 0.03,
19
+ "learning_rate": 0.00019838187702265374,
20
+ "loss": 1.1003,
21
  "step": 20
22
  },
23
  {
24
+ "epoch": 0.05,
25
+ "learning_rate": 0.0001975728155339806,
26
+ "loss": 1.0309,
27
  "step": 30
28
  },
29
  {
30
+ "epoch": 0.06,
31
+ "learning_rate": 0.00019676375404530745,
32
+ "loss": 0.9636,
33
  "step": 40
34
  },
35
  {
36
+ "epoch": 0.08,
37
+ "learning_rate": 0.00019595469255663433,
38
+ "loss": 0.8934,
39
  "step": 50
40
  },
41
  {
42
+ "epoch": 0.1,
43
+ "learning_rate": 0.00019514563106796118,
44
+ "loss": 0.8528,
45
  "step": 60
46
  },
47
  {
48
+ "epoch": 0.11,
49
+ "learning_rate": 0.00019433656957928804,
50
+ "loss": 0.8753,
51
  "step": 70
52
  },
53
  {
54
+ "epoch": 0.13,
55
+ "learning_rate": 0.00019352750809061492,
56
+ "loss": 0.8577,
57
  "step": 80
58
  },
59
  {
60
+ "epoch": 0.15,
61
+ "learning_rate": 0.00019271844660194177,
62
+ "loss": 0.821,
63
  "step": 90
64
  },
65
  {
66
+ "epoch": 0.16,
67
+ "learning_rate": 0.00019190938511326862,
68
+ "loss": 0.7945,
69
  "step": 100
70
  },
71
  {
72
+ "epoch": 0.16,
73
+ "eval_accuracy": 0.6915671572711284,
74
+ "eval_loss": 0.7725738286972046,
75
+ "eval_runtime": 149.5556,
76
+ "eval_samples_per_second": 144.468,
77
+ "eval_steps_per_second": 18.06,
78
  "step": 100
79
  },
80
  {
81
+ "epoch": 0.18,
82
+ "learning_rate": 0.00019110032362459548,
83
+ "loss": 0.775,
84
  "step": 110
85
  },
86
  {
87
+ "epoch": 0.19,
88
+ "learning_rate": 0.00019029126213592236,
89
+ "loss": 0.7727,
90
  "step": 120
91
  },
92
  {
93
+ "epoch": 0.21,
94
+ "learning_rate": 0.0001894822006472492,
95
+ "loss": 0.7341,
96
  "step": 130
97
  },
98
  {
99
+ "epoch": 0.23,
100
+ "learning_rate": 0.00018867313915857606,
101
+ "loss": 0.7121,
102
  "step": 140
103
  },
104
  {
105
+ "epoch": 0.24,
106
+ "learning_rate": 0.00018786407766990291,
107
+ "loss": 0.7187,
108
  "step": 150
109
  },
110
  {
111
+ "epoch": 0.26,
112
+ "learning_rate": 0.00018705501618122977,
113
+ "loss": 0.7346,
114
  "step": 160
115
  },
116
  {
117
+ "epoch": 0.28,
118
+ "learning_rate": 0.00018624595469255665,
119
+ "loss": 0.6781,
120
  "step": 170
121
  },
122
  {
123
+ "epoch": 0.29,
124
+ "learning_rate": 0.0001854368932038835,
125
+ "loss": 0.7102,
126
  "step": 180
127
  },
128
  {
129
+ "epoch": 0.31,
130
+ "learning_rate": 0.00018462783171521035,
131
+ "loss": 0.6582,
132
  "step": 190
133
  },
134
  {
135
+ "epoch": 0.32,
136
+ "learning_rate": 0.0001838187702265372,
137
+ "loss": 0.6502,
138
  "step": 200
139
  },
140
  {
141
+ "epoch": 0.32,
142
+ "eval_accuracy": 0.7326668518004258,
143
+ "eval_loss": 0.6581294536590576,
144
+ "eval_runtime": 156.2922,
145
+ "eval_samples_per_second": 138.241,
146
+ "eval_steps_per_second": 17.282,
147
  "step": 200
148
  },
149
  {
150
+ "epoch": 0.34,
151
+ "learning_rate": 0.0001830097087378641,
152
+ "loss": 0.6807,
153
  "step": 210
154
  },
155
  {
156
+ "epoch": 0.36,
157
+ "learning_rate": 0.00018220064724919094,
158
+ "loss": 0.683,
159
  "step": 220
160
  },
161
  {
162
+ "epoch": 0.37,
163
+ "learning_rate": 0.0001813915857605178,
164
+ "loss": 0.6362,
165
  "step": 230
166
  },
167
  {
168
+ "epoch": 0.39,
169
+ "learning_rate": 0.00018058252427184467,
170
+ "loss": 0.6389,
171
  "step": 240
172
  },
173
  {
174
+ "epoch": 0.4,
175
+ "learning_rate": 0.00017977346278317153,
176
+ "loss": 0.6333,
177
  "step": 250
178
  },
179
  {
180
+ "epoch": 0.42,
181
+ "learning_rate": 0.00017896440129449838,
182
+ "loss": 0.6116,
183
  "step": 260
184
  },
185
  {
186
+ "epoch": 0.44,
187
+ "learning_rate": 0.00017815533980582526,
188
+ "loss": 0.6021,
189
  "step": 270
190
  },
191
  {
192
+ "epoch": 0.45,
193
+ "learning_rate": 0.0001773462783171521,
194
+ "loss": 0.5889,
195
  "step": 280
196
  },
197
  {
198
+ "epoch": 0.47,
199
+ "learning_rate": 0.00017653721682847897,
200
+ "loss": 0.5977,
201
  "step": 290
202
  },
203
  {
204
+ "epoch": 0.49,
205
+ "learning_rate": 0.00017572815533980585,
206
+ "loss": 0.5671,
207
  "step": 300
208
  },
209
  {
210
+ "epoch": 0.49,
211
+ "eval_accuracy": 0.7589095621586597,
212
+ "eval_loss": 0.5891677737236023,
213
+ "eval_runtime": 152.5595,
214
+ "eval_samples_per_second": 141.623,
215
+ "eval_steps_per_second": 17.705,
216
  "step": 300
217
  },
218
  {
219
+ "epoch": 0.5,
220
+ "learning_rate": 0.0001749190938511327,
221
+ "loss": 0.6115,
222
  "step": 310
223
  },
224
  {
225
+ "epoch": 0.52,
226
+ "learning_rate": 0.00017411003236245955,
227
+ "loss": 0.5955,
228
  "step": 320
229
  },
230
  {
231
+ "epoch": 0.53,
232
+ "learning_rate": 0.0001733009708737864,
233
+ "loss": 0.573,
234
  "step": 330
235
  },
236
  {
237
+ "epoch": 0.55,
238
+ "learning_rate": 0.00017249190938511329,
239
+ "loss": 0.5681,
240
  "step": 340
241
  },
242
  {
243
+ "epoch": 0.57,
244
+ "learning_rate": 0.00017168284789644014,
245
+ "loss": 0.5624,
246
  "step": 350
247
  },
248
  {
249
+ "epoch": 0.58,
250
+ "learning_rate": 0.000170873786407767,
251
+ "loss": 0.5591,
252
  "step": 360
253
  },
254
  {
255
+ "epoch": 0.6,
256
+ "learning_rate": 0.00017006472491909387,
257
+ "loss": 0.5836,
258
  "step": 370
259
  },
260
  {
261
+ "epoch": 0.61,
262
+ "learning_rate": 0.00016925566343042073,
263
+ "loss": 0.5589,
264
  "step": 380
265
  },
266
  {
267
+ "epoch": 0.63,
268
+ "learning_rate": 0.00016844660194174758,
269
+ "loss": 0.5433,
270
  "step": 390
271
  },
272
  {
273
+ "epoch": 0.65,
274
+ "learning_rate": 0.00016763754045307446,
275
+ "loss": 0.5625,
276
  "step": 400
277
  },
278
  {
279
+ "epoch": 0.65,
280
+ "eval_accuracy": 0.7753864667222068,
281
+ "eval_loss": 0.5423863530158997,
282
+ "eval_runtime": 153.1943,
283
+ "eval_samples_per_second": 141.037,
284
+ "eval_steps_per_second": 17.631,
285
  "step": 400
286
  },
287
  {
288
+ "epoch": 0.66,
289
+ "learning_rate": 0.0001668284789644013,
290
+ "loss": 0.5507,
291
  "step": 410
292
  },
293
  {
294
+ "epoch": 0.68,
295
+ "learning_rate": 0.00016601941747572817,
296
+ "loss": 0.5439,
297
  "step": 420
298
  },
299
  {
300
+ "epoch": 0.7,
301
+ "learning_rate": 0.00016521035598705505,
302
+ "loss": 0.5362,
303
  "step": 430
304
  },
305
  {
306
+ "epoch": 0.71,
307
+ "learning_rate": 0.0001644012944983819,
308
+ "loss": 0.5375,
309
  "step": 440
310
  },
311
  {
312
+ "epoch": 0.73,
313
+ "learning_rate": 0.00016359223300970875,
314
+ "loss": 0.5379,
315
  "step": 450
316
  },
317
  {
318
+ "epoch": 0.74,
319
+ "learning_rate": 0.0001627831715210356,
320
+ "loss": 0.5066,
321
  "step": 460
322
  },
323
  {
324
+ "epoch": 0.76,
325
+ "learning_rate": 0.00016197411003236246,
326
+ "loss": 0.5419,
327
  "step": 470
328
  },
329
  {
330
+ "epoch": 0.78,
331
+ "learning_rate": 0.0001611650485436893,
332
+ "loss": 0.476,
333
  "step": 480
334
  },
335
  {
336
+ "epoch": 0.79,
337
+ "learning_rate": 0.0001603559870550162,
338
+ "loss": 0.522,
339
  "step": 490
340
  },
341
  {
342
+ "epoch": 0.81,
343
+ "learning_rate": 0.00015954692556634304,
344
+ "loss": 0.5115,
345
  "step": 500
346
  },
347
  {
348
+ "epoch": 0.81,
349
+ "eval_accuracy": 0.7930667407201704,
350
+ "eval_loss": 0.4989684820175171,
351
+ "eval_runtime": 148.2277,
352
+ "eval_samples_per_second": 145.762,
353
+ "eval_steps_per_second": 18.222,
354
  "step": 500
355
  },
356
  {
357
+ "epoch": 0.83,
358
+ "learning_rate": 0.0001587378640776699,
359
+ "loss": 0.4866,
360
  "step": 510
361
  },
362
  {
363
+ "epoch": 0.84,
364
+ "learning_rate": 0.00015792880258899675,
365
+ "loss": 0.495,
366
  "step": 520
367
  },
368
  {
369
+ "epoch": 0.86,
370
+ "learning_rate": 0.00015711974110032363,
371
+ "loss": 0.5037,
372
  "step": 530
373
  },
374
  {
375
+ "epoch": 0.87,
376
+ "learning_rate": 0.00015631067961165048,
377
+ "loss": 0.4769,
378
  "step": 540
379
  },
380
  {
381
+ "epoch": 0.89,
382
+ "learning_rate": 0.00015550161812297734,
383
+ "loss": 0.4818,
384
  "step": 550
385
  },
386
  {
387
+ "epoch": 0.91,
388
+ "learning_rate": 0.00015469255663430422,
389
+ "loss": 0.4968,
390
  "step": 560
391
  },
392
  {
393
+ "epoch": 0.92,
394
+ "learning_rate": 0.00015388349514563107,
395
+ "loss": 0.5027,
396
  "step": 570
397
  },
398
  {
399
+ "epoch": 0.94,
400
+ "learning_rate": 0.00015307443365695792,
401
+ "loss": 0.4727,
402
  "step": 580
403
  },
404
  {
405
+ "epoch": 0.95,
406
+ "learning_rate": 0.0001522653721682848,
407
+ "loss": 0.4959,
408
  "step": 590
409
  },
410
  {
411
+ "epoch": 0.97,
412
+ "learning_rate": 0.00015145631067961166,
413
+ "loss": 0.4643,
414
  "step": 600
415
  },
416
  {
417
+ "epoch": 0.97,
418
+ "eval_accuracy": 0.8040359159492734,
419
+ "eval_loss": 0.4709508717060089,
420
+ "eval_runtime": 149.3766,
421
+ "eval_samples_per_second": 144.641,
422
+ "eval_steps_per_second": 18.082,
423
  "step": 600
424
  },
425
  {
426
+ "epoch": 0.99,
427
+ "learning_rate": 0.0001506472491909385,
428
+ "loss": 0.4721,
429
  "step": 610
430
  },
431
  {
432
+ "epoch": 1.0,
433
+ "learning_rate": 0.0001498381877022654,
434
+ "loss": 0.4329,
435
  "step": 620
436
  },
437
  {
438
+ "epoch": 1.02,
439
+ "learning_rate": 0.00014902912621359224,
440
+ "loss": 0.4058,
441
  "step": 630
442
  },
443
  {
444
+ "epoch": 1.04,
445
+ "learning_rate": 0.0001482200647249191,
446
+ "loss": 0.3745,
447
  "step": 640
448
  },
449
  {
450
+ "epoch": 1.05,
451
+ "learning_rate": 0.00014741100323624598,
452
+ "loss": 0.371,
453
  "step": 650
454
  },
455
  {
456
+ "epoch": 1.07,
457
+ "learning_rate": 0.00014660194174757283,
458
+ "loss": 0.335,
459
  "step": 660
460
  },
461
  {
462
+ "epoch": 1.08,
463
+ "learning_rate": 0.00014579288025889968,
464
+ "loss": 0.3992,
465
  "step": 670
466
  },
467
  {
468
+ "epoch": 1.1,
469
+ "learning_rate": 0.00014498381877022656,
470
+ "loss": 0.3706,
471
  "step": 680
472
  },
473
  {
474
+ "epoch": 1.12,
475
+ "learning_rate": 0.00014417475728155342,
476
+ "loss": 0.3747,
477
  "step": 690
478
  },
479
  {
480
+ "epoch": 1.13,
481
+ "learning_rate": 0.00014336569579288027,
482
+ "loss": 0.3586,
483
  "step": 700
484
  },
485
  {
486
+ "epoch": 1.13,
487
+ "eval_accuracy": 0.8220864574655188,
488
+ "eval_loss": 0.4336797595024109,
489
+ "eval_runtime": 152.9157,
490
+ "eval_samples_per_second": 141.293,
491
+ "eval_steps_per_second": 17.663,
492
  "step": 700
493
  },
494
  {
495
+ "epoch": 1.15,
496
+ "learning_rate": 0.00014255663430420715,
497
+ "loss": 0.3842,
498
  "step": 710
499
  },
500
  {
501
+ "epoch": 1.17,
502
+ "learning_rate": 0.000141747572815534,
503
+ "loss": 0.3386,
504
  "step": 720
505
  },
506
  {
507
+ "epoch": 1.18,
508
+ "learning_rate": 0.00014093851132686086,
509
+ "loss": 0.3737,
510
  "step": 730
511
  },
512
  {
513
+ "epoch": 1.2,
514
+ "learning_rate": 0.0001401294498381877,
515
+ "loss": 0.3695,
516
  "step": 740
517
  },
518
  {
519
+ "epoch": 1.21,
520
+ "learning_rate": 0.00013932038834951456,
521
+ "loss": 0.3766,
522
  "step": 750
523
  },
524
  {
525
+ "epoch": 1.23,
526
+ "learning_rate": 0.00013851132686084141,
527
+ "loss": 0.3636,
528
  "step": 760
529
  },
530
  {
531
+ "epoch": 1.25,
532
+ "learning_rate": 0.0001377022653721683,
533
+ "loss": 0.3419,
534
  "step": 770
535
  },
536
  {
537
+ "epoch": 1.26,
538
+ "learning_rate": 0.00013689320388349515,
539
+ "loss": 0.382,
540
  "step": 780
541
  },
542
  {
543
+ "epoch": 1.28,
544
+ "learning_rate": 0.000136084142394822,
545
+ "loss": 0.355,
546
  "step": 790
547
  },
548
  {
549
+ "epoch": 1.29,
550
+ "learning_rate": 0.00013527508090614885,
551
+ "loss": 0.3421,
552
  "step": 800
553
  },
554
  {
555
+ "epoch": 1.29,
556
+ "eval_accuracy": 0.8337498842914005,
557
+ "eval_loss": 0.4096640944480896,
558
+ "eval_runtime": 162.2619,
559
+ "eval_samples_per_second": 133.155,
560
+ "eval_steps_per_second": 16.646,
561
  "step": 800
562
  },
563
  {
564
+ "epoch": 1.31,
565
+ "learning_rate": 0.00013446601941747573,
566
+ "loss": 0.3532,
567
  "step": 810
568
  },
569
  {
570
+ "epoch": 1.33,
571
+ "learning_rate": 0.0001336569579288026,
572
+ "loss": 0.3251,
573
  "step": 820
574
  },
575
  {
576
+ "epoch": 1.34,
577
+ "learning_rate": 0.00013284789644012944,
578
+ "loss": 0.3406,
579
  "step": 830
580
  },
581
  {
582
+ "epoch": 1.36,
583
+ "learning_rate": 0.00013203883495145632,
584
+ "loss": 0.356,
585
  "step": 840
586
  },
587
  {
588
+ "epoch": 1.38,
589
+ "learning_rate": 0.00013122977346278317,
590
+ "loss": 0.3554,
591
  "step": 850
592
  },
593
  {
594
+ "epoch": 1.39,
595
+ "learning_rate": 0.00013042071197411003,
596
+ "loss": 0.3502,
597
  "step": 860
598
  },
599
  {
600
+ "epoch": 1.41,
601
+ "learning_rate": 0.0001296116504854369,
602
+ "loss": 0.3147,
603
  "step": 870
604
  },
605
  {
606
+ "epoch": 1.42,
607
+ "learning_rate": 0.00012880258899676376,
608
+ "loss": 0.3399,
609
  "step": 880
610
  },
611
  {
612
+ "epoch": 1.44,
613
+ "learning_rate": 0.00012799352750809061,
614
+ "loss": 0.3108,
615
  "step": 890
616
  },
617
  {
618
+ "epoch": 1.46,
619
+ "learning_rate": 0.0001271844660194175,
620
+ "loss": 0.3478,
621
  "step": 900
622
  },
623
  {
624
+ "epoch": 1.46,
625
+ "eval_accuracy": 0.844626492640933,
626
+ "eval_loss": 0.38169920444488525,
627
+ "eval_runtime": 156.4108,
628
+ "eval_samples_per_second": 138.136,
629
+ "eval_steps_per_second": 17.269,
630
  "step": 900
631
  },
632
  {
633
+ "epoch": 1.47,
634
+ "learning_rate": 0.00012637540453074435,
635
+ "loss": 0.3344,
636
  "step": 910
637
  },
638
  {
639
+ "epoch": 1.49,
640
+ "learning_rate": 0.0001255663430420712,
641
+ "loss": 0.323,
642
  "step": 920
643
  },
644
  {
645
+ "epoch": 1.5,
646
+ "learning_rate": 0.00012475728155339805,
647
+ "loss": 0.3386,
648
  "step": 930
649
  },
650
  {
651
+ "epoch": 1.52,
652
+ "learning_rate": 0.00012394822006472493,
653
+ "loss": 0.3236,
654
  "step": 940
655
  },
656
  {
657
+ "epoch": 1.54,
658
+ "learning_rate": 0.00012313915857605179,
659
+ "loss": 0.332,
660
  "step": 950
661
  },
662
  {
663
+ "epoch": 1.55,
664
+ "learning_rate": 0.00012233009708737864,
665
+ "loss": 0.3382,
666
  "step": 960
667
  },
668
  {
669
+ "epoch": 1.57,
670
+ "learning_rate": 0.0001215210355987055,
671
+ "loss": 0.3033,
672
  "step": 970
673
  },
674
  {
675
+ "epoch": 1.59,
676
+ "learning_rate": 0.00012071197411003237,
677
+ "loss": 0.3358,
678
  "step": 980
679
  },
680
  {
681
+ "epoch": 1.6,
682
+ "learning_rate": 0.00011990291262135923,
683
+ "loss": 0.3098,
684
  "step": 990
685
  },
686
  {
687
+ "epoch": 1.62,
688
+ "learning_rate": 0.00011909385113268609,
689
+ "loss": 0.2965,
690
  "step": 1000
691
  },
692
  {
693
+ "epoch": 1.62,
694
+ "eval_accuracy": 0.8456910117559937,
695
+ "eval_loss": 0.37541016936302185,
696
+ "eval_runtime": 154.9131,
697
+ "eval_samples_per_second": 139.472,
698
+ "eval_steps_per_second": 17.436,
699
  "step": 1000
700
  },
701
  {
702
+ "epoch": 1.63,
703
+ "learning_rate": 0.00011828478964401295,
704
+ "loss": 0.2885,
705
  "step": 1010
706
  },
707
  {
708
+ "epoch": 1.65,
709
+ "learning_rate": 0.0001174757281553398,
710
+ "loss": 0.3348,
711
  "step": 1020
712
  },
713
  {
714
+ "epoch": 1.67,
715
+ "learning_rate": 0.00011666666666666668,
716
+ "loss": 0.3234,
717
  "step": 1030
718
  },
719
  {
720
+ "epoch": 1.68,
721
+ "learning_rate": 0.00011585760517799353,
722
+ "loss": 0.2942,
723
  "step": 1040
724
  },
725
  {
726
+ "epoch": 1.7,
727
+ "learning_rate": 0.00011504854368932039,
728
+ "loss": 0.3038,
729
  "step": 1050
730
  },
731
  {
732
+ "epoch": 1.72,
733
+ "learning_rate": 0.00011423948220064727,
734
+ "loss": 0.2935,
735
  "step": 1060
736
  },
737
  {
738
+ "epoch": 1.73,
739
+ "learning_rate": 0.00011343042071197412,
740
+ "loss": 0.3229,
741
  "step": 1070
742
  },
743
  {
744
+ "epoch": 1.75,
745
+ "learning_rate": 0.00011262135922330097,
746
+ "loss": 0.2824,
747
  "step": 1080
748
  },
749
  {
750
+ "epoch": 1.76,
751
+ "learning_rate": 0.00011181229773462785,
752
+ "loss": 0.3171,
753
  "step": 1090
754
  },
755
  {
756
+ "epoch": 1.78,
757
+ "learning_rate": 0.0001110032362459547,
758
+ "loss": 0.2986,
759
  "step": 1100
760
  },
761
  {
762
+ "epoch": 1.78,
763
+ "eval_accuracy": 0.8550402665926131,
764
+ "eval_loss": 0.3548347055912018,
765
+ "eval_runtime": 147.1647,
766
+ "eval_samples_per_second": 146.815,
767
+ "eval_steps_per_second": 18.354,
768
  "step": 1100
769
  },
770
  {
771
+ "epoch": 1.8,
772
+ "learning_rate": 0.00011019417475728156,
773
+ "loss": 0.3117,
774
  "step": 1110
775
  },
776
  {
777
+ "epoch": 1.81,
778
+ "learning_rate": 0.00010938511326860842,
779
+ "loss": 0.3094,
780
  "step": 1120
781
  },
782
  {
783
+ "epoch": 1.83,
784
+ "learning_rate": 0.00010857605177993528,
785
+ "loss": 0.3089,
786
  "step": 1130
787
  },
788
  {
789
+ "epoch": 1.84,
790
+ "learning_rate": 0.00010776699029126213,
791
+ "loss": 0.2854,
792
  "step": 1140
793
  },
794
  {
795
+ "epoch": 1.86,
796
+ "learning_rate": 0.000106957928802589,
797
+ "loss": 0.2747,
798
  "step": 1150
799
  },
800
  {
801
+ "epoch": 1.88,
802
+ "learning_rate": 0.00010614886731391586,
803
+ "loss": 0.2944,
804
  "step": 1160
805
  },
806
  {
807
+ "epoch": 1.89,
808
+ "learning_rate": 0.00010533980582524272,
809
+ "loss": 0.307,
810
  "step": 1170
811
  },
812
  {
813
+ "epoch": 1.91,
814
+ "learning_rate": 0.00010453074433656957,
815
+ "loss": 0.2829,
816
  "step": 1180
817
  },
818
  {
819
+ "epoch": 1.93,
820
+ "learning_rate": 0.00010372168284789645,
821
+ "loss": 0.3038,
822
  "step": 1190
823
  },
824
  {
825
+ "epoch": 1.94,
826
+ "learning_rate": 0.0001029126213592233,
827
+ "loss": 0.2932,
828
  "step": 1200
829
  },
830
  {
831
+ "epoch": 1.94,
832
+ "eval_accuracy": 0.863325002314172,
833
+ "eval_loss": 0.3386863172054291,
834
+ "eval_runtime": 149.1575,
835
+ "eval_samples_per_second": 144.854,
836
+ "eval_steps_per_second": 18.108,
837
  "step": 1200
838
  },
839
  {
840
+ "epoch": 1.96,
841
+ "learning_rate": 0.00010210355987055016,
842
+ "loss": 0.2873,
843
  "step": 1210
844
  },
845
  {
846
+ "epoch": 1.97,
847
+ "learning_rate": 0.00010129449838187704,
848
+ "loss": 0.2857,
849
  "step": 1220
850
  },
851
  {
852
+ "epoch": 1.99,
853
+ "learning_rate": 0.00010048543689320389,
854
+ "loss": 0.2621,
855
  "step": 1230
856
  },
857
  {
858
+ "epoch": 2.01,
859
+ "learning_rate": 9.967637540453076e-05,
860
+ "loss": 0.2392,
861
  "step": 1240
862
  },
863
  {
864
+ "epoch": 2.02,
865
+ "learning_rate": 9.886731391585761e-05,
866
+ "loss": 0.1795,
867
  "step": 1250
868
  },
869
  {
870
+ "epoch": 2.04,
871
+ "learning_rate": 9.805825242718448e-05,
872
+ "loss": 0.1613,
873
  "step": 1260
874
  },
875
  {
876
+ "epoch": 2.06,
877
+ "learning_rate": 9.724919093851133e-05,
878
+ "loss": 0.1699,
879
  "step": 1270
880
  },
881
  {
882
+ "epoch": 2.07,
883
+ "learning_rate": 9.64401294498382e-05,
884
+ "loss": 0.1787,
885
  "step": 1280
886
  },
887
  {
888
+ "epoch": 2.09,
889
+ "learning_rate": 9.563106796116505e-05,
890
+ "loss": 0.1731,
891
  "step": 1290
892
  },
893
  {
894
+ "epoch": 2.1,
895
+ "learning_rate": 9.48220064724919e-05,
896
+ "loss": 0.1701,
897
  "step": 1300
898
  },
899
  {
900
+ "epoch": 2.1,
901
+ "eval_accuracy": 0.8677219290937702,
902
+ "eval_loss": 0.3415055274963379,
903
+ "eval_runtime": 160.755,
904
+ "eval_samples_per_second": 134.403,
905
+ "eval_steps_per_second": 16.802,
906
  "step": 1300
907
  },
908
  {
909
+ "epoch": 2.12,
910
+ "learning_rate": 9.401294498381877e-05,
911
+ "loss": 0.1629,
912
  "step": 1310
913
  },
914
  {
915
+ "epoch": 2.14,
916
+ "learning_rate": 9.320388349514564e-05,
917
+ "loss": 0.1694,
918
  "step": 1320
919
  },
920
  {
921
+ "epoch": 2.15,
922
+ "learning_rate": 9.239482200647249e-05,
923
+ "loss": 0.1644,
924
  "step": 1330
925
  },
926
  {
927
+ "epoch": 2.17,
928
+ "learning_rate": 9.158576051779936e-05,
929
+ "loss": 0.1631,
930
  "step": 1340
931
  },
932
  {
933
+ "epoch": 2.18,
934
+ "learning_rate": 9.077669902912622e-05,
935
+ "loss": 0.1786,
936
  "step": 1350
937
  },
938
  {
939
+ "epoch": 2.2,
940
+ "learning_rate": 8.996763754045308e-05,
941
+ "loss": 0.1792,
942
  "step": 1360
943
  },
944
  {
945
+ "epoch": 2.22,
946
+ "learning_rate": 8.915857605177994e-05,
947
+ "loss": 0.1913,
948
  "step": 1370
949
  },
950
  {
951
+ "epoch": 2.23,
952
+ "learning_rate": 8.834951456310681e-05,
953
+ "loss": 0.1751,
954
  "step": 1380
955
  },
956
  {
957
+ "epoch": 2.25,
958
+ "learning_rate": 8.754045307443366e-05,
959
+ "loss": 0.165,
960
  "step": 1390
961
  },
962
  {
963
+ "epoch": 2.27,
964
+ "learning_rate": 8.673139158576053e-05,
965
+ "loss": 0.1891,
966
  "step": 1400
967
  },
968
  {
969
+ "epoch": 2.27,
970
+ "eval_accuracy": 0.876562066092752,
971
+ "eval_loss": 0.3259858787059784,
972
+ "eval_runtime": 179.0501,
973
+ "eval_samples_per_second": 120.67,
974
+ "eval_steps_per_second": 15.085,
975
  "step": 1400
976
  },
977
  {
978
+ "epoch": 2.28,
979
+ "learning_rate": 8.592233009708738e-05,
980
+ "loss": 0.1645,
981
  "step": 1410
982
  },
983
  {
984
+ "epoch": 2.3,
985
+ "learning_rate": 8.511326860841425e-05,
986
+ "loss": 0.1749,
987
  "step": 1420
988
  },
989
  {
990
+ "epoch": 2.31,
991
+ "learning_rate": 8.43042071197411e-05,
992
+ "loss": 0.1635,
993
  "step": 1430
994
  },
995
  {
996
+ "epoch": 2.33,
997
+ "learning_rate": 8.349514563106797e-05,
998
+ "loss": 0.1526,
999
  "step": 1440
1000
  },
1001
  {
1002
+ "epoch": 2.35,
1003
+ "learning_rate": 8.268608414239482e-05,
1004
+ "loss": 0.1786,
1005
  "step": 1450
1006
  },
1007
  {
1008
+ "epoch": 2.36,
1009
+ "learning_rate": 8.187702265372169e-05,
1010
+ "loss": 0.1726,
1011
  "step": 1460
1012
  },
1013
  {
1014
+ "epoch": 2.38,
1015
+ "learning_rate": 8.106796116504854e-05,
1016
+ "loss": 0.1625,
1017
  "step": 1470
1018
  },
1019
  {
1020
+ "epoch": 2.39,
1021
+ "learning_rate": 8.025889967637541e-05,
1022
+ "loss": 0.156,
1023
  "step": 1480
1024
  },
1025
  {
1026
+ "epoch": 2.41,
1027
+ "learning_rate": 7.944983818770227e-05,
1028
+ "loss": 0.1755,
1029
  "step": 1490
1030
  },
1031
  {
1032
+ "epoch": 2.43,
1033
+ "learning_rate": 7.864077669902913e-05,
1034
+ "loss": 0.1741,
1035
  "step": 1500
1036
  },
1037
  {
1038
+ "epoch": 2.43,
1039
+ "eval_accuracy": 0.8817920947884846,
1040
+ "eval_loss": 0.31030356884002686,
1041
+ "eval_runtime": 146.2047,
1042
+ "eval_samples_per_second": 147.779,
1043
+ "eval_steps_per_second": 18.474,
1044
  "step": 1500
1045
  },
1046
  {
1047
+ "epoch": 2.44,
1048
+ "learning_rate": 7.7831715210356e-05,
1049
+ "loss": 0.1709,
1050
  "step": 1510
1051
  },
1052
  {
1053
+ "epoch": 2.46,
1054
+ "learning_rate": 7.702265372168285e-05,
1055
+ "loss": 0.1689,
1056
  "step": 1520
1057
  },
1058
  {
1059
+ "epoch": 2.48,
1060
+ "learning_rate": 7.621359223300971e-05,
1061
+ "loss": 0.1458,
1062
  "step": 1530
1063
  },
1064
  {
1065
+ "epoch": 2.49,
1066
+ "learning_rate": 7.540453074433658e-05,
1067
+ "loss": 0.1569,
1068
  "step": 1540
1069
  },
1070
  {
1071
+ "epoch": 2.51,
1072
+ "learning_rate": 7.459546925566343e-05,
1073
+ "loss": 0.1689,
1074
  "step": 1550
1075
  },
1076
  {
1077
+ "epoch": 2.52,
1078
+ "learning_rate": 7.37864077669903e-05,
1079
+ "loss": 0.1571,
1080
  "step": 1560
1081
  },
1082
  {
1083
+ "epoch": 2.54,
1084
+ "learning_rate": 7.297734627831717e-05,
1085
+ "loss": 0.153,
1086
  "step": 1570
1087
  },
1088
  {
1089
+ "epoch": 2.56,
1090
+ "learning_rate": 7.216828478964402e-05,
1091
+ "loss": 0.1585,
1092
  "step": 1580
1093
  },
1094
  {
1095
+ "epoch": 2.57,
1096
+ "learning_rate": 7.135922330097087e-05,
1097
+ "loss": 0.1511,
1098
  "step": 1590
1099
  },
1100
  {
1101
+ "epoch": 2.59,
1102
+ "learning_rate": 7.055016181229773e-05,
1103
+ "loss": 0.1542,
1104
  "step": 1600
1105
  },
1106
  {
1107
+ "epoch": 2.59,
1108
+ "eval_accuracy": 0.8868832731648616,
1109
+ "eval_loss": 0.30611881613731384,
1110
+ "eval_runtime": 148.9187,
1111
+ "eval_samples_per_second": 145.086,
1112
+ "eval_steps_per_second": 18.137,
1113
  "step": 1600
1114
  },
1115
  {
1116
+ "epoch": 2.61,
1117
+ "learning_rate": 6.974110032362459e-05,
1118
+ "loss": 0.1684,
1119
  "step": 1610
1120
  },
1121
  {
1122
+ "epoch": 2.62,
1123
+ "learning_rate": 6.893203883495146e-05,
1124
+ "loss": 0.1593,
1125
  "step": 1620
1126
  },
1127
  {
1128
+ "epoch": 2.64,
1129
+ "learning_rate": 6.812297734627831e-05,
1130
+ "loss": 0.1601,
1131
  "step": 1630
1132
  },
1133
  {
1134
+ "epoch": 2.65,
1135
+ "learning_rate": 6.731391585760518e-05,
1136
+ "loss": 0.1408,
1137
  "step": 1640
1138
  },
1139
  {
1140
+ "epoch": 2.67,
1141
+ "learning_rate": 6.650485436893205e-05,
1142
+ "loss": 0.1542,
1143
  "step": 1650
1144
  },
1145
  {
1146
+ "epoch": 2.69,
1147
+ "learning_rate": 6.56957928802589e-05,
1148
+ "loss": 0.1609,
1149
  "step": 1660
1150
  },
1151
  {
1152
+ "epoch": 2.7,
1153
+ "learning_rate": 6.488673139158577e-05,
1154
+ "loss": 0.1528,
1155
  "step": 1670
1156
  },
1157
  {
1158
+ "epoch": 2.72,
1159
+ "learning_rate": 6.407766990291263e-05,
1160
+ "loss": 0.1447,
1161
  "step": 1680
1162
  },
1163
  {
1164
+ "epoch": 2.73,
1165
+ "learning_rate": 6.326860841423949e-05,
1166
+ "loss": 0.1522,
1167
  "step": 1690
1168
  },
1169
  {
1170
+ "epoch": 2.75,
1171
+ "learning_rate": 6.245954692556635e-05,
1172
+ "loss": 0.172,
1173
  "step": 1700
1174
  },
1175
  {
1176
+ "epoch": 2.75,
1177
+ "eval_accuracy": 0.8887808941960567,
1178
+ "eval_loss": 0.2924608886241913,
1179
+ "eval_runtime": 152.8021,
1180
+ "eval_samples_per_second": 141.399,
1181
+ "eval_steps_per_second": 17.676,
1182
  "step": 1700
1183
  },
1184
  {
1185
+ "epoch": 2.77,
1186
+ "learning_rate": 6.16504854368932e-05,
1187
+ "loss": 0.1467,
1188
  "step": 1710
1189
  },
1190
  {
1191
+ "epoch": 2.78,
1192
+ "learning_rate": 6.0841423948220065e-05,
1193
+ "loss": 0.1678,
1194
  "step": 1720
1195
  },
1196
  {
1197
+ "epoch": 2.8,
1198
+ "learning_rate": 6.003236245954693e-05,
1199
+ "loss": 0.1538,
1200
  "step": 1730
1201
  },
1202
  {
1203
+ "epoch": 2.82,
1204
+ "learning_rate": 5.9223300970873785e-05,
1205
+ "loss": 0.1609,
1206
  "step": 1740
1207
  },
1208
  {
1209
+ "epoch": 2.83,
1210
+ "learning_rate": 5.841423948220065e-05,
1211
+ "loss": 0.1454,
1212
  "step": 1750
1213
  },
1214
  {
1215
+ "epoch": 2.85,
1216
+ "learning_rate": 5.760517799352752e-05,
1217
+ "loss": 0.1505,
1218
  "step": 1760
1219
  },
1220
  {
1221
+ "epoch": 2.86,
1222
+ "learning_rate": 5.679611650485437e-05,
1223
+ "loss": 0.1344,
1224
  "step": 1770
1225
  },
1226
  {
1227
+ "epoch": 2.88,
1228
+ "learning_rate": 5.598705501618123e-05,
1229
+ "loss": 0.1425,
1230
  "step": 1780
1231
  },
1232
  {
1233
+ "epoch": 2.9,
1234
+ "learning_rate": 5.51779935275081e-05,
1235
+ "loss": 0.143,
1236
  "step": 1790
1237
  },
1238
  {
1239
+ "epoch": 2.91,
1240
+ "learning_rate": 5.436893203883495e-05,
1241
+ "loss": 0.1575,
1242
  "step": 1800
1243
  },
1244
  {
1245
+ "epoch": 2.91,
1246
+ "eval_accuracy": 0.8962788114412663,
1247
+ "eval_loss": 0.2805336117744446,
1248
+ "eval_runtime": 148.8673,
1249
+ "eval_samples_per_second": 145.136,
1250
+ "eval_steps_per_second": 18.144,
1251
  "step": 1800
1252
  },
1253
  {
1254
+ "epoch": 2.93,
1255
+ "learning_rate": 5.355987055016182e-05,
1256
+ "loss": 0.1469,
1257
  "step": 1810
1258
  },
1259
  {
1260
+ "epoch": 2.94,
1261
+ "learning_rate": 5.275080906148867e-05,
1262
+ "loss": 0.1529,
1263
  "step": 1820
1264
  },
1265
  {
1266
+ "epoch": 2.96,
1267
+ "learning_rate": 5.194174757281554e-05,
1268
+ "loss": 0.1346,
1269
  "step": 1830
1270
  },
1271
  {
1272
+ "epoch": 2.98,
1273
+ "learning_rate": 5.11326860841424e-05,
1274
+ "loss": 0.1406,
1275
  "step": 1840
1276
  },
1277
  {
1278
+ "epoch": 2.99,
1279
+ "learning_rate": 5.032362459546926e-05,
1280
+ "loss": 0.1352,
1281
  "step": 1850
1282
  },
1283
  {
1284
+ "epoch": 3.01,
1285
+ "learning_rate": 4.951456310679612e-05,
1286
+ "loss": 0.1039,
1287
  "step": 1860
1288
  },
1289
  {
1290
+ "epoch": 3.03,
1291
+ "learning_rate": 4.870550161812298e-05,
1292
+ "loss": 0.0652,
1293
  "step": 1870
1294
  },
1295
  {
1296
+ "epoch": 3.04,
1297
+ "learning_rate": 4.789644012944984e-05,
1298
+ "loss": 0.0643,
1299
  "step": 1880
1300
  },
1301
  {
1302
+ "epoch": 3.06,
1303
+ "learning_rate": 4.7087378640776703e-05,
1304
+ "loss": 0.0767,
1305
  "step": 1890
1306
  },
1307
  {
1308
+ "epoch": 3.07,
1309
+ "learning_rate": 4.627831715210356e-05,
1310
+ "loss": 0.0698,
1311
  "step": 1900
1312
  },
1313
  {
1314
+ "epoch": 3.07,
1315
+ "eval_accuracy": 0.8984078496713875,
1316
+ "eval_loss": 0.3030557930469513,
1317
+ "eval_runtime": 151.4685,
1318
+ "eval_samples_per_second": 142.643,
1319
+ "eval_steps_per_second": 17.832,
1320
  "step": 1900
1321
  },
1322
  {
1323
+ "epoch": 3.09,
1324
+ "learning_rate": 4.546925566343042e-05,
1325
+ "loss": 0.0714,
1326
  "step": 1910
1327
  },
1328
  {
1329
+ "epoch": 3.11,
1330
+ "learning_rate": 4.466019417475728e-05,
1331
+ "loss": 0.0615,
1332
  "step": 1920
1333
  },
1334
  {
1335
+ "epoch": 3.12,
1336
+ "learning_rate": 4.385113268608414e-05,
1337
+ "loss": 0.0606,
1338
  "step": 1930
1339
  },
1340
  {
1341
+ "epoch": 3.14,
1342
+ "learning_rate": 4.3042071197411e-05,
1343
+ "loss": 0.067,
1344
  "step": 1940
1345
  },
1346
  {
1347
+ "epoch": 3.16,
1348
+ "learning_rate": 4.223300970873786e-05,
1349
+ "loss": 0.0671,
1350
  "step": 1950
1351
  },
1352
  {
1353
+ "epoch": 3.17,
1354
+ "learning_rate": 4.142394822006473e-05,
1355
+ "loss": 0.059,
1356
  "step": 1960
1357
  },
1358
  {
1359
+ "epoch": 3.19,
1360
+ "learning_rate": 4.061488673139159e-05,
1361
+ "loss": 0.0617,
1362
  "step": 1970
1363
  },
1364
  {
1365
+ "epoch": 3.2,
1366
+ "learning_rate": 3.980582524271845e-05,
1367
+ "loss": 0.0702,
1368
  "step": 1980
1369
  },
1370
  {
1371
+ "epoch": 3.22,
1372
+ "learning_rate": 3.899676375404531e-05,
1373
+ "loss": 0.0596,
1374
  "step": 1990
1375
  },
1376
  {
1377
+ "epoch": 3.24,
1378
+ "learning_rate": 3.818770226537217e-05,
1379
+ "loss": 0.0671,
1380
  "step": 2000
1381
  },
1382
  {
1383
+ "epoch": 3.24,
1384
+ "eval_accuracy": 0.9008608719800055,
1385
+ "eval_loss": 0.3075368106365204,
1386
+ "eval_runtime": 167.8743,
1387
+ "eval_samples_per_second": 128.703,
1388
+ "eval_steps_per_second": 16.089,
1389
  "step": 2000
1390
  },
1391
  {
1392
+ "epoch": 3.25,
1393
+ "learning_rate": 3.737864077669903e-05,
1394
+ "loss": 0.0588,
1395
  "step": 2010
1396
  },
1397
  {
1398
+ "epoch": 3.27,
1399
+ "learning_rate": 3.656957928802589e-05,
1400
+ "loss": 0.0551,
1401
  "step": 2020
1402
  },
1403
  {
1404
+ "epoch": 3.28,
1405
+ "learning_rate": 3.5760517799352755e-05,
1406
+ "loss": 0.0497,
1407
  "step": 2030
1408
  },
1409
  {
1410
+ "epoch": 3.3,
1411
+ "learning_rate": 3.4951456310679615e-05,
1412
+ "loss": 0.0644,
1413
  "step": 2040
1414
  },
1415
  {
1416
+ "epoch": 3.32,
1417
+ "learning_rate": 3.4142394822006475e-05,
1418
+ "loss": 0.0622,
1419
  "step": 2050
1420
  },
1421
  {
1422
+ "epoch": 3.33,
1423
+ "learning_rate": 3.3333333333333335e-05,
1424
+ "loss": 0.0637,
1425
  "step": 2060
1426
  },
1427
  {
1428
+ "epoch": 3.35,
1429
+ "learning_rate": 3.2524271844660195e-05,
1430
+ "loss": 0.0633,
1431
  "step": 2070
1432
  },
1433
  {
1434
+ "epoch": 3.37,
1435
+ "learning_rate": 3.1715210355987055e-05,
1436
+ "loss": 0.0633,
1437
  "step": 2080
1438
  },
1439
  {
1440
+ "epoch": 3.38,
1441
+ "learning_rate": 3.0906148867313915e-05,
1442
+ "loss": 0.0576,
1443
  "step": 2090
1444
  },
1445
  {
1446
+ "epoch": 3.4,
1447
+ "learning_rate": 3.0097087378640774e-05,
1448
+ "loss": 0.0576,
1449
  "step": 2100
1450
  },
1451
  {
1452
+ "epoch": 3.4,
1453
+ "eval_accuracy": 0.9028510598907711,
1454
+ "eval_loss": 0.30513760447502136,
1455
+ "eval_runtime": 172.0167,
1456
+ "eval_samples_per_second": 125.604,
1457
+ "eval_steps_per_second": 15.702,
1458
  "step": 2100
1459
  },
1460
  {
1461
+ "epoch": 3.41,
1462
+ "learning_rate": 2.928802588996764e-05,
1463
+ "loss": 0.0624,
1464
  "step": 2110
1465
  },
1466
  {
1467
+ "epoch": 3.43,
1468
+ "learning_rate": 2.84789644012945e-05,
1469
+ "loss": 0.0518,
1470
  "step": 2120
1471
  },
1472
  {
1473
+ "epoch": 3.45,
1474
+ "learning_rate": 2.766990291262136e-05,
1475
+ "loss": 0.0581,
1476
  "step": 2130
1477
  },
1478
  {
1479
+ "epoch": 3.46,
1480
+ "learning_rate": 2.6860841423948217e-05,
1481
+ "loss": 0.0516,
1482
  "step": 2140
1483
  },
1484
  {
1485
+ "epoch": 3.48,
1486
+ "learning_rate": 2.6051779935275084e-05,
1487
+ "loss": 0.0595,
1488
  "step": 2150
1489
  },
1490
  {
1491
+ "epoch": 3.5,
1492
+ "learning_rate": 2.5242718446601944e-05,
1493
+ "loss": 0.0541,
1494
  "step": 2160
1495
  },
1496
  {
1497
+ "epoch": 3.51,
1498
+ "learning_rate": 2.4433656957928804e-05,
1499
+ "loss": 0.0447,
1500
  "step": 2170
1501
  },
1502
  {
1503
+ "epoch": 3.53,
1504
+ "learning_rate": 2.3624595469255664e-05,
1505
+ "loss": 0.0621,
1506
  "step": 2180
1507
  },
1508
  {
1509
+ "epoch": 3.54,
1510
+ "learning_rate": 2.2815533980582527e-05,
1511
+ "loss": 0.0577,
1512
  "step": 2190
1513
  },
1514
  {
1515
+ "epoch": 3.56,
1516
+ "learning_rate": 2.2006472491909387e-05,
1517
+ "loss": 0.0519,
1518
  "step": 2200
1519
  },
1520
  {
1521
+ "epoch": 3.56,
1522
+ "eval_accuracy": 0.9066000185133759,
1523
+ "eval_loss": 0.298173189163208,
1524
+ "eval_runtime": 170.0796,
1525
+ "eval_samples_per_second": 127.035,
1526
+ "eval_steps_per_second": 15.881,
1527
  "step": 2200
1528
  },
1529
  {
1530
+ "epoch": 3.58,
1531
+ "learning_rate": 2.1197411003236247e-05,
1532
+ "loss": 0.057,
1533
  "step": 2210
1534
  },
1535
  {
1536
+ "epoch": 3.59,
1537
+ "learning_rate": 2.0388349514563107e-05,
1538
+ "loss": 0.0521,
1539
  "step": 2220
1540
  },
1541
  {
1542
+ "epoch": 3.61,
1543
+ "learning_rate": 1.957928802588997e-05,
1544
+ "loss": 0.0525,
1545
  "step": 2230
1546
  },
1547
  {
1548
+ "epoch": 3.62,
1549
+ "learning_rate": 1.877022653721683e-05,
1550
+ "loss": 0.0499,
1551
  "step": 2240
1552
  },
1553
  {
1554
+ "epoch": 3.64,
1555
+ "learning_rate": 1.796116504854369e-05,
1556
+ "loss": 0.0566,
1557
  "step": 2250
1558
  },
1559
  {
1560
+ "epoch": 3.66,
1561
+ "learning_rate": 1.715210355987055e-05,
1562
+ "loss": 0.0569,
1563
  "step": 2260
1564
  },
1565
  {
1566
+ "epoch": 3.67,
1567
+ "learning_rate": 1.6343042071197413e-05,
1568
+ "loss": 0.0453,
1569
  "step": 2270
1570
  },
1571
  {
1572
+ "epoch": 3.69,
1573
+ "learning_rate": 1.5533980582524273e-05,
1574
+ "loss": 0.0509,
1575
  "step": 2280
1576
  },
1577
  {
1578
+ "epoch": 3.71,
1579
+ "learning_rate": 1.4724919093851133e-05,
1580
+ "loss": 0.0482,
1581
  "step": 2290
1582
  },
1583
  {
1584
+ "epoch": 3.72,
1585
+ "learning_rate": 1.3915857605177996e-05,
1586
+ "loss": 0.0527,
1587
  "step": 2300
1588
  },
1589
  {
1590
+ "epoch": 3.72,
1591
+ "eval_accuracy": 0.9072479866703693,
1592
+ "eval_loss": 0.29742419719696045,
1593
+ "eval_runtime": 156.9396,
1594
+ "eval_samples_per_second": 137.671,
1595
+ "eval_steps_per_second": 17.21,
1596
  "step": 2300
1597
  },
1598
  {
1599
+ "epoch": 3.74,
1600
+ "learning_rate": 1.3106796116504854e-05,
1601
+ "loss": 0.0513,
1602
  "step": 2310
1603
  },
1604
  {
1605
+ "epoch": 3.75,
1606
+ "learning_rate": 1.2297734627831716e-05,
1607
+ "loss": 0.0619,
1608
  "step": 2320
1609
  },
1610
  {
1611
+ "epoch": 3.77,
1612
+ "learning_rate": 1.1488673139158575e-05,
1613
+ "loss": 0.0509,
1614
  "step": 2330
1615
  },
1616
  {
1617
+ "epoch": 3.79,
1618
+ "learning_rate": 1.0679611650485437e-05,
1619
+ "loss": 0.0491,
1620
  "step": 2340
1621
  },
1622
  {
1623
+ "epoch": 3.8,
1624
+ "learning_rate": 9.870550161812297e-06,
1625
+ "loss": 0.0511,
1626
  "step": 2350
1627
  },
1628
  {
1629
+ "epoch": 3.82,
1630
+ "learning_rate": 9.06148867313916e-06,
1631
+ "loss": 0.0475,
1632
  "step": 2360
1633
  },
1634
  {
1635
+ "epoch": 3.83,
1636
+ "learning_rate": 8.25242718446602e-06,
1637
+ "loss": 0.0542,
1638
  "step": 2370
1639
  },
1640
  {
1641
+ "epoch": 3.85,
1642
+ "learning_rate": 7.443365695792881e-06,
1643
+ "loss": 0.047,
1644
  "step": 2380
1645
  },
1646
  {
1647
+ "epoch": 3.87,
1648
+ "learning_rate": 6.6343042071197415e-06,
1649
+ "loss": 0.0461,
1650
  "step": 2390
1651
  },
1652
  {
1653
+ "epoch": 3.88,
1654
+ "learning_rate": 5.825242718446602e-06,
1655
+ "loss": 0.0561,
1656
  "step": 2400
1657
  },
1658
  {
1659
+ "epoch": 3.88,
1660
+ "eval_accuracy": 0.9091456077015644,
1661
+ "eval_loss": 0.2911698520183563,
1662
+ "eval_runtime": 166.2697,
1663
+ "eval_samples_per_second": 129.945,
1664
+ "eval_steps_per_second": 16.245,
1665
  "step": 2400
1666
  },
1667
  {
1668
+ "epoch": 3.9,
1669
+ "learning_rate": 5.016181229773463e-06,
1670
+ "loss": 0.0572,
1671
  "step": 2410
1672
  },
1673
  {
1674
+ "epoch": 3.92,
1675
+ "learning_rate": 4.207119741100324e-06,
1676
+ "loss": 0.0374,
1677
  "step": 2420
1678
  },
1679
  {
1680
+ "epoch": 3.93,
1681
+ "learning_rate": 3.3980582524271844e-06,
1682
+ "loss": 0.0472,
1683
  "step": 2430
1684
  },
1685
  {
1686
+ "epoch": 3.95,
1687
+ "learning_rate": 2.5889967637540456e-06,
1688
+ "loss": 0.0384,
1689
  "step": 2440
1690
  },
1691
  {
1692
+ "epoch": 3.96,
1693
+ "learning_rate": 1.7799352750809063e-06,
1694
+ "loss": 0.0439,
1695
  "step": 2450
1696
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1697
  {
1698
  "epoch": 3.98,
1699
+ "learning_rate": 9.70873786407767e-07,
1700
+ "loss": 0.0432,
1701
+ "step": 2460
 
 
 
 
 
 
 
 
 
 
 
 
1702
  },
1703
  {
1704
  "epoch": 4.0,
1705
+ "learning_rate": 1.6181229773462785e-07,
1706
+ "loss": 0.0533,
1707
+ "step": 2470
 
 
 
1708
  },
1709
  {
1710
  "epoch": 4.0,
1711
+ "step": 2472,
1712
+ "total_flos": 4.902492055502632e+19,
1713
+ "train_loss": 0.2963483939957006,
1714
+ "train_runtime": 9311.4957,
1715
+ "train_samples_per_second": 67.941,
1716
+ "train_steps_per_second": 0.265
1717
  }
1718
  ],
1719
  "logging_steps": 10,
1720
+ "max_steps": 2472,
1721
  "num_train_epochs": 4,
1722
  "save_steps": 100,
1723
+ "total_flos": 4.902492055502632e+19,
1724
  "trial_name": null,
1725
  "trial_params": null
1726
  }
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:7cef443c88853da73ec699795c25b90a198ccd472a8a42286e40534c796fd70c
3
  size 4027
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:95191dbaeaed65d25b211b1e6f98c929203ec44d65736eda326a894d9c7bfbe5
3
  size 4027