anderloh commited on
Commit
33cc0f0
·
verified ·
1 Parent(s): aacb4c9

Training in progress, epoch 1

Browse files
README.md ADDED
@@ -0,0 +1,387 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: anderloh/Hugginhface-master-wav2vec-pretreined-5-class-train-test
3
+ tags:
4
+ - audio-classification
5
+ - generated_from_trainer
6
+ metrics:
7
+ - accuracy
8
+ model-index:
9
+ - name: HuggingfaceTest
10
+ results: []
11
+ ---
12
+
13
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
+ should probably proofread and complete it, then remove this comment. -->
15
+
16
+ # HuggingfaceTest
17
+
18
+ This model is a fine-tuned version of [anderloh/Hugginhface-master-wav2vec-pretreined-5-class-train-test](https://huggingface.co/anderloh/Hugginhface-master-wav2vec-pretreined-5-class-train-test) on the anderloh/Master5Class dataset.
19
+ It achieves the following results on the evaluation set:
20
+ - Loss: 0.8156
21
+ - Accuracy: 0.7028
22
+
23
+ ## Model description
24
+
25
+ More information needed
26
+
27
+ ## Intended uses & limitations
28
+
29
+ More information needed
30
+
31
+ ## Training and evaluation data
32
+
33
+ More information needed
34
+
35
+ ## Training procedure
36
+
37
+ ### Training hyperparameters
38
+
39
+ The following hyperparameters were used during training:
40
+ - learning_rate: 3e-05
41
+ - train_batch_size: 128
42
+ - eval_batch_size: 128
43
+ - seed: 0
44
+ - gradient_accumulation_steps: 4
45
+ - total_train_batch_size: 512
46
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
47
+ - lr_scheduler_type: linear
48
+ - lr_scheduler_warmup_ratio: 0.1
49
+ - num_epochs: 350.0
50
+ - mixed_precision_training: Native AMP
51
+
52
+ ### Training results
53
+
54
+ | Training Loss | Epoch | Step | Validation Loss | Accuracy |
55
+ |:-------------:|:------:|:----:|:---------------:|:--------:|
56
+ | No log | 0.92 | 3 | 1.5989 | 0.3427 |
57
+ | No log | 1.85 | 6 | 1.5988 | 0.3427 |
58
+ | No log | 2.77 | 9 | 1.5986 | 0.3427 |
59
+ | No log | 4.0 | 13 | 1.5981 | 0.3427 |
60
+ | No log | 4.92 | 16 | 1.5976 | 0.3357 |
61
+ | No log | 5.85 | 19 | 1.5970 | 0.3427 |
62
+ | No log | 6.77 | 22 | 1.5963 | 0.3392 |
63
+ | No log | 8.0 | 26 | 1.5953 | 0.3357 |
64
+ | No log | 8.92 | 29 | 1.5943 | 0.3287 |
65
+ | No log | 9.85 | 32 | 1.5933 | 0.3287 |
66
+ | No log | 10.77 | 35 | 1.5922 | 0.3217 |
67
+ | No log | 12.0 | 39 | 1.5906 | 0.3182 |
68
+ | No log | 12.92 | 42 | 1.5892 | 0.3147 |
69
+ | No log | 13.85 | 45 | 1.5877 | 0.3007 |
70
+ | No log | 14.77 | 48 | 1.5862 | 0.2937 |
71
+ | 1.5907 | 16.0 | 52 | 1.5841 | 0.2972 |
72
+ | 1.5907 | 16.92 | 55 | 1.5824 | 0.2832 |
73
+ | 1.5907 | 17.85 | 58 | 1.5806 | 0.2797 |
74
+ | 1.5907 | 18.77 | 61 | 1.5788 | 0.2692 |
75
+ | 1.5907 | 20.0 | 65 | 1.5762 | 0.2692 |
76
+ | 1.5907 | 20.92 | 68 | 1.5740 | 0.2657 |
77
+ | 1.5907 | 21.85 | 71 | 1.5717 | 0.2552 |
78
+ | 1.5907 | 22.77 | 74 | 1.5694 | 0.2517 |
79
+ | 1.5907 | 24.0 | 78 | 1.5661 | 0.2378 |
80
+ | 1.5907 | 24.92 | 81 | 1.5635 | 0.2343 |
81
+ | 1.5907 | 25.85 | 84 | 1.5608 | 0.2238 |
82
+ | 1.5907 | 26.77 | 87 | 1.5581 | 0.2238 |
83
+ | 1.5907 | 28.0 | 91 | 1.5542 | 0.2273 |
84
+ | 1.5907 | 28.92 | 94 | 1.5511 | 0.2273 |
85
+ | 1.5907 | 29.85 | 97 | 1.5479 | 0.2273 |
86
+ | 1.5431 | 30.77 | 100 | 1.5448 | 0.2273 |
87
+ | 1.5431 | 32.0 | 104 | 1.5408 | 0.2273 |
88
+ | 1.5431 | 32.92 | 107 | 1.5380 | 0.2273 |
89
+ | 1.5431 | 33.85 | 110 | 1.5359 | 0.2273 |
90
+ | 1.5431 | 34.77 | 113 | 1.5345 | 0.2273 |
91
+ | 1.5431 | 36.0 | 117 | 1.5335 | 0.2273 |
92
+ | 1.5431 | 36.92 | 120 | 1.5341 | 0.2273 |
93
+ | 1.5431 | 37.85 | 123 | 1.5361 | 0.2273 |
94
+ | 1.5431 | 38.77 | 126 | 1.5397 | 0.2273 |
95
+ | 1.5431 | 40.0 | 130 | 1.5479 | 0.2273 |
96
+ | 1.5431 | 40.92 | 133 | 1.5564 | 0.2273 |
97
+ | 1.5431 | 41.85 | 136 | 1.5679 | 0.2273 |
98
+ | 1.5431 | 42.77 | 139 | 1.5822 | 0.2273 |
99
+ | 1.5431 | 44.0 | 143 | 1.6002 | 0.2273 |
100
+ | 1.5431 | 44.92 | 146 | 1.6109 | 0.2273 |
101
+ | 1.5431 | 45.85 | 149 | 1.6146 | 0.2273 |
102
+ | 1.4033 | 46.77 | 152 | 1.6131 | 0.2273 |
103
+ | 1.4033 | 48.0 | 156 | 1.6008 | 0.2273 |
104
+ | 1.4033 | 48.92 | 159 | 1.5862 | 0.2413 |
105
+ | 1.4033 | 49.85 | 162 | 1.5726 | 0.2692 |
106
+ | 1.4033 | 50.77 | 165 | 1.5599 | 0.2692 |
107
+ | 1.4033 | 52.0 | 169 | 1.5459 | 0.2867 |
108
+ | 1.4033 | 52.92 | 172 | 1.5383 | 0.2937 |
109
+ | 1.4033 | 53.85 | 175 | 1.5311 | 0.3147 |
110
+ | 1.4033 | 54.77 | 178 | 1.5242 | 0.3252 |
111
+ | 1.4033 | 56.0 | 182 | 1.5169 | 0.3357 |
112
+ | 1.4033 | 56.92 | 185 | 1.5103 | 0.3427 |
113
+ | 1.4033 | 57.85 | 188 | 1.5056 | 0.3462 |
114
+ | 1.4033 | 58.77 | 191 | 1.4995 | 0.3462 |
115
+ | 1.4033 | 60.0 | 195 | 1.4939 | 0.3497 |
116
+ | 1.4033 | 60.92 | 198 | 1.4870 | 0.3601 |
117
+ | 1.2485 | 61.85 | 201 | 1.4829 | 0.3671 |
118
+ | 1.2485 | 62.77 | 204 | 1.4735 | 0.3741 |
119
+ | 1.2485 | 64.0 | 208 | 1.4612 | 0.3811 |
120
+ | 1.2485 | 64.92 | 211 | 1.4492 | 0.3986 |
121
+ | 1.2485 | 65.85 | 214 | 1.4365 | 0.4126 |
122
+ | 1.2485 | 66.77 | 217 | 1.4227 | 0.4231 |
123
+ | 1.2485 | 68.0 | 221 | 1.4096 | 0.4336 |
124
+ | 1.2485 | 68.92 | 224 | 1.4010 | 0.4371 |
125
+ | 1.2485 | 69.85 | 227 | 1.3950 | 0.4406 |
126
+ | 1.2485 | 70.77 | 230 | 1.3920 | 0.4371 |
127
+ | 1.2485 | 72.0 | 234 | 1.3799 | 0.4406 |
128
+ | 1.2485 | 72.92 | 237 | 1.3669 | 0.4476 |
129
+ | 1.2485 | 73.85 | 240 | 1.3515 | 0.4545 |
130
+ | 1.2485 | 74.77 | 243 | 1.3401 | 0.4720 |
131
+ | 1.2485 | 76.0 | 247 | 1.3286 | 0.4825 |
132
+ | 1.1198 | 76.92 | 250 | 1.3175 | 0.4860 |
133
+ | 1.1198 | 77.85 | 253 | 1.3067 | 0.4895 |
134
+ | 1.1198 | 78.77 | 256 | 1.3013 | 0.4825 |
135
+ | 1.1198 | 80.0 | 260 | 1.2954 | 0.4790 |
136
+ | 1.1198 | 80.92 | 263 | 1.2897 | 0.4860 |
137
+ | 1.1198 | 81.85 | 266 | 1.2832 | 0.4860 |
138
+ | 1.1198 | 82.77 | 269 | 1.2712 | 0.4825 |
139
+ | 1.1198 | 84.0 | 273 | 1.2584 | 0.4930 |
140
+ | 1.1198 | 84.92 | 276 | 1.2516 | 0.4965 |
141
+ | 1.1198 | 85.85 | 279 | 1.2456 | 0.5 |
142
+ | 1.1198 | 86.77 | 282 | 1.2444 | 0.5105 |
143
+ | 1.1198 | 88.0 | 286 | 1.2373 | 0.5105 |
144
+ | 1.1198 | 88.92 | 289 | 1.2309 | 0.5140 |
145
+ | 1.1198 | 89.85 | 292 | 1.2219 | 0.5210 |
146
+ | 1.1198 | 90.77 | 295 | 1.2145 | 0.5210 |
147
+ | 1.1198 | 92.0 | 299 | 1.2054 | 0.5280 |
148
+ | 0.9915 | 92.92 | 302 | 1.1982 | 0.5350 |
149
+ | 0.9915 | 93.85 | 305 | 1.1913 | 0.5385 |
150
+ | 0.9915 | 94.77 | 308 | 1.1859 | 0.5455 |
151
+ | 0.9915 | 96.0 | 312 | 1.1794 | 0.5490 |
152
+ | 0.9915 | 96.92 | 315 | 1.1734 | 0.5455 |
153
+ | 0.9915 | 97.85 | 318 | 1.1638 | 0.5524 |
154
+ | 0.9915 | 98.77 | 321 | 1.1550 | 0.5524 |
155
+ | 0.9915 | 100.0 | 325 | 1.1465 | 0.5490 |
156
+ | 0.9915 | 100.92 | 328 | 1.1444 | 0.5594 |
157
+ | 0.9915 | 101.85 | 331 | 1.1359 | 0.5629 |
158
+ | 0.9915 | 102.77 | 334 | 1.1271 | 0.5664 |
159
+ | 0.9915 | 104.0 | 338 | 1.1090 | 0.5769 |
160
+ | 0.9915 | 104.92 | 341 | 1.0972 | 0.5944 |
161
+ | 0.9915 | 105.85 | 344 | 1.0901 | 0.6014 |
162
+ | 0.9915 | 106.77 | 347 | 1.0809 | 0.6084 |
163
+ | 0.8834 | 108.0 | 351 | 1.0683 | 0.6119 |
164
+ | 0.8834 | 108.92 | 354 | 1.0605 | 0.6224 |
165
+ | 0.8834 | 109.85 | 357 | 1.0563 | 0.6259 |
166
+ | 0.8834 | 110.77 | 360 | 1.0538 | 0.6224 |
167
+ | 0.8834 | 112.0 | 364 | 1.0491 | 0.6154 |
168
+ | 0.8834 | 112.92 | 367 | 1.0441 | 0.6119 |
169
+ | 0.8834 | 113.85 | 370 | 1.0358 | 0.6119 |
170
+ | 0.8834 | 114.77 | 373 | 1.0194 | 0.6224 |
171
+ | 0.8834 | 116.0 | 377 | 1.0034 | 0.6294 |
172
+ | 0.8834 | 116.92 | 380 | 0.9991 | 0.6259 |
173
+ | 0.8834 | 117.85 | 383 | 0.9960 | 0.6259 |
174
+ | 0.8834 | 118.77 | 386 | 0.9911 | 0.6294 |
175
+ | 0.8834 | 120.0 | 390 | 0.9834 | 0.6434 |
176
+ | 0.8834 | 120.92 | 393 | 0.9776 | 0.6434 |
177
+ | 0.8834 | 121.85 | 396 | 0.9773 | 0.6434 |
178
+ | 0.8834 | 122.77 | 399 | 0.9735 | 0.6434 |
179
+ | 0.7786 | 124.0 | 403 | 0.9731 | 0.6399 |
180
+ | 0.7786 | 124.92 | 406 | 0.9728 | 0.6434 |
181
+ | 0.7786 | 125.85 | 409 | 0.9657 | 0.6573 |
182
+ | 0.7786 | 126.77 | 412 | 0.9548 | 0.6573 |
183
+ | 0.7786 | 128.0 | 416 | 0.9424 | 0.6643 |
184
+ | 0.7786 | 128.92 | 419 | 0.9391 | 0.6678 |
185
+ | 0.7786 | 129.85 | 422 | 0.9418 | 0.6678 |
186
+ | 0.7786 | 130.77 | 425 | 0.9476 | 0.6608 |
187
+ | 0.7786 | 132.0 | 429 | 0.9457 | 0.6643 |
188
+ | 0.7786 | 132.92 | 432 | 0.9413 | 0.6643 |
189
+ | 0.7786 | 133.85 | 435 | 0.9334 | 0.6678 |
190
+ | 0.7786 | 134.77 | 438 | 0.9329 | 0.6678 |
191
+ | 0.7786 | 136.0 | 442 | 0.9334 | 0.6713 |
192
+ | 0.7786 | 136.92 | 445 | 0.9265 | 0.6713 |
193
+ | 0.7786 | 137.85 | 448 | 0.9187 | 0.6713 |
194
+ | 0.7133 | 138.77 | 451 | 0.9169 | 0.6678 |
195
+ | 0.7133 | 140.0 | 455 | 0.9142 | 0.6713 |
196
+ | 0.7133 | 140.92 | 458 | 0.9131 | 0.6713 |
197
+ | 0.7133 | 141.85 | 461 | 0.9161 | 0.6783 |
198
+ | 0.7133 | 142.77 | 464 | 0.9224 | 0.6678 |
199
+ | 0.7133 | 144.0 | 468 | 0.9139 | 0.6748 |
200
+ | 0.7133 | 144.92 | 471 | 0.9090 | 0.6748 |
201
+ | 0.7133 | 145.85 | 474 | 0.9073 | 0.6713 |
202
+ | 0.7133 | 146.77 | 477 | 0.9110 | 0.6608 |
203
+ | 0.7133 | 148.0 | 481 | 0.9167 | 0.6573 |
204
+ | 0.7133 | 148.92 | 484 | 0.9118 | 0.6643 |
205
+ | 0.7133 | 149.85 | 487 | 0.8996 | 0.6713 |
206
+ | 0.7133 | 150.77 | 490 | 0.8904 | 0.6748 |
207
+ | 0.7133 | 152.0 | 494 | 0.8889 | 0.6748 |
208
+ | 0.7133 | 152.92 | 497 | 0.8899 | 0.6713 |
209
+ | 0.6674 | 153.85 | 500 | 0.8874 | 0.6748 |
210
+ | 0.6674 | 154.77 | 503 | 0.8874 | 0.6748 |
211
+ | 0.6674 | 156.0 | 507 | 0.8905 | 0.6748 |
212
+ | 0.6674 | 156.92 | 510 | 0.8881 | 0.6783 |
213
+ | 0.6674 | 157.85 | 513 | 0.8829 | 0.6748 |
214
+ | 0.6674 | 158.77 | 516 | 0.8809 | 0.6783 |
215
+ | 0.6674 | 160.0 | 520 | 0.8781 | 0.6783 |
216
+ | 0.6674 | 160.92 | 523 | 0.8776 | 0.6818 |
217
+ | 0.6674 | 161.85 | 526 | 0.8796 | 0.6783 |
218
+ | 0.6674 | 162.77 | 529 | 0.8795 | 0.6818 |
219
+ | 0.6674 | 164.0 | 533 | 0.8797 | 0.6783 |
220
+ | 0.6674 | 164.92 | 536 | 0.8707 | 0.6783 |
221
+ | 0.6674 | 165.85 | 539 | 0.8697 | 0.6783 |
222
+ | 0.6674 | 166.77 | 542 | 0.8724 | 0.6783 |
223
+ | 0.6674 | 168.0 | 546 | 0.8704 | 0.6748 |
224
+ | 0.6674 | 168.92 | 549 | 0.8694 | 0.6748 |
225
+ | 0.6305 | 169.85 | 552 | 0.8740 | 0.6748 |
226
+ | 0.6305 | 170.77 | 555 | 0.8713 | 0.6748 |
227
+ | 0.6305 | 172.0 | 559 | 0.8682 | 0.6783 |
228
+ | 0.6305 | 172.92 | 562 | 0.8688 | 0.6783 |
229
+ | 0.6305 | 173.85 | 565 | 0.8693 | 0.6818 |
230
+ | 0.6305 | 174.77 | 568 | 0.8744 | 0.6783 |
231
+ | 0.6305 | 176.0 | 572 | 0.8760 | 0.6783 |
232
+ | 0.6305 | 176.92 | 575 | 0.8696 | 0.6853 |
233
+ | 0.6305 | 177.85 | 578 | 0.8669 | 0.6853 |
234
+ | 0.6305 | 178.77 | 581 | 0.8641 | 0.6853 |
235
+ | 0.6305 | 180.0 | 585 | 0.8697 | 0.6713 |
236
+ | 0.6305 | 180.92 | 588 | 0.8678 | 0.6748 |
237
+ | 0.6305 | 181.85 | 591 | 0.8621 | 0.6818 |
238
+ | 0.6305 | 182.77 | 594 | 0.8557 | 0.6888 |
239
+ | 0.6305 | 184.0 | 598 | 0.8481 | 0.6888 |
240
+ | 0.6095 | 184.92 | 601 | 0.8429 | 0.6888 |
241
+ | 0.6095 | 185.85 | 604 | 0.8413 | 0.6888 |
242
+ | 0.6095 | 186.77 | 607 | 0.8402 | 0.6923 |
243
+ | 0.6095 | 188.0 | 611 | 0.8415 | 0.6888 |
244
+ | 0.6095 | 188.92 | 614 | 0.8410 | 0.6923 |
245
+ | 0.6095 | 189.85 | 617 | 0.8389 | 0.6853 |
246
+ | 0.6095 | 190.77 | 620 | 0.8354 | 0.6853 |
247
+ | 0.6095 | 192.0 | 624 | 0.8357 | 0.6888 |
248
+ | 0.6095 | 192.92 | 627 | 0.8401 | 0.6958 |
249
+ | 0.6095 | 193.85 | 630 | 0.8449 | 0.6958 |
250
+ | 0.6095 | 194.77 | 633 | 0.8479 | 0.6958 |
251
+ | 0.6095 | 196.0 | 637 | 0.8455 | 0.6923 |
252
+ | 0.6095 | 196.92 | 640 | 0.8422 | 0.6923 |
253
+ | 0.6095 | 197.85 | 643 | 0.8425 | 0.6923 |
254
+ | 0.6095 | 198.77 | 646 | 0.8437 | 0.6923 |
255
+ | 0.5908 | 200.0 | 650 | 0.8367 | 0.6958 |
256
+ | 0.5908 | 200.92 | 653 | 0.8347 | 0.6993 |
257
+ | 0.5908 | 201.85 | 656 | 0.8287 | 0.6958 |
258
+ | 0.5908 | 202.77 | 659 | 0.8260 | 0.6923 |
259
+ | 0.5908 | 204.0 | 663 | 0.8264 | 0.6958 |
260
+ | 0.5908 | 204.92 | 666 | 0.8295 | 0.6958 |
261
+ | 0.5908 | 205.85 | 669 | 0.8302 | 0.6923 |
262
+ | 0.5908 | 206.77 | 672 | 0.8285 | 0.6923 |
263
+ | 0.5908 | 208.0 | 676 | 0.8311 | 0.6923 |
264
+ | 0.5908 | 208.92 | 679 | 0.8321 | 0.6923 |
265
+ | 0.5908 | 209.85 | 682 | 0.8306 | 0.6923 |
266
+ | 0.5908 | 210.77 | 685 | 0.8303 | 0.6923 |
267
+ | 0.5908 | 212.0 | 689 | 0.8256 | 0.6993 |
268
+ | 0.5908 | 212.92 | 692 | 0.8230 | 0.6958 |
269
+ | 0.5908 | 213.85 | 695 | 0.8194 | 0.6958 |
270
+ | 0.5908 | 214.77 | 698 | 0.8183 | 0.6958 |
271
+ | 0.5763 | 216.0 | 702 | 0.8232 | 0.6958 |
272
+ | 0.5763 | 216.92 | 705 | 0.8237 | 0.6888 |
273
+ | 0.5763 | 217.85 | 708 | 0.8196 | 0.6993 |
274
+ | 0.5763 | 218.77 | 711 | 0.8142 | 0.6993 |
275
+ | 0.5763 | 220.0 | 715 | 0.8115 | 0.6993 |
276
+ | 0.5763 | 220.92 | 718 | 0.8130 | 0.6993 |
277
+ | 0.5763 | 221.85 | 721 | 0.8156 | 0.7028 |
278
+ | 0.5763 | 222.77 | 724 | 0.8201 | 0.6958 |
279
+ | 0.5763 | 224.0 | 728 | 0.8227 | 0.6958 |
280
+ | 0.5763 | 224.92 | 731 | 0.8232 | 0.6958 |
281
+ | 0.5763 | 225.85 | 734 | 0.8198 | 0.6923 |
282
+ | 0.5763 | 226.77 | 737 | 0.8151 | 0.6923 |
283
+ | 0.5763 | 228.0 | 741 | 0.8136 | 0.6923 |
284
+ | 0.5763 | 228.92 | 744 | 0.8134 | 0.6923 |
285
+ | 0.5763 | 229.85 | 747 | 0.8123 | 0.6958 |
286
+ | 0.57 | 230.77 | 750 | 0.8095 | 0.6958 |
287
+ | 0.57 | 232.0 | 754 | 0.8082 | 0.6958 |
288
+ | 0.57 | 232.92 | 757 | 0.8084 | 0.6958 |
289
+ | 0.57 | 233.85 | 760 | 0.8114 | 0.6923 |
290
+ | 0.57 | 234.77 | 763 | 0.8130 | 0.6923 |
291
+ | 0.57 | 236.0 | 767 | 0.8154 | 0.6923 |
292
+ | 0.57 | 236.92 | 770 | 0.8160 | 0.6923 |
293
+ | 0.57 | 237.85 | 773 | 0.8126 | 0.6888 |
294
+ | 0.57 | 238.77 | 776 | 0.8114 | 0.6888 |
295
+ | 0.57 | 240.0 | 780 | 0.8041 | 0.6923 |
296
+ | 0.57 | 240.92 | 783 | 0.8006 | 0.6923 |
297
+ | 0.57 | 241.85 | 786 | 0.7987 | 0.6958 |
298
+ | 0.57 | 242.77 | 789 | 0.7977 | 0.6993 |
299
+ | 0.57 | 244.0 | 793 | 0.8001 | 0.6993 |
300
+ | 0.57 | 244.92 | 796 | 0.8044 | 0.6958 |
301
+ | 0.57 | 245.85 | 799 | 0.8082 | 0.6958 |
302
+ | 0.5456 | 246.77 | 802 | 0.8121 | 0.6888 |
303
+ | 0.5456 | 248.0 | 806 | 0.8107 | 0.6888 |
304
+ | 0.5456 | 248.92 | 809 | 0.8064 | 0.6958 |
305
+ | 0.5456 | 249.85 | 812 | 0.8042 | 0.6958 |
306
+ | 0.5456 | 250.77 | 815 | 0.8006 | 0.6958 |
307
+ | 0.5456 | 252.0 | 819 | 0.7969 | 0.6958 |
308
+ | 0.5456 | 252.92 | 822 | 0.7955 | 0.6993 |
309
+ | 0.5456 | 253.85 | 825 | 0.7973 | 0.6958 |
310
+ | 0.5456 | 254.77 | 828 | 0.8001 | 0.6958 |
311
+ | 0.5456 | 256.0 | 832 | 0.8035 | 0.6888 |
312
+ | 0.5456 | 256.92 | 835 | 0.8035 | 0.6853 |
313
+ | 0.5456 | 257.85 | 838 | 0.8012 | 0.6923 |
314
+ | 0.5456 | 258.77 | 841 | 0.8000 | 0.6923 |
315
+ | 0.5456 | 260.0 | 845 | 0.7963 | 0.6888 |
316
+ | 0.5456 | 260.92 | 848 | 0.7928 | 0.6958 |
317
+ | 0.5369 | 261.85 | 851 | 0.7919 | 0.6923 |
318
+ | 0.5369 | 262.77 | 854 | 0.7913 | 0.6888 |
319
+ | 0.5369 | 264.0 | 858 | 0.7929 | 0.6888 |
320
+ | 0.5369 | 264.92 | 861 | 0.7955 | 0.6818 |
321
+ | 0.5369 | 265.85 | 864 | 0.7963 | 0.6853 |
322
+ | 0.5369 | 266.77 | 867 | 0.7952 | 0.6888 |
323
+ | 0.5369 | 268.0 | 871 | 0.7936 | 0.6888 |
324
+ | 0.5369 | 268.92 | 874 | 0.7929 | 0.6853 |
325
+ | 0.5369 | 269.85 | 877 | 0.7933 | 0.6853 |
326
+ | 0.5369 | 270.77 | 880 | 0.7941 | 0.6853 |
327
+ | 0.5369 | 272.0 | 884 | 0.7940 | 0.6853 |
328
+ | 0.5369 | 272.92 | 887 | 0.7929 | 0.6853 |
329
+ | 0.5369 | 273.85 | 890 | 0.7930 | 0.6853 |
330
+ | 0.5369 | 274.77 | 893 | 0.7943 | 0.6853 |
331
+ | 0.5369 | 276.0 | 897 | 0.7944 | 0.6853 |
332
+ | 0.5388 | 276.92 | 900 | 0.7933 | 0.6853 |
333
+ | 0.5388 | 277.85 | 903 | 0.7914 | 0.6853 |
334
+ | 0.5388 | 278.77 | 906 | 0.7904 | 0.6853 |
335
+ | 0.5388 | 280.0 | 910 | 0.7888 | 0.6853 |
336
+ | 0.5388 | 280.92 | 913 | 0.7900 | 0.6853 |
337
+ | 0.5388 | 281.85 | 916 | 0.7906 | 0.6853 |
338
+ | 0.5388 | 282.77 | 919 | 0.7911 | 0.6853 |
339
+ | 0.5388 | 284.0 | 923 | 0.7907 | 0.6853 |
340
+ | 0.5388 | 284.92 | 926 | 0.7907 | 0.6853 |
341
+ | 0.5388 | 285.85 | 929 | 0.7905 | 0.6818 |
342
+ | 0.5388 | 286.77 | 932 | 0.7900 | 0.6818 |
343
+ | 0.5388 | 288.0 | 936 | 0.7901 | 0.6853 |
344
+ | 0.5388 | 288.92 | 939 | 0.7902 | 0.6853 |
345
+ | 0.5388 | 289.85 | 942 | 0.7910 | 0.6853 |
346
+ | 0.5388 | 290.77 | 945 | 0.7914 | 0.6888 |
347
+ | 0.5388 | 292.0 | 949 | 0.7920 | 0.6888 |
348
+ | 0.5261 | 292.92 | 952 | 0.7928 | 0.6853 |
349
+ | 0.5261 | 293.85 | 955 | 0.7932 | 0.6888 |
350
+ | 0.5261 | 294.77 | 958 | 0.7925 | 0.6888 |
351
+ | 0.5261 | 296.0 | 962 | 0.7922 | 0.6888 |
352
+ | 0.5261 | 296.92 | 965 | 0.7919 | 0.6888 |
353
+ | 0.5261 | 297.85 | 968 | 0.7922 | 0.6888 |
354
+ | 0.5261 | 298.77 | 971 | 0.7921 | 0.6888 |
355
+ | 0.5261 | 300.0 | 975 | 0.7912 | 0.6853 |
356
+ | 0.5261 | 300.92 | 978 | 0.7907 | 0.6853 |
357
+ | 0.5261 | 301.85 | 981 | 0.7896 | 0.6853 |
358
+ | 0.5261 | 302.77 | 984 | 0.7885 | 0.6888 |
359
+ | 0.5261 | 304.0 | 988 | 0.7877 | 0.6888 |
360
+ | 0.5261 | 304.92 | 991 | 0.7874 | 0.6888 |
361
+ | 0.5261 | 305.85 | 994 | 0.7876 | 0.6888 |
362
+ | 0.5261 | 306.77 | 997 | 0.7879 | 0.6888 |
363
+ | 0.5188 | 308.0 | 1001 | 0.7884 | 0.6888 |
364
+ | 0.5188 | 308.92 | 1004 | 0.7887 | 0.6888 |
365
+ | 0.5188 | 309.85 | 1007 | 0.7890 | 0.6888 |
366
+ | 0.5188 | 310.77 | 1010 | 0.7894 | 0.6888 |
367
+ | 0.5188 | 312.0 | 1014 | 0.7899 | 0.6888 |
368
+ | 0.5188 | 312.92 | 1017 | 0.7904 | 0.6888 |
369
+ | 0.5188 | 313.85 | 1020 | 0.7907 | 0.6923 |
370
+ | 0.5188 | 314.77 | 1023 | 0.7910 | 0.6923 |
371
+ | 0.5188 | 316.0 | 1027 | 0.7912 | 0.6923 |
372
+ | 0.5188 | 316.92 | 1030 | 0.7912 | 0.6923 |
373
+ | 0.5188 | 317.85 | 1033 | 0.7912 | 0.6923 |
374
+ | 0.5188 | 318.77 | 1036 | 0.7913 | 0.6923 |
375
+ | 0.5188 | 320.0 | 1040 | 0.7913 | 0.6923 |
376
+ | 0.5188 | 320.92 | 1043 | 0.7912 | 0.6923 |
377
+ | 0.5188 | 321.85 | 1046 | 0.7912 | 0.6923 |
378
+ | 0.5188 | 322.77 | 1049 | 0.7911 | 0.6923 |
379
+ | 0.5194 | 323.08 | 1050 | 0.7911 | 0.6923 |
380
+
381
+
382
+ ### Framework versions
383
+
384
+ - Transformers 4.39.0.dev0
385
+ - Pytorch 2.2.1+cu121
386
+ - Datasets 2.17.1
387
+ - Tokenizers 0.15.2
all_results.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 323.08,
3
+ "eval_accuracy": 0.7027972027972028,
4
+ "eval_loss": 0.8156144022941589,
5
+ "eval_runtime": 5.0661,
6
+ "eval_samples_per_second": 56.453,
7
+ "eval_steps_per_second": 0.592,
8
+ "train_loss": 0.8143934268043155,
9
+ "train_runtime": 4784.9132,
10
+ "train_samples_per_second": 113.231,
11
+ "train_steps_per_second": 0.219
12
+ }
config.json ADDED
@@ -0,0 +1,124 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "anderloh/Hugginhface-master-wav2vec-pretreined-5-class-train-test",
3
+ "activation_dropout": 0.0,
4
+ "adapter_attn_dim": null,
5
+ "adapter_kernel_size": 3,
6
+ "adapter_stride": 2,
7
+ "add_adapter": false,
8
+ "apply_spec_augment": true,
9
+ "architectures": [
10
+ "Wav2Vec2ForSequenceClassification"
11
+ ],
12
+ "attention_dropout": 0.0,
13
+ "bos_token_id": 1,
14
+ "classifier_proj_size": 128,
15
+ "codevector_dim": 128,
16
+ "contrastive_logits_temperature": 0.1,
17
+ "conv_bias": true,
18
+ "conv_dim": [
19
+ 256,
20
+ 256,
21
+ 256,
22
+ 256,
23
+ 256,
24
+ 256,
25
+ 256
26
+ ],
27
+ "conv_kernel": [
28
+ 10,
29
+ 3,
30
+ 3,
31
+ 3,
32
+ 3,
33
+ 2,
34
+ 2
35
+ ],
36
+ "conv_stride": [
37
+ 5,
38
+ 2,
39
+ 2,
40
+ 2,
41
+ 2,
42
+ 2,
43
+ 2
44
+ ],
45
+ "ctc_loss_reduction": "sum",
46
+ "ctc_zero_infinity": false,
47
+ "diversity_loss_weight": 0.1,
48
+ "do_stable_layer_norm": true,
49
+ "eos_token_id": 2,
50
+ "feat_extract_activation": "gelu",
51
+ "feat_extract_dropout": 0.0,
52
+ "feat_extract_norm": "layer",
53
+ "feat_proj_dropout": 0.0,
54
+ "feat_quantizer_dropout": 0.0,
55
+ "final_dropout": 0.0,
56
+ "finetuning_task": "audio-classification",
57
+ "hidden_act": "gelu",
58
+ "hidden_dropout": 0.0,
59
+ "hidden_dropout_prob": 0.0,
60
+ "hidden_size": 384,
61
+ "id2label": {
62
+ "0": "Helicopter",
63
+ "1": "Jet",
64
+ "2": "Racecar",
65
+ "3": "Rail",
66
+ "4": "Truck"
67
+ },
68
+ "initializer_range": 0.02,
69
+ "intermediate_size": 1536,
70
+ "label2id": {
71
+ "Helicopter": "0",
72
+ "Jet": "1",
73
+ "Racecar": "2",
74
+ "Rail": "3",
75
+ "Truck": "4"
76
+ },
77
+ "layer_norm_eps": 1e-05,
78
+ "layerdrop": 0.0,
79
+ "mask_feature_length": 10,
80
+ "mask_feature_min_masks": 0,
81
+ "mask_feature_prob": 0.0,
82
+ "mask_time_length": 10,
83
+ "mask_time_min_masks": 2,
84
+ "mask_time_prob": 0.65,
85
+ "model_type": "wav2vec2",
86
+ "num_adapter_layers": 3,
87
+ "num_attention_heads": 6,
88
+ "num_codevector_groups": 2,
89
+ "num_codevectors_per_group": 320,
90
+ "num_conv_pos_embedding_groups": 16,
91
+ "num_conv_pos_embeddings": 128,
92
+ "num_feat_extract_layers": 7,
93
+ "num_hidden_layers": 6,
94
+ "num_negatives": 100,
95
+ "output_hidden_size": 384,
96
+ "pad_token_id": 0,
97
+ "proj_codevector_dim": 128,
98
+ "tdnn_dilation": [
99
+ 1,
100
+ 2,
101
+ 3,
102
+ 1,
103
+ 1
104
+ ],
105
+ "tdnn_dim": [
106
+ 512,
107
+ 512,
108
+ 512,
109
+ 512,
110
+ 1500
111
+ ],
112
+ "tdnn_kernel": [
113
+ 5,
114
+ 3,
115
+ 3,
116
+ 1,
117
+ 1
118
+ ],
119
+ "torch_dtype": "float32",
120
+ "transformers_version": "4.39.0.dev0",
121
+ "use_weighted_layer_sum": false,
122
+ "vocab_size": 32,
123
+ "xvector_output_dim": 512
124
+ }
eval_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 323.08,
3
+ "eval_accuracy": 0.7027972027972028,
4
+ "eval_loss": 0.8156144022941589,
5
+ "eval_runtime": 5.0661,
6
+ "eval_samples_per_second": 56.453,
7
+ "eval_steps_per_second": 0.592
8
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:429195c8c73df7eab4c2c253119a3c88a67a1708c472bece1c0b063064fa1fdb
3
+ size 52151348
preprocessor_config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "do_normalize": true,
3
+ "feature_extractor_type": "Wav2Vec2FeatureExtractor",
4
+ "feature_size": 1,
5
+ "padding_side": "right",
6
+ "padding_value": 0.0,
7
+ "return_attention_mask": true,
8
+ "sampling_rate": 16000
9
+ }
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:23572f8f936e2e679c8b495d2b62b1adb7f3d2ee9b9638c29ec61030ff9d884a
3
+ size 52182770
runs/Jun19_18-32-08_ml6.hpc.uio.no/events.out.tfevents.1718814757.ml6.hpc.uio.no.3740755.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a4eaf4ed221f11881caa78c440f22411ca4ec634ac8edd09827ae647c95b5f95
3
+ size 84305
runs/Jun19_19-59-24_ml6.hpc.uio.no/events.out.tfevents.1718819982.ml6.hpc.uio.no.3789351.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b02820f5aa4bdb4a46fdea41d3adc8f8e495a717251e477df090fd5096ac4d73
3
+ size 115612
runs/Jun19_19-59-24_ml6.hpc.uio.no/events.out.tfevents.1718824779.ml6.hpc.uio.no.3789351.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cd383e2a0a632cc9427727b4a52ff49d1d73ffca3a4750f2874b621fb4e917fa
3
+ size 411
runs/Jun19_21-46-15_ml6.hpc.uio.no/events.out.tfevents.1718826395.ml6.hpc.uio.no.3855193.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ccf19d9bc22e78a4faa8500752bb884b9dfc9fdf0ae94d1f7409dd78e4f9d1a7
3
+ size 7043
runs/May04_11-28-58_ml6.hpc.uio.no/events.out.tfevents.1714815008.ml6.hpc.uio.no.725046.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:90dc34e3ee0c8274e55cc4fa0cdb9655a770377cde6f2c95e6ab6ed3ad180461
3
+ size 7663
runs/May04_11-33-59_ml6.hpc.uio.no/events.out.tfevents.1714815251.ml6.hpc.uio.no.727136.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:35083425e167ed4543d937a329a15acf3e50b1e20dae60efaaaba021ef86a223
3
+ size 101497
train_results.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 323.08,
3
+ "train_loss": 0.8143934268043155,
4
+ "train_runtime": 4784.9132,
5
+ "train_samples_per_second": 113.231,
6
+ "train_steps_per_second": 0.219
7
+ }
trainer_state.json ADDED
@@ -0,0 +1,3093 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.7027972027972028,
3
+ "best_model_checkpoint": "wav2vec2-5Class-train-test-finetune/checkpoint-721",
4
+ "epoch": 323.0769230769231,
5
+ "eval_steps": 500,
6
+ "global_step": 1050,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.92,
13
+ "eval_accuracy": 0.34265734265734266,
14
+ "eval_loss": 1.59893798828125,
15
+ "eval_runtime": 4.2802,
16
+ "eval_samples_per_second": 66.819,
17
+ "eval_steps_per_second": 0.701,
18
+ "step": 3
19
+ },
20
+ {
21
+ "epoch": 1.85,
22
+ "eval_accuracy": 0.34265734265734266,
23
+ "eval_loss": 1.5987956523895264,
24
+ "eval_runtime": 4.8166,
25
+ "eval_samples_per_second": 59.378,
26
+ "eval_steps_per_second": 0.623,
27
+ "step": 6
28
+ },
29
+ {
30
+ "epoch": 2.77,
31
+ "eval_accuracy": 0.34265734265734266,
32
+ "eval_loss": 1.598555326461792,
33
+ "eval_runtime": 3.989,
34
+ "eval_samples_per_second": 71.697,
35
+ "eval_steps_per_second": 0.752,
36
+ "step": 9
37
+ },
38
+ {
39
+ "epoch": 4.0,
40
+ "eval_accuracy": 0.34265734265734266,
41
+ "eval_loss": 1.598075270652771,
42
+ "eval_runtime": 4.3871,
43
+ "eval_samples_per_second": 65.191,
44
+ "eval_steps_per_second": 0.684,
45
+ "step": 13
46
+ },
47
+ {
48
+ "epoch": 4.92,
49
+ "eval_accuracy": 0.3356643356643357,
50
+ "eval_loss": 1.5975924730300903,
51
+ "eval_runtime": 4.7955,
52
+ "eval_samples_per_second": 59.639,
53
+ "eval_steps_per_second": 0.626,
54
+ "step": 16
55
+ },
56
+ {
57
+ "epoch": 5.85,
58
+ "eval_accuracy": 0.34265734265734266,
59
+ "eval_loss": 1.5970256328582764,
60
+ "eval_runtime": 4.4665,
61
+ "eval_samples_per_second": 64.032,
62
+ "eval_steps_per_second": 0.672,
63
+ "step": 19
64
+ },
65
+ {
66
+ "epoch": 6.77,
67
+ "eval_accuracy": 0.33916083916083917,
68
+ "eval_loss": 1.5963499546051025,
69
+ "eval_runtime": 4.3016,
70
+ "eval_samples_per_second": 66.488,
71
+ "eval_steps_per_second": 0.697,
72
+ "step": 22
73
+ },
74
+ {
75
+ "epoch": 8.0,
76
+ "eval_accuracy": 0.3356643356643357,
77
+ "eval_loss": 1.5952636003494263,
78
+ "eval_runtime": 3.9531,
79
+ "eval_samples_per_second": 72.347,
80
+ "eval_steps_per_second": 0.759,
81
+ "step": 26
82
+ },
83
+ {
84
+ "epoch": 8.92,
85
+ "eval_accuracy": 0.32867132867132864,
86
+ "eval_loss": 1.594333291053772,
87
+ "eval_runtime": 5.6915,
88
+ "eval_samples_per_second": 50.25,
89
+ "eval_steps_per_second": 0.527,
90
+ "step": 29
91
+ },
92
+ {
93
+ "epoch": 9.85,
94
+ "eval_accuracy": 0.32867132867132864,
95
+ "eval_loss": 1.5933252573013306,
96
+ "eval_runtime": 4.4236,
97
+ "eval_samples_per_second": 64.653,
98
+ "eval_steps_per_second": 0.678,
99
+ "step": 32
100
+ },
101
+ {
102
+ "epoch": 10.77,
103
+ "eval_accuracy": 0.32167832167832167,
104
+ "eval_loss": 1.592211365699768,
105
+ "eval_runtime": 4.9541,
106
+ "eval_samples_per_second": 57.73,
107
+ "eval_steps_per_second": 0.606,
108
+ "step": 35
109
+ },
110
+ {
111
+ "epoch": 12.0,
112
+ "eval_accuracy": 0.3181818181818182,
113
+ "eval_loss": 1.5905568599700928,
114
+ "eval_runtime": 5.1955,
115
+ "eval_samples_per_second": 55.047,
116
+ "eval_steps_per_second": 0.577,
117
+ "step": 39
118
+ },
119
+ {
120
+ "epoch": 12.92,
121
+ "eval_accuracy": 0.3146853146853147,
122
+ "eval_loss": 1.58920156955719,
123
+ "eval_runtime": 3.6236,
124
+ "eval_samples_per_second": 78.926,
125
+ "eval_steps_per_second": 0.828,
126
+ "step": 42
127
+ },
128
+ {
129
+ "epoch": 13.85,
130
+ "eval_accuracy": 0.3006993006993007,
131
+ "eval_loss": 1.5877453088760376,
132
+ "eval_runtime": 4.348,
133
+ "eval_samples_per_second": 65.778,
134
+ "eval_steps_per_second": 0.69,
135
+ "step": 45
136
+ },
137
+ {
138
+ "epoch": 14.77,
139
+ "eval_accuracy": 0.2937062937062937,
140
+ "eval_loss": 1.5862104892730713,
141
+ "eval_runtime": 4.6902,
142
+ "eval_samples_per_second": 60.978,
143
+ "eval_steps_per_second": 0.64,
144
+ "step": 48
145
+ },
146
+ {
147
+ "epoch": 15.38,
148
+ "grad_norm": 65952.0234375,
149
+ "learning_rate": 1.4285714285714285e-05,
150
+ "loss": 1.5907,
151
+ "step": 50
152
+ },
153
+ {
154
+ "epoch": 16.0,
155
+ "eval_accuracy": 0.2972027972027972,
156
+ "eval_loss": 1.5840750932693481,
157
+ "eval_runtime": 4.547,
158
+ "eval_samples_per_second": 62.899,
159
+ "eval_steps_per_second": 0.66,
160
+ "step": 52
161
+ },
162
+ {
163
+ "epoch": 16.92,
164
+ "eval_accuracy": 0.28321678321678323,
165
+ "eval_loss": 1.5823713541030884,
166
+ "eval_runtime": 5.3625,
167
+ "eval_samples_per_second": 53.334,
168
+ "eval_steps_per_second": 0.559,
169
+ "step": 55
170
+ },
171
+ {
172
+ "epoch": 17.85,
173
+ "eval_accuracy": 0.27972027972027974,
174
+ "eval_loss": 1.5806101560592651,
175
+ "eval_runtime": 4.7671,
176
+ "eval_samples_per_second": 59.995,
177
+ "eval_steps_per_second": 0.629,
178
+ "step": 58
179
+ },
180
+ {
181
+ "epoch": 18.77,
182
+ "eval_accuracy": 0.2692307692307692,
183
+ "eval_loss": 1.5787912607192993,
184
+ "eval_runtime": 4.3086,
185
+ "eval_samples_per_second": 66.378,
186
+ "eval_steps_per_second": 0.696,
187
+ "step": 61
188
+ },
189
+ {
190
+ "epoch": 20.0,
191
+ "eval_accuracy": 0.2692307692307692,
192
+ "eval_loss": 1.576175332069397,
193
+ "eval_runtime": 5.3175,
194
+ "eval_samples_per_second": 53.784,
195
+ "eval_steps_per_second": 0.564,
196
+ "step": 65
197
+ },
198
+ {
199
+ "epoch": 20.92,
200
+ "eval_accuracy": 0.26573426573426573,
201
+ "eval_loss": 1.5740149021148682,
202
+ "eval_runtime": 4.5172,
203
+ "eval_samples_per_second": 63.314,
204
+ "eval_steps_per_second": 0.664,
205
+ "step": 68
206
+ },
207
+ {
208
+ "epoch": 21.85,
209
+ "eval_accuracy": 0.25524475524475526,
210
+ "eval_loss": 1.5717105865478516,
211
+ "eval_runtime": 3.9011,
212
+ "eval_samples_per_second": 73.312,
213
+ "eval_steps_per_second": 0.769,
214
+ "step": 71
215
+ },
216
+ {
217
+ "epoch": 22.77,
218
+ "eval_accuracy": 0.2517482517482518,
219
+ "eval_loss": 1.5693939924240112,
220
+ "eval_runtime": 3.9307,
221
+ "eval_samples_per_second": 72.76,
222
+ "eval_steps_per_second": 0.763,
223
+ "step": 74
224
+ },
225
+ {
226
+ "epoch": 24.0,
227
+ "eval_accuracy": 0.23776223776223776,
228
+ "eval_loss": 1.566083312034607,
229
+ "eval_runtime": 3.7134,
230
+ "eval_samples_per_second": 77.019,
231
+ "eval_steps_per_second": 0.808,
232
+ "step": 78
233
+ },
234
+ {
235
+ "epoch": 24.92,
236
+ "eval_accuracy": 0.23426573426573427,
237
+ "eval_loss": 1.5634570121765137,
238
+ "eval_runtime": 4.5234,
239
+ "eval_samples_per_second": 63.226,
240
+ "eval_steps_per_second": 0.663,
241
+ "step": 81
242
+ },
243
+ {
244
+ "epoch": 25.85,
245
+ "eval_accuracy": 0.22377622377622378,
246
+ "eval_loss": 1.5608404874801636,
247
+ "eval_runtime": 4.4129,
248
+ "eval_samples_per_second": 64.81,
249
+ "eval_steps_per_second": 0.68,
250
+ "step": 84
251
+ },
252
+ {
253
+ "epoch": 26.77,
254
+ "eval_accuracy": 0.22377622377622378,
255
+ "eval_loss": 1.5581375360488892,
256
+ "eval_runtime": 4.7168,
257
+ "eval_samples_per_second": 60.635,
258
+ "eval_steps_per_second": 0.636,
259
+ "step": 87
260
+ },
261
+ {
262
+ "epoch": 28.0,
263
+ "eval_accuracy": 0.22727272727272727,
264
+ "eval_loss": 1.5542311668395996,
265
+ "eval_runtime": 5.4736,
266
+ "eval_samples_per_second": 52.251,
267
+ "eval_steps_per_second": 0.548,
268
+ "step": 91
269
+ },
270
+ {
271
+ "epoch": 28.92,
272
+ "eval_accuracy": 0.22727272727272727,
273
+ "eval_loss": 1.5511480569839478,
274
+ "eval_runtime": 5.6532,
275
+ "eval_samples_per_second": 50.591,
276
+ "eval_steps_per_second": 0.531,
277
+ "step": 94
278
+ },
279
+ {
280
+ "epoch": 29.85,
281
+ "eval_accuracy": 0.22727272727272727,
282
+ "eval_loss": 1.5479341745376587,
283
+ "eval_runtime": 5.2852,
284
+ "eval_samples_per_second": 54.113,
285
+ "eval_steps_per_second": 0.568,
286
+ "step": 97
287
+ },
288
+ {
289
+ "epoch": 30.77,
290
+ "grad_norm": 68930.8125,
291
+ "learning_rate": 2.857142857142857e-05,
292
+ "loss": 1.5431,
293
+ "step": 100
294
+ },
295
+ {
296
+ "epoch": 30.77,
297
+ "eval_accuracy": 0.22727272727272727,
298
+ "eval_loss": 1.5448040962219238,
299
+ "eval_runtime": 4.6157,
300
+ "eval_samples_per_second": 61.962,
301
+ "eval_steps_per_second": 0.65,
302
+ "step": 100
303
+ },
304
+ {
305
+ "epoch": 32.0,
306
+ "eval_accuracy": 0.22727272727272727,
307
+ "eval_loss": 1.5407565832138062,
308
+ "eval_runtime": 6.2131,
309
+ "eval_samples_per_second": 46.032,
310
+ "eval_steps_per_second": 0.483,
311
+ "step": 104
312
+ },
313
+ {
314
+ "epoch": 32.92,
315
+ "eval_accuracy": 0.22727272727272727,
316
+ "eval_loss": 1.5379865169525146,
317
+ "eval_runtime": 4.645,
318
+ "eval_samples_per_second": 61.571,
319
+ "eval_steps_per_second": 0.646,
320
+ "step": 107
321
+ },
322
+ {
323
+ "epoch": 33.85,
324
+ "eval_accuracy": 0.22727272727272727,
325
+ "eval_loss": 1.5359200239181519,
326
+ "eval_runtime": 5.5884,
327
+ "eval_samples_per_second": 51.178,
328
+ "eval_steps_per_second": 0.537,
329
+ "step": 110
330
+ },
331
+ {
332
+ "epoch": 34.77,
333
+ "eval_accuracy": 0.22727272727272727,
334
+ "eval_loss": 1.5345218181610107,
335
+ "eval_runtime": 4.5718,
336
+ "eval_samples_per_second": 62.557,
337
+ "eval_steps_per_second": 0.656,
338
+ "step": 113
339
+ },
340
+ {
341
+ "epoch": 36.0,
342
+ "eval_accuracy": 0.22727272727272727,
343
+ "eval_loss": 1.5334985256195068,
344
+ "eval_runtime": 5.3526,
345
+ "eval_samples_per_second": 53.432,
346
+ "eval_steps_per_second": 0.56,
347
+ "step": 117
348
+ },
349
+ {
350
+ "epoch": 36.92,
351
+ "eval_accuracy": 0.22727272727272727,
352
+ "eval_loss": 1.5340909957885742,
353
+ "eval_runtime": 4.471,
354
+ "eval_samples_per_second": 63.967,
355
+ "eval_steps_per_second": 0.671,
356
+ "step": 120
357
+ },
358
+ {
359
+ "epoch": 37.85,
360
+ "eval_accuracy": 0.22727272727272727,
361
+ "eval_loss": 1.5361381769180298,
362
+ "eval_runtime": 3.5623,
363
+ "eval_samples_per_second": 80.286,
364
+ "eval_steps_per_second": 0.842,
365
+ "step": 123
366
+ },
367
+ {
368
+ "epoch": 38.77,
369
+ "eval_accuracy": 0.22727272727272727,
370
+ "eval_loss": 1.5397439002990723,
371
+ "eval_runtime": 4.9023,
372
+ "eval_samples_per_second": 58.34,
373
+ "eval_steps_per_second": 0.612,
374
+ "step": 126
375
+ },
376
+ {
377
+ "epoch": 40.0,
378
+ "eval_accuracy": 0.22727272727272727,
379
+ "eval_loss": 1.5478534698486328,
380
+ "eval_runtime": 3.7352,
381
+ "eval_samples_per_second": 76.569,
382
+ "eval_steps_per_second": 0.803,
383
+ "step": 130
384
+ },
385
+ {
386
+ "epoch": 40.92,
387
+ "eval_accuracy": 0.22727272727272727,
388
+ "eval_loss": 1.5564229488372803,
389
+ "eval_runtime": 4.3225,
390
+ "eval_samples_per_second": 66.166,
391
+ "eval_steps_per_second": 0.694,
392
+ "step": 133
393
+ },
394
+ {
395
+ "epoch": 41.85,
396
+ "eval_accuracy": 0.22727272727272727,
397
+ "eval_loss": 1.5678777694702148,
398
+ "eval_runtime": 4.6076,
399
+ "eval_samples_per_second": 62.072,
400
+ "eval_steps_per_second": 0.651,
401
+ "step": 136
402
+ },
403
+ {
404
+ "epoch": 42.77,
405
+ "eval_accuracy": 0.22727272727272727,
406
+ "eval_loss": 1.5821971893310547,
407
+ "eval_runtime": 4.2697,
408
+ "eval_samples_per_second": 66.983,
409
+ "eval_steps_per_second": 0.703,
410
+ "step": 139
411
+ },
412
+ {
413
+ "epoch": 44.0,
414
+ "eval_accuracy": 0.22727272727272727,
415
+ "eval_loss": 1.6002099514007568,
416
+ "eval_runtime": 4.533,
417
+ "eval_samples_per_second": 63.094,
418
+ "eval_steps_per_second": 0.662,
419
+ "step": 143
420
+ },
421
+ {
422
+ "epoch": 44.92,
423
+ "eval_accuracy": 0.22727272727272727,
424
+ "eval_loss": 1.6109449863433838,
425
+ "eval_runtime": 3.9799,
426
+ "eval_samples_per_second": 71.861,
427
+ "eval_steps_per_second": 0.754,
428
+ "step": 146
429
+ },
430
+ {
431
+ "epoch": 45.85,
432
+ "eval_accuracy": 0.22727272727272727,
433
+ "eval_loss": 1.6145771741867065,
434
+ "eval_runtime": 4.3613,
435
+ "eval_samples_per_second": 65.576,
436
+ "eval_steps_per_second": 0.688,
437
+ "step": 149
438
+ },
439
+ {
440
+ "epoch": 46.15,
441
+ "grad_norm": 45833.69921875,
442
+ "learning_rate": 2.857142857142857e-05,
443
+ "loss": 1.4033,
444
+ "step": 150
445
+ },
446
+ {
447
+ "epoch": 46.77,
448
+ "eval_accuracy": 0.22727272727272727,
449
+ "eval_loss": 1.6130825281143188,
450
+ "eval_runtime": 4.2963,
451
+ "eval_samples_per_second": 66.568,
452
+ "eval_steps_per_second": 0.698,
453
+ "step": 152
454
+ },
455
+ {
456
+ "epoch": 48.0,
457
+ "eval_accuracy": 0.22727272727272727,
458
+ "eval_loss": 1.6008453369140625,
459
+ "eval_runtime": 4.063,
460
+ "eval_samples_per_second": 70.391,
461
+ "eval_steps_per_second": 0.738,
462
+ "step": 156
463
+ },
464
+ {
465
+ "epoch": 48.92,
466
+ "eval_accuracy": 0.24125874125874125,
467
+ "eval_loss": 1.586226224899292,
468
+ "eval_runtime": 4.5029,
469
+ "eval_samples_per_second": 63.515,
470
+ "eval_steps_per_second": 0.666,
471
+ "step": 159
472
+ },
473
+ {
474
+ "epoch": 49.85,
475
+ "eval_accuracy": 0.2692307692307692,
476
+ "eval_loss": 1.572645902633667,
477
+ "eval_runtime": 5.0597,
478
+ "eval_samples_per_second": 56.525,
479
+ "eval_steps_per_second": 0.593,
480
+ "step": 162
481
+ },
482
+ {
483
+ "epoch": 50.77,
484
+ "eval_accuracy": 0.2692307692307692,
485
+ "eval_loss": 1.559901237487793,
486
+ "eval_runtime": 4.4174,
487
+ "eval_samples_per_second": 64.744,
488
+ "eval_steps_per_second": 0.679,
489
+ "step": 165
490
+ },
491
+ {
492
+ "epoch": 52.0,
493
+ "eval_accuracy": 0.2867132867132867,
494
+ "eval_loss": 1.5458828210830688,
495
+ "eval_runtime": 4.357,
496
+ "eval_samples_per_second": 65.642,
497
+ "eval_steps_per_second": 0.689,
498
+ "step": 169
499
+ },
500
+ {
501
+ "epoch": 52.92,
502
+ "eval_accuracy": 0.2937062937062937,
503
+ "eval_loss": 1.5382803678512573,
504
+ "eval_runtime": 5.6394,
505
+ "eval_samples_per_second": 50.714,
506
+ "eval_steps_per_second": 0.532,
507
+ "step": 172
508
+ },
509
+ {
510
+ "epoch": 53.85,
511
+ "eval_accuracy": 0.3146853146853147,
512
+ "eval_loss": 1.5310516357421875,
513
+ "eval_runtime": 4.4695,
514
+ "eval_samples_per_second": 63.989,
515
+ "eval_steps_per_second": 0.671,
516
+ "step": 175
517
+ },
518
+ {
519
+ "epoch": 54.77,
520
+ "eval_accuracy": 0.32517482517482516,
521
+ "eval_loss": 1.5242317914962769,
522
+ "eval_runtime": 3.8554,
523
+ "eval_samples_per_second": 74.181,
524
+ "eval_steps_per_second": 0.778,
525
+ "step": 178
526
+ },
527
+ {
528
+ "epoch": 56.0,
529
+ "eval_accuracy": 0.3356643356643357,
530
+ "eval_loss": 1.5169461965560913,
531
+ "eval_runtime": 3.9817,
532
+ "eval_samples_per_second": 71.828,
533
+ "eval_steps_per_second": 0.753,
534
+ "step": 182
535
+ },
536
+ {
537
+ "epoch": 56.92,
538
+ "eval_accuracy": 0.34265734265734266,
539
+ "eval_loss": 1.5103094577789307,
540
+ "eval_runtime": 3.9287,
541
+ "eval_samples_per_second": 72.797,
542
+ "eval_steps_per_second": 0.764,
543
+ "step": 185
544
+ },
545
+ {
546
+ "epoch": 57.85,
547
+ "eval_accuracy": 0.34615384615384615,
548
+ "eval_loss": 1.5055506229400635,
549
+ "eval_runtime": 4.3922,
550
+ "eval_samples_per_second": 65.115,
551
+ "eval_steps_per_second": 0.683,
552
+ "step": 188
553
+ },
554
+ {
555
+ "epoch": 58.77,
556
+ "eval_accuracy": 0.34615384615384615,
557
+ "eval_loss": 1.4995349645614624,
558
+ "eval_runtime": 4.2261,
559
+ "eval_samples_per_second": 67.675,
560
+ "eval_steps_per_second": 0.71,
561
+ "step": 191
562
+ },
563
+ {
564
+ "epoch": 60.0,
565
+ "eval_accuracy": 0.34965034965034963,
566
+ "eval_loss": 1.4939184188842773,
567
+ "eval_runtime": 3.9946,
568
+ "eval_samples_per_second": 71.597,
569
+ "eval_steps_per_second": 0.751,
570
+ "step": 195
571
+ },
572
+ {
573
+ "epoch": 60.92,
574
+ "eval_accuracy": 0.36013986013986016,
575
+ "eval_loss": 1.4870301485061646,
576
+ "eval_runtime": 4.7123,
577
+ "eval_samples_per_second": 60.693,
578
+ "eval_steps_per_second": 0.637,
579
+ "step": 198
580
+ },
581
+ {
582
+ "epoch": 61.54,
583
+ "grad_norm": 27324.4609375,
584
+ "learning_rate": 2.6984126984126984e-05,
585
+ "loss": 1.2485,
586
+ "step": 200
587
+ },
588
+ {
589
+ "epoch": 61.85,
590
+ "eval_accuracy": 0.36713286713286714,
591
+ "eval_loss": 1.4828742742538452,
592
+ "eval_runtime": 4.8484,
593
+ "eval_samples_per_second": 58.989,
594
+ "eval_steps_per_second": 0.619,
595
+ "step": 201
596
+ },
597
+ {
598
+ "epoch": 62.77,
599
+ "eval_accuracy": 0.3741258741258741,
600
+ "eval_loss": 1.4735387563705444,
601
+ "eval_runtime": 4.203,
602
+ "eval_samples_per_second": 68.047,
603
+ "eval_steps_per_second": 0.714,
604
+ "step": 204
605
+ },
606
+ {
607
+ "epoch": 64.0,
608
+ "eval_accuracy": 0.3811188811188811,
609
+ "eval_loss": 1.4612373113632202,
610
+ "eval_runtime": 4.6341,
611
+ "eval_samples_per_second": 61.716,
612
+ "eval_steps_per_second": 0.647,
613
+ "step": 208
614
+ },
615
+ {
616
+ "epoch": 64.92,
617
+ "eval_accuracy": 0.3986013986013986,
618
+ "eval_loss": 1.4491915702819824,
619
+ "eval_runtime": 3.9863,
620
+ "eval_samples_per_second": 71.745,
621
+ "eval_steps_per_second": 0.753,
622
+ "step": 211
623
+ },
624
+ {
625
+ "epoch": 65.85,
626
+ "eval_accuracy": 0.4125874125874126,
627
+ "eval_loss": 1.4364999532699585,
628
+ "eval_runtime": 4.1321,
629
+ "eval_samples_per_second": 69.214,
630
+ "eval_steps_per_second": 0.726,
631
+ "step": 214
632
+ },
633
+ {
634
+ "epoch": 66.77,
635
+ "eval_accuracy": 0.4230769230769231,
636
+ "eval_loss": 1.4226809740066528,
637
+ "eval_runtime": 4.2397,
638
+ "eval_samples_per_second": 67.458,
639
+ "eval_steps_per_second": 0.708,
640
+ "step": 217
641
+ },
642
+ {
643
+ "epoch": 68.0,
644
+ "eval_accuracy": 0.43356643356643354,
645
+ "eval_loss": 1.4095807075500488,
646
+ "eval_runtime": 3.8645,
647
+ "eval_samples_per_second": 74.007,
648
+ "eval_steps_per_second": 0.776,
649
+ "step": 221
650
+ },
651
+ {
652
+ "epoch": 68.92,
653
+ "eval_accuracy": 0.4370629370629371,
654
+ "eval_loss": 1.4010183811187744,
655
+ "eval_runtime": 4.5348,
656
+ "eval_samples_per_second": 63.068,
657
+ "eval_steps_per_second": 0.662,
658
+ "step": 224
659
+ },
660
+ {
661
+ "epoch": 69.85,
662
+ "eval_accuracy": 0.4405594405594406,
663
+ "eval_loss": 1.3949679136276245,
664
+ "eval_runtime": 4.4414,
665
+ "eval_samples_per_second": 64.394,
666
+ "eval_steps_per_second": 0.675,
667
+ "step": 227
668
+ },
669
+ {
670
+ "epoch": 70.77,
671
+ "eval_accuracy": 0.4370629370629371,
672
+ "eval_loss": 1.3919552564620972,
673
+ "eval_runtime": 4.3028,
674
+ "eval_samples_per_second": 66.468,
675
+ "eval_steps_per_second": 0.697,
676
+ "step": 230
677
+ },
678
+ {
679
+ "epoch": 72.0,
680
+ "eval_accuracy": 0.4405594405594406,
681
+ "eval_loss": 1.3798925876617432,
682
+ "eval_runtime": 3.4387,
683
+ "eval_samples_per_second": 83.17,
684
+ "eval_steps_per_second": 0.872,
685
+ "step": 234
686
+ },
687
+ {
688
+ "epoch": 72.92,
689
+ "eval_accuracy": 0.44755244755244755,
690
+ "eval_loss": 1.366864800453186,
691
+ "eval_runtime": 4.6503,
692
+ "eval_samples_per_second": 61.502,
693
+ "eval_steps_per_second": 0.645,
694
+ "step": 237
695
+ },
696
+ {
697
+ "epoch": 73.85,
698
+ "eval_accuracy": 0.45454545454545453,
699
+ "eval_loss": 1.3514918088912964,
700
+ "eval_runtime": 4.5609,
701
+ "eval_samples_per_second": 62.707,
702
+ "eval_steps_per_second": 0.658,
703
+ "step": 240
704
+ },
705
+ {
706
+ "epoch": 74.77,
707
+ "eval_accuracy": 0.47202797202797203,
708
+ "eval_loss": 1.3400850296020508,
709
+ "eval_runtime": 3.8017,
710
+ "eval_samples_per_second": 75.229,
711
+ "eval_steps_per_second": 0.789,
712
+ "step": 243
713
+ },
714
+ {
715
+ "epoch": 76.0,
716
+ "eval_accuracy": 0.4825174825174825,
717
+ "eval_loss": 1.3286209106445312,
718
+ "eval_runtime": 5.7477,
719
+ "eval_samples_per_second": 49.759,
720
+ "eval_steps_per_second": 0.522,
721
+ "step": 247
722
+ },
723
+ {
724
+ "epoch": 76.92,
725
+ "grad_norm": 23198.236328125,
726
+ "learning_rate": 2.5396825396825397e-05,
727
+ "loss": 1.1198,
728
+ "step": 250
729
+ },
730
+ {
731
+ "epoch": 76.92,
732
+ "eval_accuracy": 0.486013986013986,
733
+ "eval_loss": 1.317462682723999,
734
+ "eval_runtime": 4.5266,
735
+ "eval_samples_per_second": 63.182,
736
+ "eval_steps_per_second": 0.663,
737
+ "step": 250
738
+ },
739
+ {
740
+ "epoch": 77.85,
741
+ "eval_accuracy": 0.48951048951048953,
742
+ "eval_loss": 1.3067171573638916,
743
+ "eval_runtime": 3.882,
744
+ "eval_samples_per_second": 73.673,
745
+ "eval_steps_per_second": 0.773,
746
+ "step": 253
747
+ },
748
+ {
749
+ "epoch": 78.77,
750
+ "eval_accuracy": 0.4825174825174825,
751
+ "eval_loss": 1.3013015985488892,
752
+ "eval_runtime": 4.0902,
753
+ "eval_samples_per_second": 69.923,
754
+ "eval_steps_per_second": 0.733,
755
+ "step": 256
756
+ },
757
+ {
758
+ "epoch": 80.0,
759
+ "eval_accuracy": 0.479020979020979,
760
+ "eval_loss": 1.2954434156417847,
761
+ "eval_runtime": 5.4081,
762
+ "eval_samples_per_second": 52.884,
763
+ "eval_steps_per_second": 0.555,
764
+ "step": 260
765
+ },
766
+ {
767
+ "epoch": 80.92,
768
+ "eval_accuracy": 0.486013986013986,
769
+ "eval_loss": 1.289677381515503,
770
+ "eval_runtime": 4.384,
771
+ "eval_samples_per_second": 65.238,
772
+ "eval_steps_per_second": 0.684,
773
+ "step": 263
774
+ },
775
+ {
776
+ "epoch": 81.85,
777
+ "eval_accuracy": 0.486013986013986,
778
+ "eval_loss": 1.283199667930603,
779
+ "eval_runtime": 4.3325,
780
+ "eval_samples_per_second": 66.013,
781
+ "eval_steps_per_second": 0.692,
782
+ "step": 266
783
+ },
784
+ {
785
+ "epoch": 82.77,
786
+ "eval_accuracy": 0.4825174825174825,
787
+ "eval_loss": 1.2712346315383911,
788
+ "eval_runtime": 4.6039,
789
+ "eval_samples_per_second": 62.121,
790
+ "eval_steps_per_second": 0.652,
791
+ "step": 269
792
+ },
793
+ {
794
+ "epoch": 84.0,
795
+ "eval_accuracy": 0.493006993006993,
796
+ "eval_loss": 1.2584125995635986,
797
+ "eval_runtime": 4.5791,
798
+ "eval_samples_per_second": 62.458,
799
+ "eval_steps_per_second": 0.655,
800
+ "step": 273
801
+ },
802
+ {
803
+ "epoch": 84.92,
804
+ "eval_accuracy": 0.4965034965034965,
805
+ "eval_loss": 1.2516244649887085,
806
+ "eval_runtime": 4.8825,
807
+ "eval_samples_per_second": 58.577,
808
+ "eval_steps_per_second": 0.614,
809
+ "step": 276
810
+ },
811
+ {
812
+ "epoch": 85.85,
813
+ "eval_accuracy": 0.5,
814
+ "eval_loss": 1.2455971240997314,
815
+ "eval_runtime": 3.9744,
816
+ "eval_samples_per_second": 71.96,
817
+ "eval_steps_per_second": 0.755,
818
+ "step": 279
819
+ },
820
+ {
821
+ "epoch": 86.77,
822
+ "eval_accuracy": 0.5104895104895105,
823
+ "eval_loss": 1.2443982362747192,
824
+ "eval_runtime": 4.5207,
825
+ "eval_samples_per_second": 63.265,
826
+ "eval_steps_per_second": 0.664,
827
+ "step": 282
828
+ },
829
+ {
830
+ "epoch": 88.0,
831
+ "eval_accuracy": 0.5104895104895105,
832
+ "eval_loss": 1.2373132705688477,
833
+ "eval_runtime": 5.6152,
834
+ "eval_samples_per_second": 50.933,
835
+ "eval_steps_per_second": 0.534,
836
+ "step": 286
837
+ },
838
+ {
839
+ "epoch": 88.92,
840
+ "eval_accuracy": 0.513986013986014,
841
+ "eval_loss": 1.2309471368789673,
842
+ "eval_runtime": 4.7969,
843
+ "eval_samples_per_second": 59.622,
844
+ "eval_steps_per_second": 0.625,
845
+ "step": 289
846
+ },
847
+ {
848
+ "epoch": 89.85,
849
+ "eval_accuracy": 0.5209790209790209,
850
+ "eval_loss": 1.2219436168670654,
851
+ "eval_runtime": 4.2518,
852
+ "eval_samples_per_second": 67.266,
853
+ "eval_steps_per_second": 0.706,
854
+ "step": 292
855
+ },
856
+ {
857
+ "epoch": 90.77,
858
+ "eval_accuracy": 0.5209790209790209,
859
+ "eval_loss": 1.2145464420318604,
860
+ "eval_runtime": 4.6368,
861
+ "eval_samples_per_second": 61.68,
862
+ "eval_steps_per_second": 0.647,
863
+ "step": 295
864
+ },
865
+ {
866
+ "epoch": 92.0,
867
+ "eval_accuracy": 0.527972027972028,
868
+ "eval_loss": 1.2054263353347778,
869
+ "eval_runtime": 4.2071,
870
+ "eval_samples_per_second": 67.98,
871
+ "eval_steps_per_second": 0.713,
872
+ "step": 299
873
+ },
874
+ {
875
+ "epoch": 92.31,
876
+ "grad_norm": 29195.7578125,
877
+ "learning_rate": 2.380952380952381e-05,
878
+ "loss": 0.9915,
879
+ "step": 300
880
+ },
881
+ {
882
+ "epoch": 92.92,
883
+ "eval_accuracy": 0.534965034965035,
884
+ "eval_loss": 1.1981616020202637,
885
+ "eval_runtime": 4.3609,
886
+ "eval_samples_per_second": 65.583,
887
+ "eval_steps_per_second": 0.688,
888
+ "step": 302
889
+ },
890
+ {
891
+ "epoch": 93.85,
892
+ "eval_accuracy": 0.5384615384615384,
893
+ "eval_loss": 1.1913262605667114,
894
+ "eval_runtime": 3.9073,
895
+ "eval_samples_per_second": 73.197,
896
+ "eval_steps_per_second": 0.768,
897
+ "step": 305
898
+ },
899
+ {
900
+ "epoch": 94.77,
901
+ "eval_accuracy": 0.5454545454545454,
902
+ "eval_loss": 1.185881495475769,
903
+ "eval_runtime": 3.928,
904
+ "eval_samples_per_second": 72.811,
905
+ "eval_steps_per_second": 0.764,
906
+ "step": 308
907
+ },
908
+ {
909
+ "epoch": 96.0,
910
+ "eval_accuracy": 0.548951048951049,
911
+ "eval_loss": 1.179394006729126,
912
+ "eval_runtime": 4.1933,
913
+ "eval_samples_per_second": 68.204,
914
+ "eval_steps_per_second": 0.715,
915
+ "step": 312
916
+ },
917
+ {
918
+ "epoch": 96.92,
919
+ "eval_accuracy": 0.5454545454545454,
920
+ "eval_loss": 1.1733678579330444,
921
+ "eval_runtime": 5.0205,
922
+ "eval_samples_per_second": 56.967,
923
+ "eval_steps_per_second": 0.598,
924
+ "step": 315
925
+ },
926
+ {
927
+ "epoch": 97.85,
928
+ "eval_accuracy": 0.5524475524475524,
929
+ "eval_loss": 1.1637603044509888,
930
+ "eval_runtime": 4.8886,
931
+ "eval_samples_per_second": 58.503,
932
+ "eval_steps_per_second": 0.614,
933
+ "step": 318
934
+ },
935
+ {
936
+ "epoch": 98.77,
937
+ "eval_accuracy": 0.5524475524475524,
938
+ "eval_loss": 1.1549575328826904,
939
+ "eval_runtime": 4.9266,
940
+ "eval_samples_per_second": 58.052,
941
+ "eval_steps_per_second": 0.609,
942
+ "step": 321
943
+ },
944
+ {
945
+ "epoch": 100.0,
946
+ "eval_accuracy": 0.548951048951049,
947
+ "eval_loss": 1.1464989185333252,
948
+ "eval_runtime": 4.7642,
949
+ "eval_samples_per_second": 60.032,
950
+ "eval_steps_per_second": 0.63,
951
+ "step": 325
952
+ },
953
+ {
954
+ "epoch": 100.92,
955
+ "eval_accuracy": 0.5594405594405595,
956
+ "eval_loss": 1.1443748474121094,
957
+ "eval_runtime": 4.7025,
958
+ "eval_samples_per_second": 60.819,
959
+ "eval_steps_per_second": 0.638,
960
+ "step": 328
961
+ },
962
+ {
963
+ "epoch": 101.85,
964
+ "eval_accuracy": 0.5629370629370629,
965
+ "eval_loss": 1.1359333992004395,
966
+ "eval_runtime": 4.6342,
967
+ "eval_samples_per_second": 61.715,
968
+ "eval_steps_per_second": 0.647,
969
+ "step": 331
970
+ },
971
+ {
972
+ "epoch": 102.77,
973
+ "eval_accuracy": 0.5664335664335665,
974
+ "eval_loss": 1.1271060705184937,
975
+ "eval_runtime": 4.4245,
976
+ "eval_samples_per_second": 64.639,
977
+ "eval_steps_per_second": 0.678,
978
+ "step": 334
979
+ },
980
+ {
981
+ "epoch": 104.0,
982
+ "eval_accuracy": 0.5769230769230769,
983
+ "eval_loss": 1.109040379524231,
984
+ "eval_runtime": 4.9047,
985
+ "eval_samples_per_second": 58.311,
986
+ "eval_steps_per_second": 0.612,
987
+ "step": 338
988
+ },
989
+ {
990
+ "epoch": 104.92,
991
+ "eval_accuracy": 0.5944055944055944,
992
+ "eval_loss": 1.0972033739089966,
993
+ "eval_runtime": 4.5473,
994
+ "eval_samples_per_second": 62.895,
995
+ "eval_steps_per_second": 0.66,
996
+ "step": 341
997
+ },
998
+ {
999
+ "epoch": 105.85,
1000
+ "eval_accuracy": 0.6013986013986014,
1001
+ "eval_loss": 1.090105414390564,
1002
+ "eval_runtime": 3.7875,
1003
+ "eval_samples_per_second": 75.511,
1004
+ "eval_steps_per_second": 0.792,
1005
+ "step": 344
1006
+ },
1007
+ {
1008
+ "epoch": 106.77,
1009
+ "eval_accuracy": 0.6083916083916084,
1010
+ "eval_loss": 1.0809463262557983,
1011
+ "eval_runtime": 4.7656,
1012
+ "eval_samples_per_second": 60.014,
1013
+ "eval_steps_per_second": 0.63,
1014
+ "step": 347
1015
+ },
1016
+ {
1017
+ "epoch": 107.69,
1018
+ "grad_norm": 32308.33984375,
1019
+ "learning_rate": 2.222222222222222e-05,
1020
+ "loss": 0.8834,
1021
+ "step": 350
1022
+ },
1023
+ {
1024
+ "epoch": 108.0,
1025
+ "eval_accuracy": 0.6118881118881119,
1026
+ "eval_loss": 1.0683268308639526,
1027
+ "eval_runtime": 4.3145,
1028
+ "eval_samples_per_second": 66.288,
1029
+ "eval_steps_per_second": 0.695,
1030
+ "step": 351
1031
+ },
1032
+ {
1033
+ "epoch": 108.92,
1034
+ "eval_accuracy": 0.6223776223776224,
1035
+ "eval_loss": 1.0605404376983643,
1036
+ "eval_runtime": 4.6097,
1037
+ "eval_samples_per_second": 62.043,
1038
+ "eval_steps_per_second": 0.651,
1039
+ "step": 354
1040
+ },
1041
+ {
1042
+ "epoch": 109.85,
1043
+ "eval_accuracy": 0.6258741258741258,
1044
+ "eval_loss": 1.0562984943389893,
1045
+ "eval_runtime": 4.859,
1046
+ "eval_samples_per_second": 58.86,
1047
+ "eval_steps_per_second": 0.617,
1048
+ "step": 357
1049
+ },
1050
+ {
1051
+ "epoch": 110.77,
1052
+ "eval_accuracy": 0.6223776223776224,
1053
+ "eval_loss": 1.0537959337234497,
1054
+ "eval_runtime": 4.948,
1055
+ "eval_samples_per_second": 57.801,
1056
+ "eval_steps_per_second": 0.606,
1057
+ "step": 360
1058
+ },
1059
+ {
1060
+ "epoch": 112.0,
1061
+ "eval_accuracy": 0.6153846153846154,
1062
+ "eval_loss": 1.0491102933883667,
1063
+ "eval_runtime": 4.1434,
1064
+ "eval_samples_per_second": 69.026,
1065
+ "eval_steps_per_second": 0.724,
1066
+ "step": 364
1067
+ },
1068
+ {
1069
+ "epoch": 112.92,
1070
+ "eval_accuracy": 0.6118881118881119,
1071
+ "eval_loss": 1.044057011604309,
1072
+ "eval_runtime": 4.3774,
1073
+ "eval_samples_per_second": 65.336,
1074
+ "eval_steps_per_second": 0.685,
1075
+ "step": 367
1076
+ },
1077
+ {
1078
+ "epoch": 113.85,
1079
+ "eval_accuracy": 0.6118881118881119,
1080
+ "eval_loss": 1.0357924699783325,
1081
+ "eval_runtime": 4.7038,
1082
+ "eval_samples_per_second": 60.801,
1083
+ "eval_steps_per_second": 0.638,
1084
+ "step": 370
1085
+ },
1086
+ {
1087
+ "epoch": 114.77,
1088
+ "eval_accuracy": 0.6223776223776224,
1089
+ "eval_loss": 1.0194157361984253,
1090
+ "eval_runtime": 5.0902,
1091
+ "eval_samples_per_second": 56.187,
1092
+ "eval_steps_per_second": 0.589,
1093
+ "step": 373
1094
+ },
1095
+ {
1096
+ "epoch": 116.0,
1097
+ "eval_accuracy": 0.6293706293706294,
1098
+ "eval_loss": 1.0034115314483643,
1099
+ "eval_runtime": 4.386,
1100
+ "eval_samples_per_second": 65.208,
1101
+ "eval_steps_per_second": 0.684,
1102
+ "step": 377
1103
+ },
1104
+ {
1105
+ "epoch": 116.92,
1106
+ "eval_accuracy": 0.6258741258741258,
1107
+ "eval_loss": 0.9991269707679749,
1108
+ "eval_runtime": 5.2708,
1109
+ "eval_samples_per_second": 54.261,
1110
+ "eval_steps_per_second": 0.569,
1111
+ "step": 380
1112
+ },
1113
+ {
1114
+ "epoch": 117.85,
1115
+ "eval_accuracy": 0.6258741258741258,
1116
+ "eval_loss": 0.9959561824798584,
1117
+ "eval_runtime": 4.7556,
1118
+ "eval_samples_per_second": 60.139,
1119
+ "eval_steps_per_second": 0.631,
1120
+ "step": 383
1121
+ },
1122
+ {
1123
+ "epoch": 118.77,
1124
+ "eval_accuracy": 0.6293706293706294,
1125
+ "eval_loss": 0.9911425113677979,
1126
+ "eval_runtime": 4.0817,
1127
+ "eval_samples_per_second": 70.068,
1128
+ "eval_steps_per_second": 0.735,
1129
+ "step": 386
1130
+ },
1131
+ {
1132
+ "epoch": 120.0,
1133
+ "eval_accuracy": 0.6433566433566433,
1134
+ "eval_loss": 0.9834115505218506,
1135
+ "eval_runtime": 4.0058,
1136
+ "eval_samples_per_second": 71.396,
1137
+ "eval_steps_per_second": 0.749,
1138
+ "step": 390
1139
+ },
1140
+ {
1141
+ "epoch": 120.92,
1142
+ "eval_accuracy": 0.6433566433566433,
1143
+ "eval_loss": 0.9775691628456116,
1144
+ "eval_runtime": 4.3856,
1145
+ "eval_samples_per_second": 65.214,
1146
+ "eval_steps_per_second": 0.684,
1147
+ "step": 393
1148
+ },
1149
+ {
1150
+ "epoch": 121.85,
1151
+ "eval_accuracy": 0.6433566433566433,
1152
+ "eval_loss": 0.9772741198539734,
1153
+ "eval_runtime": 4.6976,
1154
+ "eval_samples_per_second": 60.882,
1155
+ "eval_steps_per_second": 0.639,
1156
+ "step": 396
1157
+ },
1158
+ {
1159
+ "epoch": 122.77,
1160
+ "eval_accuracy": 0.6433566433566433,
1161
+ "eval_loss": 0.9734641909599304,
1162
+ "eval_runtime": 4.6506,
1163
+ "eval_samples_per_second": 61.498,
1164
+ "eval_steps_per_second": 0.645,
1165
+ "step": 399
1166
+ },
1167
+ {
1168
+ "epoch": 123.08,
1169
+ "grad_norm": 27630.990234375,
1170
+ "learning_rate": 2.0634920634920633e-05,
1171
+ "loss": 0.7786,
1172
+ "step": 400
1173
+ },
1174
+ {
1175
+ "epoch": 124.0,
1176
+ "eval_accuracy": 0.6398601398601399,
1177
+ "eval_loss": 0.9730696082115173,
1178
+ "eval_runtime": 3.9976,
1179
+ "eval_samples_per_second": 71.542,
1180
+ "eval_steps_per_second": 0.75,
1181
+ "step": 403
1182
+ },
1183
+ {
1184
+ "epoch": 124.92,
1185
+ "eval_accuracy": 0.6433566433566433,
1186
+ "eval_loss": 0.9727755188941956,
1187
+ "eval_runtime": 4.0553,
1188
+ "eval_samples_per_second": 70.525,
1189
+ "eval_steps_per_second": 0.74,
1190
+ "step": 406
1191
+ },
1192
+ {
1193
+ "epoch": 125.85,
1194
+ "eval_accuracy": 0.6573426573426573,
1195
+ "eval_loss": 0.9657326936721802,
1196
+ "eval_runtime": 4.4666,
1197
+ "eval_samples_per_second": 64.031,
1198
+ "eval_steps_per_second": 0.672,
1199
+ "step": 409
1200
+ },
1201
+ {
1202
+ "epoch": 126.77,
1203
+ "eval_accuracy": 0.6573426573426573,
1204
+ "eval_loss": 0.9547586441040039,
1205
+ "eval_runtime": 4.6999,
1206
+ "eval_samples_per_second": 60.852,
1207
+ "eval_steps_per_second": 0.638,
1208
+ "step": 412
1209
+ },
1210
+ {
1211
+ "epoch": 128.0,
1212
+ "eval_accuracy": 0.6643356643356644,
1213
+ "eval_loss": 0.942358136177063,
1214
+ "eval_runtime": 4.8438,
1215
+ "eval_samples_per_second": 59.045,
1216
+ "eval_steps_per_second": 0.619,
1217
+ "step": 416
1218
+ },
1219
+ {
1220
+ "epoch": 128.92,
1221
+ "eval_accuracy": 0.6678321678321678,
1222
+ "eval_loss": 0.9391436576843262,
1223
+ "eval_runtime": 4.4506,
1224
+ "eval_samples_per_second": 64.261,
1225
+ "eval_steps_per_second": 0.674,
1226
+ "step": 419
1227
+ },
1228
+ {
1229
+ "epoch": 129.85,
1230
+ "eval_accuracy": 0.6678321678321678,
1231
+ "eval_loss": 0.9418392777442932,
1232
+ "eval_runtime": 4.2912,
1233
+ "eval_samples_per_second": 66.648,
1234
+ "eval_steps_per_second": 0.699,
1235
+ "step": 422
1236
+ },
1237
+ {
1238
+ "epoch": 130.77,
1239
+ "eval_accuracy": 0.6608391608391608,
1240
+ "eval_loss": 0.9476207494735718,
1241
+ "eval_runtime": 4.7281,
1242
+ "eval_samples_per_second": 60.49,
1243
+ "eval_steps_per_second": 0.635,
1244
+ "step": 425
1245
+ },
1246
+ {
1247
+ "epoch": 132.0,
1248
+ "eval_accuracy": 0.6643356643356644,
1249
+ "eval_loss": 0.9457269310951233,
1250
+ "eval_runtime": 4.314,
1251
+ "eval_samples_per_second": 66.295,
1252
+ "eval_steps_per_second": 0.695,
1253
+ "step": 429
1254
+ },
1255
+ {
1256
+ "epoch": 132.92,
1257
+ "eval_accuracy": 0.6643356643356644,
1258
+ "eval_loss": 0.941338062286377,
1259
+ "eval_runtime": 3.916,
1260
+ "eval_samples_per_second": 73.033,
1261
+ "eval_steps_per_second": 0.766,
1262
+ "step": 432
1263
+ },
1264
+ {
1265
+ "epoch": 133.85,
1266
+ "eval_accuracy": 0.6678321678321678,
1267
+ "eval_loss": 0.9334166049957275,
1268
+ "eval_runtime": 4.5886,
1269
+ "eval_samples_per_second": 62.329,
1270
+ "eval_steps_per_second": 0.654,
1271
+ "step": 435
1272
+ },
1273
+ {
1274
+ "epoch": 134.77,
1275
+ "eval_accuracy": 0.6678321678321678,
1276
+ "eval_loss": 0.9328890442848206,
1277
+ "eval_runtime": 4.1417,
1278
+ "eval_samples_per_second": 69.054,
1279
+ "eval_steps_per_second": 0.724,
1280
+ "step": 438
1281
+ },
1282
+ {
1283
+ "epoch": 136.0,
1284
+ "eval_accuracy": 0.6713286713286714,
1285
+ "eval_loss": 0.9333996772766113,
1286
+ "eval_runtime": 4.538,
1287
+ "eval_samples_per_second": 63.023,
1288
+ "eval_steps_per_second": 0.661,
1289
+ "step": 442
1290
+ },
1291
+ {
1292
+ "epoch": 136.92,
1293
+ "eval_accuracy": 0.6713286713286714,
1294
+ "eval_loss": 0.9264596700668335,
1295
+ "eval_runtime": 4.6642,
1296
+ "eval_samples_per_second": 61.318,
1297
+ "eval_steps_per_second": 0.643,
1298
+ "step": 445
1299
+ },
1300
+ {
1301
+ "epoch": 137.85,
1302
+ "eval_accuracy": 0.6713286713286714,
1303
+ "eval_loss": 0.9186587929725647,
1304
+ "eval_runtime": 4.5978,
1305
+ "eval_samples_per_second": 62.204,
1306
+ "eval_steps_per_second": 0.652,
1307
+ "step": 448
1308
+ },
1309
+ {
1310
+ "epoch": 138.46,
1311
+ "grad_norm": 34684.0078125,
1312
+ "learning_rate": 1.9047619047619046e-05,
1313
+ "loss": 0.7133,
1314
+ "step": 450
1315
+ },
1316
+ {
1317
+ "epoch": 138.77,
1318
+ "eval_accuracy": 0.6678321678321678,
1319
+ "eval_loss": 0.916916012763977,
1320
+ "eval_runtime": 4.1718,
1321
+ "eval_samples_per_second": 68.556,
1322
+ "eval_steps_per_second": 0.719,
1323
+ "step": 451
1324
+ },
1325
+ {
1326
+ "epoch": 140.0,
1327
+ "eval_accuracy": 0.6713286713286714,
1328
+ "eval_loss": 0.9141567349433899,
1329
+ "eval_runtime": 4.8158,
1330
+ "eval_samples_per_second": 59.388,
1331
+ "eval_steps_per_second": 0.623,
1332
+ "step": 455
1333
+ },
1334
+ {
1335
+ "epoch": 140.92,
1336
+ "eval_accuracy": 0.6713286713286714,
1337
+ "eval_loss": 0.9131244421005249,
1338
+ "eval_runtime": 4.3984,
1339
+ "eval_samples_per_second": 65.024,
1340
+ "eval_steps_per_second": 0.682,
1341
+ "step": 458
1342
+ },
1343
+ {
1344
+ "epoch": 141.85,
1345
+ "eval_accuracy": 0.6783216783216783,
1346
+ "eval_loss": 0.9160958528518677,
1347
+ "eval_runtime": 3.9738,
1348
+ "eval_samples_per_second": 71.971,
1349
+ "eval_steps_per_second": 0.755,
1350
+ "step": 461
1351
+ },
1352
+ {
1353
+ "epoch": 142.77,
1354
+ "eval_accuracy": 0.6678321678321678,
1355
+ "eval_loss": 0.9223662614822388,
1356
+ "eval_runtime": 3.7836,
1357
+ "eval_samples_per_second": 75.589,
1358
+ "eval_steps_per_second": 0.793,
1359
+ "step": 464
1360
+ },
1361
+ {
1362
+ "epoch": 144.0,
1363
+ "eval_accuracy": 0.6748251748251748,
1364
+ "eval_loss": 0.9139449000358582,
1365
+ "eval_runtime": 4.0554,
1366
+ "eval_samples_per_second": 70.522,
1367
+ "eval_steps_per_second": 0.74,
1368
+ "step": 468
1369
+ },
1370
+ {
1371
+ "epoch": 144.92,
1372
+ "eval_accuracy": 0.6748251748251748,
1373
+ "eval_loss": 0.9089756608009338,
1374
+ "eval_runtime": 4.4989,
1375
+ "eval_samples_per_second": 63.571,
1376
+ "eval_steps_per_second": 0.667,
1377
+ "step": 471
1378
+ },
1379
+ {
1380
+ "epoch": 145.85,
1381
+ "eval_accuracy": 0.6713286713286714,
1382
+ "eval_loss": 0.9072948694229126,
1383
+ "eval_runtime": 3.984,
1384
+ "eval_samples_per_second": 71.788,
1385
+ "eval_steps_per_second": 0.753,
1386
+ "step": 474
1387
+ },
1388
+ {
1389
+ "epoch": 146.77,
1390
+ "eval_accuracy": 0.6608391608391608,
1391
+ "eval_loss": 0.9110231995582581,
1392
+ "eval_runtime": 4.596,
1393
+ "eval_samples_per_second": 62.228,
1394
+ "eval_steps_per_second": 0.653,
1395
+ "step": 477
1396
+ },
1397
+ {
1398
+ "epoch": 148.0,
1399
+ "eval_accuracy": 0.6573426573426573,
1400
+ "eval_loss": 0.9167369604110718,
1401
+ "eval_runtime": 4.7051,
1402
+ "eval_samples_per_second": 60.785,
1403
+ "eval_steps_per_second": 0.638,
1404
+ "step": 481
1405
+ },
1406
+ {
1407
+ "epoch": 148.92,
1408
+ "eval_accuracy": 0.6643356643356644,
1409
+ "eval_loss": 0.9118071794509888,
1410
+ "eval_runtime": 3.9295,
1411
+ "eval_samples_per_second": 72.783,
1412
+ "eval_steps_per_second": 0.763,
1413
+ "step": 484
1414
+ },
1415
+ {
1416
+ "epoch": 149.85,
1417
+ "eval_accuracy": 0.6713286713286714,
1418
+ "eval_loss": 0.8996461629867554,
1419
+ "eval_runtime": 4.5063,
1420
+ "eval_samples_per_second": 63.466,
1421
+ "eval_steps_per_second": 0.666,
1422
+ "step": 487
1423
+ },
1424
+ {
1425
+ "epoch": 150.77,
1426
+ "eval_accuracy": 0.6748251748251748,
1427
+ "eval_loss": 0.8903929591178894,
1428
+ "eval_runtime": 4.0074,
1429
+ "eval_samples_per_second": 71.369,
1430
+ "eval_steps_per_second": 0.749,
1431
+ "step": 490
1432
+ },
1433
+ {
1434
+ "epoch": 152.0,
1435
+ "eval_accuracy": 0.6748251748251748,
1436
+ "eval_loss": 0.8889052867889404,
1437
+ "eval_runtime": 4.2482,
1438
+ "eval_samples_per_second": 67.323,
1439
+ "eval_steps_per_second": 0.706,
1440
+ "step": 494
1441
+ },
1442
+ {
1443
+ "epoch": 152.92,
1444
+ "eval_accuracy": 0.6713286713286714,
1445
+ "eval_loss": 0.889894425868988,
1446
+ "eval_runtime": 4.7658,
1447
+ "eval_samples_per_second": 60.011,
1448
+ "eval_steps_per_second": 0.629,
1449
+ "step": 497
1450
+ },
1451
+ {
1452
+ "epoch": 153.85,
1453
+ "grad_norm": 27670.865234375,
1454
+ "learning_rate": 1.746031746031746e-05,
1455
+ "loss": 0.6674,
1456
+ "step": 500
1457
+ },
1458
+ {
1459
+ "epoch": 153.85,
1460
+ "eval_accuracy": 0.6748251748251748,
1461
+ "eval_loss": 0.887377917766571,
1462
+ "eval_runtime": 4.6951,
1463
+ "eval_samples_per_second": 60.915,
1464
+ "eval_steps_per_second": 0.639,
1465
+ "step": 500
1466
+ },
1467
+ {
1468
+ "epoch": 154.77,
1469
+ "eval_accuracy": 0.6748251748251748,
1470
+ "eval_loss": 0.8873924016952515,
1471
+ "eval_runtime": 3.8042,
1472
+ "eval_samples_per_second": 75.181,
1473
+ "eval_steps_per_second": 0.789,
1474
+ "step": 503
1475
+ },
1476
+ {
1477
+ "epoch": 156.0,
1478
+ "eval_accuracy": 0.6748251748251748,
1479
+ "eval_loss": 0.8905075788497925,
1480
+ "eval_runtime": 3.9282,
1481
+ "eval_samples_per_second": 72.806,
1482
+ "eval_steps_per_second": 0.764,
1483
+ "step": 507
1484
+ },
1485
+ {
1486
+ "epoch": 156.92,
1487
+ "eval_accuracy": 0.6783216783216783,
1488
+ "eval_loss": 0.8881194591522217,
1489
+ "eval_runtime": 4.2085,
1490
+ "eval_samples_per_second": 67.957,
1491
+ "eval_steps_per_second": 0.713,
1492
+ "step": 510
1493
+ },
1494
+ {
1495
+ "epoch": 157.85,
1496
+ "eval_accuracy": 0.6748251748251748,
1497
+ "eval_loss": 0.882903516292572,
1498
+ "eval_runtime": 5.345,
1499
+ "eval_samples_per_second": 53.508,
1500
+ "eval_steps_per_second": 0.561,
1501
+ "step": 513
1502
+ },
1503
+ {
1504
+ "epoch": 158.77,
1505
+ "eval_accuracy": 0.6783216783216783,
1506
+ "eval_loss": 0.8809071183204651,
1507
+ "eval_runtime": 4.4142,
1508
+ "eval_samples_per_second": 64.791,
1509
+ "eval_steps_per_second": 0.68,
1510
+ "step": 516
1511
+ },
1512
+ {
1513
+ "epoch": 160.0,
1514
+ "eval_accuracy": 0.6783216783216783,
1515
+ "eval_loss": 0.8780828714370728,
1516
+ "eval_runtime": 3.6498,
1517
+ "eval_samples_per_second": 78.361,
1518
+ "eval_steps_per_second": 0.822,
1519
+ "step": 520
1520
+ },
1521
+ {
1522
+ "epoch": 160.92,
1523
+ "eval_accuracy": 0.6818181818181818,
1524
+ "eval_loss": 0.8776365518569946,
1525
+ "eval_runtime": 3.4668,
1526
+ "eval_samples_per_second": 82.497,
1527
+ "eval_steps_per_second": 0.865,
1528
+ "step": 523
1529
+ },
1530
+ {
1531
+ "epoch": 161.85,
1532
+ "eval_accuracy": 0.6783216783216783,
1533
+ "eval_loss": 0.8795685768127441,
1534
+ "eval_runtime": 3.8004,
1535
+ "eval_samples_per_second": 75.256,
1536
+ "eval_steps_per_second": 0.789,
1537
+ "step": 526
1538
+ },
1539
+ {
1540
+ "epoch": 162.77,
1541
+ "eval_accuracy": 0.6818181818181818,
1542
+ "eval_loss": 0.8795468807220459,
1543
+ "eval_runtime": 3.8694,
1544
+ "eval_samples_per_second": 73.913,
1545
+ "eval_steps_per_second": 0.775,
1546
+ "step": 529
1547
+ },
1548
+ {
1549
+ "epoch": 164.0,
1550
+ "eval_accuracy": 0.6783216783216783,
1551
+ "eval_loss": 0.8797011971473694,
1552
+ "eval_runtime": 4.1348,
1553
+ "eval_samples_per_second": 69.169,
1554
+ "eval_steps_per_second": 0.726,
1555
+ "step": 533
1556
+ },
1557
+ {
1558
+ "epoch": 164.92,
1559
+ "eval_accuracy": 0.6783216783216783,
1560
+ "eval_loss": 0.8706856966018677,
1561
+ "eval_runtime": 4.5762,
1562
+ "eval_samples_per_second": 62.498,
1563
+ "eval_steps_per_second": 0.656,
1564
+ "step": 536
1565
+ },
1566
+ {
1567
+ "epoch": 165.85,
1568
+ "eval_accuracy": 0.6783216783216783,
1569
+ "eval_loss": 0.8697258830070496,
1570
+ "eval_runtime": 3.5794,
1571
+ "eval_samples_per_second": 79.901,
1572
+ "eval_steps_per_second": 0.838,
1573
+ "step": 539
1574
+ },
1575
+ {
1576
+ "epoch": 166.77,
1577
+ "eval_accuracy": 0.6783216783216783,
1578
+ "eval_loss": 0.8723975419998169,
1579
+ "eval_runtime": 5.761,
1580
+ "eval_samples_per_second": 49.644,
1581
+ "eval_steps_per_second": 0.521,
1582
+ "step": 542
1583
+ },
1584
+ {
1585
+ "epoch": 168.0,
1586
+ "eval_accuracy": 0.6748251748251748,
1587
+ "eval_loss": 0.870445966720581,
1588
+ "eval_runtime": 4.2907,
1589
+ "eval_samples_per_second": 66.656,
1590
+ "eval_steps_per_second": 0.699,
1591
+ "step": 546
1592
+ },
1593
+ {
1594
+ "epoch": 168.92,
1595
+ "eval_accuracy": 0.6748251748251748,
1596
+ "eval_loss": 0.8693636655807495,
1597
+ "eval_runtime": 4.5637,
1598
+ "eval_samples_per_second": 62.668,
1599
+ "eval_steps_per_second": 0.657,
1600
+ "step": 549
1601
+ },
1602
+ {
1603
+ "epoch": 169.23,
1604
+ "grad_norm": 67537.203125,
1605
+ "learning_rate": 1.5873015873015872e-05,
1606
+ "loss": 0.6305,
1607
+ "step": 550
1608
+ },
1609
+ {
1610
+ "epoch": 169.85,
1611
+ "eval_accuracy": 0.6748251748251748,
1612
+ "eval_loss": 0.8739539980888367,
1613
+ "eval_runtime": 4.5496,
1614
+ "eval_samples_per_second": 62.862,
1615
+ "eval_steps_per_second": 0.659,
1616
+ "step": 552
1617
+ },
1618
+ {
1619
+ "epoch": 170.77,
1620
+ "eval_accuracy": 0.6748251748251748,
1621
+ "eval_loss": 0.8713040947914124,
1622
+ "eval_runtime": 4.3907,
1623
+ "eval_samples_per_second": 65.138,
1624
+ "eval_steps_per_second": 0.683,
1625
+ "step": 555
1626
+ },
1627
+ {
1628
+ "epoch": 172.0,
1629
+ "eval_accuracy": 0.6783216783216783,
1630
+ "eval_loss": 0.8682331442832947,
1631
+ "eval_runtime": 4.1777,
1632
+ "eval_samples_per_second": 68.459,
1633
+ "eval_steps_per_second": 0.718,
1634
+ "step": 559
1635
+ },
1636
+ {
1637
+ "epoch": 172.92,
1638
+ "eval_accuracy": 0.6783216783216783,
1639
+ "eval_loss": 0.868798553943634,
1640
+ "eval_runtime": 3.5218,
1641
+ "eval_samples_per_second": 81.207,
1642
+ "eval_steps_per_second": 0.852,
1643
+ "step": 562
1644
+ },
1645
+ {
1646
+ "epoch": 173.85,
1647
+ "eval_accuracy": 0.6818181818181818,
1648
+ "eval_loss": 0.8692768216133118,
1649
+ "eval_runtime": 5.0064,
1650
+ "eval_samples_per_second": 57.127,
1651
+ "eval_steps_per_second": 0.599,
1652
+ "step": 565
1653
+ },
1654
+ {
1655
+ "epoch": 174.77,
1656
+ "eval_accuracy": 0.6783216783216783,
1657
+ "eval_loss": 0.874369204044342,
1658
+ "eval_runtime": 4.1257,
1659
+ "eval_samples_per_second": 69.322,
1660
+ "eval_steps_per_second": 0.727,
1661
+ "step": 568
1662
+ },
1663
+ {
1664
+ "epoch": 176.0,
1665
+ "eval_accuracy": 0.6783216783216783,
1666
+ "eval_loss": 0.8759630918502808,
1667
+ "eval_runtime": 4.4848,
1668
+ "eval_samples_per_second": 63.771,
1669
+ "eval_steps_per_second": 0.669,
1670
+ "step": 572
1671
+ },
1672
+ {
1673
+ "epoch": 176.92,
1674
+ "eval_accuracy": 0.6853146853146853,
1675
+ "eval_loss": 0.8696449398994446,
1676
+ "eval_runtime": 4.1683,
1677
+ "eval_samples_per_second": 68.613,
1678
+ "eval_steps_per_second": 0.72,
1679
+ "step": 575
1680
+ },
1681
+ {
1682
+ "epoch": 177.85,
1683
+ "eval_accuracy": 0.6853146853146853,
1684
+ "eval_loss": 0.8668593764305115,
1685
+ "eval_runtime": 4.3889,
1686
+ "eval_samples_per_second": 65.165,
1687
+ "eval_steps_per_second": 0.684,
1688
+ "step": 578
1689
+ },
1690
+ {
1691
+ "epoch": 178.77,
1692
+ "eval_accuracy": 0.6853146853146853,
1693
+ "eval_loss": 0.8641146421432495,
1694
+ "eval_runtime": 4.0742,
1695
+ "eval_samples_per_second": 70.197,
1696
+ "eval_steps_per_second": 0.736,
1697
+ "step": 581
1698
+ },
1699
+ {
1700
+ "epoch": 180.0,
1701
+ "eval_accuracy": 0.6713286713286714,
1702
+ "eval_loss": 0.8696537613868713,
1703
+ "eval_runtime": 4.1345,
1704
+ "eval_samples_per_second": 69.173,
1705
+ "eval_steps_per_second": 0.726,
1706
+ "step": 585
1707
+ },
1708
+ {
1709
+ "epoch": 180.92,
1710
+ "eval_accuracy": 0.6748251748251748,
1711
+ "eval_loss": 0.8678367733955383,
1712
+ "eval_runtime": 3.994,
1713
+ "eval_samples_per_second": 71.607,
1714
+ "eval_steps_per_second": 0.751,
1715
+ "step": 588
1716
+ },
1717
+ {
1718
+ "epoch": 181.85,
1719
+ "eval_accuracy": 0.6818181818181818,
1720
+ "eval_loss": 0.8620542287826538,
1721
+ "eval_runtime": 4.32,
1722
+ "eval_samples_per_second": 66.204,
1723
+ "eval_steps_per_second": 0.694,
1724
+ "step": 591
1725
+ },
1726
+ {
1727
+ "epoch": 182.77,
1728
+ "eval_accuracy": 0.6888111888111889,
1729
+ "eval_loss": 0.8557011485099792,
1730
+ "eval_runtime": 4.7717,
1731
+ "eval_samples_per_second": 59.937,
1732
+ "eval_steps_per_second": 0.629,
1733
+ "step": 594
1734
+ },
1735
+ {
1736
+ "epoch": 184.0,
1737
+ "eval_accuracy": 0.6888111888111889,
1738
+ "eval_loss": 0.848114013671875,
1739
+ "eval_runtime": 4.0948,
1740
+ "eval_samples_per_second": 69.845,
1741
+ "eval_steps_per_second": 0.733,
1742
+ "step": 598
1743
+ },
1744
+ {
1745
+ "epoch": 184.62,
1746
+ "grad_norm": 36502.2421875,
1747
+ "learning_rate": 1.4285714285714285e-05,
1748
+ "loss": 0.6095,
1749
+ "step": 600
1750
+ },
1751
+ {
1752
+ "epoch": 184.92,
1753
+ "eval_accuracy": 0.6888111888111889,
1754
+ "eval_loss": 0.8428906798362732,
1755
+ "eval_runtime": 4.6887,
1756
+ "eval_samples_per_second": 60.997,
1757
+ "eval_steps_per_second": 0.64,
1758
+ "step": 601
1759
+ },
1760
+ {
1761
+ "epoch": 185.85,
1762
+ "eval_accuracy": 0.6888111888111889,
1763
+ "eval_loss": 0.8413122892379761,
1764
+ "eval_runtime": 3.8998,
1765
+ "eval_samples_per_second": 73.337,
1766
+ "eval_steps_per_second": 0.769,
1767
+ "step": 604
1768
+ },
1769
+ {
1770
+ "epoch": 186.77,
1771
+ "eval_accuracy": 0.6923076923076923,
1772
+ "eval_loss": 0.8402045965194702,
1773
+ "eval_runtime": 4.1508,
1774
+ "eval_samples_per_second": 68.903,
1775
+ "eval_steps_per_second": 0.723,
1776
+ "step": 607
1777
+ },
1778
+ {
1779
+ "epoch": 188.0,
1780
+ "eval_accuracy": 0.6888111888111889,
1781
+ "eval_loss": 0.8415275812149048,
1782
+ "eval_runtime": 4.4966,
1783
+ "eval_samples_per_second": 63.603,
1784
+ "eval_steps_per_second": 0.667,
1785
+ "step": 611
1786
+ },
1787
+ {
1788
+ "epoch": 188.92,
1789
+ "eval_accuracy": 0.6923076923076923,
1790
+ "eval_loss": 0.8409523963928223,
1791
+ "eval_runtime": 4.0007,
1792
+ "eval_samples_per_second": 71.488,
1793
+ "eval_steps_per_second": 0.75,
1794
+ "step": 614
1795
+ },
1796
+ {
1797
+ "epoch": 189.85,
1798
+ "eval_accuracy": 0.6853146853146853,
1799
+ "eval_loss": 0.8388563394546509,
1800
+ "eval_runtime": 4.5212,
1801
+ "eval_samples_per_second": 63.257,
1802
+ "eval_steps_per_second": 0.664,
1803
+ "step": 617
1804
+ },
1805
+ {
1806
+ "epoch": 190.77,
1807
+ "eval_accuracy": 0.6853146853146853,
1808
+ "eval_loss": 0.8353860378265381,
1809
+ "eval_runtime": 4.6112,
1810
+ "eval_samples_per_second": 62.023,
1811
+ "eval_steps_per_second": 0.651,
1812
+ "step": 620
1813
+ },
1814
+ {
1815
+ "epoch": 192.0,
1816
+ "eval_accuracy": 0.6888111888111889,
1817
+ "eval_loss": 0.8356983661651611,
1818
+ "eval_runtime": 4.6563,
1819
+ "eval_samples_per_second": 61.422,
1820
+ "eval_steps_per_second": 0.644,
1821
+ "step": 624
1822
+ },
1823
+ {
1824
+ "epoch": 192.92,
1825
+ "eval_accuracy": 0.6958041958041958,
1826
+ "eval_loss": 0.8400572538375854,
1827
+ "eval_runtime": 5.369,
1828
+ "eval_samples_per_second": 53.269,
1829
+ "eval_steps_per_second": 0.559,
1830
+ "step": 627
1831
+ },
1832
+ {
1833
+ "epoch": 193.85,
1834
+ "eval_accuracy": 0.6958041958041958,
1835
+ "eval_loss": 0.844892144203186,
1836
+ "eval_runtime": 4.0956,
1837
+ "eval_samples_per_second": 69.831,
1838
+ "eval_steps_per_second": 0.732,
1839
+ "step": 630
1840
+ },
1841
+ {
1842
+ "epoch": 194.77,
1843
+ "eval_accuracy": 0.6958041958041958,
1844
+ "eval_loss": 0.8478845357894897,
1845
+ "eval_runtime": 4.6385,
1846
+ "eval_samples_per_second": 61.658,
1847
+ "eval_steps_per_second": 0.647,
1848
+ "step": 633
1849
+ },
1850
+ {
1851
+ "epoch": 196.0,
1852
+ "eval_accuracy": 0.6923076923076923,
1853
+ "eval_loss": 0.8454630374908447,
1854
+ "eval_runtime": 4.4423,
1855
+ "eval_samples_per_second": 64.381,
1856
+ "eval_steps_per_second": 0.675,
1857
+ "step": 637
1858
+ },
1859
+ {
1860
+ "epoch": 196.92,
1861
+ "eval_accuracy": 0.6923076923076923,
1862
+ "eval_loss": 0.8421822190284729,
1863
+ "eval_runtime": 3.8632,
1864
+ "eval_samples_per_second": 74.032,
1865
+ "eval_steps_per_second": 0.777,
1866
+ "step": 640
1867
+ },
1868
+ {
1869
+ "epoch": 197.85,
1870
+ "eval_accuracy": 0.6923076923076923,
1871
+ "eval_loss": 0.8425044417381287,
1872
+ "eval_runtime": 5.1031,
1873
+ "eval_samples_per_second": 56.044,
1874
+ "eval_steps_per_second": 0.588,
1875
+ "step": 643
1876
+ },
1877
+ {
1878
+ "epoch": 198.77,
1879
+ "eval_accuracy": 0.6923076923076923,
1880
+ "eval_loss": 0.8436546325683594,
1881
+ "eval_runtime": 4.9685,
1882
+ "eval_samples_per_second": 57.562,
1883
+ "eval_steps_per_second": 0.604,
1884
+ "step": 646
1885
+ },
1886
+ {
1887
+ "epoch": 200.0,
1888
+ "grad_norm": 66285.84375,
1889
+ "learning_rate": 1.2698412698412699e-05,
1890
+ "loss": 0.5908,
1891
+ "step": 650
1892
+ },
1893
+ {
1894
+ "epoch": 200.0,
1895
+ "eval_accuracy": 0.6958041958041958,
1896
+ "eval_loss": 0.8366544246673584,
1897
+ "eval_runtime": 4.3292,
1898
+ "eval_samples_per_second": 66.063,
1899
+ "eval_steps_per_second": 0.693,
1900
+ "step": 650
1901
+ },
1902
+ {
1903
+ "epoch": 200.92,
1904
+ "eval_accuracy": 0.6993006993006993,
1905
+ "eval_loss": 0.834704577922821,
1906
+ "eval_runtime": 4.7887,
1907
+ "eval_samples_per_second": 59.724,
1908
+ "eval_steps_per_second": 0.626,
1909
+ "step": 653
1910
+ },
1911
+ {
1912
+ "epoch": 201.85,
1913
+ "eval_accuracy": 0.6958041958041958,
1914
+ "eval_loss": 0.8286824226379395,
1915
+ "eval_runtime": 4.388,
1916
+ "eval_samples_per_second": 65.178,
1917
+ "eval_steps_per_second": 0.684,
1918
+ "step": 656
1919
+ },
1920
+ {
1921
+ "epoch": 202.77,
1922
+ "eval_accuracy": 0.6923076923076923,
1923
+ "eval_loss": 0.8259890079498291,
1924
+ "eval_runtime": 3.7365,
1925
+ "eval_samples_per_second": 76.543,
1926
+ "eval_steps_per_second": 0.803,
1927
+ "step": 659
1928
+ },
1929
+ {
1930
+ "epoch": 204.0,
1931
+ "eval_accuracy": 0.6958041958041958,
1932
+ "eval_loss": 0.8263576626777649,
1933
+ "eval_runtime": 4.9175,
1934
+ "eval_samples_per_second": 58.159,
1935
+ "eval_steps_per_second": 0.61,
1936
+ "step": 663
1937
+ },
1938
+ {
1939
+ "epoch": 204.92,
1940
+ "eval_accuracy": 0.6958041958041958,
1941
+ "eval_loss": 0.8295235633850098,
1942
+ "eval_runtime": 4.3071,
1943
+ "eval_samples_per_second": 66.401,
1944
+ "eval_steps_per_second": 0.697,
1945
+ "step": 666
1946
+ },
1947
+ {
1948
+ "epoch": 205.85,
1949
+ "eval_accuracy": 0.6923076923076923,
1950
+ "eval_loss": 0.8301726579666138,
1951
+ "eval_runtime": 3.7499,
1952
+ "eval_samples_per_second": 76.268,
1953
+ "eval_steps_per_second": 0.8,
1954
+ "step": 669
1955
+ },
1956
+ {
1957
+ "epoch": 206.77,
1958
+ "eval_accuracy": 0.6923076923076923,
1959
+ "eval_loss": 0.828461766242981,
1960
+ "eval_runtime": 3.8022,
1961
+ "eval_samples_per_second": 75.219,
1962
+ "eval_steps_per_second": 0.789,
1963
+ "step": 672
1964
+ },
1965
+ {
1966
+ "epoch": 208.0,
1967
+ "eval_accuracy": 0.6923076923076923,
1968
+ "eval_loss": 0.831078052520752,
1969
+ "eval_runtime": 4.2868,
1970
+ "eval_samples_per_second": 66.716,
1971
+ "eval_steps_per_second": 0.7,
1972
+ "step": 676
1973
+ },
1974
+ {
1975
+ "epoch": 208.92,
1976
+ "eval_accuracy": 0.6923076923076923,
1977
+ "eval_loss": 0.8320910334587097,
1978
+ "eval_runtime": 4.474,
1979
+ "eval_samples_per_second": 63.925,
1980
+ "eval_steps_per_second": 0.671,
1981
+ "step": 679
1982
+ },
1983
+ {
1984
+ "epoch": 209.85,
1985
+ "eval_accuracy": 0.6923076923076923,
1986
+ "eval_loss": 0.8305550813674927,
1987
+ "eval_runtime": 4.1246,
1988
+ "eval_samples_per_second": 69.341,
1989
+ "eval_steps_per_second": 0.727,
1990
+ "step": 682
1991
+ },
1992
+ {
1993
+ "epoch": 210.77,
1994
+ "eval_accuracy": 0.6923076923076923,
1995
+ "eval_loss": 0.8302868604660034,
1996
+ "eval_runtime": 4.9131,
1997
+ "eval_samples_per_second": 58.212,
1998
+ "eval_steps_per_second": 0.611,
1999
+ "step": 685
2000
+ },
2001
+ {
2002
+ "epoch": 212.0,
2003
+ "eval_accuracy": 0.6993006993006993,
2004
+ "eval_loss": 0.8256182670593262,
2005
+ "eval_runtime": 4.5542,
2006
+ "eval_samples_per_second": 62.8,
2007
+ "eval_steps_per_second": 0.659,
2008
+ "step": 689
2009
+ },
2010
+ {
2011
+ "epoch": 212.92,
2012
+ "eval_accuracy": 0.6958041958041958,
2013
+ "eval_loss": 0.8230299353599548,
2014
+ "eval_runtime": 4.2845,
2015
+ "eval_samples_per_second": 66.752,
2016
+ "eval_steps_per_second": 0.7,
2017
+ "step": 692
2018
+ },
2019
+ {
2020
+ "epoch": 213.85,
2021
+ "eval_accuracy": 0.6958041958041958,
2022
+ "eval_loss": 0.819442868232727,
2023
+ "eval_runtime": 4.4153,
2024
+ "eval_samples_per_second": 64.775,
2025
+ "eval_steps_per_second": 0.679,
2026
+ "step": 695
2027
+ },
2028
+ {
2029
+ "epoch": 214.77,
2030
+ "eval_accuracy": 0.6958041958041958,
2031
+ "eval_loss": 0.8183168768882751,
2032
+ "eval_runtime": 4.9672,
2033
+ "eval_samples_per_second": 57.577,
2034
+ "eval_steps_per_second": 0.604,
2035
+ "step": 698
2036
+ },
2037
+ {
2038
+ "epoch": 215.38,
2039
+ "grad_norm": 29832.03125,
2040
+ "learning_rate": 1.111111111111111e-05,
2041
+ "loss": 0.5763,
2042
+ "step": 700
2043
+ },
2044
+ {
2045
+ "epoch": 216.0,
2046
+ "eval_accuracy": 0.6958041958041958,
2047
+ "eval_loss": 0.8231977224349976,
2048
+ "eval_runtime": 4.6354,
2049
+ "eval_samples_per_second": 61.699,
2050
+ "eval_steps_per_second": 0.647,
2051
+ "step": 702
2052
+ },
2053
+ {
2054
+ "epoch": 216.92,
2055
+ "eval_accuracy": 0.6888111888111889,
2056
+ "eval_loss": 0.8236932158470154,
2057
+ "eval_runtime": 3.7182,
2058
+ "eval_samples_per_second": 76.92,
2059
+ "eval_steps_per_second": 0.807,
2060
+ "step": 705
2061
+ },
2062
+ {
2063
+ "epoch": 217.85,
2064
+ "eval_accuracy": 0.6993006993006993,
2065
+ "eval_loss": 0.8195610642433167,
2066
+ "eval_runtime": 3.5502,
2067
+ "eval_samples_per_second": 80.56,
2068
+ "eval_steps_per_second": 0.845,
2069
+ "step": 708
2070
+ },
2071
+ {
2072
+ "epoch": 218.77,
2073
+ "eval_accuracy": 0.6993006993006993,
2074
+ "eval_loss": 0.8142436742782593,
2075
+ "eval_runtime": 4.9155,
2076
+ "eval_samples_per_second": 58.184,
2077
+ "eval_steps_per_second": 0.61,
2078
+ "step": 711
2079
+ },
2080
+ {
2081
+ "epoch": 220.0,
2082
+ "eval_accuracy": 0.6993006993006993,
2083
+ "eval_loss": 0.8115321397781372,
2084
+ "eval_runtime": 4.0939,
2085
+ "eval_samples_per_second": 69.86,
2086
+ "eval_steps_per_second": 0.733,
2087
+ "step": 715
2088
+ },
2089
+ {
2090
+ "epoch": 220.92,
2091
+ "eval_accuracy": 0.6993006993006993,
2092
+ "eval_loss": 0.8130100965499878,
2093
+ "eval_runtime": 4.2197,
2094
+ "eval_samples_per_second": 67.777,
2095
+ "eval_steps_per_second": 0.711,
2096
+ "step": 718
2097
+ },
2098
+ {
2099
+ "epoch": 221.85,
2100
+ "eval_accuracy": 0.7027972027972028,
2101
+ "eval_loss": 0.8156144022941589,
2102
+ "eval_runtime": 4.2344,
2103
+ "eval_samples_per_second": 67.542,
2104
+ "eval_steps_per_second": 0.708,
2105
+ "step": 721
2106
+ },
2107
+ {
2108
+ "epoch": 222.77,
2109
+ "eval_accuracy": 0.6958041958041958,
2110
+ "eval_loss": 0.8200713992118835,
2111
+ "eval_runtime": 4.8181,
2112
+ "eval_samples_per_second": 59.36,
2113
+ "eval_steps_per_second": 0.623,
2114
+ "step": 724
2115
+ },
2116
+ {
2117
+ "epoch": 224.0,
2118
+ "eval_accuracy": 0.6958041958041958,
2119
+ "eval_loss": 0.8227414488792419,
2120
+ "eval_runtime": 4.5671,
2121
+ "eval_samples_per_second": 62.621,
2122
+ "eval_steps_per_second": 0.657,
2123
+ "step": 728
2124
+ },
2125
+ {
2126
+ "epoch": 224.92,
2127
+ "eval_accuracy": 0.6958041958041958,
2128
+ "eval_loss": 0.8232228755950928,
2129
+ "eval_runtime": 5.221,
2130
+ "eval_samples_per_second": 54.779,
2131
+ "eval_steps_per_second": 0.575,
2132
+ "step": 731
2133
+ },
2134
+ {
2135
+ "epoch": 225.85,
2136
+ "eval_accuracy": 0.6923076923076923,
2137
+ "eval_loss": 0.8198325634002686,
2138
+ "eval_runtime": 4.2136,
2139
+ "eval_samples_per_second": 67.875,
2140
+ "eval_steps_per_second": 0.712,
2141
+ "step": 734
2142
+ },
2143
+ {
2144
+ "epoch": 226.77,
2145
+ "eval_accuracy": 0.6923076923076923,
2146
+ "eval_loss": 0.8151125311851501,
2147
+ "eval_runtime": 4.8801,
2148
+ "eval_samples_per_second": 58.606,
2149
+ "eval_steps_per_second": 0.615,
2150
+ "step": 737
2151
+ },
2152
+ {
2153
+ "epoch": 228.0,
2154
+ "eval_accuracy": 0.6923076923076923,
2155
+ "eval_loss": 0.8136410713195801,
2156
+ "eval_runtime": 5.2461,
2157
+ "eval_samples_per_second": 54.516,
2158
+ "eval_steps_per_second": 0.572,
2159
+ "step": 741
2160
+ },
2161
+ {
2162
+ "epoch": 228.92,
2163
+ "eval_accuracy": 0.6923076923076923,
2164
+ "eval_loss": 0.8134062886238098,
2165
+ "eval_runtime": 3.6429,
2166
+ "eval_samples_per_second": 78.509,
2167
+ "eval_steps_per_second": 0.824,
2168
+ "step": 744
2169
+ },
2170
+ {
2171
+ "epoch": 229.85,
2172
+ "eval_accuracy": 0.6958041958041958,
2173
+ "eval_loss": 0.8123226761817932,
2174
+ "eval_runtime": 4.8374,
2175
+ "eval_samples_per_second": 59.122,
2176
+ "eval_steps_per_second": 0.62,
2177
+ "step": 747
2178
+ },
2179
+ {
2180
+ "epoch": 230.77,
2181
+ "grad_norm": 27062.134765625,
2182
+ "learning_rate": 9.523809523809523e-06,
2183
+ "loss": 0.57,
2184
+ "step": 750
2185
+ },
2186
+ {
2187
+ "epoch": 230.77,
2188
+ "eval_accuracy": 0.6958041958041958,
2189
+ "eval_loss": 0.8095433115959167,
2190
+ "eval_runtime": 3.9409,
2191
+ "eval_samples_per_second": 72.572,
2192
+ "eval_steps_per_second": 0.761,
2193
+ "step": 750
2194
+ },
2195
+ {
2196
+ "epoch": 232.0,
2197
+ "eval_accuracy": 0.6958041958041958,
2198
+ "eval_loss": 0.8082302212715149,
2199
+ "eval_runtime": 4.0933,
2200
+ "eval_samples_per_second": 69.87,
2201
+ "eval_steps_per_second": 0.733,
2202
+ "step": 754
2203
+ },
2204
+ {
2205
+ "epoch": 232.92,
2206
+ "eval_accuracy": 0.6958041958041958,
2207
+ "eval_loss": 0.8084114193916321,
2208
+ "eval_runtime": 4.4952,
2209
+ "eval_samples_per_second": 63.624,
2210
+ "eval_steps_per_second": 0.667,
2211
+ "step": 757
2212
+ },
2213
+ {
2214
+ "epoch": 233.85,
2215
+ "eval_accuracy": 0.6923076923076923,
2216
+ "eval_loss": 0.8113557696342468,
2217
+ "eval_runtime": 4.6955,
2218
+ "eval_samples_per_second": 60.909,
2219
+ "eval_steps_per_second": 0.639,
2220
+ "step": 760
2221
+ },
2222
+ {
2223
+ "epoch": 234.77,
2224
+ "eval_accuracy": 0.6923076923076923,
2225
+ "eval_loss": 0.8130276799201965,
2226
+ "eval_runtime": 4.9303,
2227
+ "eval_samples_per_second": 58.009,
2228
+ "eval_steps_per_second": 0.608,
2229
+ "step": 763
2230
+ },
2231
+ {
2232
+ "epoch": 236.0,
2233
+ "eval_accuracy": 0.6923076923076923,
2234
+ "eval_loss": 0.8153804540634155,
2235
+ "eval_runtime": 3.6663,
2236
+ "eval_samples_per_second": 78.007,
2237
+ "eval_steps_per_second": 0.818,
2238
+ "step": 767
2239
+ },
2240
+ {
2241
+ "epoch": 236.92,
2242
+ "eval_accuracy": 0.6923076923076923,
2243
+ "eval_loss": 0.8160205483436584,
2244
+ "eval_runtime": 4.6226,
2245
+ "eval_samples_per_second": 61.87,
2246
+ "eval_steps_per_second": 0.649,
2247
+ "step": 770
2248
+ },
2249
+ {
2250
+ "epoch": 237.85,
2251
+ "eval_accuracy": 0.6888111888111889,
2252
+ "eval_loss": 0.8126419186592102,
2253
+ "eval_runtime": 4.6278,
2254
+ "eval_samples_per_second": 61.801,
2255
+ "eval_steps_per_second": 0.648,
2256
+ "step": 773
2257
+ },
2258
+ {
2259
+ "epoch": 238.77,
2260
+ "eval_accuracy": 0.6888111888111889,
2261
+ "eval_loss": 0.8113960027694702,
2262
+ "eval_runtime": 3.8362,
2263
+ "eval_samples_per_second": 74.552,
2264
+ "eval_steps_per_second": 0.782,
2265
+ "step": 776
2266
+ },
2267
+ {
2268
+ "epoch": 240.0,
2269
+ "eval_accuracy": 0.6923076923076923,
2270
+ "eval_loss": 0.8041169047355652,
2271
+ "eval_runtime": 5.2095,
2272
+ "eval_samples_per_second": 54.9,
2273
+ "eval_steps_per_second": 0.576,
2274
+ "step": 780
2275
+ },
2276
+ {
2277
+ "epoch": 240.92,
2278
+ "eval_accuracy": 0.6923076923076923,
2279
+ "eval_loss": 0.8005608916282654,
2280
+ "eval_runtime": 4.0128,
2281
+ "eval_samples_per_second": 71.273,
2282
+ "eval_steps_per_second": 0.748,
2283
+ "step": 783
2284
+ },
2285
+ {
2286
+ "epoch": 241.85,
2287
+ "eval_accuracy": 0.6958041958041958,
2288
+ "eval_loss": 0.7987480163574219,
2289
+ "eval_runtime": 4.8789,
2290
+ "eval_samples_per_second": 58.619,
2291
+ "eval_steps_per_second": 0.615,
2292
+ "step": 786
2293
+ },
2294
+ {
2295
+ "epoch": 242.77,
2296
+ "eval_accuracy": 0.6993006993006993,
2297
+ "eval_loss": 0.7977189421653748,
2298
+ "eval_runtime": 4.5854,
2299
+ "eval_samples_per_second": 62.372,
2300
+ "eval_steps_per_second": 0.654,
2301
+ "step": 789
2302
+ },
2303
+ {
2304
+ "epoch": 244.0,
2305
+ "eval_accuracy": 0.6993006993006993,
2306
+ "eval_loss": 0.8001275658607483,
2307
+ "eval_runtime": 4.7528,
2308
+ "eval_samples_per_second": 60.175,
2309
+ "eval_steps_per_second": 0.631,
2310
+ "step": 793
2311
+ },
2312
+ {
2313
+ "epoch": 244.92,
2314
+ "eval_accuracy": 0.6958041958041958,
2315
+ "eval_loss": 0.8043994903564453,
2316
+ "eval_runtime": 4.2699,
2317
+ "eval_samples_per_second": 66.98,
2318
+ "eval_steps_per_second": 0.703,
2319
+ "step": 796
2320
+ },
2321
+ {
2322
+ "epoch": 245.85,
2323
+ "eval_accuracy": 0.6958041958041958,
2324
+ "eval_loss": 0.8082275390625,
2325
+ "eval_runtime": 4.2996,
2326
+ "eval_samples_per_second": 66.518,
2327
+ "eval_steps_per_second": 0.698,
2328
+ "step": 799
2329
+ },
2330
+ {
2331
+ "epoch": 246.15,
2332
+ "grad_norm": 99001.8359375,
2333
+ "learning_rate": 7.936507936507936e-06,
2334
+ "loss": 0.5456,
2335
+ "step": 800
2336
+ },
2337
+ {
2338
+ "epoch": 246.77,
2339
+ "eval_accuracy": 0.6888111888111889,
2340
+ "eval_loss": 0.8120755553245544,
2341
+ "eval_runtime": 4.5242,
2342
+ "eval_samples_per_second": 63.216,
2343
+ "eval_steps_per_second": 0.663,
2344
+ "step": 802
2345
+ },
2346
+ {
2347
+ "epoch": 248.0,
2348
+ "eval_accuracy": 0.6888111888111889,
2349
+ "eval_loss": 0.8106970191001892,
2350
+ "eval_runtime": 4.4479,
2351
+ "eval_samples_per_second": 64.3,
2352
+ "eval_steps_per_second": 0.674,
2353
+ "step": 806
2354
+ },
2355
+ {
2356
+ "epoch": 248.92,
2357
+ "eval_accuracy": 0.6958041958041958,
2358
+ "eval_loss": 0.806368887424469,
2359
+ "eval_runtime": 4.1522,
2360
+ "eval_samples_per_second": 68.88,
2361
+ "eval_steps_per_second": 0.723,
2362
+ "step": 809
2363
+ },
2364
+ {
2365
+ "epoch": 249.85,
2366
+ "eval_accuracy": 0.6958041958041958,
2367
+ "eval_loss": 0.8042352199554443,
2368
+ "eval_runtime": 4.4213,
2369
+ "eval_samples_per_second": 64.687,
2370
+ "eval_steps_per_second": 0.679,
2371
+ "step": 812
2372
+ },
2373
+ {
2374
+ "epoch": 250.77,
2375
+ "eval_accuracy": 0.6958041958041958,
2376
+ "eval_loss": 0.8005724549293518,
2377
+ "eval_runtime": 4.4134,
2378
+ "eval_samples_per_second": 64.802,
2379
+ "eval_steps_per_second": 0.68,
2380
+ "step": 815
2381
+ },
2382
+ {
2383
+ "epoch": 252.0,
2384
+ "eval_accuracy": 0.6958041958041958,
2385
+ "eval_loss": 0.7968676090240479,
2386
+ "eval_runtime": 3.8229,
2387
+ "eval_samples_per_second": 74.812,
2388
+ "eval_steps_per_second": 0.785,
2389
+ "step": 819
2390
+ },
2391
+ {
2392
+ "epoch": 252.92,
2393
+ "eval_accuracy": 0.6993006993006993,
2394
+ "eval_loss": 0.7954707741737366,
2395
+ "eval_runtime": 4.2693,
2396
+ "eval_samples_per_second": 66.99,
2397
+ "eval_steps_per_second": 0.703,
2398
+ "step": 822
2399
+ },
2400
+ {
2401
+ "epoch": 253.85,
2402
+ "eval_accuracy": 0.6958041958041958,
2403
+ "eval_loss": 0.7973347902297974,
2404
+ "eval_runtime": 4.1401,
2405
+ "eval_samples_per_second": 69.081,
2406
+ "eval_steps_per_second": 0.725,
2407
+ "step": 825
2408
+ },
2409
+ {
2410
+ "epoch": 254.77,
2411
+ "eval_accuracy": 0.6958041958041958,
2412
+ "eval_loss": 0.8001494407653809,
2413
+ "eval_runtime": 4.4851,
2414
+ "eval_samples_per_second": 63.767,
2415
+ "eval_steps_per_second": 0.669,
2416
+ "step": 828
2417
+ },
2418
+ {
2419
+ "epoch": 256.0,
2420
+ "eval_accuracy": 0.6888111888111889,
2421
+ "eval_loss": 0.80350661277771,
2422
+ "eval_runtime": 4.4996,
2423
+ "eval_samples_per_second": 63.562,
2424
+ "eval_steps_per_second": 0.667,
2425
+ "step": 832
2426
+ },
2427
+ {
2428
+ "epoch": 256.92,
2429
+ "eval_accuracy": 0.6853146853146853,
2430
+ "eval_loss": 0.8035485148429871,
2431
+ "eval_runtime": 4.5713,
2432
+ "eval_samples_per_second": 62.564,
2433
+ "eval_steps_per_second": 0.656,
2434
+ "step": 835
2435
+ },
2436
+ {
2437
+ "epoch": 257.85,
2438
+ "eval_accuracy": 0.6923076923076923,
2439
+ "eval_loss": 0.8012282252311707,
2440
+ "eval_runtime": 4.0638,
2441
+ "eval_samples_per_second": 70.377,
2442
+ "eval_steps_per_second": 0.738,
2443
+ "step": 838
2444
+ },
2445
+ {
2446
+ "epoch": 258.77,
2447
+ "eval_accuracy": 0.6923076923076923,
2448
+ "eval_loss": 0.8000492453575134,
2449
+ "eval_runtime": 4.443,
2450
+ "eval_samples_per_second": 64.372,
2451
+ "eval_steps_per_second": 0.675,
2452
+ "step": 841
2453
+ },
2454
+ {
2455
+ "epoch": 260.0,
2456
+ "eval_accuracy": 0.6888111888111889,
2457
+ "eval_loss": 0.7963055968284607,
2458
+ "eval_runtime": 5.2655,
2459
+ "eval_samples_per_second": 54.316,
2460
+ "eval_steps_per_second": 0.57,
2461
+ "step": 845
2462
+ },
2463
+ {
2464
+ "epoch": 260.92,
2465
+ "eval_accuracy": 0.6958041958041958,
2466
+ "eval_loss": 0.7927840352058411,
2467
+ "eval_runtime": 5.1407,
2468
+ "eval_samples_per_second": 55.634,
2469
+ "eval_steps_per_second": 0.584,
2470
+ "step": 848
2471
+ },
2472
+ {
2473
+ "epoch": 261.54,
2474
+ "grad_norm": 24108.591796875,
2475
+ "learning_rate": 6.349206349206349e-06,
2476
+ "loss": 0.5369,
2477
+ "step": 850
2478
+ },
2479
+ {
2480
+ "epoch": 261.85,
2481
+ "eval_accuracy": 0.6923076923076923,
2482
+ "eval_loss": 0.7919009327888489,
2483
+ "eval_runtime": 3.8577,
2484
+ "eval_samples_per_second": 74.138,
2485
+ "eval_steps_per_second": 0.778,
2486
+ "step": 851
2487
+ },
2488
+ {
2489
+ "epoch": 262.77,
2490
+ "eval_accuracy": 0.6888111888111889,
2491
+ "eval_loss": 0.791265606880188,
2492
+ "eval_runtime": 4.1966,
2493
+ "eval_samples_per_second": 68.151,
2494
+ "eval_steps_per_second": 0.715,
2495
+ "step": 854
2496
+ },
2497
+ {
2498
+ "epoch": 264.0,
2499
+ "eval_accuracy": 0.6888111888111889,
2500
+ "eval_loss": 0.7929325699806213,
2501
+ "eval_runtime": 4.063,
2502
+ "eval_samples_per_second": 70.391,
2503
+ "eval_steps_per_second": 0.738,
2504
+ "step": 858
2505
+ },
2506
+ {
2507
+ "epoch": 264.92,
2508
+ "eval_accuracy": 0.6818181818181818,
2509
+ "eval_loss": 0.7954928278923035,
2510
+ "eval_runtime": 4.3933,
2511
+ "eval_samples_per_second": 65.099,
2512
+ "eval_steps_per_second": 0.683,
2513
+ "step": 861
2514
+ },
2515
+ {
2516
+ "epoch": 265.85,
2517
+ "eval_accuracy": 0.6853146853146853,
2518
+ "eval_loss": 0.7962778210639954,
2519
+ "eval_runtime": 4.4424,
2520
+ "eval_samples_per_second": 64.38,
2521
+ "eval_steps_per_second": 0.675,
2522
+ "step": 864
2523
+ },
2524
+ {
2525
+ "epoch": 266.77,
2526
+ "eval_accuracy": 0.6888111888111889,
2527
+ "eval_loss": 0.7951834201812744,
2528
+ "eval_runtime": 4.2605,
2529
+ "eval_samples_per_second": 67.128,
2530
+ "eval_steps_per_second": 0.704,
2531
+ "step": 867
2532
+ },
2533
+ {
2534
+ "epoch": 268.0,
2535
+ "eval_accuracy": 0.6888111888111889,
2536
+ "eval_loss": 0.7936495542526245,
2537
+ "eval_runtime": 4.9467,
2538
+ "eval_samples_per_second": 57.816,
2539
+ "eval_steps_per_second": 0.606,
2540
+ "step": 871
2541
+ },
2542
+ {
2543
+ "epoch": 268.92,
2544
+ "eval_accuracy": 0.6853146853146853,
2545
+ "eval_loss": 0.7928897738456726,
2546
+ "eval_runtime": 4.9925,
2547
+ "eval_samples_per_second": 57.286,
2548
+ "eval_steps_per_second": 0.601,
2549
+ "step": 874
2550
+ },
2551
+ {
2552
+ "epoch": 269.85,
2553
+ "eval_accuracy": 0.6853146853146853,
2554
+ "eval_loss": 0.7933365702629089,
2555
+ "eval_runtime": 4.4133,
2556
+ "eval_samples_per_second": 64.804,
2557
+ "eval_steps_per_second": 0.68,
2558
+ "step": 877
2559
+ },
2560
+ {
2561
+ "epoch": 270.77,
2562
+ "eval_accuracy": 0.6853146853146853,
2563
+ "eval_loss": 0.7940818071365356,
2564
+ "eval_runtime": 4.0519,
2565
+ "eval_samples_per_second": 70.584,
2566
+ "eval_steps_per_second": 0.74,
2567
+ "step": 880
2568
+ },
2569
+ {
2570
+ "epoch": 272.0,
2571
+ "eval_accuracy": 0.6853146853146853,
2572
+ "eval_loss": 0.7939559817314148,
2573
+ "eval_runtime": 4.2845,
2574
+ "eval_samples_per_second": 66.753,
2575
+ "eval_steps_per_second": 0.7,
2576
+ "step": 884
2577
+ },
2578
+ {
2579
+ "epoch": 272.92,
2580
+ "eval_accuracy": 0.6853146853146853,
2581
+ "eval_loss": 0.7929409742355347,
2582
+ "eval_runtime": 4.885,
2583
+ "eval_samples_per_second": 58.546,
2584
+ "eval_steps_per_second": 0.614,
2585
+ "step": 887
2586
+ },
2587
+ {
2588
+ "epoch": 273.85,
2589
+ "eval_accuracy": 0.6853146853146853,
2590
+ "eval_loss": 0.7929646968841553,
2591
+ "eval_runtime": 3.7177,
2592
+ "eval_samples_per_second": 76.929,
2593
+ "eval_steps_per_second": 0.807,
2594
+ "step": 890
2595
+ },
2596
+ {
2597
+ "epoch": 274.77,
2598
+ "eval_accuracy": 0.6853146853146853,
2599
+ "eval_loss": 0.7942932844161987,
2600
+ "eval_runtime": 4.7663,
2601
+ "eval_samples_per_second": 60.004,
2602
+ "eval_steps_per_second": 0.629,
2603
+ "step": 893
2604
+ },
2605
+ {
2606
+ "epoch": 276.0,
2607
+ "eval_accuracy": 0.6853146853146853,
2608
+ "eval_loss": 0.7943535447120667,
2609
+ "eval_runtime": 4.0017,
2610
+ "eval_samples_per_second": 71.47,
2611
+ "eval_steps_per_second": 0.75,
2612
+ "step": 897
2613
+ },
2614
+ {
2615
+ "epoch": 276.92,
2616
+ "grad_norm": 30744.533203125,
2617
+ "learning_rate": 4.7619047619047615e-06,
2618
+ "loss": 0.5388,
2619
+ "step": 900
2620
+ },
2621
+ {
2622
+ "epoch": 276.92,
2623
+ "eval_accuracy": 0.6853146853146853,
2624
+ "eval_loss": 0.7933218479156494,
2625
+ "eval_runtime": 4.3013,
2626
+ "eval_samples_per_second": 66.492,
2627
+ "eval_steps_per_second": 0.697,
2628
+ "step": 900
2629
+ },
2630
+ {
2631
+ "epoch": 277.85,
2632
+ "eval_accuracy": 0.6853146853146853,
2633
+ "eval_loss": 0.7914408445358276,
2634
+ "eval_runtime": 4.8732,
2635
+ "eval_samples_per_second": 58.689,
2636
+ "eval_steps_per_second": 0.616,
2637
+ "step": 903
2638
+ },
2639
+ {
2640
+ "epoch": 278.77,
2641
+ "eval_accuracy": 0.6853146853146853,
2642
+ "eval_loss": 0.7903594970703125,
2643
+ "eval_runtime": 4.6519,
2644
+ "eval_samples_per_second": 61.48,
2645
+ "eval_steps_per_second": 0.645,
2646
+ "step": 906
2647
+ },
2648
+ {
2649
+ "epoch": 280.0,
2650
+ "eval_accuracy": 0.6853146853146853,
2651
+ "eval_loss": 0.7888299822807312,
2652
+ "eval_runtime": 4.5788,
2653
+ "eval_samples_per_second": 62.462,
2654
+ "eval_steps_per_second": 0.655,
2655
+ "step": 910
2656
+ },
2657
+ {
2658
+ "epoch": 280.92,
2659
+ "eval_accuracy": 0.6853146853146853,
2660
+ "eval_loss": 0.7900360822677612,
2661
+ "eval_runtime": 4.5971,
2662
+ "eval_samples_per_second": 62.213,
2663
+ "eval_steps_per_second": 0.653,
2664
+ "step": 913
2665
+ },
2666
+ {
2667
+ "epoch": 281.85,
2668
+ "eval_accuracy": 0.6853146853146853,
2669
+ "eval_loss": 0.7905992865562439,
2670
+ "eval_runtime": 4.4545,
2671
+ "eval_samples_per_second": 64.205,
2672
+ "eval_steps_per_second": 0.673,
2673
+ "step": 916
2674
+ },
2675
+ {
2676
+ "epoch": 282.77,
2677
+ "eval_accuracy": 0.6853146853146853,
2678
+ "eval_loss": 0.7911333441734314,
2679
+ "eval_runtime": 4.4274,
2680
+ "eval_samples_per_second": 64.598,
2681
+ "eval_steps_per_second": 0.678,
2682
+ "step": 919
2683
+ },
2684
+ {
2685
+ "epoch": 284.0,
2686
+ "eval_accuracy": 0.6853146853146853,
2687
+ "eval_loss": 0.7906560897827148,
2688
+ "eval_runtime": 3.9207,
2689
+ "eval_samples_per_second": 72.947,
2690
+ "eval_steps_per_second": 0.765,
2691
+ "step": 923
2692
+ },
2693
+ {
2694
+ "epoch": 284.92,
2695
+ "eval_accuracy": 0.6853146853146853,
2696
+ "eval_loss": 0.7906984686851501,
2697
+ "eval_runtime": 4.5603,
2698
+ "eval_samples_per_second": 62.715,
2699
+ "eval_steps_per_second": 0.658,
2700
+ "step": 926
2701
+ },
2702
+ {
2703
+ "epoch": 285.85,
2704
+ "eval_accuracy": 0.6818181818181818,
2705
+ "eval_loss": 0.7905350923538208,
2706
+ "eval_runtime": 4.8134,
2707
+ "eval_samples_per_second": 59.418,
2708
+ "eval_steps_per_second": 0.623,
2709
+ "step": 929
2710
+ },
2711
+ {
2712
+ "epoch": 286.77,
2713
+ "eval_accuracy": 0.6818181818181818,
2714
+ "eval_loss": 0.7899833917617798,
2715
+ "eval_runtime": 4.0697,
2716
+ "eval_samples_per_second": 70.275,
2717
+ "eval_steps_per_second": 0.737,
2718
+ "step": 932
2719
+ },
2720
+ {
2721
+ "epoch": 288.0,
2722
+ "eval_accuracy": 0.6853146853146853,
2723
+ "eval_loss": 0.7901102304458618,
2724
+ "eval_runtime": 4.0126,
2725
+ "eval_samples_per_second": 71.276,
2726
+ "eval_steps_per_second": 0.748,
2727
+ "step": 936
2728
+ },
2729
+ {
2730
+ "epoch": 288.92,
2731
+ "eval_accuracy": 0.6853146853146853,
2732
+ "eval_loss": 0.7902336120605469,
2733
+ "eval_runtime": 3.8328,
2734
+ "eval_samples_per_second": 74.619,
2735
+ "eval_steps_per_second": 0.783,
2736
+ "step": 939
2737
+ },
2738
+ {
2739
+ "epoch": 289.85,
2740
+ "eval_accuracy": 0.6853146853146853,
2741
+ "eval_loss": 0.7909765839576721,
2742
+ "eval_runtime": 3.9497,
2743
+ "eval_samples_per_second": 72.411,
2744
+ "eval_steps_per_second": 0.76,
2745
+ "step": 942
2746
+ },
2747
+ {
2748
+ "epoch": 290.77,
2749
+ "eval_accuracy": 0.6888111888111889,
2750
+ "eval_loss": 0.7913976907730103,
2751
+ "eval_runtime": 4.7881,
2752
+ "eval_samples_per_second": 59.731,
2753
+ "eval_steps_per_second": 0.627,
2754
+ "step": 945
2755
+ },
2756
+ {
2757
+ "epoch": 292.0,
2758
+ "eval_accuracy": 0.6888111888111889,
2759
+ "eval_loss": 0.7919970750808716,
2760
+ "eval_runtime": 4.0436,
2761
+ "eval_samples_per_second": 70.729,
2762
+ "eval_steps_per_second": 0.742,
2763
+ "step": 949
2764
+ },
2765
+ {
2766
+ "epoch": 292.31,
2767
+ "grad_norm": 41198.3515625,
2768
+ "learning_rate": 3.1746031746031746e-06,
2769
+ "loss": 0.5261,
2770
+ "step": 950
2771
+ },
2772
+ {
2773
+ "epoch": 292.92,
2774
+ "eval_accuracy": 0.6853146853146853,
2775
+ "eval_loss": 0.7927921414375305,
2776
+ "eval_runtime": 3.9219,
2777
+ "eval_samples_per_second": 72.923,
2778
+ "eval_steps_per_second": 0.765,
2779
+ "step": 952
2780
+ },
2781
+ {
2782
+ "epoch": 293.85,
2783
+ "eval_accuracy": 0.6888111888111889,
2784
+ "eval_loss": 0.793153703212738,
2785
+ "eval_runtime": 4.3649,
2786
+ "eval_samples_per_second": 65.522,
2787
+ "eval_steps_per_second": 0.687,
2788
+ "step": 955
2789
+ },
2790
+ {
2791
+ "epoch": 294.77,
2792
+ "eval_accuracy": 0.6888111888111889,
2793
+ "eval_loss": 0.7925400733947754,
2794
+ "eval_runtime": 4.2064,
2795
+ "eval_samples_per_second": 67.992,
2796
+ "eval_steps_per_second": 0.713,
2797
+ "step": 958
2798
+ },
2799
+ {
2800
+ "epoch": 296.0,
2801
+ "eval_accuracy": 0.6888111888111889,
2802
+ "eval_loss": 0.7922278046607971,
2803
+ "eval_runtime": 4.03,
2804
+ "eval_samples_per_second": 70.968,
2805
+ "eval_steps_per_second": 0.744,
2806
+ "step": 962
2807
+ },
2808
+ {
2809
+ "epoch": 296.92,
2810
+ "eval_accuracy": 0.6888111888111889,
2811
+ "eval_loss": 0.7919090986251831,
2812
+ "eval_runtime": 4.4889,
2813
+ "eval_samples_per_second": 63.713,
2814
+ "eval_steps_per_second": 0.668,
2815
+ "step": 965
2816
+ },
2817
+ {
2818
+ "epoch": 297.85,
2819
+ "eval_accuracy": 0.6888111888111889,
2820
+ "eval_loss": 0.7922202348709106,
2821
+ "eval_runtime": 4.3742,
2822
+ "eval_samples_per_second": 65.383,
2823
+ "eval_steps_per_second": 0.686,
2824
+ "step": 968
2825
+ },
2826
+ {
2827
+ "epoch": 298.77,
2828
+ "eval_accuracy": 0.6888111888111889,
2829
+ "eval_loss": 0.7921380400657654,
2830
+ "eval_runtime": 4.27,
2831
+ "eval_samples_per_second": 66.979,
2832
+ "eval_steps_per_second": 0.703,
2833
+ "step": 971
2834
+ },
2835
+ {
2836
+ "epoch": 300.0,
2837
+ "eval_accuracy": 0.6853146853146853,
2838
+ "eval_loss": 0.7912278175354004,
2839
+ "eval_runtime": 4.209,
2840
+ "eval_samples_per_second": 67.95,
2841
+ "eval_steps_per_second": 0.713,
2842
+ "step": 975
2843
+ },
2844
+ {
2845
+ "epoch": 300.92,
2846
+ "eval_accuracy": 0.6853146853146853,
2847
+ "eval_loss": 0.7907286882400513,
2848
+ "eval_runtime": 4.5975,
2849
+ "eval_samples_per_second": 62.208,
2850
+ "eval_steps_per_second": 0.653,
2851
+ "step": 978
2852
+ },
2853
+ {
2854
+ "epoch": 301.85,
2855
+ "eval_accuracy": 0.6853146853146853,
2856
+ "eval_loss": 0.7895866632461548,
2857
+ "eval_runtime": 4.0629,
2858
+ "eval_samples_per_second": 70.394,
2859
+ "eval_steps_per_second": 0.738,
2860
+ "step": 981
2861
+ },
2862
+ {
2863
+ "epoch": 302.77,
2864
+ "eval_accuracy": 0.6888111888111889,
2865
+ "eval_loss": 0.7885376811027527,
2866
+ "eval_runtime": 4.0112,
2867
+ "eval_samples_per_second": 71.301,
2868
+ "eval_steps_per_second": 0.748,
2869
+ "step": 984
2870
+ },
2871
+ {
2872
+ "epoch": 304.0,
2873
+ "eval_accuracy": 0.6888111888111889,
2874
+ "eval_loss": 0.7877256870269775,
2875
+ "eval_runtime": 4.4199,
2876
+ "eval_samples_per_second": 64.708,
2877
+ "eval_steps_per_second": 0.679,
2878
+ "step": 988
2879
+ },
2880
+ {
2881
+ "epoch": 304.92,
2882
+ "eval_accuracy": 0.6888111888111889,
2883
+ "eval_loss": 0.7874112725257874,
2884
+ "eval_runtime": 4.0366,
2885
+ "eval_samples_per_second": 70.852,
2886
+ "eval_steps_per_second": 0.743,
2887
+ "step": 991
2888
+ },
2889
+ {
2890
+ "epoch": 305.85,
2891
+ "eval_accuracy": 0.6888111888111889,
2892
+ "eval_loss": 0.7876228094100952,
2893
+ "eval_runtime": 4.3519,
2894
+ "eval_samples_per_second": 65.718,
2895
+ "eval_steps_per_second": 0.689,
2896
+ "step": 994
2897
+ },
2898
+ {
2899
+ "epoch": 306.77,
2900
+ "eval_accuracy": 0.6888111888111889,
2901
+ "eval_loss": 0.7879106402397156,
2902
+ "eval_runtime": 5.3443,
2903
+ "eval_samples_per_second": 53.515,
2904
+ "eval_steps_per_second": 0.561,
2905
+ "step": 997
2906
+ },
2907
+ {
2908
+ "epoch": 307.69,
2909
+ "grad_norm": 31167.6875,
2910
+ "learning_rate": 1.5873015873015873e-06,
2911
+ "loss": 0.5188,
2912
+ "step": 1000
2913
+ },
2914
+ {
2915
+ "epoch": 308.0,
2916
+ "eval_accuracy": 0.6888111888111889,
2917
+ "eval_loss": 0.7883804440498352,
2918
+ "eval_runtime": 4.1413,
2919
+ "eval_samples_per_second": 69.06,
2920
+ "eval_steps_per_second": 0.724,
2921
+ "step": 1001
2922
+ },
2923
+ {
2924
+ "epoch": 308.92,
2925
+ "eval_accuracy": 0.6888111888111889,
2926
+ "eval_loss": 0.7886692881584167,
2927
+ "eval_runtime": 4.049,
2928
+ "eval_samples_per_second": 70.634,
2929
+ "eval_steps_per_second": 0.741,
2930
+ "step": 1004
2931
+ },
2932
+ {
2933
+ "epoch": 309.85,
2934
+ "eval_accuracy": 0.6888111888111889,
2935
+ "eval_loss": 0.7890444397926331,
2936
+ "eval_runtime": 4.612,
2937
+ "eval_samples_per_second": 62.012,
2938
+ "eval_steps_per_second": 0.65,
2939
+ "step": 1007
2940
+ },
2941
+ {
2942
+ "epoch": 310.77,
2943
+ "eval_accuracy": 0.6888111888111889,
2944
+ "eval_loss": 0.7894096970558167,
2945
+ "eval_runtime": 3.8027,
2946
+ "eval_samples_per_second": 75.209,
2947
+ "eval_steps_per_second": 0.789,
2948
+ "step": 1010
2949
+ },
2950
+ {
2951
+ "epoch": 312.0,
2952
+ "eval_accuracy": 0.6888111888111889,
2953
+ "eval_loss": 0.7899323105812073,
2954
+ "eval_runtime": 4.3345,
2955
+ "eval_samples_per_second": 65.983,
2956
+ "eval_steps_per_second": 0.692,
2957
+ "step": 1014
2958
+ },
2959
+ {
2960
+ "epoch": 312.92,
2961
+ "eval_accuracy": 0.6888111888111889,
2962
+ "eval_loss": 0.7903538346290588,
2963
+ "eval_runtime": 4.5846,
2964
+ "eval_samples_per_second": 62.383,
2965
+ "eval_steps_per_second": 0.654,
2966
+ "step": 1017
2967
+ },
2968
+ {
2969
+ "epoch": 313.85,
2970
+ "eval_accuracy": 0.6923076923076923,
2971
+ "eval_loss": 0.7907257080078125,
2972
+ "eval_runtime": 4.136,
2973
+ "eval_samples_per_second": 69.148,
2974
+ "eval_steps_per_second": 0.725,
2975
+ "step": 1020
2976
+ },
2977
+ {
2978
+ "epoch": 314.77,
2979
+ "eval_accuracy": 0.6923076923076923,
2980
+ "eval_loss": 0.790963888168335,
2981
+ "eval_runtime": 4.2526,
2982
+ "eval_samples_per_second": 67.252,
2983
+ "eval_steps_per_second": 0.705,
2984
+ "step": 1023
2985
+ },
2986
+ {
2987
+ "epoch": 316.0,
2988
+ "eval_accuracy": 0.6923076923076923,
2989
+ "eval_loss": 0.7912085056304932,
2990
+ "eval_runtime": 4.1188,
2991
+ "eval_samples_per_second": 69.437,
2992
+ "eval_steps_per_second": 0.728,
2993
+ "step": 1027
2994
+ },
2995
+ {
2996
+ "epoch": 316.92,
2997
+ "eval_accuracy": 0.6923076923076923,
2998
+ "eval_loss": 0.7911705374717712,
2999
+ "eval_runtime": 4.1524,
3000
+ "eval_samples_per_second": 68.876,
3001
+ "eval_steps_per_second": 0.722,
3002
+ "step": 1030
3003
+ },
3004
+ {
3005
+ "epoch": 317.85,
3006
+ "eval_accuracy": 0.6923076923076923,
3007
+ "eval_loss": 0.7911967039108276,
3008
+ "eval_runtime": 3.9058,
3009
+ "eval_samples_per_second": 73.225,
3010
+ "eval_steps_per_second": 0.768,
3011
+ "step": 1033
3012
+ },
3013
+ {
3014
+ "epoch": 318.77,
3015
+ "eval_accuracy": 0.6923076923076923,
3016
+ "eval_loss": 0.7912610173225403,
3017
+ "eval_runtime": 4.6095,
3018
+ "eval_samples_per_second": 62.046,
3019
+ "eval_steps_per_second": 0.651,
3020
+ "step": 1036
3021
+ },
3022
+ {
3023
+ "epoch": 320.0,
3024
+ "eval_accuracy": 0.6923076923076923,
3025
+ "eval_loss": 0.7912730574607849,
3026
+ "eval_runtime": 5.5705,
3027
+ "eval_samples_per_second": 51.342,
3028
+ "eval_steps_per_second": 0.539,
3029
+ "step": 1040
3030
+ },
3031
+ {
3032
+ "epoch": 320.92,
3033
+ "eval_accuracy": 0.6923076923076923,
3034
+ "eval_loss": 0.7911974787712097,
3035
+ "eval_runtime": 4.9154,
3036
+ "eval_samples_per_second": 58.185,
3037
+ "eval_steps_per_second": 0.61,
3038
+ "step": 1043
3039
+ },
3040
+ {
3041
+ "epoch": 321.85,
3042
+ "eval_accuracy": 0.6923076923076923,
3043
+ "eval_loss": 0.7911575436592102,
3044
+ "eval_runtime": 4.8387,
3045
+ "eval_samples_per_second": 59.107,
3046
+ "eval_steps_per_second": 0.62,
3047
+ "step": 1046
3048
+ },
3049
+ {
3050
+ "epoch": 322.77,
3051
+ "eval_accuracy": 0.6923076923076923,
3052
+ "eval_loss": 0.7911355495452881,
3053
+ "eval_runtime": 4.1368,
3054
+ "eval_samples_per_second": 69.135,
3055
+ "eval_steps_per_second": 0.725,
3056
+ "step": 1049
3057
+ },
3058
+ {
3059
+ "epoch": 323.08,
3060
+ "grad_norm": 53824.44140625,
3061
+ "learning_rate": 0.0,
3062
+ "loss": 0.5194,
3063
+ "step": 1050
3064
+ },
3065
+ {
3066
+ "epoch": 323.08,
3067
+ "eval_accuracy": 0.6923076923076923,
3068
+ "eval_loss": 0.7911302447319031,
3069
+ "eval_runtime": 4.2304,
3070
+ "eval_samples_per_second": 67.606,
3071
+ "eval_steps_per_second": 0.709,
3072
+ "step": 1050
3073
+ },
3074
+ {
3075
+ "epoch": 323.08,
3076
+ "step": 1050,
3077
+ "total_flos": 4.380490432252032e+18,
3078
+ "train_loss": 0.8143934268043155,
3079
+ "train_runtime": 4784.9132,
3080
+ "train_samples_per_second": 113.231,
3081
+ "train_steps_per_second": 0.219
3082
+ }
3083
+ ],
3084
+ "logging_steps": 50,
3085
+ "max_steps": 1050,
3086
+ "num_input_tokens_seen": 0,
3087
+ "num_train_epochs": 350,
3088
+ "save_steps": 500,
3089
+ "total_flos": 4.380490432252032e+18,
3090
+ "train_batch_size": 128,
3091
+ "trial_name": null,
3092
+ "trial_params": null
3093
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e664dcab0876a5ff3b49e3509d0b9d2b6ac89be97cbdc2ed077af9625a3833a9
3
+ size 4984