erbacher commited on
Commit
ec0e9a3
·
1 Parent(s): 4648926

Model save

Browse files
Files changed (5) hide show
  1. README.md +66 -0
  2. all_results.json +13 -0
  3. eval_results.json +8 -0
  4. train_results.json +8 -0
  5. trainer_state.json +1088 -0
README.md ADDED
@@ -0,0 +1,66 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: mistralai/Mistral-7B-v0.1
4
+ tags:
5
+ - generated_from_trainer
6
+ model-index:
7
+ - name: mistral-convsearch-7b
8
+ results: []
9
+ ---
10
+
11
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
12
+ should probably proofread and complete it, then remove this comment. -->
13
+
14
+ # mistral-convsearch-7b
15
+
16
+ This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on an unknown dataset.
17
+ It achieves the following results on the evaluation set:
18
+ - Loss: 0.6596
19
+
20
+ ## Model description
21
+
22
+ More information needed
23
+
24
+ ## Intended uses & limitations
25
+
26
+ More information needed
27
+
28
+ ## Training and evaluation data
29
+
30
+ More information needed
31
+
32
+ ## Training procedure
33
+
34
+ ### Training hyperparameters
35
+
36
+ The following hyperparameters were used during training:
37
+ - learning_rate: 2e-05
38
+ - train_batch_size: 8
39
+ - eval_batch_size: 8
40
+ - seed: 42
41
+ - distributed_type: multi-GPU
42
+ - num_devices: 2
43
+ - gradient_accumulation_steps: 32
44
+ - total_train_batch_size: 512
45
+ - total_eval_batch_size: 16
46
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
47
+ - lr_scheduler_type: cosine
48
+ - num_epochs: 5
49
+
50
+ ### Training results
51
+
52
+ | Training Loss | Epoch | Step | Validation Loss |
53
+ |:-------------:|:-----:|:----:|:---------------:|
54
+ | 0.9605 | 0.62 | 34 | 0.9471 |
55
+ | 0.7935 | 1.62 | 68 | 0.7886 |
56
+ | 0.7104 | 2.62 | 102 | 0.7083 |
57
+ | 0.6798 | 3.62 | 136 | 0.6753 |
58
+ | 0.6571 | 4.62 | 170 | 0.6582 |
59
+
60
+
61
+ ### Framework versions
62
+
63
+ - Transformers 4.35.0
64
+ - Pytorch 2.1.1+cu118
65
+ - Datasets 2.14.6
66
+ - Tokenizers 0.14.1
all_results.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 4.62,
3
+ "eval_loss": 0.6595966815948486,
4
+ "eval_runtime": 22.784,
5
+ "eval_samples": 200,
6
+ "eval_samples_per_second": 8.778,
7
+ "eval_steps_per_second": 0.571,
8
+ "train_loss": 0.8249162855393747,
9
+ "train_runtime": 52349.2428,
10
+ "train_samples": 27870,
11
+ "train_samples_per_second": 2.662,
12
+ "train_steps_per_second": 0.005
13
+ }
eval_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 4.62,
3
+ "eval_loss": 0.6595966815948486,
4
+ "eval_runtime": 22.784,
5
+ "eval_samples": 200,
6
+ "eval_samples_per_second": 8.778,
7
+ "eval_steps_per_second": 0.571
8
+ }
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 4.62,
3
+ "train_loss": 0.8249162855393747,
4
+ "train_runtime": 52349.2428,
5
+ "train_samples": 27870,
6
+ "train_samples_per_second": 2.662,
7
+ "train_steps_per_second": 0.005
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,1088 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 4.622273249138921,
5
+ "eval_steps": 500,
6
+ "global_step": 170,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.02,
13
+ "learning_rate": 1.9999323080037623e-05,
14
+ "loss": 1.399,
15
+ "step": 1
16
+ },
17
+ {
18
+ "epoch": 0.04,
19
+ "learning_rate": 1.999729241179462e-05,
20
+ "loss": 1.4014,
21
+ "step": 2
22
+ },
23
+ {
24
+ "epoch": 0.06,
25
+ "learning_rate": 1.999390827019096e-05,
26
+ "loss": 1.3897,
27
+ "step": 3
28
+ },
29
+ {
30
+ "epoch": 0.07,
31
+ "learning_rate": 1.998917111338525e-05,
32
+ "loss": 1.3771,
33
+ "step": 4
34
+ },
35
+ {
36
+ "epoch": 0.09,
37
+ "learning_rate": 1.9983081582712684e-05,
38
+ "loss": 1.3699,
39
+ "step": 5
40
+ },
41
+ {
42
+ "epoch": 0.11,
43
+ "learning_rate": 1.9975640502598243e-05,
44
+ "loss": 1.3372,
45
+ "step": 6
46
+ },
47
+ {
48
+ "epoch": 0.13,
49
+ "learning_rate": 1.996684888044506e-05,
50
+ "loss": 1.3135,
51
+ "step": 7
52
+ },
53
+ {
54
+ "epoch": 0.15,
55
+ "learning_rate": 1.9956707906498046e-05,
56
+ "loss": 1.2847,
57
+ "step": 8
58
+ },
59
+ {
60
+ "epoch": 0.17,
61
+ "learning_rate": 1.9945218953682736e-05,
62
+ "loss": 1.2688,
63
+ "step": 9
64
+ },
65
+ {
66
+ "epoch": 0.18,
67
+ "learning_rate": 1.9932383577419432e-05,
68
+ "loss": 1.256,
69
+ "step": 10
70
+ },
71
+ {
72
+ "epoch": 0.2,
73
+ "learning_rate": 1.9918203515412616e-05,
74
+ "loss": 1.238,
75
+ "step": 11
76
+ },
77
+ {
78
+ "epoch": 0.22,
79
+ "learning_rate": 1.9902680687415704e-05,
80
+ "loss": 1.2066,
81
+ "step": 12
82
+ },
83
+ {
84
+ "epoch": 0.24,
85
+ "learning_rate": 1.9885817194971116e-05,
86
+ "loss": 1.1979,
87
+ "step": 13
88
+ },
89
+ {
90
+ "epoch": 0.26,
91
+ "learning_rate": 1.9867615321125796e-05,
92
+ "loss": 1.1685,
93
+ "step": 14
94
+ },
95
+ {
96
+ "epoch": 0.28,
97
+ "learning_rate": 1.9848077530122083e-05,
98
+ "loss": 1.1574,
99
+ "step": 15
100
+ },
101
+ {
102
+ "epoch": 0.29,
103
+ "learning_rate": 1.9827206467064133e-05,
104
+ "loss": 1.1427,
105
+ "step": 16
106
+ },
107
+ {
108
+ "epoch": 0.31,
109
+ "learning_rate": 1.9805004957559795e-05,
110
+ "loss": 1.122,
111
+ "step": 17
112
+ },
113
+ {
114
+ "epoch": 0.33,
115
+ "learning_rate": 1.9781476007338058e-05,
116
+ "loss": 1.1201,
117
+ "step": 18
118
+ },
119
+ {
120
+ "epoch": 0.35,
121
+ "learning_rate": 1.9756622801842144e-05,
122
+ "loss": 1.0971,
123
+ "step": 19
124
+ },
125
+ {
126
+ "epoch": 0.37,
127
+ "learning_rate": 1.973044870579824e-05,
128
+ "loss": 1.086,
129
+ "step": 20
130
+ },
131
+ {
132
+ "epoch": 0.39,
133
+ "learning_rate": 1.9702957262759964e-05,
134
+ "loss": 1.0764,
135
+ "step": 21
136
+ },
137
+ {
138
+ "epoch": 0.4,
139
+ "learning_rate": 1.967415219462864e-05,
140
+ "loss": 1.0692,
141
+ "step": 22
142
+ },
143
+ {
144
+ "epoch": 0.42,
145
+ "learning_rate": 1.964403740114939e-05,
146
+ "loss": 1.0525,
147
+ "step": 23
148
+ },
149
+ {
150
+ "epoch": 0.44,
151
+ "learning_rate": 1.961261695938319e-05,
152
+ "loss": 1.0412,
153
+ "step": 24
154
+ },
155
+ {
156
+ "epoch": 0.46,
157
+ "learning_rate": 1.957989512315489e-05,
158
+ "loss": 1.0329,
159
+ "step": 25
160
+ },
161
+ {
162
+ "epoch": 0.48,
163
+ "learning_rate": 1.954587632247732e-05,
164
+ "loss": 1.023,
165
+ "step": 26
166
+ },
167
+ {
168
+ "epoch": 0.5,
169
+ "learning_rate": 1.9510565162951538e-05,
170
+ "loss": 1.019,
171
+ "step": 27
172
+ },
173
+ {
174
+ "epoch": 0.51,
175
+ "learning_rate": 1.9473966425143292e-05,
176
+ "loss": 1.0028,
177
+ "step": 28
178
+ },
179
+ {
180
+ "epoch": 0.53,
181
+ "learning_rate": 1.9436085063935837e-05,
182
+ "loss": 0.9944,
183
+ "step": 29
184
+ },
185
+ {
186
+ "epoch": 0.55,
187
+ "learning_rate": 1.9396926207859085e-05,
188
+ "loss": 0.9901,
189
+ "step": 30
190
+ },
191
+ {
192
+ "epoch": 0.57,
193
+ "learning_rate": 1.9356495158395317e-05,
194
+ "loss": 0.9879,
195
+ "step": 31
196
+ },
197
+ {
198
+ "epoch": 0.59,
199
+ "learning_rate": 1.9314797389261426e-05,
200
+ "loss": 0.9729,
201
+ "step": 32
202
+ },
203
+ {
204
+ "epoch": 0.61,
205
+ "learning_rate": 1.9271838545667876e-05,
206
+ "loss": 0.9611,
207
+ "step": 33
208
+ },
209
+ {
210
+ "epoch": 0.62,
211
+ "learning_rate": 1.9227624443554425e-05,
212
+ "loss": 0.9605,
213
+ "step": 34
214
+ },
215
+ {
216
+ "epoch": 0.62,
217
+ "eval_loss": 0.9471226930618286,
218
+ "eval_runtime": 24.0918,
219
+ "eval_samples_per_second": 8.302,
220
+ "eval_steps_per_second": 0.54,
221
+ "step": 34
222
+ },
223
+ {
224
+ "epoch": 1.02,
225
+ "learning_rate": 1.9182161068802742e-05,
226
+ "loss": 0.9495,
227
+ "step": 35
228
+ },
229
+ {
230
+ "epoch": 1.04,
231
+ "learning_rate": 1.913545457642601e-05,
232
+ "loss": 0.948,
233
+ "step": 36
234
+ },
235
+ {
236
+ "epoch": 1.05,
237
+ "learning_rate": 1.9087511289735646e-05,
238
+ "loss": 0.9363,
239
+ "step": 37
240
+ },
241
+ {
242
+ "epoch": 1.07,
243
+ "learning_rate": 1.9038337699485207e-05,
244
+ "loss": 0.9356,
245
+ "step": 38
246
+ },
247
+ {
248
+ "epoch": 1.09,
249
+ "learning_rate": 1.8987940462991673e-05,
250
+ "loss": 0.9322,
251
+ "step": 39
252
+ },
253
+ {
254
+ "epoch": 1.11,
255
+ "learning_rate": 1.8936326403234125e-05,
256
+ "loss": 0.919,
257
+ "step": 40
258
+ },
259
+ {
260
+ "epoch": 1.13,
261
+ "learning_rate": 1.8883502507930044e-05,
262
+ "loss": 0.9141,
263
+ "step": 41
264
+ },
265
+ {
266
+ "epoch": 1.15,
267
+ "learning_rate": 1.8829475928589272e-05,
268
+ "loss": 0.9057,
269
+ "step": 42
270
+ },
271
+ {
272
+ "epoch": 1.16,
273
+ "learning_rate": 1.877425397954582e-05,
274
+ "loss": 0.8991,
275
+ "step": 43
276
+ },
277
+ {
278
+ "epoch": 1.18,
279
+ "learning_rate": 1.8717844136967626e-05,
280
+ "loss": 0.8994,
281
+ "step": 44
282
+ },
283
+ {
284
+ "epoch": 1.2,
285
+ "learning_rate": 1.866025403784439e-05,
286
+ "loss": 0.8916,
287
+ "step": 45
288
+ },
289
+ {
290
+ "epoch": 1.22,
291
+ "learning_rate": 1.860149147895366e-05,
292
+ "loss": 0.8881,
293
+ "step": 46
294
+ },
295
+ {
296
+ "epoch": 1.24,
297
+ "learning_rate": 1.854156441580526e-05,
298
+ "loss": 0.8798,
299
+ "step": 47
300
+ },
301
+ {
302
+ "epoch": 1.26,
303
+ "learning_rate": 1.848048096156426e-05,
304
+ "loss": 0.8753,
305
+ "step": 48
306
+ },
307
+ {
308
+ "epoch": 1.27,
309
+ "learning_rate": 1.8418249385952575e-05,
310
+ "loss": 0.8738,
311
+ "step": 49
312
+ },
313
+ {
314
+ "epoch": 1.29,
315
+ "learning_rate": 1.8354878114129368e-05,
316
+ "loss": 0.871,
317
+ "step": 50
318
+ },
319
+ {
320
+ "epoch": 1.31,
321
+ "learning_rate": 1.8290375725550417e-05,
322
+ "loss": 0.8617,
323
+ "step": 51
324
+ },
325
+ {
326
+ "epoch": 1.33,
327
+ "learning_rate": 1.8224750952806626e-05,
328
+ "loss": 0.8646,
329
+ "step": 52
330
+ },
331
+ {
332
+ "epoch": 1.35,
333
+ "learning_rate": 1.8158012680441723e-05,
334
+ "loss": 0.8547,
335
+ "step": 53
336
+ },
337
+ {
338
+ "epoch": 1.37,
339
+ "learning_rate": 1.8090169943749477e-05,
340
+ "loss": 0.8523,
341
+ "step": 54
342
+ },
343
+ {
344
+ "epoch": 1.39,
345
+ "learning_rate": 1.802123192755044e-05,
346
+ "loss": 0.8425,
347
+ "step": 55
348
+ },
349
+ {
350
+ "epoch": 1.4,
351
+ "learning_rate": 1.795120796494848e-05,
352
+ "loss": 0.8455,
353
+ "step": 56
354
+ },
355
+ {
356
+ "epoch": 1.42,
357
+ "learning_rate": 1.788010753606722e-05,
358
+ "loss": 0.8367,
359
+ "step": 57
360
+ },
361
+ {
362
+ "epoch": 1.44,
363
+ "learning_rate": 1.7807940266766595e-05,
364
+ "loss": 0.832,
365
+ "step": 58
366
+ },
367
+ {
368
+ "epoch": 1.46,
369
+ "learning_rate": 1.7734715927339642e-05,
370
+ "loss": 0.828,
371
+ "step": 59
372
+ },
373
+ {
374
+ "epoch": 1.48,
375
+ "learning_rate": 1.766044443118978e-05,
376
+ "loss": 0.8302,
377
+ "step": 60
378
+ },
379
+ {
380
+ "epoch": 1.5,
381
+ "learning_rate": 1.7585135833488692e-05,
382
+ "loss": 0.8194,
383
+ "step": 61
384
+ },
385
+ {
386
+ "epoch": 1.51,
387
+ "learning_rate": 1.7508800329814993e-05,
388
+ "loss": 0.8123,
389
+ "step": 62
390
+ },
391
+ {
392
+ "epoch": 1.53,
393
+ "learning_rate": 1.7431448254773943e-05,
394
+ "loss": 0.8113,
395
+ "step": 63
396
+ },
397
+ {
398
+ "epoch": 1.55,
399
+ "learning_rate": 1.735309008059829e-05,
400
+ "loss": 0.813,
401
+ "step": 64
402
+ },
403
+ {
404
+ "epoch": 1.57,
405
+ "learning_rate": 1.7273736415730488e-05,
406
+ "loss": 0.8104,
407
+ "step": 65
408
+ },
409
+ {
410
+ "epoch": 1.59,
411
+ "learning_rate": 1.7193398003386514e-05,
412
+ "loss": 0.8008,
413
+ "step": 66
414
+ },
415
+ {
416
+ "epoch": 1.61,
417
+ "learning_rate": 1.711208572010137e-05,
418
+ "loss": 0.7963,
419
+ "step": 67
420
+ },
421
+ {
422
+ "epoch": 1.62,
423
+ "learning_rate": 1.702981057425662e-05,
424
+ "loss": 0.7935,
425
+ "step": 68
426
+ },
427
+ {
428
+ "epoch": 1.62,
429
+ "eval_loss": 0.7886030077934265,
430
+ "eval_runtime": 23.1527,
431
+ "eval_samples_per_second": 8.638,
432
+ "eval_steps_per_second": 0.561,
433
+ "step": 68
434
+ },
435
+ {
436
+ "epoch": 2.02,
437
+ "learning_rate": 1.6946583704589973e-05,
438
+ "loss": 0.7855,
439
+ "step": 69
440
+ },
441
+ {
442
+ "epoch": 2.04,
443
+ "learning_rate": 1.686241637868734e-05,
444
+ "loss": 0.7851,
445
+ "step": 70
446
+ },
447
+ {
448
+ "epoch": 2.05,
449
+ "learning_rate": 1.6777319991457325e-05,
450
+ "loss": 0.7832,
451
+ "step": 71
452
+ },
453
+ {
454
+ "epoch": 2.07,
455
+ "learning_rate": 1.6691306063588583e-05,
456
+ "loss": 0.7768,
457
+ "step": 72
458
+ },
459
+ {
460
+ "epoch": 2.09,
461
+ "learning_rate": 1.6604386239990077e-05,
462
+ "loss": 0.7786,
463
+ "step": 73
464
+ },
465
+ {
466
+ "epoch": 2.11,
467
+ "learning_rate": 1.6516572288214555e-05,
468
+ "loss": 0.772,
469
+ "step": 74
470
+ },
471
+ {
472
+ "epoch": 2.13,
473
+ "learning_rate": 1.6427876096865394e-05,
474
+ "loss": 0.7644,
475
+ "step": 75
476
+ },
477
+ {
478
+ "epoch": 2.15,
479
+ "learning_rate": 1.63383096739871e-05,
480
+ "loss": 0.7616,
481
+ "step": 76
482
+ },
483
+ {
484
+ "epoch": 2.16,
485
+ "learning_rate": 1.6247885145439602e-05,
486
+ "loss": 0.761,
487
+ "step": 77
488
+ },
489
+ {
490
+ "epoch": 2.18,
491
+ "learning_rate": 1.6156614753256583e-05,
492
+ "loss": 0.7587,
493
+ "step": 78
494
+ },
495
+ {
496
+ "epoch": 2.2,
497
+ "learning_rate": 1.6064510853988137e-05,
498
+ "loss": 0.7573,
499
+ "step": 79
500
+ },
501
+ {
502
+ "epoch": 2.22,
503
+ "learning_rate": 1.5971585917027864e-05,
504
+ "loss": 0.7558,
505
+ "step": 80
506
+ },
507
+ {
508
+ "epoch": 2.24,
509
+ "learning_rate": 1.5877852522924733e-05,
510
+ "loss": 0.7519,
511
+ "step": 81
512
+ },
513
+ {
514
+ "epoch": 2.26,
515
+ "learning_rate": 1.5783323361679865e-05,
516
+ "loss": 0.7427,
517
+ "step": 82
518
+ },
519
+ {
520
+ "epoch": 2.27,
521
+ "learning_rate": 1.568801123102852e-05,
522
+ "loss": 0.7504,
523
+ "step": 83
524
+ },
525
+ {
526
+ "epoch": 2.29,
527
+ "learning_rate": 1.5591929034707468e-05,
528
+ "loss": 0.7435,
529
+ "step": 84
530
+ },
531
+ {
532
+ "epoch": 2.31,
533
+ "learning_rate": 1.5495089780708062e-05,
534
+ "loss": 0.7417,
535
+ "step": 85
536
+ },
537
+ {
538
+ "epoch": 2.33,
539
+ "learning_rate": 1.539750657951513e-05,
540
+ "loss": 0.7445,
541
+ "step": 86
542
+ },
543
+ {
544
+ "epoch": 2.35,
545
+ "learning_rate": 1.529919264233205e-05,
546
+ "loss": 0.7416,
547
+ "step": 87
548
+ },
549
+ {
550
+ "epoch": 2.37,
551
+ "learning_rate": 1.5200161279292154e-05,
552
+ "loss": 0.7392,
553
+ "step": 88
554
+ },
555
+ {
556
+ "epoch": 2.38,
557
+ "learning_rate": 1.5100425897656754e-05,
558
+ "loss": 0.7369,
559
+ "step": 89
560
+ },
561
+ {
562
+ "epoch": 2.4,
563
+ "learning_rate": 1.5000000000000002e-05,
564
+ "loss": 0.7346,
565
+ "step": 90
566
+ },
567
+ {
568
+ "epoch": 2.42,
569
+ "learning_rate": 1.4898897182380872e-05,
570
+ "loss": 0.7293,
571
+ "step": 91
572
+ },
573
+ {
574
+ "epoch": 2.44,
575
+ "learning_rate": 1.4797131132502464e-05,
576
+ "loss": 0.726,
577
+ "step": 92
578
+ },
579
+ {
580
+ "epoch": 2.46,
581
+ "learning_rate": 1.469471562785891e-05,
582
+ "loss": 0.729,
583
+ "step": 93
584
+ },
585
+ {
586
+ "epoch": 2.48,
587
+ "learning_rate": 1.4591664533870118e-05,
588
+ "loss": 0.7262,
589
+ "step": 94
590
+ },
591
+ {
592
+ "epoch": 2.49,
593
+ "learning_rate": 1.4487991802004625e-05,
594
+ "loss": 0.7296,
595
+ "step": 95
596
+ },
597
+ {
598
+ "epoch": 2.51,
599
+ "learning_rate": 1.4383711467890776e-05,
600
+ "loss": 0.7249,
601
+ "step": 96
602
+ },
603
+ {
604
+ "epoch": 2.53,
605
+ "learning_rate": 1.4278837649416543e-05,
606
+ "loss": 0.7173,
607
+ "step": 97
608
+ },
609
+ {
610
+ "epoch": 2.55,
611
+ "learning_rate": 1.417338454481818e-05,
612
+ "loss": 0.7206,
613
+ "step": 98
614
+ },
615
+ {
616
+ "epoch": 2.57,
617
+ "learning_rate": 1.4067366430758004e-05,
618
+ "loss": 0.7233,
619
+ "step": 99
620
+ },
621
+ {
622
+ "epoch": 2.59,
623
+ "learning_rate": 1.396079766039157e-05,
624
+ "loss": 0.72,
625
+ "step": 100
626
+ },
627
+ {
628
+ "epoch": 2.61,
629
+ "learning_rate": 1.3853692661424485e-05,
630
+ "loss": 0.7152,
631
+ "step": 101
632
+ },
633
+ {
634
+ "epoch": 2.62,
635
+ "learning_rate": 1.3746065934159123e-05,
636
+ "loss": 0.7104,
637
+ "step": 102
638
+ },
639
+ {
640
+ "epoch": 2.62,
641
+ "eval_loss": 0.7083348631858826,
642
+ "eval_runtime": 23.1348,
643
+ "eval_samples_per_second": 8.645,
644
+ "eval_steps_per_second": 0.562,
645
+ "step": 102
646
+ },
647
+ {
648
+ "epoch": 3.02,
649
+ "learning_rate": 1.3637932049531517e-05,
650
+ "loss": 0.7093,
651
+ "step": 103
652
+ },
653
+ {
654
+ "epoch": 3.04,
655
+ "learning_rate": 1.3529305647138689e-05,
656
+ "loss": 0.7102,
657
+ "step": 104
658
+ },
659
+ {
660
+ "epoch": 3.05,
661
+ "learning_rate": 1.342020143325669e-05,
662
+ "loss": 0.7093,
663
+ "step": 105
664
+ },
665
+ {
666
+ "epoch": 3.07,
667
+ "learning_rate": 1.3310634178849583e-05,
668
+ "loss": 0.7063,
669
+ "step": 106
670
+ },
671
+ {
672
+ "epoch": 3.09,
673
+ "learning_rate": 1.3200618717569716e-05,
674
+ "loss": 0.7098,
675
+ "step": 107
676
+ },
677
+ {
678
+ "epoch": 3.11,
679
+ "learning_rate": 1.3090169943749475e-05,
680
+ "loss": 0.7039,
681
+ "step": 108
682
+ },
683
+ {
684
+ "epoch": 3.13,
685
+ "learning_rate": 1.297930281038482e-05,
686
+ "loss": 0.702,
687
+ "step": 109
688
+ },
689
+ {
690
+ "epoch": 3.15,
691
+ "learning_rate": 1.2868032327110904e-05,
692
+ "loss": 0.6992,
693
+ "step": 110
694
+ },
695
+ {
696
+ "epoch": 3.16,
697
+ "learning_rate": 1.2756373558169992e-05,
698
+ "loss": 0.6958,
699
+ "step": 111
700
+ },
701
+ {
702
+ "epoch": 3.18,
703
+ "learning_rate": 1.2644341620372025e-05,
704
+ "loss": 0.6987,
705
+ "step": 112
706
+ },
707
+ {
708
+ "epoch": 3.2,
709
+ "learning_rate": 1.253195168104802e-05,
710
+ "loss": 0.6971,
711
+ "step": 113
712
+ },
713
+ {
714
+ "epoch": 3.22,
715
+ "learning_rate": 1.2419218955996677e-05,
716
+ "loss": 0.7017,
717
+ "step": 114
718
+ },
719
+ {
720
+ "epoch": 3.24,
721
+ "learning_rate": 1.2306158707424402e-05,
722
+ "loss": 0.6922,
723
+ "step": 115
724
+ },
725
+ {
726
+ "epoch": 3.26,
727
+ "learning_rate": 1.2192786241879033e-05,
728
+ "loss": 0.6908,
729
+ "step": 116
730
+ },
731
+ {
732
+ "epoch": 3.27,
733
+ "learning_rate": 1.2079116908177592e-05,
734
+ "loss": 0.6964,
735
+ "step": 117
736
+ },
737
+ {
738
+ "epoch": 3.29,
739
+ "learning_rate": 1.1965166095328302e-05,
740
+ "loss": 0.6948,
741
+ "step": 118
742
+ },
743
+ {
744
+ "epoch": 3.31,
745
+ "learning_rate": 1.1850949230447146e-05,
746
+ "loss": 0.6899,
747
+ "step": 119
748
+ },
749
+ {
750
+ "epoch": 3.33,
751
+ "learning_rate": 1.1736481776669307e-05,
752
+ "loss": 0.6951,
753
+ "step": 120
754
+ },
755
+ {
756
+ "epoch": 3.35,
757
+ "learning_rate": 1.1621779231055677e-05,
758
+ "loss": 0.6913,
759
+ "step": 121
760
+ },
761
+ {
762
+ "epoch": 3.37,
763
+ "learning_rate": 1.1506857122494832e-05,
764
+ "loss": 0.69,
765
+ "step": 122
766
+ },
767
+ {
768
+ "epoch": 3.38,
769
+ "learning_rate": 1.1391731009600655e-05,
770
+ "loss": 0.6901,
771
+ "step": 123
772
+ },
773
+ {
774
+ "epoch": 3.4,
775
+ "learning_rate": 1.127641647860595e-05,
776
+ "loss": 0.6898,
777
+ "step": 124
778
+ },
779
+ {
780
+ "epoch": 3.42,
781
+ "learning_rate": 1.1160929141252303e-05,
782
+ "loss": 0.6859,
783
+ "step": 125
784
+ },
785
+ {
786
+ "epoch": 3.44,
787
+ "learning_rate": 1.1045284632676535e-05,
788
+ "loss": 0.686,
789
+ "step": 126
790
+ },
791
+ {
792
+ "epoch": 3.46,
793
+ "learning_rate": 1.0929498609293925e-05,
794
+ "loss": 0.682,
795
+ "step": 127
796
+ },
797
+ {
798
+ "epoch": 3.48,
799
+ "learning_rate": 1.0813586746678584e-05,
800
+ "loss": 0.6862,
801
+ "step": 128
802
+ },
803
+ {
804
+ "epoch": 3.49,
805
+ "learning_rate": 1.0697564737441254e-05,
806
+ "loss": 0.687,
807
+ "step": 129
808
+ },
809
+ {
810
+ "epoch": 3.51,
811
+ "learning_rate": 1.0581448289104759e-05,
812
+ "loss": 0.6805,
813
+ "step": 130
814
+ },
815
+ {
816
+ "epoch": 3.53,
817
+ "learning_rate": 1.046525312197747e-05,
818
+ "loss": 0.679,
819
+ "step": 131
820
+ },
821
+ {
822
+ "epoch": 3.55,
823
+ "learning_rate": 1.0348994967025012e-05,
824
+ "loss": 0.6851,
825
+ "step": 132
826
+ },
827
+ {
828
+ "epoch": 3.57,
829
+ "learning_rate": 1.0232689563740563e-05,
830
+ "loss": 0.6884,
831
+ "step": 133
832
+ },
833
+ {
834
+ "epoch": 3.59,
835
+ "learning_rate": 1.0116352658013973e-05,
836
+ "loss": 0.6786,
837
+ "step": 134
838
+ },
839
+ {
840
+ "epoch": 3.6,
841
+ "learning_rate": 1e-05,
842
+ "loss": 0.6755,
843
+ "step": 135
844
+ },
845
+ {
846
+ "epoch": 3.62,
847
+ "learning_rate": 9.883647341986032e-06,
848
+ "loss": 0.6798,
849
+ "step": 136
850
+ },
851
+ {
852
+ "epoch": 3.62,
853
+ "eval_loss": 0.6752661466598511,
854
+ "eval_runtime": 23.1352,
855
+ "eval_samples_per_second": 8.645,
856
+ "eval_steps_per_second": 0.562,
857
+ "step": 136
858
+ },
859
+ {
860
+ "epoch": 4.02,
861
+ "learning_rate": 9.767310436259438e-06,
862
+ "loss": 0.6762,
863
+ "step": 137
864
+ },
865
+ {
866
+ "epoch": 4.03,
867
+ "learning_rate": 9.651005032974994e-06,
868
+ "loss": 0.6758,
869
+ "step": 138
870
+ },
871
+ {
872
+ "epoch": 4.05,
873
+ "learning_rate": 9.534746878022533e-06,
874
+ "loss": 0.6737,
875
+ "step": 139
876
+ },
877
+ {
878
+ "epoch": 4.07,
879
+ "learning_rate": 9.418551710895243e-06,
880
+ "loss": 0.6783,
881
+ "step": 140
882
+ },
883
+ {
884
+ "epoch": 4.09,
885
+ "learning_rate": 9.302435262558748e-06,
886
+ "loss": 0.6744,
887
+ "step": 141
888
+ },
889
+ {
890
+ "epoch": 4.11,
891
+ "learning_rate": 9.18641325332142e-06,
892
+ "loss": 0.6722,
893
+ "step": 142
894
+ },
895
+ {
896
+ "epoch": 4.13,
897
+ "learning_rate": 9.07050139070608e-06,
898
+ "loss": 0.6715,
899
+ "step": 143
900
+ },
901
+ {
902
+ "epoch": 4.14,
903
+ "learning_rate": 8.954715367323468e-06,
904
+ "loss": 0.6696,
905
+ "step": 144
906
+ },
907
+ {
908
+ "epoch": 4.16,
909
+ "learning_rate": 8.839070858747697e-06,
910
+ "loss": 0.666,
911
+ "step": 145
912
+ },
913
+ {
914
+ "epoch": 4.18,
915
+ "learning_rate": 8.723583521394054e-06,
916
+ "loss": 0.6709,
917
+ "step": 146
918
+ },
919
+ {
920
+ "epoch": 4.2,
921
+ "learning_rate": 8.60826899039935e-06,
922
+ "loss": 0.6739,
923
+ "step": 147
924
+ },
925
+ {
926
+ "epoch": 4.22,
927
+ "learning_rate": 8.49314287750517e-06,
928
+ "loss": 0.6678,
929
+ "step": 148
930
+ },
931
+ {
932
+ "epoch": 4.24,
933
+ "learning_rate": 8.378220768944328e-06,
934
+ "loss": 0.6628,
935
+ "step": 149
936
+ },
937
+ {
938
+ "epoch": 4.25,
939
+ "learning_rate": 8.263518223330698e-06,
940
+ "loss": 0.6722,
941
+ "step": 150
942
+ },
943
+ {
944
+ "epoch": 4.27,
945
+ "learning_rate": 8.149050769552856e-06,
946
+ "loss": 0.6669,
947
+ "step": 151
948
+ },
949
+ {
950
+ "epoch": 4.29,
951
+ "learning_rate": 8.034833904671698e-06,
952
+ "loss": 0.6685,
953
+ "step": 152
954
+ },
955
+ {
956
+ "epoch": 4.31,
957
+ "learning_rate": 7.92088309182241e-06,
958
+ "loss": 0.6648,
959
+ "step": 153
960
+ },
961
+ {
962
+ "epoch": 4.33,
963
+ "learning_rate": 7.807213758120965e-06,
964
+ "loss": 0.6693,
965
+ "step": 154
966
+ },
967
+ {
968
+ "epoch": 4.35,
969
+ "learning_rate": 7.6938412925756e-06,
970
+ "loss": 0.669,
971
+ "step": 155
972
+ },
973
+ {
974
+ "epoch": 4.37,
975
+ "learning_rate": 7.580781044003324e-06,
976
+ "loss": 0.6664,
977
+ "step": 156
978
+ },
979
+ {
980
+ "epoch": 4.38,
981
+ "learning_rate": 7.468048318951983e-06,
982
+ "loss": 0.6661,
983
+ "step": 157
984
+ },
985
+ {
986
+ "epoch": 4.4,
987
+ "learning_rate": 7.355658379627981e-06,
988
+ "loss": 0.6661,
989
+ "step": 158
990
+ },
991
+ {
992
+ "epoch": 4.42,
993
+ "learning_rate": 7.243626441830009e-06,
994
+ "loss": 0.6642,
995
+ "step": 159
996
+ },
997
+ {
998
+ "epoch": 4.44,
999
+ "learning_rate": 7.131967672889101e-06,
1000
+ "loss": 0.6648,
1001
+ "step": 160
1002
+ },
1003
+ {
1004
+ "epoch": 4.46,
1005
+ "learning_rate": 7.02069718961518e-06,
1006
+ "loss": 0.6606,
1007
+ "step": 161
1008
+ },
1009
+ {
1010
+ "epoch": 4.48,
1011
+ "learning_rate": 6.909830056250527e-06,
1012
+ "loss": 0.6613,
1013
+ "step": 162
1014
+ },
1015
+ {
1016
+ "epoch": 4.49,
1017
+ "learning_rate": 6.799381282430284e-06,
1018
+ "loss": 0.668,
1019
+ "step": 163
1020
+ },
1021
+ {
1022
+ "epoch": 4.51,
1023
+ "learning_rate": 6.689365821150421e-06,
1024
+ "loss": 0.6605,
1025
+ "step": 164
1026
+ },
1027
+ {
1028
+ "epoch": 4.53,
1029
+ "learning_rate": 6.579798566743314e-06,
1030
+ "loss": 0.6632,
1031
+ "step": 165
1032
+ },
1033
+ {
1034
+ "epoch": 4.55,
1035
+ "learning_rate": 6.4706943528613135e-06,
1036
+ "loss": 0.6587,
1037
+ "step": 166
1038
+ },
1039
+ {
1040
+ "epoch": 4.57,
1041
+ "learning_rate": 6.362067950468489e-06,
1042
+ "loss": 0.666,
1043
+ "step": 167
1044
+ },
1045
+ {
1046
+ "epoch": 4.59,
1047
+ "learning_rate": 6.25393406584088e-06,
1048
+ "loss": 0.6672,
1049
+ "step": 168
1050
+ },
1051
+ {
1052
+ "epoch": 4.6,
1053
+ "learning_rate": 6.146307338575519e-06,
1054
+ "loss": 0.663,
1055
+ "step": 169
1056
+ },
1057
+ {
1058
+ "epoch": 4.62,
1059
+ "learning_rate": 6.039202339608432e-06,
1060
+ "loss": 0.6571,
1061
+ "step": 170
1062
+ },
1063
+ {
1064
+ "epoch": 4.62,
1065
+ "eval_loss": 0.6581608653068542,
1066
+ "eval_runtime": 23.2031,
1067
+ "eval_samples_per_second": 8.62,
1068
+ "eval_steps_per_second": 0.56,
1069
+ "step": 170
1070
+ },
1071
+ {
1072
+ "epoch": 4.62,
1073
+ "step": 170,
1074
+ "total_flos": 3.168954324143309e+16,
1075
+ "train_loss": 0.8249162855393747,
1076
+ "train_runtime": 52349.2428,
1077
+ "train_samples_per_second": 2.662,
1078
+ "train_steps_per_second": 0.005
1079
+ }
1080
+ ],
1081
+ "logging_steps": 1,
1082
+ "max_steps": 270,
1083
+ "num_train_epochs": 5,
1084
+ "save_steps": 500,
1085
+ "total_flos": 3.168954324143309e+16,
1086
+ "trial_name": null,
1087
+ "trial_params": null
1088
+ }