genies-llm commited on
Commit
b11e9a8
·
verified ·
1 Parent(s): eda07c8

Model save

Browse files
Files changed (5) hide show
  1. README.md +58 -0
  2. all_results.json +8 -0
  3. generation_config.json +11 -0
  4. train_results.json +8 -0
  5. trainer_state.json +1411 -0
README.md ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: Qwen/Qwen2.5-Coder-7B-Instruct
3
+ library_name: transformers
4
+ model_name: text2sql-sft-v9
5
+ tags:
6
+ - generated_from_trainer
7
+ - trl
8
+ - sft
9
+ licence: license
10
+ ---
11
+
12
+ # Model Card for text2sql-sft-v9
13
+
14
+ This model is a fine-tuned version of [Qwen/Qwen2.5-Coder-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct).
15
+ It has been trained using [TRL](https://github.com/huggingface/trl).
16
+
17
+ ## Quick start
18
+
19
+ ```python
20
+ from transformers import pipeline
21
+
22
+ question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
23
+ generator = pipeline("text-generation", model="genies-llm/text2sql-sft-v9", device="cuda")
24
+ output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
25
+ print(output["generated_text"])
26
+ ```
27
+
28
+ ## Training procedure
29
+
30
+ [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/genies-rnd/text2sql-sft/runs/v732vvn5)
31
+
32
+
33
+ This model was trained with SFT.
34
+
35
+ ### Framework versions
36
+
37
+ - TRL: 0.18.0
38
+ - Transformers: 4.52.3
39
+ - Pytorch: 2.6.0
40
+ - Datasets: 4.0.0
41
+ - Tokenizers: 0.21.4
42
+
43
+ ## Citations
44
+
45
+
46
+
47
+ Cite TRL as:
48
+
49
+ ```bibtex
50
+ @misc{vonwerra2022trl,
51
+ title = {{TRL: Transformer Reinforcement Learning}},
52
+ author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
53
+ year = 2020,
54
+ journal = {GitHub repository},
55
+ publisher = {GitHub},
56
+ howpublished = {\url{https://github.com/huggingface/trl}}
57
+ }
58
+ ```
all_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "total_flos": 2.276376686268252e+17,
3
+ "train_loss": 0.25581319124726526,
4
+ "train_runtime": 2707.0199,
5
+ "train_samples": 7285,
6
+ "train_samples_per_second": 8.073,
7
+ "train_steps_per_second": 0.063
8
+ }
generation_config.json ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 151643,
3
+ "do_sample": true,
4
+ "eos_token_id": 151645,
5
+ "pad_token_id": 151643,
6
+ "repetition_penalty": 1.1,
7
+ "temperature": 0.7,
8
+ "top_k": 20,
9
+ "top_p": 0.8,
10
+ "transformers_version": "4.52.3"
11
+ }
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "total_flos": 2.276376686268252e+17,
3
+ "train_loss": 0.25581319124726526,
4
+ "train_runtime": 2707.0199,
5
+ "train_samples": 7285,
6
+ "train_samples_per_second": 8.073,
7
+ "train_steps_per_second": 0.063
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,1411 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": null,
3
+ "best_metric": null,
4
+ "best_model_checkpoint": null,
5
+ "epoch": 3.0,
6
+ "eval_steps": 500,
7
+ "global_step": 171,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.017543859649122806,
14
+ "grad_norm": 3.77577782421705,
15
+ "learning_rate": 0.0,
16
+ "loss": 1.1733,
17
+ "num_tokens": 427344.0,
18
+ "step": 1
19
+ },
20
+ {
21
+ "epoch": 0.03508771929824561,
22
+ "grad_norm": 3.7130563556909215,
23
+ "learning_rate": 1.6666666666666667e-06,
24
+ "loss": 1.164,
25
+ "num_tokens": 866449.0,
26
+ "step": 2
27
+ },
28
+ {
29
+ "epoch": 0.05263157894736842,
30
+ "grad_norm": 3.970572390407664,
31
+ "learning_rate": 3.3333333333333333e-06,
32
+ "loss": 1.2254,
33
+ "num_tokens": 1268563.0,
34
+ "step": 3
35
+ },
36
+ {
37
+ "epoch": 0.07017543859649122,
38
+ "grad_norm": 3.512030300056592,
39
+ "learning_rate": 5e-06,
40
+ "loss": 1.2077,
41
+ "num_tokens": 1669265.0,
42
+ "step": 4
43
+ },
44
+ {
45
+ "epoch": 0.08771929824561403,
46
+ "grad_norm": 2.37795690695823,
47
+ "learning_rate": 6.666666666666667e-06,
48
+ "loss": 1.0413,
49
+ "num_tokens": 2093363.0,
50
+ "step": 5
51
+ },
52
+ {
53
+ "epoch": 0.10526315789473684,
54
+ "grad_norm": 1.525864968420599,
55
+ "learning_rate": 8.333333333333334e-06,
56
+ "loss": 0.874,
57
+ "num_tokens": 2505147.0,
58
+ "step": 6
59
+ },
60
+ {
61
+ "epoch": 0.12280701754385964,
62
+ "grad_norm": 1.4347947716935,
63
+ "learning_rate": 1e-05,
64
+ "loss": 0.8159,
65
+ "num_tokens": 2931742.0,
66
+ "step": 7
67
+ },
68
+ {
69
+ "epoch": 0.14035087719298245,
70
+ "grad_norm": 2.6084462153475063,
71
+ "learning_rate": 9.999184354855868e-06,
72
+ "loss": 0.6802,
73
+ "num_tokens": 3316057.0,
74
+ "step": 8
75
+ },
76
+ {
77
+ "epoch": 0.15789473684210525,
78
+ "grad_norm": 1.5740859932252258,
79
+ "learning_rate": 9.996737715102133e-06,
80
+ "loss": 0.6019,
81
+ "num_tokens": 3761399.0,
82
+ "step": 9
83
+ },
84
+ {
85
+ "epoch": 0.17543859649122806,
86
+ "grad_norm": 1.5450933348615719,
87
+ "learning_rate": 9.99266096766761e-06,
88
+ "loss": 0.5439,
89
+ "num_tokens": 4179394.0,
90
+ "step": 10
91
+ },
92
+ {
93
+ "epoch": 0.19298245614035087,
94
+ "grad_norm": 0.9018401685238229,
95
+ "learning_rate": 9.98695559040975e-06,
96
+ "loss": 0.4438,
97
+ "num_tokens": 4600345.0,
98
+ "step": 11
99
+ },
100
+ {
101
+ "epoch": 0.21052631578947367,
102
+ "grad_norm": 0.5196800459649759,
103
+ "learning_rate": 9.979623651578881e-06,
104
+ "loss": 0.395,
105
+ "num_tokens": 4980610.0,
106
+ "step": 12
107
+ },
108
+ {
109
+ "epoch": 0.22807017543859648,
110
+ "grad_norm": 0.4799071601752034,
111
+ "learning_rate": 9.970667809068476e-06,
112
+ "loss": 0.3892,
113
+ "num_tokens": 5395359.0,
114
+ "step": 13
115
+ },
116
+ {
117
+ "epoch": 0.24561403508771928,
118
+ "grad_norm": 0.31207959270885294,
119
+ "learning_rate": 9.960091309451626e-06,
120
+ "loss": 0.3808,
121
+ "num_tokens": 5805663.0,
122
+ "step": 14
123
+ },
124
+ {
125
+ "epoch": 0.2631578947368421,
126
+ "grad_norm": 0.31760448278346387,
127
+ "learning_rate": 9.947897986804131e-06,
128
+ "loss": 0.3708,
129
+ "num_tokens": 6201438.0,
130
+ "step": 15
131
+ },
132
+ {
133
+ "epoch": 0.2807017543859649,
134
+ "grad_norm": 0.3009817279323404,
135
+ "learning_rate": 9.93409226131462e-06,
136
+ "loss": 0.3557,
137
+ "num_tokens": 6614132.0,
138
+ "step": 16
139
+ },
140
+ {
141
+ "epoch": 0.2982456140350877,
142
+ "grad_norm": 0.2599822684525422,
143
+ "learning_rate": 9.91867913768218e-06,
144
+ "loss": 0.3284,
145
+ "num_tokens": 7030814.0,
146
+ "step": 17
147
+ },
148
+ {
149
+ "epoch": 0.3157894736842105,
150
+ "grad_norm": 0.24197734921107386,
151
+ "learning_rate": 9.901664203302126e-06,
152
+ "loss": 0.3272,
153
+ "num_tokens": 7453748.0,
154
+ "step": 18
155
+ },
156
+ {
157
+ "epoch": 0.3333333333333333,
158
+ "grad_norm": 0.2741912470593866,
159
+ "learning_rate": 9.883053626240503e-06,
160
+ "loss": 0.3181,
161
+ "num_tokens": 7874681.0,
162
+ "step": 19
163
+ },
164
+ {
165
+ "epoch": 0.3508771929824561,
166
+ "grad_norm": 0.21560866373997145,
167
+ "learning_rate": 9.862854152998112e-06,
168
+ "loss": 0.3102,
169
+ "num_tokens": 8349009.0,
170
+ "step": 20
171
+ },
172
+ {
173
+ "epoch": 0.3684210526315789,
174
+ "grad_norm": 0.203790119667157,
175
+ "learning_rate": 9.841073106064852e-06,
176
+ "loss": 0.3038,
177
+ "num_tokens": 8779295.0,
178
+ "step": 21
179
+ },
180
+ {
181
+ "epoch": 0.38596491228070173,
182
+ "grad_norm": 0.21543607735771148,
183
+ "learning_rate": 9.81771838126524e-06,
184
+ "loss": 0.3039,
185
+ "num_tokens": 9155380.0,
186
+ "step": 22
187
+ },
188
+ {
189
+ "epoch": 0.40350877192982454,
190
+ "grad_norm": 0.19448461940928388,
191
+ "learning_rate": 9.792798444896107e-06,
192
+ "loss": 0.2923,
193
+ "num_tokens": 9572881.0,
194
+ "step": 23
195
+ },
196
+ {
197
+ "epoch": 0.42105263157894735,
198
+ "grad_norm": 0.20104199677846465,
199
+ "learning_rate": 9.766322330657499e-06,
200
+ "loss": 0.3004,
201
+ "num_tokens": 9972480.0,
202
+ "step": 24
203
+ },
204
+ {
205
+ "epoch": 0.43859649122807015,
206
+ "grad_norm": 0.19715422132031898,
207
+ "learning_rate": 9.738299636377863e-06,
208
+ "loss": 0.291,
209
+ "num_tokens": 10406948.0,
210
+ "step": 25
211
+ },
212
+ {
213
+ "epoch": 0.45614035087719296,
214
+ "grad_norm": 0.19192801250536232,
215
+ "learning_rate": 9.70874052053476e-06,
216
+ "loss": 0.289,
217
+ "num_tokens": 10821942.0,
218
+ "step": 26
219
+ },
220
+ {
221
+ "epoch": 0.47368421052631576,
222
+ "grad_norm": 0.18616122639207905,
223
+ "learning_rate": 9.677655698572326e-06,
224
+ "loss": 0.2622,
225
+ "num_tokens": 11234170.0,
226
+ "step": 27
227
+ },
228
+ {
229
+ "epoch": 0.49122807017543857,
230
+ "grad_norm": 0.192192334546507,
231
+ "learning_rate": 9.645056439016827e-06,
232
+ "loss": 0.275,
233
+ "num_tokens": 11627308.0,
234
+ "step": 28
235
+ },
236
+ {
237
+ "epoch": 0.5087719298245614,
238
+ "grad_norm": 0.18829460979027127,
239
+ "learning_rate": 9.610954559391704e-06,
240
+ "loss": 0.2754,
241
+ "num_tokens": 12022656.0,
242
+ "step": 29
243
+ },
244
+ {
245
+ "epoch": 0.5263157894736842,
246
+ "grad_norm": 0.17300283701775698,
247
+ "learning_rate": 9.57536242193364e-06,
248
+ "loss": 0.2692,
249
+ "num_tokens": 12444405.0,
250
+ "step": 30
251
+ },
252
+ {
253
+ "epoch": 0.543859649122807,
254
+ "grad_norm": 0.17695771762278176,
255
+ "learning_rate": 9.538292929111114e-06,
256
+ "loss": 0.2734,
257
+ "num_tokens": 12837998.0,
258
+ "step": 31
259
+ },
260
+ {
261
+ "epoch": 0.5614035087719298,
262
+ "grad_norm": 0.16869993658358656,
263
+ "learning_rate": 9.499759518947156e-06,
264
+ "loss": 0.2657,
265
+ "num_tokens": 13261798.0,
266
+ "step": 32
267
+ },
268
+ {
269
+ "epoch": 0.5789473684210527,
270
+ "grad_norm": 0.15857386041778282,
271
+ "learning_rate": 9.459776160147941e-06,
272
+ "loss": 0.2559,
273
+ "num_tokens": 13717762.0,
274
+ "step": 33
275
+ },
276
+ {
277
+ "epoch": 0.5964912280701754,
278
+ "grad_norm": 0.16181853610382824,
279
+ "learning_rate": 9.418357347038999e-06,
280
+ "loss": 0.2427,
281
+ "num_tokens": 14142042.0,
282
+ "step": 34
283
+ },
284
+ {
285
+ "epoch": 0.6140350877192983,
286
+ "grad_norm": 0.16729961186553155,
287
+ "learning_rate": 9.375518094310904e-06,
288
+ "loss": 0.2546,
289
+ "num_tokens": 14543269.0,
290
+ "step": 35
291
+ },
292
+ {
293
+ "epoch": 0.631578947368421,
294
+ "grad_norm": 0.16125107833920507,
295
+ "learning_rate": 9.331273931576306e-06,
296
+ "loss": 0.2455,
297
+ "num_tokens": 14941779.0,
298
+ "step": 36
299
+ },
300
+ {
301
+ "epoch": 0.6491228070175439,
302
+ "grad_norm": 0.1651336881757046,
303
+ "learning_rate": 9.285640897740316e-06,
304
+ "loss": 0.2479,
305
+ "num_tokens": 15378768.0,
306
+ "step": 37
307
+ },
308
+ {
309
+ "epoch": 0.6666666666666666,
310
+ "grad_norm": 0.16309612539485216,
311
+ "learning_rate": 9.238635535186247e-06,
312
+ "loss": 0.2524,
313
+ "num_tokens": 15784254.0,
314
+ "step": 38
315
+ },
316
+ {
317
+ "epoch": 0.6842105263157895,
318
+ "grad_norm": 0.1613815194130245,
319
+ "learning_rate": 9.19027488377886e-06,
320
+ "loss": 0.252,
321
+ "num_tokens": 16187575.0,
322
+ "step": 39
323
+ },
324
+ {
325
+ "epoch": 0.7017543859649122,
326
+ "grad_norm": 0.15494556090577644,
327
+ "learning_rate": 9.140576474687263e-06,
328
+ "loss": 0.2397,
329
+ "num_tokens": 16592627.0,
330
+ "step": 40
331
+ },
332
+ {
333
+ "epoch": 0.7192982456140351,
334
+ "grad_norm": 0.15730027265317648,
335
+ "learning_rate": 9.0895583240297e-06,
336
+ "loss": 0.2396,
337
+ "num_tokens": 17019496.0,
338
+ "step": 41
339
+ },
340
+ {
341
+ "epoch": 0.7368421052631579,
342
+ "grad_norm": 0.15536856311696803,
343
+ "learning_rate": 9.037238926342544e-06,
344
+ "loss": 0.2388,
345
+ "num_tokens": 17448731.0,
346
+ "step": 42
347
+ },
348
+ {
349
+ "epoch": 0.7543859649122807,
350
+ "grad_norm": 0.16373703052589486,
351
+ "learning_rate": 8.983637247875872e-06,
352
+ "loss": 0.2447,
353
+ "num_tokens": 17852418.0,
354
+ "step": 43
355
+ },
356
+ {
357
+ "epoch": 0.7719298245614035,
358
+ "grad_norm": 0.1564039931823383,
359
+ "learning_rate": 8.92877271971802e-06,
360
+ "loss": 0.2317,
361
+ "num_tokens": 18284414.0,
362
+ "step": 44
363
+ },
364
+ {
365
+ "epoch": 0.7894736842105263,
366
+ "grad_norm": 0.15673969833275106,
367
+ "learning_rate": 8.872665230751644e-06,
368
+ "loss": 0.2445,
369
+ "num_tokens": 18700575.0,
370
+ "step": 45
371
+ },
372
+ {
373
+ "epoch": 0.8070175438596491,
374
+ "grad_norm": 0.16227301290569535,
375
+ "learning_rate": 8.815335120443822e-06,
376
+ "loss": 0.2369,
377
+ "num_tokens": 19112507.0,
378
+ "step": 46
379
+ },
380
+ {
381
+ "epoch": 0.8245614035087719,
382
+ "grad_norm": 0.16192875727409162,
383
+ "learning_rate": 8.756803171472817e-06,
384
+ "loss": 0.2488,
385
+ "num_tokens": 19497572.0,
386
+ "step": 47
387
+ },
388
+ {
389
+ "epoch": 0.8421052631578947,
390
+ "grad_norm": 0.14878902875215574,
391
+ "learning_rate": 8.69709060219416e-06,
392
+ "loss": 0.221,
393
+ "num_tokens": 19887057.0,
394
+ "step": 48
395
+ },
396
+ {
397
+ "epoch": 0.8596491228070176,
398
+ "grad_norm": 0.168304643511618,
399
+ "learning_rate": 8.636219058948823e-06,
400
+ "loss": 0.2338,
401
+ "num_tokens": 20294327.0,
402
+ "step": 49
403
+ },
404
+ {
405
+ "epoch": 0.8771929824561403,
406
+ "grad_norm": 0.14440251287899045,
407
+ "learning_rate": 8.574210608216206e-06,
408
+ "loss": 0.2165,
409
+ "num_tokens": 20731445.0,
410
+ "step": 50
411
+ },
412
+ {
413
+ "epoch": 0.8947368421052632,
414
+ "grad_norm": 0.14573622062248015,
415
+ "learning_rate": 8.511087728614863e-06,
416
+ "loss": 0.2291,
417
+ "num_tokens": 21165129.0,
418
+ "step": 51
419
+ },
420
+ {
421
+ "epoch": 0.9122807017543859,
422
+ "grad_norm": 0.15767251624934578,
423
+ "learning_rate": 8.446873302753783e-06,
424
+ "loss": 0.2231,
425
+ "num_tokens": 21564437.0,
426
+ "step": 52
427
+ },
428
+ {
429
+ "epoch": 0.9298245614035088,
430
+ "grad_norm": 0.14330856767528197,
431
+ "learning_rate": 8.381590608937251e-06,
432
+ "loss": 0.2274,
433
+ "num_tokens": 22012280.0,
434
+ "step": 53
435
+ },
436
+ {
437
+ "epoch": 0.9473684210526315,
438
+ "grad_norm": 0.1524166655671973,
439
+ "learning_rate": 8.315263312726248e-06,
440
+ "loss": 0.2131,
441
+ "num_tokens": 22396001.0,
442
+ "step": 54
443
+ },
444
+ {
445
+ "epoch": 0.9649122807017544,
446
+ "grad_norm": 0.15336585183985868,
447
+ "learning_rate": 8.247915458359473e-06,
448
+ "loss": 0.2195,
449
+ "num_tokens": 22793769.0,
450
+ "step": 55
451
+ },
452
+ {
453
+ "epoch": 0.9824561403508771,
454
+ "grad_norm": 0.1588593078419976,
455
+ "learning_rate": 8.179571460037096e-06,
456
+ "loss": 0.2345,
457
+ "num_tokens": 23201717.0,
458
+ "step": 56
459
+ },
460
+ {
461
+ "epoch": 1.0,
462
+ "grad_norm": 0.14666483129173052,
463
+ "learning_rate": 8.110256093070393e-06,
464
+ "loss": 0.2346,
465
+ "num_tokens": 23647950.0,
466
+ "step": 57
467
+ },
468
+ {
469
+ "epoch": 1.0175438596491229,
470
+ "grad_norm": 0.15418874889368148,
471
+ "learning_rate": 8.039994484900463e-06,
472
+ "loss": 0.2268,
473
+ "num_tokens": 24100529.0,
474
+ "step": 58
475
+ },
476
+ {
477
+ "epoch": 1.0350877192982457,
478
+ "grad_norm": 0.14747387727593939,
479
+ "learning_rate": 7.968812105989316e-06,
480
+ "loss": 0.2155,
481
+ "num_tokens": 24540892.0,
482
+ "step": 59
483
+ },
484
+ {
485
+ "epoch": 1.0526315789473684,
486
+ "grad_norm": 0.154346076339797,
487
+ "learning_rate": 7.896734760586599e-06,
488
+ "loss": 0.2057,
489
+ "num_tokens": 24956824.0,
490
+ "step": 60
491
+ },
492
+ {
493
+ "epoch": 1.0701754385964912,
494
+ "grad_norm": 0.14685651214148715,
495
+ "learning_rate": 7.82378857737533e-06,
496
+ "loss": 0.2036,
497
+ "num_tokens": 25384518.0,
498
+ "step": 61
499
+ },
500
+ {
501
+ "epoch": 1.087719298245614,
502
+ "grad_norm": 0.16326674348993506,
503
+ "learning_rate": 7.75e-06,
504
+ "loss": 0.2001,
505
+ "num_tokens": 25771807.0,
506
+ "step": 62
507
+ },
508
+ {
509
+ "epoch": 1.1052631578947367,
510
+ "grad_norm": 0.147771119836904,
511
+ "learning_rate": 7.675395777480538e-06,
512
+ "loss": 0.1996,
513
+ "num_tokens": 26177417.0,
514
+ "step": 63
515
+ },
516
+ {
517
+ "epoch": 1.1228070175438596,
518
+ "grad_norm": 0.14003657083220583,
519
+ "learning_rate": 7.600002954515532e-06,
520
+ "loss": 0.2072,
521
+ "num_tokens": 26622325.0,
522
+ "step": 64
523
+ },
524
+ {
525
+ "epoch": 1.1403508771929824,
526
+ "grad_norm": 0.15332767124685198,
527
+ "learning_rate": 7.523848861678297e-06,
528
+ "loss": 0.2065,
529
+ "num_tokens": 27045078.0,
530
+ "step": 65
531
+ },
532
+ {
533
+ "epoch": 1.1578947368421053,
534
+ "grad_norm": 0.15183433287347486,
535
+ "learning_rate": 7.446961105509289e-06,
536
+ "loss": 0.2032,
537
+ "num_tokens": 27438828.0,
538
+ "step": 66
539
+ },
540
+ {
541
+ "epoch": 1.1754385964912282,
542
+ "grad_norm": 0.14554656331938712,
543
+ "learning_rate": 7.36936755850849e-06,
544
+ "loss": 0.2054,
545
+ "num_tokens": 27854689.0,
546
+ "step": 67
547
+ },
548
+ {
549
+ "epoch": 1.1929824561403508,
550
+ "grad_norm": 0.1537836129829156,
551
+ "learning_rate": 7.2910963490313815e-06,
552
+ "loss": 0.1949,
553
+ "num_tokens": 28233580.0,
554
+ "step": 68
555
+ },
556
+ {
557
+ "epoch": 1.2105263157894737,
558
+ "grad_norm": 0.14572761360447276,
559
+ "learning_rate": 7.212175851092154e-06,
560
+ "loss": 0.1958,
561
+ "num_tokens": 28641897.0,
562
+ "step": 69
563
+ },
564
+ {
565
+ "epoch": 1.2280701754385965,
566
+ "grad_norm": 0.13708384426430809,
567
+ "learning_rate": 7.132634674077884e-06,
568
+ "loss": 0.2021,
569
+ "num_tokens": 29084929.0,
570
+ "step": 70
571
+ },
572
+ {
573
+ "epoch": 1.2456140350877192,
574
+ "grad_norm": 0.1486755044006831,
575
+ "learning_rate": 7.052501652377368e-06,
576
+ "loss": 0.2044,
577
+ "num_tokens": 29482516.0,
578
+ "step": 71
579
+ },
580
+ {
581
+ "epoch": 1.263157894736842,
582
+ "grad_norm": 0.14957101767673275,
583
+ "learning_rate": 6.971805834928399e-06,
584
+ "loss": 0.2048,
585
+ "num_tokens": 29899136.0,
586
+ "step": 72
587
+ },
588
+ {
589
+ "epoch": 1.280701754385965,
590
+ "grad_norm": 0.1486401064457622,
591
+ "learning_rate": 6.890576474687264e-06,
592
+ "loss": 0.2068,
593
+ "num_tokens": 30317666.0,
594
+ "step": 73
595
+ },
596
+ {
597
+ "epoch": 1.2982456140350878,
598
+ "grad_norm": 0.15958496167902586,
599
+ "learning_rate": 6.808843018024296e-06,
600
+ "loss": 0.1986,
601
+ "num_tokens": 30734034.0,
602
+ "step": 74
603
+ },
604
+ {
605
+ "epoch": 1.3157894736842106,
606
+ "grad_norm": 0.1383546863269268,
607
+ "learning_rate": 6.726635094049291e-06,
608
+ "loss": 0.199,
609
+ "num_tokens": 31155917.0,
610
+ "step": 75
611
+ },
612
+ {
613
+ "epoch": 1.3333333333333333,
614
+ "grad_norm": 0.14368049999314014,
615
+ "learning_rate": 6.643982503870693e-06,
616
+ "loss": 0.2032,
617
+ "num_tokens": 31573757.0,
618
+ "step": 76
619
+ },
620
+ {
621
+ "epoch": 1.3508771929824561,
622
+ "grad_norm": 0.13900291410105262,
623
+ "learning_rate": 6.560915209792424e-06,
624
+ "loss": 0.2016,
625
+ "num_tokens": 32010739.0,
626
+ "step": 77
627
+ },
628
+ {
629
+ "epoch": 1.368421052631579,
630
+ "grad_norm": 0.13844326865953724,
631
+ "learning_rate": 6.477463324452286e-06,
632
+ "loss": 0.1925,
633
+ "num_tokens": 32424467.0,
634
+ "step": 78
635
+ },
636
+ {
637
+ "epoch": 1.3859649122807016,
638
+ "grad_norm": 0.1433757292045691,
639
+ "learning_rate": 6.393657099905854e-06,
640
+ "loss": 0.2008,
641
+ "num_tokens": 32834770.0,
642
+ "step": 79
643
+ },
644
+ {
645
+ "epoch": 1.4035087719298245,
646
+ "grad_norm": 0.14196299072627255,
647
+ "learning_rate": 6.309526916659843e-06,
648
+ "loss": 0.1924,
649
+ "num_tokens": 33255872.0,
650
+ "step": 80
651
+ },
652
+ {
653
+ "epoch": 1.4210526315789473,
654
+ "grad_norm": 0.13753823275156205,
655
+ "learning_rate": 6.225103272658889e-06,
656
+ "loss": 0.2034,
657
+ "num_tokens": 33706927.0,
658
+ "step": 81
659
+ },
660
+ {
661
+ "epoch": 1.4385964912280702,
662
+ "grad_norm": 0.1384289808314504,
663
+ "learning_rate": 6.140416772229785e-06,
664
+ "loss": 0.1917,
665
+ "num_tokens": 34112601.0,
666
+ "step": 82
667
+ },
668
+ {
669
+ "epoch": 1.456140350877193,
670
+ "grad_norm": 0.14627667997521285,
671
+ "learning_rate": 6.0554981149871276e-06,
672
+ "loss": 0.2063,
673
+ "num_tokens": 34517104.0,
674
+ "step": 83
675
+ },
676
+ {
677
+ "epoch": 1.4736842105263157,
678
+ "grad_norm": 0.1558573666402862,
679
+ "learning_rate": 5.970378084704441e-06,
680
+ "loss": 0.1994,
681
+ "num_tokens": 34897309.0,
682
+ "step": 84
683
+ },
684
+ {
685
+ "epoch": 1.4912280701754386,
686
+ "grad_norm": 0.1417112898190135,
687
+ "learning_rate": 5.88508753815478e-06,
688
+ "loss": 0.1881,
689
+ "num_tokens": 35307793.0,
690
+ "step": 85
691
+ },
692
+ {
693
+ "epoch": 1.5087719298245614,
694
+ "grad_norm": 0.1385582692497573,
695
+ "learning_rate": 5.799657393924869e-06,
696
+ "loss": 0.198,
697
+ "num_tokens": 35741435.0,
698
+ "step": 86
699
+ },
700
+ {
701
+ "epoch": 1.526315789473684,
702
+ "grad_norm": 0.15662493183414183,
703
+ "learning_rate": 5.714118621206843e-06,
704
+ "loss": 0.1909,
705
+ "num_tokens": 36110154.0,
706
+ "step": 87
707
+ },
708
+ {
709
+ "epoch": 1.543859649122807,
710
+ "grad_norm": 0.14798100623872662,
711
+ "learning_rate": 5.6285022285716325e-06,
712
+ "loss": 0.2063,
713
+ "num_tokens": 36508508.0,
714
+ "step": 88
715
+ },
716
+ {
717
+ "epoch": 1.5614035087719298,
718
+ "grad_norm": 0.13949297838603725,
719
+ "learning_rate": 5.542839252728096e-06,
720
+ "loss": 0.2056,
721
+ "num_tokens": 36962199.0,
722
+ "step": 89
723
+ },
724
+ {
725
+ "epoch": 1.5789473684210527,
726
+ "grad_norm": 0.1388395627421214,
727
+ "learning_rate": 5.457160747271906e-06,
728
+ "loss": 0.1977,
729
+ "num_tokens": 37416119.0,
730
+ "step": 90
731
+ },
732
+ {
733
+ "epoch": 1.5964912280701755,
734
+ "grad_norm": 0.13753404705909872,
735
+ "learning_rate": 5.371497771428368e-06,
736
+ "loss": 0.1988,
737
+ "num_tokens": 37844052.0,
738
+ "step": 91
739
+ },
740
+ {
741
+ "epoch": 1.6140350877192984,
742
+ "grad_norm": 0.13813649436163167,
743
+ "learning_rate": 5.2858813787931605e-06,
744
+ "loss": 0.193,
745
+ "num_tokens": 38281149.0,
746
+ "step": 92
747
+ },
748
+ {
749
+ "epoch": 1.631578947368421,
750
+ "grad_norm": 0.13735934501319846,
751
+ "learning_rate": 5.2003426060751324e-06,
752
+ "loss": 0.1948,
753
+ "num_tokens": 38696776.0,
754
+ "step": 93
755
+ },
756
+ {
757
+ "epoch": 1.6491228070175439,
758
+ "grad_norm": 0.14464652766257102,
759
+ "learning_rate": 5.114912461845223e-06,
760
+ "loss": 0.1954,
761
+ "num_tokens": 39118421.0,
762
+ "step": 94
763
+ },
764
+ {
765
+ "epoch": 1.6666666666666665,
766
+ "grad_norm": 0.1441541884884212,
767
+ "learning_rate": 5.02962191529556e-06,
768
+ "loss": 0.1969,
769
+ "num_tokens": 39531921.0,
770
+ "step": 95
771
+ },
772
+ {
773
+ "epoch": 1.6842105263157894,
774
+ "grad_norm": 0.14549147633711634,
775
+ "learning_rate": 4.944501885012875e-06,
776
+ "loss": 0.1987,
777
+ "num_tokens": 39942510.0,
778
+ "step": 96
779
+ },
780
+ {
781
+ "epoch": 1.7017543859649122,
782
+ "grad_norm": 0.14239386063143547,
783
+ "learning_rate": 4.859583227770218e-06,
784
+ "loss": 0.1942,
785
+ "num_tokens": 40349157.0,
786
+ "step": 97
787
+ },
788
+ {
789
+ "epoch": 1.719298245614035,
790
+ "grad_norm": 0.14101938277964904,
791
+ "learning_rate": 4.774896727341113e-06,
792
+ "loss": 0.1896,
793
+ "num_tokens": 40755487.0,
794
+ "step": 98
795
+ },
796
+ {
797
+ "epoch": 1.736842105263158,
798
+ "grad_norm": 0.1513563377956916,
799
+ "learning_rate": 4.6904730833401575e-06,
800
+ "loss": 0.1741,
801
+ "num_tokens": 41109588.0,
802
+ "step": 99
803
+ },
804
+ {
805
+ "epoch": 1.7543859649122808,
806
+ "grad_norm": 0.14171664584567437,
807
+ "learning_rate": 4.606342900094147e-06,
808
+ "loss": 0.1978,
809
+ "num_tokens": 41549463.0,
810
+ "step": 100
811
+ },
812
+ {
813
+ "epoch": 1.7719298245614035,
814
+ "grad_norm": 0.1403769471140474,
815
+ "learning_rate": 4.5225366755477165e-06,
816
+ "loss": 0.2018,
817
+ "num_tokens": 41986009.0,
818
+ "step": 101
819
+ },
820
+ {
821
+ "epoch": 1.7894736842105263,
822
+ "grad_norm": 0.14588775725754724,
823
+ "learning_rate": 4.439084790207577e-06,
824
+ "loss": 0.1991,
825
+ "num_tokens": 42393517.0,
826
+ "step": 102
827
+ },
828
+ {
829
+ "epoch": 1.807017543859649,
830
+ "grad_norm": 0.14195463960673751,
831
+ "learning_rate": 4.35601749612931e-06,
832
+ "loss": 0.1954,
833
+ "num_tokens": 42788971.0,
834
+ "step": 103
835
+ },
836
+ {
837
+ "epoch": 1.8245614035087718,
838
+ "grad_norm": 0.14486517700153345,
839
+ "learning_rate": 4.273364905950711e-06,
840
+ "loss": 0.2001,
841
+ "num_tokens": 43200059.0,
842
+ "step": 104
843
+ },
844
+ {
845
+ "epoch": 1.8421052631578947,
846
+ "grad_norm": 0.14536497153998434,
847
+ "learning_rate": 4.191156981975704e-06,
848
+ "loss": 0.1881,
849
+ "num_tokens": 43591515.0,
850
+ "step": 105
851
+ },
852
+ {
853
+ "epoch": 1.8596491228070176,
854
+ "grad_norm": 0.1483672348465985,
855
+ "learning_rate": 4.109423525312738e-06,
856
+ "loss": 0.1936,
857
+ "num_tokens": 43989015.0,
858
+ "step": 106
859
+ },
860
+ {
861
+ "epoch": 1.8771929824561404,
862
+ "grad_norm": 0.14387471159557752,
863
+ "learning_rate": 4.028194165071603e-06,
864
+ "loss": 0.1959,
865
+ "num_tokens": 44390867.0,
866
+ "step": 107
867
+ },
868
+ {
869
+ "epoch": 1.8947368421052633,
870
+ "grad_norm": 0.14319263387686854,
871
+ "learning_rate": 3.9474983476226335e-06,
872
+ "loss": 0.2026,
873
+ "num_tokens": 44814288.0,
874
+ "step": 108
875
+ },
876
+ {
877
+ "epoch": 1.912280701754386,
878
+ "grad_norm": 0.13718763298366718,
879
+ "learning_rate": 3.867365325922116e-06,
880
+ "loss": 0.1919,
881
+ "num_tokens": 45232685.0,
882
+ "step": 109
883
+ },
884
+ {
885
+ "epoch": 1.9298245614035088,
886
+ "grad_norm": 0.13661747990592807,
887
+ "learning_rate": 3.7878241489078473e-06,
888
+ "loss": 0.192,
889
+ "num_tokens": 45633905.0,
890
+ "step": 110
891
+ },
892
+ {
893
+ "epoch": 1.9473684210526314,
894
+ "grad_norm": 0.13757723840377134,
895
+ "learning_rate": 3.7089036509686216e-06,
896
+ "loss": 0.196,
897
+ "num_tokens": 46052270.0,
898
+ "step": 111
899
+ },
900
+ {
901
+ "epoch": 1.9649122807017543,
902
+ "grad_norm": 0.14009156799615108,
903
+ "learning_rate": 3.630632441491512e-06,
904
+ "loss": 0.1945,
905
+ "num_tokens": 46479271.0,
906
+ "step": 112
907
+ },
908
+ {
909
+ "epoch": 1.9824561403508771,
910
+ "grad_norm": 0.1392559652525668,
911
+ "learning_rate": 3.5530388944907124e-06,
912
+ "loss": 0.1985,
913
+ "num_tokens": 46884227.0,
914
+ "step": 113
915
+ },
916
+ {
917
+ "epoch": 2.0,
918
+ "grad_norm": 0.13976428969132587,
919
+ "learning_rate": 3.476151138321705e-06,
920
+ "loss": 0.1995,
921
+ "num_tokens": 47297644.0,
922
+ "step": 114
923
+ },
924
+ {
925
+ "epoch": 2.017543859649123,
926
+ "grad_norm": 0.1378198428541279,
927
+ "learning_rate": 3.3999970454844688e-06,
928
+ "loss": 0.1724,
929
+ "num_tokens": 47688068.0,
930
+ "step": 115
931
+ },
932
+ {
933
+ "epoch": 2.0350877192982457,
934
+ "grad_norm": 0.134440422974191,
935
+ "learning_rate": 3.3246042225194626e-06,
936
+ "loss": 0.1796,
937
+ "num_tokens": 48092477.0,
938
+ "step": 116
939
+ },
940
+ {
941
+ "epoch": 2.0526315789473686,
942
+ "grad_norm": 0.13660484419562605,
943
+ "learning_rate": 3.2500000000000015e-06,
944
+ "loss": 0.1763,
945
+ "num_tokens": 48476841.0,
946
+ "step": 117
947
+ },
948
+ {
949
+ "epoch": 2.0701754385964914,
950
+ "grad_norm": 0.14109474340650238,
951
+ "learning_rate": 3.176211422624672e-06,
952
+ "loss": 0.1778,
953
+ "num_tokens": 48854905.0,
954
+ "step": 118
955
+ },
956
+ {
957
+ "epoch": 2.087719298245614,
958
+ "grad_norm": 0.13774654351946805,
959
+ "learning_rate": 3.103265239413401e-06,
960
+ "loss": 0.1793,
961
+ "num_tokens": 49295065.0,
962
+ "step": 119
963
+ },
964
+ {
965
+ "epoch": 2.1052631578947367,
966
+ "grad_norm": 0.14705463035874308,
967
+ "learning_rate": 3.0311878940106864e-06,
968
+ "loss": 0.1885,
969
+ "num_tokens": 49711843.0,
970
+ "step": 120
971
+ },
972
+ {
973
+ "epoch": 2.1228070175438596,
974
+ "grad_norm": 0.13965440849358451,
975
+ "learning_rate": 2.9600055150995397e-06,
976
+ "loss": 0.1804,
977
+ "num_tokens": 50121373.0,
978
+ "step": 121
979
+ },
980
+ {
981
+ "epoch": 2.1403508771929824,
982
+ "grad_norm": 0.1431354792667028,
983
+ "learning_rate": 2.889743906929609e-06,
984
+ "loss": 0.1761,
985
+ "num_tokens": 50524660.0,
986
+ "step": 122
987
+ },
988
+ {
989
+ "epoch": 2.1578947368421053,
990
+ "grad_norm": 0.13549946577694855,
991
+ "learning_rate": 2.820428539962905e-06,
992
+ "loss": 0.1756,
993
+ "num_tokens": 50952097.0,
994
+ "step": 123
995
+ },
996
+ {
997
+ "epoch": 2.175438596491228,
998
+ "grad_norm": 0.13874042982824947,
999
+ "learning_rate": 2.7520845416405285e-06,
1000
+ "loss": 0.1787,
1001
+ "num_tokens": 51357662.0,
1002
+ "step": 124
1003
+ },
1004
+ {
1005
+ "epoch": 2.192982456140351,
1006
+ "grad_norm": 0.13352052067268536,
1007
+ "learning_rate": 2.6847366872737535e-06,
1008
+ "loss": 0.1786,
1009
+ "num_tokens": 51772391.0,
1010
+ "step": 125
1011
+ },
1012
+ {
1013
+ "epoch": 2.2105263157894735,
1014
+ "grad_norm": 0.13750830287403998,
1015
+ "learning_rate": 2.618409391062751e-06,
1016
+ "loss": 0.1827,
1017
+ "num_tokens": 52198396.0,
1018
+ "step": 126
1019
+ },
1020
+ {
1021
+ "epoch": 2.2280701754385963,
1022
+ "grad_norm": 0.14077287411728898,
1023
+ "learning_rate": 2.5531266972462176e-06,
1024
+ "loss": 0.1786,
1025
+ "num_tokens": 52585564.0,
1026
+ "step": 127
1027
+ },
1028
+ {
1029
+ "epoch": 2.245614035087719,
1030
+ "grad_norm": 0.13893984896019573,
1031
+ "learning_rate": 2.4889122713851397e-06,
1032
+ "loss": 0.1788,
1033
+ "num_tokens": 52997398.0,
1034
+ "step": 128
1035
+ },
1036
+ {
1037
+ "epoch": 2.263157894736842,
1038
+ "grad_norm": 0.13788162656378736,
1039
+ "learning_rate": 2.425789391783796e-06,
1040
+ "loss": 0.1878,
1041
+ "num_tokens": 53407933.0,
1042
+ "step": 129
1043
+ },
1044
+ {
1045
+ "epoch": 2.280701754385965,
1046
+ "grad_norm": 0.13629331805149528,
1047
+ "learning_rate": 2.36378094105118e-06,
1048
+ "loss": 0.1836,
1049
+ "num_tokens": 53817667.0,
1050
+ "step": 130
1051
+ },
1052
+ {
1053
+ "epoch": 2.2982456140350878,
1054
+ "grad_norm": 0.145846640939152,
1055
+ "learning_rate": 2.302909397805841e-06,
1056
+ "loss": 0.1761,
1057
+ "num_tokens": 54208139.0,
1058
+ "step": 131
1059
+ },
1060
+ {
1061
+ "epoch": 2.3157894736842106,
1062
+ "grad_norm": 0.1415158735561498,
1063
+ "learning_rate": 2.2431968285271843e-06,
1064
+ "loss": 0.1861,
1065
+ "num_tokens": 54616138.0,
1066
+ "step": 132
1067
+ },
1068
+ {
1069
+ "epoch": 2.3333333333333335,
1070
+ "grad_norm": 0.1399694993181749,
1071
+ "learning_rate": 2.1846648795561777e-06,
1072
+ "loss": 0.18,
1073
+ "num_tokens": 55028264.0,
1074
+ "step": 133
1075
+ },
1076
+ {
1077
+ "epoch": 2.3508771929824563,
1078
+ "grad_norm": 0.1340221566625987,
1079
+ "learning_rate": 2.1273347692483574e-06,
1080
+ "loss": 0.1818,
1081
+ "num_tokens": 55474995.0,
1082
+ "step": 134
1083
+ },
1084
+ {
1085
+ "epoch": 2.3684210526315788,
1086
+ "grad_norm": 0.13728502667314055,
1087
+ "learning_rate": 2.071227280281982e-06,
1088
+ "loss": 0.1697,
1089
+ "num_tokens": 55872252.0,
1090
+ "step": 135
1091
+ },
1092
+ {
1093
+ "epoch": 2.3859649122807016,
1094
+ "grad_norm": 0.13569940106251407,
1095
+ "learning_rate": 2.016362752124129e-06,
1096
+ "loss": 0.1799,
1097
+ "num_tokens": 56295990.0,
1098
+ "step": 136
1099
+ },
1100
+ {
1101
+ "epoch": 2.4035087719298245,
1102
+ "grad_norm": 0.1433225385861297,
1103
+ "learning_rate": 1.9627610736574575e-06,
1104
+ "loss": 0.1744,
1105
+ "num_tokens": 56700633.0,
1106
+ "step": 137
1107
+ },
1108
+ {
1109
+ "epoch": 2.4210526315789473,
1110
+ "grad_norm": 0.13712140562366157,
1111
+ "learning_rate": 1.9104416759703017e-06,
1112
+ "loss": 0.1772,
1113
+ "num_tokens": 57123351.0,
1114
+ "step": 138
1115
+ },
1116
+ {
1117
+ "epoch": 2.43859649122807,
1118
+ "grad_norm": 0.14064914274676912,
1119
+ "learning_rate": 1.8594235253127373e-06,
1120
+ "loss": 0.1794,
1121
+ "num_tokens": 57541451.0,
1122
+ "step": 139
1123
+ },
1124
+ {
1125
+ "epoch": 2.456140350877193,
1126
+ "grad_norm": 0.15170132064659694,
1127
+ "learning_rate": 1.8097251162211405e-06,
1128
+ "loss": 0.1831,
1129
+ "num_tokens": 57962223.0,
1130
+ "step": 140
1131
+ },
1132
+ {
1133
+ "epoch": 2.473684210526316,
1134
+ "grad_norm": 0.13964776563103484,
1135
+ "learning_rate": 1.7613644648137543e-06,
1136
+ "loss": 0.1756,
1137
+ "num_tokens": 58375881.0,
1138
+ "step": 141
1139
+ },
1140
+ {
1141
+ "epoch": 2.4912280701754383,
1142
+ "grad_norm": 0.13507579048092097,
1143
+ "learning_rate": 1.7143591022596846e-06,
1144
+ "loss": 0.1821,
1145
+ "num_tokens": 58796929.0,
1146
+ "step": 142
1147
+ },
1148
+ {
1149
+ "epoch": 2.5087719298245617,
1150
+ "grad_norm": 0.13875107577532086,
1151
+ "learning_rate": 1.6687260684236943e-06,
1152
+ "loss": 0.1773,
1153
+ "num_tokens": 59207995.0,
1154
+ "step": 143
1155
+ },
1156
+ {
1157
+ "epoch": 2.526315789473684,
1158
+ "grad_norm": 0.14061593378122658,
1159
+ "learning_rate": 1.6244819056890975e-06,
1160
+ "loss": 0.1716,
1161
+ "num_tokens": 59582578.0,
1162
+ "step": 144
1163
+ },
1164
+ {
1165
+ "epoch": 2.543859649122807,
1166
+ "grad_norm": 0.12901477335373565,
1167
+ "learning_rate": 1.5816426529610035e-06,
1168
+ "loss": 0.1764,
1169
+ "num_tokens": 60014351.0,
1170
+ "step": 145
1171
+ },
1172
+ {
1173
+ "epoch": 2.56140350877193,
1174
+ "grad_norm": 0.13513262564013573,
1175
+ "learning_rate": 1.5402238398520614e-06,
1176
+ "loss": 0.1742,
1177
+ "num_tokens": 60428513.0,
1178
+ "step": 146
1179
+ },
1180
+ {
1181
+ "epoch": 2.5789473684210527,
1182
+ "grad_norm": 0.12744611421871882,
1183
+ "learning_rate": 1.5002404810528452e-06,
1184
+ "loss": 0.1798,
1185
+ "num_tokens": 60870775.0,
1186
+ "step": 147
1187
+ },
1188
+ {
1189
+ "epoch": 2.5964912280701755,
1190
+ "grad_norm": 0.1281932184087842,
1191
+ "learning_rate": 1.4617070708888882e-06,
1192
+ "loss": 0.1788,
1193
+ "num_tokens": 61333167.0,
1194
+ "step": 148
1195
+ },
1196
+ {
1197
+ "epoch": 2.6140350877192984,
1198
+ "grad_norm": 0.13398144271039164,
1199
+ "learning_rate": 1.4246375780663613e-06,
1200
+ "loss": 0.1792,
1201
+ "num_tokens": 61737623.0,
1202
+ "step": 149
1203
+ },
1204
+ {
1205
+ "epoch": 2.6315789473684212,
1206
+ "grad_norm": 0.13540743049220252,
1207
+ "learning_rate": 1.389045440608296e-06,
1208
+ "loss": 0.1755,
1209
+ "num_tokens": 62143293.0,
1210
+ "step": 150
1211
+ },
1212
+ {
1213
+ "epoch": 2.6491228070175437,
1214
+ "grad_norm": 0.13564465493581726,
1215
+ "learning_rate": 1.354943560983175e-06,
1216
+ "loss": 0.1735,
1217
+ "num_tokens": 62558499.0,
1218
+ "step": 151
1219
+ },
1220
+ {
1221
+ "epoch": 2.6666666666666665,
1222
+ "grad_norm": 0.12805186009140426,
1223
+ "learning_rate": 1.3223443014276738e-06,
1224
+ "loss": 0.1736,
1225
+ "num_tokens": 63004628.0,
1226
+ "step": 152
1227
+ },
1228
+ {
1229
+ "epoch": 2.6842105263157894,
1230
+ "grad_norm": 0.1328569132316143,
1231
+ "learning_rate": 1.2912594794652406e-06,
1232
+ "loss": 0.1642,
1233
+ "num_tokens": 63387346.0,
1234
+ "step": 153
1235
+ },
1236
+ {
1237
+ "epoch": 2.7017543859649122,
1238
+ "grad_norm": 0.1325321320124978,
1239
+ "learning_rate": 1.2617003636221394e-06,
1240
+ "loss": 0.169,
1241
+ "num_tokens": 63804970.0,
1242
+ "step": 154
1243
+ },
1244
+ {
1245
+ "epoch": 2.719298245614035,
1246
+ "grad_norm": 0.13540714382771668,
1247
+ "learning_rate": 1.2336776693425028e-06,
1248
+ "loss": 0.1744,
1249
+ "num_tokens": 64196162.0,
1250
+ "step": 155
1251
+ },
1252
+ {
1253
+ "epoch": 2.736842105263158,
1254
+ "grad_norm": 0.14020781213013872,
1255
+ "learning_rate": 1.2072015551038933e-06,
1256
+ "loss": 0.1811,
1257
+ "num_tokens": 64585657.0,
1258
+ "step": 156
1259
+ },
1260
+ {
1261
+ "epoch": 2.754385964912281,
1262
+ "grad_norm": 0.14012421310202808,
1263
+ "learning_rate": 1.1822816187347625e-06,
1264
+ "loss": 0.1882,
1265
+ "num_tokens": 64990929.0,
1266
+ "step": 157
1267
+ },
1268
+ {
1269
+ "epoch": 2.7719298245614032,
1270
+ "grad_norm": 0.13359559789919473,
1271
+ "learning_rate": 1.1589268939351499e-06,
1272
+ "loss": 0.1644,
1273
+ "num_tokens": 65419394.0,
1274
+ "step": 158
1275
+ },
1276
+ {
1277
+ "epoch": 2.7894736842105265,
1278
+ "grad_norm": 0.1293973137684263,
1279
+ "learning_rate": 1.1371458470018896e-06,
1280
+ "loss": 0.1686,
1281
+ "num_tokens": 65848256.0,
1282
+ "step": 159
1283
+ },
1284
+ {
1285
+ "epoch": 2.807017543859649,
1286
+ "grad_norm": 0.12796590503255867,
1287
+ "learning_rate": 1.1169463737594995e-06,
1288
+ "loss": 0.173,
1289
+ "num_tokens": 66276026.0,
1290
+ "step": 160
1291
+ },
1292
+ {
1293
+ "epoch": 2.824561403508772,
1294
+ "grad_norm": 0.1386629969970847,
1295
+ "learning_rate": 1.0983357966978747e-06,
1296
+ "loss": 0.1698,
1297
+ "num_tokens": 66662640.0,
1298
+ "step": 161
1299
+ },
1300
+ {
1301
+ "epoch": 2.8421052631578947,
1302
+ "grad_norm": 0.1312256058443758,
1303
+ "learning_rate": 1.0813208623178199e-06,
1304
+ "loss": 0.1831,
1305
+ "num_tokens": 67101128.0,
1306
+ "step": 162
1307
+ },
1308
+ {
1309
+ "epoch": 2.8596491228070176,
1310
+ "grad_norm": 0.13566699518356568,
1311
+ "learning_rate": 1.0659077386853817e-06,
1312
+ "loss": 0.1918,
1313
+ "num_tokens": 67527335.0,
1314
+ "step": 163
1315
+ },
1316
+ {
1317
+ "epoch": 2.8771929824561404,
1318
+ "grad_norm": 0.13207372833151468,
1319
+ "learning_rate": 1.0521020131958692e-06,
1320
+ "loss": 0.18,
1321
+ "num_tokens": 67953220.0,
1322
+ "step": 164
1323
+ },
1324
+ {
1325
+ "epoch": 2.8947368421052633,
1326
+ "grad_norm": 0.1351068484295404,
1327
+ "learning_rate": 1.0399086905483752e-06,
1328
+ "loss": 0.1796,
1329
+ "num_tokens": 68401961.0,
1330
+ "step": 165
1331
+ },
1332
+ {
1333
+ "epoch": 2.912280701754386,
1334
+ "grad_norm": 0.13512574709033598,
1335
+ "learning_rate": 1.0293321909315242e-06,
1336
+ "loss": 0.1742,
1337
+ "num_tokens": 68815465.0,
1338
+ "step": 166
1339
+ },
1340
+ {
1341
+ "epoch": 2.9298245614035086,
1342
+ "grad_norm": 0.13483699443340522,
1343
+ "learning_rate": 1.0203763484211196e-06,
1344
+ "loss": 0.1778,
1345
+ "num_tokens": 69255767.0,
1346
+ "step": 167
1347
+ },
1348
+ {
1349
+ "epoch": 2.9473684210526314,
1350
+ "grad_norm": 0.13142810366700336,
1351
+ "learning_rate": 1.0130444095902514e-06,
1352
+ "loss": 0.1842,
1353
+ "num_tokens": 69678619.0,
1354
+ "step": 168
1355
+ },
1356
+ {
1357
+ "epoch": 2.9649122807017543,
1358
+ "grad_norm": 0.13693412077884642,
1359
+ "learning_rate": 1.0073390323323897e-06,
1360
+ "loss": 0.177,
1361
+ "num_tokens": 70098414.0,
1362
+ "step": 169
1363
+ },
1364
+ {
1365
+ "epoch": 2.982456140350877,
1366
+ "grad_norm": 0.1320466096940847,
1367
+ "learning_rate": 1.0032622848978689e-06,
1368
+ "loss": 0.168,
1369
+ "num_tokens": 70513950.0,
1370
+ "step": 170
1371
+ },
1372
+ {
1373
+ "epoch": 3.0,
1374
+ "grad_norm": 0.1365255306036787,
1375
+ "learning_rate": 1.000815645144134e-06,
1376
+ "loss": 0.1794,
1377
+ "num_tokens": 70937090.0,
1378
+ "step": 171
1379
+ },
1380
+ {
1381
+ "epoch": 3.0,
1382
+ "step": 171,
1383
+ "total_flos": 2.276376686268252e+17,
1384
+ "train_loss": 0.25581319124726526,
1385
+ "train_runtime": 2707.0199,
1386
+ "train_samples_per_second": 8.073,
1387
+ "train_steps_per_second": 0.063
1388
+ }
1389
+ ],
1390
+ "logging_steps": 1,
1391
+ "max_steps": 171,
1392
+ "num_input_tokens_seen": 0,
1393
+ "num_train_epochs": 3,
1394
+ "save_steps": 500,
1395
+ "stateful_callbacks": {
1396
+ "TrainerControl": {
1397
+ "args": {
1398
+ "should_epoch_stop": false,
1399
+ "should_evaluate": false,
1400
+ "should_log": false,
1401
+ "should_save": true,
1402
+ "should_training_stop": true
1403
+ },
1404
+ "attributes": {}
1405
+ }
1406
+ },
1407
+ "total_flos": 2.276376686268252e+17,
1408
+ "train_batch_size": 8,
1409
+ "trial_name": null,
1410
+ "trial_params": null
1411
+ }