AmberYifan commited on
Commit
5492ebd
·
verified ·
1 Parent(s): 6d3d9ec

Model save

Browse files
all_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "total_flos": 0.0,
4
+ "train_loss": 0.02150259389877319,
5
+ "train_runtime": 2425.829,
6
+ "train_samples": 9999,
7
+ "train_samples_per_second": 4.122,
8
+ "train_steps_per_second": 0.515
9
+ }
generation_config.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 151643,
3
+ "do_sample": true,
4
+ "eos_token_id": [
5
+ 151645,
6
+ 151643
7
+ ],
8
+ "pad_token_id": 151643,
9
+ "repetition_penalty": 1.05,
10
+ "temperature": 0.7,
11
+ "top_k": 20,
12
+ "top_p": 0.8,
13
+ "transformers_version": "4.53.3"
14
+ }
train_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "total_flos": 0.0,
4
+ "train_loss": 0.02150259389877319,
5
+ "train_runtime": 2425.829,
6
+ "train_samples": 9999,
7
+ "train_samples_per_second": 4.122,
8
+ "train_steps_per_second": 0.515
9
+ }
trainer_state.json ADDED
@@ -0,0 +1,2076 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": null,
3
+ "best_metric": null,
4
+ "best_model_checkpoint": null,
5
+ "epoch": 1.0,
6
+ "eval_steps": 500,
7
+ "global_step": 1250,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.0008,
14
+ "grad_norm": 234.60278509074556,
15
+ "learning_rate": 0.0,
16
+ "logits/chosen": 0.107421875,
17
+ "logits/rejected": 0.08984375,
18
+ "logps/chosen": -262.0,
19
+ "logps/rejected": -342.0,
20
+ "loss": 0.6914,
21
+ "nll_loss": 1.015625,
22
+ "rewards/accuracies": 0.0,
23
+ "rewards/chosen": 0.0,
24
+ "rewards/margins": 0.0,
25
+ "rewards/rejected": 0.0,
26
+ "step": 1
27
+ },
28
+ {
29
+ "epoch": 0.008,
30
+ "grad_norm": 282.5477340162511,
31
+ "learning_rate": 3.6e-08,
32
+ "logits/chosen": -0.1501736044883728,
33
+ "logits/rejected": 0.009562174789607525,
34
+ "logps/chosen": -338.22222900390625,
35
+ "logps/rejected": -378.0,
36
+ "loss": 0.6885,
37
+ "nll_loss": 0.9717881679534912,
38
+ "rewards/accuracies": 0.3194444477558136,
39
+ "rewards/chosen": 0.015223185531795025,
40
+ "rewards/margins": 0.0276963971555233,
41
+ "rewards/rejected": -0.01256646029651165,
42
+ "step": 10
43
+ },
44
+ {
45
+ "epoch": 0.016,
46
+ "grad_norm": 206.12921184995773,
47
+ "learning_rate": 7.599999999999999e-08,
48
+ "logits/chosen": 0.1673583984375,
49
+ "logits/rejected": 0.0367431640625,
50
+ "logps/chosen": -207.0500030517578,
51
+ "logps/rejected": -415.20001220703125,
52
+ "loss": 0.6115,
53
+ "nll_loss": 0.9085937738418579,
54
+ "rewards/accuracies": 0.675000011920929,
55
+ "rewards/chosen": -0.02422180213034153,
56
+ "rewards/margins": 0.18081054091453552,
57
+ "rewards/rejected": -0.20512695610523224,
58
+ "step": 20
59
+ },
60
+ {
61
+ "epoch": 0.024,
62
+ "grad_norm": 158.12897373074685,
63
+ "learning_rate": 1.16e-07,
64
+ "logits/chosen": -0.01387939415872097,
65
+ "logits/rejected": 0.06098632887005806,
66
+ "logps/chosen": -323.70001220703125,
67
+ "logps/rejected": -389.6000061035156,
68
+ "loss": 0.4236,
69
+ "nll_loss": 0.9488281011581421,
70
+ "rewards/accuracies": 0.949999988079071,
71
+ "rewards/chosen": -0.1749267578125,
72
+ "rewards/margins": 0.7095702886581421,
73
+ "rewards/rejected": -0.8843749761581421,
74
+ "step": 30
75
+ },
76
+ {
77
+ "epoch": 0.032,
78
+ "grad_norm": 97.36760160402432,
79
+ "learning_rate": 1.56e-07,
80
+ "logits/chosen": -0.011962890625,
81
+ "logits/rejected": 0.02890625037252903,
82
+ "logps/chosen": -329.5,
83
+ "logps/rejected": -412.3999938964844,
84
+ "loss": 0.2188,
85
+ "nll_loss": 0.99609375,
86
+ "rewards/accuracies": 0.9750000238418579,
87
+ "rewards/chosen": -0.4403320252895355,
88
+ "rewards/margins": 1.7859375476837158,
89
+ "rewards/rejected": -2.2265625,
90
+ "step": 40
91
+ },
92
+ {
93
+ "epoch": 0.04,
94
+ "grad_norm": 67.3942566184986,
95
+ "learning_rate": 1.96e-07,
96
+ "logits/chosen": 0.07175292819738388,
97
+ "logits/rejected": -0.018310546875,
98
+ "logps/chosen": -261.70001220703125,
99
+ "logps/rejected": -420.20001220703125,
100
+ "loss": 0.1252,
101
+ "nll_loss": 1.0148437023162842,
102
+ "rewards/accuracies": 1.0,
103
+ "rewards/chosen": -0.517773449420929,
104
+ "rewards/margins": 2.8515625,
105
+ "rewards/rejected": -3.3734374046325684,
106
+ "step": 50
107
+ },
108
+ {
109
+ "epoch": 0.048,
110
+ "grad_norm": 25.358406316482952,
111
+ "learning_rate": 2.3599999999999997e-07,
112
+ "logits/chosen": 0.17060546576976776,
113
+ "logits/rejected": 0.15923461318016052,
114
+ "logps/chosen": -264.1000061035156,
115
+ "logps/rejected": -445.6000061035156,
116
+ "loss": 0.0376,
117
+ "nll_loss": 0.9632812738418579,
118
+ "rewards/accuracies": 1.0,
119
+ "rewards/chosen": -0.8160156011581421,
120
+ "rewards/margins": 4.489062309265137,
121
+ "rewards/rejected": -5.306250095367432,
122
+ "step": 60
123
+ },
124
+ {
125
+ "epoch": 0.056,
126
+ "grad_norm": 83.05887552576917,
127
+ "learning_rate": 2.7600000000000004e-07,
128
+ "logits/chosen": 0.147216796875,
129
+ "logits/rejected": 0.18815918266773224,
130
+ "logps/chosen": -271.70001220703125,
131
+ "logps/rejected": -455.20001220703125,
132
+ "loss": 0.0631,
133
+ "nll_loss": 0.940625011920929,
134
+ "rewards/accuracies": 0.987500011920929,
135
+ "rewards/chosen": -1.316015601158142,
136
+ "rewards/margins": 6.240624904632568,
137
+ "rewards/rejected": -7.556250095367432,
138
+ "step": 70
139
+ },
140
+ {
141
+ "epoch": 0.064,
142
+ "grad_norm": 25.278258079670092,
143
+ "learning_rate": 3.1599999999999997e-07,
144
+ "logits/chosen": 0.17365722358226776,
145
+ "logits/rejected": 0.2812866270542145,
146
+ "logps/chosen": -328.8999938964844,
147
+ "logps/rejected": -457.0,
148
+ "loss": 0.0121,
149
+ "nll_loss": 1.033203125,
150
+ "rewards/accuracies": 1.0,
151
+ "rewards/chosen": -1.478515625,
152
+ "rewards/margins": 7.271874904632568,
153
+ "rewards/rejected": -8.743749618530273,
154
+ "step": 80
155
+ },
156
+ {
157
+ "epoch": 0.072,
158
+ "grad_norm": 8.036414090919388,
159
+ "learning_rate": 3.5599999999999996e-07,
160
+ "logits/chosen": 0.3252929747104645,
161
+ "logits/rejected": 0.3529296815395355,
162
+ "logps/chosen": -282.1000061035156,
163
+ "logps/rejected": -486.0,
164
+ "loss": 0.0349,
165
+ "nll_loss": 1.041406273841858,
166
+ "rewards/accuracies": 0.9750000238418579,
167
+ "rewards/chosen": -2.042187452316284,
168
+ "rewards/margins": 8.703125,
169
+ "rewards/rejected": -10.743749618530273,
170
+ "step": 90
171
+ },
172
+ {
173
+ "epoch": 0.08,
174
+ "grad_norm": 0.06277898862179868,
175
+ "learning_rate": 3.96e-07,
176
+ "logits/chosen": 0.11843261867761612,
177
+ "logits/rejected": 0.28974610567092896,
178
+ "logps/chosen": -338.29998779296875,
179
+ "logps/rejected": -518.5999755859375,
180
+ "loss": 0.0113,
181
+ "nll_loss": 1.068750023841858,
182
+ "rewards/accuracies": 1.0,
183
+ "rewards/chosen": -1.958593726158142,
184
+ "rewards/margins": 10.34375,
185
+ "rewards/rejected": -12.306249618530273,
186
+ "step": 100
187
+ },
188
+ {
189
+ "epoch": 0.088,
190
+ "grad_norm": 186.15959679077883,
191
+ "learning_rate": 4.36e-07,
192
+ "logits/chosen": 0.16660156846046448,
193
+ "logits/rejected": 0.23691406846046448,
194
+ "logps/chosen": -336.79998779296875,
195
+ "logps/rejected": -498.3999938964844,
196
+ "loss": 0.0368,
197
+ "nll_loss": 1.010156273841858,
198
+ "rewards/accuracies": 0.987500011920929,
199
+ "rewards/chosen": -2.2421875,
200
+ "rewards/margins": 10.175000190734863,
201
+ "rewards/rejected": -12.431249618530273,
202
+ "step": 110
203
+ },
204
+ {
205
+ "epoch": 0.096,
206
+ "grad_norm": 1.030662736090751,
207
+ "learning_rate": 4.76e-07,
208
+ "logits/chosen": 0.3314208984375,
209
+ "logits/rejected": 0.39873045682907104,
210
+ "logps/chosen": -291.20001220703125,
211
+ "logps/rejected": -559.7999877929688,
212
+ "loss": 0.0055,
213
+ "nll_loss": 0.977734386920929,
214
+ "rewards/accuracies": 1.0,
215
+ "rewards/chosen": -2.3921875953674316,
216
+ "rewards/margins": 13.274999618530273,
217
+ "rewards/rejected": -15.681249618530273,
218
+ "step": 120
219
+ },
220
+ {
221
+ "epoch": 0.104,
222
+ "grad_norm": 10.790384157037435,
223
+ "learning_rate": 4.982222222222223e-07,
224
+ "logits/chosen": 0.33642578125,
225
+ "logits/rejected": 0.3980468809604645,
226
+ "logps/chosen": -316.8999938964844,
227
+ "logps/rejected": -563.7999877929688,
228
+ "loss": 0.0057,
229
+ "nll_loss": 1.100000023841858,
230
+ "rewards/accuracies": 1.0,
231
+ "rewards/chosen": -2.5250000953674316,
232
+ "rewards/margins": 14.125,
233
+ "rewards/rejected": -16.65625,
234
+ "step": 130
235
+ },
236
+ {
237
+ "epoch": 0.112,
238
+ "grad_norm": 6.132805095404835,
239
+ "learning_rate": 4.937777777777777e-07,
240
+ "logits/chosen": 0.42326658964157104,
241
+ "logits/rejected": 0.41484373807907104,
242
+ "logps/chosen": -286.6000061035156,
243
+ "logps/rejected": -567.0,
244
+ "loss": 0.0025,
245
+ "nll_loss": 1.1179687976837158,
246
+ "rewards/accuracies": 1.0,
247
+ "rewards/chosen": -3.112499952316284,
248
+ "rewards/margins": 14.568750381469727,
249
+ "rewards/rejected": -17.6875,
250
+ "step": 140
251
+ },
252
+ {
253
+ "epoch": 0.12,
254
+ "grad_norm": 0.021958637023240073,
255
+ "learning_rate": 4.893333333333333e-07,
256
+ "logits/chosen": 0.45771485567092896,
257
+ "logits/rejected": 0.517138659954071,
258
+ "logps/chosen": -292.1000061035156,
259
+ "logps/rejected": -587.5999755859375,
260
+ "loss": 0.0195,
261
+ "nll_loss": 1.0183594226837158,
262
+ "rewards/accuracies": 0.987500011920929,
263
+ "rewards/chosen": -3.3921875953674316,
264
+ "rewards/margins": 15.631250381469727,
265
+ "rewards/rejected": -19.018749237060547,
266
+ "step": 150
267
+ },
268
+ {
269
+ "epoch": 0.128,
270
+ "grad_norm": 0.13947826744502106,
271
+ "learning_rate": 4.848888888888888e-07,
272
+ "logits/chosen": 0.2855468690395355,
273
+ "logits/rejected": 0.37548828125,
274
+ "logps/chosen": -297.95001220703125,
275
+ "logps/rejected": -582.4000244140625,
276
+ "loss": 0.0012,
277
+ "nll_loss": 1.080078125,
278
+ "rewards/accuracies": 1.0,
279
+ "rewards/chosen": -2.78515625,
280
+ "rewards/margins": 15.831250190734863,
281
+ "rewards/rejected": -18.618749618530273,
282
+ "step": 160
283
+ },
284
+ {
285
+ "epoch": 0.136,
286
+ "grad_norm": 0.16318898226871553,
287
+ "learning_rate": 4.804444444444444e-07,
288
+ "logits/chosen": 0.34228515625,
289
+ "logits/rejected": 0.42265623807907104,
290
+ "logps/chosen": -280.8999938964844,
291
+ "logps/rejected": -588.0,
292
+ "loss": 0.0029,
293
+ "nll_loss": 1.0382812023162842,
294
+ "rewards/accuracies": 1.0,
295
+ "rewards/chosen": -3.028125047683716,
296
+ "rewards/margins": 16.481250762939453,
297
+ "rewards/rejected": -19.512500762939453,
298
+ "step": 170
299
+ },
300
+ {
301
+ "epoch": 0.144,
302
+ "grad_norm": 16.913628536041927,
303
+ "learning_rate": 4.76e-07,
304
+ "logits/chosen": 0.29877930879592896,
305
+ "logits/rejected": 0.38134765625,
306
+ "logps/chosen": -337.3999938964844,
307
+ "logps/rejected": -592.4000244140625,
308
+ "loss": 0.0117,
309
+ "nll_loss": 1.0945312976837158,
310
+ "rewards/accuracies": 1.0,
311
+ "rewards/chosen": -2.65234375,
312
+ "rewards/margins": 17.700000762939453,
313
+ "rewards/rejected": -20.362499237060547,
314
+ "step": 180
315
+ },
316
+ {
317
+ "epoch": 0.152,
318
+ "grad_norm": 9.054585013706896,
319
+ "learning_rate": 4.7155555555555556e-07,
320
+ "logits/chosen": 0.4610839784145355,
321
+ "logits/rejected": 0.5546875,
322
+ "logps/chosen": -301.6000061035156,
323
+ "logps/rejected": -574.4000244140625,
324
+ "loss": 0.0198,
325
+ "nll_loss": 1.0695312023162842,
326
+ "rewards/accuracies": 0.987500011920929,
327
+ "rewards/chosen": -2.5296874046325684,
328
+ "rewards/margins": 16.774999618530273,
329
+ "rewards/rejected": -19.318750381469727,
330
+ "step": 190
331
+ },
332
+ {
333
+ "epoch": 0.16,
334
+ "grad_norm": 0.025795673574277502,
335
+ "learning_rate": 4.6711111111111104e-07,
336
+ "logits/chosen": 0.42558592557907104,
337
+ "logits/rejected": 0.5215820074081421,
338
+ "logps/chosen": -290.5,
339
+ "logps/rejected": -604.4000244140625,
340
+ "loss": 0.0011,
341
+ "nll_loss": 1.0128905773162842,
342
+ "rewards/accuracies": 1.0,
343
+ "rewards/chosen": -2.750781297683716,
344
+ "rewards/margins": 18.137500762939453,
345
+ "rewards/rejected": -20.887500762939453,
346
+ "step": 200
347
+ },
348
+ {
349
+ "epoch": 0.168,
350
+ "grad_norm": 0.049230968184915055,
351
+ "learning_rate": 4.6266666666666663e-07,
352
+ "logits/chosen": 0.4349609315395355,
353
+ "logits/rejected": 0.5816406011581421,
354
+ "logps/chosen": -299.20001220703125,
355
+ "logps/rejected": -581.4000244140625,
356
+ "loss": 0.0012,
357
+ "nll_loss": 1.058984398841858,
358
+ "rewards/accuracies": 1.0,
359
+ "rewards/chosen": -2.5531249046325684,
360
+ "rewards/margins": 17.506250381469727,
361
+ "rewards/rejected": -20.075000762939453,
362
+ "step": 210
363
+ },
364
+ {
365
+ "epoch": 0.176,
366
+ "grad_norm": 0.12051591908423682,
367
+ "learning_rate": 4.5822222222222216e-07,
368
+ "logits/chosen": 0.33723145723342896,
369
+ "logits/rejected": 0.4976562559604645,
370
+ "logps/chosen": -331.29998779296875,
371
+ "logps/rejected": -598.7999877929688,
372
+ "loss": 0.0014,
373
+ "nll_loss": 1.0636718273162842,
374
+ "rewards/accuracies": 1.0,
375
+ "rewards/chosen": -2.875,
376
+ "rewards/margins": 17.549999237060547,
377
+ "rewards/rejected": -20.412500381469727,
378
+ "step": 220
379
+ },
380
+ {
381
+ "epoch": 0.184,
382
+ "grad_norm": 0.02082872725280439,
383
+ "learning_rate": 4.5377777777777775e-07,
384
+ "logits/chosen": 0.44482421875,
385
+ "logits/rejected": 0.587109386920929,
386
+ "logps/chosen": -266.6000061035156,
387
+ "logps/rejected": -608.0,
388
+ "loss": 0.0113,
389
+ "nll_loss": 0.9273437261581421,
390
+ "rewards/accuracies": 0.987500011920929,
391
+ "rewards/chosen": -2.444531202316284,
392
+ "rewards/margins": 18.037500381469727,
393
+ "rewards/rejected": -20.487499237060547,
394
+ "step": 230
395
+ },
396
+ {
397
+ "epoch": 0.192,
398
+ "grad_norm": 1.2819745634114876,
399
+ "learning_rate": 4.493333333333333e-07,
400
+ "logits/chosen": 0.3896484375,
401
+ "logits/rejected": 0.533886730670929,
402
+ "logps/chosen": -330.5,
403
+ "logps/rejected": -569.7999877929688,
404
+ "loss": 0.0097,
405
+ "nll_loss": 0.998828113079071,
406
+ "rewards/accuracies": 1.0,
407
+ "rewards/chosen": -2.526562452316284,
408
+ "rewards/margins": 17.625,
409
+ "rewards/rejected": -20.162500381469727,
410
+ "step": 240
411
+ },
412
+ {
413
+ "epoch": 0.2,
414
+ "grad_norm": 0.6312660403152253,
415
+ "learning_rate": 4.4488888888888887e-07,
416
+ "logits/chosen": 0.39438170194625854,
417
+ "logits/rejected": 0.45518797636032104,
418
+ "logps/chosen": -317.6000061035156,
419
+ "logps/rejected": -540.5999755859375,
420
+ "loss": 0.0351,
421
+ "nll_loss": 1.03515625,
422
+ "rewards/accuracies": 0.987500011920929,
423
+ "rewards/chosen": -2.47265625,
424
+ "rewards/margins": 15.524999618530273,
425
+ "rewards/rejected": -18.0,
426
+ "step": 250
427
+ },
428
+ {
429
+ "epoch": 0.208,
430
+ "grad_norm": 0.013098055970262888,
431
+ "learning_rate": 4.4044444444444445e-07,
432
+ "logits/chosen": 0.3513244688510895,
433
+ "logits/rejected": 0.47832030057907104,
434
+ "logps/chosen": -316.3999938964844,
435
+ "logps/rejected": -599.5999755859375,
436
+ "loss": 0.0237,
437
+ "nll_loss": 1.089453101158142,
438
+ "rewards/accuracies": 0.987500011920929,
439
+ "rewards/chosen": -2.32421875,
440
+ "rewards/margins": 17.71875,
441
+ "rewards/rejected": -20.075000762939453,
442
+ "step": 260
443
+ },
444
+ {
445
+ "epoch": 0.216,
446
+ "grad_norm": 0.02660915986368852,
447
+ "learning_rate": 4.36e-07,
448
+ "logits/chosen": 0.4693359434604645,
449
+ "logits/rejected": 0.5892578363418579,
450
+ "logps/chosen": -300.79998779296875,
451
+ "logps/rejected": -594.2000122070312,
452
+ "loss": 0.0108,
453
+ "nll_loss": 1.070703148841858,
454
+ "rewards/accuracies": 1.0,
455
+ "rewards/chosen": -3.1343750953674316,
456
+ "rewards/margins": 17.587499618530273,
457
+ "rewards/rejected": -20.737499237060547,
458
+ "step": 270
459
+ },
460
+ {
461
+ "epoch": 0.224,
462
+ "grad_norm": 0.026490947430251415,
463
+ "learning_rate": 4.3155555555555557e-07,
464
+ "logits/chosen": 0.40800780057907104,
465
+ "logits/rejected": 0.58984375,
466
+ "logps/chosen": -319.79998779296875,
467
+ "logps/rejected": -617.4000244140625,
468
+ "loss": 0.0016,
469
+ "nll_loss": 1.0261719226837158,
470
+ "rewards/accuracies": 1.0,
471
+ "rewards/chosen": -2.5062499046325684,
472
+ "rewards/margins": 19.556249618530273,
473
+ "rewards/rejected": -22.087499618530273,
474
+ "step": 280
475
+ },
476
+ {
477
+ "epoch": 0.232,
478
+ "grad_norm": 0.026710360631259922,
479
+ "learning_rate": 4.271111111111111e-07,
480
+ "logits/chosen": 0.554211437702179,
481
+ "logits/rejected": 0.658398449420929,
482
+ "logps/chosen": -281.5,
483
+ "logps/rejected": -627.2000122070312,
484
+ "loss": 0.0065,
485
+ "nll_loss": 0.9703124761581421,
486
+ "rewards/accuracies": 1.0,
487
+ "rewards/chosen": -2.4585938453674316,
488
+ "rewards/margins": 21.037500381469727,
489
+ "rewards/rejected": -23.5,
490
+ "step": 290
491
+ },
492
+ {
493
+ "epoch": 0.24,
494
+ "grad_norm": 0.0629472840725987,
495
+ "learning_rate": 4.226666666666667e-07,
496
+ "logits/chosen": 0.492919921875,
497
+ "logits/rejected": 0.6646484136581421,
498
+ "logps/chosen": -280.3999938964844,
499
+ "logps/rejected": -636.4000244140625,
500
+ "loss": 0.0011,
501
+ "nll_loss": 1.0402343273162842,
502
+ "rewards/accuracies": 1.0,
503
+ "rewards/chosen": -2.484375,
504
+ "rewards/margins": 21.662500381469727,
505
+ "rewards/rejected": -24.125,
506
+ "step": 300
507
+ },
508
+ {
509
+ "epoch": 0.248,
510
+ "grad_norm": 0.015481044961707728,
511
+ "learning_rate": 4.1822222222222217e-07,
512
+ "logits/chosen": 0.518505871295929,
513
+ "logits/rejected": 0.6767578125,
514
+ "logps/chosen": -301.6000061035156,
515
+ "logps/rejected": -672.4000244140625,
516
+ "loss": 0.001,
517
+ "nll_loss": 1.0242187976837158,
518
+ "rewards/accuracies": 1.0,
519
+ "rewards/chosen": -2.29296875,
520
+ "rewards/margins": 22.825000762939453,
521
+ "rewards/rejected": -25.087499618530273,
522
+ "step": 310
523
+ },
524
+ {
525
+ "epoch": 0.256,
526
+ "grad_norm": 0.011465936087268223,
527
+ "learning_rate": 4.1377777777777776e-07,
528
+ "logits/chosen": 0.39887696504592896,
529
+ "logits/rejected": 0.503710925579071,
530
+ "logps/chosen": -398.6000061035156,
531
+ "logps/rejected": -589.7999877929688,
532
+ "loss": 0.0023,
533
+ "nll_loss": 1.108984351158142,
534
+ "rewards/accuracies": 1.0,
535
+ "rewards/chosen": -2.241406202316284,
536
+ "rewards/margins": 18.181249618530273,
537
+ "rewards/rejected": -20.412500381469727,
538
+ "step": 320
539
+ },
540
+ {
541
+ "epoch": 0.264,
542
+ "grad_norm": 0.009443129326879707,
543
+ "learning_rate": 4.093333333333333e-07,
544
+ "logits/chosen": 0.42631834745407104,
545
+ "logits/rejected": 0.555468738079071,
546
+ "logps/chosen": -311.70001220703125,
547
+ "logps/rejected": -587.2000122070312,
548
+ "loss": 0.0022,
549
+ "nll_loss": 0.985546886920929,
550
+ "rewards/accuracies": 1.0,
551
+ "rewards/chosen": -1.837499976158142,
552
+ "rewards/margins": 18.837499618530273,
553
+ "rewards/rejected": -20.649999618530273,
554
+ "step": 330
555
+ },
556
+ {
557
+ "epoch": 0.272,
558
+ "grad_norm": 0.0101159188120434,
559
+ "learning_rate": 4.048888888888889e-07,
560
+ "logits/chosen": 0.3521057069301605,
561
+ "logits/rejected": 0.47089844942092896,
562
+ "logps/chosen": -259.6000061035156,
563
+ "logps/rejected": -614.2000122070312,
564
+ "loss": 0.001,
565
+ "nll_loss": 0.975390613079071,
566
+ "rewards/accuracies": 1.0,
567
+ "rewards/chosen": -1.9140625,
568
+ "rewards/margins": 20.387500762939453,
569
+ "rewards/rejected": -22.318750381469727,
570
+ "step": 340
571
+ },
572
+ {
573
+ "epoch": 0.28,
574
+ "grad_norm": 0.01605643719718742,
575
+ "learning_rate": 4.004444444444444e-07,
576
+ "logits/chosen": 0.3182617127895355,
577
+ "logits/rejected": 0.4351562559604645,
578
+ "logps/chosen": -269.79998779296875,
579
+ "logps/rejected": -607.2000122070312,
580
+ "loss": 0.0078,
581
+ "nll_loss": 0.967968761920929,
582
+ "rewards/accuracies": 1.0,
583
+ "rewards/chosen": -1.5499999523162842,
584
+ "rewards/margins": 20.274999618530273,
585
+ "rewards/rejected": -21.799999237060547,
586
+ "step": 350
587
+ },
588
+ {
589
+ "epoch": 0.288,
590
+ "grad_norm": 0.01795986015889223,
591
+ "learning_rate": 3.96e-07,
592
+ "logits/chosen": 0.4128173887729645,
593
+ "logits/rejected": 0.5787109136581421,
594
+ "logps/chosen": -280.20001220703125,
595
+ "logps/rejected": -583.5999755859375,
596
+ "loss": 0.0054,
597
+ "nll_loss": 1.0378906726837158,
598
+ "rewards/accuracies": 1.0,
599
+ "rewards/chosen": -1.6953125,
600
+ "rewards/margins": 19.5,
601
+ "rewards/rejected": -21.212499618530273,
602
+ "step": 360
603
+ },
604
+ {
605
+ "epoch": 0.296,
606
+ "grad_norm": 0.032900120617305545,
607
+ "learning_rate": 3.9155555555555553e-07,
608
+ "logits/chosen": 0.3366943299770355,
609
+ "logits/rejected": 0.56396484375,
610
+ "logps/chosen": -314.6000061035156,
611
+ "logps/rejected": -623.2000122070312,
612
+ "loss": 0.0012,
613
+ "nll_loss": 1.031640648841858,
614
+ "rewards/accuracies": 1.0,
615
+ "rewards/chosen": -1.967187523841858,
616
+ "rewards/margins": 22.0,
617
+ "rewards/rejected": -23.962499618530273,
618
+ "step": 370
619
+ },
620
+ {
621
+ "epoch": 0.304,
622
+ "grad_norm": 0.03368133834851561,
623
+ "learning_rate": 3.871111111111111e-07,
624
+ "logits/chosen": 0.443115234375,
625
+ "logits/rejected": 0.6058593988418579,
626
+ "logps/chosen": -303.5,
627
+ "logps/rejected": -611.2000122070312,
628
+ "loss": 0.0033,
629
+ "nll_loss": 1.0988280773162842,
630
+ "rewards/accuracies": 1.0,
631
+ "rewards/chosen": -2.264843702316284,
632
+ "rewards/margins": 20.337499618530273,
633
+ "rewards/rejected": -22.612499237060547,
634
+ "step": 380
635
+ },
636
+ {
637
+ "epoch": 0.312,
638
+ "grad_norm": 8.91531946245202,
639
+ "learning_rate": 3.8266666666666665e-07,
640
+ "logits/chosen": 0.37250977754592896,
641
+ "logits/rejected": 0.570019543170929,
642
+ "logps/chosen": -355.79998779296875,
643
+ "logps/rejected": -587.0,
644
+ "loss": 0.0075,
645
+ "nll_loss": 1.019140601158142,
646
+ "rewards/accuracies": 1.0,
647
+ "rewards/chosen": -1.7722656726837158,
648
+ "rewards/margins": 18.850000381469727,
649
+ "rewards/rejected": -20.612499237060547,
650
+ "step": 390
651
+ },
652
+ {
653
+ "epoch": 0.32,
654
+ "grad_norm": 0.014188316050120144,
655
+ "learning_rate": 3.7822222222222224e-07,
656
+ "logits/chosen": 0.28227537870407104,
657
+ "logits/rejected": 0.44189453125,
658
+ "logps/chosen": -302.5,
659
+ "logps/rejected": -622.2000122070312,
660
+ "loss": 0.0018,
661
+ "nll_loss": 1.0828125476837158,
662
+ "rewards/accuracies": 1.0,
663
+ "rewards/chosen": -1.759374976158142,
664
+ "rewards/margins": 20.856250762939453,
665
+ "rewards/rejected": -22.625,
666
+ "step": 400
667
+ },
668
+ {
669
+ "epoch": 0.328,
670
+ "grad_norm": 0.01629482028046429,
671
+ "learning_rate": 3.7377777777777777e-07,
672
+ "logits/chosen": 0.4126953184604645,
673
+ "logits/rejected": 0.501953125,
674
+ "logps/chosen": -356.8999938964844,
675
+ "logps/rejected": -628.0,
676
+ "loss": 0.001,
677
+ "nll_loss": 1.040624976158142,
678
+ "rewards/accuracies": 1.0,
679
+ "rewards/chosen": -2.0132813453674316,
680
+ "rewards/margins": 21.137500762939453,
681
+ "rewards/rejected": -23.149999618530273,
682
+ "step": 410
683
+ },
684
+ {
685
+ "epoch": 0.336,
686
+ "grad_norm": 0.017142773967819217,
687
+ "learning_rate": 3.693333333333333e-07,
688
+ "logits/chosen": 0.3507751524448395,
689
+ "logits/rejected": 0.4869628846645355,
690
+ "logps/chosen": -333.95001220703125,
691
+ "logps/rejected": -605.2000122070312,
692
+ "loss": 0.001,
693
+ "nll_loss": 0.9984375238418579,
694
+ "rewards/accuracies": 1.0,
695
+ "rewards/chosen": -1.606054663658142,
696
+ "rewards/margins": 20.237499237060547,
697
+ "rewards/rejected": -21.862499237060547,
698
+ "step": 420
699
+ },
700
+ {
701
+ "epoch": 0.344,
702
+ "grad_norm": 0.18744587283856406,
703
+ "learning_rate": 3.6488888888888884e-07,
704
+ "logits/chosen": 0.4670043885707855,
705
+ "logits/rejected": 0.5835937261581421,
706
+ "logps/chosen": -272.5,
707
+ "logps/rejected": -609.2000122070312,
708
+ "loss": 0.001,
709
+ "nll_loss": 0.9847656488418579,
710
+ "rewards/accuracies": 1.0,
711
+ "rewards/chosen": -1.394140601158142,
712
+ "rewards/margins": 20.149999618530273,
713
+ "rewards/rejected": -21.5625,
714
+ "step": 430
715
+ },
716
+ {
717
+ "epoch": 0.352,
718
+ "grad_norm": 0.7115134446778305,
719
+ "learning_rate": 3.604444444444444e-07,
720
+ "logits/chosen": 0.32639771699905396,
721
+ "logits/rejected": 0.49003905057907104,
722
+ "logps/chosen": -271.1000061035156,
723
+ "logps/rejected": -617.0,
724
+ "loss": 0.001,
725
+ "nll_loss": 0.901562511920929,
726
+ "rewards/accuracies": 1.0,
727
+ "rewards/chosen": -1.182031273841858,
728
+ "rewards/margins": 21.112499237060547,
729
+ "rewards/rejected": -22.3125,
730
+ "step": 440
731
+ },
732
+ {
733
+ "epoch": 0.36,
734
+ "grad_norm": 0.08581115615312221,
735
+ "learning_rate": 3.5599999999999996e-07,
736
+ "logits/chosen": 0.4197753965854645,
737
+ "logits/rejected": 0.5601562261581421,
738
+ "logps/chosen": -299.20001220703125,
739
+ "logps/rejected": -558.4000244140625,
740
+ "loss": 0.0051,
741
+ "nll_loss": 0.979296863079071,
742
+ "rewards/accuracies": 1.0,
743
+ "rewards/chosen": -0.885546863079071,
744
+ "rewards/margins": 18.625,
745
+ "rewards/rejected": -19.512500762939453,
746
+ "step": 450
747
+ },
748
+ {
749
+ "epoch": 0.368,
750
+ "grad_norm": 0.050537690816069084,
751
+ "learning_rate": 3.5155555555555554e-07,
752
+ "logits/chosen": 0.36616212129592896,
753
+ "logits/rejected": 0.5220702886581421,
754
+ "logps/chosen": -293.20001220703125,
755
+ "logps/rejected": -601.4000244140625,
756
+ "loss": 0.0011,
757
+ "nll_loss": 1.007421851158142,
758
+ "rewards/accuracies": 1.0,
759
+ "rewards/chosen": -1.21875,
760
+ "rewards/margins": 20.475000381469727,
761
+ "rewards/rejected": -21.6875,
762
+ "step": 460
763
+ },
764
+ {
765
+ "epoch": 0.376,
766
+ "grad_norm": 0.02175945703204624,
767
+ "learning_rate": 3.471111111111111e-07,
768
+ "logits/chosen": 0.4150390625,
769
+ "logits/rejected": 0.5416015386581421,
770
+ "logps/chosen": -276.8999938964844,
771
+ "logps/rejected": -617.7999877929688,
772
+ "loss": 0.0011,
773
+ "nll_loss": 1.1179687976837158,
774
+ "rewards/accuracies": 1.0,
775
+ "rewards/chosen": -1.5373046398162842,
776
+ "rewards/margins": 20.962499618530273,
777
+ "rewards/rejected": -22.5,
778
+ "step": 470
779
+ },
780
+ {
781
+ "epoch": 0.384,
782
+ "grad_norm": 0.026063826437843437,
783
+ "learning_rate": 3.4266666666666666e-07,
784
+ "logits/chosen": 0.45039063692092896,
785
+ "logits/rejected": 0.612500011920929,
786
+ "logps/chosen": -272.3999938964844,
787
+ "logps/rejected": -599.7999877929688,
788
+ "loss": 0.0011,
789
+ "nll_loss": 0.9125000238418579,
790
+ "rewards/accuracies": 1.0,
791
+ "rewards/chosen": -1.2761719226837158,
792
+ "rewards/margins": 21.975000381469727,
793
+ "rewards/rejected": -23.25,
794
+ "step": 480
795
+ },
796
+ {
797
+ "epoch": 0.392,
798
+ "grad_norm": 0.014336581093924161,
799
+ "learning_rate": 3.382222222222222e-07,
800
+ "logits/chosen": 0.38768309354782104,
801
+ "logits/rejected": 0.51953125,
802
+ "logps/chosen": -373.20001220703125,
803
+ "logps/rejected": -591.5999755859375,
804
+ "loss": 0.0011,
805
+ "nll_loss": 1.019921898841858,
806
+ "rewards/accuracies": 1.0,
807
+ "rewards/chosen": -1.96875,
808
+ "rewards/margins": 20.899999618530273,
809
+ "rewards/rejected": -22.862499237060547,
810
+ "step": 490
811
+ },
812
+ {
813
+ "epoch": 0.4,
814
+ "grad_norm": 0.015681965151029324,
815
+ "learning_rate": 3.337777777777778e-07,
816
+ "logits/chosen": 0.28288573026657104,
817
+ "logits/rejected": 0.4932617247104645,
818
+ "logps/chosen": -301.70001220703125,
819
+ "logps/rejected": -696.0,
820
+ "loss": 0.0011,
821
+ "nll_loss": 1.058203101158142,
822
+ "rewards/accuracies": 1.0,
823
+ "rewards/chosen": -1.958593726158142,
824
+ "rewards/margins": 25.9375,
825
+ "rewards/rejected": -27.912500381469727,
826
+ "step": 500
827
+ },
828
+ {
829
+ "epoch": 0.408,
830
+ "grad_norm": 0.029148974279792687,
831
+ "learning_rate": 3.293333333333333e-07,
832
+ "logits/chosen": 0.31782227754592896,
833
+ "logits/rejected": 0.4458984434604645,
834
+ "logps/chosen": -302.6000061035156,
835
+ "logps/rejected": -636.0,
836
+ "loss": 0.0136,
837
+ "nll_loss": 0.9886718988418579,
838
+ "rewards/accuracies": 0.987500011920929,
839
+ "rewards/chosen": -1.5339844226837158,
840
+ "rewards/margins": 22.774999618530273,
841
+ "rewards/rejected": -24.287500381469727,
842
+ "step": 510
843
+ },
844
+ {
845
+ "epoch": 0.416,
846
+ "grad_norm": 0.019209526445581045,
847
+ "learning_rate": 3.248888888888889e-07,
848
+ "logits/chosen": 0.3396972715854645,
849
+ "logits/rejected": 0.4786132872104645,
850
+ "logps/chosen": -296.29998779296875,
851
+ "logps/rejected": -648.0,
852
+ "loss": 0.0011,
853
+ "nll_loss": 1.05078125,
854
+ "rewards/accuracies": 1.0,
855
+ "rewards/chosen": -1.097070336341858,
856
+ "rewards/margins": 23.450000762939453,
857
+ "rewards/rejected": -24.575000762939453,
858
+ "step": 520
859
+ },
860
+ {
861
+ "epoch": 0.424,
862
+ "grad_norm": 0.016375124643676898,
863
+ "learning_rate": 3.204444444444444e-07,
864
+ "logits/chosen": 0.2938476502895355,
865
+ "logits/rejected": 0.45917969942092896,
866
+ "logps/chosen": -328.5,
867
+ "logps/rejected": -683.2000122070312,
868
+ "loss": 0.0011,
869
+ "nll_loss": 1.078515648841858,
870
+ "rewards/accuracies": 1.0,
871
+ "rewards/chosen": -0.935742199420929,
872
+ "rewards/margins": 24.362499237060547,
873
+ "rewards/rejected": -25.274999618530273,
874
+ "step": 530
875
+ },
876
+ {
877
+ "epoch": 0.432,
878
+ "grad_norm": 0.03023805422018604,
879
+ "learning_rate": 3.1599999999999997e-07,
880
+ "logits/chosen": 0.3899902403354645,
881
+ "logits/rejected": 0.4580078125,
882
+ "logps/chosen": -258.04998779296875,
883
+ "logps/rejected": -600.0,
884
+ "loss": 0.0009,
885
+ "nll_loss": 0.878125011920929,
886
+ "rewards/accuracies": 1.0,
887
+ "rewards/chosen": -0.675488293170929,
888
+ "rewards/margins": 21.325000762939453,
889
+ "rewards/rejected": -22.0,
890
+ "step": 540
891
+ },
892
+ {
893
+ "epoch": 0.44,
894
+ "grad_norm": 0.01610635441474965,
895
+ "learning_rate": 3.115555555555555e-07,
896
+ "logits/chosen": 0.3402954041957855,
897
+ "logits/rejected": 0.46113282442092896,
898
+ "logps/chosen": -274.29998779296875,
899
+ "logps/rejected": -666.4000244140625,
900
+ "loss": 0.001,
901
+ "nll_loss": 0.967968761920929,
902
+ "rewards/accuracies": 1.0,
903
+ "rewards/chosen": -0.11225585639476776,
904
+ "rewards/margins": 23.512500762939453,
905
+ "rewards/rejected": -23.625,
906
+ "step": 550
907
+ },
908
+ {
909
+ "epoch": 0.448,
910
+ "grad_norm": 0.09940501062248701,
911
+ "learning_rate": 3.071111111111111e-07,
912
+ "logits/chosen": 0.13032226264476776,
913
+ "logits/rejected": 0.25639647245407104,
914
+ "logps/chosen": -349.79998779296875,
915
+ "logps/rejected": -608.5999755859375,
916
+ "loss": 0.0056,
917
+ "nll_loss": 1.128515601158142,
918
+ "rewards/accuracies": 1.0,
919
+ "rewards/chosen": -0.763476550579071,
920
+ "rewards/margins": 20.899999618530273,
921
+ "rewards/rejected": -21.637500762939453,
922
+ "step": 560
923
+ },
924
+ {
925
+ "epoch": 0.456,
926
+ "grad_norm": 0.015044416279848708,
927
+ "learning_rate": 3.026666666666666e-07,
928
+ "logits/chosen": 0.2685302793979645,
929
+ "logits/rejected": 0.44746094942092896,
930
+ "logps/chosen": -277.29998779296875,
931
+ "logps/rejected": -619.2000122070312,
932
+ "loss": 0.0032,
933
+ "nll_loss": 0.977343738079071,
934
+ "rewards/accuracies": 1.0,
935
+ "rewards/chosen": -0.5107055902481079,
936
+ "rewards/margins": 22.412500381469727,
937
+ "rewards/rejected": -22.912500381469727,
938
+ "step": 570
939
+ },
940
+ {
941
+ "epoch": 0.464,
942
+ "grad_norm": 0.010492314365964279,
943
+ "learning_rate": 2.982222222222222e-07,
944
+ "logits/chosen": 0.24605712294578552,
945
+ "logits/rejected": 0.38178712129592896,
946
+ "logps/chosen": -282.45001220703125,
947
+ "logps/rejected": -596.4000244140625,
948
+ "loss": 0.001,
949
+ "nll_loss": 0.9585937261581421,
950
+ "rewards/accuracies": 1.0,
951
+ "rewards/chosen": -0.3153625428676605,
952
+ "rewards/margins": 21.8125,
953
+ "rewards/rejected": -22.125,
954
+ "step": 580
955
+ },
956
+ {
957
+ "epoch": 0.472,
958
+ "grad_norm": 0.011244630233026583,
959
+ "learning_rate": 2.937777777777778e-07,
960
+ "logits/chosen": 0.185791015625,
961
+ "logits/rejected": 0.36054688692092896,
962
+ "logps/chosen": -298.29998779296875,
963
+ "logps/rejected": -561.4000244140625,
964
+ "loss": 0.001,
965
+ "nll_loss": 0.9632812738418579,
966
+ "rewards/accuracies": 1.0,
967
+ "rewards/chosen": -0.29365235567092896,
968
+ "rewards/margins": 20.21875,
969
+ "rewards/rejected": -20.512500762939453,
970
+ "step": 590
971
+ },
972
+ {
973
+ "epoch": 0.48,
974
+ "grad_norm": 0.014891293094420017,
975
+ "learning_rate": 2.8933333333333333e-07,
976
+ "logits/chosen": 0.3611816465854645,
977
+ "logits/rejected": 0.46757811307907104,
978
+ "logps/chosen": -321.70001220703125,
979
+ "logps/rejected": -618.0,
980
+ "loss": 0.0012,
981
+ "nll_loss": 1.0792968273162842,
982
+ "rewards/accuracies": 1.0,
983
+ "rewards/chosen": -1.0291016101837158,
984
+ "rewards/margins": 22.037500381469727,
985
+ "rewards/rejected": -23.049999237060547,
986
+ "step": 600
987
+ },
988
+ {
989
+ "epoch": 0.488,
990
+ "grad_norm": 0.016715039332789686,
991
+ "learning_rate": 2.848888888888889e-07,
992
+ "logits/chosen": 0.3804687559604645,
993
+ "logits/rejected": 0.546875,
994
+ "logps/chosen": -266.79998779296875,
995
+ "logps/rejected": -629.5999755859375,
996
+ "loss": 0.0009,
997
+ "nll_loss": 0.9234374761581421,
998
+ "rewards/accuracies": 1.0,
999
+ "rewards/chosen": -0.6703125238418579,
1000
+ "rewards/margins": 22.493749618530273,
1001
+ "rewards/rejected": -23.149999618530273,
1002
+ "step": 610
1003
+ },
1004
+ {
1005
+ "epoch": 0.496,
1006
+ "grad_norm": 0.007446183758849815,
1007
+ "learning_rate": 2.8044444444444445e-07,
1008
+ "logits/chosen": 0.38258057832717896,
1009
+ "logits/rejected": 0.46435546875,
1010
+ "logps/chosen": -264.04998779296875,
1011
+ "logps/rejected": -692.4000244140625,
1012
+ "loss": 0.001,
1013
+ "nll_loss": 0.966015636920929,
1014
+ "rewards/accuracies": 1.0,
1015
+ "rewards/chosen": -0.674023449420929,
1016
+ "rewards/margins": 25.799999237060547,
1017
+ "rewards/rejected": -26.475000381469727,
1018
+ "step": 620
1019
+ },
1020
+ {
1021
+ "epoch": 0.504,
1022
+ "grad_norm": 0.06960779594217158,
1023
+ "learning_rate": 2.7600000000000004e-07,
1024
+ "logits/chosen": 0.22910156846046448,
1025
+ "logits/rejected": 0.3974609375,
1026
+ "logps/chosen": -257.70001220703125,
1027
+ "logps/rejected": -636.4000244140625,
1028
+ "loss": 0.0136,
1029
+ "nll_loss": 0.969531238079071,
1030
+ "rewards/accuracies": 0.987500011920929,
1031
+ "rewards/chosen": -1.108984351158142,
1032
+ "rewards/margins": 23.912500381469727,
1033
+ "rewards/rejected": -25.012500762939453,
1034
+ "step": 630
1035
+ },
1036
+ {
1037
+ "epoch": 0.512,
1038
+ "grad_norm": 0.018519025389240995,
1039
+ "learning_rate": 2.715555555555555e-07,
1040
+ "logits/chosen": 0.5155273675918579,
1041
+ "logits/rejected": 0.7171875238418579,
1042
+ "logps/chosen": -312.0,
1043
+ "logps/rejected": -652.0,
1044
+ "loss": 0.0041,
1045
+ "nll_loss": 0.944531261920929,
1046
+ "rewards/accuracies": 1.0,
1047
+ "rewards/chosen": -1.06640625,
1048
+ "rewards/margins": 25.774999618530273,
1049
+ "rewards/rejected": -26.850000381469727,
1050
+ "step": 640
1051
+ },
1052
+ {
1053
+ "epoch": 0.52,
1054
+ "grad_norm": 0.02373063016084682,
1055
+ "learning_rate": 2.671111111111111e-07,
1056
+ "logits/chosen": 0.4817748963832855,
1057
+ "logits/rejected": 0.6402343511581421,
1058
+ "logps/chosen": -285.20001220703125,
1059
+ "logps/rejected": -635.0,
1060
+ "loss": 0.0023,
1061
+ "nll_loss": 1.037500023841858,
1062
+ "rewards/accuracies": 1.0,
1063
+ "rewards/chosen": -2.0257811546325684,
1064
+ "rewards/margins": 25.0,
1065
+ "rewards/rejected": -27.0,
1066
+ "step": 650
1067
+ },
1068
+ {
1069
+ "epoch": 0.528,
1070
+ "grad_norm": 0.04312745328739261,
1071
+ "learning_rate": 2.6266666666666664e-07,
1072
+ "logits/chosen": 0.45927733182907104,
1073
+ "logits/rejected": 0.6669921875,
1074
+ "logps/chosen": -315.5,
1075
+ "logps/rejected": -683.5999755859375,
1076
+ "loss": 0.0011,
1077
+ "nll_loss": 1.0671875476837158,
1078
+ "rewards/accuracies": 1.0,
1079
+ "rewards/chosen": -1.6691405773162842,
1080
+ "rewards/margins": 27.862499237060547,
1081
+ "rewards/rejected": -29.524999618530273,
1082
+ "step": 660
1083
+ },
1084
+ {
1085
+ "epoch": 0.536,
1086
+ "grad_norm": 8.406219083200051,
1087
+ "learning_rate": 2.582222222222222e-07,
1088
+ "logits/chosen": 0.51904296875,
1089
+ "logits/rejected": 0.666015625,
1090
+ "logps/chosen": -280.0,
1091
+ "logps/rejected": -706.7999877929688,
1092
+ "loss": 0.0013,
1093
+ "nll_loss": 1.0515625476837158,
1094
+ "rewards/accuracies": 1.0,
1095
+ "rewards/chosen": -1.578515648841858,
1096
+ "rewards/margins": 28.737499237060547,
1097
+ "rewards/rejected": -30.299999237060547,
1098
+ "step": 670
1099
+ },
1100
+ {
1101
+ "epoch": 0.544,
1102
+ "grad_norm": 0.00797165057348372,
1103
+ "learning_rate": 2.5377777777777776e-07,
1104
+ "logits/chosen": 0.425048828125,
1105
+ "logits/rejected": 0.6025390625,
1106
+ "logps/chosen": -304.20001220703125,
1107
+ "logps/rejected": -616.2000122070312,
1108
+ "loss": 0.0073,
1109
+ "nll_loss": 0.931640625,
1110
+ "rewards/accuracies": 1.0,
1111
+ "rewards/chosen": -1.0339844226837158,
1112
+ "rewards/margins": 23.674999237060547,
1113
+ "rewards/rejected": -24.6875,
1114
+ "step": 680
1115
+ },
1116
+ {
1117
+ "epoch": 0.552,
1118
+ "grad_norm": 0.02117016630770859,
1119
+ "learning_rate": 2.493333333333333e-07,
1120
+ "logits/chosen": 0.43408203125,
1121
+ "logits/rejected": 0.5884765386581421,
1122
+ "logps/chosen": -271.20001220703125,
1123
+ "logps/rejected": -651.2000122070312,
1124
+ "loss": 0.001,
1125
+ "nll_loss": 1.0207030773162842,
1126
+ "rewards/accuracies": 1.0,
1127
+ "rewards/chosen": -0.912792980670929,
1128
+ "rewards/margins": 25.412500381469727,
1129
+ "rewards/rejected": -26.325000762939453,
1130
+ "step": 690
1131
+ },
1132
+ {
1133
+ "epoch": 0.56,
1134
+ "grad_norm": 0.05013906609321948,
1135
+ "learning_rate": 2.448888888888889e-07,
1136
+ "logits/chosen": 0.5001465082168579,
1137
+ "logits/rejected": 0.587890625,
1138
+ "logps/chosen": -291.0,
1139
+ "logps/rejected": -642.5999755859375,
1140
+ "loss": 0.0011,
1141
+ "nll_loss": 1.056249976158142,
1142
+ "rewards/accuracies": 1.0,
1143
+ "rewards/chosen": -0.5542968511581421,
1144
+ "rewards/margins": 24.600000381469727,
1145
+ "rewards/rejected": -25.149999618530273,
1146
+ "step": 700
1147
+ },
1148
+ {
1149
+ "epoch": 0.568,
1150
+ "grad_norm": 0.010508494344882753,
1151
+ "learning_rate": 2.404444444444444e-07,
1152
+ "logits/chosen": 0.4154296815395355,
1153
+ "logits/rejected": 0.53466796875,
1154
+ "logps/chosen": -281.8999938964844,
1155
+ "logps/rejected": -616.5999755859375,
1156
+ "loss": 0.001,
1157
+ "nll_loss": 1.004296898841858,
1158
+ "rewards/accuracies": 1.0,
1159
+ "rewards/chosen": -0.6047607660293579,
1160
+ "rewards/margins": 23.875,
1161
+ "rewards/rejected": -24.487499237060547,
1162
+ "step": 710
1163
+ },
1164
+ {
1165
+ "epoch": 0.576,
1166
+ "grad_norm": 0.02085599761375334,
1167
+ "learning_rate": 2.3599999999999997e-07,
1168
+ "logits/chosen": 0.40234375,
1169
+ "logits/rejected": 0.593945324420929,
1170
+ "logps/chosen": -295.20001220703125,
1171
+ "logps/rejected": -631.2000122070312,
1172
+ "loss": 0.0011,
1173
+ "nll_loss": 1.082421898841858,
1174
+ "rewards/accuracies": 1.0,
1175
+ "rewards/chosen": -0.5250488519668579,
1176
+ "rewards/margins": 23.5625,
1177
+ "rewards/rejected": -24.112499237060547,
1178
+ "step": 720
1179
+ },
1180
+ {
1181
+ "epoch": 0.584,
1182
+ "grad_norm": 0.022872642095837056,
1183
+ "learning_rate": 2.3155555555555553e-07,
1184
+ "logits/chosen": 0.3960937559604645,
1185
+ "logits/rejected": 0.5274413824081421,
1186
+ "logps/chosen": -269.79998779296875,
1187
+ "logps/rejected": -597.2000122070312,
1188
+ "loss": 0.0009,
1189
+ "nll_loss": 0.9195312261581421,
1190
+ "rewards/accuracies": 1.0,
1191
+ "rewards/chosen": -0.25908201932907104,
1192
+ "rewards/margins": 22.393749237060547,
1193
+ "rewards/rejected": -22.643749237060547,
1194
+ "step": 730
1195
+ },
1196
+ {
1197
+ "epoch": 0.592,
1198
+ "grad_norm": 0.012548738999103248,
1199
+ "learning_rate": 2.2711111111111112e-07,
1200
+ "logits/chosen": 0.3612304627895355,
1201
+ "logits/rejected": 0.47871094942092896,
1202
+ "logps/chosen": -264.20001220703125,
1203
+ "logps/rejected": -630.0,
1204
+ "loss": 0.0009,
1205
+ "nll_loss": 0.899609386920929,
1206
+ "rewards/accuracies": 1.0,
1207
+ "rewards/chosen": -0.07817383110523224,
1208
+ "rewards/margins": 23.512500762939453,
1209
+ "rewards/rejected": -23.649999618530273,
1210
+ "step": 740
1211
+ },
1212
+ {
1213
+ "epoch": 0.6,
1214
+ "grad_norm": 0.009873598939126866,
1215
+ "learning_rate": 2.2266666666666668e-07,
1216
+ "logits/chosen": 0.3773437440395355,
1217
+ "logits/rejected": 0.5074218511581421,
1218
+ "logps/chosen": -303.3999938964844,
1219
+ "logps/rejected": -563.2000122070312,
1220
+ "loss": 0.0009,
1221
+ "nll_loss": 0.873828113079071,
1222
+ "rewards/accuracies": 1.0,
1223
+ "rewards/chosen": 0.1182861328125,
1224
+ "rewards/margins": 20.112499237060547,
1225
+ "rewards/rejected": -19.987499237060547,
1226
+ "step": 750
1227
+ },
1228
+ {
1229
+ "epoch": 0.608,
1230
+ "grad_norm": 2.1722284009630792,
1231
+ "learning_rate": 2.1822222222222224e-07,
1232
+ "logits/chosen": 0.45292967557907104,
1233
+ "logits/rejected": 0.45878905057907104,
1234
+ "logps/chosen": -267.79998779296875,
1235
+ "logps/rejected": -575.7999877929688,
1236
+ "loss": 0.0013,
1237
+ "nll_loss": 0.91796875,
1238
+ "rewards/accuracies": 1.0,
1239
+ "rewards/chosen": -0.02753906324505806,
1240
+ "rewards/margins": 20.549999237060547,
1241
+ "rewards/rejected": -20.587499618530273,
1242
+ "step": 760
1243
+ },
1244
+ {
1245
+ "epoch": 0.616,
1246
+ "grad_norm": 0.015566357128601925,
1247
+ "learning_rate": 2.1377777777777777e-07,
1248
+ "logits/chosen": 0.3982177674770355,
1249
+ "logits/rejected": 0.540234386920929,
1250
+ "logps/chosen": -265.5,
1251
+ "logps/rejected": -687.5999755859375,
1252
+ "loss": 0.0096,
1253
+ "nll_loss": 0.9742187261581421,
1254
+ "rewards/accuracies": 0.987500011920929,
1255
+ "rewards/chosen": -0.7779296636581421,
1256
+ "rewards/margins": 28.487499237060547,
1257
+ "rewards/rejected": -29.274999618530273,
1258
+ "step": 770
1259
+ },
1260
+ {
1261
+ "epoch": 0.624,
1262
+ "grad_norm": 0.0852360643714337,
1263
+ "learning_rate": 2.0933333333333333e-07,
1264
+ "logits/chosen": 0.3617187440395355,
1265
+ "logits/rejected": 0.48701173067092896,
1266
+ "logps/chosen": -265.8500061035156,
1267
+ "logps/rejected": -620.0,
1268
+ "loss": 0.0009,
1269
+ "nll_loss": 0.9300781488418579,
1270
+ "rewards/accuracies": 1.0,
1271
+ "rewards/chosen": 0.07557983696460724,
1272
+ "rewards/margins": 23.362499237060547,
1273
+ "rewards/rejected": -23.274999618530273,
1274
+ "step": 780
1275
+ },
1276
+ {
1277
+ "epoch": 0.632,
1278
+ "grad_norm": 0.00755420050240308,
1279
+ "learning_rate": 2.048888888888889e-07,
1280
+ "logits/chosen": 0.28996580839157104,
1281
+ "logits/rejected": 0.535351574420929,
1282
+ "logps/chosen": -256.5,
1283
+ "logps/rejected": -633.2000122070312,
1284
+ "loss": 0.007,
1285
+ "nll_loss": 0.967578113079071,
1286
+ "rewards/accuracies": 1.0,
1287
+ "rewards/chosen": -0.40595704317092896,
1288
+ "rewards/margins": 24.274999618530273,
1289
+ "rewards/rejected": -24.700000762939453,
1290
+ "step": 790
1291
+ },
1292
+ {
1293
+ "epoch": 0.64,
1294
+ "grad_norm": 0.01713698594594714,
1295
+ "learning_rate": 2.0044444444444445e-07,
1296
+ "logits/chosen": 0.24697265028953552,
1297
+ "logits/rejected": 0.4248046875,
1298
+ "logps/chosen": -283.8999938964844,
1299
+ "logps/rejected": -630.0,
1300
+ "loss": 0.001,
1301
+ "nll_loss": 0.9644531011581421,
1302
+ "rewards/accuracies": 1.0,
1303
+ "rewards/chosen": 0.10643310844898224,
1304
+ "rewards/margins": 23.575000762939453,
1305
+ "rewards/rejected": -23.487499237060547,
1306
+ "step": 800
1307
+ },
1308
+ {
1309
+ "epoch": 0.648,
1310
+ "grad_norm": 0.008040661363419143,
1311
+ "learning_rate": 1.96e-07,
1312
+ "logits/chosen": 0.31492918729782104,
1313
+ "logits/rejected": 0.41838377714157104,
1314
+ "logps/chosen": -306.79998779296875,
1315
+ "logps/rejected": -615.2000122070312,
1316
+ "loss": 0.0014,
1317
+ "nll_loss": 0.9761718511581421,
1318
+ "rewards/accuracies": 1.0,
1319
+ "rewards/chosen": -0.2574706971645355,
1320
+ "rewards/margins": 22.5,
1321
+ "rewards/rejected": -22.762500762939453,
1322
+ "step": 810
1323
+ },
1324
+ {
1325
+ "epoch": 0.656,
1326
+ "grad_norm": 0.011050522622669116,
1327
+ "learning_rate": 1.9155555555555554e-07,
1328
+ "logits/chosen": 0.36284178495407104,
1329
+ "logits/rejected": 0.5755859613418579,
1330
+ "logps/chosen": -284.6000061035156,
1331
+ "logps/rejected": -627.2000122070312,
1332
+ "loss": 0.001,
1333
+ "nll_loss": 1.021875023841858,
1334
+ "rewards/accuracies": 1.0,
1335
+ "rewards/chosen": -0.15849609673023224,
1336
+ "rewards/margins": 24.737499237060547,
1337
+ "rewards/rejected": -24.912500381469727,
1338
+ "step": 820
1339
+ },
1340
+ {
1341
+ "epoch": 0.664,
1342
+ "grad_norm": 1.3243299715147776,
1343
+ "learning_rate": 1.871111111111111e-07,
1344
+ "logits/chosen": 0.40766602754592896,
1345
+ "logits/rejected": 0.55078125,
1346
+ "logps/chosen": -256.0,
1347
+ "logps/rejected": -638.7999877929688,
1348
+ "loss": 0.001,
1349
+ "nll_loss": 0.9371093511581421,
1350
+ "rewards/accuracies": 1.0,
1351
+ "rewards/chosen": 0.19802245497703552,
1352
+ "rewards/margins": 25.137500762939453,
1353
+ "rewards/rejected": -24.962499618530273,
1354
+ "step": 830
1355
+ },
1356
+ {
1357
+ "epoch": 0.672,
1358
+ "grad_norm": 0.14410920057877055,
1359
+ "learning_rate": 1.8266666666666666e-07,
1360
+ "logits/chosen": 0.37744140625,
1361
+ "logits/rejected": 0.558398425579071,
1362
+ "logps/chosen": -298.3999938964844,
1363
+ "logps/rejected": -634.0,
1364
+ "loss": 0.0016,
1365
+ "nll_loss": 0.9925781488418579,
1366
+ "rewards/accuracies": 1.0,
1367
+ "rewards/chosen": -0.5572754144668579,
1368
+ "rewards/margins": 24.287500381469727,
1369
+ "rewards/rejected": -24.862499237060547,
1370
+ "step": 840
1371
+ },
1372
+ {
1373
+ "epoch": 0.68,
1374
+ "grad_norm": 0.008316643842253967,
1375
+ "learning_rate": 1.7822222222222222e-07,
1376
+ "logits/chosen": 0.3272949159145355,
1377
+ "logits/rejected": 0.524121105670929,
1378
+ "logps/chosen": -298.6000061035156,
1379
+ "logps/rejected": -672.7999877929688,
1380
+ "loss": 0.0127,
1381
+ "nll_loss": 0.9996093511581421,
1382
+ "rewards/accuracies": 0.987500011920929,
1383
+ "rewards/chosen": -0.3311523497104645,
1384
+ "rewards/margins": 25.774999618530273,
1385
+ "rewards/rejected": -26.100000381469727,
1386
+ "step": 850
1387
+ },
1388
+ {
1389
+ "epoch": 0.688,
1390
+ "grad_norm": 0.008261030762950254,
1391
+ "learning_rate": 1.7377777777777778e-07,
1392
+ "logits/chosen": 0.4465576112270355,
1393
+ "logits/rejected": 0.6617187261581421,
1394
+ "logps/chosen": -281.1000061035156,
1395
+ "logps/rejected": -612.0,
1396
+ "loss": 0.0011,
1397
+ "nll_loss": 0.98046875,
1398
+ "rewards/accuracies": 1.0,
1399
+ "rewards/chosen": -0.07645263522863388,
1400
+ "rewards/margins": 22.987499237060547,
1401
+ "rewards/rejected": -23.0625,
1402
+ "step": 860
1403
+ },
1404
+ {
1405
+ "epoch": 0.696,
1406
+ "grad_norm": 0.012271312104445061,
1407
+ "learning_rate": 1.6933333333333334e-07,
1408
+ "logits/chosen": 0.4715820252895355,
1409
+ "logits/rejected": 0.6121581792831421,
1410
+ "logps/chosen": -285.6000061035156,
1411
+ "logps/rejected": -622.4000244140625,
1412
+ "loss": 0.0011,
1413
+ "nll_loss": 0.889453113079071,
1414
+ "rewards/accuracies": 1.0,
1415
+ "rewards/chosen": 0.10859374701976776,
1416
+ "rewards/margins": 24.6875,
1417
+ "rewards/rejected": -24.587499618530273,
1418
+ "step": 870
1419
+ },
1420
+ {
1421
+ "epoch": 0.704,
1422
+ "grad_norm": 0.008446600990786186,
1423
+ "learning_rate": 1.6488888888888887e-07,
1424
+ "logits/chosen": 0.4478515684604645,
1425
+ "logits/rejected": 0.648632824420929,
1426
+ "logps/chosen": -293.20001220703125,
1427
+ "logps/rejected": -619.2000122070312,
1428
+ "loss": 0.0008,
1429
+ "nll_loss": 0.8179687261581421,
1430
+ "rewards/accuracies": 1.0,
1431
+ "rewards/chosen": 0.14560547471046448,
1432
+ "rewards/margins": 24.024999618530273,
1433
+ "rewards/rejected": -23.875,
1434
+ "step": 880
1435
+ },
1436
+ {
1437
+ "epoch": 0.712,
1438
+ "grad_norm": 0.007390463100531587,
1439
+ "learning_rate": 1.6044444444444443e-07,
1440
+ "logits/chosen": 0.47856444120407104,
1441
+ "logits/rejected": 0.5884765386581421,
1442
+ "logps/chosen": -263.3999938964844,
1443
+ "logps/rejected": -658.0,
1444
+ "loss": 0.0009,
1445
+ "nll_loss": 0.9476562738418579,
1446
+ "rewards/accuracies": 1.0,
1447
+ "rewards/chosen": 0.11513672024011612,
1448
+ "rewards/margins": 26.512500762939453,
1449
+ "rewards/rejected": -26.375,
1450
+ "step": 890
1451
+ },
1452
+ {
1453
+ "epoch": 0.72,
1454
+ "grad_norm": 0.00835958049715363,
1455
+ "learning_rate": 1.56e-07,
1456
+ "logits/chosen": 0.24870605766773224,
1457
+ "logits/rejected": 0.455322265625,
1458
+ "logps/chosen": -257.79998779296875,
1459
+ "logps/rejected": -668.0,
1460
+ "loss": 0.0009,
1461
+ "nll_loss": 0.9390624761581421,
1462
+ "rewards/accuracies": 1.0,
1463
+ "rewards/chosen": -0.04535522311925888,
1464
+ "rewards/margins": 26.987499237060547,
1465
+ "rewards/rejected": -27.037500381469727,
1466
+ "step": 900
1467
+ },
1468
+ {
1469
+ "epoch": 0.728,
1470
+ "grad_norm": 0.009631942998860495,
1471
+ "learning_rate": 1.5155555555555555e-07,
1472
+ "logits/chosen": 0.4524902403354645,
1473
+ "logits/rejected": 0.631640613079071,
1474
+ "logps/chosen": -226.6999969482422,
1475
+ "logps/rejected": -674.4000244140625,
1476
+ "loss": 0.0009,
1477
+ "nll_loss": 0.9175781011581421,
1478
+ "rewards/accuracies": 1.0,
1479
+ "rewards/chosen": 0.06406249850988388,
1480
+ "rewards/margins": 27.8125,
1481
+ "rewards/rejected": -27.75,
1482
+ "step": 910
1483
+ },
1484
+ {
1485
+ "epoch": 0.736,
1486
+ "grad_norm": 0.0089109719938143,
1487
+ "learning_rate": 1.4711111111111111e-07,
1488
+ "logits/chosen": 0.3174072206020355,
1489
+ "logits/rejected": 0.40791016817092896,
1490
+ "logps/chosen": -313.5,
1491
+ "logps/rejected": -609.5999755859375,
1492
+ "loss": 0.001,
1493
+ "nll_loss": 0.943359375,
1494
+ "rewards/accuracies": 1.0,
1495
+ "rewards/chosen": 0.3261352479457855,
1496
+ "rewards/margins": 23.612499237060547,
1497
+ "rewards/rejected": -23.287500381469727,
1498
+ "step": 920
1499
+ },
1500
+ {
1501
+ "epoch": 0.744,
1502
+ "grad_norm": 0.017708759105074332,
1503
+ "learning_rate": 1.4266666666666665e-07,
1504
+ "logits/chosen": 0.30195313692092896,
1505
+ "logits/rejected": 0.4756835997104645,
1506
+ "logps/chosen": -248.89999389648438,
1507
+ "logps/rejected": -651.5999755859375,
1508
+ "loss": 0.0009,
1509
+ "nll_loss": 0.919921875,
1510
+ "rewards/accuracies": 1.0,
1511
+ "rewards/chosen": 0.13002929091453552,
1512
+ "rewards/margins": 26.625,
1513
+ "rewards/rejected": -26.487499237060547,
1514
+ "step": 930
1515
+ },
1516
+ {
1517
+ "epoch": 0.752,
1518
+ "grad_norm": 0.02847716886316666,
1519
+ "learning_rate": 1.382222222222222e-07,
1520
+ "logits/chosen": 0.3302246034145355,
1521
+ "logits/rejected": 0.5365234613418579,
1522
+ "logps/chosen": -278.1000061035156,
1523
+ "logps/rejected": -616.4000244140625,
1524
+ "loss": 0.021,
1525
+ "nll_loss": 1.017187476158142,
1526
+ "rewards/accuracies": 0.987500011920929,
1527
+ "rewards/chosen": 0.10097656399011612,
1528
+ "rewards/margins": 23.318750381469727,
1529
+ "rewards/rejected": -23.237499237060547,
1530
+ "step": 940
1531
+ },
1532
+ {
1533
+ "epoch": 0.76,
1534
+ "grad_norm": 0.0045187999902578015,
1535
+ "learning_rate": 1.3377777777777777e-07,
1536
+ "logits/chosen": 0.31437987089157104,
1537
+ "logits/rejected": 0.5342773199081421,
1538
+ "logps/chosen": -319.70001220703125,
1539
+ "logps/rejected": -623.0,
1540
+ "loss": 0.001,
1541
+ "nll_loss": 0.9664062261581421,
1542
+ "rewards/accuracies": 1.0,
1543
+ "rewards/chosen": 0.3951171934604645,
1544
+ "rewards/margins": 24.412500381469727,
1545
+ "rewards/rejected": -24.0,
1546
+ "step": 950
1547
+ },
1548
+ {
1549
+ "epoch": 0.768,
1550
+ "grad_norm": 0.07958922138660834,
1551
+ "learning_rate": 1.2933333333333333e-07,
1552
+ "logits/chosen": 0.3418945372104645,
1553
+ "logits/rejected": 0.6171875,
1554
+ "logps/chosen": -278.20001220703125,
1555
+ "logps/rejected": -639.2000122070312,
1556
+ "loss": 0.0009,
1557
+ "nll_loss": 0.899609386920929,
1558
+ "rewards/accuracies": 1.0,
1559
+ "rewards/chosen": 0.687060534954071,
1560
+ "rewards/margins": 24.649999618530273,
1561
+ "rewards/rejected": -24.0,
1562
+ "step": 960
1563
+ },
1564
+ {
1565
+ "epoch": 0.776,
1566
+ "grad_norm": 0.008878025257232514,
1567
+ "learning_rate": 1.2488888888888889e-07,
1568
+ "logits/chosen": 0.36860352754592896,
1569
+ "logits/rejected": 0.5000976324081421,
1570
+ "logps/chosen": -252.1999969482422,
1571
+ "logps/rejected": -642.4000244140625,
1572
+ "loss": 0.0008,
1573
+ "nll_loss": 0.837890625,
1574
+ "rewards/accuracies": 1.0,
1575
+ "rewards/chosen": 0.6498047113418579,
1576
+ "rewards/margins": 24.712499618530273,
1577
+ "rewards/rejected": -24.075000762939453,
1578
+ "step": 970
1579
+ },
1580
+ {
1581
+ "epoch": 0.784,
1582
+ "grad_norm": 0.00886428654417825,
1583
+ "learning_rate": 1.2044444444444445e-07,
1584
+ "logits/chosen": 0.3182617127895355,
1585
+ "logits/rejected": 0.526562511920929,
1586
+ "logps/chosen": -296.20001220703125,
1587
+ "logps/rejected": -644.4000244140625,
1588
+ "loss": 0.0009,
1589
+ "nll_loss": 0.914843738079071,
1590
+ "rewards/accuracies": 1.0,
1591
+ "rewards/chosen": 0.883837878704071,
1592
+ "rewards/margins": 24.725000381469727,
1593
+ "rewards/rejected": -23.837499618530273,
1594
+ "step": 980
1595
+ },
1596
+ {
1597
+ "epoch": 0.792,
1598
+ "grad_norm": 0.010048774551857776,
1599
+ "learning_rate": 1.16e-07,
1600
+ "logits/chosen": 0.20156249403953552,
1601
+ "logits/rejected": 0.45976561307907104,
1602
+ "logps/chosen": -333.5,
1603
+ "logps/rejected": -593.0,
1604
+ "loss": 0.0011,
1605
+ "nll_loss": 1.0207030773162842,
1606
+ "rewards/accuracies": 1.0,
1607
+ "rewards/chosen": 0.47138673067092896,
1608
+ "rewards/margins": 22.174999237060547,
1609
+ "rewards/rejected": -21.6875,
1610
+ "step": 990
1611
+ },
1612
+ {
1613
+ "epoch": 0.8,
1614
+ "grad_norm": 5.74291381581626,
1615
+ "learning_rate": 1.1155555555555555e-07,
1616
+ "logits/chosen": 0.3219238221645355,
1617
+ "logits/rejected": 0.49858397245407104,
1618
+ "logps/chosen": -278.75,
1619
+ "logps/rejected": -644.2000122070312,
1620
+ "loss": 0.0019,
1621
+ "nll_loss": 1.019140601158142,
1622
+ "rewards/accuracies": 1.0,
1623
+ "rewards/chosen": 0.7679687738418579,
1624
+ "rewards/margins": 25.0,
1625
+ "rewards/rejected": -24.200000762939453,
1626
+ "step": 1000
1627
+ },
1628
+ {
1629
+ "epoch": 0.808,
1630
+ "grad_norm": 0.02192293336950942,
1631
+ "learning_rate": 1.0711111111111111e-07,
1632
+ "logits/chosen": 0.569140613079071,
1633
+ "logits/rejected": 0.702343761920929,
1634
+ "logps/chosen": -254.3000030517578,
1635
+ "logps/rejected": -685.5999755859375,
1636
+ "loss": 0.0009,
1637
+ "nll_loss": 0.868359386920929,
1638
+ "rewards/accuracies": 1.0,
1639
+ "rewards/chosen": 0.591113269329071,
1640
+ "rewards/margins": 27.024999618530273,
1641
+ "rewards/rejected": -26.424999237060547,
1642
+ "step": 1010
1643
+ },
1644
+ {
1645
+ "epoch": 0.816,
1646
+ "grad_norm": 0.009300952984474926,
1647
+ "learning_rate": 1.0266666666666666e-07,
1648
+ "logits/chosen": 0.533111572265625,
1649
+ "logits/rejected": 0.6361328363418579,
1650
+ "logps/chosen": -233.25,
1651
+ "logps/rejected": -630.4000244140625,
1652
+ "loss": 0.0009,
1653
+ "nll_loss": 0.8453124761581421,
1654
+ "rewards/accuracies": 1.0,
1655
+ "rewards/chosen": 0.5562499761581421,
1656
+ "rewards/margins": 24.325000762939453,
1657
+ "rewards/rejected": -23.75,
1658
+ "step": 1020
1659
+ },
1660
+ {
1661
+ "epoch": 0.824,
1662
+ "grad_norm": 0.046196100888199323,
1663
+ "learning_rate": 9.822222222222222e-08,
1664
+ "logits/chosen": 0.42668455839157104,
1665
+ "logits/rejected": 0.6005859375,
1666
+ "logps/chosen": -274.5,
1667
+ "logps/rejected": -626.4000244140625,
1668
+ "loss": 0.001,
1669
+ "nll_loss": 0.940625011920929,
1670
+ "rewards/accuracies": 1.0,
1671
+ "rewards/chosen": 0.694628894329071,
1672
+ "rewards/margins": 24.924999237060547,
1673
+ "rewards/rejected": -24.25,
1674
+ "step": 1030
1675
+ },
1676
+ {
1677
+ "epoch": 0.832,
1678
+ "grad_norm": 0.005292906779736119,
1679
+ "learning_rate": 9.377777777777778e-08,
1680
+ "logits/chosen": 0.4056640565395355,
1681
+ "logits/rejected": 0.615234375,
1682
+ "logps/chosen": -284.70001220703125,
1683
+ "logps/rejected": -645.5999755859375,
1684
+ "loss": 0.0041,
1685
+ "nll_loss": 0.9710937738418579,
1686
+ "rewards/accuracies": 1.0,
1687
+ "rewards/chosen": 0.47343748807907104,
1688
+ "rewards/margins": 25.4375,
1689
+ "rewards/rejected": -24.962499618530273,
1690
+ "step": 1040
1691
+ },
1692
+ {
1693
+ "epoch": 0.84,
1694
+ "grad_norm": 0.07957225270899694,
1695
+ "learning_rate": 8.933333333333333e-08,
1696
+ "logits/chosen": 0.4527343809604645,
1697
+ "logits/rejected": 0.659960925579071,
1698
+ "logps/chosen": -297.8999938964844,
1699
+ "logps/rejected": -642.2000122070312,
1700
+ "loss": 0.0011,
1701
+ "nll_loss": 1.062890648841858,
1702
+ "rewards/accuracies": 1.0,
1703
+ "rewards/chosen": 0.14885254204273224,
1704
+ "rewards/margins": 25.674999237060547,
1705
+ "rewards/rejected": -25.549999237060547,
1706
+ "step": 1050
1707
+ },
1708
+ {
1709
+ "epoch": 0.848,
1710
+ "grad_norm": 0.02087149426239816,
1711
+ "learning_rate": 8.488888888888889e-08,
1712
+ "logits/chosen": 0.45966798067092896,
1713
+ "logits/rejected": 0.640820324420929,
1714
+ "logps/chosen": -267.29998779296875,
1715
+ "logps/rejected": -657.5999755859375,
1716
+ "loss": 0.0048,
1717
+ "nll_loss": 0.8785156011581421,
1718
+ "rewards/accuracies": 1.0,
1719
+ "rewards/chosen": 0.26695555448532104,
1720
+ "rewards/margins": 27.325000762939453,
1721
+ "rewards/rejected": -27.0625,
1722
+ "step": 1060
1723
+ },
1724
+ {
1725
+ "epoch": 0.856,
1726
+ "grad_norm": 0.013941417548488667,
1727
+ "learning_rate": 8.044444444444445e-08,
1728
+ "logits/chosen": 0.39580076932907104,
1729
+ "logits/rejected": 0.616015613079071,
1730
+ "logps/chosen": -279.3999938964844,
1731
+ "logps/rejected": -689.5999755859375,
1732
+ "loss": 0.001,
1733
+ "nll_loss": 0.974609375,
1734
+ "rewards/accuracies": 1.0,
1735
+ "rewards/chosen": 0.3509277403354645,
1736
+ "rewards/margins": 29.162500381469727,
1737
+ "rewards/rejected": -28.825000762939453,
1738
+ "step": 1070
1739
+ },
1740
+ {
1741
+ "epoch": 0.864,
1742
+ "grad_norm": 0.05356592295983735,
1743
+ "learning_rate": 7.599999999999999e-08,
1744
+ "logits/chosen": 0.35834962129592896,
1745
+ "logits/rejected": 0.5601562261581421,
1746
+ "logps/chosen": -264.6000061035156,
1747
+ "logps/rejected": -654.4000244140625,
1748
+ "loss": 0.001,
1749
+ "nll_loss": 0.9996093511581421,
1750
+ "rewards/accuracies": 1.0,
1751
+ "rewards/chosen": 0.139892578125,
1752
+ "rewards/margins": 26.475000381469727,
1753
+ "rewards/rejected": -26.337499618530273,
1754
+ "step": 1080
1755
+ },
1756
+ {
1757
+ "epoch": 0.872,
1758
+ "grad_norm": 0.04051423211552107,
1759
+ "learning_rate": 7.155555555555555e-08,
1760
+ "logits/chosen": 0.4306640625,
1761
+ "logits/rejected": 0.589648425579071,
1762
+ "logps/chosen": -277.3999938964844,
1763
+ "logps/rejected": -653.5999755859375,
1764
+ "loss": 0.0045,
1765
+ "nll_loss": 0.9339843988418579,
1766
+ "rewards/accuracies": 1.0,
1767
+ "rewards/chosen": 0.3447265625,
1768
+ "rewards/margins": 26.325000762939453,
1769
+ "rewards/rejected": -25.975000381469727,
1770
+ "step": 1090
1771
+ },
1772
+ {
1773
+ "epoch": 0.88,
1774
+ "grad_norm": 0.02104507845831199,
1775
+ "learning_rate": 6.71111111111111e-08,
1776
+ "logits/chosen": 0.269287109375,
1777
+ "logits/rejected": 0.5531250238418579,
1778
+ "logps/chosen": -333.29998779296875,
1779
+ "logps/rejected": -603.0,
1780
+ "loss": 0.001,
1781
+ "nll_loss": 0.9921875,
1782
+ "rewards/accuracies": 1.0,
1783
+ "rewards/chosen": 0.49541014432907104,
1784
+ "rewards/margins": 23.712499618530273,
1785
+ "rewards/rejected": -23.212499618530273,
1786
+ "step": 1100
1787
+ },
1788
+ {
1789
+ "epoch": 0.888,
1790
+ "grad_norm": 0.026162991645433887,
1791
+ "learning_rate": 6.266666666666666e-08,
1792
+ "logits/chosen": 0.5933593511581421,
1793
+ "logits/rejected": 0.702343761920929,
1794
+ "logps/chosen": -242.4499969482422,
1795
+ "logps/rejected": -657.5999755859375,
1796
+ "loss": 0.0009,
1797
+ "nll_loss": 0.883984386920929,
1798
+ "rewards/accuracies": 1.0,
1799
+ "rewards/chosen": 0.2967773377895355,
1800
+ "rewards/margins": 26.862499237060547,
1801
+ "rewards/rejected": -26.575000762939453,
1802
+ "step": 1110
1803
+ },
1804
+ {
1805
+ "epoch": 0.896,
1806
+ "grad_norm": 0.03066308474144947,
1807
+ "learning_rate": 5.822222222222222e-08,
1808
+ "logits/chosen": 0.4716796875,
1809
+ "logits/rejected": 0.6839843988418579,
1810
+ "logps/chosen": -220.10000610351562,
1811
+ "logps/rejected": -684.7999877929688,
1812
+ "loss": 0.0009,
1813
+ "nll_loss": 0.9468749761581421,
1814
+ "rewards/accuracies": 1.0,
1815
+ "rewards/chosen": 0.584765613079071,
1816
+ "rewards/margins": 28.125,
1817
+ "rewards/rejected": -27.549999237060547,
1818
+ "step": 1120
1819
+ },
1820
+ {
1821
+ "epoch": 0.904,
1822
+ "grad_norm": 0.023039708522050593,
1823
+ "learning_rate": 5.377777777777778e-08,
1824
+ "logits/chosen": 0.3741699159145355,
1825
+ "logits/rejected": 0.5889648199081421,
1826
+ "logps/chosen": -277.29998779296875,
1827
+ "logps/rejected": -665.7999877929688,
1828
+ "loss": 0.0014,
1829
+ "nll_loss": 0.9859374761581421,
1830
+ "rewards/accuracies": 1.0,
1831
+ "rewards/chosen": 0.4515624940395355,
1832
+ "rewards/margins": 26.174999237060547,
1833
+ "rewards/rejected": -25.725000381469727,
1834
+ "step": 1130
1835
+ },
1836
+ {
1837
+ "epoch": 0.912,
1838
+ "grad_norm": 0.04174862262602521,
1839
+ "learning_rate": 4.933333333333333e-08,
1840
+ "logits/chosen": 0.3539062440395355,
1841
+ "logits/rejected": 0.5293945074081421,
1842
+ "logps/chosen": -331.8999938964844,
1843
+ "logps/rejected": -592.0,
1844
+ "loss": 0.0064,
1845
+ "nll_loss": 0.887890636920929,
1846
+ "rewards/accuracies": 1.0,
1847
+ "rewards/chosen": 0.621826171875,
1848
+ "rewards/margins": 22.587499618530273,
1849
+ "rewards/rejected": -21.975000381469727,
1850
+ "step": 1140
1851
+ },
1852
+ {
1853
+ "epoch": 0.92,
1854
+ "grad_norm": 0.36396437493512307,
1855
+ "learning_rate": 4.4888888888888885e-08,
1856
+ "logits/chosen": 0.39692384004592896,
1857
+ "logits/rejected": 0.5400390625,
1858
+ "logps/chosen": -262.8999938964844,
1859
+ "logps/rejected": -643.7999877929688,
1860
+ "loss": 0.001,
1861
+ "nll_loss": 0.9195312261581421,
1862
+ "rewards/accuracies": 1.0,
1863
+ "rewards/chosen": 0.6341797113418579,
1864
+ "rewards/margins": 25.431249618530273,
1865
+ "rewards/rejected": -24.799999237060547,
1866
+ "step": 1150
1867
+ },
1868
+ {
1869
+ "epoch": 0.928,
1870
+ "grad_norm": 0.01230667079616308,
1871
+ "learning_rate": 4.044444444444444e-08,
1872
+ "logits/chosen": 0.29730224609375,
1873
+ "logits/rejected": 0.5694335699081421,
1874
+ "logps/chosen": -283.8999938964844,
1875
+ "logps/rejected": -612.5999755859375,
1876
+ "loss": 0.0009,
1877
+ "nll_loss": 0.8515625,
1878
+ "rewards/accuracies": 1.0,
1879
+ "rewards/chosen": 0.666796863079071,
1880
+ "rewards/margins": 24.575000762939453,
1881
+ "rewards/rejected": -23.899999618530273,
1882
+ "step": 1160
1883
+ },
1884
+ {
1885
+ "epoch": 0.936,
1886
+ "grad_norm": 0.014453975200438642,
1887
+ "learning_rate": 3.6e-08,
1888
+ "logits/chosen": 0.3432373106479645,
1889
+ "logits/rejected": 0.5855468511581421,
1890
+ "logps/chosen": -291.3999938964844,
1891
+ "logps/rejected": -665.2000122070312,
1892
+ "loss": 0.001,
1893
+ "nll_loss": 1.019921898841858,
1894
+ "rewards/accuracies": 1.0,
1895
+ "rewards/chosen": 0.6011718511581421,
1896
+ "rewards/margins": 26.850000381469727,
1897
+ "rewards/rejected": -26.262500762939453,
1898
+ "step": 1170
1899
+ },
1900
+ {
1901
+ "epoch": 0.944,
1902
+ "grad_norm": 0.01768958135813815,
1903
+ "learning_rate": 3.155555555555556e-08,
1904
+ "logits/chosen": 0.31098634004592896,
1905
+ "logits/rejected": 0.5472656488418579,
1906
+ "logps/chosen": -295.70001220703125,
1907
+ "logps/rejected": -587.2000122070312,
1908
+ "loss": 0.0055,
1909
+ "nll_loss": 0.8902343511581421,
1910
+ "rewards/accuracies": 1.0,
1911
+ "rewards/chosen": 0.17841796576976776,
1912
+ "rewards/margins": 22.399999618530273,
1913
+ "rewards/rejected": -22.225000381469727,
1914
+ "step": 1180
1915
+ },
1916
+ {
1917
+ "epoch": 0.952,
1918
+ "grad_norm": 0.009303330717789412,
1919
+ "learning_rate": 2.7111111111111108e-08,
1920
+ "logits/chosen": 0.263427734375,
1921
+ "logits/rejected": 0.49003905057907104,
1922
+ "logps/chosen": -262.6000061035156,
1923
+ "logps/rejected": -649.2000122070312,
1924
+ "loss": 0.0009,
1925
+ "nll_loss": 0.8871093988418579,
1926
+ "rewards/accuracies": 1.0,
1927
+ "rewards/chosen": 0.3846679627895355,
1928
+ "rewards/margins": 25.137500762939453,
1929
+ "rewards/rejected": -24.762500762939453,
1930
+ "step": 1190
1931
+ },
1932
+ {
1933
+ "epoch": 0.96,
1934
+ "grad_norm": 0.033005829470572054,
1935
+ "learning_rate": 2.2666666666666668e-08,
1936
+ "logits/chosen": 0.3676391541957855,
1937
+ "logits/rejected": 0.5830078125,
1938
+ "logps/chosen": -295.70001220703125,
1939
+ "logps/rejected": -625.2000122070312,
1940
+ "loss": 0.001,
1941
+ "nll_loss": 0.9925781488418579,
1942
+ "rewards/accuracies": 1.0,
1943
+ "rewards/chosen": 0.7079833745956421,
1944
+ "rewards/margins": 25.0625,
1945
+ "rewards/rejected": -24.3125,
1946
+ "step": 1200
1947
+ },
1948
+ {
1949
+ "epoch": 0.968,
1950
+ "grad_norm": 0.040751069146410926,
1951
+ "learning_rate": 1.822222222222222e-08,
1952
+ "logits/chosen": 0.3670410215854645,
1953
+ "logits/rejected": 0.5015624761581421,
1954
+ "logps/chosen": -256.29998779296875,
1955
+ "logps/rejected": -645.4000244140625,
1956
+ "loss": 0.0049,
1957
+ "nll_loss": 0.907031238079071,
1958
+ "rewards/accuracies": 1.0,
1959
+ "rewards/chosen": 0.8086913824081421,
1960
+ "rewards/margins": 26.362499237060547,
1961
+ "rewards/rejected": -25.549999237060547,
1962
+ "step": 1210
1963
+ },
1964
+ {
1965
+ "epoch": 0.976,
1966
+ "grad_norm": 0.28250040304556245,
1967
+ "learning_rate": 1.3777777777777778e-08,
1968
+ "logits/chosen": 0.431640625,
1969
+ "logits/rejected": 0.626171886920929,
1970
+ "logps/chosen": -267.3500061035156,
1971
+ "logps/rejected": -647.5999755859375,
1972
+ "loss": 0.0021,
1973
+ "nll_loss": 0.9437500238418579,
1974
+ "rewards/accuracies": 1.0,
1975
+ "rewards/chosen": 0.539379894733429,
1976
+ "rewards/margins": 26.012500762939453,
1977
+ "rewards/rejected": -25.462499618530273,
1978
+ "step": 1220
1979
+ },
1980
+ {
1981
+ "epoch": 0.984,
1982
+ "grad_norm": 0.039993394141365025,
1983
+ "learning_rate": 9.333333333333334e-09,
1984
+ "logits/chosen": 0.45268553495407104,
1985
+ "logits/rejected": 0.6796875,
1986
+ "logps/chosen": -277.79998779296875,
1987
+ "logps/rejected": -639.5999755859375,
1988
+ "loss": 0.0008,
1989
+ "nll_loss": 0.813281238079071,
1990
+ "rewards/accuracies": 1.0,
1991
+ "rewards/chosen": 0.44189453125,
1992
+ "rewards/margins": 26.350000381469727,
1993
+ "rewards/rejected": -25.924999237060547,
1994
+ "step": 1230
1995
+ },
1996
+ {
1997
+ "epoch": 0.992,
1998
+ "grad_norm": 0.010994332832260341,
1999
+ "learning_rate": 4.888888888888888e-09,
2000
+ "logits/chosen": 0.42723387479782104,
2001
+ "logits/rejected": 0.5927734375,
2002
+ "logps/chosen": -252.60000610351562,
2003
+ "logps/rejected": -644.7999877929688,
2004
+ "loss": 0.0012,
2005
+ "nll_loss": 0.8550781011581421,
2006
+ "rewards/accuracies": 1.0,
2007
+ "rewards/chosen": 0.605175793170929,
2008
+ "rewards/margins": 25.524999618530273,
2009
+ "rewards/rejected": -24.924999237060547,
2010
+ "step": 1240
2011
+ },
2012
+ {
2013
+ "epoch": 1.0,
2014
+ "grad_norm": 0.02065652761694791,
2015
+ "learning_rate": 4.4444444444444443e-10,
2016
+ "logits/chosen": 0.35319823026657104,
2017
+ "logits/rejected": 0.5824218988418579,
2018
+ "logps/chosen": -258.6000061035156,
2019
+ "logps/rejected": -651.2000122070312,
2020
+ "loss": 0.0134,
2021
+ "nll_loss": 0.932421863079071,
2022
+ "rewards/accuracies": 0.987500011920929,
2023
+ "rewards/chosen": 0.7752929925918579,
2024
+ "rewards/margins": 26.612499237060547,
2025
+ "rewards/rejected": -25.837499618530273,
2026
+ "step": 1250
2027
+ },
2028
+ {
2029
+ "epoch": 1.0,
2030
+ "eval_logits/chosen": 0.22201773524284363,
2031
+ "eval_logits/rejected": 0.42946213483810425,
2032
+ "eval_logps/chosen": -328.9230651855469,
2033
+ "eval_logps/rejected": -597.076904296875,
2034
+ "eval_loss": 0.012361373752355576,
2035
+ "eval_nll_loss": 0.9699519276618958,
2036
+ "eval_rewards/accuracies": 0.9903846383094788,
2037
+ "eval_rewards/chosen": 0.4366079568862915,
2038
+ "eval_rewards/margins": 22.413461685180664,
2039
+ "eval_rewards/rejected": -21.975961685180664,
2040
+ "eval_runtime": 8.634,
2041
+ "eval_samples_per_second": 11.582,
2042
+ "eval_steps_per_second": 1.506,
2043
+ "step": 1250
2044
+ },
2045
+ {
2046
+ "epoch": 1.0,
2047
+ "step": 1250,
2048
+ "total_flos": 0.0,
2049
+ "train_loss": 0.02150259389877319,
2050
+ "train_runtime": 2425.829,
2051
+ "train_samples_per_second": 4.122,
2052
+ "train_steps_per_second": 0.515
2053
+ }
2054
+ ],
2055
+ "logging_steps": 10,
2056
+ "max_steps": 1250,
2057
+ "num_input_tokens_seen": 0,
2058
+ "num_train_epochs": 1,
2059
+ "save_steps": 500,
2060
+ "stateful_callbacks": {
2061
+ "TrainerControl": {
2062
+ "args": {
2063
+ "should_epoch_stop": false,
2064
+ "should_evaluate": false,
2065
+ "should_log": false,
2066
+ "should_save": true,
2067
+ "should_training_stop": true
2068
+ },
2069
+ "attributes": {}
2070
+ }
2071
+ },
2072
+ "total_flos": 0.0,
2073
+ "train_batch_size": 2,
2074
+ "trial_name": null,
2075
+ "trial_params": null
2076
+ }