amirali1985 commited on
Commit
b09eb77
·
verified ·
1 Parent(s): 4d8109c

Upload add_sub_sorl_v1_abs50_50K

Browse files
add_sub_sorl_v1_abs50_50K/config.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "SorlModelWrapper"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": null,
8
+ "dtype": "float32",
9
+ "eos_token_id": null,
10
+ "head_dim": 128,
11
+ "hidden_act": "silu",
12
+ "hidden_size": 510,
13
+ "initializer_range": 0.02,
14
+ "intermediate_size": 2040,
15
+ "layer_types": [
16
+ "full_attention",
17
+ "full_attention"
18
+ ],
19
+ "max_position_embeddings": 128,
20
+ "max_window_layers": 28,
21
+ "model_type": "qwen3",
22
+ "num_attention_heads": 3,
23
+ "num_hidden_layers": 2,
24
+ "num_key_value_heads": 3,
25
+ "pad_token_id": null,
26
+ "rms_norm_eps": 1e-06,
27
+ "rope_parameters": {
28
+ "rope_theta": 10000.0,
29
+ "rope_type": "default"
30
+ },
31
+ "sliding_window": null,
32
+ "tie_word_embeddings": false,
33
+ "transformers_version": "5.5.0",
34
+ "use_cache": true,
35
+ "use_sliding_window": false,
36
+ "vocab_size": 151694
37
+ }
add_sub_sorl_v1_abs50_50K/generation_config.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "output_attentions": false,
4
+ "output_hidden_states": false,
5
+ "transformers_version": "5.5.0",
6
+ "use_cache": true
7
+ }
add_sub_sorl_v1_abs50_50K/metrics.json ADDED
@@ -0,0 +1,2257 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "history": {
3
+ "step": [
4
+ 50,
5
+ 100,
6
+ 150,
7
+ 200,
8
+ 250,
9
+ 300,
10
+ 350,
11
+ 400,
12
+ 450,
13
+ 500,
14
+ 550,
15
+ 600,
16
+ 650,
17
+ 700,
18
+ 750,
19
+ 832,
20
+ 882,
21
+ 932,
22
+ 982,
23
+ 1032,
24
+ 1082,
25
+ 1132,
26
+ 1182,
27
+ 1232,
28
+ 1282,
29
+ 1332,
30
+ 1382,
31
+ 1432,
32
+ 1482,
33
+ 1532,
34
+ 1614,
35
+ 1664,
36
+ 1714,
37
+ 1764,
38
+ 1814,
39
+ 1864,
40
+ 1914,
41
+ 1964,
42
+ 2014,
43
+ 2064,
44
+ 2114,
45
+ 2164,
46
+ 2214,
47
+ 2264,
48
+ 2314,
49
+ 2396,
50
+ 2446,
51
+ 2496,
52
+ 2546,
53
+ 2596,
54
+ 2646,
55
+ 2696,
56
+ 2746,
57
+ 2796,
58
+ 2846,
59
+ 2896,
60
+ 2946,
61
+ 2996,
62
+ 3046,
63
+ 3096,
64
+ 3178,
65
+ 3228,
66
+ 3278,
67
+ 3328,
68
+ 3378,
69
+ 3428,
70
+ 3478,
71
+ 3528,
72
+ 3578,
73
+ 3628,
74
+ 3678,
75
+ 3728,
76
+ 3778,
77
+ 3828,
78
+ 3878,
79
+ 3960,
80
+ 4010,
81
+ 4060,
82
+ 4110,
83
+ 4160,
84
+ 4210,
85
+ 4260,
86
+ 4310,
87
+ 4360,
88
+ 4410,
89
+ 4460,
90
+ 4510,
91
+ 4560,
92
+ 4610,
93
+ 4660,
94
+ 4742,
95
+ 4792,
96
+ 4842,
97
+ 4892,
98
+ 4942,
99
+ 4992,
100
+ 5042,
101
+ 5092,
102
+ 5142,
103
+ 5192,
104
+ 5242,
105
+ 5292,
106
+ 5342,
107
+ 5392,
108
+ 5442,
109
+ 5524,
110
+ 5574,
111
+ 5624,
112
+ 5674,
113
+ 5724,
114
+ 5774,
115
+ 5824,
116
+ 5874,
117
+ 5924,
118
+ 5974,
119
+ 6024,
120
+ 6074,
121
+ 6124,
122
+ 6174,
123
+ 6224,
124
+ 6306,
125
+ 6356,
126
+ 6406,
127
+ 6456,
128
+ 6506,
129
+ 6556,
130
+ 6606,
131
+ 6656,
132
+ 6706,
133
+ 6756,
134
+ 6806,
135
+ 6856,
136
+ 6906,
137
+ 6956,
138
+ 7006,
139
+ 7088,
140
+ 7138,
141
+ 7188,
142
+ 7238,
143
+ 7288,
144
+ 7338,
145
+ 7388,
146
+ 7438,
147
+ 7488,
148
+ 7538,
149
+ 7588,
150
+ 7638,
151
+ 7688,
152
+ 7738,
153
+ 7788
154
+ ],
155
+ "loss": [
156
+ 7.256625652313232,
157
+ 3.0191478729248047,
158
+ 2.4123916625976562,
159
+ 2.1674225330352783,
160
+ 2.1556787490844727,
161
+ 1.8286030292510986,
162
+ 2.14564847946167,
163
+ 1.900666356086731,
164
+ 1.652380108833313,
165
+ 1.6646597385406494,
166
+ 1.5041191577911377,
167
+ 1.546173334121704,
168
+ 0.7197972536087036,
169
+ 0.08199936151504517,
170
+ -3.1339755058288574,
171
+ -10.444205284118652,
172
+ -10.103414535522461,
173
+ -12.214149475097656,
174
+ -12.902936935424805,
175
+ -13.465953826904297,
176
+ -13.666388511657715,
177
+ -14.849788665771484,
178
+ -14.489677429199219,
179
+ -14.085216522216797,
180
+ -15.04929256439209,
181
+ -14.683259010314941,
182
+ -13.91379451751709,
183
+ -15.094310760498047,
184
+ -13.880881309509277,
185
+ -14.374068260192871,
186
+ -14.42313003540039,
187
+ -14.229098320007324,
188
+ -15.09312629699707,
189
+ -14.730694770812988,
190
+ -14.6654634475708,
191
+ -13.932817459106445,
192
+ -14.541162490844727,
193
+ -13.534533500671387,
194
+ -14.629840850830078,
195
+ -14.549479484558105,
196
+ -14.572128295898438,
197
+ -14.297953605651855,
198
+ -14.748451232910156,
199
+ -14.658515930175781,
200
+ -14.846949577331543,
201
+ -14.570510864257812,
202
+ -14.642899513244629,
203
+ -15.091267585754395,
204
+ -14.61697769165039,
205
+ -14.35709285736084,
206
+ -14.233384132385254,
207
+ -13.99492073059082,
208
+ -13.887598037719727,
209
+ -13.822426795959473,
210
+ -13.868623733520508,
211
+ -13.392026901245117,
212
+ -14.518260955810547,
213
+ -12.745328903198242,
214
+ -9.385150909423828,
215
+ -8.4232177734375,
216
+ -5.4400715827941895,
217
+ -5.325277805328369,
218
+ -5.132301330566406,
219
+ -4.313063621520996,
220
+ -3.695901870727539,
221
+ -3.7530672550201416,
222
+ -3.7038750648498535,
223
+ -3.661875009536743,
224
+ -3.2753937244415283,
225
+ -3.360624074935913,
226
+ -2.504333257675171,
227
+ -3.0331249237060547,
228
+ -3.0124270915985107,
229
+ -2.910573720932007,
230
+ -3.7368786334991455,
231
+ -2.7395472526550293,
232
+ -2.6733858585357666,
233
+ -2.886018991470337,
234
+ -2.6889913082122803,
235
+ -2.5152089595794678,
236
+ -2.2789855003356934,
237
+ -2.7361295223236084,
238
+ -2.4921624660491943,
239
+ -2.7641701698303223,
240
+ -4.348557949066162,
241
+ -2.480908155441284,
242
+ -2.207150936126709,
243
+ -2.63488507270813,
244
+ -2.250746726989746,
245
+ -2.611670970916748,
246
+ -2.541576385498047,
247
+ -2.34275484085083,
248
+ -2.689805746078491,
249
+ -2.281898021697998,
250
+ -2.7649309635162354,
251
+ -2.1240293979644775,
252
+ -2.607470750808716,
253
+ -2.4728121757507324,
254
+ -2.0544490814208984,
255
+ -2.0300307273864746,
256
+ -2.1023619174957275,
257
+ -1.887721061706543,
258
+ -2.0482027530670166,
259
+ -1.9892445802688599,
260
+ -1.9722803831100464,
261
+ -2.084959030151367,
262
+ -2.446387529373169,
263
+ -1.835938811302185,
264
+ -2.5824408531188965,
265
+ -2.1360812187194824,
266
+ -1.9915828704833984,
267
+ -1.6431983709335327,
268
+ -1.4731082916259766,
269
+ -1.7144525051116943,
270
+ -1.7019168138504028,
271
+ -1.6519891023635864,
272
+ -1.7413341999053955,
273
+ -1.588895320892334,
274
+ -1.539756417274475,
275
+ -1.904645562171936,
276
+ -1.5546034574508667,
277
+ -1.7151820659637451,
278
+ -1.450798511505127,
279
+ -1.574430227279663,
280
+ -1.8458795547485352,
281
+ -1.5194255113601685,
282
+ -1.6603127717971802,
283
+ -2.0166845321655273,
284
+ -1.2642289400100708,
285
+ -1.7409981489181519,
286
+ -1.6157901287078857,
287
+ -1.5526642799377441,
288
+ -1.2320806980133057,
289
+ -1.5149484872817993,
290
+ -1.4476909637451172,
291
+ -1.6797325611114502,
292
+ -1.3895442485809326,
293
+ -1.231076717376709,
294
+ -1.2591882944107056,
295
+ -1.7371399402618408,
296
+ -1.5310636758804321,
297
+ -1.4542627334594727,
298
+ -1.2413599491119385,
299
+ -1.226067304611206,
300
+ -1.6244958639144897,
301
+ -1.3765522241592407,
302
+ -1.362652063369751,
303
+ -1.392403244972229,
304
+ -1.3551605939865112,
305
+ -1.0807950496673584
306
+ ],
307
+ "base_loss": [
308
+ 6.029807090759277,
309
+ 2.3067009449005127,
310
+ 1.9342564344406128,
311
+ 1.8905317783355713,
312
+ 1.8440930843353271,
313
+ 1.8317269086837769,
314
+ 1.8801648616790771,
315
+ 1.8195487260818481,
316
+ 1.7616671323776245,
317
+ 1.7560369968414307,
318
+ 1.7867367267608643,
319
+ 1.7833213806152344,
320
+ 1.7948106527328491,
321
+ 1.7519046068191528,
322
+ 1.948911428451538,
323
+ 2.001823902130127,
324
+ 1.8075017929077148,
325
+ 1.8323737382888794,
326
+ 1.797086477279663,
327
+ 1.8139381408691406,
328
+ 1.751043438911438,
329
+ 1.8313592672348022,
330
+ 1.751556634902954,
331
+ 1.7140840291976929,
332
+ 1.789836049079895,
333
+ 1.7125715017318726,
334
+ 1.6791929006576538,
335
+ 1.7678550481796265,
336
+ 1.6537859439849854,
337
+ 1.6602624654769897,
338
+ 1.6753387451171875,
339
+ 1.6629526615142822,
340
+ 1.7369420528411865,
341
+ 1.683754563331604,
342
+ 1.6948719024658203,
343
+ 1.6238089799880981,
344
+ 1.6669832468032837,
345
+ 1.5685263872146606,
346
+ 1.6912544965744019,
347
+ 1.6462609767913818,
348
+ 1.6818253993988037,
349
+ 1.636511206626892,
350
+ 1.6857174634933472,
351
+ 1.669575810432434,
352
+ 1.6751861572265625,
353
+ 1.6484739780426025,
354
+ 1.6467863321304321,
355
+ 1.6952067613601685,
356
+ 1.6533762216567993,
357
+ 1.616503357887268,
358
+ 1.6140310764312744,
359
+ 1.5757274627685547,
360
+ 1.5750893354415894,
361
+ 1.556767225265503,
362
+ 1.5602978467941284,
363
+ 1.515640139579773,
364
+ 1.6252305507659912,
365
+ 1.4415589570999146,
366
+ 1.0609798431396484,
367
+ 0.9612196683883667,
368
+ 0.616158127784729,
369
+ 0.6064944863319397,
370
+ 0.5862141251564026,
371
+ 0.4924693703651428,
372
+ 0.4313984513282776,
373
+ 0.42700475454330444,
374
+ 0.42180582880973816,
375
+ 0.41957685351371765,
376
+ 0.37408900260925293,
377
+ 0.38498300313949585,
378
+ 0.28954702615737915,
379
+ 0.348208487033844,
380
+ 0.3462330996990204,
381
+ 0.3341756761074066,
382
+ 0.4224683344364166,
383
+ 0.3134305477142334,
384
+ 0.3083657920360565,
385
+ 0.3302149474620819,
386
+ 0.30693626403808594,
387
+ 0.2877400815486908,
388
+ 0.2639155685901642,
389
+ 0.3185356557369232,
390
+ 0.2856982350349426,
391
+ 0.31456050276756287,
392
+ 0.49335941672325134,
393
+ 0.28367578983306885,
394
+ 0.25294229388237,
395
+ 0.30259618163108826,
396
+ 0.2595691978931427,
397
+ 0.30289167165756226,
398
+ 0.291527658700943,
399
+ 0.26810169219970703,
400
+ 0.3059633672237396,
401
+ 0.2611132562160492,
402
+ 0.3158523738384247,
403
+ 0.2444075345993042,
404
+ 0.29604029655456543,
405
+ 0.2825319468975067,
406
+ 0.23427459597587585,
407
+ 0.23259370028972626,
408
+ 0.2404058873653412,
409
+ 0.21756012737751007,
410
+ 0.23280969262123108,
411
+ 0.22855813801288605,
412
+ 0.2257063388824463,
413
+ 0.23912663757801056,
414
+ 0.27966248989105225,
415
+ 0.21003004908561707,
416
+ 0.29382458329200745,
417
+ 0.2435278445482254,
418
+ 0.2295869141817093,
419
+ 0.18904468417167664,
420
+ 0.17142455279827118,
421
+ 0.1948322355747223,
422
+ 0.19676467776298523,
423
+ 0.19100677967071533,
424
+ 0.19870205223560333,
425
+ 0.18229179084300995,
426
+ 0.17854009568691254,
427
+ 0.2182820737361908,
428
+ 0.1792173683643341,
429
+ 0.19656817615032196,
430
+ 0.16844725608825684,
431
+ 0.18259193003177643,
432
+ 0.21280623972415924,
433
+ 0.17412368953227997,
434
+ 0.19087910652160645,
435
+ 0.22902394831180573,
436
+ 0.14896346628665924,
437
+ 0.20004956424236298,
438
+ 0.18591725826263428,
439
+ 0.17941395938396454,
440
+ 0.14243106544017792,
441
+ 0.17380957305431366,
442
+ 0.16619420051574707,
443
+ 0.19304493069648743,
444
+ 0.15973064303398132,
445
+ 0.1419793665409088,
446
+ 0.1448678970336914,
447
+ 0.2004563808441162,
448
+ 0.17605385184288025,
449
+ 0.16834628582000732,
450
+ 0.14428769052028656,
451
+ 0.14093129336833954,
452
+ 0.18535971641540527,
453
+ 0.1585533171892166,
454
+ 0.1568572074174881,
455
+ 0.15938758850097656,
456
+ 0.15521882474422455,
457
+ 0.126205712556839
458
+ ],
459
+ "info_loss": [
460
+ -0.20561504364013672,
461
+ -0.03972625732421875,
462
+ -0.048748135566711426,
463
+ -0.06750452518463135,
464
+ -0.06405246257781982,
465
+ -0.09477114677429199,
466
+ -0.06761360168457031,
467
+ -0.08672130107879639,
468
+ -0.10533666610717773,
469
+ -0.10386943817138672,
470
+ -0.12152564525604248,
471
+ -0.11574661731719971,
472
+ -0.19611811637878418,
473
+ -0.24177753925323486,
474
+ -0.5657413005828857,
475
+ -1.2865196466445923,
476
+ -1.2276489734649658,
477
+ -1.4379183053970337,
478
+ -1.5007377862930298,
479
+ -1.5571799278259277,
480
+ -1.5694807767868042,
481
+ -1.6956429481506348,
482
+ -1.6478774547576904,
483
+ -1.6037368774414062,
484
+ -1.7057857513427734,
485
+ -1.6578631401062012,
486
+ -1.5784200429916382,
487
+ -1.7046573162078857,
488
+ -1.569814682006836,
489
+ -1.6190284490585327,
490
+ -1.6252418756484985,
491
+ -1.6039395332336426,
492
+ -1.6967638731002808,
493
+ -1.6540355682373047,
494
+ -1.6475841999053955,
495
+ -1.5689774751663208,
496
+ -1.635087251663208,
497
+ -1.5238193273544312,
498
+ -1.6450031995773315,
499
+ -1.630631685256958,
500
+ -1.637630820274353,
501
+ -1.6060518026351929,
502
+ -1.6621246337890625,
503
+ -1.6464064121246338,
504
+ -1.6617577075958252,
505
+ -1.6344650983810425,
506
+ -1.6399197578430176,
507
+ -1.6889852285385132,
508
+ -1.639074444770813,
509
+ -1.6081275939941406,
510
+ -1.5951539278030396,
511
+ -1.566024661064148,
512
+ -1.5557743310928345,
513
+ -1.5495342016220093,
514
+ -1.5556063652038574,
515
+ -1.5009874105453491,
516
+ -1.6231387853622437,
517
+ -1.4322359561920166,
518
+ -1.0561802387237549,
519
+ -0.9466190338134766,
520
+ -0.6137336492538452,
521
+ -0.6038743257522583,
522
+ -0.5819218754768372,
523
+ -0.4909442663192749,
524
+ -0.4217419922351837,
525
+ -0.4254307448863983,
526
+ -0.4203161597251892,
527
+ -0.4181394577026367,
528
+ -0.3734082281589508,
529
+ -0.38232800364494324,
530
+ -0.2871572971343994,
531
+ -0.34667515754699707,
532
+ -0.3433464467525482,
533
+ -0.3323284983634949,
534
+ -0.4220069944858551,
535
+ -0.3129675090312958,
536
+ -0.30473649501800537,
537
+ -0.32990047335624695,
538
+ -0.3063972592353821,
539
+ -0.28762325644493103,
540
+ -0.2610863447189331,
541
+ -0.312552273273468,
542
+ -0.2848331332206726,
543
+ -0.3143031597137451,
544
+ -0.49174460768699646,
545
+ -0.2831017076969147,
546
+ -0.25272035598754883,
547
+ -0.3016011118888855,
548
+ -0.2592092454433441,
549
+ -0.2988108992576599,
550
+ -0.2913071811199188,
551
+ -0.2679586410522461,
552
+ -0.3054676353931427,
553
+ -0.26094746589660645,
554
+ -0.3152065873146057,
555
+ -0.24421927332878113,
556
+ -0.295553594827652,
557
+ -0.28226152062416077,
558
+ -0.2340558022260666,
559
+ -0.23213818669319153,
560
+ -0.24020898342132568,
561
+ -0.21750272810459137,
562
+ -0.23276019096374512,
563
+ -0.22842691838741302,
564
+ -0.22563359141349792,
565
+ -0.23904889822006226,
566
+ -0.2794150114059448,
567
+ -0.20988059043884277,
568
+ -0.29378125071525574,
569
+ -0.24314871430397034,
570
+ -0.2293650060892105,
571
+ -0.18900363147258759,
572
+ -0.17137543857097626,
573
+ -0.1948050558567047,
574
+ -0.19659706950187683,
575
+ -0.19093747437000275,
576
+ -0.19865652918815613,
577
+ -0.1822470873594284,
578
+ -0.1785041093826294,
579
+ -0.21816547214984894,
580
+ -0.17919395864009857,
581
+ -0.19648776948451996,
582
+ -0.16839498281478882,
583
+ -0.18214161694049835,
584
+ -0.212591290473938,
585
+ -0.17409944534301758,
586
+ -0.19081123173236847,
587
+ -0.2289951741695404,
588
+ -0.14660219848155975,
589
+ -0.20001469552516937,
590
+ -0.18589811027050018,
591
+ -0.1793726533651352,
592
+ -0.14236652851104736,
593
+ -0.1737845093011856,
594
+ -0.16613999009132385,
595
+ -0.19301597774028778,
596
+ -0.15971405804157257,
597
+ -0.14195962250232697,
598
+ -0.14485248923301697,
599
+ -0.20041891932487488,
600
+ -0.17602841556072235,
601
+ -0.16832193732261658,
602
+ -0.14427153766155243,
603
+ -0.14090751111507416,
604
+ -0.18532036244869232,
605
+ -0.1585266888141632,
606
+ -0.15683983266353607,
607
+ -0.15936271846294403,
608
+ -0.1551767736673355,
609
+ -0.12618587911128998
610
+ ],
611
+ "abs_loss": [
612
+ 3.7497520446777344,
613
+ 2.6481971740722656,
614
+ 2.8985848426818848,
615
+ 2.818944215774536,
616
+ 2.870543956756592,
617
+ 2.872476100921631,
618
+ 2.758188247680664,
619
+ 2.8428196907043457,
620
+ 2.7872250080108643,
621
+ 2.910464286804199,
622
+ 2.752615213394165,
623
+ 2.7820749282836914,
624
+ 2.7789087295532227,
625
+ 2.835494041442871,
626
+ 2.5386619567871094,
627
+ 2.0468544960021973,
628
+ 1.7976104021072388,
629
+ 1.6245003938674927,
630
+ 1.4929277896881104,
631
+ 1.4750802516937256,
632
+ 1.5771673917770386,
633
+ 1.666552186012268,
634
+ 1.0843603610992432,
635
+ 1.545884132385254,
636
+ 1.4265331029891968,
637
+ 0.999521017074585,
638
+ 1.2854008674621582,
639
+ 1.195152759552002,
640
+ 1.1227796077728271,
641
+ 0.9406141042709351,
642
+ 1.0983028411865234,
643
+ 0.85012286901474,
644
+ 0.8116261959075928,
645
+ 0.7224507927894592,
646
+ 0.8427834510803223,
647
+ 0.8805156946182251,
648
+ 0.8626446723937988,
649
+ 0.9968369007110596,
650
+ 0.9508752822875977,
651
+ 0.8513461351394653,
652
+ 0.6574622392654419,
653
+ 0.8986561298370361,
654
+ 0.5942917466163635,
655
+ 0.7797585129737854,
656
+ 0.7031676173210144,
657
+ 0.6163696050643921,
658
+ 0.6315547227859497,
659
+ 0.7303791046142578,
660
+ 0.5330150723457336,
661
+ 0.5637519359588623,
662
+ 0.5594895482063293,
663
+ 0.5726730823516846,
664
+ 0.5400386452674866,
665
+ 0.5256966352462769,
666
+ 0.44924086332321167,
667
+ 0.47774025797843933,
668
+ 0.5889784097671509,
669
+ 0.4862280488014221,
670
+ 0.5457887649536133,
671
+ 0.4811607599258423,
672
+ 0.5763522982597351,
673
+ 0.4618305563926697,
674
+ 0.49888068437576294,
675
+ 0.5134003758430481,
676
+ 0.4723852872848511,
677
+ 0.4225727915763855,
678
+ 0.3785339295864105,
679
+ 0.3270162343978882,
680
+ 0.3133487105369568,
681
+ 0.33950677514076233,
682
+ 0.32967162132263184,
683
+ 0.3334130644798279,
684
+ 0.31977397203445435,
685
+ 0.39836356043815613,
686
+ 0.3870335817337036,
687
+ 0.27905893325805664,
688
+ 0.38264474272727966,
689
+ 0.3101273775100708,
690
+ 0.2242627888917923,
691
+ 0.24193796515464783,
692
+ 0.222909078001976,
693
+ 0.20927608013153076,
694
+ 0.23676183819770813,
695
+ 0.30053985118865967,
696
+ 0.34205877780914307,
697
+ 0.22115281224250793,
698
+ 0.252194344997406,
699
+ 0.2795361876487732,
700
+ 0.2963107228279114,
701
+ 0.26521268486976624,
702
+ 0.3244202733039856,
703
+ 0.28083187341690063,
704
+ 0.18782655894756317,
705
+ 0.273157000541687,
706
+ 0.21937644481658936,
707
+ 0.2191070020198822,
708
+ 0.2309451699256897,
709
+ 0.2086794078350067,
710
+ 0.20153307914733887,
711
+ 0.2183530181646347,
712
+ 0.17671014368534088,
713
+ 0.1841510534286499,
714
+ 0.22086754441261292,
715
+ 0.23169928789138794,
716
+ 0.27007248997688293,
717
+ 0.20892009139060974,
718
+ 0.17916454374790192,
719
+ 0.20730438828468323,
720
+ 0.20521558821201324,
721
+ 0.17557865381240845,
722
+ 0.18920457363128662,
723
+ 0.20690660178661346,
724
+ 0.1799832135438919,
725
+ 0.15656515955924988,
726
+ 0.18862086534500122,
727
+ 0.17772457003593445,
728
+ 0.17366942763328552,
729
+ 0.16974030435085297,
730
+ 0.16747745871543884,
731
+ 0.16128060221672058,
732
+ 0.19063544273376465,
733
+ 0.1916358768939972,
734
+ 0.20035310089588165,
735
+ 0.16092176735401154,
736
+ 0.1909703016281128,
737
+ 0.1657269448041916,
738
+ 0.1640641689300537,
739
+ 0.14971278607845306,
740
+ 0.10842731595039368,
741
+ 0.2196534425020218,
742
+ 0.16537384688854218,
743
+ 0.1963886320590973,
744
+ 0.16384944319725037,
745
+ 0.13870114088058472,
746
+ 0.147671639919281,
747
+ 0.17166978120803833,
748
+ 0.15626773238182068,
749
+ 0.17788034677505493,
750
+ 0.1344836950302124,
751
+ 0.1465415060520172,
752
+ 0.19216865301132202,
753
+ 0.13693669438362122,
754
+ 0.1630391776561737,
755
+ 0.11989884078502655,
756
+ 0.1325468271970749,
757
+ 0.1899355947971344,
758
+ 0.12264394760131836,
759
+ 0.12508927285671234,
760
+ 0.1484333872795105,
761
+ 0.13628548383712769
762
+ ],
763
+ "zipf_loss": [
764
+ 2.907993793487549,
765
+ 0.8448899388313293,
766
+ 0.6757582426071167,
767
+ 0.6700415015220642,
768
+ 0.6650558114051819,
769
+ 0.6573399305343628,
770
+ 0.6658006906509399,
771
+ 0.6640486717224121,
772
+ 0.6653571128845215,
773
+ 0.6562708020210266,
774
+ 0.6573773622512817,
775
+ 0.6421106457710266,
776
+ 0.6082768440246582,
777
+ 0.46432074904441833,
778
+ 0.3206597566604614,
779
+ 0.21448186039924622,
780
+ 0.1858120709657669,
781
+ 0.17020957171916962,
782
+ 0.1580614596605301,
783
+ 0.14439967274665833,
784
+ 0.11965931951999664,
785
+ 0.1086258664727211,
786
+ 0.12910524010658264,
787
+ 0.08348017930984497,
788
+ 0.07607515156269073,
789
+ 0.0828506276011467,
790
+ 0.06267359852790833,
791
+ 0.06489258259534836,
792
+ 0.051201872527599335,
793
+ 0.06189261004328728,
794
+ 0.04411982744932175,
795
+ 0.06233106553554535,
796
+ 0.05640706419944763,
797
+ 0.05366092175245285,
798
+ 0.031228451058268547,
799
+ 0.04509659856557846,
800
+ 0.05646108090877533,
801
+ 0.0354497954249382,
802
+ 0.033848002552986145,
803
+ 0.025441396981477737,
804
+ 0.05660788342356682,
805
+ 0.0361880287528038,
806
+ 0.12764789164066315,
807
+ 0.057995907962322235,
808
+ 0.025125017389655113,
809
+ 0.06402861326932907,
810
+ 0.046355463564395905,
811
+ 0.03033997118473053,
812
+ 0.06708943098783493,
813
+ 0.05130421370267868,
814
+ 0.04817532002925873,
815
+ 0.03233150392770767,
816
+ 0.041051946580410004,
817
+ 0.06357830762863159,
818
+ 0.08221856504678726,
819
+ 0.05443323031067848,
820
+ 0.028998034074902534,
821
+ 0.08684802800416946,
822
+ 0.0610925555229187,
823
+ 0.0336371473968029,
824
+ 0.023472048342227936,
825
+ 0.06078788638114929,
826
+ 0.05081494152545929,
827
+ 0.05256975442171097,
828
+ 0.042881276458501816,
829
+ 0.031977828592061996,
830
+ 0.03962726891040802,
831
+ 0.06724098324775696,
832
+ 0.05326463282108307,
833
+ 0.04372230917215347,
834
+ 0.04472571983933449,
835
+ 0.052076928317546844,
836
+ 0.042826779186725616,
837
+ 0.03869928792119026,
838
+ 0.02201938070356846,
839
+ 0.04879137873649597,
840
+ 0.02734885923564434,
841
+ 0.05175810307264328,
842
+ 0.04561843350529671,
843
+ 0.04908981919288635,
844
+ 0.045671574771404266,
845
+ 0.04992988705635071,
846
+ 0.04679465293884277,
847
+ 0.0342470221221447,
848
+ 0.04132283851504326,
849
+ 0.04431803524494171,
850
+ 0.04189082980155945,
851
+ 0.0505763441324234,
852
+ 0.05214542895555496,
853
+ 0.047025106847286224,
854
+ 0.04752563685178757,
855
+ 0.04064689949154854,
856
+ 0.04012472555041313,
857
+ 0.03914778679609299,
858
+ 0.04934476315975189,
859
+ 0.051845304667949677,
860
+ 0.028930390253663063,
861
+ 0.04640305042266846,
862
+ 0.03168107569217682,
863
+ 0.03692198544740677,
864
+ 0.041651077568531036,
865
+ 0.05133098363876343,
866
+ 0.024502430111169815,
867
+ 0.04329643025994301,
868
+ 0.031341951340436935,
869
+ 0.04551147669553757,
870
+ 0.0501837432384491,
871
+ 0.03210654854774475,
872
+ 0.041025444865226746,
873
+ 0.03432022035121918,
874
+ 0.0535600446164608,
875
+ 0.03710262104868889,
876
+ 0.051223304122686386,
877
+ 0.02310934104025364,
878
+ 0.04842713102698326,
879
+ 0.04860638454556465,
880
+ 0.029162120074033737,
881
+ 0.03430991619825363,
882
+ 0.04999687895178795,
883
+ 0.042598988860845566,
884
+ 0.03905525058507919,
885
+ 0.03396385908126831,
886
+ 0.04466875270009041,
887
+ 0.04830180108547211,
888
+ 0.048130013048648834,
889
+ 0.03087255358695984,
890
+ 0.040513940155506134,
891
+ 0.029271963983774185,
892
+ 0.04198697581887245,
893
+ 0.037133827805519104,
894
+ 0.04073633253574371,
895
+ 0.04200948029756546,
896
+ 0.032768625766038895,
897
+ 0.03521694615483284,
898
+ 0.03274751082062721,
899
+ 0.040215421468019485,
900
+ 0.03223889321088791,
901
+ 0.02875204011797905,
902
+ 0.031020455062389374,
903
+ 0.05193880572915077,
904
+ 0.03394973650574684,
905
+ 0.046916697174310684,
906
+ 0.04076382517814636,
907
+ 0.030086800456047058,
908
+ 0.03009340539574623,
909
+ 0.031167635694146156,
910
+ 0.03662469983100891,
911
+ 0.029327519237995148,
912
+ 0.02654491364955902,
913
+ 0.04122946411371231
914
+ ],
915
+ "denoise_loss": [],
916
+ "ortho_loss": [
917
+ 0.10270898044109344,
918
+ 0.045592378824949265,
919
+ 0.034897129982709885,
920
+ 0.03289920464158058,
921
+ 0.03318662568926811,
922
+ 0.035863105207681656,
923
+ 0.03718609735369682,
924
+ 0.04016987606883049,
925
+ 0.04869673773646355,
926
+ 0.053654879331588745,
927
+ 0.061492759734392166,
928
+ 0.06294921040534973,
929
+ 0.06910595297813416,
930
+ 0.07273681461811066,
931
+ 0.076898954808712,
932
+ 0.0958031639456749,
933
+ 0.1058020293712616,
934
+ 0.11720982193946838,
935
+ 0.13132666051387787,
936
+ 0.1452067792415619,
937
+ 0.15491452813148499,
938
+ 0.16184890270233154,
939
+ 0.17446120083332062,
940
+ 0.18189102411270142,
941
+ 0.1888672560453415,
942
+ 0.19659653306007385,
943
+ 0.1996685415506363,
944
+ 0.20556356012821198,
945
+ 0.2079426646232605,
946
+ 0.21441443264484406,
947
+ 0.2170758992433548,
948
+ 0.22126887738704681,
949
+ 0.22582681477069855,
950
+ 0.22605378925800323,
951
+ 0.2280348688364029,
952
+ 0.2285757064819336,
953
+ 0.22907176613807678,
954
+ 0.23065848648548126,
955
+ 0.23264732956886292,
956
+ 0.234690859913826,
957
+ 0.23757733404636383,
958
+ 0.23970840871334076,
959
+ 0.2384733259677887,
960
+ 0.23657508194446564,
961
+ 0.23852205276489258,
962
+ 0.24019040167331696,
963
+ 0.23936468362808228,
964
+ 0.23304446041584015,
965
+ 0.23387999832630157,
966
+ 0.2349344789981842,
967
+ 0.23946575820446014,
968
+ 0.24182644486427307,
969
+ 0.24514825642108917,
970
+ 0.2487141191959381,
971
+ 0.2516993284225464,
972
+ 0.2516952455043793,
973
+ 0.2530873715877533,
974
+ 0.2569259703159332,
975
+ 0.2646262049674988,
976
+ 0.2720961272716522,
977
+ 0.27863892912864685,
978
+ 0.2827668786048889,
979
+ 0.28322896361351013,
980
+ 0.29026034474372864,
981
+ 0.296204149723053,
982
+ 0.2996149957180023,
983
+ 0.3005698323249817,
984
+ 0.30707406997680664,
985
+ 0.3101491630077362,
986
+ 0.3167986273765564,
987
+ 0.3269317150115967,
988
+ 0.32879483699798584,
989
+ 0.33054113388061523,
990
+ 0.33136507868766785,
991
+ 0.3334370255470276,
992
+ 0.3420204818248749,
993
+ 0.34395188093185425,
994
+ 0.3462192118167877,
995
+ 0.350506067276001,
996
+ 0.3514798581600189,
997
+ 0.35246649384498596,
998
+ 0.3512493073940277,
999
+ 0.3530470132827759,
1000
+ 0.355428546667099,
1001
+ 0.359707772731781,
1002
+ 0.3637158274650574,
1003
+ 0.3649544417858124,
1004
+ 0.3662114441394806,
1005
+ 0.3675302267074585,
1006
+ 0.37116730213165283,
1007
+ 0.37062546610832214,
1008
+ 0.37135010957717896,
1009
+ 0.3697737455368042,
1010
+ 0.3697727620601654,
1011
+ 0.3714151084423065,
1012
+ 0.37123236060142517,
1013
+ 0.3738124966621399,
1014
+ 0.3752833604812622,
1015
+ 0.37488511204719543,
1016
+ 0.3777393102645874,
1017
+ 0.37878528237342834,
1018
+ 0.3812442421913147,
1019
+ 0.38351762294769287,
1020
+ 0.38525545597076416,
1021
+ 0.3856000006198883,
1022
+ 0.3862100839614868,
1023
+ 0.3856704533100128,
1024
+ 0.38774314522743225,
1025
+ 0.38804271817207336,
1026
+ 0.3885641396045685,
1027
+ 0.38839849829673767,
1028
+ 0.3905911147594452,
1029
+ 0.39113733172416687,
1030
+ 0.39186158776283264,
1031
+ 0.39222121238708496,
1032
+ 0.39306116104125977,
1033
+ 0.3945384621620178,
1034
+ 0.39511334896087646,
1035
+ 0.39469024538993835,
1036
+ 0.39563339948654175,
1037
+ 0.39637646079063416,
1038
+ 0.3954394459724426,
1039
+ 0.39604392647743225,
1040
+ 0.3968430161476135,
1041
+ 0.3972108066082001,
1042
+ 0.39802664518356323,
1043
+ 0.4002288281917572,
1044
+ 0.4006965458393097,
1045
+ 0.40092286467552185,
1046
+ 0.4012592136859894,
1047
+ 0.4014245271682739,
1048
+ 0.4025733470916748,
1049
+ 0.4031515419483185,
1050
+ 0.4036944806575775,
1051
+ 0.40446820855140686,
1052
+ 0.40457797050476074,
1053
+ 0.4052768647670746,
1054
+ 0.4059322774410248,
1055
+ 0.4069337546825409,
1056
+ 0.4073435664176941,
1057
+ 0.40719661116600037,
1058
+ 0.40762969851493835,
1059
+ 0.40837234258651733,
1060
+ 0.40862128138542175,
1061
+ 0.4101361334323883,
1062
+ 0.41058194637298584,
1063
+ 0.4111683964729309,
1064
+ 0.4115133285522461,
1065
+ 0.41178953647613525,
1066
+ 0.41227221488952637
1067
+ ],
1068
+ "lr": [
1069
+ 7.840000000000001e-05,
1070
+ 8e-05,
1071
+ 8e-05,
1072
+ 8e-05,
1073
+ 8e-05,
1074
+ 8e-05,
1075
+ 8e-05,
1076
+ 8e-05,
1077
+ 8e-05,
1078
+ 8e-05,
1079
+ 8e-05,
1080
+ 8e-05,
1081
+ 8e-05,
1082
+ 8e-05,
1083
+ 8e-05,
1084
+ 8e-05,
1085
+ 8e-05,
1086
+ 8e-05,
1087
+ 8e-05,
1088
+ 8e-05,
1089
+ 8e-05,
1090
+ 8e-05,
1091
+ 8e-05,
1092
+ 8e-05,
1093
+ 8e-05,
1094
+ 8e-05,
1095
+ 8e-05,
1096
+ 8e-05,
1097
+ 8e-05,
1098
+ 8e-05,
1099
+ 8e-05,
1100
+ 8e-05,
1101
+ 8e-05,
1102
+ 8e-05,
1103
+ 8e-05,
1104
+ 8e-05,
1105
+ 8e-05,
1106
+ 8e-05,
1107
+ 8e-05,
1108
+ 8e-05,
1109
+ 8e-05,
1110
+ 8e-05,
1111
+ 8e-05,
1112
+ 8e-05,
1113
+ 8e-05,
1114
+ 8e-05,
1115
+ 8e-05,
1116
+ 8e-05,
1117
+ 8e-05,
1118
+ 8e-05,
1119
+ 8e-05,
1120
+ 8e-05,
1121
+ 8e-05,
1122
+ 8e-05,
1123
+ 8e-05,
1124
+ 8e-05,
1125
+ 8e-05,
1126
+ 8e-05,
1127
+ 8e-05,
1128
+ 8e-05,
1129
+ 8e-05,
1130
+ 8e-05,
1131
+ 8e-05,
1132
+ 8e-05,
1133
+ 8e-05,
1134
+ 8e-05,
1135
+ 8e-05,
1136
+ 8e-05,
1137
+ 8e-05,
1138
+ 8e-05,
1139
+ 8e-05,
1140
+ 8e-05,
1141
+ 8e-05,
1142
+ 8e-05,
1143
+ 8e-05,
1144
+ 8e-05,
1145
+ 8e-05,
1146
+ 8e-05,
1147
+ 8e-05,
1148
+ 8e-05,
1149
+ 8e-05,
1150
+ 8e-05,
1151
+ 8e-05,
1152
+ 8e-05,
1153
+ 8e-05,
1154
+ 8e-05,
1155
+ 8e-05,
1156
+ 8e-05,
1157
+ 8e-05,
1158
+ 8e-05,
1159
+ 7.932818532818534e-05,
1160
+ 7.816988416988418e-05,
1161
+ 7.701158301158302e-05,
1162
+ 7.585328185328185e-05,
1163
+ 7.469498069498071e-05,
1164
+ 7.353667953667954e-05,
1165
+ 7.237837837837838e-05,
1166
+ 7.122007722007721e-05,
1167
+ 7.006177606177606e-05,
1168
+ 6.890347490347492e-05,
1169
+ 6.774517374517375e-05,
1170
+ 6.65868725868726e-05,
1171
+ 6.542857142857144e-05,
1172
+ 6.427027027027027e-05,
1173
+ 6.311196911196911e-05,
1174
+ 6.121235521235521e-05,
1175
+ 6.0054054054054064e-05,
1176
+ 5.8895752895752895e-05,
1177
+ 5.773745173745175e-05,
1178
+ 5.6579150579150584e-05,
1179
+ 5.542084942084943e-05,
1180
+ 5.426254826254825e-05,
1181
+ 5.310424710424711e-05,
1182
+ 5.194594594594594e-05,
1183
+ 5.0787644787644786e-05,
1184
+ 4.9629343629343644e-05,
1185
+ 4.8471042471042475e-05,
1186
+ 4.7312741312741326e-05,
1187
+ 4.615444015444014e-05,
1188
+ 4.4996138996139e-05,
1189
+ 4.309652509652511e-05,
1190
+ 4.1938223938223946e-05,
1191
+ 4.07799227799228e-05,
1192
+ 3.962162162162162e-05,
1193
+ 3.846332046332047e-05,
1194
+ 3.73050193050193e-05,
1195
+ 3.6146718146718155e-05,
1196
+ 3.4988416988416986e-05,
1197
+ 3.383011583011584e-05,
1198
+ 3.267181467181467e-05,
1199
+ 3.151351351351352e-05,
1200
+ 3.0355212355212367e-05,
1201
+ 2.9196911196911198e-05,
1202
+ 2.8038610038610046e-05,
1203
+ 2.6880308880308876e-05,
1204
+ 2.4980694980694983e-05,
1205
+ 2.3822393822393838e-05,
1206
+ 2.266409266409267e-05,
1207
+ 2.1505791505791517e-05,
1208
+ 2.0347490347490348e-05,
1209
+ 1.9189189189189195e-05,
1210
+ 1.8030888030888026e-05,
1211
+ 1.6872586872586878e-05,
1212
+ 1.571428571428571e-05,
1213
+ 1.455598455598456e-05,
1214
+ 1.3397683397683389e-05,
1215
+ 1.223938223938224e-05,
1216
+ 1.1081081081081092e-05,
1217
+ 9.92277992277992e-06,
1218
+ 8.764478764478772e-06
1219
+ ],
1220
+ "emb_lr": [],
1221
+ "eval_step": [
1222
+ 750,
1223
+ 1532,
1224
+ 2314,
1225
+ 3096,
1226
+ 3878,
1227
+ 4660,
1228
+ 5442,
1229
+ 6224,
1230
+ 7006,
1231
+ 7788
1232
+ ],
1233
+ "eval_accuracy": [
1234
+ 0.03,
1235
+ 0.92,
1236
+ 0.95,
1237
+ 0.99,
1238
+ 1.0,
1239
+ 1.0,
1240
+ 1.0,
1241
+ 1.0,
1242
+ 1.0,
1243
+ 1.0
1244
+ ]
1245
+ },
1246
+ "final_accuracy": 1.0,
1247
+ "sft_eval": {
1248
+ "config": {
1249
+ "ops": "add_sub",
1250
+ "K": null,
1251
+ "mode": "sft",
1252
+ "n_digits": 6,
1253
+ "n_per_split": 50
1254
+ },
1255
+ "splits": {
1256
+ "add_S0": {
1257
+ "full_accuracy": 0.66,
1258
+ "n_examples": 50,
1259
+ "per_subtask": {
1260
+ "SA": {
1261
+ "accuracy": 0.9457627118644067,
1262
+ "count": 295
1263
+ },
1264
+ "SS": {
1265
+ "accuracy": 0.9272727272727272,
1266
+ "count": 55
1267
+ }
1268
+ }
1269
+ },
1270
+ "add_S1": {
1271
+ "full_accuracy": 0.78,
1272
+ "n_examples": 50,
1273
+ "per_subtask": {
1274
+ "SA": {
1275
+ "accuracy": 0.9761904761904762,
1276
+ "count": 126
1277
+ },
1278
+ "SC": {
1279
+ "accuracy": 0.9367088607594937,
1280
+ "count": 79
1281
+ },
1282
+ "SS": {
1283
+ "accuracy": 1.0,
1284
+ "count": 21
1285
+ },
1286
+ "UC": {
1287
+ "accuracy": 0.9758064516129032,
1288
+ "count": 124
1289
+ }
1290
+ }
1291
+ },
1292
+ "add_S2": {
1293
+ "full_accuracy": 0.42,
1294
+ "n_examples": 50,
1295
+ "per_subtask": {
1296
+ "SA": {
1297
+ "accuracy": 0.92,
1298
+ "count": 75
1299
+ },
1300
+ "SC": {
1301
+ "accuracy": 0.9193548387096774,
1302
+ "count": 62
1303
+ },
1304
+ "SS": {
1305
+ "accuracy": 0.8205128205128205,
1306
+ "count": 39
1307
+ },
1308
+ "UC": {
1309
+ "accuracy": 0.8378378378378378,
1310
+ "count": 111
1311
+ },
1312
+ "US": {
1313
+ "accuracy": 0.9047619047619048,
1314
+ "count": 63
1315
+ }
1316
+ }
1317
+ },
1318
+ "add_S3": {
1319
+ "full_accuracy": 0.34,
1320
+ "n_examples": 50,
1321
+ "per_subtask": {
1322
+ "SA": {
1323
+ "accuracy": 0.95,
1324
+ "count": 60
1325
+ },
1326
+ "SC": {
1327
+ "accuracy": 0.8947368421052632,
1328
+ "count": 57
1329
+ },
1330
+ "SS": {
1331
+ "accuracy": 1.0,
1332
+ "count": 19
1333
+ },
1334
+ "UC": {
1335
+ "accuracy": 0.7884615384615384,
1336
+ "count": 104
1337
+ },
1338
+ "US": {
1339
+ "accuracy": 0.8545454545454545,
1340
+ "count": 110
1341
+ }
1342
+ }
1343
+ },
1344
+ "add_S4": {
1345
+ "full_accuracy": 0.36,
1346
+ "n_examples": 50,
1347
+ "per_subtask": {
1348
+ "SA": {
1349
+ "accuracy": 1.0,
1350
+ "count": 48
1351
+ },
1352
+ "SC": {
1353
+ "accuracy": 0.9807692307692307,
1354
+ "count": 52
1355
+ },
1356
+ "SS": {
1357
+ "accuracy": 0.8571428571428571,
1358
+ "count": 7
1359
+ },
1360
+ "UC": {
1361
+ "accuracy": 0.7303370786516854,
1362
+ "count": 89
1363
+ },
1364
+ "US": {
1365
+ "accuracy": 0.6688311688311688,
1366
+ "count": 154
1367
+ }
1368
+ }
1369
+ },
1370
+ "add_S5": {
1371
+ "full_accuracy": 0.18,
1372
+ "n_examples": 50,
1373
+ "per_subtask": {
1374
+ "SA": {
1375
+ "accuracy": 1.0,
1376
+ "count": 50
1377
+ },
1378
+ "SC": {
1379
+ "accuracy": 0.98,
1380
+ "count": 50
1381
+ },
1382
+ "UC": {
1383
+ "accuracy": 0.44,
1384
+ "count": 50
1385
+ },
1386
+ "US": {
1387
+ "accuracy": 0.375,
1388
+ "count": 200
1389
+ }
1390
+ }
1391
+ },
1392
+ "add_S6": {
1393
+ "full_accuracy": 0.3,
1394
+ "n_examples": 50,
1395
+ "per_subtask": {
1396
+ "SC": {
1397
+ "accuracy": 1.0,
1398
+ "count": 50
1399
+ },
1400
+ "UC": {
1401
+ "accuracy": 0.34,
1402
+ "count": 50
1403
+ },
1404
+ "US": {
1405
+ "accuracy": 0.452,
1406
+ "count": 250
1407
+ }
1408
+ }
1409
+ },
1410
+ "add_random": {
1411
+ "full_accuracy": 0.795,
1412
+ "n_examples": 200,
1413
+ "per_subtask": {
1414
+ "SA": {
1415
+ "accuracy": 0.9791183294663574,
1416
+ "count": 431
1417
+ },
1418
+ "SC": {
1419
+ "accuracy": 0.9778481012658228,
1420
+ "count": 316
1421
+ },
1422
+ "SS": {
1423
+ "accuracy": 0.8717948717948718,
1424
+ "count": 39
1425
+ },
1426
+ "UC": {
1427
+ "accuracy": 0.95,
1428
+ "count": 560
1429
+ },
1430
+ "US": {
1431
+ "accuracy": 0.9259259259259259,
1432
+ "count": 54
1433
+ }
1434
+ }
1435
+ },
1436
+ "add_C3": {
1437
+ "full_accuracy": 0.58,
1438
+ "n_examples": 50,
1439
+ "per_subtask": {
1440
+ "SA": {
1441
+ "accuracy": 1.0,
1442
+ "count": 150
1443
+ },
1444
+ "SC": {
1445
+ "accuracy": 1.0,
1446
+ "count": 50
1447
+ },
1448
+ "UC": {
1449
+ "accuracy": 0.8365384615384616,
1450
+ "count": 104
1451
+ },
1452
+ "US": {
1453
+ "accuracy": 0.8260869565217391,
1454
+ "count": 46
1455
+ }
1456
+ }
1457
+ },
1458
+ "add_C4": {
1459
+ "full_accuracy": 0.42,
1460
+ "n_examples": 50,
1461
+ "per_subtask": {
1462
+ "SA": {
1463
+ "accuracy": 0.99,
1464
+ "count": 100
1465
+ },
1466
+ "SC": {
1467
+ "accuracy": 1.0,
1468
+ "count": 50
1469
+ },
1470
+ "UC": {
1471
+ "accuracy": 0.8130081300813008,
1472
+ "count": 123
1473
+ },
1474
+ "US": {
1475
+ "accuracy": 0.6883116883116883,
1476
+ "count": 77
1477
+ }
1478
+ }
1479
+ },
1480
+ "add_C5": {
1481
+ "full_accuracy": 0.32,
1482
+ "n_examples": 50,
1483
+ "per_subtask": {
1484
+ "SA": {
1485
+ "accuracy": 1.0,
1486
+ "count": 50
1487
+ },
1488
+ "SC": {
1489
+ "accuracy": 0.98,
1490
+ "count": 50
1491
+ },
1492
+ "UC": {
1493
+ "accuracy": 0.7792207792207793,
1494
+ "count": 154
1495
+ },
1496
+ "US": {
1497
+ "accuracy": 0.8541666666666666,
1498
+ "count": 96
1499
+ }
1500
+ }
1501
+ },
1502
+ "add_C6": {
1503
+ "full_accuracy": 0.28,
1504
+ "n_examples": 50,
1505
+ "per_subtask": {
1506
+ "SC": {
1507
+ "accuracy": 1.0,
1508
+ "count": 50
1509
+ },
1510
+ "UC": {
1511
+ "accuracy": 0.7692307692307693,
1512
+ "count": 182
1513
+ },
1514
+ "US": {
1515
+ "accuracy": 0.7711864406779662,
1516
+ "count": 118
1517
+ }
1518
+ }
1519
+ },
1520
+ "sub_M0": {
1521
+ "full_accuracy": 0.64,
1522
+ "n_examples": 50,
1523
+ "per_subtask": {
1524
+ "MD": {
1525
+ "accuracy": 0.935374149659864,
1526
+ "count": 294
1527
+ },
1528
+ "ME": {
1529
+ "accuracy": 1.0,
1530
+ "count": 56
1531
+ }
1532
+ }
1533
+ },
1534
+ "sub_M1": {
1535
+ "full_accuracy": 0.84,
1536
+ "n_examples": 50,
1537
+ "per_subtask": {
1538
+ "MD": {
1539
+ "accuracy": 0.958041958041958,
1540
+ "count": 143
1541
+ },
1542
+ "MB": {
1543
+ "accuracy": 1.0,
1544
+ "count": 69
1545
+ },
1546
+ "ME": {
1547
+ "accuracy": 1.0,
1548
+ "count": 15
1549
+ },
1550
+ "UB": {
1551
+ "accuracy": 0.975609756097561,
1552
+ "count": 123
1553
+ }
1554
+ }
1555
+ },
1556
+ "sub_M2": {
1557
+ "full_accuracy": 0.36,
1558
+ "n_examples": 50,
1559
+ "per_subtask": {
1560
+ "MD": {
1561
+ "accuracy": 0.9351851851851852,
1562
+ "count": 108
1563
+ },
1564
+ "MB": {
1565
+ "accuracy": 1.0,
1566
+ "count": 52
1567
+ },
1568
+ "ME": {
1569
+ "accuracy": 0.9615384615384616,
1570
+ "count": 52
1571
+ },
1572
+ "UB": {
1573
+ "accuracy": 0.6551724137931034,
1574
+ "count": 87
1575
+ },
1576
+ "UD": {
1577
+ "accuracy": 1.0,
1578
+ "count": 51
1579
+ }
1580
+ }
1581
+ },
1582
+ "sub_M3": {
1583
+ "full_accuracy": 0.12,
1584
+ "n_examples": 50,
1585
+ "per_subtask": {
1586
+ "MD": {
1587
+ "accuracy": 0.9893617021276596,
1588
+ "count": 94
1589
+ },
1590
+ "MB": {
1591
+ "accuracy": 0.9803921568627451,
1592
+ "count": 51
1593
+ },
1594
+ "ME": {
1595
+ "accuracy": 1.0,
1596
+ "count": 25
1597
+ },
1598
+ "UB": {
1599
+ "accuracy": 0.6410256410256411,
1600
+ "count": 78
1601
+ },
1602
+ "UD": {
1603
+ "accuracy": 0.6078431372549019,
1604
+ "count": 102
1605
+ }
1606
+ }
1607
+ },
1608
+ "sub_M4": {
1609
+ "full_accuracy": 0.06,
1610
+ "n_examples": 50,
1611
+ "per_subtask": {
1612
+ "MD": {
1613
+ "accuracy": 0.99,
1614
+ "count": 100
1615
+ },
1616
+ "MB": {
1617
+ "accuracy": 1.0,
1618
+ "count": 50
1619
+ },
1620
+ "UB": {
1621
+ "accuracy": 0.66,
1622
+ "count": 50
1623
+ },
1624
+ "UD": {
1625
+ "accuracy": 0.34,
1626
+ "count": 150
1627
+ }
1628
+ }
1629
+ },
1630
+ "sub_M5": {
1631
+ "full_accuracy": 0.2,
1632
+ "n_examples": 50,
1633
+ "per_subtask": {
1634
+ "MD": {
1635
+ "accuracy": 1.0,
1636
+ "count": 50
1637
+ },
1638
+ "MB": {
1639
+ "accuracy": 1.0,
1640
+ "count": 50
1641
+ },
1642
+ "UB": {
1643
+ "accuracy": 0.82,
1644
+ "count": 50
1645
+ },
1646
+ "UD": {
1647
+ "accuracy": 0.38,
1648
+ "count": 200
1649
+ }
1650
+ }
1651
+ },
1652
+ "sub_random": {
1653
+ "full_accuracy": 0.775,
1654
+ "n_examples": 200,
1655
+ "per_subtask": {
1656
+ "MD": {
1657
+ "accuracy": 0.9693877551020408,
1658
+ "count": 588
1659
+ },
1660
+ "MB": {
1661
+ "accuracy": 0.9925373134328358,
1662
+ "count": 268
1663
+ },
1664
+ "ME": {
1665
+ "accuracy": 1.0,
1666
+ "count": 60
1667
+ },
1668
+ "UB": {
1669
+ "accuracy": 0.9328859060402684,
1670
+ "count": 447
1671
+ },
1672
+ "UD": {
1673
+ "accuracy": 0.8918918918918919,
1674
+ "count": 37
1675
+ }
1676
+ }
1677
+ },
1678
+ "sub_B3": {
1679
+ "full_accuracy": 0.56,
1680
+ "n_examples": 50,
1681
+ "per_subtask": {
1682
+ "MD": {
1683
+ "accuracy": 0.9933333333333333,
1684
+ "count": 150
1685
+ },
1686
+ "MB": {
1687
+ "accuracy": 1.0,
1688
+ "count": 50
1689
+ },
1690
+ "UB": {
1691
+ "accuracy": 0.8317757009345794,
1692
+ "count": 107
1693
+ },
1694
+ "UD": {
1695
+ "accuracy": 0.8372093023255814,
1696
+ "count": 43
1697
+ }
1698
+ }
1699
+ },
1700
+ "sub_B4": {
1701
+ "full_accuracy": 0.34,
1702
+ "n_examples": 50,
1703
+ "per_subtask": {
1704
+ "MD": {
1705
+ "accuracy": 1.0,
1706
+ "count": 100
1707
+ },
1708
+ "MB": {
1709
+ "accuracy": 1.0,
1710
+ "count": 50
1711
+ },
1712
+ "UB": {
1713
+ "accuracy": 0.8157894736842105,
1714
+ "count": 114
1715
+ },
1716
+ "UD": {
1717
+ "accuracy": 0.6046511627906976,
1718
+ "count": 86
1719
+ }
1720
+ }
1721
+ },
1722
+ "sub_B5": {
1723
+ "full_accuracy": 0.28,
1724
+ "n_examples": 50,
1725
+ "per_subtask": {
1726
+ "MD": {
1727
+ "accuracy": 1.0,
1728
+ "count": 50
1729
+ },
1730
+ "MB": {
1731
+ "accuracy": 1.0,
1732
+ "count": 50
1733
+ },
1734
+ "UB": {
1735
+ "accuracy": 0.7843137254901961,
1736
+ "count": 153
1737
+ },
1738
+ "UD": {
1739
+ "accuracy": 0.6597938144329897,
1740
+ "count": 97
1741
+ }
1742
+ }
1743
+ }
1744
+ },
1745
+ "summary": {
1746
+ "overall_accuracy": 0.5107142857142857,
1747
+ "total_examples": 1400,
1748
+ "n_splits": 22
1749
+ }
1750
+ },
1751
+ "sorl_eval": {
1752
+ "config": {
1753
+ "ops": "add_sub",
1754
+ "K": 4,
1755
+ "mode": "sorl",
1756
+ "n_digits": 6,
1757
+ "n_per_split": 50
1758
+ },
1759
+ "splits": {
1760
+ "add_S0": {
1761
+ "full_accuracy": 1.0,
1762
+ "n_examples": 50,
1763
+ "per_subtask": {
1764
+ "SA": {
1765
+ "accuracy": 1.0,
1766
+ "count": 295
1767
+ },
1768
+ "SS": {
1769
+ "accuracy": 1.0,
1770
+ "count": 55
1771
+ }
1772
+ }
1773
+ },
1774
+ "add_S1": {
1775
+ "full_accuracy": 1.0,
1776
+ "n_examples": 50,
1777
+ "per_subtask": {
1778
+ "SA": {
1779
+ "accuracy": 1.0,
1780
+ "count": 126
1781
+ },
1782
+ "SC": {
1783
+ "accuracy": 1.0,
1784
+ "count": 79
1785
+ },
1786
+ "SS": {
1787
+ "accuracy": 1.0,
1788
+ "count": 21
1789
+ },
1790
+ "UC": {
1791
+ "accuracy": 1.0,
1792
+ "count": 124
1793
+ }
1794
+ }
1795
+ },
1796
+ "add_S2": {
1797
+ "full_accuracy": 1.0,
1798
+ "n_examples": 50,
1799
+ "per_subtask": {
1800
+ "SA": {
1801
+ "accuracy": 1.0,
1802
+ "count": 75
1803
+ },
1804
+ "SC": {
1805
+ "accuracy": 1.0,
1806
+ "count": 62
1807
+ },
1808
+ "SS": {
1809
+ "accuracy": 1.0,
1810
+ "count": 39
1811
+ },
1812
+ "UC": {
1813
+ "accuracy": 1.0,
1814
+ "count": 111
1815
+ },
1816
+ "US": {
1817
+ "accuracy": 1.0,
1818
+ "count": 63
1819
+ }
1820
+ }
1821
+ },
1822
+ "add_S3": {
1823
+ "full_accuracy": 1.0,
1824
+ "n_examples": 50,
1825
+ "per_subtask": {
1826
+ "SA": {
1827
+ "accuracy": 1.0,
1828
+ "count": 60
1829
+ },
1830
+ "SC": {
1831
+ "accuracy": 1.0,
1832
+ "count": 57
1833
+ },
1834
+ "SS": {
1835
+ "accuracy": 1.0,
1836
+ "count": 19
1837
+ },
1838
+ "UC": {
1839
+ "accuracy": 1.0,
1840
+ "count": 104
1841
+ },
1842
+ "US": {
1843
+ "accuracy": 1.0,
1844
+ "count": 110
1845
+ }
1846
+ }
1847
+ },
1848
+ "add_S4": {
1849
+ "full_accuracy": 1.0,
1850
+ "n_examples": 50,
1851
+ "per_subtask": {
1852
+ "SA": {
1853
+ "accuracy": 1.0,
1854
+ "count": 48
1855
+ },
1856
+ "SC": {
1857
+ "accuracy": 1.0,
1858
+ "count": 52
1859
+ },
1860
+ "SS": {
1861
+ "accuracy": 1.0,
1862
+ "count": 7
1863
+ },
1864
+ "UC": {
1865
+ "accuracy": 1.0,
1866
+ "count": 89
1867
+ },
1868
+ "US": {
1869
+ "accuracy": 1.0,
1870
+ "count": 154
1871
+ }
1872
+ }
1873
+ },
1874
+ "add_S5": {
1875
+ "full_accuracy": 1.0,
1876
+ "n_examples": 50,
1877
+ "per_subtask": {
1878
+ "SA": {
1879
+ "accuracy": 1.0,
1880
+ "count": 50
1881
+ },
1882
+ "SC": {
1883
+ "accuracy": 1.0,
1884
+ "count": 50
1885
+ },
1886
+ "UC": {
1887
+ "accuracy": 1.0,
1888
+ "count": 50
1889
+ },
1890
+ "US": {
1891
+ "accuracy": 1.0,
1892
+ "count": 200
1893
+ }
1894
+ }
1895
+ },
1896
+ "add_S6": {
1897
+ "full_accuracy": 1.0,
1898
+ "n_examples": 50,
1899
+ "per_subtask": {
1900
+ "SC": {
1901
+ "accuracy": 1.0,
1902
+ "count": 50
1903
+ },
1904
+ "UC": {
1905
+ "accuracy": 1.0,
1906
+ "count": 50
1907
+ },
1908
+ "US": {
1909
+ "accuracy": 1.0,
1910
+ "count": 250
1911
+ }
1912
+ }
1913
+ },
1914
+ "add_random": {
1915
+ "full_accuracy": 1.0,
1916
+ "n_examples": 200,
1917
+ "per_subtask": {
1918
+ "SA": {
1919
+ "accuracy": 1.0,
1920
+ "count": 431
1921
+ },
1922
+ "SC": {
1923
+ "accuracy": 1.0,
1924
+ "count": 316
1925
+ },
1926
+ "SS": {
1927
+ "accuracy": 1.0,
1928
+ "count": 39
1929
+ },
1930
+ "UC": {
1931
+ "accuracy": 1.0,
1932
+ "count": 560
1933
+ },
1934
+ "US": {
1935
+ "accuracy": 1.0,
1936
+ "count": 54
1937
+ }
1938
+ }
1939
+ },
1940
+ "add_C3": {
1941
+ "full_accuracy": 1.0,
1942
+ "n_examples": 50,
1943
+ "per_subtask": {
1944
+ "SA": {
1945
+ "accuracy": 1.0,
1946
+ "count": 150
1947
+ },
1948
+ "SC": {
1949
+ "accuracy": 1.0,
1950
+ "count": 50
1951
+ },
1952
+ "UC": {
1953
+ "accuracy": 1.0,
1954
+ "count": 104
1955
+ },
1956
+ "US": {
1957
+ "accuracy": 1.0,
1958
+ "count": 46
1959
+ }
1960
+ }
1961
+ },
1962
+ "add_C4": {
1963
+ "full_accuracy": 1.0,
1964
+ "n_examples": 50,
1965
+ "per_subtask": {
1966
+ "SA": {
1967
+ "accuracy": 1.0,
1968
+ "count": 100
1969
+ },
1970
+ "SC": {
1971
+ "accuracy": 1.0,
1972
+ "count": 50
1973
+ },
1974
+ "UC": {
1975
+ "accuracy": 1.0,
1976
+ "count": 123
1977
+ },
1978
+ "US": {
1979
+ "accuracy": 1.0,
1980
+ "count": 77
1981
+ }
1982
+ }
1983
+ },
1984
+ "add_C5": {
1985
+ "full_accuracy": 1.0,
1986
+ "n_examples": 50,
1987
+ "per_subtask": {
1988
+ "SA": {
1989
+ "accuracy": 1.0,
1990
+ "count": 50
1991
+ },
1992
+ "SC": {
1993
+ "accuracy": 1.0,
1994
+ "count": 50
1995
+ },
1996
+ "UC": {
1997
+ "accuracy": 1.0,
1998
+ "count": 154
1999
+ },
2000
+ "US": {
2001
+ "accuracy": 1.0,
2002
+ "count": 96
2003
+ }
2004
+ }
2005
+ },
2006
+ "add_C6": {
2007
+ "full_accuracy": 1.0,
2008
+ "n_examples": 50,
2009
+ "per_subtask": {
2010
+ "SC": {
2011
+ "accuracy": 1.0,
2012
+ "count": 50
2013
+ },
2014
+ "UC": {
2015
+ "accuracy": 1.0,
2016
+ "count": 182
2017
+ },
2018
+ "US": {
2019
+ "accuracy": 1.0,
2020
+ "count": 118
2021
+ }
2022
+ }
2023
+ },
2024
+ "sub_M0": {
2025
+ "full_accuracy": 1.0,
2026
+ "n_examples": 50,
2027
+ "per_subtask": {
2028
+ "MD": {
2029
+ "accuracy": 1.0,
2030
+ "count": 294
2031
+ },
2032
+ "ME": {
2033
+ "accuracy": 1.0,
2034
+ "count": 56
2035
+ }
2036
+ }
2037
+ },
2038
+ "sub_M1": {
2039
+ "full_accuracy": 1.0,
2040
+ "n_examples": 50,
2041
+ "per_subtask": {
2042
+ "MD": {
2043
+ "accuracy": 1.0,
2044
+ "count": 143
2045
+ },
2046
+ "MB": {
2047
+ "accuracy": 1.0,
2048
+ "count": 69
2049
+ },
2050
+ "ME": {
2051
+ "accuracy": 1.0,
2052
+ "count": 15
2053
+ },
2054
+ "UB": {
2055
+ "accuracy": 1.0,
2056
+ "count": 123
2057
+ }
2058
+ }
2059
+ },
2060
+ "sub_M2": {
2061
+ "full_accuracy": 1.0,
2062
+ "n_examples": 50,
2063
+ "per_subtask": {
2064
+ "MD": {
2065
+ "accuracy": 1.0,
2066
+ "count": 108
2067
+ },
2068
+ "MB": {
2069
+ "accuracy": 1.0,
2070
+ "count": 52
2071
+ },
2072
+ "ME": {
2073
+ "accuracy": 1.0,
2074
+ "count": 52
2075
+ },
2076
+ "UB": {
2077
+ "accuracy": 1.0,
2078
+ "count": 87
2079
+ },
2080
+ "UD": {
2081
+ "accuracy": 1.0,
2082
+ "count": 51
2083
+ }
2084
+ }
2085
+ },
2086
+ "sub_M3": {
2087
+ "full_accuracy": 1.0,
2088
+ "n_examples": 50,
2089
+ "per_subtask": {
2090
+ "MD": {
2091
+ "accuracy": 1.0,
2092
+ "count": 94
2093
+ },
2094
+ "MB": {
2095
+ "accuracy": 1.0,
2096
+ "count": 51
2097
+ },
2098
+ "ME": {
2099
+ "accuracy": 1.0,
2100
+ "count": 25
2101
+ },
2102
+ "UB": {
2103
+ "accuracy": 1.0,
2104
+ "count": 78
2105
+ },
2106
+ "UD": {
2107
+ "accuracy": 1.0,
2108
+ "count": 102
2109
+ }
2110
+ }
2111
+ },
2112
+ "sub_M4": {
2113
+ "full_accuracy": 1.0,
2114
+ "n_examples": 50,
2115
+ "per_subtask": {
2116
+ "MD": {
2117
+ "accuracy": 1.0,
2118
+ "count": 100
2119
+ },
2120
+ "MB": {
2121
+ "accuracy": 1.0,
2122
+ "count": 50
2123
+ },
2124
+ "UB": {
2125
+ "accuracy": 1.0,
2126
+ "count": 50
2127
+ },
2128
+ "UD": {
2129
+ "accuracy": 1.0,
2130
+ "count": 150
2131
+ }
2132
+ }
2133
+ },
2134
+ "sub_M5": {
2135
+ "full_accuracy": 1.0,
2136
+ "n_examples": 50,
2137
+ "per_subtask": {
2138
+ "MD": {
2139
+ "accuracy": 1.0,
2140
+ "count": 50
2141
+ },
2142
+ "MB": {
2143
+ "accuracy": 1.0,
2144
+ "count": 50
2145
+ },
2146
+ "UB": {
2147
+ "accuracy": 1.0,
2148
+ "count": 50
2149
+ },
2150
+ "UD": {
2151
+ "accuracy": 1.0,
2152
+ "count": 200
2153
+ }
2154
+ }
2155
+ },
2156
+ "sub_random": {
2157
+ "full_accuracy": 1.0,
2158
+ "n_examples": 200,
2159
+ "per_subtask": {
2160
+ "MD": {
2161
+ "accuracy": 1.0,
2162
+ "count": 588
2163
+ },
2164
+ "MB": {
2165
+ "accuracy": 1.0,
2166
+ "count": 268
2167
+ },
2168
+ "ME": {
2169
+ "accuracy": 1.0,
2170
+ "count": 60
2171
+ },
2172
+ "UB": {
2173
+ "accuracy": 1.0,
2174
+ "count": 447
2175
+ },
2176
+ "UD": {
2177
+ "accuracy": 1.0,
2178
+ "count": 37
2179
+ }
2180
+ }
2181
+ },
2182
+ "sub_B3": {
2183
+ "full_accuracy": 1.0,
2184
+ "n_examples": 50,
2185
+ "per_subtask": {
2186
+ "MD": {
2187
+ "accuracy": 1.0,
2188
+ "count": 150
2189
+ },
2190
+ "MB": {
2191
+ "accuracy": 1.0,
2192
+ "count": 50
2193
+ },
2194
+ "UB": {
2195
+ "accuracy": 1.0,
2196
+ "count": 107
2197
+ },
2198
+ "UD": {
2199
+ "accuracy": 1.0,
2200
+ "count": 43
2201
+ }
2202
+ }
2203
+ },
2204
+ "sub_B4": {
2205
+ "full_accuracy": 1.0,
2206
+ "n_examples": 50,
2207
+ "per_subtask": {
2208
+ "MD": {
2209
+ "accuracy": 1.0,
2210
+ "count": 100
2211
+ },
2212
+ "MB": {
2213
+ "accuracy": 1.0,
2214
+ "count": 50
2215
+ },
2216
+ "UB": {
2217
+ "accuracy": 1.0,
2218
+ "count": 114
2219
+ },
2220
+ "UD": {
2221
+ "accuracy": 1.0,
2222
+ "count": 86
2223
+ }
2224
+ }
2225
+ },
2226
+ "sub_B5": {
2227
+ "full_accuracy": 1.0,
2228
+ "n_examples": 50,
2229
+ "per_subtask": {
2230
+ "MD": {
2231
+ "accuracy": 1.0,
2232
+ "count": 50
2233
+ },
2234
+ "MB": {
2235
+ "accuracy": 1.0,
2236
+ "count": 50
2237
+ },
2238
+ "UB": {
2239
+ "accuracy": 1.0,
2240
+ "count": 153
2241
+ },
2242
+ "UD": {
2243
+ "accuracy": 1.0,
2244
+ "count": 97
2245
+ }
2246
+ }
2247
+ }
2248
+ },
2249
+ "summary": {
2250
+ "overall_accuracy": 1.0,
2251
+ "total_examples": 1400,
2252
+ "n_splits": 22
2253
+ }
2254
+ },
2255
+ "sorl_overall_accuracy": 1.0,
2256
+ "sft_overall_accuracy": 0.5107142857142857
2257
+ }
add_sub_sorl_v1_abs50_50K/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3807a271457535e90facd4a2c2b7452fb1470e8345fa33b069288d979eff66d5
3
+ size 650466940
add_sub_sorl_v1_abs50_50K/train_config.json ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "mode": "sorl",
3
+ "ops": "add_sub",
4
+ "n_digits": 6,
5
+ "n_layer": 2,
6
+ "n_head": 3,
7
+ "n_embd": 510,
8
+ "abs_vocab": 50,
9
+ "K": 4,
10
+ "alpha_info_gain": 10.0,
11
+ "alpha_abs": 0.1,
12
+ "alpha_soft_zipf": 1.0,
13
+ "batch_size": 64,
14
+ "num_epochs": 10,
15
+ "dataset_size": 50000,
16
+ "lr": 8e-05,
17
+ "output_dir": "ckpt/sweep/as_sorl_abs50_K4_50K",
18
+ "device": "cuda",
19
+ "push_to_hub": true,
20
+ "no_wandb": false,
21
+ "n_params": 162540062,
22
+ "run_name": "add_sub_sorl_v1_abs50_50K",
23
+ "git_commit": "800625019270114adcda289bbd550c4f1109a514",
24
+ "timestamp": "2026-04-12T03:52:52.696357+00:00",
25
+ "tokenizer": "Qwen/Qwen3-0.6B",
26
+ "dataset_repo": "thoughtworks/arithmetic-sorl-data",
27
+ "dataset_config": "add_sub_6digit",
28
+ "model_repo": "thoughtworks/arithmetic-sorl",
29
+ "trainer_version": "v1",
30
+ "wandb_run_id": "d8644t69",
31
+ "wandb_url": "https://wandb.ai/nlp_and_interpretability/sorl-arithmetic/runs/d8644t69",
32
+ "final_accuracy": 1.0,
33
+ "sft_accuracy": 0.5107142857142857,
34
+ "eval_method": "ArithmeticEvaluator"
35
+ }