File size: 41,627 Bytes
255ad86
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
[TRAIN] Qwen2.5-0.5B-Instruct / yelp_polarity
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 3.7303, 'grad_norm': 4.7769694328308105, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 2.7849, 'grad_norm': 0.8657419085502625, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 2.2527, 'grad_norm': 0.6747444272041321, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 17.0597, 'train_samples_per_second': 46.894, 'train_steps_per_second': 5.862, 'train_loss': 2.5282428288459777, 'epoch': 1.0}
[TRAIN] Llama-3.2-1B-Instruct / yelp_polarity
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 4.1805, 'grad_norm': 2.5839781761169434, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 2.741, 'grad_norm': 0.8343231678009033, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 2.1358, 'grad_norm': 0.7408742308616638, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 23.9512, 'train_samples_per_second': 33.401, 'train_steps_per_second': 4.175, 'train_loss': 2.452775387763977, 'epoch': 1.0}
[TRAIN] Qwen2.5-0.5B-Instruct / yahoo_topics
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 3.5828, 'grad_norm': 7.462729454040527, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 1.69, 'grad_norm': 1.3501135110855103, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 0.6278, 'grad_norm': 0.7101731300354004, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 11.1499, 'train_samples_per_second': 71.749, 'train_steps_per_second': 8.969, 'train_loss': 1.1778222179412843, 'epoch': 1.0}
[TRAIN] Llama-3.2-1B-Instruct / yahoo_topics
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 4.0504, 'grad_norm': 3.948367118835449, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 1.6769, 'grad_norm': 0.8289307951927185, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 0.719, 'grad_norm': 0.7602887153625488, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 15.0409, 'train_samples_per_second': 53.188, 'train_steps_per_second': 6.649, 'train_loss': 1.2217226791381837, 'epoch': 1.0}
[TRAIN] Qwen2.5-0.5B-Instruct / setfit_qnli
Repo card metadata block was not found. Setting CardData to empty.
WARNING:huggingface_hub.repocard:Repo card metadata block was not found. Setting CardData to empty.
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 3.6959, 'grad_norm': 7.206658840179443, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 2.1738, 'grad_norm': 2.3098304271698, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 1.246, 'grad_norm': 0.8799184560775757, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 13.537, 'train_samples_per_second': 59.097, 'train_steps_per_second': 7.387, 'train_loss': 1.7251459455490112, 'epoch': 1.0}
[TRAIN] Llama-3.2-1B-Instruct / setfit_qnli
Repo card metadata block was not found. Setting CardData to empty.
WARNING:huggingface_hub.repocard:Repo card metadata block was not found. Setting CardData to empty.
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 4.3207, 'grad_norm': 3.4501419067382812, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 2.3036, 'grad_norm': 1.5148175954818726, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 1.3368, 'grad_norm': 0.8041808605194092, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 19.2885, 'train_samples_per_second': 41.476, 'train_steps_per_second': 5.184, 'train_loss': 1.840370125770569, 'epoch': 1.0}
[TRAIN] Qwen2.5-0.5B-Instruct / setfit_mnli
Repo card metadata block was not found. Setting CardData to empty.
WARNING:huggingface_hub.repocard:Repo card metadata block was not found. Setting CardData to empty.
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 3.7572, 'grad_norm': 7.329052925109863, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 2.0955, 'grad_norm': 1.332726001739502, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 1.1722, 'grad_norm': 1.0915731191635132, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 12.5607, 'train_samples_per_second': 63.691, 'train_steps_per_second': 7.961, 'train_loss': 1.6504256796836854, 'epoch': 1.0}
[TRAIN] Llama-3.2-1B-Instruct / setfit_mnli
Repo card metadata block was not found. Setting CardData to empty.
WARNING:huggingface_hub.repocard:Repo card metadata block was not found. Setting CardData to empty.
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 4.5361, 'grad_norm': 3.802908182144165, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 2.3165, 'grad_norm': 1.068376064300537, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 1.3286, 'grad_norm': 1.183143973350525, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 17.7105, 'train_samples_per_second': 45.171, 'train_steps_per_second': 5.646, 'train_loss': 1.8447417879104615, 'epoch': 1.0}
[TRAIN] Qwen2.5-0.5B-Instruct / setfit_rte
Repo card metadata block was not found. Setting CardData to empty.
WARNING:huggingface_hub.repocard:Repo card metadata block was not found. Setting CardData to empty.
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 3.7704, 'grad_norm': 6.176489353179932, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 2.245, 'grad_norm': 1.4168987274169922, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 1.5001, 'grad_norm': 0.8886827826499939, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 15.8692, 'train_samples_per_second': 50.412, 'train_steps_per_second': 6.301, 'train_loss': 1.8877661609649659, 'epoch': 1.0}
[TRAIN] Llama-3.2-1B-Instruct / setfit_rte
Repo card metadata block was not found. Setting CardData to empty.
WARNING:huggingface_hub.repocard:Repo card metadata block was not found. Setting CardData to empty.
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 4.3427, 'grad_norm': 3.0718681812286377, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 2.3075, 'grad_norm': 0.8789539337158203, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 1.5508, 'grad_norm': 0.8922062516212463, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 22.9748, 'train_samples_per_second': 34.821, 'train_steps_per_second': 4.353, 'train_loss': 1.9495039463043213, 'epoch': 1.0}
[TRAIN] Qwen2.5-0.5B-Instruct / setfit_mrpc
Repo card metadata block was not found. Setting CardData to empty.
WARNING:huggingface_hub.repocard:Repo card metadata block was not found. Setting CardData to empty.
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 3.5646, 'grad_norm': 6.482349872589111, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 2.1609, 'grad_norm': 1.498671054840088, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 1.1077, 'grad_norm': 1.2300711870193481, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 12.675, 'train_samples_per_second': 63.116, 'train_steps_per_second': 7.89, 'train_loss': 1.6482944440841676, 'epoch': 1.0}
[TRAIN] Llama-3.2-1B-Instruct / setfit_mrpc
Repo card metadata block was not found. Setting CardData to empty.
WARNING:huggingface_hub.repocard:Repo card metadata block was not found. Setting CardData to empty.
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 4.1908, 'grad_norm': 3.27897572517395, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 2.2278, 'grad_norm': 1.3972417116165161, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 1.1679, 'grad_norm': 1.1495029926300049, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 17.4505, 'train_samples_per_second': 45.844, 'train_steps_per_second': 5.73, 'train_loss': 1.7174697971343995, 'epoch': 1.0}
[TRAIN] Qwen2.5-0.5B-Instruct / setfit_qqp
Repo card metadata block was not found. Setting CardData to empty.
WARNING:huggingface_hub.repocard:Repo card metadata block was not found. Setting CardData to empty.
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 4.2107, 'grad_norm': 9.564720153808594, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 2.1536, 'grad_norm': 2.2898943424224854, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 0.9036, 'grad_norm': 1.9666653871536255, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 11.0988, 'train_samples_per_second': 72.08, 'train_steps_per_second': 9.01, 'train_loss': 1.5491774892807006, 'epoch': 1.0}
[TRAIN] Llama-3.2-1B-Instruct / setfit_qqp
Repo card metadata block was not found. Setting CardData to empty.
WARNING:huggingface_hub.repocard:Repo card metadata block was not found. Setting CardData to empty.
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 4.8175, 'grad_norm': 4.59446907043457, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 2.2373, 'grad_norm': 1.7829301357269287, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 1.0236, 'grad_norm': 0.9115185141563416, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 14.7927, 'train_samples_per_second': 54.081, 'train_steps_per_second': 6.76, 'train_loss': 1.656243953704834, 'epoch': 1.0}
[TRAIN] Qwen2.5-0.5B-Instruct / snli_main
FAIL snli_main: 'NoneType' object is not subscriptable
[TRAIN] Qwen2.5-0.5B-Instruct / paws
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 3.5934, 'grad_norm': 6.102503776550293, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 1.9913, 'grad_norm': 2.3907556533813477, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 0.9893, 'grad_norm': 1.0990138053894043, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 12.8257, 'train_samples_per_second': 62.375, 'train_steps_per_second': 7.797, 'train_loss': 1.5063319826126098, 'epoch': 1.0}
[TRAIN] Llama-3.2-1B-Instruct / paws
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 4.1323, 'grad_norm': 3.2562813758850098, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 2.0692, 'grad_norm': 1.2330275774002075, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 1.0592, 'grad_norm': 1.0995728969573975, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 18.2495, 'train_samples_per_second': 43.837, 'train_steps_per_second': 5.48, 'train_loss': 1.5848482275009155, 'epoch': 1.0}
[TRAIN] Qwen2.5-0.5B-Instruct / mteb_emo
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 3.8069, 'grad_norm': 8.230766296386719, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 1.8378, 'grad_norm': 1.1878981590270996, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 0.8893, 'grad_norm': 0.9681587219238281, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 10.9856, 'train_samples_per_second': 72.823, 'train_steps_per_second': 9.103, 'train_loss': 1.3832484269142151, 'epoch': 1.0}
[TRAIN] Llama-3.2-1B-Instruct / mteb_emo
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 4.3094, 'grad_norm': 4.186059951782227, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 1.8931, 'grad_norm': 0.6450718641281128, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 0.9777, 'grad_norm': 0.8589568734169006, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 15.2721, 'train_samples_per_second': 52.383, 'train_steps_per_second': 6.548, 'train_loss': 1.4595746660232545, 'epoch': 1.0}
[TRAIN] Qwen2.5-0.5B-Instruct / mteb_tweet_sent
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 4.174, 'grad_norm': 9.287677764892578, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 2.1783, 'grad_norm': 1.6081738471984863, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 1.0981, 'grad_norm': 1.098804235458374, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 10.0871, 'train_samples_per_second': 79.309, 'train_steps_per_second': 9.914, 'train_loss': 1.6581892824172975, 'epoch': 1.0}
[TRAIN] Llama-3.2-1B-Instruct / mteb_tweet_sent
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 4.9047, 'grad_norm': 4.823108673095703, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 2.2931, 'grad_norm': 0.8228617906570435, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 1.1957, 'grad_norm': 1.1062841415405273, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 13.3568, 'train_samples_per_second': 59.895, 'train_steps_per_second': 7.487, 'train_loss': 1.770520305633545, 'epoch': 1.0}
[TRAIN] Qwen2.5-0.5B-Instruct / mteb_toxic_conv
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 3.8327, 'grad_norm': 5.095354080200195, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 2.7047, 'grad_norm': 1.2102032899856567, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 1.8282, 'grad_norm': 0.7912783622741699, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 16.1364, 'train_samples_per_second': 49.577, 'train_steps_per_second': 6.197, 'train_loss': 2.277755131721497, 'epoch': 1.0}
[TRAIN] Llama-3.2-1B-Instruct / mteb_toxic_conv
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 4.2113, 'grad_norm': 2.6563172340393066, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 2.6816, 'grad_norm': 0.9523308277130127, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 1.8255, 'grad_norm': 0.8748573660850525, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 23.3874, 'train_samples_per_second': 34.206, 'train_steps_per_second': 4.276, 'train_loss': 2.2688763523101807, 'epoch': 1.0}
[TRAIN] Qwen2.5-0.5B-Instruct / mteb_amazon_cf
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 4.0523, 'grad_norm': 9.151971817016602, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 2.1372, 'grad_norm': 1.455265998840332, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 1.0694, 'grad_norm': 0.7700763940811157, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 11.0029, 'train_samples_per_second': 72.708, 'train_steps_per_second': 9.088, 'train_loss': 1.6224427938461303, 'epoch': 1.0}
[TRAIN] Llama-3.2-1B-Instruct / mteb_amazon_cf
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 4.6138, 'grad_norm': 4.544278621673584, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 2.1775, 'grad_norm': 0.9337619543075562, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 1.1355, 'grad_norm': 0.723482072353363, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 15.2983, 'train_samples_per_second': 52.293, 'train_steps_per_second': 6.537, 'train_loss': 1.6808272743225097, 'epoch': 1.0}
[TRAIN] Qwen2.5-0.5B-Instruct / dair_emo_unsplit
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 3.8074, 'grad_norm': 8.336443901062012, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 1.8721, 'grad_norm': 1.2432242631912231, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 0.8747, 'grad_norm': 0.7010195851325989, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 11.1363, 'train_samples_per_second': 71.837, 'train_steps_per_second': 8.98, 'train_loss': 1.3927378106117247, 'epoch': 1.0}
[TRAIN] Llama-3.2-1B-Instruct / dair_emo_unsplit
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 4.2904, 'grad_norm': 4.192126750946045, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 1.9423, 'grad_norm': 0.7681553959846497, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 0.9669, 'grad_norm': 0.8587647676467896, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 15.4846, 'train_samples_per_second': 51.664, 'train_steps_per_second': 6.458, 'train_loss': 1.4780806350708007, 'epoch': 1.0}
[TRAIN] Qwen2.5-0.5B-Instruct / setfit_yelp_full
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 3.7862, 'grad_norm': 5.573526382446289, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 2.7039, 'grad_norm': 0.8480653166770935, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 2.1478, 'grad_norm': 0.7421320676803589, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 16.8957, 'train_samples_per_second': 47.349, 'train_steps_per_second': 5.919, 'train_loss': 2.436655189990997, 'epoch': 1.0}
[TRAIN] Llama-3.2-1B-Instruct / setfit_yelp_full
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 4.1259, 'grad_norm': 2.8407199382781982, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 2.5924, 'grad_norm': 0.847999095916748, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 2.0286, 'grad_norm': 0.8495175838470459, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 24.3012, 'train_samples_per_second': 32.92, 'train_steps_per_second': 4.115, 'train_loss': 2.325834422111511, 'epoch': 1.0}
[TRAIN] Qwen2.5-0.5B-Instruct / setfit_emotion
Repo card metadata block was not found. Setting CardData to empty.
WARNING:huggingface_hub.repocard:Repo card metadata block was not found. Setting CardData to empty.
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 3.6796, 'grad_norm': 8.679558753967285, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 1.8993, 'grad_norm': 1.4098697900772095, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 0.8971, 'grad_norm': 0.7674059271812439, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 11.0804, 'train_samples_per_second': 72.2, 'train_steps_per_second': 9.025, 'train_loss': 1.415963158607483, 'epoch': 1.0}
[TRAIN] Llama-3.2-1B-Instruct / setfit_emotion
Repo card metadata block was not found. Setting CardData to empty.
WARNING:huggingface_hub.repocard:Repo card metadata block was not found. Setting CardData to empty.
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 4.2258, 'grad_norm': 4.511800765991211, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 1.9514, 'grad_norm': 0.8560807108879089, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 0.9838, 'grad_norm': 0.7192825078964233, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 15.4355, 'train_samples_per_second': 51.829, 'train_steps_per_second': 6.479, 'train_loss': 1.4903493642807006, 'epoch': 1.0}
[TRAIN] Qwen2.5-0.5B-Instruct / setfit_sst5_alt
Repo card metadata block was not found. Setting CardData to empty.
WARNING:huggingface_hub.repocard:Repo card metadata block was not found. Setting CardData to empty.
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 3.8562, 'grad_norm': 7.73358154296875, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 2.058, 'grad_norm': 1.4864909648895264, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 0.9958, 'grad_norm': 0.9336028695106506, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 11.1382, 'train_samples_per_second': 71.825, 'train_steps_per_second': 8.978, 'train_loss': 1.5448474097251892, 'epoch': 1.0}
[TRAIN] Llama-3.2-1B-Instruct / setfit_sst5_alt
Repo card metadata block was not found. Setting CardData to empty.
WARNING:huggingface_hub.repocard:Repo card metadata block was not found. Setting CardData to empty.
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 4.533, 'grad_norm': 4.299591541290283, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 2.1324, 'grad_norm': 0.8781896829605103, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 1.0859, 'grad_norm': 1.1681996583938599, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 14.9527, 'train_samples_per_second': 53.502, 'train_steps_per_second': 6.688, 'train_loss': 1.6331668090820313, 'epoch': 1.0}
[TRAIN] Qwen2.5-0.5B-Instruct / setfit_student_qcat
Repo card metadata block was not found. Setting CardData to empty.
WARNING:huggingface_hub.repocard:Repo card metadata block was not found. Setting CardData to empty.
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 3.222, 'grad_norm': 6.4504008293151855, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 1.7676, 'grad_norm': 1.245484471321106, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 0.9295, 'grad_norm': 0.9734070897102356, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 16.7493, 'train_samples_per_second': 47.763, 'train_steps_per_second': 5.97, 'train_loss': 1.3630782985687255, 'epoch': 1.0}
[TRAIN] Llama-3.2-1B-Instruct / setfit_student_qcat
Repo card metadata block was not found. Setting CardData to empty.
WARNING:huggingface_hub.repocard:Repo card metadata block was not found. Setting CardData to empty.
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 3.6534, 'grad_norm': 3.073930025100708, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 1.8414, 'grad_norm': 1.0740983486175537, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 1.0049, 'grad_norm': 0.8595083355903625, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 23.9597, 'train_samples_per_second': 33.389, 'train_steps_per_second': 4.174, 'train_loss': 1.4412736368179322, 'epoch': 1.0}
[TRAIN] Qwen2.5-0.5B-Instruct / setfit_movie_reviews
Repo card metadata block was not found. Setting CardData to empty.
WARNING:huggingface_hub.repocard:Repo card metadata block was not found. Setting CardData to empty.
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 4.3526, 'grad_norm': 8.657792091369629, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 2.1798, 'grad_norm': 1.4296244382858276, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 1.1388, 'grad_norm': 1.635110855102539, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 10.5597, 'train_samples_per_second': 75.76, 'train_steps_per_second': 9.47, 'train_loss': 1.681065092086792, 'epoch': 1.0}
[TRAIN] Llama-3.2-1B-Instruct / setfit_movie_reviews
Repo card metadata block was not found. Setting CardData to empty.
WARNING:huggingface_hub.repocard:Repo card metadata block was not found. Setting CardData to empty.
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 5.0628, 'grad_norm': 4.693039894104004, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 2.3569, 'grad_norm': 1.6031795740127563, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 1.2367, 'grad_norm': 1.3692783117294312, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 13.7401, 'train_samples_per_second': 58.224, 'train_steps_per_second': 7.278, 'train_loss': 1.8238403606414795, 'epoch': 1.0}
[TRAIN] Qwen2.5-0.5B-Instruct / sms_spam
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 4.6678, 'grad_norm': 9.387018203735352, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 2.5725, 'grad_norm': 2.320296049118042, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 1.3857, 'grad_norm': 0.8593599200248718, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 11.7612, 'train_samples_per_second': 68.02, 'train_steps_per_second': 8.503, 'train_loss': 2.0000625324249266, 'epoch': 1.0}
[TRAIN] Llama-3.2-1B-Instruct / sms_spam
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 5.2133, 'grad_norm': 4.438594818115234, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 2.5525, 'grad_norm': 0.9178706407546997, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 1.4744, 'grad_norm': 0.9547106027603149, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 15.3862, 'train_samples_per_second': 51.995, 'train_steps_per_second': 6.499, 'train_loss': 2.040070538520813, 'epoch': 1.0}
[TRAIN] Qwen2.5-0.5B-Instruct / snips
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 3.7415, 'grad_norm': 6.773946762084961, 'learning_rate': 6.666666666666667e-05, 'epoch': 0.024390243902439025}
{'train_runtime': 5.0066, 'train_samples_per_second': 65.514, 'train_steps_per_second': 8.189, 'train_loss': 2.3904829083419425, 'epoch': 1.0}
[TRAIN] Llama-3.2-1B-Instruct / snips
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 4.2322, 'grad_norm': 3.6766297817230225, 'learning_rate': 6.666666666666667e-05, 'epoch': 0.024390243902439025}
{'train_runtime': 6.6162, 'train_samples_per_second': 49.575, 'train_steps_per_second': 6.197, 'train_loss': 2.2383127794033144, 'epoch': 1.0}
[TRAIN] Qwen2.5-0.5B-Instruct / toxic_chat
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 4.2878, 'grad_norm': 11.425216674804688, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 2.1228, 'grad_norm': 1.2395150661468506, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 1.0886, 'grad_norm': 0.9984734058380127, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 14.4319, 'train_samples_per_second': 55.433, 'train_steps_per_second': 6.929, 'train_loss': 1.627325358390808, 'epoch': 1.0}
[TRAIN] Llama-3.2-1B-Instruct / toxic_chat
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 4.9473, 'grad_norm': 5.387019157409668, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 2.2314, 'grad_norm': 1.1496485471725464, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 1.2022, 'grad_norm': 1.3217898607254028, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 19.4465, 'train_samples_per_second': 41.138, 'train_steps_per_second': 5.142, 'train_loss': 1.7439907598495483, 'epoch': 1.0}
[TRAIN] Qwen2.5-0.5B-Instruct / hate_offensive_lang
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 4.1901, 'grad_norm': 7.31283712387085, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 2.4641, 'grad_norm': 1.5787039995193481, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 1.4027, 'grad_norm': 0.8544707894325256, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 11.7338, 'train_samples_per_second': 68.179, 'train_steps_per_second': 8.522, 'train_loss': 1.9506223344802855, 'epoch': 1.0}
[TRAIN] Llama-3.2-1B-Instruct / hate_offensive_lang
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 4.7699, 'grad_norm': 3.9460268020629883, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 2.4483, 'grad_norm': 1.2194733619689941, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 1.4445, 'grad_norm': 0.9832725524902344, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 15.3135, 'train_samples_per_second': 52.242, 'train_steps_per_second': 6.53, 'train_loss': 1.969619345664978, 'epoch': 1.0}
[TRAIN] Qwen2.5-0.5B-Instruct / amazon_massive_scenario
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 3.7737, 'grad_norm': 6.923394203186035, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 1.8803, 'grad_norm': 3.135934829711914, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 0.3266, 'grad_norm': 0.7962337136268616, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 12.1362, 'train_samples_per_second': 65.919, 'train_steps_per_second': 8.24, 'train_loss': 1.1223828840255736, 'epoch': 1.0}
[TRAIN] Llama-3.2-1B-Instruct / amazon_massive_scenario
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 4.1677, 'grad_norm': 3.5334181785583496, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 1.7375, 'grad_norm': 0.7452163100242615, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 0.4004, 'grad_norm': 0.6548231840133667, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 16.2556, 'train_samples_per_second': 49.214, 'train_steps_per_second': 6.152, 'train_loss': 1.0932469034194947, 'epoch': 1.0}
[TRAIN] Qwen2.5-0.5B-Instruct / trec_fine
The repository for CogComp/trec contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/CogComp/trec.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.

Do you wish to run the custom code? [y/N] FAIL trec_fine: The repository for CogComp/trec contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/CogComp/trec.
Please pass the argument `trust_remote_code=True` to allow custom code to be run.
[TRAIN] Qwen2.5-0.5B-Instruct / ag_setfit
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 3.999, 'grad_norm': 6.8160881996154785, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 2.2971, 'grad_norm': 1.6471203565597534, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
FAIL ag_setfit: Loading this dataset requires you to execute custom code contained in the dataset repository on your local machine. Please set the option `trust_remote_code=True` to permit loading of this dataset.
[TRAIN] Qwen2.5-0.5B-Instruct / cardiffnlp_topic
FAIL cardiffnlp_topic: 'train'
[TRAIN] Qwen2.5-0.5B-Instruct / financial_pb_pos
The repository for takala/financial_phrasebank contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/takala/financial_phrasebank.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.

Do you wish to run the custom code? [y/N] FAIL financial_pb_pos: The repository for takala/financial_phrasebank contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/takala/financial_phrasebank.
Please pass the argument `trust_remote_code=True` to allow custom code to be run.
[TRAIN] Qwen2.5-0.5B-Instruct / clinc_small_skip
FAIL clinc_small_skip: 'NoneType' object is not iterable
[TRAIN] Qwen2.5-0.5B-Instruct / hate_speech18
The repository for hate_speech18 contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/hate_speech18.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.

Do you wish to run the custom code? [y/N] FAIL hate_speech18: The repository for hate_speech18 contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/hate_speech18.
Please pass the argument `trust_remote_code=True` to allow custom code to be run.
[TRAIN] Qwen2.5-0.5B-Instruct / rotten_alt
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 4.433, 'grad_norm': 9.2361478805542, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 2.2875, 'grad_norm': 1.727787733078003, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 1.2166, 'grad_norm': 0.9453311562538147, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 10.2424, 'train_samples_per_second': 78.106, 'train_steps_per_second': 9.763, 'train_loss': 1.7735167932510376, 'epoch': 1.0}
[TRAIN] Llama-3.2-1B-Instruct / rotten_alt
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 5.0912, 'grad_norm': 4.759712219238281, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 2.4038, 'grad_norm': 1.4830732345581055, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 1.2859, 'grad_norm': 1.2477036714553833, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 13.9507, 'train_samples_per_second': 57.345, 'train_steps_per_second': 7.168, 'train_loss': 1.871756911277771, 'epoch': 1.0}
[TRAIN] Qwen2.5-0.5B-Instruct / yelp_polarity_test
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 3.7862, 'grad_norm': 5.573526382446289, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 2.7039, 'grad_norm': 0.8480653166770935, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 2.1478, 'grad_norm': 0.7421320676803589, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 17.1604, 'train_samples_per_second': 46.619, 'train_steps_per_second': 5.827, 'train_loss': 2.436655189990997, 'epoch': 1.0}
[TRAIN] Llama-3.2-1B-Instruct / yelp_polarity_test
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 4.1259, 'grad_norm': 2.8407199382781982, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 2.5924, 'grad_norm': 0.847999095916748, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 2.0286, 'grad_norm': 0.8495175838470459, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 24.3883, 'train_samples_per_second': 32.803, 'train_steps_per_second': 4.1, 'train_loss': 2.325834422111511, 'epoch': 1.0}
[TRAIN] Qwen2.5-0.5B-Instruct / dynasent_r1
The repository for dynabench/dynasent contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/dynabench/dynasent.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.

Do you wish to run the custom code? [y/N] FAIL dynasent_r1: The repository for dynabench/dynasent contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/dynabench/dynasent.
Please pass the argument `trust_remote_code=True` to allow custom code to be run.
[TRAIN] Qwen2.5-0.5B-Instruct / dynasent_r2
The repository for dynabench/dynasent contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/dynabench/dynasent.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.

Do you wish to run the custom code? [y/N] FAIL dynasent_r2: The repository for dynabench/dynasent contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/dynabench/dynasent.
Please pass the argument `trust_remote_code=True` to allow custom code to be run.


FAILED: ['snli_main', 'trec_fine', 'ag_setfit', 'cardiffnlp_topic', 'financial_pb_pos', 'clinc_small_skip', 'hate_speech18', 'dynasent_r1', 'dynasent_r2']
Successfully trained 25 new anchors