File size: 29,291 Bytes
bffe8cd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
Device: cuda
Loading tokenizer: /tmp/eval/multilingual_32k.model
Loading base model: /tmp/eval/best_model.pt
Model loaded: 3.04B parameters
Loading SFT data from: /tmp/sft_data_v2
Train: 3949348 tokens, Val: 201020 tokens
Using 8-bit AdamW (bitsandbytes)

Starting SFT training for 4000 steps...
Batch size: 1 x 4 accum = 4 effective, Seq len: 2048, LR: 2e-05
Step 10/4000 | Loss: 2.3791 | LR: 0.000001 | TPS: 1196 | 68s
Step 20/4000 | Loss: 2.5346 | LR: 0.000002 | TPS: 1418 | 116s
Step 30/4000 | Loss: 2.7910 | LR: 0.000003 | TPS: 1511 | 163s
Step 40/4000 | Loss: 2.5189 | LR: 0.000004 | TPS: 1562 | 210s
Step 50/4000 | Loss: 2.5049 | LR: 0.000005 | TPS: 1594 | 257s
Step 60/4000 | Loss: 2.5417 | LR: 0.000006 | TPS: 1616 | 304s
Step 70/4000 | Loss: 2.2374 | LR: 0.000007 | TPS: 1633 | 351s
Step 80/4000 | Loss: 2.5328 | LR: 0.000008 | TPS: 1645 | 398s
Step 90/4000 | Loss: 2.5359 | LR: 0.000009 | TPS: 1655 | 445s
Step 100/4000 | Loss: 2.4830 | LR: 0.000010 | TPS: 1663 | 493s
Step 110/4000 | Loss: 2.3015 | LR: 0.000011 | TPS: 1669 | 540s
Step 120/4000 | Loss: 2.4667 | LR: 0.000012 | TPS: 1675 | 587s
Step 130/4000 | Loss: 2.3792 | LR: 0.000013 | TPS: 1680 | 634s
Step 140/4000 | Loss: 2.3918 | LR: 0.000014 | TPS: 1684 | 681s
Step 150/4000 | Loss: 2.3368 | LR: 0.000015 | TPS: 1687 | 728s
Step 160/4000 | Loss: 2.4838 | LR: 0.000016 | TPS: 1690 | 775s
Step 170/4000 | Loss: 2.3578 | LR: 0.000017 | TPS: 1693 | 823s
Step 180/4000 | Loss: 2.5485 | LR: 0.000018 | TPS: 1695 | 870s
Step 190/4000 | Loss: 2.0834 | LR: 0.000019 | TPS: 1698 | 917s
Step 200/4000 | Loss: 1.9784 | LR: 0.000020 | TPS: 1699 | 964s
Step 210/4000 | Loss: 2.4826 | LR: 0.000020 | TPS: 1701 | 1011s
Step 220/4000 | Loss: 2.3540 | LR: 0.000020 | TPS: 1703 | 1058s
Step 230/4000 | Loss: 2.2093 | LR: 0.000020 | TPS: 1704 | 1105s
Step 240/4000 | Loss: 2.2137 | LR: 0.000020 | TPS: 1706 | 1153s
Step 250/4000 | Loss: 2.2151 | LR: 0.000020 | TPS: 1707 | 1200s
Step 260/4000 | Loss: 2.2535 | LR: 0.000020 | TPS: 1708 | 1247s
Step 270/4000 | Loss: 2.2235 | LR: 0.000020 | TPS: 1709 | 1294s
Step 280/4000 | Loss: 2.0449 | LR: 0.000020 | TPS: 1710 | 1341s
Step 290/4000 | Loss: 2.1502 | LR: 0.000020 | TPS: 1711 | 1388s
Step 300/4000 | Loss: 2.3716 | LR: 0.000020 | TPS: 1712 | 1435s
Step 310/4000 | Loss: 2.1591 | LR: 0.000020 | TPS: 1713 | 1483s
Step 320/4000 | Loss: 2.2153 | LR: 0.000020 | TPS: 1714 | 1530s
Step 330/4000 | Loss: 2.2023 | LR: 0.000020 | TPS: 1714 | 1577s
Step 340/4000 | Loss: 2.3968 | LR: 0.000020 | TPS: 1715 | 1624s
Step 350/4000 | Loss: 2.1146 | LR: 0.000020 | TPS: 1716 | 1671s
Step 360/4000 | Loss: 2.1857 | LR: 0.000020 | TPS: 1716 | 1718s
Step 370/4000 | Loss: 2.1965 | LR: 0.000020 | TPS: 1717 | 1765s
Step 380/4000 | Loss: 2.1613 | LR: 0.000020 | TPS: 1717 | 1813s
Step 390/4000 | Loss: 2.3080 | LR: 0.000020 | TPS: 1718 | 1860s
Step 400/4000 | Loss: 2.2964 | LR: 0.000020 | TPS: 1718 | 1907s
  📊 Val loss: 2.2256 (NEW BEST!)
  💾 Best model saved to /tmp/sft/sft_model_v2.pt
Step 410/4000 | Loss: 2.2859 | LR: 0.000020 | TPS: 1703 | 1973s
Step 420/4000 | Loss: 2.1711 | LR: 0.000020 | TPS: 1703 | 2020s
Step 430/4000 | Loss: 2.1434 | LR: 0.000020 | TPS: 1704 | 2067s
Step 440/4000 | Loss: 2.2115 | LR: 0.000020 | TPS: 1705 | 2114s
Step 450/4000 | Loss: 2.2985 | LR: 0.000020 | TPS: 1706 | 2161s
Step 460/4000 | Loss: 1.9845 | LR: 0.000020 | TPS: 1707 | 2208s
Step 470/4000 | Loss: 2.3135 | LR: 0.000020 | TPS: 1707 | 2255s
Step 480/4000 | Loss: 2.3004 | LR: 0.000020 | TPS: 1708 | 2302s
Step 490/4000 | Loss: 2.1841 | LR: 0.000020 | TPS: 1709 | 2349s
Step 500/4000 | Loss: 2.3647 | LR: 0.000020 | TPS: 1709 | 2396s
Step 510/4000 | Loss: 2.1587 | LR: 0.000020 | TPS: 1710 | 2443s
Step 520/4000 | Loss: 2.0790 | LR: 0.000020 | TPS: 1711 | 2490s
Step 530/4000 | Loss: 2.0842 | LR: 0.000020 | TPS: 1711 | 2537s
Step 540/4000 | Loss: 2.4031 | LR: 0.000020 | TPS: 1712 | 2584s
Step 550/4000 | Loss: 2.3037 | LR: 0.000020 | TPS: 1712 | 2632s
Step 560/4000 | Loss: 2.2433 | LR: 0.000020 | TPS: 1713 | 2679s
Step 570/4000 | Loss: 2.1670 | LR: 0.000020 | TPS: 1713 | 2726s
Step 580/4000 | Loss: 2.1579 | LR: 0.000020 | TPS: 1714 | 2773s
Step 590/4000 | Loss: 1.9392 | LR: 0.000020 | TPS: 1714 | 2820s
Step 600/4000 | Loss: 2.1226 | LR: 0.000020 | TPS: 1715 | 2867s
Step 610/4000 | Loss: 2.2641 | LR: 0.000019 | TPS: 1715 | 2914s
Step 620/4000 | Loss: 2.0771 | LR: 0.000019 | TPS: 1715 | 2961s
Step 630/4000 | Loss: 2.4527 | LR: 0.000019 | TPS: 1716 | 3008s
Step 640/4000 | Loss: 2.2605 | LR: 0.000019 | TPS: 1716 | 3055s
Step 650/4000 | Loss: 1.9801 | LR: 0.000019 | TPS: 1717 | 3102s
Step 660/4000 | Loss: 2.4208 | LR: 0.000019 | TPS: 1717 | 3149s
Step 670/4000 | Loss: 2.3331 | LR: 0.000019 | TPS: 1717 | 3196s
Step 680/4000 | Loss: 2.1299 | LR: 0.000019 | TPS: 1718 | 3243s
Step 690/4000 | Loss: 2.1551 | LR: 0.000019 | TPS: 1718 | 3290s
Step 700/4000 | Loss: 2.0940 | LR: 0.000019 | TPS: 1718 | 3337s
Step 710/4000 | Loss: 2.0533 | LR: 0.000019 | TPS: 1719 | 3384s
Step 720/4000 | Loss: 2.2076 | LR: 0.000019 | TPS: 1719 | 3431s
Step 730/4000 | Loss: 1.9816 | LR: 0.000019 | TPS: 1719 | 3478s
Step 740/4000 | Loss: 2.1420 | LR: 0.000019 | TPS: 1719 | 3526s
Step 750/4000 | Loss: 2.2928 | LR: 0.000019 | TPS: 1720 | 3573s
Step 760/4000 | Loss: 2.1035 | LR: 0.000019 | TPS: 1720 | 3620s
Step 770/4000 | Loss: 2.1663 | LR: 0.000019 | TPS: 1720 | 3667s
Step 780/4000 | Loss: 2.2270 | LR: 0.000019 | TPS: 1721 | 3714s
Step 790/4000 | Loss: 2.1436 | LR: 0.000019 | TPS: 1721 | 3761s
Step 800/4000 | Loss: 2.3599 | LR: 0.000019 | TPS: 1721 | 3808s
  📊 Val loss: 2.1960 (NEW BEST!)
  💾 Best model saved to /tmp/sft/sft_model_v2.pt
Step 810/4000 | Loss: 2.2325 | LR: 0.000019 | TPS: 1696 | 3912s
Step 820/4000 | Loss: 2.0798 | LR: 0.000019 | TPS: 1696 | 3960s
Step 830/4000 | Loss: 2.1527 | LR: 0.000019 | TPS: 1697 | 4007s
Step 840/4000 | Loss: 2.2046 | LR: 0.000019 | TPS: 1697 | 4054s
Step 850/4000 | Loss: 2.0648 | LR: 0.000019 | TPS: 1698 | 4101s
Step 860/4000 | Loss: 2.1708 | LR: 0.000019 | TPS: 1698 | 4148s
Step 870/4000 | Loss: 2.3088 | LR: 0.000019 | TPS: 1699 | 4195s
Step 880/4000 | Loss: 1.9936 | LR: 0.000019 | TPS: 1699 | 4242s
Step 890/4000 | Loss: 2.1869 | LR: 0.000019 | TPS: 1700 | 4290s
Step 900/4000 | Loss: 2.4199 | LR: 0.000019 | TPS: 1700 | 4337s
Step 910/4000 | Loss: 2.3803 | LR: 0.000018 | TPS: 1700 | 4384s
Step 920/4000 | Loss: 2.0193 | LR: 0.000018 | TPS: 1701 | 4431s
Step 930/4000 | Loss: 2.1047 | LR: 0.000018 | TPS: 1701 | 4478s
Step 940/4000 | Loss: 2.1449 | LR: 0.000018 | TPS: 1702 | 4525s
Step 950/4000 | Loss: 2.1521 | LR: 0.000018 | TPS: 1702 | 4572s
Step 960/4000 | Loss: 2.2820 | LR: 0.000018 | TPS: 1702 | 4620s
Step 970/4000 | Loss: 2.2996 | LR: 0.000018 | TPS: 1703 | 4667s
Step 980/4000 | Loss: 2.3187 | LR: 0.000018 | TPS: 1703 | 4714s
Step 990/4000 | Loss: 2.1756 | LR: 0.000018 | TPS: 1703 | 4761s
Step 1000/4000 | Loss: 1.9765 | LR: 0.000018 | TPS: 1704 | 4808s

  🔤 Generation samples (step 1000):
    [EN] The capital of France is located in Normandy.
    [HE] מלזיה.
    [AR] باريس.
    [FA] پاریس یکی از شهرهای بزرگ و تاریخی جهان است که دارای جاذبه های طبیعی، فرهنگی و اقتصادی متعددی می باشد. شهر پاریس در غرب کشورمان قرار دارد و به عنوان یکی از مهم ترین مراکز تجاری و مالی دنیا شناخته شده ا
    [TRANSLATE] "תודה על הכול, אבא. אני כאן איתך בכל רגע נתון."

Step 1010/4000 | Loss: 2.1665 | LR: 0.000018 | TPS: 1703 | 4859s
Step 1020/4000 | Loss: 2.1047 | LR: 0.000018 | TPS: 1703 | 4906s
Step 1030/4000 | Loss: 2.2359 | LR: 0.000018 | TPS: 1704 | 4953s
Step 1040/4000 | Loss: 2.0109 | LR: 0.000018 | TPS: 1704 | 5000s
Step 1050/4000 | Loss: 2.1515 | LR: 0.000018 | TPS: 1704 | 5047s
Step 1060/4000 | Loss: 2.0880 | LR: 0.000018 | TPS: 1705 | 5094s
Step 1070/4000 | Loss: 2.2460 | LR: 0.000018 | TPS: 1705 | 5142s
Step 1080/4000 | Loss: 1.9325 | LR: 0.000018 | TPS: 1705 | 5189s
Step 1090/4000 | Loss: 2.2283 | LR: 0.000018 | TPS: 1705 | 5236s
Step 1100/4000 | Loss: 2.3303 | LR: 0.000018 | TPS: 1706 | 5283s
Step 1110/4000 | Loss: 2.1772 | LR: 0.000018 | TPS: 1706 | 5330s
Step 1120/4000 | Loss: 2.1615 | LR: 0.000018 | TPS: 1706 | 5377s
Step 1130/4000 | Loss: 2.1470 | LR: 0.000017 | TPS: 1707 | 5424s
Step 1140/4000 | Loss: 1.9640 | LR: 0.000017 | TPS: 1707 | 5472s
Step 1150/4000 | Loss: 2.1891 | LR: 0.000017 | TPS: 1707 | 5519s
Step 1160/4000 | Loss: 2.2183 | LR: 0.000017 | TPS: 1707 | 5566s
Step 1170/4000 | Loss: 2.0268 | LR: 0.000017 | TPS: 1708 | 5613s
Step 1180/4000 | Loss: 2.2234 | LR: 0.000017 | TPS: 1708 | 5660s
Step 1190/4000 | Loss: 2.1961 | LR: 0.000017 | TPS: 1708 | 5707s
Step 1200/4000 | Loss: 2.2019 | LR: 0.000017 | TPS: 1708 | 5754s
  📊 Val loss: 2.2238 
Step 1210/4000 | Loss: 2.0809 | LR: 0.000017 | TPS: 1707 | 5807s
Step 1220/4000 | Loss: 2.1716 | LR: 0.000017 | TPS: 1707 | 5854s
Step 1230/4000 | Loss: 2.2607 | LR: 0.000017 | TPS: 1707 | 5901s
Step 1240/4000 | Loss: 2.1838 | LR: 0.000017 | TPS: 1708 | 5949s
Step 1250/4000 | Loss: 2.0725 | LR: 0.000017 | TPS: 1708 | 5996s
Step 1260/4000 | Loss: 2.2797 | LR: 0.000017 | TPS: 1708 | 6043s
Step 1270/4000 | Loss: 2.0366 | LR: 0.000017 | TPS: 1708 | 6090s
Step 1280/4000 | Loss: 2.1469 | LR: 0.000017 | TPS: 1709 | 6137s
Step 1290/4000 | Loss: 2.1541 | LR: 0.000017 | TPS: 1709 | 6184s
Step 1300/4000 | Loss: 2.0311 | LR: 0.000017 | TPS: 1709 | 6231s
Step 1310/4000 | Loss: 2.1828 | LR: 0.000016 | TPS: 1709 | 6279s
Step 1320/4000 | Loss: 2.2004 | LR: 0.000016 | TPS: 1709 | 6326s
Step 1330/4000 | Loss: 2.2589 | LR: 0.000016 | TPS: 1710 | 6373s
Step 1340/4000 | Loss: 2.1475 | LR: 0.000016 | TPS: 1710 | 6420s
Step 1350/4000 | Loss: 2.1672 | LR: 0.000016 | TPS: 1710 | 6467s
Step 1360/4000 | Loss: 2.1921 | LR: 0.000016 | TPS: 1710 | 6514s
Step 1370/4000 | Loss: 2.0689 | LR: 0.000016 | TPS: 1710 | 6561s
Step 1380/4000 | Loss: 2.2560 | LR: 0.000016 | TPS: 1711 | 6609s
Step 1390/4000 | Loss: 1.9519 | LR: 0.000016 | TPS: 1711 | 6656s
Step 1400/4000 | Loss: 1.9671 | LR: 0.000016 | TPS: 1711 | 6703s
Step 1410/4000 | Loss: 2.1535 | LR: 0.000016 | TPS: 1711 | 6750s
Step 1420/4000 | Loss: 2.1726 | LR: 0.000016 | TPS: 1711 | 6797s
Step 1430/4000 | Loss: 2.0854 | LR: 0.000016 | TPS: 1712 | 6844s
Step 1440/4000 | Loss: 2.0955 | LR: 0.000016 | TPS: 1712 | 6891s
Step 1450/4000 | Loss: 2.1260 | LR: 0.000016 | TPS: 1712 | 6939s
Step 1460/4000 | Loss: 2.2860 | LR: 0.000016 | TPS: 1712 | 6986s
Step 1470/4000 | Loss: 1.6098 | LR: 0.000015 | TPS: 1712 | 7033s
Step 1480/4000 | Loss: 2.1327 | LR: 0.000015 | TPS: 1712 | 7080s
Step 1490/4000 | Loss: 2.0506 | LR: 0.000015 | TPS: 1713 | 7127s
Step 1500/4000 | Loss: 2.0568 | LR: 0.000015 | TPS: 1713 | 7174s
Step 1510/4000 | Loss: 2.0177 | LR: 0.000015 | TPS: 1713 | 7221s
Step 1520/4000 | Loss: 2.0383 | LR: 0.000015 | TPS: 1713 | 7269s
Step 1530/4000 | Loss: 2.0994 | LR: 0.000015 | TPS: 1713 | 7316s
Step 1540/4000 | Loss: 2.0863 | LR: 0.000015 | TPS: 1713 | 7363s
Step 1550/4000 | Loss: 2.3287 | LR: 0.000015 | TPS: 1714 | 7410s
Step 1560/4000 | Loss: 2.1585 | LR: 0.000015 | TPS: 1714 | 7457s
Step 1570/4000 | Loss: 1.9781 | LR: 0.000015 | TPS: 1714 | 7504s
Step 1580/4000 | Loss: 1.9344 | LR: 0.000015 | TPS: 1714 | 7551s
Step 1590/4000 | Loss: 2.1031 | LR: 0.000015 | TPS: 1714 | 7599s
Step 1600/4000 | Loss: 2.2633 | LR: 0.000015 | TPS: 1714 | 7646s
  📊 Val loss: 2.1164 (NEW BEST!)
  💾 Best model saved to /tmp/sft/sft_model_v2.pt
Step 1610/4000 | Loss: 2.0217 | LR: 0.000015 | TPS: 1702 | 7750s
Step 1620/4000 | Loss: 2.0437 | LR: 0.000014 | TPS: 1702 | 7797s
Step 1630/4000 | Loss: 2.3588 | LR: 0.000014 | TPS: 1702 | 7844s
Step 1640/4000 | Loss: 2.1927 | LR: 0.000014 | TPS: 1702 | 7892s
Step 1650/4000 | Loss: 1.9298 | LR: 0.000014 | TPS: 1703 | 7939s
Step 1660/4000 | Loss: 2.1604 | LR: 0.000014 | TPS: 1703 | 7986s
Step 1670/4000 | Loss: 2.0326 | LR: 0.000014 | TPS: 1703 | 8033s
Step 1680/4000 | Loss: 2.1872 | LR: 0.000014 | TPS: 1703 | 8080s
Step 1690/4000 | Loss: 2.0633 | LR: 0.000014 | TPS: 1703 | 8127s
Step 1700/4000 | Loss: 2.2547 | LR: 0.000014 | TPS: 1704 | 8174s
Step 1710/4000 | Loss: 1.8940 | LR: 0.000014 | TPS: 1704 | 8221s
Step 1720/4000 | Loss: 2.0726 | LR: 0.000014 | TPS: 1704 | 8269s
Step 1730/4000 | Loss: 2.0857 | LR: 0.000014 | TPS: 1704 | 8316s
Step 1740/4000 | Loss: 2.0686 | LR: 0.000014 | TPS: 1704 | 8363s
Step 1750/4000 | Loss: 2.1306 | LR: 0.000014 | TPS: 1705 | 8410s
Step 1760/4000 | Loss: 2.0932 | LR: 0.000013 | TPS: 1705 | 8457s
Step 1770/4000 | Loss: 2.0751 | LR: 0.000013 | TPS: 1705 | 8504s
Step 1780/4000 | Loss: 2.1802 | LR: 0.000013 | TPS: 1705 | 8551s
Step 1790/4000 | Loss: 1.6657 | LR: 0.000013 | TPS: 1705 | 8599s
Step 1800/4000 | Loss: 2.1290 | LR: 0.000013 | TPS: 1706 | 8646s
Step 1810/4000 | Loss: 2.1032 | LR: 0.000013 | TPS: 1706 | 8693s
Step 1820/4000 | Loss: 2.1255 | LR: 0.000013 | TPS: 1706 | 8740s
Step 1830/4000 | Loss: 2.1091 | LR: 0.000013 | TPS: 1706 | 8787s
Step 1840/4000 | Loss: 1.9875 | LR: 0.000013 | TPS: 1706 | 8834s
Step 1850/4000 | Loss: 1.9615 | LR: 0.000013 | TPS: 1706 | 8881s
Step 1860/4000 | Loss: 2.0189 | LR: 0.000013 | TPS: 1707 | 8929s
Step 1870/4000 | Loss: 2.1387 | LR: 0.000013 | TPS: 1707 | 8976s
Step 1880/4000 | Loss: 2.0963 | LR: 0.000013 | TPS: 1707 | 9023s
Step 1890/4000 | Loss: 2.1750 | LR: 0.000013 | TPS: 1707 | 9070s
Step 1900/4000 | Loss: 2.3945 | LR: 0.000012 | TPS: 1707 | 9117s
Step 1910/4000 | Loss: 2.1515 | LR: 0.000012 | TPS: 1707 | 9164s
Step 1920/4000 | Loss: 2.2224 | LR: 0.000012 | TPS: 1708 | 9211s
Step 1930/4000 | Loss: 2.3160 | LR: 0.000012 | TPS: 1708 | 9259s
Step 1940/4000 | Loss: 2.0126 | LR: 0.000012 | TPS: 1708 | 9306s
Step 1950/4000 | Loss: 2.2443 | LR: 0.000012 | TPS: 1708 | 9353s
Step 1960/4000 | Loss: 1.9590 | LR: 0.000012 | TPS: 1708 | 9400s
Step 1970/4000 | Loss: 2.2280 | LR: 0.000012 | TPS: 1708 | 9447s
Step 1980/4000 | Loss: 1.9723 | LR: 0.000012 | TPS: 1708 | 9494s
Step 1990/4000 | Loss: 2.0697 | LR: 0.000012 | TPS: 1709 | 9541s
Step 2000/4000 | Loss: 2.0568 | LR: 0.000012 | TPS: 1709 | 9589s
  📊 Val loss: 2.1674 

  🔤 Generation samples (step 2000):
    [EN] Paris (pronounced "Paris") is a city located in northeastern France. It borders Germany to the east, with Belgium and Luxembourg as its easternmost provinces.
    [HE] בצרפת, העיר העתיקה היא אזור התיירות העיקרי.
    [AR] باريس
    [FA] پاریس، پایتخت کشور فرانسه است.
    [TRANSLATE] The answer is YES.

Step 2010/4000 | Loss: 1.9474 | LR: 0.000012 | TPS: 1708 | 9643s
Step 2020/4000 | Loss: 2.1131 | LR: 0.000012 | TPS: 1708 | 9690s
Step 2030/4000 | Loss: 2.0446 | LR: 0.000012 | TPS: 1708 | 9737s
Step 2040/4000 | Loss: 2.2229 | LR: 0.000011 | TPS: 1708 | 9784s
Step 2050/4000 | Loss: 2.1576 | LR: 0.000011 | TPS: 1708 | 9832s
Step 2060/4000 | Loss: 2.1899 | LR: 0.000011 | TPS: 1708 | 9879s
Step 2070/4000 | Loss: 2.0957 | LR: 0.000011 | TPS: 1708 | 9926s
Step 2080/4000 | Loss: 2.2643 | LR: 0.000011 | TPS: 1709 | 9973s
Step 2090/4000 | Loss: 2.0676 | LR: 0.000011 | TPS: 1709 | 10020s
Step 2100/4000 | Loss: 2.1386 | LR: 0.000011 | TPS: 1709 | 10067s
Step 2110/4000 | Loss: 2.1891 | LR: 0.000011 | TPS: 1709 | 10114s
Step 2120/4000 | Loss: 1.9532 | LR: 0.000011 | TPS: 1709 | 10162s
Step 2130/4000 | Loss: 1.9766 | LR: 0.000011 | TPS: 1709 | 10209s
Step 2140/4000 | Loss: 2.3656 | LR: 0.000011 | TPS: 1709 | 10256s
Step 2150/4000 | Loss: 2.0545 | LR: 0.000011 | TPS: 1709 | 10303s
Step 2160/4000 | Loss: 1.9706 | LR: 0.000011 | TPS: 1710 | 10350s
Step 2170/4000 | Loss: 2.0302 | LR: 0.000010 | TPS: 1710 | 10397s
Step 2180/4000 | Loss: 2.1752 | LR: 0.000010 | TPS: 1710 | 10444s
Step 2190/4000 | Loss: 2.1455 | LR: 0.000010 | TPS: 1710 | 10492s
Step 2200/4000 | Loss: 2.2238 | LR: 0.000010 | TPS: 1710 | 10539s
Step 2210/4000 | Loss: 2.1010 | LR: 0.000010 | TPS: 1710 | 10586s
Step 2220/4000 | Loss: 2.1831 | LR: 0.000010 | TPS: 1710 | 10633s
Step 2230/4000 | Loss: 1.6542 | LR: 0.000010 | TPS: 1710 | 10680s
Step 2240/4000 | Loss: 2.1102 | LR: 0.000010 | TPS: 1711 | 10727s
Step 2250/4000 | Loss: 2.2099 | LR: 0.000010 | TPS: 1711 | 10774s
Step 2260/4000 | Loss: 2.1750 | LR: 0.000010 | TPS: 1711 | 10821s
Step 2270/4000 | Loss: 2.2369 | LR: 0.000010 | TPS: 1711 | 10869s
Step 2280/4000 | Loss: 2.0393 | LR: 0.000010 | TPS: 1711 | 10916s
Step 2290/4000 | Loss: 2.3140 | LR: 0.000010 | TPS: 1711 | 10963s
Step 2300/4000 | Loss: 2.0601 | LR: 0.000010 | TPS: 1711 | 11010s
Step 2310/4000 | Loss: 2.1472 | LR: 0.000009 | TPS: 1711 | 11057s
Step 2320/4000 | Loss: 2.0987 | LR: 0.000009 | TPS: 1712 | 11104s
Step 2330/4000 | Loss: 2.0354 | LR: 0.000009 | TPS: 1712 | 11152s
Step 2340/4000 | Loss: 1.9309 | LR: 0.000009 | TPS: 1712 | 11199s
Step 2350/4000 | Loss: 2.1222 | LR: 0.000009 | TPS: 1712 | 11246s
Step 2360/4000 | Loss: 1.9861 | LR: 0.000009 | TPS: 1712 | 11293s
Step 2370/4000 | Loss: 2.1986 | LR: 0.000009 | TPS: 1712 | 11340s
Step 2380/4000 | Loss: 2.0335 | LR: 0.000009 | TPS: 1712 | 11387s
Step 2390/4000 | Loss: 2.2123 | LR: 0.000009 | TPS: 1712 | 11434s
Step 2400/4000 | Loss: 2.0287 | LR: 0.000009 | TPS: 1712 | 11482s
  📊 Val loss: 2.1943 
Step 2410/4000 | Loss: 2.0483 | LR: 0.000009 | TPS: 1712 | 11534s
Step 2420/4000 | Loss: 2.0710 | LR: 0.000009 | TPS: 1712 | 11581s
Step 2430/4000 | Loss: 2.3005 | LR: 0.000009 | TPS: 1712 | 11629s
Step 2440/4000 | Loss: 2.0617 | LR: 0.000009 | TPS: 1712 | 11676s
Step 2450/4000 | Loss: 2.2063 | LR: 0.000008 | TPS: 1712 | 11723s
Step 2460/4000 | Loss: 2.0405 | LR: 0.000008 | TPS: 1712 | 11770s
Step 2470/4000 | Loss: 2.2280 | LR: 0.000008 | TPS: 1712 | 11817s
Step 2480/4000 | Loss: 2.3856 | LR: 0.000008 | TPS: 1712 | 11864s
Step 2490/4000 | Loss: 1.9853 | LR: 0.000008 | TPS: 1712 | 11911s
Step 2500/4000 | Loss: 2.0673 | LR: 0.000008 | TPS: 1713 | 11959s
Step 2510/4000 | Loss: 2.1777 | LR: 0.000008 | TPS: 1713 | 12006s
Step 2520/4000 | Loss: 1.9846 | LR: 0.000008 | TPS: 1713 | 12053s
Step 2530/4000 | Loss: 2.1922 | LR: 0.000008 | TPS: 1713 | 12100s
Step 2540/4000 | Loss: 2.0542 | LR: 0.000008 | TPS: 1713 | 12147s
Step 2550/4000 | Loss: 2.1041 | LR: 0.000008 | TPS: 1713 | 12194s
Step 2560/4000 | Loss: 2.0099 | LR: 0.000008 | TPS: 1713 | 12241s
Step 2570/4000 | Loss: 1.8186 | LR: 0.000008 | TPS: 1713 | 12289s
Step 2580/4000 | Loss: 2.2079 | LR: 0.000008 | TPS: 1713 | 12336s
Step 2590/4000 | Loss: 1.9931 | LR: 0.000007 | TPS: 1713 | 12383s
Step 2600/4000 | Loss: 2.0986 | LR: 0.000007 | TPS: 1714 | 12430s
Step 2610/4000 | Loss: 2.0439 | LR: 0.000007 | TPS: 1714 | 12477s
Step 2620/4000 | Loss: 1.9408 | LR: 0.000007 | TPS: 1714 | 12524s
Step 2630/4000 | Loss: 2.1992 | LR: 0.000007 | TPS: 1714 | 12571s
Step 2640/4000 | Loss: 2.0929 | LR: 0.000007 | TPS: 1714 | 12619s
Step 2650/4000 | Loss: 1.9728 | LR: 0.000007 | TPS: 1714 | 12666s
Step 2660/4000 | Loss: 1.8369 | LR: 0.000007 | TPS: 1714 | 12713s
Step 2670/4000 | Loss: 1.9926 | LR: 0.000007 | TPS: 1714 | 12760s
Step 2680/4000 | Loss: 2.0414 | LR: 0.000007 | TPS: 1714 | 12807s
Step 2690/4000 | Loss: 2.1368 | LR: 0.000007 | TPS: 1714 | 12854s
Step 2700/4000 | Loss: 2.0254 | LR: 0.000007 | TPS: 1714 | 12901s
Step 2710/4000 | Loss: 2.1572 | LR: 0.000007 | TPS: 1715 | 12948s
Step 2720/4000 | Loss: 2.0418 | LR: 0.000007 | TPS: 1715 | 12996s
Step 2730/4000 | Loss: 2.1235 | LR: 0.000007 | TPS: 1715 | 13043s
Step 2740/4000 | Loss: 2.0756 | LR: 0.000006 | TPS: 1715 | 13090s
Step 2750/4000 | Loss: 2.1417 | LR: 0.000006 | TPS: 1715 | 13137s
Step 2760/4000 | Loss: 1.9427 | LR: 0.000006 | TPS: 1715 | 13184s
Step 2770/4000 | Loss: 2.1166 | LR: 0.000006 | TPS: 1715 | 13231s
Step 2780/4000 | Loss: 1.9711 | LR: 0.000006 | TPS: 1715 | 13278s
Step 2790/4000 | Loss: 2.1390 | LR: 0.000006 | TPS: 1715 | 13326s
Step 2800/4000 | Loss: 2.0557 | LR: 0.000006 | TPS: 1715 | 13373s
  📊 Val loss: 2.1839 
Step 2810/4000 | Loss: 2.0581 | LR: 0.000006 | TPS: 1715 | 13425s
Step 2820/4000 | Loss: 2.1139 | LR: 0.000006 | TPS: 1715 | 13473s
Step 2830/4000 | Loss: 2.1228 | LR: 0.000006 | TPS: 1715 | 13520s
Step 2840/4000 | Loss: 1.9685 | LR: 0.000006 | TPS: 1715 | 13567s
Step 2850/4000 | Loss: 2.1206 | LR: 0.000006 | TPS: 1715 | 13614s
Step 2860/4000 | Loss: 2.1942 | LR: 0.000006 | TPS: 1715 | 13661s
Step 2870/4000 | Loss: 1.9068 | LR: 0.000006 | TPS: 1715 | 13708s
Step 2880/4000 | Loss: 2.2099 | LR: 0.000006 | TPS: 1715 | 13755s
Step 2890/4000 | Loss: 2.0948 | LR: 0.000006 | TPS: 1715 | 13803s
Step 2900/4000 | Loss: 2.0630 | LR: 0.000005 | TPS: 1715 | 13850s
Step 2910/4000 | Loss: 1.9867 | LR: 0.000005 | TPS: 1715 | 13897s
Step 2920/4000 | Loss: 2.0602 | LR: 0.000005 | TPS: 1715 | 13944s
Step 2930/4000 | Loss: 2.0163 | LR: 0.000005 | TPS: 1716 | 13991s
Step 2940/4000 | Loss: 2.0337 | LR: 0.000005 | TPS: 1716 | 14038s
Step 2950/4000 | Loss: 2.2476 | LR: 0.000005 | TPS: 1716 | 14085s
Step 2960/4000 | Loss: 2.0430 | LR: 0.000005 | TPS: 1716 | 14133s
Step 2970/4000 | Loss: 2.3037 | LR: 0.000005 | TPS: 1716 | 14180s
Step 2980/4000 | Loss: 2.0831 | LR: 0.000005 | TPS: 1716 | 14227s
Step 2990/4000 | Loss: 2.1781 | LR: 0.000005 | TPS: 1716 | 14274s
Step 3000/4000 | Loss: 2.0784 | LR: 0.000005 | TPS: 1716 | 14321s

  🔤 Generation samples (step 3000):
    [EN] The city of Paris is a metropolitan area in Europe, consisting of 57 counties. Its main cities include Lyons, Bordeaux and Valence.
    [HE] איטליה.
    [AR] باريس.
    [FA] پاریس پایتخت کشور فرانسه و یکی از شهرهای بزرگ این کشور است. شهر پاریس در شمال غربی قاره اروپا قرار دارد.
    [TRANSLATE] You are the first one in the world to learn how to think.

Step 3010/4000 | Loss: 2.1244 | LR: 0.000005 | TPS: 1716 | 14370s
Step 3020/4000 | Loss: 2.1107 | LR: 0.000005 | TPS: 1716 | 14417s
Step 3030/4000 | Loss: 2.3589 | LR: 0.000005 | TPS: 1716 | 14464s
Step 3040/4000 | Loss: 2.0592 | LR: 0.000005 | TPS: 1716 | 14511s
Step 3050/4000 | Loss: 2.0730 | LR: 0.000005 | TPS: 1716 | 14559s
Step 3060/4000 | Loss: 2.1365 | LR: 0.000005 | TPS: 1716 | 14606s
Step 3070/4000 | Loss: 1.9819 | LR: 0.000005 | TPS: 1716 | 14653s
Step 3080/4000 | Loss: 2.2175 | LR: 0.000004 | TPS: 1716 | 14700s
Step 3090/4000 | Loss: 2.1442 | LR: 0.000004 | TPS: 1716 | 14747s
Step 3100/4000 | Loss: 2.0811 | LR: 0.000004 | TPS: 1717 | 14794s
Step 3110/4000 | Loss: 2.1427 | LR: 0.000004 | TPS: 1717 | 14841s
Step 3120/4000 | Loss: 2.1722 | LR: 0.000004 | TPS: 1717 | 14889s
Step 3130/4000 | Loss: 2.0577 | LR: 0.000004 | TPS: 1717 | 14936s
Step 3140/4000 | Loss: 2.0873 | LR: 0.000004 | TPS: 1717 | 14983s
Step 3150/4000 | Loss: 2.2920 | LR: 0.000004 | TPS: 1717 | 15030s
Step 3160/4000 | Loss: 1.8839 | LR: 0.000004 | TPS: 1717 | 15077s
Step 3170/4000 | Loss: 2.0144 | LR: 0.000004 | TPS: 1717 | 15124s
Step 3180/4000 | Loss: 1.9689 | LR: 0.000004 | TPS: 1717 | 15171s
Step 3190/4000 | Loss: 2.2123 | LR: 0.000004 | TPS: 1717 | 15219s
Step 3200/4000 | Loss: 2.0510 | LR: 0.000004 | TPS: 1717 | 15266s
  📊 Val loss: 2.1269 
Step 3210/4000 | Loss: 2.4087 | LR: 0.000004 | TPS: 1717 | 15318s
Step 3220/4000 | Loss: 2.2608 | LR: 0.000004 | TPS: 1717 | 15365s
Step 3230/4000 | Loss: 2.1930 | LR: 0.000004 | TPS: 1717 | 15413s
Step 3240/4000 | Loss: 2.0713 | LR: 0.000004 | TPS: 1717 | 15460s
Step 3250/4000 | Loss: 2.2660 | LR: 0.000004 | TPS: 1717 | 15507s
Step 3260/4000 | Loss: 1.9479 | LR: 0.000004 | TPS: 1717 | 15554s
Step 3270/4000 | Loss: 1.9657 | LR: 0.000004 | TPS: 1717 | 15601s
Step 3280/4000 | Loss: 2.1884 | LR: 0.000004 | TPS: 1717 | 15648s
Step 3290/4000 | Loss: 2.0927 | LR: 0.000004 | TPS: 1717 | 15695s
Step 3300/4000 | Loss: 2.0393 | LR: 0.000003 | TPS: 1717 | 15743s
Step 3310/4000 | Loss: 2.1302 | LR: 0.000003 | TPS: 1717 | 15790s
Step 3320/4000 | Loss: 2.0059 | LR: 0.000003 | TPS: 1717 | 15837s
Step 3330/4000 | Loss: 1.8687 | LR: 0.000003 | TPS: 1717 | 15884s
Step 3340/4000 | Loss: 2.0293 | LR: 0.000003 | TPS: 1717 | 15931s
Step 3350/4000 | Loss: 2.1500 | LR: 0.000003 | TPS: 1718 | 15978s
Step 3360/4000 | Loss: 1.9667 | LR: 0.000003 | TPS: 1718 | 16025s
Step 3370/4000 | Loss: 2.1206 | LR: 0.000003 | TPS: 1718 | 16073s
Step 3380/4000 | Loss: 2.3028 | LR: 0.000003 | TPS: 1718 | 16120s
Step 3390/4000 | Loss: 2.0075 | LR: 0.000003 | TPS: 1718 | 16167s
Step 3400/4000 | Loss: 2.0562 | LR: 0.000003 | TPS: 1718 | 16214s
Step 3410/4000 | Loss: 1.9977 | LR: 0.000003 | TPS: 1718 | 16261s
Step 3420/4000 | Loss: 2.1680 | LR: 0.000003 | TPS: 1718 | 16308s
Step 3430/4000 | Loss: 2.0009 | LR: 0.000003 | TPS: 1718 | 16355s
Step 3440/4000 | Loss: 1.8301 | LR: 0.000003 | TPS: 1718 | 16403s
Step 3450/4000 | Loss: 2.0239 | LR: 0.000003 | TPS: 1718 | 16450s
Step 3460/4000 | Loss: 2.0535 | LR: 0.000003 | TPS: 1718 | 16497s
Step 3470/4000 | Loss: 2.1348 | LR: 0.000003 | TPS: 1718 | 16544s
Step 3480/4000 | Loss: 2.0337 | LR: 0.000003 | TPS: 1718 | 16591s
Step 3490/4000 | Loss: 1.9342 | LR: 0.000003 | TPS: 1718 | 16638s
Step 3500/4000 | Loss: 2.0052 | LR: 0.000003 | TPS: 1718 | 16685s
Step 3510/4000 | Loss: 1.9902 | LR: 0.000003 | TPS: 1718 | 16732s
Step 3520/4000 | Loss: 2.1567 | LR: 0.000003 | TPS: 1719 | 16780s
Step 3530/4000 | Loss: 2.0515 | LR: 0.000003 | TPS: 1719 | 16827s
Step 3540/4000 | Loss: 2.1572 | LR: 0.000003 | TPS: 1719 | 16874s
Step 3550/4000 | Loss: 2.1381 | LR: 0.000003 | TPS: 1719 | 16921s
Step 3560/4000 | Loss: 2.0383 | LR: 0.000003 | TPS: 1719 | 16968s
Step 3570/4000 | Loss: 2.3566 | LR: 0.000003 | TPS: 1719 | 17015s
Step 3580/4000 | Loss: 1.9773 | LR: 0.000003 | TPS: 1719 | 17062s
Step 3590/4000 | Loss: 2.0418 | LR: 0.000003 | TPS: 1719 | 17110s
Step 3600/4000 | Loss: 2.1756 | LR: 0.000002 | TPS: 1719 | 17157s
  📊 Val loss: 2.1478 
Step 3610/4000 | Loss: 2.0761 | LR: 0.000002 | TPS: 1718 | 17209s
Step 3620/4000 | Loss: 2.1353 | LR: 0.000002 | TPS: 1718 | 17257s
Step 3630/4000 | Loss: 2.1856 | LR: 0.000002 | TPS: 1719 | 17304s
Step 3640/4000 | Loss: 2.1298 | LR: 0.000002 | TPS: 1719 | 17351s
Step 3650/4000 | Loss: 2.0784 | LR: 0.000002 | TPS: 1719 | 17398s
Step 3660/4000 | Loss: 2.0533 | LR: 0.000002 | TPS: 1719 | 17445s
Step 3670/4000 | Loss: 2.2151 | LR: 0.000002 | TPS: 1719 | 17492s
Step 3680/4000 | Loss: 2.0177 | LR: 0.000002 | TPS: 1719 | 17539s
Step 3690/4000 | Loss: 2.1048 | LR: 0.000002 | TPS: 1719 | 17587s
Step 3700/4000 | Loss: 2.0629 | LR: 0.000002 | TPS: 1719 | 17634s
Step 3710/4000 | Loss: 2.0375 | LR: 0.000002 | TPS: 1719 | 17681s
Step 3720/4000 | Loss: 2.2282 | LR: 0.000002 | TPS: 1719 | 17728s
Step 3730/4000 | Loss: 2.2049 | LR: 0.000002 | TPS: 1719 | 17775s
Step 3740/4000 | Loss: 2.0247 | LR: 0.000002 | TPS: 1719 | 17822s
Step 3750/4000 | Loss: 2.0337 | LR: 0.000002 | TPS: 1719 | 17869s
Step 3760/4000 | Loss: 2.0922 | LR: 0.000002 | TPS: 1719 | 17917s
Step 3770/4000 | Loss: 2.1018 | LR: 0.000002 | TPS: 1719 | 17964s
Step 3780/4000 | Loss: 2.1183 | LR: 0.000002 | TPS: 1719 | 18011s
Step 3790/4000 | Loss: 2.2469 | LR: 0.000002 | TPS: 1719 | 18058s
Step 3800/4000 | Loss: 2.1373 | LR: 0.000002 | TPS: 1719 | 18105s
Step 3810/4000 | Loss: 2.1103 | LR: 0.000002 | TPS: 1719 | 18152s
Step 3820/4000 | Loss: 2.0317 | LR: 0.000002 | TPS: 1719 | 18199s
Step 3830/4000 | Loss: 2.0022 | LR: 0.000002 | TPS: 1720 | 18247s
Step 3840/4000 | Loss: 2.1618 | LR: 0.000002 | TPS: 1720 | 18294s
Step 3850/4000 | Loss: 2.1421 | LR: 0.000002 | TPS: 1720 | 18341s
Step 3860/4000 | Loss: 1.9279 | LR: 0.000002 | TPS: 1720 | 18388s
Step 3870/4000 | Loss: 2.1657 | LR: 0.000002 | TPS: 1720 | 18435s
Step 3880/4000 | Loss: 2.1433 | LR: 0.000002 | TPS: 1720 | 18482s
Step 3890/4000 | Loss: 2.0893 | LR: 0.000002 | TPS: 1720 | 18529s
Step 3900/4000 | Loss: 2.0036 | LR: 0.000002 | TPS: 1720 | 18576s
Step 3910/4000 | Loss: 2.0691 | LR: 0.000002 | TPS: 1720 | 18624s
Step 3920/4000 | Loss: 2.0282 | LR: 0.000002 | TPS: 1720 | 18671s
Step 3930/4000 | Loss: 1.9818 | LR: 0.000002 | TPS: 1720 | 18718s
Step 3940/4000 | Loss: 2.1466 | LR: 0.000002 | TPS: 1720 | 18765s
Step 3950/4000 | Loss: 2.0455 | LR: 0.000002 | TPS: 1720 | 18812s
Step 3960/4000 | Loss: 2.1226 | LR: 0.000002 | TPS: 1720 | 18859s
Step 3970/4000 | Loss: 1.9890 | LR: 0.000002 | TPS: 1720 | 18906s
Step 3980/4000 | Loss: 2.1891 | LR: 0.000002 | TPS: 1720 | 18954s
Step 3990/4000 | Loss: 1.8920 | LR: 0.000002 | TPS: 1720 | 19001s
Step 4000/4000 | Loss: 2.0073 | LR: 0.000002 | TPS: 1720 | 19048s
  📊 Val loss: 2.1472 

  🔤 Generation samples (step 4000):
    [EN] The capital of France consists of 38 cities, 26.9% (14) of which are in the metropolitan area.
    [HE] צרפת היא אחת מיעדי התיירות הפופולאריים ביותר בעולם, בשל היותה מוקד משיכה תיירותי משמעותי עבור תיירים מכל רחבי העולם. העיר בנויה משני חלקים עיקריים - כיכר ד'ארסאן (Droite Sud) ורחוב ד'ארסאן (De La Roch
    [AR] باريس.
    [FA] پاریس شهری بزرگ و تاریخی در شمال غربی اروپا است.
    [TRANSLATE] It’s very short.


============================================================
SFT TRAINING COMPLETE
Steps: 4000, Time: 19057s (317.6min)
Best val loss: 2.1164
Model saved to: /tmp/sft/sft_model_v2.pt
============================================================
Uploading to S3...