ronnengmail commited on
Commit
bffe8cd
·
verified ·
1 Parent(s): b408473

Upload eval/sft_3b.log with huggingface_hub

Browse files
Files changed (1) hide show
  1. eval/sft_3b.log +463 -0
eval/sft_3b.log ADDED
@@ -0,0 +1,463 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Device: cuda
2
+ Loading tokenizer: /tmp/eval/multilingual_32k.model
3
+ Loading base model: /tmp/eval/best_model.pt
4
+ Model loaded: 3.04B parameters
5
+ Loading SFT data from: /tmp/sft_data_v2
6
+ Train: 3949348 tokens, Val: 201020 tokens
7
+ Using 8-bit AdamW (bitsandbytes)
8
+
9
+ Starting SFT training for 4000 steps...
10
+ Batch size: 1 x 4 accum = 4 effective, Seq len: 2048, LR: 2e-05
11
+ Step 10/4000 | Loss: 2.3791 | LR: 0.000001 | TPS: 1196 | 68s
12
+ Step 20/4000 | Loss: 2.5346 | LR: 0.000002 | TPS: 1418 | 116s
13
+ Step 30/4000 | Loss: 2.7910 | LR: 0.000003 | TPS: 1511 | 163s
14
+ Step 40/4000 | Loss: 2.5189 | LR: 0.000004 | TPS: 1562 | 210s
15
+ Step 50/4000 | Loss: 2.5049 | LR: 0.000005 | TPS: 1594 | 257s
16
+ Step 60/4000 | Loss: 2.5417 | LR: 0.000006 | TPS: 1616 | 304s
17
+ Step 70/4000 | Loss: 2.2374 | LR: 0.000007 | TPS: 1633 | 351s
18
+ Step 80/4000 | Loss: 2.5328 | LR: 0.000008 | TPS: 1645 | 398s
19
+ Step 90/4000 | Loss: 2.5359 | LR: 0.000009 | TPS: 1655 | 445s
20
+ Step 100/4000 | Loss: 2.4830 | LR: 0.000010 | TPS: 1663 | 493s
21
+ Step 110/4000 | Loss: 2.3015 | LR: 0.000011 | TPS: 1669 | 540s
22
+ Step 120/4000 | Loss: 2.4667 | LR: 0.000012 | TPS: 1675 | 587s
23
+ Step 130/4000 | Loss: 2.3792 | LR: 0.000013 | TPS: 1680 | 634s
24
+ Step 140/4000 | Loss: 2.3918 | LR: 0.000014 | TPS: 1684 | 681s
25
+ Step 150/4000 | Loss: 2.3368 | LR: 0.000015 | TPS: 1687 | 728s
26
+ Step 160/4000 | Loss: 2.4838 | LR: 0.000016 | TPS: 1690 | 775s
27
+ Step 170/4000 | Loss: 2.3578 | LR: 0.000017 | TPS: 1693 | 823s
28
+ Step 180/4000 | Loss: 2.5485 | LR: 0.000018 | TPS: 1695 | 870s
29
+ Step 190/4000 | Loss: 2.0834 | LR: 0.000019 | TPS: 1698 | 917s
30
+ Step 200/4000 | Loss: 1.9784 | LR: 0.000020 | TPS: 1699 | 964s
31
+ Step 210/4000 | Loss: 2.4826 | LR: 0.000020 | TPS: 1701 | 1011s
32
+ Step 220/4000 | Loss: 2.3540 | LR: 0.000020 | TPS: 1703 | 1058s
33
+ Step 230/4000 | Loss: 2.2093 | LR: 0.000020 | TPS: 1704 | 1105s
34
+ Step 240/4000 | Loss: 2.2137 | LR: 0.000020 | TPS: 1706 | 1153s
35
+ Step 250/4000 | Loss: 2.2151 | LR: 0.000020 | TPS: 1707 | 1200s
36
+ Step 260/4000 | Loss: 2.2535 | LR: 0.000020 | TPS: 1708 | 1247s
37
+ Step 270/4000 | Loss: 2.2235 | LR: 0.000020 | TPS: 1709 | 1294s
38
+ Step 280/4000 | Loss: 2.0449 | LR: 0.000020 | TPS: 1710 | 1341s
39
+ Step 290/4000 | Loss: 2.1502 | LR: 0.000020 | TPS: 1711 | 1388s
40
+ Step 300/4000 | Loss: 2.3716 | LR: 0.000020 | TPS: 1712 | 1435s
41
+ Step 310/4000 | Loss: 2.1591 | LR: 0.000020 | TPS: 1713 | 1483s
42
+ Step 320/4000 | Loss: 2.2153 | LR: 0.000020 | TPS: 1714 | 1530s
43
+ Step 330/4000 | Loss: 2.2023 | LR: 0.000020 | TPS: 1714 | 1577s
44
+ Step 340/4000 | Loss: 2.3968 | LR: 0.000020 | TPS: 1715 | 1624s
45
+ Step 350/4000 | Loss: 2.1146 | LR: 0.000020 | TPS: 1716 | 1671s
46
+ Step 360/4000 | Loss: 2.1857 | LR: 0.000020 | TPS: 1716 | 1718s
47
+ Step 370/4000 | Loss: 2.1965 | LR: 0.000020 | TPS: 1717 | 1765s
48
+ Step 380/4000 | Loss: 2.1613 | LR: 0.000020 | TPS: 1717 | 1813s
49
+ Step 390/4000 | Loss: 2.3080 | LR: 0.000020 | TPS: 1718 | 1860s
50
+ Step 400/4000 | Loss: 2.2964 | LR: 0.000020 | TPS: 1718 | 1907s
51
+ 📊 Val loss: 2.2256 (NEW BEST!)
52
+ 💾 Best model saved to /tmp/sft/sft_model_v2.pt
53
+ Step 410/4000 | Loss: 2.2859 | LR: 0.000020 | TPS: 1703 | 1973s
54
+ Step 420/4000 | Loss: 2.1711 | LR: 0.000020 | TPS: 1703 | 2020s
55
+ Step 430/4000 | Loss: 2.1434 | LR: 0.000020 | TPS: 1704 | 2067s
56
+ Step 440/4000 | Loss: 2.2115 | LR: 0.000020 | TPS: 1705 | 2114s
57
+ Step 450/4000 | Loss: 2.2985 | LR: 0.000020 | TPS: 1706 | 2161s
58
+ Step 460/4000 | Loss: 1.9845 | LR: 0.000020 | TPS: 1707 | 2208s
59
+ Step 470/4000 | Loss: 2.3135 | LR: 0.000020 | TPS: 1707 | 2255s
60
+ Step 480/4000 | Loss: 2.3004 | LR: 0.000020 | TPS: 1708 | 2302s
61
+ Step 490/4000 | Loss: 2.1841 | LR: 0.000020 | TPS: 1709 | 2349s
62
+ Step 500/4000 | Loss: 2.3647 | LR: 0.000020 | TPS: 1709 | 2396s
63
+ Step 510/4000 | Loss: 2.1587 | LR: 0.000020 | TPS: 1710 | 2443s
64
+ Step 520/4000 | Loss: 2.0790 | LR: 0.000020 | TPS: 1711 | 2490s
65
+ Step 530/4000 | Loss: 2.0842 | LR: 0.000020 | TPS: 1711 | 2537s
66
+ Step 540/4000 | Loss: 2.4031 | LR: 0.000020 | TPS: 1712 | 2584s
67
+ Step 550/4000 | Loss: 2.3037 | LR: 0.000020 | TPS: 1712 | 2632s
68
+ Step 560/4000 | Loss: 2.2433 | LR: 0.000020 | TPS: 1713 | 2679s
69
+ Step 570/4000 | Loss: 2.1670 | LR: 0.000020 | TPS: 1713 | 2726s
70
+ Step 580/4000 | Loss: 2.1579 | LR: 0.000020 | TPS: 1714 | 2773s
71
+ Step 590/4000 | Loss: 1.9392 | LR: 0.000020 | TPS: 1714 | 2820s
72
+ Step 600/4000 | Loss: 2.1226 | LR: 0.000020 | TPS: 1715 | 2867s
73
+ Step 610/4000 | Loss: 2.2641 | LR: 0.000019 | TPS: 1715 | 2914s
74
+ Step 620/4000 | Loss: 2.0771 | LR: 0.000019 | TPS: 1715 | 2961s
75
+ Step 630/4000 | Loss: 2.4527 | LR: 0.000019 | TPS: 1716 | 3008s
76
+ Step 640/4000 | Loss: 2.2605 | LR: 0.000019 | TPS: 1716 | 3055s
77
+ Step 650/4000 | Loss: 1.9801 | LR: 0.000019 | TPS: 1717 | 3102s
78
+ Step 660/4000 | Loss: 2.4208 | LR: 0.000019 | TPS: 1717 | 3149s
79
+ Step 670/4000 | Loss: 2.3331 | LR: 0.000019 | TPS: 1717 | 3196s
80
+ Step 680/4000 | Loss: 2.1299 | LR: 0.000019 | TPS: 1718 | 3243s
81
+ Step 690/4000 | Loss: 2.1551 | LR: 0.000019 | TPS: 1718 | 3290s
82
+ Step 700/4000 | Loss: 2.0940 | LR: 0.000019 | TPS: 1718 | 3337s
83
+ Step 710/4000 | Loss: 2.0533 | LR: 0.000019 | TPS: 1719 | 3384s
84
+ Step 720/4000 | Loss: 2.2076 | LR: 0.000019 | TPS: 1719 | 3431s
85
+ Step 730/4000 | Loss: 1.9816 | LR: 0.000019 | TPS: 1719 | 3478s
86
+ Step 740/4000 | Loss: 2.1420 | LR: 0.000019 | TPS: 1719 | 3526s
87
+ Step 750/4000 | Loss: 2.2928 | LR: 0.000019 | TPS: 1720 | 3573s
88
+ Step 760/4000 | Loss: 2.1035 | LR: 0.000019 | TPS: 1720 | 3620s
89
+ Step 770/4000 | Loss: 2.1663 | LR: 0.000019 | TPS: 1720 | 3667s
90
+ Step 780/4000 | Loss: 2.2270 | LR: 0.000019 | TPS: 1721 | 3714s
91
+ Step 790/4000 | Loss: 2.1436 | LR: 0.000019 | TPS: 1721 | 3761s
92
+ Step 800/4000 | Loss: 2.3599 | LR: 0.000019 | TPS: 1721 | 3808s
93
+ 📊 Val loss: 2.1960 (NEW BEST!)
94
+ 💾 Best model saved to /tmp/sft/sft_model_v2.pt
95
+ Step 810/4000 | Loss: 2.2325 | LR: 0.000019 | TPS: 1696 | 3912s
96
+ Step 820/4000 | Loss: 2.0798 | LR: 0.000019 | TPS: 1696 | 3960s
97
+ Step 830/4000 | Loss: 2.1527 | LR: 0.000019 | TPS: 1697 | 4007s
98
+ Step 840/4000 | Loss: 2.2046 | LR: 0.000019 | TPS: 1697 | 4054s
99
+ Step 850/4000 | Loss: 2.0648 | LR: 0.000019 | TPS: 1698 | 4101s
100
+ Step 860/4000 | Loss: 2.1708 | LR: 0.000019 | TPS: 1698 | 4148s
101
+ Step 870/4000 | Loss: 2.3088 | LR: 0.000019 | TPS: 1699 | 4195s
102
+ Step 880/4000 | Loss: 1.9936 | LR: 0.000019 | TPS: 1699 | 4242s
103
+ Step 890/4000 | Loss: 2.1869 | LR: 0.000019 | TPS: 1700 | 4290s
104
+ Step 900/4000 | Loss: 2.4199 | LR: 0.000019 | TPS: 1700 | 4337s
105
+ Step 910/4000 | Loss: 2.3803 | LR: 0.000018 | TPS: 1700 | 4384s
106
+ Step 920/4000 | Loss: 2.0193 | LR: 0.000018 | TPS: 1701 | 4431s
107
+ Step 930/4000 | Loss: 2.1047 | LR: 0.000018 | TPS: 1701 | 4478s
108
+ Step 940/4000 | Loss: 2.1449 | LR: 0.000018 | TPS: 1702 | 4525s
109
+ Step 950/4000 | Loss: 2.1521 | LR: 0.000018 | TPS: 1702 | 4572s
110
+ Step 960/4000 | Loss: 2.2820 | LR: 0.000018 | TPS: 1702 | 4620s
111
+ Step 970/4000 | Loss: 2.2996 | LR: 0.000018 | TPS: 1703 | 4667s
112
+ Step 980/4000 | Loss: 2.3187 | LR: 0.000018 | TPS: 1703 | 4714s
113
+ Step 990/4000 | Loss: 2.1756 | LR: 0.000018 | TPS: 1703 | 4761s
114
+ Step 1000/4000 | Loss: 1.9765 | LR: 0.000018 | TPS: 1704 | 4808s
115
+
116
+ 🔤 Generation samples (step 1000):
117
+ [EN] The capital of France is located in Normandy.
118
+ [HE] מלזיה.
119
+ [AR] باريس.
120
+ [FA] پاریس یکی از شهرهای بزرگ و تاریخی جهان است که دارای جاذبه های طبیعی، فرهنگی و اقتصادی متعددی می باشد. شهر پاریس در غرب کشورمان قرار دارد و به عنوان یکی از مهم ترین مراکز تجاری و مالی دنیا شناخته شده ا
121
+ [TRANSLATE] "תודה על הכול, אבא. אני כאן איתך בכל רגע נתון."
122
+
123
+ Step 1010/4000 | Loss: 2.1665 | LR: 0.000018 | TPS: 1703 | 4859s
124
+ Step 1020/4000 | Loss: 2.1047 | LR: 0.000018 | TPS: 1703 | 4906s
125
+ Step 1030/4000 | Loss: 2.2359 | LR: 0.000018 | TPS: 1704 | 4953s
126
+ Step 1040/4000 | Loss: 2.0109 | LR: 0.000018 | TPS: 1704 | 5000s
127
+ Step 1050/4000 | Loss: 2.1515 | LR: 0.000018 | TPS: 1704 | 5047s
128
+ Step 1060/4000 | Loss: 2.0880 | LR: 0.000018 | TPS: 1705 | 5094s
129
+ Step 1070/4000 | Loss: 2.2460 | LR: 0.000018 | TPS: 1705 | 5142s
130
+ Step 1080/4000 | Loss: 1.9325 | LR: 0.000018 | TPS: 1705 | 5189s
131
+ Step 1090/4000 | Loss: 2.2283 | LR: 0.000018 | TPS: 1705 | 5236s
132
+ Step 1100/4000 | Loss: 2.3303 | LR: 0.000018 | TPS: 1706 | 5283s
133
+ Step 1110/4000 | Loss: 2.1772 | LR: 0.000018 | TPS: 1706 | 5330s
134
+ Step 1120/4000 | Loss: 2.1615 | LR: 0.000018 | TPS: 1706 | 5377s
135
+ Step 1130/4000 | Loss: 2.1470 | LR: 0.000017 | TPS: 1707 | 5424s
136
+ Step 1140/4000 | Loss: 1.9640 | LR: 0.000017 | TPS: 1707 | 5472s
137
+ Step 1150/4000 | Loss: 2.1891 | LR: 0.000017 | TPS: 1707 | 5519s
138
+ Step 1160/4000 | Loss: 2.2183 | LR: 0.000017 | TPS: 1707 | 5566s
139
+ Step 1170/4000 | Loss: 2.0268 | LR: 0.000017 | TPS: 1708 | 5613s
140
+ Step 1180/4000 | Loss: 2.2234 | LR: 0.000017 | TPS: 1708 | 5660s
141
+ Step 1190/4000 | Loss: 2.1961 | LR: 0.000017 | TPS: 1708 | 5707s
142
+ Step 1200/4000 | Loss: 2.2019 | LR: 0.000017 | TPS: 1708 | 5754s
143
+ 📊 Val loss: 2.2238
144
+ Step 1210/4000 | Loss: 2.0809 | LR: 0.000017 | TPS: 1707 | 5807s
145
+ Step 1220/4000 | Loss: 2.1716 | LR: 0.000017 | TPS: 1707 | 5854s
146
+ Step 1230/4000 | Loss: 2.2607 | LR: 0.000017 | TPS: 1707 | 5901s
147
+ Step 1240/4000 | Loss: 2.1838 | LR: 0.000017 | TPS: 1708 | 5949s
148
+ Step 1250/4000 | Loss: 2.0725 | LR: 0.000017 | TPS: 1708 | 5996s
149
+ Step 1260/4000 | Loss: 2.2797 | LR: 0.000017 | TPS: 1708 | 6043s
150
+ Step 1270/4000 | Loss: 2.0366 | LR: 0.000017 | TPS: 1708 | 6090s
151
+ Step 1280/4000 | Loss: 2.1469 | LR: 0.000017 | TPS: 1709 | 6137s
152
+ Step 1290/4000 | Loss: 2.1541 | LR: 0.000017 | TPS: 1709 | 6184s
153
+ Step 1300/4000 | Loss: 2.0311 | LR: 0.000017 | TPS: 1709 | 6231s
154
+ Step 1310/4000 | Loss: 2.1828 | LR: 0.000016 | TPS: 1709 | 6279s
155
+ Step 1320/4000 | Loss: 2.2004 | LR: 0.000016 | TPS: 1709 | 6326s
156
+ Step 1330/4000 | Loss: 2.2589 | LR: 0.000016 | TPS: 1710 | 6373s
157
+ Step 1340/4000 | Loss: 2.1475 | LR: 0.000016 | TPS: 1710 | 6420s
158
+ Step 1350/4000 | Loss: 2.1672 | LR: 0.000016 | TPS: 1710 | 6467s
159
+ Step 1360/4000 | Loss: 2.1921 | LR: 0.000016 | TPS: 1710 | 6514s
160
+ Step 1370/4000 | Loss: 2.0689 | LR: 0.000016 | TPS: 1710 | 6561s
161
+ Step 1380/4000 | Loss: 2.2560 | LR: 0.000016 | TPS: 1711 | 6609s
162
+ Step 1390/4000 | Loss: 1.9519 | LR: 0.000016 | TPS: 1711 | 6656s
163
+ Step 1400/4000 | Loss: 1.9671 | LR: 0.000016 | TPS: 1711 | 6703s
164
+ Step 1410/4000 | Loss: 2.1535 | LR: 0.000016 | TPS: 1711 | 6750s
165
+ Step 1420/4000 | Loss: 2.1726 | LR: 0.000016 | TPS: 1711 | 6797s
166
+ Step 1430/4000 | Loss: 2.0854 | LR: 0.000016 | TPS: 1712 | 6844s
167
+ Step 1440/4000 | Loss: 2.0955 | LR: 0.000016 | TPS: 1712 | 6891s
168
+ Step 1450/4000 | Loss: 2.1260 | LR: 0.000016 | TPS: 1712 | 6939s
169
+ Step 1460/4000 | Loss: 2.2860 | LR: 0.000016 | TPS: 1712 | 6986s
170
+ Step 1470/4000 | Loss: 1.6098 | LR: 0.000015 | TPS: 1712 | 7033s
171
+ Step 1480/4000 | Loss: 2.1327 | LR: 0.000015 | TPS: 1712 | 7080s
172
+ Step 1490/4000 | Loss: 2.0506 | LR: 0.000015 | TPS: 1713 | 7127s
173
+ Step 1500/4000 | Loss: 2.0568 | LR: 0.000015 | TPS: 1713 | 7174s
174
+ Step 1510/4000 | Loss: 2.0177 | LR: 0.000015 | TPS: 1713 | 7221s
175
+ Step 1520/4000 | Loss: 2.0383 | LR: 0.000015 | TPS: 1713 | 7269s
176
+ Step 1530/4000 | Loss: 2.0994 | LR: 0.000015 | TPS: 1713 | 7316s
177
+ Step 1540/4000 | Loss: 2.0863 | LR: 0.000015 | TPS: 1713 | 7363s
178
+ Step 1550/4000 | Loss: 2.3287 | LR: 0.000015 | TPS: 1714 | 7410s
179
+ Step 1560/4000 | Loss: 2.1585 | LR: 0.000015 | TPS: 1714 | 7457s
180
+ Step 1570/4000 | Loss: 1.9781 | LR: 0.000015 | TPS: 1714 | 7504s
181
+ Step 1580/4000 | Loss: 1.9344 | LR: 0.000015 | TPS: 1714 | 7551s
182
+ Step 1590/4000 | Loss: 2.1031 | LR: 0.000015 | TPS: 1714 | 7599s
183
+ Step 1600/4000 | Loss: 2.2633 | LR: 0.000015 | TPS: 1714 | 7646s
184
+ 📊 Val loss: 2.1164 (NEW BEST!)
185
+ 💾 Best model saved to /tmp/sft/sft_model_v2.pt
186
+ Step 1610/4000 | Loss: 2.0217 | LR: 0.000015 | TPS: 1702 | 7750s
187
+ Step 1620/4000 | Loss: 2.0437 | LR: 0.000014 | TPS: 1702 | 7797s
188
+ Step 1630/4000 | Loss: 2.3588 | LR: 0.000014 | TPS: 1702 | 7844s
189
+ Step 1640/4000 | Loss: 2.1927 | LR: 0.000014 | TPS: 1702 | 7892s
190
+ Step 1650/4000 | Loss: 1.9298 | LR: 0.000014 | TPS: 1703 | 7939s
191
+ Step 1660/4000 | Loss: 2.1604 | LR: 0.000014 | TPS: 1703 | 7986s
192
+ Step 1670/4000 | Loss: 2.0326 | LR: 0.000014 | TPS: 1703 | 8033s
193
+ Step 1680/4000 | Loss: 2.1872 | LR: 0.000014 | TPS: 1703 | 8080s
194
+ Step 1690/4000 | Loss: 2.0633 | LR: 0.000014 | TPS: 1703 | 8127s
195
+ Step 1700/4000 | Loss: 2.2547 | LR: 0.000014 | TPS: 1704 | 8174s
196
+ Step 1710/4000 | Loss: 1.8940 | LR: 0.000014 | TPS: 1704 | 8221s
197
+ Step 1720/4000 | Loss: 2.0726 | LR: 0.000014 | TPS: 1704 | 8269s
198
+ Step 1730/4000 | Loss: 2.0857 | LR: 0.000014 | TPS: 1704 | 8316s
199
+ Step 1740/4000 | Loss: 2.0686 | LR: 0.000014 | TPS: 1704 | 8363s
200
+ Step 1750/4000 | Loss: 2.1306 | LR: 0.000014 | TPS: 1705 | 8410s
201
+ Step 1760/4000 | Loss: 2.0932 | LR: 0.000013 | TPS: 1705 | 8457s
202
+ Step 1770/4000 | Loss: 2.0751 | LR: 0.000013 | TPS: 1705 | 8504s
203
+ Step 1780/4000 | Loss: 2.1802 | LR: 0.000013 | TPS: 1705 | 8551s
204
+ Step 1790/4000 | Loss: 1.6657 | LR: 0.000013 | TPS: 1705 | 8599s
205
+ Step 1800/4000 | Loss: 2.1290 | LR: 0.000013 | TPS: 1706 | 8646s
206
+ Step 1810/4000 | Loss: 2.1032 | LR: 0.000013 | TPS: 1706 | 8693s
207
+ Step 1820/4000 | Loss: 2.1255 | LR: 0.000013 | TPS: 1706 | 8740s
208
+ Step 1830/4000 | Loss: 2.1091 | LR: 0.000013 | TPS: 1706 | 8787s
209
+ Step 1840/4000 | Loss: 1.9875 | LR: 0.000013 | TPS: 1706 | 8834s
210
+ Step 1850/4000 | Loss: 1.9615 | LR: 0.000013 | TPS: 1706 | 8881s
211
+ Step 1860/4000 | Loss: 2.0189 | LR: 0.000013 | TPS: 1707 | 8929s
212
+ Step 1870/4000 | Loss: 2.1387 | LR: 0.000013 | TPS: 1707 | 8976s
213
+ Step 1880/4000 | Loss: 2.0963 | LR: 0.000013 | TPS: 1707 | 9023s
214
+ Step 1890/4000 | Loss: 2.1750 | LR: 0.000013 | TPS: 1707 | 9070s
215
+ Step 1900/4000 | Loss: 2.3945 | LR: 0.000012 | TPS: 1707 | 9117s
216
+ Step 1910/4000 | Loss: 2.1515 | LR: 0.000012 | TPS: 1707 | 9164s
217
+ Step 1920/4000 | Loss: 2.2224 | LR: 0.000012 | TPS: 1708 | 9211s
218
+ Step 1930/4000 | Loss: 2.3160 | LR: 0.000012 | TPS: 1708 | 9259s
219
+ Step 1940/4000 | Loss: 2.0126 | LR: 0.000012 | TPS: 1708 | 9306s
220
+ Step 1950/4000 | Loss: 2.2443 | LR: 0.000012 | TPS: 1708 | 9353s
221
+ Step 1960/4000 | Loss: 1.9590 | LR: 0.000012 | TPS: 1708 | 9400s
222
+ Step 1970/4000 | Loss: 2.2280 | LR: 0.000012 | TPS: 1708 | 9447s
223
+ Step 1980/4000 | Loss: 1.9723 | LR: 0.000012 | TPS: 1708 | 9494s
224
+ Step 1990/4000 | Loss: 2.0697 | LR: 0.000012 | TPS: 1709 | 9541s
225
+ Step 2000/4000 | Loss: 2.0568 | LR: 0.000012 | TPS: 1709 | 9589s
226
+ 📊 Val loss: 2.1674
227
+
228
+ 🔤 Generation samples (step 2000):
229
+ [EN] Paris (pronounced "Paris") is a city located in northeastern France. It borders Germany to the east, with Belgium and Luxembourg as its easternmost provinces.
230
+ [HE] בצרפת, העיר העתיקה היא אזור התיירות העיקרי.
231
+ [AR] باريس
232
+ [FA] پاریس، پایتخت کشور فرانسه است.
233
+ [TRANSLATE] The answer is YES.
234
+
235
+ Step 2010/4000 | Loss: 1.9474 | LR: 0.000012 | TPS: 1708 | 9643s
236
+ Step 2020/4000 | Loss: 2.1131 | LR: 0.000012 | TPS: 1708 | 9690s
237
+ Step 2030/4000 | Loss: 2.0446 | LR: 0.000012 | TPS: 1708 | 9737s
238
+ Step 2040/4000 | Loss: 2.2229 | LR: 0.000011 | TPS: 1708 | 9784s
239
+ Step 2050/4000 | Loss: 2.1576 | LR: 0.000011 | TPS: 1708 | 9832s
240
+ Step 2060/4000 | Loss: 2.1899 | LR: 0.000011 | TPS: 1708 | 9879s
241
+ Step 2070/4000 | Loss: 2.0957 | LR: 0.000011 | TPS: 1708 | 9926s
242
+ Step 2080/4000 | Loss: 2.2643 | LR: 0.000011 | TPS: 1709 | 9973s
243
+ Step 2090/4000 | Loss: 2.0676 | LR: 0.000011 | TPS: 1709 | 10020s
244
+ Step 2100/4000 | Loss: 2.1386 | LR: 0.000011 | TPS: 1709 | 10067s
245
+ Step 2110/4000 | Loss: 2.1891 | LR: 0.000011 | TPS: 1709 | 10114s
246
+ Step 2120/4000 | Loss: 1.9532 | LR: 0.000011 | TPS: 1709 | 10162s
247
+ Step 2130/4000 | Loss: 1.9766 | LR: 0.000011 | TPS: 1709 | 10209s
248
+ Step 2140/4000 | Loss: 2.3656 | LR: 0.000011 | TPS: 1709 | 10256s
249
+ Step 2150/4000 | Loss: 2.0545 | LR: 0.000011 | TPS: 1709 | 10303s
250
+ Step 2160/4000 | Loss: 1.9706 | LR: 0.000011 | TPS: 1710 | 10350s
251
+ Step 2170/4000 | Loss: 2.0302 | LR: 0.000010 | TPS: 1710 | 10397s
252
+ Step 2180/4000 | Loss: 2.1752 | LR: 0.000010 | TPS: 1710 | 10444s
253
+ Step 2190/4000 | Loss: 2.1455 | LR: 0.000010 | TPS: 1710 | 10492s
254
+ Step 2200/4000 | Loss: 2.2238 | LR: 0.000010 | TPS: 1710 | 10539s
255
+ Step 2210/4000 | Loss: 2.1010 | LR: 0.000010 | TPS: 1710 | 10586s
256
+ Step 2220/4000 | Loss: 2.1831 | LR: 0.000010 | TPS: 1710 | 10633s
257
+ Step 2230/4000 | Loss: 1.6542 | LR: 0.000010 | TPS: 1710 | 10680s
258
+ Step 2240/4000 | Loss: 2.1102 | LR: 0.000010 | TPS: 1711 | 10727s
259
+ Step 2250/4000 | Loss: 2.2099 | LR: 0.000010 | TPS: 1711 | 10774s
260
+ Step 2260/4000 | Loss: 2.1750 | LR: 0.000010 | TPS: 1711 | 10821s
261
+ Step 2270/4000 | Loss: 2.2369 | LR: 0.000010 | TPS: 1711 | 10869s
262
+ Step 2280/4000 | Loss: 2.0393 | LR: 0.000010 | TPS: 1711 | 10916s
263
+ Step 2290/4000 | Loss: 2.3140 | LR: 0.000010 | TPS: 1711 | 10963s
264
+ Step 2300/4000 | Loss: 2.0601 | LR: 0.000010 | TPS: 1711 | 11010s
265
+ Step 2310/4000 | Loss: 2.1472 | LR: 0.000009 | TPS: 1711 | 11057s
266
+ Step 2320/4000 | Loss: 2.0987 | LR: 0.000009 | TPS: 1712 | 11104s
267
+ Step 2330/4000 | Loss: 2.0354 | LR: 0.000009 | TPS: 1712 | 11152s
268
+ Step 2340/4000 | Loss: 1.9309 | LR: 0.000009 | TPS: 1712 | 11199s
269
+ Step 2350/4000 | Loss: 2.1222 | LR: 0.000009 | TPS: 1712 | 11246s
270
+ Step 2360/4000 | Loss: 1.9861 | LR: 0.000009 | TPS: 1712 | 11293s
271
+ Step 2370/4000 | Loss: 2.1986 | LR: 0.000009 | TPS: 1712 | 11340s
272
+ Step 2380/4000 | Loss: 2.0335 | LR: 0.000009 | TPS: 1712 | 11387s
273
+ Step 2390/4000 | Loss: 2.2123 | LR: 0.000009 | TPS: 1712 | 11434s
274
+ Step 2400/4000 | Loss: 2.0287 | LR: 0.000009 | TPS: 1712 | 11482s
275
+ 📊 Val loss: 2.1943
276
+ Step 2410/4000 | Loss: 2.0483 | LR: 0.000009 | TPS: 1712 | 11534s
277
+ Step 2420/4000 | Loss: 2.0710 | LR: 0.000009 | TPS: 1712 | 11581s
278
+ Step 2430/4000 | Loss: 2.3005 | LR: 0.000009 | TPS: 1712 | 11629s
279
+ Step 2440/4000 | Loss: 2.0617 | LR: 0.000009 | TPS: 1712 | 11676s
280
+ Step 2450/4000 | Loss: 2.2063 | LR: 0.000008 | TPS: 1712 | 11723s
281
+ Step 2460/4000 | Loss: 2.0405 | LR: 0.000008 | TPS: 1712 | 11770s
282
+ Step 2470/4000 | Loss: 2.2280 | LR: 0.000008 | TPS: 1712 | 11817s
283
+ Step 2480/4000 | Loss: 2.3856 | LR: 0.000008 | TPS: 1712 | 11864s
284
+ Step 2490/4000 | Loss: 1.9853 | LR: 0.000008 | TPS: 1712 | 11911s
285
+ Step 2500/4000 | Loss: 2.0673 | LR: 0.000008 | TPS: 1713 | 11959s
286
+ Step 2510/4000 | Loss: 2.1777 | LR: 0.000008 | TPS: 1713 | 12006s
287
+ Step 2520/4000 | Loss: 1.9846 | LR: 0.000008 | TPS: 1713 | 12053s
288
+ Step 2530/4000 | Loss: 2.1922 | LR: 0.000008 | TPS: 1713 | 12100s
289
+ Step 2540/4000 | Loss: 2.0542 | LR: 0.000008 | TPS: 1713 | 12147s
290
+ Step 2550/4000 | Loss: 2.1041 | LR: 0.000008 | TPS: 1713 | 12194s
291
+ Step 2560/4000 | Loss: 2.0099 | LR: 0.000008 | TPS: 1713 | 12241s
292
+ Step 2570/4000 | Loss: 1.8186 | LR: 0.000008 | TPS: 1713 | 12289s
293
+ Step 2580/4000 | Loss: 2.2079 | LR: 0.000008 | TPS: 1713 | 12336s
294
+ Step 2590/4000 | Loss: 1.9931 | LR: 0.000007 | TPS: 1713 | 12383s
295
+ Step 2600/4000 | Loss: 2.0986 | LR: 0.000007 | TPS: 1714 | 12430s
296
+ Step 2610/4000 | Loss: 2.0439 | LR: 0.000007 | TPS: 1714 | 12477s
297
+ Step 2620/4000 | Loss: 1.9408 | LR: 0.000007 | TPS: 1714 | 12524s
298
+ Step 2630/4000 | Loss: 2.1992 | LR: 0.000007 | TPS: 1714 | 12571s
299
+ Step 2640/4000 | Loss: 2.0929 | LR: 0.000007 | TPS: 1714 | 12619s
300
+ Step 2650/4000 | Loss: 1.9728 | LR: 0.000007 | TPS: 1714 | 12666s
301
+ Step 2660/4000 | Loss: 1.8369 | LR: 0.000007 | TPS: 1714 | 12713s
302
+ Step 2670/4000 | Loss: 1.9926 | LR: 0.000007 | TPS: 1714 | 12760s
303
+ Step 2680/4000 | Loss: 2.0414 | LR: 0.000007 | TPS: 1714 | 12807s
304
+ Step 2690/4000 | Loss: 2.1368 | LR: 0.000007 | TPS: 1714 | 12854s
305
+ Step 2700/4000 | Loss: 2.0254 | LR: 0.000007 | TPS: 1714 | 12901s
306
+ Step 2710/4000 | Loss: 2.1572 | LR: 0.000007 | TPS: 1715 | 12948s
307
+ Step 2720/4000 | Loss: 2.0418 | LR: 0.000007 | TPS: 1715 | 12996s
308
+ Step 2730/4000 | Loss: 2.1235 | LR: 0.000007 | TPS: 1715 | 13043s
309
+ Step 2740/4000 | Loss: 2.0756 | LR: 0.000006 | TPS: 1715 | 13090s
310
+ Step 2750/4000 | Loss: 2.1417 | LR: 0.000006 | TPS: 1715 | 13137s
311
+ Step 2760/4000 | Loss: 1.9427 | LR: 0.000006 | TPS: 1715 | 13184s
312
+ Step 2770/4000 | Loss: 2.1166 | LR: 0.000006 | TPS: 1715 | 13231s
313
+ Step 2780/4000 | Loss: 1.9711 | LR: 0.000006 | TPS: 1715 | 13278s
314
+ Step 2790/4000 | Loss: 2.1390 | LR: 0.000006 | TPS: 1715 | 13326s
315
+ Step 2800/4000 | Loss: 2.0557 | LR: 0.000006 | TPS: 1715 | 13373s
316
+ 📊 Val loss: 2.1839
317
+ Step 2810/4000 | Loss: 2.0581 | LR: 0.000006 | TPS: 1715 | 13425s
318
+ Step 2820/4000 | Loss: 2.1139 | LR: 0.000006 | TPS: 1715 | 13473s
319
+ Step 2830/4000 | Loss: 2.1228 | LR: 0.000006 | TPS: 1715 | 13520s
320
+ Step 2840/4000 | Loss: 1.9685 | LR: 0.000006 | TPS: 1715 | 13567s
321
+ Step 2850/4000 | Loss: 2.1206 | LR: 0.000006 | TPS: 1715 | 13614s
322
+ Step 2860/4000 | Loss: 2.1942 | LR: 0.000006 | TPS: 1715 | 13661s
323
+ Step 2870/4000 | Loss: 1.9068 | LR: 0.000006 | TPS: 1715 | 13708s
324
+ Step 2880/4000 | Loss: 2.2099 | LR: 0.000006 | TPS: 1715 | 13755s
325
+ Step 2890/4000 | Loss: 2.0948 | LR: 0.000006 | TPS: 1715 | 13803s
326
+ Step 2900/4000 | Loss: 2.0630 | LR: 0.000005 | TPS: 1715 | 13850s
327
+ Step 2910/4000 | Loss: 1.9867 | LR: 0.000005 | TPS: 1715 | 13897s
328
+ Step 2920/4000 | Loss: 2.0602 | LR: 0.000005 | TPS: 1715 | 13944s
329
+ Step 2930/4000 | Loss: 2.0163 | LR: 0.000005 | TPS: 1716 | 13991s
330
+ Step 2940/4000 | Loss: 2.0337 | LR: 0.000005 | TPS: 1716 | 14038s
331
+ Step 2950/4000 | Loss: 2.2476 | LR: 0.000005 | TPS: 1716 | 14085s
332
+ Step 2960/4000 | Loss: 2.0430 | LR: 0.000005 | TPS: 1716 | 14133s
333
+ Step 2970/4000 | Loss: 2.3037 | LR: 0.000005 | TPS: 1716 | 14180s
334
+ Step 2980/4000 | Loss: 2.0831 | LR: 0.000005 | TPS: 1716 | 14227s
335
+ Step 2990/4000 | Loss: 2.1781 | LR: 0.000005 | TPS: 1716 | 14274s
336
+ Step 3000/4000 | Loss: 2.0784 | LR: 0.000005 | TPS: 1716 | 14321s
337
+
338
+ 🔤 Generation samples (step 3000):
339
+ [EN] The city of Paris is a metropolitan area in Europe, consisting of 57 counties. Its main cities include Lyons, Bordeaux and Valence.
340
+ [HE] איטליה.
341
+ [AR] باريس.
342
+ [FA] پاریس پایتخت کشور فرانسه و یکی از شهرهای بزرگ این کشور است. شهر پاریس در شمال غربی قاره اروپا قرار دارد.
343
+ [TRANSLATE] You are the first one in the world to learn how to think.
344
+
345
+ Step 3010/4000 | Loss: 2.1244 | LR: 0.000005 | TPS: 1716 | 14370s
346
+ Step 3020/4000 | Loss: 2.1107 | LR: 0.000005 | TPS: 1716 | 14417s
347
+ Step 3030/4000 | Loss: 2.3589 | LR: 0.000005 | TPS: 1716 | 14464s
348
+ Step 3040/4000 | Loss: 2.0592 | LR: 0.000005 | TPS: 1716 | 14511s
349
+ Step 3050/4000 | Loss: 2.0730 | LR: 0.000005 | TPS: 1716 | 14559s
350
+ Step 3060/4000 | Loss: 2.1365 | LR: 0.000005 | TPS: 1716 | 14606s
351
+ Step 3070/4000 | Loss: 1.9819 | LR: 0.000005 | TPS: 1716 | 14653s
352
+ Step 3080/4000 | Loss: 2.2175 | LR: 0.000004 | TPS: 1716 | 14700s
353
+ Step 3090/4000 | Loss: 2.1442 | LR: 0.000004 | TPS: 1716 | 14747s
354
+ Step 3100/4000 | Loss: 2.0811 | LR: 0.000004 | TPS: 1717 | 14794s
355
+ Step 3110/4000 | Loss: 2.1427 | LR: 0.000004 | TPS: 1717 | 14841s
356
+ Step 3120/4000 | Loss: 2.1722 | LR: 0.000004 | TPS: 1717 | 14889s
357
+ Step 3130/4000 | Loss: 2.0577 | LR: 0.000004 | TPS: 1717 | 14936s
358
+ Step 3140/4000 | Loss: 2.0873 | LR: 0.000004 | TPS: 1717 | 14983s
359
+ Step 3150/4000 | Loss: 2.2920 | LR: 0.000004 | TPS: 1717 | 15030s
360
+ Step 3160/4000 | Loss: 1.8839 | LR: 0.000004 | TPS: 1717 | 15077s
361
+ Step 3170/4000 | Loss: 2.0144 | LR: 0.000004 | TPS: 1717 | 15124s
362
+ Step 3180/4000 | Loss: 1.9689 | LR: 0.000004 | TPS: 1717 | 15171s
363
+ Step 3190/4000 | Loss: 2.2123 | LR: 0.000004 | TPS: 1717 | 15219s
364
+ Step 3200/4000 | Loss: 2.0510 | LR: 0.000004 | TPS: 1717 | 15266s
365
+ 📊 Val loss: 2.1269
366
+ Step 3210/4000 | Loss: 2.4087 | LR: 0.000004 | TPS: 1717 | 15318s
367
+ Step 3220/4000 | Loss: 2.2608 | LR: 0.000004 | TPS: 1717 | 15365s
368
+ Step 3230/4000 | Loss: 2.1930 | LR: 0.000004 | TPS: 1717 | 15413s
369
+ Step 3240/4000 | Loss: 2.0713 | LR: 0.000004 | TPS: 1717 | 15460s
370
+ Step 3250/4000 | Loss: 2.2660 | LR: 0.000004 | TPS: 1717 | 15507s
371
+ Step 3260/4000 | Loss: 1.9479 | LR: 0.000004 | TPS: 1717 | 15554s
372
+ Step 3270/4000 | Loss: 1.9657 | LR: 0.000004 | TPS: 1717 | 15601s
373
+ Step 3280/4000 | Loss: 2.1884 | LR: 0.000004 | TPS: 1717 | 15648s
374
+ Step 3290/4000 | Loss: 2.0927 | LR: 0.000004 | TPS: 1717 | 15695s
375
+ Step 3300/4000 | Loss: 2.0393 | LR: 0.000003 | TPS: 1717 | 15743s
376
+ Step 3310/4000 | Loss: 2.1302 | LR: 0.000003 | TPS: 1717 | 15790s
377
+ Step 3320/4000 | Loss: 2.0059 | LR: 0.000003 | TPS: 1717 | 15837s
378
+ Step 3330/4000 | Loss: 1.8687 | LR: 0.000003 | TPS: 1717 | 15884s
379
+ Step 3340/4000 | Loss: 2.0293 | LR: 0.000003 | TPS: 1717 | 15931s
380
+ Step 3350/4000 | Loss: 2.1500 | LR: 0.000003 | TPS: 1718 | 15978s
381
+ Step 3360/4000 | Loss: 1.9667 | LR: 0.000003 | TPS: 1718 | 16025s
382
+ Step 3370/4000 | Loss: 2.1206 | LR: 0.000003 | TPS: 1718 | 16073s
383
+ Step 3380/4000 | Loss: 2.3028 | LR: 0.000003 | TPS: 1718 | 16120s
384
+ Step 3390/4000 | Loss: 2.0075 | LR: 0.000003 | TPS: 1718 | 16167s
385
+ Step 3400/4000 | Loss: 2.0562 | LR: 0.000003 | TPS: 1718 | 16214s
386
+ Step 3410/4000 | Loss: 1.9977 | LR: 0.000003 | TPS: 1718 | 16261s
387
+ Step 3420/4000 | Loss: 2.1680 | LR: 0.000003 | TPS: 1718 | 16308s
388
+ Step 3430/4000 | Loss: 2.0009 | LR: 0.000003 | TPS: 1718 | 16355s
389
+ Step 3440/4000 | Loss: 1.8301 | LR: 0.000003 | TPS: 1718 | 16403s
390
+ Step 3450/4000 | Loss: 2.0239 | LR: 0.000003 | TPS: 1718 | 16450s
391
+ Step 3460/4000 | Loss: 2.0535 | LR: 0.000003 | TPS: 1718 | 16497s
392
+ Step 3470/4000 | Loss: 2.1348 | LR: 0.000003 | TPS: 1718 | 16544s
393
+ Step 3480/4000 | Loss: 2.0337 | LR: 0.000003 | TPS: 1718 | 16591s
394
+ Step 3490/4000 | Loss: 1.9342 | LR: 0.000003 | TPS: 1718 | 16638s
395
+ Step 3500/4000 | Loss: 2.0052 | LR: 0.000003 | TPS: 1718 | 16685s
396
+ Step 3510/4000 | Loss: 1.9902 | LR: 0.000003 | TPS: 1718 | 16732s
397
+ Step 3520/4000 | Loss: 2.1567 | LR: 0.000003 | TPS: 1719 | 16780s
398
+ Step 3530/4000 | Loss: 2.0515 | LR: 0.000003 | TPS: 1719 | 16827s
399
+ Step 3540/4000 | Loss: 2.1572 | LR: 0.000003 | TPS: 1719 | 16874s
400
+ Step 3550/4000 | Loss: 2.1381 | LR: 0.000003 | TPS: 1719 | 16921s
401
+ Step 3560/4000 | Loss: 2.0383 | LR: 0.000003 | TPS: 1719 | 16968s
402
+ Step 3570/4000 | Loss: 2.3566 | LR: 0.000003 | TPS: 1719 | 17015s
403
+ Step 3580/4000 | Loss: 1.9773 | LR: 0.000003 | TPS: 1719 | 17062s
404
+ Step 3590/4000 | Loss: 2.0418 | LR: 0.000003 | TPS: 1719 | 17110s
405
+ Step 3600/4000 | Loss: 2.1756 | LR: 0.000002 | TPS: 1719 | 17157s
406
+ 📊 Val loss: 2.1478
407
+ Step 3610/4000 | Loss: 2.0761 | LR: 0.000002 | TPS: 1718 | 17209s
408
+ Step 3620/4000 | Loss: 2.1353 | LR: 0.000002 | TPS: 1718 | 17257s
409
+ Step 3630/4000 | Loss: 2.1856 | LR: 0.000002 | TPS: 1719 | 17304s
410
+ Step 3640/4000 | Loss: 2.1298 | LR: 0.000002 | TPS: 1719 | 17351s
411
+ Step 3650/4000 | Loss: 2.0784 | LR: 0.000002 | TPS: 1719 | 17398s
412
+ Step 3660/4000 | Loss: 2.0533 | LR: 0.000002 | TPS: 1719 | 17445s
413
+ Step 3670/4000 | Loss: 2.2151 | LR: 0.000002 | TPS: 1719 | 17492s
414
+ Step 3680/4000 | Loss: 2.0177 | LR: 0.000002 | TPS: 1719 | 17539s
415
+ Step 3690/4000 | Loss: 2.1048 | LR: 0.000002 | TPS: 1719 | 17587s
416
+ Step 3700/4000 | Loss: 2.0629 | LR: 0.000002 | TPS: 1719 | 17634s
417
+ Step 3710/4000 | Loss: 2.0375 | LR: 0.000002 | TPS: 1719 | 17681s
418
+ Step 3720/4000 | Loss: 2.2282 | LR: 0.000002 | TPS: 1719 | 17728s
419
+ Step 3730/4000 | Loss: 2.2049 | LR: 0.000002 | TPS: 1719 | 17775s
420
+ Step 3740/4000 | Loss: 2.0247 | LR: 0.000002 | TPS: 1719 | 17822s
421
+ Step 3750/4000 | Loss: 2.0337 | LR: 0.000002 | TPS: 1719 | 17869s
422
+ Step 3760/4000 | Loss: 2.0922 | LR: 0.000002 | TPS: 1719 | 17917s
423
+ Step 3770/4000 | Loss: 2.1018 | LR: 0.000002 | TPS: 1719 | 17964s
424
+ Step 3780/4000 | Loss: 2.1183 | LR: 0.000002 | TPS: 1719 | 18011s
425
+ Step 3790/4000 | Loss: 2.2469 | LR: 0.000002 | TPS: 1719 | 18058s
426
+ Step 3800/4000 | Loss: 2.1373 | LR: 0.000002 | TPS: 1719 | 18105s
427
+ Step 3810/4000 | Loss: 2.1103 | LR: 0.000002 | TPS: 1719 | 18152s
428
+ Step 3820/4000 | Loss: 2.0317 | LR: 0.000002 | TPS: 1719 | 18199s
429
+ Step 3830/4000 | Loss: 2.0022 | LR: 0.000002 | TPS: 1720 | 18247s
430
+ Step 3840/4000 | Loss: 2.1618 | LR: 0.000002 | TPS: 1720 | 18294s
431
+ Step 3850/4000 | Loss: 2.1421 | LR: 0.000002 | TPS: 1720 | 18341s
432
+ Step 3860/4000 | Loss: 1.9279 | LR: 0.000002 | TPS: 1720 | 18388s
433
+ Step 3870/4000 | Loss: 2.1657 | LR: 0.000002 | TPS: 1720 | 18435s
434
+ Step 3880/4000 | Loss: 2.1433 | LR: 0.000002 | TPS: 1720 | 18482s
435
+ Step 3890/4000 | Loss: 2.0893 | LR: 0.000002 | TPS: 1720 | 18529s
436
+ Step 3900/4000 | Loss: 2.0036 | LR: 0.000002 | TPS: 1720 | 18576s
437
+ Step 3910/4000 | Loss: 2.0691 | LR: 0.000002 | TPS: 1720 | 18624s
438
+ Step 3920/4000 | Loss: 2.0282 | LR: 0.000002 | TPS: 1720 | 18671s
439
+ Step 3930/4000 | Loss: 1.9818 | LR: 0.000002 | TPS: 1720 | 18718s
440
+ Step 3940/4000 | Loss: 2.1466 | LR: 0.000002 | TPS: 1720 | 18765s
441
+ Step 3950/4000 | Loss: 2.0455 | LR: 0.000002 | TPS: 1720 | 18812s
442
+ Step 3960/4000 | Loss: 2.1226 | LR: 0.000002 | TPS: 1720 | 18859s
443
+ Step 3970/4000 | Loss: 1.9890 | LR: 0.000002 | TPS: 1720 | 18906s
444
+ Step 3980/4000 | Loss: 2.1891 | LR: 0.000002 | TPS: 1720 | 18954s
445
+ Step 3990/4000 | Loss: 1.8920 | LR: 0.000002 | TPS: 1720 | 19001s
446
+ Step 4000/4000 | Loss: 2.0073 | LR: 0.000002 | TPS: 1720 | 19048s
447
+ 📊 Val loss: 2.1472
448
+
449
+ 🔤 Generation samples (step 4000):
450
+ [EN] The capital of France consists of 38 cities, 26.9% (14) of which are in the metropolitan area.
451
+ [HE] צרפת היא אחת מיעדי התיירות הפופולאריים ביותר בעולם, בשל היותה מוקד משיכה תיירותי משמעותי עבור תיירים מכל רחבי העולם. העיר בנויה משני חלקים עיקריים - כיכר ד'ארסאן (Droite Sud) ורחוב ד'ארסאן (De La Roch
452
+ [AR] باريس.
453
+ [FA] پاریس شهری بزرگ و تاریخی در شمال غربی اروپا است.
454
+ [TRANSLATE] It’s very short.
455
+
456
+
457
+ ============================================================
458
+ SFT TRAINING COMPLETE
459
+ Steps: 4000, Time: 19057s (317.6min)
460
+ Best val loss: 2.1164
461
+ Model saved to: /tmp/sft/sft_model_v2.pt
462
+ ============================================================
463
+ Uploading to S3...