Commit ·
1511d6d
1
Parent(s): 45630f8
new data
Browse files- logs/main_log.txt +120 -0
logs/main_log.txt
CHANGED
|
@@ -11180,3 +11180,123 @@ time (ms)
|
|
| 11180 |
time (ms)
|
| 11181 |
iteration 331/ 159576 | consumed samples: 5296 | elapsed time per iteration (ms): 13678.9 | learning rate: 1.469E-06 | global batch size: 16 | lm loss: 8.243130E+00 | loss scale: 4096.0 | grad norm: 39935.584 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
| 11182 |
time (ms)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11180 |
time (ms)
|
| 11181 |
iteration 331/ 159576 | consumed samples: 5296 | elapsed time per iteration (ms): 13678.9 | learning rate: 1.469E-06 | global batch size: 16 | lm loss: 8.243130E+00 | loss scale: 4096.0 | grad norm: 39935.584 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
| 11182 |
time (ms)
|
| 11183 |
+
iteration 332/ 159576 | consumed samples: 5312 | elapsed time per iteration (ms): 13653.3 | learning rate: 1.473E-06 | global batch size: 16 | lm loss: 8.148146E+00 | loss scale: 4096.0 | grad norm: 31710.971 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
| 11184 |
+
time (ms)
|
| 11185 |
+
iteration 333/ 159576 | consumed samples: 5328 | elapsed time per iteration (ms): 13982.9 | learning rate: 1.478E-06 | global batch size: 16 | lm loss: 8.055049E+00 | loss scale: 4096.0 | grad norm: 40555.458 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
| 11186 |
+
time (ms)
|
| 11187 |
+
iteration 334/ 159576 | consumed samples: 5344 | elapsed time per iteration (ms): 13576.5 | learning rate: 1.482E-06 | global batch size: 16 | lm loss: 8.154724E+00 | loss scale: 4096.0 | grad norm: 98189.157 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
| 11188 |
+
time (ms)
|
| 11189 |
+
iteration 335/ 159576 | consumed samples: 5360 | elapsed time per iteration (ms): 13666.3 | learning rate: 1.487E-06 | global batch size: 16 | lm loss: 8.056485E+00 | loss scale: 4096.0 | grad norm: 53277.066 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
| 11190 |
+
time (ms)
|
| 11191 |
+
iteration 336/ 159576 | consumed samples: 5376 | elapsed time per iteration (ms): 13667.7 | learning rate: 1.491E-06 | global batch size: 16 | lm loss: 7.902112E+00 | loss scale: 4096.0 | grad norm: 35520.620 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
| 11192 |
+
time (ms)
|
| 11193 |
+
iteration 337/ 159576 | consumed samples: 5392 | elapsed time per iteration (ms): 14189.1 | learning rate: 1.496E-06 | global batch size: 16 | lm loss: 8.211933E+00 | loss scale: 4096.0 | grad norm: 102636.452 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
| 11194 |
+
time (ms)
|
| 11195 |
+
iteration 338/ 159576 | consumed samples: 5408 | elapsed time per iteration (ms): 13538.3 | learning rate: 1.500E-06 | global batch size: 16 | lm loss: 8.077993E+00 | loss scale: 4096.0 | grad norm: 74161.424 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
| 11196 |
+
time (ms)
|
| 11197 |
+
iteration 339/ 159576 | consumed samples: 5424 | elapsed time per iteration (ms): 13690.1 | learning rate: 1.504E-06 | global batch size: 16 | lm loss: 8.002722E+00 | loss scale: 4096.0 | grad norm: 41178.202 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
| 11198 |
+
time (ms)
|
| 11199 |
+
iteration 340/ 159576 | consumed samples: 5440 | elapsed time per iteration (ms): 13761.4 | learning rate: 1.509E-06 | global batch size: 16 | lm loss: 8.070647E+00 | loss scale: 4096.0 | grad norm: 146660.160 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
| 11200 |
+
time (ms)
|
| 11201 |
+
iteration 341/ 159576 | consumed samples: 5456 | elapsed time per iteration (ms): 13679.6 | learning rate: 1.513E-06 | global batch size: 16 | lm loss: 8.211810E+00 | loss scale: 4096.0 | grad norm: 56011.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
| 11202 |
+
time (ms)
|
| 11203 |
+
iteration 342/ 159576 | consumed samples: 5472 | elapsed time per iteration (ms): 13958.7 | learning rate: 1.518E-06 | global batch size: 16 | lm loss: 8.028828E+00 | loss scale: 4096.0 | grad norm: 45507.509 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
| 11204 |
+
time (ms)
|
| 11205 |
+
iteration 343/ 159576 | consumed samples: 5488 | elapsed time per iteration (ms): 13796.1 | learning rate: 1.522E-06 | global batch size: 16 | lm loss: 8.000618E+00 | loss scale: 4096.0 | grad norm: 41366.016 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
| 11206 |
+
time (ms)
|
| 11207 |
+
iteration 344/ 159576 | consumed samples: 5504 | elapsed time per iteration (ms): 13566.5 | learning rate: 1.527E-06 | global batch size: 16 | lm loss: 8.106353E+00 | loss scale: 4096.0 | grad norm: 86487.826 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
| 11208 |
+
time (ms)
|
| 11209 |
+
iteration 345/ 159576 | consumed samples: 5520 | elapsed time per iteration (ms): 13617.7 | learning rate: 1.531E-06 | global batch size: 16 | lm loss: 8.130958E+00 | loss scale: 4096.0 | grad norm: 65559.636 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
| 11210 |
+
time (ms)
|
| 11211 |
+
iteration 346/ 159576 | consumed samples: 5536 | elapsed time per iteration (ms): 14006.3 | learning rate: 1.536E-06 | global batch size: 16 | lm loss: 8.100373E+00 | loss scale: 4096.0 | grad norm: 50918.888 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
| 11212 |
+
time (ms)
|
| 11213 |
+
iteration 347/ 159576 | consumed samples: 5552 | elapsed time per iteration (ms): 13652.0 | learning rate: 1.540E-06 | global batch size: 16 | lm loss: 8.193462E+00 | loss scale: 4096.0 | grad norm: 49482.923 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
| 11214 |
+
time (ms)
|
| 11215 |
+
iteration 348/ 159576 | consumed samples: 5568 | elapsed time per iteration (ms): 13785.4 | learning rate: 1.544E-06 | global batch size: 16 | lm loss: 8.185720E+00 | loss scale: 4096.0 | grad norm: 33616.818 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
| 11216 |
+
time (ms)
|
| 11217 |
+
iteration 349/ 159576 | consumed samples: 5584 | elapsed time per iteration (ms): 13534.7 | learning rate: 1.549E-06 | global batch size: 16 | lm loss: 7.997324E+00 | loss scale: 4096.0 | grad norm: 41224.808 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
| 11218 |
+
time (ms)
|
| 11219 |
+
iteration 350/ 159576 | consumed samples: 5600 | elapsed time per iteration (ms): 14148.0 | learning rate: 1.553E-06 | global batch size: 16 | lm loss: 8.069170E+00 | loss scale: 4096.0 | grad norm: 61139.413 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
| 11220 |
+
time (ms)
|
| 11221 |
+
iteration 351/ 159576 | consumed samples: 5616 | elapsed time per iteration (ms): 13626.0 | learning rate: 1.558E-06 | global batch size: 16 | lm loss: 8.052499E+00 | loss scale: 4096.0 | grad norm: 58965.426 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
| 11222 |
+
time (ms)
|
| 11223 |
+
iteration 352/ 159576 | consumed samples: 5632 | elapsed time per iteration (ms): 13633.5 | learning rate: 1.562E-06 | global batch size: 16 | lm loss: 8.036291E+00 | loss scale: 4096.0 | grad norm: 38820.487 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
| 11224 |
+
time (ms)
|
| 11225 |
+
iteration 353/ 159576 | consumed samples: 5648 | elapsed time per iteration (ms): 13648.6 | learning rate: 1.567E-06 | global batch size: 16 | lm loss: 8.007360E+00 | loss scale: 4096.0 | grad norm: 33342.929 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
| 11226 |
+
time (ms)
|
| 11227 |
+
iteration 354/ 159576 | consumed samples: 5664 | elapsed time per iteration (ms): 13707.0 | learning rate: 1.571E-06 | global batch size: 16 | lm loss: 7.890161E+00 | loss scale: 4096.0 | grad norm: 62589.896 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
| 11228 |
+
time (ms)
|
| 11229 |
+
iteration 355/ 159576 | consumed samples: 5680 | elapsed time per iteration (ms): 14101.4 | learning rate: 1.575E-06 | global batch size: 16 | lm loss: 8.034273E+00 | loss scale: 4096.0 | grad norm: 62100.784 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
| 11230 |
+
time (ms)
|
| 11231 |
+
iteration 356/ 159576 | consumed samples: 5696 | elapsed time per iteration (ms): 13548.4 | learning rate: 1.580E-06 | global batch size: 16 | lm loss: 7.964279E+00 | loss scale: 4096.0 | grad norm: 37283.643 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
| 11232 |
+
time (ms)
|
| 11233 |
+
iteration 357/ 159576 | consumed samples: 5712 | elapsed time per iteration (ms): 13655.3 | learning rate: 1.584E-06 | global batch size: 16 | lm loss: 7.882459E+00 | loss scale: 4096.0 | grad norm: 36278.786 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
| 11234 |
+
time (ms)
|
| 11235 |
+
iteration 358/ 159576 | consumed samples: 5728 | elapsed time per iteration (ms): 13872.1 | learning rate: 1.589E-06 | global batch size: 16 | lm loss: 8.081428E+00 | loss scale: 4096.0 | grad norm: 59624.520 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
| 11236 |
+
time (ms)
|
| 11237 |
+
iteration 359/ 159576 | consumed samples: 5744 | elapsed time per iteration (ms): 13830.3 | learning rate: 1.593E-06 | global batch size: 16 | lm loss: 8.345490E+00 | loss scale: 4096.0 | grad norm: 101818.152 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
| 11238 |
+
time (ms)
|
| 11239 |
+
iteration 360/ 159576 | consumed samples: 5760 | elapsed time per iteration (ms): 13738.3 | learning rate: 1.598E-06 | global batch size: 16 | lm loss: 8.090802E+00 | loss scale: 4096.0 | grad norm: 37735.210 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
| 11240 |
+
time (ms)
|
| 11241 |
+
iteration 361/ 159576 | consumed samples: 5776 | elapsed time per iteration (ms): 13673.7 | learning rate: 1.602E-06 | global batch size: 16 | lm loss: 7.934822E+00 | loss scale: 4096.0 | grad norm: 35051.225 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
| 11242 |
+
time (ms)
|
| 11243 |
+
iteration 362/ 159576 | consumed samples: 5792 | elapsed time per iteration (ms): 13779.0 | learning rate: 1.607E-06 | global batch size: 16 | lm loss: 8.217977E+00 | loss scale: 4096.0 | grad norm: 81671.155 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
| 11244 |
+
time (ms)
|
| 11245 |
+
iteration 363/ 159576 | consumed samples: 5808 | elapsed time per iteration (ms): 14148.6 | learning rate: 1.611E-06 | global batch size: 16 | lm loss: 7.956856E+00 | loss scale: 4096.0 | grad norm: 123728.069 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
| 11246 |
+
time (ms)
|
| 11247 |
+
iteration 364/ 159576 | consumed samples: 5824 | elapsed time per iteration (ms): 13509.6 | learning rate: 1.615E-06 | global batch size: 16 | lm loss: 7.980748E+00 | loss scale: 4096.0 | grad norm: 64323.538 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
| 11248 |
+
time (ms)
|
| 11249 |
+
iteration 365/ 159576 | consumed samples: 5840 | elapsed time per iteration (ms): 13791.1 | learning rate: 1.620E-06 | global batch size: 16 | lm loss: 7.927495E+00 | loss scale: 4096.0 | grad norm: 38595.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
| 11250 |
+
time (ms)
|
| 11251 |
+
iteration 366/ 159576 | consumed samples: 5856 | elapsed time per iteration (ms): 13535.8 | learning rate: 1.624E-06 | global batch size: 16 | lm loss: 7.992770E+00 | loss scale: 4096.0 | grad norm: 34786.799 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
| 11252 |
+
time (ms)
|
| 11253 |
+
iteration 367/ 159576 | consumed samples: 5872 | elapsed time per iteration (ms): 13709.6 | learning rate: 1.629E-06 | global batch size: 16 | lm loss: 8.033854E+00 | loss scale: 4096.0 | grad norm: 26681.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
| 11254 |
+
time (ms)
|
| 11255 |
+
iteration 368/ 159576 | consumed samples: 5888 | elapsed time per iteration (ms): 13923.8 | learning rate: 1.633E-06 | global batch size: 16 | lm loss: 8.086361E+00 | loss scale: 4096.0 | grad norm: 116063.612 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
| 11256 |
+
time (ms)
|
| 11257 |
+
iteration 369/ 159576 | consumed samples: 5904 | elapsed time per iteration (ms): 13743.2 | learning rate: 1.638E-06 | global batch size: 16 | lm loss: 8.136069E+00 | loss scale: 4096.0 | grad norm: 192843.981 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
| 11258 |
+
time (ms)
|
| 11259 |
+
iteration 370/ 159576 | consumed samples: 5920 | elapsed time per iteration (ms): 13586.5 | learning rate: 1.642E-06 | global batch size: 16 | lm loss: 8.213842E+00 | loss scale: 4096.0 | grad norm: 66749.630 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
| 11260 |
+
time (ms)
|
| 11261 |
+
iteration 371/ 159576 | consumed samples: 5936 | elapsed time per iteration (ms): 13637.5 | learning rate: 1.646E-06 | global batch size: 16 | lm loss: 7.862526E+00 | loss scale: 4096.0 | grad norm: 35628.877 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
| 11262 |
+
time (ms)
|
| 11263 |
+
iteration 372/ 159576 | consumed samples: 5952 | elapsed time per iteration (ms): 14269.3 | learning rate: 1.651E-06 | global batch size: 16 | lm loss: 8.111351E+00 | loss scale: 4096.0 | grad norm: 51284.654 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
| 11264 |
+
time (ms)
|
| 11265 |
+
iteration 373/ 159576 | consumed samples: 5968 | elapsed time per iteration (ms): 13424.8 | learning rate: 1.655E-06 | global batch size: 16 | lm loss: 7.860275E+00 | loss scale: 4096.0 | grad norm: 51885.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
| 11266 |
+
time (ms)
|
| 11267 |
+
iteration 374/ 159576 | consumed samples: 5984 | elapsed time per iteration (ms): 13638.9 | learning rate: 1.660E-06 | global batch size: 16 | lm loss: 7.995843E+00 | loss scale: 4096.0 | grad norm: 40982.716 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
| 11268 |
+
time (ms)
|
| 11269 |
+
iteration 375/ 159576 | consumed samples: 6000 | elapsed time per iteration (ms): 13719.8 | learning rate: 1.664E-06 | global batch size: 16 | lm loss: 7.989121E+00 | loss scale: 4096.0 | grad norm: 43694.588 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
| 11270 |
+
time (ms)
|
| 11271 |
+
iteration 376/ 159576 | consumed samples: 6016 | elapsed time per iteration (ms): 13718.2 | learning rate: 1.669E-06 | global batch size: 16 | lm loss: 8.054690E+00 | loss scale: 4096.0 | grad norm: 56142.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
| 11272 |
+
time (ms)
|
| 11273 |
+
iteration 377/ 159576 | consumed samples: 6032 | elapsed time per iteration (ms): 14087.0 | learning rate: 1.673E-06 | global batch size: 16 | lm loss: 8.145277E+00 | loss scale: 4096.0 | grad norm: 77837.877 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
| 11274 |
+
time (ms)
|
| 11275 |
+
iteration 378/ 159576 | consumed samples: 6048 | elapsed time per iteration (ms): 13621.7 | learning rate: 1.678E-06 | global batch size: 16 | lm loss: 7.879861E+00 | loss scale: 4096.0 | grad norm: 35054.780 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
| 11276 |
+
time (ms)
|
| 11277 |
+
iteration 379/ 159576 | consumed samples: 6064 | elapsed time per iteration (ms): 13676.7 | learning rate: 1.682E-06 | global batch size: 16 | lm loss: 7.996103E+00 | loss scale: 4096.0 | grad norm: 31871.611 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
| 11278 |
+
time (ms)
|
| 11279 |
+
iteration 380/ 159576 | consumed samples: 6080 | elapsed time per iteration (ms): 13756.2 | learning rate: 1.686E-06 | global batch size: 16 | lm loss: 7.788074E+00 | loss scale: 4096.0 | grad norm: 30378.507 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
| 11280 |
+
time (ms)
|
| 11281 |
+
iteration 381/ 159576 | consumed samples: 6096 | elapsed time per iteration (ms): 13731.7 | learning rate: 1.691E-06 | global batch size: 16 | lm loss: 7.998044E+00 | loss scale: 4096.0 | grad norm: 78167.228 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
| 11282 |
+
time (ms)
|
| 11283 |
+
iteration 382/ 159576 | consumed samples: 6112 | elapsed time per iteration (ms): 13696.8 | learning rate: 1.695E-06 | global batch size: 16 | lm loss: 8.001510E+00 | loss scale: 4096.0 | grad norm: 57981.800 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
| 11284 |
+
time (ms)
|
| 11285 |
+
iteration 383/ 159576 | consumed samples: 6128 | elapsed time per iteration (ms): 13688.0 | learning rate: 1.700E-06 | global batch size: 16 | lm loss: 8.043833E+00 | loss scale: 4096.0 | grad norm: 40631.885 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
| 11286 |
+
time (ms)
|
| 11287 |
+
iteration 384/ 159576 | consumed samples: 6144 | elapsed time per iteration (ms): 13680.4 | learning rate: 1.704E-06 | global batch size: 16 | lm loss: 8.029270E+00 | loss scale: 4096.0 | grad norm: 31579.477 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
| 11288 |
+
time (ms)
|
| 11289 |
+
iteration 385/ 159576 | consumed samples: 6160 | elapsed time per iteration (ms): 14057.5 | learning rate: 1.709E-06 | global batch size: 16 | lm loss: 8.156369E+00 | loss scale: 4096.0 | grad norm: 87842.060 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
| 11290 |
+
time (ms)
|
| 11291 |
+
iteration 386/ 159576 | consumed samples: 6176 | elapsed time per iteration (ms): 13765.1 | learning rate: 1.713E-06 | global batch size: 16 | lm loss: 8.024692E+00 | loss scale: 4096.0 | grad norm: 56881.857 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
| 11292 |
+
time (ms)
|
| 11293 |
+
iteration 387/ 159576 | consumed samples: 6192 | elapsed time per iteration (ms): 13768.8 | learning rate: 1.717E-06 | global batch size: 16 | lm loss: 7.997876E+00 | loss scale: 4096.0 | grad norm: 31105.819 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
| 11294 |
+
time (ms)
|
| 11295 |
+
iteration 388/ 159576 | consumed samples: 6208 | elapsed time per iteration (ms): 13433.5 | learning rate: 1.722E-06 | global batch size: 16 | lm loss: 7.985063E+00 | loss scale: 4096.0 | grad norm: 78090.353 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
| 11296 |
+
time (ms)
|
| 11297 |
+
iteration 389/ 159576 | consumed samples: 6224 | elapsed time per iteration (ms): 13675.2 | learning rate: 1.726E-06 | global batch size: 16 | lm loss: 7.926050E+00 | loss scale: 4096.0 | grad norm: 61534.683 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
| 11298 |
+
time (ms)
|
| 11299 |
+
iteration 390/ 159576 | consumed samples: 6240 | elapsed time per iteration (ms): 13989.4 | learning rate: 1.731E-06 | global batch size: 16 | lm loss: 7.938218E+00 | loss scale: 4096.0 | grad norm: 37749.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
| 11300 |
+
time (ms)
|
| 11301 |
+
iteration 391/ 159576 | consumed samples: 6256 | elapsed time per iteration (ms): 13663.4 | learning rate: 1.735E-06 | global batch size: 16 | lm loss: 7.835842E+00 | loss scale: 4096.0 | grad norm: 48700.287 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
| 11302 |
+
time (ms)
|