bigscience-bot commited on
Commit
4588f4c
·
1 Parent(s): 229827f
Files changed (1) hide show
  1. logs/main_log.txt +77 -0
logs/main_log.txt CHANGED
@@ -8309,3 +8309,80 @@ valid loss at iteration 17000 | lm loss value: 1.893942E+00 | lm loss PPL: 6.645
8309
  iteration 17400/ 296023 | consumed samples: 3829184 | consumed tokens: 7842168832 | elapsed time per iteration (ms): 4656.6 | learning rate: 1.988E-04 | global batch size: 512 | lm loss: 1.909471E+00 | loss scale: 32768.0 | grad norm: 5274.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
8310
  iteration 17600/ 296023 | consumed samples: 3931584 | consumed tokens: 8051884032 | elapsed time per iteration (ms): 4643.5 | learning rate: 1.988E-04 | global batch size: 512 | lm loss: 1.909943E+00 | loss scale: 32768.0 | grad norm: 6170.035 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
8311
  iteration 17800/ 296023 | consumed samples: 4033984 | consumed tokens: 8261599232 | elapsed time per iteration (ms): 4653.0 | learning rate: 1.987E-04 | global batch size: 512 | lm loss: 1.948319E+00 | loss scale: 32768.0 | grad norm: 6084.939 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8309
  iteration 17400/ 296023 | consumed samples: 3829184 | consumed tokens: 7842168832 | elapsed time per iteration (ms): 4656.6 | learning rate: 1.988E-04 | global batch size: 512 | lm loss: 1.909471E+00 | loss scale: 32768.0 | grad norm: 5274.381 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
8310
  iteration 17600/ 296023 | consumed samples: 3931584 | consumed tokens: 8051884032 | elapsed time per iteration (ms): 4643.5 | learning rate: 1.988E-04 | global batch size: 512 | lm loss: 1.909943E+00 | loss scale: 32768.0 | grad norm: 6170.035 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
8311
  iteration 17800/ 296023 | consumed samples: 4033984 | consumed tokens: 8261599232 | elapsed time per iteration (ms): 4653.0 | learning rate: 1.987E-04 | global batch size: 512 | lm loss: 1.948319E+00 | loss scale: 32768.0 | grad norm: 6084.939 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
8312
+ [2021-11-05 09:56:34,720] [INFO] [logging.py:68:log_dist] [Rank 0] step=18000, skipped=38, lr=[0.00019863630787211344, 0.00019863630787211344], mom=[(0.9, 0.999), (0.9, 0.999)]
8313
+ steps: 18000 loss: 2.0673 iter time (s): 0.002 samples/sec: 218843.518
8314
+ iteration 18000/ 296023 | consumed samples: 4136384 | consumed tokens: 8471314432 | elapsed time per iteration (ms): 4665.8 | learning rate: 1.986E-04 | global batch size: 512 | lm loss: 1.892513E+00 | loss scale: 65536.0 | grad norm: 12609.619 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
8315
+ -------------------------------------------------------------------------------------------
8316
+ valid loss at iteration 18000 | lm loss value: 1.864149E+00 | lm loss PPL: 6.450444E+00 |
8317
+ -------------------------------------------------------------------------------------------
8318
+ saving checkpoint at iteration 18000 to /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints
8319
+ [2021-11-05 09:58:26,257] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints/global_step18000/mp_rank_00_model_states.pt
8320
+ [2021-11-05 09:58:26,683] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints/global_step18000/zero_pp_rank_30_mp_rank_01_optim_states.pt
8321
+ [2021-11-05 09:58:26,683] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints/global_step18000/zero_pp_rank_31_mp_rank_01_optim_states.pt
8322
+ [2021-11-05 09:58:26,677] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints/global_step18000/zero_pp_rank_3_mp_rank_01_optim_states.pt
8323
+ [2021-11-05 09:58:26,685] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints/global_step18000/zero_pp_rank_3_mp_rank_00_optim_states.pt
8324
+ [2021-11-05 09:58:26,682] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints/global_step18000/zero_pp_rank_7_mp_rank_01_optim_states.pt
8325
+ [2021-11-05 09:58:26,685] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints/global_step18000/zero_pp_rank_5_mp_rank_00_optim_states.pt
8326
+ [2021-11-05 09:58:26,686] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints/global_step18000/zero_pp_rank_12_mp_rank_01_optim_states.pt
8327
+ [2021-11-05 09:58:26,691] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints/global_step18000/zero_pp_rank_14_mp_rank_01_optim_states.pt
8328
+ [2021-11-05 09:58:26,687] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints/global_step18000/zero_pp_rank_26_mp_rank_00_optim_states.pt
8329
+ [2021-11-05 09:58:26,688] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints/global_step18000/zero_pp_rank_25_mp_rank_00_optim_states.pt
8330
+ [2021-11-05 09:58:26,692] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints/global_step18000/zero_pp_rank_4_mp_rank_00_optim_states.pt
8331
+ [2021-11-05 09:58:26,694] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints/global_step18000/zero_pp_rank_6_mp_rank_01_optim_states.pt
8332
+ [2021-11-05 09:58:26,689] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints/global_step18000/zero_pp_rank_19_mp_rank_00_optim_states.pt
8333
+ [2021-11-05 09:58:26,695] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints/global_step18000/zero_pp_rank_1_mp_rank_00_optim_states.pt
8334
+ [2021-11-05 09:58:26,690] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints/global_step18000/zero_pp_rank_11_mp_rank_01_optim_states.pt
8335
+ [2021-11-05 09:58:26,690] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints/global_step18000/zero_pp_rank_9_mp_rank_00_optim_states.pt
8336
+ [2021-11-05 09:58:26,691] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints/global_step18000/zero_pp_rank_10_mp_rank_00_optim_states.pt
8337
+ [2021-11-05 09:58:26,690] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints/global_step18000/zero_pp_rank_18_mp_rank_01_optim_states.pt
8338
+ [2021-11-05 09:58:26,689] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints/global_step18000/zero_pp_rank_24_mp_rank_01_optim_states.pt
8339
+ [2021-11-05 09:58:26,690] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints/global_step18000/zero_pp_rank_27_mp_rank_01_optim_states.pt
8340
+ [2021-11-05 09:58:26,697] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints/global_step18000/zero_pp_rank_9_mp_rank_01_optim_states.pt
8341
+ [2021-11-05 09:58:26,702] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints/global_step18000/zero_pp_rank_17_mp_rank_01_optim_states.pt
8342
+ [2021-11-05 09:58:26,699] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints/global_step18000/zero_pp_rank_28_mp_rank_00_optim_states.pt
8343
+ [2021-11-05 09:58:26,699] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints/global_step18000/zero_pp_rank_21_mp_rank_01_optim_states.pt
8344
+ [2021-11-05 09:58:26,707] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints/global_step18000/zero_pp_rank_20_mp_rank_01_optim_states.pt
8345
+ [2021-11-05 09:58:26,712] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints/global_step18000/zero_pp_rank_25_mp_rank_01_optim_states.pt
8346
+ [2021-11-05 09:58:26,713] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints/global_step18000/zero_pp_rank_0_mp_rank_01_optim_states.pt
8347
+ [2021-11-05 09:58:26,713] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints/global_step18000/zero_pp_rank_16_mp_rank_00_optim_states.pt
8348
+ [2021-11-05 09:58:26,713] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints/global_step18000/zero_pp_rank_17_mp_rank_00_optim_states.pt
8349
+ [2021-11-05 09:58:26,714] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints/global_step18000/zero_pp_rank_4_mp_rank_01_optim_states.pt
8350
+ [2021-11-05 09:58:26,715] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints/global_step18000/zero_pp_rank_5_mp_rank_01_optim_states.pt
8351
+ [2021-11-05 09:58:26,717] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints/global_step18000/zero_pp_rank_29_mp_rank_00_optim_states.pt
8352
+ [2021-11-05 09:58:26,718] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints/global_step18000/zero_pp_rank_29_mp_rank_01_optim_states.pt
8353
+ [2021-11-05 09:58:26,718] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints/global_step18000/zero_pp_rank_18_mp_rank_00_optim_states.pt
8354
+ [2021-11-05 09:58:26,719] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints/global_step18000/zero_pp_rank_22_mp_rank_01_optim_states.pt
8355
+ [2021-11-05 09:58:26,719] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints/global_step18000/zero_pp_rank_19_mp_rank_01_optim_states.pt
8356
+ [2021-11-05 09:58:26,719] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints/global_step18000/zero_pp_rank_26_mp_rank_01_optim_states.pt
8357
+ [2021-11-05 09:58:26,717] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints/global_step18000/zero_pp_rank_13_mp_rank_00_optim_states.pt
8358
+ [2021-11-05 09:58:26,717] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints/global_step18000/zero_pp_rank_12_mp_rank_00_optim_states.pt
8359
+ [2021-11-05 09:58:26,722] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints/global_step18000/zero_pp_rank_7_mp_rank_00_optim_states.pt
8360
+ [2021-11-05 09:58:26,722] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints/global_step18000/zero_pp_rank_28_mp_rank_01_optim_states.pt
8361
+ [2021-11-05 09:58:26,724] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints/global_step18000/zero_pp_rank_11_mp_rank_00_optim_states.pt
8362
+ [2021-11-05 09:58:26,725] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints/global_step18000/zero_pp_rank_24_mp_rank_00_optim_states.pt
8363
+ [2021-11-05 09:58:26,726] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints/global_step18000/zero_pp_rank_27_mp_rank_00_optim_states.pt
8364
+ [2021-11-05 09:58:26,720] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints/global_step18000/zero_pp_rank_22_mp_rank_00_optim_states.pt
8365
+ [2021-11-05 09:58:26,721] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints/global_step18000/zero_pp_rank_21_mp_rank_00_optim_states.pt
8366
+ [2021-11-05 09:58:26,726] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints/global_step18000/zero_pp_rank_15_mp_rank_01_optim_states.pt
8367
+ [2021-11-05 09:58:26,726] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints/global_step18000/zero_pp_rank_10_mp_rank_01_optim_states.pt
8368
+ [2021-11-05 09:58:26,728] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints/global_step18000/zero_pp_rank_6_mp_rank_00_optim_states.pt
8369
+ [2021-11-05 09:58:26,728] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints/global_step18000/zero_pp_rank_2_mp_rank_00_optim_states.pt
8370
+ [2021-11-05 09:58:26,729] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints/global_step18000/zero_pp_rank_1_mp_rank_01_optim_states.pt
8371
+ [2021-11-05 09:58:26,729] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints/global_step18000/zero_pp_rank_23_mp_rank_01_optim_states.pt
8372
+ [2021-11-05 09:58:26,729] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints/global_step18000/zero_pp_rank_14_mp_rank_00_optim_states.pt
8373
+ [2021-11-05 09:58:26,730] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints/global_step18000/zero_pp_rank_8_mp_rank_00_optim_states.pt
8374
+ [2021-11-05 09:58:26,731] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints/global_step18000/zero_pp_rank_8_mp_rank_01_optim_states.pt
8375
+ [2021-11-05 09:58:26,732] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints/global_step18000/zero_pp_rank_31_mp_rank_00_optim_states.pt
8376
+ [2021-11-05 09:58:26,733] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints/global_step18000/zero_pp_rank_15_mp_rank_00_optim_states.pt
8377
+ [2021-11-05 09:58:26,734] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints/global_step18000/zero_pp_rank_20_mp_rank_00_optim_states.pt
8378
+ [2021-11-05 09:58:26,735] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints/global_step18000/zero_pp_rank_30_mp_rank_00_optim_states.pt
8379
+ [2021-11-05 09:58:26,742] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints/global_step18000/zero_pp_rank_23_mp_rank_00_optim_states.pt
8380
+ [2021-11-05 09:58:26,742] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints/global_step18000/zero_pp_rank_13_mp_rank_01_optim_states.pt
8381
+ [2021-11-05 09:58:26,745] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints/global_step18000/zero_pp_rank_0_mp_rank_00_optim_states.pt
8382
+ [2021-11-05 09:58:26,748] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints/global_step18000/zero_pp_rank_16_mp_rank_01_optim_states.pt
8383
+ [2021-11-05 09:58:26,750] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints/global_step18000/zero_pp_rank_2_mp_rank_01_optim_states.pt
8384
+ successfully saved checkpoint at iteration 18000 to /gpfsscratch/rech/six/commun/checkpoints/tr6e-1B3-pile/checkpoints
8385
+ time (ms) | save-checkpoint: 2752.92
8386
+ iteration 18200/ 296023 | consumed samples: 4238784 | consumed tokens: 8681029632 | elapsed time per iteration (ms): 5209.2 | learning rate: 1.986E-04 | global batch size: 512 | lm loss: 1.918148E+00 | loss scale: 65536.0 | grad norm: 9739.549 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
8387
+ iteration 18400/ 296023 | consumed samples: 4341184 | consumed tokens: 8890744832 | elapsed time per iteration (ms): 4642.1 | learning rate: 1.985E-04 | global batch size: 512 | lm loss: 1.884550E+00 | loss scale: 16384.0 | grad norm: 2549.465 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
8388
+ iteration 18600/ 296023 | consumed samples: 4443584 | consumed tokens: 9100460032 | elapsed time per iteration (ms): 4652.5 | learning rate: 1.984E-04 | global batch size: 512 | lm loss: 1.880175E+00 | loss scale: 16384.0 | grad norm: 2965.738 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |