INFO:__main__:Initializing DDP settings... INFO:__main__: is_ddp = True INFO:__main__:Initializing PyTorch settings... INFO:__main__:Initializing models and optimizers... INFO:__main__: Resuming training from `/root/autodl-tmp/checkpoint_20241130_005942_step_307000.pt`... INFO:__main__: block_size = 512 INFO:__main__: vocab_size = 16384 INFO:__main__: n_layer = 24 INFO:__main__: n_embd = 768 INFO:__main__: n_head = 16 INFO:__main__: n_kv_head = 8 INFO:__main__: n_hidden = 2048 INFO:__main__: Parameters INFO:__main__: Total = 168,334,080 (0.16833408B) INFO:__main__: Train = 168,334,080 (100.0%) INFO:__main__:Loading dataset... INFO:__main__: Train set 0 : 12,881,662 samples (6,595,410,944 tokens) INFO:__main__: Valid set 0 : 644,124 samples (329,791,488 tokens) INFO:__main__:2024-11-30 01:52:01 | Start training from iteration #307000 INFO:__main__:2024-11-30 01:52:03 | Validation | Step: 307000 | Val_loss: 9.048 | Best_val_loss: 19.4081 INFO:__main__:2024-11-30 01:52:04 | Epoch: 0 | Step: 307000 | Dataset: 0-360 | Loss: 11.708 | 800 ms/step , 86316.45 GFLOP/s , 0.0 tokens/s INFO:__main__:2024-11-30 01:52:11 | Epoch: 0 | Step: 307010 | Dataset: 0-2760 | Loss: 10.047 | 595 ms/step , 115984.47 GFLOP/s , 172977.9 tokens/s INFO:__main__:2024-11-30 01:52:18 | Epoch: 0 | Step: 307020 | Dataset: 0-5160 | Loss: 9.328 | 595 ms/step , 115936.44 GFLOP/s , 173372.9 tokens/s INFO:__main__:2024-11-30 01:52:25 | Epoch: 0 | Step: 307030 | Dataset: 0-7560 | Loss: 7.429 | 596 ms/step , 115870.28 GFLOP/s , 174068.2 tokens/s INFO:__main__:2024-11-30 01:52:32 | Epoch: 0 | Step: 307040 | Dataset: 0-9960 | Loss: 6.315 | 596 ms/step , 115750.07 GFLOP/s , 173931.4 tokens/s INFO:__main__:2024-11-30 01:52:39 | Epoch: 0 | Step: 307050 | Dataset: 0-12360 | Loss: 6.389 | 597 ms/step , 115690.23 GFLOP/s , 174016.3 tokens/s INFO:__main__:2024-11-30 01:52:46 | Epoch: 0 | Step: 307060 | Dataset: 0-14760 | Loss: 5.521 | 598 ms/step , 115487.44 GFLOP/s , 173944.2 tokens/s INFO:__main__:2024-11-30 01:52:53 | Epoch: 0 | Step: 307070 | Dataset: 0-17160 | Loss: 5.991 | 598 ms/step , 115439.35 GFLOP/s , 173727.7 tokens/s INFO:__main__:2024-11-30 01:53:00 | Epoch: 0 | Step: 307080 | Dataset: 0-19560 | Loss: 5.880 | 597 ms/step , 115589.69 GFLOP/s , 173657.1 tokens/s INFO:__main__:2024-11-30 01:53:07 | Epoch: 0 | Step: 307090 | Dataset: 0-21960 | Loss: 5.425 | 598 ms/step , 115401.81 GFLOP/s , 173454.9 tokens/s INFO:__main__:2024-11-30 01:53:14 | Epoch: 0 | Step: 307100 | Dataset: 0-24360 | Loss: 5.806 | 598 ms/step , 115476.47 GFLOP/s , 173502.0 tokens/s INFO:__main__:2024-11-30 01:53:21 | Epoch: 0 | Step: 307110 | Dataset: 0-26760 | Loss: 5.769 | 598 ms/step , 115366.48 GFLOP/s , 173323.2 tokens/s INFO:__main__:2024-11-30 01:53:29 | Epoch: 0 | Step: 307120 | Dataset: 0-29160 | Loss: 5.952 | 598 ms/step , 115394.34 GFLOP/s , 173388.4 tokens/s INFO:__main__:2024-11-30 01:53:36 | Epoch: 0 | Step: 307130 | Dataset: 0-31560 | Loss: 5.694 | 598 ms/step , 115410.10 GFLOP/s , 173439.6 tokens/s INFO:__main__:2024-11-30 01:53:43 | Epoch: 0 | Step: 307140 | Dataset: 0-33960 | Loss: 5.570 | 598 ms/step , 115384.95 GFLOP/s , 173314.2 tokens/s INFO:__main__:2024-11-30 01:53:50 | Epoch: 0 | Step: 307150 | Dataset: 0-36360 | Loss: 5.733 | 598 ms/step , 115364.76 GFLOP/s , 173270.4 tokens/s INFO:__main__:2024-11-30 01:53:57 | Epoch: 0 | Step: 307160 | Dataset: 0-38760 | Loss: 5.718 | 599 ms/step , 115187.45 GFLOP/s , 173397.8 tokens/s INFO:__main__:2024-11-30 01:54:04 | Epoch: 0 | Step: 307170 | Dataset: 0-41160 | Loss: 5.543 | 599 ms/step , 115292.44 GFLOP/s , 173450.0 tokens/s INFO:__main__:2024-11-30 01:54:11 | Epoch: 0 | Step: 307180 | Dataset: 0-43560 | Loss: 5.779 | 598 ms/step , 115347.95 GFLOP/s , 173406.0 tokens/s INFO:__main__:2024-11-30 01:54:18 | Epoch: 0 | Step: 307190 | Dataset: 0-45960 | Loss: 5.647 | 598 ms/step , 115325.72 GFLOP/s , 173420.1 tokens/s INFO:__main__:2024-11-30 01:54:25 | Epoch: 0 | Step: 307200 | Dataset: 0-48360 | Loss: 5.803 | 599 ms/step , 115278.63 GFLOP/s , 173453.2 tokens/s INFO:__main__:2024-11-30 01:54:32 | Epoch: 0 | Step: 307210 | Dataset: 0-50760 | Loss: 5.609 | 598 ms/step , 115329.78 GFLOP/s , 173431.2 tokens/s INFO:__main__:2024-11-30 01:54:39 | Epoch: 0 | Step: 307220 | Dataset: 0-53160 | Loss: 5.613 | 598 ms/step , 115348.93 GFLOP/s , 173462.1 tokens/s INFO:__main__:2024-11-30 01:54:46 | Epoch: 0 | Step: 307230 | Dataset: 0-55560 | Loss: 5.703 | 599 ms/step , 115214.01 GFLOP/s , 173362.4 tokens/s INFO:__main__:2024-11-30 01:54:54 | Epoch: 0 | Step: 307240 | Dataset: 0-57960 | Loss: 5.327 | 599 ms/step , 115278.22 GFLOP/s , 173345.5 tokens/s INFO:__main__:2024-11-30 01:55:01 | Epoch: 0 | Step: 307250 | Dataset: 0-60360 | Loss: 5.461 | 599 ms/step , 115132.55 GFLOP/s , 173245.4 tokens/s INFO:__main__:2024-11-30 01:55:08 | Epoch: 0 | Step: 307260 | Dataset: 0-62760 | Loss: 5.175 | 599 ms/step , 115268.33 GFLOP/s , 173281.4 tokens/s INFO:__main__:2024-11-30 01:55:15 | Epoch: 0 | Step: 307270 | Dataset: 0-65160 | Loss: 5.479 | 599 ms/step , 115272.68 GFLOP/s , 173320.7 tokens/s INFO:__main__:2024-11-30 01:55:22 | Epoch: 0 | Step: 307280 | Dataset: 0-67560 | Loss: 5.524 | 598 ms/step , 115335.30 GFLOP/s , 173436.5 tokens/s INFO:__main__:2024-11-30 01:55:29 | Epoch: 0 | Step: 307290 | Dataset: 0-69960 | Loss: 5.416 | 598 ms/step , 115372.71 GFLOP/s , 173473.2 tokens/s INFO:__main__:2024-11-30 01:55:36 | Epoch: 0 | Step: 307300 | Dataset: 0-72360 | Loss: 5.445 | 599 ms/step , 115139.72 GFLOP/s , 173281.2 tokens/s INFO:__main__:2024-11-30 01:55:43 | Epoch: 0 | Step: 307310 | Dataset: 0-74760 | Loss: 5.024 | 601 ms/step , 114830.76 GFLOP/s , 173320.5 tokens/s INFO:__main__:2024-11-30 01:55:50 | Epoch: 0 | Step: 307320 | Dataset: 0-77160 | Loss: 5.227 | 599 ms/step , 115268.87 GFLOP/s , 173319.0 tokens/s INFO:__main__:2024-11-30 01:55:57 | Epoch: 0 | Step: 307330 | Dataset: 0-79560 | Loss: 5.302 | 601 ms/step , 114918.13 GFLOP/s , 173286.8 tokens/s INFO:__main__:2024-11-30 01:56:04 | Epoch: 0 | Step: 307340 | Dataset: 0-81960 | Loss: 5.146 | 599 ms/step , 115293.51 GFLOP/s , 173284.4 tokens/s INFO:__main__:2024-11-30 01:56:12 | Epoch: 0 | Step: 307350 | Dataset: 0-84360 | Loss: 5.326 | 598 ms/step , 115442.28 GFLOP/s , 173359.9 tokens/s INFO:__main__:2024-11-30 01:56:19 | Epoch: 0 | Step: 307360 | Dataset: 0-86760 | Loss: 5.067 | 598 ms/step , 115392.96 GFLOP/s , 173471.3 tokens/s INFO:__main__:2024-11-30 01:56:26 | Epoch: 0 | Step: 307370 | Dataset: 0-89160 | Loss: 5.479 | 598 ms/step , 115318.25 GFLOP/s , 173368.4 tokens/s INFO:__main__:2024-11-30 01:56:33 | Epoch: 0 | Step: 307380 | Dataset: 0-91560 | Loss: 4.974 | 599 ms/step , 115164.56 GFLOP/s , 173280.7 tokens/s INFO:__main__:2024-11-30 01:56:40 | Epoch: 0 | Step: 307390 | Dataset: 0-93960 | Loss: 4.972 | 599 ms/step , 115239.42 GFLOP/s , 173316.2 tokens/s INFO:__main__:2024-11-30 01:56:47 | Epoch: 0 | Step: 307400 | Dataset: 0-96360 | Loss: 5.001 | 599 ms/step , 115152.06 GFLOP/s , 173305.0 tokens/s INFO:__main__:2024-11-30 01:56:54 | Epoch: 0 | Step: 307410 | Dataset: 0-98760 | Loss: 4.716 | 599 ms/step , 115298.66 GFLOP/s , 173372.6 tokens/s INFO:__main__:2024-11-30 01:57:01 | Epoch: 0 | Step: 307420 | Dataset: 0-101160 | Loss: 5.141 | 601 ms/step , 114919.59 GFLOP/s , 173283.3 tokens/s INFO:__main__:2024-11-30 01:57:08 | Epoch: 0 | Step: 307430 | Dataset: 0-103560 | Loss: 5.091 | 599 ms/step , 115264.27 GFLOP/s , 173431.1 tokens/s INFO:__main__:2024-11-30 01:57:15 | Epoch: 0 | Step: 307440 | Dataset: 0-105960 | Loss: 4.812 | 599 ms/step , 115190.06 GFLOP/s , 173472.3 tokens/s INFO:__main__:2024-11-30 01:57:22 | Epoch: 0 | Step: 307450 | Dataset: 0-108360 | Loss: 4.799 | 599 ms/step , 115265.15 GFLOP/s , 173345.5 tokens/s INFO:__main__:2024-11-30 01:57:30 | Epoch: 0 | Step: 307460 | Dataset: 0-110760 | Loss: 4.830 | 599 ms/step , 115219.22 GFLOP/s , 173286.2 tokens/s INFO:__main__:2024-11-30 01:57:37 | Epoch: 0 | Step: 307470 | Dataset: 0-113160 | Loss: 4.914 | 599 ms/step , 115247.83 GFLOP/s , 173306.3 tokens/s INFO:__main__:2024-11-30 01:57:44 | Epoch: 0 | Step: 307480 | Dataset: 0-115560 | Loss: 4.722 | 599 ms/step , 115241.48 GFLOP/s , 173323.1 tokens/s INFO:__main__:2024-11-30 01:57:51 | Epoch: 0 | Step: 307490 | Dataset: 0-117960 | Loss: 4.741 | 599 ms/step , 115263.13 GFLOP/s , 173339.4 tokens/s INFO:__main__:2024-11-30 01:57:58 | Validation | Step: 307500 | Val_loss: 5.613 | Best_val_loss: 19.4081 INFO:__main__:2024-11-30 01:57:58 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_015758_step_307500.pt` INFO:__main__:2024-11-30 01:58:01 | Epoch: 0 | Step: 307500 | Dataset: 0-120360 | Loss: 4.672 | 595 ms/step , 116028.37 GFLOP/s , 120433.0 tokens/s INFO:__main__:2024-11-30 01:58:08 | Epoch: 0 | Step: 307510 | Dataset: 0-122760 | Loss: 4.837 | 598 ms/step , 115387.53 GFLOP/s , 173482.5 tokens/s INFO:__main__:2024-11-30 01:58:15 | Epoch: 0 | Step: 307520 | Dataset: 0-125160 | Loss: 4.639 | 599 ms/step , 115225.30 GFLOP/s , 173219.8 tokens/s INFO:__main__:2024-11-30 01:58:22 | Epoch: 0 | Step: 307530 | Dataset: 0-127560 | Loss: 4.963 | 599 ms/step , 115290.19 GFLOP/s , 173365.3 tokens/s INFO:__main__:2024-11-30 01:58:29 | Epoch: 0 | Step: 307540 | Dataset: 0-129960 | Loss: 4.528 | 599 ms/step , 115229.97 GFLOP/s , 173306.7 tokens/s INFO:__main__:2024-11-30 01:58:36 | Epoch: 0 | Step: 307550 | Dataset: 0-132360 | Loss: 4.761 | 599 ms/step , 115275.81 GFLOP/s , 173374.9 tokens/s INFO:__main__:2024-11-30 01:58:44 | Epoch: 0 | Step: 307560 | Dataset: 0-134760 | Loss: 4.852 | 599 ms/step , 115247.61 GFLOP/s , 173302.7 tokens/s INFO:__main__:2024-11-30 01:58:51 | Epoch: 0 | Step: 307570 | Dataset: 0-137160 | Loss: 4.667 | 599 ms/step , 115203.63 GFLOP/s , 173363.7 tokens/s INFO:__main__:2024-11-30 01:58:58 | Epoch: 0 | Step: 307580 | Dataset: 0-139560 | Loss: 4.517 | 598 ms/step , 115434.02 GFLOP/s , 173541.9 tokens/s INFO:__main__:2024-11-30 01:59:05 | Epoch: 0 | Step: 307590 | Dataset: 0-141960 | Loss: 4.396 | 598 ms/step , 115367.70 GFLOP/s , 173385.1 tokens/s INFO:__main__:2024-11-30 01:59:12 | Epoch: 0 | Step: 307600 | Dataset: 0-144360 | Loss: 4.634 | 599 ms/step , 115294.77 GFLOP/s , 173328.0 tokens/s INFO:__main__:2024-11-30 01:59:19 | Epoch: 0 | Step: 307610 | Dataset: 0-146760 | Loss: 4.700 | 598 ms/step , 115354.96 GFLOP/s , 173362.2 tokens/s INFO:__main__:2024-11-30 01:59:26 | Epoch: 0 | Step: 307620 | Dataset: 0-149160 | Loss: 4.478 | 599 ms/step , 115306.92 GFLOP/s , 173348.6 tokens/s INFO:__main__:2024-11-30 01:59:33 | Epoch: 0 | Step: 307630 | Dataset: 0-151560 | Loss: 4.362 | 599 ms/step , 115212.54 GFLOP/s , 173339.3 tokens/s INFO:__main__:2024-11-30 01:59:40 | Epoch: 0 | Step: 307640 | Dataset: 0-153960 | Loss: 4.733 | 599 ms/step , 115184.35 GFLOP/s , 173302.0 tokens/s INFO:__main__:2024-11-30 01:59:47 | Epoch: 0 | Step: 307650 | Dataset: 0-156360 | Loss: 4.437 | 599 ms/step , 115304.11 GFLOP/s , 173440.4 tokens/s INFO:__main__:2024-11-30 01:59:54 | Epoch: 0 | Step: 307660 | Dataset: 0-158760 | Loss: 4.584 | 598 ms/step , 115327.74 GFLOP/s , 173457.6 tokens/s INFO:__main__:2024-11-30 02:00:01 | Epoch: 0 | Step: 307670 | Dataset: 0-161160 | Loss: 4.349 | 599 ms/step , 115195.57 GFLOP/s , 173329.1 tokens/s INFO:__main__:2024-11-30 02:00:09 | Epoch: 0 | Step: 307680 | Dataset: 0-163560 | Loss: 4.204 | 598 ms/step , 115428.88 GFLOP/s , 173338.6 tokens/s INFO:__main__:2024-11-30 02:00:16 | Epoch: 0 | Step: 307690 | Dataset: 0-165960 | Loss: 4.359 | 599 ms/step , 115188.27 GFLOP/s , 173206.0 tokens/s INFO:__main__:2024-11-30 02:00:23 | Epoch: 0 | Step: 307700 | Dataset: 0-168360 | Loss: 4.311 | 599 ms/step , 115173.57 GFLOP/s , 173171.0 tokens/s INFO:__main__:2024-11-30 02:00:30 | Epoch: 0 | Step: 307710 | Dataset: 0-170760 | Loss: 4.494 | 599 ms/step , 115213.69 GFLOP/s , 173255.7 tokens/s INFO:__main__:2024-11-30 02:00:37 | Epoch: 0 | Step: 307720 | Dataset: 0-173160 | Loss: 4.125 | 598 ms/step , 115345.14 GFLOP/s , 173279.7 tokens/s INFO:__main__:2024-11-30 02:00:44 | Epoch: 0 | Step: 307730 | Dataset: 0-175560 | Loss: 4.185 | 599 ms/step , 115234.84 GFLOP/s , 173413.5 tokens/s INFO:__main__:2024-11-30 02:00:51 | Epoch: 0 | Step: 307740 | Dataset: 0-177960 | Loss: 4.354 | 599 ms/step , 115202.77 GFLOP/s , 173349.2 tokens/s INFO:__main__:2024-11-30 02:00:58 | Epoch: 0 | Step: 307750 | Dataset: 0-180360 | Loss: 4.230 | 599 ms/step , 115232.08 GFLOP/s , 173251.3 tokens/s INFO:__main__:2024-11-30 02:01:05 | Epoch: 0 | Step: 307760 | Dataset: 0-182760 | Loss: 4.287 | 598 ms/step , 115313.40 GFLOP/s , 173290.9 tokens/s INFO:__main__:2024-11-30 02:01:12 | Epoch: 0 | Step: 307770 | Dataset: 0-185160 | Loss: 4.310 | 599 ms/step , 115171.03 GFLOP/s , 173253.1 tokens/s INFO:__main__:2024-11-30 02:01:19 | Epoch: 0 | Step: 307780 | Dataset: 0-187560 | Loss: 4.202 | 598 ms/step , 115350.36 GFLOP/s , 173277.0 tokens/s INFO:__main__:2024-11-30 02:01:27 | Epoch: 0 | Step: 307790 | Dataset: 0-189960 | Loss: 4.011 | 599 ms/step , 115181.81 GFLOP/s , 173288.2 tokens/s INFO:__main__:2024-11-30 02:01:34 | Epoch: 0 | Step: 307800 | Dataset: 0-192360 | Loss: 4.169 | 599 ms/step , 115264.29 GFLOP/s , 173184.9 tokens/s INFO:__main__:2024-11-30 02:01:41 | Epoch: 0 | Step: 307810 | Dataset: 0-194760 | Loss: 4.181 | 599 ms/step , 115216.33 GFLOP/s , 173446.2 tokens/s INFO:__main__:2024-11-30 02:01:48 | Epoch: 0 | Step: 307820 | Dataset: 0-197160 | Loss: 4.178 | 599 ms/step , 115134.47 GFLOP/s , 173351.7 tokens/s INFO:__main__:2024-11-30 02:01:55 | Epoch: 0 | Step: 307830 | Dataset: 0-199560 | Loss: 4.020 | 600 ms/step , 114961.26 GFLOP/s , 173254.6 tokens/s INFO:__main__:2024-11-30 02:02:02 | Epoch: 0 | Step: 307840 | Dataset: 0-201960 | Loss: 3.949 | 599 ms/step , 115247.74 GFLOP/s , 173349.0 tokens/s INFO:__main__:2024-11-30 02:02:09 | Epoch: 0 | Step: 307850 | Dataset: 0-204360 | Loss: 3.877 | 598 ms/step , 115386.52 GFLOP/s , 173328.8 tokens/s INFO:__main__:2024-11-30 02:02:16 | Epoch: 0 | Step: 307860 | Dataset: 0-206760 | Loss: 4.015 | 599 ms/step , 115181.59 GFLOP/s , 173300.4 tokens/s INFO:__main__:2024-11-30 02:02:23 | Epoch: 0 | Step: 307870 | Dataset: 0-209160 | Loss: 4.041 | 598 ms/step , 115355.11 GFLOP/s , 173352.9 tokens/s INFO:__main__:2024-11-30 02:02:30 | Epoch: 0 | Step: 307880 | Dataset: 0-211560 | Loss: 3.893 | 598 ms/step , 115313.35 GFLOP/s , 173482.0 tokens/s INFO:__main__:2024-11-30 02:02:37 | Epoch: 0 | Step: 307890 | Dataset: 0-213960 | Loss: 3.747 | 599 ms/step , 115224.24 GFLOP/s , 173443.8 tokens/s INFO:__main__:2024-11-30 02:02:45 | Epoch: 0 | Step: 307900 | Dataset: 0-216360 | Loss: 3.697 | 598 ms/step , 115378.47 GFLOP/s , 173350.0 tokens/s INFO:__main__:2024-11-30 02:02:52 | Epoch: 0 | Step: 307910 | Dataset: 0-218760 | Loss: 3.923 | 599 ms/step , 115229.28 GFLOP/s , 173318.0 tokens/s INFO:__main__:2024-11-30 02:02:59 | Epoch: 0 | Step: 307920 | Dataset: 0-221160 | Loss: 3.684 | 599 ms/step , 115129.26 GFLOP/s , 173308.7 tokens/s INFO:__main__:2024-11-30 02:03:06 | Epoch: 0 | Step: 307930 | Dataset: 0-223560 | Loss: 3.594 | 598 ms/step , 115311.93 GFLOP/s , 173360.7 tokens/s INFO:__main__:2024-11-30 02:03:13 | Epoch: 0 | Step: 307940 | Dataset: 0-225960 | Loss: 3.707 | 599 ms/step , 115158.24 GFLOP/s , 173357.4 tokens/s INFO:__main__:2024-11-30 02:03:20 | Epoch: 0 | Step: 307950 | Dataset: 0-228360 | Loss: 3.711 | 598 ms/step , 115425.85 GFLOP/s , 173416.9 tokens/s INFO:__main__:2024-11-30 02:03:27 | Epoch: 0 | Step: 307960 | Dataset: 0-230760 | Loss: 3.780 | 599 ms/step , 115294.45 GFLOP/s , 173496.5 tokens/s INFO:__main__:2024-11-30 02:03:34 | Epoch: 0 | Step: 307970 | Dataset: 0-233160 | Loss: 3.713 | 599 ms/step , 115165.02 GFLOP/s , 173418.7 tokens/s INFO:__main__:2024-11-30 02:03:41 | Epoch: 0 | Step: 307980 | Dataset: 0-235560 | Loss: 3.627 | 598 ms/step , 115344.31 GFLOP/s , 173300.2 tokens/s INFO:__main__:2024-11-30 02:03:48 | Epoch: 0 | Step: 307990 | Dataset: 0-237960 | Loss: 3.656 | 600 ms/step , 115069.80 GFLOP/s , 173331.4 tokens/s INFO:__main__:2024-11-30 02:03:56 | Validation | Step: 308000 | Val_loss: 4.126 | Best_val_loss: 5.6127 INFO:__main__:2024-11-30 02:03:56 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_020356_step_308000.pt` INFO:__main__:2024-11-30 02:03:59 | Epoch: 0 | Step: 308000 | Dataset: 0-240360 | Loss: 3.602 | 596 ms/step , 115852.65 GFLOP/s , 118390.1 tokens/s INFO:__main__:2024-11-30 02:04:06 | Epoch: 0 | Step: 308010 | Dataset: 0-242760 | Loss: 3.569 | 598 ms/step , 115312.77 GFLOP/s , 173344.2 tokens/s INFO:__main__:2024-11-30 02:04:13 | Epoch: 0 | Step: 308020 | Dataset: 0-245160 | Loss: 3.592 | 598 ms/step , 115361.21 GFLOP/s , 173304.4 tokens/s INFO:__main__:2024-11-30 02:04:20 | Epoch: 0 | Step: 308030 | Dataset: 0-247560 | Loss: 3.469 | 599 ms/step , 115306.23 GFLOP/s , 173468.8 tokens/s INFO:__main__:2024-11-30 02:04:27 | Epoch: 0 | Step: 308040 | Dataset: 0-249960 | Loss: 3.511 | 599 ms/step , 115287.89 GFLOP/s , 173524.0 tokens/s INFO:__main__:2024-11-30 02:04:34 | Epoch: 0 | Step: 308050 | Dataset: 0-252360 | Loss: 3.473 | 598 ms/step , 115350.39 GFLOP/s , 173378.2 tokens/s INFO:__main__:2024-11-30 02:04:41 | Epoch: 0 | Step: 308060 | Dataset: 0-254760 | Loss: 3.279 | 599 ms/step , 115256.65 GFLOP/s , 173392.7 tokens/s INFO:__main__:2024-11-30 02:04:48 | Epoch: 0 | Step: 308070 | Dataset: 0-257160 | Loss: 3.306 | 599 ms/step , 115299.62 GFLOP/s , 173357.2 tokens/s INFO:__main__:2024-11-30 02:04:55 | Epoch: 0 | Step: 308080 | Dataset: 0-259560 | Loss: 3.430 | 599 ms/step , 115289.35 GFLOP/s , 173380.1 tokens/s INFO:__main__:2024-11-30 02:05:02 | Epoch: 0 | Step: 308090 | Dataset: 0-261960 | Loss: 3.325 | 599 ms/step , 115236.67 GFLOP/s , 173355.6 tokens/s INFO:__main__:2024-11-30 02:05:10 | Epoch: 0 | Step: 308100 | Dataset: 0-264360 | Loss: 4.392 | 598 ms/step , 115421.65 GFLOP/s , 173446.1 tokens/s INFO:__main__:2024-11-30 02:05:17 | Epoch: 0 | Step: 308110 | Dataset: 0-266760 | Loss: 3.988 | 599 ms/step , 115228.01 GFLOP/s , 173497.7 tokens/s INFO:__main__:2024-11-30 02:05:24 | Epoch: 0 | Step: 308120 | Dataset: 0-269160 | Loss: 4.215 | 599 ms/step , 115290.33 GFLOP/s , 173381.3 tokens/s INFO:__main__:2024-11-30 02:05:31 | Epoch: 0 | Step: 308130 | Dataset: 0-271560 | Loss: 3.992 | 599 ms/step , 115296.21 GFLOP/s , 173350.8 tokens/s INFO:__main__:2024-11-30 02:05:38 | Epoch: 0 | Step: 308140 | Dataset: 0-273960 | Loss: 3.696 | 599 ms/step , 115196.23 GFLOP/s , 173340.9 tokens/s INFO:__main__:2024-11-30 02:05:45 | Epoch: 0 | Step: 308150 | Dataset: 0-276360 | Loss: 2.504 | 600 ms/step , 115115.91 GFLOP/s , 173346.4 tokens/s INFO:__main__:2024-11-30 02:05:52 | Epoch: 0 | Step: 308160 | Dataset: 0-278760 | Loss: 3.930 | 599 ms/step , 115158.50 GFLOP/s , 173355.9 tokens/s INFO:__main__:2024-11-30 02:05:59 | Epoch: 0 | Step: 308170 | Dataset: 0-281160 | Loss: 3.882 | 599 ms/step , 115226.48 GFLOP/s , 173324.3 tokens/s INFO:__main__:2024-11-30 02:06:06 | Epoch: 0 | Step: 308180 | Dataset: 0-283560 | Loss: 3.838 | 599 ms/step , 115307.32 GFLOP/s , 173472.3 tokens/s INFO:__main__:2024-11-30 02:06:13 | Epoch: 0 | Step: 308190 | Dataset: 0-285960 | Loss: 2.397 | 598 ms/step , 115346.35 GFLOP/s , 173458.6 tokens/s INFO:__main__:2024-11-30 02:06:20 | Epoch: 0 | Step: 308200 | Dataset: 0-288360 | Loss: 4.735 | 599 ms/step , 115303.54 GFLOP/s , 173379.2 tokens/s INFO:__main__:2024-11-30 02:06:28 | Epoch: 0 | Step: 308210 | Dataset: 0-290760 | Loss: 4.086 | 599 ms/step , 115280.22 GFLOP/s , 173358.9 tokens/s INFO:__main__:2024-11-30 02:06:35 | Epoch: 0 | Step: 308220 | Dataset: 0-293160 | Loss: 3.478 | 599 ms/step , 115298.00 GFLOP/s , 173381.8 tokens/s INFO:__main__:2024-11-30 02:06:42 | Epoch: 0 | Step: 308230 | Dataset: 0-295560 | Loss: 3.472 | 599 ms/step , 115190.51 GFLOP/s , 173348.4 tokens/s INFO:__main__:2024-11-30 02:06:49 | Epoch: 0 | Step: 308240 | Dataset: 0-297960 | Loss: 3.514 | 599 ms/step , 115308.72 GFLOP/s , 173323.6 tokens/s INFO:__main__:2024-11-30 02:06:56 | Epoch: 0 | Step: 308250 | Dataset: 0-300360 | Loss: 3.498 | 598 ms/step , 115388.28 GFLOP/s , 173424.4 tokens/s INFO:__main__:2024-11-30 02:07:03 | Epoch: 0 | Step: 308260 | Dataset: 0-302760 | Loss: 3.057 | 599 ms/step , 115295.97 GFLOP/s , 173507.2 tokens/s INFO:__main__:2024-11-30 02:07:10 | Epoch: 0 | Step: 308270 | Dataset: 0-305160 | Loss: 3.342 | 599 ms/step , 115209.16 GFLOP/s , 173427.8 tokens/s INFO:__main__:2024-11-30 02:07:17 | Epoch: 0 | Step: 308280 | Dataset: 0-307560 | Loss: 3.333 | 598 ms/step , 115384.78 GFLOP/s , 173373.4 tokens/s INFO:__main__:2024-11-30 02:07:24 | Epoch: 0 | Step: 308290 | Dataset: 0-309960 | Loss: 2.456 | 599 ms/step , 115249.59 GFLOP/s , 173372.5 tokens/s INFO:__main__:2024-11-30 02:07:31 | Epoch: 0 | Step: 308300 | Dataset: 0-312360 | Loss: 3.357 | 599 ms/step , 115298.56 GFLOP/s , 173335.1 tokens/s INFO:__main__:2024-11-30 02:07:38 | Epoch: 0 | Step: 308310 | Dataset: 0-314760 | Loss: 3.151 | 599 ms/step , 115271.69 GFLOP/s , 173348.5 tokens/s INFO:__main__:2024-11-30 02:07:45 | Epoch: 0 | Step: 308320 | Dataset: 0-317160 | Loss: 3.598 | 598 ms/step , 115314.97 GFLOP/s , 173353.6 tokens/s INFO:__main__:2024-11-30 02:07:53 | Epoch: 0 | Step: 308330 | Dataset: 0-319560 | Loss: 3.451 | 598 ms/step , 115433.95 GFLOP/s , 173553.9 tokens/s INFO:__main__:2024-11-30 02:08:00 | Epoch: 0 | Step: 308340 | Dataset: 0-321960 | Loss: 3.522 | 598 ms/step , 115449.94 GFLOP/s , 173582.0 tokens/s INFO:__main__:2024-11-30 02:08:07 | Epoch: 0 | Step: 308350 | Dataset: 0-324360 | Loss: 3.473 | 598 ms/step , 115431.83 GFLOP/s , 173421.4 tokens/s INFO:__main__:2024-11-30 02:08:14 | Epoch: 0 | Step: 308360 | Dataset: 0-326760 | Loss: 3.439 | 599 ms/step , 115306.12 GFLOP/s , 173422.8 tokens/s INFO:__main__:2024-11-30 02:08:21 | Epoch: 0 | Step: 308370 | Dataset: 0-329160 | Loss: 3.501 | 598 ms/step , 115331.08 GFLOP/s , 173308.3 tokens/s INFO:__main__:2024-11-30 02:08:28 | Epoch: 0 | Step: 308380 | Dataset: 0-331560 | Loss: 3.441 | 598 ms/step , 115320.15 GFLOP/s , 173395.4 tokens/s INFO:__main__:2024-11-30 02:08:35 | Epoch: 0 | Step: 308390 | Dataset: 0-333960 | Loss: 3.481 | 598 ms/step , 115370.10 GFLOP/s , 173414.4 tokens/s INFO:__main__:2024-11-30 02:08:42 | Epoch: 0 | Step: 308400 | Dataset: 0-336360 | Loss: 3.371 | 598 ms/step , 115353.33 GFLOP/s , 173409.9 tokens/s INFO:__main__:2024-11-30 02:08:49 | Epoch: 0 | Step: 308410 | Dataset: 0-338760 | Loss: 3.112 | 598 ms/step , 115446.29 GFLOP/s , 173578.0 tokens/s INFO:__main__:2024-11-30 02:08:56 | Epoch: 0 | Step: 308420 | Dataset: 0-341160 | Loss: 3.308 | 599 ms/step , 115279.09 GFLOP/s , 173436.3 tokens/s INFO:__main__:2024-11-30 02:09:03 | Epoch: 0 | Step: 308430 | Dataset: 0-343560 | Loss: 3.290 | 598 ms/step , 115316.11 GFLOP/s , 173339.3 tokens/s INFO:__main__:2024-11-30 02:09:11 | Epoch: 0 | Step: 308440 | Dataset: 0-345960 | Loss: 3.322 | 599 ms/step , 115282.55 GFLOP/s , 173384.3 tokens/s INFO:__main__:2024-11-30 02:09:18 | Epoch: 0 | Step: 308450 | Dataset: 0-348360 | Loss: 3.132 | 598 ms/step , 115467.68 GFLOP/s , 173431.6 tokens/s INFO:__main__:2024-11-30 02:09:25 | Epoch: 0 | Step: 308460 | Dataset: 0-350760 | Loss: 2.982 | 598 ms/step , 115328.44 GFLOP/s , 173391.8 tokens/s INFO:__main__:2024-11-30 02:09:32 | Epoch: 0 | Step: 308470 | Dataset: 0-353160 | Loss: 3.105 | 598 ms/step , 115355.15 GFLOP/s , 173428.0 tokens/s INFO:__main__:2024-11-30 02:09:39 | Epoch: 0 | Step: 308480 | Dataset: 0-355560 | Loss: 3.129 | 598 ms/step , 115341.83 GFLOP/s , 173548.0 tokens/s INFO:__main__:2024-11-30 02:09:46 | Epoch: 0 | Step: 308490 | Dataset: 0-357960 | Loss: 3.065 | 598 ms/step , 115416.05 GFLOP/s , 173571.3 tokens/s INFO:__main__:2024-11-30 02:09:54 | Validation | Step: 308500 | Val_loss: 2.625 | Best_val_loss: 4.1259 INFO:__main__:2024-11-30 02:09:54 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_020954_step_308500.pt` INFO:__main__:2024-11-30 02:09:56 | Epoch: 0 | Step: 308500 | Dataset: 0-360360 | Loss: 3.000 | 595 ms/step , 115919.95 GFLOP/s , 117383.6 tokens/s INFO:__main__:2024-11-30 02:10:03 | Epoch: 0 | Step: 308510 | Dataset: 0-362760 | Loss: 3.019 | 598 ms/step , 115458.82 GFLOP/s , 173442.3 tokens/s INFO:__main__:2024-11-30 02:10:11 | Epoch: 0 | Step: 308520 | Dataset: 0-365160 | Loss: 2.982 | 598 ms/step , 115337.99 GFLOP/s , 173366.4 tokens/s INFO:__main__:2024-11-30 02:10:18 | Epoch: 0 | Step: 308530 | Dataset: 0-367560 | Loss: 2.772 | 598 ms/step , 115429.71 GFLOP/s , 173436.9 tokens/s INFO:__main__:2024-11-30 02:10:25 | Epoch: 0 | Step: 308540 | Dataset: 0-369960 | Loss: 2.961 | 597 ms/step , 115526.30 GFLOP/s , 173494.6 tokens/s INFO:__main__:2024-11-30 02:10:32 | Epoch: 0 | Step: 308550 | Dataset: 0-372360 | Loss: 2.851 | 597 ms/step , 115565.69 GFLOP/s , 173596.0 tokens/s INFO:__main__:2024-11-30 02:10:39 | Epoch: 0 | Step: 308560 | Dataset: 0-374760 | Loss: 3.431 | 598 ms/step , 115353.77 GFLOP/s , 173578.1 tokens/s INFO:__main__:2024-11-30 02:10:46 | Epoch: 0 | Step: 308570 | Dataset: 0-377160 | Loss: 2.724 | 598 ms/step , 115397.24 GFLOP/s , 173475.9 tokens/s INFO:__main__:2024-11-30 02:10:53 | Epoch: 0 | Step: 308580 | Dataset: 0-379560 | Loss: 2.669 | 599 ms/step , 115252.06 GFLOP/s , 173482.4 tokens/s INFO:__main__:2024-11-30 02:11:00 | Epoch: 0 | Step: 308590 | Dataset: 0-381960 | Loss: 2.343 | 598 ms/step , 115332.39 GFLOP/s , 173432.2 tokens/s INFO:__main__:2024-11-30 02:11:07 | Epoch: 0 | Step: 308600 | Dataset: 0-384360 | Loss: 3.040 | 598 ms/step , 115358.65 GFLOP/s , 173443.8 tokens/s INFO:__main__:2024-11-30 02:11:14 | Epoch: 0 | Step: 308610 | Dataset: 0-386760 | Loss: 2.307 | 599 ms/step , 115271.89 GFLOP/s , 173367.3 tokens/s INFO:__main__:2024-11-30 02:11:21 | Epoch: 0 | Step: 308620 | Dataset: 0-389160 | Loss: 1.801 | 598 ms/step , 115317.13 GFLOP/s , 173426.6 tokens/s INFO:__main__:2024-11-30 02:11:28 | Epoch: 0 | Step: 308630 | Dataset: 0-391560 | Loss: 2.749 | 599 ms/step , 115273.39 GFLOP/s , 173519.7 tokens/s INFO:__main__:2024-11-30 02:11:36 | Epoch: 0 | Step: 308640 | Dataset: 0-393960 | Loss: 2.412 | 601 ms/step , 114872.25 GFLOP/s , 173469.1 tokens/s INFO:__main__:2024-11-30 02:11:43 | Epoch: 0 | Step: 308650 | Dataset: 0-396360 | Loss: 2.439 | 600 ms/step , 115113.48 GFLOP/s , 173341.2 tokens/s INFO:__main__:2024-11-30 02:11:50 | Epoch: 0 | Step: 308660 | Dataset: 0-398760 | Loss: 2.531 | 599 ms/step , 115216.19 GFLOP/s , 173387.3 tokens/s INFO:__main__:2024-11-30 02:11:57 | Epoch: 0 | Step: 308670 | Dataset: 0-401160 | Loss: 2.374 | 598 ms/step , 115343.61 GFLOP/s , 173330.9 tokens/s INFO:__main__:2024-11-30 02:12:04 | Epoch: 0 | Step: 308680 | Dataset: 0-403560 | Loss: 2.327 | 599 ms/step , 115244.27 GFLOP/s , 173408.0 tokens/s INFO:__main__:2024-11-30 02:12:11 | Epoch: 0 | Step: 308690 | Dataset: 0-405960 | Loss: 2.275 | 599 ms/step , 115299.69 GFLOP/s , 173372.5 tokens/s INFO:__main__:2024-11-30 02:12:18 | Epoch: 0 | Step: 308700 | Dataset: 0-408360 | Loss: 2.254 | 598 ms/step , 115392.68 GFLOP/s , 173424.3 tokens/s INFO:__main__:2024-11-30 02:12:25 | Epoch: 0 | Step: 308710 | Dataset: 0-410760 | Loss: 2.449 | 598 ms/step , 115420.88 GFLOP/s , 173578.2 tokens/s INFO:__main__:2024-11-30 02:12:32 | Epoch: 0 | Step: 308720 | Dataset: 0-413160 | Loss: 2.357 | 599 ms/step , 115293.14 GFLOP/s , 173354.9 tokens/s INFO:__main__:2024-11-30 02:12:39 | Epoch: 0 | Step: 308730 | Dataset: 0-415560 | Loss: 2.427 | 599 ms/step , 115214.50 GFLOP/s , 173353.7 tokens/s INFO:__main__:2024-11-30 02:12:46 | Epoch: 0 | Step: 308740 | Dataset: 0-417960 | Loss: 2.455 | 598 ms/step , 115405.43 GFLOP/s , 173446.5 tokens/s INFO:__main__:2024-11-30 02:12:54 | Epoch: 0 | Step: 308750 | Dataset: 0-420360 | Loss: 2.019 | 599 ms/step , 115206.73 GFLOP/s , 173357.6 tokens/s INFO:__main__:2024-11-30 02:13:01 | Epoch: 0 | Step: 308760 | Dataset: 0-422760 | Loss: 2.057 | 598 ms/step , 115386.41 GFLOP/s , 173407.5 tokens/s INFO:__main__:2024-11-30 02:13:08 | Epoch: 0 | Step: 308770 | Dataset: 0-425160 | Loss: 2.203 | 599 ms/step , 115279.41 GFLOP/s , 173447.3 tokens/s INFO:__main__:2024-11-30 02:13:15 | Epoch: 0 | Step: 308780 | Dataset: 0-427560 | Loss: 1.967 | 599 ms/step , 115282.55 GFLOP/s , 173525.1 tokens/s INFO:__main__:2024-11-30 02:13:22 | Epoch: 0 | Step: 308790 | Dataset: 0-429960 | Loss: 2.075 | 599 ms/step , 115187.60 GFLOP/s , 173492.5 tokens/s INFO:__main__:2024-11-30 02:13:29 | Epoch: 0 | Step: 308800 | Dataset: 0-432360 | Loss: 2.468 | 599 ms/step , 115177.44 GFLOP/s , 173355.6 tokens/s INFO:__main__:2024-11-30 02:13:36 | Epoch: 0 | Step: 308810 | Dataset: 0-434760 | Loss: 1.886 | 598 ms/step , 115342.77 GFLOP/s , 173412.7 tokens/s INFO:__main__:2024-11-30 02:13:43 | Epoch: 0 | Step: 308820 | Dataset: 0-437160 | Loss: 1.997 | 599 ms/step , 115254.86 GFLOP/s , 173288.8 tokens/s INFO:__main__:2024-11-30 02:13:50 | Epoch: 0 | Step: 308830 | Dataset: 0-439560 | Loss: 2.422 | 598 ms/step , 115365.34 GFLOP/s , 173400.0 tokens/s INFO:__main__:2024-11-30 02:13:57 | Epoch: 0 | Step: 308840 | Dataset: 0-441960 | Loss: 1.723 | 598 ms/step , 115417.53 GFLOP/s , 173434.1 tokens/s INFO:__main__:2024-11-30 02:14:04 | Epoch: 0 | Step: 308850 | Dataset: 0-444360 | Loss: 1.972 | 598 ms/step , 115419.58 GFLOP/s , 173534.9 tokens/s INFO:__main__:2024-11-30 02:14:11 | Epoch: 0 | Step: 308860 | Dataset: 0-446760 | Loss: 1.736 | 598 ms/step , 115381.16 GFLOP/s , 173433.8 tokens/s INFO:__main__:2024-11-30 02:14:19 | Epoch: 0 | Step: 308870 | Dataset: 0-449160 | Loss: 1.766 | 598 ms/step , 115319.48 GFLOP/s , 173469.3 tokens/s INFO:__main__:2024-11-30 02:14:26 | Epoch: 0 | Step: 308880 | Dataset: 0-451560 | Loss: 1.701 | 598 ms/step , 115439.96 GFLOP/s , 173438.9 tokens/s INFO:__main__:2024-11-30 02:14:33 | Epoch: 0 | Step: 308890 | Dataset: 0-453960 | Loss: 1.663 | 598 ms/step , 115358.57 GFLOP/s , 173470.9 tokens/s INFO:__main__:2024-11-30 02:14:40 | Epoch: 0 | Step: 308900 | Dataset: 0-456360 | Loss: 1.632 | 598 ms/step , 115381.64 GFLOP/s , 173433.7 tokens/s INFO:__main__:2024-11-30 02:14:47 | Epoch: 0 | Step: 308910 | Dataset: 0-458760 | Loss: 1.596 | 598 ms/step , 115339.13 GFLOP/s , 173419.9 tokens/s INFO:__main__:2024-11-30 02:14:54 | Epoch: 0 | Step: 308920 | Dataset: 0-461160 | Loss: 1.434 | 597 ms/step , 115524.73 GFLOP/s , 173475.9 tokens/s INFO:__main__:2024-11-30 02:15:01 | Epoch: 0 | Step: 308930 | Dataset: 0-463560 | Loss: 1.521 | 598 ms/step , 115334.52 GFLOP/s , 173621.1 tokens/s INFO:__main__:2024-11-30 02:15:08 | Epoch: 0 | Step: 308940 | Dataset: 0-465960 | Loss: 1.762 | 598 ms/step , 115470.69 GFLOP/s , 173470.6 tokens/s INFO:__main__:2024-11-30 02:15:15 | Epoch: 0 | Step: 308950 | Dataset: 0-468360 | Loss: 1.562 | 599 ms/step , 115294.46 GFLOP/s , 173409.6 tokens/s INFO:__main__:2024-11-30 02:15:22 | Epoch: 0 | Step: 308960 | Dataset: 0-470760 | Loss: 1.857 | 600 ms/step , 115114.41 GFLOP/s , 173396.9 tokens/s INFO:__main__:2024-11-30 02:15:29 | Epoch: 0 | Step: 308970 | Dataset: 0-473160 | Loss: 2.023 | 599 ms/step , 115261.35 GFLOP/s , 173350.0 tokens/s INFO:__main__:2024-11-30 02:15:36 | Epoch: 0 | Step: 308980 | Dataset: 0-475560 | Loss: 1.586 | 598 ms/step , 115354.06 GFLOP/s , 173397.2 tokens/s INFO:__main__:2024-11-30 02:15:44 | Epoch: 0 | Step: 308990 | Dataset: 0-477960 | Loss: 1.812 | 598 ms/step , 115344.97 GFLOP/s , 173313.5 tokens/s INFO:__main__:2024-11-30 02:15:51 | Validation | Step: 309000 | Val_loss: 1.356 | Best_val_loss: 2.6251 INFO:__main__:2024-11-30 02:15:51 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_021551_step_309000.pt` INFO:__main__:2024-11-30 02:15:54 | Epoch: 0 | Step: 309000 | Dataset: 0-480360 | Loss: 1.612 | 595 ms/step , 116051.75 GFLOP/s , 120076.5 tokens/s INFO:__main__:2024-11-30 02:16:01 | Epoch: 0 | Step: 309010 | Dataset: 0-482760 | Loss: 1.582 | 598 ms/step , 115356.94 GFLOP/s , 173418.9 tokens/s INFO:__main__:2024-11-30 02:16:08 | Epoch: 0 | Step: 309020 | Dataset: 0-485160 | Loss: 1.822 | 598 ms/step , 115490.27 GFLOP/s , 173347.5 tokens/s INFO:__main__:2024-11-30 02:16:15 | Epoch: 0 | Step: 309030 | Dataset: 0-487560 | Loss: 1.660 | 599 ms/step , 115308.31 GFLOP/s , 173288.3 tokens/s INFO:__main__:2024-11-30 02:16:22 | Epoch: 0 | Step: 309040 | Dataset: 0-489960 | Loss: 2.174 | 602 ms/step , 114631.46 GFLOP/s , 173265.8 tokens/s INFO:__main__:2024-11-30 02:16:29 | Epoch: 0 | Step: 309050 | Dataset: 0-492360 | Loss: 1.503 | 598 ms/step , 115445.13 GFLOP/s , 173445.8 tokens/s INFO:__main__:2024-11-30 02:16:36 | Epoch: 0 | Step: 309060 | Dataset: 0-494760 | Loss: 1.663 | 599 ms/step , 115263.41 GFLOP/s , 173389.8 tokens/s INFO:__main__:2024-11-30 02:16:43 | Epoch: 0 | Step: 309070 | Dataset: 0-497160 | Loss: 1.264 | 599 ms/step , 115266.88 GFLOP/s , 173529.1 tokens/s INFO:__main__:2024-11-30 02:16:50 | Epoch: 0 | Step: 309080 | Dataset: 0-499560 | Loss: 1.801 | 598 ms/step , 115363.13 GFLOP/s , 173516.0 tokens/s INFO:__main__:2024-11-30 02:16:58 | Epoch: 0 | Step: 309090 | Dataset: 0-501960 | Loss: 2.548 | 599 ms/step , 115204.11 GFLOP/s , 173372.7 tokens/s INFO:__main__:2024-11-30 02:17:05 | Epoch: 0 | Step: 309100 | Dataset: 0-504360 | Loss: 1.061 | 598 ms/step , 115441.65 GFLOP/s , 173384.2 tokens/s INFO:__main__:2024-11-30 02:17:12 | Epoch: 0 | Step: 309110 | Dataset: 0-506760 | Loss: 1.187 | 598 ms/step , 115382.87 GFLOP/s , 173422.6 tokens/s INFO:__main__:2024-11-30 02:17:19 | Epoch: 0 | Step: 309120 | Dataset: 0-509160 | Loss: 4.885 | 599 ms/step , 115215.49 GFLOP/s , 173363.6 tokens/s INFO:__main__:2024-11-30 02:17:26 | Epoch: 0 | Step: 309130 | Dataset: 0-511560 | Loss: 2.505 | 599 ms/step , 115164.59 GFLOP/s , 173150.1 tokens/s INFO:__main__:2024-11-30 02:17:33 | Epoch: 0 | Step: 309140 | Dataset: 0-513960 | Loss: 1.572 | 598 ms/step , 115426.33 GFLOP/s , 173403.9 tokens/s INFO:__main__:2024-11-30 02:17:40 | Epoch: 0 | Step: 309150 | Dataset: 0-516360 | Loss: 1.216 | 599 ms/step , 115153.18 GFLOP/s , 173539.5 tokens/s INFO:__main__:2024-11-30 02:17:47 | Epoch: 0 | Step: 309160 | Dataset: 0-518760 | Loss: 1.127 | 599 ms/step , 115164.36 GFLOP/s , 173247.4 tokens/s INFO:__main__:2024-11-30 02:17:54 | Epoch: 0 | Step: 309170 | Dataset: 0-521160 | Loss: 1.506 | 598 ms/step , 115409.34 GFLOP/s , 173368.8 tokens/s INFO:__main__:2024-11-30 02:18:01 | Epoch: 0 | Step: 309180 | Dataset: 0-523560 | Loss: 0.887 | 598 ms/step , 115373.11 GFLOP/s , 173329.7 tokens/s INFO:__main__:2024-11-30 02:18:08 | Epoch: 0 | Step: 309190 | Dataset: 0-525960 | Loss: 1.649 | 599 ms/step , 115243.11 GFLOP/s , 173293.6 tokens/s INFO:__main__:2024-11-30 02:18:16 | Epoch: 0 | Step: 309200 | Dataset: 0-528360 | Loss: 1.625 | 600 ms/step , 115055.30 GFLOP/s , 173318.8 tokens/s INFO:__main__:2024-11-30 02:18:23 | Epoch: 0 | Step: 309210 | Dataset: 0-530760 | Loss: 1.649 | 599 ms/step , 115196.03 GFLOP/s , 173321.9 tokens/s INFO:__main__:2024-11-30 02:18:30 | Epoch: 0 | Step: 309220 | Dataset: 0-533160 | Loss: 1.598 | 599 ms/step , 115278.43 GFLOP/s , 173470.8 tokens/s INFO:__main__:2024-11-30 02:18:37 | Epoch: 0 | Step: 309230 | Dataset: 0-535560 | Loss: 1.587 | 598 ms/step , 115339.60 GFLOP/s , 173468.7 tokens/s INFO:__main__:2024-11-30 02:18:44 | Epoch: 0 | Step: 309240 | Dataset: 0-537960 | Loss: 1.538 | 599 ms/step , 115169.84 GFLOP/s , 173348.1 tokens/s INFO:__main__:2024-11-30 02:18:51 | Epoch: 0 | Step: 309250 | Dataset: 0-540360 | Loss: 1.567 | 599 ms/step , 115271.89 GFLOP/s , 173322.3 tokens/s INFO:__main__:2024-11-30 02:18:58 | Epoch: 0 | Step: 309260 | Dataset: 0-542760 | Loss: 1.095 | 599 ms/step , 115219.81 GFLOP/s , 173293.8 tokens/s INFO:__main__:2024-11-30 02:19:05 | Epoch: 0 | Step: 309270 | Dataset: 0-545160 | Loss: 1.080 | 599 ms/step , 115283.80 GFLOP/s , 173391.5 tokens/s INFO:__main__:2024-11-30 02:19:12 | Epoch: 0 | Step: 309280 | Dataset: 0-547560 | Loss: 1.061 | 599 ms/step , 115261.72 GFLOP/s , 173332.4 tokens/s INFO:__main__:2024-11-30 02:19:19 | Epoch: 0 | Step: 309290 | Dataset: 0-549960 | Loss: 1.038 | 598 ms/step , 115475.05 GFLOP/s , 173375.4 tokens/s INFO:__main__:2024-11-30 02:19:26 | Epoch: 0 | Step: 309300 | Dataset: 0-552360 | Loss: 1.044 | 598 ms/step , 115410.49 GFLOP/s , 173560.4 tokens/s INFO:__main__:2024-11-30 02:19:34 | Epoch: 0 | Step: 309310 | Dataset: 0-554760 | Loss: 0.979 | 599 ms/step , 115217.16 GFLOP/s , 173467.9 tokens/s INFO:__main__:2024-11-30 02:19:41 | Epoch: 0 | Step: 309320 | Dataset: 0-557160 | Loss: 0.980 | 599 ms/step , 115252.44 GFLOP/s , 173343.9 tokens/s INFO:__main__:2024-11-30 02:19:48 | Epoch: 0 | Step: 309330 | Dataset: 0-559560 | Loss: 0.992 | 598 ms/step , 115328.88 GFLOP/s , 173419.0 tokens/s INFO:__main__:2024-11-30 02:19:55 | Epoch: 0 | Step: 309340 | Dataset: 0-561960 | Loss: 0.981 | 599 ms/step , 115289.46 GFLOP/s , 173416.9 tokens/s INFO:__main__:2024-11-30 02:20:02 | Epoch: 0 | Step: 309350 | Dataset: 0-564360 | Loss: 1.013 | 598 ms/step , 115310.72 GFLOP/s , 173370.9 tokens/s INFO:__main__:2024-11-30 02:20:09 | Epoch: 0 | Step: 309360 | Dataset: 0-566760 | Loss: 1.705 | 599 ms/step , 115282.52 GFLOP/s , 173361.6 tokens/s INFO:__main__:2024-11-30 02:20:16 | Epoch: 0 | Step: 309370 | Dataset: 0-569160 | Loss: 1.600 | 598 ms/step , 115331.24 GFLOP/s , 173500.7 tokens/s INFO:__main__:2024-11-30 02:20:23 | Epoch: 0 | Step: 309380 | Dataset: 0-571560 | Loss: 1.627 | 599 ms/step , 115286.99 GFLOP/s , 173523.4 tokens/s INFO:__main__:2024-11-30 02:20:30 | Epoch: 0 | Step: 309390 | Dataset: 0-573960 | Loss: 1.623 | 599 ms/step , 115256.45 GFLOP/s , 173388.2 tokens/s INFO:__main__:2024-11-30 02:20:37 | Epoch: 0 | Step: 309400 | Dataset: 0-576360 | Loss: 1.547 | 600 ms/step , 115042.34 GFLOP/s , 173364.4 tokens/s INFO:__main__:2024-11-30 02:20:44 | Epoch: 0 | Step: 309410 | Dataset: 0-578760 | Loss: 1.585 | 599 ms/step , 115306.92 GFLOP/s , 173397.3 tokens/s INFO:__main__:2024-11-30 02:20:51 | Epoch: 0 | Step: 309420 | Dataset: 0-581160 | Loss: 1.606 | 599 ms/step , 115232.15 GFLOP/s , 173325.9 tokens/s INFO:__main__:2024-11-30 02:20:59 | Epoch: 0 | Step: 309430 | Dataset: 0-583560 | Loss: 1.529 | 599 ms/step , 115265.94 GFLOP/s , 173342.3 tokens/s INFO:__main__:2024-11-30 02:21:06 | Epoch: 0 | Step: 309440 | Dataset: 0-585960 | Loss: 1.531 | 598 ms/step , 115328.35 GFLOP/s , 173385.5 tokens/s INFO:__main__:2024-11-30 02:21:13 | Epoch: 0 | Step: 309450 | Dataset: 0-588360 | Loss: 1.496 | 598 ms/step , 115386.16 GFLOP/s , 173516.5 tokens/s INFO:__main__:2024-11-30 02:21:20 | Epoch: 0 | Step: 309460 | Dataset: 0-590760 | Loss: 1.555 | 598 ms/step , 115331.86 GFLOP/s , 173422.1 tokens/s INFO:__main__:2024-11-30 02:21:27 | Epoch: 0 | Step: 309470 | Dataset: 0-593160 | Loss: 1.537 | 599 ms/step , 115167.57 GFLOP/s , 173356.6 tokens/s INFO:__main__:2024-11-30 02:21:34 | Epoch: 0 | Step: 309480 | Dataset: 0-595560 | Loss: 1.656 | 599 ms/step , 115250.55 GFLOP/s , 173383.0 tokens/s INFO:__main__:2024-11-30 02:21:41 | Epoch: 0 | Step: 309490 | Dataset: 0-597960 | Loss: 1.530 | 599 ms/step , 115271.44 GFLOP/s , 173355.2 tokens/s INFO:__main__:2024-11-30 02:21:49 | Validation | Step: 309500 | Val_loss: 0.720 | Best_val_loss: 1.3561 INFO:__main__:2024-11-30 02:21:49 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_022149_step_309500.pt` INFO:__main__:2024-11-30 02:21:52 | Epoch: 0 | Step: 309500 | Dataset: 0-600360 | Loss: 1.476 | 595 ms/step , 115962.43 GFLOP/s , 117425.9 tokens/s INFO:__main__:2024-11-30 02:21:59 | Epoch: 0 | Step: 309510 | Dataset: 0-602760 | Loss: 1.424 | 598 ms/step , 115408.79 GFLOP/s , 173360.2 tokens/s INFO:__main__:2024-11-30 02:22:06 | Epoch: 0 | Step: 309520 | Dataset: 0-605160 | Loss: 1.347 | 598 ms/step , 115461.75 GFLOP/s , 173480.8 tokens/s INFO:__main__:2024-11-30 02:22:13 | Epoch: 0 | Step: 309530 | Dataset: 0-607560 | Loss: 1.395 | 598 ms/step , 115375.51 GFLOP/s , 173394.5 tokens/s INFO:__main__:2024-11-30 02:22:20 | Epoch: 0 | Step: 309540 | Dataset: 0-609960 | Loss: 1.476 | 598 ms/step , 115430.98 GFLOP/s , 173277.1 tokens/s INFO:__main__:2024-11-30 02:22:27 | Epoch: 0 | Step: 309550 | Dataset: 0-612360 | Loss: 1.506 | 598 ms/step , 115424.70 GFLOP/s , 173366.8 tokens/s INFO:__main__:2024-11-30 02:22:34 | Epoch: 0 | Step: 309560 | Dataset: 0-614760 | Loss: 1.424 | 598 ms/step , 115454.34 GFLOP/s , 173307.0 tokens/s INFO:__main__:2024-11-30 02:22:41 | Epoch: 0 | Step: 309570 | Dataset: 0-617160 | Loss: 1.472 | 599 ms/step , 115281.00 GFLOP/s , 173369.1 tokens/s INFO:__main__:2024-11-30 02:22:48 | Epoch: 0 | Step: 309580 | Dataset: 0-619560 | Loss: 1.538 | 599 ms/step , 115293.61 GFLOP/s , 173392.0 tokens/s INFO:__main__:2024-11-30 02:22:55 | Epoch: 0 | Step: 309590 | Dataset: 0-621960 | Loss: 1.499 | 599 ms/step , 115203.80 GFLOP/s , 173454.4 tokens/s INFO:__main__:2024-11-30 02:23:02 | Epoch: 0 | Step: 309600 | Dataset: 0-624360 | Loss: 1.353 | 598 ms/step , 115396.41 GFLOP/s , 173522.3 tokens/s INFO:__main__:2024-11-30 02:23:09 | Epoch: 0 | Step: 309610 | Dataset: 0-626760 | Loss: 1.414 | 599 ms/step , 115237.61 GFLOP/s , 173385.9 tokens/s INFO:__main__:2024-11-30 02:23:17 | Epoch: 0 | Step: 309620 | Dataset: 0-629160 | Loss: 1.468 | 599 ms/step , 115267.77 GFLOP/s , 173335.8 tokens/s INFO:__main__:2024-11-30 02:23:24 | Epoch: 0 | Step: 309630 | Dataset: 0-631560 | Loss: 1.479 | 599 ms/step , 115282.67 GFLOP/s , 173367.4 tokens/s INFO:__main__:2024-11-30 02:23:31 | Epoch: 0 | Step: 309640 | Dataset: 0-633960 | Loss: 1.510 | 599 ms/step , 115280.34 GFLOP/s , 173309.3 tokens/s INFO:__main__:2024-11-30 02:23:38 | Epoch: 0 | Step: 309650 | Dataset: 0-636360 | Loss: 1.447 | 599 ms/step , 115219.56 GFLOP/s , 173327.6 tokens/s INFO:__main__:2024-11-30 02:23:45 | Epoch: 0 | Step: 309660 | Dataset: 0-638760 | Loss: 1.406 | 599 ms/step , 115239.04 GFLOP/s , 173388.2 tokens/s INFO:__main__:2024-11-30 02:23:52 | Epoch: 0 | Step: 309670 | Dataset: 0-641160 | Loss: 1.349 | 598 ms/step , 115366.74 GFLOP/s , 173487.7 tokens/s INFO:__main__:2024-11-30 02:23:59 | Epoch: 0 | Step: 309680 | Dataset: 0-643560 | Loss: 1.408 | 599 ms/step , 115298.93 GFLOP/s , 173433.7 tokens/s INFO:__main__:2024-11-30 02:24:06 | Epoch: 0 | Step: 309690 | Dataset: 0-645960 | Loss: 1.379 | 599 ms/step , 115300.44 GFLOP/s , 173359.1 tokens/s INFO:__main__:2024-11-30 02:24:13 | Epoch: 0 | Step: 309700 | Dataset: 0-648360 | Loss: 1.409 | 599 ms/step , 115247.49 GFLOP/s , 173377.5 tokens/s INFO:__main__:2024-11-30 02:24:20 | Epoch: 0 | Step: 309710 | Dataset: 0-650760 | Loss: 1.392 | 599 ms/step , 115228.87 GFLOP/s , 173351.0 tokens/s INFO:__main__:2024-11-30 02:24:27 | Epoch: 0 | Step: 309720 | Dataset: 0-653160 | Loss: 1.315 | 599 ms/step , 115148.83 GFLOP/s , 173368.9 tokens/s INFO:__main__:2024-11-30 02:24:35 | Epoch: 0 | Step: 309730 | Dataset: 0-655560 | Loss: 1.374 | 599 ms/step , 115236.04 GFLOP/s , 173380.0 tokens/s INFO:__main__:2024-11-30 02:24:42 | Epoch: 0 | Step: 309740 | Dataset: 0-657960 | Loss: 0.814 | 598 ms/step , 115366.33 GFLOP/s , 173470.0 tokens/s INFO:__main__:2024-11-30 02:24:49 | Epoch: 0 | Step: 309750 | Dataset: 0-660360 | Loss: 0.768 | 599 ms/step , 115262.37 GFLOP/s , 173534.5 tokens/s INFO:__main__:2024-11-30 02:24:56 | Epoch: 0 | Step: 309760 | Dataset: 0-662760 | Loss: 0.756 | 598 ms/step , 115325.55 GFLOP/s , 173444.4 tokens/s INFO:__main__:2024-11-30 02:25:03 | Epoch: 0 | Step: 309770 | Dataset: 0-665160 | Loss: 0.707 | 599 ms/step , 115245.48 GFLOP/s , 173384.1 tokens/s INFO:__main__:2024-11-30 02:25:10 | Epoch: 0 | Step: 309780 | Dataset: 0-667560 | Loss: 0.721 | 599 ms/step , 115161.92 GFLOP/s , 173339.1 tokens/s INFO:__main__:2024-11-30 02:25:17 | Epoch: 0 | Step: 309790 | Dataset: 0-669960 | Loss: 0.813 | 599 ms/step , 115191.49 GFLOP/s , 173379.2 tokens/s INFO:__main__:2024-11-30 02:25:24 | Epoch: 0 | Step: 309800 | Dataset: 0-672360 | Loss: 0.723 | 598 ms/step , 115316.17 GFLOP/s , 173362.9 tokens/s INFO:__main__:2024-11-30 02:25:31 | Epoch: 0 | Step: 309810 | Dataset: 0-674760 | Loss: 0.734 | 598 ms/step , 115358.04 GFLOP/s , 173369.2 tokens/s INFO:__main__:2024-11-30 02:25:38 | Epoch: 0 | Step: 309820 | Dataset: 0-677160 | Loss: 0.682 | 598 ms/step , 115373.57 GFLOP/s , 173545.0 tokens/s INFO:__main__:2024-11-30 02:25:45 | Epoch: 0 | Step: 309830 | Dataset: 0-679560 | Loss: 0.671 | 599 ms/step , 115290.16 GFLOP/s , 173481.3 tokens/s INFO:__main__:2024-11-30 02:25:52 | Epoch: 0 | Step: 309840 | Dataset: 0-681960 | Loss: 0.698 | 598 ms/step , 115339.13 GFLOP/s , 173361.9 tokens/s INFO:__main__:2024-11-30 02:26:00 | Epoch: 0 | Step: 309850 | Dataset: 0-684360 | Loss: 0.675 | 598 ms/step , 115384.07 GFLOP/s , 173384.0 tokens/s INFO:__main__:2024-11-30 02:26:07 | Epoch: 0 | Step: 309860 | Dataset: 0-686760 | Loss: 0.746 | 598 ms/step , 115316.00 GFLOP/s , 173374.2 tokens/s INFO:__main__:2024-11-30 02:26:14 | Epoch: 0 | Step: 309870 | Dataset: 0-689160 | Loss: 0.642 | 599 ms/step , 115297.40 GFLOP/s , 173386.2 tokens/s INFO:__main__:2024-11-30 02:26:21 | Epoch: 0 | Step: 309880 | Dataset: 0-691560 | Loss: 0.713 | 598 ms/step , 115352.83 GFLOP/s , 173452.0 tokens/s INFO:__main__:2024-11-30 02:26:28 | Epoch: 0 | Step: 309890 | Dataset: 0-693960 | Loss: 0.686 | 598 ms/step , 115414.40 GFLOP/s , 173403.3 tokens/s INFO:__main__:2024-11-30 02:26:35 | Epoch: 0 | Step: 309900 | Dataset: 0-696360 | Loss: 0.739 | 598 ms/step , 115360.90 GFLOP/s , 173522.2 tokens/s INFO:__main__:2024-11-30 02:26:42 | Epoch: 0 | Step: 309910 | Dataset: 0-698760 | Loss: 0.655 | 598 ms/step , 115393.22 GFLOP/s , 173398.4 tokens/s INFO:__main__:2024-11-30 02:26:49 | Epoch: 0 | Step: 309920 | Dataset: 0-701160 | Loss: 0.693 | 598 ms/step , 115411.35 GFLOP/s , 173469.3 tokens/s INFO:__main__:2024-11-30 02:26:56 | Epoch: 0 | Step: 309930 | Dataset: 0-703560 | Loss: 0.719 | 598 ms/step , 115329.67 GFLOP/s , 173411.3 tokens/s INFO:__main__:2024-11-30 02:27:03 | Epoch: 0 | Step: 309940 | Dataset: 0-705960 | Loss: 0.621 | 598 ms/step , 115383.83 GFLOP/s , 173385.9 tokens/s INFO:__main__:2024-11-30 02:27:10 | Epoch: 0 | Step: 309950 | Dataset: 0-708360 | Loss: 0.657 | 599 ms/step , 115226.00 GFLOP/s , 173402.1 tokens/s INFO:__main__:2024-11-30 02:27:18 | Epoch: 0 | Step: 309960 | Dataset: 0-710760 | Loss: 0.631 | 599 ms/step , 115294.12 GFLOP/s , 173421.7 tokens/s INFO:__main__:2024-11-30 02:27:25 | Epoch: 0 | Step: 309970 | Dataset: 0-713160 | Loss: 0.689 | 598 ms/step , 115382.35 GFLOP/s , 173593.1 tokens/s INFO:__main__:2024-11-30 02:27:32 | Epoch: 0 | Step: 309980 | Dataset: 0-715560 | Loss: 0.654 | 599 ms/step , 115299.33 GFLOP/s , 173560.6 tokens/s INFO:__main__:2024-11-30 02:27:39 | Epoch: 0 | Step: 309990 | Dataset: 0-717960 | Loss: 0.714 | 599 ms/step , 115306.08 GFLOP/s , 173404.1 tokens/s INFO:__main__:2024-11-30 02:27:46 | Validation | Step: 310000 | Val_loss: 0.534 | Best_val_loss: 0.7204 INFO:__main__:2024-11-30 02:27:46 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_022746_step_310000.pt` INFO:__main__:2024-11-30 02:27:49 | Epoch: 0 | Step: 310000 | Dataset: 0-720360 | Loss: 0.590 | 595 ms/step , 116044.51 GFLOP/s , 117218.2 tokens/s INFO:__main__:2024-11-30 02:27:56 | Epoch: 0 | Step: 310010 | Dataset: 0-722760 | Loss: 0.638 | 598 ms/step , 115340.62 GFLOP/s , 173436.1 tokens/s INFO:__main__:2024-11-30 02:28:03 | Epoch: 0 | Step: 310020 | Dataset: 0-725160 | Loss: 0.714 | 598 ms/step , 115318.46 GFLOP/s , 173336.7 tokens/s INFO:__main__:2024-11-30 02:28:10 | Epoch: 0 | Step: 310030 | Dataset: 0-727560 | Loss: 0.585 | 598 ms/step , 115376.80 GFLOP/s , 173395.2 tokens/s INFO:__main__:2024-11-30 02:28:18 | Epoch: 0 | Step: 310040 | Dataset: 0-729960 | Loss: 0.671 | 597 ms/step , 115522.26 GFLOP/s , 173389.2 tokens/s INFO:__main__:2024-11-30 02:28:25 | Epoch: 0 | Step: 310050 | Dataset: 0-732360 | Loss: 0.668 | 598 ms/step , 115408.36 GFLOP/s , 173342.8 tokens/s INFO:__main__:2024-11-30 02:28:32 | Epoch: 0 | Step: 310060 | Dataset: 0-734760 | Loss: 0.631 | 598 ms/step , 115376.68 GFLOP/s , 173388.4 tokens/s INFO:__main__:2024-11-30 02:28:39 | Epoch: 0 | Step: 310070 | Dataset: 0-737160 | Loss: 0.583 | 599 ms/step , 115253.16 GFLOP/s , 173430.5 tokens/s INFO:__main__:2024-11-30 02:28:46 | Epoch: 0 | Step: 310080 | Dataset: 0-739560 | Loss: 0.670 | 599 ms/step , 115278.89 GFLOP/s , 173395.1 tokens/s INFO:__main__:2024-11-30 02:28:53 | Epoch: 0 | Step: 310090 | Dataset: 0-741960 | Loss: 0.573 | 598 ms/step , 115317.85 GFLOP/s , 173393.6 tokens/s INFO:__main__:2024-11-30 02:29:00 | Epoch: 0 | Step: 310100 | Dataset: 0-744360 | Loss: 0.616 | 599 ms/step , 115254.41 GFLOP/s , 173372.0 tokens/s INFO:__main__:2024-11-30 02:29:07 | Epoch: 0 | Step: 310110 | Dataset: 0-746760 | Loss: 0.612 | 598 ms/step , 115348.47 GFLOP/s , 173491.0 tokens/s INFO:__main__:2024-11-30 02:29:14 | Epoch: 0 | Step: 310120 | Dataset: 0-749160 | Loss: 0.590 | 598 ms/step , 115326.40 GFLOP/s , 173615.2 tokens/s INFO:__main__:2024-11-30 02:29:21 | Epoch: 0 | Step: 310130 | Dataset: 0-751560 | Loss: 0.653 | 598 ms/step , 115339.38 GFLOP/s , 173540.6 tokens/s INFO:__main__:2024-11-30 02:29:28 | Epoch: 0 | Step: 310140 | Dataset: 0-753960 | Loss: 0.636 | 598 ms/step , 115373.98 GFLOP/s , 173473.5 tokens/s INFO:__main__:2024-11-30 02:29:36 | Epoch: 0 | Step: 310150 | Dataset: 0-756360 | Loss: 0.582 | 598 ms/step , 115384.10 GFLOP/s , 173470.1 tokens/s INFO:__main__:2024-11-30 02:29:43 | Epoch: 0 | Step: 310160 | Dataset: 0-758760 | Loss: 0.591 | 597 ms/step , 115513.13 GFLOP/s , 173547.4 tokens/s INFO:__main__:2024-11-30 02:29:50 | Epoch: 0 | Step: 310170 | Dataset: 0-761160 | Loss: 0.624 | 598 ms/step , 115457.53 GFLOP/s , 173489.5 tokens/s INFO:__main__:2024-11-30 02:29:57 | Epoch: 0 | Step: 310180 | Dataset: 0-763560 | Loss: 0.609 | 598 ms/step , 115375.36 GFLOP/s , 173497.1 tokens/s INFO:__main__:2024-11-30 02:30:04 | Epoch: 0 | Step: 310190 | Dataset: 0-765960 | Loss: 0.652 | 598 ms/step , 115471.83 GFLOP/s , 173647.1 tokens/s INFO:__main__:2024-11-30 02:30:11 | Epoch: 0 | Step: 310200 | Dataset: 0-768360 | Loss: 0.595 | 598 ms/step , 115491.16 GFLOP/s , 173682.5 tokens/s INFO:__main__:2024-11-30 02:30:18 | Epoch: 0 | Step: 310210 | Dataset: 0-770760 | Loss: 0.614 | 597 ms/step , 115506.29 GFLOP/s , 173586.0 tokens/s INFO:__main__:2024-11-30 02:30:25 | Epoch: 0 | Step: 310220 | Dataset: 0-773160 | Loss: 0.629 | 598 ms/step , 115437.91 GFLOP/s , 173525.7 tokens/s INFO:__main__:2024-11-30 02:30:32 | Epoch: 0 | Step: 310230 | Dataset: 0-775560 | Loss: 0.563 | 598 ms/step , 115439.57 GFLOP/s , 173513.8 tokens/s INFO:__main__:2024-11-30 02:30:39 | Epoch: 0 | Step: 310240 | Dataset: 0-777960 | Loss: 0.613 | 598 ms/step , 115310.58 GFLOP/s , 173479.3 tokens/s INFO:__main__:2024-11-30 02:30:46 | Epoch: 0 | Step: 310250 | Dataset: 0-780360 | Loss: 0.616 | 598 ms/step , 115390.80 GFLOP/s , 173472.5 tokens/s INFO:__main__:2024-11-30 02:30:53 | Epoch: 0 | Step: 310260 | Dataset: 0-782760 | Loss: 0.651 | 598 ms/step , 115451.97 GFLOP/s , 173546.5 tokens/s INFO:__main__:2024-11-30 02:31:00 | Epoch: 0 | Step: 310270 | Dataset: 0-785160 | Loss: 0.566 | 598 ms/step , 115408.67 GFLOP/s , 173657.6 tokens/s INFO:__main__:2024-11-30 02:31:08 | Epoch: 0 | Step: 310280 | Dataset: 0-787560 | Loss: 0.903 | 598 ms/step , 115398.24 GFLOP/s , 173577.2 tokens/s INFO:__main__:2024-11-30 02:31:15 | Epoch: 0 | Step: 310290 | Dataset: 0-789960 | Loss: 0.833 | 599 ms/step , 115231.40 GFLOP/s , 173501.8 tokens/s INFO:__main__:2024-11-30 02:31:22 | Epoch: 0 | Step: 310300 | Dataset: 0-792360 | Loss: 0.797 | 598 ms/step , 115350.13 GFLOP/s , 173469.1 tokens/s INFO:__main__:2024-11-30 02:31:29 | Epoch: 0 | Step: 310310 | Dataset: 0-794760 | Loss: 0.854 | 598 ms/step , 115345.99 GFLOP/s , 173465.1 tokens/s INFO:__main__:2024-11-30 02:31:36 | Epoch: 0 | Step: 310320 | Dataset: 0-797160 | Loss: 0.844 | 598 ms/step , 115367.03 GFLOP/s , 173497.4 tokens/s INFO:__main__:2024-11-30 02:31:43 | Epoch: 0 | Step: 310330 | Dataset: 0-799560 | Loss: 0.731 | 598 ms/step , 115347.67 GFLOP/s , 173456.5 tokens/s INFO:__main__:2024-11-30 02:31:50 | Epoch: 0 | Step: 310340 | Dataset: 0-801960 | Loss: 0.890 | 597 ms/step , 115575.47 GFLOP/s , 173575.8 tokens/s INFO:__main__:2024-11-30 02:31:57 | Epoch: 0 | Step: 310350 | Dataset: 0-804360 | Loss: 0.813 | 598 ms/step , 115474.94 GFLOP/s , 173688.9 tokens/s INFO:__main__:2024-11-30 02:32:04 | Epoch: 0 | Step: 310360 | Dataset: 0-806760 | Loss: 0.909 | 598 ms/step , 115366.55 GFLOP/s , 173536.8 tokens/s INFO:__main__:2024-11-30 02:32:11 | Epoch: 0 | Step: 310370 | Dataset: 0-809160 | Loss: 0.934 | 598 ms/step , 115362.35 GFLOP/s , 173496.7 tokens/s INFO:__main__:2024-11-30 02:32:18 | Epoch: 0 | Step: 310380 | Dataset: 0-811560 | Loss: 0.812 | 598 ms/step , 115313.69 GFLOP/s , 173462.0 tokens/s INFO:__main__:2024-11-30 02:32:25 | Epoch: 0 | Step: 310390 | Dataset: 0-813960 | Loss: 0.944 | 599 ms/step , 115273.28 GFLOP/s , 173488.5 tokens/s INFO:__main__:2024-11-30 02:32:33 | Epoch: 0 | Step: 310400 | Dataset: 0-816360 | Loss: 0.877 | 598 ms/step , 115325.88 GFLOP/s , 173483.5 tokens/s INFO:__main__:2024-11-30 02:32:40 | Epoch: 0 | Step: 310410 | Dataset: 0-818760 | Loss: 0.822 | 598 ms/step , 115328.21 GFLOP/s , 173490.7 tokens/s INFO:__main__:2024-11-30 02:32:47 | Epoch: 0 | Step: 310420 | Dataset: 0-821160 | Loss: 0.812 | 597 ms/step , 115550.67 GFLOP/s , 173650.9 tokens/s INFO:__main__:2024-11-30 02:32:54 | Epoch: 0 | Step: 310430 | Dataset: 0-823560 | Loss: 0.945 | 598 ms/step , 115368.11 GFLOP/s , 173571.2 tokens/s INFO:__main__:2024-11-30 02:33:01 | Epoch: 0 | Step: 310440 | Dataset: 0-825960 | Loss: 0.907 | 599 ms/step , 115196.11 GFLOP/s , 173429.4 tokens/s INFO:__main__:2024-11-30 02:33:08 | Epoch: 0 | Step: 310450 | Dataset: 0-828360 | Loss: 0.839 | 598 ms/step , 115309.28 GFLOP/s , 173523.3 tokens/s INFO:__main__:2024-11-30 02:33:15 | Epoch: 0 | Step: 310460 | Dataset: 0-830760 | Loss: 0.886 | 598 ms/step , 115364.07 GFLOP/s , 173467.3 tokens/s INFO:__main__:2024-11-30 02:33:22 | Epoch: 0 | Step: 310470 | Dataset: 0-833160 | Loss: 0.851 | 599 ms/step , 115296.28 GFLOP/s , 173510.0 tokens/s INFO:__main__:2024-11-30 02:33:29 | Epoch: 0 | Step: 310480 | Dataset: 0-835560 | Loss: 0.894 | 598 ms/step , 115330.36 GFLOP/s , 173488.9 tokens/s INFO:__main__:2024-11-30 02:33:36 | Epoch: 0 | Step: 310490 | Dataset: 0-837960 | Loss: 0.822 | 598 ms/step , 115450.45 GFLOP/s , 173567.9 tokens/s INFO:__main__:2024-11-30 02:33:44 | Validation | Step: 310500 | Val_loss: 0.488 | Best_val_loss: 0.5343 INFO:__main__:2024-11-30 02:33:44 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_023344_step_310500.pt` INFO:__main__:2024-11-30 02:33:47 | Epoch: 0 | Step: 310500 | Dataset: 0-840360 | Loss: 0.792 | 595 ms/step , 116067.24 GFLOP/s , 119082.9 tokens/s INFO:__main__:2024-11-30 02:33:54 | Epoch: 0 | Step: 310510 | Dataset: 0-842760 | Loss: 0.728 | 598 ms/step , 115448.82 GFLOP/s , 173526.6 tokens/s INFO:__main__:2024-11-30 02:34:01 | Epoch: 0 | Step: 310520 | Dataset: 0-845160 | Loss: 0.868 | 598 ms/step , 115445.38 GFLOP/s , 173442.2 tokens/s INFO:__main__:2024-11-30 02:34:08 | Epoch: 0 | Step: 310530 | Dataset: 0-847560 | Loss: 0.870 | 597 ms/step , 115517.11 GFLOP/s , 173300.6 tokens/s INFO:__main__:2024-11-30 02:34:15 | Epoch: 0 | Step: 310540 | Dataset: 0-849960 | Loss: 0.818 | 599 ms/step , 115289.04 GFLOP/s , 173323.3 tokens/s INFO:__main__:2024-11-30 02:34:22 | Epoch: 0 | Step: 310550 | Dataset: 0-852360 | Loss: 0.856 | 598 ms/step , 115428.68 GFLOP/s , 173389.1 tokens/s INFO:__main__:2024-11-30 02:34:29 | Epoch: 0 | Step: 310560 | Dataset: 0-854760 | Loss: 0.795 | 598 ms/step , 115423.73 GFLOP/s , 173576.3 tokens/s INFO:__main__:2024-11-30 02:34:36 | Epoch: 0 | Step: 310570 | Dataset: 0-857160 | Loss: 0.812 | 598 ms/step , 115394.59 GFLOP/s , 173688.7 tokens/s INFO:__main__:2024-11-30 02:34:43 | Epoch: 0 | Step: 310580 | Dataset: 0-859560 | Loss: 0.762 | 598 ms/step , 115444.18 GFLOP/s , 173637.5 tokens/s INFO:__main__:2024-11-30 02:34:50 | Epoch: 0 | Step: 310590 | Dataset: 0-861960 | Loss: 0.766 | 600 ms/step , 115026.22 GFLOP/s , 173481.0 tokens/s INFO:__main__:2024-11-30 02:34:57 | Epoch: 0 | Step: 310600 | Dataset: 0-864360 | Loss: 0.752 | 598 ms/step , 115406.40 GFLOP/s , 173525.5 tokens/s INFO:__main__:2024-11-30 02:35:05 | Epoch: 0 | Step: 310610 | Dataset: 0-866760 | Loss: 0.932 | 598 ms/step , 115330.34 GFLOP/s , 173499.6 tokens/s INFO:__main__:2024-11-30 02:35:12 | Epoch: 0 | Step: 310620 | Dataset: 0-869160 | Loss: 0.840 | 598 ms/step , 115408.65 GFLOP/s , 173460.9 tokens/s INFO:__main__:2024-11-30 02:35:19 | Epoch: 0 | Step: 310630 | Dataset: 0-871560 | Loss: 0.773 | 598 ms/step , 115430.08 GFLOP/s , 173511.1 tokens/s INFO:__main__:2024-11-30 02:35:26 | Epoch: 0 | Step: 310640 | Dataset: 0-873960 | Loss: 0.840 | 597 ms/step , 115550.25 GFLOP/s , 173626.0 tokens/s INFO:__main__:2024-11-30 02:35:33 | Epoch: 0 | Step: 310650 | Dataset: 0-876360 | Loss: 0.738 | 598 ms/step , 115431.49 GFLOP/s , 173696.8 tokens/s INFO:__main__:2024-11-30 02:35:40 | Epoch: 0 | Step: 310660 | Dataset: 0-878760 | Loss: 0.836 | 599 ms/step , 115279.00 GFLOP/s , 173521.7 tokens/s INFO:__main__:2024-11-30 02:35:47 | Epoch: 0 | Step: 310670 | Dataset: 0-881160 | Loss: 0.736 | 598 ms/step , 115328.61 GFLOP/s , 173517.0 tokens/s INFO:__main__:2024-11-30 02:35:54 | Epoch: 0 | Step: 310680 | Dataset: 0-883560 | Loss: 0.818 | 598 ms/step , 115433.40 GFLOP/s , 173534.7 tokens/s INFO:__main__:2024-11-30 02:36:01 | Epoch: 0 | Step: 310690 | Dataset: 0-885960 | Loss: 0.749 | 598 ms/step , 115364.99 GFLOP/s , 173548.3 tokens/s INFO:__main__:2024-11-30 02:36:08 | Epoch: 0 | Step: 310700 | Dataset: 0-888360 | Loss: 0.787 | 598 ms/step , 115378.91 GFLOP/s , 173505.8 tokens/s INFO:__main__:2024-11-30 02:36:15 | Epoch: 0 | Step: 310710 | Dataset: 0-890760 | Loss: 0.770 | 598 ms/step , 115446.60 GFLOP/s , 173501.5 tokens/s INFO:__main__:2024-11-30 02:36:22 | Epoch: 0 | Step: 310720 | Dataset: 0-893160 | Loss: 0.803 | 597 ms/step , 115549.76 GFLOP/s , 173665.4 tokens/s INFO:__main__:2024-11-30 02:36:29 | Epoch: 0 | Step: 310730 | Dataset: 0-895560 | Loss: 0.733 | 598 ms/step , 115461.61 GFLOP/s , 173625.7 tokens/s INFO:__main__:2024-11-30 02:36:37 | Epoch: 0 | Step: 310740 | Dataset: 0-897960 | Loss: 0.809 | 598 ms/step , 115316.99 GFLOP/s , 173540.0 tokens/s INFO:__main__:2024-11-30 02:36:44 | Epoch: 0 | Step: 310750 | Dataset: 0-900360 | Loss: 0.736 | 598 ms/step , 115413.64 GFLOP/s , 173541.0 tokens/s INFO:__main__:2024-11-30 02:36:51 | Epoch: 0 | Step: 310760 | Dataset: 0-902760 | Loss: 0.763 | 598 ms/step , 115436.86 GFLOP/s , 173507.2 tokens/s INFO:__main__:2024-11-30 02:36:58 | Epoch: 0 | Step: 310770 | Dataset: 0-905160 | Loss: 0.835 | 598 ms/step , 115470.24 GFLOP/s , 173537.8 tokens/s INFO:__main__:2024-11-30 02:37:05 | Epoch: 0 | Step: 310780 | Dataset: 0-907560 | Loss: 0.797 | 598 ms/step , 115494.81 GFLOP/s , 173511.0 tokens/s INFO:__main__:2024-11-30 02:37:12 | Epoch: 0 | Step: 310790 | Dataset: 0-909960 | Loss: 0.908 | 597 ms/step , 115557.93 GFLOP/s , 173573.1 tokens/s INFO:__main__:2024-11-30 02:37:19 | Epoch: 0 | Step: 310800 | Dataset: 0-912360 | Loss: 0.800 | 598 ms/step , 115404.75 GFLOP/s , 173643.5 tokens/s INFO:__main__:2024-11-30 02:37:26 | Epoch: 0 | Step: 310810 | Dataset: 0-914760 | Loss: 0.781 | 598 ms/step , 115377.14 GFLOP/s , 173578.0 tokens/s INFO:__main__:2024-11-30 02:37:33 | Epoch: 0 | Step: 310820 | Dataset: 0-917160 | Loss: 0.753 | 598 ms/step , 115398.07 GFLOP/s , 173497.8 tokens/s INFO:__main__:2024-11-30 02:37:40 | Epoch: 0 | Step: 310830 | Dataset: 0-919560 | Loss: 0.910 | 598 ms/step , 115409.12 GFLOP/s , 173499.2 tokens/s INFO:__main__:2024-11-30 02:37:47 | Epoch: 0 | Step: 310840 | Dataset: 0-921960 | Loss: 0.794 | 598 ms/step , 115427.30 GFLOP/s , 173513.9 tokens/s INFO:__main__:2024-11-30 02:37:54 | Epoch: 0 | Step: 310850 | Dataset: 0-924360 | Loss: 0.831 | 599 ms/step , 115241.83 GFLOP/s , 173488.5 tokens/s INFO:__main__:2024-11-30 02:38:02 | Epoch: 0 | Step: 310860 | Dataset: 0-926760 | Loss: 0.852 | 599 ms/step , 115272.90 GFLOP/s , 173499.3 tokens/s INFO:__main__:2024-11-30 02:38:09 | Epoch: 0 | Step: 310870 | Dataset: 0-929160 | Loss: 0.790 | 598 ms/step , 115447.17 GFLOP/s , 173594.7 tokens/s INFO:__main__:2024-11-30 02:38:16 | Epoch: 0 | Step: 310880 | Dataset: 0-931560 | Loss: 0.793 | 599 ms/step , 115287.29 GFLOP/s , 173619.3 tokens/s INFO:__main__:2024-11-30 02:38:23 | Epoch: 0 | Step: 310890 | Dataset: 0-933960 | Loss: 0.843 | 598 ms/step , 115402.83 GFLOP/s , 173535.8 tokens/s INFO:__main__:2024-11-30 02:38:30 | Epoch: 0 | Step: 310900 | Dataset: 0-936360 | Loss: 0.794 | 598 ms/step , 115388.01 GFLOP/s , 173493.9 tokens/s INFO:__main__:2024-11-30 02:38:37 | Epoch: 0 | Step: 310910 | Dataset: 0-938760 | Loss: 0.685 | 598 ms/step , 115322.49 GFLOP/s , 173445.0 tokens/s INFO:__main__:2024-11-30 02:38:44 | Epoch: 0 | Step: 310920 | Dataset: 0-941160 | Loss: 0.884 | 598 ms/step , 115349.24 GFLOP/s , 173506.8 tokens/s INFO:__main__:2024-11-30 02:38:51 | Epoch: 0 | Step: 310930 | Dataset: 0-943560 | Loss: 0.871 | 598 ms/step , 115358.31 GFLOP/s , 173500.9 tokens/s INFO:__main__:2024-11-30 02:38:58 | Epoch: 0 | Step: 310940 | Dataset: 0-945960 | Loss: 0.807 | 598 ms/step , 115435.72 GFLOP/s , 173515.7 tokens/s INFO:__main__:2024-11-30 02:39:05 | Epoch: 0 | Step: 310950 | Dataset: 0-948360 | Loss: 0.744 | 598 ms/step , 115457.24 GFLOP/s , 173621.1 tokens/s INFO:__main__:2024-11-30 02:39:12 | Epoch: 0 | Step: 310960 | Dataset: 0-950760 | Loss: 0.881 | 598 ms/step , 115399.64 GFLOP/s , 173600.3 tokens/s INFO:__main__:2024-11-30 02:39:19 | Epoch: 0 | Step: 310970 | Dataset: 0-953160 | Loss: 0.737 | 598 ms/step , 115488.38 GFLOP/s , 173466.3 tokens/s INFO:__main__:2024-11-30 02:39:26 | Epoch: 0 | Step: 310980 | Dataset: 0-955560 | Loss: 0.773 | 599 ms/step , 115173.20 GFLOP/s , 173449.3 tokens/s INFO:__main__:2024-11-30 02:39:34 | Epoch: 0 | Step: 310990 | Dataset: 0-957960 | Loss: 0.689 | 599 ms/step , 115154.61 GFLOP/s , 173509.2 tokens/s INFO:__main__:2024-11-30 02:39:41 | Validation | Step: 311000 | Val_loss: 0.455 | Best_val_loss: 0.4879 INFO:__main__:2024-11-30 02:39:41 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_023941_step_311000.pt` INFO:__main__:2024-11-30 02:39:44 | Epoch: 0 | Step: 311000 | Dataset: 0-960360 | Loss: 0.807 | 595 ms/step , 116002.37 GFLOP/s , 118149.4 tokens/s INFO:__main__:2024-11-30 02:39:51 | Epoch: 0 | Step: 311010 | Dataset: 0-962760 | Loss: 0.868 | 598 ms/step , 115437.03 GFLOP/s , 173467.8 tokens/s INFO:__main__:2024-11-30 02:39:58 | Epoch: 0 | Step: 311020 | Dataset: 0-965160 | Loss: 0.782 | 598 ms/step , 115341.01 GFLOP/s , 173383.4 tokens/s INFO:__main__:2024-11-30 02:40:05 | Epoch: 0 | Step: 311030 | Dataset: 0-967560 | Loss: 0.792 | 598 ms/step , 115491.69 GFLOP/s , 173511.4 tokens/s INFO:__main__:2024-11-30 02:40:12 | Epoch: 0 | Step: 311040 | Dataset: 0-969960 | Loss: 0.738 | 598 ms/step , 115373.67 GFLOP/s , 173430.5 tokens/s INFO:__main__:2024-11-30 02:40:19 | Epoch: 0 | Step: 311050 | Dataset: 0-972360 | Loss: 0.813 | 598 ms/step , 115329.49 GFLOP/s , 173267.9 tokens/s INFO:__main__:2024-11-30 02:40:26 | Epoch: 0 | Step: 311060 | Dataset: 0-974760 | Loss: 0.824 | 598 ms/step , 115405.27 GFLOP/s , 173313.3 tokens/s INFO:__main__:2024-11-30 02:40:34 | Epoch: 0 | Step: 311070 | Dataset: 0-977160 | Loss: 0.775 | 599 ms/step , 115202.14 GFLOP/s , 173518.2 tokens/s INFO:__main__:2024-11-30 02:40:41 | Epoch: 0 | Step: 311080 | Dataset: 0-979560 | Loss: 0.792 | 599 ms/step , 115217.71 GFLOP/s , 173402.5 tokens/s INFO:__main__:2024-11-30 02:40:48 | Epoch: 0 | Step: 311090 | Dataset: 0-981960 | Loss: 0.860 | 598 ms/step , 115439.89 GFLOP/s , 173532.8 tokens/s INFO:__main__:2024-11-30 02:40:55 | Epoch: 0 | Step: 311100 | Dataset: 0-984360 | Loss: 0.790 | 598 ms/step , 115461.36 GFLOP/s , 173649.2 tokens/s INFO:__main__:2024-11-30 02:41:02 | Epoch: 0 | Step: 311110 | Dataset: 0-986760 | Loss: 0.877 | 598 ms/step , 115421.14 GFLOP/s , 173619.9 tokens/s INFO:__main__:2024-11-30 02:41:09 | Epoch: 0 | Step: 311120 | Dataset: 0-989160 | Loss: 0.793 | 599 ms/step , 115293.35 GFLOP/s , 173549.1 tokens/s INFO:__main__:2024-11-30 02:41:16 | Epoch: 0 | Step: 311130 | Dataset: 0-991560 | Loss: 0.839 | 599 ms/step , 115303.99 GFLOP/s , 173507.2 tokens/s INFO:__main__:2024-11-30 02:41:23 | Epoch: 0 | Step: 311140 | Dataset: 0-993960 | Loss: 0.782 | 599 ms/step , 115201.37 GFLOP/s , 173501.3 tokens/s INFO:__main__:2024-11-30 02:41:30 | Epoch: 0 | Step: 311150 | Dataset: 0-996360 | Loss: 0.736 | 598 ms/step , 115430.56 GFLOP/s , 173483.3 tokens/s INFO:__main__:2024-11-30 02:41:37 | Epoch: 0 | Step: 311160 | Dataset: 0-998760 | Loss: 0.820 | 598 ms/step , 115367.07 GFLOP/s , 173396.5 tokens/s INFO:__main__:2024-11-30 02:41:44 | Epoch: 0 | Step: 311170 | Dataset: 0-1001160 | Loss: 0.700 | 599 ms/step , 115268.77 GFLOP/s , 173580.6 tokens/s INFO:__main__:2024-11-30 02:41:51 | Epoch: 0 | Step: 311180 | Dataset: 0-1003560 | Loss: 0.806 | 597 ms/step , 115514.28 GFLOP/s , 173708.3 tokens/s INFO:__main__:2024-11-30 02:41:59 | Epoch: 0 | Step: 311190 | Dataset: 0-1005960 | Loss: 0.812 | 598 ms/step , 115333.04 GFLOP/s , 173495.6 tokens/s INFO:__main__:2024-11-30 02:42:06 | Epoch: 0 | Step: 311200 | Dataset: 0-1008360 | Loss: 0.718 | 599 ms/step , 115247.75 GFLOP/s , 173468.3 tokens/s INFO:__main__:2024-11-30 02:42:13 | Epoch: 0 | Step: 311210 | Dataset: 0-1010760 | Loss: 0.814 | 601 ms/step , 114905.42 GFLOP/s , 173398.5 tokens/s INFO:__main__:2024-11-30 02:42:20 | Epoch: 0 | Step: 311220 | Dataset: 0-1013160 | Loss: 0.806 | 598 ms/step , 115318.09 GFLOP/s , 173477.4 tokens/s INFO:__main__:2024-11-30 02:42:27 | Epoch: 0 | Step: 311230 | Dataset: 0-1015560 | Loss: 0.721 | 598 ms/step , 115384.16 GFLOP/s , 173468.9 tokens/s INFO:__main__:2024-11-30 02:42:34 | Epoch: 0 | Step: 311240 | Dataset: 0-1017960 | Loss: 0.801 | 598 ms/step , 115446.02 GFLOP/s , 173476.3 tokens/s INFO:__main__:2024-11-30 02:42:41 | Epoch: 0 | Step: 311250 | Dataset: 0-1020360 | Loss: 0.909 | 598 ms/step , 115411.25 GFLOP/s , 173552.5 tokens/s INFO:__main__:2024-11-30 02:42:48 | Epoch: 0 | Step: 311260 | Dataset: 0-1022760 | Loss: 0.780 | 598 ms/step , 115390.71 GFLOP/s , 173565.6 tokens/s INFO:__main__:2024-11-30 02:42:55 | Epoch: 0 | Step: 311270 | Dataset: 0-1025160 | Loss: 0.829 | 598 ms/step , 115344.12 GFLOP/s , 173437.3 tokens/s INFO:__main__:2024-11-30 02:43:02 | Epoch: 0 | Step: 311280 | Dataset: 0-1027560 | Loss: 0.686 | 598 ms/step , 115315.71 GFLOP/s , 173497.4 tokens/s INFO:__main__:2024-11-30 02:43:09 | Epoch: 0 | Step: 311290 | Dataset: 0-1029960 | Loss: 0.849 | 599 ms/step , 115286.63 GFLOP/s , 173428.0 tokens/s INFO:__main__:2024-11-30 02:43:16 | Epoch: 0 | Step: 311300 | Dataset: 0-1032360 | Loss: 0.770 | 598 ms/step , 115338.14 GFLOP/s , 173499.8 tokens/s INFO:__main__:2024-11-30 02:43:24 | Epoch: 0 | Step: 311310 | Dataset: 0-1034760 | Loss: 0.721 | 598 ms/step , 115404.54 GFLOP/s , 173489.3 tokens/s INFO:__main__:2024-11-30 02:43:31 | Epoch: 0 | Step: 311320 | Dataset: 0-1037160 | Loss: 0.817 | 598 ms/step , 115430.51 GFLOP/s , 173527.8 tokens/s INFO:__main__:2024-11-30 02:43:38 | Epoch: 0 | Step: 311330 | Dataset: 0-1039560 | Loss: 0.738 | 598 ms/step , 115464.03 GFLOP/s , 173668.8 tokens/s INFO:__main__:2024-11-30 02:43:45 | Epoch: 0 | Step: 311340 | Dataset: 0-1041960 | Loss: 0.797 | 598 ms/step , 115457.03 GFLOP/s , 173580.0 tokens/s INFO:__main__:2024-11-30 02:43:52 | Epoch: 0 | Step: 311350 | Dataset: 0-1044360 | Loss: 0.767 | 598 ms/step , 115343.31 GFLOP/s , 173427.4 tokens/s INFO:__main__:2024-11-30 02:43:59 | Epoch: 0 | Step: 311360 | Dataset: 0-1046760 | Loss: 0.714 | 598 ms/step , 115326.11 GFLOP/s , 173447.9 tokens/s INFO:__main__:2024-11-30 02:44:06 | Epoch: 0 | Step: 311370 | Dataset: 0-1049160 | Loss: 0.799 | 599 ms/step , 115293.61 GFLOP/s , 173474.9 tokens/s INFO:__main__:2024-11-30 02:44:13 | Epoch: 0 | Step: 311380 | Dataset: 0-1051560 | Loss: 0.791 | 598 ms/step , 115330.33 GFLOP/s , 173416.5 tokens/s INFO:__main__:2024-11-30 02:44:20 | Epoch: 0 | Step: 311390 | Dataset: 0-1053960 | Loss: 0.851 | 599 ms/step , 115244.05 GFLOP/s , 173416.3 tokens/s INFO:__main__:2024-11-30 02:44:27 | Epoch: 0 | Step: 311400 | Dataset: 0-1056360 | Loss: 0.750 | 598 ms/step , 115382.89 GFLOP/s , 173545.4 tokens/s INFO:__main__:2024-11-30 02:44:34 | Epoch: 0 | Step: 311410 | Dataset: 0-1058760 | Loss: 0.836 | 599 ms/step , 115269.43 GFLOP/s , 173577.2 tokens/s INFO:__main__:2024-11-30 02:44:41 | Epoch: 0 | Step: 311420 | Dataset: 0-1061160 | Loss: 0.735 | 599 ms/step , 115291.62 GFLOP/s , 173437.5 tokens/s INFO:__main__:2024-11-30 02:44:49 | Epoch: 0 | Step: 311430 | Dataset: 0-1063560 | Loss: 0.859 | 601 ms/step , 114909.90 GFLOP/s , 173313.1 tokens/s INFO:__main__:2024-11-30 02:44:56 | Epoch: 0 | Step: 311440 | Dataset: 0-1065960 | Loss: 0.857 | 599 ms/step , 115292.87 GFLOP/s , 173406.3 tokens/s INFO:__main__:2024-11-30 02:45:03 | Epoch: 0 | Step: 311450 | Dataset: 0-1068360 | Loss: 0.786 | 598 ms/step , 115327.62 GFLOP/s , 173423.7 tokens/s INFO:__main__:2024-11-30 02:45:10 | Epoch: 0 | Step: 311460 | Dataset: 0-1070760 | Loss: 0.826 | 599 ms/step , 115207.32 GFLOP/s , 173394.0 tokens/s INFO:__main__:2024-11-30 02:45:17 | Epoch: 0 | Step: 311470 | Dataset: 0-1073160 | Loss: 0.951 | 598 ms/step , 115415.22 GFLOP/s , 173425.8 tokens/s INFO:__main__:2024-11-30 02:45:24 | Epoch: 0 | Step: 311480 | Dataset: 0-1075560 | Loss: 0.822 | 598 ms/step , 115470.00 GFLOP/s , 173552.5 tokens/s INFO:__main__:2024-11-30 02:45:31 | Epoch: 0 | Step: 311490 | Dataset: 0-1077960 | Loss: 0.907 | 598 ms/step , 115346.03 GFLOP/s , 173508.3 tokens/s INFO:__main__:2024-11-30 02:45:39 | Validation | Step: 311500 | Val_loss: 0.436 | Best_val_loss: 0.4546 INFO:__main__:2024-11-30 02:45:39 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_024539_step_311500.pt` INFO:__main__:2024-11-30 02:45:41 | Epoch: 0 | Step: 311500 | Dataset: 0-1080360 | Loss: 0.821 | 595 ms/step , 116003.83 GFLOP/s , 117666.8 tokens/s INFO:__main__:2024-11-30 02:45:49 | Epoch: 0 | Step: 311510 | Dataset: 0-1082760 | Loss: 0.775 | 598 ms/step , 115426.93 GFLOP/s , 173412.1 tokens/s INFO:__main__:2024-11-30 02:45:56 | Epoch: 0 | Step: 311520 | Dataset: 0-1085160 | Loss: 0.766 | 598 ms/step , 115333.25 GFLOP/s , 173335.1 tokens/s INFO:__main__:2024-11-30 02:46:03 | Epoch: 0 | Step: 311530 | Dataset: 0-1087560 | Loss: 0.809 | 598 ms/step , 115312.82 GFLOP/s , 173288.6 tokens/s INFO:__main__:2024-11-30 02:46:10 | Epoch: 0 | Step: 311540 | Dataset: 0-1089960 | Loss: 0.875 | 598 ms/step , 115423.15 GFLOP/s , 173350.4 tokens/s INFO:__main__:2024-11-30 02:46:17 | Epoch: 0 | Step: 311550 | Dataset: 0-1092360 | Loss: 0.842 | 600 ms/step , 114997.84 GFLOP/s , 173574.7 tokens/s INFO:__main__:2024-11-30 02:46:24 | Epoch: 0 | Step: 311560 | Dataset: 0-1094760 | Loss: 0.727 | 599 ms/step , 115263.69 GFLOP/s , 173510.0 tokens/s INFO:__main__:2024-11-30 02:46:31 | Epoch: 0 | Step: 311570 | Dataset: 0-1097160 | Loss: 0.833 | 599 ms/step , 115266.48 GFLOP/s , 173387.3 tokens/s INFO:__main__:2024-11-30 02:46:38 | Epoch: 0 | Step: 311580 | Dataset: 0-1099560 | Loss: 0.708 | 599 ms/step , 115194.92 GFLOP/s , 173351.4 tokens/s INFO:__main__:2024-11-30 02:46:45 | Epoch: 0 | Step: 311590 | Dataset: 0-1101960 | Loss: 0.780 | 598 ms/step , 115310.67 GFLOP/s , 173398.0 tokens/s INFO:__main__:2024-11-30 02:46:52 | Epoch: 0 | Step: 311600 | Dataset: 0-1104360 | Loss: 0.800 | 600 ms/step , 115026.54 GFLOP/s , 173340.9 tokens/s INFO:__main__:2024-11-30 02:46:59 | Epoch: 0 | Step: 311610 | Dataset: 0-1106760 | Loss: 0.728 | 599 ms/step , 115201.66 GFLOP/s , 173355.5 tokens/s INFO:__main__:2024-11-30 02:47:07 | Epoch: 0 | Step: 311620 | Dataset: 0-1109160 | Loss: 0.835 | 598 ms/step , 115396.56 GFLOP/s , 173449.8 tokens/s INFO:__main__:2024-11-30 02:47:14 | Epoch: 0 | Step: 311630 | Dataset: 0-1111560 | Loss: 0.884 | 598 ms/step , 115318.99 GFLOP/s , 173492.9 tokens/s INFO:__main__:2024-11-30 02:47:21 | Epoch: 0 | Step: 311640 | Dataset: 0-1113960 | Loss: 0.821 | 599 ms/step , 115280.06 GFLOP/s , 173377.2 tokens/s INFO:__main__:2024-11-30 02:47:28 | Epoch: 0 | Step: 311650 | Dataset: 0-1116360 | Loss: 0.710 | 599 ms/step , 115190.24 GFLOP/s , 173318.8 tokens/s INFO:__main__:2024-11-30 02:47:35 | Epoch: 0 | Step: 311660 | Dataset: 0-1118760 | Loss: 0.832 | 598 ms/step , 115321.77 GFLOP/s , 173391.1 tokens/s INFO:__main__:2024-11-30 02:47:42 | Epoch: 0 | Step: 311670 | Dataset: 0-1121160 | Loss: 0.701 | 599 ms/step , 115181.10 GFLOP/s , 173314.6 tokens/s INFO:__main__:2024-11-30 02:47:49 | Epoch: 0 | Step: 311680 | Dataset: 0-1123560 | Loss: 0.840 | 599 ms/step , 115258.04 GFLOP/s , 173336.9 tokens/s INFO:__main__:2024-11-30 02:47:56 | Epoch: 0 | Step: 311690 | Dataset: 0-1125960 | Loss: 0.724 | 598 ms/step , 115316.74 GFLOP/s , 173364.6 tokens/s INFO:__main__:2024-11-30 02:48:03 | Epoch: 0 | Step: 311700 | Dataset: 0-1128360 | Loss: 0.722 | 599 ms/step , 115195.53 GFLOP/s , 173438.5 tokens/s INFO:__main__:2024-11-30 02:48:10 | Epoch: 0 | Step: 311710 | Dataset: 0-1130760 | Loss: 0.824 | 599 ms/step , 115171.75 GFLOP/s , 173436.3 tokens/s INFO:__main__:2024-11-30 02:48:17 | Epoch: 0 | Step: 311720 | Dataset: 0-1133160 | Loss: 0.797 | 598 ms/step , 115322.13 GFLOP/s , 173332.3 tokens/s INFO:__main__:2024-11-30 02:48:24 | Epoch: 0 | Step: 311730 | Dataset: 0-1135560 | Loss: 0.764 | 599 ms/step , 115286.54 GFLOP/s , 173300.6 tokens/s INFO:__main__:2024-11-30 02:48:32 | Epoch: 0 | Step: 311740 | Dataset: 0-1137960 | Loss: 0.788 | 599 ms/step , 115259.49 GFLOP/s , 173321.1 tokens/s INFO:__main__:2024-11-30 02:48:39 | Epoch: 0 | Step: 311750 | Dataset: 0-1140360 | Loss: 0.835 | 599 ms/step , 115244.65 GFLOP/s , 173342.1 tokens/s INFO:__main__:2024-11-30 02:48:46 | Epoch: 0 | Step: 311760 | Dataset: 0-1142760 | Loss: 0.772 | 599 ms/step , 115226.16 GFLOP/s , 173307.2 tokens/s INFO:__main__:2024-11-30 02:48:53 | Epoch: 0 | Step: 311770 | Dataset: 0-1145160 | Loss: 0.840 | 599 ms/step , 115257.83 GFLOP/s , 173432.5 tokens/s INFO:__main__:2024-11-30 02:49:00 | Epoch: 0 | Step: 311780 | Dataset: 0-1147560 | Loss: 0.769 | 599 ms/step , 115253.72 GFLOP/s , 173492.5 tokens/s INFO:__main__:2024-11-30 02:49:07 | Epoch: 0 | Step: 311790 | Dataset: 0-1149960 | Loss: 0.872 | 599 ms/step , 115258.87 GFLOP/s , 173330.4 tokens/s INFO:__main__:2024-11-30 02:49:14 | Epoch: 0 | Step: 311800 | Dataset: 0-1152360 | Loss: 0.731 | 599 ms/step , 115236.10 GFLOP/s , 173361.8 tokens/s INFO:__main__:2024-11-30 02:49:21 | Epoch: 0 | Step: 311810 | Dataset: 0-1154760 | Loss: 0.837 | 599 ms/step , 115230.48 GFLOP/s , 173356.5 tokens/s INFO:__main__:2024-11-30 02:49:28 | Epoch: 0 | Step: 311820 | Dataset: 0-1157160 | Loss: 0.755 | 599 ms/step , 115136.20 GFLOP/s , 173354.6 tokens/s INFO:__main__:2024-11-30 02:49:35 | Epoch: 0 | Step: 311830 | Dataset: 0-1159560 | Loss: 0.728 | 599 ms/step , 115167.15 GFLOP/s , 173366.0 tokens/s INFO:__main__:2024-11-30 02:49:42 | Epoch: 0 | Step: 311840 | Dataset: 0-1161960 | Loss: 0.778 | 598 ms/step , 115424.41 GFLOP/s , 173360.0 tokens/s INFO:__main__:2024-11-30 02:49:50 | Epoch: 0 | Step: 311850 | Dataset: 0-1164360 | Loss: 0.797 | 598 ms/step , 115358.33 GFLOP/s , 173489.3 tokens/s INFO:__main__:2024-11-30 02:49:57 | Epoch: 0 | Step: 311860 | Dataset: 0-1166760 | Loss: 0.740 | 597 ms/step , 115633.43 GFLOP/s , 173504.7 tokens/s INFO:__main__:2024-11-30 02:50:04 | Epoch: 0 | Step: 311870 | Dataset: 0-1169160 | Loss: 0.825 | 596 ms/step , 115795.31 GFLOP/s , 173730.7 tokens/s INFO:__main__:2024-11-30 02:50:11 | Epoch: 0 | Step: 311880 | Dataset: 0-1171560 | Loss: 0.698 | 597 ms/step , 115621.03 GFLOP/s , 173740.4 tokens/s INFO:__main__:2024-11-30 02:50:18 | Epoch: 0 | Step: 311890 | Dataset: 0-1173960 | Loss: 0.775 | 597 ms/step , 115651.58 GFLOP/s , 173745.8 tokens/s INFO:__main__:2024-11-30 02:50:25 | Epoch: 0 | Step: 311900 | Dataset: 0-1176360 | Loss: 0.788 | 597 ms/step , 115686.89 GFLOP/s , 173709.5 tokens/s INFO:__main__:2024-11-30 02:50:32 | Epoch: 0 | Step: 311910 | Dataset: 0-1178760 | Loss: 0.696 | 597 ms/step , 115557.89 GFLOP/s , 173708.2 tokens/s INFO:__main__:2024-11-30 02:50:39 | Epoch: 0 | Step: 311920 | Dataset: 0-1181160 | Loss: 0.476 | 596 ms/step , 115825.94 GFLOP/s , 173845.7 tokens/s INFO:__main__:2024-11-30 02:50:46 | Epoch: 0 | Step: 311930 | Dataset: 0-1183560 | Loss: 0.459 | 596 ms/step , 115824.92 GFLOP/s , 173839.3 tokens/s INFO:__main__:2024-11-30 02:50:53 | Epoch: 0 | Step: 311940 | Dataset: 0-1185960 | Loss: 0.421 | 596 ms/step , 115783.03 GFLOP/s , 173873.8 tokens/s INFO:__main__:2024-11-30 02:51:00 | Epoch: 0 | Step: 311950 | Dataset: 0-1188360 | Loss: 0.468 | 596 ms/step , 115762.81 GFLOP/s , 173827.1 tokens/s INFO:__main__:2024-11-30 02:51:07 | Epoch: 0 | Step: 311960 | Dataset: 0-1190760 | Loss: 0.474 | 596 ms/step , 115796.37 GFLOP/s , 173741.8 tokens/s INFO:__main__:2024-11-30 02:51:14 | Epoch: 0 | Step: 311970 | Dataset: 0-1193160 | Loss: 0.502 | 596 ms/step , 115700.66 GFLOP/s , 173800.2 tokens/s INFO:__main__:2024-11-30 02:51:21 | Epoch: 0 | Step: 311980 | Dataset: 0-1195560 | Loss: 0.424 | 596 ms/step , 115826.55 GFLOP/s , 173724.0 tokens/s INFO:__main__:2024-11-30 02:51:29 | Epoch: 0 | Step: 311990 | Dataset: 0-1197960 | Loss: 0.438 | 596 ms/step , 115778.72 GFLOP/s , 173803.8 tokens/s INFO:__main__:2024-11-30 02:51:36 | Validation | Step: 312000 | Val_loss: 0.460 | Best_val_loss: 0.4363 INFO:__main__:2024-11-30 02:51:36 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_025136_step_312000.pt` INFO:__main__:2024-11-30 02:51:39 | Epoch: 0 | Step: 312000 | Dataset: 0-1200360 | Loss: 0.486 | 593 ms/step , 116301.46 GFLOP/s , 120684.6 tokens/s INFO:__main__:2024-11-30 02:51:46 | Epoch: 0 | Step: 312010 | Dataset: 0-1202760 | Loss: 0.413 | 598 ms/step , 115322.93 GFLOP/s , 173506.1 tokens/s INFO:__main__:2024-11-30 02:51:53 | Epoch: 0 | Step: 312020 | Dataset: 0-1205160 | Loss: 0.398 | 598 ms/step , 115381.53 GFLOP/s , 173351.5 tokens/s INFO:__main__:2024-11-30 02:52:00 | Epoch: 0 | Step: 312030 | Dataset: 0-1207560 | Loss: 0.424 | 600 ms/step , 114992.41 GFLOP/s , 173431.2 tokens/s INFO:__main__:2024-11-30 02:52:07 | Epoch: 0 | Step: 312040 | Dataset: 0-1209960 | Loss: 0.406 | 598 ms/step , 115358.02 GFLOP/s , 173515.7 tokens/s INFO:__main__:2024-11-30 02:52:14 | Epoch: 0 | Step: 312050 | Dataset: 0-1212360 | Loss: 0.424 | 598 ms/step , 115496.50 GFLOP/s , 173492.7 tokens/s INFO:__main__:2024-11-30 02:52:21 | Epoch: 0 | Step: 312060 | Dataset: 0-1214760 | Loss: 0.436 | 598 ms/step , 115377.51 GFLOP/s , 173520.1 tokens/s INFO:__main__:2024-11-30 02:52:28 | Epoch: 0 | Step: 312070 | Dataset: 0-1217160 | Loss: 0.396 | 598 ms/step , 115375.25 GFLOP/s , 173519.2 tokens/s INFO:__main__:2024-11-30 02:52:35 | Epoch: 0 | Step: 312080 | Dataset: 0-1219560 | Loss: 0.371 | 598 ms/step , 115408.09 GFLOP/s , 173652.2 tokens/s INFO:__main__:2024-11-30 02:52:42 | Epoch: 0 | Step: 312090 | Dataset: 0-1221960 | Loss: 0.406 | 598 ms/step , 115348.87 GFLOP/s , 173528.5 tokens/s INFO:__main__:2024-11-30 02:52:50 | Epoch: 0 | Step: 312100 | Dataset: 0-1224360 | Loss: 0.403 | 600 ms/step , 115109.88 GFLOP/s , 173447.1 tokens/s INFO:__main__:2024-11-30 02:52:57 | Epoch: 0 | Step: 312110 | Dataset: 0-1226760 | Loss: 0.428 | 598 ms/step , 115322.32 GFLOP/s , 173452.7 tokens/s INFO:__main__:2024-11-30 02:53:04 | Epoch: 0 | Step: 312120 | Dataset: 0-1229160 | Loss: 0.422 | 598 ms/step , 115352.69 GFLOP/s , 173455.4 tokens/s INFO:__main__:2024-11-30 02:53:11 | Epoch: 0 | Step: 312130 | Dataset: 0-1231560 | Loss: 0.399 | 597 ms/step , 115681.72 GFLOP/s , 173824.4 tokens/s INFO:__main__:2024-11-30 02:53:18 | Epoch: 0 | Step: 312140 | Dataset: 0-1233960 | Loss: 0.445 | 596 ms/step , 115746.12 GFLOP/s , 173769.0 tokens/s INFO:__main__:2024-11-30 02:53:25 | Epoch: 0 | Step: 312150 | Dataset: 0-1236360 | Loss: 0.438 | 596 ms/step , 115774.29 GFLOP/s , 173886.5 tokens/s INFO:__main__:2024-11-30 02:53:32 | Epoch: 0 | Step: 312160 | Dataset: 0-1238760 | Loss: 0.419 | 596 ms/step , 115756.34 GFLOP/s , 173962.2 tokens/s INFO:__main__:2024-11-30 02:53:39 | Epoch: 0 | Step: 312170 | Dataset: 0-1241160 | Loss: 0.447 | 596 ms/step , 115741.20 GFLOP/s , 173777.3 tokens/s INFO:__main__:2024-11-30 02:53:46 | Epoch: 0 | Step: 312180 | Dataset: 0-1243560 | Loss: 0.430 | 596 ms/step , 115711.06 GFLOP/s , 173755.5 tokens/s INFO:__main__:2024-11-30 02:53:53 | Epoch: 0 | Step: 312190 | Dataset: 0-1245960 | Loss: 0.452 | 596 ms/step , 115715.04 GFLOP/s , 173755.8 tokens/s INFO:__main__:2024-11-30 02:54:00 | Epoch: 0 | Step: 312200 | Dataset: 0-1248360 | Loss: 0.439 | 596 ms/step , 115837.40 GFLOP/s , 173806.0 tokens/s INFO:__main__:2024-11-30 02:54:07 | Epoch: 0 | Step: 312210 | Dataset: 0-1250760 | Loss: 0.428 | 597 ms/step , 115506.18 GFLOP/s , 173820.2 tokens/s INFO:__main__:2024-11-30 02:54:14 | Epoch: 0 | Step: 312220 | Dataset: 0-1253160 | Loss: 0.418 | 596 ms/step , 115769.45 GFLOP/s , 173836.4 tokens/s INFO:__main__:2024-11-30 02:54:21 | Epoch: 0 | Step: 312230 | Dataset: 0-1255560 | Loss: 0.366 | 595 ms/step , 115911.66 GFLOP/s , 173933.1 tokens/s INFO:__main__:2024-11-30 02:54:29 | Epoch: 0 | Step: 312240 | Dataset: 0-1257960 | Loss: 0.373 | 597 ms/step , 115663.78 GFLOP/s , 173797.8 tokens/s INFO:__main__:2024-11-30 02:54:36 | Epoch: 0 | Step: 312250 | Dataset: 0-1260360 | Loss: 0.438 | 596 ms/step , 115830.02 GFLOP/s , 173733.1 tokens/s INFO:__main__:2024-11-30 02:54:43 | Epoch: 0 | Step: 312260 | Dataset: 0-1262760 | Loss: 0.397 | 599 ms/step , 115128.00 GFLOP/s , 173758.1 tokens/s INFO:__main__:2024-11-30 02:54:50 | Epoch: 0 | Step: 312270 | Dataset: 0-1265160 | Loss: 0.382 | 596 ms/step , 115805.60 GFLOP/s , 173822.8 tokens/s INFO:__main__:2024-11-30 02:54:57 | Epoch: 0 | Step: 312280 | Dataset: 0-1267560 | Loss: 0.432 | 597 ms/step , 115693.48 GFLOP/s , 173746.0 tokens/s INFO:__main__:2024-11-30 02:55:04 | Epoch: 0 | Step: 312290 | Dataset: 0-1269960 | Loss: 0.372 | 596 ms/step , 115785.64 GFLOP/s , 173843.0 tokens/s INFO:__main__:2024-11-30 02:55:11 | Epoch: 0 | Step: 312300 | Dataset: 0-1272360 | Loss: 0.444 | 595 ms/step , 115926.31 GFLOP/s , 173886.0 tokens/s INFO:__main__:2024-11-30 02:55:18 | Epoch: 0 | Step: 312310 | Dataset: 0-1274760 | Loss: 0.399 | 596 ms/step , 115712.61 GFLOP/s , 173963.2 tokens/s INFO:__main__:2024-11-30 02:55:25 | Epoch: 0 | Step: 312320 | Dataset: 0-1277160 | Loss: 0.410 | 597 ms/step , 115641.27 GFLOP/s , 173789.4 tokens/s INFO:__main__:2024-11-30 02:55:32 | Epoch: 0 | Step: 312330 | Dataset: 0-1279560 | Loss: 0.443 | 596 ms/step , 115803.70 GFLOP/s , 173751.3 tokens/s INFO:__main__:2024-11-30 02:55:39 | Epoch: 0 | Step: 312340 | Dataset: 0-1281960 | Loss: 0.401 | 597 ms/step , 115509.12 GFLOP/s , 173773.4 tokens/s INFO:__main__:2024-11-30 02:55:46 | Epoch: 0 | Step: 312350 | Dataset: 0-1284360 | Loss: 0.476 | 596 ms/step , 115730.63 GFLOP/s , 173813.5 tokens/s INFO:__main__:2024-11-30 02:55:53 | Epoch: 0 | Step: 312360 | Dataset: 0-1286760 | Loss: 0.409 | 597 ms/step , 115692.12 GFLOP/s , 173687.4 tokens/s INFO:__main__:2024-11-30 02:56:00 | Epoch: 0 | Step: 312370 | Dataset: 0-1289160 | Loss: 0.447 | 596 ms/step , 115782.70 GFLOP/s , 173777.2 tokens/s INFO:__main__:2024-11-30 02:56:08 | Epoch: 0 | Step: 312380 | Dataset: 0-1291560 | Loss: 0.381 | 596 ms/step , 115797.53 GFLOP/s , 173842.2 tokens/s INFO:__main__:2024-11-30 02:56:15 | Epoch: 0 | Step: 312390 | Dataset: 0-1293960 | Loss: 0.413 | 596 ms/step , 115858.02 GFLOP/s , 173831.7 tokens/s INFO:__main__:2024-11-30 02:56:22 | Epoch: 0 | Step: 312400 | Dataset: 0-1296360 | Loss: 0.401 | 597 ms/step , 115647.24 GFLOP/s , 173771.8 tokens/s INFO:__main__:2024-11-30 02:56:29 | Epoch: 0 | Step: 312410 | Dataset: 0-1298760 | Loss: 0.402 | 596 ms/step , 115770.11 GFLOP/s , 173731.2 tokens/s INFO:__main__:2024-11-30 02:56:36 | Epoch: 0 | Step: 312420 | Dataset: 0-1301160 | Loss: 0.412 | 597 ms/step , 115665.50 GFLOP/s , 173719.6 tokens/s INFO:__main__:2024-11-30 02:56:43 | Epoch: 0 | Step: 312430 | Dataset: 0-1303560 | Loss: 0.394 | 596 ms/step , 115736.44 GFLOP/s , 173724.3 tokens/s INFO:__main__:2024-11-30 02:56:50 | Epoch: 0 | Step: 312440 | Dataset: 0-1305960 | Loss: 0.453 | 596 ms/step , 115825.47 GFLOP/s , 173757.0 tokens/s INFO:__main__:2024-11-30 02:56:57 | Epoch: 0 | Step: 312450 | Dataset: 0-1308360 | Loss: 0.400 | 596 ms/step , 115846.92 GFLOP/s , 173737.6 tokens/s INFO:__main__:2024-11-30 02:57:04 | Epoch: 0 | Step: 312460 | Dataset: 0-1310760 | Loss: 0.451 | 598 ms/step , 115358.27 GFLOP/s , 173782.8 tokens/s INFO:__main__:2024-11-30 02:57:11 | Epoch: 0 | Step: 312470 | Dataset: 0-1313160 | Loss: 0.538 | 596 ms/step , 115725.65 GFLOP/s , 173631.3 tokens/s INFO:__main__:2024-11-30 02:57:18 | Epoch: 0 | Step: 312480 | Dataset: 0-1315560 | Loss: 1.603 | 597 ms/step , 115629.92 GFLOP/s , 173693.3 tokens/s INFO:__main__:2024-11-30 02:57:25 | Epoch: 0 | Step: 312490 | Dataset: 0-1317960 | Loss: 0.388 | 597 ms/step , 115543.47 GFLOP/s , 173761.9 tokens/s INFO:__main__:2024-11-30 02:57:33 | Validation | Step: 312500 | Val_loss: 0.572 | Best_val_loss: 0.4363 INFO:__main__:2024-11-30 02:57:34 | Epoch: 0 | Step: 312500 | Dataset: 0-1320360 | Loss: 0.576 | 596 ms/step , 115767.86 GFLOP/s , 147771.6 tokens/s INFO:__main__:2024-11-30 02:57:41 | Epoch: 0 | Step: 312510 | Dataset: 0-1322760 | Loss: 1.077 | 597 ms/step , 115627.56 GFLOP/s , 173735.8 tokens/s INFO:__main__:2024-11-30 02:57:48 | Epoch: 0 | Step: 312520 | Dataset: 0-1325160 | Loss: 1.040 | 596 ms/step , 115778.50 GFLOP/s , 173930.2 tokens/s INFO:__main__:2024-11-30 02:57:55 | Epoch: 0 | Step: 312530 | Dataset: 0-1327560 | Loss: 0.828 | 597 ms/step , 115612.48 GFLOP/s , 173937.1 tokens/s INFO:__main__:2024-11-30 02:58:02 | Epoch: 0 | Step: 312540 | Dataset: 0-1329960 | Loss: 0.671 | 596 ms/step , 115746.05 GFLOP/s , 173661.5 tokens/s INFO:__main__:2024-11-30 02:58:09 | Epoch: 0 | Step: 312550 | Dataset: 0-1332360 | Loss: 0.648 | 596 ms/step , 115730.19 GFLOP/s , 173730.5 tokens/s INFO:__main__:2024-11-30 02:58:16 | Epoch: 0 | Step: 312560 | Dataset: 0-1334760 | Loss: 0.430 | 596 ms/step , 115863.42 GFLOP/s , 173837.4 tokens/s INFO:__main__:2024-11-30 02:58:23 | Epoch: 0 | Step: 312570 | Dataset: 0-1337160 | Loss: 0.585 | 597 ms/step , 115610.27 GFLOP/s , 173682.6 tokens/s INFO:__main__:2024-11-30 02:58:30 | Epoch: 0 | Step: 312580 | Dataset: 0-1339560 | Loss: 0.455 | 596 ms/step , 115744.61 GFLOP/s , 173801.5 tokens/s INFO:__main__:2024-11-30 02:58:37 | Epoch: 0 | Step: 312590 | Dataset: 0-1341960 | Loss: 0.232 | 596 ms/step , 115796.91 GFLOP/s , 173887.1 tokens/s INFO:__main__:2024-11-30 02:58:44 | Epoch: 0 | Step: 312600 | Dataset: 0-1344360 | Loss: 0.467 | 596 ms/step , 115806.55 GFLOP/s , 173883.1 tokens/s INFO:__main__:2024-11-30 02:58:51 | Epoch: 0 | Step: 312610 | Dataset: 0-1346760 | Loss: 0.631 | 596 ms/step , 115737.31 GFLOP/s , 173879.6 tokens/s INFO:__main__:2024-11-30 02:58:58 | Epoch: 0 | Step: 312620 | Dataset: 0-1349160 | Loss: 0.919 | 596 ms/step , 115844.54 GFLOP/s , 173716.8 tokens/s INFO:__main__:2024-11-30 02:59:06 | Epoch: 0 | Step: 312630 | Dataset: 0-1351560 | Loss: 0.239 | 602 ms/step , 114685.13 GFLOP/s , 173515.7 tokens/s INFO:__main__:2024-11-30 02:59:13 | Epoch: 0 | Step: 312640 | Dataset: 0-1353960 | Loss: 1.297 | 597 ms/step , 115517.96 GFLOP/s , 173676.7 tokens/s INFO:__main__:2024-11-30 02:59:20 | Epoch: 0 | Step: 312650 | Dataset: 0-1356360 | Loss: 1.306 | 598 ms/step , 115443.11 GFLOP/s , 173601.7 tokens/s INFO:__main__:2024-11-30 02:59:27 | Epoch: 0 | Step: 312660 | Dataset: 0-1358760 | Loss: 1.305 | 598 ms/step , 115406.89 GFLOP/s , 173645.5 tokens/s INFO:__main__:2024-11-30 02:59:34 | Epoch: 0 | Step: 312670 | Dataset: 0-1361160 | Loss: 1.291 | 597 ms/step , 115681.80 GFLOP/s , 173645.6 tokens/s INFO:__main__:2024-11-30 02:59:41 | Epoch: 0 | Step: 312680 | Dataset: 0-1363560 | Loss: 1.242 | 597 ms/step , 115684.00 GFLOP/s , 173666.0 tokens/s INFO:__main__:2024-11-30 02:59:48 | Epoch: 0 | Step: 312690 | Dataset: 0-1365960 | Loss: 1.241 | 597 ms/step , 115583.59 GFLOP/s , 173673.8 tokens/s INFO:__main__:2024-11-30 02:59:55 | Epoch: 0 | Step: 312700 | Dataset: 0-1368360 | Loss: 1.261 | 597 ms/step , 115676.60 GFLOP/s , 173554.9 tokens/s INFO:__main__:2024-11-30 03:00:02 | Epoch: 0 | Step: 312710 | Dataset: 0-1370760 | Loss: 1.214 | 597 ms/step , 115520.70 GFLOP/s , 173572.0 tokens/s INFO:__main__:2024-11-30 03:00:09 | Epoch: 0 | Step: 312720 | Dataset: 0-1373160 | Loss: 1.225 | 598 ms/step , 115460.82 GFLOP/s , 173558.8 tokens/s INFO:__main__:2024-11-30 03:00:16 | Epoch: 0 | Step: 312730 | Dataset: 0-1375560 | Loss: 1.209 | 598 ms/step , 115466.79 GFLOP/s , 173483.8 tokens/s INFO:__main__:2024-11-30 03:00:23 | Epoch: 0 | Step: 312740 | Dataset: 0-1377960 | Loss: 1.240 | 597 ms/step , 115581.30 GFLOP/s , 173427.6 tokens/s INFO:__main__:2024-11-30 03:00:31 | Epoch: 0 | Step: 312750 | Dataset: 0-1380360 | Loss: 1.204 | 599 ms/step , 115142.65 GFLOP/s , 173585.6 tokens/s INFO:__main__:2024-11-30 03:00:38 | Epoch: 0 | Step: 312760 | Dataset: 0-1382760 | Loss: 1.211 | 598 ms/step , 115459.14 GFLOP/s , 173714.8 tokens/s INFO:__main__:2024-11-30 03:00:45 | Epoch: 0 | Step: 312770 | Dataset: 0-1385160 | Loss: 1.183 | 597 ms/step , 115594.93 GFLOP/s , 173498.4 tokens/s INFO:__main__:2024-11-30 03:00:52 | Epoch: 0 | Step: 312780 | Dataset: 0-1387560 | Loss: 1.196 | 598 ms/step , 115365.39 GFLOP/s , 173453.9 tokens/s INFO:__main__:2024-11-30 03:00:59 | Epoch: 0 | Step: 312790 | Dataset: 0-1389960 | Loss: 1.155 | 598 ms/step , 115346.57 GFLOP/s , 173441.6 tokens/s INFO:__main__:2024-11-30 03:01:06 | Epoch: 0 | Step: 312800 | Dataset: 0-1392360 | Loss: 1.182 | 598 ms/step , 115332.74 GFLOP/s , 173452.8 tokens/s INFO:__main__:2024-11-30 03:01:13 | Epoch: 0 | Step: 312810 | Dataset: 0-1394760 | Loss: 1.193 | 597 ms/step , 115565.51 GFLOP/s , 173535.2 tokens/s INFO:__main__:2024-11-30 03:01:20 | Epoch: 0 | Step: 312820 | Dataset: 0-1397160 | Loss: 1.205 | 597 ms/step , 115621.16 GFLOP/s , 173542.8 tokens/s INFO:__main__:2024-11-30 03:01:27 | Epoch: 0 | Step: 312830 | Dataset: 0-1399560 | Loss: 1.153 | 597 ms/step , 115661.54 GFLOP/s , 173615.7 tokens/s INFO:__main__:2024-11-30 03:01:34 | Epoch: 0 | Step: 312840 | Dataset: 0-1401960 | Loss: 1.163 | 597 ms/step , 115511.90 GFLOP/s , 173723.6 tokens/s INFO:__main__:2024-11-30 03:01:41 | Epoch: 0 | Step: 312850 | Dataset: 0-1404360 | Loss: 1.151 | 598 ms/step , 115496.44 GFLOP/s , 173597.3 tokens/s INFO:__main__:2024-11-30 03:01:48 | Epoch: 0 | Step: 312860 | Dataset: 0-1406760 | Loss: 1.171 | 598 ms/step , 115383.59 GFLOP/s , 173560.0 tokens/s INFO:__main__:2024-11-30 03:01:55 | Epoch: 0 | Step: 312870 | Dataset: 0-1409160 | Loss: 1.157 | 597 ms/step , 115669.81 GFLOP/s , 173646.3 tokens/s INFO:__main__:2024-11-30 03:02:03 | Epoch: 0 | Step: 312880 | Dataset: 0-1411560 | Loss: 1.210 | 597 ms/step , 115537.32 GFLOP/s , 173570.0 tokens/s INFO:__main__:2024-11-30 03:02:10 | Epoch: 0 | Step: 312890 | Dataset: 0-1413960 | Loss: 1.180 | 597 ms/step , 115586.34 GFLOP/s , 173598.6 tokens/s INFO:__main__:2024-11-30 03:02:17 | Epoch: 0 | Step: 312900 | Dataset: 0-1416360 | Loss: 1.160 | 597 ms/step , 115609.18 GFLOP/s , 173546.3 tokens/s INFO:__main__:2024-11-30 03:02:24 | Epoch: 0 | Step: 312910 | Dataset: 0-1418760 | Loss: 1.178 | 597 ms/step , 115619.11 GFLOP/s , 173764.2 tokens/s INFO:__main__:2024-11-30 03:02:31 | Epoch: 0 | Step: 312920 | Dataset: 0-1421160 | Loss: 1.181 | 598 ms/step , 115464.41 GFLOP/s , 173537.8 tokens/s INFO:__main__:2024-11-30 03:02:38 | Epoch: 0 | Step: 312930 | Dataset: 0-1423560 | Loss: 1.173 | 597 ms/step , 115655.79 GFLOP/s , 173616.7 tokens/s INFO:__main__:2024-11-30 03:02:45 | Epoch: 0 | Step: 312940 | Dataset: 0-1425960 | Loss: 1.177 | 597 ms/step , 115582.97 GFLOP/s , 173572.6 tokens/s INFO:__main__:2024-11-30 03:02:52 | Epoch: 0 | Step: 312950 | Dataset: 0-1428360 | Loss: 1.147 | 598 ms/step , 115444.00 GFLOP/s , 173594.2 tokens/s INFO:__main__:2024-11-30 03:02:59 | Epoch: 0 | Step: 312960 | Dataset: 0-1430760 | Loss: 1.165 | 598 ms/step , 115470.92 GFLOP/s , 173601.0 tokens/s INFO:__main__:2024-11-30 03:03:06 | Epoch: 0 | Step: 312970 | Dataset: 0-1433160 | Loss: 1.106 | 597 ms/step , 115630.85 GFLOP/s , 173629.1 tokens/s INFO:__main__:2024-11-30 03:03:13 | Epoch: 0 | Step: 312980 | Dataset: 0-1435560 | Loss: 1.122 | 598 ms/step , 115435.76 GFLOP/s , 173718.9 tokens/s INFO:__main__:2024-11-30 03:03:20 | Epoch: 0 | Step: 312990 | Dataset: 0-1437960 | Loss: 1.157 | 597 ms/step , 115603.24 GFLOP/s , 173652.8 tokens/s INFO:__main__:2024-11-30 03:03:28 | Validation | Step: 313000 | Val_loss: 0.576 | Best_val_loss: 0.4363 INFO:__main__:2024-11-30 03:03:28 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_030328_step_313000.pt` INFO:__main__:2024-11-30 03:03:31 | Epoch: 0 | Step: 313000 | Dataset: 0-1440360 | Loss: 1.135 | 596 ms/step , 115745.27 GFLOP/s , 117052.1 tokens/s INFO:__main__:2024-11-30 03:03:38 | Epoch: 0 | Step: 313010 | Dataset: 0-1442760 | Loss: 1.018 | 599 ms/step , 115291.61 GFLOP/s , 173275.8 tokens/s INFO:__main__:2024-11-30 03:03:45 | Epoch: 0 | Step: 313020 | Dataset: 0-1445160 | Loss: 0.985 | 600 ms/step , 115036.07 GFLOP/s , 173150.5 tokens/s INFO:__main__:2024-11-30 03:03:52 | Epoch: 0 | Step: 313030 | Dataset: 0-1447560 | Loss: 0.885 | 600 ms/step , 114982.44 GFLOP/s , 173037.8 tokens/s INFO:__main__:2024-11-30 03:03:59 | Epoch: 0 | Step: 313040 | Dataset: 0-1449960 | Loss: 0.915 | 599 ms/step , 115298.14 GFLOP/s , 173269.1 tokens/s INFO:__main__:2024-11-30 03:04:06 | Epoch: 0 | Step: 313050 | Dataset: 0-1452360 | Loss: 0.984 | 599 ms/step , 115196.70 GFLOP/s , 173335.8 tokens/s INFO:__main__:2024-11-30 03:04:13 | Epoch: 0 | Step: 313060 | Dataset: 0-1454760 | Loss: 0.941 | 599 ms/step , 115203.86 GFLOP/s , 173335.2 tokens/s INFO:__main__:2024-11-30 03:04:21 | Epoch: 0 | Step: 313070 | Dataset: 0-1457160 | Loss: 0.910 | 600 ms/step , 115072.30 GFLOP/s , 173081.0 tokens/s INFO:__main__:2024-11-30 03:04:28 | Epoch: 0 | Step: 313080 | Dataset: 0-1459560 | Loss: 0.947 | 599 ms/step , 115124.24 GFLOP/s , 173200.3 tokens/s INFO:__main__:2024-11-30 03:04:35 | Epoch: 0 | Step: 313090 | Dataset: 0-1461960 | Loss: 0.795 | 600 ms/step , 115111.71 GFLOP/s , 173145.7 tokens/s INFO:__main__:2024-11-30 03:04:42 | Epoch: 0 | Step: 313100 | Dataset: 0-1464360 | Loss: 0.989 | 600 ms/step , 115005.60 GFLOP/s , 173099.4 tokens/s INFO:__main__:2024-11-30 03:04:49 | Epoch: 0 | Step: 313110 | Dataset: 0-1466760 | Loss: 0.819 | 600 ms/step , 114959.17 GFLOP/s , 173121.1 tokens/s INFO:__main__:2024-11-30 03:04:56 | Epoch: 0 | Step: 313120 | Dataset: 0-1469160 | Loss: 0.955 | 600 ms/step , 115024.50 GFLOP/s , 173148.4 tokens/s INFO:__main__:2024-11-30 03:05:03 | Epoch: 0 | Step: 313130 | Dataset: 0-1471560 | Loss: 0.946 | 600 ms/step , 115113.25 GFLOP/s , 173286.6 tokens/s INFO:__main__:2024-11-30 03:05:10 | Epoch: 0 | Step: 313140 | Dataset: 0-1473960 | Loss: 0.845 | 605 ms/step , 114140.63 GFLOP/s , 173130.6 tokens/s INFO:__main__:2024-11-30 03:05:17 | Epoch: 0 | Step: 313150 | Dataset: 0-1476360 | Loss: 0.814 | 601 ms/step , 114859.68 GFLOP/s , 173139.3 tokens/s INFO:__main__:2024-11-30 03:05:24 | Epoch: 0 | Step: 313160 | Dataset: 0-1478760 | Loss: 0.828 | 599 ms/step , 115182.07 GFLOP/s , 173016.8 tokens/s INFO:__main__:2024-11-30 03:05:32 | Epoch: 0 | Step: 313170 | Dataset: 0-1481160 | Loss: 0.856 | 600 ms/step , 115090.49 GFLOP/s , 173173.4 tokens/s INFO:__main__:2024-11-30 03:05:39 | Epoch: 0 | Step: 313180 | Dataset: 0-1483560 | Loss: 0.890 | 599 ms/step , 115141.66 GFLOP/s , 173127.4 tokens/s INFO:__main__:2024-11-30 03:05:46 | Epoch: 0 | Step: 313190 | Dataset: 0-1485960 | Loss: 0.885 | 598 ms/step , 115401.43 GFLOP/s , 173486.0 tokens/s INFO:__main__:2024-11-30 03:05:53 | Epoch: 0 | Step: 313200 | Dataset: 0-1488360 | Loss: 0.916 | 596 ms/step , 115717.53 GFLOP/s , 173660.8 tokens/s INFO:__main__:2024-11-30 03:06:00 | Epoch: 0 | Step: 313210 | Dataset: 0-1490760 | Loss: 0.872 | 598 ms/step , 115416.24 GFLOP/s , 173667.3 tokens/s INFO:__main__:2024-11-30 03:06:07 | Epoch: 0 | Step: 313220 | Dataset: 0-1493160 | Loss: 0.792 | 598 ms/step , 115416.27 GFLOP/s , 173641.4 tokens/s INFO:__main__:2024-11-30 03:06:14 | Epoch: 0 | Step: 313230 | Dataset: 0-1495560 | Loss: 0.921 | 598 ms/step , 115357.12 GFLOP/s , 173514.7 tokens/s INFO:__main__:2024-11-30 03:06:21 | Epoch: 0 | Step: 313240 | Dataset: 0-1497960 | Loss: 0.902 | 597 ms/step , 115538.59 GFLOP/s , 173499.4 tokens/s INFO:__main__:2024-11-30 03:06:28 | Epoch: 0 | Step: 313250 | Dataset: 0-1500360 | Loss: 0.883 | 597 ms/step , 115540.81 GFLOP/s , 173590.1 tokens/s INFO:__main__:2024-11-30 03:06:35 | Epoch: 0 | Step: 313260 | Dataset: 0-1502760 | Loss: 0.899 | 598 ms/step , 115431.34 GFLOP/s , 173603.8 tokens/s INFO:__main__:2024-11-30 03:06:42 | Epoch: 0 | Step: 313270 | Dataset: 0-1505160 | Loss: 0.914 | 598 ms/step , 115440.87 GFLOP/s , 173594.1 tokens/s INFO:__main__:2024-11-30 03:06:49 | Epoch: 0 | Step: 313280 | Dataset: 0-1507560 | Loss: 0.905 | 597 ms/step , 115670.62 GFLOP/s , 173810.5 tokens/s INFO:__main__:2024-11-30 03:06:56 | Epoch: 0 | Step: 313290 | Dataset: 0-1509960 | Loss: 0.887 | 598 ms/step , 115474.38 GFLOP/s , 173572.2 tokens/s INFO:__main__:2024-11-30 03:07:04 | Epoch: 0 | Step: 313300 | Dataset: 0-1512360 | Loss: 0.954 | 598 ms/step , 115409.52 GFLOP/s , 173532.8 tokens/s INFO:__main__:2024-11-30 03:07:11 | Epoch: 0 | Step: 313310 | Dataset: 0-1514760 | Loss: 0.873 | 598 ms/step , 115464.01 GFLOP/s , 173566.2 tokens/s INFO:__main__:2024-11-30 03:07:18 | Epoch: 0 | Step: 313320 | Dataset: 0-1517160 | Loss: 0.925 | 597 ms/step , 115573.65 GFLOP/s , 173534.2 tokens/s INFO:__main__:2024-11-30 03:07:25 | Epoch: 0 | Step: 313330 | Dataset: 0-1519560 | Loss: 1.043 | 598 ms/step , 115421.17 GFLOP/s , 173547.3 tokens/s INFO:__main__:2024-11-30 03:07:32 | Epoch: 0 | Step: 313340 | Dataset: 0-1521960 | Loss: 1.101 | 598 ms/step , 115473.78 GFLOP/s , 173529.0 tokens/s INFO:__main__:2024-11-30 03:07:39 | Epoch: 0 | Step: 313350 | Dataset: 0-1524360 | Loss: 1.017 | 596 ms/step , 115744.85 GFLOP/s , 173639.3 tokens/s INFO:__main__:2024-11-30 03:07:46 | Epoch: 0 | Step: 313360 | Dataset: 0-1526760 | Loss: 1.009 | 597 ms/step , 115512.00 GFLOP/s , 173637.4 tokens/s INFO:__main__:2024-11-30 03:07:53 | Epoch: 0 | Step: 313370 | Dataset: 0-1529160 | Loss: 1.090 | 598 ms/step , 115395.83 GFLOP/s , 173491.6 tokens/s INFO:__main__:2024-11-30 03:08:00 | Epoch: 0 | Step: 313380 | Dataset: 0-1531560 | Loss: 1.045 | 597 ms/step , 115584.65 GFLOP/s , 173559.7 tokens/s INFO:__main__:2024-11-30 03:08:07 | Epoch: 0 | Step: 313390 | Dataset: 0-1533960 | Loss: 0.975 | 598 ms/step , 115443.33 GFLOP/s , 173566.8 tokens/s INFO:__main__:2024-11-30 03:08:14 | Epoch: 0 | Step: 313400 | Dataset: 0-1536360 | Loss: 0.972 | 598 ms/step , 115462.83 GFLOP/s , 173447.1 tokens/s INFO:__main__:2024-11-30 03:08:21 | Epoch: 0 | Step: 313410 | Dataset: 0-1538760 | Loss: 0.978 | 598 ms/step , 115389.84 GFLOP/s , 173509.1 tokens/s INFO:__main__:2024-11-30 03:08:29 | Epoch: 0 | Step: 313420 | Dataset: 0-1541160 | Loss: 0.974 | 597 ms/step , 115537.99 GFLOP/s , 173443.4 tokens/s INFO:__main__:2024-11-30 03:08:36 | Epoch: 0 | Step: 313430 | Dataset: 0-1543560 | Loss: 1.006 | 598 ms/step , 115491.17 GFLOP/s , 173719.5 tokens/s INFO:__main__:2024-11-30 03:08:43 | Epoch: 0 | Step: 313440 | Dataset: 0-1545960 | Loss: 0.940 | 598 ms/step , 115410.06 GFLOP/s , 173582.2 tokens/s INFO:__main__:2024-11-30 03:08:50 | Epoch: 0 | Step: 313450 | Dataset: 0-1548360 | Loss: 0.980 | 599 ms/step , 115200.52 GFLOP/s , 173558.0 tokens/s INFO:__main__:2024-11-30 03:08:57 | Epoch: 0 | Step: 313460 | Dataset: 0-1550760 | Loss: 1.016 | 602 ms/step , 114698.82 GFLOP/s , 173293.9 tokens/s INFO:__main__:2024-11-30 03:09:04 | Epoch: 0 | Step: 313470 | Dataset: 0-1553160 | Loss: 1.003 | 598 ms/step , 115451.64 GFLOP/s , 173443.4 tokens/s INFO:__main__:2024-11-30 03:09:11 | Epoch: 0 | Step: 313480 | Dataset: 0-1555560 | Loss: 1.033 | 599 ms/step , 115231.17 GFLOP/s , 173438.7 tokens/s INFO:__main__:2024-11-30 03:09:18 | Epoch: 0 | Step: 313490 | Dataset: 0-1557960 | Loss: 0.945 | 598 ms/step , 115417.12 GFLOP/s , 173463.8 tokens/s INFO:__main__:2024-11-30 03:09:26 | Validation | Step: 313500 | Val_loss: 0.573 | Best_val_loss: 0.4363 INFO:__main__:2024-11-30 03:09:26 | Epoch: 0 | Step: 313500 | Dataset: 0-1560360 | Loss: 0.951 | 596 ms/step , 115817.50 GFLOP/s , 147753.7 tokens/s INFO:__main__:2024-11-30 03:09:34 | Epoch: 0 | Step: 313510 | Dataset: 0-1562760 | Loss: 0.970 | 597 ms/step , 115569.68 GFLOP/s , 173690.8 tokens/s INFO:__main__:2024-11-30 03:09:41 | Epoch: 0 | Step: 313520 | Dataset: 0-1565160 | Loss: 0.953 | 598 ms/step , 115315.68 GFLOP/s , 173466.1 tokens/s INFO:__main__:2024-11-30 03:09:48 | Epoch: 0 | Step: 313530 | Dataset: 0-1567560 | Loss: 0.960 | 598 ms/step , 115388.27 GFLOP/s , 173555.1 tokens/s INFO:__main__:2024-11-30 03:09:55 | Epoch: 0 | Step: 313540 | Dataset: 0-1569960 | Loss: 0.949 | 598 ms/step , 115487.97 GFLOP/s , 173563.9 tokens/s INFO:__main__:2024-11-30 03:10:02 | Epoch: 0 | Step: 313550 | Dataset: 0-1572360 | Loss: 1.001 | 597 ms/step , 115521.79 GFLOP/s , 173567.0 tokens/s INFO:__main__:2024-11-30 03:10:09 | Epoch: 0 | Step: 313560 | Dataset: 0-1574760 | Loss: 0.769 | 598 ms/step , 115449.59 GFLOP/s , 173509.4 tokens/s INFO:__main__:2024-11-30 03:10:16 | Epoch: 0 | Step: 313570 | Dataset: 0-1577160 | Loss: 0.709 | 597 ms/step , 115611.38 GFLOP/s , 173713.7 tokens/s INFO:__main__:2024-11-30 03:10:23 | Epoch: 0 | Step: 313580 | Dataset: 0-1579560 | Loss: 0.773 | 597 ms/step , 115633.44 GFLOP/s , 173829.3 tokens/s INFO:__main__:2024-11-30 03:10:30 | Epoch: 0 | Step: 313590 | Dataset: 0-1581960 | Loss: 0.755 | 599 ms/step , 115161.73 GFLOP/s , 173624.3 tokens/s INFO:__main__:2024-11-30 03:10:37 | Epoch: 0 | Step: 313600 | Dataset: 0-1584360 | Loss: 0.691 | 597 ms/step , 115534.76 GFLOP/s , 173646.2 tokens/s INFO:__main__:2024-11-30 03:10:44 | Epoch: 0 | Step: 313610 | Dataset: 0-1586760 | Loss: 0.684 | 599 ms/step , 115260.91 GFLOP/s , 173657.9 tokens/s INFO:__main__:2024-11-30 03:10:51 | Epoch: 0 | Step: 313620 | Dataset: 0-1589160 | Loss: 0.651 | 598 ms/step , 115486.58 GFLOP/s , 173631.2 tokens/s INFO:__main__:2024-11-30 03:10:58 | Epoch: 0 | Step: 313630 | Dataset: 0-1591560 | Loss: 0.762 | 598 ms/step , 115348.98 GFLOP/s , 173612.6 tokens/s INFO:__main__:2024-11-30 03:11:06 | Epoch: 0 | Step: 313640 | Dataset: 0-1593960 | Loss: 0.686 | 598 ms/step , 115418.58 GFLOP/s , 173572.8 tokens/s INFO:__main__:2024-11-30 03:11:13 | Epoch: 0 | Step: 313650 | Dataset: 0-1596360 | Loss: 0.691 | 598 ms/step , 115346.22 GFLOP/s , 173756.2 tokens/s INFO:__main__:2024-11-30 03:11:20 | Epoch: 0 | Step: 313660 | Dataset: 0-1598760 | Loss: 0.677 | 597 ms/step , 115675.15 GFLOP/s , 173725.2 tokens/s INFO:__main__:2024-11-30 03:11:27 | Epoch: 0 | Step: 313670 | Dataset: 0-1601160 | Loss: 0.687 | 597 ms/step , 115609.72 GFLOP/s , 173673.0 tokens/s INFO:__main__:2024-11-30 03:11:34 | Epoch: 0 | Step: 313680 | Dataset: 0-1603560 | Loss: 0.652 | 598 ms/step , 115426.72 GFLOP/s , 173736.5 tokens/s INFO:__main__:2024-11-30 03:11:41 | Epoch: 0 | Step: 313690 | Dataset: 0-1605960 | Loss: 0.619 | 598 ms/step , 115312.13 GFLOP/s , 173663.3 tokens/s INFO:__main__:2024-11-30 03:11:48 | Epoch: 0 | Step: 313700 | Dataset: 0-1608360 | Loss: 0.673 | 597 ms/step , 115627.59 GFLOP/s , 173647.6 tokens/s INFO:__main__:2024-11-30 03:11:55 | Epoch: 0 | Step: 313710 | Dataset: 0-1610760 | Loss: 0.667 | 597 ms/step , 115538.62 GFLOP/s , 173640.7 tokens/s INFO:__main__:2024-11-30 03:12:02 | Epoch: 0 | Step: 313720 | Dataset: 0-1613160 | Loss: 0.607 | 596 ms/step , 115784.24 GFLOP/s , 173709.5 tokens/s INFO:__main__:2024-11-30 03:12:09 | Epoch: 0 | Step: 313730 | Dataset: 0-1615560 | Loss: 0.629 | 596 ms/step , 115807.31 GFLOP/s , 173797.8 tokens/s INFO:__main__:2024-11-30 03:12:16 | Epoch: 0 | Step: 313740 | Dataset: 0-1617960 | Loss: 0.665 | 597 ms/step , 115599.98 GFLOP/s , 173601.9 tokens/s INFO:__main__:2024-11-30 03:12:23 | Epoch: 0 | Step: 313750 | Dataset: 0-1620360 | Loss: 0.631 | 596 ms/step , 115710.05 GFLOP/s , 173745.4 tokens/s INFO:__main__:2024-11-30 03:12:30 | Epoch: 0 | Step: 313760 | Dataset: 0-1622760 | Loss: 0.710 | 597 ms/step , 115689.54 GFLOP/s , 173694.5 tokens/s INFO:__main__:2024-11-30 03:12:37 | Epoch: 0 | Step: 313770 | Dataset: 0-1625160 | Loss: 0.678 | 597 ms/step , 115661.52 GFLOP/s , 173646.8 tokens/s INFO:__main__:2024-11-30 03:12:45 | Epoch: 0 | Step: 313780 | Dataset: 0-1627560 | Loss: 0.602 | 597 ms/step , 115603.99 GFLOP/s , 173666.5 tokens/s INFO:__main__:2024-11-30 03:12:52 | Epoch: 0 | Step: 313790 | Dataset: 0-1629960 | Loss: 0.685 | 598 ms/step , 115493.50 GFLOP/s , 173684.1 tokens/s INFO:__main__:2024-11-30 03:12:59 | Epoch: 0 | Step: 313800 | Dataset: 0-1632360 | Loss: 0.642 | 596 ms/step , 115797.50 GFLOP/s , 173778.0 tokens/s INFO:__main__:2024-11-30 03:13:06 | Epoch: 0 | Step: 313810 | Dataset: 0-1634760 | Loss: 0.618 | 597 ms/step , 115668.28 GFLOP/s , 173726.3 tokens/s INFO:__main__:2024-11-30 03:13:13 | Epoch: 0 | Step: 313820 | Dataset: 0-1637160 | Loss: 0.644 | 597 ms/step , 115635.45 GFLOP/s , 173711.9 tokens/s INFO:__main__:2024-11-30 03:13:20 | Epoch: 0 | Step: 313830 | Dataset: 0-1639560 | Loss: 0.674 | 597 ms/step , 115507.99 GFLOP/s , 173708.9 tokens/s INFO:__main__:2024-11-30 03:13:27 | Epoch: 0 | Step: 313840 | Dataset: 0-1641960 | Loss: 0.667 | 597 ms/step , 115628.07 GFLOP/s , 173694.0 tokens/s INFO:__main__:2024-11-30 03:13:34 | Epoch: 0 | Step: 313850 | Dataset: 0-1644360 | Loss: 0.687 | 597 ms/step , 115609.96 GFLOP/s , 173725.8 tokens/s INFO:__main__:2024-11-30 03:13:41 | Epoch: 0 | Step: 313860 | Dataset: 0-1646760 | Loss: 0.680 | 597 ms/step , 115548.86 GFLOP/s , 173729.7 tokens/s INFO:__main__:2024-11-30 03:13:48 | Epoch: 0 | Step: 313870 | Dataset: 0-1649160 | Loss: 0.729 | 596 ms/step , 115788.00 GFLOP/s , 173640.1 tokens/s INFO:__main__:2024-11-30 03:13:55 | Epoch: 0 | Step: 313880 | Dataset: 0-1651560 | Loss: 0.566 | 596 ms/step , 115739.56 GFLOP/s , 173727.2 tokens/s INFO:__main__:2024-11-30 03:14:02 | Epoch: 0 | Step: 313890 | Dataset: 0-1653960 | Loss: 0.703 | 598 ms/step , 115491.66 GFLOP/s , 173832.3 tokens/s INFO:__main__:2024-11-30 03:14:09 | Epoch: 0 | Step: 313900 | Dataset: 0-1656360 | Loss: 0.609 | 597 ms/step , 115599.90 GFLOP/s , 173710.1 tokens/s INFO:__main__:2024-11-30 03:14:17 | Epoch: 0 | Step: 313910 | Dataset: 0-1658760 | Loss: 0.675 | 597 ms/step , 115682.50 GFLOP/s , 173718.2 tokens/s INFO:__main__:2024-11-30 03:14:24 | Epoch: 0 | Step: 313920 | Dataset: 0-1661160 | Loss: 0.647 | 597 ms/step , 115657.95 GFLOP/s , 173712.8 tokens/s INFO:__main__:2024-11-30 03:14:31 | Epoch: 0 | Step: 313930 | Dataset: 0-1663560 | Loss: 0.585 | 597 ms/step , 115610.95 GFLOP/s , 173709.6 tokens/s INFO:__main__:2024-11-30 03:14:38 | Epoch: 0 | Step: 313940 | Dataset: 0-1665960 | Loss: 0.650 | 597 ms/step , 115678.49 GFLOP/s , 172520.5 tokens/s INFO:__main__:2024-11-30 03:14:45 | Epoch: 0 | Step: 313950 | Dataset: 0-1668360 | Loss: 0.642 | 596 ms/step , 115807.02 GFLOP/s , 173862.4 tokens/s INFO:__main__:2024-11-30 03:14:52 | Epoch: 0 | Step: 313960 | Dataset: 0-1670760 | Loss: 0.698 | 596 ms/step , 115726.89 GFLOP/s , 173873.1 tokens/s INFO:__main__:2024-11-30 03:14:59 | Epoch: 0 | Step: 313970 | Dataset: 0-1673160 | Loss: 0.623 | 597 ms/step , 115529.20 GFLOP/s , 173744.8 tokens/s INFO:__main__:2024-11-30 03:15:06 | Epoch: 0 | Step: 313980 | Dataset: 0-1675560 | Loss: 0.657 | 596 ms/step , 115748.00 GFLOP/s , 173712.4 tokens/s INFO:__main__:2024-11-30 03:15:13 | Epoch: 0 | Step: 313990 | Dataset: 0-1677960 | Loss: 0.635 | 597 ms/step , 115538.73 GFLOP/s , 173745.6 tokens/s INFO:__main__:2024-11-30 03:15:21 | Validation | Step: 314000 | Val_loss: 0.532 | Best_val_loss: 0.4363 INFO:__main__:2024-11-30 03:15:21 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_031521_step_314000.pt` INFO:__main__:2024-11-30 03:15:23 | Epoch: 0 | Step: 314000 | Dataset: 0-1680360 | Loss: 0.626 | 595 ms/step , 115977.88 GFLOP/s , 118829.2 tokens/s INFO:__main__:2024-11-30 03:15:31 | Epoch: 0 | Step: 314010 | Dataset: 0-1682760 | Loss: 0.633 | 598 ms/step , 115363.23 GFLOP/s , 173361.2 tokens/s INFO:__main__:2024-11-30 03:15:38 | Epoch: 0 | Step: 314020 | Dataset: 0-1685160 | Loss: 0.614 | 598 ms/step , 115459.28 GFLOP/s , 173390.4 tokens/s INFO:__main__:2024-11-30 03:15:45 | Epoch: 0 | Step: 314030 | Dataset: 0-1687560 | Loss: 0.596 | 598 ms/step , 115398.99 GFLOP/s , 173547.0 tokens/s INFO:__main__:2024-11-30 03:15:52 | Epoch: 0 | Step: 314040 | Dataset: 0-1689960 | Loss: 0.665 | 599 ms/step , 115183.08 GFLOP/s , 173429.1 tokens/s INFO:__main__:2024-11-30 03:15:59 | Epoch: 0 | Step: 314050 | Dataset: 0-1692360 | Loss: 0.581 | 598 ms/step , 115342.47 GFLOP/s , 173378.8 tokens/s INFO:__main__:2024-11-30 03:16:06 | Epoch: 0 | Step: 314060 | Dataset: 0-1694760 | Loss: 0.619 | 599 ms/step , 115175.55 GFLOP/s , 173397.5 tokens/s INFO:__main__:2024-11-30 03:16:13 | Epoch: 0 | Step: 314070 | Dataset: 0-1697160 | Loss: 0.639 | 599 ms/step , 115191.40 GFLOP/s , 173366.0 tokens/s INFO:__main__:2024-11-30 03:16:20 | Epoch: 0 | Step: 314080 | Dataset: 0-1699560 | Loss: 0.623 | 599 ms/step , 115296.64 GFLOP/s , 173413.4 tokens/s INFO:__main__:2024-11-30 03:16:27 | Epoch: 0 | Step: 314090 | Dataset: 0-1701960 | Loss: 0.610 | 598 ms/step , 115331.59 GFLOP/s , 173381.9 tokens/s INFO:__main__:2024-11-30 03:16:34 | Epoch: 0 | Step: 314100 | Dataset: 0-1704360 | Loss: 1.091 | 598 ms/step , 115324.84 GFLOP/s , 173516.1 tokens/s INFO:__main__:2024-11-30 03:16:41 | Epoch: 0 | Step: 314110 | Dataset: 0-1706760 | Loss: 1.293 | 599 ms/step , 115127.84 GFLOP/s , 173448.3 tokens/s INFO:__main__:2024-11-30 03:16:49 | Epoch: 0 | Step: 314120 | Dataset: 0-1709160 | Loss: 1.350 | 599 ms/step , 115188.86 GFLOP/s , 173264.0 tokens/s INFO:__main__:2024-11-30 03:16:56 | Epoch: 0 | Step: 314130 | Dataset: 0-1711560 | Loss: 1.277 | 599 ms/step , 115201.90 GFLOP/s , 173280.8 tokens/s INFO:__main__:2024-11-30 03:17:03 | Epoch: 0 | Step: 314140 | Dataset: 0-1713960 | Loss: 1.284 | 599 ms/step , 115220.09 GFLOP/s , 173266.1 tokens/s INFO:__main__:2024-11-30 03:17:10 | Epoch: 0 | Step: 314150 | Dataset: 0-1716360 | Loss: 1.275 | 599 ms/step , 115273.25 GFLOP/s , 173329.7 tokens/s INFO:__main__:2024-11-30 03:17:17 | Epoch: 0 | Step: 314160 | Dataset: 0-1718760 | Loss: 1.260 | 599 ms/step , 115146.52 GFLOP/s , 173230.2 tokens/s INFO:__main__:2024-11-30 03:17:24 | Epoch: 0 | Step: 314170 | Dataset: 0-1721160 | Loss: 1.336 | 598 ms/step , 115335.82 GFLOP/s , 173313.9 tokens/s INFO:__main__:2024-11-30 03:17:31 | Epoch: 0 | Step: 314180 | Dataset: 0-1723560 | Loss: 0.948 | 598 ms/step , 115332.91 GFLOP/s , 173485.5 tokens/s INFO:__main__:2024-11-30 03:17:38 | Epoch: 0 | Step: 314190 | Dataset: 0-1725960 | Loss: 0.942 | 599 ms/step , 115209.01 GFLOP/s , 173448.6 tokens/s INFO:__main__:2024-11-30 03:17:45 | Epoch: 0 | Step: 314200 | Dataset: 0-1728360 | Loss: 0.927 | 598 ms/step , 115312.31 GFLOP/s , 173256.5 tokens/s INFO:__main__:2024-11-30 03:17:52 | Epoch: 0 | Step: 314210 | Dataset: 0-1730760 | Loss: 0.905 | 599 ms/step , 115203.42 GFLOP/s , 173335.1 tokens/s INFO:__main__:2024-11-30 03:17:59 | Epoch: 0 | Step: 314220 | Dataset: 0-1733160 | Loss: 0.900 | 598 ms/step , 115365.05 GFLOP/s , 173350.2 tokens/s INFO:__main__:2024-11-30 03:18:07 | Epoch: 0 | Step: 314230 | Dataset: 0-1735560 | Loss: 0.882 | 599 ms/step , 115186.94 GFLOP/s , 173313.8 tokens/s INFO:__main__:2024-11-30 03:18:14 | Epoch: 0 | Step: 314240 | Dataset: 0-1737960 | Loss: 0.875 | 600 ms/step , 115114.97 GFLOP/s , 173333.3 tokens/s INFO:__main__:2024-11-30 03:18:21 | Epoch: 0 | Step: 314250 | Dataset: 0-1740360 | Loss: 0.855 | 598 ms/step , 115346.18 GFLOP/s , 173422.2 tokens/s INFO:__main__:2024-11-30 03:18:28 | Epoch: 0 | Step: 314260 | Dataset: 0-1742760 | Loss: 0.875 | 599 ms/step , 115264.53 GFLOP/s , 173491.7 tokens/s INFO:__main__:2024-11-30 03:18:35 | Epoch: 0 | Step: 314270 | Dataset: 0-1745160 | Loss: 0.836 | 599 ms/step , 115206.27 GFLOP/s , 173382.2 tokens/s INFO:__main__:2024-11-30 03:18:42 | Epoch: 0 | Step: 314280 | Dataset: 0-1747560 | Loss: 0.823 | 599 ms/step , 115243.30 GFLOP/s , 173294.2 tokens/s INFO:__main__:2024-11-30 03:18:49 | Epoch: 0 | Step: 314290 | Dataset: 0-1749960 | Loss: 0.818 | 599 ms/step , 115272.76 GFLOP/s , 173369.0 tokens/s INFO:__main__:2024-11-30 03:18:56 | Epoch: 0 | Step: 314300 | Dataset: 0-1752360 | Loss: 0.841 | 599 ms/step , 115261.25 GFLOP/s , 173306.1 tokens/s INFO:__main__:2024-11-30 03:19:03 | Epoch: 0 | Step: 314310 | Dataset: 0-1754760 | Loss: 0.820 | 599 ms/step , 115198.31 GFLOP/s , 173372.1 tokens/s INFO:__main__:2024-11-30 03:19:10 | Epoch: 0 | Step: 314320 | Dataset: 0-1757160 | Loss: 0.801 | 598 ms/step , 115315.10 GFLOP/s , 173347.6 tokens/s INFO:__main__:2024-11-30 03:19:17 | Epoch: 0 | Step: 314330 | Dataset: 0-1759560 | Loss: 0.830 | 596 ms/step , 115716.12 GFLOP/s , 173791.3 tokens/s INFO:__main__:2024-11-30 03:19:24 | Epoch: 0 | Step: 314340 | Dataset: 0-1761960 | Loss: 0.816 | 596 ms/step , 115845.35 GFLOP/s , 173843.5 tokens/s INFO:__main__:2024-11-30 03:19:32 | Epoch: 0 | Step: 314350 | Dataset: 0-1764360 | Loss: 0.794 | 597 ms/step , 115654.55 GFLOP/s , 173729.4 tokens/s INFO:__main__:2024-11-30 03:19:39 | Epoch: 0 | Step: 314360 | Dataset: 0-1766760 | Loss: 0.806 | 597 ms/step , 115599.14 GFLOP/s , 173731.3 tokens/s INFO:__main__:2024-11-30 03:19:46 | Epoch: 0 | Step: 314370 | Dataset: 0-1769160 | Loss: 0.813 | 597 ms/step , 115562.18 GFLOP/s , 173635.1 tokens/s INFO:__main__:2024-11-30 03:19:53 | Epoch: 0 | Step: 314380 | Dataset: 0-1771560 | Loss: 0.798 | 597 ms/step , 115683.98 GFLOP/s , 173765.9 tokens/s INFO:__main__:2024-11-30 03:20:00 | Epoch: 0 | Step: 314390 | Dataset: 0-1773960 | Loss: 0.778 | 597 ms/step , 115601.87 GFLOP/s , 173692.7 tokens/s INFO:__main__:2024-11-30 03:20:07 | Epoch: 0 | Step: 314400 | Dataset: 0-1776360 | Loss: 0.793 | 596 ms/step , 115777.55 GFLOP/s , 173802.5 tokens/s INFO:__main__:2024-11-30 03:20:14 | Epoch: 0 | Step: 314410 | Dataset: 0-1778760 | Loss: 0.818 | 597 ms/step , 115655.89 GFLOP/s , 173909.7 tokens/s INFO:__main__:2024-11-30 03:20:21 | Epoch: 0 | Step: 314420 | Dataset: 0-1781160 | Loss: 0.773 | 596 ms/step , 115716.05 GFLOP/s , 173816.1 tokens/s INFO:__main__:2024-11-30 03:20:28 | Epoch: 0 | Step: 314430 | Dataset: 0-1783560 | Loss: 0.774 | 596 ms/step , 115703.12 GFLOP/s , 173665.2 tokens/s INFO:__main__:2024-11-30 03:20:35 | Epoch: 0 | Step: 314440 | Dataset: 0-1785960 | Loss: 0.796 | 598 ms/step , 115480.12 GFLOP/s , 173805.9 tokens/s INFO:__main__:2024-11-30 03:20:42 | Epoch: 0 | Step: 314450 | Dataset: 0-1788360 | Loss: 0.745 | 597 ms/step , 115688.49 GFLOP/s , 173773.6 tokens/s INFO:__main__:2024-11-30 03:20:49 | Epoch: 0 | Step: 314460 | Dataset: 0-1790760 | Loss: 1.267 | 598 ms/step , 115467.03 GFLOP/s , 173760.6 tokens/s INFO:__main__:2024-11-30 03:20:56 | Epoch: 0 | Step: 314470 | Dataset: 0-1793160 | Loss: 1.268 | 597 ms/step , 115621.23 GFLOP/s , 173697.4 tokens/s INFO:__main__:2024-11-30 03:21:03 | Epoch: 0 | Step: 314480 | Dataset: 0-1795560 | Loss: 1.180 | 597 ms/step , 115656.08 GFLOP/s , 173786.4 tokens/s INFO:__main__:2024-11-30 03:21:11 | Epoch: 0 | Step: 314490 | Dataset: 0-1797960 | Loss: 1.126 | 596 ms/step , 115710.44 GFLOP/s , 173785.2 tokens/s INFO:__main__:2024-11-30 03:21:18 | Validation | Step: 314500 | Val_loss: 0.588 | Best_val_loss: 0.4363 INFO:__main__:2024-11-30 03:21:19 | Epoch: 0 | Step: 314500 | Dataset: 0-1800360 | Loss: 1.241 | 596 ms/step , 115745.58 GFLOP/s , 147914.3 tokens/s INFO:__main__:2024-11-30 03:21:26 | Epoch: 0 | Step: 314510 | Dataset: 0-1802760 | Loss: 1.220 | 597 ms/step , 115506.51 GFLOP/s , 173729.0 tokens/s INFO:__main__:2024-11-30 03:21:33 | Epoch: 0 | Step: 314520 | Dataset: 0-1805160 | Loss: 1.077 | 597 ms/step , 115670.80 GFLOP/s , 173770.2 tokens/s INFO:__main__:2024-11-30 03:21:40 | Epoch: 0 | Step: 314530 | Dataset: 0-1807560 | Loss: 1.190 | 597 ms/step , 115641.42 GFLOP/s , 173704.1 tokens/s INFO:__main__:2024-11-30 03:21:47 | Epoch: 0 | Step: 314540 | Dataset: 0-1809960 | Loss: 1.157 | 597 ms/step , 115590.66 GFLOP/s , 173670.1 tokens/s INFO:__main__:2024-11-30 03:21:54 | Epoch: 0 | Step: 314550 | Dataset: 0-1812360 | Loss: 1.167 | 597 ms/step , 115666.29 GFLOP/s , 173711.1 tokens/s INFO:__main__:2024-11-30 03:22:01 | Epoch: 0 | Step: 314560 | Dataset: 0-1814760 | Loss: 1.093 | 597 ms/step , 115647.65 GFLOP/s , 173787.8 tokens/s INFO:__main__:2024-11-30 03:22:08 | Epoch: 0 | Step: 314570 | Dataset: 0-1817160 | Loss: 1.197 | 597 ms/step , 115636.04 GFLOP/s , 173727.6 tokens/s INFO:__main__:2024-11-30 03:22:15 | Epoch: 0 | Step: 314580 | Dataset: 0-1819560 | Loss: 1.052 | 597 ms/step , 115664.92 GFLOP/s , 173626.3 tokens/s INFO:__main__:2024-11-30 03:22:22 | Epoch: 0 | Step: 314590 | Dataset: 0-1821960 | Loss: 1.085 | 597 ms/step , 115617.04 GFLOP/s , 173673.1 tokens/s INFO:__main__:2024-11-30 03:22:30 | Epoch: 0 | Step: 314600 | Dataset: 0-1824360 | Loss: 1.056 | 598 ms/step , 115447.13 GFLOP/s , 173650.1 tokens/s INFO:__main__:2024-11-30 03:22:37 | Epoch: 0 | Step: 314610 | Dataset: 0-1826760 | Loss: 1.052 | 597 ms/step , 115567.70 GFLOP/s , 173686.2 tokens/s INFO:__main__:2024-11-30 03:22:44 | Epoch: 0 | Step: 314620 | Dataset: 0-1829160 | Loss: 1.054 | 597 ms/step , 115504.38 GFLOP/s , 173614.7 tokens/s INFO:__main__:2024-11-30 03:22:51 | Epoch: 0 | Step: 314630 | Dataset: 0-1831560 | Loss: 1.089 | 597 ms/step , 115692.96 GFLOP/s , 173686.7 tokens/s INFO:__main__:2024-11-30 03:22:58 | Epoch: 0 | Step: 314640 | Dataset: 0-1833960 | Loss: 1.178 | 597 ms/step , 115507.17 GFLOP/s , 173727.4 tokens/s INFO:__main__:2024-11-30 03:23:05 | Epoch: 0 | Step: 314650 | Dataset: 0-1836360 | Loss: 0.797 | 598 ms/step , 115461.95 GFLOP/s , 173743.9 tokens/s INFO:__main__:2024-11-30 03:23:12 | Epoch: 0 | Step: 314660 | Dataset: 0-1838760 | Loss: 0.772 | 597 ms/step , 115548.62 GFLOP/s , 173664.8 tokens/s INFO:__main__:2024-11-30 03:23:19 | Epoch: 0 | Step: 314670 | Dataset: 0-1841160 | Loss: 0.711 | 597 ms/step , 115558.17 GFLOP/s , 173677.1 tokens/s INFO:__main__:2024-11-30 03:23:26 | Epoch: 0 | Step: 314680 | Dataset: 0-1843560 | Loss: 0.696 | 598 ms/step , 115465.34 GFLOP/s , 173683.4 tokens/s INFO:__main__:2024-11-30 03:23:33 | Epoch: 0 | Step: 314690 | Dataset: 0-1845960 | Loss: 0.713 | 596 ms/step , 115727.78 GFLOP/s , 173668.2 tokens/s INFO:__main__:2024-11-30 03:23:40 | Epoch: 0 | Step: 314700 | Dataset: 0-1848360 | Loss: 0.816 | 598 ms/step , 115368.77 GFLOP/s , 173631.5 tokens/s INFO:__main__:2024-11-30 03:23:47 | Epoch: 0 | Step: 314710 | Dataset: 0-1850760 | Loss: 0.742 | 596 ms/step , 115792.58 GFLOP/s , 173708.5 tokens/s INFO:__main__:2024-11-30 03:23:54 | Epoch: 0 | Step: 314720 | Dataset: 0-1853160 | Loss: 0.665 | 596 ms/step , 115736.60 GFLOP/s , 173861.2 tokens/s INFO:__main__:2024-11-30 03:24:02 | Epoch: 0 | Step: 314730 | Dataset: 0-1855560 | Loss: 0.668 | 597 ms/step , 115572.55 GFLOP/s , 173596.2 tokens/s INFO:__main__:2024-11-30 03:24:09 | Epoch: 0 | Step: 314740 | Dataset: 0-1857960 | Loss: 0.759 | 596 ms/step , 115782.24 GFLOP/s , 173668.7 tokens/s INFO:__main__:2024-11-30 03:24:16 | Epoch: 0 | Step: 314750 | Dataset: 0-1860360 | Loss: 0.854 | 597 ms/step , 115647.71 GFLOP/s , 173662.2 tokens/s INFO:__main__:2024-11-30 03:24:23 | Epoch: 0 | Step: 314760 | Dataset: 0-1862760 | Loss: 0.721 | 597 ms/step , 115579.29 GFLOP/s , 173625.3 tokens/s INFO:__main__:2024-11-30 03:24:30 | Epoch: 0 | Step: 314770 | Dataset: 0-1865160 | Loss: 0.749 | 597 ms/step , 115649.37 GFLOP/s , 173637.0 tokens/s INFO:__main__:2024-11-30 03:24:37 | Epoch: 0 | Step: 314780 | Dataset: 0-1867560 | Loss: 0.771 | 597 ms/step , 115651.18 GFLOP/s , 173715.6 tokens/s INFO:__main__:2024-11-30 03:24:44 | Epoch: 0 | Step: 314790 | Dataset: 0-1869960 | Loss: 0.737 | 596 ms/step , 115843.82 GFLOP/s , 173745.7 tokens/s INFO:__main__:2024-11-30 03:24:51 | Epoch: 0 | Step: 314800 | Dataset: 0-1872360 | Loss: 0.794 | 597 ms/step , 115536.61 GFLOP/s , 173757.3 tokens/s INFO:__main__:2024-11-30 03:24:58 | Epoch: 0 | Step: 314810 | Dataset: 0-1874760 | Loss: 0.780 | 597 ms/step , 115513.61 GFLOP/s , 173642.2 tokens/s INFO:__main__:2024-11-30 03:25:05 | Epoch: 0 | Step: 314820 | Dataset: 0-1877160 | Loss: 0.819 | 597 ms/step , 115543.33 GFLOP/s , 173679.0 tokens/s INFO:__main__:2024-11-30 03:25:12 | Epoch: 0 | Step: 314830 | Dataset: 0-1879560 | Loss: 0.726 | 596 ms/step , 115748.20 GFLOP/s , 173735.5 tokens/s INFO:__main__:2024-11-30 03:25:19 | Epoch: 0 | Step: 314840 | Dataset: 0-1881960 | Loss: 0.801 | 597 ms/step , 115685.40 GFLOP/s , 173732.5 tokens/s INFO:__main__:2024-11-30 03:25:26 | Epoch: 0 | Step: 314850 | Dataset: 0-1884360 | Loss: 0.699 | 597 ms/step , 115578.68 GFLOP/s , 173696.2 tokens/s INFO:__main__:2024-11-30 03:25:34 | Epoch: 0 | Step: 314860 | Dataset: 0-1886760 | Loss: 0.738 | 597 ms/step , 115640.63 GFLOP/s , 173786.0 tokens/s INFO:__main__:2024-11-30 03:25:41 | Epoch: 0 | Step: 314870 | Dataset: 0-1889160 | Loss: 0.768 | 597 ms/step , 115691.69 GFLOP/s , 173844.2 tokens/s INFO:__main__:2024-11-30 03:25:48 | Epoch: 0 | Step: 314880 | Dataset: 0-1891560 | Loss: 0.746 | 597 ms/step , 115583.02 GFLOP/s , 173782.0 tokens/s INFO:__main__:2024-11-30 03:25:55 | Epoch: 0 | Step: 314890 | Dataset: 0-1893960 | Loss: 0.691 | 596 ms/step , 115729.21 GFLOP/s , 173741.4 tokens/s INFO:__main__:2024-11-30 03:26:02 | Epoch: 0 | Step: 314900 | Dataset: 0-1896360 | Loss: 0.748 | 597 ms/step , 115605.17 GFLOP/s , 173670.4 tokens/s INFO:__main__:2024-11-30 03:26:09 | Epoch: 0 | Step: 314910 | Dataset: 0-1898760 | Loss: 0.771 | 596 ms/step , 115767.63 GFLOP/s , 173652.5 tokens/s INFO:__main__:2024-11-30 03:26:16 | Epoch: 0 | Step: 314920 | Dataset: 0-1901160 | Loss: 0.808 | 597 ms/step , 115588.32 GFLOP/s , 173695.7 tokens/s INFO:__main__:2024-11-30 03:26:23 | Epoch: 0 | Step: 314930 | Dataset: 0-1903560 | Loss: 0.795 | 596 ms/step , 115753.07 GFLOP/s , 173710.8 tokens/s INFO:__main__:2024-11-30 03:26:30 | Epoch: 0 | Step: 314940 | Dataset: 0-1905960 | Loss: 0.819 | 597 ms/step , 115611.38 GFLOP/s , 173907.7 tokens/s INFO:__main__:2024-11-30 03:26:37 | Epoch: 0 | Step: 314950 | Dataset: 0-1908360 | Loss: 0.807 | 596 ms/step , 115713.67 GFLOP/s , 173811.8 tokens/s INFO:__main__:2024-11-30 03:26:44 | Epoch: 0 | Step: 314960 | Dataset: 0-1910760 | Loss: 0.796 | 598 ms/step , 115491.56 GFLOP/s , 173736.3 tokens/s INFO:__main__:2024-11-30 03:26:51 | Epoch: 0 | Step: 314970 | Dataset: 0-1913160 | Loss: 0.804 | 597 ms/step , 115572.02 GFLOP/s , 173641.7 tokens/s INFO:__main__:2024-11-30 03:26:58 | Epoch: 0 | Step: 314980 | Dataset: 0-1915560 | Loss: 0.871 | 597 ms/step , 115615.62 GFLOP/s , 173829.2 tokens/s INFO:__main__:2024-11-30 03:27:05 | Epoch: 0 | Step: 314990 | Dataset: 0-1917960 | Loss: 0.806 | 596 ms/step , 115808.10 GFLOP/s , 173661.7 tokens/s INFO:__main__:2024-11-30 03:27:13 | Validation | Step: 315000 | Val_loss: 0.558 | Best_val_loss: 0.4363 INFO:__main__:2024-11-30 03:27:13 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_032713_step_315000.pt` INFO:__main__:2024-11-30 03:27:16 | Epoch: 0 | Step: 315000 | Dataset: 0-1920360 | Loss: 0.818 | 595 ms/step , 115975.94 GFLOP/s , 119175.5 tokens/s INFO:__main__:2024-11-30 03:27:23 | Epoch: 0 | Step: 315010 | Dataset: 0-1922760 | Loss: 0.756 | 598 ms/step , 115427.66 GFLOP/s , 173479.1 tokens/s INFO:__main__:2024-11-30 03:27:30 | Epoch: 0 | Step: 315020 | Dataset: 0-1925160 | Loss: 0.744 | 599 ms/step , 115208.84 GFLOP/s , 173407.9 tokens/s INFO:__main__:2024-11-30 03:27:37 | Epoch: 0 | Step: 315030 | Dataset: 0-1927560 | Loss: 0.731 | 599 ms/step , 115207.02 GFLOP/s , 173382.0 tokens/s INFO:__main__:2024-11-30 03:27:44 | Epoch: 0 | Step: 315040 | Dataset: 0-1929960 | Loss: 0.765 | 598 ms/step , 115328.83 GFLOP/s , 173412.4 tokens/s INFO:__main__:2024-11-30 03:27:51 | Epoch: 0 | Step: 315050 | Dataset: 0-1932360 | Loss: 0.689 | 599 ms/step , 115192.22 GFLOP/s , 173363.7 tokens/s INFO:__main__:2024-11-30 03:27:58 | Epoch: 0 | Step: 315060 | Dataset: 0-1934760 | Loss: 0.684 | 599 ms/step , 115268.06 GFLOP/s , 173308.4 tokens/s INFO:__main__:2024-11-30 03:28:05 | Epoch: 0 | Step: 315070 | Dataset: 0-1937160 | Loss: 0.651 | 599 ms/step , 115123.85 GFLOP/s , 173263.7 tokens/s INFO:__main__:2024-11-30 03:28:12 | Epoch: 0 | Step: 315080 | Dataset: 0-1939560 | Loss: 0.809 | 599 ms/step , 115236.41 GFLOP/s , 173456.9 tokens/s INFO:__main__:2024-11-30 03:28:20 | Epoch: 0 | Step: 315090 | Dataset: 0-1941960 | Loss: 0.854 | 598 ms/step , 115339.25 GFLOP/s , 173499.7 tokens/s INFO:__main__:2024-11-30 03:28:27 | Epoch: 0 | Step: 315100 | Dataset: 0-1944360 | Loss: 0.727 | 599 ms/step , 115186.27 GFLOP/s , 173348.4 tokens/s INFO:__main__:2024-11-30 03:28:34 | Epoch: 0 | Step: 315110 | Dataset: 0-1946760 | Loss: 0.724 | 599 ms/step , 115209.66 GFLOP/s , 173302.0 tokens/s INFO:__main__:2024-11-30 03:28:41 | Epoch: 0 | Step: 315120 | Dataset: 0-1949160 | Loss: 0.816 | 599 ms/step , 115196.07 GFLOP/s , 173320.3 tokens/s INFO:__main__:2024-11-30 03:28:48 | Epoch: 0 | Step: 315130 | Dataset: 0-1951560 | Loss: 0.782 | 599 ms/step , 115259.23 GFLOP/s , 173309.1 tokens/s INFO:__main__:2024-11-30 03:28:55 | Epoch: 0 | Step: 315140 | Dataset: 0-1953960 | Loss: 0.742 | 599 ms/step , 115186.92 GFLOP/s , 173263.7 tokens/s INFO:__main__:2024-11-30 03:29:02 | Epoch: 0 | Step: 315150 | Dataset: 0-1956360 | Loss: 0.772 | 598 ms/step , 115419.08 GFLOP/s , 173369.8 tokens/s INFO:__main__:2024-11-30 03:29:09 | Epoch: 0 | Step: 315160 | Dataset: 0-1958760 | Loss: 0.750 | 599 ms/step , 115262.58 GFLOP/s , 173435.0 tokens/s INFO:__main__:2024-11-30 03:29:16 | Epoch: 0 | Step: 315170 | Dataset: 0-1961160 | Loss: 0.661 | 598 ms/step , 115316.74 GFLOP/s , 173403.0 tokens/s INFO:__main__:2024-11-30 03:29:23 | Epoch: 0 | Step: 315180 | Dataset: 0-1963560 | Loss: 0.695 | 599 ms/step , 115283.72 GFLOP/s , 173359.8 tokens/s INFO:__main__:2024-11-30 03:29:30 | Epoch: 0 | Step: 315190 | Dataset: 0-1965960 | Loss: 0.712 | 600 ms/step , 115107.14 GFLOP/s , 173284.8 tokens/s INFO:__main__:2024-11-30 03:29:38 | Epoch: 0 | Step: 315200 | Dataset: 0-1968360 | Loss: 0.605 | 599 ms/step , 115284.15 GFLOP/s , 173401.8 tokens/s INFO:__main__:2024-11-30 03:29:45 | Epoch: 0 | Step: 315210 | Dataset: 0-1970760 | Loss: 0.594 | 599 ms/step , 115231.75 GFLOP/s , 173406.3 tokens/s INFO:__main__:2024-11-30 03:29:52 | Epoch: 0 | Step: 315220 | Dataset: 0-1973160 | Loss: 0.545 | 599 ms/step , 115308.82 GFLOP/s , 173400.9 tokens/s INFO:__main__:2024-11-30 03:29:59 | Epoch: 0 | Step: 315230 | Dataset: 0-1975560 | Loss: 0.591 | 597 ms/step , 115671.96 GFLOP/s , 173715.3 tokens/s INFO:__main__:2024-11-30 03:30:06 | Epoch: 0 | Step: 315240 | Dataset: 0-1977960 | Loss: 0.536 | 595 ms/step , 115895.88 GFLOP/s , 173913.2 tokens/s INFO:__main__:2024-11-30 03:30:13 | Epoch: 0 | Step: 315250 | Dataset: 0-1980360 | Loss: 0.514 | 597 ms/step , 115512.46 GFLOP/s , 173849.4 tokens/s INFO:__main__:2024-11-30 03:30:20 | Epoch: 0 | Step: 315260 | Dataset: 0-1982760 | Loss: 0.494 | 596 ms/step , 115820.80 GFLOP/s , 173760.7 tokens/s INFO:__main__:2024-11-30 03:30:27 | Epoch: 0 | Step: 315270 | Dataset: 0-1985160 | Loss: 0.499 | 596 ms/step , 115734.47 GFLOP/s , 173809.7 tokens/s INFO:__main__:2024-11-30 03:30:34 | Epoch: 0 | Step: 315280 | Dataset: 0-1987560 | Loss: 0.480 | 596 ms/step , 115826.40 GFLOP/s , 173861.6 tokens/s INFO:__main__:2024-11-30 03:30:41 | Epoch: 0 | Step: 315290 | Dataset: 0-1989960 | Loss: 0.534 | 597 ms/step , 115592.99 GFLOP/s , 173821.8 tokens/s INFO:__main__:2024-11-30 03:30:48 | Epoch: 0 | Step: 315300 | Dataset: 0-1992360 | Loss: 0.312 | 596 ms/step , 115833.10 GFLOP/s , 173848.2 tokens/s INFO:__main__:2024-11-30 03:30:55 | Epoch: 0 | Step: 315310 | Dataset: 0-1994760 | Loss: 0.286 | 596 ms/step , 115863.05 GFLOP/s , 173990.1 tokens/s INFO:__main__:2024-11-30 03:31:02 | Epoch: 0 | Step: 315320 | Dataset: 0-1997160 | Loss: 0.292 | 597 ms/step , 115648.27 GFLOP/s , 173882.1 tokens/s INFO:__main__:2024-11-30 03:31:09 | Epoch: 0 | Step: 315330 | Dataset: 0-1999560 | Loss: 0.262 | 596 ms/step , 115711.98 GFLOP/s , 173830.4 tokens/s INFO:__main__:2024-11-30 03:31:17 | Epoch: 0 | Step: 315340 | Dataset: 0-2001960 | Loss: 0.260 | 596 ms/step , 115740.34 GFLOP/s , 173877.9 tokens/s INFO:__main__:2024-11-30 03:31:24 | Epoch: 0 | Step: 315350 | Dataset: 0-2004360 | Loss: 0.243 | 597 ms/step , 115626.66 GFLOP/s , 173868.7 tokens/s INFO:__main__:2024-11-30 03:31:31 | Epoch: 0 | Step: 315360 | Dataset: 0-2006760 | Loss: 0.220 | 596 ms/step , 115797.92 GFLOP/s , 173729.8 tokens/s INFO:__main__:2024-11-30 03:31:38 | Epoch: 0 | Step: 315370 | Dataset: 0-2009160 | Loss: 0.219 | 596 ms/step , 115773.34 GFLOP/s , 173821.2 tokens/s INFO:__main__:2024-11-30 03:31:45 | Epoch: 0 | Step: 315380 | Dataset: 0-2011560 | Loss: 0.199 | 595 ms/step , 115909.30 GFLOP/s , 173847.5 tokens/s INFO:__main__:2024-11-30 03:31:52 | Epoch: 0 | Step: 315390 | Dataset: 0-2013960 | Loss: 0.203 | 597 ms/step , 115639.93 GFLOP/s , 173886.2 tokens/s INFO:__main__:2024-11-30 03:31:59 | Epoch: 0 | Step: 315400 | Dataset: 0-2016360 | Loss: 0.181 | 597 ms/step , 115694.70 GFLOP/s , 173834.9 tokens/s INFO:__main__:2024-11-30 03:32:06 | Epoch: 0 | Step: 315410 | Dataset: 0-2018760 | Loss: 0.173 | 597 ms/step , 115523.22 GFLOP/s , 173848.5 tokens/s INFO:__main__:2024-11-30 03:32:13 | Epoch: 0 | Step: 315420 | Dataset: 0-2021160 | Loss: 0.182 | 596 ms/step , 115770.54 GFLOP/s , 173772.6 tokens/s INFO:__main__:2024-11-30 03:32:20 | Epoch: 0 | Step: 315430 | Dataset: 0-2023560 | Loss: 0.154 | 598 ms/step , 115345.06 GFLOP/s , 173831.3 tokens/s INFO:__main__:2024-11-30 03:32:27 | Epoch: 0 | Step: 315440 | Dataset: 0-2025960 | Loss: 0.168 | 596 ms/step , 115770.11 GFLOP/s , 173742.9 tokens/s INFO:__main__:2024-11-30 03:32:34 | Epoch: 0 | Step: 315450 | Dataset: 0-2028360 | Loss: 0.164 | 596 ms/step , 115732.41 GFLOP/s , 173781.3 tokens/s INFO:__main__:2024-11-30 03:32:41 | Epoch: 0 | Step: 315460 | Dataset: 0-2030760 | Loss: 0.170 | 596 ms/step , 115803.94 GFLOP/s , 173905.4 tokens/s INFO:__main__:2024-11-30 03:32:48 | Epoch: 0 | Step: 315470 | Dataset: 0-2033160 | Loss: 0.159 | 596 ms/step , 115798.97 GFLOP/s , 173844.8 tokens/s INFO:__main__:2024-11-30 03:32:55 | Epoch: 0 | Step: 315480 | Dataset: 0-2035560 | Loss: 0.157 | 596 ms/step , 115779.17 GFLOP/s , 173827.3 tokens/s INFO:__main__:2024-11-30 03:33:03 | Epoch: 0 | Step: 315490 | Dataset: 0-2037960 | Loss: 0.149 | 596 ms/step , 115795.52 GFLOP/s , 173806.9 tokens/s INFO:__main__:2024-11-30 03:33:10 | Validation | Step: 315500 | Val_loss: 0.608 | Best_val_loss: 0.4363 INFO:__main__:2024-11-30 03:33:11 | Epoch: 0 | Step: 315500 | Dataset: 0-2040360 | Loss: 0.153 | 595 ms/step , 116014.05 GFLOP/s , 147917.3 tokens/s INFO:__main__:2024-11-30 03:33:18 | Epoch: 0 | Step: 315510 | Dataset: 0-2042760 | Loss: 1.223 | 597 ms/step , 115668.52 GFLOP/s , 173885.1 tokens/s INFO:__main__:2024-11-30 03:33:25 | Epoch: 0 | Step: 315520 | Dataset: 0-2045160 | Loss: 1.227 | 598 ms/step , 115436.16 GFLOP/s , 173751.2 tokens/s INFO:__main__:2024-11-30 03:33:32 | Epoch: 0 | Step: 315530 | Dataset: 0-2047560 | Loss: 1.047 | 595 ms/step , 115975.79 GFLOP/s , 173922.8 tokens/s INFO:__main__:2024-11-30 03:33:39 | Epoch: 0 | Step: 315540 | Dataset: 0-2049960 | Loss: 1.157 | 596 ms/step , 115753.35 GFLOP/s , 173878.9 tokens/s INFO:__main__:2024-11-30 03:33:46 | Epoch: 0 | Step: 315550 | Dataset: 0-2052360 | Loss: 1.056 | 597 ms/step , 115628.54 GFLOP/s , 173801.0 tokens/s INFO:__main__:2024-11-30 03:33:53 | Epoch: 0 | Step: 315560 | Dataset: 0-2054760 | Loss: 0.850 | 596 ms/step , 115784.05 GFLOP/s , 173774.4 tokens/s INFO:__main__:2024-11-30 03:34:00 | Epoch: 0 | Step: 315570 | Dataset: 0-2057160 | Loss: 0.859 | 597 ms/step , 115685.23 GFLOP/s , 173736.0 tokens/s INFO:__main__:2024-11-30 03:34:07 | Epoch: 0 | Step: 315580 | Dataset: 0-2059560 | Loss: 0.862 | 597 ms/step , 115657.43 GFLOP/s , 173775.4 tokens/s INFO:__main__:2024-11-30 03:34:14 | Epoch: 0 | Step: 315590 | Dataset: 0-2061960 | Loss: 0.849 | 596 ms/step , 115696.04 GFLOP/s , 173768.0 tokens/s INFO:__main__:2024-11-30 03:34:22 | Epoch: 0 | Step: 315600 | Dataset: 0-2064360 | Loss: 0.826 | 597 ms/step , 115535.20 GFLOP/s , 173785.5 tokens/s INFO:__main__:2024-11-30 03:34:29 | Epoch: 0 | Step: 315610 | Dataset: 0-2066760 | Loss: 0.812 | 596 ms/step , 115768.57 GFLOP/s , 173850.2 tokens/s INFO:__main__:2024-11-30 03:34:36 | Epoch: 0 | Step: 315620 | Dataset: 0-2069160 | Loss: 0.762 | 597 ms/step , 115616.43 GFLOP/s , 173783.0 tokens/s INFO:__main__:2024-11-30 03:34:43 | Epoch: 0 | Step: 315630 | Dataset: 0-2071560 | Loss: 0.823 | 596 ms/step , 115771.16 GFLOP/s , 173682.2 tokens/s INFO:__main__:2024-11-30 03:34:50 | Epoch: 0 | Step: 315640 | Dataset: 0-2073960 | Loss: 0.785 | 597 ms/step , 115643.47 GFLOP/s , 173726.3 tokens/s INFO:__main__:2024-11-30 03:34:57 | Epoch: 0 | Step: 315650 | Dataset: 0-2076360 | Loss: 0.810 | 597 ms/step , 115597.56 GFLOP/s , 173655.1 tokens/s INFO:__main__:2024-11-30 03:35:04 | Epoch: 0 | Step: 315660 | Dataset: 0-2078760 | Loss: 0.780 | 597 ms/step , 115551.06 GFLOP/s , 173709.1 tokens/s INFO:__main__:2024-11-30 03:35:11 | Epoch: 0 | Step: 315670 | Dataset: 0-2081160 | Loss: 0.767 | 596 ms/step , 115729.08 GFLOP/s , 173680.1 tokens/s INFO:__main__:2024-11-30 03:35:18 | Epoch: 0 | Step: 315680 | Dataset: 0-2083560 | Loss: 0.781 | 596 ms/step , 115759.30 GFLOP/s , 173768.3 tokens/s INFO:__main__:2024-11-30 03:35:25 | Epoch: 0 | Step: 315690 | Dataset: 0-2085960 | Loss: 0.782 | 596 ms/step , 115757.84 GFLOP/s , 173755.8 tokens/s INFO:__main__:2024-11-30 03:35:32 | Epoch: 0 | Step: 315700 | Dataset: 0-2088360 | Loss: 0.741 | 597 ms/step , 115664.24 GFLOP/s , 173684.3 tokens/s INFO:__main__:2024-11-30 03:35:39 | Epoch: 0 | Step: 315710 | Dataset: 0-2090760 | Loss: 0.770 | 597 ms/step , 115584.40 GFLOP/s , 173792.3 tokens/s INFO:__main__:2024-11-30 03:35:46 | Epoch: 0 | Step: 315720 | Dataset: 0-2093160 | Loss: 0.748 | 597 ms/step , 115621.77 GFLOP/s , 173666.9 tokens/s INFO:__main__:2024-11-30 03:35:53 | Epoch: 0 | Step: 315730 | Dataset: 0-2095560 | Loss: 0.709 | 597 ms/step , 115664.27 GFLOP/s , 173720.1 tokens/s INFO:__main__:2024-11-30 03:36:01 | Epoch: 0 | Step: 315740 | Dataset: 0-2097960 | Loss: 0.618 | 597 ms/step , 115648.74 GFLOP/s , 173680.0 tokens/s INFO:__main__:2024-11-30 03:36:08 | Epoch: 0 | Step: 315750 | Dataset: 0-2100360 | Loss: 0.629 | 597 ms/step , 115650.73 GFLOP/s , 173717.3 tokens/s INFO:__main__:2024-11-30 03:36:15 | Epoch: 0 | Step: 315760 | Dataset: 0-2102760 | Loss: 0.542 | 596 ms/step , 115740.52 GFLOP/s , 173872.4 tokens/s INFO:__main__:2024-11-30 03:36:22 | Epoch: 0 | Step: 315770 | Dataset: 0-2105160 | Loss: 0.575 | 597 ms/step , 115618.08 GFLOP/s , 173813.7 tokens/s INFO:__main__:2024-11-30 03:36:29 | Epoch: 0 | Step: 315780 | Dataset: 0-2107560 | Loss: 0.594 | 597 ms/step , 115649.19 GFLOP/s , 173707.6 tokens/s INFO:__main__:2024-11-30 03:36:36 | Epoch: 0 | Step: 315790 | Dataset: 0-2109960 | Loss: 0.553 | 597 ms/step , 115608.63 GFLOP/s , 173677.3 tokens/s INFO:__main__:2024-11-30 03:36:43 | Epoch: 0 | Step: 315800 | Dataset: 0-2112360 | Loss: 0.559 | 597 ms/step , 115671.91 GFLOP/s , 173770.1 tokens/s INFO:__main__:2024-11-30 03:36:50 | Epoch: 0 | Step: 315810 | Dataset: 0-2114760 | Loss: 0.544 | 596 ms/step , 115741.88 GFLOP/s , 173679.7 tokens/s INFO:__main__:2024-11-30 03:36:57 | Epoch: 0 | Step: 315820 | Dataset: 0-2117160 | Loss: 0.566 | 598 ms/step , 115497.10 GFLOP/s , 173687.0 tokens/s INFO:__main__:2024-11-30 03:37:04 | Epoch: 0 | Step: 315830 | Dataset: 0-2119560 | Loss: 0.555 | 596 ms/step , 115796.74 GFLOP/s , 173861.4 tokens/s INFO:__main__:2024-11-30 03:37:11 | Epoch: 0 | Step: 315840 | Dataset: 0-2121960 | Loss: 0.514 | 597 ms/step , 115688.75 GFLOP/s , 173844.5 tokens/s INFO:__main__:2024-11-30 03:37:18 | Epoch: 0 | Step: 315850 | Dataset: 0-2124360 | Loss: 0.498 | 596 ms/step , 115769.49 GFLOP/s , 173791.8 tokens/s INFO:__main__:2024-11-30 03:37:25 | Epoch: 0 | Step: 315860 | Dataset: 0-2126760 | Loss: 0.529 | 597 ms/step , 115656.55 GFLOP/s , 173708.8 tokens/s INFO:__main__:2024-11-30 03:37:33 | Epoch: 0 | Step: 315870 | Dataset: 0-2129160 | Loss: 0.533 | 597 ms/step , 115559.39 GFLOP/s , 173791.6 tokens/s INFO:__main__:2024-11-30 03:37:40 | Epoch: 0 | Step: 315880 | Dataset: 0-2131560 | Loss: 0.500 | 596 ms/step , 115696.93 GFLOP/s , 173822.2 tokens/s INFO:__main__:2024-11-30 03:37:47 | Epoch: 0 | Step: 315890 | Dataset: 0-2133960 | Loss: 0.495 | 597 ms/step , 115624.35 GFLOP/s , 173744.1 tokens/s INFO:__main__:2024-11-30 03:37:54 | Epoch: 0 | Step: 315900 | Dataset: 0-2136360 | Loss: 0.554 | 596 ms/step , 115818.41 GFLOP/s , 173753.0 tokens/s INFO:__main__:2024-11-30 03:38:01 | Epoch: 0 | Step: 315910 | Dataset: 0-2138760 | Loss: 0.551 | 596 ms/step , 115760.85 GFLOP/s , 173850.3 tokens/s INFO:__main__:2024-11-30 03:38:08 | Epoch: 0 | Step: 315920 | Dataset: 0-2141160 | Loss: 0.599 | 596 ms/step , 115716.28 GFLOP/s , 173807.9 tokens/s INFO:__main__:2024-11-30 03:38:15 | Epoch: 0 | Step: 315930 | Dataset: 0-2143560 | Loss: 0.508 | 597 ms/step , 115583.66 GFLOP/s , 173676.4 tokens/s INFO:__main__:2024-11-30 03:38:22 | Epoch: 0 | Step: 315940 | Dataset: 0-2145960 | Loss: 0.485 | 597 ms/step , 115575.16 GFLOP/s , 173703.1 tokens/s INFO:__main__:2024-11-30 03:38:29 | Epoch: 0 | Step: 315950 | Dataset: 0-2148360 | Loss: 0.471 | 596 ms/step , 115710.48 GFLOP/s , 173716.6 tokens/s INFO:__main__:2024-11-30 03:38:36 | Epoch: 0 | Step: 315960 | Dataset: 0-2150760 | Loss: 0.501 | 597 ms/step , 115624.58 GFLOP/s , 173722.9 tokens/s INFO:__main__:2024-11-30 03:38:43 | Epoch: 0 | Step: 315970 | Dataset: 0-2153160 | Loss: 0.535 | 598 ms/step , 115490.11 GFLOP/s , 173745.1 tokens/s INFO:__main__:2024-11-30 03:38:50 | Epoch: 0 | Step: 315980 | Dataset: 0-2155560 | Loss: 0.589 | 596 ms/step , 115782.06 GFLOP/s , 173805.2 tokens/s INFO:__main__:2024-11-30 03:38:57 | Epoch: 0 | Step: 315990 | Dataset: 0-2157960 | Loss: 0.525 | 597 ms/step , 115674.60 GFLOP/s , 173900.4 tokens/s INFO:__main__:2024-11-30 03:39:05 | Validation | Step: 316000 | Val_loss: 0.519 | Best_val_loss: 0.4363 INFO:__main__:2024-11-30 03:39:05 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_033905_step_316000.pt` INFO:__main__:2024-11-30 03:39:08 | Epoch: 0 | Step: 316000 | Dataset: 0-2160360 | Loss: 0.533 | 595 ms/step , 115962.08 GFLOP/s , 119413.2 tokens/s INFO:__main__:2024-11-30 03:39:15 | Epoch: 0 | Step: 316010 | Dataset: 0-2162760 | Loss: 0.568 | 598 ms/step , 115393.31 GFLOP/s , 173421.6 tokens/s INFO:__main__:2024-11-30 03:39:22 | Epoch: 0 | Step: 316020 | Dataset: 0-2165160 | Loss: 0.537 | 599 ms/step , 115154.23 GFLOP/s , 173358.6 tokens/s INFO:__main__:2024-11-30 03:39:29 | Epoch: 0 | Step: 316030 | Dataset: 0-2167560 | Loss: 0.502 | 599 ms/step , 115249.43 GFLOP/s , 173414.6 tokens/s INFO:__main__:2024-11-30 03:39:36 | Epoch: 0 | Step: 316040 | Dataset: 0-2169960 | Loss: 0.467 | 599 ms/step , 115252.20 GFLOP/s , 173394.0 tokens/s INFO:__main__:2024-11-30 03:39:43 | Epoch: 0 | Step: 316050 | Dataset: 0-2172360 | Loss: 0.503 | 598 ms/step , 115385.94 GFLOP/s , 173459.6 tokens/s INFO:__main__:2024-11-30 03:39:50 | Epoch: 0 | Step: 316060 | Dataset: 0-2174760 | Loss: 0.500 | 598 ms/step , 115339.74 GFLOP/s , 173525.5 tokens/s INFO:__main__:2024-11-30 03:39:57 | Epoch: 0 | Step: 316070 | Dataset: 0-2177160 | Loss: 0.480 | 599 ms/step , 115174.28 GFLOP/s , 173373.4 tokens/s INFO:__main__:2024-11-30 03:40:04 | Epoch: 0 | Step: 316080 | Dataset: 0-2179560 | Loss: 0.526 | 599 ms/step , 115257.74 GFLOP/s , 173362.4 tokens/s INFO:__main__:2024-11-30 03:40:11 | Epoch: 0 | Step: 316090 | Dataset: 0-2181960 | Loss: 0.490 | 598 ms/step , 115367.35 GFLOP/s , 173368.0 tokens/s INFO:__main__:2024-11-30 03:40:19 | Epoch: 0 | Step: 316100 | Dataset: 0-2184360 | Loss: 0.583 | 599 ms/step , 115233.87 GFLOP/s , 173367.7 tokens/s INFO:__main__:2024-11-30 03:40:26 | Epoch: 0 | Step: 316110 | Dataset: 0-2186760 | Loss: 0.513 | 599 ms/step , 115293.30 GFLOP/s , 173349.5 tokens/s INFO:__main__:2024-11-30 03:40:33 | Epoch: 0 | Step: 316120 | Dataset: 0-2189160 | Loss: 0.656 | 599 ms/step , 115287.73 GFLOP/s , 173365.0 tokens/s INFO:__main__:2024-11-30 03:40:40 | Epoch: 0 | Step: 316130 | Dataset: 0-2191560 | Loss: 0.659 | 599 ms/step , 115239.70 GFLOP/s , 173507.9 tokens/s INFO:__main__:2024-11-30 03:40:47 | Epoch: 0 | Step: 316140 | Dataset: 0-2193960 | Loss: 0.792 | 598 ms/step , 115317.11 GFLOP/s , 173382.9 tokens/s INFO:__main__:2024-11-30 03:40:54 | Epoch: 0 | Step: 316150 | Dataset: 0-2196360 | Loss: 0.719 | 599 ms/step , 115136.77 GFLOP/s , 173294.1 tokens/s INFO:__main__:2024-11-30 03:41:01 | Epoch: 0 | Step: 316160 | Dataset: 0-2198760 | Loss: 0.744 | 597 ms/step , 115606.16 GFLOP/s , 173601.4 tokens/s INFO:__main__:2024-11-30 03:41:08 | Epoch: 0 | Step: 316170 | Dataset: 0-2201160 | Loss: 0.760 | 597 ms/step , 115659.94 GFLOP/s , 173768.2 tokens/s INFO:__main__:2024-11-30 03:41:15 | Epoch: 0 | Step: 316180 | Dataset: 0-2203560 | Loss: 0.735 | 596 ms/step , 115733.32 GFLOP/s , 173663.8 tokens/s INFO:__main__:2024-11-30 03:41:22 | Epoch: 0 | Step: 316190 | Dataset: 0-2205960 | Loss: 0.724 | 598 ms/step , 115447.85 GFLOP/s , 173682.6 tokens/s INFO:__main__:2024-11-30 03:41:29 | Epoch: 0 | Step: 316200 | Dataset: 0-2208360 | Loss: 0.684 | 597 ms/step , 115618.73 GFLOP/s , 173786.3 tokens/s INFO:__main__:2024-11-30 03:41:36 | Epoch: 0 | Step: 316210 | Dataset: 0-2210760 | Loss: 0.791 | 596 ms/step , 115742.58 GFLOP/s , 173843.8 tokens/s INFO:__main__:2024-11-30 03:41:43 | Epoch: 0 | Step: 316220 | Dataset: 0-2213160 | Loss: 0.756 | 597 ms/step , 115644.51 GFLOP/s , 173654.3 tokens/s INFO:__main__:2024-11-30 03:41:51 | Epoch: 0 | Step: 316230 | Dataset: 0-2215560 | Loss: 0.722 | 597 ms/step , 115588.57 GFLOP/s , 173747.1 tokens/s INFO:__main__:2024-11-30 03:41:58 | Epoch: 0 | Step: 316240 | Dataset: 0-2217960 | Loss: 0.678 | 597 ms/step , 115652.19 GFLOP/s , 173719.1 tokens/s INFO:__main__:2024-11-30 03:42:05 | Epoch: 0 | Step: 316250 | Dataset: 0-2220360 | Loss: 0.673 | 597 ms/step , 115652.77 GFLOP/s , 173732.2 tokens/s INFO:__main__:2024-11-30 03:42:12 | Epoch: 0 | Step: 316260 | Dataset: 0-2222760 | Loss: 0.658 | 597 ms/step , 115598.17 GFLOP/s , 173689.2 tokens/s INFO:__main__:2024-11-30 03:42:19 | Epoch: 0 | Step: 316270 | Dataset: 0-2225160 | Loss: 0.744 | 596 ms/step , 115765.69 GFLOP/s , 173701.4 tokens/s INFO:__main__:2024-11-30 03:42:26 | Epoch: 0 | Step: 316280 | Dataset: 0-2227560 | Loss: 0.746 | 599 ms/step , 115228.42 GFLOP/s , 173750.1 tokens/s INFO:__main__:2024-11-30 03:42:33 | Epoch: 0 | Step: 316290 | Dataset: 0-2229960 | Loss: 0.637 | 596 ms/step , 115836.87 GFLOP/s , 173781.8 tokens/s INFO:__main__:2024-11-30 03:42:40 | Epoch: 0 | Step: 316300 | Dataset: 0-2232360 | Loss: 0.629 | 597 ms/step , 115620.74 GFLOP/s , 173780.2 tokens/s INFO:__main__:2024-11-30 03:42:47 | Epoch: 0 | Step: 316310 | Dataset: 0-2234760 | Loss: 0.616 | 596 ms/step , 115737.86 GFLOP/s , 173826.8 tokens/s INFO:__main__:2024-11-30 03:42:54 | Epoch: 0 | Step: 316320 | Dataset: 0-2237160 | Loss: 0.613 | 597 ms/step , 115592.16 GFLOP/s , 173727.0 tokens/s INFO:__main__:2024-11-30 03:43:01 | Epoch: 0 | Step: 316330 | Dataset: 0-2239560 | Loss: 0.610 | 597 ms/step , 115609.35 GFLOP/s , 173735.3 tokens/s INFO:__main__:2024-11-30 03:43:08 | Epoch: 0 | Step: 316340 | Dataset: 0-2241960 | Loss: 0.601 | 595 ms/step , 115891.76 GFLOP/s , 173739.0 tokens/s INFO:__main__:2024-11-30 03:43:15 | Epoch: 0 | Step: 316350 | Dataset: 0-2244360 | Loss: 0.574 | 596 ms/step , 115713.69 GFLOP/s , 173888.8 tokens/s INFO:__main__:2024-11-30 03:43:22 | Epoch: 0 | Step: 316360 | Dataset: 0-2246760 | Loss: 0.308 | 595 ms/step , 115935.59 GFLOP/s , 173970.0 tokens/s INFO:__main__:2024-11-30 03:43:30 | Epoch: 0 | Step: 316370 | Dataset: 0-2249160 | Loss: 0.297 | 596 ms/step , 115867.05 GFLOP/s , 173873.0 tokens/s INFO:__main__:2024-11-30 03:43:37 | Epoch: 0 | Step: 316380 | Dataset: 0-2251560 | Loss: 0.284 | 597 ms/step , 115670.88 GFLOP/s , 173858.5 tokens/s INFO:__main__:2024-11-30 03:43:44 | Epoch: 0 | Step: 316390 | Dataset: 0-2253960 | Loss: 0.301 | 597 ms/step , 115672.37 GFLOP/s , 173838.3 tokens/s INFO:__main__:2024-11-30 03:43:51 | Epoch: 0 | Step: 316400 | Dataset: 0-2256360 | Loss: 0.259 | 596 ms/step , 115738.14 GFLOP/s , 173811.1 tokens/s INFO:__main__:2024-11-30 03:43:58 | Epoch: 0 | Step: 316410 | Dataset: 0-2258760 | Loss: 0.246 | 597 ms/step , 115607.56 GFLOP/s , 173799.6 tokens/s INFO:__main__:2024-11-30 03:44:05 | Epoch: 0 | Step: 316420 | Dataset: 0-2261160 | Loss: 0.269 | 595 ms/step , 115905.47 GFLOP/s , 173954.2 tokens/s INFO:__main__:2024-11-30 03:44:12 | Epoch: 0 | Step: 316430 | Dataset: 0-2263560 | Loss: 0.260 | 597 ms/step , 115617.55 GFLOP/s , 173896.2 tokens/s INFO:__main__:2024-11-30 03:44:19 | Epoch: 0 | Step: 316440 | Dataset: 0-2265960 | Loss: 0.247 | 596 ms/step , 115751.34 GFLOP/s , 173844.6 tokens/s INFO:__main__:2024-11-30 03:44:26 | Epoch: 0 | Step: 316450 | Dataset: 0-2268360 | Loss: 0.230 | 596 ms/step , 115785.68 GFLOP/s , 173839.4 tokens/s INFO:__main__:2024-11-30 03:44:33 | Epoch: 0 | Step: 316460 | Dataset: 0-2270760 | Loss: 0.225 | 597 ms/step , 115573.00 GFLOP/s , 173763.6 tokens/s INFO:__main__:2024-11-30 03:44:40 | Epoch: 0 | Step: 316470 | Dataset: 0-2273160 | Loss: 0.230 | 597 ms/step , 115558.53 GFLOP/s , 173768.7 tokens/s INFO:__main__:2024-11-30 03:44:47 | Epoch: 0 | Step: 316480 | Dataset: 0-2275560 | Loss: 0.237 | 596 ms/step , 115707.87 GFLOP/s , 173820.7 tokens/s INFO:__main__:2024-11-30 03:44:54 | Epoch: 0 | Step: 316490 | Dataset: 0-2277960 | Loss: 0.235 | 597 ms/step , 115605.43 GFLOP/s , 173809.7 tokens/s INFO:__main__:2024-11-30 03:45:02 | Validation | Step: 316500 | Val_loss: 0.561 | Best_val_loss: 0.4363 INFO:__main__:2024-11-30 03:45:03 | Epoch: 0 | Step: 316500 | Dataset: 0-2280360 | Loss: 0.224 | 595 ms/step , 116073.93 GFLOP/s , 147983.3 tokens/s INFO:__main__:2024-11-30 03:45:10 | Epoch: 0 | Step: 316510 | Dataset: 0-2282760 | Loss: 0.228 | 597 ms/step , 115666.35 GFLOP/s , 174075.2 tokens/s INFO:__main__:2024-11-30 03:45:17 | Epoch: 0 | Step: 316520 | Dataset: 0-2285160 | Loss: 0.224 | 596 ms/step , 115751.44 GFLOP/s , 173871.0 tokens/s INFO:__main__:2024-11-30 03:45:24 | Epoch: 0 | Step: 316530 | Dataset: 0-2287560 | Loss: 0.226 | 597 ms/step , 115637.24 GFLOP/s , 173861.2 tokens/s INFO:__main__:2024-11-30 03:45:31 | Epoch: 0 | Step: 316540 | Dataset: 0-2289960 | Loss: 0.200 | 597 ms/step , 115590.41 GFLOP/s , 173759.3 tokens/s INFO:__main__:2024-11-30 03:45:38 | Epoch: 0 | Step: 316550 | Dataset: 0-2292360 | Loss: 0.212 | 596 ms/step , 115738.78 GFLOP/s , 173891.2 tokens/s INFO:__main__:2024-11-30 03:45:45 | Epoch: 0 | Step: 316560 | Dataset: 0-2294760 | Loss: 1.315 | 597 ms/step , 115662.27 GFLOP/s , 173770.1 tokens/s INFO:__main__:2024-11-30 03:45:52 | Epoch: 0 | Step: 316570 | Dataset: 0-2297160 | Loss: 1.366 | 597 ms/step , 115674.46 GFLOP/s , 173726.1 tokens/s INFO:__main__:2024-11-30 03:45:59 | Epoch: 0 | Step: 316580 | Dataset: 0-2299560 | Loss: 0.484 | 597 ms/step , 115623.34 GFLOP/s , 173824.2 tokens/s INFO:__main__:2024-11-30 03:46:06 | Epoch: 0 | Step: 316590 | Dataset: 0-2301960 | Loss: 0.541 | 598 ms/step , 115481.96 GFLOP/s , 173749.2 tokens/s INFO:__main__:2024-11-30 03:46:13 | Epoch: 0 | Step: 316600 | Dataset: 0-2304360 | Loss: 0.470 | 597 ms/step , 115633.19 GFLOP/s , 173680.2 tokens/s INFO:__main__:2024-11-30 03:46:20 | Epoch: 0 | Step: 316610 | Dataset: 0-2306760 | Loss: 0.509 | 597 ms/step , 115559.14 GFLOP/s , 173663.4 tokens/s INFO:__main__:2024-11-30 03:46:28 | Epoch: 0 | Step: 316620 | Dataset: 0-2309160 | Loss: 0.466 | 597 ms/step , 115509.50 GFLOP/s , 173684.2 tokens/s INFO:__main__:2024-11-30 03:46:35 | Epoch: 0 | Step: 316630 | Dataset: 0-2311560 | Loss: 0.558 | 597 ms/step , 115663.28 GFLOP/s , 173801.3 tokens/s INFO:__main__:2024-11-30 03:46:42 | Epoch: 0 | Step: 316640 | Dataset: 0-2313960 | Loss: 0.497 | 597 ms/step , 115670.87 GFLOP/s , 173724.7 tokens/s INFO:__main__:2024-11-30 03:46:49 | Epoch: 0 | Step: 316650 | Dataset: 0-2316360 | Loss: 0.475 | 597 ms/step , 115611.19 GFLOP/s , 173837.0 tokens/s INFO:__main__:2024-11-30 03:46:56 | Epoch: 0 | Step: 316660 | Dataset: 0-2318760 | Loss: 0.497 | 597 ms/step , 115637.48 GFLOP/s , 173779.6 tokens/s INFO:__main__:2024-11-30 03:47:03 | Epoch: 0 | Step: 316670 | Dataset: 0-2321160 | Loss: 0.500 | 597 ms/step , 115621.96 GFLOP/s , 173674.3 tokens/s INFO:__main__:2024-11-30 03:47:10 | Epoch: 0 | Step: 316680 | Dataset: 0-2323560 | Loss: 0.501 | 597 ms/step , 115597.35 GFLOP/s , 173675.6 tokens/s INFO:__main__:2024-11-30 03:47:17 | Epoch: 0 | Step: 316690 | Dataset: 0-2325960 | Loss: 0.432 | 596 ms/step , 115716.93 GFLOP/s , 173676.1 tokens/s INFO:__main__:2024-11-30 03:47:24 | Epoch: 0 | Step: 316700 | Dataset: 0-2328360 | Loss: 0.439 | 597 ms/step , 115690.22 GFLOP/s , 173624.9 tokens/s INFO:__main__:2024-11-30 03:47:31 | Epoch: 0 | Step: 316710 | Dataset: 0-2330760 | Loss: 0.495 | 597 ms/step , 115598.39 GFLOP/s , 173718.8 tokens/s INFO:__main__:2024-11-30 03:47:38 | Epoch: 0 | Step: 316720 | Dataset: 0-2333160 | Loss: 0.474 | 597 ms/step , 115547.50 GFLOP/s , 173722.7 tokens/s INFO:__main__:2024-11-30 03:47:45 | Epoch: 0 | Step: 316730 | Dataset: 0-2335560 | Loss: 0.509 | 596 ms/step , 115767.24 GFLOP/s , 173808.4 tokens/s INFO:__main__:2024-11-30 03:47:52 | Epoch: 0 | Step: 316740 | Dataset: 0-2337960 | Loss: 0.434 | 598 ms/step , 115475.91 GFLOP/s , 173737.9 tokens/s INFO:__main__:2024-11-30 03:47:59 | Epoch: 0 | Step: 316750 | Dataset: 0-2340360 | Loss: 0.458 | 597 ms/step , 115628.86 GFLOP/s , 173657.0 tokens/s INFO:__main__:2024-11-30 03:48:07 | Epoch: 0 | Step: 316760 | Dataset: 0-2342760 | Loss: 0.452 | 597 ms/step , 115578.22 GFLOP/s , 173685.3 tokens/s INFO:__main__:2024-11-30 03:48:14 | Epoch: 0 | Step: 316770 | Dataset: 0-2345160 | Loss: 0.441 | 597 ms/step , 115524.01 GFLOP/s , 173658.3 tokens/s INFO:__main__:2024-11-30 03:48:21 | Epoch: 0 | Step: 316780 | Dataset: 0-2347560 | Loss: 0.464 | 597 ms/step , 115679.40 GFLOP/s , 173699.2 tokens/s INFO:__main__:2024-11-30 03:48:28 | Epoch: 0 | Step: 316790 | Dataset: 0-2349960 | Loss: 0.535 | 597 ms/step , 115577.15 GFLOP/s , 173683.6 tokens/s INFO:__main__:2024-11-30 03:48:35 | Epoch: 0 | Step: 316800 | Dataset: 0-2352360 | Loss: 0.467 | 596 ms/step , 115725.41 GFLOP/s , 173669.9 tokens/s INFO:__main__:2024-11-30 03:48:42 | Epoch: 0 | Step: 316810 | Dataset: 0-2354760 | Loss: 0.487 | 597 ms/step , 115629.08 GFLOP/s , 173845.6 tokens/s INFO:__main__:2024-11-30 03:48:49 | Epoch: 0 | Step: 316820 | Dataset: 0-2357160 | Loss: 0.498 | 599 ms/step , 115287.20 GFLOP/s , 173589.9 tokens/s INFO:__main__:2024-11-30 03:48:56 | Epoch: 0 | Step: 316830 | Dataset: 0-2359560 | Loss: 0.517 | 597 ms/step , 115614.53 GFLOP/s , 173662.8 tokens/s INFO:__main__:2024-11-30 03:49:03 | Epoch: 0 | Step: 316840 | Dataset: 0-2361960 | Loss: 0.949 | 597 ms/step , 115510.78 GFLOP/s , 173607.5 tokens/s INFO:__main__:2024-11-30 03:49:10 | Epoch: 0 | Step: 316850 | Dataset: 0-2364360 | Loss: 0.859 | 597 ms/step , 115562.60 GFLOP/s , 173499.7 tokens/s INFO:__main__:2024-11-30 03:49:17 | Epoch: 0 | Step: 316860 | Dataset: 0-2366760 | Loss: 0.877 | 597 ms/step , 115641.78 GFLOP/s , 173582.5 tokens/s INFO:__main__:2024-11-30 03:49:24 | Epoch: 0 | Step: 316870 | Dataset: 0-2369160 | Loss: 0.765 | 597 ms/step , 115688.53 GFLOP/s , 173633.9 tokens/s INFO:__main__:2024-11-30 03:49:31 | Epoch: 0 | Step: 316880 | Dataset: 0-2371560 | Loss: 0.831 | 597 ms/step , 115511.75 GFLOP/s , 173734.5 tokens/s INFO:__main__:2024-11-30 03:49:39 | Epoch: 0 | Step: 316890 | Dataset: 0-2373960 | Loss: 0.824 | 598 ms/step , 115384.22 GFLOP/s , 173562.6 tokens/s INFO:__main__:2024-11-30 03:49:46 | Epoch: 0 | Step: 316900 | Dataset: 0-2376360 | Loss: 0.819 | 598 ms/step , 115388.91 GFLOP/s , 173670.4 tokens/s INFO:__main__:2024-11-30 03:49:53 | Epoch: 0 | Step: 316910 | Dataset: 0-2378760 | Loss: 0.863 | 597 ms/step , 115539.05 GFLOP/s , 173558.6 tokens/s INFO:__main__:2024-11-30 03:50:00 | Epoch: 0 | Step: 316920 | Dataset: 0-2381160 | Loss: 0.827 | 597 ms/step , 115560.92 GFLOP/s , 173577.6 tokens/s INFO:__main__:2024-11-30 03:50:07 | Epoch: 0 | Step: 316930 | Dataset: 0-2383560 | Loss: 0.777 | 598 ms/step , 115353.91 GFLOP/s , 173564.6 tokens/s INFO:__main__:2024-11-30 03:50:14 | Epoch: 0 | Step: 316940 | Dataset: 0-2385960 | Loss: 0.818 | 597 ms/step , 115604.04 GFLOP/s , 173635.0 tokens/s INFO:__main__:2024-11-30 03:50:21 | Epoch: 0 | Step: 316950 | Dataset: 0-2388360 | Loss: 0.895 | 597 ms/step , 115615.77 GFLOP/s , 173619.3 tokens/s INFO:__main__:2024-11-30 03:50:28 | Epoch: 0 | Step: 316960 | Dataset: 0-2390760 | Loss: 0.815 | 597 ms/step , 115601.73 GFLOP/s , 173626.0 tokens/s INFO:__main__:2024-11-30 03:50:35 | Epoch: 0 | Step: 316970 | Dataset: 0-2393160 | Loss: 0.788 | 597 ms/step , 115606.19 GFLOP/s , 173564.6 tokens/s INFO:__main__:2024-11-30 03:50:42 | Epoch: 0 | Step: 316980 | Dataset: 0-2395560 | Loss: 0.764 | 597 ms/step , 115555.36 GFLOP/s , 173555.9 tokens/s INFO:__main__:2024-11-30 03:50:49 | Epoch: 0 | Step: 316990 | Dataset: 0-2397960 | Loss: 0.788 | 598 ms/step , 115497.99 GFLOP/s , 173600.6 tokens/s INFO:__main__:2024-11-30 03:50:57 | Validation | Step: 317000 | Val_loss: 0.553 | Best_val_loss: 0.4363 INFO:__main__:2024-11-30 03:50:57 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_035057_step_317000.pt` INFO:__main__:2024-11-30 03:51:00 | Epoch: 0 | Step: 317000 | Dataset: 0-2400360 | Loss: 0.816 | 596 ms/step , 115793.79 GFLOP/s , 118667.2 tokens/s INFO:__main__:2024-11-30 03:51:07 | Epoch: 0 | Step: 317010 | Dataset: 0-2402760 | Loss: 0.775 | 599 ms/step , 115160.02 GFLOP/s , 173198.1 tokens/s INFO:__main__:2024-11-30 03:51:14 | Epoch: 0 | Step: 317020 | Dataset: 0-2405160 | Loss: 0.801 | 599 ms/step , 115250.20 GFLOP/s , 173359.5 tokens/s INFO:__main__:2024-11-30 03:51:21 | Epoch: 0 | Step: 317030 | Dataset: 0-2407560 | Loss: 0.832 | 599 ms/step , 115267.06 GFLOP/s , 173462.0 tokens/s INFO:__main__:2024-11-30 03:51:28 | Epoch: 0 | Step: 317040 | Dataset: 0-2409960 | Loss: 0.788 | 598 ms/step , 115444.19 GFLOP/s , 173419.8 tokens/s INFO:__main__:2024-11-30 03:51:35 | Epoch: 0 | Step: 317050 | Dataset: 0-2412360 | Loss: 0.811 | 597 ms/step , 115595.47 GFLOP/s , 173705.2 tokens/s INFO:__main__:2024-11-30 03:51:42 | Epoch: 0 | Step: 317060 | Dataset: 0-2414760 | Loss: 0.812 | 598 ms/step , 115440.72 GFLOP/s , 173742.9 tokens/s INFO:__main__:2024-11-30 03:51:49 | Epoch: 0 | Step: 317070 | Dataset: 0-2417160 | Loss: 0.760 | 597 ms/step , 115505.81 GFLOP/s , 173727.8 tokens/s INFO:__main__:2024-11-30 03:51:56 | Epoch: 0 | Step: 317080 | Dataset: 0-2419560 | Loss: 0.812 | 597 ms/step , 115629.12 GFLOP/s , 173658.7 tokens/s INFO:__main__:2024-11-30 03:52:03 | Epoch: 0 | Step: 317090 | Dataset: 0-2421960 | Loss: 0.803 | 597 ms/step , 115583.81 GFLOP/s , 173662.6 tokens/s INFO:__main__:2024-11-30 03:52:10 | Epoch: 0 | Step: 317100 | Dataset: 0-2424360 | Loss: 0.831 | 597 ms/step , 115615.80 GFLOP/s , 173795.4 tokens/s INFO:__main__:2024-11-30 03:52:18 | Epoch: 0 | Step: 317110 | Dataset: 0-2426760 | Loss: 0.696 | 597 ms/step , 115601.84 GFLOP/s , 173700.3 tokens/s INFO:__main__:2024-11-30 03:52:25 | Epoch: 0 | Step: 317120 | Dataset: 0-2429160 | Loss: 0.815 | 597 ms/step , 115560.42 GFLOP/s , 173674.7 tokens/s INFO:__main__:2024-11-30 03:52:32 | Epoch: 0 | Step: 317130 | Dataset: 0-2431560 | Loss: 0.734 | 598 ms/step , 115463.06 GFLOP/s , 173645.6 tokens/s INFO:__main__:2024-11-30 03:52:39 | Epoch: 0 | Step: 317140 | Dataset: 0-2433960 | Loss: 0.789 | 597 ms/step , 115565.73 GFLOP/s , 173705.1 tokens/s INFO:__main__:2024-11-30 03:52:46 | Epoch: 0 | Step: 317150 | Dataset: 0-2436360 | Loss: 0.771 | 598 ms/step , 115450.90 GFLOP/s , 173598.8 tokens/s INFO:__main__:2024-11-30 03:52:53 | Epoch: 0 | Step: 317160 | Dataset: 0-2438760 | Loss: 0.779 | 597 ms/step , 115517.47 GFLOP/s , 173633.1 tokens/s INFO:__main__:2024-11-30 03:53:00 | Epoch: 0 | Step: 317170 | Dataset: 0-2441160 | Loss: 0.794 | 597 ms/step , 115620.26 GFLOP/s , 173699.4 tokens/s INFO:__main__:2024-11-30 03:53:07 | Epoch: 0 | Step: 317180 | Dataset: 0-2443560 | Loss: 0.794 | 597 ms/step , 115574.03 GFLOP/s , 173769.9 tokens/s INFO:__main__:2024-11-30 03:53:14 | Epoch: 0 | Step: 317190 | Dataset: 0-2445960 | Loss: 0.767 | 597 ms/step , 115542.96 GFLOP/s , 173661.9 tokens/s INFO:__main__:2024-11-30 03:53:21 | Epoch: 0 | Step: 317200 | Dataset: 0-2448360 | Loss: 0.777 | 597 ms/step , 115558.04 GFLOP/s , 173632.6 tokens/s INFO:__main__:2024-11-30 03:53:28 | Epoch: 0 | Step: 317210 | Dataset: 0-2450760 | Loss: 0.714 | 598 ms/step , 115459.60 GFLOP/s , 173630.8 tokens/s INFO:__main__:2024-11-30 03:53:35 | Epoch: 0 | Step: 317220 | Dataset: 0-2453160 | Loss: 0.741 | 598 ms/step , 115501.45 GFLOP/s , 173614.2 tokens/s INFO:__main__:2024-11-30 03:53:42 | Epoch: 0 | Step: 317230 | Dataset: 0-2455560 | Loss: 0.804 | 598 ms/step , 115432.66 GFLOP/s , 173623.6 tokens/s INFO:__main__:2024-11-30 03:53:50 | Epoch: 0 | Step: 317240 | Dataset: 0-2457960 | Loss: 0.818 | 597 ms/step , 115542.28 GFLOP/s , 173659.2 tokens/s INFO:__main__:2024-11-30 03:53:57 | Epoch: 0 | Step: 317250 | Dataset: 0-2460360 | Loss: 0.783 | 597 ms/step , 115506.73 GFLOP/s , 173725.4 tokens/s INFO:__main__:2024-11-30 03:54:04 | Epoch: 0 | Step: 317260 | Dataset: 0-2462760 | Loss: 0.785 | 598 ms/step , 115358.23 GFLOP/s , 173589.6 tokens/s INFO:__main__:2024-11-30 03:54:11 | Epoch: 0 | Step: 317270 | Dataset: 0-2465160 | Loss: 0.820 | 598 ms/step , 115420.59 GFLOP/s , 173628.3 tokens/s INFO:__main__:2024-11-30 03:54:18 | Epoch: 0 | Step: 317280 | Dataset: 0-2467560 | Loss: 0.761 | 598 ms/step , 115437.30 GFLOP/s , 173605.6 tokens/s INFO:__main__:2024-11-30 03:54:25 | Epoch: 0 | Step: 317290 | Dataset: 0-2469960 | Loss: 0.825 | 598 ms/step , 115449.91 GFLOP/s , 173695.0 tokens/s INFO:__main__:2024-11-30 03:54:32 | Epoch: 0 | Step: 317300 | Dataset: 0-2472360 | Loss: 0.850 | 598 ms/step , 115366.75 GFLOP/s , 173558.2 tokens/s INFO:__main__:2024-11-30 03:54:39 | Epoch: 0 | Step: 317310 | Dataset: 0-2474760 | Loss: 0.732 | 598 ms/step , 115458.40 GFLOP/s , 173590.7 tokens/s INFO:__main__:2024-11-30 03:54:46 | Epoch: 0 | Step: 317320 | Dataset: 0-2477160 | Loss: 0.758 | 597 ms/step , 115647.86 GFLOP/s , 173730.0 tokens/s INFO:__main__:2024-11-30 03:54:53 | Epoch: 0 | Step: 317330 | Dataset: 0-2479560 | Loss: 0.743 | 597 ms/step , 115540.10 GFLOP/s , 173737.5 tokens/s INFO:__main__:2024-11-30 03:55:00 | Epoch: 0 | Step: 317340 | Dataset: 0-2481960 | Loss: 0.711 | 597 ms/step , 115511.07 GFLOP/s , 173592.6 tokens/s INFO:__main__:2024-11-30 03:55:07 | Epoch: 0 | Step: 317350 | Dataset: 0-2484360 | Loss: 0.777 | 598 ms/step , 115486.65 GFLOP/s , 173594.2 tokens/s INFO:__main__:2024-11-30 03:55:14 | Epoch: 0 | Step: 317360 | Dataset: 0-2486760 | Loss: 0.784 | 598 ms/step , 115481.60 GFLOP/s , 173576.0 tokens/s INFO:__main__:2024-11-30 03:55:22 | Epoch: 0 | Step: 317370 | Dataset: 0-2489160 | Loss: 0.770 | 597 ms/step , 115506.74 GFLOP/s , 173563.4 tokens/s INFO:__main__:2024-11-30 03:55:29 | Epoch: 0 | Step: 317380 | Dataset: 0-2491560 | Loss: 0.639 | 597 ms/step , 115562.89 GFLOP/s , 173662.6 tokens/s INFO:__main__:2024-11-30 03:55:36 | Epoch: 0 | Step: 317390 | Dataset: 0-2493960 | Loss: 0.606 | 597 ms/step , 115653.29 GFLOP/s , 173689.4 tokens/s INFO:__main__:2024-11-30 03:55:43 | Epoch: 0 | Step: 317400 | Dataset: 0-2496360 | Loss: 0.682 | 597 ms/step , 115578.27 GFLOP/s , 173766.6 tokens/s INFO:__main__:2024-11-30 03:55:50 | Epoch: 0 | Step: 317410 | Dataset: 0-2498760 | Loss: 0.597 | 597 ms/step , 115687.40 GFLOP/s , 173692.5 tokens/s INFO:__main__:2024-11-30 03:55:57 | Epoch: 0 | Step: 317420 | Dataset: 0-2501160 | Loss: 0.642 | 598 ms/step , 115447.59 GFLOP/s , 173657.2 tokens/s INFO:__main__:2024-11-30 03:56:04 | Epoch: 0 | Step: 317430 | Dataset: 0-2503560 | Loss: 0.564 | 597 ms/step , 115513.23 GFLOP/s , 173646.3 tokens/s INFO:__main__:2024-11-30 03:56:11 | Epoch: 0 | Step: 317440 | Dataset: 0-2505960 | Loss: 0.649 | 597 ms/step , 115592.45 GFLOP/s , 173682.6 tokens/s INFO:__main__:2024-11-30 03:56:18 | Epoch: 0 | Step: 317450 | Dataset: 0-2508360 | Loss: 0.655 | 598 ms/step , 115441.48 GFLOP/s , 173608.4 tokens/s INFO:__main__:2024-11-30 03:56:25 | Epoch: 0 | Step: 317460 | Dataset: 0-2510760 | Loss: 0.559 | 596 ms/step , 115697.73 GFLOP/s , 173665.0 tokens/s INFO:__main__:2024-11-30 03:56:32 | Epoch: 0 | Step: 317470 | Dataset: 0-2513160 | Loss: 0.617 | 597 ms/step , 115532.64 GFLOP/s , 173828.1 tokens/s INFO:__main__:2024-11-30 03:56:39 | Epoch: 0 | Step: 317480 | Dataset: 0-2515560 | Loss: 0.648 | 597 ms/step , 115633.07 GFLOP/s , 173729.8 tokens/s INFO:__main__:2024-11-30 03:56:46 | Epoch: 0 | Step: 317490 | Dataset: 0-2517960 | Loss: 0.627 | 597 ms/step , 115567.80 GFLOP/s , 173591.1 tokens/s INFO:__main__:2024-11-30 03:56:54 | Validation | Step: 317500 | Val_loss: 0.574 | Best_val_loss: 0.4363 INFO:__main__:2024-11-30 03:56:55 | Epoch: 0 | Step: 317500 | Dataset: 0-2520360 | Loss: 0.598 | 595 ms/step , 115914.40 GFLOP/s , 147498.5 tokens/s INFO:__main__:2024-11-30 03:57:02 | Epoch: 0 | Step: 317510 | Dataset: 0-2522760 | Loss: 0.547 | 597 ms/step , 115572.58 GFLOP/s , 173767.4 tokens/s INFO:__main__:2024-11-30 03:57:09 | Epoch: 0 | Step: 317520 | Dataset: 0-2525160 | Loss: 0.667 | 597 ms/step , 115527.80 GFLOP/s , 173715.3 tokens/s INFO:__main__:2024-11-30 03:57:16 | Epoch: 0 | Step: 317530 | Dataset: 0-2527560 | Loss: 0.662 | 597 ms/step , 115585.29 GFLOP/s , 173630.2 tokens/s INFO:__main__:2024-11-30 03:57:23 | Epoch: 0 | Step: 317540 | Dataset: 0-2529960 | Loss: 0.693 | 597 ms/step , 115588.18 GFLOP/s , 173716.5 tokens/s INFO:__main__:2024-11-30 03:57:30 | Epoch: 0 | Step: 317550 | Dataset: 0-2532360 | Loss: 0.601 | 598 ms/step , 115398.09 GFLOP/s , 173830.4 tokens/s INFO:__main__:2024-11-30 03:57:37 | Epoch: 0 | Step: 317560 | Dataset: 0-2534760 | Loss: 0.599 | 597 ms/step , 115543.50 GFLOP/s , 173666.7 tokens/s INFO:__main__:2024-11-30 03:57:44 | Epoch: 0 | Step: 317570 | Dataset: 0-2537160 | Loss: 0.545 | 598 ms/step , 115343.30 GFLOP/s , 173655.1 tokens/s INFO:__main__:2024-11-30 03:57:51 | Epoch: 0 | Step: 317580 | Dataset: 0-2539560 | Loss: 0.618 | 597 ms/step , 115624.22 GFLOP/s , 173625.5 tokens/s INFO:__main__:2024-11-30 03:57:58 | Epoch: 0 | Step: 317590 | Dataset: 0-2541960 | Loss: 0.618 | 597 ms/step , 115565.91 GFLOP/s , 173635.6 tokens/s INFO:__main__:2024-11-30 03:58:06 | Epoch: 0 | Step: 317600 | Dataset: 0-2544360 | Loss: 0.653 | 597 ms/step , 115577.46 GFLOP/s , 173612.4 tokens/s INFO:__main__:2024-11-30 03:58:13 | Epoch: 0 | Step: 317610 | Dataset: 0-2546760 | Loss: 0.638 | 597 ms/step , 115567.66 GFLOP/s , 173655.9 tokens/s INFO:__main__:2024-11-30 03:58:20 | Epoch: 0 | Step: 317620 | Dataset: 0-2549160 | Loss: 0.592 | 597 ms/step , 115693.23 GFLOP/s , 173783.8 tokens/s INFO:__main__:2024-11-30 03:58:27 | Epoch: 0 | Step: 317630 | Dataset: 0-2551560 | Loss: 0.653 | 597 ms/step , 115635.78 GFLOP/s , 173664.6 tokens/s INFO:__main__:2024-11-30 03:58:34 | Epoch: 0 | Step: 317640 | Dataset: 0-2553960 | Loss: 0.711 | 598 ms/step , 115474.56 GFLOP/s , 173620.9 tokens/s INFO:__main__:2024-11-30 03:58:41 | Epoch: 0 | Step: 317650 | Dataset: 0-2556360 | Loss: 0.619 | 598 ms/step , 115372.59 GFLOP/s , 173670.1 tokens/s INFO:__main__:2024-11-30 03:58:48 | Epoch: 0 | Step: 317660 | Dataset: 0-2558760 | Loss: 0.627 | 597 ms/step , 115543.73 GFLOP/s , 173594.3 tokens/s INFO:__main__:2024-11-30 03:58:55 | Epoch: 0 | Step: 317670 | Dataset: 0-2561160 | Loss: 0.630 | 598 ms/step , 115484.59 GFLOP/s , 173638.5 tokens/s INFO:__main__:2024-11-30 03:59:02 | Epoch: 0 | Step: 317680 | Dataset: 0-2563560 | Loss: 0.577 | 596 ms/step , 115738.16 GFLOP/s , 173646.5 tokens/s INFO:__main__:2024-11-30 03:59:09 | Epoch: 0 | Step: 317690 | Dataset: 0-2565960 | Loss: 0.635 | 597 ms/step , 115648.87 GFLOP/s , 173758.3 tokens/s INFO:__main__:2024-11-30 03:59:16 | Epoch: 0 | Step: 317700 | Dataset: 0-2568360 | Loss: 0.644 | 598 ms/step , 115501.38 GFLOP/s , 173745.5 tokens/s INFO:__main__:2024-11-30 03:59:23 | Epoch: 0 | Step: 317710 | Dataset: 0-2570760 | Loss: 0.652 | 597 ms/step , 115534.75 GFLOP/s , 173645.3 tokens/s INFO:__main__:2024-11-30 03:59:30 | Epoch: 0 | Step: 317720 | Dataset: 0-2573160 | Loss: 0.546 | 597 ms/step , 115579.25 GFLOP/s , 173589.6 tokens/s INFO:__main__:2024-11-30 03:59:37 | Epoch: 0 | Step: 317730 | Dataset: 0-2575560 | Loss: 0.583 | 597 ms/step , 115563.95 GFLOP/s , 173642.3 tokens/s INFO:__main__:2024-11-30 03:59:45 | Epoch: 0 | Step: 317740 | Dataset: 0-2577960 | Loss: 0.640 | 598 ms/step , 115388.44 GFLOP/s , 173617.5 tokens/s INFO:__main__:2024-11-30 03:59:52 | Epoch: 0 | Step: 317750 | Dataset: 0-2580360 | Loss: 0.561 | 598 ms/step , 115321.56 GFLOP/s , 173621.0 tokens/s INFO:__main__:2024-11-30 03:59:59 | Epoch: 0 | Step: 317760 | Dataset: 0-2582760 | Loss: 0.606 | 597 ms/step , 115597.21 GFLOP/s , 173612.6 tokens/s INFO:__main__:2024-11-30 04:00:06 | Epoch: 0 | Step: 317770 | Dataset: 0-2585160 | Loss: 0.655 | 597 ms/step , 115677.24 GFLOP/s , 173749.4 tokens/s INFO:__main__:2024-11-30 04:00:13 | Epoch: 0 | Step: 317780 | Dataset: 0-2587560 | Loss: 0.628 | 598 ms/step , 115495.66 GFLOP/s , 173566.4 tokens/s INFO:__main__:2024-11-30 04:00:20 | Epoch: 0 | Step: 317790 | Dataset: 0-2589960 | Loss: 0.683 | 598 ms/step , 115336.36 GFLOP/s , 173531.7 tokens/s INFO:__main__:2024-11-30 04:00:27 | Epoch: 0 | Step: 317800 | Dataset: 0-2592360 | Loss: 0.655 | 598 ms/step , 115335.82 GFLOP/s , 173558.4 tokens/s INFO:__main__:2024-11-30 04:00:34 | Epoch: 0 | Step: 317810 | Dataset: 0-2594760 | Loss: 0.615 | 598 ms/step , 115430.55 GFLOP/s , 173582.2 tokens/s INFO:__main__:2024-11-30 04:00:41 | Epoch: 0 | Step: 317820 | Dataset: 0-2597160 | Loss: 0.597 | 597 ms/step , 115630.78 GFLOP/s , 173570.2 tokens/s INFO:__main__:2024-11-30 04:00:48 | Epoch: 0 | Step: 317830 | Dataset: 0-2599560 | Loss: 0.663 | 598 ms/step , 115422.62 GFLOP/s , 173458.9 tokens/s INFO:__main__:2024-11-30 04:00:55 | Epoch: 0 | Step: 317840 | Dataset: 0-2601960 | Loss: 0.553 | 597 ms/step , 115668.28 GFLOP/s , 173675.7 tokens/s INFO:__main__:2024-11-30 04:01:02 | Epoch: 0 | Step: 317850 | Dataset: 0-2604360 | Loss: 0.610 | 597 ms/step , 115579.35 GFLOP/s , 173677.1 tokens/s INFO:__main__:2024-11-30 04:01:10 | Epoch: 0 | Step: 317860 | Dataset: 0-2606760 | Loss: 0.572 | 598 ms/step , 115475.00 GFLOP/s , 173537.4 tokens/s INFO:__main__:2024-11-30 04:01:17 | Epoch: 0 | Step: 317870 | Dataset: 0-2609160 | Loss: 0.594 | 597 ms/step , 115643.72 GFLOP/s , 173607.8 tokens/s INFO:__main__:2024-11-30 04:01:24 | Epoch: 0 | Step: 317880 | Dataset: 0-2611560 | Loss: 0.560 | 597 ms/step , 115573.54 GFLOP/s , 173684.9 tokens/s INFO:__main__:2024-11-30 04:01:31 | Epoch: 0 | Step: 317890 | Dataset: 0-2613960 | Loss: 0.603 | 597 ms/step , 115561.35 GFLOP/s , 173638.2 tokens/s INFO:__main__:2024-11-30 04:01:38 | Epoch: 0 | Step: 317900 | Dataset: 0-2616360 | Loss: 0.548 | 597 ms/step , 115533.41 GFLOP/s , 173648.9 tokens/s INFO:__main__:2024-11-30 04:01:45 | Epoch: 0 | Step: 317910 | Dataset: 0-2618760 | Loss: 0.655 | 596 ms/step , 115737.72 GFLOP/s , 173708.1 tokens/s INFO:__main__:2024-11-30 04:01:52 | Epoch: 0 | Step: 317920 | Dataset: 0-2621160 | Loss: 0.642 | 597 ms/step , 115612.03 GFLOP/s , 173808.3 tokens/s INFO:__main__:2024-11-30 04:01:59 | Epoch: 0 | Step: 317930 | Dataset: 0-2623560 | Loss: 0.930 | 596 ms/step , 115735.06 GFLOP/s , 173798.7 tokens/s INFO:__main__:2024-11-30 04:02:06 | Epoch: 0 | Step: 317940 | Dataset: 0-2625960 | Loss: 0.980 | 597 ms/step , 115657.41 GFLOP/s , 173739.2 tokens/s INFO:__main__:2024-11-30 04:02:13 | Epoch: 0 | Step: 317950 | Dataset: 0-2628360 | Loss: 0.876 | 596 ms/step , 115713.80 GFLOP/s , 173719.7 tokens/s INFO:__main__:2024-11-30 04:02:20 | Epoch: 0 | Step: 317960 | Dataset: 0-2630760 | Loss: 0.973 | 596 ms/step , 115717.64 GFLOP/s , 173755.2 tokens/s INFO:__main__:2024-11-30 04:02:27 | Epoch: 0 | Step: 317970 | Dataset: 0-2633160 | Loss: 0.955 | 597 ms/step , 115675.04 GFLOP/s , 173765.5 tokens/s INFO:__main__:2024-11-30 04:02:34 | Epoch: 0 | Step: 317980 | Dataset: 0-2635560 | Loss: 0.898 | 597 ms/step , 115683.86 GFLOP/s , 173718.9 tokens/s INFO:__main__:2024-11-30 04:02:41 | Epoch: 0 | Step: 317990 | Dataset: 0-2637960 | Loss: 0.888 | 597 ms/step , 115675.01 GFLOP/s , 173822.2 tokens/s INFO:__main__:2024-11-30 04:02:49 | Validation | Step: 318000 | Val_loss: 1.005 | Best_val_loss: 0.4363 INFO:__main__:2024-11-30 04:02:49 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_040249_step_318000.pt` INFO:__main__:2024-11-30 04:02:52 | Epoch: 0 | Step: 318000 | Dataset: 0-2640360 | Loss: 0.908 | 595 ms/step , 116080.01 GFLOP/s , 118561.7 tokens/s INFO:__main__:2024-11-30 04:02:59 | Epoch: 0 | Step: 318010 | Dataset: 0-2642760 | Loss: 0.956 | 599 ms/step , 115236.71 GFLOP/s , 173252.3 tokens/s INFO:__main__:2024-11-30 04:03:06 | Epoch: 0 | Step: 318020 | Dataset: 0-2645160 | Loss: 0.839 | 598 ms/step , 115377.54 GFLOP/s , 173334.4 tokens/s INFO:__main__:2024-11-30 04:03:13 | Epoch: 0 | Step: 318030 | Dataset: 0-2647560 | Loss: 0.915 | 598 ms/step , 115392.42 GFLOP/s , 173429.5 tokens/s INFO:__main__:2024-11-30 04:03:20 | Epoch: 0 | Step: 318040 | Dataset: 0-2649960 | Loss: 0.918 | 597 ms/step , 115610.88 GFLOP/s , 173821.9 tokens/s INFO:__main__:2024-11-30 04:03:27 | Epoch: 0 | Step: 318050 | Dataset: 0-2652360 | Loss: 0.961 | 598 ms/step , 115445.03 GFLOP/s , 173829.2 tokens/s INFO:__main__:2024-11-30 04:03:34 | Epoch: 0 | Step: 318060 | Dataset: 0-2654760 | Loss: 0.941 | 596 ms/step , 115858.72 GFLOP/s , 173918.6 tokens/s INFO:__main__:2024-11-30 04:03:41 | Epoch: 0 | Step: 318070 | Dataset: 0-2657160 | Loss: 0.884 | 596 ms/step , 115784.31 GFLOP/s , 173928.2 tokens/s INFO:__main__:2024-11-30 04:03:48 | Epoch: 0 | Step: 318080 | Dataset: 0-2659560 | Loss: 0.863 | 597 ms/step , 115550.93 GFLOP/s , 173818.8 tokens/s INFO:__main__:2024-11-30 04:03:55 | Epoch: 0 | Step: 318090 | Dataset: 0-2661960 | Loss: 0.892 | 596 ms/step , 115700.60 GFLOP/s , 173807.5 tokens/s INFO:__main__:2024-11-30 04:04:03 | Epoch: 0 | Step: 318100 | Dataset: 0-2664360 | Loss: 0.811 | 597 ms/step , 115636.63 GFLOP/s , 173775.4 tokens/s INFO:__main__:2024-11-30 04:04:10 | Epoch: 0 | Step: 318110 | Dataset: 0-2666760 | Loss: 0.848 | 597 ms/step , 115658.98 GFLOP/s , 173747.0 tokens/s INFO:__main__:2024-11-30 04:04:17 | Epoch: 0 | Step: 318120 | Dataset: 0-2669160 | Loss: 0.865 | 597 ms/step , 115621.97 GFLOP/s , 173790.1 tokens/s INFO:__main__:2024-11-30 04:04:24 | Epoch: 0 | Step: 318130 | Dataset: 0-2671560 | Loss: 0.892 | 596 ms/step , 115776.40 GFLOP/s , 173830.6 tokens/s INFO:__main__:2024-11-30 04:04:31 | Epoch: 0 | Step: 318140 | Dataset: 0-2673960 | Loss: 0.946 | 596 ms/step , 115871.21 GFLOP/s , 173930.3 tokens/s INFO:__main__:2024-11-30 04:04:38 | Epoch: 0 | Step: 318150 | Dataset: 0-2676360 | Loss: 0.885 | 597 ms/step , 115515.86 GFLOP/s , 173850.3 tokens/s INFO:__main__:2024-11-30 04:04:45 | Epoch: 0 | Step: 318160 | Dataset: 0-2678760 | Loss: 0.924 | 596 ms/step , 115836.14 GFLOP/s , 173725.4 tokens/s INFO:__main__:2024-11-30 04:04:52 | Epoch: 0 | Step: 318170 | Dataset: 0-2681160 | Loss: 0.963 | 597 ms/step , 115519.55 GFLOP/s , 173780.6 tokens/s INFO:__main__:2024-11-30 04:04:59 | Epoch: 0 | Step: 318180 | Dataset: 0-2683560 | Loss: 0.829 | 597 ms/step , 115689.09 GFLOP/s , 173790.7 tokens/s INFO:__main__:2024-11-30 04:05:06 | Epoch: 0 | Step: 318190 | Dataset: 0-2685960 | Loss: 0.938 | 597 ms/step , 115645.78 GFLOP/s , 173754.1 tokens/s INFO:__main__:2024-11-30 04:05:13 | Epoch: 0 | Step: 318200 | Dataset: 0-2688360 | Loss: 0.631 | 597 ms/step , 115529.72 GFLOP/s , 173766.2 tokens/s INFO:__main__:2024-11-30 04:05:20 | Epoch: 0 | Step: 318210 | Dataset: 0-2690760 | Loss: 0.637 | 597 ms/step , 115688.27 GFLOP/s , 173781.1 tokens/s INFO:__main__:2024-11-30 04:05:27 | Epoch: 0 | Step: 318220 | Dataset: 0-2693160 | Loss: 0.625 | 598 ms/step , 115494.24 GFLOP/s , 173822.7 tokens/s INFO:__main__:2024-11-30 04:05:34 | Epoch: 0 | Step: 318230 | Dataset: 0-2695560 | Loss: 0.530 | 597 ms/step , 115615.15 GFLOP/s , 173720.9 tokens/s INFO:__main__:2024-11-30 04:05:42 | Epoch: 0 | Step: 318240 | Dataset: 0-2697960 | Loss: 0.576 | 597 ms/step , 115517.63 GFLOP/s , 173654.6 tokens/s INFO:__main__:2024-11-30 04:05:49 | Epoch: 0 | Step: 318250 | Dataset: 0-2700360 | Loss: 0.567 | 597 ms/step , 115572.88 GFLOP/s , 173678.3 tokens/s INFO:__main__:2024-11-30 04:05:56 | Epoch: 0 | Step: 318260 | Dataset: 0-2702760 | Loss: 0.555 | 597 ms/step , 115647.71 GFLOP/s , 173667.0 tokens/s INFO:__main__:2024-11-30 04:06:03 | Epoch: 0 | Step: 318270 | Dataset: 0-2705160 | Loss: 0.531 | 597 ms/step , 115686.93 GFLOP/s , 173670.0 tokens/s INFO:__main__:2024-11-30 04:06:10 | Epoch: 0 | Step: 318280 | Dataset: 0-2707560 | Loss: 0.540 | 597 ms/step , 115622.28 GFLOP/s , 173659.3 tokens/s INFO:__main__:2024-11-30 04:06:17 | Epoch: 0 | Step: 318290 | Dataset: 0-2709960 | Loss: 0.532 | 597 ms/step , 115510.66 GFLOP/s , 173788.5 tokens/s INFO:__main__:2024-11-30 04:06:24 | Epoch: 0 | Step: 318300 | Dataset: 0-2712360 | Loss: 0.527 | 598 ms/step , 115485.81 GFLOP/s , 173785.1 tokens/s INFO:__main__:2024-11-30 04:06:31 | Epoch: 0 | Step: 318310 | Dataset: 0-2714760 | Loss: 0.509 | 598 ms/step , 115445.37 GFLOP/s , 173646.2 tokens/s INFO:__main__:2024-11-30 04:06:38 | Epoch: 0 | Step: 318320 | Dataset: 0-2717160 | Loss: 0.507 | 597 ms/step , 115560.71 GFLOP/s , 173671.3 tokens/s INFO:__main__:2024-11-30 04:06:45 | Epoch: 0 | Step: 318330 | Dataset: 0-2719560 | Loss: 0.518 | 597 ms/step , 115621.30 GFLOP/s , 173669.4 tokens/s INFO:__main__:2024-11-30 04:06:52 | Epoch: 0 | Step: 318340 | Dataset: 0-2721960 | Loss: 0.490 | 597 ms/step , 115534.75 GFLOP/s , 173602.1 tokens/s INFO:__main__:2024-11-30 04:06:59 | Epoch: 0 | Step: 318350 | Dataset: 0-2724360 | Loss: 0.492 | 597 ms/step , 115619.45 GFLOP/s , 173642.4 tokens/s INFO:__main__:2024-11-30 04:07:06 | Epoch: 0 | Step: 318360 | Dataset: 0-2726760 | Loss: 0.501 | 597 ms/step , 115691.81 GFLOP/s , 173762.5 tokens/s INFO:__main__:2024-11-30 04:07:14 | Epoch: 0 | Step: 318370 | Dataset: 0-2729160 | Loss: 0.512 | 596 ms/step , 115786.11 GFLOP/s , 173807.0 tokens/s INFO:__main__:2024-11-30 04:07:21 | Epoch: 0 | Step: 318380 | Dataset: 0-2731560 | Loss: 0.506 | 598 ms/step , 115479.38 GFLOP/s , 173652.8 tokens/s INFO:__main__:2024-11-30 04:07:28 | Epoch: 0 | Step: 318390 | Dataset: 0-2733960 | Loss: 0.512 | 597 ms/step , 115529.38 GFLOP/s , 173715.6 tokens/s INFO:__main__:2024-11-30 04:07:35 | Epoch: 0 | Step: 318400 | Dataset: 0-2736360 | Loss: 0.469 | 597 ms/step , 115673.96 GFLOP/s , 173714.0 tokens/s INFO:__main__:2024-11-30 04:07:42 | Epoch: 0 | Step: 318410 | Dataset: 0-2738760 | Loss: 0.555 | 597 ms/step , 115608.30 GFLOP/s , 173655.0 tokens/s INFO:__main__:2024-11-30 04:07:49 | Epoch: 0 | Step: 318420 | Dataset: 0-2741160 | Loss: 0.488 | 597 ms/step , 115666.51 GFLOP/s , 173573.9 tokens/s INFO:__main__:2024-11-30 04:07:56 | Epoch: 0 | Step: 318430 | Dataset: 0-2743560 | Loss: 0.498 | 599 ms/step , 115296.90 GFLOP/s , 173631.0 tokens/s INFO:__main__:2024-11-30 04:08:03 | Epoch: 0 | Step: 318440 | Dataset: 0-2745960 | Loss: 0.476 | 597 ms/step , 115559.49 GFLOP/s , 173793.4 tokens/s INFO:__main__:2024-11-30 04:08:10 | Epoch: 0 | Step: 318450 | Dataset: 0-2748360 | Loss: 0.494 | 596 ms/step , 115738.97 GFLOP/s , 173678.7 tokens/s INFO:__main__:2024-11-30 04:08:17 | Epoch: 0 | Step: 318460 | Dataset: 0-2750760 | Loss: 0.474 | 597 ms/step , 115557.72 GFLOP/s , 173545.4 tokens/s INFO:__main__:2024-11-30 04:08:24 | Epoch: 0 | Step: 318470 | Dataset: 0-2753160 | Loss: 0.749 | 597 ms/step , 115652.99 GFLOP/s , 173615.4 tokens/s INFO:__main__:2024-11-30 04:08:31 | Epoch: 0 | Step: 318480 | Dataset: 0-2755560 | Loss: 0.815 | 598 ms/step , 115384.14 GFLOP/s , 173651.5 tokens/s INFO:__main__:2024-11-30 04:08:38 | Epoch: 0 | Step: 318490 | Dataset: 0-2757960 | Loss: 0.690 | 598 ms/step , 115498.41 GFLOP/s , 173629.9 tokens/s INFO:__main__:2024-11-30 04:08:46 | Validation | Step: 318500 | Val_loss: 0.982 | Best_val_loss: 0.4363 INFO:__main__:2024-11-30 04:08:47 | Epoch: 0 | Step: 318500 | Dataset: 0-2760360 | Loss: 0.695 | 596 ms/step , 115869.36 GFLOP/s , 147671.4 tokens/s INFO:__main__:2024-11-30 04:08:54 | Epoch: 0 | Step: 318510 | Dataset: 0-2762760 | Loss: 0.763 | 597 ms/step , 115594.87 GFLOP/s , 173792.7 tokens/s INFO:__main__:2024-11-30 04:09:01 | Epoch: 0 | Step: 318520 | Dataset: 0-2765160 | Loss: 0.762 | 597 ms/step , 115526.72 GFLOP/s , 173786.7 tokens/s INFO:__main__:2024-11-30 04:09:08 | Epoch: 0 | Step: 318530 | Dataset: 0-2767560 | Loss: 0.790 | 597 ms/step , 115559.34 GFLOP/s , 173733.1 tokens/s INFO:__main__:2024-11-30 04:09:15 | Epoch: 0 | Step: 318540 | Dataset: 0-2769960 | Loss: 0.733 | 597 ms/step , 115625.66 GFLOP/s , 173629.6 tokens/s INFO:__main__:2024-11-30 04:09:22 | Epoch: 0 | Step: 318550 | Dataset: 0-2772360 | Loss: 0.662 | 597 ms/step , 115520.00 GFLOP/s , 173579.2 tokens/s INFO:__main__:2024-11-30 04:09:29 | Epoch: 0 | Step: 318560 | Dataset: 0-2774760 | Loss: 0.718 | 597 ms/step , 115568.87 GFLOP/s , 173636.0 tokens/s INFO:__main__:2024-11-30 04:09:36 | Epoch: 0 | Step: 318570 | Dataset: 0-2777160 | Loss: 0.686 | 597 ms/step , 115510.77 GFLOP/s , 173632.5 tokens/s INFO:__main__:2024-11-30 04:09:43 | Epoch: 0 | Step: 318580 | Dataset: 0-2779560 | Loss: 0.684 | 598 ms/step , 115477.07 GFLOP/s , 173642.5 tokens/s INFO:__main__:2024-11-30 04:09:50 | Epoch: 0 | Step: 318590 | Dataset: 0-2781960 | Loss: 0.733 | 597 ms/step , 115612.78 GFLOP/s , 173806.1 tokens/s INFO:__main__:2024-11-30 04:09:58 | Epoch: 0 | Step: 318600 | Dataset: 0-2784360 | Loss: 0.701 | 597 ms/step , 115608.13 GFLOP/s , 173626.3 tokens/s INFO:__main__:2024-11-30 04:10:05 | Epoch: 0 | Step: 318610 | Dataset: 0-2786760 | Loss: 0.781 | 597 ms/step , 115569.02 GFLOP/s , 173666.1 tokens/s INFO:__main__:2024-11-30 04:10:12 | Epoch: 0 | Step: 318620 | Dataset: 0-2789160 | Loss: 0.816 | 598 ms/step , 115456.53 GFLOP/s , 173615.6 tokens/s INFO:__main__:2024-11-30 04:10:19 | Epoch: 0 | Step: 318630 | Dataset: 0-2791560 | Loss: 0.779 | 597 ms/step , 115551.46 GFLOP/s , 173625.7 tokens/s INFO:__main__:2024-11-30 04:10:26 | Epoch: 0 | Step: 318640 | Dataset: 0-2793960 | Loss: 0.810 | 597 ms/step , 115505.48 GFLOP/s , 173594.4 tokens/s INFO:__main__:2024-11-30 04:10:33 | Epoch: 0 | Step: 318650 | Dataset: 0-2796360 | Loss: 0.746 | 598 ms/step , 115475.36 GFLOP/s , 173657.0 tokens/s INFO:__main__:2024-11-30 04:10:40 | Epoch: 0 | Step: 318660 | Dataset: 0-2798760 | Loss: 0.656 | 596 ms/step , 115745.49 GFLOP/s , 173677.6 tokens/s INFO:__main__:2024-11-30 04:10:47 | Epoch: 0 | Step: 318670 | Dataset: 0-2801160 | Loss: 0.736 | 597 ms/step , 115586.06 GFLOP/s , 173759.1 tokens/s INFO:__main__:2024-11-30 04:10:54 | Epoch: 0 | Step: 318680 | Dataset: 0-2803560 | Loss: 0.743 | 598 ms/step , 115482.14 GFLOP/s , 173669.4 tokens/s INFO:__main__:2024-11-30 04:11:01 | Epoch: 0 | Step: 318690 | Dataset: 0-2805960 | Loss: 0.754 | 598 ms/step , 115408.63 GFLOP/s , 173618.7 tokens/s INFO:__main__:2024-11-30 04:11:08 | Epoch: 0 | Step: 318700 | Dataset: 0-2808360 | Loss: 0.678 | 597 ms/step , 115622.26 GFLOP/s , 173598.1 tokens/s INFO:__main__:2024-11-30 04:11:15 | Epoch: 0 | Step: 318710 | Dataset: 0-2810760 | Loss: 0.713 | 597 ms/step , 115597.29 GFLOP/s , 173645.3 tokens/s INFO:__main__:2024-11-30 04:11:22 | Epoch: 0 | Step: 318720 | Dataset: 0-2813160 | Loss: 0.660 | 598 ms/step , 115436.97 GFLOP/s , 173672.7 tokens/s INFO:__main__:2024-11-30 04:11:30 | Epoch: 0 | Step: 318730 | Dataset: 0-2815560 | Loss: 0.845 | 597 ms/step , 115606.74 GFLOP/s , 173569.7 tokens/s INFO:__main__:2024-11-30 04:11:37 | Epoch: 0 | Step: 318740 | Dataset: 0-2817960 | Loss: 0.792 | 598 ms/step , 115438.84 GFLOP/s , 173765.0 tokens/s INFO:__main__:2024-11-30 04:11:44 | Epoch: 0 | Step: 318750 | Dataset: 0-2820360 | Loss: 0.721 | 596 ms/step , 115732.12 GFLOP/s , 173676.3 tokens/s INFO:__main__:2024-11-30 04:11:51 | Epoch: 0 | Step: 318760 | Dataset: 0-2822760 | Loss: 0.662 | 597 ms/step , 115683.74 GFLOP/s , 173618.1 tokens/s INFO:__main__:2024-11-30 04:11:58 | Epoch: 0 | Step: 318770 | Dataset: 0-2825160 | Loss: 0.636 | 598 ms/step , 115419.44 GFLOP/s , 173629.7 tokens/s INFO:__main__:2024-11-30 04:12:05 | Epoch: 0 | Step: 318780 | Dataset: 0-2827560 | Loss: 0.842 | 598 ms/step , 115310.01 GFLOP/s , 173610.5 tokens/s INFO:__main__:2024-11-30 04:12:12 | Epoch: 0 | Step: 318790 | Dataset: 0-2829960 | Loss: 0.711 | 597 ms/step , 115630.64 GFLOP/s , 173613.4 tokens/s INFO:__main__:2024-11-30 04:12:19 | Epoch: 0 | Step: 318800 | Dataset: 0-2832360 | Loss: 0.645 | 598 ms/step , 115492.84 GFLOP/s , 173636.2 tokens/s INFO:__main__:2024-11-30 04:12:26 | Epoch: 0 | Step: 318810 | Dataset: 0-2834760 | Loss: 0.814 | 597 ms/step , 115675.80 GFLOP/s , 173693.5 tokens/s INFO:__main__:2024-11-30 04:12:33 | Epoch: 0 | Step: 318820 | Dataset: 0-2837160 | Loss: 0.752 | 596 ms/step , 115699.50 GFLOP/s , 173703.2 tokens/s INFO:__main__:2024-11-30 04:12:40 | Epoch: 0 | Step: 318830 | Dataset: 0-2839560 | Loss: 0.623 | 598 ms/step , 115496.83 GFLOP/s , 173706.2 tokens/s INFO:__main__:2024-11-30 04:12:47 | Epoch: 0 | Step: 318840 | Dataset: 0-2841960 | Loss: 0.681 | 598 ms/step , 115402.21 GFLOP/s , 173590.7 tokens/s INFO:__main__:2024-11-30 04:12:54 | Epoch: 0 | Step: 318850 | Dataset: 0-2844360 | Loss: 0.731 | 597 ms/step , 115524.27 GFLOP/s , 173621.2 tokens/s INFO:__main__:2024-11-30 04:13:01 | Epoch: 0 | Step: 318860 | Dataset: 0-2846760 | Loss: 0.687 | 597 ms/step , 115627.47 GFLOP/s , 173634.3 tokens/s INFO:__main__:2024-11-30 04:13:09 | Epoch: 0 | Step: 318870 | Dataset: 0-2849160 | Loss: 0.766 | 597 ms/step , 115540.39 GFLOP/s , 173652.1 tokens/s INFO:__main__:2024-11-30 04:13:16 | Epoch: 0 | Step: 318880 | Dataset: 0-2851560 | Loss: 0.775 | 597 ms/step , 115597.72 GFLOP/s , 173605.9 tokens/s INFO:__main__:2024-11-30 04:13:23 | Epoch: 0 | Step: 318890 | Dataset: 0-2853960 | Loss: 0.641 | 596 ms/step , 115739.29 GFLOP/s , 173733.5 tokens/s INFO:__main__:2024-11-30 04:13:30 | Epoch: 0 | Step: 318900 | Dataset: 0-2856360 | Loss: 0.670 | 596 ms/step , 115735.17 GFLOP/s , 173777.3 tokens/s INFO:__main__:2024-11-30 04:13:37 | Epoch: 0 | Step: 318910 | Dataset: 0-2858760 | Loss: 0.731 | 597 ms/step , 115658.29 GFLOP/s , 173604.2 tokens/s INFO:__main__:2024-11-30 04:13:44 | Epoch: 0 | Step: 318920 | Dataset: 0-2861160 | Loss: 0.808 | 597 ms/step , 115587.70 GFLOP/s , 173602.3 tokens/s INFO:__main__:2024-11-30 04:13:51 | Epoch: 0 | Step: 318930 | Dataset: 0-2863560 | Loss: 0.727 | 599 ms/step , 115189.51 GFLOP/s , 173609.8 tokens/s INFO:__main__:2024-11-30 04:13:58 | Epoch: 0 | Step: 318940 | Dataset: 0-2865960 | Loss: 0.714 | 598 ms/step , 115376.19 GFLOP/s , 173634.3 tokens/s INFO:__main__:2024-11-30 04:14:05 | Epoch: 0 | Step: 318950 | Dataset: 0-2868360 | Loss: 0.793 | 597 ms/step , 115556.59 GFLOP/s , 173635.0 tokens/s INFO:__main__:2024-11-30 04:14:12 | Epoch: 0 | Step: 318960 | Dataset: 0-2870760 | Loss: 0.716 | 597 ms/step , 115584.31 GFLOP/s , 173678.6 tokens/s INFO:__main__:2024-11-30 04:14:19 | Epoch: 0 | Step: 318970 | Dataset: 0-2873160 | Loss: 0.753 | 597 ms/step , 115626.03 GFLOP/s , 173736.0 tokens/s INFO:__main__:2024-11-30 04:14:26 | Epoch: 0 | Step: 318980 | Dataset: 0-2875560 | Loss: 0.586 | 598 ms/step , 115447.06 GFLOP/s , 173622.9 tokens/s INFO:__main__:2024-11-30 04:14:33 | Epoch: 0 | Step: 318990 | Dataset: 0-2877960 | Loss: 0.852 | 598 ms/step , 115493.33 GFLOP/s , 173595.2 tokens/s INFO:__main__:2024-11-30 04:14:41 | Validation | Step: 319000 | Val_loss: 0.972 | Best_val_loss: 0.4363 INFO:__main__:2024-11-30 04:14:41 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_041441_step_319000.pt` INFO:__main__:2024-11-30 04:14:44 | Epoch: 0 | Step: 319000 | Dataset: 0-2880360 | Loss: 0.763 | 595 ms/step , 115992.84 GFLOP/s , 118083.6 tokens/s INFO:__main__:2024-11-30 04:14:51 | Epoch: 0 | Step: 319010 | Dataset: 0-2882760 | Loss: 0.703 | 598 ms/step , 115325.29 GFLOP/s , 173322.1 tokens/s INFO:__main__:2024-11-30 04:14:58 | Epoch: 0 | Step: 319020 | Dataset: 0-2885160 | Loss: 0.102 | 596 ms/step , 115744.35 GFLOP/s , 173438.7 tokens/s INFO:__main__:2024-11-30 04:15:05 | Epoch: 0 | Step: 319030 | Dataset: 0-2887560 | Loss: 0.910 | 597 ms/step , 115668.54 GFLOP/s , 173893.8 tokens/s INFO:__main__:2024-11-30 04:15:12 | Epoch: 0 | Step: 319040 | Dataset: 0-2889960 | Loss: 0.894 | 598 ms/step , 115375.87 GFLOP/s , 173906.6 tokens/s INFO:__main__:2024-11-30 04:15:19 | Epoch: 0 | Step: 319050 | Dataset: 0-2892360 | Loss: 0.891 | 597 ms/step , 115605.80 GFLOP/s , 173756.3 tokens/s INFO:__main__:2024-11-30 04:15:26 | Epoch: 0 | Step: 319060 | Dataset: 0-2894760 | Loss: 0.888 | 598 ms/step , 115454.19 GFLOP/s , 173646.0 tokens/s INFO:__main__:2024-11-30 04:15:33 | Epoch: 0 | Step: 319070 | Dataset: 0-2897160 | Loss: 0.873 | 598 ms/step , 115495.39 GFLOP/s , 173723.1 tokens/s INFO:__main__:2024-11-30 04:15:41 | Epoch: 0 | Step: 319080 | Dataset: 0-2899560 | Loss: 0.863 | 597 ms/step , 115673.65 GFLOP/s , 173660.6 tokens/s INFO:__main__:2024-11-30 04:15:48 | Epoch: 0 | Step: 319090 | Dataset: 0-2901960 | Loss: 0.911 | 597 ms/step , 115548.92 GFLOP/s , 173656.1 tokens/s INFO:__main__:2024-11-30 04:15:55 | Epoch: 0 | Step: 319100 | Dataset: 0-2904360 | Loss: 0.868 | 598 ms/step , 115477.61 GFLOP/s , 173761.9 tokens/s INFO:__main__:2024-11-30 04:16:02 | Epoch: 0 | Step: 319110 | Dataset: 0-2906760 | Loss: 0.884 | 597 ms/step , 115650.48 GFLOP/s , 173763.5 tokens/s INFO:__main__:2024-11-30 04:16:09 | Epoch: 0 | Step: 319120 | Dataset: 0-2909160 | Loss: 0.842 | 597 ms/step , 115533.52 GFLOP/s , 173691.3 tokens/s INFO:__main__:2024-11-30 04:16:16 | Epoch: 0 | Step: 319130 | Dataset: 0-2911560 | Loss: 0.868 | 598 ms/step , 115396.64 GFLOP/s , 173663.0 tokens/s INFO:__main__:2024-11-30 04:16:23 | Epoch: 0 | Step: 319140 | Dataset: 0-2913960 | Loss: 0.821 | 598 ms/step , 115401.66 GFLOP/s , 173643.6 tokens/s INFO:__main__:2024-11-30 04:16:30 | Epoch: 0 | Step: 319150 | Dataset: 0-2916360 | Loss: 0.824 | 597 ms/step , 115684.97 GFLOP/s , 173703.7 tokens/s INFO:__main__:2024-11-30 04:16:37 | Epoch: 0 | Step: 319160 | Dataset: 0-2918760 | Loss: 0.813 | 597 ms/step , 115560.64 GFLOP/s , 173640.5 tokens/s INFO:__main__:2024-11-30 04:16:44 | Epoch: 0 | Step: 319170 | Dataset: 0-2921160 | Loss: 0.882 | 597 ms/step , 115584.10 GFLOP/s , 173586.9 tokens/s INFO:__main__:2024-11-30 04:16:51 | Epoch: 0 | Step: 319180 | Dataset: 0-2923560 | Loss: 0.840 | 597 ms/step , 115631.70 GFLOP/s , 173718.0 tokens/s INFO:__main__:2024-11-30 04:16:58 | Epoch: 0 | Step: 319190 | Dataset: 0-2925960 | Loss: 0.857 | 598 ms/step , 115471.78 GFLOP/s , 173745.1 tokens/s INFO:__main__:2024-11-30 04:17:05 | Epoch: 0 | Step: 319200 | Dataset: 0-2928360 | Loss: 0.841 | 597 ms/step , 115604.67 GFLOP/s , 173631.9 tokens/s INFO:__main__:2024-11-30 04:17:12 | Epoch: 0 | Step: 319210 | Dataset: 0-2930760 | Loss: 0.859 | 598 ms/step , 115452.46 GFLOP/s , 173588.6 tokens/s INFO:__main__:2024-11-30 04:17:20 | Epoch: 0 | Step: 319220 | Dataset: 0-2933160 | Loss: 0.845 | 597 ms/step , 115522.41 GFLOP/s , 173575.7 tokens/s INFO:__main__:2024-11-30 04:17:27 | Epoch: 0 | Step: 319230 | Dataset: 0-2935560 | Loss: 0.835 | 597 ms/step , 115580.57 GFLOP/s , 173566.4 tokens/s INFO:__main__:2024-11-30 04:17:34 | Epoch: 0 | Step: 319240 | Dataset: 0-2937960 | Loss: 0.846 | 598 ms/step , 115446.69 GFLOP/s , 173551.5 tokens/s INFO:__main__:2024-11-30 04:17:41 | Epoch: 0 | Step: 319250 | Dataset: 0-2940360 | Loss: 0.848 | 597 ms/step , 115554.12 GFLOP/s , 173643.7 tokens/s INFO:__main__:2024-11-30 04:17:48 | Epoch: 0 | Step: 319260 | Dataset: 0-2942760 | Loss: 0.866 | 597 ms/step , 115595.26 GFLOP/s , 173746.9 tokens/s INFO:__main__:2024-11-30 04:17:55 | Epoch: 0 | Step: 319270 | Dataset: 0-2945160 | Loss: 0.812 | 598 ms/step , 115420.95 GFLOP/s , 173598.8 tokens/s INFO:__main__:2024-11-30 04:18:02 | Epoch: 0 | Step: 319280 | Dataset: 0-2947560 | Loss: 0.827 | 597 ms/step , 115583.96 GFLOP/s , 173624.5 tokens/s INFO:__main__:2024-11-30 04:18:09 | Epoch: 0 | Step: 319290 | Dataset: 0-2949960 | Loss: 0.853 | 598 ms/step , 115438.19 GFLOP/s , 173562.2 tokens/s INFO:__main__:2024-11-30 04:18:16 | Epoch: 0 | Step: 319300 | Dataset: 0-2952360 | Loss: 0.838 | 597 ms/step , 115510.05 GFLOP/s , 173565.0 tokens/s INFO:__main__:2024-11-30 04:18:23 | Epoch: 0 | Step: 319310 | Dataset: 0-2954760 | Loss: 0.804 | 597 ms/step , 115582.10 GFLOP/s , 173578.9 tokens/s INFO:__main__:2024-11-30 04:18:30 | Epoch: 0 | Step: 319320 | Dataset: 0-2957160 | Loss: 0.816 | 597 ms/step , 115561.17 GFLOP/s , 173555.8 tokens/s INFO:__main__:2024-11-30 04:18:37 | Epoch: 0 | Step: 319330 | Dataset: 0-2959560 | Loss: 0.796 | 596 ms/step , 115716.98 GFLOP/s , 173676.4 tokens/s INFO:__main__:2024-11-30 04:18:44 | Epoch: 0 | Step: 319340 | Dataset: 0-2961960 | Loss: 0.869 | 597 ms/step , 115604.20 GFLOP/s , 173715.8 tokens/s INFO:__main__:2024-11-30 04:18:52 | Epoch: 0 | Step: 319350 | Dataset: 0-2964360 | Loss: 0.811 | 598 ms/step , 115460.58 GFLOP/s , 173536.8 tokens/s INFO:__main__:2024-11-30 04:18:59 | Epoch: 0 | Step: 319360 | Dataset: 0-2966760 | Loss: 0.829 | 598 ms/step , 115460.16 GFLOP/s , 173613.3 tokens/s INFO:__main__:2024-11-30 04:19:06 | Epoch: 0 | Step: 319370 | Dataset: 0-2969160 | Loss: 0.842 | 598 ms/step , 115441.11 GFLOP/s , 173540.8 tokens/s INFO:__main__:2024-11-30 04:19:13 | Epoch: 0 | Step: 319380 | Dataset: 0-2971560 | Loss: 0.821 | 597 ms/step , 115533.83 GFLOP/s , 173615.4 tokens/s INFO:__main__:2024-11-30 04:19:20 | Epoch: 0 | Step: 319390 | Dataset: 0-2973960 | Loss: 0.858 | 597 ms/step , 115515.64 GFLOP/s , 173577.8 tokens/s INFO:__main__:2024-11-30 04:19:27 | Epoch: 0 | Step: 319400 | Dataset: 0-2976360 | Loss: 0.785 | 599 ms/step , 115262.80 GFLOP/s , 173584.4 tokens/s INFO:__main__:2024-11-30 04:19:34 | Epoch: 0 | Step: 319410 | Dataset: 0-2978760 | Loss: 0.828 | 597 ms/step , 115558.58 GFLOP/s , 173690.4 tokens/s INFO:__main__:2024-11-30 04:19:41 | Epoch: 0 | Step: 319420 | Dataset: 0-2981160 | Loss: 0.813 | 598 ms/step , 115445.64 GFLOP/s , 173678.6 tokens/s INFO:__main__:2024-11-30 04:19:48 | Epoch: 0 | Step: 319430 | Dataset: 0-2983560 | Loss: 0.811 | 598 ms/step , 115381.87 GFLOP/s , 173570.2 tokens/s INFO:__main__:2024-11-30 04:19:55 | Epoch: 0 | Step: 319440 | Dataset: 0-2985960 | Loss: 0.846 | 598 ms/step , 115392.01 GFLOP/s , 173549.9 tokens/s INFO:__main__:2024-11-30 04:20:02 | Epoch: 0 | Step: 319450 | Dataset: 0-2988360 | Loss: 0.837 | 598 ms/step , 115462.72 GFLOP/s , 173661.3 tokens/s INFO:__main__:2024-11-30 04:20:09 | Epoch: 0 | Step: 319460 | Dataset: 0-2990760 | Loss: 0.779 | 597 ms/step , 115526.48 GFLOP/s , 173594.8 tokens/s INFO:__main__:2024-11-30 04:20:17 | Epoch: 0 | Step: 319470 | Dataset: 0-2993160 | Loss: 0.774 | 597 ms/step , 115543.04 GFLOP/s , 173554.3 tokens/s INFO:__main__:2024-11-30 04:20:24 | Epoch: 0 | Step: 319480 | Dataset: 0-2995560 | Loss: 0.816 | 597 ms/step , 115615.35 GFLOP/s , 173652.5 tokens/s INFO:__main__:2024-11-30 04:20:31 | Epoch: 0 | Step: 319490 | Dataset: 0-2997960 | Loss: 0.808 | 596 ms/step , 115743.11 GFLOP/s , 173724.7 tokens/s INFO:__main__:2024-11-30 04:20:38 | Validation | Step: 319500 | Val_loss: 0.947 | Best_val_loss: 0.4363 INFO:__main__:2024-11-30 04:20:39 | Epoch: 0 | Step: 319500 | Dataset: 0-3000360 | Loss: 0.808 | 597 ms/step , 115623.87 GFLOP/s , 147656.9 tokens/s INFO:__main__:2024-11-30 04:20:46 | Epoch: 0 | Step: 319510 | Dataset: 0-3002760 | Loss: 0.796 | 598 ms/step , 115473.63 GFLOP/s , 173711.4 tokens/s INFO:__main__:2024-11-30 04:20:53 | Epoch: 0 | Step: 319520 | Dataset: 0-3005160 | Loss: 0.824 | 597 ms/step , 115511.46 GFLOP/s , 173704.8 tokens/s INFO:__main__:2024-11-30 04:21:00 | Epoch: 0 | Step: 319530 | Dataset: 0-3007560 | Loss: 0.845 | 597 ms/step , 115528.57 GFLOP/s , 173648.0 tokens/s INFO:__main__:2024-11-30 04:21:07 | Epoch: 0 | Step: 319540 | Dataset: 0-3009960 | Loss: 0.785 | 597 ms/step , 115585.86 GFLOP/s , 173622.3 tokens/s INFO:__main__:2024-11-30 04:21:14 | Epoch: 0 | Step: 319550 | Dataset: 0-3012360 | Loss: 0.796 | 597 ms/step , 115657.81 GFLOP/s , 173679.3 tokens/s INFO:__main__:2024-11-30 04:21:21 | Epoch: 0 | Step: 319560 | Dataset: 0-3014760 | Loss: 0.852 | 596 ms/step , 115707.85 GFLOP/s , 173840.5 tokens/s INFO:__main__:2024-11-30 04:21:28 | Epoch: 0 | Step: 319570 | Dataset: 0-3017160 | Loss: 0.741 | 597 ms/step , 115663.67 GFLOP/s , 173760.6 tokens/s INFO:__main__:2024-11-30 04:21:36 | Epoch: 0 | Step: 319580 | Dataset: 0-3019560 | Loss: 0.701 | 598 ms/step , 115464.26 GFLOP/s , 173702.1 tokens/s INFO:__main__:2024-11-30 04:21:43 | Epoch: 0 | Step: 319590 | Dataset: 0-3021960 | Loss: 0.794 | 598 ms/step , 115442.05 GFLOP/s , 173709.6 tokens/s INFO:__main__:2024-11-30 04:21:50 | Epoch: 0 | Step: 319600 | Dataset: 0-3024360 | Loss: 0.683 | 597 ms/step , 115548.76 GFLOP/s , 173648.0 tokens/s INFO:__main__:2024-11-30 04:21:57 | Epoch: 0 | Step: 319610 | Dataset: 0-3026760 | Loss: 0.750 | 596 ms/step , 115804.68 GFLOP/s , 173688.5 tokens/s INFO:__main__:2024-11-30 04:22:04 | Epoch: 0 | Step: 319620 | Dataset: 0-3029160 | Loss: 0.685 | 598 ms/step , 115398.42 GFLOP/s , 173692.2 tokens/s INFO:__main__:2024-11-30 04:22:11 | Epoch: 0 | Step: 319630 | Dataset: 0-3031560 | Loss: 0.789 | 596 ms/step , 115728.78 GFLOP/s , 173826.4 tokens/s INFO:__main__:2024-11-30 04:22:18 | Epoch: 0 | Step: 319640 | Dataset: 0-3033960 | Loss: 0.792 | 597 ms/step , 115511.97 GFLOP/s , 173787.3 tokens/s INFO:__main__:2024-11-30 04:22:25 | Epoch: 0 | Step: 319650 | Dataset: 0-3036360 | Loss: 0.731 | 598 ms/step , 115463.64 GFLOP/s , 173663.1 tokens/s INFO:__main__:2024-11-30 04:22:32 | Epoch: 0 | Step: 319660 | Dataset: 0-3038760 | Loss: 0.675 | 597 ms/step , 115559.28 GFLOP/s , 173649.7 tokens/s INFO:__main__:2024-11-30 04:22:39 | Epoch: 0 | Step: 319670 | Dataset: 0-3041160 | Loss: 0.696 | 597 ms/step , 115615.99 GFLOP/s , 173702.0 tokens/s INFO:__main__:2024-11-30 04:22:46 | Epoch: 0 | Step: 319680 | Dataset: 0-3043560 | Loss: 0.789 | 597 ms/step , 115550.12 GFLOP/s , 173685.4 tokens/s INFO:__main__:2024-11-30 04:22:53 | Epoch: 0 | Step: 319690 | Dataset: 0-3045960 | Loss: 0.709 | 597 ms/step , 115601.70 GFLOP/s , 173719.7 tokens/s INFO:__main__:2024-11-30 04:23:00 | Epoch: 0 | Step: 319700 | Dataset: 0-3048360 | Loss: 0.686 | 596 ms/step , 115720.15 GFLOP/s , 173745.7 tokens/s INFO:__main__:2024-11-30 04:23:08 | Epoch: 0 | Step: 319710 | Dataset: 0-3050760 | Loss: 0.715 | 597 ms/step , 115665.92 GFLOP/s , 173865.2 tokens/s INFO:__main__:2024-11-30 04:23:15 | Epoch: 0 | Step: 319720 | Dataset: 0-3053160 | Loss: 0.716 | 597 ms/step , 115620.65 GFLOP/s , 173782.8 tokens/s INFO:__main__:2024-11-30 04:23:22 | Epoch: 0 | Step: 319730 | Dataset: 0-3055560 | Loss: 0.806 | 597 ms/step , 115514.57 GFLOP/s , 173672.0 tokens/s INFO:__main__:2024-11-30 04:23:29 | Epoch: 0 | Step: 319740 | Dataset: 0-3057960 | Loss: 0.745 | 598 ms/step , 115493.17 GFLOP/s , 173711.7 tokens/s INFO:__main__:2024-11-30 04:23:36 | Epoch: 0 | Step: 319750 | Dataset: 0-3060360 | Loss: 0.742 | 597 ms/step , 115675.47 GFLOP/s , 173649.5 tokens/s INFO:__main__:2024-11-30 04:23:43 | Epoch: 0 | Step: 319760 | Dataset: 0-3062760 | Loss: 0.694 | 597 ms/step , 115674.64 GFLOP/s , 173711.7 tokens/s INFO:__main__:2024-11-30 04:23:50 | Epoch: 0 | Step: 319770 | Dataset: 0-3065160 | Loss: 0.725 | 597 ms/step , 115512.50 GFLOP/s , 173693.0 tokens/s INFO:__main__:2024-11-30 04:23:57 | Epoch: 0 | Step: 319780 | Dataset: 0-3067560 | Loss: 0.702 | 596 ms/step , 115721.53 GFLOP/s , 173781.2 tokens/s INFO:__main__:2024-11-30 04:24:04 | Epoch: 0 | Step: 319790 | Dataset: 0-3069960 | Loss: 0.835 | 597 ms/step , 115550.83 GFLOP/s , 173778.0 tokens/s INFO:__main__:2024-11-30 04:24:11 | Epoch: 0 | Step: 319800 | Dataset: 0-3072360 | Loss: 0.640 | 597 ms/step , 115624.34 GFLOP/s , 173688.3 tokens/s INFO:__main__:2024-11-30 04:24:18 | Epoch: 0 | Step: 319810 | Dataset: 0-3074760 | Loss: 0.681 | 597 ms/step , 115637.63 GFLOP/s , 173671.0 tokens/s INFO:__main__:2024-11-30 04:24:25 | Epoch: 0 | Step: 319820 | Dataset: 0-3077160 | Loss: 0.696 | 597 ms/step , 115658.11 GFLOP/s , 173709.3 tokens/s INFO:__main__:2024-11-30 04:24:32 | Epoch: 0 | Step: 319830 | Dataset: 0-3079560 | Loss: 0.676 | 598 ms/step , 115432.03 GFLOP/s , 173645.0 tokens/s INFO:__main__:2024-11-30 04:24:39 | Epoch: 0 | Step: 319840 | Dataset: 0-3081960 | Loss: 0.712 | 597 ms/step , 115506.64 GFLOP/s , 173726.4 tokens/s INFO:__main__:2024-11-30 04:24:47 | Epoch: 0 | Step: 319850 | Dataset: 0-3084360 | Loss: 0.701 | 597 ms/step , 115670.47 GFLOP/s , 173738.4 tokens/s INFO:__main__:2024-11-30 04:24:54 | Epoch: 0 | Step: 319860 | Dataset: 0-3086760 | Loss: 0.651 | 596 ms/step , 115699.67 GFLOP/s , 173772.9 tokens/s INFO:__main__:2024-11-30 04:25:01 | Epoch: 0 | Step: 319870 | Dataset: 0-3089160 | Loss: 0.745 | 597 ms/step , 115642.99 GFLOP/s , 173768.4 tokens/s INFO:__main__:2024-11-30 04:25:08 | Epoch: 0 | Step: 319880 | Dataset: 0-3091560 | Loss: 0.704 | 597 ms/step , 115570.77 GFLOP/s , 173670.5 tokens/s INFO:__main__:2024-11-30 04:25:15 | Epoch: 0 | Step: 319890 | Dataset: 0-3093960 | Loss: 0.685 | 596 ms/step , 115728.74 GFLOP/s , 173530.6 tokens/s INFO:__main__:2024-11-30 04:25:22 | Epoch: 0 | Step: 319900 | Dataset: 0-3096360 | Loss: 0.722 | 598 ms/step , 115497.53 GFLOP/s , 173671.6 tokens/s INFO:__main__:2024-11-30 04:25:29 | Epoch: 0 | Step: 319910 | Dataset: 0-3098760 | Loss: 0.813 | 597 ms/step , 115663.08 GFLOP/s , 173698.2 tokens/s INFO:__main__:2024-11-30 04:25:36 | Epoch: 0 | Step: 319920 | Dataset: 0-3101160 | Loss: 0.702 | 597 ms/step , 115584.06 GFLOP/s , 173655.5 tokens/s INFO:__main__:2024-11-30 04:25:43 | Epoch: 0 | Step: 319930 | Dataset: 0-3103560 | Loss: 0.641 | 597 ms/step , 115680.28 GFLOP/s , 173783.2 tokens/s INFO:__main__:2024-11-30 04:25:50 | Epoch: 0 | Step: 319940 | Dataset: 0-3105960 | Loss: 0.676 | 597 ms/step , 115619.41 GFLOP/s , 173785.9 tokens/s INFO:__main__:2024-11-30 04:25:57 | Epoch: 0 | Step: 319950 | Dataset: 0-3108360 | Loss: 0.671 | 597 ms/step , 115650.42 GFLOP/s , 173678.9 tokens/s INFO:__main__:2024-11-30 04:26:04 | Epoch: 0 | Step: 319960 | Dataset: 0-3110760 | Loss: 0.732 | 597 ms/step , 115638.78 GFLOP/s , 173697.2 tokens/s INFO:__main__:2024-11-30 04:26:11 | Epoch: 0 | Step: 319970 | Dataset: 0-3113160 | Loss: 0.725 | 598 ms/step , 115481.95 GFLOP/s , 173642.9 tokens/s INFO:__main__:2024-11-30 04:26:19 | Epoch: 0 | Step: 319980 | Dataset: 0-3115560 | Loss: 0.711 | 597 ms/step , 115510.40 GFLOP/s , 173693.1 tokens/s INFO:__main__:2024-11-30 04:26:26 | Epoch: 0 | Step: 319990 | Dataset: 0-3117960 | Loss: 0.743 | 597 ms/step , 115594.06 GFLOP/s , 173683.2 tokens/s INFO:__main__:2024-11-30 04:26:33 | Validation | Step: 320000 | Val_loss: 0.943 | Best_val_loss: 0.4363 INFO:__main__:2024-11-30 04:26:33 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_042633_step_320000.pt` INFO:__main__:2024-11-30 04:26:36 | Epoch: 0 | Step: 320000 | Dataset: 0-3120360 | Loss: 0.738 | 596 ms/step , 115852.44 GFLOP/s , 120837.4 tokens/s INFO:__main__:2024-11-30 04:26:43 | Epoch: 0 | Step: 320010 | Dataset: 0-3122760 | Loss: 0.711 | 598 ms/step , 115370.50 GFLOP/s , 173384.9 tokens/s INFO:__main__:2024-11-30 04:26:50 | Epoch: 0 | Step: 320020 | Dataset: 0-3125160 | Loss: 0.724 | 599 ms/step , 115258.37 GFLOP/s , 173306.6 tokens/s INFO:__main__:2024-11-30 04:26:57 | Epoch: 0 | Step: 320030 | Dataset: 0-3127560 | Loss: 0.728 | 597 ms/step , 115554.13 GFLOP/s , 173626.1 tokens/s INFO:__main__:2024-11-30 04:27:04 | Epoch: 0 | Step: 320040 | Dataset: 0-3129960 | Loss: 0.746 | 597 ms/step , 115673.11 GFLOP/s , 173769.0 tokens/s INFO:__main__:2024-11-30 04:27:11 | Epoch: 0 | Step: 320050 | Dataset: 0-3132360 | Loss: 0.708 | 597 ms/step , 115554.96 GFLOP/s , 173823.4 tokens/s INFO:__main__:2024-11-30 04:27:18 | Epoch: 0 | Step: 320060 | Dataset: 0-3134760 | Loss: 0.835 | 596 ms/step , 115749.22 GFLOP/s , 173733.5 tokens/s INFO:__main__:2024-11-30 04:27:25 | Epoch: 0 | Step: 320070 | Dataset: 0-3137160 | Loss: 0.701 | 596 ms/step , 115748.87 GFLOP/s , 173826.9 tokens/s INFO:__main__:2024-11-30 04:27:32 | Epoch: 0 | Step: 320080 | Dataset: 0-3139560 | Loss: 0.694 | 596 ms/step , 115814.56 GFLOP/s , 173826.8 tokens/s INFO:__main__:2024-11-30 04:27:39 | Epoch: 0 | Step: 320090 | Dataset: 0-3141960 | Loss: 0.686 | 598 ms/step , 115487.07 GFLOP/s , 173818.3 tokens/s INFO:__main__:2024-11-30 04:27:47 | Epoch: 0 | Step: 320100 | Dataset: 0-3144360 | Loss: 0.705 | 598 ms/step , 115490.35 GFLOP/s , 173655.7 tokens/s INFO:__main__:2024-11-30 04:27:54 | Epoch: 0 | Step: 320110 | Dataset: 0-3146760 | Loss: 0.430 | 597 ms/step , 115632.48 GFLOP/s , 173716.0 tokens/s INFO:__main__:2024-11-30 04:28:01 | Epoch: 0 | Step: 320120 | Dataset: 0-3149160 | Loss: 0.359 | 597 ms/step , 115642.88 GFLOP/s , 173711.9 tokens/s INFO:__main__:2024-11-30 04:28:08 | Epoch: 0 | Step: 320130 | Dataset: 0-3151560 | Loss: 0.376 | 597 ms/step , 115580.71 GFLOP/s , 173802.3 tokens/s INFO:__main__:2024-11-30 04:28:15 | Epoch: 0 | Step: 320140 | Dataset: 0-3153960 | Loss: 0.415 | 598 ms/step , 115498.63 GFLOP/s , 173737.9 tokens/s INFO:__main__:2024-11-30 04:28:22 | Epoch: 0 | Step: 320150 | Dataset: 0-3156360 | Loss: 0.377 | 597 ms/step , 115579.71 GFLOP/s , 173819.2 tokens/s INFO:__main__:2024-11-30 04:28:29 | Epoch: 0 | Step: 320160 | Dataset: 0-3158760 | Loss: 0.414 | 597 ms/step , 115612.87 GFLOP/s , 173863.5 tokens/s INFO:__main__:2024-11-30 04:28:36 | Epoch: 0 | Step: 320170 | Dataset: 0-3161160 | Loss: 0.415 | 597 ms/step , 115568.77 GFLOP/s , 173772.9 tokens/s INFO:__main__:2024-11-30 04:28:43 | Epoch: 0 | Step: 320180 | Dataset: 0-3163560 | Loss: 0.429 | 596 ms/step , 115701.58 GFLOP/s , 173776.9 tokens/s INFO:__main__:2024-11-30 04:28:50 | Epoch: 0 | Step: 320190 | Dataset: 0-3165960 | Loss: 0.416 | 597 ms/step , 115680.70 GFLOP/s , 173729.9 tokens/s INFO:__main__:2024-11-30 04:28:57 | Epoch: 0 | Step: 320200 | Dataset: 0-3168360 | Loss: 0.394 | 596 ms/step , 115719.00 GFLOP/s , 173783.0 tokens/s INFO:__main__:2024-11-30 04:29:04 | Epoch: 0 | Step: 320210 | Dataset: 0-3170760 | Loss: 0.416 | 597 ms/step , 115514.02 GFLOP/s , 173702.6 tokens/s INFO:__main__:2024-11-30 04:29:11 | Epoch: 0 | Step: 320220 | Dataset: 0-3173160 | Loss: 0.374 | 597 ms/step , 115568.10 GFLOP/s , 173779.2 tokens/s INFO:__main__:2024-11-30 04:29:18 | Epoch: 0 | Step: 320230 | Dataset: 0-3175560 | Loss: 0.412 | 596 ms/step , 115724.15 GFLOP/s , 173915.7 tokens/s INFO:__main__:2024-11-30 04:29:26 | Epoch: 0 | Step: 320240 | Dataset: 0-3177960 | Loss: 0.371 | 596 ms/step , 115752.87 GFLOP/s , 173831.2 tokens/s INFO:__main__:2024-11-30 04:29:33 | Epoch: 0 | Step: 320250 | Dataset: 0-3180360 | Loss: 0.341 | 597 ms/step , 115600.72 GFLOP/s , 173740.4 tokens/s INFO:__main__:2024-11-30 04:29:40 | Epoch: 0 | Step: 320260 | Dataset: 0-3182760 | Loss: 0.385 | 597 ms/step , 115587.25 GFLOP/s , 173759.3 tokens/s INFO:__main__:2024-11-30 04:29:47 | Epoch: 0 | Step: 320270 | Dataset: 0-3185160 | Loss: 0.384 | 596 ms/step , 115845.34 GFLOP/s , 173717.3 tokens/s INFO:__main__:2024-11-30 04:29:54 | Epoch: 0 | Step: 320280 | Dataset: 0-3187560 | Loss: 0.421 | 597 ms/step , 115673.96 GFLOP/s , 173752.7 tokens/s INFO:__main__:2024-11-30 04:30:01 | Epoch: 0 | Step: 320290 | Dataset: 0-3189960 | Loss: 0.407 | 598 ms/step , 115480.19 GFLOP/s , 173731.1 tokens/s INFO:__main__:2024-11-30 04:30:08 | Epoch: 0 | Step: 320300 | Dataset: 0-3192360 | Loss: 0.351 | 597 ms/step , 115687.98 GFLOP/s , 173788.5 tokens/s INFO:__main__:2024-11-30 04:30:15 | Epoch: 0 | Step: 320310 | Dataset: 0-3194760 | Loss: 0.372 | 596 ms/step , 115768.92 GFLOP/s , 173877.0 tokens/s INFO:__main__:2024-11-30 04:30:22 | Epoch: 0 | Step: 320320 | Dataset: 0-3197160 | Loss: 0.386 | 597 ms/step , 115694.16 GFLOP/s , 173808.7 tokens/s INFO:__main__:2024-11-30 04:30:29 | Epoch: 0 | Step: 320330 | Dataset: 0-3199560 | Loss: 0.423 | 597 ms/step , 115540.46 GFLOP/s , 173798.1 tokens/s INFO:__main__:2024-11-30 04:30:36 | Epoch: 0 | Step: 320340 | Dataset: 0-3201960 | Loss: 0.408 | 597 ms/step , 115646.99 GFLOP/s , 173821.7 tokens/s INFO:__main__:2024-11-30 04:30:43 | Epoch: 0 | Step: 320350 | Dataset: 0-3204360 | Loss: 0.361 | 598 ms/step , 115439.71 GFLOP/s , 173727.5 tokens/s INFO:__main__:2024-11-30 04:30:50 | Epoch: 0 | Step: 320360 | Dataset: 0-3206760 | Loss: 0.424 | 596 ms/step , 115740.84 GFLOP/s , 173774.2 tokens/s INFO:__main__:2024-11-30 04:30:57 | Epoch: 0 | Step: 320370 | Dataset: 0-3209160 | Loss: 0.390 | 596 ms/step , 115792.40 GFLOP/s , 173828.2 tokens/s INFO:__main__:2024-11-30 04:31:05 | Epoch: 0 | Step: 320380 | Dataset: 0-3211560 | Loss: 0.395 | 596 ms/step , 115800.78 GFLOP/s , 173960.0 tokens/s INFO:__main__:2024-11-30 04:31:12 | Epoch: 0 | Step: 320390 | Dataset: 0-3213960 | Loss: 0.389 | 597 ms/step , 115560.46 GFLOP/s , 173750.8 tokens/s INFO:__main__:2024-11-30 04:31:19 | Epoch: 0 | Step: 320400 | Dataset: 0-3216360 | Loss: 0.371 | 597 ms/step , 115645.78 GFLOP/s , 173767.6 tokens/s INFO:__main__:2024-11-30 04:31:26 | Epoch: 0 | Step: 320410 | Dataset: 0-3218760 | Loss: 0.395 | 598 ms/step , 115498.51 GFLOP/s , 173778.2 tokens/s INFO:__main__:2024-11-30 04:31:33 | Epoch: 0 | Step: 320420 | Dataset: 0-3221160 | Loss: 0.444 | 598 ms/step , 115479.07 GFLOP/s , 173751.2 tokens/s INFO:__main__:2024-11-30 04:31:40 | Epoch: 0 | Step: 320430 | Dataset: 0-3223560 | Loss: 0.408 | 596 ms/step , 115714.66 GFLOP/s , 173772.5 tokens/s INFO:__main__:2024-11-30 04:31:47 | Epoch: 0 | Step: 320440 | Dataset: 0-3225960 | Loss: 0.443 | 597 ms/step , 115689.49 GFLOP/s , 173802.8 tokens/s INFO:__main__:2024-11-30 04:31:54 | Epoch: 0 | Step: 320450 | Dataset: 0-3228360 | Loss: 0.384 | 596 ms/step , 115886.01 GFLOP/s , 173848.7 tokens/s INFO:__main__:2024-11-30 04:32:01 | Epoch: 0 | Step: 320460 | Dataset: 0-3230760 | Loss: 0.346 | 597 ms/step , 115670.91 GFLOP/s , 173913.1 tokens/s INFO:__main__:2024-11-30 04:32:08 | Epoch: 0 | Step: 320470 | Dataset: 0-3233160 | Loss: 0.327 | 597 ms/step , 115550.63 GFLOP/s , 173807.7 tokens/s INFO:__main__:2024-11-30 04:32:15 | Epoch: 0 | Step: 320480 | Dataset: 0-3235560 | Loss: 0.351 | 596 ms/step , 115717.57 GFLOP/s , 173771.0 tokens/s INFO:__main__:2024-11-30 04:32:22 | Epoch: 0 | Step: 320490 | Dataset: 0-3237960 | Loss: 0.428 | 598 ms/step , 115472.30 GFLOP/s , 173747.8 tokens/s INFO:__main__:2024-11-30 04:32:30 | Validation | Step: 320500 | Val_loss: 0.968 | Best_val_loss: 0.4363 INFO:__main__:2024-11-30 04:32:31 | Epoch: 0 | Step: 320500 | Dataset: 0-3240360 | Loss: 0.395 | 596 ms/step , 115816.83 GFLOP/s , 147770.1 tokens/s INFO:__main__:2024-11-30 04:32:38 | Epoch: 0 | Step: 320510 | Dataset: 0-3242760 | Loss: 0.386 | 597 ms/step , 115665.78 GFLOP/s , 173848.1 tokens/s INFO:__main__:2024-11-30 04:32:45 | Epoch: 0 | Step: 320520 | Dataset: 0-3245160 | Loss: 0.379 | 596 ms/step , 115764.73 GFLOP/s , 173871.7 tokens/s INFO:__main__:2024-11-30 04:32:52 | Epoch: 0 | Step: 320530 | Dataset: 0-3247560 | Loss: 0.319 | 597 ms/step , 115694.56 GFLOP/s , 173918.9 tokens/s INFO:__main__:2024-11-30 04:32:59 | Epoch: 0 | Step: 320540 | Dataset: 0-3249960 | Loss: 0.379 | 597 ms/step , 115611.09 GFLOP/s , 173866.0 tokens/s INFO:__main__:2024-11-30 04:33:06 | Epoch: 0 | Step: 320550 | Dataset: 0-3252360 | Loss: 0.410 | 597 ms/step , 115523.66 GFLOP/s , 173777.6 tokens/s INFO:__main__:2024-11-30 04:33:13 | Epoch: 0 | Step: 320560 | Dataset: 0-3254760 | Loss: 0.392 | 597 ms/step , 115626.93 GFLOP/s , 173772.4 tokens/s INFO:__main__:2024-11-30 04:33:20 | Epoch: 0 | Step: 320570 | Dataset: 0-3257160 | Loss: 0.383 | 597 ms/step , 115543.16 GFLOP/s , 173717.2 tokens/s INFO:__main__:2024-11-30 04:33:27 | Epoch: 0 | Step: 320580 | Dataset: 0-3259560 | Loss: 0.366 | 597 ms/step , 115615.44 GFLOP/s , 173746.0 tokens/s INFO:__main__:2024-11-30 04:33:34 | Epoch: 0 | Step: 320590 | Dataset: 0-3261960 | Loss: 0.381 | 597 ms/step , 115682.47 GFLOP/s , 173771.5 tokens/s INFO:__main__:2024-11-30 04:33:41 | Epoch: 0 | Step: 320600 | Dataset: 0-3264360 | Loss: 0.404 | 599 ms/step , 115217.50 GFLOP/s , 173741.5 tokens/s INFO:__main__:2024-11-30 04:33:48 | Epoch: 0 | Step: 320610 | Dataset: 0-3266760 | Loss: 0.454 | 597 ms/step , 115647.85 GFLOP/s , 173857.3 tokens/s INFO:__main__:2024-11-30 04:33:55 | Epoch: 0 | Step: 320620 | Dataset: 0-3269160 | Loss: 0.365 | 598 ms/step , 115425.23 GFLOP/s , 173723.7 tokens/s INFO:__main__:2024-11-30 04:34:03 | Epoch: 0 | Step: 320630 | Dataset: 0-3271560 | Loss: 0.387 | 596 ms/step , 115729.50 GFLOP/s , 173754.4 tokens/s INFO:__main__:2024-11-30 04:34:10 | Epoch: 0 | Step: 320640 | Dataset: 0-3273960 | Loss: 0.399 | 597 ms/step , 115614.94 GFLOP/s , 173624.6 tokens/s INFO:__main__:2024-11-30 04:34:17 | Epoch: 0 | Step: 320650 | Dataset: 0-3276360 | Loss: 0.357 | 597 ms/step , 115509.10 GFLOP/s , 173706.8 tokens/s INFO:__main__:2024-11-30 04:34:24 | Epoch: 0 | Step: 320660 | Dataset: 0-3278760 | Loss: 0.462 | 597 ms/step , 115625.93 GFLOP/s , 173756.2 tokens/s INFO:__main__:2024-11-30 04:34:31 | Epoch: 0 | Step: 320670 | Dataset: 0-3281160 | Loss: 0.450 | 597 ms/step , 115648.18 GFLOP/s , 173798.6 tokens/s INFO:__main__:2024-11-30 04:34:38 | Epoch: 0 | Step: 320680 | Dataset: 0-3283560 | Loss: 0.442 | 596 ms/step , 115762.62 GFLOP/s , 173860.8 tokens/s INFO:__main__:2024-11-30 04:34:45 | Epoch: 0 | Step: 320690 | Dataset: 0-3285960 | Loss: 0.442 | 597 ms/step , 115517.33 GFLOP/s , 173776.0 tokens/s INFO:__main__:2024-11-30 04:34:52 | Epoch: 0 | Step: 320700 | Dataset: 0-3288360 | Loss: 0.448 | 598 ms/step , 115495.54 GFLOP/s , 173709.0 tokens/s INFO:__main__:2024-11-30 04:34:59 | Epoch: 0 | Step: 320710 | Dataset: 0-3290760 | Loss: 0.432 | 596 ms/step , 115706.68 GFLOP/s , 173723.0 tokens/s INFO:__main__:2024-11-30 04:35:06 | Epoch: 0 | Step: 320720 | Dataset: 0-3293160 | Loss: 0.489 | 597 ms/step , 115510.32 GFLOP/s , 173759.7 tokens/s INFO:__main__:2024-11-30 04:35:13 | Epoch: 0 | Step: 320730 | Dataset: 0-3295560 | Loss: 0.488 | 598 ms/step , 115419.81 GFLOP/s , 173704.0 tokens/s INFO:__main__:2024-11-30 04:35:20 | Epoch: 0 | Step: 320740 | Dataset: 0-3297960 | Loss: 0.478 | 597 ms/step , 115679.66 GFLOP/s , 173714.4 tokens/s INFO:__main__:2024-11-30 04:35:27 | Epoch: 0 | Step: 320750 | Dataset: 0-3300360 | Loss: 0.428 | 596 ms/step , 115766.86 GFLOP/s , 173811.1 tokens/s INFO:__main__:2024-11-30 04:35:34 | Epoch: 0 | Step: 320760 | Dataset: 0-3302760 | Loss: 0.438 | 596 ms/step , 115730.39 GFLOP/s , 173881.4 tokens/s INFO:__main__:2024-11-30 04:35:42 | Epoch: 0 | Step: 320770 | Dataset: 0-3305160 | Loss: 0.456 | 596 ms/step , 115744.49 GFLOP/s , 173741.5 tokens/s INFO:__main__:2024-11-30 04:35:49 | Epoch: 0 | Step: 320780 | Dataset: 0-3307560 | Loss: 0.475 | 597 ms/step , 115621.21 GFLOP/s , 173739.8 tokens/s INFO:__main__:2024-11-30 04:35:56 | Epoch: 0 | Step: 320790 | Dataset: 0-3309960 | Loss: 0.426 | 597 ms/step , 115518.80 GFLOP/s , 173780.3 tokens/s INFO:__main__:2024-11-30 04:36:03 | Epoch: 0 | Step: 320800 | Dataset: 0-3312360 | Loss: 0.462 | 598 ms/step , 115496.49 GFLOP/s , 173759.4 tokens/s INFO:__main__:2024-11-30 04:36:10 | Epoch: 0 | Step: 320810 | Dataset: 0-3314760 | Loss: 0.494 | 596 ms/step , 115732.13 GFLOP/s , 173722.8 tokens/s INFO:__main__:2024-11-30 04:36:17 | Epoch: 0 | Step: 320820 | Dataset: 0-3317160 | Loss: 0.466 | 597 ms/step , 115634.74 GFLOP/s , 173703.7 tokens/s INFO:__main__:2024-11-30 04:36:24 | Epoch: 0 | Step: 320830 | Dataset: 0-3319560 | Loss: 0.477 | 596 ms/step , 115701.43 GFLOP/s , 173883.2 tokens/s INFO:__main__:2024-11-30 04:36:31 | Epoch: 0 | Step: 320840 | Dataset: 0-3321960 | Loss: 0.433 | 597 ms/step , 115570.31 GFLOP/s , 173741.5 tokens/s INFO:__main__:2024-11-30 04:36:38 | Epoch: 0 | Step: 320850 | Dataset: 0-3324360 | Loss: 0.420 | 597 ms/step , 115615.61 GFLOP/s , 173676.2 tokens/s INFO:__main__:2024-11-30 04:36:45 | Epoch: 0 | Step: 320860 | Dataset: 0-3326760 | Loss: 0.454 | 596 ms/step , 115726.43 GFLOP/s , 173721.5 tokens/s INFO:__main__:2024-11-30 04:36:52 | Epoch: 0 | Step: 320870 | Dataset: 0-3329160 | Loss: 0.450 | 597 ms/step , 115522.54 GFLOP/s , 173749.8 tokens/s INFO:__main__:2024-11-30 04:36:59 | Epoch: 0 | Step: 320880 | Dataset: 0-3331560 | Loss: 0.415 | 597 ms/step , 115528.73 GFLOP/s , 173703.9 tokens/s INFO:__main__:2024-11-30 04:37:06 | Epoch: 0 | Step: 320890 | Dataset: 0-3333960 | Loss: 0.470 | 597 ms/step , 115655.54 GFLOP/s , 173713.3 tokens/s INFO:__main__:2024-11-30 04:37:13 | Epoch: 0 | Step: 320900 | Dataset: 0-3336360 | Loss: 0.463 | 597 ms/step , 115683.25 GFLOP/s , 173912.9 tokens/s INFO:__main__:2024-11-30 04:37:21 | Epoch: 0 | Step: 320910 | Dataset: 0-3338760 | Loss: 0.486 | 596 ms/step , 115714.03 GFLOP/s , 173828.2 tokens/s INFO:__main__:2024-11-30 04:37:28 | Epoch: 0 | Step: 320920 | Dataset: 0-3341160 | Loss: 0.467 | 597 ms/step , 115604.21 GFLOP/s , 173653.2 tokens/s INFO:__main__:2024-11-30 04:37:35 | Epoch: 0 | Step: 320930 | Dataset: 0-3343560 | Loss: 0.423 | 597 ms/step , 115584.09 GFLOP/s , 173803.4 tokens/s INFO:__main__:2024-11-30 04:37:42 | Epoch: 0 | Step: 320940 | Dataset: 0-3345960 | Loss: 0.454 | 596 ms/step , 115751.20 GFLOP/s , 173721.2 tokens/s INFO:__main__:2024-11-30 04:37:49 | Epoch: 0 | Step: 320950 | Dataset: 0-3348360 | Loss: 0.463 | 597 ms/step , 115581.44 GFLOP/s , 173766.6 tokens/s INFO:__main__:2024-11-30 04:37:56 | Epoch: 0 | Step: 320960 | Dataset: 0-3350760 | Loss: 0.450 | 596 ms/step , 115759.22 GFLOP/s , 173792.0 tokens/s INFO:__main__:2024-11-30 04:38:03 | Epoch: 0 | Step: 320970 | Dataset: 0-3353160 | Loss: 0.445 | 596 ms/step , 115699.97 GFLOP/s , 173798.0 tokens/s INFO:__main__:2024-11-30 04:38:10 | Epoch: 0 | Step: 320980 | Dataset: 0-3355560 | Loss: 0.487 | 596 ms/step , 115744.50 GFLOP/s , 173915.3 tokens/s INFO:__main__:2024-11-30 04:38:17 | Epoch: 0 | Step: 320990 | Dataset: 0-3357960 | Loss: 0.457 | 597 ms/step , 115555.41 GFLOP/s , 173742.4 tokens/s INFO:__main__:2024-11-30 04:38:25 | Validation | Step: 321000 | Val_loss: 1.012 | Best_val_loss: 0.4363 INFO:__main__:2024-11-30 04:38:25 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_043825_step_321000.pt` INFO:__main__:2024-11-30 04:38:28 | Epoch: 0 | Step: 321000 | Dataset: 0-3360360 | Loss: 0.480 | 595 ms/step , 115929.53 GFLOP/s , 118032.8 tokens/s INFO:__main__:2024-11-30 04:38:35 | Epoch: 0 | Step: 321010 | Dataset: 0-3362760 | Loss: 0.459 | 599 ms/step , 115256.08 GFLOP/s , 173376.3 tokens/s INFO:__main__:2024-11-30 04:38:42 | Epoch: 0 | Step: 321020 | Dataset: 0-3365160 | Loss: 0.421 | 600 ms/step , 115109.43 GFLOP/s , 173357.5 tokens/s INFO:__main__:2024-11-30 04:38:49 | Epoch: 0 | Step: 321030 | Dataset: 0-3367560 | Loss: 0.453 | 598 ms/step , 115397.72 GFLOP/s , 173360.7 tokens/s INFO:__main__:2024-11-30 04:38:56 | Epoch: 0 | Step: 321040 | Dataset: 0-3369960 | Loss: 0.504 | 598 ms/step , 115327.57 GFLOP/s , 173286.6 tokens/s INFO:__main__:2024-11-30 04:39:03 | Epoch: 0 | Step: 321050 | Dataset: 0-3372360 | Loss: 0.458 | 598 ms/step , 115379.11 GFLOP/s , 173480.1 tokens/s INFO:__main__:2024-11-30 04:39:10 | Epoch: 0 | Step: 321060 | Dataset: 0-3374760 | Loss: 0.465 | 597 ms/step , 115575.79 GFLOP/s , 173738.0 tokens/s INFO:__main__:2024-11-30 04:39:17 | Epoch: 0 | Step: 321070 | Dataset: 0-3377160 | Loss: 0.436 | 596 ms/step , 115771.62 GFLOP/s , 173858.2 tokens/s INFO:__main__:2024-11-30 04:39:24 | Epoch: 0 | Step: 321080 | Dataset: 0-3379560 | Loss: 0.439 | 597 ms/step , 115527.98 GFLOP/s , 173831.0 tokens/s INFO:__main__:2024-11-30 04:39:31 | Epoch: 0 | Step: 321090 | Dataset: 0-3381960 | Loss: 0.475 | 596 ms/step , 115719.44 GFLOP/s , 173789.1 tokens/s INFO:__main__:2024-11-30 04:39:38 | Epoch: 0 | Step: 321100 | Dataset: 0-3384360 | Loss: 0.426 | 598 ms/step , 115394.17 GFLOP/s , 173773.4 tokens/s INFO:__main__:2024-11-30 04:39:45 | Epoch: 0 | Step: 321110 | Dataset: 0-3386760 | Loss: 0.444 | 596 ms/step , 115774.87 GFLOP/s , 173875.8 tokens/s INFO:__main__:2024-11-30 04:39:52 | Epoch: 0 | Step: 321120 | Dataset: 0-3389160 | Loss: 0.476 | 597 ms/step , 115655.67 GFLOP/s , 173898.7 tokens/s INFO:__main__:2024-11-30 04:39:59 | Epoch: 0 | Step: 321130 | Dataset: 0-3391560 | Loss: 0.486 | 596 ms/step , 115744.30 GFLOP/s , 173848.0 tokens/s INFO:__main__:2024-11-30 04:40:07 | Epoch: 0 | Step: 321140 | Dataset: 0-3393960 | Loss: 0.991 | 597 ms/step , 115631.27 GFLOP/s , 173778.5 tokens/s INFO:__main__:2024-11-30 04:40:14 | Epoch: 0 | Step: 321150 | Dataset: 0-3396360 | Loss: 1.025 | 596 ms/step , 115697.81 GFLOP/s , 173684.0 tokens/s INFO:__main__:2024-11-30 04:40:21 | Epoch: 0 | Step: 321160 | Dataset: 0-3398760 | Loss: 1.082 | 596 ms/step , 115718.71 GFLOP/s , 173764.1 tokens/s INFO:__main__:2024-11-30 04:40:28 | Epoch: 0 | Step: 321170 | Dataset: 0-3401160 | Loss: 1.748 | 596 ms/step , 115784.55 GFLOP/s , 173734.2 tokens/s INFO:__main__:2024-11-30 04:40:35 | Epoch: 0 | Step: 321180 | Dataset: 0-3403560 | Loss: 1.029 | 597 ms/step , 115682.98 GFLOP/s , 173672.7 tokens/s INFO:__main__:2024-11-30 04:40:42 | Epoch: 0 | Step: 321190 | Dataset: 0-3405960 | Loss: 1.552 | 597 ms/step , 115534.79 GFLOP/s , 173833.1 tokens/s INFO:__main__:2024-11-30 04:40:49 | Epoch: 0 | Step: 321200 | Dataset: 0-3408360 | Loss: 0.392 | 596 ms/step , 115706.38 GFLOP/s , 173869.0 tokens/s INFO:__main__:2024-11-30 04:40:56 | Epoch: 0 | Step: 321210 | Dataset: 0-3410760 | Loss: 0.386 | 597 ms/step , 115582.40 GFLOP/s , 173838.3 tokens/s INFO:__main__:2024-11-30 04:41:03 | Epoch: 0 | Step: 321220 | Dataset: 0-3413160 | Loss: 0.412 | 597 ms/step , 115675.32 GFLOP/s , 173857.7 tokens/s INFO:__main__:2024-11-30 04:41:10 | Epoch: 0 | Step: 321230 | Dataset: 0-3415560 | Loss: 0.435 | 596 ms/step , 115754.41 GFLOP/s , 173877.6 tokens/s INFO:__main__:2024-11-30 04:41:17 | Epoch: 0 | Step: 321240 | Dataset: 0-3417960 | Loss: 0.392 | 597 ms/step , 115540.09 GFLOP/s , 173861.7 tokens/s INFO:__main__:2024-11-30 04:41:24 | Epoch: 0 | Step: 321250 | Dataset: 0-3420360 | Loss: 0.438 | 596 ms/step , 115696.49 GFLOP/s , 173793.5 tokens/s INFO:__main__:2024-11-30 04:41:31 | Epoch: 0 | Step: 321260 | Dataset: 0-3422760 | Loss: 0.364 | 597 ms/step , 115652.07 GFLOP/s , 173876.1 tokens/s INFO:__main__:2024-11-30 04:41:38 | Epoch: 0 | Step: 321270 | Dataset: 0-3425160 | Loss: 0.400 | 597 ms/step , 115656.44 GFLOP/s , 173959.1 tokens/s INFO:__main__:2024-11-30 04:41:46 | Epoch: 0 | Step: 321280 | Dataset: 0-3427560 | Loss: 0.389 | 596 ms/step , 115741.57 GFLOP/s , 173876.7 tokens/s INFO:__main__:2024-11-30 04:41:53 | Epoch: 0 | Step: 321290 | Dataset: 0-3429960 | Loss: 0.344 | 597 ms/step , 115606.28 GFLOP/s , 173827.4 tokens/s INFO:__main__:2024-11-30 04:42:00 | Epoch: 0 | Step: 321300 | Dataset: 0-3432360 | Loss: 0.403 | 596 ms/step , 115721.26 GFLOP/s , 173824.1 tokens/s INFO:__main__:2024-11-30 04:42:07 | Epoch: 0 | Step: 321310 | Dataset: 0-3434760 | Loss: 0.379 | 597 ms/step , 115641.42 GFLOP/s , 173816.4 tokens/s INFO:__main__:2024-11-30 04:42:14 | Epoch: 0 | Step: 321320 | Dataset: 0-3437160 | Loss: 0.386 | 598 ms/step , 115486.18 GFLOP/s , 173838.2 tokens/s INFO:__main__:2024-11-30 04:42:21 | Epoch: 0 | Step: 321330 | Dataset: 0-3439560 | Loss: 0.350 | 596 ms/step , 115842.80 GFLOP/s , 173880.5 tokens/s INFO:__main__:2024-11-30 04:42:28 | Epoch: 0 | Step: 321340 | Dataset: 0-3441960 | Loss: 0.335 | 596 ms/step , 115767.74 GFLOP/s , 173906.3 tokens/s INFO:__main__:2024-11-30 04:42:35 | Epoch: 0 | Step: 321350 | Dataset: 0-3444360 | Loss: 0.388 | 596 ms/step , 115739.14 GFLOP/s , 173942.8 tokens/s INFO:__main__:2024-11-30 04:42:42 | Epoch: 0 | Step: 321360 | Dataset: 0-3446760 | Loss: 0.415 | 596 ms/step , 115740.25 GFLOP/s , 173833.2 tokens/s INFO:__main__:2024-11-30 04:42:49 | Epoch: 0 | Step: 321370 | Dataset: 0-3449160 | Loss: 0.445 | 597 ms/step , 115596.85 GFLOP/s , 173788.5 tokens/s INFO:__main__:2024-11-30 04:42:56 | Epoch: 0 | Step: 321380 | Dataset: 0-3451560 | Loss: 0.395 | 596 ms/step , 115740.47 GFLOP/s , 173768.8 tokens/s INFO:__main__:2024-11-30 04:43:03 | Epoch: 0 | Step: 321390 | Dataset: 0-3453960 | Loss: 0.415 | 596 ms/step , 115711.18 GFLOP/s , 173830.4 tokens/s INFO:__main__:2024-11-30 04:43:10 | Epoch: 0 | Step: 321400 | Dataset: 0-3456360 | Loss: 0.349 | 596 ms/step , 115769.79 GFLOP/s , 173791.0 tokens/s INFO:__main__:2024-11-30 04:43:17 | Epoch: 0 | Step: 321410 | Dataset: 0-3458760 | Loss: 0.370 | 596 ms/step , 115764.50 GFLOP/s , 173756.8 tokens/s INFO:__main__:2024-11-30 04:43:24 | Epoch: 0 | Step: 321420 | Dataset: 0-3461160 | Loss: 0.415 | 596 ms/step , 115754.84 GFLOP/s , 173945.1 tokens/s INFO:__main__:2024-11-30 04:43:32 | Epoch: 0 | Step: 321430 | Dataset: 0-3463560 | Loss: 0.378 | 596 ms/step , 115703.50 GFLOP/s , 173911.6 tokens/s INFO:__main__:2024-11-30 04:43:39 | Epoch: 0 | Step: 321440 | Dataset: 0-3465960 | Loss: 0.409 | 597 ms/step , 115653.36 GFLOP/s , 173779.4 tokens/s INFO:__main__:2024-11-30 04:43:46 | Epoch: 0 | Step: 321450 | Dataset: 0-3468360 | Loss: 0.372 | 596 ms/step , 115785.06 GFLOP/s , 173773.9 tokens/s INFO:__main__:2024-11-30 04:43:53 | Epoch: 0 | Step: 321460 | Dataset: 0-3470760 | Loss: 0.423 | 597 ms/step , 115636.42 GFLOP/s , 173737.1 tokens/s INFO:__main__:2024-11-30 04:44:00 | Epoch: 0 | Step: 321470 | Dataset: 0-3473160 | Loss: 0.302 | 597 ms/step , 115675.58 GFLOP/s , 173851.2 tokens/s INFO:__main__:2024-11-30 04:44:07 | Epoch: 0 | Step: 321480 | Dataset: 0-3475560 | Loss: 0.354 | 596 ms/step , 115708.84 GFLOP/s , 173825.2 tokens/s INFO:__main__:2024-11-30 04:44:14 | Epoch: 0 | Step: 321490 | Dataset: 0-3477960 | Loss: 0.341 | 596 ms/step , 115804.43 GFLOP/s , 173874.5 tokens/s INFO:__main__:2024-11-30 04:44:22 | Validation | Step: 321500 | Val_loss: 0.960 | Best_val_loss: 0.4363 INFO:__main__:2024-11-30 04:44:22 | Epoch: 0 | Step: 321500 | Dataset: 0-3480360 | Loss: 0.317 | 594 ms/step , 116155.07 GFLOP/s , 147884.1 tokens/s INFO:__main__:2024-11-30 04:44:29 | Epoch: 0 | Step: 321510 | Dataset: 0-3482760 | Loss: 0.352 | 596 ms/step , 115738.29 GFLOP/s , 173888.5 tokens/s INFO:__main__:2024-11-30 04:44:36 | Epoch: 0 | Step: 321520 | Dataset: 0-3485160 | Loss: 0.371 | 596 ms/step , 115719.43 GFLOP/s , 173915.7 tokens/s INFO:__main__:2024-11-30 04:44:43 | Epoch: 0 | Step: 321530 | Dataset: 0-3487560 | Loss: 0.352 | 597 ms/step , 115525.49 GFLOP/s , 173796.0 tokens/s INFO:__main__:2024-11-30 04:44:51 | Epoch: 0 | Step: 321540 | Dataset: 0-3489960 | Loss: 0.402 | 596 ms/step , 115721.25 GFLOP/s , 173920.1 tokens/s INFO:__main__:2024-11-30 04:44:58 | Epoch: 0 | Step: 321550 | Dataset: 0-3492360 | Loss: 0.360 | 598 ms/step , 115486.15 GFLOP/s , 173812.5 tokens/s INFO:__main__:2024-11-30 04:45:05 | Epoch: 0 | Step: 321560 | Dataset: 0-3494760 | Loss: 0.416 | 596 ms/step , 115758.57 GFLOP/s , 173882.2 tokens/s INFO:__main__:2024-11-30 04:45:12 | Epoch: 0 | Step: 321570 | Dataset: 0-3497160 | Loss: 0.389 | 596 ms/step , 115785.39 GFLOP/s , 173994.7 tokens/s INFO:__main__:2024-11-30 04:45:19 | Epoch: 0 | Step: 321580 | Dataset: 0-3499560 | Loss: 0.419 | 597 ms/step , 115649.68 GFLOP/s , 173878.1 tokens/s INFO:__main__:2024-11-30 04:45:26 | Epoch: 0 | Step: 321590 | Dataset: 0-3501960 | Loss: 0.377 | 597 ms/step , 115605.49 GFLOP/s , 173794.7 tokens/s INFO:__main__:2024-11-30 04:45:33 | Epoch: 0 | Step: 321600 | Dataset: 0-3504360 | Loss: 0.392 | 597 ms/step , 115675.60 GFLOP/s , 173810.2 tokens/s INFO:__main__:2024-11-30 04:45:40 | Epoch: 0 | Step: 321610 | Dataset: 0-3506760 | Loss: 0.412 | 597 ms/step , 115646.47 GFLOP/s , 173788.2 tokens/s INFO:__main__:2024-11-30 04:45:47 | Epoch: 0 | Step: 321620 | Dataset: 0-3509160 | Loss: 0.404 | 596 ms/step , 115700.32 GFLOP/s , 173807.5 tokens/s INFO:__main__:2024-11-30 04:45:54 | Epoch: 0 | Step: 321630 | Dataset: 0-3511560 | Loss: 0.370 | 597 ms/step , 115690.37 GFLOP/s , 173762.7 tokens/s INFO:__main__:2024-11-30 04:46:01 | Epoch: 0 | Step: 321640 | Dataset: 0-3513960 | Loss: 0.396 | 596 ms/step , 115713.75 GFLOP/s , 173959.9 tokens/s INFO:__main__:2024-11-30 04:46:08 | Epoch: 0 | Step: 321650 | Dataset: 0-3516360 | Loss: 0.409 | 598 ms/step , 115350.05 GFLOP/s , 173889.7 tokens/s INFO:__main__:2024-11-30 04:46:15 | Epoch: 0 | Step: 321660 | Dataset: 0-3518760 | Loss: 0.418 | 597 ms/step , 115680.75 GFLOP/s , 173805.2 tokens/s INFO:__main__:2024-11-30 04:46:22 | Epoch: 0 | Step: 321670 | Dataset: 0-3521160 | Loss: 0.368 | 598 ms/step , 115313.45 GFLOP/s , 173797.2 tokens/s INFO:__main__:2024-11-30 04:46:30 | Epoch: 0 | Step: 321680 | Dataset: 0-3523560 | Loss: 0.378 | 597 ms/step , 115627.81 GFLOP/s , 173766.5 tokens/s INFO:__main__:2024-11-30 04:46:37 | Epoch: 0 | Step: 321690 | Dataset: 0-3525960 | Loss: 0.399 | 597 ms/step , 115691.71 GFLOP/s , 173807.4 tokens/s INFO:__main__:2024-11-30 04:46:44 | Epoch: 0 | Step: 321700 | Dataset: 0-3528360 | Loss: 0.405 | 597 ms/step , 115694.27 GFLOP/s , 173837.3 tokens/s INFO:__main__:2024-11-30 04:46:51 | Epoch: 0 | Step: 321710 | Dataset: 0-3530760 | Loss: 0.370 | 597 ms/step , 115511.47 GFLOP/s , 173832.6 tokens/s INFO:__main__:2024-11-30 04:46:58 | Epoch: 0 | Step: 321720 | Dataset: 0-3533160 | Loss: 0.400 | 596 ms/step , 115758.58 GFLOP/s , 173942.5 tokens/s INFO:__main__:2024-11-30 04:47:05 | Epoch: 0 | Step: 321730 | Dataset: 0-3535560 | Loss: 0.394 | 596 ms/step , 115705.26 GFLOP/s , 173862.0 tokens/s INFO:__main__:2024-11-30 04:47:12 | Epoch: 0 | Step: 321740 | Dataset: 0-3537960 | Loss: 0.426 | 597 ms/step , 115630.20 GFLOP/s , 173806.2 tokens/s INFO:__main__:2024-11-30 04:47:19 | Epoch: 0 | Step: 321750 | Dataset: 0-3540360 | Loss: 0.632 | 596 ms/step , 115698.21 GFLOP/s , 173764.1 tokens/s INFO:__main__:2024-11-30 04:47:26 | Epoch: 0 | Step: 321760 | Dataset: 0-3542760 | Loss: 0.612 | 597 ms/step , 115584.37 GFLOP/s , 173781.3 tokens/s INFO:__main__:2024-11-30 04:47:33 | Epoch: 0 | Step: 321770 | Dataset: 0-3545160 | Loss: 0.634 | 597 ms/step , 115531.03 GFLOP/s , 173742.9 tokens/s INFO:__main__:2024-11-30 04:47:40 | Epoch: 0 | Step: 321780 | Dataset: 0-3547560 | Loss: 0.621 | 596 ms/step , 115715.08 GFLOP/s , 173722.4 tokens/s INFO:__main__:2024-11-30 04:47:47 | Epoch: 0 | Step: 321790 | Dataset: 0-3549960 | Loss: 0.648 | 597 ms/step , 115624.69 GFLOP/s , 173826.1 tokens/s INFO:__main__:2024-11-30 04:47:54 | Epoch: 0 | Step: 321800 | Dataset: 0-3552360 | Loss: 0.605 | 597 ms/step , 115504.43 GFLOP/s , 173870.0 tokens/s INFO:__main__:2024-11-30 04:48:01 | Epoch: 0 | Step: 321810 | Dataset: 0-3554760 | Loss: 0.621 | 597 ms/step , 115621.03 GFLOP/s , 173799.7 tokens/s INFO:__main__:2024-11-30 04:48:08 | Epoch: 0 | Step: 321820 | Dataset: 0-3557160 | Loss: 0.588 | 596 ms/step , 115718.30 GFLOP/s , 173697.3 tokens/s INFO:__main__:2024-11-30 04:48:16 | Epoch: 0 | Step: 321830 | Dataset: 0-3559560 | Loss: 0.570 | 597 ms/step , 115634.23 GFLOP/s , 173705.4 tokens/s INFO:__main__:2024-11-30 04:48:23 | Epoch: 0 | Step: 321840 | Dataset: 0-3561960 | Loss: 0.636 | 597 ms/step , 115518.01 GFLOP/s , 173742.8 tokens/s INFO:__main__:2024-11-30 04:48:30 | Epoch: 0 | Step: 321850 | Dataset: 0-3564360 | Loss: 0.599 | 596 ms/step , 115723.89 GFLOP/s , 173735.0 tokens/s INFO:__main__:2024-11-30 04:48:37 | Epoch: 0 | Step: 321860 | Dataset: 0-3566760 | Loss: 0.602 | 597 ms/step , 115640.14 GFLOP/s , 173833.6 tokens/s INFO:__main__:2024-11-30 04:48:44 | Epoch: 0 | Step: 321870 | Dataset: 0-3569160 | Loss: 0.601 | 596 ms/step , 115699.51 GFLOP/s , 173783.7 tokens/s INFO:__main__:2024-11-30 04:48:51 | Epoch: 0 | Step: 321880 | Dataset: 0-3571560 | Loss: 0.608 | 597 ms/step , 115505.10 GFLOP/s , 173713.5 tokens/s INFO:__main__:2024-11-30 04:48:58 | Epoch: 0 | Step: 321890 | Dataset: 0-3573960 | Loss: 0.567 | 598 ms/step , 115438.33 GFLOP/s , 173696.7 tokens/s INFO:__main__:2024-11-30 04:49:05 | Epoch: 0 | Step: 321900 | Dataset: 0-3576360 | Loss: 0.578 | 597 ms/step , 115549.82 GFLOP/s , 173741.6 tokens/s INFO:__main__:2024-11-30 04:49:12 | Epoch: 0 | Step: 321910 | Dataset: 0-3578760 | Loss: 0.659 | 597 ms/step , 115568.92 GFLOP/s , 173707.3 tokens/s INFO:__main__:2024-11-30 04:49:19 | Epoch: 0 | Step: 321920 | Dataset: 0-3581160 | Loss: 0.583 | 598 ms/step , 115451.81 GFLOP/s , 173676.3 tokens/s INFO:__main__:2024-11-30 04:49:26 | Epoch: 0 | Step: 321930 | Dataset: 0-3583560 | Loss: 0.639 | 597 ms/step , 115612.28 GFLOP/s , 173759.3 tokens/s INFO:__main__:2024-11-30 04:49:33 | Epoch: 0 | Step: 321940 | Dataset: 0-3585960 | Loss: 0.599 | 597 ms/step , 115593.33 GFLOP/s , 173851.1 tokens/s INFO:__main__:2024-11-30 04:49:40 | Epoch: 0 | Step: 321950 | Dataset: 0-3588360 | Loss: 0.659 | 597 ms/step , 115628.05 GFLOP/s , 173807.5 tokens/s INFO:__main__:2024-11-30 04:49:48 | Epoch: 0 | Step: 321960 | Dataset: 0-3590760 | Loss: 0.609 | 597 ms/step , 115650.86 GFLOP/s , 173756.9 tokens/s INFO:__main__:2024-11-30 04:49:55 | Epoch: 0 | Step: 321970 | Dataset: 0-3593160 | Loss: 0.548 | 597 ms/step , 115575.69 GFLOP/s , 173711.0 tokens/s INFO:__main__:2024-11-30 04:50:02 | Epoch: 0 | Step: 321980 | Dataset: 0-3595560 | Loss: 0.607 | 597 ms/step , 115603.59 GFLOP/s , 173761.7 tokens/s INFO:__main__:2024-11-30 04:50:09 | Epoch: 0 | Step: 321990 | Dataset: 0-3597960 | Loss: 0.562 | 597 ms/step , 115611.97 GFLOP/s , 173655.4 tokens/s INFO:__main__:2024-11-30 04:50:16 | Validation | Step: 322000 | Val_loss: 0.983 | Best_val_loss: 0.4363 INFO:__main__:2024-11-30 04:50:16 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_045016_step_322000.pt` INFO:__main__:2024-11-30 04:50:19 | Epoch: 0 | Step: 322000 | Dataset: 0-3600360 | Loss: 0.575 | 595 ms/step , 115944.31 GFLOP/s , 118706.0 tokens/s INFO:__main__:2024-11-30 04:50:26 | Epoch: 0 | Step: 322010 | Dataset: 0-3602760 | Loss: 0.633 | 597 ms/step , 115528.25 GFLOP/s , 173577.8 tokens/s INFO:__main__:2024-11-30 04:50:33 | Epoch: 0 | Step: 322020 | Dataset: 0-3605160 | Loss: 0.666 | 599 ms/step , 115184.35 GFLOP/s , 173411.6 tokens/s INFO:__main__:2024-11-30 04:50:40 | Epoch: 0 | Step: 322030 | Dataset: 0-3607560 | Loss: 0.651 | 598 ms/step , 115328.33 GFLOP/s , 173261.4 tokens/s INFO:__main__:2024-11-30 04:50:47 | Epoch: 0 | Step: 322040 | Dataset: 0-3609960 | Loss: 0.567 | 599 ms/step , 115157.57 GFLOP/s , 173188.6 tokens/s INFO:__main__:2024-11-30 04:50:55 | Epoch: 0 | Step: 322050 | Dataset: 0-3612360 | Loss: 0.625 | 599 ms/step , 115262.67 GFLOP/s , 173267.0 tokens/s INFO:__main__:2024-11-30 04:51:02 | Epoch: 0 | Step: 322060 | Dataset: 0-3614760 | Loss: 0.602 | 597 ms/step , 115590.32 GFLOP/s , 173681.5 tokens/s INFO:__main__:2024-11-30 04:51:09 | Epoch: 0 | Step: 322070 | Dataset: 0-3617160 | Loss: 0.652 | 597 ms/step , 115659.08 GFLOP/s , 173784.3 tokens/s INFO:__main__:2024-11-30 04:51:16 | Epoch: 0 | Step: 322080 | Dataset: 0-3619560 | Loss: 0.662 | 596 ms/step , 115758.91 GFLOP/s , 173856.0 tokens/s INFO:__main__:2024-11-30 04:51:23 | Epoch: 0 | Step: 322090 | Dataset: 0-3621960 | Loss: 0.584 | 598 ms/step , 115498.82 GFLOP/s , 173915.1 tokens/s INFO:__main__:2024-11-30 04:51:30 | Epoch: 0 | Step: 322100 | Dataset: 0-3624360 | Loss: 0.688 | 596 ms/step , 115778.44 GFLOP/s , 173806.7 tokens/s INFO:__main__:2024-11-30 04:51:37 | Epoch: 0 | Step: 322110 | Dataset: 0-3626760 | Loss: 0.609 | 597 ms/step , 115581.44 GFLOP/s , 173787.2 tokens/s INFO:__main__:2024-11-30 04:51:44 | Epoch: 0 | Step: 322120 | Dataset: 0-3629160 | Loss: 0.640 | 597 ms/step , 115674.51 GFLOP/s , 173822.8 tokens/s INFO:__main__:2024-11-30 04:51:51 | Epoch: 0 | Step: 322130 | Dataset: 0-3631560 | Loss: 0.609 | 597 ms/step , 115510.40 GFLOP/s , 173783.4 tokens/s INFO:__main__:2024-11-30 04:51:58 | Epoch: 0 | Step: 322140 | Dataset: 0-3633960 | Loss: 0.564 | 597 ms/step , 115691.00 GFLOP/s , 173788.4 tokens/s INFO:__main__:2024-11-30 04:52:05 | Epoch: 0 | Step: 322150 | Dataset: 0-3636360 | Loss: 0.592 | 597 ms/step , 115681.63 GFLOP/s , 173768.0 tokens/s INFO:__main__:2024-11-30 04:52:12 | Epoch: 0 | Step: 322160 | Dataset: 0-3638760 | Loss: 0.635 | 596 ms/step , 115852.08 GFLOP/s , 173875.4 tokens/s INFO:__main__:2024-11-30 04:52:19 | Epoch: 0 | Step: 322170 | Dataset: 0-3641160 | Loss: 0.547 | 598 ms/step , 115501.54 GFLOP/s , 173836.6 tokens/s INFO:__main__:2024-11-30 04:52:26 | Epoch: 0 | Step: 322180 | Dataset: 0-3643560 | Loss: 0.617 | 597 ms/step , 115687.68 GFLOP/s , 173706.3 tokens/s INFO:__main__:2024-11-30 04:52:34 | Epoch: 0 | Step: 322190 | Dataset: 0-3645960 | Loss: 0.664 | 597 ms/step , 115632.33 GFLOP/s , 173625.8 tokens/s INFO:__main__:2024-11-30 04:52:41 | Epoch: 0 | Step: 322200 | Dataset: 0-3648360 | Loss: 0.585 | 597 ms/step , 115633.78 GFLOP/s , 173725.5 tokens/s INFO:__main__:2024-11-30 04:52:48 | Epoch: 0 | Step: 322210 | Dataset: 0-3650760 | Loss: 0.570 | 596 ms/step , 115746.99 GFLOP/s , 173757.1 tokens/s INFO:__main__:2024-11-30 04:52:55 | Epoch: 0 | Step: 322220 | Dataset: 0-3653160 | Loss: 0.556 | 597 ms/step , 115563.62 GFLOP/s , 173798.8 tokens/s INFO:__main__:2024-11-30 04:53:02 | Epoch: 0 | Step: 322230 | Dataset: 0-3655560 | Loss: 0.570 | 596 ms/step , 115865.39 GFLOP/s , 173866.1 tokens/s INFO:__main__:2024-11-30 04:53:09 | Epoch: 0 | Step: 322240 | Dataset: 0-3657960 | Loss: 0.579 | 598 ms/step , 115465.52 GFLOP/s , 173919.6 tokens/s INFO:__main__:2024-11-30 04:53:16 | Epoch: 0 | Step: 322250 | Dataset: 0-3660360 | Loss: 0.645 | 596 ms/step , 115707.98 GFLOP/s , 173749.4 tokens/s INFO:__main__:2024-11-30 04:53:23 | Epoch: 0 | Step: 322260 | Dataset: 0-3662760 | Loss: 0.584 | 596 ms/step , 115716.76 GFLOP/s , 173756.4 tokens/s INFO:__main__:2024-11-30 04:53:30 | Epoch: 0 | Step: 322270 | Dataset: 0-3665160 | Loss: 0.567 | 597 ms/step , 115694.31 GFLOP/s , 173780.9 tokens/s INFO:__main__:2024-11-30 04:53:37 | Epoch: 0 | Step: 322280 | Dataset: 0-3667560 | Loss: 0.639 | 597 ms/step , 115585.31 GFLOP/s , 173773.8 tokens/s INFO:__main__:2024-11-30 04:53:44 | Epoch: 0 | Step: 322290 | Dataset: 0-3669960 | Loss: 0.598 | 598 ms/step , 115391.41 GFLOP/s , 173710.7 tokens/s INFO:__main__:2024-11-30 04:53:51 | Epoch: 0 | Step: 322300 | Dataset: 0-3672360 | Loss: 0.472 | 596 ms/step , 115762.53 GFLOP/s , 173822.1 tokens/s INFO:__main__:2024-11-30 04:53:58 | Epoch: 0 | Step: 322310 | Dataset: 0-3674760 | Loss: 0.454 | 596 ms/step , 115827.99 GFLOP/s , 173922.2 tokens/s INFO:__main__:2024-11-30 04:54:05 | Epoch: 0 | Step: 322320 | Dataset: 0-3677160 | Loss: 0.457 | 597 ms/step , 115647.06 GFLOP/s , 173829.1 tokens/s INFO:__main__:2024-11-30 04:54:12 | Epoch: 0 | Step: 322330 | Dataset: 0-3679560 | Loss: 0.478 | 597 ms/step , 115657.43 GFLOP/s , 173772.0 tokens/s INFO:__main__:2024-11-30 04:54:20 | Epoch: 0 | Step: 322340 | Dataset: 0-3681960 | Loss: 0.435 | 597 ms/step , 115679.69 GFLOP/s , 173741.8 tokens/s INFO:__main__:2024-11-30 04:54:27 | Epoch: 0 | Step: 322350 | Dataset: 0-3684360 | Loss: 0.493 | 597 ms/step , 115641.87 GFLOP/s , 173779.7 tokens/s INFO:__main__:2024-11-30 04:54:34 | Epoch: 0 | Step: 322360 | Dataset: 0-3686760 | Loss: 0.438 | 596 ms/step , 115697.35 GFLOP/s , 173798.3 tokens/s INFO:__main__:2024-11-30 04:54:41 | Epoch: 0 | Step: 322370 | Dataset: 0-3689160 | Loss: 0.491 | 597 ms/step , 115643.05 GFLOP/s , 173854.4 tokens/s INFO:__main__:2024-11-30 04:54:48 | Epoch: 0 | Step: 322380 | Dataset: 0-3691560 | Loss: 0.434 | 596 ms/step , 115797.71 GFLOP/s , 173867.3 tokens/s INFO:__main__:2024-11-30 04:54:55 | Epoch: 0 | Step: 322390 | Dataset: 0-3693960 | Loss: 0.411 | 596 ms/step , 115824.40 GFLOP/s , 173969.9 tokens/s INFO:__main__:2024-11-30 04:55:02 | Epoch: 0 | Step: 322400 | Dataset: 0-3696360 | Loss: 0.458 | 597 ms/step , 115540.82 GFLOP/s , 173786.0 tokens/s INFO:__main__:2024-11-30 04:55:09 | Epoch: 0 | Step: 322410 | Dataset: 0-3698760 | Loss: 0.419 | 597 ms/step , 115533.88 GFLOP/s , 173838.7 tokens/s INFO:__main__:2024-11-30 04:55:16 | Epoch: 0 | Step: 322420 | Dataset: 0-3701160 | Loss: 0.442 | 597 ms/step , 115594.80 GFLOP/s , 173774.8 tokens/s INFO:__main__:2024-11-30 04:55:23 | Epoch: 0 | Step: 322430 | Dataset: 0-3703560 | Loss: 0.397 | 596 ms/step , 115786.34 GFLOP/s , 173742.2 tokens/s INFO:__main__:2024-11-30 04:55:30 | Epoch: 0 | Step: 322440 | Dataset: 0-3705960 | Loss: 0.411 | 597 ms/step , 115621.45 GFLOP/s , 173759.1 tokens/s INFO:__main__:2024-11-30 04:55:37 | Epoch: 0 | Step: 322450 | Dataset: 0-3708360 | Loss: 0.511 | 596 ms/step , 115790.48 GFLOP/s , 173863.0 tokens/s INFO:__main__:2024-11-30 04:55:44 | Epoch: 0 | Step: 322460 | Dataset: 0-3710760 | Loss: 0.471 | 596 ms/step , 115815.43 GFLOP/s , 173868.3 tokens/s INFO:__main__:2024-11-30 04:55:51 | Epoch: 0 | Step: 322470 | Dataset: 0-3713160 | Loss: 0.477 | 597 ms/step , 115685.71 GFLOP/s , 173805.8 tokens/s INFO:__main__:2024-11-30 04:55:59 | Epoch: 0 | Step: 322480 | Dataset: 0-3715560 | Loss: 0.436 | 598 ms/step , 115385.32 GFLOP/s , 173736.8 tokens/s INFO:__main__:2024-11-30 04:56:06 | Epoch: 0 | Step: 322490 | Dataset: 0-3717960 | Loss: 0.409 | 596 ms/step , 115721.59 GFLOP/s , 173777.1 tokens/s INFO:__main__:2024-11-30 04:56:13 | Validation | Step: 322500 | Val_loss: 0.985 | Best_val_loss: 0.4363 INFO:__main__:2024-11-30 04:56:14 | Epoch: 0 | Step: 322500 | Dataset: 0-3720360 | Loss: 0.441 | 595 ms/step , 115917.18 GFLOP/s , 147805.0 tokens/s INFO:__main__:2024-11-30 04:56:21 | Epoch: 0 | Step: 322510 | Dataset: 0-3722760 | Loss: 0.439 | 596 ms/step , 115743.79 GFLOP/s , 173871.9 tokens/s INFO:__main__:2024-11-30 04:56:28 | Epoch: 0 | Step: 322520 | Dataset: 0-3725160 | Loss: 0.440 | 597 ms/step , 115691.20 GFLOP/s , 173809.9 tokens/s INFO:__main__:2024-11-30 04:56:35 | Epoch: 0 | Step: 322530 | Dataset: 0-3727560 | Loss: 0.445 | 596 ms/step , 115723.27 GFLOP/s , 173980.5 tokens/s INFO:__main__:2024-11-30 04:56:42 | Epoch: 0 | Step: 322540 | Dataset: 0-3729960 | Loss: 0.461 | 597 ms/step , 115666.98 GFLOP/s , 173939.9 tokens/s INFO:__main__:2024-11-30 04:56:49 | Epoch: 0 | Step: 322550 | Dataset: 0-3732360 | Loss: 0.505 | 597 ms/step , 115574.37 GFLOP/s , 173839.6 tokens/s INFO:__main__:2024-11-30 04:56:56 | Epoch: 0 | Step: 322560 | Dataset: 0-3734760 | Loss: 0.428 | 596 ms/step , 115743.49 GFLOP/s , 173782.3 tokens/s INFO:__main__:2024-11-30 04:57:03 | Epoch: 0 | Step: 322570 | Dataset: 0-3737160 | Loss: 0.495 | 597 ms/step , 115545.80 GFLOP/s , 173823.9 tokens/s INFO:__main__:2024-11-30 04:57:10 | Epoch: 0 | Step: 322580 | Dataset: 0-3739560 | Loss: 0.417 | 596 ms/step , 115753.70 GFLOP/s , 173778.7 tokens/s INFO:__main__:2024-11-30 04:57:18 | Epoch: 0 | Step: 322590 | Dataset: 0-3741960 | Loss: 0.433 | 597 ms/step , 115503.22 GFLOP/s , 173801.2 tokens/s INFO:__main__:2024-11-30 04:57:25 | Epoch: 0 | Step: 322600 | Dataset: 0-3744360 | Loss: 0.471 | 596 ms/step , 115813.40 GFLOP/s , 173790.0 tokens/s INFO:__main__:2024-11-30 04:57:32 | Epoch: 0 | Step: 322610 | Dataset: 0-3746760 | Loss: 0.456 | 595 ms/step , 115907.98 GFLOP/s , 173998.6 tokens/s INFO:__main__:2024-11-30 04:57:39 | Epoch: 0 | Step: 322620 | Dataset: 0-3749160 | Loss: 0.461 | 598 ms/step , 115345.31 GFLOP/s , 173700.0 tokens/s INFO:__main__:2024-11-30 04:57:46 | Epoch: 0 | Step: 322630 | Dataset: 0-3751560 | Loss: 0.421 | 597 ms/step , 115551.70 GFLOP/s , 173804.0 tokens/s INFO:__main__:2024-11-30 04:57:53 | Epoch: 0 | Step: 322640 | Dataset: 0-3753960 | Loss: 0.434 | 597 ms/step , 115640.30 GFLOP/s , 173797.7 tokens/s INFO:__main__:2024-11-30 04:58:00 | Epoch: 0 | Step: 322650 | Dataset: 0-3756360 | Loss: 0.434 | 597 ms/step , 115580.09 GFLOP/s , 173750.6 tokens/s INFO:__main__:2024-11-30 04:58:07 | Epoch: 0 | Step: 322660 | Dataset: 0-3758760 | Loss: 0.393 | 596 ms/step , 115788.82 GFLOP/s , 173777.0 tokens/s INFO:__main__:2024-11-30 04:58:14 | Epoch: 0 | Step: 322670 | Dataset: 0-3761160 | Loss: 0.421 | 597 ms/step , 115638.57 GFLOP/s , 173777.3 tokens/s INFO:__main__:2024-11-30 04:58:21 | Epoch: 0 | Step: 322680 | Dataset: 0-3763560 | Loss: 0.426 | 596 ms/step , 115726.71 GFLOP/s , 173959.2 tokens/s INFO:__main__:2024-11-30 04:58:28 | Epoch: 0 | Step: 322690 | Dataset: 0-3765960 | Loss: 0.500 | 596 ms/step , 115739.52 GFLOP/s , 173879.9 tokens/s INFO:__main__:2024-11-30 04:58:35 | Epoch: 0 | Step: 322700 | Dataset: 0-3768360 | Loss: 0.449 | 597 ms/step , 115597.92 GFLOP/s , 173733.1 tokens/s INFO:__main__:2024-11-30 04:58:42 | Epoch: 0 | Step: 322710 | Dataset: 0-3770760 | Loss: 0.444 | 597 ms/step , 115663.97 GFLOP/s , 173745.7 tokens/s INFO:__main__:2024-11-30 04:58:49 | Epoch: 0 | Step: 322720 | Dataset: 0-3773160 | Loss: 0.429 | 596 ms/step , 115703.87 GFLOP/s , 173731.9 tokens/s INFO:__main__:2024-11-30 04:58:57 | Epoch: 0 | Step: 322730 | Dataset: 0-3775560 | Loss: 0.481 | 597 ms/step , 115621.20 GFLOP/s , 173774.5 tokens/s INFO:__main__:2024-11-30 04:59:04 | Epoch: 0 | Step: 322740 | Dataset: 0-3777960 | Loss: 0.450 | 596 ms/step , 115726.10 GFLOP/s , 173758.1 tokens/s INFO:__main__:2024-11-30 04:59:11 | Epoch: 0 | Step: 322750 | Dataset: 0-3780360 | Loss: 0.464 | 596 ms/step , 115810.11 GFLOP/s , 173816.7 tokens/s INFO:__main__:2024-11-30 04:59:18 | Epoch: 0 | Step: 322760 | Dataset: 0-3782760 | Loss: 0.433 | 596 ms/step , 115749.05 GFLOP/s , 173931.9 tokens/s INFO:__main__:2024-11-30 04:59:25 | Epoch: 0 | Step: 322770 | Dataset: 0-3785160 | Loss: 0.478 | 596 ms/step , 115732.84 GFLOP/s , 173839.1 tokens/s INFO:__main__:2024-11-30 04:59:32 | Epoch: 0 | Step: 322780 | Dataset: 0-3787560 | Loss: 0.451 | 597 ms/step , 115571.67 GFLOP/s , 173751.5 tokens/s INFO:__main__:2024-11-30 04:59:39 | Epoch: 0 | Step: 322790 | Dataset: 0-3789960 | Loss: 0.426 | 598 ms/step , 115472.53 GFLOP/s , 173777.8 tokens/s INFO:__main__:2024-11-30 04:59:46 | Epoch: 0 | Step: 322800 | Dataset: 0-3792360 | Loss: 0.461 | 597 ms/step , 115686.66 GFLOP/s , 173781.2 tokens/s INFO:__main__:2024-11-30 04:59:53 | Epoch: 0 | Step: 322810 | Dataset: 0-3794760 | Loss: 0.417 | 597 ms/step , 115644.38 GFLOP/s , 173711.8 tokens/s INFO:__main__:2024-11-30 05:00:00 | Epoch: 0 | Step: 322820 | Dataset: 0-3797160 | Loss: 0.468 | 597 ms/step , 115518.04 GFLOP/s , 173718.5 tokens/s INFO:__main__:2024-11-30 05:00:07 | Epoch: 0 | Step: 322830 | Dataset: 0-3799560 | Loss: 0.440 | 596 ms/step , 115722.80 GFLOP/s , 173898.9 tokens/s INFO:__main__:2024-11-30 05:00:14 | Epoch: 0 | Step: 322840 | Dataset: 0-3801960 | Loss: 0.815 | 596 ms/step , 115699.22 GFLOP/s , 173834.6 tokens/s INFO:__main__:2024-11-30 05:00:21 | Epoch: 0 | Step: 322850 | Dataset: 0-3804360 | Loss: 0.746 | 597 ms/step , 115581.32 GFLOP/s , 173648.4 tokens/s INFO:__main__:2024-11-30 05:00:28 | Epoch: 0 | Step: 322860 | Dataset: 0-3806760 | Loss: 0.681 | 597 ms/step , 115633.42 GFLOP/s , 173671.7 tokens/s INFO:__main__:2024-11-30 05:00:36 | Epoch: 0 | Step: 322870 | Dataset: 0-3809160 | Loss: 0.607 | 597 ms/step , 115526.49 GFLOP/s , 173658.6 tokens/s INFO:__main__:2024-11-30 05:00:43 | Epoch: 0 | Step: 322880 | Dataset: 0-3811560 | Loss: 0.731 | 597 ms/step , 115506.57 GFLOP/s , 173624.3 tokens/s INFO:__main__:2024-11-30 05:00:50 | Epoch: 0 | Step: 322890 | Dataset: 0-3813960 | Loss: 0.721 | 597 ms/step , 115687.79 GFLOP/s , 173651.3 tokens/s INFO:__main__:2024-11-30 05:00:57 | Epoch: 0 | Step: 322900 | Dataset: 0-3816360 | Loss: 0.624 | 597 ms/step , 115686.25 GFLOP/s , 173727.9 tokens/s INFO:__main__:2024-11-30 05:01:04 | Epoch: 0 | Step: 322910 | Dataset: 0-3818760 | Loss: 0.751 | 597 ms/step , 115678.72 GFLOP/s , 173842.0 tokens/s INFO:__main__:2024-11-30 05:01:11 | Epoch: 0 | Step: 322920 | Dataset: 0-3821160 | Loss: 0.659 | 597 ms/step , 115616.47 GFLOP/s , 173619.5 tokens/s INFO:__main__:2024-11-30 05:01:18 | Epoch: 0 | Step: 322930 | Dataset: 0-3823560 | Loss: 0.740 | 597 ms/step , 115524.28 GFLOP/s , 173704.7 tokens/s INFO:__main__:2024-11-30 05:01:25 | Epoch: 0 | Step: 322940 | Dataset: 0-3825960 | Loss: 0.741 | 597 ms/step , 115550.90 GFLOP/s , 173658.1 tokens/s INFO:__main__:2024-11-30 05:01:32 | Epoch: 0 | Step: 322950 | Dataset: 0-3828360 | Loss: 0.682 | 597 ms/step , 115595.03 GFLOP/s , 173703.3 tokens/s INFO:__main__:2024-11-30 05:01:39 | Epoch: 0 | Step: 322960 | Dataset: 0-3830760 | Loss: 0.826 | 597 ms/step , 115584.32 GFLOP/s , 173675.3 tokens/s INFO:__main__:2024-11-30 05:01:46 | Epoch: 0 | Step: 322970 | Dataset: 0-3833160 | Loss: 0.721 | 597 ms/step , 115673.63 GFLOP/s , 173754.4 tokens/s INFO:__main__:2024-11-30 05:01:53 | Epoch: 0 | Step: 322980 | Dataset: 0-3835560 | Loss: 0.802 | 596 ms/step , 115778.89 GFLOP/s , 173911.0 tokens/s INFO:__main__:2024-11-30 05:02:00 | Epoch: 0 | Step: 322990 | Dataset: 0-3837960 | Loss: 0.743 | 597 ms/step , 115538.60 GFLOP/s , 173792.6 tokens/s INFO:__main__:2024-11-30 05:02:08 | Validation | Step: 323000 | Val_loss: 1.046 | Best_val_loss: 0.4363 INFO:__main__:2024-11-30 05:02:08 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_050208_step_323000.pt` INFO:__main__:2024-11-30 05:02:11 | Epoch: 0 | Step: 323000 | Dataset: 0-3840360 | Loss: 0.748 | 595 ms/step , 115939.19 GFLOP/s , 118257.5 tokens/s INFO:__main__:2024-11-30 05:02:18 | Epoch: 0 | Step: 323010 | Dataset: 0-3842760 | Loss: 0.641 | 598 ms/step , 115455.19 GFLOP/s , 173392.8 tokens/s INFO:__main__:2024-11-30 05:02:25 | Epoch: 0 | Step: 323020 | Dataset: 0-3845160 | Loss: 0.685 | 599 ms/step , 115291.56 GFLOP/s , 173280.7 tokens/s INFO:__main__:2024-11-30 05:02:32 | Epoch: 0 | Step: 323030 | Dataset: 0-3847560 | Loss: 0.716 | 599 ms/step , 115142.59 GFLOP/s , 173283.5 tokens/s INFO:__main__:2024-11-30 05:02:39 | Epoch: 0 | Step: 323040 | Dataset: 0-3849960 | Loss: 0.737 | 598 ms/step , 115365.30 GFLOP/s , 173218.9 tokens/s INFO:__main__:2024-11-30 05:02:46 | Epoch: 0 | Step: 323050 | Dataset: 0-3852360 | Loss: 0.674 | 598 ms/step , 115464.15 GFLOP/s , 173370.4 tokens/s INFO:__main__:2024-11-30 05:02:53 | Epoch: 0 | Step: 323060 | Dataset: 0-3854760 | Loss: 0.709 | 599 ms/step , 115231.86 GFLOP/s , 173394.7 tokens/s INFO:__main__:2024-11-30 05:03:00 | Epoch: 0 | Step: 323070 | Dataset: 0-3857160 | Loss: 0.676 | 597 ms/step , 115584.46 GFLOP/s , 173626.9 tokens/s INFO:__main__:2024-11-30 05:03:07 | Epoch: 0 | Step: 323080 | Dataset: 0-3859560 | Loss: 0.668 | 597 ms/step , 115652.63 GFLOP/s , 173822.6 tokens/s INFO:__main__:2024-11-30 05:03:15 | Epoch: 0 | Step: 323090 | Dataset: 0-3861960 | Loss: 0.641 | 597 ms/step , 115666.16 GFLOP/s , 173777.5 tokens/s INFO:__main__:2024-11-30 05:03:22 | Epoch: 0 | Step: 323100 | Dataset: 0-3864360 | Loss: 0.799 | 597 ms/step , 115531.23 GFLOP/s , 173790.5 tokens/s INFO:__main__:2024-11-30 05:03:29 | Epoch: 0 | Step: 323110 | Dataset: 0-3866760 | Loss: 0.689 | 596 ms/step , 115742.21 GFLOP/s , 173751.9 tokens/s INFO:__main__:2024-11-30 05:03:36 | Epoch: 0 | Step: 323120 | Dataset: 0-3869160 | Loss: 0.875 | 597 ms/step , 115674.56 GFLOP/s , 173827.7 tokens/s INFO:__main__:2024-11-30 05:03:43 | Epoch: 0 | Step: 323130 | Dataset: 0-3871560 | Loss: 0.753 | 596 ms/step , 115724.92 GFLOP/s , 173832.8 tokens/s INFO:__main__:2024-11-30 05:03:50 | Epoch: 0 | Step: 323140 | Dataset: 0-3873960 | Loss: 0.685 | 598 ms/step , 115499.41 GFLOP/s , 173826.2 tokens/s INFO:__main__:2024-11-30 05:03:57 | Epoch: 0 | Step: 323150 | Dataset: 0-3876360 | Loss: 0.732 | 597 ms/step , 115647.37 GFLOP/s , 173782.2 tokens/s INFO:__main__:2024-11-30 05:04:04 | Epoch: 0 | Step: 323160 | Dataset: 0-3878760 | Loss: 0.677 | 596 ms/step , 115704.01 GFLOP/s , 173763.1 tokens/s INFO:__main__:2024-11-30 05:04:11 | Epoch: 0 | Step: 323170 | Dataset: 0-3881160 | Loss: 0.675 | 596 ms/step , 115706.67 GFLOP/s , 173727.1 tokens/s INFO:__main__:2024-11-30 05:04:18 | Epoch: 0 | Step: 323180 | Dataset: 0-3883560 | Loss: 0.742 | 597 ms/step , 115633.89 GFLOP/s , 173687.2 tokens/s INFO:__main__:2024-11-30 05:04:25 | Epoch: 0 | Step: 323190 | Dataset: 0-3885960 | Loss: 0.669 | 598 ms/step , 115338.20 GFLOP/s , 173682.9 tokens/s INFO:__main__:2024-11-30 05:04:32 | Epoch: 0 | Step: 323200 | Dataset: 0-3888360 | Loss: 0.698 | 596 ms/step , 115817.07 GFLOP/s , 173930.5 tokens/s INFO:__main__:2024-11-30 05:04:39 | Epoch: 0 | Step: 323210 | Dataset: 0-3890760 | Loss: 0.664 | 597 ms/step , 115624.30 GFLOP/s , 173751.5 tokens/s INFO:__main__:2024-11-30 05:04:46 | Epoch: 0 | Step: 323220 | Dataset: 0-3893160 | Loss: 0.661 | 597 ms/step , 115639.26 GFLOP/s , 173775.0 tokens/s INFO:__main__:2024-11-30 05:04:54 | Epoch: 0 | Step: 323230 | Dataset: 0-3895560 | Loss: 0.719 | 597 ms/step , 115659.05 GFLOP/s , 173683.5 tokens/s INFO:__main__:2024-11-30 05:05:01 | Epoch: 0 | Step: 323240 | Dataset: 0-3897960 | Loss: 0.679 | 597 ms/step , 115548.84 GFLOP/s , 173776.8 tokens/s INFO:__main__:2024-11-30 05:05:08 | Epoch: 0 | Step: 323250 | Dataset: 0-3900360 | Loss: 0.746 | 597 ms/step , 115638.75 GFLOP/s , 173727.2 tokens/s INFO:__main__:2024-11-30 05:05:15 | Epoch: 0 | Step: 323260 | Dataset: 0-3902760 | Loss: 0.708 | 597 ms/step , 115650.98 GFLOP/s , 173720.0 tokens/s INFO:__main__:2024-11-30 05:05:22 | Epoch: 0 | Step: 323270 | Dataset: 0-3905160 | Loss: 0.846 | 597 ms/step , 115611.37 GFLOP/s , 173817.1 tokens/s INFO:__main__:2024-11-30 05:05:29 | Epoch: 0 | Step: 323280 | Dataset: 0-3907560 | Loss: 0.674 | 596 ms/step , 115714.09 GFLOP/s , 173860.1 tokens/s INFO:__main__:2024-11-30 05:05:36 | Epoch: 0 | Step: 323290 | Dataset: 0-3909960 | Loss: 0.700 | 597 ms/step , 115559.19 GFLOP/s , 173725.2 tokens/s INFO:__main__:2024-11-30 05:05:43 | Epoch: 0 | Step: 323300 | Dataset: 0-3912360 | Loss: 0.638 | 596 ms/step , 115718.68 GFLOP/s , 173734.2 tokens/s INFO:__main__:2024-11-30 05:05:50 | Epoch: 0 | Step: 323310 | Dataset: 0-3914760 | Loss: 0.641 | 597 ms/step , 115529.20 GFLOP/s , 173738.8 tokens/s INFO:__main__:2024-11-30 05:05:57 | Epoch: 0 | Step: 323320 | Dataset: 0-3917160 | Loss: 0.736 | 597 ms/step , 115594.30 GFLOP/s , 173772.5 tokens/s INFO:__main__:2024-11-30 05:06:04 | Epoch: 0 | Step: 323330 | Dataset: 0-3919560 | Loss: 0.707 | 597 ms/step , 115577.24 GFLOP/s , 173787.7 tokens/s INFO:__main__:2024-11-30 05:06:11 | Epoch: 0 | Step: 323340 | Dataset: 0-3921960 | Loss: 0.613 | 597 ms/step , 115570.17 GFLOP/s , 173774.6 tokens/s INFO:__main__:2024-11-30 05:06:18 | Epoch: 0 | Step: 323350 | Dataset: 0-3924360 | Loss: 0.786 | 596 ms/step , 115742.94 GFLOP/s , 173830.8 tokens/s INFO:__main__:2024-11-30 05:06:25 | Epoch: 0 | Step: 323360 | Dataset: 0-3926760 | Loss: 0.683 | 596 ms/step , 115714.35 GFLOP/s , 173801.4 tokens/s INFO:__main__:2024-11-30 05:06:33 | Epoch: 0 | Step: 323370 | Dataset: 0-3929160 | Loss: 0.761 | 597 ms/step , 115682.69 GFLOP/s , 173725.5 tokens/s INFO:__main__:2024-11-30 05:06:40 | Epoch: 0 | Step: 323380 | Dataset: 0-3931560 | Loss: 0.739 | 598 ms/step , 115464.93 GFLOP/s , 173645.7 tokens/s INFO:__main__:2024-11-30 05:06:47 | Epoch: 0 | Step: 323390 | Dataset: 0-3933960 | Loss: 0.810 | 597 ms/step , 115511.72 GFLOP/s , 173666.4 tokens/s INFO:__main__:2024-11-30 05:06:54 | Epoch: 0 | Step: 323400 | Dataset: 0-3936360 | Loss: 0.773 | 598 ms/step , 115450.78 GFLOP/s , 173641.2 tokens/s INFO:__main__:2024-11-30 05:07:01 | Epoch: 0 | Step: 323410 | Dataset: 0-3938760 | Loss: 0.818 | 598 ms/step , 115464.76 GFLOP/s , 173605.5 tokens/s INFO:__main__:2024-11-30 05:07:08 | Epoch: 0 | Step: 323420 | Dataset: 0-3941160 | Loss: 0.759 | 597 ms/step , 115658.81 GFLOP/s , 173673.4 tokens/s INFO:__main__:2024-11-30 05:07:15 | Epoch: 0 | Step: 323430 | Dataset: 0-3943560 | Loss: 0.771 | 597 ms/step , 115520.59 GFLOP/s , 173787.8 tokens/s INFO:__main__:2024-11-30 05:07:22 | Epoch: 0 | Step: 323440 | Dataset: 0-3945960 | Loss: 0.759 | 598 ms/step , 115327.09 GFLOP/s , 173649.3 tokens/s INFO:__main__:2024-11-30 05:07:29 | Epoch: 0 | Step: 323450 | Dataset: 0-3948360 | Loss: 0.801 | 598 ms/step , 115440.38 GFLOP/s , 173589.1 tokens/s INFO:__main__:2024-11-30 05:07:36 | Epoch: 0 | Step: 323460 | Dataset: 0-3950760 | Loss: 0.819 | 597 ms/step , 115640.09 GFLOP/s , 173628.3 tokens/s INFO:__main__:2024-11-30 05:07:43 | Epoch: 0 | Step: 323470 | Dataset: 0-3953160 | Loss: 0.768 | 597 ms/step , 115628.69 GFLOP/s , 173588.5 tokens/s INFO:__main__:2024-11-30 05:07:50 | Epoch: 0 | Step: 323480 | Dataset: 0-3955560 | Loss: 0.736 | 597 ms/step , 115685.79 GFLOP/s , 173562.2 tokens/s INFO:__main__:2024-11-30 05:07:57 | Epoch: 0 | Step: 323490 | Dataset: 0-3957960 | Loss: 0.840 | 597 ms/step , 115570.34 GFLOP/s , 173608.5 tokens/s INFO:__main__:2024-11-30 05:08:05 | Validation | Step: 323500 | Val_loss: 1.414 | Best_val_loss: 0.4363 INFO:__main__:2024-11-30 05:08:06 | Epoch: 0 | Step: 323500 | Dataset: 0-3960360 | Loss: 0.798 | 596 ms/step , 115857.52 GFLOP/s , 147850.2 tokens/s INFO:__main__:2024-11-30 05:08:13 | Epoch: 0 | Step: 323510 | Dataset: 0-3962760 | Loss: 0.762 | 597 ms/step , 115614.40 GFLOP/s , 173758.4 tokens/s INFO:__main__:2024-11-30 05:08:20 | Epoch: 0 | Step: 323520 | Dataset: 0-3965160 | Loss: 0.705 | 597 ms/step , 115667.18 GFLOP/s , 173650.8 tokens/s INFO:__main__:2024-11-30 05:08:27 | Epoch: 0 | Step: 323530 | Dataset: 0-3967560 | Loss: 0.743 | 598 ms/step , 115468.45 GFLOP/s , 173668.4 tokens/s INFO:__main__:2024-11-30 05:08:34 | Epoch: 0 | Step: 323540 | Dataset: 0-3969960 | Loss: 0.731 | 598 ms/step , 115440.94 GFLOP/s , 173659.5 tokens/s INFO:__main__:2024-11-30 05:08:41 | Epoch: 0 | Step: 323550 | Dataset: 0-3972360 | Loss: 0.738 | 597 ms/step , 115504.91 GFLOP/s , 173636.5 tokens/s INFO:__main__:2024-11-30 05:08:48 | Epoch: 0 | Step: 323560 | Dataset: 0-3974760 | Loss: 0.778 | 598 ms/step , 115489.95 GFLOP/s , 173654.6 tokens/s INFO:__main__:2024-11-30 05:08:55 | Epoch: 0 | Step: 323570 | Dataset: 0-3977160 | Loss: 0.742 | 596 ms/step , 115782.54 GFLOP/s , 173787.0 tokens/s INFO:__main__:2024-11-30 05:09:02 | Epoch: 0 | Step: 323580 | Dataset: 0-3979560 | Loss: 0.740 | 597 ms/step , 115567.05 GFLOP/s , 173717.7 tokens/s INFO:__main__:2024-11-30 05:09:09 | Epoch: 0 | Step: 323590 | Dataset: 0-3981960 | Loss: 0.754 | 598 ms/step , 115394.26 GFLOP/s , 173628.0 tokens/s INFO:__main__:2024-11-30 05:09:17 | Epoch: 0 | Step: 323600 | Dataset: 0-3984360 | Loss: 0.804 | 597 ms/step , 115508.76 GFLOP/s , 173644.7 tokens/s INFO:__main__:2024-11-30 05:09:24 | Epoch: 0 | Step: 323610 | Dataset: 0-3986760 | Loss: 0.775 | 597 ms/step , 115616.86 GFLOP/s , 173630.1 tokens/s INFO:__main__:2024-11-30 05:09:31 | Epoch: 0 | Step: 323620 | Dataset: 0-3989160 | Loss: 0.699 | 597 ms/step , 115590.79 GFLOP/s , 173658.7 tokens/s INFO:__main__:2024-11-30 05:09:38 | Epoch: 0 | Step: 323630 | Dataset: 0-3991560 | Loss: 0.832 | 597 ms/step , 115515.16 GFLOP/s , 173632.6 tokens/s INFO:__main__:2024-11-30 05:09:45 | Epoch: 0 | Step: 323640 | Dataset: 0-3993960 | Loss: 0.725 | 597 ms/step , 115603.69 GFLOP/s , 173713.4 tokens/s INFO:__main__:2024-11-30 05:09:52 | Epoch: 0 | Step: 323650 | Dataset: 0-3996360 | Loss: 0.736 | 597 ms/step , 115657.16 GFLOP/s , 173773.1 tokens/s INFO:__main__:2024-11-30 05:09:59 | Epoch: 0 | Step: 323660 | Dataset: 0-3998760 | Loss: 0.764 | 597 ms/step , 115527.21 GFLOP/s , 173667.6 tokens/s INFO:__main__:2024-11-30 05:10:06 | Epoch: 0 | Step: 323670 | Dataset: 0-4001160 | Loss: 0.757 | 598 ms/step , 115413.20 GFLOP/s , 173676.9 tokens/s INFO:__main__:2024-11-30 05:10:13 | Epoch: 0 | Step: 323680 | Dataset: 0-4003560 | Loss: 0.765 | 598 ms/step , 115465.29 GFLOP/s , 173552.2 tokens/s INFO:__main__:2024-11-30 05:10:20 | Epoch: 0 | Step: 323690 | Dataset: 0-4005960 | Loss: 0.707 | 597 ms/step , 115509.18 GFLOP/s , 173635.9 tokens/s INFO:__main__:2024-11-30 05:10:27 | Epoch: 0 | Step: 323700 | Dataset: 0-4008360 | Loss: 0.746 | 597 ms/step , 115589.34 GFLOP/s , 173705.4 tokens/s INFO:__main__:2024-11-30 05:10:34 | Epoch: 0 | Step: 323710 | Dataset: 0-4010760 | Loss: 0.794 | 596 ms/step , 115707.27 GFLOP/s , 173571.2 tokens/s INFO:__main__:2024-11-30 05:10:41 | Epoch: 0 | Step: 323720 | Dataset: 0-4013160 | Loss: 0.718 | 597 ms/step , 115618.92 GFLOP/s , 173731.2 tokens/s INFO:__main__:2024-11-30 05:10:49 | Epoch: 0 | Step: 323730 | Dataset: 0-4015560 | Loss: 0.668 | 598 ms/step , 115479.15 GFLOP/s , 173697.6 tokens/s INFO:__main__:2024-11-30 05:10:56 | Epoch: 0 | Step: 323740 | Dataset: 0-4017960 | Loss: 0.737 | 598 ms/step , 115416.90 GFLOP/s , 173646.2 tokens/s INFO:__main__:2024-11-30 05:11:03 | Epoch: 0 | Step: 323750 | Dataset: 0-4020360 | Loss: 0.750 | 599 ms/step , 115260.15 GFLOP/s , 173578.6 tokens/s INFO:__main__:2024-11-30 05:11:10 | Epoch: 0 | Step: 323760 | Dataset: 0-4022760 | Loss: 0.765 | 597 ms/step , 115606.37 GFLOP/s , 173621.1 tokens/s INFO:__main__:2024-11-30 05:11:17 | Epoch: 0 | Step: 323770 | Dataset: 0-4025160 | Loss: 0.775 | 598 ms/step , 115382.81 GFLOP/s , 173646.1 tokens/s INFO:__main__:2024-11-30 05:11:24 | Epoch: 0 | Step: 323780 | Dataset: 0-4027560 | Loss: 0.758 | 598 ms/step , 115486.04 GFLOP/s , 173644.3 tokens/s INFO:__main__:2024-11-30 05:11:31 | Epoch: 0 | Step: 323790 | Dataset: 0-4029960 | Loss: 0.759 | 598 ms/step , 115456.44 GFLOP/s , 173733.1 tokens/s INFO:__main__:2024-11-30 05:11:38 | Epoch: 0 | Step: 323800 | Dataset: 0-4032360 | Loss: 0.733 | 597 ms/step , 115682.90 GFLOP/s , 173761.4 tokens/s INFO:__main__:2024-11-30 05:11:45 | Epoch: 0 | Step: 323810 | Dataset: 0-4034760 | Loss: 0.830 | 598 ms/step , 115473.80 GFLOP/s , 173686.4 tokens/s INFO:__main__:2024-11-30 05:11:52 | Epoch: 0 | Step: 323820 | Dataset: 0-4037160 | Loss: 0.742 | 597 ms/step , 115506.27 GFLOP/s , 173693.2 tokens/s INFO:__main__:2024-11-30 05:11:59 | Epoch: 0 | Step: 323830 | Dataset: 0-4039560 | Loss: 0.753 | 597 ms/step , 115542.70 GFLOP/s , 173679.2 tokens/s INFO:__main__:2024-11-30 05:12:06 | Epoch: 0 | Step: 323840 | Dataset: 0-4041960 | Loss: 0.727 | 597 ms/step , 115593.47 GFLOP/s , 173656.8 tokens/s INFO:__main__:2024-11-30 05:12:13 | Epoch: 0 | Step: 323850 | Dataset: 0-4044360 | Loss: 0.748 | 598 ms/step , 115356.77 GFLOP/s , 173637.8 tokens/s INFO:__main__:2024-11-30 05:12:21 | Epoch: 0 | Step: 323860 | Dataset: 0-4046760 | Loss: 0.719 | 597 ms/step , 115682.63 GFLOP/s , 173632.8 tokens/s INFO:__main__:2024-11-30 05:12:28 | Epoch: 0 | Step: 323870 | Dataset: 0-4049160 | Loss: 0.734 | 598 ms/step , 115314.17 GFLOP/s , 173823.7 tokens/s INFO:__main__:2024-11-30 05:12:35 | Epoch: 0 | Step: 323880 | Dataset: 0-4051560 | Loss: 0.783 | 597 ms/step , 115574.69 GFLOP/s , 173719.5 tokens/s INFO:__main__:2024-11-30 05:12:42 | Epoch: 0 | Step: 323890 | Dataset: 0-4053960 | Loss: 0.704 | 598 ms/step , 115489.17 GFLOP/s , 173681.3 tokens/s INFO:__main__:2024-11-30 05:12:49 | Epoch: 0 | Step: 323900 | Dataset: 0-4056360 | Loss: 0.759 | 598 ms/step , 115369.54 GFLOP/s , 173720.7 tokens/s INFO:__main__:2024-11-30 05:12:56 | Epoch: 0 | Step: 323910 | Dataset: 0-4058760 | Loss: 0.703 | 598 ms/step , 115451.10 GFLOP/s , 173638.2 tokens/s INFO:__main__:2024-11-30 05:13:03 | Epoch: 0 | Step: 323920 | Dataset: 0-4061160 | Loss: 0.718 | 597 ms/step , 115630.93 GFLOP/s , 173671.2 tokens/s INFO:__main__:2024-11-30 05:13:10 | Epoch: 0 | Step: 323930 | Dataset: 0-4063560 | Loss: 0.758 | 598 ms/step , 115461.72 GFLOP/s , 173681.8 tokens/s INFO:__main__:2024-11-30 05:13:17 | Epoch: 0 | Step: 323940 | Dataset: 0-4065960 | Loss: 0.973 | 597 ms/step , 115687.29 GFLOP/s , 173823.7 tokens/s INFO:__main__:2024-11-30 05:13:24 | Epoch: 0 | Step: 323950 | Dataset: 0-4068360 | Loss: 0.884 | 597 ms/step , 115593.00 GFLOP/s , 173877.8 tokens/s INFO:__main__:2024-11-30 05:13:31 | Epoch: 0 | Step: 323960 | Dataset: 0-4070760 | Loss: 0.946 | 597 ms/step , 115564.01 GFLOP/s , 173728.9 tokens/s INFO:__main__:2024-11-30 05:13:38 | Epoch: 0 | Step: 323970 | Dataset: 0-4073160 | Loss: 0.860 | 597 ms/step , 115647.52 GFLOP/s , 173636.0 tokens/s INFO:__main__:2024-11-30 05:13:45 | Epoch: 0 | Step: 323980 | Dataset: 0-4075560 | Loss: 0.893 | 597 ms/step , 115611.88 GFLOP/s , 173628.8 tokens/s INFO:__main__:2024-11-30 05:13:52 | Epoch: 0 | Step: 323990 | Dataset: 0-4077960 | Loss: 0.920 | 598 ms/step , 115485.41 GFLOP/s , 173692.9 tokens/s INFO:__main__:2024-11-30 05:14:00 | Validation | Step: 324000 | Val_loss: 1.309 | Best_val_loss: 0.4363 INFO:__main__:2024-11-30 05:14:00 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_051400_step_324000.pt` INFO:__main__:2024-11-30 05:14:03 | Epoch: 0 | Step: 324000 | Dataset: 0-4080360 | Loss: 0.833 | 596 ms/step , 115866.91 GFLOP/s , 118552.4 tokens/s INFO:__main__:2024-11-30 05:14:10 | Epoch: 0 | Step: 324010 | Dataset: 0-4082760 | Loss: 0.911 | 598 ms/step , 115398.45 GFLOP/s , 173402.0 tokens/s INFO:__main__:2024-11-30 05:14:17 | Epoch: 0 | Step: 324020 | Dataset: 0-4085160 | Loss: 0.841 | 602 ms/step , 114637.82 GFLOP/s , 173324.9 tokens/s INFO:__main__:2024-11-30 05:14:24 | Epoch: 0 | Step: 324030 | Dataset: 0-4087560 | Loss: 0.951 | 599 ms/step , 115266.38 GFLOP/s , 173177.5 tokens/s INFO:__main__:2024-11-30 05:14:31 | Epoch: 0 | Step: 324040 | Dataset: 0-4089960 | Loss: 0.693 | 599 ms/step , 115267.96 GFLOP/s , 173198.4 tokens/s INFO:__main__:2024-11-30 05:14:38 | Epoch: 0 | Step: 324050 | Dataset: 0-4092360 | Loss: 0.706 | 599 ms/step , 115268.56 GFLOP/s , 173258.0 tokens/s INFO:__main__:2024-11-30 05:14:45 | Epoch: 0 | Step: 324060 | Dataset: 0-4094760 | Loss: 0.682 | 599 ms/step , 115277.64 GFLOP/s , 173268.1 tokens/s INFO:__main__:2024-11-30 05:14:52 | Epoch: 0 | Step: 324070 | Dataset: 0-4097160 | Loss: 0.690 | 599 ms/step , 115218.69 GFLOP/s , 173285.5 tokens/s INFO:__main__:2024-11-30 05:15:00 | Epoch: 0 | Step: 324080 | Dataset: 0-4099560 | Loss: 0.706 | 599 ms/step , 115293.07 GFLOP/s , 173333.0 tokens/s INFO:__main__:2024-11-30 05:15:07 | Epoch: 0 | Step: 324090 | Dataset: 0-4101960 | Loss: 0.667 | 599 ms/step , 115281.97 GFLOP/s , 173441.1 tokens/s INFO:__main__:2024-11-30 05:15:14 | Epoch: 0 | Step: 324100 | Dataset: 0-4104360 | Loss: 0.667 | 599 ms/step , 115186.91 GFLOP/s , 173413.8 tokens/s INFO:__main__:2024-11-30 05:15:21 | Epoch: 0 | Step: 324110 | Dataset: 0-4106760 | Loss: 0.667 | 599 ms/step , 115160.96 GFLOP/s , 173286.3 tokens/s INFO:__main__:2024-11-30 05:15:28 | Epoch: 0 | Step: 324120 | Dataset: 0-4109160 | Loss: 0.650 | 597 ms/step , 115584.89 GFLOP/s , 173766.9 tokens/s INFO:__main__:2024-11-30 05:15:35 | Epoch: 0 | Step: 324130 | Dataset: 0-4111560 | Loss: 0.654 | 597 ms/step , 115642.37 GFLOP/s , 173761.0 tokens/s INFO:__main__:2024-11-30 05:15:42 | Epoch: 0 | Step: 324140 | Dataset: 0-4113960 | Loss: 0.652 | 597 ms/step , 115561.13 GFLOP/s , 173765.9 tokens/s INFO:__main__:2024-11-30 05:15:49 | Epoch: 0 | Step: 324150 | Dataset: 0-4116360 | Loss: 0.649 | 597 ms/step , 115669.43 GFLOP/s , 173762.8 tokens/s INFO:__main__:2024-11-30 05:15:56 | Epoch: 0 | Step: 324160 | Dataset: 0-4118760 | Loss: 0.654 | 596 ms/step , 115734.02 GFLOP/s , 173794.2 tokens/s INFO:__main__:2024-11-30 05:16:03 | Epoch: 0 | Step: 324170 | Dataset: 0-4121160 | Loss: 0.653 | 596 ms/step , 115705.64 GFLOP/s , 173906.1 tokens/s INFO:__main__:2024-11-30 05:16:10 | Epoch: 0 | Step: 324180 | Dataset: 0-4123560 | Loss: 0.681 | 598 ms/step , 115494.73 GFLOP/s , 173798.5 tokens/s INFO:__main__:2024-11-30 05:16:17 | Epoch: 0 | Step: 324190 | Dataset: 0-4125960 | Loss: 0.677 | 597 ms/step , 115626.72 GFLOP/s , 173785.9 tokens/s INFO:__main__:2024-11-30 05:16:24 | Epoch: 0 | Step: 324200 | Dataset: 0-4128360 | Loss: 0.667 | 598 ms/step , 115429.68 GFLOP/s , 173759.5 tokens/s INFO:__main__:2024-11-30 05:16:32 | Epoch: 0 | Step: 324210 | Dataset: 0-4130760 | Loss: 0.627 | 597 ms/step , 115657.95 GFLOP/s , 173782.9 tokens/s INFO:__main__:2024-11-30 05:16:39 | Epoch: 0 | Step: 324220 | Dataset: 0-4133160 | Loss: 0.631 | 597 ms/step , 115563.37 GFLOP/s , 173793.3 tokens/s INFO:__main__:2024-11-30 05:16:46 | Epoch: 0 | Step: 324230 | Dataset: 0-4135560 | Loss: 0.597 | 596 ms/step , 115754.87 GFLOP/s , 173771.0 tokens/s INFO:__main__:2024-11-30 05:16:53 | Epoch: 0 | Step: 324240 | Dataset: 0-4137960 | Loss: 0.625 | 596 ms/step , 115722.75 GFLOP/s , 173880.4 tokens/s INFO:__main__:2024-11-30 05:17:00 | Epoch: 0 | Step: 324250 | Dataset: 0-4140360 | Loss: 0.672 | 596 ms/step , 115703.51 GFLOP/s , 173885.7 tokens/s INFO:__main__:2024-11-30 05:17:07 | Epoch: 0 | Step: 324260 | Dataset: 0-4142760 | Loss: 0.659 | 597 ms/step , 115588.48 GFLOP/s , 173727.4 tokens/s INFO:__main__:2024-11-30 05:17:14 | Epoch: 0 | Step: 324270 | Dataset: 0-4145160 | Loss: 0.646 | 597 ms/step , 115689.46 GFLOP/s , 173758.3 tokens/s INFO:__main__:2024-11-30 05:17:21 | Epoch: 0 | Step: 324280 | Dataset: 0-4147560 | Loss: 0.630 | 597 ms/step , 115575.55 GFLOP/s , 173793.5 tokens/s INFO:__main__:2024-11-30 05:17:28 | Epoch: 0 | Step: 324290 | Dataset: 0-4149960 | Loss: 0.613 | 596 ms/step , 115751.09 GFLOP/s , 173799.5 tokens/s INFO:__main__:2024-11-30 05:17:35 | Epoch: 0 | Step: 324300 | Dataset: 0-4152360 | Loss: 0.642 | 598 ms/step , 115444.51 GFLOP/s , 173745.2 tokens/s INFO:__main__:2024-11-30 05:17:42 | Epoch: 0 | Step: 324310 | Dataset: 0-4154760 | Loss: 0.619 | 597 ms/step , 115620.37 GFLOP/s , 173835.6 tokens/s INFO:__main__:2024-11-30 05:17:49 | Epoch: 0 | Step: 324320 | Dataset: 0-4157160 | Loss: 0.610 | 596 ms/step , 115779.84 GFLOP/s , 173917.1 tokens/s INFO:__main__:2024-11-30 05:17:56 | Epoch: 0 | Step: 324330 | Dataset: 0-4159560 | Loss: 0.648 | 597 ms/step , 115657.08 GFLOP/s , 173750.1 tokens/s INFO:__main__:2024-11-30 05:18:03 | Epoch: 0 | Step: 324340 | Dataset: 0-4161960 | Loss: 0.611 | 597 ms/step , 115667.80 GFLOP/s , 173710.2 tokens/s INFO:__main__:2024-11-30 05:18:11 | Epoch: 0 | Step: 324350 | Dataset: 0-4164360 | Loss: 0.615 | 598 ms/step , 115457.70 GFLOP/s , 173768.5 tokens/s INFO:__main__:2024-11-30 05:18:18 | Epoch: 0 | Step: 324360 | Dataset: 0-4166760 | Loss: 0.590 | 597 ms/step , 115667.79 GFLOP/s , 173740.9 tokens/s INFO:__main__:2024-11-30 05:18:25 | Epoch: 0 | Step: 324370 | Dataset: 0-4169160 | Loss: 0.611 | 597 ms/step , 115694.64 GFLOP/s , 173717.6 tokens/s INFO:__main__:2024-11-30 05:18:32 | Epoch: 0 | Step: 324380 | Dataset: 0-4171560 | Loss: 0.593 | 597 ms/step , 115670.18 GFLOP/s , 173753.6 tokens/s INFO:__main__:2024-11-30 05:18:39 | Epoch: 0 | Step: 324390 | Dataset: 0-4173960 | Loss: 0.611 | 597 ms/step , 115650.49 GFLOP/s , 173858.0 tokens/s INFO:__main__:2024-11-30 05:18:46 | Epoch: 0 | Step: 324400 | Dataset: 0-4176360 | Loss: 0.621 | 597 ms/step , 115679.20 GFLOP/s , 173863.2 tokens/s INFO:__main__:2024-11-30 05:18:53 | Epoch: 0 | Step: 324410 | Dataset: 0-4178760 | Loss: 0.662 | 597 ms/step , 115635.69 GFLOP/s , 173746.1 tokens/s INFO:__main__:2024-11-30 05:19:00 | Epoch: 0 | Step: 324420 | Dataset: 0-4181160 | Loss: 0.600 | 597 ms/step , 115537.30 GFLOP/s , 173715.4 tokens/s INFO:__main__:2024-11-30 05:19:07 | Epoch: 0 | Step: 324430 | Dataset: 0-4183560 | Loss: 0.654 | 597 ms/step , 115511.54 GFLOP/s , 173725.2 tokens/s INFO:__main__:2024-11-30 05:19:14 | Epoch: 0 | Step: 324440 | Dataset: 0-4185960 | Loss: 0.614 | 597 ms/step , 115695.52 GFLOP/s , 173726.7 tokens/s INFO:__main__:2024-11-30 05:19:21 | Epoch: 0 | Step: 324450 | Dataset: 0-4188360 | Loss: 0.664 | 597 ms/step , 115540.17 GFLOP/s , 173742.5 tokens/s INFO:__main__:2024-11-30 05:19:28 | Epoch: 0 | Step: 324460 | Dataset: 0-4190760 | Loss: 0.618 | 596 ms/step , 115740.09 GFLOP/s , 173756.2 tokens/s INFO:__main__:2024-11-30 05:19:35 | Epoch: 0 | Step: 324470 | Dataset: 0-4193160 | Loss: 0.660 | 597 ms/step , 115668.73 GFLOP/s , 173898.0 tokens/s INFO:__main__:2024-11-30 05:19:42 | Epoch: 0 | Step: 324480 | Dataset: 0-4195560 | Loss: 0.520 | 596 ms/step , 115739.37 GFLOP/s , 173760.6 tokens/s INFO:__main__:2024-11-30 05:19:50 | Epoch: 0 | Step: 324490 | Dataset: 0-4197960 | Loss: 0.556 | 597 ms/step , 115632.47 GFLOP/s , 173887.5 tokens/s INFO:__main__:2024-11-30 05:19:57 | Validation | Step: 324500 | Val_loss: 1.317 | Best_val_loss: 0.4363 INFO:__main__:2024-11-30 05:19:58 | Epoch: 0 | Step: 324500 | Dataset: 0-4200360 | Loss: 0.533 | 595 ms/step , 116009.23 GFLOP/s , 147895.4 tokens/s INFO:__main__:2024-11-30 05:20:05 | Epoch: 0 | Step: 324510 | Dataset: 0-4202760 | Loss: 0.473 | 597 ms/step , 115633.97 GFLOP/s , 173942.2 tokens/s INFO:__main__:2024-11-30 05:20:12 | Epoch: 0 | Step: 324520 | Dataset: 0-4205160 | Loss: 0.529 | 596 ms/step , 115704.17 GFLOP/s , 173872.9 tokens/s INFO:__main__:2024-11-30 05:20:19 | Epoch: 0 | Step: 324530 | Dataset: 0-4207560 | Loss: 0.440 | 597 ms/step , 115605.11 GFLOP/s , 173915.7 tokens/s INFO:__main__:2024-11-30 05:20:26 | Epoch: 0 | Step: 324540 | Dataset: 0-4209960 | Loss: 0.473 | 596 ms/step , 115866.82 GFLOP/s , 174020.7 tokens/s INFO:__main__:2024-11-30 05:20:33 | Epoch: 0 | Step: 324550 | Dataset: 0-4212360 | Loss: 0.509 | 597 ms/step , 115665.43 GFLOP/s , 173932.9 tokens/s INFO:__main__:2024-11-30 05:20:40 | Epoch: 0 | Step: 324560 | Dataset: 0-4214760 | Loss: 0.497 | 597 ms/step , 115668.37 GFLOP/s , 173863.0 tokens/s INFO:__main__:2024-11-30 05:20:47 | Epoch: 0 | Step: 324570 | Dataset: 0-4217160 | Loss: 0.546 | 597 ms/step , 115653.82 GFLOP/s , 173914.5 tokens/s INFO:__main__:2024-11-30 05:20:54 | Epoch: 0 | Step: 324580 | Dataset: 0-4219560 | Loss: 0.471 | 597 ms/step , 115693.78 GFLOP/s , 173833.9 tokens/s INFO:__main__:2024-11-30 05:21:01 | Epoch: 0 | Step: 324590 | Dataset: 0-4221960 | Loss: 0.514 | 598 ms/step , 115438.70 GFLOP/s , 173813.9 tokens/s INFO:__main__:2024-11-30 05:21:08 | Epoch: 0 | Step: 324600 | Dataset: 0-4224360 | Loss: 0.454 | 596 ms/step , 115728.87 GFLOP/s , 173820.9 tokens/s INFO:__main__:2024-11-30 05:21:16 | Epoch: 0 | Step: 324610 | Dataset: 0-4226760 | Loss: 0.478 | 597 ms/step , 115622.74 GFLOP/s , 173864.9 tokens/s INFO:__main__:2024-11-30 05:21:23 | Epoch: 0 | Step: 324620 | Dataset: 0-4229160 | Loss: 0.494 | 596 ms/step , 115742.98 GFLOP/s , 173959.0 tokens/s INFO:__main__:2024-11-30 05:21:30 | Epoch: 0 | Step: 324630 | Dataset: 0-4231560 | Loss: 0.476 | 596 ms/step , 115727.74 GFLOP/s , 173861.4 tokens/s INFO:__main__:2024-11-30 05:21:37 | Epoch: 0 | Step: 324640 | Dataset: 0-4233960 | Loss: 0.478 | 596 ms/step , 115776.30 GFLOP/s , 173883.2 tokens/s INFO:__main__:2024-11-30 05:21:44 | Epoch: 0 | Step: 324650 | Dataset: 0-4236360 | Loss: 0.494 | 596 ms/step , 115752.50 GFLOP/s , 173827.3 tokens/s INFO:__main__:2024-11-30 05:21:51 | Epoch: 0 | Step: 324660 | Dataset: 0-4238760 | Loss: 0.501 | 596 ms/step , 115758.14 GFLOP/s , 173892.4 tokens/s INFO:__main__:2024-11-30 05:21:58 | Epoch: 0 | Step: 324670 | Dataset: 0-4241160 | Loss: 0.479 | 597 ms/step , 115580.94 GFLOP/s , 173770.8 tokens/s INFO:__main__:2024-11-30 05:22:05 | Epoch: 0 | Step: 324680 | Dataset: 0-4243560 | Loss: 0.463 | 597 ms/step , 115672.75 GFLOP/s , 173880.9 tokens/s INFO:__main__:2024-11-30 05:22:12 | Epoch: 0 | Step: 324690 | Dataset: 0-4245960 | Loss: 0.459 | 596 ms/step , 115815.94 GFLOP/s , 173943.0 tokens/s INFO:__main__:2024-11-30 05:22:19 | Epoch: 0 | Step: 324700 | Dataset: 0-4248360 | Loss: 0.596 | 597 ms/step , 115553.30 GFLOP/s , 173866.9 tokens/s INFO:__main__:2024-11-30 05:22:26 | Epoch: 0 | Step: 324710 | Dataset: 0-4250760 | Loss: 0.450 | 597 ms/step , 115688.88 GFLOP/s , 173892.1 tokens/s INFO:__main__:2024-11-30 05:22:33 | Epoch: 0 | Step: 324720 | Dataset: 0-4253160 | Loss: 0.473 | 597 ms/step , 115632.33 GFLOP/s , 173796.3 tokens/s INFO:__main__:2024-11-30 05:22:40 | Epoch: 0 | Step: 324730 | Dataset: 0-4255560 | Loss: 0.492 | 596 ms/step , 115700.84 GFLOP/s , 173891.3 tokens/s INFO:__main__:2024-11-30 05:22:47 | Epoch: 0 | Step: 324740 | Dataset: 0-4257960 | Loss: 0.521 | 596 ms/step , 115728.42 GFLOP/s , 173794.6 tokens/s INFO:__main__:2024-11-30 05:22:54 | Epoch: 0 | Step: 324750 | Dataset: 0-4260360 | Loss: 0.448 | 597 ms/step , 115579.53 GFLOP/s , 173824.7 tokens/s INFO:__main__:2024-11-30 05:23:02 | Epoch: 0 | Step: 324760 | Dataset: 0-4262760 | Loss: 0.525 | 596 ms/step , 115701.86 GFLOP/s , 173840.0 tokens/s INFO:__main__:2024-11-30 05:23:09 | Epoch: 0 | Step: 324770 | Dataset: 0-4265160 | Loss: 0.524 | 596 ms/step , 115840.10 GFLOP/s , 173996.9 tokens/s INFO:__main__:2024-11-30 05:23:16 | Epoch: 0 | Step: 324780 | Dataset: 0-4267560 | Loss: 0.465 | 598 ms/step , 115463.68 GFLOP/s , 173874.3 tokens/s INFO:__main__:2024-11-30 05:23:23 | Epoch: 0 | Step: 324790 | Dataset: 0-4269960 | Loss: 0.486 | 597 ms/step , 115612.12 GFLOP/s , 173855.1 tokens/s INFO:__main__:2024-11-30 05:23:30 | Epoch: 0 | Step: 324800 | Dataset: 0-4272360 | Loss: 0.475 | 597 ms/step , 115672.34 GFLOP/s , 173916.0 tokens/s INFO:__main__:2024-11-30 05:23:37 | Epoch: 0 | Step: 324810 | Dataset: 0-4274760 | Loss: 0.501 | 596 ms/step , 115836.67 GFLOP/s , 173849.3 tokens/s INFO:__main__:2024-11-30 05:23:44 | Epoch: 0 | Step: 324820 | Dataset: 0-4277160 | Loss: 0.457 | 597 ms/step , 115691.68 GFLOP/s , 173766.3 tokens/s INFO:__main__:2024-11-30 05:23:51 | Epoch: 0 | Step: 324830 | Dataset: 0-4279560 | Loss: 0.422 | 596 ms/step , 115699.22 GFLOP/s , 173884.1 tokens/s INFO:__main__:2024-11-30 05:23:58 | Epoch: 0 | Step: 324840 | Dataset: 0-4281960 | Loss: 0.508 | 596 ms/step , 115798.09 GFLOP/s , 173980.5 tokens/s INFO:__main__:2024-11-30 05:24:05 | Epoch: 0 | Step: 324850 | Dataset: 0-4284360 | Loss: 0.487 | 596 ms/step , 115792.95 GFLOP/s , 173871.4 tokens/s INFO:__main__:2024-11-30 05:24:12 | Epoch: 0 | Step: 324860 | Dataset: 0-4286760 | Loss: 0.547 | 596 ms/step , 115739.30 GFLOP/s , 173787.8 tokens/s INFO:__main__:2024-11-30 05:24:19 | Epoch: 0 | Step: 324870 | Dataset: 0-4289160 | Loss: 0.479 | 596 ms/step , 115716.35 GFLOP/s , 173773.9 tokens/s INFO:__main__:2024-11-30 05:24:26 | Epoch: 0 | Step: 324880 | Dataset: 0-4291560 | Loss: 0.471 | 596 ms/step , 115758.47 GFLOP/s , 173827.6 tokens/s INFO:__main__:2024-11-30 05:24:33 | Epoch: 0 | Step: 324890 | Dataset: 0-4293960 | Loss: 0.473 | 596 ms/step , 115740.63 GFLOP/s , 173803.1 tokens/s INFO:__main__:2024-11-30 05:24:41 | Epoch: 0 | Step: 324900 | Dataset: 0-4296360 | Loss: 0.556 | 597 ms/step , 115610.19 GFLOP/s , 173808.5 tokens/s INFO:__main__:2024-11-30 05:24:48 | Epoch: 0 | Step: 324910 | Dataset: 0-4298760 | Loss: 0.509 | 596 ms/step , 115740.31 GFLOP/s , 173920.3 tokens/s INFO:__main__:2024-11-30 05:24:55 | Epoch: 0 | Step: 324920 | Dataset: 0-4301160 | Loss: 0.519 | 596 ms/step , 115783.88 GFLOP/s , 174049.2 tokens/s INFO:__main__:2024-11-30 05:25:02 | Epoch: 0 | Step: 324930 | Dataset: 0-4303560 | Loss: 0.433 | 596 ms/step , 115703.91 GFLOP/s , 173890.5 tokens/s INFO:__main__:2024-11-30 05:25:09 | Epoch: 0 | Step: 324940 | Dataset: 0-4305960 | Loss: 0.555 | 596 ms/step , 115745.35 GFLOP/s , 173904.1 tokens/s INFO:__main__:2024-11-30 05:25:16 | Epoch: 0 | Step: 324950 | Dataset: 0-4308360 | Loss: 0.501 | 596 ms/step , 115730.54 GFLOP/s , 173810.3 tokens/s INFO:__main__:2024-11-30 05:25:23 | Epoch: 0 | Step: 324960 | Dataset: 0-4310760 | Loss: 0.512 | 597 ms/step , 115609.05 GFLOP/s , 173794.8 tokens/s INFO:__main__:2024-11-30 05:25:30 | Epoch: 0 | Step: 324970 | Dataset: 0-4313160 | Loss: 0.495 | 596 ms/step , 115728.42 GFLOP/s , 173877.2 tokens/s INFO:__main__:2024-11-30 05:25:37 | Epoch: 0 | Step: 324980 | Dataset: 0-4315560 | Loss: 0.429 | 596 ms/step , 115766.65 GFLOP/s , 173883.5 tokens/s INFO:__main__:2024-11-30 05:25:44 | Epoch: 0 | Step: 324990 | Dataset: 0-4317960 | Loss: 0.526 | 596 ms/step , 115768.56 GFLOP/s , 173952.6 tokens/s INFO:__main__:2024-11-30 05:25:52 | Validation | Step: 325000 | Val_loss: 1.419 | Best_val_loss: 0.4363 INFO:__main__:2024-11-30 05:25:52 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_052552_step_325000.pt` INFO:__main__:2024-11-30 05:25:54 | Epoch: 0 | Step: 325000 | Dataset: 0-4320360 | Loss: 0.493 | 594 ms/step , 116139.74 GFLOP/s , 118920.1 tokens/s INFO:__main__:2024-11-30 05:26:02 | Epoch: 0 | Step: 325010 | Dataset: 0-4322760 | Loss: 0.483 | 598 ms/step , 115343.90 GFLOP/s , 173437.1 tokens/s INFO:__main__:2024-11-30 05:26:09 | Epoch: 0 | Step: 325020 | Dataset: 0-4325160 | Loss: 1.873 | 599 ms/step , 115138.15 GFLOP/s , 173113.5 tokens/s INFO:__main__:2024-11-30 05:26:16 | Epoch: 0 | Step: 325030 | Dataset: 0-4327560 | Loss: 0.598 | 600 ms/step , 115084.80 GFLOP/s , 173215.2 tokens/s INFO:__main__:2024-11-30 05:26:23 | Epoch: 0 | Step: 325040 | Dataset: 0-4329960 | Loss: 0.684 | 598 ms/step , 115309.71 GFLOP/s , 173269.3 tokens/s INFO:__main__:2024-11-30 05:26:30 | Epoch: 0 | Step: 325050 | Dataset: 0-4332360 | Loss: 0.558 | 598 ms/step , 115333.55 GFLOP/s , 173221.8 tokens/s INFO:__main__:2024-11-30 05:26:37 | Epoch: 0 | Step: 325060 | Dataset: 0-4334760 | Loss: 0.600 | 599 ms/step , 115247.90 GFLOP/s , 173268.7 tokens/s INFO:__main__:2024-11-30 05:26:44 | Epoch: 0 | Step: 325070 | Dataset: 0-4337160 | Loss: 0.631 | 599 ms/step , 115287.71 GFLOP/s , 173373.8 tokens/s INFO:__main__:2024-11-30 05:26:51 | Epoch: 0 | Step: 325080 | Dataset: 0-4339560 | Loss: 0.634 | 599 ms/step , 115139.69 GFLOP/s , 173266.1 tokens/s INFO:__main__:2024-11-30 05:26:58 | Epoch: 0 | Step: 325090 | Dataset: 0-4341960 | Loss: 0.601 | 599 ms/step , 115301.29 GFLOP/s , 173152.9 tokens/s INFO:__main__:2024-11-30 05:27:05 | Epoch: 0 | Step: 325100 | Dataset: 0-4344360 | Loss: 0.594 | 598 ms/step , 115360.18 GFLOP/s , 173252.9 tokens/s INFO:__main__:2024-11-30 05:27:12 | Epoch: 0 | Step: 325110 | Dataset: 0-4346760 | Loss: 0.622 | 599 ms/step , 115146.97 GFLOP/s , 173312.1 tokens/s INFO:__main__:2024-11-30 05:27:20 | Epoch: 0 | Step: 325120 | Dataset: 0-4349160 | Loss: 0.650 | 599 ms/step , 115307.49 GFLOP/s , 173297.4 tokens/s INFO:__main__:2024-11-30 05:27:27 | Epoch: 0 | Step: 325130 | Dataset: 0-4351560 | Loss: 0.623 | 596 ms/step , 115734.93 GFLOP/s , 173381.7 tokens/s INFO:__main__:2024-11-30 05:27:34 | Epoch: 0 | Step: 325140 | Dataset: 0-4353960 | Loss: 0.616 | 596 ms/step , 115866.54 GFLOP/s , 173877.2 tokens/s INFO:__main__:2024-11-30 05:27:41 | Epoch: 0 | Step: 325150 | Dataset: 0-4356360 | Loss: 0.646 | 597 ms/step , 115568.07 GFLOP/s , 173899.3 tokens/s INFO:__main__:2024-11-30 05:27:48 | Epoch: 0 | Step: 325160 | Dataset: 0-4358760 | Loss: 0.459 | 596 ms/step , 115741.36 GFLOP/s , 173796.3 tokens/s INFO:__main__:2024-11-30 05:27:55 | Epoch: 0 | Step: 325170 | Dataset: 0-4361160 | Loss: 0.455 | 597 ms/step , 115647.75 GFLOP/s , 173853.5 tokens/s INFO:__main__:2024-11-30 05:28:02 | Epoch: 0 | Step: 325180 | Dataset: 0-4363560 | Loss: 0.477 | 597 ms/step , 115594.30 GFLOP/s , 173764.7 tokens/s INFO:__main__:2024-11-30 05:28:09 | Epoch: 0 | Step: 325190 | Dataset: 0-4365960 | Loss: 0.485 | 597 ms/step , 115548.87 GFLOP/s , 173816.7 tokens/s INFO:__main__:2024-11-30 05:28:16 | Epoch: 0 | Step: 325200 | Dataset: 0-4368360 | Loss: 0.432 | 597 ms/step , 115653.81 GFLOP/s , 173810.0 tokens/s INFO:__main__:2024-11-30 05:28:23 | Epoch: 0 | Step: 325210 | Dataset: 0-4370760 | Loss: 0.463 | 596 ms/step , 115775.97 GFLOP/s , 173864.9 tokens/s INFO:__main__:2024-11-30 05:28:30 | Epoch: 0 | Step: 325220 | Dataset: 0-4373160 | Loss: 0.435 | 597 ms/step , 115594.36 GFLOP/s , 173924.1 tokens/s INFO:__main__:2024-11-30 05:28:37 | Epoch: 0 | Step: 325230 | Dataset: 0-4375560 | Loss: 0.428 | 597 ms/step , 115549.87 GFLOP/s , 173884.8 tokens/s INFO:__main__:2024-11-30 05:28:44 | Epoch: 0 | Step: 325240 | Dataset: 0-4377960 | Loss: 0.470 | 597 ms/step , 115679.02 GFLOP/s , 173863.6 tokens/s INFO:__main__:2024-11-30 05:28:51 | Epoch: 0 | Step: 325250 | Dataset: 0-4380360 | Loss: 0.434 | 596 ms/step , 115700.35 GFLOP/s , 173846.8 tokens/s INFO:__main__:2024-11-30 05:28:59 | Epoch: 0 | Step: 325260 | Dataset: 0-4382760 | Loss: 0.504 | 597 ms/step , 115668.44 GFLOP/s , 173780.8 tokens/s INFO:__main__:2024-11-30 05:29:06 | Epoch: 0 | Step: 325270 | Dataset: 0-4385160 | Loss: 0.398 | 597 ms/step , 115576.12 GFLOP/s , 173794.7 tokens/s INFO:__main__:2024-11-30 05:29:13 | Epoch: 0 | Step: 325280 | Dataset: 0-4387560 | Loss: 0.461 | 596 ms/step , 115759.98 GFLOP/s , 173839.4 tokens/s INFO:__main__:2024-11-30 05:29:20 | Epoch: 0 | Step: 325290 | Dataset: 0-4389960 | Loss: 0.427 | 596 ms/step , 115700.12 GFLOP/s , 173875.1 tokens/s INFO:__main__:2024-11-30 05:29:27 | Epoch: 0 | Step: 325300 | Dataset: 0-4392360 | Loss: 0.430 | 596 ms/step , 115775.82 GFLOP/s , 173852.9 tokens/s INFO:__main__:2024-11-30 05:29:34 | Epoch: 0 | Step: 325310 | Dataset: 0-4394760 | Loss: 0.435 | 597 ms/step , 115622.94 GFLOP/s , 173869.6 tokens/s INFO:__main__:2024-11-30 05:29:41 | Epoch: 0 | Step: 325320 | Dataset: 0-4397160 | Loss: 0.456 | 596 ms/step , 115766.72 GFLOP/s , 173706.4 tokens/s INFO:__main__:2024-11-30 05:29:48 | Epoch: 0 | Step: 325330 | Dataset: 0-4399560 | Loss: 0.468 | 597 ms/step , 115612.45 GFLOP/s , 173882.1 tokens/s INFO:__main__:2024-11-30 05:29:55 | Epoch: 0 | Step: 325340 | Dataset: 0-4401960 | Loss: 0.412 | 597 ms/step , 115693.84 GFLOP/s , 173771.6 tokens/s INFO:__main__:2024-11-30 05:30:02 | Epoch: 0 | Step: 325350 | Dataset: 0-4404360 | Loss: 0.412 | 597 ms/step , 115606.68 GFLOP/s , 173762.6 tokens/s INFO:__main__:2024-11-30 05:30:09 | Epoch: 0 | Step: 325360 | Dataset: 0-4406760 | Loss: 0.396 | 596 ms/step , 115841.42 GFLOP/s , 173789.4 tokens/s INFO:__main__:2024-11-30 05:30:16 | Epoch: 0 | Step: 325370 | Dataset: 0-4409160 | Loss: 0.343 | 596 ms/step , 115704.37 GFLOP/s , 173891.0 tokens/s INFO:__main__:2024-11-30 05:30:23 | Epoch: 0 | Step: 325380 | Dataset: 0-4411560 | Loss: 0.436 | 597 ms/step , 115686.13 GFLOP/s , 173906.0 tokens/s INFO:__main__:2024-11-30 05:30:30 | Epoch: 0 | Step: 325390 | Dataset: 0-4413960 | Loss: 0.418 | 596 ms/step , 115759.80 GFLOP/s , 173780.3 tokens/s INFO:__main__:2024-11-30 05:30:37 | Epoch: 0 | Step: 325400 | Dataset: 0-4416360 | Loss: 0.371 | 597 ms/step , 115622.80 GFLOP/s , 173821.1 tokens/s INFO:__main__:2024-11-30 05:30:45 | Epoch: 0 | Step: 325410 | Dataset: 0-4418760 | Loss: 0.418 | 596 ms/step , 115720.68 GFLOP/s , 173792.7 tokens/s INFO:__main__:2024-11-30 05:30:52 | Epoch: 0 | Step: 325420 | Dataset: 0-4421160 | Loss: 0.457 | 597 ms/step , 115667.09 GFLOP/s , 173821.6 tokens/s INFO:__main__:2024-11-30 05:30:59 | Epoch: 0 | Step: 325430 | Dataset: 0-4423560 | Loss: 0.451 | 597 ms/step , 115691.09 GFLOP/s , 173749.0 tokens/s INFO:__main__:2024-11-30 05:31:06 | Epoch: 0 | Step: 325440 | Dataset: 0-4425960 | Loss: 0.383 | 596 ms/step , 115760.44 GFLOP/s , 173798.7 tokens/s INFO:__main__:2024-11-30 05:31:13 | Epoch: 0 | Step: 325450 | Dataset: 0-4428360 | Loss: 0.415 | 596 ms/step , 115749.68 GFLOP/s , 173904.8 tokens/s INFO:__main__:2024-11-30 05:31:20 | Epoch: 0 | Step: 325460 | Dataset: 0-4430760 | Loss: 0.434 | 597 ms/step , 115688.41 GFLOP/s , 173806.9 tokens/s INFO:__main__:2024-11-30 05:31:27 | Epoch: 0 | Step: 325470 | Dataset: 0-4433160 | Loss: 0.457 | 597 ms/step , 115670.98 GFLOP/s , 173803.8 tokens/s INFO:__main__:2024-11-30 05:31:34 | Epoch: 0 | Step: 325480 | Dataset: 0-4435560 | Loss: 0.456 | 597 ms/step , 115672.70 GFLOP/s , 173771.4 tokens/s INFO:__main__:2024-11-30 05:31:41 | Epoch: 0 | Step: 325490 | Dataset: 0-4437960 | Loss: 0.469 | 597 ms/step , 115630.96 GFLOP/s , 173748.3 tokens/s INFO:__main__:2024-11-30 05:31:49 | Validation | Step: 325500 | Val_loss: 1.385 | Best_val_loss: 0.4363 INFO:__main__:2024-11-30 05:31:49 | Epoch: 0 | Step: 325500 | Dataset: 0-4440360 | Loss: 0.427 | 595 ms/step , 116012.43 GFLOP/s , 147875.1 tokens/s INFO:__main__:2024-11-30 05:31:56 | Epoch: 0 | Step: 325510 | Dataset: 0-4442760 | Loss: 0.438 | 596 ms/step , 115733.84 GFLOP/s , 173806.7 tokens/s INFO:__main__:2024-11-30 05:32:04 | Epoch: 0 | Step: 325520 | Dataset: 0-4445160 | Loss: 0.462 | 597 ms/step , 115642.43 GFLOP/s , 173963.6 tokens/s INFO:__main__:2024-11-30 05:32:11 | Epoch: 0 | Step: 325530 | Dataset: 0-4447560 | Loss: 0.425 | 597 ms/step , 115654.19 GFLOP/s , 173950.4 tokens/s INFO:__main__:2024-11-30 05:32:18 | Epoch: 0 | Step: 325540 | Dataset: 0-4449960 | Loss: 0.416 | 597 ms/step , 115640.94 GFLOP/s , 173771.4 tokens/s INFO:__main__:2024-11-30 05:32:25 | Epoch: 0 | Step: 325550 | Dataset: 0-4452360 | Loss: 0.432 | 597 ms/step , 115628.09 GFLOP/s , 173799.1 tokens/s INFO:__main__:2024-11-30 05:32:32 | Epoch: 0 | Step: 325560 | Dataset: 0-4454760 | Loss: 0.461 | 596 ms/step , 115800.72 GFLOP/s , 173796.1 tokens/s INFO:__main__:2024-11-30 05:32:39 | Epoch: 0 | Step: 325570 | Dataset: 0-4457160 | Loss: 0.349 | 596 ms/step , 115728.83 GFLOP/s , 173788.8 tokens/s INFO:__main__:2024-11-30 05:32:46 | Epoch: 0 | Step: 325580 | Dataset: 0-4459560 | Loss: 0.392 | 596 ms/step , 115736.94 GFLOP/s , 173821.6 tokens/s INFO:__main__:2024-11-30 05:32:53 | Epoch: 0 | Step: 325590 | Dataset: 0-4461960 | Loss: 0.380 | 597 ms/step , 115683.52 GFLOP/s , 173921.4 tokens/s INFO:__main__:2024-11-30 05:33:00 | Epoch: 0 | Step: 325600 | Dataset: 0-4464360 | Loss: 0.426 | 596 ms/step , 115779.58 GFLOP/s , 174009.2 tokens/s INFO:__main__:2024-11-30 05:33:07 | Epoch: 0 | Step: 325610 | Dataset: 0-4466760 | Loss: 0.330 | 596 ms/step , 115785.13 GFLOP/s , 173919.4 tokens/s INFO:__main__:2024-11-30 05:33:14 | Epoch: 0 | Step: 325620 | Dataset: 0-4469160 | Loss: 0.348 | 596 ms/step , 115730.83 GFLOP/s , 173854.9 tokens/s INFO:__main__:2024-11-30 05:33:21 | Epoch: 0 | Step: 325630 | Dataset: 0-4471560 | Loss: 0.348 | 597 ms/step , 115527.05 GFLOP/s , 173841.2 tokens/s INFO:__main__:2024-11-30 05:33:28 | Epoch: 0 | Step: 325640 | Dataset: 0-4473960 | Loss: 0.398 | 597 ms/step , 115646.34 GFLOP/s , 173813.2 tokens/s INFO:__main__:2024-11-30 05:33:35 | Epoch: 0 | Step: 325650 | Dataset: 0-4476360 | Loss: 0.375 | 596 ms/step , 115739.98 GFLOP/s , 173832.8 tokens/s INFO:__main__:2024-11-30 05:33:43 | Epoch: 0 | Step: 325660 | Dataset: 0-4478760 | Loss: 0.348 | 597 ms/step , 115625.06 GFLOP/s , 173858.7 tokens/s INFO:__main__:2024-11-30 05:33:50 | Epoch: 0 | Step: 325670 | Dataset: 0-4481160 | Loss: 0.339 | 595 ms/step , 115907.09 GFLOP/s , 173980.7 tokens/s INFO:__main__:2024-11-30 05:33:57 | Epoch: 0 | Step: 325680 | Dataset: 0-4483560 | Loss: 0.348 | 596 ms/step , 115792.19 GFLOP/s , 174022.4 tokens/s INFO:__main__:2024-11-30 05:34:04 | Epoch: 0 | Step: 325690 | Dataset: 0-4485960 | Loss: 0.361 | 596 ms/step , 115797.42 GFLOP/s , 173840.7 tokens/s INFO:__main__:2024-11-30 05:34:11 | Epoch: 0 | Step: 325700 | Dataset: 0-4488360 | Loss: 0.340 | 596 ms/step , 115732.85 GFLOP/s , 173879.1 tokens/s INFO:__main__:2024-11-30 05:34:18 | Epoch: 0 | Step: 325710 | Dataset: 0-4490760 | Loss: 0.278 | 597 ms/step , 115633.23 GFLOP/s , 173798.2 tokens/s INFO:__main__:2024-11-30 05:34:25 | Epoch: 0 | Step: 325720 | Dataset: 0-4493160 | Loss: 0.382 | 596 ms/step , 115804.51 GFLOP/s , 173909.9 tokens/s INFO:__main__:2024-11-30 05:34:32 | Epoch: 0 | Step: 325730 | Dataset: 0-4495560 | Loss: 0.317 | 596 ms/step , 115743.41 GFLOP/s , 173822.5 tokens/s INFO:__main__:2024-11-30 05:34:39 | Epoch: 0 | Step: 325740 | Dataset: 0-4497960 | Loss: 0.309 | 596 ms/step , 115797.99 GFLOP/s , 173882.0 tokens/s INFO:__main__:2024-11-30 05:34:46 | Epoch: 0 | Step: 325750 | Dataset: 0-4500360 | Loss: 0.325 | 596 ms/step , 115870.44 GFLOP/s , 173980.2 tokens/s INFO:__main__:2024-11-30 05:34:53 | Epoch: 0 | Step: 325760 | Dataset: 0-4502760 | Loss: 0.390 | 597 ms/step , 115643.70 GFLOP/s , 173980.7 tokens/s INFO:__main__:2024-11-30 05:35:00 | Epoch: 0 | Step: 325770 | Dataset: 0-4505160 | Loss: 0.352 | 596 ms/step , 115743.80 GFLOP/s , 173912.8 tokens/s INFO:__main__:2024-11-30 05:35:07 | Epoch: 0 | Step: 325780 | Dataset: 0-4507560 | Loss: 0.421 | 596 ms/step , 115738.33 GFLOP/s , 173845.4 tokens/s INFO:__main__:2024-11-30 05:35:14 | Epoch: 0 | Step: 325790 | Dataset: 0-4509960 | Loss: 0.355 | 597 ms/step , 115658.88 GFLOP/s , 173805.5 tokens/s INFO:__main__:2024-11-30 05:35:21 | Epoch: 0 | Step: 325800 | Dataset: 0-4512360 | Loss: 0.310 | 596 ms/step , 115731.94 GFLOP/s , 173794.1 tokens/s INFO:__main__:2024-11-30 05:35:29 | Epoch: 0 | Step: 325810 | Dataset: 0-4514760 | Loss: 0.312 | 596 ms/step , 115759.29 GFLOP/s , 173932.3 tokens/s INFO:__main__:2024-11-30 05:35:36 | Epoch: 0 | Step: 325820 | Dataset: 0-4517160 | Loss: 0.375 | 596 ms/step , 115737.66 GFLOP/s , 173897.4 tokens/s INFO:__main__:2024-11-30 05:35:43 | Epoch: 0 | Step: 325830 | Dataset: 0-4519560 | Loss: 0.364 | 598 ms/step , 115406.95 GFLOP/s , 173962.8 tokens/s INFO:__main__:2024-11-30 05:35:50 | Epoch: 0 | Step: 325840 | Dataset: 0-4521960 | Loss: 0.370 | 596 ms/step , 115714.44 GFLOP/s , 173828.7 tokens/s INFO:__main__:2024-11-30 05:35:57 | Epoch: 0 | Step: 325850 | Dataset: 0-4524360 | Loss: 0.289 | 596 ms/step , 115744.52 GFLOP/s , 173825.1 tokens/s INFO:__main__:2024-11-30 05:36:04 | Epoch: 0 | Step: 325860 | Dataset: 0-4526760 | Loss: 0.349 | 596 ms/step , 115713.76 GFLOP/s , 173847.4 tokens/s INFO:__main__:2024-11-30 05:36:11 | Epoch: 0 | Step: 325870 | Dataset: 0-4529160 | Loss: 0.337 | 598 ms/step , 115485.66 GFLOP/s , 173846.0 tokens/s INFO:__main__:2024-11-30 05:36:18 | Epoch: 0 | Step: 325880 | Dataset: 0-4531560 | Loss: 0.416 | 598 ms/step , 115398.84 GFLOP/s , 173799.6 tokens/s INFO:__main__:2024-11-30 05:36:25 | Epoch: 0 | Step: 325890 | Dataset: 0-4533960 | Loss: 0.338 | 597 ms/step , 115669.85 GFLOP/s , 173823.5 tokens/s INFO:__main__:2024-11-30 05:36:32 | Epoch: 0 | Step: 325900 | Dataset: 0-4536360 | Loss: 0.378 | 596 ms/step , 115755.18 GFLOP/s , 173983.6 tokens/s INFO:__main__:2024-11-30 05:36:39 | Epoch: 0 | Step: 325910 | Dataset: 0-4538760 | Loss: 0.353 | 596 ms/step , 115762.92 GFLOP/s , 173983.2 tokens/s INFO:__main__:2024-11-30 05:36:46 | Epoch: 0 | Step: 325920 | Dataset: 0-4541160 | Loss: 0.340 | 597 ms/step , 115615.18 GFLOP/s , 173887.5 tokens/s INFO:__main__:2024-11-30 05:36:53 | Epoch: 0 | Step: 325930 | Dataset: 0-4543560 | Loss: 0.420 | 597 ms/step , 115656.64 GFLOP/s , 173823.1 tokens/s INFO:__main__:2024-11-30 05:37:00 | Epoch: 0 | Step: 325940 | Dataset: 0-4545960 | Loss: 0.361 | 596 ms/step , 115747.65 GFLOP/s , 173895.8 tokens/s INFO:__main__:2024-11-30 05:37:07 | Epoch: 0 | Step: 325950 | Dataset: 0-4548360 | Loss: 0.324 | 597 ms/step , 115689.01 GFLOP/s , 173846.7 tokens/s INFO:__main__:2024-11-30 05:37:15 | Epoch: 0 | Step: 325960 | Dataset: 0-4550760 | Loss: 0.361 | 596 ms/step , 115725.53 GFLOP/s , 173824.6 tokens/s INFO:__main__:2024-11-30 05:37:22 | Epoch: 0 | Step: 325970 | Dataset: 0-4553160 | Loss: 0.403 | 597 ms/step , 115680.84 GFLOP/s , 173869.8 tokens/s INFO:__main__:2024-11-30 05:37:29 | Epoch: 0 | Step: 325980 | Dataset: 0-4555560 | Loss: 0.335 | 596 ms/step , 115747.93 GFLOP/s , 174034.1 tokens/s INFO:__main__:2024-11-30 05:37:36 | Epoch: 0 | Step: 325990 | Dataset: 0-4557960 | Loss: 0.297 | 596 ms/step , 115834.65 GFLOP/s , 173949.1 tokens/s INFO:__main__:2024-11-30 05:37:43 | Validation | Step: 326000 | Val_loss: 1.466 | Best_val_loss: 0.4363 INFO:__main__:2024-11-30 05:37:43 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_053743_step_326000.pt` INFO:__main__:2024-11-30 05:37:46 | Epoch: 0 | Step: 326000 | Dataset: 0-4560360 | Loss: 0.285 | 595 ms/step , 116024.19 GFLOP/s , 118405.5 tokens/s INFO:__main__:2024-11-30 05:37:53 | Epoch: 0 | Step: 326010 | Dataset: 0-4562760 | Loss: 0.297 | 598 ms/step , 115315.93 GFLOP/s , 173442.3 tokens/s INFO:__main__:2024-11-30 05:38:00 | Epoch: 0 | Step: 326020 | Dataset: 0-4565160 | Loss: 0.345 | 598 ms/step , 115319.97 GFLOP/s , 173398.5 tokens/s INFO:__main__:2024-11-30 05:38:07 | Epoch: 0 | Step: 326030 | Dataset: 0-4567560 | Loss: 0.337 | 598 ms/step , 115418.32 GFLOP/s , 173416.6 tokens/s INFO:__main__:2024-11-30 05:38:14 | Epoch: 0 | Step: 326040 | Dataset: 0-4569960 | Loss: 0.334 | 596 ms/step , 115785.95 GFLOP/s , 173377.4 tokens/s INFO:__main__:2024-11-30 05:38:22 | Epoch: 0 | Step: 326050 | Dataset: 0-4572360 | Loss: 0.287 | 596 ms/step , 115736.97 GFLOP/s , 173690.7 tokens/s INFO:__main__:2024-11-30 05:38:29 | Epoch: 0 | Step: 326060 | Dataset: 0-4574760 | Loss: 0.378 | 596 ms/step , 115751.54 GFLOP/s , 174062.4 tokens/s INFO:__main__:2024-11-30 05:38:36 | Epoch: 0 | Step: 326070 | Dataset: 0-4577160 | Loss: 0.317 | 596 ms/step , 115807.67 GFLOP/s , 173897.7 tokens/s INFO:__main__:2024-11-30 05:38:43 | Epoch: 0 | Step: 326080 | Dataset: 0-4579560 | Loss: 0.358 | 597 ms/step , 115690.60 GFLOP/s , 173837.6 tokens/s INFO:__main__:2024-11-30 05:38:50 | Epoch: 0 | Step: 326090 | Dataset: 0-4581960 | Loss: 0.289 | 596 ms/step , 115807.32 GFLOP/s , 173815.8 tokens/s INFO:__main__:2024-11-30 05:38:57 | Epoch: 0 | Step: 326100 | Dataset: 0-4584360 | Loss: 0.348 | 596 ms/step , 115731.91 GFLOP/s , 173883.8 tokens/s INFO:__main__:2024-11-30 05:39:04 | Epoch: 0 | Step: 326110 | Dataset: 0-4586760 | Loss: 0.349 | 596 ms/step , 115707.16 GFLOP/s , 173916.9 tokens/s INFO:__main__:2024-11-30 05:39:11 | Epoch: 0 | Step: 326120 | Dataset: 0-4589160 | Loss: 0.757 | 596 ms/step , 115755.82 GFLOP/s , 173840.2 tokens/s INFO:__main__:2024-11-30 05:39:18 | Epoch: 0 | Step: 326130 | Dataset: 0-4591560 | Loss: 0.820 | 596 ms/step , 115735.44 GFLOP/s , 173790.8 tokens/s INFO:__main__:2024-11-30 05:39:25 | Epoch: 0 | Step: 326140 | Dataset: 0-4593960 | Loss: 0.701 | 597 ms/step , 115586.53 GFLOP/s , 173616.9 tokens/s INFO:__main__:2024-11-30 05:39:32 | Epoch: 0 | Step: 326150 | Dataset: 0-4596360 | Loss: 0.755 | 598 ms/step , 115501.26 GFLOP/s , 173599.8 tokens/s INFO:__main__:2024-11-30 05:39:39 | Epoch: 0 | Step: 326160 | Dataset: 0-4598760 | Loss: 0.742 | 597 ms/step , 115605.03 GFLOP/s , 173694.4 tokens/s INFO:__main__:2024-11-30 05:39:46 | Epoch: 0 | Step: 326170 | Dataset: 0-4601160 | Loss: 0.665 | 597 ms/step , 115526.49 GFLOP/s , 173634.5 tokens/s INFO:__main__:2024-11-30 05:39:53 | Epoch: 0 | Step: 326180 | Dataset: 0-4603560 | Loss: 0.758 | 597 ms/step , 115573.12 GFLOP/s , 173648.6 tokens/s INFO:__main__:2024-11-30 05:40:01 | Epoch: 0 | Step: 326190 | Dataset: 0-4605960 | Loss: 0.806 | 597 ms/step , 115650.42 GFLOP/s , 173678.4 tokens/s INFO:__main__:2024-11-30 05:40:08 | Epoch: 0 | Step: 326200 | Dataset: 0-4608360 | Loss: 0.753 | 597 ms/step , 115674.56 GFLOP/s , 173750.7 tokens/s INFO:__main__:2024-11-30 05:40:15 | Epoch: 0 | Step: 326210 | Dataset: 0-4610760 | Loss: 0.759 | 597 ms/step , 115687.58 GFLOP/s , 173758.9 tokens/s INFO:__main__:2024-11-30 05:40:22 | Epoch: 0 | Step: 326220 | Dataset: 0-4613160 | Loss: 0.705 | 597 ms/step , 115670.82 GFLOP/s , 173626.0 tokens/s INFO:__main__:2024-11-30 05:40:29 | Epoch: 0 | Step: 326230 | Dataset: 0-4615560 | Loss: 0.780 | 597 ms/step , 115614.03 GFLOP/s , 173606.1 tokens/s INFO:__main__:2024-11-30 05:40:36 | Epoch: 0 | Step: 326240 | Dataset: 0-4617960 | Loss: 0.701 | 601 ms/step , 114909.24 GFLOP/s , 173667.8 tokens/s INFO:__main__:2024-11-30 05:40:43 | Epoch: 0 | Step: 326250 | Dataset: 0-4620360 | Loss: 0.735 | 597 ms/step , 115523.67 GFLOP/s , 173586.1 tokens/s INFO:__main__:2024-11-30 05:40:50 | Epoch: 0 | Step: 326260 | Dataset: 0-4622760 | Loss: 0.684 | 598 ms/step , 115474.17 GFLOP/s , 173668.3 tokens/s INFO:__main__:2024-11-30 05:40:57 | Epoch: 0 | Step: 326270 | Dataset: 0-4625160 | Loss: 0.756 | 597 ms/step , 115657.38 GFLOP/s , 173594.9 tokens/s INFO:__main__:2024-11-30 05:41:04 | Epoch: 0 | Step: 326280 | Dataset: 0-4627560 | Loss: 0.770 | 597 ms/step , 115614.64 GFLOP/s , 173702.3 tokens/s INFO:__main__:2024-11-30 05:41:11 | Epoch: 0 | Step: 326290 | Dataset: 0-4629960 | Loss: 0.692 | 598 ms/step , 115387.87 GFLOP/s , 173633.3 tokens/s INFO:__main__:2024-11-30 05:41:18 | Epoch: 0 | Step: 326300 | Dataset: 0-4632360 | Loss: 0.636 | 598 ms/step , 115475.53 GFLOP/s , 173580.6 tokens/s INFO:__main__:2024-11-30 05:41:25 | Epoch: 0 | Step: 326310 | Dataset: 0-4634760 | Loss: 0.722 | 597 ms/step , 115560.32 GFLOP/s , 173590.7 tokens/s INFO:__main__:2024-11-30 05:41:32 | Epoch: 0 | Step: 326320 | Dataset: 0-4637160 | Loss: 0.682 | 597 ms/step , 115504.57 GFLOP/s , 173629.1 tokens/s INFO:__main__:2024-11-30 05:41:40 | Epoch: 0 | Step: 326330 | Dataset: 0-4639560 | Loss: 0.681 | 598 ms/step , 115352.50 GFLOP/s , 173594.6 tokens/s INFO:__main__:2024-11-30 05:41:47 | Epoch: 0 | Step: 326340 | Dataset: 0-4641960 | Loss: 0.810 | 598 ms/step , 115440.27 GFLOP/s , 173598.0 tokens/s INFO:__main__:2024-11-30 05:41:54 | Epoch: 0 | Step: 326350 | Dataset: 0-4644360 | Loss: 0.649 | 596 ms/step , 115706.28 GFLOP/s , 173797.8 tokens/s INFO:__main__:2024-11-30 05:42:01 | Epoch: 0 | Step: 326360 | Dataset: 0-4646760 | Loss: 0.682 | 598 ms/step , 115500.69 GFLOP/s , 173723.4 tokens/s INFO:__main__:2024-11-30 05:42:08 | Epoch: 0 | Step: 326370 | Dataset: 0-4649160 | Loss: 0.638 | 597 ms/step , 115502.59 GFLOP/s , 173657.3 tokens/s INFO:__main__:2024-11-30 05:42:15 | Epoch: 0 | Step: 326380 | Dataset: 0-4651560 | Loss: 0.708 | 597 ms/step , 115658.26 GFLOP/s , 173694.9 tokens/s INFO:__main__:2024-11-30 05:42:22 | Epoch: 0 | Step: 326390 | Dataset: 0-4653960 | Loss: 0.654 | 598 ms/step , 115496.59 GFLOP/s , 173630.8 tokens/s INFO:__main__:2024-11-30 05:42:29 | Epoch: 0 | Step: 326400 | Dataset: 0-4656360 | Loss: 0.751 | 598 ms/step , 115474.95 GFLOP/s , 173599.9 tokens/s INFO:__main__:2024-11-30 05:42:36 | Epoch: 0 | Step: 326410 | Dataset: 0-4658760 | Loss: 0.705 | 598 ms/step , 115469.97 GFLOP/s , 173688.3 tokens/s INFO:__main__:2024-11-30 05:42:43 | Epoch: 0 | Step: 326420 | Dataset: 0-4661160 | Loss: 0.769 | 598 ms/step , 115463.50 GFLOP/s , 173663.1 tokens/s INFO:__main__:2024-11-30 05:42:50 | Epoch: 0 | Step: 326430 | Dataset: 0-4663560 | Loss: 0.605 | 596 ms/step , 115818.97 GFLOP/s , 173658.5 tokens/s INFO:__main__:2024-11-30 05:42:57 | Epoch: 0 | Step: 326440 | Dataset: 0-4665960 | Loss: 0.694 | 597 ms/step , 115586.20 GFLOP/s , 173696.9 tokens/s INFO:__main__:2024-11-30 05:43:04 | Epoch: 0 | Step: 326450 | Dataset: 0-4668360 | Loss: 0.700 | 597 ms/step , 115594.68 GFLOP/s , 173633.9 tokens/s INFO:__main__:2024-11-30 05:43:12 | Epoch: 0 | Step: 326460 | Dataset: 0-4670760 | Loss: 0.703 | 598 ms/step , 115450.43 GFLOP/s , 173608.3 tokens/s INFO:__main__:2024-11-30 05:43:19 | Epoch: 0 | Step: 326470 | Dataset: 0-4673160 | Loss: 0.717 | 597 ms/step , 115527.85 GFLOP/s , 173572.7 tokens/s INFO:__main__:2024-11-30 05:43:26 | Epoch: 0 | Step: 326480 | Dataset: 0-4675560 | Loss: 0.690 | 598 ms/step , 115404.51 GFLOP/s , 173606.6 tokens/s INFO:__main__:2024-11-30 05:43:33 | Epoch: 0 | Step: 326490 | Dataset: 0-4677960 | Loss: 0.685 | 597 ms/step , 115537.30 GFLOP/s , 173631.3 tokens/s INFO:__main__:2024-11-30 05:43:40 | Validation | Step: 326500 | Val_loss: 1.510 | Best_val_loss: 0.4363 INFO:__main__:2024-11-30 05:43:41 | Epoch: 0 | Step: 326500 | Dataset: 0-4680360 | Loss: 0.611 | 596 ms/step , 115717.80 GFLOP/s , 147744.3 tokens/s INFO:__main__:2024-11-30 05:43:48 | Epoch: 0 | Step: 326510 | Dataset: 0-4682760 | Loss: 0.688 | 597 ms/step , 115684.06 GFLOP/s , 173794.3 tokens/s INFO:__main__:2024-11-30 05:43:55 | Epoch: 0 | Step: 326520 | Dataset: 0-4685160 | Loss: 0.705 | 598 ms/step , 115427.97 GFLOP/s , 173627.3 tokens/s INFO:__main__:2024-11-30 05:44:02 | Epoch: 0 | Step: 326530 | Dataset: 0-4687560 | Loss: 0.680 | 596 ms/step , 115709.45 GFLOP/s , 173669.9 tokens/s INFO:__main__:2024-11-30 05:44:09 | Epoch: 0 | Step: 326540 | Dataset: 0-4689960 | Loss: 0.669 | 596 ms/step , 115699.53 GFLOP/s , 173611.9 tokens/s INFO:__main__:2024-11-30 05:44:16 | Epoch: 0 | Step: 326550 | Dataset: 0-4692360 | Loss: 0.621 | 597 ms/step , 115640.61 GFLOP/s , 173633.8 tokens/s INFO:__main__:2024-11-30 05:44:24 | Epoch: 0 | Step: 326560 | Dataset: 0-4694760 | Loss: 0.735 | 597 ms/step , 115525.93 GFLOP/s , 173623.0 tokens/s INFO:__main__:2024-11-30 05:44:31 | Epoch: 0 | Step: 326570 | Dataset: 0-4697160 | Loss: 0.702 | 597 ms/step , 115604.72 GFLOP/s , 173676.2 tokens/s INFO:__main__:2024-11-30 05:44:38 | Epoch: 0 | Step: 326580 | Dataset: 0-4699560 | Loss: 0.707 | 597 ms/step , 115651.53 GFLOP/s , 173783.1 tokens/s INFO:__main__:2024-11-30 05:44:45 | Epoch: 0 | Step: 326590 | Dataset: 0-4701960 | Loss: 0.636 | 597 ms/step , 115580.80 GFLOP/s , 173652.3 tokens/s INFO:__main__:2024-11-30 05:44:52 | Epoch: 0 | Step: 326600 | Dataset: 0-4704360 | Loss: 0.743 | 598 ms/step , 115346.48 GFLOP/s , 173643.9 tokens/s INFO:__main__:2024-11-30 05:44:59 | Epoch: 0 | Step: 326610 | Dataset: 0-4706760 | Loss: 0.744 | 597 ms/step , 115572.01 GFLOP/s , 173597.5 tokens/s INFO:__main__:2024-11-30 05:45:06 | Epoch: 0 | Step: 326620 | Dataset: 0-4709160 | Loss: 0.570 | 598 ms/step , 115461.77 GFLOP/s , 173605.7 tokens/s INFO:__main__:2024-11-30 05:45:13 | Epoch: 0 | Step: 326630 | Dataset: 0-4711560 | Loss: 0.744 | 597 ms/step , 115577.12 GFLOP/s , 173601.0 tokens/s INFO:__main__:2024-11-30 05:45:20 | Epoch: 0 | Step: 326640 | Dataset: 0-4713960 | Loss: 0.675 | 598 ms/step , 115456.33 GFLOP/s , 173537.1 tokens/s INFO:__main__:2024-11-30 05:45:27 | Epoch: 0 | Step: 326650 | Dataset: 0-4716360 | Loss: 0.670 | 596 ms/step , 115723.80 GFLOP/s , 173777.0 tokens/s INFO:__main__:2024-11-30 05:45:34 | Epoch: 0 | Step: 326660 | Dataset: 0-4718760 | Loss: 0.751 | 597 ms/step , 115529.19 GFLOP/s , 173715.3 tokens/s INFO:__main__:2024-11-30 05:45:41 | Epoch: 0 | Step: 326670 | Dataset: 0-4721160 | Loss: 0.423 | 596 ms/step , 115700.23 GFLOP/s , 173821.2 tokens/s INFO:__main__:2024-11-30 05:45:48 | Epoch: 0 | Step: 326680 | Dataset: 0-4723560 | Loss: 0.377 | 597 ms/step , 115690.52 GFLOP/s , 173686.1 tokens/s INFO:__main__:2024-11-30 05:45:56 | Epoch: 0 | Step: 326690 | Dataset: 0-4725960 | Loss: 0.389 | 596 ms/step , 115740.05 GFLOP/s , 173716.6 tokens/s INFO:__main__:2024-11-30 05:46:03 | Epoch: 0 | Step: 326700 | Dataset: 0-4728360 | Loss: 0.411 | 597 ms/step , 115544.89 GFLOP/s , 173746.2 tokens/s INFO:__main__:2024-11-30 05:46:10 | Epoch: 0 | Step: 326710 | Dataset: 0-4730760 | Loss: 0.382 | 597 ms/step , 115600.03 GFLOP/s , 173722.4 tokens/s INFO:__main__:2024-11-30 05:46:17 | Epoch: 0 | Step: 326720 | Dataset: 0-4733160 | Loss: 0.364 | 596 ms/step , 115718.61 GFLOP/s , 173732.6 tokens/s INFO:__main__:2024-11-30 05:46:24 | Epoch: 0 | Step: 326730 | Dataset: 0-4735560 | Loss: 0.386 | 596 ms/step , 115855.10 GFLOP/s , 173862.6 tokens/s INFO:__main__:2024-11-30 05:46:31 | Epoch: 0 | Step: 326740 | Dataset: 0-4737960 | Loss: 0.363 | 597 ms/step , 115597.56 GFLOP/s , 173741.4 tokens/s INFO:__main__:2024-11-30 05:46:38 | Epoch: 0 | Step: 326750 | Dataset: 0-4740360 | Loss: 0.385 | 597 ms/step , 115661.01 GFLOP/s , 173797.1 tokens/s INFO:__main__:2024-11-30 05:46:45 | Epoch: 0 | Step: 326760 | Dataset: 0-4742760 | Loss: 0.377 | 597 ms/step , 115668.65 GFLOP/s , 173700.3 tokens/s INFO:__main__:2024-11-30 05:46:52 | Epoch: 0 | Step: 326770 | Dataset: 0-4745160 | Loss: 0.395 | 596 ms/step , 115702.11 GFLOP/s , 173775.1 tokens/s INFO:__main__:2024-11-30 05:46:59 | Epoch: 0 | Step: 326780 | Dataset: 0-4747560 | Loss: 0.349 | 597 ms/step , 115690.55 GFLOP/s , 173744.4 tokens/s INFO:__main__:2024-11-30 05:47:06 | Epoch: 0 | Step: 326790 | Dataset: 0-4749960 | Loss: 0.396 | 597 ms/step , 115659.54 GFLOP/s , 173742.4 tokens/s INFO:__main__:2024-11-30 05:47:13 | Epoch: 0 | Step: 326800 | Dataset: 0-4752360 | Loss: 0.373 | 599 ms/step , 115308.11 GFLOP/s , 173823.9 tokens/s INFO:__main__:2024-11-30 05:47:20 | Epoch: 0 | Step: 326810 | Dataset: 0-4754760 | Loss: 0.374 | 597 ms/step , 115628.54 GFLOP/s , 173703.1 tokens/s INFO:__main__:2024-11-30 05:47:27 | Epoch: 0 | Step: 326820 | Dataset: 0-4757160 | Loss: 0.408 | 599 ms/step , 115281.99 GFLOP/s , 173688.5 tokens/s INFO:__main__:2024-11-30 05:47:35 | Epoch: 0 | Step: 326830 | Dataset: 0-4759560 | Loss: 0.378 | 597 ms/step , 115506.38 GFLOP/s , 173705.2 tokens/s INFO:__main__:2024-11-30 05:47:42 | Epoch: 0 | Step: 326840 | Dataset: 0-4761960 | Loss: 0.381 | 598 ms/step , 115349.40 GFLOP/s , 173690.0 tokens/s INFO:__main__:2024-11-30 05:47:49 | Epoch: 0 | Step: 326850 | Dataset: 0-4764360 | Loss: 0.356 | 597 ms/step , 115569.64 GFLOP/s , 173721.2 tokens/s INFO:__main__:2024-11-30 05:47:56 | Epoch: 0 | Step: 326860 | Dataset: 0-4766760 | Loss: 0.362 | 597 ms/step , 115609.26 GFLOP/s , 173729.5 tokens/s INFO:__main__:2024-11-30 05:48:03 | Epoch: 0 | Step: 326870 | Dataset: 0-4769160 | Loss: 0.401 | 596 ms/step , 115765.94 GFLOP/s , 173759.0 tokens/s INFO:__main__:2024-11-30 05:48:10 | Epoch: 0 | Step: 326880 | Dataset: 0-4771560 | Loss: 0.398 | 597 ms/step , 115693.62 GFLOP/s , 173880.5 tokens/s INFO:__main__:2024-11-30 05:48:17 | Epoch: 0 | Step: 326890 | Dataset: 0-4773960 | Loss: 0.369 | 597 ms/step , 115695.46 GFLOP/s , 173774.5 tokens/s INFO:__main__:2024-11-30 05:48:24 | Epoch: 0 | Step: 326900 | Dataset: 0-4776360 | Loss: 0.339 | 598 ms/step , 115480.33 GFLOP/s , 173698.6 tokens/s INFO:__main__:2024-11-30 05:48:31 | Epoch: 0 | Step: 326910 | Dataset: 0-4778760 | Loss: 0.359 | 598 ms/step , 115467.76 GFLOP/s , 173644.3 tokens/s INFO:__main__:2024-11-30 05:48:38 | Epoch: 0 | Step: 326920 | Dataset: 0-4781160 | Loss: 0.362 | 597 ms/step , 115667.73 GFLOP/s , 173660.8 tokens/s INFO:__main__:2024-11-30 05:48:45 | Epoch: 0 | Step: 326930 | Dataset: 0-4783560 | Loss: 0.386 | 597 ms/step , 115599.64 GFLOP/s , 173640.5 tokens/s INFO:__main__:2024-11-30 05:48:52 | Epoch: 0 | Step: 326940 | Dataset: 0-4785960 | Loss: 0.383 | 598 ms/step , 115344.80 GFLOP/s , 173733.0 tokens/s INFO:__main__:2024-11-30 05:48:59 | Epoch: 0 | Step: 326950 | Dataset: 0-4788360 | Loss: 0.397 | 596 ms/step , 115802.91 GFLOP/s , 173858.5 tokens/s INFO:__main__:2024-11-30 05:49:06 | Epoch: 0 | Step: 326960 | Dataset: 0-4790760 | Loss: 0.374 | 597 ms/step , 115628.83 GFLOP/s , 173860.6 tokens/s INFO:__main__:2024-11-30 05:49:14 | Epoch: 0 | Step: 326970 | Dataset: 0-4793160 | Loss: 0.361 | 597 ms/step , 115670.82 GFLOP/s , 173751.7 tokens/s INFO:__main__:2024-11-30 05:49:21 | Epoch: 0 | Step: 326980 | Dataset: 0-4795560 | Loss: 0.413 | 598 ms/step , 115487.41 GFLOP/s , 173776.6 tokens/s INFO:__main__:2024-11-30 05:49:28 | Epoch: 0 | Step: 326990 | Dataset: 0-4797960 | Loss: 0.379 | 597 ms/step , 115527.08 GFLOP/s , 173662.9 tokens/s INFO:__main__:2024-11-30 05:49:35 | Validation | Step: 327000 | Val_loss: 1.318 | Best_val_loss: 0.4363 INFO:__main__:2024-11-30 05:49:35 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_054935_step_327000.pt` INFO:__main__:2024-11-30 05:49:38 | Epoch: 0 | Step: 327000 | Dataset: 0-4800360 | Loss: 0.376 | 594 ms/step , 116087.17 GFLOP/s , 118934.1 tokens/s INFO:__main__:2024-11-30 05:49:45 | Epoch: 0 | Step: 327010 | Dataset: 0-4802760 | Loss: 0.448 | 597 ms/step , 115612.37 GFLOP/s , 173344.2 tokens/s INFO:__main__:2024-11-30 05:49:52 | Epoch: 0 | Step: 327020 | Dataset: 0-4805160 | Loss: 0.449 | 597 ms/step , 115554.97 GFLOP/s , 173475.4 tokens/s INFO:__main__:2024-11-30 05:49:59 | Epoch: 0 | Step: 327030 | Dataset: 0-4807560 | Loss: 0.376 | 596 ms/step , 115772.21 GFLOP/s , 173549.0 tokens/s INFO:__main__:2024-11-30 05:50:06 | Epoch: 0 | Step: 327040 | Dataset: 0-4809960 | Loss: 0.405 | 597 ms/step , 115655.81 GFLOP/s , 173891.7 tokens/s INFO:__main__:2024-11-30 05:50:13 | Epoch: 0 | Step: 327050 | Dataset: 0-4812360 | Loss: 0.360 | 596 ms/step , 115784.63 GFLOP/s , 173842.8 tokens/s INFO:__main__:2024-11-30 05:50:21 | Epoch: 0 | Step: 327060 | Dataset: 0-4814760 | Loss: 0.332 | 597 ms/step , 115678.25 GFLOP/s , 173785.4 tokens/s INFO:__main__:2024-11-30 05:50:28 | Epoch: 0 | Step: 327070 | Dataset: 0-4817160 | Loss: 0.395 | 596 ms/step , 115708.34 GFLOP/s , 173753.3 tokens/s INFO:__main__:2024-11-30 05:50:35 | Epoch: 0 | Step: 327080 | Dataset: 0-4819560 | Loss: 0.419 | 597 ms/step , 115600.23 GFLOP/s , 173722.6 tokens/s INFO:__main__:2024-11-30 05:50:42 | Epoch: 0 | Step: 327090 | Dataset: 0-4821960 | Loss: 0.398 | 597 ms/step , 115530.99 GFLOP/s , 173768.8 tokens/s INFO:__main__:2024-11-30 05:50:49 | Epoch: 0 | Step: 327100 | Dataset: 0-4824360 | Loss: 0.355 | 596 ms/step , 115879.21 GFLOP/s , 173927.3 tokens/s INFO:__main__:2024-11-30 05:50:56 | Epoch: 0 | Step: 327110 | Dataset: 0-4826760 | Loss: 0.381 | 596 ms/step , 115738.86 GFLOP/s , 173876.3 tokens/s INFO:__main__:2024-11-30 05:51:03 | Epoch: 0 | Step: 327120 | Dataset: 0-4829160 | Loss: 0.368 | 596 ms/step , 115747.65 GFLOP/s , 173813.0 tokens/s INFO:__main__:2024-11-30 05:51:10 | Epoch: 0 | Step: 327130 | Dataset: 0-4831560 | Loss: 0.394 | 597 ms/step , 115579.16 GFLOP/s , 173677.8 tokens/s INFO:__main__:2024-11-30 05:51:17 | Epoch: 0 | Step: 327140 | Dataset: 0-4833960 | Loss: 0.392 | 597 ms/step , 115658.93 GFLOP/s , 173721.0 tokens/s INFO:__main__:2024-11-30 05:51:24 | Epoch: 0 | Step: 327150 | Dataset: 0-4836360 | Loss: 0.382 | 597 ms/step , 115564.84 GFLOP/s , 173692.8 tokens/s INFO:__main__:2024-11-30 05:51:31 | Epoch: 0 | Step: 327160 | Dataset: 0-4838760 | Loss: 0.390 | 597 ms/step , 115667.64 GFLOP/s , 173780.9 tokens/s INFO:__main__:2024-11-30 05:51:38 | Epoch: 0 | Step: 327170 | Dataset: 0-4841160 | Loss: 0.377 | 596 ms/step , 115755.71 GFLOP/s , 173842.5 tokens/s INFO:__main__:2024-11-30 05:51:45 | Epoch: 0 | Step: 327180 | Dataset: 0-4843560 | Loss: 0.348 | 595 ms/step , 115905.32 GFLOP/s , 173820.4 tokens/s INFO:__main__:2024-11-30 05:51:52 | Epoch: 0 | Step: 327190 | Dataset: 0-4845960 | Loss: 0.384 | 597 ms/step , 115587.74 GFLOP/s , 173766.3 tokens/s INFO:__main__:2024-11-30 05:52:00 | Epoch: 0 | Step: 327200 | Dataset: 0-4848360 | Loss: 0.366 | 597 ms/step , 115660.24 GFLOP/s , 173733.8 tokens/s INFO:__main__:2024-11-30 05:52:07 | Epoch: 0 | Step: 327210 | Dataset: 0-4850760 | Loss: 0.731 | 596 ms/step , 115698.30 GFLOP/s , 173679.3 tokens/s INFO:__main__:2024-11-30 05:52:14 | Epoch: 0 | Step: 327220 | Dataset: 0-4853160 | Loss: 0.753 | 597 ms/step , 115520.78 GFLOP/s , 173587.7 tokens/s INFO:__main__:2024-11-30 05:52:21 | Epoch: 0 | Step: 327230 | Dataset: 0-4855560 | Loss: 0.627 | 597 ms/step , 115669.74 GFLOP/s , 173656.0 tokens/s INFO:__main__:2024-11-30 05:52:28 | Epoch: 0 | Step: 327240 | Dataset: 0-4857960 | Loss: 0.590 | 597 ms/step , 115599.07 GFLOP/s , 173748.0 tokens/s INFO:__main__:2024-11-30 05:52:35 | Epoch: 0 | Step: 327250 | Dataset: 0-4860360 | Loss: 0.540 | 596 ms/step , 115831.11 GFLOP/s , 173892.3 tokens/s INFO:__main__:2024-11-30 05:52:42 | Epoch: 0 | Step: 327260 | Dataset: 0-4862760 | Loss: 0.554 | 597 ms/step , 115691.48 GFLOP/s , 173865.4 tokens/s INFO:__main__:2024-11-30 05:52:49 | Epoch: 0 | Step: 327270 | Dataset: 0-4865160 | Loss: 0.547 | 597 ms/step , 115684.68 GFLOP/s , 173837.0 tokens/s INFO:__main__:2024-11-30 05:52:56 | Epoch: 0 | Step: 327280 | Dataset: 0-4867560 | Loss: 0.623 | 597 ms/step , 115607.16 GFLOP/s , 173762.8 tokens/s INFO:__main__:2024-11-30 05:53:03 | Epoch: 0 | Step: 327290 | Dataset: 0-4869960 | Loss: 0.995 | 597 ms/step , 115571.30 GFLOP/s , 173745.9 tokens/s INFO:__main__:2024-11-30 05:53:10 | Epoch: 0 | Step: 327300 | Dataset: 0-4872360 | Loss: 0.981 | 598 ms/step , 115498.14 GFLOP/s , 173652.3 tokens/s INFO:__main__:2024-11-30 05:53:17 | Epoch: 0 | Step: 327310 | Dataset: 0-4874760 | Loss: 0.956 | 597 ms/step , 115650.61 GFLOP/s , 173720.0 tokens/s INFO:__main__:2024-11-30 05:53:24 | Epoch: 0 | Step: 327320 | Dataset: 0-4877160 | Loss: 0.896 | 596 ms/step , 115703.62 GFLOP/s , 173721.9 tokens/s INFO:__main__:2024-11-30 05:53:31 | Epoch: 0 | Step: 327330 | Dataset: 0-4879560 | Loss: 0.941 | 597 ms/step , 115577.26 GFLOP/s , 173791.6 tokens/s INFO:__main__:2024-11-30 05:53:39 | Epoch: 0 | Step: 327340 | Dataset: 0-4881960 | Loss: 0.979 | 597 ms/step , 115682.03 GFLOP/s , 173800.8 tokens/s INFO:__main__:2024-11-30 05:53:46 | Epoch: 0 | Step: 327350 | Dataset: 0-4884360 | Loss: 1.002 | 597 ms/step , 115677.35 GFLOP/s , 173663.7 tokens/s INFO:__main__:2024-11-30 05:53:53 | Epoch: 0 | Step: 327360 | Dataset: 0-4886760 | Loss: 0.951 | 598 ms/step , 115480.23 GFLOP/s , 173683.2 tokens/s INFO:__main__:2024-11-30 05:54:00 | Epoch: 0 | Step: 327370 | Dataset: 0-4889160 | Loss: 0.956 | 597 ms/step , 115636.16 GFLOP/s , 173575.3 tokens/s INFO:__main__:2024-11-30 05:54:07 | Epoch: 0 | Step: 327380 | Dataset: 0-4891560 | Loss: 0.909 | 597 ms/step , 115625.68 GFLOP/s , 173690.5 tokens/s INFO:__main__:2024-11-30 05:54:14 | Epoch: 0 | Step: 327390 | Dataset: 0-4893960 | Loss: 1.009 | 596 ms/step , 115766.82 GFLOP/s , 173660.9 tokens/s INFO:__main__:2024-11-30 05:54:21 | Epoch: 0 | Step: 327400 | Dataset: 0-4896360 | Loss: 0.987 | 597 ms/step , 115685.15 GFLOP/s , 173747.4 tokens/s INFO:__main__:2024-11-30 05:54:28 | Epoch: 0 | Step: 327410 | Dataset: 0-4898760 | Loss: 1.078 | 596 ms/step , 115732.28 GFLOP/s , 173780.1 tokens/s INFO:__main__:2024-11-30 05:54:35 | Epoch: 0 | Step: 327420 | Dataset: 0-4901160 | Loss: 1.052 | 597 ms/step , 115679.05 GFLOP/s , 173711.2 tokens/s INFO:__main__:2024-11-30 05:54:42 | Epoch: 0 | Step: 327430 | Dataset: 0-4903560 | Loss: 1.100 | 597 ms/step , 115620.08 GFLOP/s , 173651.0 tokens/s INFO:__main__:2024-11-30 05:54:49 | Epoch: 0 | Step: 327440 | Dataset: 0-4905960 | Loss: 1.249 | 598 ms/step , 115361.77 GFLOP/s , 173613.0 tokens/s INFO:__main__:2024-11-30 05:54:56 | Epoch: 0 | Step: 327450 | Dataset: 0-4908360 | Loss: 1.130 | 598 ms/step , 115358.31 GFLOP/s , 173565.1 tokens/s INFO:__main__:2024-11-30 05:55:03 | Epoch: 0 | Step: 327460 | Dataset: 0-4910760 | Loss: 1.197 | 598 ms/step , 115417.55 GFLOP/s , 173595.9 tokens/s INFO:__main__:2024-11-30 05:55:11 | Epoch: 0 | Step: 327470 | Dataset: 0-4913160 | Loss: 1.166 | 597 ms/step , 115670.17 GFLOP/s , 173639.6 tokens/s INFO:__main__:2024-11-30 05:55:18 | Epoch: 0 | Step: 327480 | Dataset: 0-4915560 | Loss: 1.229 | 597 ms/step , 115639.28 GFLOP/s , 173798.4 tokens/s INFO:__main__:2024-11-30 05:55:25 | Epoch: 0 | Step: 327490 | Dataset: 0-4917960 | Loss: 1.176 | 597 ms/step , 115619.69 GFLOP/s , 173808.6 tokens/s INFO:__main__:2024-11-30 05:55:32 | Validation | Step: 327500 | Val_loss: 1.004 | Best_val_loss: 0.4363 INFO:__main__:2024-11-30 05:55:33 | Epoch: 0 | Step: 327500 | Dataset: 0-4920360 | Loss: 1.138 | 597 ms/step , 115654.62 GFLOP/s , 147742.5 tokens/s INFO:__main__:2024-11-30 05:55:40 | Epoch: 0 | Step: 327510 | Dataset: 0-4922760 | Loss: 1.136 | 598 ms/step , 115416.29 GFLOP/s , 173730.1 tokens/s INFO:__main__:2024-11-30 05:55:47 | Epoch: 0 | Step: 327520 | Dataset: 0-4925160 | Loss: 1.175 | 598 ms/step , 115369.33 GFLOP/s , 173652.0 tokens/s INFO:__main__:2024-11-30 05:55:54 | Epoch: 0 | Step: 327530 | Dataset: 0-4927560 | Loss: 1.090 | 597 ms/step , 115567.83 GFLOP/s , 173600.6 tokens/s INFO:__main__:2024-11-30 05:56:01 | Epoch: 0 | Step: 327540 | Dataset: 0-4929960 | Loss: 0.917 | 597 ms/step , 115572.28 GFLOP/s , 173651.0 tokens/s INFO:__main__:2024-11-30 05:56:08 | Epoch: 0 | Step: 327550 | Dataset: 0-4932360 | Loss: 0.973 | 596 ms/step , 115721.80 GFLOP/s , 173608.9 tokens/s INFO:__main__:2024-11-30 05:56:15 | Epoch: 0 | Step: 327560 | Dataset: 0-4934760 | Loss: 0.984 | 597 ms/step , 115574.91 GFLOP/s , 173711.5 tokens/s INFO:__main__:2024-11-30 05:56:22 | Epoch: 0 | Step: 327570 | Dataset: 0-4937160 | Loss: 1.045 | 597 ms/step , 115533.84 GFLOP/s , 173661.9 tokens/s INFO:__main__:2024-11-30 05:56:30 | Epoch: 0 | Step: 327580 | Dataset: 0-4939560 | Loss: 0.817 | 597 ms/step , 115576.92 GFLOP/s , 173622.0 tokens/s INFO:__main__:2024-11-30 05:56:37 | Epoch: 0 | Step: 327590 | Dataset: 0-4941960 | Loss: 0.622 | 597 ms/step , 115632.83 GFLOP/s , 173550.9 tokens/s INFO:__main__:2024-11-30 05:56:44 | Epoch: 0 | Step: 327600 | Dataset: 0-4944360 | Loss: 0.509 | 597 ms/step , 115594.35 GFLOP/s , 173666.0 tokens/s INFO:__main__:2024-11-30 05:56:51 | Epoch: 0 | Step: 327610 | Dataset: 0-4946760 | Loss: 0.478 | 597 ms/step , 115609.99 GFLOP/s , 173705.7 tokens/s INFO:__main__:2024-11-30 05:56:58 | Epoch: 0 | Step: 327620 | Dataset: 0-4949160 | Loss: 0.897 | 598 ms/step , 115360.63 GFLOP/s , 173572.4 tokens/s INFO:__main__:2024-11-30 05:57:05 | Epoch: 0 | Step: 327630 | Dataset: 0-4951560 | Loss: 0.616 | 597 ms/step , 115631.96 GFLOP/s , 173725.7 tokens/s INFO:__main__:2024-11-30 05:57:12 | Epoch: 0 | Step: 327640 | Dataset: 0-4953960 | Loss: 0.688 | 597 ms/step , 115570.06 GFLOP/s , 173746.7 tokens/s INFO:__main__:2024-11-30 05:57:19 | Epoch: 0 | Step: 327650 | Dataset: 0-4956360 | Loss: 0.586 | 597 ms/step , 115659.33 GFLOP/s , 173543.1 tokens/s INFO:__main__:2024-11-30 05:57:26 | Epoch: 0 | Step: 327660 | Dataset: 0-4958760 | Loss: 0.634 | 597 ms/step , 115669.15 GFLOP/s , 173631.9 tokens/s INFO:__main__:2024-11-30 05:57:33 | Epoch: 0 | Step: 327670 | Dataset: 0-4961160 | Loss: 0.698 | 597 ms/step , 115550.27 GFLOP/s , 173669.2 tokens/s INFO:__main__:2024-11-30 05:57:40 | Epoch: 0 | Step: 327680 | Dataset: 0-4963560 | Loss: 0.854 | 598 ms/step , 115423.31 GFLOP/s , 173611.1 tokens/s INFO:__main__:2024-11-30 05:57:47 | Epoch: 0 | Step: 327690 | Dataset: 0-4965960 | Loss: 0.698 | 598 ms/step , 115434.51 GFLOP/s , 173699.6 tokens/s INFO:__main__:2024-11-30 05:57:54 | Epoch: 0 | Step: 327700 | Dataset: 0-4968360 | Loss: 0.621 | 597 ms/step , 115561.13 GFLOP/s , 173671.0 tokens/s INFO:__main__:2024-11-30 05:58:02 | Epoch: 0 | Step: 327710 | Dataset: 0-4970760 | Loss: 0.916 | 597 ms/step , 115633.64 GFLOP/s , 173710.7 tokens/s INFO:__main__:2024-11-30 05:58:09 | Epoch: 0 | Step: 327720 | Dataset: 0-4973160 | Loss: 0.656 | 597 ms/step , 115580.47 GFLOP/s , 173643.3 tokens/s INFO:__main__:2024-11-30 05:58:16 | Epoch: 0 | Step: 327730 | Dataset: 0-4975560 | Loss: 0.731 | 597 ms/step , 115584.61 GFLOP/s , 173570.0 tokens/s INFO:__main__:2024-11-30 05:58:23 | Epoch: 0 | Step: 327740 | Dataset: 0-4977960 | Loss: 0.744 | 598 ms/step , 115492.03 GFLOP/s , 173625.9 tokens/s INFO:__main__:2024-11-30 05:58:30 | Epoch: 0 | Step: 327750 | Dataset: 0-4980360 | Loss: 0.644 | 598 ms/step , 115452.54 GFLOP/s , 173631.4 tokens/s INFO:__main__:2024-11-30 05:58:37 | Epoch: 0 | Step: 327760 | Dataset: 0-4982760 | Loss: 0.642 | 598 ms/step , 115402.83 GFLOP/s , 173479.2 tokens/s INFO:__main__:2024-11-30 05:58:44 | Epoch: 0 | Step: 327770 | Dataset: 0-4985160 | Loss: 0.631 | 598 ms/step , 115460.45 GFLOP/s , 173538.6 tokens/s INFO:__main__:2024-11-30 05:58:51 | Epoch: 0 | Step: 327780 | Dataset: 0-4987560 | Loss: 0.635 | 598 ms/step , 115492.15 GFLOP/s , 173665.3 tokens/s INFO:__main__:2024-11-30 05:58:58 | Epoch: 0 | Step: 327790 | Dataset: 0-4989960 | Loss: 0.734 | 597 ms/step , 115602.03 GFLOP/s , 173673.6 tokens/s INFO:__main__:2024-11-30 05:59:05 | Epoch: 0 | Step: 327800 | Dataset: 0-4992360 | Loss: 0.833 | 599 ms/step , 115226.71 GFLOP/s , 173440.8 tokens/s INFO:__main__:2024-11-30 05:59:12 | Epoch: 0 | Step: 327810 | Dataset: 0-4994760 | Loss: 0.850 | 598 ms/step , 115470.12 GFLOP/s , 173466.5 tokens/s INFO:__main__:2024-11-30 05:59:19 | Epoch: 0 | Step: 327820 | Dataset: 0-4997160 | Loss: 0.745 | 598 ms/step , 115364.13 GFLOP/s , 173435.1 tokens/s INFO:__main__:2024-11-30 05:59:27 | Epoch: 0 | Step: 327830 | Dataset: 0-4999560 | Loss: 0.733 | 598 ms/step , 115421.98 GFLOP/s , 173439.2 tokens/s INFO:__main__:2024-11-30 05:59:34 | Epoch: 0 | Step: 327840 | Dataset: 0-5001960 | Loss: 0.822 | 598 ms/step , 115321.71 GFLOP/s , 173377.1 tokens/s INFO:__main__:2024-11-30 05:59:41 | Epoch: 0 | Step: 327850 | Dataset: 0-5004360 | Loss: 0.796 | 598 ms/step , 115429.13 GFLOP/s , 173470.5 tokens/s INFO:__main__:2024-11-30 05:59:48 | Epoch: 0 | Step: 327860 | Dataset: 0-5006760 | Loss: 0.806 | 597 ms/step , 115540.76 GFLOP/s , 173526.9 tokens/s INFO:__main__:2024-11-30 05:59:55 | Epoch: 0 | Step: 327870 | Dataset: 0-5009160 | Loss: 0.842 | 599 ms/step , 115269.90 GFLOP/s , 173513.8 tokens/s INFO:__main__:2024-11-30 06:00:02 | Epoch: 0 | Step: 327880 | Dataset: 0-5011560 | Loss: 0.730 | 597 ms/step , 115531.04 GFLOP/s , 173397.9 tokens/s INFO:__main__:2024-11-30 06:00:09 | Epoch: 0 | Step: 327890 | Dataset: 0-5013960 | Loss: 0.822 | 598 ms/step , 115496.85 GFLOP/s , 173387.0 tokens/s INFO:__main__:2024-11-30 06:00:16 | Epoch: 0 | Step: 327900 | Dataset: 0-5016360 | Loss: 0.829 | 598 ms/step , 115320.62 GFLOP/s , 173320.3 tokens/s INFO:__main__:2024-11-30 06:00:23 | Epoch: 0 | Step: 327910 | Dataset: 0-5018760 | Loss: 0.818 | 599 ms/step , 115296.21 GFLOP/s , 173312.1 tokens/s INFO:__main__:2024-11-30 06:00:30 | Epoch: 0 | Step: 327920 | Dataset: 0-5021160 | Loss: 0.794 | 598 ms/step , 115389.56 GFLOP/s , 173353.1 tokens/s INFO:__main__:2024-11-30 06:00:37 | Epoch: 0 | Step: 327930 | Dataset: 0-5023560 | Loss: 0.785 | 598 ms/step , 115381.08 GFLOP/s , 173469.1 tokens/s INFO:__main__:2024-11-30 06:00:44 | Epoch: 0 | Step: 327940 | Dataset: 0-5025960 | Loss: 0.865 | 598 ms/step , 115410.09 GFLOP/s , 173404.4 tokens/s INFO:__main__:2024-11-30 06:00:52 | Epoch: 0 | Step: 327950 | Dataset: 0-5028360 | Loss: 0.830 | 599 ms/step , 115266.29 GFLOP/s , 173304.8 tokens/s INFO:__main__:2024-11-30 06:00:59 | Epoch: 0 | Step: 327960 | Dataset: 0-5030760 | Loss: 0.822 | 598 ms/step , 115337.28 GFLOP/s , 173321.1 tokens/s INFO:__main__:2024-11-30 06:01:06 | Epoch: 0 | Step: 327970 | Dataset: 0-5033160 | Loss: 0.801 | 598 ms/step , 115381.34 GFLOP/s , 173318.3 tokens/s INFO:__main__:2024-11-30 06:01:13 | Epoch: 0 | Step: 327980 | Dataset: 0-5035560 | Loss: 0.810 | 598 ms/step , 115313.94 GFLOP/s , 173458.0 tokens/s INFO:__main__:2024-11-30 06:01:20 | Epoch: 0 | Step: 327990 | Dataset: 0-5037960 | Loss: 0.794 | 598 ms/step , 115418.02 GFLOP/s , 173426.1 tokens/s INFO:__main__:2024-11-30 06:01:28 | Validation | Step: 328000 | Val_loss: 0.640 | Best_val_loss: 0.4363 INFO:__main__:2024-11-30 06:01:28 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_060128_step_328000.pt` INFO:__main__:2024-11-30 06:01:30 | Epoch: 0 | Step: 328000 | Dataset: 0-5040360 | Loss: 0.803 | 596 ms/step , 115793.28 GFLOP/s , 120090.6 tokens/s INFO:__main__:2024-11-30 06:01:37 | Epoch: 0 | Step: 328010 | Dataset: 0-5042760 | Loss: 0.799 | 600 ms/step , 114962.00 GFLOP/s , 173130.5 tokens/s INFO:__main__:2024-11-30 06:01:44 | Epoch: 0 | Step: 328020 | Dataset: 0-5045160 | Loss: 0.772 | 598 ms/step , 115485.01 GFLOP/s , 173060.8 tokens/s INFO:__main__:2024-11-30 06:01:51 | Epoch: 0 | Step: 328030 | Dataset: 0-5047560 | Loss: 0.782 | 598 ms/step , 115452.95 GFLOP/s , 172986.0 tokens/s INFO:__main__:2024-11-30 06:01:59 | Epoch: 0 | Step: 328040 | Dataset: 0-5049960 | Loss: 0.877 | 598 ms/step , 115422.68 GFLOP/s , 173425.9 tokens/s INFO:__main__:2024-11-30 06:02:06 | Epoch: 0 | Step: 328050 | Dataset: 0-5052360 | Loss: 0.795 | 598 ms/step , 115461.91 GFLOP/s , 173483.2 tokens/s INFO:__main__:2024-11-30 06:02:13 | Epoch: 0 | Step: 328060 | Dataset: 0-5054760 | Loss: 0.831 | 597 ms/step , 115552.09 GFLOP/s , 173523.5 tokens/s INFO:__main__:2024-11-30 06:02:20 | Epoch: 0 | Step: 328070 | Dataset: 0-5057160 | Loss: 0.814 | 597 ms/step , 115625.19 GFLOP/s , 173525.8 tokens/s INFO:__main__:2024-11-30 06:02:27 | Epoch: 0 | Step: 328080 | Dataset: 0-5059560 | Loss: 0.825 | 598 ms/step , 115479.07 GFLOP/s , 173589.8 tokens/s INFO:__main__:2024-11-30 06:02:34 | Epoch: 0 | Step: 328090 | Dataset: 0-5061960 | Loss: 0.766 | 598 ms/step , 115359.46 GFLOP/s , 173502.9 tokens/s INFO:__main__:2024-11-30 06:02:41 | Epoch: 0 | Step: 328100 | Dataset: 0-5064360 | Loss: 0.820 | 598 ms/step , 115347.33 GFLOP/s , 173471.7 tokens/s INFO:__main__:2024-11-30 06:02:48 | Epoch: 0 | Step: 328110 | Dataset: 0-5066760 | Loss: 0.844 | 599 ms/step , 115298.56 GFLOP/s , 173426.0 tokens/s INFO:__main__:2024-11-30 06:02:55 | Epoch: 0 | Step: 328120 | Dataset: 0-5069160 | Loss: 0.856 | 599 ms/step , 115280.31 GFLOP/s , 173476.7 tokens/s INFO:__main__:2024-11-30 06:03:02 | Epoch: 0 | Step: 328130 | Dataset: 0-5071560 | Loss: 0.759 | 598 ms/step , 115371.21 GFLOP/s , 173446.3 tokens/s INFO:__main__:2024-11-30 06:03:09 | Epoch: 0 | Step: 328140 | Dataset: 0-5073960 | Loss: 0.868 | 598 ms/step , 115487.14 GFLOP/s , 173510.9 tokens/s INFO:__main__:2024-11-30 06:03:16 | Epoch: 0 | Step: 328150 | Dataset: 0-5076360 | Loss: 0.672 | 598 ms/step , 115450.37 GFLOP/s , 173497.0 tokens/s INFO:__main__:2024-11-30 06:03:24 | Epoch: 0 | Step: 328160 | Dataset: 0-5078760 | Loss: 0.810 | 598 ms/step , 115437.33 GFLOP/s , 173509.7 tokens/s INFO:__main__:2024-11-30 06:03:31 | Epoch: 0 | Step: 328170 | Dataset: 0-5081160 | Loss: 0.770 | 598 ms/step , 115344.67 GFLOP/s , 173438.6 tokens/s INFO:__main__:2024-11-30 06:03:38 | Epoch: 0 | Step: 328180 | Dataset: 0-5083560 | Loss: 0.828 | 598 ms/step , 115457.36 GFLOP/s , 173393.7 tokens/s INFO:__main__:2024-11-30 06:03:45 | Epoch: 0 | Step: 328190 | Dataset: 0-5085960 | Loss: 0.804 | 598 ms/step , 115362.23 GFLOP/s , 173481.0 tokens/s INFO:__main__:2024-11-30 06:03:52 | Epoch: 0 | Step: 328200 | Dataset: 0-5088360 | Loss: 0.789 | 598 ms/step , 115431.08 GFLOP/s , 173509.5 tokens/s INFO:__main__:2024-11-30 06:03:59 | Epoch: 0 | Step: 328210 | Dataset: 0-5090760 | Loss: 0.851 | 599 ms/step , 115297.76 GFLOP/s , 173299.9 tokens/s INFO:__main__:2024-11-30 06:04:06 | Epoch: 0 | Step: 328220 | Dataset: 0-5093160 | Loss: 0.769 | 598 ms/step , 115501.62 GFLOP/s , 173512.1 tokens/s INFO:__main__:2024-11-30 06:04:13 | Epoch: 0 | Step: 328230 | Dataset: 0-5095560 | Loss: 0.749 | 598 ms/step , 115457.46 GFLOP/s , 173597.3 tokens/s INFO:__main__:2024-11-30 06:04:20 | Epoch: 0 | Step: 328240 | Dataset: 0-5097960 | Loss: 0.845 | 598 ms/step , 115345.97 GFLOP/s , 173480.9 tokens/s INFO:__main__:2024-11-30 06:04:27 | Epoch: 0 | Step: 328250 | Dataset: 0-5100360 | Loss: 0.856 | 598 ms/step , 115489.59 GFLOP/s , 173445.9 tokens/s INFO:__main__:2024-11-30 06:04:34 | Epoch: 0 | Step: 328260 | Dataset: 0-5102760 | Loss: 0.784 | 598 ms/step , 115375.19 GFLOP/s , 173396.9 tokens/s INFO:__main__:2024-11-30 06:04:41 | Epoch: 0 | Step: 328270 | Dataset: 0-5105160 | Loss: 0.856 | 598 ms/step , 115405.67 GFLOP/s , 173480.2 tokens/s INFO:__main__:2024-11-30 06:04:49 | Epoch: 0 | Step: 328280 | Dataset: 0-5107560 | Loss: 0.857 | 598 ms/step , 115327.99 GFLOP/s , 173428.9 tokens/s INFO:__main__:2024-11-30 06:04:56 | Epoch: 0 | Step: 328290 | Dataset: 0-5109960 | Loss: 0.836 | 597 ms/step , 115516.40 GFLOP/s , 173460.5 tokens/s INFO:__main__:2024-11-30 06:05:03 | Epoch: 0 | Step: 328300 | Dataset: 0-5112360 | Loss: 0.535 | 596 ms/step , 115733.78 GFLOP/s , 173557.4 tokens/s INFO:__main__:2024-11-30 06:05:10 | Epoch: 0 | Step: 328310 | Dataset: 0-5114760 | Loss: 0.564 | 597 ms/step , 115601.88 GFLOP/s , 173758.6 tokens/s INFO:__main__:2024-11-30 06:05:17 | Epoch: 0 | Step: 328320 | Dataset: 0-5117160 | Loss: 0.518 | 597 ms/step , 115676.40 GFLOP/s , 173650.0 tokens/s INFO:__main__:2024-11-30 06:05:24 | Epoch: 0 | Step: 328330 | Dataset: 0-5119560 | Loss: 0.540 | 597 ms/step , 115518.41 GFLOP/s , 173642.3 tokens/s INFO:__main__:2024-11-30 06:05:31 | Epoch: 0 | Step: 328340 | Dataset: 0-5121960 | Loss: 0.454 | 597 ms/step , 115642.02 GFLOP/s , 173648.9 tokens/s INFO:__main__:2024-11-30 06:05:38 | Epoch: 0 | Step: 328350 | Dataset: 0-5124360 | Loss: 0.530 | 597 ms/step , 115611.17 GFLOP/s , 173628.7 tokens/s INFO:__main__:2024-11-30 06:05:45 | Epoch: 0 | Step: 328360 | Dataset: 0-5126760 | Loss: 0.495 | 597 ms/step , 115564.51 GFLOP/s , 173615.1 tokens/s INFO:__main__:2024-11-30 06:05:52 | Epoch: 0 | Step: 328370 | Dataset: 0-5129160 | Loss: 0.454 | 596 ms/step , 115700.06 GFLOP/s , 173690.1 tokens/s INFO:__main__:2024-11-30 06:05:59 | Epoch: 0 | Step: 328380 | Dataset: 0-5131560 | Loss: 0.472 | 596 ms/step , 115804.23 GFLOP/s , 173875.6 tokens/s INFO:__main__:2024-11-30 06:06:06 | Epoch: 0 | Step: 328390 | Dataset: 0-5133960 | Loss: 0.486 | 598 ms/step , 115405.28 GFLOP/s , 173694.0 tokens/s INFO:__main__:2024-11-30 06:06:13 | Epoch: 0 | Step: 328400 | Dataset: 0-5136360 | Loss: 0.449 | 598 ms/step , 115444.87 GFLOP/s , 173666.8 tokens/s INFO:__main__:2024-11-30 06:06:21 | Epoch: 0 | Step: 328410 | Dataset: 0-5138760 | Loss: 0.503 | 597 ms/step , 115687.15 GFLOP/s , 173703.4 tokens/s INFO:__main__:2024-11-30 06:06:28 | Epoch: 0 | Step: 328420 | Dataset: 0-5141160 | Loss: 0.547 | 597 ms/step , 115570.35 GFLOP/s , 173671.7 tokens/s INFO:__main__:2024-11-30 06:06:35 | Epoch: 0 | Step: 328430 | Dataset: 0-5143560 | Loss: 0.480 | 597 ms/step , 115670.13 GFLOP/s , 173651.0 tokens/s INFO:__main__:2024-11-30 06:06:42 | Epoch: 0 | Step: 328440 | Dataset: 0-5145960 | Loss: 0.441 | 597 ms/step , 115555.32 GFLOP/s , 173684.8 tokens/s INFO:__main__:2024-11-30 06:06:49 | Epoch: 0 | Step: 328450 | Dataset: 0-5148360 | Loss: 0.490 | 597 ms/step , 115633.76 GFLOP/s , 173698.8 tokens/s INFO:__main__:2024-11-30 06:06:56 | Epoch: 0 | Step: 328460 | Dataset: 0-5150760 | Loss: 0.496 | 597 ms/step , 115680.43 GFLOP/s , 173797.3 tokens/s INFO:__main__:2024-11-30 06:07:03 | Epoch: 0 | Step: 328470 | Dataset: 0-5153160 | Loss: 0.548 | 598 ms/step , 115433.37 GFLOP/s , 173635.9 tokens/s INFO:__main__:2024-11-30 06:07:10 | Epoch: 0 | Step: 328480 | Dataset: 0-5155560 | Loss: 0.445 | 597 ms/step , 115584.30 GFLOP/s , 173610.8 tokens/s INFO:__main__:2024-11-30 06:07:17 | Epoch: 0 | Step: 328490 | Dataset: 0-5157960 | Loss: 0.474 | 598 ms/step , 115463.46 GFLOP/s , 173594.0 tokens/s INFO:__main__:2024-11-30 06:07:25 | Validation | Step: 328500 | Val_loss: 0.628 | Best_val_loss: 0.4363 INFO:__main__:2024-11-30 06:07:25 | Epoch: 0 | Step: 328500 | Dataset: 0-5160360 | Loss: 0.466 | 597 ms/step , 115570.84 GFLOP/s , 147754.0 tokens/s INFO:__main__:2024-11-30 06:07:33 | Epoch: 0 | Step: 328510 | Dataset: 0-5162760 | Loss: 0.454 | 597 ms/step , 115645.44 GFLOP/s , 173739.5 tokens/s INFO:__main__:2024-11-30 06:07:40 | Epoch: 0 | Step: 328520 | Dataset: 0-5165160 | Loss: 0.474 | 596 ms/step , 115767.82 GFLOP/s , 173708.3 tokens/s INFO:__main__:2024-11-30 06:07:47 | Epoch: 0 | Step: 328530 | Dataset: 0-5167560 | Loss: 0.482 | 596 ms/step , 115740.81 GFLOP/s , 173798.8 tokens/s INFO:__main__:2024-11-30 06:07:54 | Epoch: 0 | Step: 328540 | Dataset: 0-5169960 | Loss: 0.484 | 597 ms/step , 115621.58 GFLOP/s , 173753.4 tokens/s INFO:__main__:2024-11-30 06:08:01 | Epoch: 0 | Step: 328550 | Dataset: 0-5172360 | Loss: 0.464 | 597 ms/step , 115543.90 GFLOP/s , 173670.5 tokens/s INFO:__main__:2024-11-30 06:08:08 | Epoch: 0 | Step: 328560 | Dataset: 0-5174760 | Loss: 0.522 | 597 ms/step , 115568.52 GFLOP/s , 173698.4 tokens/s INFO:__main__:2024-11-30 06:08:15 | Epoch: 0 | Step: 328570 | Dataset: 0-5177160 | Loss: 0.488 | 597 ms/step , 115669.42 GFLOP/s , 173576.9 tokens/s INFO:__main__:2024-11-30 06:08:22 | Epoch: 0 | Step: 328580 | Dataset: 0-5179560 | Loss: 0.465 | 597 ms/step , 115568.17 GFLOP/s , 173625.6 tokens/s INFO:__main__:2024-11-30 06:08:29 | Epoch: 0 | Step: 328590 | Dataset: 0-5181960 | Loss: 0.487 | 597 ms/step , 115584.78 GFLOP/s , 173698.1 tokens/s INFO:__main__:2024-11-30 06:08:36 | Epoch: 0 | Step: 328600 | Dataset: 0-5184360 | Loss: 0.443 | 597 ms/step , 115571.12 GFLOP/s , 173750.7 tokens/s INFO:__main__:2024-11-30 06:08:43 | Epoch: 0 | Step: 328610 | Dataset: 0-5186760 | Loss: 0.516 | 598 ms/step , 115465.41 GFLOP/s , 173752.5 tokens/s INFO:__main__:2024-11-30 06:08:50 | Epoch: 0 | Step: 328620 | Dataset: 0-5189160 | Loss: 0.494 | 598 ms/step , 115463.31 GFLOP/s , 173613.8 tokens/s INFO:__main__:2024-11-30 06:08:57 | Epoch: 0 | Step: 328630 | Dataset: 0-5191560 | Loss: 0.476 | 597 ms/step , 115568.43 GFLOP/s , 173644.9 tokens/s INFO:__main__:2024-11-30 06:09:04 | Epoch: 0 | Step: 328640 | Dataset: 0-5193960 | Loss: 0.466 | 597 ms/step , 115516.83 GFLOP/s , 173657.9 tokens/s INFO:__main__:2024-11-30 06:09:12 | Epoch: 0 | Step: 328650 | Dataset: 0-5196360 | Loss: 0.541 | 598 ms/step , 115405.38 GFLOP/s , 173617.7 tokens/s INFO:__main__:2024-11-30 06:09:19 | Epoch: 0 | Step: 328660 | Dataset: 0-5198760 | Loss: 0.482 | 598 ms/step , 115432.64 GFLOP/s , 173606.7 tokens/s INFO:__main__:2024-11-30 06:09:26 | Epoch: 0 | Step: 328670 | Dataset: 0-5201160 | Loss: 0.488 | 596 ms/step , 115734.47 GFLOP/s , 173714.1 tokens/s INFO:__main__:2024-11-30 06:09:33 | Epoch: 0 | Step: 328680 | Dataset: 0-5203560 | Loss: 0.433 | 597 ms/step , 115681.61 GFLOP/s , 173789.0 tokens/s INFO:__main__:2024-11-30 06:09:40 | Epoch: 0 | Step: 328690 | Dataset: 0-5205960 | Loss: 0.472 | 597 ms/step , 115561.22 GFLOP/s , 173687.8 tokens/s INFO:__main__:2024-11-30 06:09:47 | Epoch: 0 | Step: 328700 | Dataset: 0-5208360 | Loss: 0.505 | 596 ms/step , 115695.89 GFLOP/s , 173568.7 tokens/s INFO:__main__:2024-11-30 06:09:54 | Epoch: 0 | Step: 328710 | Dataset: 0-5210760 | Loss: 0.482 | 597 ms/step , 115657.23 GFLOP/s , 173668.2 tokens/s INFO:__main__:2024-11-30 06:10:01 | Epoch: 0 | Step: 328720 | Dataset: 0-5213160 | Loss: 0.528 | 597 ms/step , 115583.76 GFLOP/s , 173663.2 tokens/s INFO:__main__:2024-11-30 06:10:08 | Epoch: 0 | Step: 328730 | Dataset: 0-5215560 | Loss: 0.441 | 596 ms/step , 115704.64 GFLOP/s , 173672.8 tokens/s INFO:__main__:2024-11-30 06:10:15 | Epoch: 0 | Step: 328740 | Dataset: 0-5217960 | Loss: 0.498 | 598 ms/step , 115401.60 GFLOP/s , 173619.9 tokens/s INFO:__main__:2024-11-30 06:10:22 | Epoch: 0 | Step: 328750 | Dataset: 0-5220360 | Loss: 0.507 | 596 ms/step , 115716.57 GFLOP/s , 173738.9 tokens/s INFO:__main__:2024-11-30 06:10:29 | Epoch: 0 | Step: 328760 | Dataset: 0-5222760 | Loss: 0.454 | 597 ms/step , 115670.94 GFLOP/s , 173720.3 tokens/s INFO:__main__:2024-11-30 06:10:36 | Epoch: 0 | Step: 328770 | Dataset: 0-5225160 | Loss: 0.420 | 597 ms/step , 115608.82 GFLOP/s , 173670.1 tokens/s INFO:__main__:2024-11-30 06:10:44 | Epoch: 0 | Step: 328780 | Dataset: 0-5227560 | Loss: 0.477 | 598 ms/step , 115481.53 GFLOP/s , 173637.6 tokens/s INFO:__main__:2024-11-30 06:10:51 | Epoch: 0 | Step: 328790 | Dataset: 0-5229960 | Loss: 0.488 | 598 ms/step , 115409.42 GFLOP/s , 173587.1 tokens/s INFO:__main__:2024-11-30 06:10:58 | Epoch: 0 | Step: 328800 | Dataset: 0-5232360 | Loss: 0.506 | 597 ms/step , 115678.15 GFLOP/s , 173688.9 tokens/s INFO:__main__:2024-11-30 06:11:05 | Epoch: 0 | Step: 328810 | Dataset: 0-5234760 | Loss: 0.505 | 597 ms/step , 115575.09 GFLOP/s , 173623.0 tokens/s INFO:__main__:2024-11-30 06:11:12 | Epoch: 0 | Step: 328820 | Dataset: 0-5237160 | Loss: 0.487 | 597 ms/step , 115580.66 GFLOP/s , 173628.1 tokens/s INFO:__main__:2024-11-30 06:11:19 | Epoch: 0 | Step: 328830 | Dataset: 0-5239560 | Loss: 0.499 | 597 ms/step , 115525.48 GFLOP/s , 173795.3 tokens/s INFO:__main__:2024-11-30 06:11:26 | Epoch: 0 | Step: 328840 | Dataset: 0-5241960 | Loss: 0.500 | 597 ms/step , 115686.67 GFLOP/s , 173735.6 tokens/s INFO:__main__:2024-11-30 06:11:33 | Epoch: 0 | Step: 328850 | Dataset: 0-5244360 | Loss: 0.640 | 597 ms/step , 115566.14 GFLOP/s , 173554.5 tokens/s INFO:__main__:2024-11-30 06:11:40 | Epoch: 0 | Step: 328860 | Dataset: 0-5246760 | Loss: 0.618 | 597 ms/step , 115583.53 GFLOP/s , 173571.6 tokens/s INFO:__main__:2024-11-30 06:11:47 | Epoch: 0 | Step: 328870 | Dataset: 0-5249160 | Loss: 0.639 | 597 ms/step , 115570.85 GFLOP/s , 173567.2 tokens/s INFO:__main__:2024-11-30 06:11:54 | Epoch: 0 | Step: 328880 | Dataset: 0-5251560 | Loss: 0.595 | 597 ms/step , 115563.28 GFLOP/s , 173631.1 tokens/s INFO:__main__:2024-11-30 06:12:01 | Epoch: 0 | Step: 328890 | Dataset: 0-5253960 | Loss: 0.629 | 597 ms/step , 115613.03 GFLOP/s , 173563.4 tokens/s INFO:__main__:2024-11-30 06:12:08 | Epoch: 0 | Step: 328900 | Dataset: 0-5256360 | Loss: 0.509 | 596 ms/step , 115701.35 GFLOP/s , 173658.8 tokens/s INFO:__main__:2024-11-30 06:12:16 | Epoch: 0 | Step: 328910 | Dataset: 0-5258760 | Loss: 0.637 | 598 ms/step , 115458.02 GFLOP/s , 173772.6 tokens/s INFO:__main__:2024-11-30 06:12:23 | Epoch: 0 | Step: 328920 | Dataset: 0-5261160 | Loss: 0.613 | 597 ms/step , 115602.40 GFLOP/s , 173619.7 tokens/s INFO:__main__:2024-11-30 06:12:30 | Epoch: 0 | Step: 328930 | Dataset: 0-5263560 | Loss: 0.543 | 597 ms/step , 115661.22 GFLOP/s , 173565.4 tokens/s INFO:__main__:2024-11-30 06:12:37 | Epoch: 0 | Step: 328940 | Dataset: 0-5265960 | Loss: 0.584 | 597 ms/step , 115647.28 GFLOP/s , 173605.2 tokens/s INFO:__main__:2024-11-30 06:12:44 | Epoch: 0 | Step: 328950 | Dataset: 0-5268360 | Loss: 0.559 | 598 ms/step , 115440.30 GFLOP/s , 173490.2 tokens/s INFO:__main__:2024-11-30 06:12:51 | Epoch: 0 | Step: 328960 | Dataset: 0-5270760 | Loss: 0.524 | 597 ms/step , 115636.50 GFLOP/s , 173570.1 tokens/s INFO:__main__:2024-11-30 06:12:58 | Epoch: 0 | Step: 328970 | Dataset: 0-5273160 | Loss: 0.597 | 598 ms/step , 115484.47 GFLOP/s , 173571.3 tokens/s INFO:__main__:2024-11-30 06:13:05 | Epoch: 0 | Step: 328980 | Dataset: 0-5275560 | Loss: 0.567 | 597 ms/step , 115618.05 GFLOP/s , 173752.1 tokens/s INFO:__main__:2024-11-30 06:13:12 | Epoch: 0 | Step: 328990 | Dataset: 0-5277960 | Loss: 0.589 | 597 ms/step , 115576.89 GFLOP/s , 173677.0 tokens/s INFO:__main__:2024-11-30 06:13:20 | Validation | Step: 329000 | Val_loss: 0.771 | Best_val_loss: 0.4363 INFO:__main__:2024-11-30 06:13:20 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_061320_step_329000.pt` INFO:__main__:2024-11-30 06:13:23 | Epoch: 0 | Step: 329000 | Dataset: 0-5280360 | Loss: 0.526 | 596 ms/step , 115835.64 GFLOP/s , 118366.2 tokens/s INFO:__main__:2024-11-30 06:13:30 | Epoch: 0 | Step: 329010 | Dataset: 0-5282760 | Loss: 0.597 | 599 ms/step , 115177.66 GFLOP/s , 173203.9 tokens/s INFO:__main__:2024-11-30 06:13:37 | Epoch: 0 | Step: 329020 | Dataset: 0-5285160 | Loss: 0.619 | 597 ms/step , 115572.92 GFLOP/s , 173198.8 tokens/s INFO:__main__:2024-11-30 06:13:44 | Epoch: 0 | Step: 329030 | Dataset: 0-5287560 | Loss: 0.602 | 597 ms/step , 115543.16 GFLOP/s , 173133.3 tokens/s INFO:__main__:2024-11-30 06:13:51 | Epoch: 0 | Step: 329040 | Dataset: 0-5289960 | Loss: 0.623 | 597 ms/step , 115578.40 GFLOP/s , 173452.5 tokens/s INFO:__main__:2024-11-30 06:13:58 | Epoch: 0 | Step: 329050 | Dataset: 0-5292360 | Loss: 0.634 | 598 ms/step , 115486.29 GFLOP/s , 173800.5 tokens/s INFO:__main__:2024-11-30 06:14:05 | Epoch: 0 | Step: 329060 | Dataset: 0-5294760 | Loss: 0.601 | 597 ms/step , 115573.32 GFLOP/s , 173715.7 tokens/s INFO:__main__:2024-11-30 06:14:12 | Epoch: 0 | Step: 329070 | Dataset: 0-5297160 | Loss: 0.604 | 598 ms/step , 115421.01 GFLOP/s , 173592.1 tokens/s INFO:__main__:2024-11-30 06:14:19 | Epoch: 0 | Step: 329080 | Dataset: 0-5299560 | Loss: 0.630 | 598 ms/step , 115414.37 GFLOP/s , 173606.1 tokens/s INFO:__main__:2024-11-30 06:14:26 | Epoch: 0 | Step: 329090 | Dataset: 0-5301960 | Loss: 0.535 | 598 ms/step , 115433.42 GFLOP/s , 173589.6 tokens/s INFO:__main__:2024-11-30 06:14:33 | Epoch: 0 | Step: 329100 | Dataset: 0-5304360 | Loss: 0.656 | 597 ms/step , 115556.69 GFLOP/s , 173552.5 tokens/s INFO:__main__:2024-11-30 06:14:40 | Epoch: 0 | Step: 329110 | Dataset: 0-5306760 | Loss: 0.639 | 597 ms/step , 115581.10 GFLOP/s , 173565.3 tokens/s INFO:__main__:2024-11-30 06:14:48 | Epoch: 0 | Step: 329120 | Dataset: 0-5309160 | Loss: 0.587 | 596 ms/step , 115765.03 GFLOP/s , 173644.9 tokens/s INFO:__main__:2024-11-30 06:14:55 | Epoch: 0 | Step: 329130 | Dataset: 0-5311560 | Loss: 0.567 | 597 ms/step , 115647.93 GFLOP/s , 173717.9 tokens/s INFO:__main__:2024-11-30 06:15:02 | Epoch: 0 | Step: 329140 | Dataset: 0-5313960 | Loss: 0.607 | 598 ms/step , 115479.73 GFLOP/s , 173541.7 tokens/s INFO:__main__:2024-11-30 06:15:09 | Epoch: 0 | Step: 329150 | Dataset: 0-5316360 | Loss: 0.576 | 598 ms/step , 115497.66 GFLOP/s , 173571.8 tokens/s INFO:__main__:2024-11-30 06:15:16 | Epoch: 0 | Step: 329160 | Dataset: 0-5318760 | Loss: 0.567 | 597 ms/step , 115584.24 GFLOP/s , 173563.4 tokens/s INFO:__main__:2024-11-30 06:15:23 | Epoch: 0 | Step: 329170 | Dataset: 0-5321160 | Loss: 0.591 | 598 ms/step , 115433.17 GFLOP/s , 173574.9 tokens/s INFO:__main__:2024-11-30 06:15:30 | Epoch: 0 | Step: 329180 | Dataset: 0-5323560 | Loss: 0.556 | 598 ms/step , 115473.73 GFLOP/s , 173544.7 tokens/s INFO:__main__:2024-11-30 06:15:37 | Epoch: 0 | Step: 329190 | Dataset: 0-5325960 | Loss: 0.624 | 597 ms/step , 115575.12 GFLOP/s , 173572.3 tokens/s INFO:__main__:2024-11-30 06:15:44 | Epoch: 0 | Step: 329200 | Dataset: 0-5328360 | Loss: 0.628 | 598 ms/step , 115496.66 GFLOP/s , 173739.9 tokens/s INFO:__main__:2024-11-30 06:15:51 | Epoch: 0 | Step: 329210 | Dataset: 0-5330760 | Loss: 0.582 | 597 ms/step , 115581.88 GFLOP/s , 173710.8 tokens/s INFO:__main__:2024-11-30 06:15:58 | Epoch: 0 | Step: 329220 | Dataset: 0-5333160 | Loss: 0.533 | 597 ms/step , 115557.44 GFLOP/s , 173494.5 tokens/s INFO:__main__:2024-11-30 06:16:05 | Epoch: 0 | Step: 329230 | Dataset: 0-5335560 | Loss: 0.591 | 597 ms/step , 115530.24 GFLOP/s , 173561.1 tokens/s INFO:__main__:2024-11-30 06:16:12 | Epoch: 0 | Step: 329240 | Dataset: 0-5337960 | Loss: 0.610 | 599 ms/step , 115291.16 GFLOP/s , 173529.0 tokens/s INFO:__main__:2024-11-30 06:16:20 | Epoch: 0 | Step: 329250 | Dataset: 0-5340360 | Loss: 0.550 | 598 ms/step , 115425.28 GFLOP/s , 173531.5 tokens/s INFO:__main__:2024-11-30 06:16:27 | Epoch: 0 | Step: 329260 | Dataset: 0-5342760 | Loss: 0.593 | 597 ms/step , 115520.53 GFLOP/s , 173508.3 tokens/s INFO:__main__:2024-11-30 06:16:34 | Epoch: 0 | Step: 329270 | Dataset: 0-5345160 | Loss: 0.540 | 597 ms/step , 115569.30 GFLOP/s , 173615.2 tokens/s INFO:__main__:2024-11-30 06:16:41 | Epoch: 0 | Step: 329280 | Dataset: 0-5347560 | Loss: 0.581 | 598 ms/step , 115473.36 GFLOP/s , 173694.0 tokens/s INFO:__main__:2024-11-30 06:16:48 | Epoch: 0 | Step: 329290 | Dataset: 0-5349960 | Loss: 0.633 | 598 ms/step , 115467.25 GFLOP/s , 173566.5 tokens/s INFO:__main__:2024-11-30 06:16:55 | Epoch: 0 | Step: 329300 | Dataset: 0-5352360 | Loss: 0.584 | 598 ms/step , 115472.54 GFLOP/s , 173541.9 tokens/s INFO:__main__:2024-11-30 06:17:02 | Epoch: 0 | Step: 329310 | Dataset: 0-5354760 | Loss: 0.634 | 597 ms/step , 115557.75 GFLOP/s , 173582.9 tokens/s INFO:__main__:2024-11-30 06:17:09 | Epoch: 0 | Step: 329320 | Dataset: 0-5357160 | Loss: 0.573 | 597 ms/step , 115547.59 GFLOP/s , 173509.0 tokens/s INFO:__main__:2024-11-30 06:17:16 | Epoch: 0 | Step: 329330 | Dataset: 0-5359560 | Loss: 0.684 | 598 ms/step , 115347.41 GFLOP/s , 173496.6 tokens/s INFO:__main__:2024-11-30 06:17:23 | Epoch: 0 | Step: 329340 | Dataset: 0-5361960 | Loss: 0.680 | 597 ms/step , 115510.81 GFLOP/s , 173598.1 tokens/s INFO:__main__:2024-11-30 06:17:30 | Epoch: 0 | Step: 329350 | Dataset: 0-5364360 | Loss: 0.554 | 597 ms/step , 115682.25 GFLOP/s , 173581.5 tokens/s INFO:__main__:2024-11-30 06:17:37 | Epoch: 0 | Step: 329360 | Dataset: 0-5366760 | Loss: 0.591 | 597 ms/step , 115647.09 GFLOP/s , 173631.1 tokens/s INFO:__main__:2024-11-30 06:17:45 | Epoch: 0 | Step: 329370 | Dataset: 0-5369160 | Loss: 0.566 | 598 ms/step , 115452.85 GFLOP/s , 173549.2 tokens/s INFO:__main__:2024-11-30 06:17:52 | Epoch: 0 | Step: 329380 | Dataset: 0-5371560 | Loss: 0.576 | 597 ms/step , 115577.79 GFLOP/s , 173534.4 tokens/s INFO:__main__:2024-11-30 06:17:59 | Epoch: 0 | Step: 329390 | Dataset: 0-5373960 | Loss: 0.562 | 597 ms/step , 115540.50 GFLOP/s , 173566.6 tokens/s INFO:__main__:2024-11-30 06:18:06 | Epoch: 0 | Step: 329400 | Dataset: 0-5376360 | Loss: 0.645 | 597 ms/step , 115533.97 GFLOP/s , 173491.0 tokens/s INFO:__main__:2024-11-30 06:18:13 | Epoch: 0 | Step: 329410 | Dataset: 0-5378760 | Loss: 0.688 | 597 ms/step , 115515.88 GFLOP/s , 173469.2 tokens/s INFO:__main__:2024-11-30 06:18:20 | Epoch: 0 | Step: 329420 | Dataset: 0-5381160 | Loss: 0.684 | 597 ms/step , 115535.75 GFLOP/s , 173599.1 tokens/s INFO:__main__:2024-11-30 06:18:27 | Epoch: 0 | Step: 329430 | Dataset: 0-5383560 | Loss: 0.606 | 597 ms/step , 115524.46 GFLOP/s , 173688.6 tokens/s INFO:__main__:2024-11-30 06:18:34 | Epoch: 0 | Step: 329440 | Dataset: 0-5385960 | Loss: 0.613 | 597 ms/step , 115606.16 GFLOP/s , 173562.8 tokens/s INFO:__main__:2024-11-30 06:18:41 | Epoch: 0 | Step: 329450 | Dataset: 0-5388360 | Loss: 0.734 | 598 ms/step , 115402.77 GFLOP/s , 173482.1 tokens/s INFO:__main__:2024-11-30 06:18:48 | Epoch: 0 | Step: 329460 | Dataset: 0-5390760 | Loss: 0.720 | 598 ms/step , 115406.74 GFLOP/s , 173540.4 tokens/s INFO:__main__:2024-11-30 06:18:55 | Epoch: 0 | Step: 329470 | Dataset: 0-5393160 | Loss: 0.722 | 598 ms/step , 115432.21 GFLOP/s , 173546.1 tokens/s INFO:__main__:2024-11-30 06:19:02 | Epoch: 0 | Step: 329480 | Dataset: 0-5395560 | Loss: 0.745 | 597 ms/step , 115533.20 GFLOP/s , 173517.8 tokens/s INFO:__main__:2024-11-30 06:19:09 | Epoch: 0 | Step: 329490 | Dataset: 0-5397960 | Loss: 0.705 | 598 ms/step , 115498.68 GFLOP/s , 173553.6 tokens/s INFO:__main__:2024-11-30 06:19:17 | Validation | Step: 329500 | Val_loss: 0.749 | Best_val_loss: 0.4363 INFO:__main__:2024-11-30 06:19:18 | Epoch: 0 | Step: 329500 | Dataset: 0-5400360 | Loss: 0.589 | 594 ms/step , 116101.89 GFLOP/s , 147772.8 tokens/s INFO:__main__:2024-11-30 06:19:25 | Epoch: 0 | Step: 329510 | Dataset: 0-5402760 | Loss: 0.774 | 597 ms/step , 115583.40 GFLOP/s , 173656.3 tokens/s INFO:__main__:2024-11-30 06:19:32 | Epoch: 0 | Step: 329520 | Dataset: 0-5405160 | Loss: 0.716 | 598 ms/step , 115460.42 GFLOP/s , 173449.5 tokens/s INFO:__main__:2024-11-30 06:19:39 | Epoch: 0 | Step: 329530 | Dataset: 0-5407560 | Loss: 0.634 | 597 ms/step , 115564.96 GFLOP/s , 173521.2 tokens/s INFO:__main__:2024-11-30 06:19:46 | Epoch: 0 | Step: 329540 | Dataset: 0-5409960 | Loss: 0.673 | 598 ms/step , 115470.49 GFLOP/s , 173604.7 tokens/s INFO:__main__:2024-11-30 06:19:53 | Epoch: 0 | Step: 329550 | Dataset: 0-5412360 | Loss: 0.790 | 598 ms/step , 115382.62 GFLOP/s , 173529.9 tokens/s INFO:__main__:2024-11-30 06:20:00 | Epoch: 0 | Step: 329560 | Dataset: 0-5414760 | Loss: 0.662 | 597 ms/step , 115512.95 GFLOP/s , 173450.6 tokens/s INFO:__main__:2024-11-30 06:20:07 | Epoch: 0 | Step: 329570 | Dataset: 0-5417160 | Loss: 0.688 | 597 ms/step , 115671.29 GFLOP/s , 173653.4 tokens/s INFO:__main__:2024-11-30 06:20:14 | Epoch: 0 | Step: 329580 | Dataset: 0-5419560 | Loss: 0.739 | 597 ms/step , 115563.92 GFLOP/s , 173682.4 tokens/s INFO:__main__:2024-11-30 06:20:22 | Epoch: 0 | Step: 329590 | Dataset: 0-5421960 | Loss: 0.689 | 598 ms/step , 115459.95 GFLOP/s , 173505.5 tokens/s INFO:__main__:2024-11-30 06:20:29 | Epoch: 0 | Step: 329600 | Dataset: 0-5424360 | Loss: 0.733 | 598 ms/step , 115351.89 GFLOP/s , 173518.6 tokens/s INFO:__main__:2024-11-30 06:20:36 | Epoch: 0 | Step: 329610 | Dataset: 0-5426760 | Loss: 0.665 | 598 ms/step , 115430.01 GFLOP/s , 173564.0 tokens/s INFO:__main__:2024-11-30 06:20:43 | Epoch: 0 | Step: 329620 | Dataset: 0-5429160 | Loss: 0.680 | 597 ms/step , 115511.40 GFLOP/s , 173467.6 tokens/s INFO:__main__:2024-11-30 06:20:50 | Epoch: 0 | Step: 329630 | Dataset: 0-5431560 | Loss: 0.688 | 598 ms/step , 115478.51 GFLOP/s , 173537.2 tokens/s INFO:__main__:2024-11-30 06:20:57 | Epoch: 0 | Step: 329640 | Dataset: 0-5433960 | Loss: 0.720 | 598 ms/step , 115451.05 GFLOP/s , 173545.4 tokens/s INFO:__main__:2024-11-30 06:21:04 | Epoch: 0 | Step: 329650 | Dataset: 0-5436360 | Loss: 0.777 | 597 ms/step , 115506.05 GFLOP/s , 173661.7 tokens/s INFO:__main__:2024-11-30 06:21:11 | Epoch: 0 | Step: 329660 | Dataset: 0-5438760 | Loss: 0.705 | 598 ms/step , 115416.87 GFLOP/s , 173602.4 tokens/s INFO:__main__:2024-11-30 06:21:18 | Epoch: 0 | Step: 329670 | Dataset: 0-5441160 | Loss: 0.614 | 597 ms/step , 115528.35 GFLOP/s , 173453.7 tokens/s INFO:__main__:2024-11-30 06:21:25 | Epoch: 0 | Step: 329680 | Dataset: 0-5443560 | Loss: 0.811 | 598 ms/step , 115442.23 GFLOP/s , 173499.7 tokens/s INFO:__main__:2024-11-30 06:21:32 | Epoch: 0 | Step: 329690 | Dataset: 0-5445960 | Loss: 0.681 | 597 ms/step , 115540.30 GFLOP/s , 173493.3 tokens/s INFO:__main__:2024-11-30 06:21:39 | Epoch: 0 | Step: 329700 | Dataset: 0-5448360 | Loss: 0.775 | 597 ms/step , 115596.53 GFLOP/s , 173500.9 tokens/s INFO:__main__:2024-11-30 06:21:46 | Epoch: 0 | Step: 329710 | Dataset: 0-5450760 | Loss: 0.630 | 597 ms/step , 115569.29 GFLOP/s , 173514.1 tokens/s INFO:__main__:2024-11-30 06:21:54 | Epoch: 0 | Step: 329720 | Dataset: 0-5453160 | Loss: 0.690 | 597 ms/step , 115656.48 GFLOP/s , 173621.4 tokens/s INFO:__main__:2024-11-30 06:22:01 | Epoch: 0 | Step: 329730 | Dataset: 0-5455560 | Loss: 0.787 | 597 ms/step , 115547.89 GFLOP/s , 173610.4 tokens/s INFO:__main__:2024-11-30 06:22:08 | Epoch: 0 | Step: 329740 | Dataset: 0-5457960 | Loss: 0.701 | 598 ms/step , 115458.43 GFLOP/s , 173481.8 tokens/s INFO:__main__:2024-11-30 06:22:15 | Epoch: 0 | Step: 329750 | Dataset: 0-5460360 | Loss: 0.704 | 597 ms/step , 115663.35 GFLOP/s , 173599.5 tokens/s INFO:__main__:2024-11-30 06:22:22 | Epoch: 0 | Step: 329760 | Dataset: 0-5462760 | Loss: 0.695 | 597 ms/step , 115621.15 GFLOP/s , 173516.9 tokens/s INFO:__main__:2024-11-30 06:22:29 | Epoch: 0 | Step: 329770 | Dataset: 0-5465160 | Loss: 0.749 | 598 ms/step , 115428.02 GFLOP/s , 173494.6 tokens/s INFO:__main__:2024-11-30 06:22:36 | Epoch: 0 | Step: 329780 | Dataset: 0-5467560 | Loss: 0.686 | 598 ms/step , 115448.16 GFLOP/s , 173539.3 tokens/s INFO:__main__:2024-11-30 06:22:43 | Epoch: 0 | Step: 329790 | Dataset: 0-5469960 | Loss: 0.637 | 598 ms/step , 115464.19 GFLOP/s , 173487.1 tokens/s INFO:__main__:2024-11-30 06:22:50 | Epoch: 0 | Step: 329800 | Dataset: 0-5472360 | Loss: 0.677 | 597 ms/step , 115649.23 GFLOP/s , 173720.6 tokens/s INFO:__main__:2024-11-30 06:22:57 | Epoch: 0 | Step: 329810 | Dataset: 0-5474760 | Loss: 0.637 | 597 ms/step , 115673.01 GFLOP/s , 173637.0 tokens/s INFO:__main__:2024-11-30 06:23:04 | Epoch: 0 | Step: 329820 | Dataset: 0-5477160 | Loss: 0.719 | 597 ms/step , 115557.14 GFLOP/s , 173476.8 tokens/s INFO:__main__:2024-11-30 06:23:11 | Epoch: 0 | Step: 329830 | Dataset: 0-5479560 | Loss: 0.653 | 598 ms/step , 115468.74 GFLOP/s , 173493.2 tokens/s INFO:__main__:2024-11-30 06:23:19 | Epoch: 0 | Step: 329840 | Dataset: 0-5481960 | Loss: 0.679 | 597 ms/step , 115534.80 GFLOP/s , 173482.9 tokens/s INFO:__main__:2024-11-30 06:23:26 | Epoch: 0 | Step: 329850 | Dataset: 0-5484360 | Loss: 0.770 | 598 ms/step , 115468.84 GFLOP/s , 173543.2 tokens/s INFO:__main__:2024-11-30 06:23:33 | Epoch: 0 | Step: 329860 | Dataset: 0-5486760 | Loss: 0.611 | 598 ms/step , 115338.73 GFLOP/s , 173518.3 tokens/s INFO:__main__:2024-11-30 06:23:40 | Epoch: 0 | Step: 329870 | Dataset: 0-5489160 | Loss: 0.655 | 597 ms/step , 115597.61 GFLOP/s , 173638.7 tokens/s INFO:__main__:2024-11-30 06:23:47 | Epoch: 0 | Step: 329880 | Dataset: 0-5491560 | Loss: 0.626 | 597 ms/step , 115639.05 GFLOP/s , 173676.7 tokens/s INFO:__main__:2024-11-30 06:23:54 | Epoch: 0 | Step: 329890 | Dataset: 0-5493960 | Loss: 0.740 | 598 ms/step , 115472.91 GFLOP/s , 173510.6 tokens/s INFO:__main__:2024-11-30 06:24:01 | Epoch: 0 | Step: 329900 | Dataset: 0-5496360 | Loss: 0.626 | 598 ms/step , 115488.58 GFLOP/s , 173472.8 tokens/s INFO:__main__:2024-11-30 06:24:08 | Epoch: 0 | Step: 329910 | Dataset: 0-5498760 | Loss: 0.623 | 597 ms/step , 115515.56 GFLOP/s , 173480.5 tokens/s INFO:__main__:2024-11-30 06:24:15 | Epoch: 0 | Step: 329920 | Dataset: 0-5501160 | Loss: 0.593 | 598 ms/step , 115452.67 GFLOP/s , 173469.3 tokens/s INFO:__main__:2024-11-30 06:24:22 | Epoch: 0 | Step: 329930 | Dataset: 0-5503560 | Loss: 0.680 | 598 ms/step , 115444.02 GFLOP/s , 173553.1 tokens/s INFO:__main__:2024-11-30 06:24:29 | Epoch: 0 | Step: 329940 | Dataset: 0-5505960 | Loss: 0.660 | 597 ms/step , 115617.44 GFLOP/s , 173550.6 tokens/s INFO:__main__:2024-11-30 06:24:36 | Epoch: 0 | Step: 329950 | Dataset: 0-5508360 | Loss: 0.719 | 597 ms/step , 115575.56 GFLOP/s , 173594.0 tokens/s INFO:__main__:2024-11-30 06:24:43 | Epoch: 0 | Step: 329960 | Dataset: 0-5510760 | Loss: 0.623 | 598 ms/step , 115489.59 GFLOP/s , 173585.1 tokens/s INFO:__main__:2024-11-30 06:24:51 | Epoch: 0 | Step: 329970 | Dataset: 0-5513160 | Loss: 0.678 | 598 ms/step , 115448.12 GFLOP/s , 173510.3 tokens/s INFO:__main__:2024-11-30 06:24:58 | Epoch: 0 | Step: 329980 | Dataset: 0-5515560 | Loss: 0.744 | 598 ms/step , 115479.90 GFLOP/s , 173525.3 tokens/s INFO:__main__:2024-11-30 06:25:05 | Epoch: 0 | Step: 329990 | Dataset: 0-5517960 | Loss: 0.657 | 598 ms/step , 115387.22 GFLOP/s , 173522.6 tokens/s INFO:__main__:2024-11-30 06:25:12 | Validation | Step: 330000 | Val_loss: 0.765 | Best_val_loss: 0.4363 INFO:__main__:2024-11-30 06:25:12 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_062512_step_330000.pt` INFO:__main__:2024-11-30 06:25:15 | Epoch: 0 | Step: 330000 | Dataset: 0-5520360 | Loss: 0.680 | 597 ms/step , 115674.76 GFLOP/s , 117998.7 tokens/s INFO:__main__:2024-11-30 06:25:22 | Epoch: 0 | Step: 330010 | Dataset: 0-5522760 | Loss: 0.820 | 599 ms/step , 115263.17 GFLOP/s , 173191.6 tokens/s INFO:__main__:2024-11-30 06:25:29 | Epoch: 0 | Step: 330020 | Dataset: 0-5525160 | Loss: 0.689 | 597 ms/step , 115513.66 GFLOP/s , 173309.5 tokens/s INFO:__main__:2024-11-30 06:25:36 | Epoch: 0 | Step: 330030 | Dataset: 0-5527560 | Loss: 0.764 | 597 ms/step , 115556.24 GFLOP/s , 173199.7 tokens/s INFO:__main__:2024-11-30 06:25:44 | Epoch: 0 | Step: 330040 | Dataset: 0-5529960 | Loss: 0.708 | 598 ms/step , 115389.57 GFLOP/s , 173083.0 tokens/s INFO:__main__:2024-11-30 06:25:51 | Epoch: 0 | Step: 330050 | Dataset: 0-5532360 | Loss: 0.632 | 598 ms/step , 115411.27 GFLOP/s , 173631.7 tokens/s INFO:__main__:2024-11-30 06:25:58 | Epoch: 0 | Step: 330060 | Dataset: 0-5534760 | Loss: 0.767 | 597 ms/step , 115505.31 GFLOP/s , 173524.3 tokens/s INFO:__main__:2024-11-30 06:26:05 | Epoch: 0 | Step: 330070 | Dataset: 0-5537160 | Loss: 0.770 | 597 ms/step , 115549.77 GFLOP/s , 173540.6 tokens/s INFO:__main__:2024-11-30 06:26:12 | Epoch: 0 | Step: 330080 | Dataset: 0-5539560 | Loss: 0.756 | 598 ms/step , 115475.73 GFLOP/s , 173543.6 tokens/s INFO:__main__:2024-11-30 06:26:19 | Epoch: 0 | Step: 330090 | Dataset: 0-5541960 | Loss: 0.768 | 597 ms/step , 115570.80 GFLOP/s , 173649.3 tokens/s INFO:__main__:2024-11-30 06:26:26 | Epoch: 0 | Step: 330100 | Dataset: 0-5544360 | Loss: 0.690 | 597 ms/step , 115674.67 GFLOP/s , 173727.3 tokens/s INFO:__main__:2024-11-30 06:26:33 | Epoch: 0 | Step: 330110 | Dataset: 0-5546760 | Loss: 0.670 | 596 ms/step , 115703.08 GFLOP/s , 173557.0 tokens/s INFO:__main__:2024-11-30 06:26:40 | Epoch: 0 | Step: 330120 | Dataset: 0-5549160 | Loss: 0.742 | 597 ms/step , 115560.96 GFLOP/s , 173503.2 tokens/s INFO:__main__:2024-11-30 06:26:47 | Epoch: 0 | Step: 330130 | Dataset: 0-5551560 | Loss: 0.650 | 597 ms/step , 115599.73 GFLOP/s , 173547.9 tokens/s INFO:__main__:2024-11-30 06:26:54 | Epoch: 0 | Step: 330140 | Dataset: 0-5553960 | Loss: 0.706 | 597 ms/step , 115556.50 GFLOP/s , 173550.3 tokens/s INFO:__main__:2024-11-30 06:27:01 | Epoch: 0 | Step: 330150 | Dataset: 0-5556360 | Loss: 0.630 | 598 ms/step , 115493.50 GFLOP/s , 173487.7 tokens/s INFO:__main__:2024-11-30 06:27:08 | Epoch: 0 | Step: 330160 | Dataset: 0-5558760 | Loss: 0.767 | 598 ms/step , 115426.62 GFLOP/s , 173566.4 tokens/s INFO:__main__:2024-11-30 06:27:16 | Epoch: 0 | Step: 330170 | Dataset: 0-5561160 | Loss: 0.789 | 597 ms/step , 115572.58 GFLOP/s , 173712.1 tokens/s INFO:__main__:2024-11-30 06:27:23 | Epoch: 0 | Step: 330180 | Dataset: 0-5563560 | Loss: 0.676 | 597 ms/step , 115581.62 GFLOP/s , 173619.4 tokens/s INFO:__main__:2024-11-30 06:27:30 | Epoch: 0 | Step: 330190 | Dataset: 0-5565960 | Loss: 0.753 | 598 ms/step , 115476.85 GFLOP/s , 173545.0 tokens/s INFO:__main__:2024-11-30 06:27:37 | Epoch: 0 | Step: 330200 | Dataset: 0-5568360 | Loss: 0.739 | 598 ms/step , 115470.37 GFLOP/s , 173496.9 tokens/s INFO:__main__:2024-11-30 06:27:44 | Epoch: 0 | Step: 330210 | Dataset: 0-5570760 | Loss: 0.633 | 597 ms/step , 115566.56 GFLOP/s , 173510.9 tokens/s INFO:__main__:2024-11-30 06:27:51 | Epoch: 0 | Step: 330220 | Dataset: 0-5573160 | Loss: 0.709 | 598 ms/step , 115394.15 GFLOP/s , 173487.6 tokens/s INFO:__main__:2024-11-30 06:27:58 | Epoch: 0 | Step: 330230 | Dataset: 0-5575560 | Loss: 0.745 | 598 ms/step , 115459.14 GFLOP/s , 173509.0 tokens/s INFO:__main__:2024-11-30 06:28:05 | Epoch: 0 | Step: 330240 | Dataset: 0-5577960 | Loss: 0.712 | 598 ms/step , 115478.93 GFLOP/s , 173618.5 tokens/s INFO:__main__:2024-11-30 06:28:12 | Epoch: 0 | Step: 330250 | Dataset: 0-5580360 | Loss: 0.719 | 596 ms/step , 115736.96 GFLOP/s , 173698.9 tokens/s INFO:__main__:2024-11-30 06:28:19 | Epoch: 0 | Step: 330260 | Dataset: 0-5582760 | Loss: 0.665 | 597 ms/step , 115626.49 GFLOP/s , 173530.2 tokens/s INFO:__main__:2024-11-30 06:28:26 | Epoch: 0 | Step: 330270 | Dataset: 0-5585160 | Loss: 0.635 | 597 ms/step , 115566.06 GFLOP/s , 173517.0 tokens/s INFO:__main__:2024-11-30 06:28:33 | Epoch: 0 | Step: 330280 | Dataset: 0-5587560 | Loss: 0.764 | 598 ms/step , 115420.74 GFLOP/s , 173541.9 tokens/s INFO:__main__:2024-11-30 06:28:41 | Epoch: 0 | Step: 330290 | Dataset: 0-5589960 | Loss: 0.633 | 597 ms/step , 115505.19 GFLOP/s , 173527.7 tokens/s INFO:__main__:2024-11-30 06:28:48 | Epoch: 0 | Step: 330300 | Dataset: 0-5592360 | Loss: 0.646 | 598 ms/step , 115501.21 GFLOP/s , 173503.4 tokens/s INFO:__main__:2024-11-30 06:28:55 | Epoch: 0 | Step: 330310 | Dataset: 0-5594760 | Loss: 0.712 | 597 ms/step , 115519.68 GFLOP/s , 173493.3 tokens/s INFO:__main__:2024-11-30 06:29:02 | Epoch: 0 | Step: 330320 | Dataset: 0-5597160 | Loss: 0.779 | 598 ms/step , 115439.19 GFLOP/s , 173657.7 tokens/s INFO:__main__:2024-11-30 06:29:09 | Epoch: 0 | Step: 330330 | Dataset: 0-5599560 | Loss: 0.651 | 597 ms/step , 115532.26 GFLOP/s , 173569.0 tokens/s INFO:__main__:2024-11-30 06:29:16 | Epoch: 0 | Step: 330340 | Dataset: 0-5601960 | Loss: 0.627 | 597 ms/step , 115584.28 GFLOP/s , 173487.9 tokens/s INFO:__main__:2024-11-30 06:29:23 | Epoch: 0 | Step: 330350 | Dataset: 0-5604360 | Loss: 0.704 | 597 ms/step , 115518.23 GFLOP/s , 173503.4 tokens/s INFO:__main__:2024-11-30 06:29:30 | Epoch: 0 | Step: 330360 | Dataset: 0-5606760 | Loss: 0.607 | 597 ms/step , 115600.28 GFLOP/s , 173499.9 tokens/s INFO:__main__:2024-11-30 06:29:37 | Epoch: 0 | Step: 330370 | Dataset: 0-5609160 | Loss: 0.605 | 597 ms/step , 115521.15 GFLOP/s , 173494.9 tokens/s INFO:__main__:2024-11-30 06:29:44 | Epoch: 0 | Step: 330380 | Dataset: 0-5611560 | Loss: 0.672 | 597 ms/step , 115527.99 GFLOP/s , 173459.3 tokens/s INFO:__main__:2024-11-30 06:29:51 | Epoch: 0 | Step: 330390 | Dataset: 0-5613960 | Loss: 0.646 | 596 ms/step , 115719.66 GFLOP/s , 173545.3 tokens/s INFO:__main__:2024-11-30 06:29:58 | Epoch: 0 | Step: 330400 | Dataset: 0-5616360 | Loss: 0.665 | 597 ms/step , 115605.34 GFLOP/s , 173598.0 tokens/s INFO:__main__:2024-11-30 06:30:05 | Epoch: 0 | Step: 330410 | Dataset: 0-5618760 | Loss: 0.734 | 597 ms/step , 115549.29 GFLOP/s , 173553.4 tokens/s INFO:__main__:2024-11-30 06:30:13 | Epoch: 0 | Step: 330420 | Dataset: 0-5621160 | Loss: 0.613 | 597 ms/step , 115597.71 GFLOP/s , 173535.1 tokens/s INFO:__main__:2024-11-30 06:30:20 | Epoch: 0 | Step: 330430 | Dataset: 0-5623560 | Loss: 0.622 | 597 ms/step , 115557.96 GFLOP/s , 173519.6 tokens/s INFO:__main__:2024-11-30 06:30:27 | Epoch: 0 | Step: 330440 | Dataset: 0-5625960 | Loss: 0.693 | 597 ms/step , 115515.94 GFLOP/s , 173476.5 tokens/s INFO:__main__:2024-11-30 06:30:34 | Epoch: 0 | Step: 330450 | Dataset: 0-5628360 | Loss: 0.644 | 598 ms/step , 115438.60 GFLOP/s , 173529.5 tokens/s INFO:__main__:2024-11-30 06:30:41 | Epoch: 0 | Step: 330460 | Dataset: 0-5630760 | Loss: 0.685 | 597 ms/step , 115549.06 GFLOP/s , 173550.9 tokens/s INFO:__main__:2024-11-30 06:30:48 | Epoch: 0 | Step: 330470 | Dataset: 0-5633160 | Loss: 0.618 | 596 ms/step , 115767.20 GFLOP/s , 173686.5 tokens/s INFO:__main__:2024-11-30 06:30:55 | Epoch: 0 | Step: 330480 | Dataset: 0-5635560 | Loss: 0.766 | 598 ms/step , 115447.99 GFLOP/s , 173547.6 tokens/s INFO:__main__:2024-11-30 06:31:02 | Epoch: 0 | Step: 330490 | Dataset: 0-5637960 | Loss: 0.357 | 596 ms/step , 115736.40 GFLOP/s , 173592.3 tokens/s INFO:__main__:2024-11-30 06:31:10 | Validation | Step: 330500 | Val_loss: 0.760 | Best_val_loss: 0.4363 INFO:__main__:2024-11-30 06:31:10 | Epoch: 0 | Step: 330500 | Dataset: 0-5640360 | Loss: 0.366 | 596 ms/step , 115723.40 GFLOP/s , 147742.1 tokens/s INFO:__main__:2024-11-30 06:31:18 | Epoch: 0 | Step: 330510 | Dataset: 0-5642760 | Loss: 0.465 | 596 ms/step , 115733.53 GFLOP/s , 173822.3 tokens/s INFO:__main__:2024-11-30 06:31:25 | Epoch: 0 | Step: 330520 | Dataset: 0-5645160 | Loss: 0.401 | 597 ms/step , 115622.57 GFLOP/s , 173797.2 tokens/s INFO:__main__:2024-11-30 06:31:32 | Epoch: 0 | Step: 330530 | Dataset: 0-5647560 | Loss: 0.351 | 597 ms/step , 115641.13 GFLOP/s , 173782.5 tokens/s INFO:__main__:2024-11-30 06:31:39 | Epoch: 0 | Step: 330540 | Dataset: 0-5649960 | Loss: 0.387 | 596 ms/step , 115824.37 GFLOP/s , 173865.0 tokens/s INFO:__main__:2024-11-30 06:31:46 | Epoch: 0 | Step: 330550 | Dataset: 0-5652360 | Loss: 0.430 | 597 ms/step , 115541.46 GFLOP/s , 173850.8 tokens/s INFO:__main__:2024-11-30 06:31:53 | Epoch: 0 | Step: 330560 | Dataset: 0-5654760 | Loss: 0.399 | 597 ms/step , 115592.30 GFLOP/s , 173750.9 tokens/s INFO:__main__:2024-11-30 06:32:00 | Epoch: 0 | Step: 330570 | Dataset: 0-5657160 | Loss: 0.389 | 597 ms/step , 115619.26 GFLOP/s , 173625.7 tokens/s INFO:__main__:2024-11-30 06:32:07 | Epoch: 0 | Step: 330580 | Dataset: 0-5659560 | Loss: 0.337 | 598 ms/step , 115494.75 GFLOP/s , 173770.8 tokens/s INFO:__main__:2024-11-30 06:32:14 | Epoch: 0 | Step: 330590 | Dataset: 0-5661960 | Loss: 0.368 | 597 ms/step , 115645.37 GFLOP/s , 173742.7 tokens/s INFO:__main__:2024-11-30 06:32:21 | Epoch: 0 | Step: 330600 | Dataset: 0-5664360 | Loss: 0.396 | 597 ms/step , 115658.52 GFLOP/s , 173688.2 tokens/s INFO:__main__:2024-11-30 06:32:28 | Epoch: 0 | Step: 330610 | Dataset: 0-5666760 | Loss: 0.349 | 596 ms/step , 115787.52 GFLOP/s , 173786.2 tokens/s INFO:__main__:2024-11-30 06:32:35 | Epoch: 0 | Step: 330620 | Dataset: 0-5669160 | Loss: 0.385 | 597 ms/step , 115530.80 GFLOP/s , 173860.7 tokens/s INFO:__main__:2024-11-30 06:32:42 | Epoch: 0 | Step: 330630 | Dataset: 0-5671560 | Loss: 0.318 | 596 ms/step , 115714.74 GFLOP/s , 173819.5 tokens/s INFO:__main__:2024-11-30 06:32:49 | Epoch: 0 | Step: 330640 | Dataset: 0-5673960 | Loss: 0.394 | 597 ms/step , 115574.91 GFLOP/s , 173764.2 tokens/s INFO:__main__:2024-11-30 06:32:57 | Epoch: 0 | Step: 330650 | Dataset: 0-5676360 | Loss: 0.368 | 597 ms/step , 115634.11 GFLOP/s , 173784.3 tokens/s INFO:__main__:2024-11-30 06:33:04 | Epoch: 0 | Step: 330660 | Dataset: 0-5678760 | Loss: 0.378 | 597 ms/step , 115617.15 GFLOP/s , 173741.9 tokens/s INFO:__main__:2024-11-30 06:33:11 | Epoch: 0 | Step: 330670 | Dataset: 0-5681160 | Loss: 0.335 | 596 ms/step , 115727.10 GFLOP/s , 173729.7 tokens/s INFO:__main__:2024-11-30 06:33:18 | Epoch: 0 | Step: 330680 | Dataset: 0-5683560 | Loss: 0.370 | 597 ms/step , 115603.42 GFLOP/s , 173672.2 tokens/s INFO:__main__:2024-11-30 06:33:25 | Epoch: 0 | Step: 330690 | Dataset: 0-5685960 | Loss: 0.396 | 596 ms/step , 115859.42 GFLOP/s , 173930.6 tokens/s INFO:__main__:2024-11-30 06:33:32 | Epoch: 0 | Step: 330700 | Dataset: 0-5688360 | Loss: 0.375 | 596 ms/step , 115760.55 GFLOP/s , 173835.5 tokens/s INFO:__main__:2024-11-30 06:33:39 | Epoch: 0 | Step: 330710 | Dataset: 0-5690760 | Loss: 0.388 | 597 ms/step , 115613.60 GFLOP/s , 173708.3 tokens/s INFO:__main__:2024-11-30 06:33:46 | Epoch: 0 | Step: 330720 | Dataset: 0-5693160 | Loss: 0.389 | 598 ms/step , 115474.66 GFLOP/s , 173708.9 tokens/s INFO:__main__:2024-11-30 06:33:53 | Epoch: 0 | Step: 330730 | Dataset: 0-5695560 | Loss: 0.328 | 596 ms/step , 115739.36 GFLOP/s , 173708.8 tokens/s INFO:__main__:2024-11-30 06:34:00 | Epoch: 0 | Step: 330740 | Dataset: 0-5697960 | Loss: 0.382 | 596 ms/step , 115731.71 GFLOP/s , 173669.2 tokens/s INFO:__main__:2024-11-30 06:34:07 | Epoch: 0 | Step: 330750 | Dataset: 0-5700360 | Loss: 0.408 | 597 ms/step , 115633.95 GFLOP/s , 173750.0 tokens/s INFO:__main__:2024-11-30 06:34:14 | Epoch: 0 | Step: 330760 | Dataset: 0-5702760 | Loss: 0.350 | 595 ms/step , 115941.80 GFLOP/s , 173803.2 tokens/s INFO:__main__:2024-11-30 06:34:21 | Epoch: 0 | Step: 330770 | Dataset: 0-5705160 | Loss: 0.376 | 596 ms/step , 115725.03 GFLOP/s , 173917.7 tokens/s INFO:__main__:2024-11-30 06:34:28 | Epoch: 0 | Step: 330780 | Dataset: 0-5707560 | Loss: 0.354 | 596 ms/step , 115724.76 GFLOP/s , 173733.1 tokens/s INFO:__main__:2024-11-30 06:34:36 | Epoch: 0 | Step: 330790 | Dataset: 0-5709960 | Loss: 0.391 | 596 ms/step , 115718.64 GFLOP/s , 173685.8 tokens/s INFO:__main__:2024-11-30 06:34:43 | Epoch: 0 | Step: 330800 | Dataset: 0-5712360 | Loss: 0.343 | 597 ms/step , 115551.40 GFLOP/s , 173758.5 tokens/s INFO:__main__:2024-11-30 06:34:50 | Epoch: 0 | Step: 330810 | Dataset: 0-5714760 | Loss: 0.394 | 597 ms/step , 115602.07 GFLOP/s , 173797.7 tokens/s INFO:__main__:2024-11-30 06:34:57 | Epoch: 0 | Step: 330820 | Dataset: 0-5717160 | Loss: 0.455 | 597 ms/step , 115523.61 GFLOP/s , 173722.3 tokens/s INFO:__main__:2024-11-30 06:35:04 | Epoch: 0 | Step: 330830 | Dataset: 0-5719560 | Loss: 0.344 | 596 ms/step , 115805.66 GFLOP/s , 173742.1 tokens/s INFO:__main__:2024-11-30 06:35:11 | Epoch: 0 | Step: 330840 | Dataset: 0-5721960 | Loss: 0.365 | 597 ms/step , 115679.52 GFLOP/s , 173903.8 tokens/s INFO:__main__:2024-11-30 06:35:18 | Epoch: 0 | Step: 330850 | Dataset: 0-5724360 | Loss: 0.388 | 596 ms/step , 115717.87 GFLOP/s , 173830.6 tokens/s INFO:__main__:2024-11-30 06:35:25 | Epoch: 0 | Step: 330860 | Dataset: 0-5726760 | Loss: 0.348 | 598 ms/step , 115487.79 GFLOP/s , 173762.6 tokens/s INFO:__main__:2024-11-30 06:35:32 | Epoch: 0 | Step: 330870 | Dataset: 0-5729160 | Loss: 0.368 | 596 ms/step , 115792.41 GFLOP/s , 173700.3 tokens/s INFO:__main__:2024-11-30 06:35:39 | Epoch: 0 | Step: 330880 | Dataset: 0-5731560 | Loss: 0.353 | 596 ms/step , 115734.17 GFLOP/s , 173731.1 tokens/s INFO:__main__:2024-11-30 06:35:46 | Epoch: 0 | Step: 330890 | Dataset: 0-5733960 | Loss: 0.401 | 597 ms/step , 115595.98 GFLOP/s , 173726.3 tokens/s INFO:__main__:2024-11-30 06:35:53 | Epoch: 0 | Step: 330900 | Dataset: 0-5736360 | Loss: 0.367 | 597 ms/step , 115652.50 GFLOP/s , 173721.6 tokens/s INFO:__main__:2024-11-30 06:36:00 | Epoch: 0 | Step: 330910 | Dataset: 0-5738760 | Loss: 0.338 | 597 ms/step , 115678.23 GFLOP/s , 173836.2 tokens/s INFO:__main__:2024-11-30 06:36:07 | Epoch: 0 | Step: 330920 | Dataset: 0-5741160 | Loss: 0.387 | 596 ms/step , 115780.33 GFLOP/s , 173818.1 tokens/s INFO:__main__:2024-11-30 06:36:15 | Epoch: 0 | Step: 330930 | Dataset: 0-5743560 | Loss: 0.415 | 597 ms/step , 115644.80 GFLOP/s , 173745.9 tokens/s INFO:__main__:2024-11-30 06:36:22 | Epoch: 0 | Step: 330940 | Dataset: 0-5745960 | Loss: 0.377 | 597 ms/step , 115562.34 GFLOP/s , 173705.9 tokens/s INFO:__main__:2024-11-30 06:36:29 | Epoch: 0 | Step: 330950 | Dataset: 0-5748360 | Loss: 0.396 | 597 ms/step , 115638.83 GFLOP/s , 173720.2 tokens/s INFO:__main__:2024-11-30 06:36:36 | Epoch: 0 | Step: 330960 | Dataset: 0-5750760 | Loss: 0.408 | 598 ms/step , 115501.95 GFLOP/s , 173679.0 tokens/s INFO:__main__:2024-11-30 06:36:43 | Epoch: 0 | Step: 330970 | Dataset: 0-5753160 | Loss: 0.322 | 597 ms/step , 115577.64 GFLOP/s , 173699.2 tokens/s INFO:__main__:2024-11-30 06:36:50 | Epoch: 0 | Step: 330980 | Dataset: 0-5755560 | Loss: 0.355 | 597 ms/step , 115658.91 GFLOP/s , 173728.0 tokens/s INFO:__main__:2024-11-30 06:36:57 | Epoch: 0 | Step: 330990 | Dataset: 0-5757960 | Loss: 0.377 | 596 ms/step , 115711.38 GFLOP/s , 173848.9 tokens/s INFO:__main__:2024-11-30 06:37:05 | Validation | Step: 331000 | Val_loss: 0.771 | Best_val_loss: 0.4363 INFO:__main__:2024-11-30 06:37:05 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_063705_step_331000.pt` INFO:__main__:2024-11-30 06:37:07 | Epoch: 0 | Step: 331000 | Dataset: 0-5760360 | Loss: 0.345 | 595 ms/step , 116034.20 GFLOP/s , 117907.9 tokens/s INFO:__main__:2024-11-30 06:37:14 | Epoch: 0 | Step: 331010 | Dataset: 0-5762760 | Loss: 0.380 | 598 ms/step , 115434.30 GFLOP/s , 173311.2 tokens/s INFO:__main__:2024-11-30 06:37:22 | Epoch: 0 | Step: 331020 | Dataset: 0-5765160 | Loss: 0.351 | 596 ms/step , 115780.56 GFLOP/s , 173150.1 tokens/s INFO:__main__:2024-11-30 06:37:29 | Epoch: 0 | Step: 331030 | Dataset: 0-5767560 | Loss: 0.502 | 597 ms/step , 115570.68 GFLOP/s , 173270.3 tokens/s INFO:__main__:2024-11-30 06:37:36 | Epoch: 0 | Step: 331040 | Dataset: 0-5769960 | Loss: 0.711 | 597 ms/step , 115529.87 GFLOP/s , 173102.0 tokens/s INFO:__main__:2024-11-30 06:37:43 | Epoch: 0 | Step: 331050 | Dataset: 0-5772360 | Loss: 0.793 | 598 ms/step , 115455.38 GFLOP/s , 173648.0 tokens/s INFO:__main__:2024-11-30 06:37:50 | Epoch: 0 | Step: 331060 | Dataset: 0-5774760 | Loss: 0.744 | 597 ms/step , 115591.31 GFLOP/s , 173765.8 tokens/s INFO:__main__:2024-11-30 06:37:57 | Epoch: 0 | Step: 331070 | Dataset: 0-5777160 | Loss: 0.741 | 598 ms/step , 115455.45 GFLOP/s , 173686.1 tokens/s INFO:__main__:2024-11-30 06:38:04 | Epoch: 0 | Step: 331080 | Dataset: 0-5779560 | Loss: 0.727 | 597 ms/step , 115596.96 GFLOP/s , 173560.4 tokens/s INFO:__main__:2024-11-30 06:38:11 | Epoch: 0 | Step: 331090 | Dataset: 0-5781960 | Loss: 0.645 | 597 ms/step , 115597.74 GFLOP/s , 173559.7 tokens/s INFO:__main__:2024-11-30 06:38:18 | Epoch: 0 | Step: 331100 | Dataset: 0-5784360 | Loss: 0.675 | 598 ms/step , 115423.91 GFLOP/s , 173528.4 tokens/s INFO:__main__:2024-11-30 06:38:25 | Epoch: 0 | Step: 331110 | Dataset: 0-5786760 | Loss: 0.783 | 598 ms/step , 115432.05 GFLOP/s , 173511.6 tokens/s INFO:__main__:2024-11-30 06:38:32 | Epoch: 0 | Step: 331120 | Dataset: 0-5789160 | Loss: 0.685 | 597 ms/step , 115507.79 GFLOP/s , 173506.4 tokens/s INFO:__main__:2024-11-30 06:38:39 | Epoch: 0 | Step: 331130 | Dataset: 0-5791560 | Loss: 0.723 | 597 ms/step , 115642.68 GFLOP/s , 173584.7 tokens/s INFO:__main__:2024-11-30 06:38:47 | Epoch: 0 | Step: 331140 | Dataset: 0-5793960 | Loss: 0.756 | 597 ms/step , 115610.19 GFLOP/s , 173702.4 tokens/s INFO:__main__:2024-11-30 06:38:54 | Epoch: 0 | Step: 331150 | Dataset: 0-5796360 | Loss: 0.636 | 597 ms/step , 115557.68 GFLOP/s , 173532.6 tokens/s INFO:__main__:2024-11-30 06:39:01 | Epoch: 0 | Step: 331160 | Dataset: 0-5798760 | Loss: 0.691 | 598 ms/step , 115460.42 GFLOP/s , 173510.2 tokens/s INFO:__main__:2024-11-30 06:39:08 | Epoch: 0 | Step: 331170 | Dataset: 0-5801160 | Loss: 0.670 | 597 ms/step , 115561.05 GFLOP/s , 173427.7 tokens/s INFO:__main__:2024-11-30 06:39:15 | Epoch: 0 | Step: 331180 | Dataset: 0-5803560 | Loss: 0.769 | 598 ms/step , 115376.35 GFLOP/s , 173421.0 tokens/s INFO:__main__:2024-11-30 06:39:22 | Epoch: 0 | Step: 331190 | Dataset: 0-5805960 | Loss: 0.756 | 598 ms/step , 115409.04 GFLOP/s , 173385.9 tokens/s INFO:__main__:2024-11-30 06:39:29 | Epoch: 0 | Step: 331200 | Dataset: 0-5808360 | Loss: 0.745 | 598 ms/step , 115458.35 GFLOP/s , 173538.8 tokens/s INFO:__main__:2024-11-30 06:39:36 | Epoch: 0 | Step: 331210 | Dataset: 0-5810760 | Loss: 0.626 | 598 ms/step , 115475.10 GFLOP/s , 173637.8 tokens/s INFO:__main__:2024-11-30 06:39:43 | Epoch: 0 | Step: 331220 | Dataset: 0-5813160 | Loss: 0.713 | 598 ms/step , 115481.95 GFLOP/s , 173603.4 tokens/s INFO:__main__:2024-11-30 06:39:50 | Epoch: 0 | Step: 331230 | Dataset: 0-5815560 | Loss: 0.721 | 598 ms/step , 115480.69 GFLOP/s , 173498.4 tokens/s INFO:__main__:2024-11-30 06:39:57 | Epoch: 0 | Step: 331240 | Dataset: 0-5817960 | Loss: 0.675 | 597 ms/step , 115535.55 GFLOP/s , 173513.3 tokens/s INFO:__main__:2024-11-30 06:40:04 | Epoch: 0 | Step: 331250 | Dataset: 0-5820360 | Loss: 0.686 | 598 ms/step , 115408.73 GFLOP/s , 173476.2 tokens/s INFO:__main__:2024-11-30 06:40:12 | Epoch: 0 | Step: 331260 | Dataset: 0-5822760 | Loss: 0.706 | 598 ms/step , 115473.48 GFLOP/s , 173464.7 tokens/s INFO:__main__:2024-11-30 06:40:19 | Epoch: 0 | Step: 331270 | Dataset: 0-5825160 | Loss: 0.703 | 597 ms/step , 115519.25 GFLOP/s , 173545.4 tokens/s INFO:__main__:2024-11-30 06:40:26 | Epoch: 0 | Step: 331280 | Dataset: 0-5827560 | Loss: 0.562 | 596 ms/step , 115795.33 GFLOP/s , 173610.8 tokens/s INFO:__main__:2024-11-30 06:40:33 | Epoch: 0 | Step: 331290 | Dataset: 0-5829960 | Loss: 0.674 | 598 ms/step , 115477.52 GFLOP/s , 173647.9 tokens/s INFO:__main__:2024-11-30 06:40:40 | Epoch: 0 | Step: 331300 | Dataset: 0-5832360 | Loss: 0.622 | 597 ms/step , 115514.58 GFLOP/s , 173487.4 tokens/s INFO:__main__:2024-11-30 06:40:47 | Epoch: 0 | Step: 331310 | Dataset: 0-5834760 | Loss: 0.647 | 597 ms/step , 115594.42 GFLOP/s , 173482.7 tokens/s INFO:__main__:2024-11-30 06:40:54 | Epoch: 0 | Step: 331320 | Dataset: 0-5837160 | Loss: 0.641 | 597 ms/step , 115506.18 GFLOP/s , 173492.6 tokens/s INFO:__main__:2024-11-30 06:41:01 | Epoch: 0 | Step: 331330 | Dataset: 0-5839560 | Loss: 0.686 | 598 ms/step , 115457.73 GFLOP/s , 173516.5 tokens/s INFO:__main__:2024-11-30 06:41:08 | Epoch: 0 | Step: 331340 | Dataset: 0-5841960 | Loss: 0.797 | 598 ms/step , 115486.64 GFLOP/s , 173484.1 tokens/s INFO:__main__:2024-11-30 06:41:15 | Epoch: 0 | Step: 331350 | Dataset: 0-5844360 | Loss: 0.623 | 597 ms/step , 115561.91 GFLOP/s , 173527.6 tokens/s INFO:__main__:2024-11-30 06:41:22 | Epoch: 0 | Step: 331360 | Dataset: 0-5846760 | Loss: 0.719 | 597 ms/step , 115561.07 GFLOP/s , 173575.1 tokens/s INFO:__main__:2024-11-30 06:41:29 | Epoch: 0 | Step: 331370 | Dataset: 0-5849160 | Loss: 0.657 | 598 ms/step , 115487.66 GFLOP/s , 173567.1 tokens/s INFO:__main__:2024-11-30 06:41:37 | Epoch: 0 | Step: 331380 | Dataset: 0-5851560 | Loss: 0.725 | 597 ms/step , 115599.89 GFLOP/s , 173484.9 tokens/s INFO:__main__:2024-11-30 06:41:44 | Epoch: 0 | Step: 331390 | Dataset: 0-5853960 | Loss: 0.655 | 598 ms/step , 115391.48 GFLOP/s , 173492.7 tokens/s INFO:__main__:2024-11-30 06:41:51 | Epoch: 0 | Step: 331400 | Dataset: 0-5856360 | Loss: 0.706 | 598 ms/step , 115464.65 GFLOP/s , 173466.3 tokens/s INFO:__main__:2024-11-30 06:41:58 | Epoch: 0 | Step: 331410 | Dataset: 0-5858760 | Loss: 0.610 | 598 ms/step , 115455.31 GFLOP/s , 173442.2 tokens/s INFO:__main__:2024-11-30 06:42:05 | Epoch: 0 | Step: 331420 | Dataset: 0-5861160 | Loss: 0.730 | 597 ms/step , 115562.53 GFLOP/s , 173501.4 tokens/s INFO:__main__:2024-11-30 06:42:12 | Epoch: 0 | Step: 331430 | Dataset: 0-5863560 | Loss: 0.669 | 597 ms/step , 115574.52 GFLOP/s , 173576.1 tokens/s INFO:__main__:2024-11-30 06:42:19 | Epoch: 0 | Step: 331440 | Dataset: 0-5865960 | Loss: 0.690 | 597 ms/step , 115595.99 GFLOP/s , 173591.7 tokens/s INFO:__main__:2024-11-30 06:42:26 | Epoch: 0 | Step: 331450 | Dataset: 0-5868360 | Loss: 0.727 | 598 ms/step , 115442.71 GFLOP/s , 173556.5 tokens/s INFO:__main__:2024-11-30 06:42:33 | Epoch: 0 | Step: 331460 | Dataset: 0-5870760 | Loss: 0.770 | 598 ms/step , 115474.45 GFLOP/s , 173453.1 tokens/s INFO:__main__:2024-11-30 06:42:40 | Epoch: 0 | Step: 331470 | Dataset: 0-5873160 | Loss: 0.684 | 598 ms/step , 115443.46 GFLOP/s , 173488.6 tokens/s INFO:__main__:2024-11-30 06:42:47 | Epoch: 0 | Step: 331480 | Dataset: 0-5875560 | Loss: 0.683 | 598 ms/step , 115496.72 GFLOP/s , 173491.9 tokens/s INFO:__main__:2024-11-30 06:42:54 | Epoch: 0 | Step: 331490 | Dataset: 0-5877960 | Loss: 0.609 | 598 ms/step , 115445.77 GFLOP/s , 173485.2 tokens/s INFO:__main__:2024-11-30 06:43:02 | Validation | Step: 331500 | Val_loss: 0.784 | Best_val_loss: 0.4363 INFO:__main__:2024-11-30 06:43:03 | Epoch: 0 | Step: 331500 | Dataset: 0-5880360 | Loss: 0.719 | 596 ms/step , 115796.69 GFLOP/s , 147530.8 tokens/s INFO:__main__:2024-11-30 06:43:10 | Epoch: 0 | Step: 331510 | Dataset: 0-5882760 | Loss: 0.656 | 597 ms/step , 115579.36 GFLOP/s , 173732.0 tokens/s INFO:__main__:2024-11-30 06:43:17 | Epoch: 0 | Step: 331520 | Dataset: 0-5885160 | Loss: 0.687 | 598 ms/step , 115418.69 GFLOP/s , 173661.3 tokens/s INFO:__main__:2024-11-30 06:43:24 | Epoch: 0 | Step: 331530 | Dataset: 0-5887560 | Loss: 0.740 | 598 ms/step , 115364.57 GFLOP/s , 173528.5 tokens/s INFO:__main__:2024-11-30 06:43:31 | Epoch: 0 | Step: 331540 | Dataset: 0-5889960 | Loss: 0.669 | 598 ms/step , 115471.43 GFLOP/s , 173555.4 tokens/s INFO:__main__:2024-11-30 06:43:38 | Epoch: 0 | Step: 331550 | Dataset: 0-5892360 | Loss: 0.682 | 597 ms/step , 115606.41 GFLOP/s , 173485.2 tokens/s INFO:__main__:2024-11-30 06:43:45 | Epoch: 0 | Step: 331560 | Dataset: 0-5894760 | Loss: 0.624 | 597 ms/step , 115570.14 GFLOP/s , 173508.2 tokens/s INFO:__main__:2024-11-30 06:43:52 | Epoch: 0 | Step: 331570 | Dataset: 0-5897160 | Loss: 0.708 | 598 ms/step , 115407.52 GFLOP/s , 173460.8 tokens/s INFO:__main__:2024-11-30 06:43:59 | Epoch: 0 | Step: 331580 | Dataset: 0-5899560 | Loss: 0.698 | 597 ms/step , 115635.22 GFLOP/s , 173580.4 tokens/s INFO:__main__:2024-11-30 06:44:06 | Epoch: 0 | Step: 331590 | Dataset: 0-5901960 | Loss: 0.609 | 597 ms/step , 115543.45 GFLOP/s , 173551.6 tokens/s INFO:__main__:2024-11-30 06:44:14 | Epoch: 0 | Step: 331600 | Dataset: 0-5904360 | Loss: 0.690 | 598 ms/step , 115418.74 GFLOP/s , 173491.7 tokens/s INFO:__main__:2024-11-30 06:44:21 | Epoch: 0 | Step: 331610 | Dataset: 0-5906760 | Loss: 0.686 | 598 ms/step , 115424.40 GFLOP/s , 173461.4 tokens/s INFO:__main__:2024-11-30 06:44:28 | Epoch: 0 | Step: 331620 | Dataset: 0-5909160 | Loss: 0.677 | 597 ms/step , 115524.09 GFLOP/s , 173487.1 tokens/s INFO:__main__:2024-11-30 06:44:35 | Epoch: 0 | Step: 331630 | Dataset: 0-5911560 | Loss: 0.727 | 599 ms/step , 115270.47 GFLOP/s , 173485.6 tokens/s INFO:__main__:2024-11-30 06:44:42 | Epoch: 0 | Step: 331640 | Dataset: 0-5913960 | Loss: 0.701 | 598 ms/step , 115469.50 GFLOP/s , 173475.1 tokens/s INFO:__main__:2024-11-30 06:44:49 | Epoch: 0 | Step: 331650 | Dataset: 0-5916360 | Loss: 0.725 | 597 ms/step , 115593.52 GFLOP/s , 173492.7 tokens/s INFO:__main__:2024-11-30 06:44:56 | Epoch: 0 | Step: 331660 | Dataset: 0-5918760 | Loss: 0.515 | 596 ms/step , 115716.98 GFLOP/s , 173608.0 tokens/s INFO:__main__:2024-11-30 06:45:03 | Epoch: 0 | Step: 331670 | Dataset: 0-5921160 | Loss: 0.694 | 597 ms/step , 115515.93 GFLOP/s , 173519.9 tokens/s INFO:__main__:2024-11-30 06:45:10 | Epoch: 0 | Step: 331680 | Dataset: 0-5923560 | Loss: 0.706 | 597 ms/step , 115567.32 GFLOP/s , 173460.0 tokens/s INFO:__main__:2024-11-30 06:45:17 | Epoch: 0 | Step: 331690 | Dataset: 0-5925960 | Loss: 0.665 | 598 ms/step , 115389.47 GFLOP/s , 173382.4 tokens/s INFO:__main__:2024-11-30 06:45:24 | Epoch: 0 | Step: 331700 | Dataset: 0-5928360 | Loss: 0.600 | 597 ms/step , 115554.10 GFLOP/s , 173533.6 tokens/s INFO:__main__:2024-11-30 06:45:31 | Epoch: 0 | Step: 331710 | Dataset: 0-5930760 | Loss: 0.685 | 597 ms/step , 115515.26 GFLOP/s , 173527.1 tokens/s INFO:__main__:2024-11-30 06:45:39 | Epoch: 0 | Step: 331720 | Dataset: 0-5933160 | Loss: 0.684 | 597 ms/step , 115527.55 GFLOP/s , 173522.6 tokens/s INFO:__main__:2024-11-30 06:45:46 | Epoch: 0 | Step: 331730 | Dataset: 0-5935560 | Loss: 0.612 | 597 ms/step , 115610.97 GFLOP/s , 173529.1 tokens/s INFO:__main__:2024-11-30 06:45:53 | Epoch: 0 | Step: 331740 | Dataset: 0-5937960 | Loss: 0.799 | 598 ms/step , 115476.05 GFLOP/s , 173554.6 tokens/s INFO:__main__:2024-11-30 06:46:00 | Epoch: 0 | Step: 331750 | Dataset: 0-5940360 | Loss: 0.750 | 598 ms/step , 115390.89 GFLOP/s , 173494.1 tokens/s INFO:__main__:2024-11-30 06:46:07 | Epoch: 0 | Step: 331760 | Dataset: 0-5942760 | Loss: 0.675 | 598 ms/step , 115499.58 GFLOP/s , 173468.2 tokens/s INFO:__main__:2024-11-30 06:46:14 | Epoch: 0 | Step: 331770 | Dataset: 0-5945160 | Loss: 0.701 | 597 ms/step , 115596.07 GFLOP/s , 173506.7 tokens/s INFO:__main__:2024-11-30 06:46:21 | Epoch: 0 | Step: 331780 | Dataset: 0-5947560 | Loss: 0.626 | 597 ms/step , 115532.16 GFLOP/s , 173502.6 tokens/s INFO:__main__:2024-11-30 06:46:28 | Epoch: 0 | Step: 331790 | Dataset: 0-5949960 | Loss: 0.793 | 599 ms/step , 115174.17 GFLOP/s , 173400.2 tokens/s INFO:__main__:2024-11-30 06:46:35 | Epoch: 0 | Step: 331800 | Dataset: 0-5952360 | Loss: 0.666 | 598 ms/step , 115493.73 GFLOP/s , 173423.5 tokens/s INFO:__main__:2024-11-30 06:46:42 | Epoch: 0 | Step: 331810 | Dataset: 0-5954760 | Loss: 0.666 | 597 ms/step , 115598.46 GFLOP/s , 173682.1 tokens/s INFO:__main__:2024-11-30 06:46:49 | Epoch: 0 | Step: 331820 | Dataset: 0-5957160 | Loss: 0.683 | 598 ms/step , 115465.79 GFLOP/s , 173544.6 tokens/s INFO:__main__:2024-11-30 06:46:56 | Epoch: 0 | Step: 331830 | Dataset: 0-5959560 | Loss: 0.730 | 597 ms/step , 115550.62 GFLOP/s , 173449.1 tokens/s INFO:__main__:2024-11-30 06:47:04 | Epoch: 0 | Step: 331840 | Dataset: 0-5961960 | Loss: 0.714 | 598 ms/step , 115348.41 GFLOP/s , 173468.3 tokens/s INFO:__main__:2024-11-30 06:47:11 | Epoch: 0 | Step: 331850 | Dataset: 0-5964360 | Loss: 0.754 | 597 ms/step , 115553.80 GFLOP/s , 173497.0 tokens/s INFO:__main__:2024-11-30 06:47:18 | Epoch: 0 | Step: 331860 | Dataset: 0-5966760 | Loss: 0.657 | 597 ms/step , 115627.62 GFLOP/s , 173487.6 tokens/s INFO:__main__:2024-11-30 06:47:25 | Epoch: 0 | Step: 331870 | Dataset: 0-5969160 | Loss: 0.650 | 597 ms/step , 115505.45 GFLOP/s , 173497.0 tokens/s INFO:__main__:2024-11-30 06:47:32 | Epoch: 0 | Step: 331880 | Dataset: 0-5971560 | Loss: 0.636 | 596 ms/step , 115700.49 GFLOP/s , 173532.8 tokens/s INFO:__main__:2024-11-30 06:47:39 | Epoch: 0 | Step: 331890 | Dataset: 0-5973960 | Loss: 0.610 | 597 ms/step , 115605.52 GFLOP/s , 173633.2 tokens/s INFO:__main__:2024-11-30 06:47:46 | Epoch: 0 | Step: 331900 | Dataset: 0-5976360 | Loss: 0.733 | 597 ms/step , 115539.78 GFLOP/s , 173492.6 tokens/s INFO:__main__:2024-11-30 06:47:53 | Epoch: 0 | Step: 331910 | Dataset: 0-5978760 | Loss: 0.685 | 599 ms/step , 115123.59 GFLOP/s , 173472.7 tokens/s INFO:__main__:2024-11-30 06:48:00 | Epoch: 0 | Step: 331920 | Dataset: 0-5981160 | Loss: 0.772 | 598 ms/step , 115462.05 GFLOP/s , 173460.4 tokens/s INFO:__main__:2024-11-30 06:48:07 | Epoch: 0 | Step: 331930 | Dataset: 0-5983560 | Loss: 0.696 | 598 ms/step , 115454.30 GFLOP/s , 173472.8 tokens/s INFO:__main__:2024-11-30 06:48:14 | Epoch: 0 | Step: 331940 | Dataset: 0-5985960 | Loss: 0.719 | 598 ms/step , 115393.10 GFLOP/s , 173533.6 tokens/s INFO:__main__:2024-11-30 06:48:21 | Epoch: 0 | Step: 331950 | Dataset: 0-5988360 | Loss: 0.689 | 597 ms/step , 115546.13 GFLOP/s , 173550.3 tokens/s INFO:__main__:2024-11-30 06:48:28 | Epoch: 0 | Step: 331960 | Dataset: 0-5990760 | Loss: 0.664 | 597 ms/step , 115606.72 GFLOP/s , 173637.3 tokens/s INFO:__main__:2024-11-30 06:48:36 | Epoch: 0 | Step: 331970 | Dataset: 0-5993160 | Loss: 0.752 | 597 ms/step , 115619.59 GFLOP/s , 173526.4 tokens/s INFO:__main__:2024-11-30 06:48:43 | Epoch: 0 | Step: 331980 | Dataset: 0-5995560 | Loss: 0.647 | 597 ms/step , 115518.22 GFLOP/s , 173458.6 tokens/s INFO:__main__:2024-11-30 06:48:50 | Epoch: 0 | Step: 331990 | Dataset: 0-5997960 | Loss: 0.703 | 598 ms/step , 115347.27 GFLOP/s , 173467.5 tokens/s INFO:__main__:2024-11-30 06:48:57 | Validation | Step: 332000 | Val_loss: 0.764 | Best_val_loss: 0.4363 INFO:__main__:2024-11-30 06:48:57 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_064857_step_332000.pt` INFO:__main__:2024-11-30 06:49:00 | Epoch: 0 | Step: 332000 | Dataset: 0-6000360 | Loss: 0.659 | 596 ms/step , 115845.83 GFLOP/s , 118712.2 tokens/s INFO:__main__:2024-11-30 06:49:07 | Epoch: 0 | Step: 332010 | Dataset: 0-6002760 | Loss: 0.736 | 600 ms/step , 115044.43 GFLOP/s , 173103.2 tokens/s INFO:__main__:2024-11-30 06:49:14 | Epoch: 0 | Step: 332020 | Dataset: 0-6005160 | Loss: 0.683 | 597 ms/step , 115643.08 GFLOP/s , 173177.6 tokens/s INFO:__main__:2024-11-30 06:49:21 | Epoch: 0 | Step: 332030 | Dataset: 0-6007560 | Loss: 0.571 | 596 ms/step , 115764.53 GFLOP/s , 173323.9 tokens/s INFO:__main__:2024-11-30 06:49:28 | Epoch: 0 | Step: 332040 | Dataset: 0-6009960 | Loss: 0.606 | 597 ms/step , 115563.87 GFLOP/s , 173532.4 tokens/s INFO:__main__:2024-11-30 06:49:36 | Epoch: 0 | Step: 332050 | Dataset: 0-6012360 | Loss: 0.587 | 597 ms/step , 115509.99 GFLOP/s , 173602.3 tokens/s INFO:__main__:2024-11-30 06:49:43 | Epoch: 0 | Step: 332060 | Dataset: 0-6014760 | Loss: 0.727 | 597 ms/step , 115513.47 GFLOP/s , 173547.0 tokens/s INFO:__main__:2024-11-30 06:49:50 | Epoch: 0 | Step: 332070 | Dataset: 0-6017160 | Loss: 0.685 | 597 ms/step , 115509.11 GFLOP/s , 173596.7 tokens/s INFO:__main__:2024-11-30 06:49:57 | Epoch: 0 | Step: 332080 | Dataset: 0-6019560 | Loss: 0.719 | 598 ms/step , 115446.66 GFLOP/s , 173536.1 tokens/s INFO:__main__:2024-11-30 06:50:04 | Epoch: 0 | Step: 332090 | Dataset: 0-6021960 | Loss: 0.680 | 599 ms/step , 115299.93 GFLOP/s , 173516.0 tokens/s INFO:__main__:2024-11-30 06:50:11 | Epoch: 0 | Step: 332100 | Dataset: 0-6024360 | Loss: 0.641 | 596 ms/step , 115725.22 GFLOP/s , 173658.6 tokens/s INFO:__main__:2024-11-30 06:50:18 | Epoch: 0 | Step: 332110 | Dataset: 0-6026760 | Loss: 0.628 | 597 ms/step , 115507.99 GFLOP/s , 173683.1 tokens/s INFO:__main__:2024-11-30 06:50:25 | Epoch: 0 | Step: 332120 | Dataset: 0-6029160 | Loss: 0.696 | 598 ms/step , 115407.80 GFLOP/s , 173515.5 tokens/s INFO:__main__:2024-11-30 06:50:32 | Epoch: 0 | Step: 332130 | Dataset: 0-6031560 | Loss: 0.439 | 597 ms/step , 115504.38 GFLOP/s , 173572.9 tokens/s INFO:__main__:2024-11-30 06:50:39 | Epoch: 0 | Step: 332140 | Dataset: 0-6033960 | Loss: 0.427 | 597 ms/step , 115629.63 GFLOP/s , 173672.3 tokens/s INFO:__main__:2024-11-30 06:50:46 | Epoch: 0 | Step: 332150 | Dataset: 0-6036360 | Loss: 0.401 | 597 ms/step , 115679.97 GFLOP/s , 173636.7 tokens/s INFO:__main__:2024-11-30 06:50:53 | Epoch: 0 | Step: 332160 | Dataset: 0-6038760 | Loss: 0.443 | 598 ms/step , 115422.95 GFLOP/s , 173611.1 tokens/s INFO:__main__:2024-11-30 06:51:00 | Epoch: 0 | Step: 332170 | Dataset: 0-6041160 | Loss: 0.448 | 597 ms/step , 115611.21 GFLOP/s , 173655.8 tokens/s INFO:__main__:2024-11-30 06:51:08 | Epoch: 0 | Step: 332180 | Dataset: 0-6043560 | Loss: 0.379 | 596 ms/step , 115816.76 GFLOP/s , 173534.1 tokens/s INFO:__main__:2024-11-30 06:51:15 | Epoch: 0 | Step: 332190 | Dataset: 0-6045960 | Loss: 0.430 | 598 ms/step , 115501.71 GFLOP/s , 173657.5 tokens/s INFO:__main__:2024-11-30 06:51:22 | Epoch: 0 | Step: 332200 | Dataset: 0-6048360 | Loss: 0.436 | 597 ms/step , 115603.27 GFLOP/s , 173616.7 tokens/s INFO:__main__:2024-11-30 06:51:29 | Epoch: 0 | Step: 332210 | Dataset: 0-6050760 | Loss: 0.422 | 597 ms/step , 115654.05 GFLOP/s , 173569.9 tokens/s INFO:__main__:2024-11-30 06:51:36 | Epoch: 0 | Step: 332220 | Dataset: 0-6053160 | Loss: 0.428 | 597 ms/step , 115608.88 GFLOP/s , 173577.0 tokens/s INFO:__main__:2024-11-30 06:51:43 | Epoch: 0 | Step: 332230 | Dataset: 0-6055560 | Loss: 0.464 | 598 ms/step , 115317.39 GFLOP/s , 173663.8 tokens/s INFO:__main__:2024-11-30 06:51:50 | Epoch: 0 | Step: 332240 | Dataset: 0-6057960 | Loss: 0.472 | 598 ms/step , 115391.92 GFLOP/s , 173607.1 tokens/s INFO:__main__:2024-11-30 06:51:57 | Epoch: 0 | Step: 332250 | Dataset: 0-6060360 | Loss: 0.425 | 597 ms/step , 115664.65 GFLOP/s , 173740.2 tokens/s INFO:__main__:2024-11-30 06:52:04 | Epoch: 0 | Step: 332260 | Dataset: 0-6062760 | Loss: 0.496 | 597 ms/step , 115694.68 GFLOP/s , 173844.3 tokens/s INFO:__main__:2024-11-30 06:52:11 | Epoch: 0 | Step: 332270 | Dataset: 0-6065160 | Loss: 0.471 | 597 ms/step , 115548.39 GFLOP/s , 173653.1 tokens/s INFO:__main__:2024-11-30 06:52:18 | Epoch: 0 | Step: 332280 | Dataset: 0-6067560 | Loss: 0.404 | 597 ms/step , 115542.86 GFLOP/s , 173622.8 tokens/s INFO:__main__:2024-11-30 06:52:25 | Epoch: 0 | Step: 332290 | Dataset: 0-6069960 | Loss: 0.412 | 597 ms/step , 115513.63 GFLOP/s , 173615.0 tokens/s INFO:__main__:2024-11-30 06:52:32 | Epoch: 0 | Step: 332300 | Dataset: 0-6072360 | Loss: 0.442 | 598 ms/step , 115453.64 GFLOP/s , 173602.9 tokens/s INFO:__main__:2024-11-30 06:52:40 | Epoch: 0 | Step: 332310 | Dataset: 0-6074760 | Loss: 0.420 | 597 ms/step , 115590.35 GFLOP/s , 173626.0 tokens/s INFO:__main__:2024-11-30 06:52:47 | Epoch: 0 | Step: 332320 | Dataset: 0-6077160 | Loss: 0.428 | 597 ms/step , 115587.42 GFLOP/s , 173605.1 tokens/s INFO:__main__:2024-11-30 06:52:54 | Epoch: 0 | Step: 332330 | Dataset: 0-6079560 | Loss: 0.459 | 597 ms/step , 115535.64 GFLOP/s , 173716.9 tokens/s INFO:__main__:2024-11-30 06:53:01 | Epoch: 0 | Step: 332340 | Dataset: 0-6081960 | Loss: 0.481 | 597 ms/step , 115645.52 GFLOP/s , 173607.5 tokens/s INFO:__main__:2024-11-30 06:53:08 | Epoch: 0 | Step: 332350 | Dataset: 0-6084360 | Loss: 0.412 | 597 ms/step , 115607.32 GFLOP/s , 173576.9 tokens/s INFO:__main__:2024-11-30 06:53:15 | Epoch: 0 | Step: 332360 | Dataset: 0-6086760 | Loss: 0.421 | 597 ms/step , 115614.59 GFLOP/s , 173545.5 tokens/s INFO:__main__:2024-11-30 06:53:22 | Epoch: 0 | Step: 332370 | Dataset: 0-6089160 | Loss: 0.412 | 596 ms/step , 115716.52 GFLOP/s , 173546.2 tokens/s INFO:__main__:2024-11-30 06:53:29 | Epoch: 0 | Step: 332380 | Dataset: 0-6091560 | Loss: 0.445 | 597 ms/step , 115548.06 GFLOP/s , 173604.4 tokens/s INFO:__main__:2024-11-30 06:53:36 | Epoch: 0 | Step: 332390 | Dataset: 0-6093960 | Loss: 0.440 | 598 ms/step , 115481.27 GFLOP/s , 173599.5 tokens/s INFO:__main__:2024-11-30 06:53:43 | Epoch: 0 | Step: 332400 | Dataset: 0-6096360 | Loss: 0.437 | 596 ms/step , 115764.26 GFLOP/s , 173752.6 tokens/s INFO:__main__:2024-11-30 06:53:50 | Epoch: 0 | Step: 332410 | Dataset: 0-6098760 | Loss: 0.412 | 597 ms/step , 115607.84 GFLOP/s , 173750.7 tokens/s INFO:__main__:2024-11-30 06:53:57 | Epoch: 0 | Step: 332420 | Dataset: 0-6101160 | Loss: 0.450 | 597 ms/step , 115518.50 GFLOP/s , 173613.9 tokens/s INFO:__main__:2024-11-30 06:54:04 | Epoch: 0 | Step: 332430 | Dataset: 0-6103560 | Loss: 0.421 | 598 ms/step , 115481.55 GFLOP/s , 173626.0 tokens/s INFO:__main__:2024-11-30 06:54:12 | Epoch: 0 | Step: 332440 | Dataset: 0-6105960 | Loss: 0.451 | 597 ms/step , 115598.75 GFLOP/s , 173613.9 tokens/s INFO:__main__:2024-11-30 06:54:19 | Epoch: 0 | Step: 332450 | Dataset: 0-6108360 | Loss: 0.406 | 597 ms/step , 115611.96 GFLOP/s , 173634.5 tokens/s INFO:__main__:2024-11-30 06:54:26 | Epoch: 0 | Step: 332460 | Dataset: 0-6110760 | Loss: 0.400 | 597 ms/step , 115620.35 GFLOP/s , 173549.3 tokens/s INFO:__main__:2024-11-30 06:54:33 | Epoch: 0 | Step: 332470 | Dataset: 0-6113160 | Loss: 0.442 | 597 ms/step , 115685.22 GFLOP/s , 173609.6 tokens/s INFO:__main__:2024-11-30 06:54:40 | Epoch: 0 | Step: 332480 | Dataset: 0-6115560 | Loss: 0.441 | 597 ms/step , 115565.02 GFLOP/s , 173725.3 tokens/s INFO:__main__:2024-11-30 06:54:47 | Epoch: 0 | Step: 332490 | Dataset: 0-6117960 | Loss: 0.384 | 597 ms/step , 115545.80 GFLOP/s , 173585.8 tokens/s INFO:__main__:2024-11-30 06:54:55 | Validation | Step: 332500 | Val_loss: 0.821 | Best_val_loss: 0.4363 INFO:__main__:2024-11-30 06:54:55 | Epoch: 0 | Step: 332500 | Dataset: 0-6120360 | Loss: 0.381 | 596 ms/step , 115860.02 GFLOP/s , 147633.4 tokens/s INFO:__main__:2024-11-30 06:55:02 | Epoch: 0 | Step: 332510 | Dataset: 0-6122760 | Loss: 0.419 | 597 ms/step , 115581.79 GFLOP/s , 173649.8 tokens/s INFO:__main__:2024-11-30 06:55:09 | Epoch: 0 | Step: 332520 | Dataset: 0-6125160 | Loss: 0.425 | 597 ms/step , 115606.81 GFLOP/s , 173671.0 tokens/s INFO:__main__:2024-11-30 06:55:16 | Epoch: 0 | Step: 332530 | Dataset: 0-6127560 | Loss: 0.413 | 597 ms/step , 115633.69 GFLOP/s , 173673.5 tokens/s INFO:__main__:2024-11-30 06:55:24 | Epoch: 0 | Step: 332540 | Dataset: 0-6129960 | Loss: 0.471 | 596 ms/step , 115711.10 GFLOP/s , 173673.4 tokens/s INFO:__main__:2024-11-30 06:55:31 | Epoch: 0 | Step: 332550 | Dataset: 0-6132360 | Loss: 0.405 | 597 ms/step , 115625.04 GFLOP/s , 173854.7 tokens/s INFO:__main__:2024-11-30 06:55:38 | Epoch: 0 | Step: 332560 | Dataset: 0-6134760 | Loss: 0.414 | 597 ms/step , 115611.62 GFLOP/s , 173759.4 tokens/s INFO:__main__:2024-11-30 06:55:45 | Epoch: 0 | Step: 332570 | Dataset: 0-6137160 | Loss: 0.402 | 597 ms/step , 115553.13 GFLOP/s , 173644.9 tokens/s INFO:__main__:2024-11-30 06:55:52 | Epoch: 0 | Step: 332580 | Dataset: 0-6139560 | Loss: 0.419 | 597 ms/step , 115628.07 GFLOP/s , 173586.3 tokens/s INFO:__main__:2024-11-30 06:55:59 | Epoch: 0 | Step: 332590 | Dataset: 0-6141960 | Loss: 0.469 | 597 ms/step , 115670.88 GFLOP/s , 173548.2 tokens/s INFO:__main__:2024-11-30 06:56:06 | Epoch: 0 | Step: 332600 | Dataset: 0-6144360 | Loss: 0.398 | 598 ms/step , 115480.22 GFLOP/s , 173635.3 tokens/s INFO:__main__:2024-11-30 06:56:13 | Epoch: 0 | Step: 332610 | Dataset: 0-6146760 | Loss: 0.419 | 597 ms/step , 115561.00 GFLOP/s , 173628.7 tokens/s INFO:__main__:2024-11-30 06:56:20 | Epoch: 0 | Step: 332620 | Dataset: 0-6149160 | Loss: 0.441 | 596 ms/step , 115827.73 GFLOP/s , 173741.5 tokens/s INFO:__main__:2024-11-30 06:56:27 | Epoch: 0 | Step: 332630 | Dataset: 0-6151560 | Loss: 0.461 | 597 ms/step , 115594.49 GFLOP/s , 173848.0 tokens/s INFO:__main__:2024-11-30 06:56:34 | Epoch: 0 | Step: 332640 | Dataset: 0-6153960 | Loss: 0.490 | 597 ms/step , 115566.81 GFLOP/s , 173653.4 tokens/s INFO:__main__:2024-11-30 06:56:41 | Epoch: 0 | Step: 332650 | Dataset: 0-6156360 | Loss: 0.446 | 597 ms/step , 115673.78 GFLOP/s , 173605.2 tokens/s INFO:__main__:2024-11-30 06:56:48 | Epoch: 0 | Step: 332660 | Dataset: 0-6158760 | Loss: 0.472 | 597 ms/step , 115619.08 GFLOP/s , 173597.0 tokens/s INFO:__main__:2024-11-30 06:56:56 | Epoch: 0 | Step: 332670 | Dataset: 0-6161160 | Loss: 0.615 | 597 ms/step , 115627.23 GFLOP/s , 173578.0 tokens/s INFO:__main__:2024-11-30 06:57:03 | Epoch: 0 | Step: 332680 | Dataset: 0-6163560 | Loss: 0.562 | 597 ms/step , 115567.74 GFLOP/s , 173527.2 tokens/s INFO:__main__:2024-11-30 06:57:10 | Epoch: 0 | Step: 332690 | Dataset: 0-6165960 | Loss: 0.568 | 597 ms/step , 115562.32 GFLOP/s , 173533.8 tokens/s INFO:__main__:2024-11-30 06:57:17 | Epoch: 0 | Step: 332700 | Dataset: 0-6168360 | Loss: 0.601 | 597 ms/step , 115578.27 GFLOP/s , 173609.5 tokens/s INFO:__main__:2024-11-30 06:57:24 | Epoch: 0 | Step: 332710 | Dataset: 0-6170760 | Loss: 0.572 | 597 ms/step , 115540.12 GFLOP/s , 173522.0 tokens/s INFO:__main__:2024-11-30 06:57:31 | Epoch: 0 | Step: 332720 | Dataset: 0-6173160 | Loss: 0.495 | 597 ms/step , 115543.46 GFLOP/s , 173538.4 tokens/s INFO:__main__:2024-11-30 06:57:38 | Epoch: 0 | Step: 332730 | Dataset: 0-6175560 | Loss: 0.521 | 597 ms/step , 115525.72 GFLOP/s , 173502.4 tokens/s INFO:__main__:2024-11-30 06:57:45 | Epoch: 0 | Step: 332740 | Dataset: 0-6177960 | Loss: 0.556 | 597 ms/step , 115511.21 GFLOP/s , 173427.4 tokens/s INFO:__main__:2024-11-30 06:57:52 | Epoch: 0 | Step: 332750 | Dataset: 0-6180360 | Loss: 0.578 | 598 ms/step , 115316.64 GFLOP/s , 173544.9 tokens/s INFO:__main__:2024-11-30 06:57:59 | Epoch: 0 | Step: 332760 | Dataset: 0-6182760 | Loss: 0.539 | 598 ms/step , 115433.82 GFLOP/s , 173467.5 tokens/s INFO:__main__:2024-11-30 06:58:06 | Epoch: 0 | Step: 332770 | Dataset: 0-6185160 | Loss: 0.547 | 597 ms/step , 115602.04 GFLOP/s , 173574.6 tokens/s INFO:__main__:2024-11-30 06:58:13 | Epoch: 0 | Step: 332780 | Dataset: 0-6187560 | Loss: 0.546 | 597 ms/step , 115656.13 GFLOP/s , 173573.9 tokens/s INFO:__main__:2024-11-30 06:58:21 | Epoch: 0 | Step: 332790 | Dataset: 0-6189960 | Loss: 0.509 | 597 ms/step , 115596.55 GFLOP/s , 173537.0 tokens/s INFO:__main__:2024-11-30 06:58:28 | Epoch: 0 | Step: 332800 | Dataset: 0-6192360 | Loss: 0.542 | 598 ms/step , 115398.83 GFLOP/s , 173415.1 tokens/s INFO:__main__:2024-11-30 06:58:35 | Epoch: 0 | Step: 332810 | Dataset: 0-6194760 | Loss: 0.524 | 597 ms/step , 115584.67 GFLOP/s , 173520.4 tokens/s INFO:__main__:2024-11-30 06:58:42 | Epoch: 0 | Step: 332820 | Dataset: 0-6197160 | Loss: 0.588 | 598 ms/step , 115470.76 GFLOP/s , 173517.9 tokens/s INFO:__main__:2024-11-30 06:58:49 | Epoch: 0 | Step: 332830 | Dataset: 0-6199560 | Loss: 0.504 | 598 ms/step , 115411.76 GFLOP/s , 173391.2 tokens/s INFO:__main__:2024-11-30 06:58:56 | Epoch: 0 | Step: 332840 | Dataset: 0-6201960 | Loss: 0.521 | 597 ms/step , 115547.19 GFLOP/s , 173437.8 tokens/s INFO:__main__:2024-11-30 06:59:03 | Epoch: 0 | Step: 332850 | Dataset: 0-6204360 | Loss: 0.597 | 597 ms/step , 115669.45 GFLOP/s , 173700.3 tokens/s INFO:__main__:2024-11-30 06:59:10 | Epoch: 0 | Step: 332860 | Dataset: 0-6206760 | Loss: 0.580 | 598 ms/step , 115495.96 GFLOP/s , 173547.8 tokens/s INFO:__main__:2024-11-30 06:59:17 | Epoch: 0 | Step: 332870 | Dataset: 0-6209160 | Loss: 0.632 | 598 ms/step , 115390.18 GFLOP/s , 173452.1 tokens/s INFO:__main__:2024-11-30 06:59:24 | Epoch: 0 | Step: 332880 | Dataset: 0-6211560 | Loss: 0.643 | 598 ms/step , 115480.87 GFLOP/s , 173482.1 tokens/s INFO:__main__:2024-11-30 06:59:31 | Epoch: 0 | Step: 332890 | Dataset: 0-6213960 | Loss: 0.622 | 598 ms/step , 115323.84 GFLOP/s , 173518.3 tokens/s INFO:__main__:2024-11-30 06:59:38 | Epoch: 0 | Step: 332900 | Dataset: 0-6216360 | Loss: 0.594 | 598 ms/step , 115459.52 GFLOP/s , 173473.9 tokens/s INFO:__main__:2024-11-30 06:59:46 | Epoch: 0 | Step: 332910 | Dataset: 0-6218760 | Loss: 0.536 | 598 ms/step , 115485.66 GFLOP/s , 173502.2 tokens/s INFO:__main__:2024-11-30 06:59:53 | Epoch: 0 | Step: 332920 | Dataset: 0-6221160 | Loss: 0.616 | 597 ms/step , 115577.18 GFLOP/s , 173556.1 tokens/s INFO:__main__:2024-11-30 07:00:00 | Epoch: 0 | Step: 332930 | Dataset: 0-6223560 | Loss: 0.513 | 597 ms/step , 115605.42 GFLOP/s , 173599.7 tokens/s INFO:__main__:2024-11-30 07:00:07 | Epoch: 0 | Step: 332940 | Dataset: 0-6225960 | Loss: 0.561 | 597 ms/step , 115595.76 GFLOP/s , 173544.4 tokens/s INFO:__main__:2024-11-30 07:00:14 | Epoch: 0 | Step: 332950 | Dataset: 0-6228360 | Loss: 0.546 | 598 ms/step , 115487.72 GFLOP/s , 173408.9 tokens/s INFO:__main__:2024-11-30 07:00:21 | Epoch: 0 | Step: 332960 | Dataset: 0-6230760 | Loss: 0.602 | 599 ms/step , 115293.12 GFLOP/s , 173352.0 tokens/s INFO:__main__:2024-11-30 07:00:28 | Epoch: 0 | Step: 332970 | Dataset: 0-6233160 | Loss: 0.586 | 598 ms/step , 115472.56 GFLOP/s , 173365.4 tokens/s INFO:__main__:2024-11-30 07:00:35 | Epoch: 0 | Step: 332980 | Dataset: 0-6235560 | Loss: 0.599 | 599 ms/step , 115265.11 GFLOP/s , 173322.1 tokens/s INFO:__main__:2024-11-30 07:00:42 | Epoch: 0 | Step: 332990 | Dataset: 0-6237960 | Loss: 0.543 | 597 ms/step , 115554.62 GFLOP/s , 173441.1 tokens/s INFO:__main__:2024-11-30 07:00:50 | Validation | Step: 333000 | Val_loss: 0.773 | Best_val_loss: 0.4363 INFO:__main__:2024-11-30 07:00:50 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_070050_step_333000.pt` INFO:__main__:2024-11-30 07:00:53 | Epoch: 0 | Step: 333000 | Dataset: 0-6240360 | Loss: 0.569 | 596 ms/step , 115792.16 GFLOP/s , 119266.6 tokens/s INFO:__main__:2024-11-30 07:01:00 | Epoch: 0 | Step: 333010 | Dataset: 0-6242760 | Loss: 0.630 | 599 ms/step , 115239.86 GFLOP/s , 173102.5 tokens/s INFO:__main__:2024-11-30 07:01:07 | Epoch: 0 | Step: 333020 | Dataset: 0-6245160 | Loss: 0.590 | 598 ms/step , 115456.24 GFLOP/s , 173135.1 tokens/s INFO:__main__:2024-11-30 07:01:14 | Epoch: 0 | Step: 333030 | Dataset: 0-6247560 | Loss: 0.597 | 597 ms/step , 115524.61 GFLOP/s , 173097.2 tokens/s INFO:__main__:2024-11-30 07:01:21 | Epoch: 0 | Step: 333040 | Dataset: 0-6249960 | Loss: 0.509 | 598 ms/step , 115398.09 GFLOP/s , 173028.8 tokens/s INFO:__main__:2024-11-30 07:01:28 | Epoch: 0 | Step: 333050 | Dataset: 0-6252360 | Loss: 0.569 | 598 ms/step , 115398.12 GFLOP/s , 173128.6 tokens/s INFO:__main__:2024-11-30 07:01:35 | Epoch: 0 | Step: 333060 | Dataset: 0-6254760 | Loss: 0.592 | 597 ms/step , 115553.71 GFLOP/s , 173610.0 tokens/s INFO:__main__:2024-11-30 07:01:42 | Epoch: 0 | Step: 333070 | Dataset: 0-6257160 | Loss: 0.561 | 600 ms/step , 115063.87 GFLOP/s , 173625.7 tokens/s INFO:__main__:2024-11-30 07:01:49 | Epoch: 0 | Step: 333080 | Dataset: 0-6259560 | Loss: 0.615 | 597 ms/step , 115578.18 GFLOP/s , 173585.8 tokens/s INFO:__main__:2024-11-30 07:01:56 | Epoch: 0 | Step: 333090 | Dataset: 0-6261960 | Loss: 0.591 | 597 ms/step , 115508.73 GFLOP/s , 173494.3 tokens/s INFO:__main__:2024-11-30 07:02:03 | Epoch: 0 | Step: 333100 | Dataset: 0-6264360 | Loss: 0.573 | 598 ms/step , 115406.39 GFLOP/s , 173499.6 tokens/s INFO:__main__:2024-11-30 07:02:10 | Epoch: 0 | Step: 333110 | Dataset: 0-6266760 | Loss: 0.589 | 598 ms/step , 115380.55 GFLOP/s , 173521.5 tokens/s INFO:__main__:2024-11-30 07:02:18 | Epoch: 0 | Step: 333120 | Dataset: 0-6269160 | Loss: 0.583 | 597 ms/step , 115622.93 GFLOP/s , 173523.6 tokens/s INFO:__main__:2024-11-30 07:02:25 | Epoch: 0 | Step: 333130 | Dataset: 0-6271560 | Loss: 0.590 | 598 ms/step , 115392.27 GFLOP/s , 173461.2 tokens/s INFO:__main__:2024-11-30 07:02:32 | Epoch: 0 | Step: 333140 | Dataset: 0-6273960 | Loss: 0.577 | 597 ms/step , 115568.79 GFLOP/s , 173550.2 tokens/s INFO:__main__:2024-11-30 07:02:39 | Epoch: 0 | Step: 333150 | Dataset: 0-6276360 | Loss: 0.564 | 598 ms/step , 115453.00 GFLOP/s , 173558.2 tokens/s INFO:__main__:2024-11-30 07:02:46 | Epoch: 0 | Step: 333160 | Dataset: 0-6278760 | Loss: 0.609 | 598 ms/step , 115488.09 GFLOP/s , 173502.2 tokens/s INFO:__main__:2024-11-30 07:02:53 | Epoch: 0 | Step: 333170 | Dataset: 0-6281160 | Loss: 0.550 | 598 ms/step , 115406.95 GFLOP/s , 173516.8 tokens/s INFO:__main__:2024-11-30 07:03:00 | Epoch: 0 | Step: 333180 | Dataset: 0-6283560 | Loss: 0.515 | 597 ms/step , 115521.47 GFLOP/s , 173451.9 tokens/s INFO:__main__:2024-11-30 07:03:07 | Epoch: 0 | Step: 333190 | Dataset: 0-6285960 | Loss: 0.519 | 598 ms/step , 115435.43 GFLOP/s , 173472.1 tokens/s INFO:__main__:2024-11-30 07:03:14 | Epoch: 0 | Step: 333200 | Dataset: 0-6288360 | Loss: 0.507 | 598 ms/step , 115476.82 GFLOP/s , 173601.2 tokens/s INFO:__main__:2024-11-30 07:03:21 | Epoch: 0 | Step: 333210 | Dataset: 0-6290760 | Loss: 0.455 | 597 ms/step , 115612.88 GFLOP/s , 173555.2 tokens/s INFO:__main__:2024-11-30 07:03:28 | Epoch: 0 | Step: 333220 | Dataset: 0-6293160 | Loss: 0.751 | 597 ms/step , 115595.19 GFLOP/s , 173692.2 tokens/s INFO:__main__:2024-11-30 07:03:35 | Epoch: 0 | Step: 333230 | Dataset: 0-6295560 | Loss: 0.721 | 597 ms/step , 115553.09 GFLOP/s , 173480.3 tokens/s INFO:__main__:2024-11-30 07:03:43 | Epoch: 0 | Step: 333240 | Dataset: 0-6297960 | Loss: 0.652 | 598 ms/step , 115464.34 GFLOP/s , 173452.5 tokens/s INFO:__main__:2024-11-30 07:03:50 | Epoch: 0 | Step: 333250 | Dataset: 0-6300360 | Loss: 0.650 | 598 ms/step , 115488.52 GFLOP/s , 173427.3 tokens/s INFO:__main__:2024-11-30 07:03:57 | Epoch: 0 | Step: 333260 | Dataset: 0-6302760 | Loss: 0.666 | 598 ms/step , 115334.77 GFLOP/s , 173409.7 tokens/s INFO:__main__:2024-11-30 07:04:04 | Epoch: 0 | Step: 333270 | Dataset: 0-6305160 | Loss: 0.723 | 598 ms/step , 115459.91 GFLOP/s , 173508.4 tokens/s INFO:__main__:2024-11-30 07:04:11 | Epoch: 0 | Step: 333280 | Dataset: 0-6307560 | Loss: 0.693 | 599 ms/step , 115269.62 GFLOP/s , 173358.4 tokens/s INFO:__main__:2024-11-30 07:04:18 | Epoch: 0 | Step: 333290 | Dataset: 0-6309960 | Loss: 0.759 | 597 ms/step , 115570.36 GFLOP/s , 173576.6 tokens/s INFO:__main__:2024-11-30 07:04:25 | Epoch: 0 | Step: 333300 | Dataset: 0-6312360 | Loss: 0.541 | 601 ms/step , 114803.27 GFLOP/s , 173510.9 tokens/s INFO:__main__:2024-11-30 07:04:32 | Epoch: 0 | Step: 333310 | Dataset: 0-6314760 | Loss: 0.670 | 598 ms/step , 115387.12 GFLOP/s , 173395.8 tokens/s INFO:__main__:2024-11-30 07:04:39 | Epoch: 0 | Step: 333320 | Dataset: 0-6317160 | Loss: 0.701 | 598 ms/step , 115449.35 GFLOP/s , 173448.2 tokens/s INFO:__main__:2024-11-30 07:04:46 | Epoch: 0 | Step: 333330 | Dataset: 0-6319560 | Loss: 0.688 | 598 ms/step , 115404.28 GFLOP/s , 173439.4 tokens/s INFO:__main__:2024-11-30 07:04:53 | Epoch: 0 | Step: 333340 | Dataset: 0-6321960 | Loss: 0.670 | 597 ms/step , 115584.57 GFLOP/s , 173409.5 tokens/s INFO:__main__:2024-11-30 07:05:00 | Epoch: 0 | Step: 333350 | Dataset: 0-6324360 | Loss: 0.687 | 598 ms/step , 115404.69 GFLOP/s , 173447.0 tokens/s INFO:__main__:2024-11-30 07:05:08 | Epoch: 0 | Step: 333360 | Dataset: 0-6326760 | Loss: 0.682 | 597 ms/step , 115516.26 GFLOP/s , 173496.5 tokens/s INFO:__main__:2024-11-30 07:05:15 | Epoch: 0 | Step: 333370 | Dataset: 0-6329160 | Loss: 0.677 | 598 ms/step , 115437.06 GFLOP/s , 173605.6 tokens/s INFO:__main__:2024-11-30 07:05:22 | Epoch: 0 | Step: 333380 | Dataset: 0-6331560 | Loss: 0.624 | 598 ms/step , 115489.35 GFLOP/s , 173436.2 tokens/s INFO:__main__:2024-11-30 07:05:29 | Epoch: 0 | Step: 333390 | Dataset: 0-6333960 | Loss: 0.610 | 598 ms/step , 115428.06 GFLOP/s , 173415.3 tokens/s INFO:__main__:2024-11-30 07:05:36 | Epoch: 0 | Step: 333400 | Dataset: 0-6336360 | Loss: 0.644 | 597 ms/step , 115527.09 GFLOP/s , 173396.8 tokens/s INFO:__main__:2024-11-30 07:05:43 | Epoch: 0 | Step: 333410 | Dataset: 0-6338760 | Loss: 0.637 | 598 ms/step , 115385.84 GFLOP/s , 173400.9 tokens/s INFO:__main__:2024-11-30 07:05:50 | Epoch: 0 | Step: 333420 | Dataset: 0-6341160 | Loss: 0.729 | 598 ms/step , 115439.30 GFLOP/s , 173472.0 tokens/s INFO:__main__:2024-11-30 07:05:57 | Epoch: 0 | Step: 333430 | Dataset: 0-6343560 | Loss: 0.772 | 598 ms/step , 115463.90 GFLOP/s , 173377.8 tokens/s INFO:__main__:2024-11-30 07:06:04 | Epoch: 0 | Step: 333440 | Dataset: 0-6345960 | Loss: 0.609 | 596 ms/step , 115755.70 GFLOP/s , 173602.0 tokens/s INFO:__main__:2024-11-30 07:06:11 | Epoch: 0 | Step: 333450 | Dataset: 0-6348360 | Loss: 0.581 | 597 ms/step , 115533.72 GFLOP/s , 173579.2 tokens/s INFO:__main__:2024-11-30 07:06:18 | Epoch: 0 | Step: 333460 | Dataset: 0-6350760 | Loss: 0.731 | 598 ms/step , 115438.21 GFLOP/s , 173393.4 tokens/s INFO:__main__:2024-11-30 07:06:25 | Epoch: 0 | Step: 333470 | Dataset: 0-6353160 | Loss: 0.703 | 598 ms/step , 115407.63 GFLOP/s , 173448.5 tokens/s INFO:__main__:2024-11-30 07:06:33 | Epoch: 0 | Step: 333480 | Dataset: 0-6355560 | Loss: 0.663 | 597 ms/step , 115504.03 GFLOP/s , 173413.6 tokens/s INFO:__main__:2024-11-30 07:06:40 | Epoch: 0 | Step: 333490 | Dataset: 0-6357960 | Loss: 0.699 | 598 ms/step , 115471.53 GFLOP/s , 173383.2 tokens/s INFO:__main__:2024-11-30 07:06:47 | Validation | Step: 333500 | Val_loss: 0.745 | Best_val_loss: 0.4363 INFO:__main__:2024-11-30 07:06:48 | Epoch: 0 | Step: 333500 | Dataset: 0-6360360 | Loss: 0.677 | 597 ms/step , 115616.73 GFLOP/s , 147540.1 tokens/s INFO:__main__:2024-11-30 07:06:55 | Epoch: 0 | Step: 333510 | Dataset: 0-6362760 | Loss: 0.628 | 597 ms/step , 115544.26 GFLOP/s , 173628.4 tokens/s INFO:__main__:2024-11-30 07:07:02 | Epoch: 0 | Step: 333520 | Dataset: 0-6365160 | Loss: 0.680 | 597 ms/step , 115541.40 GFLOP/s , 173698.7 tokens/s INFO:__main__:2024-11-30 07:07:09 | Epoch: 0 | Step: 333530 | Dataset: 0-6367560 | Loss: 0.669 | 598 ms/step , 115408.59 GFLOP/s , 173495.9 tokens/s INFO:__main__:2024-11-30 07:07:16 | Epoch: 0 | Step: 333540 | Dataset: 0-6369960 | Loss: 0.724 | 598 ms/step , 115416.35 GFLOP/s , 173513.9 tokens/s INFO:__main__:2024-11-30 07:07:23 | Epoch: 0 | Step: 333550 | Dataset: 0-6372360 | Loss: 0.716 | 598 ms/step , 115415.02 GFLOP/s , 173490.8 tokens/s INFO:__main__:2024-11-30 07:07:30 | Epoch: 0 | Step: 333560 | Dataset: 0-6374760 | Loss: 0.669 | 597 ms/step , 115532.04 GFLOP/s , 173424.0 tokens/s INFO:__main__:2024-11-30 07:07:38 | Epoch: 0 | Step: 333570 | Dataset: 0-6377160 | Loss: 0.627 | 598 ms/step , 115407.48 GFLOP/s , 173384.8 tokens/s INFO:__main__:2024-11-30 07:07:45 | Epoch: 0 | Step: 333580 | Dataset: 0-6379560 | Loss: 0.691 | 597 ms/step , 115580.54 GFLOP/s , 173501.8 tokens/s INFO:__main__:2024-11-30 07:07:52 | Epoch: 0 | Step: 333590 | Dataset: 0-6381960 | Loss: 0.763 | 598 ms/step , 115440.55 GFLOP/s , 173575.9 tokens/s INFO:__main__:2024-11-30 07:07:59 | Epoch: 0 | Step: 333600 | Dataset: 0-6384360 | Loss: 0.769 | 598 ms/step , 115368.33 GFLOP/s , 173503.3 tokens/s INFO:__main__:2024-11-30 07:08:06 | Epoch: 0 | Step: 333610 | Dataset: 0-6386760 | Loss: 0.679 | 597 ms/step , 115544.31 GFLOP/s , 173492.2 tokens/s INFO:__main__:2024-11-30 07:08:13 | Epoch: 0 | Step: 333620 | Dataset: 0-6389160 | Loss: 0.717 | 598 ms/step , 115459.12 GFLOP/s , 173358.5 tokens/s INFO:__main__:2024-11-30 07:08:20 | Epoch: 0 | Step: 333630 | Dataset: 0-6391560 | Loss: 0.729 | 598 ms/step , 115462.92 GFLOP/s , 173409.4 tokens/s INFO:__main__:2024-11-30 07:08:27 | Epoch: 0 | Step: 333640 | Dataset: 0-6393960 | Loss: 0.703 | 598 ms/step , 115474.24 GFLOP/s , 173411.9 tokens/s INFO:__main__:2024-11-30 07:08:34 | Epoch: 0 | Step: 333650 | Dataset: 0-6396360 | Loss: 0.639 | 598 ms/step , 115401.04 GFLOP/s , 173457.2 tokens/s INFO:__main__:2024-11-30 07:08:41 | Epoch: 0 | Step: 333660 | Dataset: 0-6398760 | Loss: 0.784 | 597 ms/step , 115544.58 GFLOP/s , 173518.5 tokens/s INFO:__main__:2024-11-30 07:08:48 | Epoch: 0 | Step: 333670 | Dataset: 0-6401160 | Loss: 0.679 | 597 ms/step , 115529.84 GFLOP/s , 173574.0 tokens/s INFO:__main__:2024-11-30 07:08:55 | Epoch: 0 | Step: 333680 | Dataset: 0-6403560 | Loss: 0.635 | 598 ms/step , 115345.61 GFLOP/s , 173462.9 tokens/s INFO:__main__:2024-11-30 07:09:03 | Epoch: 0 | Step: 333690 | Dataset: 0-6405960 | Loss: 0.610 | 598 ms/step , 115416.90 GFLOP/s , 173422.6 tokens/s INFO:__main__:2024-11-30 07:09:10 | Epoch: 0 | Step: 333700 | Dataset: 0-6408360 | Loss: 0.671 | 598 ms/step , 115440.82 GFLOP/s , 173420.1 tokens/s INFO:__main__:2024-11-30 07:09:17 | Epoch: 0 | Step: 333710 | Dataset: 0-6410760 | Loss: 0.662 | 598 ms/step , 115369.75 GFLOP/s , 173473.7 tokens/s INFO:__main__:2024-11-30 07:09:24 | Epoch: 0 | Step: 333720 | Dataset: 0-6413160 | Loss: 0.772 | 598 ms/step , 115463.66 GFLOP/s , 173379.8 tokens/s INFO:__main__:2024-11-30 07:09:31 | Epoch: 0 | Step: 333730 | Dataset: 0-6415560 | Loss: 0.767 | 598 ms/step , 115348.76 GFLOP/s , 173449.8 tokens/s INFO:__main__:2024-11-30 07:09:38 | Epoch: 0 | Step: 333740 | Dataset: 0-6417960 | Loss: 0.659 | 597 ms/step , 115520.01 GFLOP/s , 173556.7 tokens/s INFO:__main__:2024-11-30 07:09:45 | Epoch: 0 | Step: 333750 | Dataset: 0-6420360 | Loss: 0.661 | 598 ms/step , 115464.31 GFLOP/s , 173496.0 tokens/s INFO:__main__:2024-11-30 07:09:52 | Epoch: 0 | Step: 333760 | Dataset: 0-6422760 | Loss: 0.683 | 598 ms/step , 115397.61 GFLOP/s , 173477.5 tokens/s INFO:__main__:2024-11-30 07:09:59 | Epoch: 0 | Step: 333770 | Dataset: 0-6425160 | Loss: 0.412 | 597 ms/step , 115574.76 GFLOP/s , 173684.2 tokens/s INFO:__main__:2024-11-30 07:10:06 | Epoch: 0 | Step: 333780 | Dataset: 0-6427560 | Loss: 0.384 | 597 ms/step , 115686.30 GFLOP/s , 173630.7 tokens/s INFO:__main__:2024-11-30 07:10:13 | Epoch: 0 | Step: 333790 | Dataset: 0-6429960 | Loss: 0.357 | 597 ms/step , 115655.68 GFLOP/s , 173678.8 tokens/s INFO:__main__:2024-11-30 07:10:20 | Epoch: 0 | Step: 333800 | Dataset: 0-6432360 | Loss: 0.340 | 597 ms/step , 115658.28 GFLOP/s , 173726.4 tokens/s INFO:__main__:2024-11-30 07:10:27 | Epoch: 0 | Step: 333810 | Dataset: 0-6434760 | Loss: 0.315 | 597 ms/step , 115685.03 GFLOP/s , 173762.7 tokens/s INFO:__main__:2024-11-30 07:10:35 | Epoch: 0 | Step: 333820 | Dataset: 0-6437160 | Loss: 0.308 | 596 ms/step , 115765.18 GFLOP/s , 173879.0 tokens/s INFO:__main__:2024-11-30 07:10:42 | Epoch: 0 | Step: 333830 | Dataset: 0-6439560 | Loss: 0.290 | 597 ms/step , 115573.09 GFLOP/s , 173698.6 tokens/s INFO:__main__:2024-11-30 07:10:49 | Epoch: 0 | Step: 333840 | Dataset: 0-6441960 | Loss: 0.278 | 598 ms/step , 115345.70 GFLOP/s , 173676.2 tokens/s INFO:__main__:2024-11-30 07:10:56 | Epoch: 0 | Step: 333850 | Dataset: 0-6444360 | Loss: 0.270 | 597 ms/step , 115649.08 GFLOP/s , 173654.8 tokens/s INFO:__main__:2024-11-30 07:11:03 | Epoch: 0 | Step: 333860 | Dataset: 0-6446760 | Loss: 0.276 | 596 ms/step , 115731.42 GFLOP/s , 173680.3 tokens/s INFO:__main__:2024-11-30 07:11:10 | Epoch: 0 | Step: 333870 | Dataset: 0-6449160 | Loss: 0.255 | 597 ms/step , 115606.01 GFLOP/s , 173696.9 tokens/s INFO:__main__:2024-11-30 07:11:17 | Epoch: 0 | Step: 333880 | Dataset: 0-6451560 | Loss: 0.256 | 597 ms/step , 115683.81 GFLOP/s , 173718.6 tokens/s INFO:__main__:2024-11-30 07:11:24 | Epoch: 0 | Step: 333890 | Dataset: 0-6453960 | Loss: 0.247 | 596 ms/step , 115760.11 GFLOP/s , 173884.3 tokens/s INFO:__main__:2024-11-30 07:11:31 | Epoch: 0 | Step: 333900 | Dataset: 0-6456360 | Loss: 0.241 | 597 ms/step , 115505.19 GFLOP/s , 173755.6 tokens/s INFO:__main__:2024-11-30 07:11:38 | Epoch: 0 | Step: 333910 | Dataset: 0-6458760 | Loss: 0.246 | 596 ms/step , 115700.03 GFLOP/s , 173659.5 tokens/s INFO:__main__:2024-11-30 07:11:45 | Epoch: 0 | Step: 333920 | Dataset: 0-6461160 | Loss: 0.227 | 597 ms/step , 115547.98 GFLOP/s , 173695.9 tokens/s INFO:__main__:2024-11-30 07:11:52 | Epoch: 0 | Step: 333930 | Dataset: 0-6463560 | Loss: 0.230 | 597 ms/step , 115565.03 GFLOP/s , 173705.6 tokens/s INFO:__main__:2024-11-30 07:11:59 | Epoch: 0 | Step: 333940 | Dataset: 0-6465960 | Loss: 0.230 | 597 ms/step , 115558.99 GFLOP/s , 173659.1 tokens/s INFO:__main__:2024-11-30 07:12:07 | Epoch: 0 | Step: 333950 | Dataset: 0-6468360 | Loss: 0.247 | 596 ms/step , 115707.41 GFLOP/s , 173686.0 tokens/s INFO:__main__:2024-11-30 07:12:14 | Epoch: 0 | Step: 333960 | Dataset: 0-6470760 | Loss: 0.234 | 597 ms/step , 115566.13 GFLOP/s , 173720.2 tokens/s INFO:__main__:2024-11-30 07:12:21 | Epoch: 0 | Step: 333970 | Dataset: 0-6473160 | Loss: 0.414 | 596 ms/step , 115734.39 GFLOP/s , 173778.2 tokens/s INFO:__main__:2024-11-30 07:12:28 | Epoch: 0 | Step: 333980 | Dataset: 0-6475560 | Loss: 0.408 | 597 ms/step , 115658.86 GFLOP/s , 173718.3 tokens/s INFO:__main__:2024-11-30 07:12:35 | Epoch: 0 | Step: 333990 | Dataset: 0-6477960 | Loss: 0.357 | 597 ms/step , 115664.92 GFLOP/s , 173638.2 tokens/s INFO:__main__:2024-11-30 07:12:42 | Validation | Step: 334000 | Val_loss: 0.827 | Best_val_loss: 0.4363 INFO:__main__:2024-11-30 07:12:42 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_071242_step_334000.pt` INFO:__main__:2024-11-30 07:12:45 | Epoch: 0 | Step: 334000 | Dataset: 0-6480360 | Loss: 0.362 | 595 ms/step , 116064.42 GFLOP/s , 117060.1 tokens/s INFO:__main__:2024-11-30 07:12:52 | Epoch: 0 | Step: 334010 | Dataset: 0-6482760 | Loss: 0.358 | 599 ms/step , 115265.47 GFLOP/s , 173342.4 tokens/s INFO:__main__:2024-11-30 07:13:00 | Epoch: 0 | Step: 334020 | Dataset: 0-6485160 | Loss: 0.349 | 597 ms/step , 115525.80 GFLOP/s , 173258.5 tokens/s INFO:__main__:2024-11-30 07:13:07 | Epoch: 0 | Step: 334030 | Dataset: 0-6487560 | Loss: 0.322 | 596 ms/step , 115779.62 GFLOP/s , 173363.3 tokens/s INFO:__main__:2024-11-30 07:13:14 | Epoch: 0 | Step: 334040 | Dataset: 0-6489960 | Loss: 0.285 | 597 ms/step , 115641.05 GFLOP/s , 173401.8 tokens/s INFO:__main__:2024-11-30 07:13:21 | Epoch: 0 | Step: 334050 | Dataset: 0-6492360 | Loss: 0.333 | 597 ms/step , 115691.46 GFLOP/s , 173250.0 tokens/s INFO:__main__:2024-11-30 07:13:28 | Epoch: 0 | Step: 334060 | Dataset: 0-6494760 | Loss: 0.356 | 596 ms/step , 115722.17 GFLOP/s , 173286.6 tokens/s INFO:__main__:2024-11-30 07:13:35 | Epoch: 0 | Step: 334070 | Dataset: 0-6497160 | Loss: 0.367 | 597 ms/step , 115579.41 GFLOP/s , 173756.0 tokens/s INFO:__main__:2024-11-30 07:13:42 | Epoch: 0 | Step: 334080 | Dataset: 0-6499560 | Loss: 0.305 | 597 ms/step , 115650.84 GFLOP/s , 173762.8 tokens/s INFO:__main__:2024-11-30 07:13:49 | Epoch: 0 | Step: 334090 | Dataset: 0-6501960 | Loss: 0.299 | 597 ms/step , 115524.40 GFLOP/s , 173691.1 tokens/s INFO:__main__:2024-11-30 07:13:56 | Epoch: 0 | Step: 334100 | Dataset: 0-6504360 | Loss: 0.434 | 597 ms/step , 115667.46 GFLOP/s , 173702.0 tokens/s INFO:__main__:2024-11-30 07:14:03 | Epoch: 0 | Step: 334110 | Dataset: 0-6506760 | Loss: 0.313 | 596 ms/step , 115772.73 GFLOP/s , 173872.4 tokens/s INFO:__main__:2024-11-30 07:14:10 | Epoch: 0 | Step: 334120 | Dataset: 0-6509160 | Loss: 0.310 | 597 ms/step , 115665.45 GFLOP/s , 173715.8 tokens/s INFO:__main__:2024-11-30 07:14:17 | Epoch: 0 | Step: 334130 | Dataset: 0-6511560 | Loss: 0.293 | 597 ms/step , 115628.97 GFLOP/s , 173666.1 tokens/s INFO:__main__:2024-11-30 07:14:24 | Epoch: 0 | Step: 334140 | Dataset: 0-6513960 | Loss: 0.317 | 597 ms/step , 115601.00 GFLOP/s , 173701.0 tokens/s INFO:__main__:2024-11-30 07:14:32 | Epoch: 0 | Step: 334150 | Dataset: 0-6516360 | Loss: 0.292 | 597 ms/step , 115572.34 GFLOP/s , 173650.1 tokens/s INFO:__main__:2024-11-30 07:14:39 | Epoch: 0 | Step: 334160 | Dataset: 0-6518760 | Loss: 0.251 | 597 ms/step , 115584.82 GFLOP/s , 173642.5 tokens/s INFO:__main__:2024-11-30 07:14:46 | Epoch: 0 | Step: 334170 | Dataset: 0-6521160 | Loss: 0.291 | 597 ms/step , 115625.36 GFLOP/s , 173628.3 tokens/s INFO:__main__:2024-11-30 07:14:53 | Epoch: 0 | Step: 334180 | Dataset: 0-6523560 | Loss: 0.636 | 596 ms/step , 115712.99 GFLOP/s , 173751.0 tokens/s INFO:__main__:2024-11-30 07:15:00 | Epoch: 0 | Step: 334190 | Dataset: 0-6525960 | Loss: 0.630 | 597 ms/step , 115607.80 GFLOP/s , 173729.2 tokens/s INFO:__main__:2024-11-30 07:15:07 | Epoch: 0 | Step: 334200 | Dataset: 0-6528360 | Loss: 0.597 | 597 ms/step , 115533.54 GFLOP/s , 173666.7 tokens/s INFO:__main__:2024-11-30 07:15:14 | Epoch: 0 | Step: 334210 | Dataset: 0-6530760 | Loss: 0.547 | 597 ms/step , 115550.59 GFLOP/s , 173571.9 tokens/s INFO:__main__:2024-11-30 07:15:21 | Epoch: 0 | Step: 334220 | Dataset: 0-6533160 | Loss: 0.548 | 597 ms/step , 115653.33 GFLOP/s , 173583.2 tokens/s INFO:__main__:2024-11-30 07:15:28 | Epoch: 0 | Step: 334230 | Dataset: 0-6535560 | Loss: 0.579 | 597 ms/step , 115653.16 GFLOP/s , 173619.2 tokens/s INFO:__main__:2024-11-30 07:15:35 | Epoch: 0 | Step: 334240 | Dataset: 0-6537960 | Loss: 0.600 | 597 ms/step , 115558.33 GFLOP/s , 173598.3 tokens/s INFO:__main__:2024-11-30 07:15:42 | Epoch: 0 | Step: 334250 | Dataset: 0-6540360 | Loss: 0.554 | 596 ms/step , 115697.82 GFLOP/s , 173656.0 tokens/s INFO:__main__:2024-11-30 07:15:49 | Epoch: 0 | Step: 334260 | Dataset: 0-6542760 | Loss: 0.561 | 597 ms/step , 115555.38 GFLOP/s , 173714.1 tokens/s INFO:__main__:2024-11-30 07:15:56 | Epoch: 0 | Step: 334270 | Dataset: 0-6545160 | Loss: 0.562 | 597 ms/step , 115685.02 GFLOP/s , 173657.4 tokens/s INFO:__main__:2024-11-30 07:16:04 | Epoch: 0 | Step: 334280 | Dataset: 0-6547560 | Loss: 0.558 | 597 ms/step , 115637.15 GFLOP/s , 173625.8 tokens/s INFO:__main__:2024-11-30 07:16:11 | Epoch: 0 | Step: 334290 | Dataset: 0-6549960 | Loss: 0.563 | 597 ms/step , 115656.67 GFLOP/s , 173577.8 tokens/s INFO:__main__:2024-11-30 07:16:18 | Epoch: 0 | Step: 334300 | Dataset: 0-6552360 | Loss: 0.590 | 597 ms/step , 115535.95 GFLOP/s , 173590.9 tokens/s INFO:__main__:2024-11-30 07:16:25 | Epoch: 0 | Step: 334310 | Dataset: 0-6554760 | Loss: 0.417 | 597 ms/step , 115560.07 GFLOP/s , 173598.1 tokens/s INFO:__main__:2024-11-30 07:16:32 | Epoch: 0 | Step: 334320 | Dataset: 0-6557160 | Loss: 0.452 | 597 ms/step , 115621.40 GFLOP/s , 173529.3 tokens/s INFO:__main__:2024-11-30 07:16:39 | Epoch: 0 | Step: 334330 | Dataset: 0-6559560 | Loss: 0.362 | 596 ms/step , 115696.46 GFLOP/s , 173692.2 tokens/s INFO:__main__:2024-11-30 07:16:46 | Epoch: 0 | Step: 334340 | Dataset: 0-6561960 | Loss: 0.370 | 597 ms/step , 115529.42 GFLOP/s , 173663.9 tokens/s INFO:__main__:2024-11-30 07:16:53 | Epoch: 0 | Step: 334350 | Dataset: 0-6564360 | Loss: 0.376 | 599 ms/step , 115302.68 GFLOP/s , 173585.1 tokens/s INFO:__main__:2024-11-30 07:17:00 | Epoch: 0 | Step: 334360 | Dataset: 0-6566760 | Loss: 0.371 | 597 ms/step , 115504.38 GFLOP/s , 173594.9 tokens/s INFO:__main__:2024-11-30 07:17:07 | Epoch: 0 | Step: 334370 | Dataset: 0-6569160 | Loss: 0.403 | 597 ms/step , 115587.31 GFLOP/s , 173588.2 tokens/s INFO:__main__:2024-11-30 07:17:14 | Epoch: 0 | Step: 334380 | Dataset: 0-6571560 | Loss: 0.336 | 597 ms/step , 115588.96 GFLOP/s , 173601.7 tokens/s INFO:__main__:2024-11-30 07:17:21 | Epoch: 0 | Step: 334390 | Dataset: 0-6573960 | Loss: 0.373 | 597 ms/step , 115640.25 GFLOP/s , 173596.4 tokens/s INFO:__main__:2024-11-30 07:17:28 | Epoch: 0 | Step: 334400 | Dataset: 0-6576360 | Loss: 0.376 | 596 ms/step , 115769.79 GFLOP/s , 173633.6 tokens/s INFO:__main__:2024-11-30 07:17:36 | Epoch: 0 | Step: 334410 | Dataset: 0-6578760 | Loss: 0.364 | 596 ms/step , 115842.57 GFLOP/s , 173730.3 tokens/s INFO:__main__:2024-11-30 07:17:43 | Epoch: 0 | Step: 334420 | Dataset: 0-6581160 | Loss: 0.412 | 597 ms/step , 115568.00 GFLOP/s , 173558.5 tokens/s INFO:__main__:2024-11-30 07:17:50 | Epoch: 0 | Step: 334430 | Dataset: 0-6583560 | Loss: 0.369 | 597 ms/step , 115679.59 GFLOP/s , 173587.2 tokens/s INFO:__main__:2024-11-30 07:17:57 | Epoch: 0 | Step: 334440 | Dataset: 0-6585960 | Loss: 0.379 | 597 ms/step , 115559.78 GFLOP/s , 173606.8 tokens/s INFO:__main__:2024-11-30 07:18:04 | Epoch: 0 | Step: 334450 | Dataset: 0-6588360 | Loss: 0.357 | 597 ms/step , 115665.79 GFLOP/s , 173627.5 tokens/s INFO:__main__:2024-11-30 07:18:11 | Epoch: 0 | Step: 334460 | Dataset: 0-6590760 | Loss: 0.347 | 598 ms/step , 115457.99 GFLOP/s , 173585.7 tokens/s INFO:__main__:2024-11-30 07:18:18 | Epoch: 0 | Step: 334470 | Dataset: 0-6593160 | Loss: 0.391 | 597 ms/step , 115692.37 GFLOP/s , 173582.6 tokens/s INFO:__main__:2024-11-30 07:18:25 | Epoch: 0 | Step: 334480 | Dataset: 0-6595560 | Loss: 0.377 | 597 ms/step , 115592.85 GFLOP/s , 173718.2 tokens/s INFO:__main__:2024-11-30 07:18:32 | Epoch: 0 | Step: 334490 | Dataset: 0-6597960 | Loss: 0.376 | 597 ms/step , 115613.12 GFLOP/s , 173751.8 tokens/s INFO:__main__:2024-11-30 07:18:40 | Validation | Step: 334500 | Val_loss: 0.825 | Best_val_loss: 0.4363 INFO:__main__:2024-11-30 07:18:40 | Epoch: 0 | Step: 334500 | Dataset: 0-6600360 | Loss: 0.379 | 596 ms/step , 115754.11 GFLOP/s , 147598.3 tokens/s INFO:__main__:2024-11-30 07:18:48 | Epoch: 0 | Step: 334510 | Dataset: 0-6602760 | Loss: 0.350 | 597 ms/step , 115561.94 GFLOP/s , 173644.5 tokens/s INFO:__main__:2024-11-30 07:18:55 | Epoch: 0 | Step: 334520 | Dataset: 0-6605160 | Loss: 0.346 | 597 ms/step , 115557.37 GFLOP/s , 173569.8 tokens/s INFO:__main__:2024-11-30 07:19:02 | Epoch: 0 | Step: 334530 | Dataset: 0-6607560 | Loss: 0.344 | 598 ms/step , 115356.51 GFLOP/s , 173669.6 tokens/s INFO:__main__:2024-11-30 07:19:09 | Epoch: 0 | Step: 334540 | Dataset: 0-6609960 | Loss: 0.334 | 597 ms/step , 115567.04 GFLOP/s , 173584.5 tokens/s INFO:__main__:2024-11-30 07:19:16 | Epoch: 0 | Step: 334550 | Dataset: 0-6612360 | Loss: 0.369 | 596 ms/step , 115824.09 GFLOP/s , 173672.7 tokens/s INFO:__main__:2024-11-30 07:19:23 | Epoch: 0 | Step: 334560 | Dataset: 0-6614760 | Loss: 0.423 | 597 ms/step , 115538.18 GFLOP/s , 173782.7 tokens/s INFO:__main__:2024-11-30 07:19:30 | Epoch: 0 | Step: 334570 | Dataset: 0-6617160 | Loss: 0.416 | 597 ms/step , 115632.03 GFLOP/s , 173631.1 tokens/s INFO:__main__:2024-11-30 07:19:37 | Epoch: 0 | Step: 334580 | Dataset: 0-6619560 | Loss: 0.419 | 598 ms/step , 115460.17 GFLOP/s , 173629.8 tokens/s INFO:__main__:2024-11-30 07:19:44 | Epoch: 0 | Step: 334590 | Dataset: 0-6621960 | Loss: 0.388 | 598 ms/step , 115419.62 GFLOP/s , 173575.7 tokens/s INFO:__main__:2024-11-30 07:19:51 | Epoch: 0 | Step: 334600 | Dataset: 0-6624360 | Loss: 0.408 | 598 ms/step , 115442.11 GFLOP/s , 173556.3 tokens/s INFO:__main__:2024-11-30 07:19:58 | Epoch: 0 | Step: 334610 | Dataset: 0-6626760 | Loss: 0.353 | 597 ms/step , 115578.08 GFLOP/s , 173616.1 tokens/s INFO:__main__:2024-11-30 07:20:05 | Epoch: 0 | Step: 334620 | Dataset: 0-6629160 | Loss: 0.385 | 597 ms/step , 115505.80 GFLOP/s , 173637.1 tokens/s INFO:__main__:2024-11-30 07:20:12 | Epoch: 0 | Step: 334630 | Dataset: 0-6631560 | Loss: 0.379 | 596 ms/step , 115696.70 GFLOP/s , 173822.3 tokens/s INFO:__main__:2024-11-30 07:20:20 | Epoch: 0 | Step: 334640 | Dataset: 0-6633960 | Loss: 0.384 | 597 ms/step , 115576.21 GFLOP/s , 173753.6 tokens/s INFO:__main__:2024-11-30 07:20:27 | Epoch: 0 | Step: 334650 | Dataset: 0-6636360 | Loss: 0.335 | 597 ms/step , 115663.32 GFLOP/s , 173582.9 tokens/s INFO:__main__:2024-11-30 07:20:34 | Epoch: 0 | Step: 334660 | Dataset: 0-6638760 | Loss: 0.378 | 597 ms/step , 115529.63 GFLOP/s , 173576.1 tokens/s INFO:__main__:2024-11-30 07:20:41 | Epoch: 0 | Step: 334670 | Dataset: 0-6641160 | Loss: 0.408 | 597 ms/step , 115637.96 GFLOP/s , 173647.5 tokens/s INFO:__main__:2024-11-30 07:20:48 | Epoch: 0 | Step: 334680 | Dataset: 0-6643560 | Loss: 0.390 | 597 ms/step , 115534.57 GFLOP/s , 173576.2 tokens/s INFO:__main__:2024-11-30 07:20:55 | Epoch: 0 | Step: 334690 | Dataset: 0-6645960 | Loss: 0.338 | 597 ms/step , 115600.22 GFLOP/s , 173519.3 tokens/s INFO:__main__:2024-11-30 07:21:02 | Epoch: 0 | Step: 334700 | Dataset: 0-6648360 | Loss: 0.363 | 596 ms/step , 115742.89 GFLOP/s , 173647.3 tokens/s INFO:__main__:2024-11-30 07:21:09 | Epoch: 0 | Step: 334710 | Dataset: 0-6650760 | Loss: 0.353 | 597 ms/step , 115619.59 GFLOP/s , 173658.9 tokens/s INFO:__main__:2024-11-30 07:21:16 | Epoch: 0 | Step: 334720 | Dataset: 0-6653160 | Loss: 0.321 | 597 ms/step , 115622.02 GFLOP/s , 173583.2 tokens/s INFO:__main__:2024-11-30 07:21:23 | Epoch: 0 | Step: 334730 | Dataset: 0-6655560 | Loss: 0.435 | 598 ms/step , 115480.81 GFLOP/s , 173667.9 tokens/s INFO:__main__:2024-11-30 07:21:30 | Epoch: 0 | Step: 334740 | Dataset: 0-6657960 | Loss: 0.374 | 598 ms/step , 115494.83 GFLOP/s , 173635.5 tokens/s INFO:__main__:2024-11-30 07:21:37 | Epoch: 0 | Step: 334750 | Dataset: 0-6660360 | Loss: 0.355 | 597 ms/step , 115534.35 GFLOP/s , 173581.1 tokens/s INFO:__main__:2024-11-30 07:21:44 | Epoch: 0 | Step: 334760 | Dataset: 0-6662760 | Loss: 0.365 | 597 ms/step , 115614.32 GFLOP/s , 173616.3 tokens/s INFO:__main__:2024-11-30 07:21:52 | Epoch: 0 | Step: 334770 | Dataset: 0-6665160 | Loss: 0.357 | 596 ms/step , 115747.82 GFLOP/s , 173614.2 tokens/s INFO:__main__:2024-11-30 07:21:59 | Epoch: 0 | Step: 334780 | Dataset: 0-6667560 | Loss: 0.344 | 596 ms/step , 115790.58 GFLOP/s , 173607.6 tokens/s INFO:__main__:2024-11-30 07:22:06 | Epoch: 0 | Step: 334790 | Dataset: 0-6669960 | Loss: 0.311 | 598 ms/step , 115457.57 GFLOP/s , 173552.3 tokens/s INFO:__main__:2024-11-30 07:22:13 | Epoch: 0 | Step: 334800 | Dataset: 0-6672360 | Loss: 0.378 | 597 ms/step , 115552.95 GFLOP/s , 173665.2 tokens/s INFO:__main__:2024-11-30 07:22:20 | Epoch: 0 | Step: 334810 | Dataset: 0-6674760 | Loss: 0.336 | 598 ms/step , 115423.96 GFLOP/s , 173584.9 tokens/s INFO:__main__:2024-11-30 07:22:27 | Epoch: 0 | Step: 334820 | Dataset: 0-6677160 | Loss: 0.394 | 597 ms/step , 115545.81 GFLOP/s , 173576.9 tokens/s INFO:__main__:2024-11-30 07:22:34 | Epoch: 0 | Step: 334830 | Dataset: 0-6679560 | Loss: 0.387 | 597 ms/step , 115595.85 GFLOP/s , 173561.0 tokens/s INFO:__main__:2024-11-30 07:22:41 | Epoch: 0 | Step: 334840 | Dataset: 0-6681960 | Loss: 0.384 | 596 ms/step , 115715.82 GFLOP/s , 173590.7 tokens/s INFO:__main__:2024-11-30 07:22:48 | Epoch: 0 | Step: 334850 | Dataset: 0-6684360 | Loss: 0.354 | 596 ms/step , 115767.38 GFLOP/s , 173774.7 tokens/s INFO:__main__:2024-11-30 07:22:55 | Epoch: 0 | Step: 334860 | Dataset: 0-6686760 | Loss: 0.971 | 598 ms/step , 115445.85 GFLOP/s , 173630.2 tokens/s INFO:__main__:2024-11-30 07:23:02 | Epoch: 0 | Step: 334870 | Dataset: 0-6689160 | Loss: 1.038 | 597 ms/step , 115550.08 GFLOP/s , 173380.5 tokens/s INFO:__main__:2024-11-30 07:23:09 | Epoch: 0 | Step: 334880 | Dataset: 0-6691560 | Loss: 1.090 | 598 ms/step , 115445.66 GFLOP/s , 173387.1 tokens/s INFO:__main__:2024-11-30 07:23:17 | Epoch: 0 | Step: 334890 | Dataset: 0-6693960 | Loss: 1.038 | 598 ms/step , 115366.92 GFLOP/s , 173415.3 tokens/s INFO:__main__:2024-11-30 07:23:24 | Epoch: 0 | Step: 334900 | Dataset: 0-6696360 | Loss: 1.170 | 598 ms/step , 115451.97 GFLOP/s , 173406.4 tokens/s INFO:__main__:2024-11-30 07:23:31 | Epoch: 0 | Step: 334910 | Dataset: 0-6698760 | Loss: 1.061 | 597 ms/step , 115507.72 GFLOP/s , 173456.0 tokens/s INFO:__main__:2024-11-30 07:23:38 | Epoch: 0 | Step: 334920 | Dataset: 0-6701160 | Loss: 1.027 | 599 ms/step , 115120.27 GFLOP/s , 173474.3 tokens/s INFO:__main__:2024-11-30 07:23:45 | Epoch: 0 | Step: 334930 | Dataset: 0-6703560 | Loss: 1.075 | 598 ms/step , 115433.78 GFLOP/s , 173536.0 tokens/s INFO:__main__:2024-11-30 07:23:52 | Epoch: 0 | Step: 334940 | Dataset: 0-6705960 | Loss: 0.841 | 598 ms/step , 115380.09 GFLOP/s , 173490.2 tokens/s INFO:__main__:2024-11-30 07:23:59 | Epoch: 0 | Step: 334950 | Dataset: 0-6708360 | Loss: 1.009 | 598 ms/step , 115469.68 GFLOP/s , 173404.7 tokens/s INFO:__main__:2024-11-30 07:24:06 | Epoch: 0 | Step: 334960 | Dataset: 0-6710760 | Loss: 0.877 | 597 ms/step , 115539.17 GFLOP/s , 173402.6 tokens/s INFO:__main__:2024-11-30 07:24:13 | Epoch: 0 | Step: 334970 | Dataset: 0-6713160 | Loss: 0.841 | 598 ms/step , 115398.13 GFLOP/s , 173398.1 tokens/s INFO:__main__:2024-11-30 07:24:20 | Epoch: 0 | Step: 334980 | Dataset: 0-6715560 | Loss: 0.981 | 598 ms/step , 115419.71 GFLOP/s , 173402.0 tokens/s INFO:__main__:2024-11-30 07:24:27 | Epoch: 0 | Step: 334990 | Dataset: 0-6717960 | Loss: 0.905 | 598 ms/step , 115502.07 GFLOP/s , 173409.4 tokens/s INFO:__main__:2024-11-30 07:24:35 | Validation | Step: 335000 | Val_loss: 0.801 | Best_val_loss: 0.4363 INFO:__main__:2024-11-30 07:24:35 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_072435_step_335000.pt` INFO:__main__:2024-11-30 07:24:38 | Epoch: 0 | Step: 335000 | Dataset: 0-6720360 | Loss: 0.987 | 596 ms/step , 115856.71 GFLOP/s , 120314.1 tokens/s INFO:__main__:2024-11-30 07:24:45 | Epoch: 0 | Step: 335010 | Dataset: 0-6722760 | Loss: 1.274 | 599 ms/step , 115120.68 GFLOP/s , 173085.3 tokens/s INFO:__main__:2024-11-30 07:24:52 | Epoch: 0 | Step: 335020 | Dataset: 0-6725160 | Loss: 1.215 | 597 ms/step , 115574.66 GFLOP/s , 173164.0 tokens/s INFO:__main__:2024-11-30 07:24:59 | Epoch: 0 | Step: 335030 | Dataset: 0-6727560 | Loss: 0.305 | 597 ms/step , 115627.46 GFLOP/s , 173312.7 tokens/s INFO:__main__:2024-11-30 07:25:06 | Epoch: 0 | Step: 335040 | Dataset: 0-6729960 | Loss: 0.375 | 597 ms/step , 115626.63 GFLOP/s , 173274.5 tokens/s INFO:__main__:2024-11-30 07:25:13 | Epoch: 0 | Step: 335050 | Dataset: 0-6732360 | Loss: 0.372 | 597 ms/step , 115512.20 GFLOP/s , 173397.6 tokens/s INFO:__main__:2024-11-30 07:25:20 | Epoch: 0 | Step: 335060 | Dataset: 0-6734760 | Loss: 0.313 | 597 ms/step , 115623.12 GFLOP/s , 173693.8 tokens/s INFO:__main__:2024-11-30 07:25:27 | Epoch: 0 | Step: 335070 | Dataset: 0-6737160 | Loss: 0.370 | 596 ms/step , 115718.14 GFLOP/s , 173841.1 tokens/s INFO:__main__:2024-11-30 07:25:34 | Epoch: 0 | Step: 335080 | Dataset: 0-6739560 | Loss: 0.307 | 597 ms/step , 115639.50 GFLOP/s , 173945.9 tokens/s INFO:__main__:2024-11-30 07:25:41 | Epoch: 0 | Step: 335090 | Dataset: 0-6741960 | Loss: 0.350 | 597 ms/step , 115635.50 GFLOP/s , 173693.6 tokens/s INFO:__main__:2024-11-30 07:25:48 | Epoch: 0 | Step: 335100 | Dataset: 0-6744360 | Loss: 0.354 | 597 ms/step , 115667.31 GFLOP/s , 173695.2 tokens/s INFO:__main__:2024-11-30 07:25:55 | Epoch: 0 | Step: 335110 | Dataset: 0-6746760 | Loss: 0.322 | 597 ms/step , 115642.33 GFLOP/s , 173667.1 tokens/s INFO:__main__:2024-11-30 07:26:03 | Epoch: 0 | Step: 335120 | Dataset: 0-6749160 | Loss: 0.317 | 597 ms/step , 115605.94 GFLOP/s , 173655.5 tokens/s INFO:__main__:2024-11-30 07:26:10 | Epoch: 0 | Step: 335130 | Dataset: 0-6751560 | Loss: 0.305 | 597 ms/step , 115603.61 GFLOP/s , 173668.0 tokens/s INFO:__main__:2024-11-30 07:26:17 | Epoch: 0 | Step: 335140 | Dataset: 0-6753960 | Loss: 0.336 | 597 ms/step , 115532.05 GFLOP/s , 173757.7 tokens/s INFO:__main__:2024-11-30 07:26:24 | Epoch: 0 | Step: 335150 | Dataset: 0-6756360 | Loss: 0.337 | 597 ms/step , 115551.28 GFLOP/s , 173820.9 tokens/s INFO:__main__:2024-11-30 07:26:31 | Epoch: 0 | Step: 335160 | Dataset: 0-6758760 | Loss: 0.291 | 597 ms/step , 115524.44 GFLOP/s , 173730.1 tokens/s INFO:__main__:2024-11-30 07:26:38 | Epoch: 0 | Step: 335170 | Dataset: 0-6761160 | Loss: 0.326 | 597 ms/step , 115603.82 GFLOP/s , 173664.4 tokens/s INFO:__main__:2024-11-30 07:26:45 | Epoch: 0 | Step: 335180 | Dataset: 0-6763560 | Loss: 0.324 | 597 ms/step , 115691.83 GFLOP/s , 173703.6 tokens/s INFO:__main__:2024-11-30 07:26:52 | Epoch: 0 | Step: 335190 | Dataset: 0-6765960 | Loss: 0.340 | 597 ms/step , 115599.10 GFLOP/s , 173582.5 tokens/s INFO:__main__:2024-11-30 07:26:59 | Epoch: 0 | Step: 335200 | Dataset: 0-6768360 | Loss: 0.358 | 597 ms/step , 115543.82 GFLOP/s , 173655.2 tokens/s INFO:__main__:2024-11-30 07:27:06 | Epoch: 0 | Step: 335210 | Dataset: 0-6770760 | Loss: 0.338 | 597 ms/step , 115586.44 GFLOP/s , 173628.3 tokens/s INFO:__main__:2024-11-30 07:27:13 | Epoch: 0 | Step: 335220 | Dataset: 0-6773160 | Loss: 0.309 | 596 ms/step , 115755.69 GFLOP/s , 173802.1 tokens/s INFO:__main__:2024-11-30 07:27:20 | Epoch: 0 | Step: 335230 | Dataset: 0-6775560 | Loss: 0.353 | 597 ms/step , 115547.07 GFLOP/s , 173705.4 tokens/s INFO:__main__:2024-11-30 07:27:27 | Epoch: 0 | Step: 335240 | Dataset: 0-6777960 | Loss: 0.315 | 597 ms/step , 115567.99 GFLOP/s , 173634.4 tokens/s INFO:__main__:2024-11-30 07:27:35 | Epoch: 0 | Step: 335250 | Dataset: 0-6780360 | Loss: 0.297 | 597 ms/step , 115588.51 GFLOP/s , 173591.1 tokens/s INFO:__main__:2024-11-30 07:27:42 | Epoch: 0 | Step: 335260 | Dataset: 0-6782760 | Loss: 0.314 | 597 ms/step , 115597.63 GFLOP/s , 173590.6 tokens/s INFO:__main__:2024-11-30 07:27:49 | Epoch: 0 | Step: 335270 | Dataset: 0-6785160 | Loss: 0.313 | 597 ms/step , 115508.47 GFLOP/s , 173545.2 tokens/s INFO:__main__:2024-11-30 07:27:56 | Epoch: 0 | Step: 335280 | Dataset: 0-6787560 | Loss: 0.405 | 597 ms/step , 115535.20 GFLOP/s , 173689.2 tokens/s INFO:__main__:2024-11-30 07:28:03 | Epoch: 0 | Step: 335290 | Dataset: 0-6789960 | Loss: 0.398 | 596 ms/step , 115849.35 GFLOP/s , 173700.8 tokens/s INFO:__main__:2024-11-30 07:28:10 | Epoch: 0 | Step: 335300 | Dataset: 0-6792360 | Loss: 0.352 | 597 ms/step , 115628.68 GFLOP/s , 173834.8 tokens/s INFO:__main__:2024-11-30 07:28:17 | Epoch: 0 | Step: 335310 | Dataset: 0-6794760 | Loss: 0.341 | 597 ms/step , 115642.75 GFLOP/s , 173707.1 tokens/s INFO:__main__:2024-11-30 07:28:24 | Epoch: 0 | Step: 335320 | Dataset: 0-6797160 | Loss: 0.300 | 597 ms/step , 115650.85 GFLOP/s , 173670.9 tokens/s INFO:__main__:2024-11-30 07:28:31 | Epoch: 0 | Step: 335330 | Dataset: 0-6799560 | Loss: 0.361 | 597 ms/step , 115536.21 GFLOP/s , 173615.5 tokens/s INFO:__main__:2024-11-30 07:28:38 | Epoch: 0 | Step: 335340 | Dataset: 0-6801960 | Loss: 0.344 | 597 ms/step , 115653.02 GFLOP/s , 173625.3 tokens/s INFO:__main__:2024-11-30 07:28:45 | Epoch: 0 | Step: 335350 | Dataset: 0-6804360 | Loss: 0.334 | 598 ms/step , 115411.74 GFLOP/s , 173651.9 tokens/s INFO:__main__:2024-11-30 07:28:52 | Epoch: 0 | Step: 335360 | Dataset: 0-6806760 | Loss: 0.317 | 598 ms/step , 115462.00 GFLOP/s , 173656.5 tokens/s INFO:__main__:2024-11-30 07:28:59 | Epoch: 0 | Step: 335370 | Dataset: 0-6809160 | Loss: 0.286 | 596 ms/step , 115708.33 GFLOP/s , 173790.1 tokens/s INFO:__main__:2024-11-30 07:29:06 | Epoch: 0 | Step: 335380 | Dataset: 0-6811560 | Loss: 0.346 | 597 ms/step , 115540.71 GFLOP/s , 173694.6 tokens/s INFO:__main__:2024-11-30 07:29:14 | Epoch: 0 | Step: 335390 | Dataset: 0-6813960 | Loss: 0.357 | 597 ms/step , 115573.03 GFLOP/s , 173622.3 tokens/s INFO:__main__:2024-11-30 07:29:21 | Epoch: 0 | Step: 335400 | Dataset: 0-6816360 | Loss: 0.652 | 598 ms/step , 115479.93 GFLOP/s , 173604.9 tokens/s INFO:__main__:2024-11-30 07:29:28 | Epoch: 0 | Step: 335410 | Dataset: 0-6818760 | Loss: 0.586 | 598 ms/step , 115436.90 GFLOP/s , 173447.8 tokens/s INFO:__main__:2024-11-30 07:29:35 | Epoch: 0 | Step: 335420 | Dataset: 0-6821160 | Loss: 0.408 | 596 ms/step , 115763.40 GFLOP/s , 173678.9 tokens/s INFO:__main__:2024-11-30 07:29:42 | Epoch: 0 | Step: 335430 | Dataset: 0-6823560 | Loss: 1.523 | 598 ms/step , 115435.57 GFLOP/s , 173558.8 tokens/s INFO:__main__:2024-11-30 07:29:49 | Epoch: 0 | Step: 335440 | Dataset: 0-6825960 | Loss: 0.293 | 596 ms/step , 115725.03 GFLOP/s , 173795.2 tokens/s INFO:__main__:2024-11-30 07:29:56 | Epoch: 0 | Step: 335450 | Dataset: 0-6828360 | Loss: 0.330 | 596 ms/step , 115748.93 GFLOP/s , 173710.7 tokens/s INFO:__main__:2024-11-30 07:30:03 | Epoch: 0 | Step: 335460 | Dataset: 0-6830760 | Loss: 1.237 | 598 ms/step , 115362.72 GFLOP/s , 173553.2 tokens/s INFO:__main__:2024-11-30 07:30:10 | Epoch: 0 | Step: 335470 | Dataset: 0-6833160 | Loss: 1.124 | 597 ms/step , 115528.90 GFLOP/s , 173464.7 tokens/s INFO:__main__:2024-11-30 07:30:17 | Epoch: 0 | Step: 335480 | Dataset: 0-6835560 | Loss: 0.285 | 597 ms/step , 115620.52 GFLOP/s , 173549.6 tokens/s INFO:__main__:2024-11-30 07:30:24 | Epoch: 0 | Step: 335490 | Dataset: 0-6837960 | Loss: 1.131 | 597 ms/step , 115551.68 GFLOP/s , 173618.4 tokens/s INFO:__main__:2024-11-30 07:30:32 | Validation | Step: 335500 | Val_loss: 0.848 | Best_val_loss: 0.4363 INFO:__main__:2024-11-30 07:30:33 | Epoch: 0 | Step: 335500 | Dataset: 0-6840360 | Loss: 0.867 | 595 ms/step , 115940.78 GFLOP/s , 147685.9 tokens/s INFO:__main__:2024-11-30 07:30:40 | Epoch: 0 | Step: 335510 | Dataset: 0-6842760 | Loss: 0.386 | 596 ms/step , 115729.88 GFLOP/s , 173777.8 tokens/s INFO:__main__:2024-11-30 07:30:47 | Epoch: 0 | Step: 335520 | Dataset: 0-6845160 | Loss: 1.274 | 598 ms/step , 115356.58 GFLOP/s , 173735.6 tokens/s INFO:__main__:2024-11-30 07:30:54 | Epoch: 0 | Step: 335530 | Dataset: 0-6847560 | Loss: 1.061 | 597 ms/step , 115556.20 GFLOP/s , 173743.4 tokens/s INFO:__main__:2024-11-30 07:31:01 | Epoch: 0 | Step: 335540 | Dataset: 0-6849960 | Loss: 0.125 | 596 ms/step , 115776.29 GFLOP/s , 173670.5 tokens/s INFO:__main__:2024-11-30 07:31:08 | Epoch: 0 | Step: 335550 | Dataset: 0-6852360 | Loss: 2.174 | 598 ms/step , 115354.19 GFLOP/s , 173674.7 tokens/s INFO:__main__:2024-11-30 07:31:15 | Epoch: 0 | Step: 335560 | Dataset: 0-6854760 | Loss: 0.425 | 597 ms/step , 115623.05 GFLOP/s , 173494.4 tokens/s INFO:__main__:2024-11-30 07:31:22 | Epoch: 0 | Step: 335570 | Dataset: 0-6857160 | Loss: 1.125 | 598 ms/step , 115460.61 GFLOP/s , 173533.4 tokens/s INFO:__main__:2024-11-30 07:31:29 | Epoch: 0 | Step: 335580 | Dataset: 0-6859560 | Loss: 0.320 | 596 ms/step , 115697.79 GFLOP/s , 173671.9 tokens/s INFO:__main__:2024-11-30 07:31:36 | Epoch: 0 | Step: 335590 | Dataset: 0-6861960 | Loss: 0.613 | 596 ms/step , 115739.71 GFLOP/s , 173693.1 tokens/s INFO:__main__:2024-11-30 07:31:43 | Epoch: 0 | Step: 335600 | Dataset: 0-6864360 | Loss: 0.306 | 596 ms/step , 115758.93 GFLOP/s , 173797.9 tokens/s INFO:__main__:2024-11-30 07:31:51 | Epoch: 0 | Step: 335610 | Dataset: 0-6866760 | Loss: 0.543 | 597 ms/step , 115643.78 GFLOP/s , 173718.1 tokens/s INFO:__main__:2024-11-30 07:31:58 | Epoch: 0 | Step: 335620 | Dataset: 0-6869160 | Loss: 0.124 | 598 ms/step , 115454.30 GFLOP/s , 173723.3 tokens/s INFO:__main__:2024-11-30 07:32:05 | Epoch: 0 | Step: 335630 | Dataset: 0-6871560 | Loss: 1.451 | 598 ms/step , 115418.82 GFLOP/s , 173674.7 tokens/s INFO:__main__:2024-11-30 07:32:12 | Epoch: 0 | Step: 335640 | Dataset: 0-6873960 | Loss: 1.070 | 597 ms/step , 115549.09 GFLOP/s , 173664.0 tokens/s INFO:__main__:2024-11-30 07:32:19 | Epoch: 0 | Step: 335650 | Dataset: 0-6876360 | Loss: 0.113 | 597 ms/step , 115659.85 GFLOP/s , 173718.2 tokens/s INFO:__main__:2024-11-30 07:32:26 | Epoch: 0 | Step: 335660 | Dataset: 0-6878760 | Loss: 0.138 | 596 ms/step , 115763.06 GFLOP/s , 173850.0 tokens/s INFO:__main__:2024-11-30 07:32:33 | Epoch: 0 | Step: 335670 | Dataset: 0-6881160 | Loss: 0.999 | 598 ms/step , 115466.83 GFLOP/s , 173940.0 tokens/s INFO:__main__:2024-11-30 07:32:40 | Epoch: 0 | Step: 335680 | Dataset: 0-6883560 | Loss: 1.101 | 599 ms/step , 115277.49 GFLOP/s , 173738.2 tokens/s INFO:__main__:2024-11-30 07:32:47 | Epoch: 0 | Step: 335690 | Dataset: 0-6885960 | Loss: 0.217 | 597 ms/step , 115633.94 GFLOP/s , 173483.9 tokens/s INFO:__main__:2024-11-30 07:32:54 | Epoch: 0 | Step: 335700 | Dataset: 0-6888360 | Loss: 1.095 | 598 ms/step , 115466.17 GFLOP/s , 173617.8 tokens/s INFO:__main__:2024-11-30 07:33:01 | Epoch: 0 | Step: 335710 | Dataset: 0-6890760 | Loss: 0.207 | 597 ms/step , 115555.94 GFLOP/s , 173556.6 tokens/s INFO:__main__:2024-11-30 07:33:08 | Epoch: 0 | Step: 335720 | Dataset: 0-6893160 | Loss: 0.089 | 597 ms/step , 115630.01 GFLOP/s , 173747.0 tokens/s INFO:__main__:2024-11-30 07:33:15 | Epoch: 0 | Step: 335730 | Dataset: 0-6895560 | Loss: 0.412 | 597 ms/step , 115554.12 GFLOP/s , 173689.5 tokens/s INFO:__main__:2024-11-30 07:33:22 | Epoch: 0 | Step: 335740 | Dataset: 0-6897960 | Loss: 0.294 | 596 ms/step , 115743.38 GFLOP/s , 173833.4 tokens/s INFO:__main__:2024-11-30 07:33:30 | Epoch: 0 | Step: 335750 | Dataset: 0-6900360 | Loss: 0.334 | 598 ms/step , 115353.77 GFLOP/s , 173792.1 tokens/s INFO:__main__:2024-11-30 07:33:37 | Epoch: 0 | Step: 335760 | Dataset: 0-6902760 | Loss: 0.352 | 597 ms/step , 115606.16 GFLOP/s , 173621.9 tokens/s INFO:__main__:2024-11-30 07:33:44 | Epoch: 0 | Step: 335770 | Dataset: 0-6905160 | Loss: 0.116 | 596 ms/step , 115738.44 GFLOP/s , 173746.0 tokens/s INFO:__main__:2024-11-30 07:33:51 | Epoch: 0 | Step: 335780 | Dataset: 0-6907560 | Loss: 1.229 | 598 ms/step , 115481.61 GFLOP/s , 173556.9 tokens/s INFO:__main__:2024-11-30 07:33:58 | Epoch: 0 | Step: 335790 | Dataset: 0-6909960 | Loss: 0.229 | 597 ms/step , 115618.62 GFLOP/s , 173613.0 tokens/s INFO:__main__:2024-11-30 07:34:05 | Epoch: 0 | Step: 335800 | Dataset: 0-6912360 | Loss: 0.333 | 597 ms/step , 115642.06 GFLOP/s , 173729.1 tokens/s INFO:__main__:2024-11-30 07:34:12 | Epoch: 0 | Step: 335810 | Dataset: 0-6914760 | Loss: 0.412 | 596 ms/step , 115842.66 GFLOP/s , 173753.4 tokens/s INFO:__main__:2024-11-30 07:34:19 | Epoch: 0 | Step: 335820 | Dataset: 0-6917160 | Loss: 0.387 | 597 ms/step , 115656.46 GFLOP/s , 173928.3 tokens/s INFO:__main__:2024-11-30 07:34:26 | Epoch: 0 | Step: 335830 | Dataset: 0-6919560 | Loss: 0.138 | 596 ms/step , 115729.88 GFLOP/s , 173785.3 tokens/s INFO:__main__:2024-11-30 07:34:33 | Epoch: 0 | Step: 335840 | Dataset: 0-6921960 | Loss: 0.128 | 598 ms/step , 115497.45 GFLOP/s , 173720.3 tokens/s INFO:__main__:2024-11-30 07:34:40 | Epoch: 0 | Step: 335850 | Dataset: 0-6924360 | Loss: 0.105 | 596 ms/step , 115745.35 GFLOP/s , 173859.9 tokens/s INFO:__main__:2024-11-30 07:34:47 | Epoch: 0 | Step: 335860 | Dataset: 0-6926760 | Loss: 1.272 | 598 ms/step , 115441.38 GFLOP/s , 173516.0 tokens/s INFO:__main__:2024-11-30 07:34:54 | Epoch: 0 | Step: 335870 | Dataset: 0-6929160 | Loss: 1.273 | 598 ms/step , 115316.59 GFLOP/s , 173316.4 tokens/s INFO:__main__:2024-11-30 07:35:02 | Epoch: 0 | Step: 335880 | Dataset: 0-6931560 | Loss: 0.292 | 597 ms/step , 115618.31 GFLOP/s , 173539.8 tokens/s INFO:__main__:2024-11-30 07:35:09 | Epoch: 0 | Step: 335890 | Dataset: 0-6933960 | Loss: 0.284 | 596 ms/step , 115768.15 GFLOP/s , 173888.2 tokens/s INFO:__main__:2024-11-30 07:35:16 | Epoch: 0 | Step: 335900 | Dataset: 0-6936360 | Loss: 0.291 | 596 ms/step , 115705.54 GFLOP/s , 173848.8 tokens/s INFO:__main__:2024-11-30 07:35:23 | Epoch: 0 | Step: 335910 | Dataset: 0-6938760 | Loss: 0.286 | 597 ms/step , 115576.69 GFLOP/s , 173808.5 tokens/s INFO:__main__:2024-11-30 07:35:30 | Epoch: 0 | Step: 335920 | Dataset: 0-6941160 | Loss: 0.578 | 597 ms/step , 115577.27 GFLOP/s , 173642.9 tokens/s INFO:__main__:2024-11-30 07:35:37 | Epoch: 0 | Step: 335930 | Dataset: 0-6943560 | Loss: 2.260 | 598 ms/step , 115412.28 GFLOP/s , 173478.2 tokens/s INFO:__main__:2024-11-30 07:35:44 | Epoch: 0 | Step: 335940 | Dataset: 0-6945960 | Loss: 1.075 | 599 ms/step , 115303.33 GFLOP/s , 173375.0 tokens/s INFO:__main__:2024-11-30 07:35:51 | Epoch: 0 | Step: 335950 | Dataset: 0-6948360 | Loss: 0.602 | 597 ms/step , 115597.11 GFLOP/s , 173538.0 tokens/s INFO:__main__:2024-11-30 07:35:58 | Epoch: 0 | Step: 335960 | Dataset: 0-6950760 | Loss: 0.576 | 596 ms/step , 115797.51 GFLOP/s , 173688.8 tokens/s INFO:__main__:2024-11-30 07:36:05 | Epoch: 0 | Step: 335970 | Dataset: 0-6953160 | Loss: 0.603 | 596 ms/step , 115773.85 GFLOP/s , 173749.5 tokens/s INFO:__main__:2024-11-30 07:36:12 | Epoch: 0 | Step: 335980 | Dataset: 0-6955560 | Loss: 0.532 | 597 ms/step , 115622.28 GFLOP/s , 173683.8 tokens/s INFO:__main__:2024-11-30 07:36:19 | Epoch: 0 | Step: 335990 | Dataset: 0-6957960 | Loss: 0.522 | 597 ms/step , 115514.73 GFLOP/s , 173629.9 tokens/s INFO:__main__:2024-11-30 07:36:27 | Validation | Step: 336000 | Val_loss: 0.847 | Best_val_loss: 0.4363 INFO:__main__:2024-11-30 07:36:27 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_073627_step_336000.pt` INFO:__main__:2024-11-30 07:36:30 | Epoch: 0 | Step: 336000 | Dataset: 0-6960360 | Loss: 0.569 | 598 ms/step , 115392.83 GFLOP/s , 118552.4 tokens/s INFO:__main__:2024-11-30 07:36:37 | Epoch: 0 | Step: 336010 | Dataset: 0-6962760 | Loss: 0.551 | 598 ms/step , 115442.61 GFLOP/s , 173267.2 tokens/s INFO:__main__:2024-11-30 07:36:44 | Epoch: 0 | Step: 336020 | Dataset: 0-6965160 | Loss: 0.532 | 599 ms/step , 115245.36 GFLOP/s , 173222.1 tokens/s INFO:__main__:2024-11-30 07:36:51 | Epoch: 0 | Step: 336030 | Dataset: 0-6967560 | Loss: 0.574 | 596 ms/step , 115775.35 GFLOP/s , 173354.9 tokens/s INFO:__main__:2024-11-30 07:36:58 | Epoch: 0 | Step: 336040 | Dataset: 0-6969960 | Loss: 0.560 | 597 ms/step , 115574.88 GFLOP/s , 173352.7 tokens/s INFO:__main__:2024-11-30 07:37:05 | Epoch: 0 | Step: 336050 | Dataset: 0-6972360 | Loss: 0.513 | 597 ms/step , 115602.46 GFLOP/s , 173286.2 tokens/s INFO:__main__:2024-11-30 07:37:12 | Epoch: 0 | Step: 336060 | Dataset: 0-6974760 | Loss: 0.550 | 597 ms/step , 115684.95 GFLOP/s , 173403.5 tokens/s INFO:__main__:2024-11-30 07:37:19 | Epoch: 0 | Step: 336070 | Dataset: 0-6977160 | Loss: 0.557 | 597 ms/step , 115664.94 GFLOP/s , 173684.9 tokens/s INFO:__main__:2024-11-30 07:37:26 | Epoch: 0 | Step: 336080 | Dataset: 0-6979560 | Loss: 0.578 | 597 ms/step , 115605.35 GFLOP/s , 173691.6 tokens/s INFO:__main__:2024-11-30 07:37:33 | Epoch: 0 | Step: 336090 | Dataset: 0-6981960 | Loss: 0.606 | 597 ms/step , 115663.22 GFLOP/s , 173692.7 tokens/s INFO:__main__:2024-11-30 07:37:41 | Epoch: 0 | Step: 336100 | Dataset: 0-6984360 | Loss: 0.565 | 597 ms/step , 115658.89 GFLOP/s , 173680.4 tokens/s INFO:__main__:2024-11-30 07:37:48 | Epoch: 0 | Step: 336110 | Dataset: 0-6986760 | Loss: 0.538 | 597 ms/step , 115624.21 GFLOP/s , 173867.5 tokens/s INFO:__main__:2024-11-30 07:37:55 | Epoch: 0 | Step: 336120 | Dataset: 0-6989160 | Loss: 0.524 | 597 ms/step , 115640.58 GFLOP/s , 173807.4 tokens/s INFO:__main__:2024-11-30 07:38:02 | Epoch: 0 | Step: 336130 | Dataset: 0-6991560 | Loss: 0.491 | 597 ms/step , 115628.00 GFLOP/s , 173734.4 tokens/s INFO:__main__:2024-11-30 07:38:09 | Epoch: 0 | Step: 336140 | Dataset: 0-6993960 | Loss: 0.632 | 597 ms/step , 115565.41 GFLOP/s , 173650.8 tokens/s INFO:__main__:2024-11-30 07:38:16 | Epoch: 0 | Step: 336150 | Dataset: 0-6996360 | Loss: 0.498 | 597 ms/step , 115549.68 GFLOP/s , 173695.8 tokens/s INFO:__main__:2024-11-30 07:38:23 | Epoch: 0 | Step: 336160 | Dataset: 0-6998760 | Loss: 0.521 | 597 ms/step , 115644.33 GFLOP/s , 173660.8 tokens/s INFO:__main__:2024-11-30 07:38:30 | Epoch: 0 | Step: 336170 | Dataset: 0-7001160 | Loss: 0.554 | 597 ms/step , 115600.10 GFLOP/s , 173676.2 tokens/s INFO:__main__:2024-11-30 07:38:37 | Epoch: 0 | Step: 336180 | Dataset: 0-7003560 | Loss: 0.577 | 596 ms/step , 115696.46 GFLOP/s , 173767.9 tokens/s INFO:__main__:2024-11-30 07:38:44 | Epoch: 0 | Step: 336190 | Dataset: 0-7005960 | Loss: 0.563 | 597 ms/step , 115611.97 GFLOP/s , 173907.3 tokens/s INFO:__main__:2024-11-30 07:38:51 | Epoch: 0 | Step: 336200 | Dataset: 0-7008360 | Loss: 0.560 | 597 ms/step , 115516.05 GFLOP/s , 173733.7 tokens/s INFO:__main__:2024-11-30 07:38:58 | Epoch: 0 | Step: 336210 | Dataset: 0-7010760 | Loss: 0.592 | 597 ms/step , 115533.65 GFLOP/s , 173606.3 tokens/s INFO:__main__:2024-11-30 07:39:05 | Epoch: 0 | Step: 336220 | Dataset: 0-7013160 | Loss: 0.538 | 597 ms/step , 115603.87 GFLOP/s , 173613.1 tokens/s INFO:__main__:2024-11-30 07:39:12 | Epoch: 0 | Step: 336230 | Dataset: 0-7015560 | Loss: 0.582 | 597 ms/step , 115634.76 GFLOP/s , 173679.5 tokens/s INFO:__main__:2024-11-30 07:39:20 | Epoch: 0 | Step: 336240 | Dataset: 0-7017960 | Loss: 0.609 | 597 ms/step , 115579.44 GFLOP/s , 173587.3 tokens/s INFO:__main__:2024-11-30 07:39:27 | Epoch: 0 | Step: 336250 | Dataset: 0-7020360 | Loss: 0.611 | 597 ms/step , 115602.23 GFLOP/s , 173687.9 tokens/s INFO:__main__:2024-11-30 07:39:34 | Epoch: 0 | Step: 336260 | Dataset: 0-7022760 | Loss: 0.601 | 597 ms/step , 115660.01 GFLOP/s , 173782.1 tokens/s INFO:__main__:2024-11-30 07:39:41 | Epoch: 0 | Step: 336270 | Dataset: 0-7025160 | Loss: 0.568 | 597 ms/step , 115583.89 GFLOP/s , 173748.9 tokens/s INFO:__main__:2024-11-30 07:39:48 | Epoch: 0 | Step: 336280 | Dataset: 0-7027560 | Loss: 0.598 | 597 ms/step , 115683.01 GFLOP/s , 173680.5 tokens/s INFO:__main__:2024-11-30 07:39:55 | Epoch: 0 | Step: 336290 | Dataset: 0-7029960 | Loss: 0.558 | 597 ms/step , 115524.64 GFLOP/s , 173700.0 tokens/s INFO:__main__:2024-11-30 07:40:02 | Epoch: 0 | Step: 336300 | Dataset: 0-7032360 | Loss: 0.429 | 597 ms/step , 115582.12 GFLOP/s , 173582.7 tokens/s INFO:__main__:2024-11-30 07:40:09 | Epoch: 0 | Step: 336310 | Dataset: 0-7034760 | Loss: 0.430 | 598 ms/step , 115430.70 GFLOP/s , 173495.9 tokens/s INFO:__main__:2024-11-30 07:40:16 | Epoch: 0 | Step: 336320 | Dataset: 0-7037160 | Loss: 0.429 | 597 ms/step , 115504.87 GFLOP/s , 173546.5 tokens/s INFO:__main__:2024-11-30 07:40:23 | Epoch: 0 | Step: 336330 | Dataset: 0-7039560 | Loss: 0.449 | 597 ms/step , 115652.12 GFLOP/s , 173659.7 tokens/s INFO:__main__:2024-11-30 07:40:30 | Epoch: 0 | Step: 336340 | Dataset: 0-7041960 | Loss: 0.341 | 596 ms/step , 115715.53 GFLOP/s , 173677.5 tokens/s INFO:__main__:2024-11-30 07:40:37 | Epoch: 0 | Step: 336350 | Dataset: 0-7044360 | Loss: 0.404 | 597 ms/step , 115601.77 GFLOP/s , 173594.0 tokens/s INFO:__main__:2024-11-30 07:40:45 | Epoch: 0 | Step: 336360 | Dataset: 0-7046760 | Loss: 0.419 | 598 ms/step , 115455.35 GFLOP/s , 173568.2 tokens/s INFO:__main__:2024-11-30 07:40:52 | Epoch: 0 | Step: 336370 | Dataset: 0-7049160 | Loss: 0.416 | 598 ms/step , 115414.13 GFLOP/s , 173520.6 tokens/s INFO:__main__:2024-11-30 07:40:59 | Epoch: 0 | Step: 336380 | Dataset: 0-7051560 | Loss: 0.428 | 597 ms/step , 115513.84 GFLOP/s , 173553.0 tokens/s INFO:__main__:2024-11-30 07:41:06 | Epoch: 0 | Step: 336390 | Dataset: 0-7053960 | Loss: 0.413 | 597 ms/step , 115569.95 GFLOP/s , 173513.6 tokens/s INFO:__main__:2024-11-30 07:41:13 | Epoch: 0 | Step: 336400 | Dataset: 0-7056360 | Loss: 0.365 | 596 ms/step , 115721.25 GFLOP/s , 173593.3 tokens/s INFO:__main__:2024-11-30 07:41:20 | Epoch: 0 | Step: 336410 | Dataset: 0-7058760 | Loss: 0.384 | 597 ms/step , 115675.67 GFLOP/s , 173757.8 tokens/s INFO:__main__:2024-11-30 07:41:27 | Epoch: 0 | Step: 336420 | Dataset: 0-7061160 | Loss: 0.378 | 597 ms/step , 115512.93 GFLOP/s , 173618.8 tokens/s INFO:__main__:2024-11-30 07:41:34 | Epoch: 0 | Step: 336430 | Dataset: 0-7063560 | Loss: 0.348 | 597 ms/step , 115603.64 GFLOP/s , 173551.2 tokens/s INFO:__main__:2024-11-30 07:41:41 | Epoch: 0 | Step: 336440 | Dataset: 0-7065960 | Loss: 0.425 | 598 ms/step , 115418.39 GFLOP/s , 173570.4 tokens/s INFO:__main__:2024-11-30 07:41:48 | Epoch: 0 | Step: 336450 | Dataset: 0-7068360 | Loss: 0.395 | 599 ms/step , 115290.14 GFLOP/s , 173558.8 tokens/s INFO:__main__:2024-11-30 07:41:55 | Epoch: 0 | Step: 336460 | Dataset: 0-7070760 | Loss: 0.396 | 596 ms/step , 115747.06 GFLOP/s , 173543.5 tokens/s INFO:__main__:2024-11-30 07:42:02 | Epoch: 0 | Step: 336470 | Dataset: 0-7073160 | Loss: 0.574 | 596 ms/step , 115705.02 GFLOP/s , 173664.3 tokens/s INFO:__main__:2024-11-30 07:42:09 | Epoch: 0 | Step: 336480 | Dataset: 0-7075560 | Loss: 0.530 | 596 ms/step , 115783.73 GFLOP/s , 173819.2 tokens/s INFO:__main__:2024-11-30 07:42:17 | Epoch: 0 | Step: 336490 | Dataset: 0-7077960 | Loss: 0.473 | 597 ms/step , 115683.75 GFLOP/s , 173810.8 tokens/s INFO:__main__:2024-11-30 07:42:24 | Validation | Step: 336500 | Val_loss: 0.813 | Best_val_loss: 0.4363 INFO:__main__:2024-11-30 07:42:25 | Epoch: 0 | Step: 336500 | Dataset: 0-7080360 | Loss: 0.370 | 596 ms/step , 115779.98 GFLOP/s , 147623.5 tokens/s INFO:__main__:2024-11-30 07:42:32 | Epoch: 0 | Step: 336510 | Dataset: 0-7082760 | Loss: 0.441 | 597 ms/step , 115666.30 GFLOP/s , 173668.3 tokens/s INFO:__main__:2024-11-30 07:42:39 | Epoch: 0 | Step: 336520 | Dataset: 0-7085160 | Loss: 0.338 | 597 ms/step , 115632.94 GFLOP/s , 173558.7 tokens/s INFO:__main__:2024-11-30 07:42:46 | Epoch: 0 | Step: 336530 | Dataset: 0-7087560 | Loss: 0.369 | 597 ms/step , 115546.14 GFLOP/s , 173651.7 tokens/s INFO:__main__:2024-11-30 07:42:53 | Epoch: 0 | Step: 336540 | Dataset: 0-7089960 | Loss: 0.401 | 597 ms/step , 115529.65 GFLOP/s , 173613.6 tokens/s INFO:__main__:2024-11-30 07:43:00 | Epoch: 0 | Step: 336550 | Dataset: 0-7092360 | Loss: 0.347 | 597 ms/step , 115591.95 GFLOP/s , 173645.4 tokens/s INFO:__main__:2024-11-30 07:43:07 | Epoch: 0 | Step: 336560 | Dataset: 0-7094760 | Loss: 0.309 | 597 ms/step , 115651.85 GFLOP/s , 173768.4 tokens/s INFO:__main__:2024-11-30 07:43:14 | Epoch: 0 | Step: 336570 | Dataset: 0-7097160 | Loss: 0.433 | 598 ms/step , 115405.98 GFLOP/s , 173564.8 tokens/s INFO:__main__:2024-11-30 07:43:21 | Epoch: 0 | Step: 336580 | Dataset: 0-7099560 | Loss: 0.419 | 597 ms/step , 115673.05 GFLOP/s , 173595.9 tokens/s INFO:__main__:2024-11-30 07:43:29 | Epoch: 0 | Step: 336590 | Dataset: 0-7101960 | Loss: 0.355 | 597 ms/step , 115636.91 GFLOP/s , 173627.1 tokens/s INFO:__main__:2024-11-30 07:43:36 | Epoch: 0 | Step: 336600 | Dataset: 0-7104360 | Loss: 0.371 | 597 ms/step , 115518.85 GFLOP/s , 173587.7 tokens/s INFO:__main__:2024-11-30 07:43:43 | Epoch: 0 | Step: 336610 | Dataset: 0-7106760 | Loss: 0.377 | 597 ms/step , 115512.41 GFLOP/s , 173601.3 tokens/s INFO:__main__:2024-11-30 07:43:50 | Epoch: 0 | Step: 336620 | Dataset: 0-7109160 | Loss: 0.379 | 597 ms/step , 115623.57 GFLOP/s , 173613.8 tokens/s INFO:__main__:2024-11-30 07:43:57 | Epoch: 0 | Step: 336630 | Dataset: 0-7111560 | Loss: 0.375 | 598 ms/step , 115478.34 GFLOP/s , 173775.2 tokens/s INFO:__main__:2024-11-30 07:44:04 | Epoch: 0 | Step: 336640 | Dataset: 0-7113960 | Loss: 0.359 | 597 ms/step , 115568.15 GFLOP/s , 173681.0 tokens/s INFO:__main__:2024-11-30 07:44:11 | Epoch: 0 | Step: 336650 | Dataset: 0-7116360 | Loss: 0.397 | 596 ms/step , 115710.72 GFLOP/s , 173607.5 tokens/s INFO:__main__:2024-11-30 07:44:18 | Epoch: 0 | Step: 336660 | Dataset: 0-7118760 | Loss: 0.374 | 597 ms/step , 115550.25 GFLOP/s , 173652.6 tokens/s INFO:__main__:2024-11-30 07:44:25 | Epoch: 0 | Step: 336670 | Dataset: 0-7121160 | Loss: 0.354 | 598 ms/step , 115481.74 GFLOP/s , 173592.0 tokens/s INFO:__main__:2024-11-30 07:44:32 | Epoch: 0 | Step: 336680 | Dataset: 0-7123560 | Loss: 0.414 | 597 ms/step , 115629.10 GFLOP/s , 173604.5 tokens/s INFO:__main__:2024-11-30 07:44:39 | Epoch: 0 | Step: 336690 | Dataset: 0-7125960 | Loss: 0.398 | 597 ms/step , 115614.68 GFLOP/s , 173610.7 tokens/s INFO:__main__:2024-11-30 07:44:46 | Epoch: 0 | Step: 336700 | Dataset: 0-7128360 | Loss: 0.365 | 597 ms/step , 115617.43 GFLOP/s , 173682.8 tokens/s INFO:__main__:2024-11-30 07:44:53 | Epoch: 0 | Step: 336710 | Dataset: 0-7130760 | Loss: 0.419 | 597 ms/step , 115618.51 GFLOP/s , 173679.9 tokens/s INFO:__main__:2024-11-30 07:45:01 | Epoch: 0 | Step: 336720 | Dataset: 0-7133160 | Loss: 0.404 | 597 ms/step , 115630.70 GFLOP/s , 173615.6 tokens/s INFO:__main__:2024-11-30 07:45:08 | Epoch: 0 | Step: 336730 | Dataset: 0-7135560 | Loss: 0.333 | 597 ms/step , 115644.13 GFLOP/s , 173610.9 tokens/s INFO:__main__:2024-11-30 07:45:15 | Epoch: 0 | Step: 336740 | Dataset: 0-7137960 | Loss: 0.362 | 597 ms/step , 115654.15 GFLOP/s , 173578.1 tokens/s INFO:__main__:2024-11-30 07:45:22 | Epoch: 0 | Step: 336750 | Dataset: 0-7140360 | Loss: 0.388 | 598 ms/step , 115434.35 GFLOP/s , 173542.6 tokens/s INFO:__main__:2024-11-30 07:45:29 | Epoch: 0 | Step: 336760 | Dataset: 0-7142760 | Loss: 0.384 | 597 ms/step , 115561.85 GFLOP/s , 173530.8 tokens/s INFO:__main__:2024-11-30 07:45:36 | Epoch: 0 | Step: 336770 | Dataset: 0-7145160 | Loss: 0.392 | 597 ms/step , 115576.21 GFLOP/s , 173644.0 tokens/s INFO:__main__:2024-11-30 07:45:43 | Epoch: 0 | Step: 336780 | Dataset: 0-7147560 | Loss: 0.374 | 596 ms/step , 115748.39 GFLOP/s , 173870.9 tokens/s INFO:__main__:2024-11-30 07:45:50 | Epoch: 0 | Step: 336790 | Dataset: 0-7149960 | Loss: 0.341 | 597 ms/step , 115661.32 GFLOP/s , 173691.6 tokens/s INFO:__main__:2024-11-30 07:45:57 | Epoch: 0 | Step: 336800 | Dataset: 0-7152360 | Loss: 0.415 | 597 ms/step , 115502.44 GFLOP/s , 173566.3 tokens/s INFO:__main__:2024-11-30 07:46:04 | Epoch: 0 | Step: 336810 | Dataset: 0-7154760 | Loss: 0.347 | 597 ms/step , 115565.22 GFLOP/s , 173585.0 tokens/s INFO:__main__:2024-11-30 07:46:11 | Epoch: 0 | Step: 336820 | Dataset: 0-7157160 | Loss: 0.397 | 598 ms/step , 115416.58 GFLOP/s , 173595.4 tokens/s INFO:__main__:2024-11-30 07:46:18 | Epoch: 0 | Step: 336830 | Dataset: 0-7159560 | Loss: 0.393 | 597 ms/step , 115538.41 GFLOP/s , 173594.5 tokens/s INFO:__main__:2024-11-30 07:46:25 | Epoch: 0 | Step: 336840 | Dataset: 0-7161960 | Loss: 0.358 | 597 ms/step , 115619.11 GFLOP/s , 173578.4 tokens/s INFO:__main__:2024-11-30 07:46:33 | Epoch: 0 | Step: 336850 | Dataset: 0-7164360 | Loss: 0.376 | 597 ms/step , 115617.92 GFLOP/s , 173644.6 tokens/s INFO:__main__:2024-11-30 07:46:40 | Epoch: 0 | Step: 336860 | Dataset: 0-7166760 | Loss: 0.303 | 597 ms/step , 115598.90 GFLOP/s , 173659.6 tokens/s INFO:__main__:2024-11-30 07:46:47 | Epoch: 0 | Step: 336870 | Dataset: 0-7169160 | Loss: 0.354 | 596 ms/step , 115696.80 GFLOP/s , 173569.1 tokens/s INFO:__main__:2024-11-30 07:46:54 | Epoch: 0 | Step: 336880 | Dataset: 0-7171560 | Loss: 0.326 | 597 ms/step , 115568.35 GFLOP/s , 173621.4 tokens/s INFO:__main__:2024-11-30 07:47:01 | Epoch: 0 | Step: 336890 | Dataset: 0-7173960 | Loss: 0.368 | 597 ms/step , 115561.19 GFLOP/s , 173590.3 tokens/s INFO:__main__:2024-11-30 07:47:08 | Epoch: 0 | Step: 336900 | Dataset: 0-7176360 | Loss: 0.384 | 597 ms/step , 115659.83 GFLOP/s , 173599.1 tokens/s INFO:__main__:2024-11-30 07:47:15 | Epoch: 0 | Step: 336910 | Dataset: 0-7178760 | Loss: 0.335 | 598 ms/step , 115458.57 GFLOP/s , 173590.5 tokens/s INFO:__main__:2024-11-30 07:47:22 | Epoch: 0 | Step: 336920 | Dataset: 0-7181160 | Loss: 0.409 | 597 ms/step , 115573.74 GFLOP/s , 173660.1 tokens/s INFO:__main__:2024-11-30 07:47:29 | Epoch: 0 | Step: 336930 | Dataset: 0-7183560 | Loss: 0.381 | 597 ms/step , 115641.33 GFLOP/s , 173732.4 tokens/s INFO:__main__:2024-11-30 07:47:36 | Epoch: 0 | Step: 336940 | Dataset: 0-7185960 | Loss: 0.417 | 597 ms/step , 115593.96 GFLOP/s , 173635.5 tokens/s INFO:__main__:2024-11-30 07:47:43 | Epoch: 0 | Step: 336950 | Dataset: 0-7188360 | Loss: 0.412 | 597 ms/step , 115621.97 GFLOP/s , 173611.9 tokens/s INFO:__main__:2024-11-30 07:47:50 | Epoch: 0 | Step: 336960 | Dataset: 0-7190760 | Loss: 0.400 | 597 ms/step , 115547.95 GFLOP/s , 173621.0 tokens/s INFO:__main__:2024-11-30 07:47:57 | Epoch: 0 | Step: 336970 | Dataset: 0-7193160 | Loss: 0.357 | 597 ms/step , 115566.25 GFLOP/s , 173622.0 tokens/s INFO:__main__:2024-11-30 07:48:05 | Epoch: 0 | Step: 336980 | Dataset: 0-7195560 | Loss: 0.366 | 597 ms/step , 115565.71 GFLOP/s , 173631.2 tokens/s INFO:__main__:2024-11-30 07:48:12 | Epoch: 0 | Step: 336990 | Dataset: 0-7197960 | Loss: 0.370 | 597 ms/step , 115546.23 GFLOP/s , 173575.5 tokens/s INFO:__main__:2024-11-30 07:48:19 | Validation | Step: 337000 | Val_loss: 0.836 | Best_val_loss: 0.4363 INFO:__main__:2024-11-30 07:48:19 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_074819_step_337000.pt` INFO:__main__:2024-11-30 07:48:22 | Epoch: 0 | Step: 337000 | Dataset: 0-7200360 | Loss: 0.339 | 595 ms/step , 116065.18 GFLOP/s , 120788.7 tokens/s INFO:__main__:2024-11-30 07:48:29 | Epoch: 0 | Step: 337010 | Dataset: 0-7202760 | Loss: 0.392 | 599 ms/step , 115145.94 GFLOP/s , 173297.8 tokens/s INFO:__main__:2024-11-30 07:48:36 | Epoch: 0 | Step: 337020 | Dataset: 0-7205160 | Loss: 0.331 | 598 ms/step , 115323.23 GFLOP/s , 173242.4 tokens/s INFO:__main__:2024-11-30 07:48:43 | Epoch: 0 | Step: 337030 | Dataset: 0-7207560 | Loss: 0.400 | 597 ms/step , 115588.70 GFLOP/s , 173237.4 tokens/s INFO:__main__:2024-11-30 07:48:50 | Epoch: 0 | Step: 337040 | Dataset: 0-7209960 | Loss: 0.348 | 597 ms/step , 115654.08 GFLOP/s , 173189.8 tokens/s INFO:__main__:2024-11-30 07:48:57 | Epoch: 0 | Step: 337050 | Dataset: 0-7212360 | Loss: 0.371 | 597 ms/step , 115657.92 GFLOP/s , 173184.9 tokens/s INFO:__main__:2024-11-30 07:49:04 | Epoch: 0 | Step: 337060 | Dataset: 0-7214760 | Loss: 0.393 | 597 ms/step , 115679.37 GFLOP/s , 173199.8 tokens/s INFO:__main__:2024-11-30 07:49:11 | Epoch: 0 | Step: 337070 | Dataset: 0-7217160 | Loss: 0.436 | 596 ms/step , 115769.81 GFLOP/s , 173524.2 tokens/s INFO:__main__:2024-11-30 07:49:19 | Epoch: 0 | Step: 337080 | Dataset: 0-7219560 | Loss: 0.370 | 596 ms/step , 115759.59 GFLOP/s , 173831.4 tokens/s INFO:__main__:2024-11-30 07:49:26 | Epoch: 0 | Step: 337090 | Dataset: 0-7221960 | Loss: 0.379 | 597 ms/step , 115577.33 GFLOP/s , 173681.0 tokens/s INFO:__main__:2024-11-30 07:49:33 | Epoch: 0 | Step: 337100 | Dataset: 0-7224360 | Loss: 0.360 | 597 ms/step , 115586.71 GFLOP/s , 173634.6 tokens/s INFO:__main__:2024-11-30 07:49:40 | Epoch: 0 | Step: 337110 | Dataset: 0-7226760 | Loss: 0.339 | 597 ms/step , 115618.01 GFLOP/s , 173678.4 tokens/s INFO:__main__:2024-11-30 07:49:47 | Epoch: 0 | Step: 337120 | Dataset: 0-7229160 | Loss: 0.352 | 597 ms/step , 115653.00 GFLOP/s , 173690.9 tokens/s INFO:__main__:2024-11-30 07:49:54 | Epoch: 0 | Step: 337130 | Dataset: 0-7231560 | Loss: 0.350 | 597 ms/step , 115594.22 GFLOP/s , 173606.2 tokens/s INFO:__main__:2024-11-30 07:50:01 | Epoch: 0 | Step: 337140 | Dataset: 0-7233960 | Loss: 0.342 | 597 ms/step , 115622.52 GFLOP/s , 173667.0 tokens/s INFO:__main__:2024-11-30 07:50:08 | Epoch: 0 | Step: 337150 | Dataset: 0-7236360 | Loss: 0.411 | 596 ms/step , 115723.15 GFLOP/s , 173801.8 tokens/s INFO:__main__:2024-11-30 07:50:15 | Epoch: 0 | Step: 337160 | Dataset: 0-7238760 | Loss: 0.403 | 597 ms/step , 115666.20 GFLOP/s , 173692.0 tokens/s INFO:__main__:2024-11-30 07:50:22 | Epoch: 0 | Step: 337170 | Dataset: 0-7241160 | Loss: 0.372 | 597 ms/step , 115609.07 GFLOP/s , 173647.6 tokens/s INFO:__main__:2024-11-30 07:50:29 | Epoch: 0 | Step: 337180 | Dataset: 0-7243560 | Loss: 0.398 | 597 ms/step , 115571.27 GFLOP/s , 173627.1 tokens/s INFO:__main__:2024-11-30 07:50:36 | Epoch: 0 | Step: 337190 | Dataset: 0-7245960 | Loss: 0.334 | 597 ms/step , 115507.81 GFLOP/s , 173624.9 tokens/s INFO:__main__:2024-11-30 07:50:43 | Epoch: 0 | Step: 337200 | Dataset: 0-7248360 | Loss: 0.335 | 597 ms/step , 115568.59 GFLOP/s , 173673.8 tokens/s INFO:__main__:2024-11-30 07:50:50 | Epoch: 0 | Step: 337210 | Dataset: 0-7250760 | Loss: 0.425 | 597 ms/step , 115625.18 GFLOP/s , 173612.3 tokens/s INFO:__main__:2024-11-30 07:50:58 | Epoch: 0 | Step: 337220 | Dataset: 0-7253160 | Loss: 0.313 | 596 ms/step , 115766.53 GFLOP/s , 173801.4 tokens/s INFO:__main__:2024-11-30 07:51:05 | Epoch: 0 | Step: 337230 | Dataset: 0-7255560 | Loss: 0.358 | 598 ms/step , 115486.56 GFLOP/s , 173733.4 tokens/s INFO:__main__:2024-11-30 07:51:12 | Epoch: 0 | Step: 337240 | Dataset: 0-7257960 | Loss: 0.382 | 597 ms/step , 115604.03 GFLOP/s , 173607.3 tokens/s INFO:__main__:2024-11-30 07:51:19 | Epoch: 0 | Step: 337250 | Dataset: 0-7260360 | Loss: 0.398 | 597 ms/step , 115625.94 GFLOP/s , 173564.0 tokens/s INFO:__main__:2024-11-30 07:51:26 | Epoch: 0 | Step: 337260 | Dataset: 0-7262760 | Loss: 0.341 | 597 ms/step , 115553.03 GFLOP/s , 173600.6 tokens/s INFO:__main__:2024-11-30 07:51:33 | Epoch: 0 | Step: 337270 | Dataset: 0-7265160 | Loss: 0.344 | 597 ms/step , 115596.68 GFLOP/s , 173572.3 tokens/s INFO:__main__:2024-11-30 07:51:40 | Epoch: 0 | Step: 337280 | Dataset: 0-7267560 | Loss: 0.376 | 597 ms/step , 115546.99 GFLOP/s , 173648.4 tokens/s INFO:__main__:2024-11-30 07:51:47 | Epoch: 0 | Step: 337290 | Dataset: 0-7269960 | Loss: 0.341 | 596 ms/step , 115773.84 GFLOP/s , 173693.8 tokens/s INFO:__main__:2024-11-30 07:51:54 | Epoch: 0 | Step: 337300 | Dataset: 0-7272360 | Loss: 0.420 | 596 ms/step , 115714.27 GFLOP/s , 173787.1 tokens/s INFO:__main__:2024-11-30 07:52:01 | Epoch: 0 | Step: 337310 | Dataset: 0-7274760 | Loss: 0.366 | 598 ms/step , 115430.46 GFLOP/s , 173708.2 tokens/s INFO:__main__:2024-11-30 07:52:08 | Epoch: 0 | Step: 337320 | Dataset: 0-7277160 | Loss: 0.390 | 597 ms/step , 115615.53 GFLOP/s , 173680.4 tokens/s INFO:__main__:2024-11-30 07:52:15 | Epoch: 0 | Step: 337330 | Dataset: 0-7279560 | Loss: 0.366 | 597 ms/step , 115547.36 GFLOP/s , 173647.0 tokens/s INFO:__main__:2024-11-30 07:52:22 | Epoch: 0 | Step: 337340 | Dataset: 0-7281960 | Loss: 0.353 | 597 ms/step , 115634.66 GFLOP/s , 173647.4 tokens/s INFO:__main__:2024-11-30 07:52:30 | Epoch: 0 | Step: 337350 | Dataset: 0-7284360 | Loss: 0.394 | 597 ms/step , 115639.01 GFLOP/s , 173618.1 tokens/s INFO:__main__:2024-11-30 07:52:37 | Epoch: 0 | Step: 337360 | Dataset: 0-7286760 | Loss: 0.370 | 597 ms/step , 115601.34 GFLOP/s , 173600.2 tokens/s INFO:__main__:2024-11-30 07:52:44 | Epoch: 0 | Step: 337370 | Dataset: 0-7289160 | Loss: 0.382 | 596 ms/step , 115708.45 GFLOP/s , 173796.7 tokens/s INFO:__main__:2024-11-30 07:52:51 | Epoch: 0 | Step: 337380 | Dataset: 0-7291560 | Loss: 0.343 | 597 ms/step , 115629.68 GFLOP/s , 173782.0 tokens/s INFO:__main__:2024-11-30 07:52:58 | Epoch: 0 | Step: 337390 | Dataset: 0-7293960 | Loss: 0.375 | 597 ms/step , 115562.24 GFLOP/s , 173584.6 tokens/s INFO:__main__:2024-11-30 07:53:05 | Epoch: 0 | Step: 337400 | Dataset: 0-7296360 | Loss: 0.356 | 598 ms/step , 115489.72 GFLOP/s , 173617.0 tokens/s INFO:__main__:2024-11-30 07:53:12 | Epoch: 0 | Step: 337410 | Dataset: 0-7298760 | Loss: 0.353 | 597 ms/step , 115619.75 GFLOP/s , 173651.6 tokens/s INFO:__main__:2024-11-30 07:53:19 | Epoch: 0 | Step: 337420 | Dataset: 0-7301160 | Loss: 0.370 | 597 ms/step , 115607.41 GFLOP/s , 173618.4 tokens/s INFO:__main__:2024-11-30 07:53:26 | Epoch: 0 | Step: 337430 | Dataset: 0-7303560 | Loss: 0.425 | 597 ms/step , 115561.98 GFLOP/s , 173604.4 tokens/s INFO:__main__:2024-11-30 07:53:33 | Epoch: 0 | Step: 337440 | Dataset: 0-7305960 | Loss: 0.390 | 597 ms/step , 115661.18 GFLOP/s , 173675.6 tokens/s INFO:__main__:2024-11-30 07:53:40 | Epoch: 0 | Step: 337450 | Dataset: 0-7308360 | Loss: 0.417 | 597 ms/step , 115508.36 GFLOP/s , 173733.9 tokens/s INFO:__main__:2024-11-30 07:53:47 | Epoch: 0 | Step: 337460 | Dataset: 0-7310760 | Loss: 0.415 | 597 ms/step , 115505.58 GFLOP/s , 173606.5 tokens/s INFO:__main__:2024-11-30 07:53:54 | Epoch: 0 | Step: 337470 | Dataset: 0-7313160 | Loss: 0.339 | 597 ms/step , 115688.56 GFLOP/s , 173580.5 tokens/s INFO:__main__:2024-11-30 07:54:02 | Epoch: 0 | Step: 337480 | Dataset: 0-7315560 | Loss: 0.394 | 598 ms/step , 115484.51 GFLOP/s , 173606.2 tokens/s INFO:__main__:2024-11-30 07:54:09 | Epoch: 0 | Step: 337490 | Dataset: 0-7317960 | Loss: 0.392 | 598 ms/step , 115487.62 GFLOP/s , 173589.0 tokens/s INFO:__main__:2024-11-30 07:54:16 | Validation | Step: 337500 | Val_loss: 0.857 | Best_val_loss: 0.4363 INFO:__main__:2024-11-30 07:54:17 | Epoch: 0 | Step: 337500 | Dataset: 0-7320360 | Loss: 0.380 | 596 ms/step , 115734.58 GFLOP/s , 147687.9 tokens/s INFO:__main__:2024-11-30 07:54:24 | Epoch: 0 | Step: 337510 | Dataset: 0-7322760 | Loss: 0.371 | 597 ms/step , 115568.59 GFLOP/s , 173663.5 tokens/s INFO:__main__:2024-11-30 07:54:31 | Epoch: 0 | Step: 337520 | Dataset: 0-7325160 | Loss: 0.369 | 596 ms/step , 115740.17 GFLOP/s , 173839.3 tokens/s INFO:__main__:2024-11-30 07:54:38 | Epoch: 0 | Step: 337530 | Dataset: 0-7327560 | Loss: 0.399 | 597 ms/step , 115618.20 GFLOP/s , 173759.4 tokens/s INFO:__main__:2024-11-30 07:54:45 | Epoch: 0 | Step: 337540 | Dataset: 0-7329960 | Loss: 0.358 | 597 ms/step , 115616.19 GFLOP/s , 173643.8 tokens/s INFO:__main__:2024-11-30 07:54:52 | Epoch: 0 | Step: 337550 | Dataset: 0-7332360 | Loss: 0.351 | 597 ms/step , 115575.51 GFLOP/s , 173608.2 tokens/s INFO:__main__:2024-11-30 07:54:59 | Epoch: 0 | Step: 337560 | Dataset: 0-7334760 | Loss: 0.408 | 598 ms/step , 115490.16 GFLOP/s , 173654.1 tokens/s INFO:__main__:2024-11-30 07:55:06 | Epoch: 0 | Step: 337570 | Dataset: 0-7337160 | Loss: 0.368 | 597 ms/step , 115628.30 GFLOP/s , 173643.7 tokens/s INFO:__main__:2024-11-30 07:55:14 | Epoch: 0 | Step: 337580 | Dataset: 0-7339560 | Loss: 0.363 | 598 ms/step , 115397.88 GFLOP/s , 173604.9 tokens/s INFO:__main__:2024-11-30 07:55:21 | Epoch: 0 | Step: 337590 | Dataset: 0-7341960 | Loss: 0.733 | 597 ms/step , 115605.55 GFLOP/s , 173665.9 tokens/s INFO:__main__:2024-11-30 07:55:28 | Epoch: 0 | Step: 337600 | Dataset: 0-7344360 | Loss: 0.722 | 597 ms/step , 115592.18 GFLOP/s , 173622.7 tokens/s INFO:__main__:2024-11-30 07:55:35 | Epoch: 0 | Step: 337610 | Dataset: 0-7346760 | Loss: 0.733 | 597 ms/step , 115573.95 GFLOP/s , 173522.8 tokens/s INFO:__main__:2024-11-30 07:55:42 | Epoch: 0 | Step: 337620 | Dataset: 0-7349160 | Loss: 0.664 | 598 ms/step , 115479.32 GFLOP/s , 173511.3 tokens/s INFO:__main__:2024-11-30 07:55:49 | Epoch: 0 | Step: 337630 | Dataset: 0-7351560 | Loss: 0.718 | 598 ms/step , 115489.71 GFLOP/s , 173431.1 tokens/s INFO:__main__:2024-11-30 07:55:56 | Epoch: 0 | Step: 337640 | Dataset: 0-7353960 | Loss: 0.673 | 597 ms/step , 115519.46 GFLOP/s , 173479.5 tokens/s INFO:__main__:2024-11-30 07:56:03 | Epoch: 0 | Step: 337650 | Dataset: 0-7356360 | Loss: 0.655 | 597 ms/step , 115542.51 GFLOP/s , 173472.6 tokens/s INFO:__main__:2024-11-30 07:56:10 | Epoch: 0 | Step: 337660 | Dataset: 0-7358760 | Loss: 0.735 | 597 ms/step , 115518.49 GFLOP/s , 173475.1 tokens/s INFO:__main__:2024-11-30 07:56:17 | Epoch: 0 | Step: 337670 | Dataset: 0-7361160 | Loss: 0.593 | 597 ms/step , 115598.07 GFLOP/s , 173691.1 tokens/s INFO:__main__:2024-11-30 07:56:24 | Epoch: 0 | Step: 337680 | Dataset: 0-7363560 | Loss: 0.721 | 598 ms/step , 115476.80 GFLOP/s , 173536.0 tokens/s INFO:__main__:2024-11-30 07:56:31 | Epoch: 0 | Step: 337690 | Dataset: 0-7365960 | Loss: 0.602 | 597 ms/step , 115534.08 GFLOP/s , 173519.2 tokens/s INFO:__main__:2024-11-30 07:56:39 | Epoch: 0 | Step: 337700 | Dataset: 0-7368360 | Loss: 0.710 | 598 ms/step , 115491.92 GFLOP/s , 173462.3 tokens/s INFO:__main__:2024-11-30 07:56:46 | Epoch: 0 | Step: 337710 | Dataset: 0-7370760 | Loss: 0.684 | 598 ms/step , 115452.53 GFLOP/s , 173465.7 tokens/s INFO:__main__:2024-11-30 07:56:53 | Epoch: 0 | Step: 337720 | Dataset: 0-7373160 | Loss: 0.674 | 597 ms/step , 115545.84 GFLOP/s , 173502.5 tokens/s INFO:__main__:2024-11-30 07:57:00 | Epoch: 0 | Step: 337730 | Dataset: 0-7375560 | Loss: 0.719 | 598 ms/step , 115413.44 GFLOP/s , 173502.2 tokens/s INFO:__main__:2024-11-30 07:57:07 | Epoch: 0 | Step: 337740 | Dataset: 0-7377960 | Loss: 0.662 | 597 ms/step , 115610.75 GFLOP/s , 173639.6 tokens/s INFO:__main__:2024-11-30 07:57:14 | Epoch: 0 | Step: 337750 | Dataset: 0-7380360 | Loss: 0.636 | 597 ms/step , 115521.09 GFLOP/s , 173660.4 tokens/s INFO:__main__:2024-11-30 07:57:21 | Epoch: 0 | Step: 337760 | Dataset: 0-7382760 | Loss: 0.703 | 597 ms/step , 115527.82 GFLOP/s , 173437.3 tokens/s INFO:__main__:2024-11-30 07:57:28 | Epoch: 0 | Step: 337770 | Dataset: 0-7385160 | Loss: 0.622 | 598 ms/step , 115409.00 GFLOP/s , 173526.1 tokens/s INFO:__main__:2024-11-30 07:57:35 | Epoch: 0 | Step: 337780 | Dataset: 0-7387560 | Loss: 0.635 | 598 ms/step , 115497.95 GFLOP/s , 173498.7 tokens/s INFO:__main__:2024-11-30 07:57:42 | Epoch: 0 | Step: 337790 | Dataset: 0-7389960 | Loss: 0.731 | 597 ms/step , 115542.28 GFLOP/s , 173484.6 tokens/s INFO:__main__:2024-11-30 07:57:49 | Epoch: 0 | Step: 337800 | Dataset: 0-7392360 | Loss: 0.623 | 597 ms/step , 115503.64 GFLOP/s , 173463.9 tokens/s INFO:__main__:2024-11-30 07:57:56 | Epoch: 0 | Step: 337810 | Dataset: 0-7394760 | Loss: 0.631 | 598 ms/step , 115482.64 GFLOP/s , 173599.4 tokens/s INFO:__main__:2024-11-30 07:58:03 | Epoch: 0 | Step: 337820 | Dataset: 0-7397160 | Loss: 0.736 | 598 ms/step , 115493.54 GFLOP/s , 173606.3 tokens/s INFO:__main__:2024-11-30 07:58:11 | Epoch: 0 | Step: 337830 | Dataset: 0-7399560 | Loss: 0.608 | 598 ms/step , 115452.22 GFLOP/s , 173479.6 tokens/s INFO:__main__:2024-11-30 07:58:18 | Epoch: 0 | Step: 337840 | Dataset: 0-7401960 | Loss: 0.665 | 598 ms/step , 115499.32 GFLOP/s , 173415.8 tokens/s INFO:__main__:2024-11-30 07:58:25 | Epoch: 0 | Step: 337850 | Dataset: 0-7404360 | Loss: 0.570 | 598 ms/step , 115439.68 GFLOP/s , 173430.8 tokens/s INFO:__main__:2024-11-30 07:58:32 | Epoch: 0 | Step: 337860 | Dataset: 0-7406760 | Loss: 0.545 | 597 ms/step , 115505.32 GFLOP/s , 173503.5 tokens/s INFO:__main__:2024-11-30 07:58:39 | Epoch: 0 | Step: 337870 | Dataset: 0-7409160 | Loss: 0.765 | 597 ms/step , 115524.21 GFLOP/s , 173483.1 tokens/s INFO:__main__:2024-11-30 07:58:46 | Epoch: 0 | Step: 337880 | Dataset: 0-7411560 | Loss: 0.602 | 597 ms/step , 115558.59 GFLOP/s , 173463.3 tokens/s INFO:__main__:2024-11-30 07:58:53 | Epoch: 0 | Step: 337890 | Dataset: 0-7413960 | Loss: 0.682 | 598 ms/step , 115489.14 GFLOP/s , 173555.2 tokens/s INFO:__main__:2024-11-30 07:59:00 | Epoch: 0 | Step: 337900 | Dataset: 0-7416360 | Loss: 0.687 | 599 ms/step , 115301.35 GFLOP/s , 173546.6 tokens/s INFO:__main__:2024-11-30 07:59:07 | Epoch: 0 | Step: 337910 | Dataset: 0-7418760 | Loss: 0.675 | 598 ms/step , 115453.12 GFLOP/s , 173510.0 tokens/s INFO:__main__:2024-11-30 07:59:14 | Epoch: 0 | Step: 337920 | Dataset: 0-7421160 | Loss: 0.705 | 598 ms/step , 115378.99 GFLOP/s , 173436.9 tokens/s INFO:__main__:2024-11-30 07:59:21 | Epoch: 0 | Step: 337930 | Dataset: 0-7423560 | Loss: 0.683 | 597 ms/step , 115513.21 GFLOP/s , 173439.6 tokens/s INFO:__main__:2024-11-30 07:59:28 | Epoch: 0 | Step: 337940 | Dataset: 0-7425960 | Loss: 0.669 | 597 ms/step , 115597.89 GFLOP/s , 173459.7 tokens/s INFO:__main__:2024-11-30 07:59:36 | Epoch: 0 | Step: 337950 | Dataset: 0-7428360 | Loss: 0.697 | 598 ms/step , 115408.81 GFLOP/s , 173458.3 tokens/s INFO:__main__:2024-11-30 07:59:43 | Epoch: 0 | Step: 337960 | Dataset: 0-7430760 | Loss: 0.643 | 597 ms/step , 115654.30 GFLOP/s , 173595.5 tokens/s INFO:__main__:2024-11-30 07:59:50 | Epoch: 0 | Step: 337970 | Dataset: 0-7433160 | Loss: 0.681 | 597 ms/step , 115530.23 GFLOP/s , 173668.7 tokens/s INFO:__main__:2024-11-30 07:59:57 | Epoch: 0 | Step: 337980 | Dataset: 0-7435560 | Loss: 0.685 | 597 ms/step , 115564.07 GFLOP/s , 173517.2 tokens/s INFO:__main__:2024-11-30 08:00:04 | Epoch: 0 | Step: 337990 | Dataset: 0-7437960 | Loss: 0.723 | 597 ms/step , 115507.43 GFLOP/s , 173520.9 tokens/s INFO:__main__:2024-11-30 08:00:12 | Validation | Step: 338000 | Val_loss: 0.823 | Best_val_loss: 0.4363 INFO:__main__:2024-11-30 08:00:12 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_080012_step_338000.pt` INFO:__main__:2024-11-30 08:00:14 | Epoch: 0 | Step: 338000 | Dataset: 0-7440360 | Loss: 0.675 | 596 ms/step , 115751.13 GFLOP/s , 118287.7 tokens/s INFO:__main__:2024-11-30 08:00:21 | Epoch: 0 | Step: 338010 | Dataset: 0-7442760 | Loss: 0.626 | 600 ms/step , 115060.44 GFLOP/s , 172948.4 tokens/s INFO:__main__:2024-11-30 08:00:28 | Epoch: 0 | Step: 338020 | Dataset: 0-7445160 | Loss: 0.602 | 600 ms/step , 114981.45 GFLOP/s , 172955.6 tokens/s INFO:__main__:2024-11-30 08:00:36 | Epoch: 0 | Step: 338030 | Dataset: 0-7447560 | Loss: 0.593 | 599 ms/step , 115170.41 GFLOP/s , 172998.4 tokens/s INFO:__main__:2024-11-30 08:00:43 | Epoch: 0 | Step: 338040 | Dataset: 0-7449960 | Loss: 0.616 | 597 ms/step , 115629.58 GFLOP/s , 173051.0 tokens/s INFO:__main__:2024-11-30 08:00:50 | Epoch: 0 | Step: 338050 | Dataset: 0-7452360 | Loss: 0.739 | 598 ms/step , 115425.38 GFLOP/s , 172973.5 tokens/s INFO:__main__:2024-11-30 08:00:57 | Epoch: 0 | Step: 338060 | Dataset: 0-7454760 | Loss: 0.653 | 598 ms/step , 115446.49 GFLOP/s , 172919.5 tokens/s INFO:__main__:2024-11-30 08:01:04 | Epoch: 0 | Step: 338070 | Dataset: 0-7457160 | Loss: 0.720 | 597 ms/step , 115531.40 GFLOP/s , 173182.4 tokens/s INFO:__main__:2024-11-30 08:01:11 | Epoch: 0 | Step: 338080 | Dataset: 0-7459560 | Loss: 0.665 | 597 ms/step , 115537.37 GFLOP/s , 173534.6 tokens/s INFO:__main__:2024-11-30 08:01:18 | Epoch: 0 | Step: 338090 | Dataset: 0-7461960 | Loss: 0.677 | 598 ms/step , 115502.06 GFLOP/s , 173510.7 tokens/s INFO:__main__:2024-11-30 08:01:25 | Epoch: 0 | Step: 338100 | Dataset: 0-7464360 | Loss: 0.696 | 598 ms/step , 115439.78 GFLOP/s , 173524.6 tokens/s INFO:__main__:2024-11-30 08:01:32 | Epoch: 0 | Step: 338110 | Dataset: 0-7466760 | Loss: 0.746 | 598 ms/step , 115442.54 GFLOP/s , 173599.4 tokens/s INFO:__main__:2024-11-30 08:01:39 | Epoch: 0 | Step: 338120 | Dataset: 0-7469160 | Loss: 0.704 | 598 ms/step , 115393.41 GFLOP/s , 173508.8 tokens/s INFO:__main__:2024-11-30 08:01:46 | Epoch: 0 | Step: 338130 | Dataset: 0-7471560 | Loss: 0.582 | 597 ms/step , 115541.01 GFLOP/s , 173550.2 tokens/s INFO:__main__:2024-11-30 08:01:54 | Epoch: 0 | Step: 338140 | Dataset: 0-7473960 | Loss: 0.629 | 598 ms/step , 115479.09 GFLOP/s , 173512.1 tokens/s INFO:__main__:2024-11-30 08:02:01 | Epoch: 0 | Step: 338150 | Dataset: 0-7476360 | Loss: 0.599 | 598 ms/step , 115408.12 GFLOP/s , 173458.6 tokens/s INFO:__main__:2024-11-30 08:02:08 | Epoch: 0 | Step: 338160 | Dataset: 0-7478760 | Loss: 0.641 | 598 ms/step , 115373.67 GFLOP/s , 173423.1 tokens/s INFO:__main__:2024-11-30 08:02:15 | Epoch: 0 | Step: 338170 | Dataset: 0-7481160 | Loss: 0.547 | 598 ms/step , 115396.73 GFLOP/s , 173507.0 tokens/s INFO:__main__:2024-11-30 08:02:22 | Epoch: 0 | Step: 338180 | Dataset: 0-7483560 | Loss: 0.610 | 597 ms/step , 115628.00 GFLOP/s , 173570.7 tokens/s INFO:__main__:2024-11-30 08:02:29 | Epoch: 0 | Step: 338190 | Dataset: 0-7485960 | Loss: 0.575 | 597 ms/step , 115591.60 GFLOP/s , 173613.9 tokens/s INFO:__main__:2024-11-30 08:02:36 | Epoch: 0 | Step: 338200 | Dataset: 0-7488360 | Loss: 0.604 | 598 ms/step , 115487.50 GFLOP/s , 173463.6 tokens/s INFO:__main__:2024-11-30 08:02:43 | Epoch: 0 | Step: 338210 | Dataset: 0-7490760 | Loss: 0.547 | 598 ms/step , 115321.97 GFLOP/s , 173369.5 tokens/s INFO:__main__:2024-11-30 08:02:50 | Epoch: 0 | Step: 338220 | Dataset: 0-7493160 | Loss: 0.622 | 598 ms/step , 115406.87 GFLOP/s , 173443.8 tokens/s INFO:__main__:2024-11-30 08:02:57 | Epoch: 0 | Step: 338230 | Dataset: 0-7495560 | Loss: 0.534 | 598 ms/step , 115488.19 GFLOP/s , 173449.6 tokens/s INFO:__main__:2024-11-30 08:03:04 | Epoch: 0 | Step: 338240 | Dataset: 0-7497960 | Loss: 0.581 | 598 ms/step , 115471.88 GFLOP/s , 173391.5 tokens/s INFO:__main__:2024-11-30 08:03:11 | Epoch: 0 | Step: 338250 | Dataset: 0-7500360 | Loss: 0.569 | 597 ms/step , 115563.83 GFLOP/s , 173458.9 tokens/s INFO:__main__:2024-11-30 08:03:19 | Epoch: 0 | Step: 338260 | Dataset: 0-7502760 | Loss: 0.541 | 597 ms/step , 115504.21 GFLOP/s , 173609.3 tokens/s INFO:__main__:2024-11-30 08:03:26 | Epoch: 0 | Step: 338270 | Dataset: 0-7505160 | Loss: 0.572 | 598 ms/step , 115326.32 GFLOP/s , 173496.2 tokens/s INFO:__main__:2024-11-30 08:03:33 | Epoch: 0 | Step: 338280 | Dataset: 0-7507560 | Loss: 0.575 | 598 ms/step , 115450.49 GFLOP/s , 173452.3 tokens/s INFO:__main__:2024-11-30 08:03:40 | Epoch: 0 | Step: 338290 | Dataset: 0-7509960 | Loss: 0.605 | 599 ms/step , 115292.87 GFLOP/s , 173398.8 tokens/s INFO:__main__:2024-11-30 08:03:47 | Epoch: 0 | Step: 338300 | Dataset: 0-7512360 | Loss: 0.539 | 597 ms/step , 115595.53 GFLOP/s , 173503.3 tokens/s INFO:__main__:2024-11-30 08:03:54 | Epoch: 0 | Step: 338310 | Dataset: 0-7514760 | Loss: 0.584 | 598 ms/step , 115434.66 GFLOP/s , 173470.6 tokens/s INFO:__main__:2024-11-30 08:04:01 | Epoch: 0 | Step: 338320 | Dataset: 0-7517160 | Loss: 0.563 | 599 ms/step , 115306.64 GFLOP/s , 173449.3 tokens/s INFO:__main__:2024-11-30 08:04:08 | Epoch: 0 | Step: 338330 | Dataset: 0-7519560 | Loss: 0.531 | 597 ms/step , 115547.65 GFLOP/s , 173509.6 tokens/s INFO:__main__:2024-11-30 08:04:15 | Epoch: 0 | Step: 338340 | Dataset: 0-7521960 | Loss: 0.551 | 597 ms/step , 115534.43 GFLOP/s , 173601.4 tokens/s INFO:__main__:2024-11-30 08:04:22 | Epoch: 0 | Step: 338350 | Dataset: 0-7524360 | Loss: 0.530 | 598 ms/step , 115454.12 GFLOP/s , 173496.7 tokens/s INFO:__main__:2024-11-30 08:04:29 | Epoch: 0 | Step: 338360 | Dataset: 0-7526760 | Loss: 0.567 | 598 ms/step , 115466.45 GFLOP/s , 173407.3 tokens/s INFO:__main__:2024-11-30 08:04:36 | Epoch: 0 | Step: 338370 | Dataset: 0-7529160 | Loss: 0.558 | 598 ms/step , 115464.71 GFLOP/s , 173527.8 tokens/s INFO:__main__:2024-11-30 08:04:44 | Epoch: 0 | Step: 338380 | Dataset: 0-7531560 | Loss: 0.563 | 598 ms/step , 115441.23 GFLOP/s , 173398.7 tokens/s INFO:__main__:2024-11-30 08:04:51 | Epoch: 0 | Step: 338390 | Dataset: 0-7533960 | Loss: 0.572 | 598 ms/step , 115419.32 GFLOP/s , 173473.1 tokens/s INFO:__main__:2024-11-30 08:04:58 | Epoch: 0 | Step: 338400 | Dataset: 0-7536360 | Loss: 0.569 | 597 ms/step , 115602.05 GFLOP/s , 173536.1 tokens/s INFO:__main__:2024-11-30 08:05:05 | Epoch: 0 | Step: 338410 | Dataset: 0-7538760 | Loss: 0.531 | 598 ms/step , 115470.53 GFLOP/s , 173635.3 tokens/s INFO:__main__:2024-11-30 08:05:12 | Epoch: 0 | Step: 338420 | Dataset: 0-7541160 | Loss: 0.612 | 598 ms/step , 115476.21 GFLOP/s , 173524.1 tokens/s INFO:__main__:2024-11-30 08:05:19 | Epoch: 0 | Step: 338430 | Dataset: 0-7543560 | Loss: 0.551 | 598 ms/step , 115475.56 GFLOP/s , 173473.7 tokens/s INFO:__main__:2024-11-30 08:05:26 | Epoch: 0 | Step: 338440 | Dataset: 0-7545960 | Loss: 0.627 | 598 ms/step , 115392.87 GFLOP/s , 173410.6 tokens/s INFO:__main__:2024-11-30 08:05:33 | Epoch: 0 | Step: 338450 | Dataset: 0-7548360 | Loss: 0.588 | 598 ms/step , 115475.08 GFLOP/s , 173493.4 tokens/s INFO:__main__:2024-11-30 08:05:40 | Epoch: 0 | Step: 338460 | Dataset: 0-7550760 | Loss: 0.554 | 597 ms/step , 115609.59 GFLOP/s , 173493.7 tokens/s INFO:__main__:2024-11-30 08:05:47 | Epoch: 0 | Step: 338470 | Dataset: 0-7553160 | Loss: 0.612 | 598 ms/step , 115473.43 GFLOP/s , 173475.6 tokens/s INFO:__main__:2024-11-30 08:05:54 | Epoch: 0 | Step: 338480 | Dataset: 0-7555560 | Loss: 0.657 | 597 ms/step , 115577.00 GFLOP/s , 173559.2 tokens/s INFO:__main__:2024-11-30 08:06:01 | Epoch: 0 | Step: 338490 | Dataset: 0-7557960 | Loss: 0.584 | 598 ms/step , 115487.75 GFLOP/s , 173590.4 tokens/s INFO:__main__:2024-11-30 08:06:09 | Validation | Step: 338500 | Val_loss: 0.812 | Best_val_loss: 0.4363 INFO:__main__:2024-11-30 08:06:10 | Epoch: 0 | Step: 338500 | Dataset: 0-7560360 | Loss: 0.548 | 597 ms/step , 115576.03 GFLOP/s , 147565.2 tokens/s INFO:__main__:2024-11-30 08:06:17 | Epoch: 0 | Step: 338510 | Dataset: 0-7562760 | Loss: 0.581 | 598 ms/step , 115484.29 GFLOP/s , 173594.2 tokens/s INFO:__main__:2024-11-30 08:06:24 | Epoch: 0 | Step: 338520 | Dataset: 0-7565160 | Loss: 0.605 | 598 ms/step , 115485.47 GFLOP/s , 173520.7 tokens/s INFO:__main__:2024-11-30 08:06:31 | Epoch: 0 | Step: 338530 | Dataset: 0-7567560 | Loss: 0.573 | 597 ms/step , 115505.54 GFLOP/s , 173486.1 tokens/s INFO:__main__:2024-11-30 08:06:38 | Epoch: 0 | Step: 338540 | Dataset: 0-7569960 | Loss: 0.615 | 598 ms/step , 115397.54 GFLOP/s , 173476.5 tokens/s INFO:__main__:2024-11-30 08:06:45 | Epoch: 0 | Step: 338550 | Dataset: 0-7572360 | Loss: 0.597 | 597 ms/step , 115666.91 GFLOP/s , 173579.7 tokens/s INFO:__main__:2024-11-30 08:06:52 | Epoch: 0 | Step: 338560 | Dataset: 0-7574760 | Loss: 0.560 | 597 ms/step , 115563.64 GFLOP/s , 173659.5 tokens/s INFO:__main__:2024-11-30 08:06:59 | Epoch: 0 | Step: 338570 | Dataset: 0-7577160 | Loss: 0.579 | 597 ms/step , 115544.82 GFLOP/s , 173544.1 tokens/s INFO:__main__:2024-11-30 08:07:06 | Epoch: 0 | Step: 338580 | Dataset: 0-7579560 | Loss: 0.559 | 598 ms/step , 115412.58 GFLOP/s , 173411.1 tokens/s INFO:__main__:2024-11-30 08:07:14 | Epoch: 0 | Step: 338590 | Dataset: 0-7581960 | Loss: 0.562 | 598 ms/step , 115482.83 GFLOP/s , 173484.8 tokens/s INFO:__main__:2024-11-30 08:07:21 | Epoch: 0 | Step: 338600 | Dataset: 0-7584360 | Loss: 0.582 | 598 ms/step , 115494.74 GFLOP/s , 173470.1 tokens/s INFO:__main__:2024-11-30 08:07:28 | Epoch: 0 | Step: 338610 | Dataset: 0-7586760 | Loss: 0.601 | 598 ms/step , 115396.81 GFLOP/s , 173463.3 tokens/s INFO:__main__:2024-11-30 08:07:35 | Epoch: 0 | Step: 338620 | Dataset: 0-7589160 | Loss: 0.513 | 598 ms/step , 115493.32 GFLOP/s , 173458.0 tokens/s INFO:__main__:2024-11-30 08:07:42 | Epoch: 0 | Step: 338630 | Dataset: 0-7591560 | Loss: 0.560 | 597 ms/step , 115629.52 GFLOP/s , 173571.0 tokens/s INFO:__main__:2024-11-30 08:07:49 | Epoch: 0 | Step: 338640 | Dataset: 0-7593960 | Loss: 0.542 | 598 ms/step , 115461.62 GFLOP/s , 173613.8 tokens/s INFO:__main__:2024-11-30 08:07:56 | Epoch: 0 | Step: 338650 | Dataset: 0-7596360 | Loss: 0.541 | 598 ms/step , 115465.93 GFLOP/s , 173433.2 tokens/s INFO:__main__:2024-11-30 08:08:03 | Epoch: 0 | Step: 338660 | Dataset: 0-7598760 | Loss: 0.531 | 598 ms/step , 115403.81 GFLOP/s , 173472.2 tokens/s INFO:__main__:2024-11-30 08:08:10 | Epoch: 0 | Step: 338670 | Dataset: 0-7601160 | Loss: 0.575 | 597 ms/step , 115544.50 GFLOP/s , 173492.6 tokens/s INFO:__main__:2024-11-30 08:08:17 | Epoch: 0 | Step: 338680 | Dataset: 0-7603560 | Loss: 0.388 | 597 ms/step , 115522.44 GFLOP/s , 173586.3 tokens/s INFO:__main__:2024-11-30 08:08:24 | Epoch: 0 | Step: 338690 | Dataset: 0-7605960 | Loss: 0.402 | 597 ms/step , 115595.14 GFLOP/s , 173596.0 tokens/s INFO:__main__:2024-11-30 08:08:31 | Epoch: 0 | Step: 338700 | Dataset: 0-7608360 | Loss: 0.352 | 597 ms/step , 115591.88 GFLOP/s , 173714.0 tokens/s INFO:__main__:2024-11-30 08:08:38 | Epoch: 0 | Step: 338710 | Dataset: 0-7610760 | Loss: 0.416 | 597 ms/step , 115611.97 GFLOP/s , 173775.3 tokens/s INFO:__main__:2024-11-30 08:08:46 | Epoch: 0 | Step: 338720 | Dataset: 0-7613160 | Loss: 0.355 | 597 ms/step , 115682.91 GFLOP/s , 173641.0 tokens/s INFO:__main__:2024-11-30 08:08:53 | Epoch: 0 | Step: 338730 | Dataset: 0-7615560 | Loss: 0.421 | 597 ms/step , 115557.22 GFLOP/s , 173551.6 tokens/s INFO:__main__:2024-11-30 08:09:00 | Epoch: 0 | Step: 338740 | Dataset: 0-7617960 | Loss: 0.366 | 598 ms/step , 115396.36 GFLOP/s , 173601.0 tokens/s INFO:__main__:2024-11-30 08:09:07 | Epoch: 0 | Step: 338750 | Dataset: 0-7620360 | Loss: 0.368 | 597 ms/step , 115547.82 GFLOP/s , 173581.3 tokens/s INFO:__main__:2024-11-30 08:09:14 | Epoch: 0 | Step: 338760 | Dataset: 0-7622760 | Loss: 0.321 | 597 ms/step , 115629.75 GFLOP/s , 173630.5 tokens/s INFO:__main__:2024-11-30 08:09:21 | Epoch: 0 | Step: 338770 | Dataset: 0-7625160 | Loss: 0.362 | 596 ms/step , 115723.18 GFLOP/s , 173713.6 tokens/s INFO:__main__:2024-11-30 08:09:28 | Epoch: 0 | Step: 338780 | Dataset: 0-7627560 | Loss: 0.414 | 596 ms/step , 115716.81 GFLOP/s , 173739.0 tokens/s INFO:__main__:2024-11-30 08:09:35 | Epoch: 0 | Step: 338790 | Dataset: 0-7629960 | Loss: 0.386 | 598 ms/step , 115490.92 GFLOP/s , 173667.5 tokens/s INFO:__main__:2024-11-30 08:09:42 | Epoch: 0 | Step: 338800 | Dataset: 0-7632360 | Loss: 0.369 | 597 ms/step , 115587.55 GFLOP/s , 173635.4 tokens/s INFO:__main__:2024-11-30 08:09:49 | Epoch: 0 | Step: 338810 | Dataset: 0-7634760 | Loss: 0.384 | 597 ms/step , 115616.47 GFLOP/s , 173608.4 tokens/s INFO:__main__:2024-11-30 08:09:56 | Epoch: 0 | Step: 338820 | Dataset: 0-7637160 | Loss: 0.345 | 597 ms/step , 115583.99 GFLOP/s , 173627.1 tokens/s INFO:__main__:2024-11-30 08:10:03 | Epoch: 0 | Step: 338830 | Dataset: 0-7639560 | Loss: 0.374 | 597 ms/step , 115654.05 GFLOP/s , 173620.8 tokens/s INFO:__main__:2024-11-30 08:10:10 | Epoch: 0 | Step: 338840 | Dataset: 0-7641960 | Loss: 0.389 | 597 ms/step , 115551.22 GFLOP/s , 173617.0 tokens/s INFO:__main__:2024-11-30 08:10:18 | Epoch: 0 | Step: 338850 | Dataset: 0-7644360 | Loss: 0.292 | 596 ms/step , 115715.27 GFLOP/s , 173727.0 tokens/s INFO:__main__:2024-11-30 08:10:25 | Epoch: 0 | Step: 338860 | Dataset: 0-7646760 | Loss: 0.375 | 597 ms/step , 115544.07 GFLOP/s , 173724.7 tokens/s INFO:__main__:2024-11-30 08:10:32 | Epoch: 0 | Step: 338870 | Dataset: 0-7649160 | Loss: 0.404 | 597 ms/step , 115579.04 GFLOP/s , 173579.2 tokens/s INFO:__main__:2024-11-30 08:10:39 | Epoch: 0 | Step: 338880 | Dataset: 0-7651560 | Loss: 0.398 | 598 ms/step , 115445.01 GFLOP/s , 173667.2 tokens/s INFO:__main__:2024-11-30 08:10:46 | Epoch: 0 | Step: 338890 | Dataset: 0-7653960 | Loss: 0.410 | 597 ms/step , 115565.25 GFLOP/s , 173631.9 tokens/s INFO:__main__:2024-11-30 08:10:53 | Epoch: 0 | Step: 338900 | Dataset: 0-7656360 | Loss: 0.339 | 598 ms/step , 115493.94 GFLOP/s , 173654.1 tokens/s INFO:__main__:2024-11-30 08:11:00 | Epoch: 0 | Step: 338910 | Dataset: 0-7658760 | Loss: 0.388 | 597 ms/step , 115510.47 GFLOP/s , 173643.7 tokens/s INFO:__main__:2024-11-30 08:11:07 | Epoch: 0 | Step: 338920 | Dataset: 0-7661160 | Loss: 0.358 | 596 ms/step , 115706.36 GFLOP/s , 173736.9 tokens/s INFO:__main__:2024-11-30 08:11:14 | Epoch: 0 | Step: 338930 | Dataset: 0-7663560 | Loss: 0.341 | 597 ms/step , 115631.07 GFLOP/s , 173869.3 tokens/s INFO:__main__:2024-11-30 08:11:21 | Epoch: 0 | Step: 338940 | Dataset: 0-7665960 | Loss: 0.362 | 597 ms/step , 115634.87 GFLOP/s , 173701.8 tokens/s INFO:__main__:2024-11-30 08:11:28 | Epoch: 0 | Step: 338950 | Dataset: 0-7668360 | Loss: 0.369 | 597 ms/step , 115572.55 GFLOP/s , 173632.1 tokens/s INFO:__main__:2024-11-30 08:11:35 | Epoch: 0 | Step: 338960 | Dataset: 0-7670760 | Loss: 0.361 | 598 ms/step , 115450.46 GFLOP/s , 173637.3 tokens/s INFO:__main__:2024-11-30 08:11:42 | Epoch: 0 | Step: 338970 | Dataset: 0-7673160 | Loss: 0.337 | 597 ms/step , 115588.74 GFLOP/s , 173560.9 tokens/s INFO:__main__:2024-11-30 08:11:50 | Epoch: 0 | Step: 338980 | Dataset: 0-7675560 | Loss: 0.337 | 597 ms/step , 115584.45 GFLOP/s , 173618.6 tokens/s INFO:__main__:2024-11-30 08:11:57 | Epoch: 0 | Step: 338990 | Dataset: 0-7677960 | Loss: 0.345 | 597 ms/step , 115610.82 GFLOP/s , 173673.8 tokens/s INFO:__main__:2024-11-30 08:12:04 | Validation | Step: 339000 | Val_loss: 0.875 | Best_val_loss: 0.4363 INFO:__main__:2024-11-30 08:12:04 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_081204_step_339000.pt` INFO:__main__:2024-11-30 08:12:07 | Epoch: 0 | Step: 339000 | Dataset: 0-7680360 | Loss: 0.415 | 595 ms/step , 115956.80 GFLOP/s , 120984.6 tokens/s INFO:__main__:2024-11-30 08:12:14 | Epoch: 0 | Step: 339010 | Dataset: 0-7682760 | Loss: 0.361 | 598 ms/step , 115376.20 GFLOP/s , 173300.7 tokens/s INFO:__main__:2024-11-30 08:12:21 | Epoch: 0 | Step: 339020 | Dataset: 0-7685160 | Loss: 0.313 | 599 ms/step , 115296.37 GFLOP/s , 173201.0 tokens/s INFO:__main__:2024-11-30 08:12:28 | Epoch: 0 | Step: 339030 | Dataset: 0-7687560 | Loss: 0.385 | 599 ms/step , 115230.16 GFLOP/s , 173166.3 tokens/s INFO:__main__:2024-11-30 08:12:35 | Epoch: 0 | Step: 339040 | Dataset: 0-7689960 | Loss: 0.322 | 599 ms/step , 115301.91 GFLOP/s , 173165.1 tokens/s INFO:__main__:2024-11-30 08:12:42 | Epoch: 0 | Step: 339050 | Dataset: 0-7692360 | Loss: 0.346 | 597 ms/step , 115608.56 GFLOP/s , 173163.2 tokens/s INFO:__main__:2024-11-30 08:12:49 | Epoch: 0 | Step: 339060 | Dataset: 0-7694760 | Loss: 0.411 | 597 ms/step , 115539.13 GFLOP/s , 173180.8 tokens/s INFO:__main__:2024-11-30 08:12:56 | Epoch: 0 | Step: 339070 | Dataset: 0-7697160 | Loss: 0.354 | 596 ms/step , 115790.45 GFLOP/s , 173404.7 tokens/s INFO:__main__:2024-11-30 08:13:04 | Epoch: 0 | Step: 339080 | Dataset: 0-7699560 | Loss: 0.339 | 596 ms/step , 115702.12 GFLOP/s , 173292.2 tokens/s INFO:__main__:2024-11-30 08:13:11 | Epoch: 0 | Step: 339090 | Dataset: 0-7701960 | Loss: 0.374 | 597 ms/step , 115594.72 GFLOP/s , 173127.5 tokens/s INFO:__main__:2024-11-30 08:13:18 | Epoch: 0 | Step: 339100 | Dataset: 0-7704360 | Loss: 0.393 | 597 ms/step , 115515.48 GFLOP/s , 173303.6 tokens/s INFO:__main__:2024-11-30 08:13:25 | Epoch: 0 | Step: 339110 | Dataset: 0-7706760 | Loss: 0.396 | 598 ms/step , 115427.68 GFLOP/s , 173634.3 tokens/s INFO:__main__:2024-11-30 08:13:32 | Epoch: 0 | Step: 339120 | Dataset: 0-7709160 | Loss: 0.342 | 597 ms/step , 115594.57 GFLOP/s , 173663.6 tokens/s INFO:__main__:2024-11-30 08:13:39 | Epoch: 0 | Step: 339130 | Dataset: 0-7711560 | Loss: 0.342 | 601 ms/step , 114845.44 GFLOP/s , 173538.2 tokens/s INFO:__main__:2024-11-30 08:13:46 | Epoch: 0 | Step: 339140 | Dataset: 0-7713960 | Loss: 0.363 | 598 ms/step , 115361.74 GFLOP/s , 173559.7 tokens/s INFO:__main__:2024-11-30 08:13:53 | Epoch: 0 | Step: 339150 | Dataset: 0-7716360 | Loss: 0.356 | 596 ms/step , 115696.15 GFLOP/s , 173823.7 tokens/s INFO:__main__:2024-11-30 08:14:00 | Epoch: 0 | Step: 339160 | Dataset: 0-7718760 | Loss: 0.413 | 597 ms/step , 115554.42 GFLOP/s , 173727.7 tokens/s INFO:__main__:2024-11-30 08:14:07 | Epoch: 0 | Step: 339170 | Dataset: 0-7721160 | Loss: 0.376 | 598 ms/step , 115335.55 GFLOP/s , 173610.0 tokens/s INFO:__main__:2024-11-30 08:14:14 | Epoch: 0 | Step: 339180 | Dataset: 0-7723560 | Loss: 0.326 | 597 ms/step , 115585.10 GFLOP/s , 173634.9 tokens/s INFO:__main__:2024-11-30 08:14:21 | Epoch: 0 | Step: 339190 | Dataset: 0-7725960 | Loss: 0.337 | 597 ms/step , 115646.19 GFLOP/s , 173628.2 tokens/s INFO:__main__:2024-11-30 08:14:28 | Epoch: 0 | Step: 339200 | Dataset: 0-7728360 | Loss: 0.365 | 597 ms/step , 115571.19 GFLOP/s , 173606.2 tokens/s INFO:__main__:2024-11-30 08:14:36 | Epoch: 0 | Step: 339210 | Dataset: 0-7730760 | Loss: 0.326 | 617 ms/step , 111765.95 GFLOP/s , 173062.0 tokens/s INFO:__main__:2024-11-30 08:14:43 | Epoch: 0 | Step: 339220 | Dataset: 0-7733160 | Loss: 0.399 | 598 ms/step , 115457.23 GFLOP/s , 173745.4 tokens/s INFO:__main__:2024-11-30 08:14:50 | Epoch: 0 | Step: 339230 | Dataset: 0-7735560 | Loss: 0.758 | 598 ms/step , 115493.11 GFLOP/s , 173592.2 tokens/s INFO:__main__:2024-11-30 08:14:57 | Epoch: 0 | Step: 339240 | Dataset: 0-7737960 | Loss: 0.773 | 599 ms/step , 115252.75 GFLOP/s , 173464.6 tokens/s INFO:__main__:2024-11-30 08:15:04 | Epoch: 0 | Step: 339250 | Dataset: 0-7740360 | Loss: 0.685 | 598 ms/step , 115428.52 GFLOP/s , 173383.9 tokens/s INFO:__main__:2024-11-30 08:15:11 | Epoch: 0 | Step: 339260 | Dataset: 0-7742760 | Loss: 0.693 | 597 ms/step , 115503.25 GFLOP/s , 173477.3 tokens/s INFO:__main__:2024-11-30 08:15:18 | Epoch: 0 | Step: 339270 | Dataset: 0-7745160 | Loss: 0.708 | 597 ms/step , 115526.34 GFLOP/s , 173454.3 tokens/s INFO:__main__:2024-11-30 08:15:25 | Epoch: 0 | Step: 339280 | Dataset: 0-7747560 | Loss: 0.733 | 598 ms/step , 115318.58 GFLOP/s , 173462.0 tokens/s INFO:__main__:2024-11-30 08:15:32 | Epoch: 0 | Step: 339290 | Dataset: 0-7749960 | Loss: 0.583 | 598 ms/step , 115494.19 GFLOP/s , 173443.8 tokens/s INFO:__main__:2024-11-30 08:15:39 | Epoch: 0 | Step: 339300 | Dataset: 0-7752360 | Loss: 0.654 | 598 ms/step , 115365.17 GFLOP/s , 173625.1 tokens/s INFO:__main__:2024-11-30 08:15:46 | Epoch: 0 | Step: 339310 | Dataset: 0-7754760 | Loss: 0.817 | 598 ms/step , 115391.41 GFLOP/s , 173595.9 tokens/s INFO:__main__:2024-11-30 08:15:53 | Epoch: 0 | Step: 339320 | Dataset: 0-7757160 | Loss: 0.661 | 599 ms/step , 115283.34 GFLOP/s , 173502.3 tokens/s INFO:__main__:2024-11-30 08:16:01 | Epoch: 0 | Step: 339330 | Dataset: 0-7759560 | Loss: 0.727 | 598 ms/step , 115336.40 GFLOP/s , 173432.2 tokens/s INFO:__main__:2024-11-30 08:16:08 | Epoch: 0 | Step: 339340 | Dataset: 0-7761960 | Loss: 0.721 | 598 ms/step , 115425.58 GFLOP/s , 173373.3 tokens/s INFO:__main__:2024-11-30 08:16:15 | Epoch: 0 | Step: 339350 | Dataset: 0-7764360 | Loss: 0.670 | 599 ms/step , 115279.98 GFLOP/s , 173438.4 tokens/s INFO:__main__:2024-11-30 08:16:22 | Epoch: 0 | Step: 339360 | Dataset: 0-7766760 | Loss: 0.718 | 599 ms/step , 115225.76 GFLOP/s , 173460.0 tokens/s INFO:__main__:2024-11-30 08:16:29 | Epoch: 0 | Step: 339370 | Dataset: 0-7769160 | Loss: 0.624 | 598 ms/step , 115484.81 GFLOP/s , 173568.3 tokens/s INFO:__main__:2024-11-30 08:16:36 | Epoch: 0 | Step: 339380 | Dataset: 0-7771560 | Loss: 0.731 | 597 ms/step , 115552.32 GFLOP/s , 173665.9 tokens/s INFO:__main__:2024-11-30 08:16:43 | Epoch: 0 | Step: 339390 | Dataset: 0-7773960 | Loss: 0.714 | 598 ms/step , 115392.52 GFLOP/s , 173552.2 tokens/s INFO:__main__:2024-11-30 08:16:50 | Epoch: 0 | Step: 339400 | Dataset: 0-7776360 | Loss: 0.623 | 598 ms/step , 115420.26 GFLOP/s , 173421.5 tokens/s INFO:__main__:2024-11-30 08:16:57 | Epoch: 0 | Step: 339410 | Dataset: 0-7778760 | Loss: 0.642 | 597 ms/step , 115508.14 GFLOP/s , 173432.8 tokens/s INFO:__main__:2024-11-30 08:17:04 | Epoch: 0 | Step: 339420 | Dataset: 0-7781160 | Loss: 0.690 | 598 ms/step , 115445.78 GFLOP/s , 173467.5 tokens/s INFO:__main__:2024-11-30 08:17:11 | Epoch: 0 | Step: 339430 | Dataset: 0-7783560 | Loss: 0.643 | 598 ms/step , 115445.38 GFLOP/s , 173452.5 tokens/s INFO:__main__:2024-11-30 08:17:18 | Epoch: 0 | Step: 339440 | Dataset: 0-7785960 | Loss: 0.687 | 598 ms/step , 115432.72 GFLOP/s , 173436.4 tokens/s INFO:__main__:2024-11-30 08:17:26 | Epoch: 0 | Step: 339450 | Dataset: 0-7788360 | Loss: 0.628 | 598 ms/step , 115497.01 GFLOP/s , 173570.7 tokens/s INFO:__main__:2024-11-30 08:17:33 | Epoch: 0 | Step: 339460 | Dataset: 0-7790760 | Loss: 0.757 | 598 ms/step , 115491.92 GFLOP/s , 173566.3 tokens/s INFO:__main__:2024-11-30 08:17:40 | Epoch: 0 | Step: 339470 | Dataset: 0-7793160 | Loss: 0.676 | 598 ms/step , 115403.01 GFLOP/s , 173462.5 tokens/s INFO:__main__:2024-11-30 08:17:47 | Epoch: 0 | Step: 339480 | Dataset: 0-7795560 | Loss: 0.697 | 598 ms/step , 115346.34 GFLOP/s , 173467.7 tokens/s INFO:__main__:2024-11-30 08:17:54 | Epoch: 0 | Step: 339490 | Dataset: 0-7797960 | Loss: 0.623 | 598 ms/step , 115368.88 GFLOP/s , 173462.0 tokens/s INFO:__main__:2024-11-30 08:18:02 | Validation | Step: 339500 | Val_loss: 0.624 | Best_val_loss: 0.4363 INFO:__main__:2024-11-30 08:18:02 | Epoch: 0 | Step: 339500 | Dataset: 0-7800360 | Loss: 0.753 | 597 ms/step , 115503.45 GFLOP/s , 147441.0 tokens/s INFO:__main__:2024-11-30 08:18:09 | Epoch: 0 | Step: 339510 | Dataset: 0-7802760 | Loss: 0.708 | 598 ms/step , 115431.93 GFLOP/s , 173557.7 tokens/s INFO:__main__:2024-11-30 08:18:16 | Epoch: 0 | Step: 339520 | Dataset: 0-7805160 | Loss: 0.688 | 597 ms/step , 115592.42 GFLOP/s , 173661.9 tokens/s INFO:__main__:2024-11-30 08:18:23 | Epoch: 0 | Step: 339530 | Dataset: 0-7807560 | Loss: 0.744 | 598 ms/step , 115414.84 GFLOP/s , 173664.5 tokens/s INFO:__main__:2024-11-30 08:18:31 | Epoch: 0 | Step: 339540 | Dataset: 0-7809960 | Loss: 0.635 | 598 ms/step , 115499.50 GFLOP/s , 173506.6 tokens/s INFO:__main__:2024-11-30 08:18:38 | Epoch: 0 | Step: 339550 | Dataset: 0-7812360 | Loss: 0.704 | 598 ms/step , 115378.04 GFLOP/s , 173477.2 tokens/s INFO:__main__:2024-11-30 08:18:45 | Epoch: 0 | Step: 339560 | Dataset: 0-7814760 | Loss: 0.763 | 598 ms/step , 115452.11 GFLOP/s , 173415.7 tokens/s INFO:__main__:2024-11-30 08:18:52 | Epoch: 0 | Step: 339570 | Dataset: 0-7817160 | Loss: 0.626 | 597 ms/step , 115506.19 GFLOP/s , 173400.4 tokens/s INFO:__main__:2024-11-30 08:18:59 | Epoch: 0 | Step: 339580 | Dataset: 0-7819560 | Loss: 0.630 | 598 ms/step , 115431.63 GFLOP/s , 173534.9 tokens/s INFO:__main__:2024-11-30 08:19:06 | Epoch: 0 | Step: 339590 | Dataset: 0-7821960 | Loss: 0.713 | 598 ms/step , 115373.20 GFLOP/s , 173493.3 tokens/s INFO:__main__:2024-11-30 08:19:13 | Epoch: 0 | Step: 339600 | Dataset: 0-7824360 | Loss: 0.692 | 597 ms/step , 115536.28 GFLOP/s , 173650.8 tokens/s INFO:__main__:2024-11-30 08:19:20 | Epoch: 0 | Step: 339610 | Dataset: 0-7826760 | Loss: 0.707 | 598 ms/step , 115442.02 GFLOP/s , 173571.2 tokens/s INFO:__main__:2024-11-30 08:19:27 | Epoch: 0 | Step: 339620 | Dataset: 0-7829160 | Loss: 0.640 | 598 ms/step , 115394.15 GFLOP/s , 173373.3 tokens/s INFO:__main__:2024-11-30 08:19:34 | Epoch: 0 | Step: 339630 | Dataset: 0-7831560 | Loss: 0.630 | 597 ms/step , 115535.28 GFLOP/s , 173465.7 tokens/s INFO:__main__:2024-11-30 08:19:41 | Epoch: 0 | Step: 339640 | Dataset: 0-7833960 | Loss: 0.655 | 597 ms/step , 115648.50 GFLOP/s , 173529.9 tokens/s INFO:__main__:2024-11-30 08:19:48 | Epoch: 0 | Step: 339650 | Dataset: 0-7836360 | Loss: 0.761 | 598 ms/step , 115447.61 GFLOP/s , 173434.8 tokens/s INFO:__main__:2024-11-30 08:19:56 | Epoch: 0 | Step: 339660 | Dataset: 0-7838760 | Loss: 0.693 | 598 ms/step , 115434.08 GFLOP/s , 173462.9 tokens/s INFO:__main__:2024-11-30 08:20:03 | Epoch: 0 | Step: 339670 | Dataset: 0-7841160 | Loss: 0.609 | 598 ms/step , 115499.62 GFLOP/s , 173561.6 tokens/s INFO:__main__:2024-11-30 08:20:10 | Epoch: 0 | Step: 339680 | Dataset: 0-7843560 | Loss: 0.744 | 598 ms/step , 115478.25 GFLOP/s , 173607.7 tokens/s INFO:__main__:2024-11-30 08:20:17 | Epoch: 0 | Step: 339690 | Dataset: 0-7845960 | Loss: 0.694 | 598 ms/step , 115423.73 GFLOP/s , 173473.7 tokens/s INFO:__main__:2024-11-30 08:20:24 | Epoch: 0 | Step: 339700 | Dataset: 0-7848360 | Loss: 0.691 | 598 ms/step , 115337.70 GFLOP/s , 173478.7 tokens/s INFO:__main__:2024-11-30 08:20:31 | Epoch: 0 | Step: 339710 | Dataset: 0-7850760 | Loss: 0.637 | 598 ms/step , 115449.11 GFLOP/s , 173385.3 tokens/s INFO:__main__:2024-11-30 08:20:38 | Epoch: 0 | Step: 339720 | Dataset: 0-7853160 | Loss: 0.688 | 598 ms/step , 115379.89 GFLOP/s , 173465.0 tokens/s INFO:__main__:2024-11-30 08:20:45 | Epoch: 0 | Step: 339730 | Dataset: 0-7855560 | Loss: 0.650 | 597 ms/step , 115562.95 GFLOP/s , 173453.9 tokens/s INFO:__main__:2024-11-30 08:20:52 | Epoch: 0 | Step: 339740 | Dataset: 0-7857960 | Loss: 0.718 | 599 ms/step , 115270.22 GFLOP/s , 173475.5 tokens/s INFO:__main__:2024-11-30 08:20:59 | Epoch: 0 | Step: 339750 | Dataset: 0-7860360 | Loss: 0.662 | 597 ms/step , 115565.23 GFLOP/s , 173592.9 tokens/s INFO:__main__:2024-11-30 08:21:06 | Epoch: 0 | Step: 339760 | Dataset: 0-7862760 | Loss: 0.676 | 597 ms/step , 115562.80 GFLOP/s , 173606.3 tokens/s INFO:__main__:2024-11-30 08:21:13 | Epoch: 0 | Step: 339770 | Dataset: 0-7865160 | Loss: 0.298 | 596 ms/step , 115780.17 GFLOP/s , 173675.4 tokens/s INFO:__main__:2024-11-30 08:21:20 | Epoch: 0 | Step: 339780 | Dataset: 0-7867560 | Loss: 0.381 | 596 ms/step , 115773.45 GFLOP/s , 173901.7 tokens/s INFO:__main__:2024-11-30 08:21:28 | Epoch: 0 | Step: 339790 | Dataset: 0-7869960 | Loss: 0.296 | 596 ms/step , 115715.37 GFLOP/s , 173817.3 tokens/s INFO:__main__:2024-11-30 08:21:35 | Epoch: 0 | Step: 339800 | Dataset: 0-7872360 | Loss: 0.367 | 596 ms/step , 115786.49 GFLOP/s , 173934.3 tokens/s INFO:__main__:2024-11-30 08:21:42 | Epoch: 0 | Step: 339810 | Dataset: 0-7874760 | Loss: 0.290 | 596 ms/step , 115861.85 GFLOP/s , 173962.4 tokens/s INFO:__main__:2024-11-30 08:21:49 | Epoch: 0 | Step: 339820 | Dataset: 0-7877160 | Loss: 0.369 | 595 ms/step , 115955.43 GFLOP/s , 174010.5 tokens/s INFO:__main__:2024-11-30 08:21:56 | Epoch: 0 | Step: 339830 | Dataset: 0-7879560 | Loss: 0.349 | 595 ms/step , 115956.27 GFLOP/s , 174015.8 tokens/s INFO:__main__:2024-11-30 08:22:03 | Epoch: 0 | Step: 339840 | Dataset: 0-7881960 | Loss: 0.316 | 596 ms/step , 115819.79 GFLOP/s , 173996.7 tokens/s INFO:__main__:2024-11-30 08:22:10 | Epoch: 0 | Step: 339850 | Dataset: 0-7884360 | Loss: 0.321 | 595 ms/step , 115958.11 GFLOP/s , 174023.6 tokens/s INFO:__main__:2024-11-30 08:22:17 | Epoch: 0 | Step: 339860 | Dataset: 0-7886760 | Loss: 0.324 | 595 ms/step , 116048.29 GFLOP/s , 173978.3 tokens/s INFO:__main__:2024-11-30 08:22:24 | Epoch: 0 | Step: 339870 | Dataset: 0-7889160 | Loss: 0.346 | 596 ms/step , 115855.80 GFLOP/s , 174004.4 tokens/s INFO:__main__:2024-11-30 08:22:31 | Epoch: 0 | Step: 339880 | Dataset: 0-7891560 | Loss: 0.277 | 596 ms/step , 115846.60 GFLOP/s , 173984.9 tokens/s INFO:__main__:2024-11-30 08:22:38 | Epoch: 0 | Step: 339890 | Dataset: 0-7893960 | Loss: 0.274 | 595 ms/step , 115920.11 GFLOP/s , 174040.0 tokens/s INFO:__main__:2024-11-30 08:22:45 | Epoch: 0 | Step: 339900 | Dataset: 0-7896360 | Loss: 0.373 | 596 ms/step , 115793.73 GFLOP/s , 173998.4 tokens/s INFO:__main__:2024-11-30 08:22:52 | Epoch: 0 | Step: 339910 | Dataset: 0-7898760 | Loss: 0.326 | 595 ms/step , 115912.75 GFLOP/s , 174020.5 tokens/s INFO:__main__:2024-11-30 08:22:59 | Epoch: 0 | Step: 339920 | Dataset: 0-7901160 | Loss: 0.321 | 595 ms/step , 116079.69 GFLOP/s , 174045.7 tokens/s INFO:__main__:2024-11-30 08:23:06 | Epoch: 0 | Step: 339930 | Dataset: 0-7903560 | Loss: 0.381 | 596 ms/step , 115817.67 GFLOP/s , 173940.1 tokens/s INFO:__main__:2024-11-30 08:23:13 | Epoch: 0 | Step: 339940 | Dataset: 0-7905960 | Loss: 0.326 | 595 ms/step , 115939.75 GFLOP/s , 173958.6 tokens/s INFO:__main__:2024-11-30 08:23:21 | Epoch: 0 | Step: 339950 | Dataset: 0-7908360 | Loss: 0.284 | 595 ms/step , 115917.32 GFLOP/s , 174093.3 tokens/s INFO:__main__:2024-11-30 08:23:28 | Epoch: 0 | Step: 339960 | Dataset: 0-7910760 | Loss: 0.367 | 595 ms/step , 115921.59 GFLOP/s , 173954.6 tokens/s INFO:__main__:2024-11-30 08:23:35 | Epoch: 0 | Step: 339970 | Dataset: 0-7913160 | Loss: 0.322 | 595 ms/step , 115992.63 GFLOP/s , 174049.7 tokens/s INFO:__main__:2024-11-30 08:23:42 | Epoch: 0 | Step: 339980 | Dataset: 0-7915560 | Loss: 0.292 | 595 ms/step , 116061.75 GFLOP/s , 174071.6 tokens/s INFO:__main__:2024-11-30 08:23:49 | Epoch: 0 | Step: 339990 | Dataset: 0-7917960 | Loss: 0.283 | 596 ms/step , 115815.90 GFLOP/s , 174100.9 tokens/s INFO:__main__:2024-11-30 08:23:56 | Validation | Step: 340000 | Val_loss: 0.384 | Best_val_loss: 0.4363 INFO:__main__:2024-11-30 08:23:56 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_082356_step_340000.pt` INFO:__main__:2024-11-30 08:23:59 | Epoch: 0 | Step: 340000 | Dataset: 0-7920360 | Loss: 0.320 | 594 ms/step , 116195.49 GFLOP/s , 120333.7 tokens/s INFO:__main__:2024-11-30 08:24:06 | Epoch: 0 | Step: 340010 | Dataset: 0-7922760 | Loss: 0.341 | 597 ms/step , 115599.13 GFLOP/s , 173682.1 tokens/s INFO:__main__:2024-11-30 08:24:13 | Epoch: 0 | Step: 340020 | Dataset: 0-7925160 | Loss: 0.299 | 597 ms/step , 115592.75 GFLOP/s , 173667.2 tokens/s INFO:__main__:2024-11-30 08:24:20 | Epoch: 0 | Step: 340030 | Dataset: 0-7927560 | Loss: 0.316 | 598 ms/step , 115497.02 GFLOP/s , 173638.7 tokens/s INFO:__main__:2024-11-30 08:24:27 | Epoch: 0 | Step: 340040 | Dataset: 0-7929960 | Loss: 0.243 | 597 ms/step , 115532.96 GFLOP/s , 173677.9 tokens/s INFO:__main__:2024-11-30 08:24:34 | Epoch: 0 | Step: 340050 | Dataset: 0-7932360 | Loss: 0.369 | 598 ms/step , 115433.43 GFLOP/s , 173612.1 tokens/s INFO:__main__:2024-11-30 08:24:41 | Epoch: 0 | Step: 340060 | Dataset: 0-7934760 | Loss: 0.382 | 597 ms/step , 115556.33 GFLOP/s , 173558.9 tokens/s INFO:__main__:2024-11-30 08:24:49 | Epoch: 0 | Step: 340070 | Dataset: 0-7937160 | Loss: 0.374 | 596 ms/step , 115878.78 GFLOP/s , 173629.5 tokens/s INFO:__main__:2024-11-30 08:24:56 | Epoch: 0 | Step: 340080 | Dataset: 0-7939560 | Loss: 0.331 | 611 ms/step , 112991.82 GFLOP/s , 173379.1 tokens/s INFO:__main__:2024-11-30 08:25:03 | Epoch: 0 | Step: 340090 | Dataset: 0-7941960 | Loss: 0.249 | 595 ms/step , 115930.46 GFLOP/s , 173660.5 tokens/s INFO:__main__:2024-11-30 08:25:10 | Epoch: 0 | Step: 340100 | Dataset: 0-7944360 | Loss: 0.404 | 596 ms/step , 115796.76 GFLOP/s , 173592.8 tokens/s INFO:__main__:2024-11-30 08:25:17 | Epoch: 0 | Step: 340110 | Dataset: 0-7946760 | Loss: 0.321 | 597 ms/step , 115596.90 GFLOP/s , 173636.1 tokens/s INFO:__main__:2024-11-30 08:25:24 | Epoch: 0 | Step: 340120 | Dataset: 0-7949160 | Loss: 0.422 | 595 ms/step , 116073.54 GFLOP/s , 173614.4 tokens/s INFO:__main__:2024-11-30 08:25:31 | Epoch: 0 | Step: 340130 | Dataset: 0-7951560 | Loss: 0.371 | 596 ms/step , 115799.35 GFLOP/s , 173538.2 tokens/s INFO:__main__:2024-11-30 08:25:38 | Epoch: 0 | Step: 340140 | Dataset: 0-7953960 | Loss: 0.372 | 595 ms/step , 115963.92 GFLOP/s , 173538.5 tokens/s INFO:__main__:2024-11-30 08:25:45 | Epoch: 0 | Step: 340150 | Dataset: 0-7956360 | Loss: 0.297 | 595 ms/step , 115946.98 GFLOP/s , 173582.6 tokens/s INFO:__main__:2024-11-30 08:25:52 | Epoch: 0 | Step: 340160 | Dataset: 0-7958760 | Loss: 0.380 | 596 ms/step , 115801.02 GFLOP/s , 173493.9 tokens/s INFO:__main__:2024-11-30 08:25:59 | Epoch: 0 | Step: 340170 | Dataset: 0-7961160 | Loss: 0.355 | 595 ms/step , 115951.50 GFLOP/s , 173590.0 tokens/s INFO:__main__:2024-11-30 08:26:06 | Epoch: 0 | Step: 340180 | Dataset: 0-7963560 | Loss: 0.347 | 595 ms/step , 115919.87 GFLOP/s , 173592.5 tokens/s INFO:__main__:2024-11-30 08:26:13 | Epoch: 0 | Step: 340190 | Dataset: 0-7965960 | Loss: 0.292 | 596 ms/step , 115850.19 GFLOP/s , 173553.8 tokens/s INFO:__main__:2024-11-30 08:26:21 | Epoch: 0 | Step: 340200 | Dataset: 0-7968360 | Loss: 0.336 | 595 ms/step , 115945.47 GFLOP/s , 173712.2 tokens/s INFO:__main__:2024-11-30 08:26:28 | Epoch: 0 | Step: 340210 | Dataset: 0-7970760 | Loss: 0.267 | 595 ms/step , 115908.66 GFLOP/s , 173581.7 tokens/s INFO:__main__:2024-11-30 08:26:35 | Epoch: 0 | Step: 340220 | Dataset: 0-7973160 | Loss: 0.312 | 596 ms/step , 115878.26 GFLOP/s , 173564.1 tokens/s INFO:__main__:2024-11-30 08:26:42 | Epoch: 0 | Step: 340230 | Dataset: 0-7975560 | Loss: 0.343 | 596 ms/step , 115859.38 GFLOP/s , 173618.6 tokens/s INFO:__main__:2024-11-30 08:26:49 | Epoch: 0 | Step: 340240 | Dataset: 0-7977960 | Loss: 0.363 | 595 ms/step , 115937.47 GFLOP/s , 173566.1 tokens/s INFO:__main__:2024-11-30 08:26:56 | Epoch: 0 | Step: 340250 | Dataset: 0-7980360 | Loss: 0.349 | 595 ms/step , 116072.02 GFLOP/s , 173602.6 tokens/s INFO:__main__:2024-11-30 08:27:03 | Epoch: 0 | Step: 340260 | Dataset: 0-7982760 | Loss: 0.321 | 595 ms/step , 115911.29 GFLOP/s , 173585.7 tokens/s INFO:__main__:2024-11-30 08:27:10 | Epoch: 0 | Step: 340270 | Dataset: 0-7985160 | Loss: 0.317 | 596 ms/step , 115841.22 GFLOP/s , 173576.9 tokens/s INFO:__main__:2024-11-30 08:27:17 | Epoch: 0 | Step: 340280 | Dataset: 0-7987560 | Loss: 0.317 | 595 ms/step , 115949.57 GFLOP/s , 173669.5 tokens/s INFO:__main__:2024-11-30 08:27:24 | Epoch: 0 | Step: 340290 | Dataset: 0-7989960 | Loss: 0.339 | 596 ms/step , 115867.54 GFLOP/s , 173567.5 tokens/s INFO:__main__:2024-11-30 08:27:31 | Epoch: 0 | Step: 340300 | Dataset: 0-7992360 | Loss: 0.328 | 596 ms/step , 115826.50 GFLOP/s , 173588.2 tokens/s INFO:__main__:2024-11-30 08:27:38 | Epoch: 0 | Step: 340310 | Dataset: 0-7994760 | Loss: 0.346 | 595 ms/step , 115953.96 GFLOP/s , 173600.4 tokens/s INFO:__main__:2024-11-30 08:27:46 | Epoch: 0 | Step: 340320 | Dataset: 0-7997160 | Loss: 0.116 | 595 ms/step , 115958.34 GFLOP/s , 173451.9 tokens/s INFO:__main__:2024-11-30 08:27:53 | Epoch: 0 | Step: 340330 | Dataset: 0-7999560 | Loss: 0.887 | 597 ms/step , 115652.44 GFLOP/s , 173590.6 tokens/s INFO:__main__:2024-11-30 08:28:00 | Epoch: 0 | Step: 340340 | Dataset: 0-8001960 | Loss: 0.731 | 595 ms/step , 115892.62 GFLOP/s , 173568.6 tokens/s INFO:__main__:2024-11-30 08:28:07 | Epoch: 0 | Step: 340350 | Dataset: 0-8004360 | Loss: 0.727 | 595 ms/step , 115928.17 GFLOP/s , 173551.4 tokens/s INFO:__main__:2024-11-30 08:28:14 | Epoch: 0 | Step: 340360 | Dataset: 0-8006760 | Loss: 1.214 | 596 ms/step , 115883.58 GFLOP/s , 173685.4 tokens/s INFO:__main__:2024-11-30 08:28:21 | Epoch: 0 | Step: 340370 | Dataset: 0-8009160 | Loss: 0.121 | 594 ms/step , 116173.59 GFLOP/s , 173672.6 tokens/s INFO:__main__:2024-11-30 08:28:28 | Epoch: 0 | Step: 340380 | Dataset: 0-8011560 | Loss: 0.384 | 595 ms/step , 116075.78 GFLOP/s , 173673.6 tokens/s INFO:__main__:2024-11-30 08:28:35 | Epoch: 0 | Step: 340390 | Dataset: 0-8013960 | Loss: 0.468 | 596 ms/step , 115816.06 GFLOP/s , 173633.3 tokens/s INFO:__main__:2024-11-30 08:28:42 | Epoch: 0 | Step: 340400 | Dataset: 0-8016360 | Loss: 0.376 | 595 ms/step , 115897.31 GFLOP/s , 173654.2 tokens/s INFO:__main__:2024-11-30 08:28:49 | Epoch: 0 | Step: 340410 | Dataset: 0-8018760 | Loss: 0.382 | 595 ms/step , 116019.76 GFLOP/s , 173764.2 tokens/s INFO:__main__:2024-11-30 08:28:56 | Epoch: 0 | Step: 340420 | Dataset: 0-8021160 | Loss: 0.201 | 596 ms/step , 115877.38 GFLOP/s , 173695.3 tokens/s INFO:__main__:2024-11-30 08:29:03 | Epoch: 0 | Step: 340430 | Dataset: 0-8023560 | Loss: 0.801 | 596 ms/step , 115726.43 GFLOP/s , 173573.7 tokens/s INFO:__main__:2024-11-30 08:29:10 | Epoch: 0 | Step: 340440 | Dataset: 0-8025960 | Loss: 0.879 | 596 ms/step , 115863.02 GFLOP/s , 173519.0 tokens/s INFO:__main__:2024-11-30 08:29:18 | Epoch: 0 | Step: 340450 | Dataset: 0-8028360 | Loss: 1.502 | 597 ms/step , 115650.71 GFLOP/s , 173408.0 tokens/s INFO:__main__:2024-11-30 08:29:25 | Epoch: 0 | Step: 340460 | Dataset: 0-8030760 | Loss: 0.542 | 596 ms/step , 115883.61 GFLOP/s , 173667.0 tokens/s INFO:__main__:2024-11-30 08:29:32 | Epoch: 0 | Step: 340470 | Dataset: 0-8033160 | Loss: 0.354 | 595 ms/step , 116016.20 GFLOP/s , 173632.7 tokens/s INFO:__main__:2024-11-30 08:29:39 | Epoch: 0 | Step: 340480 | Dataset: 0-8035560 | Loss: 0.199 | 596 ms/step , 115888.99 GFLOP/s , 173649.1 tokens/s INFO:__main__:2024-11-30 08:29:46 | Epoch: 0 | Step: 340490 | Dataset: 0-8037960 | Loss: 0.158 | 595 ms/step , 116051.76 GFLOP/s , 173723.8 tokens/s INFO:__main__:2024-11-30 08:29:53 | Validation | Step: 340500 | Val_loss: 0.386 | Best_val_loss: 0.3839 INFO:__main__:2024-11-30 08:29:54 | Epoch: 0 | Step: 340500 | Dataset: 0-8040360 | Loss: 0.900 | 595 ms/step , 116049.33 GFLOP/s , 147877.8 tokens/s INFO:__main__:2024-11-30 08:30:01 | Epoch: 0 | Step: 340510 | Dataset: 0-8042760 | Loss: 0.146 | 595 ms/step , 115977.96 GFLOP/s , 173682.5 tokens/s INFO:__main__:2024-11-30 08:30:08 | Epoch: 0 | Step: 340520 | Dataset: 0-8045160 | Loss: 0.203 | 595 ms/step , 115956.95 GFLOP/s , 173722.0 tokens/s INFO:__main__:2024-11-30 08:30:15 | Epoch: 0 | Step: 340530 | Dataset: 0-8047560 | Loss: 0.221 | 595 ms/step , 116055.27 GFLOP/s , 173844.8 tokens/s INFO:__main__:2024-11-30 08:30:22 | Epoch: 0 | Step: 340540 | Dataset: 0-8049960 | Loss: 0.202 | 595 ms/step , 116039.48 GFLOP/s , 173812.1 tokens/s INFO:__main__:2024-11-30 08:30:30 | Epoch: 0 | Step: 340550 | Dataset: 0-8052360 | Loss: 0.230 | 596 ms/step , 115874.84 GFLOP/s , 173663.4 tokens/s INFO:__main__:2024-11-30 08:30:37 | Epoch: 0 | Step: 340560 | Dataset: 0-8054760 | Loss: 0.070 | 595 ms/step , 116029.37 GFLOP/s , 173647.0 tokens/s INFO:__main__:2024-11-30 08:30:44 | Epoch: 0 | Step: 340570 | Dataset: 0-8057160 | Loss: 0.791 | 595 ms/step , 115948.65 GFLOP/s , 173583.4 tokens/s INFO:__main__:2024-11-30 08:30:51 | Epoch: 0 | Step: 340580 | Dataset: 0-8059560 | Loss: 0.597 | 595 ms/step , 115969.15 GFLOP/s , 173501.3 tokens/s INFO:__main__:2024-11-30 08:30:58 | Epoch: 0 | Step: 340590 | Dataset: 0-8061960 | Loss: 0.753 | 596 ms/step , 115831.29 GFLOP/s , 173569.3 tokens/s INFO:__main__:2024-11-30 08:31:05 | Epoch: 0 | Step: 340600 | Dataset: 0-8064360 | Loss: 0.437 | 594 ms/step , 116105.36 GFLOP/s , 173592.3 tokens/s INFO:__main__:2024-11-30 08:31:12 | Epoch: 0 | Step: 340610 | Dataset: 0-8066760 | Loss: 0.603 | 595 ms/step , 115941.24 GFLOP/s , 173715.3 tokens/s INFO:__main__:2024-11-30 08:31:19 | Epoch: 0 | Step: 340620 | Dataset: 0-8069160 | Loss: 0.225 | 595 ms/step , 116026.07 GFLOP/s , 173701.5 tokens/s INFO:__main__:2024-11-30 08:31:26 | Epoch: 0 | Step: 340630 | Dataset: 0-8071560 | Loss: 1.000 | 596 ms/step , 115738.19 GFLOP/s , 173474.3 tokens/s INFO:__main__:2024-11-30 08:31:33 | Epoch: 0 | Step: 340640 | Dataset: 0-8073960 | Loss: 2.411 | 597 ms/step , 115657.70 GFLOP/s , 173450.4 tokens/s INFO:__main__:2024-11-30 08:31:40 | Epoch: 0 | Step: 340650 | Dataset: 0-8076360 | Loss: 0.699 | 595 ms/step , 116013.49 GFLOP/s , 173745.0 tokens/s INFO:__main__:2024-11-30 08:31:47 | Epoch: 0 | Step: 340660 | Dataset: 0-8078760 | Loss: 0.574 | 596 ms/step , 115820.35 GFLOP/s , 173640.5 tokens/s INFO:__main__:2024-11-30 08:31:54 | Epoch: 0 | Step: 340670 | Dataset: 0-8081160 | Loss: 0.381 | 595 ms/step , 115954.35 GFLOP/s , 173642.5 tokens/s INFO:__main__:2024-11-30 08:32:02 | Epoch: 0 | Step: 340680 | Dataset: 0-8083560 | Loss: 0.375 | 594 ms/step , 116089.28 GFLOP/s , 173642.7 tokens/s INFO:__main__:2024-11-30 08:32:09 | Epoch: 0 | Step: 340690 | Dataset: 0-8085960 | Loss: 0.282 | 595 ms/step , 116027.47 GFLOP/s , 173665.4 tokens/s INFO:__main__:2024-11-30 08:32:16 | Epoch: 0 | Step: 340700 | Dataset: 0-8088360 | Loss: 0.641 | 595 ms/step , 115937.03 GFLOP/s , 173444.3 tokens/s INFO:__main__:2024-11-30 08:32:23 | Epoch: 0 | Step: 340710 | Dataset: 0-8090760 | Loss: 0.508 | 595 ms/step , 116041.84 GFLOP/s , 173562.6 tokens/s INFO:__main__:2024-11-30 08:32:30 | Epoch: 0 | Step: 340720 | Dataset: 0-8093160 | Loss: 0.310 | 595 ms/step , 115927.64 GFLOP/s , 173551.1 tokens/s INFO:__main__:2024-11-30 08:32:37 | Epoch: 0 | Step: 340730 | Dataset: 0-8095560 | Loss: 0.232 | 595 ms/step , 116002.15 GFLOP/s , 173631.9 tokens/s INFO:__main__:2024-11-30 08:32:44 | Epoch: 0 | Step: 340740 | Dataset: 0-8097960 | Loss: 0.461 | 596 ms/step , 115857.40 GFLOP/s , 173759.8 tokens/s INFO:__main__:2024-11-30 08:32:51 | Epoch: 0 | Step: 340750 | Dataset: 0-8100360 | Loss: 0.459 | 595 ms/step , 115920.55 GFLOP/s , 173617.3 tokens/s INFO:__main__:2024-11-30 08:32:58 | Epoch: 0 | Step: 340760 | Dataset: 0-8102760 | Loss: 1.259 | 596 ms/step , 115867.11 GFLOP/s , 173408.5 tokens/s INFO:__main__:2024-11-30 08:33:05 | Epoch: 0 | Step: 340770 | Dataset: 0-8105160 | Loss: 0.465 | 596 ms/step , 115849.39 GFLOP/s , 173615.0 tokens/s INFO:__main__:2024-11-30 08:33:12 | Epoch: 0 | Step: 340780 | Dataset: 0-8107560 | Loss: 0.271 | 595 ms/step , 116030.31 GFLOP/s , 173722.2 tokens/s INFO:__main__:2024-11-30 08:33:19 | Epoch: 0 | Step: 340790 | Dataset: 0-8109960 | Loss: 1.590 | 597 ms/step , 115673.94 GFLOP/s , 173499.5 tokens/s INFO:__main__:2024-11-30 08:33:26 | Epoch: 0 | Step: 340800 | Dataset: 0-8112360 | Loss: 0.928 | 596 ms/step , 115721.17 GFLOP/s , 173537.9 tokens/s INFO:__main__:2024-11-30 08:33:34 | Epoch: 0 | Step: 340810 | Dataset: 0-8114760 | Loss: 2.332 | 596 ms/step , 115722.22 GFLOP/s , 173544.3 tokens/s INFO:__main__:2024-11-30 08:33:41 | Epoch: 0 | Step: 340820 | Dataset: 0-8117160 | Loss: 1.642 | 596 ms/step , 115769.65 GFLOP/s , 173353.7 tokens/s INFO:__main__:2024-11-30 08:33:48 | Epoch: 0 | Step: 340830 | Dataset: 0-8119560 | Loss: 1.164 | 597 ms/step , 115622.36 GFLOP/s , 173375.9 tokens/s INFO:__main__:2024-11-30 08:33:55 | Epoch: 0 | Step: 340840 | Dataset: 0-8121960 | Loss: 2.003 | 596 ms/step , 115727.17 GFLOP/s , 173290.6 tokens/s INFO:__main__:2024-11-30 08:34:02 | Epoch: 0 | Step: 340850 | Dataset: 0-8124360 | Loss: 1.425 | 596 ms/step , 115697.31 GFLOP/s , 173575.9 tokens/s INFO:__main__:2024-11-30 08:34:09 | Epoch: 0 | Step: 340860 | Dataset: 0-8126760 | Loss: 0.505 | 596 ms/step , 115729.18 GFLOP/s , 173900.2 tokens/s INFO:__main__:2024-11-30 08:34:16 | Epoch: 0 | Step: 340870 | Dataset: 0-8129160 | Loss: 0.397 | 596 ms/step , 115834.77 GFLOP/s , 174112.3 tokens/s INFO:__main__:2024-11-30 08:34:23 | Epoch: 0 | Step: 340880 | Dataset: 0-8131560 | Loss: 0.372 | 596 ms/step , 115824.97 GFLOP/s , 174146.8 tokens/s INFO:__main__:2024-11-30 08:34:30 | Epoch: 0 | Step: 340890 | Dataset: 0-8133960 | Loss: 0.392 | 596 ms/step , 115847.84 GFLOP/s , 174130.6 tokens/s INFO:__main__:2024-11-30 08:34:37 | Epoch: 0 | Step: 340900 | Dataset: 0-8136360 | Loss: 0.418 | 596 ms/step , 115887.69 GFLOP/s , 174130.8 tokens/s INFO:__main__:2024-11-30 08:34:44 | Epoch: 0 | Step: 340910 | Dataset: 0-8138760 | Loss: 0.384 | 595 ms/step , 115912.50 GFLOP/s , 174165.2 tokens/s INFO:__main__:2024-11-30 08:34:51 | Epoch: 0 | Step: 340920 | Dataset: 0-8141160 | Loss: 0.359 | 595 ms/step , 115944.95 GFLOP/s , 174178.7 tokens/s INFO:__main__:2024-11-30 08:34:58 | Epoch: 0 | Step: 340930 | Dataset: 0-8143560 | Loss: 0.402 | 596 ms/step , 115827.83 GFLOP/s , 174009.0 tokens/s INFO:__main__:2024-11-30 08:35:05 | Epoch: 0 | Step: 340940 | Dataset: 0-8145960 | Loss: 0.384 | 597 ms/step , 115627.72 GFLOP/s , 173965.8 tokens/s INFO:__main__:2024-11-30 08:35:12 | Epoch: 0 | Step: 340950 | Dataset: 0-8148360 | Loss: 0.358 | 595 ms/step , 115960.85 GFLOP/s , 174060.9 tokens/s INFO:__main__:2024-11-30 08:35:20 | Epoch: 0 | Step: 340960 | Dataset: 0-8150760 | Loss: 0.377 | 596 ms/step , 115880.17 GFLOP/s , 174134.7 tokens/s INFO:__main__:2024-11-30 08:35:27 | Epoch: 0 | Step: 340970 | Dataset: 0-8153160 | Loss: 0.390 | 595 ms/step , 115931.43 GFLOP/s , 174094.8 tokens/s INFO:__main__:2024-11-30 08:35:34 | Epoch: 0 | Step: 340980 | Dataset: 0-8155560 | Loss: 0.339 | 595 ms/step , 115959.19 GFLOP/s , 174098.9 tokens/s INFO:__main__:2024-11-30 08:35:41 | Epoch: 0 | Step: 340990 | Dataset: 0-8157960 | Loss: 0.385 | 596 ms/step , 115800.79 GFLOP/s , 174102.3 tokens/s INFO:__main__:2024-11-30 08:35:48 | Validation | Step: 341000 | Val_loss: 0.379 | Best_val_loss: 0.3839 INFO:__main__:2024-11-30 08:35:48 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_083548_step_341000.pt` INFO:__main__:2024-11-30 08:35:51 | Epoch: 0 | Step: 341000 | Dataset: 0-8160360 | Loss: 0.336 | 594 ms/step , 116269.73 GFLOP/s , 120793.5 tokens/s INFO:__main__:2024-11-30 08:35:58 | Epoch: 0 | Step: 341010 | Dataset: 0-8162760 | Loss: 0.375 | 598 ms/step , 115423.02 GFLOP/s , 173638.5 tokens/s INFO:__main__:2024-11-30 08:36:05 | Epoch: 0 | Step: 341020 | Dataset: 0-8165160 | Loss: 0.387 | 597 ms/step , 115526.22 GFLOP/s , 173612.2 tokens/s INFO:__main__:2024-11-30 08:36:12 | Epoch: 0 | Step: 341030 | Dataset: 0-8167560 | Loss: 0.331 | 597 ms/step , 115604.42 GFLOP/s , 173657.6 tokens/s INFO:__main__:2024-11-30 08:36:19 | Epoch: 0 | Step: 341040 | Dataset: 0-8169960 | Loss: 0.360 | 596 ms/step , 115702.64 GFLOP/s , 173644.9 tokens/s INFO:__main__:2024-11-30 08:36:26 | Epoch: 0 | Step: 341050 | Dataset: 0-8172360 | Loss: 0.371 | 595 ms/step , 116004.82 GFLOP/s , 173722.3 tokens/s INFO:__main__:2024-11-30 08:36:33 | Epoch: 0 | Step: 341060 | Dataset: 0-8174760 | Loss: 0.395 | 595 ms/step , 115920.99 GFLOP/s , 173412.1 tokens/s INFO:__main__:2024-11-30 08:36:40 | Epoch: 0 | Step: 341070 | Dataset: 0-8177160 | Loss: 0.310 | 595 ms/step , 115985.06 GFLOP/s , 173555.5 tokens/s INFO:__main__:2024-11-30 08:36:48 | Epoch: 0 | Step: 341080 | Dataset: 0-8179560 | Loss: 0.392 | 595 ms/step , 115930.29 GFLOP/s , 173581.3 tokens/s INFO:__main__:2024-11-30 08:36:55 | Epoch: 0 | Step: 341090 | Dataset: 0-8181960 | Loss: 0.370 | 596 ms/step , 115812.38 GFLOP/s , 173574.5 tokens/s INFO:__main__:2024-11-30 08:37:02 | Epoch: 0 | Step: 341100 | Dataset: 0-8184360 | Loss: 0.398 | 595 ms/step , 115930.72 GFLOP/s , 173619.6 tokens/s INFO:__main__:2024-11-30 08:37:09 | Epoch: 0 | Step: 341110 | Dataset: 0-8186760 | Loss: 0.337 | 595 ms/step , 115957.10 GFLOP/s , 173579.0 tokens/s INFO:__main__:2024-11-30 08:37:16 | Epoch: 0 | Step: 341120 | Dataset: 0-8189160 | Loss: 0.360 | 596 ms/step , 115844.30 GFLOP/s , 173530.2 tokens/s INFO:__main__:2024-11-30 08:37:23 | Epoch: 0 | Step: 341130 | Dataset: 0-8191560 | Loss: 0.336 | 595 ms/step , 115942.94 GFLOP/s , 173537.6 tokens/s INFO:__main__:2024-11-30 08:37:30 | Epoch: 0 | Step: 341140 | Dataset: 0-8193960 | Loss: 0.355 | 595 ms/step , 116015.22 GFLOP/s , 173612.1 tokens/s INFO:__main__:2024-11-30 08:37:37 | Epoch: 0 | Step: 341150 | Dataset: 0-8196360 | Loss: 0.378 | 596 ms/step , 115888.83 GFLOP/s , 173497.6 tokens/s INFO:__main__:2024-11-30 08:37:44 | Epoch: 0 | Step: 341160 | Dataset: 0-8198760 | Loss: 0.327 | 596 ms/step , 115835.01 GFLOP/s , 173592.0 tokens/s INFO:__main__:2024-11-30 08:37:51 | Epoch: 0 | Step: 341170 | Dataset: 0-8201160 | Loss: 0.360 | 595 ms/step , 115946.02 GFLOP/s , 173556.9 tokens/s INFO:__main__:2024-11-30 08:37:58 | Epoch: 0 | Step: 341180 | Dataset: 0-8203560 | Loss: 0.367 | 596 ms/step , 115752.66 GFLOP/s , 173500.3 tokens/s INFO:__main__:2024-11-30 08:38:05 | Epoch: 0 | Step: 341190 | Dataset: 0-8205960 | Loss: 0.340 | 596 ms/step , 115811.77 GFLOP/s , 173471.9 tokens/s INFO:__main__:2024-11-30 08:38:12 | Epoch: 0 | Step: 341200 | Dataset: 0-8208360 | Loss: 0.374 | 595 ms/step , 115959.54 GFLOP/s , 173478.6 tokens/s INFO:__main__:2024-11-30 08:38:20 | Epoch: 0 | Step: 341210 | Dataset: 0-8210760 | Loss: 0.392 | 596 ms/step , 115800.51 GFLOP/s , 173538.5 tokens/s INFO:__main__:2024-11-30 08:38:27 | Epoch: 0 | Step: 341220 | Dataset: 0-8213160 | Loss: 0.315 | 595 ms/step , 115970.53 GFLOP/s , 173563.9 tokens/s INFO:__main__:2024-11-30 08:38:34 | Epoch: 0 | Step: 341230 | Dataset: 0-8215560 | Loss: 0.406 | 595 ms/step , 115961.81 GFLOP/s , 173392.9 tokens/s INFO:__main__:2024-11-30 08:38:41 | Epoch: 0 | Step: 341240 | Dataset: 0-8217960 | Loss: 0.419 | 596 ms/step , 115871.60 GFLOP/s , 173545.7 tokens/s INFO:__main__:2024-11-30 08:38:48 | Epoch: 0 | Step: 341250 | Dataset: 0-8220360 | Loss: 0.354 | 596 ms/step , 115824.64 GFLOP/s , 173436.2 tokens/s INFO:__main__:2024-11-30 08:38:55 | Epoch: 0 | Step: 341260 | Dataset: 0-8222760 | Loss: 0.374 | 596 ms/step , 115843.34 GFLOP/s , 173707.4 tokens/s INFO:__main__:2024-11-30 08:39:02 | Epoch: 0 | Step: 341270 | Dataset: 0-8225160 | Loss: 0.337 | 598 ms/step , 115412.88 GFLOP/s , 174028.3 tokens/s INFO:__main__:2024-11-30 08:39:09 | Epoch: 0 | Step: 341280 | Dataset: 0-8227560 | Loss: 0.437 | 596 ms/step , 115835.27 GFLOP/s , 174049.7 tokens/s INFO:__main__:2024-11-30 08:39:16 | Epoch: 0 | Step: 341290 | Dataset: 0-8229960 | Loss: 0.345 | 596 ms/step , 115806.67 GFLOP/s , 174112.0 tokens/s INFO:__main__:2024-11-30 08:39:23 | Epoch: 0 | Step: 341300 | Dataset: 0-8232360 | Loss: 0.365 | 596 ms/step , 115871.95 GFLOP/s , 174011.8 tokens/s INFO:__main__:2024-11-30 08:39:30 | Epoch: 0 | Step: 341310 | Dataset: 0-8234760 | Loss: 0.349 | 595 ms/step , 115931.60 GFLOP/s , 174081.8 tokens/s INFO:__main__:2024-11-30 08:39:37 | Epoch: 0 | Step: 341320 | Dataset: 0-8237160 | Loss: 0.362 | 595 ms/step , 115906.86 GFLOP/s , 174053.4 tokens/s INFO:__main__:2024-11-30 08:39:44 | Epoch: 0 | Step: 341330 | Dataset: 0-8239560 | Loss: 0.383 | 595 ms/step , 115967.28 GFLOP/s , 174050.8 tokens/s INFO:__main__:2024-11-30 08:39:51 | Epoch: 0 | Step: 341340 | Dataset: 0-8241960 | Loss: 0.401 | 595 ms/step , 115962.09 GFLOP/s , 174075.6 tokens/s INFO:__main__:2024-11-30 08:39:58 | Epoch: 0 | Step: 341350 | Dataset: 0-8244360 | Loss: 0.368 | 596 ms/step , 115779.91 GFLOP/s , 174101.3 tokens/s INFO:__main__:2024-11-30 08:40:06 | Epoch: 0 | Step: 341360 | Dataset: 0-8246760 | Loss: 0.367 | 595 ms/step , 115902.22 GFLOP/s , 174015.9 tokens/s INFO:__main__:2024-11-30 08:40:13 | Epoch: 0 | Step: 341370 | Dataset: 0-8249160 | Loss: 0.392 | 595 ms/step , 115936.91 GFLOP/s , 174054.1 tokens/s INFO:__main__:2024-11-30 08:40:20 | Epoch: 0 | Step: 341380 | Dataset: 0-8251560 | Loss: 0.388 | 596 ms/step , 115868.27 GFLOP/s , 174012.1 tokens/s INFO:__main__:2024-11-30 08:40:27 | Epoch: 0 | Step: 341390 | Dataset: 0-8253960 | Loss: 0.403 | 595 ms/step , 115898.83 GFLOP/s , 174066.7 tokens/s INFO:__main__:2024-11-30 08:40:34 | Epoch: 0 | Step: 341400 | Dataset: 0-8256360 | Loss: 0.366 | 595 ms/step , 116017.39 GFLOP/s , 174066.8 tokens/s INFO:__main__:2024-11-30 08:40:41 | Epoch: 0 | Step: 341410 | Dataset: 0-8258760 | Loss: 0.362 | 595 ms/step , 115961.20 GFLOP/s , 174082.6 tokens/s INFO:__main__:2024-11-30 08:40:48 | Epoch: 0 | Step: 341420 | Dataset: 0-8261160 | Loss: 0.351 | 596 ms/step , 115792.80 GFLOP/s , 173915.8 tokens/s INFO:__main__:2024-11-30 08:40:55 | Epoch: 0 | Step: 341430 | Dataset: 0-8263560 | Loss: 0.336 | 595 ms/step , 116056.74 GFLOP/s , 174126.9 tokens/s INFO:__main__:2024-11-30 08:41:02 | Epoch: 0 | Step: 341440 | Dataset: 0-8265960 | Loss: 0.423 | 596 ms/step , 115819.56 GFLOP/s , 174072.4 tokens/s INFO:__main__:2024-11-30 08:41:09 | Epoch: 0 | Step: 341450 | Dataset: 0-8268360 | Loss: 0.407 | 596 ms/step , 115806.98 GFLOP/s , 174115.2 tokens/s INFO:__main__:2024-11-30 08:41:16 | Epoch: 0 | Step: 341460 | Dataset: 0-8270760 | Loss: 0.350 | 596 ms/step , 115853.73 GFLOP/s , 174159.5 tokens/s INFO:__main__:2024-11-30 08:41:23 | Epoch: 0 | Step: 341470 | Dataset: 0-8273160 | Loss: 0.374 | 595 ms/step , 115990.72 GFLOP/s , 174095.7 tokens/s INFO:__main__:2024-11-30 08:41:30 | Epoch: 0 | Step: 341480 | Dataset: 0-8275560 | Loss: 0.401 | 596 ms/step , 115821.29 GFLOP/s , 174112.5 tokens/s INFO:__main__:2024-11-30 08:41:37 | Epoch: 0 | Step: 341490 | Dataset: 0-8277960 | Loss: 0.368 | 595 ms/step , 115938.42 GFLOP/s , 174021.3 tokens/s INFO:__main__:2024-11-30 08:41:45 | Validation | Step: 341500 | Val_loss: 0.375 | Best_val_loss: 0.3786 INFO:__main__:2024-11-30 08:41:45 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_084145_step_341500.pt` INFO:__main__:2024-11-30 08:41:48 | Epoch: 0 | Step: 341500 | Dataset: 0-8280360 | Loss: 0.329 | 594 ms/step , 116089.75 GFLOP/s , 120704.0 tokens/s INFO:__main__:2024-11-30 08:41:55 | Epoch: 0 | Step: 341510 | Dataset: 0-8282760 | Loss: 0.296 | 597 ms/step , 115502.82 GFLOP/s , 173610.0 tokens/s INFO:__main__:2024-11-30 08:42:02 | Epoch: 0 | Step: 341520 | Dataset: 0-8285160 | Loss: 0.328 | 597 ms/step , 115598.63 GFLOP/s , 173482.2 tokens/s INFO:__main__:2024-11-30 08:42:09 | Epoch: 0 | Step: 341530 | Dataset: 0-8287560 | Loss: 0.350 | 598 ms/step , 115425.81 GFLOP/s , 173588.9 tokens/s INFO:__main__:2024-11-30 08:42:16 | Epoch: 0 | Step: 341540 | Dataset: 0-8289960 | Loss: 0.361 | 595 ms/step , 115938.40 GFLOP/s , 173529.5 tokens/s INFO:__main__:2024-11-30 08:42:23 | Epoch: 0 | Step: 341550 | Dataset: 0-8292360 | Loss: 0.357 | 596 ms/step , 115818.99 GFLOP/s , 173614.1 tokens/s INFO:__main__:2024-11-30 08:42:30 | Epoch: 0 | Step: 341560 | Dataset: 0-8294760 | Loss: 0.350 | 596 ms/step , 115831.64 GFLOP/s , 173564.6 tokens/s INFO:__main__:2024-11-30 08:42:37 | Epoch: 0 | Step: 341570 | Dataset: 0-8297160 | Loss: 0.323 | 595 ms/step , 115906.99 GFLOP/s , 173561.8 tokens/s INFO:__main__:2024-11-30 08:42:44 | Epoch: 0 | Step: 341580 | Dataset: 0-8299560 | Loss: 0.369 | 595 ms/step , 115920.75 GFLOP/s , 173592.7 tokens/s INFO:__main__:2024-11-30 08:42:51 | Epoch: 0 | Step: 341590 | Dataset: 0-8301960 | Loss: 0.387 | 595 ms/step , 115916.63 GFLOP/s , 173564.8 tokens/s INFO:__main__:2024-11-30 08:42:58 | Epoch: 0 | Step: 341600 | Dataset: 0-8304360 | Loss: 0.392 | 595 ms/step , 115898.59 GFLOP/s , 173531.5 tokens/s INFO:__main__:2024-11-30 08:43:05 | Epoch: 0 | Step: 341610 | Dataset: 0-8306760 | Loss: 0.409 | 595 ms/step , 115927.70 GFLOP/s , 173534.1 tokens/s INFO:__main__:2024-11-30 08:43:12 | Epoch: 0 | Step: 341620 | Dataset: 0-8309160 | Loss: 0.339 | 595 ms/step , 115914.13 GFLOP/s , 173677.3 tokens/s INFO:__main__:2024-11-30 08:43:20 | Epoch: 0 | Step: 341630 | Dataset: 0-8311560 | Loss: 0.336 | 596 ms/step , 115849.51 GFLOP/s , 173640.3 tokens/s INFO:__main__:2024-11-30 08:43:27 | Epoch: 0 | Step: 341640 | Dataset: 0-8313960 | Loss: 0.397 | 595 ms/step , 115904.64 GFLOP/s , 173514.6 tokens/s INFO:__main__:2024-11-30 08:43:34 | Epoch: 0 | Step: 341650 | Dataset: 0-8316360 | Loss: 0.366 | 596 ms/step , 115820.12 GFLOP/s , 173904.3 tokens/s INFO:__main__:2024-11-30 08:43:41 | Epoch: 0 | Step: 341660 | Dataset: 0-8318760 | Loss: 0.351 | 596 ms/step , 115839.98 GFLOP/s , 174090.2 tokens/s INFO:__main__:2024-11-30 08:43:48 | Epoch: 0 | Step: 341670 | Dataset: 0-8321160 | Loss: 0.327 | 596 ms/step , 115799.83 GFLOP/s , 174021.2 tokens/s INFO:__main__:2024-11-30 08:43:55 | Epoch: 0 | Step: 341680 | Dataset: 0-8323560 | Loss: 0.420 | 596 ms/step , 115703.00 GFLOP/s , 174087.9 tokens/s INFO:__main__:2024-11-30 08:44:02 | Epoch: 0 | Step: 341690 | Dataset: 0-8325960 | Loss: 0.396 | 595 ms/step , 115939.75 GFLOP/s , 174041.6 tokens/s INFO:__main__:2024-11-30 08:44:09 | Epoch: 0 | Step: 341700 | Dataset: 0-8328360 | Loss: 0.351 | 595 ms/step , 115924.27 GFLOP/s , 174043.8 tokens/s INFO:__main__:2024-11-30 08:44:16 | Epoch: 0 | Step: 341710 | Dataset: 0-8330760 | Loss: 0.373 | 596 ms/step , 115772.69 GFLOP/s , 174021.7 tokens/s INFO:__main__:2024-11-30 08:44:23 | Epoch: 0 | Step: 341720 | Dataset: 0-8333160 | Loss: 0.342 | 596 ms/step , 115840.03 GFLOP/s , 174051.5 tokens/s INFO:__main__:2024-11-30 08:44:30 | Epoch: 0 | Step: 341730 | Dataset: 0-8335560 | Loss: 0.382 | 596 ms/step , 115867.06 GFLOP/s , 174034.0 tokens/s INFO:__main__:2024-11-30 08:44:37 | Epoch: 0 | Step: 341740 | Dataset: 0-8337960 | Loss: 0.360 | 596 ms/step , 115795.64 GFLOP/s , 174048.2 tokens/s INFO:__main__:2024-11-30 08:44:44 | Epoch: 0 | Step: 341750 | Dataset: 0-8340360 | Loss: 0.377 | 595 ms/step , 115927.82 GFLOP/s , 174023.7 tokens/s INFO:__main__:2024-11-30 08:44:51 | Epoch: 0 | Step: 341760 | Dataset: 0-8342760 | Loss: 0.340 | 595 ms/step , 115913.89 GFLOP/s , 174012.2 tokens/s INFO:__main__:2024-11-30 08:44:58 | Epoch: 0 | Step: 341770 | Dataset: 0-8345160 | Loss: 0.386 | 597 ms/step , 115572.76 GFLOP/s , 174002.4 tokens/s INFO:__main__:2024-11-30 08:45:05 | Epoch: 0 | Step: 341780 | Dataset: 0-8347560 | Loss: 0.338 | 596 ms/step , 115844.83 GFLOP/s , 174051.2 tokens/s INFO:__main__:2024-11-30 08:45:13 | Epoch: 0 | Step: 341790 | Dataset: 0-8349960 | Loss: 0.336 | 595 ms/step , 115898.25 GFLOP/s , 174066.6 tokens/s INFO:__main__:2024-11-30 08:45:20 | Epoch: 0 | Step: 341800 | Dataset: 0-8352360 | Loss: 0.345 | 597 ms/step , 115684.92 GFLOP/s , 174031.6 tokens/s INFO:__main__:2024-11-30 08:45:27 | Epoch: 0 | Step: 341810 | Dataset: 0-8354760 | Loss: 0.392 | 595 ms/step , 116002.65 GFLOP/s , 174091.9 tokens/s INFO:__main__:2024-11-30 08:45:34 | Epoch: 0 | Step: 341820 | Dataset: 0-8357160 | Loss: 0.411 | 595 ms/step , 115978.06 GFLOP/s , 174083.7 tokens/s INFO:__main__:2024-11-30 08:45:41 | Epoch: 0 | Step: 341830 | Dataset: 0-8359560 | Loss: 0.354 | 597 ms/step , 115664.67 GFLOP/s , 173983.2 tokens/s INFO:__main__:2024-11-30 08:45:48 | Epoch: 0 | Step: 341840 | Dataset: 0-8361960 | Loss: 0.359 | 595 ms/step , 115894.56 GFLOP/s , 174083.8 tokens/s INFO:__main__:2024-11-30 08:45:55 | Epoch: 0 | Step: 341850 | Dataset: 0-8364360 | Loss: 0.398 | 595 ms/step , 115925.98 GFLOP/s , 174096.3 tokens/s INFO:__main__:2024-11-30 08:46:02 | Epoch: 0 | Step: 341860 | Dataset: 0-8366760 | Loss: 0.370 | 596 ms/step , 115846.15 GFLOP/s , 174077.2 tokens/s INFO:__main__:2024-11-30 08:46:09 | Epoch: 0 | Step: 341870 | Dataset: 0-8369160 | Loss: 0.380 | 596 ms/step , 115870.48 GFLOP/s , 174066.2 tokens/s INFO:__main__:2024-11-30 08:46:16 | Epoch: 0 | Step: 341880 | Dataset: 0-8371560 | Loss: 0.364 | 596 ms/step , 115885.84 GFLOP/s , 174084.0 tokens/s INFO:__main__:2024-11-30 08:46:23 | Epoch: 0 | Step: 341890 | Dataset: 0-8373960 | Loss: 0.355 | 596 ms/step , 115837.01 GFLOP/s , 173923.6 tokens/s INFO:__main__:2024-11-30 08:46:30 | Epoch: 0 | Step: 341900 | Dataset: 0-8376360 | Loss: 0.372 | 596 ms/step , 115796.39 GFLOP/s , 174050.9 tokens/s INFO:__main__:2024-11-30 08:46:37 | Epoch: 0 | Step: 341910 | Dataset: 0-8378760 | Loss: 0.392 | 596 ms/step , 115866.05 GFLOP/s , 174093.8 tokens/s INFO:__main__:2024-11-30 08:46:44 | Epoch: 0 | Step: 341920 | Dataset: 0-8381160 | Loss: 0.353 | 596 ms/step , 115764.30 GFLOP/s , 174045.7 tokens/s INFO:__main__:2024-11-30 08:46:51 | Epoch: 0 | Step: 341930 | Dataset: 0-8383560 | Loss: 0.340 | 595 ms/step , 116011.18 GFLOP/s , 174080.9 tokens/s INFO:__main__:2024-11-30 08:46:58 | Epoch: 0 | Step: 341940 | Dataset: 0-8385960 | Loss: 0.319 | 595 ms/step , 115955.48 GFLOP/s , 174088.3 tokens/s INFO:__main__:2024-11-30 08:47:05 | Epoch: 0 | Step: 341950 | Dataset: 0-8388360 | Loss: 0.362 | 596 ms/step , 115752.92 GFLOP/s , 174021.3 tokens/s INFO:__main__:2024-11-30 08:47:13 | Epoch: 0 | Step: 341960 | Dataset: 0-8390760 | Loss: 0.503 | 595 ms/step , 115892.53 GFLOP/s , 174025.4 tokens/s INFO:__main__:2024-11-30 08:47:20 | Epoch: 0 | Step: 341970 | Dataset: 0-8393160 | Loss: 0.500 | 597 ms/step , 115662.92 GFLOP/s , 173987.8 tokens/s INFO:__main__:2024-11-30 08:47:27 | Epoch: 0 | Step: 341980 | Dataset: 0-8395560 | Loss: 0.473 | 595 ms/step , 115958.19 GFLOP/s , 173918.3 tokens/s INFO:__main__:2024-11-30 08:47:34 | Epoch: 0 | Step: 341990 | Dataset: 0-8397960 | Loss: 0.489 | 596 ms/step , 115698.22 GFLOP/s , 173971.1 tokens/s INFO:__main__:2024-11-30 08:47:41 | Validation | Step: 342000 | Val_loss: 0.369 | Best_val_loss: 0.3749 INFO:__main__:2024-11-30 08:47:41 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_084741_step_342000.pt` INFO:__main__:2024-11-30 08:47:44 | Epoch: 0 | Step: 342000 | Dataset: 0-8400360 | Loss: 0.441 | 594 ms/step , 116102.30 GFLOP/s , 121300.9 tokens/s INFO:__main__:2024-11-30 08:47:51 | Epoch: 0 | Step: 342010 | Dataset: 0-8402760 | Loss: 0.476 | 597 ms/step , 115536.69 GFLOP/s , 173530.2 tokens/s INFO:__main__:2024-11-30 08:47:58 | Epoch: 0 | Step: 342020 | Dataset: 0-8405160 | Loss: 0.479 | 599 ms/step , 115294.08 GFLOP/s , 173464.2 tokens/s INFO:__main__:2024-11-30 08:48:05 | Epoch: 0 | Step: 342030 | Dataset: 0-8407560 | Loss: 0.474 | 597 ms/step , 115644.80 GFLOP/s , 173539.8 tokens/s INFO:__main__:2024-11-30 08:48:12 | Epoch: 0 | Step: 342040 | Dataset: 0-8409960 | Loss: 0.435 | 596 ms/step , 115760.64 GFLOP/s , 173588.4 tokens/s INFO:__main__:2024-11-30 08:48:19 | Epoch: 0 | Step: 342050 | Dataset: 0-8412360 | Loss: 0.467 | 596 ms/step , 115734.67 GFLOP/s , 173496.7 tokens/s INFO:__main__:2024-11-30 08:48:26 | Epoch: 0 | Step: 342060 | Dataset: 0-8414760 | Loss: 0.452 | 596 ms/step , 115807.02 GFLOP/s , 173493.6 tokens/s INFO:__main__:2024-11-30 08:48:33 | Epoch: 0 | Step: 342070 | Dataset: 0-8417160 | Loss: 0.454 | 596 ms/step , 115861.54 GFLOP/s , 173501.3 tokens/s INFO:__main__:2024-11-30 08:48:41 | Epoch: 0 | Step: 342080 | Dataset: 0-8419560 | Loss: 0.480 | 596 ms/step , 115756.55 GFLOP/s , 173446.1 tokens/s INFO:__main__:2024-11-30 08:48:48 | Epoch: 0 | Step: 342090 | Dataset: 0-8421960 | Loss: 0.440 | 596 ms/step , 115794.01 GFLOP/s , 173458.5 tokens/s INFO:__main__:2024-11-30 08:48:55 | Epoch: 0 | Step: 342100 | Dataset: 0-8424360 | Loss: 0.457 | 596 ms/step , 115825.67 GFLOP/s , 173397.5 tokens/s INFO:__main__:2024-11-30 08:49:02 | Epoch: 0 | Step: 342110 | Dataset: 0-8426760 | Loss: 0.417 | 596 ms/step , 115703.93 GFLOP/s , 173548.3 tokens/s INFO:__main__:2024-11-30 08:49:09 | Epoch: 0 | Step: 342120 | Dataset: 0-8429160 | Loss: 0.433 | 596 ms/step , 115847.30 GFLOP/s , 174029.9 tokens/s INFO:__main__:2024-11-30 08:49:16 | Epoch: 0 | Step: 342130 | Dataset: 0-8431560 | Loss: 0.452 | 596 ms/step , 115809.95 GFLOP/s , 173939.5 tokens/s INFO:__main__:2024-11-30 08:49:23 | Epoch: 0 | Step: 342140 | Dataset: 0-8433960 | Loss: 0.473 | 597 ms/step , 115655.58 GFLOP/s , 173969.8 tokens/s INFO:__main__:2024-11-30 08:49:30 | Epoch: 0 | Step: 342150 | Dataset: 0-8436360 | Loss: 0.417 | 596 ms/step , 115864.76 GFLOP/s , 173995.3 tokens/s INFO:__main__:2024-11-30 08:49:37 | Epoch: 0 | Step: 342160 | Dataset: 0-8438760 | Loss: 0.442 | 596 ms/step , 115832.85 GFLOP/s , 174019.5 tokens/s INFO:__main__:2024-11-30 08:49:44 | Epoch: 0 | Step: 342170 | Dataset: 0-8441160 | Loss: 0.426 | 597 ms/step , 115563.94 GFLOP/s , 173955.5 tokens/s INFO:__main__:2024-11-30 08:49:51 | Epoch: 0 | Step: 342180 | Dataset: 0-8443560 | Loss: 0.403 | 595 ms/step , 115934.59 GFLOP/s , 173969.7 tokens/s INFO:__main__:2024-11-30 08:49:58 | Epoch: 0 | Step: 342190 | Dataset: 0-8445960 | Loss: 0.470 | 596 ms/step , 115738.78 GFLOP/s , 173865.2 tokens/s INFO:__main__:2024-11-30 08:50:05 | Epoch: 0 | Step: 342200 | Dataset: 0-8448360 | Loss: 0.452 | 596 ms/step , 115814.32 GFLOP/s , 173980.2 tokens/s INFO:__main__:2024-11-30 08:50:12 | Epoch: 0 | Step: 342210 | Dataset: 0-8450760 | Loss: 0.500 | 596 ms/step , 115748.78 GFLOP/s , 173975.9 tokens/s INFO:__main__:2024-11-30 08:50:19 | Epoch: 0 | Step: 342220 | Dataset: 0-8453160 | Loss: 0.431 | 596 ms/step , 115800.64 GFLOP/s , 174069.0 tokens/s INFO:__main__:2024-11-30 08:50:27 | Epoch: 0 | Step: 342230 | Dataset: 0-8455560 | Loss: 0.465 | 596 ms/step , 115712.55 GFLOP/s , 174028.1 tokens/s INFO:__main__:2024-11-30 08:50:34 | Epoch: 0 | Step: 342240 | Dataset: 0-8457960 | Loss: 0.394 | 596 ms/step , 115859.45 GFLOP/s , 174102.4 tokens/s INFO:__main__:2024-11-30 08:50:41 | Epoch: 0 | Step: 342250 | Dataset: 0-8460360 | Loss: 0.434 | 596 ms/step , 115818.63 GFLOP/s , 173983.9 tokens/s INFO:__main__:2024-11-30 08:50:48 | Epoch: 0 | Step: 342260 | Dataset: 0-8462760 | Loss: 0.460 | 596 ms/step , 115722.12 GFLOP/s , 174055.3 tokens/s INFO:__main__:2024-11-30 08:50:55 | Epoch: 0 | Step: 342270 | Dataset: 0-8465160 | Loss: 0.440 | 596 ms/step , 115822.62 GFLOP/s , 174043.4 tokens/s INFO:__main__:2024-11-30 08:51:02 | Epoch: 0 | Step: 342280 | Dataset: 0-8467560 | Loss: 0.444 | 596 ms/step , 115740.47 GFLOP/s , 174052.3 tokens/s INFO:__main__:2024-11-30 08:51:09 | Epoch: 0 | Step: 342290 | Dataset: 0-8469960 | Loss: 0.434 | 596 ms/step , 115731.86 GFLOP/s , 174010.9 tokens/s INFO:__main__:2024-11-30 08:51:16 | Epoch: 0 | Step: 342300 | Dataset: 0-8472360 | Loss: 0.414 | 595 ms/step , 115975.37 GFLOP/s , 174060.5 tokens/s INFO:__main__:2024-11-30 08:51:23 | Epoch: 0 | Step: 342310 | Dataset: 0-8474760 | Loss: 0.457 | 595 ms/step , 115891.19 GFLOP/s , 174084.0 tokens/s INFO:__main__:2024-11-30 08:51:30 | Epoch: 0 | Step: 342320 | Dataset: 0-8477160 | Loss: 0.456 | 596 ms/step , 115701.27 GFLOP/s , 174081.5 tokens/s INFO:__main__:2024-11-30 08:51:37 | Epoch: 0 | Step: 342330 | Dataset: 0-8479560 | Loss: 0.408 | 596 ms/step , 115854.55 GFLOP/s , 173952.7 tokens/s INFO:__main__:2024-11-30 08:51:44 | Epoch: 0 | Step: 342340 | Dataset: 0-8481960 | Loss: 0.389 | 596 ms/step , 115713.04 GFLOP/s , 173867.9 tokens/s INFO:__main__:2024-11-30 08:51:51 | Epoch: 0 | Step: 342350 | Dataset: 0-8484360 | Loss: 0.391 | 596 ms/step , 115858.73 GFLOP/s , 173943.3 tokens/s INFO:__main__:2024-11-30 08:51:58 | Epoch: 0 | Step: 342360 | Dataset: 0-8486760 | Loss: 0.424 | 595 ms/step , 115942.46 GFLOP/s , 174000.8 tokens/s INFO:__main__:2024-11-30 08:52:05 | Epoch: 0 | Step: 342370 | Dataset: 0-8489160 | Loss: 0.415 | 596 ms/step , 115719.43 GFLOP/s , 173990.6 tokens/s INFO:__main__:2024-11-30 08:52:12 | Epoch: 0 | Step: 342380 | Dataset: 0-8491560 | Loss: 0.436 | 596 ms/step , 115842.97 GFLOP/s , 174042.5 tokens/s INFO:__main__:2024-11-30 08:52:20 | Epoch: 0 | Step: 342390 | Dataset: 0-8493960 | Loss: 0.362 | 596 ms/step , 115804.39 GFLOP/s , 174042.9 tokens/s INFO:__main__:2024-11-30 08:52:27 | Epoch: 0 | Step: 342400 | Dataset: 0-8496360 | Loss: 0.445 | 595 ms/step , 115905.08 GFLOP/s , 174103.7 tokens/s INFO:__main__:2024-11-30 08:52:34 | Epoch: 0 | Step: 342410 | Dataset: 0-8498760 | Loss: 0.435 | 597 ms/step , 115668.67 GFLOP/s , 173982.8 tokens/s INFO:__main__:2024-11-30 08:52:41 | Epoch: 0 | Step: 342420 | Dataset: 0-8501160 | Loss: 0.435 | 596 ms/step , 115885.66 GFLOP/s , 174065.7 tokens/s INFO:__main__:2024-11-30 08:52:48 | Epoch: 0 | Step: 342430 | Dataset: 0-8503560 | Loss: 0.471 | 596 ms/step , 115825.12 GFLOP/s , 173937.9 tokens/s INFO:__main__:2024-11-30 08:52:55 | Epoch: 0 | Step: 342440 | Dataset: 0-8505960 | Loss: 0.435 | 596 ms/step , 115756.09 GFLOP/s , 174002.2 tokens/s INFO:__main__:2024-11-30 08:53:02 | Epoch: 0 | Step: 342450 | Dataset: 0-8508360 | Loss: 0.439 | 596 ms/step , 115711.69 GFLOP/s , 173977.2 tokens/s INFO:__main__:2024-11-30 08:53:09 | Epoch: 0 | Step: 342460 | Dataset: 0-8510760 | Loss: 0.447 | 596 ms/step , 115790.80 GFLOP/s , 174031.0 tokens/s INFO:__main__:2024-11-30 08:53:16 | Epoch: 0 | Step: 342470 | Dataset: 0-8513160 | Loss: 0.426 | 596 ms/step , 115826.76 GFLOP/s , 174016.6 tokens/s INFO:__main__:2024-11-30 08:53:23 | Epoch: 0 | Step: 342480 | Dataset: 0-8515560 | Loss: 0.453 | 596 ms/step , 115820.41 GFLOP/s , 174016.7 tokens/s INFO:__main__:2024-11-30 08:53:30 | Epoch: 0 | Step: 342490 | Dataset: 0-8517960 | Loss: 0.406 | 596 ms/step , 115823.87 GFLOP/s , 174037.7 tokens/s INFO:__main__:2024-11-30 08:53:38 | Validation | Step: 342500 | Val_loss: 0.362 | Best_val_loss: 0.3687 INFO:__main__:2024-11-30 08:53:38 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_085338_step_342500.pt` INFO:__main__:2024-11-30 08:53:40 | Epoch: 0 | Step: 342500 | Dataset: 0-8520360 | Loss: 0.445 | 594 ms/step , 116141.45 GFLOP/s , 121468.1 tokens/s INFO:__main__:2024-11-30 08:53:47 | Epoch: 0 | Step: 342510 | Dataset: 0-8522760 | Loss: 0.463 | 599 ms/step , 115272.42 GFLOP/s , 173598.9 tokens/s INFO:__main__:2024-11-30 08:53:54 | Epoch: 0 | Step: 342520 | Dataset: 0-8525160 | Loss: 0.500 | 595 ms/step , 116013.24 GFLOP/s , 173588.0 tokens/s INFO:__main__:2024-11-30 08:54:01 | Epoch: 0 | Step: 342530 | Dataset: 0-8527560 | Loss: 0.441 | 596 ms/step , 115879.47 GFLOP/s , 173692.2 tokens/s INFO:__main__:2024-11-30 08:54:09 | Epoch: 0 | Step: 342540 | Dataset: 0-8529960 | Loss: 0.459 | 595 ms/step , 116016.35 GFLOP/s , 173613.6 tokens/s INFO:__main__:2024-11-30 08:54:16 | Epoch: 0 | Step: 342550 | Dataset: 0-8532360 | Loss: 0.463 | 595 ms/step , 116012.91 GFLOP/s , 173595.9 tokens/s INFO:__main__:2024-11-30 08:54:23 | Epoch: 0 | Step: 342560 | Dataset: 0-8534760 | Loss: 0.457 | 596 ms/step , 115884.33 GFLOP/s , 173538.5 tokens/s INFO:__main__:2024-11-30 08:54:30 | Epoch: 0 | Step: 342570 | Dataset: 0-8537160 | Loss: 0.491 | 596 ms/step , 115704.22 GFLOP/s , 173448.3 tokens/s INFO:__main__:2024-11-30 08:54:37 | Epoch: 0 | Step: 342580 | Dataset: 0-8539560 | Loss: 0.478 | 595 ms/step , 115988.06 GFLOP/s , 173461.9 tokens/s INFO:__main__:2024-11-30 08:54:44 | Epoch: 0 | Step: 342590 | Dataset: 0-8541960 | Loss: 0.511 | 596 ms/step , 115806.07 GFLOP/s , 173698.6 tokens/s INFO:__main__:2024-11-30 08:54:51 | Epoch: 0 | Step: 342600 | Dataset: 0-8544360 | Loss: 0.443 | 596 ms/step , 115869.58 GFLOP/s , 173959.1 tokens/s INFO:__main__:2024-11-30 08:54:58 | Epoch: 0 | Step: 342610 | Dataset: 0-8546760 | Loss: 0.427 | 595 ms/step , 115915.62 GFLOP/s , 174003.5 tokens/s INFO:__main__:2024-11-30 08:55:05 | Epoch: 0 | Step: 342620 | Dataset: 0-8549160 | Loss: 0.523 | 596 ms/step , 115861.78 GFLOP/s , 173959.9 tokens/s INFO:__main__:2024-11-30 08:55:12 | Epoch: 0 | Step: 342630 | Dataset: 0-8551560 | Loss: 0.407 | 597 ms/step , 115692.85 GFLOP/s , 173984.7 tokens/s INFO:__main__:2024-11-30 08:55:19 | Epoch: 0 | Step: 342640 | Dataset: 0-8553960 | Loss: 0.457 | 596 ms/step , 115801.67 GFLOP/s , 173928.4 tokens/s INFO:__main__:2024-11-30 08:55:26 | Epoch: 0 | Step: 342650 | Dataset: 0-8556360 | Loss: 0.455 | 596 ms/step , 115886.48 GFLOP/s , 174038.4 tokens/s INFO:__main__:2024-11-30 08:55:33 | Epoch: 0 | Step: 342660 | Dataset: 0-8558760 | Loss: 0.441 | 596 ms/step , 115863.99 GFLOP/s , 174068.2 tokens/s INFO:__main__:2024-11-30 08:55:40 | Epoch: 0 | Step: 342670 | Dataset: 0-8561160 | Loss: 0.475 | 596 ms/step , 115838.16 GFLOP/s , 174044.4 tokens/s INFO:__main__:2024-11-30 08:55:48 | Epoch: 0 | Step: 342680 | Dataset: 0-8563560 | Loss: 0.464 | 596 ms/step , 115799.07 GFLOP/s , 174015.6 tokens/s INFO:__main__:2024-11-30 08:55:55 | Epoch: 0 | Step: 342690 | Dataset: 0-8565960 | Loss: 0.472 | 596 ms/step , 115725.64 GFLOP/s , 173996.8 tokens/s INFO:__main__:2024-11-30 08:56:02 | Epoch: 0 | Step: 342700 | Dataset: 0-8568360 | Loss: 0.459 | 596 ms/step , 115775.43 GFLOP/s , 173971.9 tokens/s INFO:__main__:2024-11-30 08:56:09 | Epoch: 0 | Step: 342710 | Dataset: 0-8570760 | Loss: 0.448 | 596 ms/step , 115880.47 GFLOP/s , 174012.8 tokens/s INFO:__main__:2024-11-30 08:56:16 | Epoch: 0 | Step: 342720 | Dataset: 0-8573160 | Loss: 0.467 | 596 ms/step , 115697.78 GFLOP/s , 173975.3 tokens/s INFO:__main__:2024-11-30 08:56:23 | Epoch: 0 | Step: 342730 | Dataset: 0-8575560 | Loss: 0.442 | 596 ms/step , 115820.83 GFLOP/s , 173882.7 tokens/s INFO:__main__:2024-11-30 08:56:30 | Epoch: 0 | Step: 342740 | Dataset: 0-8577960 | Loss: 0.489 | 596 ms/step , 115871.60 GFLOP/s , 173972.7 tokens/s INFO:__main__:2024-11-30 08:56:37 | Epoch: 0 | Step: 342750 | Dataset: 0-8580360 | Loss: 0.407 | 596 ms/step , 115731.19 GFLOP/s , 173922.0 tokens/s INFO:__main__:2024-11-30 08:56:44 | Epoch: 0 | Step: 342760 | Dataset: 0-8582760 | Loss: 0.429 | 596 ms/step , 115802.03 GFLOP/s , 173958.6 tokens/s INFO:__main__:2024-11-30 08:56:51 | Epoch: 0 | Step: 342770 | Dataset: 0-8585160 | Loss: 0.444 | 596 ms/step , 115847.87 GFLOP/s , 173919.1 tokens/s INFO:__main__:2024-11-30 08:56:58 | Epoch: 0 | Step: 342780 | Dataset: 0-8587560 | Loss: 0.495 | 596 ms/step , 115734.40 GFLOP/s , 173904.0 tokens/s INFO:__main__:2024-11-30 08:57:05 | Epoch: 0 | Step: 342790 | Dataset: 0-8589960 | Loss: 0.428 | 596 ms/step , 115696.25 GFLOP/s , 173953.3 tokens/s INFO:__main__:2024-11-30 08:57:12 | Epoch: 0 | Step: 342800 | Dataset: 0-8592360 | Loss: 0.429 | 595 ms/step , 115899.64 GFLOP/s , 173975.6 tokens/s INFO:__main__:2024-11-30 08:57:19 | Epoch: 0 | Step: 342810 | Dataset: 0-8594760 | Loss: 0.446 | 597 ms/step , 115621.90 GFLOP/s , 173877.7 tokens/s INFO:__main__:2024-11-30 08:57:26 | Epoch: 0 | Step: 342820 | Dataset: 0-8597160 | Loss: 0.427 | 596 ms/step , 115701.16 GFLOP/s , 173894.9 tokens/s INFO:__main__:2024-11-30 08:57:33 | Epoch: 0 | Step: 342830 | Dataset: 0-8599560 | Loss: 0.428 | 595 ms/step , 115941.49 GFLOP/s , 173958.1 tokens/s INFO:__main__:2024-11-30 08:57:41 | Epoch: 0 | Step: 342840 | Dataset: 0-8601960 | Loss: 0.434 | 596 ms/step , 115797.45 GFLOP/s , 173932.0 tokens/s INFO:__main__:2024-11-30 08:57:48 | Epoch: 0 | Step: 342850 | Dataset: 0-8604360 | Loss: 0.432 | 597 ms/step , 115652.52 GFLOP/s , 173951.5 tokens/s INFO:__main__:2024-11-30 08:57:55 | Epoch: 0 | Step: 342860 | Dataset: 0-8606760 | Loss: 0.439 | 596 ms/step , 115795.97 GFLOP/s , 173928.4 tokens/s INFO:__main__:2024-11-30 08:58:02 | Epoch: 0 | Step: 342870 | Dataset: 0-8609160 | Loss: 0.429 | 596 ms/step , 115737.81 GFLOP/s , 173864.1 tokens/s INFO:__main__:2024-11-30 08:58:09 | Epoch: 0 | Step: 342880 | Dataset: 0-8611560 | Loss: 0.404 | 598 ms/step , 115417.93 GFLOP/s , 173842.3 tokens/s INFO:__main__:2024-11-30 08:58:16 | Epoch: 0 | Step: 342890 | Dataset: 0-8613960 | Loss: 0.440 | 597 ms/step , 115583.15 GFLOP/s , 173910.4 tokens/s INFO:__main__:2024-11-30 08:58:23 | Epoch: 0 | Step: 342900 | Dataset: 0-8616360 | Loss: 0.439 | 597 ms/step , 115616.78 GFLOP/s , 173966.7 tokens/s INFO:__main__:2024-11-30 08:58:30 | Epoch: 0 | Step: 342910 | Dataset: 0-8618760 | Loss: 0.389 | 596 ms/step , 115834.17 GFLOP/s , 173955.8 tokens/s INFO:__main__:2024-11-30 08:58:37 | Epoch: 0 | Step: 342920 | Dataset: 0-8621160 | Loss: 0.441 | 597 ms/step , 115632.84 GFLOP/s , 173942.9 tokens/s INFO:__main__:2024-11-30 08:58:44 | Epoch: 0 | Step: 342930 | Dataset: 0-8623560 | Loss: 0.409 | 599 ms/step , 115245.56 GFLOP/s , 174009.7 tokens/s INFO:__main__:2024-11-30 08:58:51 | Epoch: 0 | Step: 342940 | Dataset: 0-8625960 | Loss: 0.431 | 596 ms/step , 115825.98 GFLOP/s , 173956.0 tokens/s INFO:__main__:2024-11-30 08:58:58 | Epoch: 0 | Step: 342950 | Dataset: 0-8628360 | Loss: 0.468 | 596 ms/step , 115804.36 GFLOP/s , 173975.0 tokens/s INFO:__main__:2024-11-30 08:59:05 | Epoch: 0 | Step: 342960 | Dataset: 0-8630760 | Loss: 0.432 | 596 ms/step , 115723.51 GFLOP/s , 174000.2 tokens/s INFO:__main__:2024-11-30 08:59:12 | Epoch: 0 | Step: 342970 | Dataset: 0-8633160 | Loss: 0.452 | 596 ms/step , 115838.20 GFLOP/s , 173941.2 tokens/s INFO:__main__:2024-11-30 08:59:19 | Epoch: 0 | Step: 342980 | Dataset: 0-8635560 | Loss: 0.395 | 596 ms/step , 115844.39 GFLOP/s , 174011.9 tokens/s INFO:__main__:2024-11-30 08:59:27 | Epoch: 0 | Step: 342990 | Dataset: 0-8637960 | Loss: 0.417 | 597 ms/step , 115694.71 GFLOP/s , 173944.3 tokens/s INFO:__main__:2024-11-30 08:59:34 | Validation | Step: 343000 | Val_loss: 0.408 | Best_val_loss: 0.3619 INFO:__main__:2024-11-30 08:59:34 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_085934_step_343000.pt` INFO:__main__:2024-11-30 08:59:37 | Epoch: 0 | Step: 343000 | Dataset: 0-8640360 | Loss: 0.442 | 595 ms/step , 115944.09 GFLOP/s , 121235.0 tokens/s INFO:__main__:2024-11-30 08:59:44 | Epoch: 0 | Step: 343010 | Dataset: 0-8642760 | Loss: 0.424 | 597 ms/step , 115509.52 GFLOP/s , 173481.8 tokens/s INFO:__main__:2024-11-30 08:59:51 | Epoch: 0 | Step: 343020 | Dataset: 0-8645160 | Loss: 0.404 | 597 ms/step , 115663.63 GFLOP/s , 173512.9 tokens/s INFO:__main__:2024-11-30 08:59:58 | Epoch: 0 | Step: 343030 | Dataset: 0-8647560 | Loss: 0.419 | 596 ms/step , 115809.27 GFLOP/s , 173577.7 tokens/s INFO:__main__:2024-11-30 09:00:05 | Epoch: 0 | Step: 343040 | Dataset: 0-8649960 | Loss: 0.414 | 595 ms/step , 115938.89 GFLOP/s , 173533.6 tokens/s INFO:__main__:2024-11-30 09:00:12 | Epoch: 0 | Step: 343050 | Dataset: 0-8652360 | Loss: 0.811 | 597 ms/step , 115661.54 GFLOP/s , 173356.2 tokens/s INFO:__main__:2024-11-30 09:00:19 | Epoch: 0 | Step: 343060 | Dataset: 0-8654760 | Loss: 0.834 | 598 ms/step , 115454.27 GFLOP/s , 173228.4 tokens/s INFO:__main__:2024-11-30 09:00:26 | Epoch: 0 | Step: 343070 | Dataset: 0-8657160 | Loss: 0.826 | 596 ms/step , 115701.58 GFLOP/s , 173289.2 tokens/s INFO:__main__:2024-11-30 09:00:33 | Epoch: 0 | Step: 343080 | Dataset: 0-8659560 | Loss: 0.794 | 597 ms/step , 115553.86 GFLOP/s , 173234.3 tokens/s INFO:__main__:2024-11-30 09:00:40 | Epoch: 0 | Step: 343090 | Dataset: 0-8661960 | Loss: 0.834 | 597 ms/step , 115654.66 GFLOP/s , 173209.4 tokens/s INFO:__main__:2024-11-30 09:00:48 | Epoch: 0 | Step: 343100 | Dataset: 0-8664360 | Loss: 0.831 | 597 ms/step , 115606.84 GFLOP/s , 173238.1 tokens/s INFO:__main__:2024-11-30 09:00:55 | Epoch: 0 | Step: 343110 | Dataset: 0-8666760 | Loss: 0.802 | 597 ms/step , 115542.40 GFLOP/s , 173664.4 tokens/s INFO:__main__:2024-11-30 09:01:02 | Epoch: 0 | Step: 343120 | Dataset: 0-8669160 | Loss: 0.883 | 597 ms/step , 115570.67 GFLOP/s , 173640.0 tokens/s INFO:__main__:2024-11-30 09:01:09 | Epoch: 0 | Step: 343130 | Dataset: 0-8671560 | Loss: 0.793 | 597 ms/step , 115575.04 GFLOP/s , 173731.3 tokens/s INFO:__main__:2024-11-30 09:01:16 | Epoch: 0 | Step: 343140 | Dataset: 0-8673960 | Loss: 0.797 | 599 ms/step , 115136.23 GFLOP/s , 173661.9 tokens/s INFO:__main__:2024-11-30 09:01:23 | Epoch: 0 | Step: 343150 | Dataset: 0-8676360 | Loss: 0.827 | 597 ms/step , 115603.45 GFLOP/s , 173740.5 tokens/s INFO:__main__:2024-11-30 09:01:30 | Epoch: 0 | Step: 343160 | Dataset: 0-8678760 | Loss: 0.783 | 596 ms/step , 115838.59 GFLOP/s , 173754.4 tokens/s INFO:__main__:2024-11-30 09:01:37 | Epoch: 0 | Step: 343170 | Dataset: 0-8681160 | Loss: 0.790 | 597 ms/step , 115531.10 GFLOP/s , 173909.7 tokens/s INFO:__main__:2024-11-30 09:01:44 | Epoch: 0 | Step: 343180 | Dataset: 0-8683560 | Loss: 0.741 | 597 ms/step , 115631.47 GFLOP/s , 173757.5 tokens/s INFO:__main__:2024-11-30 09:01:51 | Epoch: 0 | Step: 343190 | Dataset: 0-8685960 | Loss: 0.770 | 597 ms/step , 115519.82 GFLOP/s , 173829.7 tokens/s INFO:__main__:2024-11-30 09:01:58 | Epoch: 0 | Step: 343200 | Dataset: 0-8688360 | Loss: 0.838 | 597 ms/step , 115601.46 GFLOP/s , 173781.1 tokens/s INFO:__main__:2024-11-30 09:02:05 | Epoch: 0 | Step: 343210 | Dataset: 0-8690760 | Loss: 0.743 | 597 ms/step , 115530.24 GFLOP/s , 173770.3 tokens/s INFO:__main__:2024-11-30 09:02:12 | Epoch: 0 | Step: 343220 | Dataset: 0-8693160 | Loss: 0.804 | 597 ms/step , 115655.29 GFLOP/s , 173790.6 tokens/s INFO:__main__:2024-11-30 09:02:19 | Epoch: 0 | Step: 343230 | Dataset: 0-8695560 | Loss: 0.757 | 597 ms/step , 115673.76 GFLOP/s , 173785.9 tokens/s INFO:__main__:2024-11-30 09:02:27 | Epoch: 0 | Step: 343240 | Dataset: 0-8697960 | Loss: 0.814 | 597 ms/step , 115562.77 GFLOP/s , 173751.4 tokens/s INFO:__main__:2024-11-30 09:02:34 | Epoch: 0 | Step: 343250 | Dataset: 0-8700360 | Loss: 0.821 | 597 ms/step , 115615.11 GFLOP/s , 173773.2 tokens/s INFO:__main__:2024-11-30 09:02:41 | Epoch: 0 | Step: 343260 | Dataset: 0-8702760 | Loss: 0.794 | 597 ms/step , 115629.55 GFLOP/s , 173765.0 tokens/s INFO:__main__:2024-11-30 09:02:48 | Epoch: 0 | Step: 343270 | Dataset: 0-8705160 | Loss: 0.820 | 597 ms/step , 115567.42 GFLOP/s , 173681.6 tokens/s INFO:__main__:2024-11-30 09:02:55 | Epoch: 0 | Step: 343280 | Dataset: 0-8707560 | Loss: 0.815 | 597 ms/step , 115688.15 GFLOP/s , 173712.5 tokens/s INFO:__main__:2024-11-30 09:03:02 | Epoch: 0 | Step: 343290 | Dataset: 0-8709960 | Loss: 0.819 | 599 ms/step , 115280.77 GFLOP/s , 173632.1 tokens/s INFO:__main__:2024-11-30 09:03:09 | Epoch: 0 | Step: 343300 | Dataset: 0-8712360 | Loss: 0.785 | 598 ms/step , 115483.66 GFLOP/s , 173731.9 tokens/s INFO:__main__:2024-11-30 09:03:16 | Epoch: 0 | Step: 343310 | Dataset: 0-8714760 | Loss: 0.747 | 596 ms/step , 115710.80 GFLOP/s , 173716.1 tokens/s INFO:__main__:2024-11-30 09:03:23 | Epoch: 0 | Step: 343320 | Dataset: 0-8717160 | Loss: 0.793 | 598 ms/step , 115480.56 GFLOP/s , 173726.0 tokens/s INFO:__main__:2024-11-30 09:03:30 | Epoch: 0 | Step: 343330 | Dataset: 0-8719560 | Loss: 0.809 | 597 ms/step , 115594.53 GFLOP/s , 173731.2 tokens/s INFO:__main__:2024-11-30 09:03:37 | Epoch: 0 | Step: 343340 | Dataset: 0-8721960 | Loss: 0.758 | 596 ms/step , 115762.68 GFLOP/s , 173705.8 tokens/s INFO:__main__:2024-11-30 09:03:44 | Epoch: 0 | Step: 343350 | Dataset: 0-8724360 | Loss: 0.757 | 597 ms/step , 115576.57 GFLOP/s , 173720.9 tokens/s INFO:__main__:2024-11-30 09:03:51 | Epoch: 0 | Step: 343360 | Dataset: 0-8726760 | Loss: 0.736 | 597 ms/step , 115621.32 GFLOP/s , 173752.5 tokens/s INFO:__main__:2024-11-30 09:03:58 | Epoch: 0 | Step: 343370 | Dataset: 0-8729160 | Loss: 0.743 | 596 ms/step , 115760.23 GFLOP/s , 173723.0 tokens/s INFO:__main__:2024-11-30 09:04:06 | Epoch: 0 | Step: 343380 | Dataset: 0-8731560 | Loss: 0.762 | 597 ms/step , 115514.08 GFLOP/s , 173787.0 tokens/s INFO:__main__:2024-11-30 09:04:13 | Epoch: 0 | Step: 343390 | Dataset: 0-8733960 | Loss: 0.765 | 597 ms/step , 115562.88 GFLOP/s , 173736.9 tokens/s INFO:__main__:2024-11-30 09:04:20 | Epoch: 0 | Step: 343400 | Dataset: 0-8736360 | Loss: 0.792 | 597 ms/step , 115623.08 GFLOP/s , 173738.7 tokens/s INFO:__main__:2024-11-30 09:04:27 | Epoch: 0 | Step: 343410 | Dataset: 0-8738760 | Loss: 0.819 | 597 ms/step , 115592.17 GFLOP/s , 173765.2 tokens/s INFO:__main__:2024-11-30 09:04:34 | Epoch: 0 | Step: 343420 | Dataset: 0-8741160 | Loss: 0.733 | 597 ms/step , 115539.33 GFLOP/s , 173623.1 tokens/s INFO:__main__:2024-11-30 09:04:41 | Epoch: 0 | Step: 343430 | Dataset: 0-8743560 | Loss: 0.804 | 597 ms/step , 115628.05 GFLOP/s , 173723.3 tokens/s INFO:__main__:2024-11-30 09:04:48 | Epoch: 0 | Step: 343440 | Dataset: 0-8745960 | Loss: 0.824 | 597 ms/step , 115628.23 GFLOP/s , 173689.6 tokens/s INFO:__main__:2024-11-30 09:04:55 | Epoch: 0 | Step: 343450 | Dataset: 0-8748360 | Loss: 0.787 | 598 ms/step , 115429.63 GFLOP/s , 173737.4 tokens/s INFO:__main__:2024-11-30 09:05:02 | Epoch: 0 | Step: 343460 | Dataset: 0-8750760 | Loss: 0.751 | 596 ms/step , 115714.99 GFLOP/s , 173645.9 tokens/s INFO:__main__:2024-11-30 09:05:09 | Epoch: 0 | Step: 343470 | Dataset: 0-8753160 | Loss: 0.722 | 597 ms/step , 115563.64 GFLOP/s , 173741.3 tokens/s INFO:__main__:2024-11-30 09:05:16 | Epoch: 0 | Step: 343480 | Dataset: 0-8755560 | Loss: 0.773 | 597 ms/step , 115628.19 GFLOP/s , 173688.8 tokens/s INFO:__main__:2024-11-30 09:05:23 | Epoch: 0 | Step: 343490 | Dataset: 0-8757960 | Loss: 0.756 | 596 ms/step , 115697.18 GFLOP/s , 173755.2 tokens/s INFO:__main__:2024-11-30 09:05:31 | Validation | Step: 343500 | Val_loss: 0.379 | Best_val_loss: 0.3619 INFO:__main__:2024-11-30 09:05:32 | Epoch: 0 | Step: 343500 | Dataset: 0-8760360 | Loss: 0.806 | 596 ms/step , 115879.20 GFLOP/s , 147937.1 tokens/s INFO:__main__:2024-11-30 09:05:39 | Epoch: 0 | Step: 343510 | Dataset: 0-8762760 | Loss: 0.756 | 597 ms/step , 115539.87 GFLOP/s , 173763.8 tokens/s INFO:__main__:2024-11-30 09:05:46 | Epoch: 0 | Step: 343520 | Dataset: 0-8765160 | Loss: 0.717 | 596 ms/step , 115720.90 GFLOP/s , 173874.1 tokens/s INFO:__main__:2024-11-30 09:05:53 | Epoch: 0 | Step: 343530 | Dataset: 0-8767560 | Loss: 0.756 | 597 ms/step , 115603.54 GFLOP/s , 173783.9 tokens/s INFO:__main__:2024-11-30 09:06:00 | Epoch: 0 | Step: 343540 | Dataset: 0-8769960 | Loss: 0.771 | 597 ms/step , 115694.07 GFLOP/s , 173756.3 tokens/s INFO:__main__:2024-11-30 09:06:07 | Epoch: 0 | Step: 343550 | Dataset: 0-8772360 | Loss: 0.786 | 597 ms/step , 115520.93 GFLOP/s , 173776.2 tokens/s INFO:__main__:2024-11-30 09:06:14 | Epoch: 0 | Step: 343560 | Dataset: 0-8774760 | Loss: 0.781 | 597 ms/step , 115600.54 GFLOP/s , 173760.8 tokens/s INFO:__main__:2024-11-30 09:06:21 | Epoch: 0 | Step: 343570 | Dataset: 0-8777160 | Loss: 0.761 | 597 ms/step , 115529.68 GFLOP/s , 173728.6 tokens/s INFO:__main__:2024-11-30 09:06:28 | Epoch: 0 | Step: 343580 | Dataset: 0-8779560 | Loss: 0.769 | 598 ms/step , 115483.18 GFLOP/s , 173631.3 tokens/s INFO:__main__:2024-11-30 09:06:35 | Epoch: 0 | Step: 343590 | Dataset: 0-8781960 | Loss: 0.820 | 597 ms/step , 115592.93 GFLOP/s , 173760.4 tokens/s INFO:__main__:2024-11-30 09:06:42 | Epoch: 0 | Step: 343600 | Dataset: 0-8784360 | Loss: 0.708 | 596 ms/step , 115743.51 GFLOP/s , 173796.8 tokens/s INFO:__main__:2024-11-30 09:06:49 | Epoch: 0 | Step: 343610 | Dataset: 0-8786760 | Loss: 0.717 | 598 ms/step , 115464.80 GFLOP/s , 173816.0 tokens/s INFO:__main__:2024-11-30 09:06:57 | Epoch: 0 | Step: 343620 | Dataset: 0-8789160 | Loss: 0.672 | 596 ms/step , 115807.18 GFLOP/s , 173857.6 tokens/s INFO:__main__:2024-11-30 09:07:04 | Epoch: 0 | Step: 343630 | Dataset: 0-8791560 | Loss: 0.619 | 596 ms/step , 115791.76 GFLOP/s , 173903.4 tokens/s INFO:__main__:2024-11-30 09:07:11 | Epoch: 0 | Step: 343640 | Dataset: 0-8793960 | Loss: 0.566 | 596 ms/step , 115719.02 GFLOP/s , 173790.7 tokens/s INFO:__main__:2024-11-30 09:07:18 | Epoch: 0 | Step: 343650 | Dataset: 0-8796360 | Loss: 0.753 | 597 ms/step , 115664.33 GFLOP/s , 173842.8 tokens/s INFO:__main__:2024-11-30 09:07:25 | Epoch: 0 | Step: 343660 | Dataset: 0-8798760 | Loss: 0.749 | 597 ms/step , 115658.79 GFLOP/s , 173836.9 tokens/s INFO:__main__:2024-11-30 09:07:32 | Epoch: 0 | Step: 343670 | Dataset: 0-8801160 | Loss: 0.676 | 597 ms/step , 115632.98 GFLOP/s , 173896.9 tokens/s INFO:__main__:2024-11-30 09:07:39 | Epoch: 0 | Step: 343680 | Dataset: 0-8803560 | Loss: 0.632 | 596 ms/step , 115793.82 GFLOP/s , 173875.9 tokens/s INFO:__main__:2024-11-30 09:07:46 | Epoch: 0 | Step: 343690 | Dataset: 0-8805960 | Loss: 0.628 | 596 ms/step , 115726.32 GFLOP/s , 173884.3 tokens/s INFO:__main__:2024-11-30 09:07:53 | Epoch: 0 | Step: 343700 | Dataset: 0-8808360 | Loss: 0.616 | 597 ms/step , 115633.48 GFLOP/s , 173863.4 tokens/s INFO:__main__:2024-11-30 09:08:00 | Epoch: 0 | Step: 343710 | Dataset: 0-8810760 | Loss: 0.771 | 597 ms/step , 115662.33 GFLOP/s , 173823.9 tokens/s INFO:__main__:2024-11-30 09:08:07 | Epoch: 0 | Step: 343720 | Dataset: 0-8813160 | Loss: 0.672 | 596 ms/step , 115829.93 GFLOP/s , 173839.3 tokens/s INFO:__main__:2024-11-30 09:08:14 | Epoch: 0 | Step: 343730 | Dataset: 0-8815560 | Loss: 0.594 | 597 ms/step , 115524.19 GFLOP/s , 173779.9 tokens/s INFO:__main__:2024-11-30 09:08:21 | Epoch: 0 | Step: 343740 | Dataset: 0-8817960 | Loss: 0.683 | 596 ms/step , 115837.57 GFLOP/s , 173795.9 tokens/s INFO:__main__:2024-11-30 09:08:28 | Epoch: 0 | Step: 343750 | Dataset: 0-8820360 | Loss: 0.647 | 596 ms/step , 115802.69 GFLOP/s , 173893.0 tokens/s INFO:__main__:2024-11-30 09:08:35 | Epoch: 0 | Step: 343760 | Dataset: 0-8822760 | Loss: 0.703 | 597 ms/step , 115664.10 GFLOP/s , 173814.4 tokens/s INFO:__main__:2024-11-30 09:08:43 | Epoch: 0 | Step: 343770 | Dataset: 0-8825160 | Loss: 0.627 | 597 ms/step , 115687.98 GFLOP/s , 173820.4 tokens/s INFO:__main__:2024-11-30 09:08:50 | Epoch: 0 | Step: 343780 | Dataset: 0-8827560 | Loss: 0.600 | 597 ms/step , 115635.33 GFLOP/s , 173875.6 tokens/s INFO:__main__:2024-11-30 09:08:57 | Epoch: 0 | Step: 343790 | Dataset: 0-8829960 | Loss: 0.585 | 597 ms/step , 115646.28 GFLOP/s , 173871.1 tokens/s INFO:__main__:2024-11-30 09:09:04 | Epoch: 0 | Step: 343800 | Dataset: 0-8832360 | Loss: 0.666 | 596 ms/step , 115746.78 GFLOP/s , 173798.4 tokens/s INFO:__main__:2024-11-30 09:09:11 | Epoch: 0 | Step: 343810 | Dataset: 0-8834760 | Loss: 0.661 | 596 ms/step , 115877.82 GFLOP/s , 173840.9 tokens/s INFO:__main__:2024-11-30 09:09:18 | Epoch: 0 | Step: 343820 | Dataset: 0-8837160 | Loss: 0.664 | 597 ms/step , 115691.78 GFLOP/s , 173823.3 tokens/s INFO:__main__:2024-11-30 09:09:25 | Epoch: 0 | Step: 343830 | Dataset: 0-8839560 | Loss: 0.691 | 596 ms/step , 115715.67 GFLOP/s , 173723.7 tokens/s INFO:__main__:2024-11-30 09:09:32 | Epoch: 0 | Step: 343840 | Dataset: 0-8841960 | Loss: 0.575 | 596 ms/step , 115789.04 GFLOP/s , 173843.2 tokens/s INFO:__main__:2024-11-30 09:09:39 | Epoch: 0 | Step: 343850 | Dataset: 0-8844360 | Loss: 0.584 | 597 ms/step , 115598.14 GFLOP/s , 173678.0 tokens/s INFO:__main__:2024-11-30 09:09:46 | Epoch: 0 | Step: 343860 | Dataset: 0-8846760 | Loss: 0.670 | 596 ms/step , 115703.42 GFLOP/s , 173699.3 tokens/s INFO:__main__:2024-11-30 09:09:53 | Epoch: 0 | Step: 343870 | Dataset: 0-8849160 | Loss: 0.645 | 597 ms/step , 115613.41 GFLOP/s , 173657.2 tokens/s INFO:__main__:2024-11-30 09:10:00 | Epoch: 0 | Step: 343880 | Dataset: 0-8851560 | Loss: 0.719 | 597 ms/step , 115555.25 GFLOP/s , 173740.8 tokens/s INFO:__main__:2024-11-30 09:10:07 | Epoch: 0 | Step: 343890 | Dataset: 0-8853960 | Loss: 0.652 | 596 ms/step , 115747.26 GFLOP/s , 173739.6 tokens/s INFO:__main__:2024-11-30 09:10:14 | Epoch: 0 | Step: 343900 | Dataset: 0-8856360 | Loss: 0.573 | 596 ms/step , 115842.27 GFLOP/s , 173830.0 tokens/s INFO:__main__:2024-11-30 09:10:22 | Epoch: 0 | Step: 343910 | Dataset: 0-8858760 | Loss: 0.631 | 596 ms/step , 115808.95 GFLOP/s , 173802.9 tokens/s INFO:__main__:2024-11-30 09:10:29 | Epoch: 0 | Step: 343920 | Dataset: 0-8861160 | Loss: 0.628 | 597 ms/step , 115524.16 GFLOP/s , 173627.6 tokens/s INFO:__main__:2024-11-30 09:10:36 | Epoch: 0 | Step: 343930 | Dataset: 0-8863560 | Loss: 0.645 | 597 ms/step , 115684.62 GFLOP/s , 173809.7 tokens/s INFO:__main__:2024-11-30 09:10:43 | Epoch: 0 | Step: 343940 | Dataset: 0-8865960 | Loss: 0.647 | 596 ms/step , 115703.19 GFLOP/s , 173764.6 tokens/s INFO:__main__:2024-11-30 09:10:50 | Epoch: 0 | Step: 343950 | Dataset: 0-8868360 | Loss: 0.727 | 597 ms/step , 115582.77 GFLOP/s , 173842.3 tokens/s INFO:__main__:2024-11-30 09:10:57 | Epoch: 0 | Step: 343960 | Dataset: 0-8870760 | Loss: 0.664 | 596 ms/step , 115809.14 GFLOP/s , 173871.8 tokens/s INFO:__main__:2024-11-30 09:11:04 | Epoch: 0 | Step: 343970 | Dataset: 0-8873160 | Loss: 0.639 | 596 ms/step , 115788.55 GFLOP/s , 173841.7 tokens/s INFO:__main__:2024-11-30 09:11:11 | Epoch: 0 | Step: 343980 | Dataset: 0-8875560 | Loss: 0.633 | 597 ms/step , 115605.79 GFLOP/s , 173920.8 tokens/s INFO:__main__:2024-11-30 09:11:18 | Epoch: 0 | Step: 343990 | Dataset: 0-8877960 | Loss: 0.716 | 597 ms/step , 115594.39 GFLOP/s , 173836.7 tokens/s INFO:__main__:2024-11-30 09:11:26 | Validation | Step: 344000 | Val_loss: 0.349 | Best_val_loss: 0.3619 INFO:__main__:2024-11-30 09:11:26 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_091126_step_344000.pt` INFO:__main__:2024-11-30 09:11:28 | Epoch: 0 | Step: 344000 | Dataset: 0-8880360 | Loss: 0.649 | 595 ms/step , 115988.32 GFLOP/s , 121321.4 tokens/s INFO:__main__:2024-11-30 09:11:35 | Epoch: 0 | Step: 344010 | Dataset: 0-8882760 | Loss: 0.694 | 598 ms/step , 115392.36 GFLOP/s , 173317.6 tokens/s INFO:__main__:2024-11-30 09:11:42 | Epoch: 0 | Step: 344020 | Dataset: 0-8885160 | Loss: 0.763 | 597 ms/step , 115646.11 GFLOP/s , 173404.2 tokens/s INFO:__main__:2024-11-30 09:11:49 | Epoch: 0 | Step: 344030 | Dataset: 0-8887560 | Loss: 0.682 | 595 ms/step , 115911.83 GFLOP/s , 173445.8 tokens/s INFO:__main__:2024-11-30 09:11:57 | Epoch: 0 | Step: 344040 | Dataset: 0-8889960 | Loss: 0.623 | 596 ms/step , 115838.89 GFLOP/s , 173450.8 tokens/s INFO:__main__:2024-11-30 09:12:04 | Epoch: 0 | Step: 344050 | Dataset: 0-8892360 | Loss: 0.622 | 597 ms/step , 115648.30 GFLOP/s , 173359.8 tokens/s INFO:__main__:2024-11-30 09:12:11 | Epoch: 0 | Step: 344060 | Dataset: 0-8894760 | Loss: 0.694 | 596 ms/step , 115851.56 GFLOP/s , 173342.0 tokens/s INFO:__main__:2024-11-30 09:12:18 | Epoch: 0 | Step: 344070 | Dataset: 0-8897160 | Loss: 0.654 | 595 ms/step , 115913.02 GFLOP/s , 173430.9 tokens/s INFO:__main__:2024-11-30 09:12:25 | Epoch: 0 | Step: 344080 | Dataset: 0-8899560 | Loss: 0.686 | 596 ms/step , 115765.63 GFLOP/s , 173839.0 tokens/s INFO:__main__:2024-11-30 09:12:32 | Epoch: 0 | Step: 344090 | Dataset: 0-8901960 | Loss: 0.654 | 599 ms/step , 115303.71 GFLOP/s , 173782.4 tokens/s INFO:__main__:2024-11-30 09:12:39 | Epoch: 0 | Step: 344100 | Dataset: 0-8904360 | Loss: 0.671 | 597 ms/step , 115682.29 GFLOP/s , 173902.7 tokens/s INFO:__main__:2024-11-30 09:12:46 | Epoch: 0 | Step: 344110 | Dataset: 0-8906760 | Loss: 0.772 | 596 ms/step , 115750.04 GFLOP/s , 173930.7 tokens/s INFO:__main__:2024-11-30 09:12:53 | Epoch: 0 | Step: 344120 | Dataset: 0-8909160 | Loss: 0.624 | 596 ms/step , 115845.21 GFLOP/s , 173837.9 tokens/s INFO:__main__:2024-11-30 09:13:00 | Epoch: 0 | Step: 344130 | Dataset: 0-8911560 | Loss: 0.701 | 597 ms/step , 115538.93 GFLOP/s , 173905.2 tokens/s INFO:__main__:2024-11-30 09:13:07 | Epoch: 0 | Step: 344140 | Dataset: 0-8913960 | Loss: 0.727 | 596 ms/step , 115727.73 GFLOP/s , 173839.1 tokens/s INFO:__main__:2024-11-30 09:13:14 | Epoch: 0 | Step: 344150 | Dataset: 0-8916360 | Loss: 0.735 | 597 ms/step , 115602.77 GFLOP/s , 173729.8 tokens/s INFO:__main__:2024-11-30 09:13:21 | Epoch: 0 | Step: 344160 | Dataset: 0-8918760 | Loss: 0.761 | 597 ms/step , 115613.46 GFLOP/s , 173744.2 tokens/s INFO:__main__:2024-11-30 09:13:29 | Epoch: 0 | Step: 344170 | Dataset: 0-8921160 | Loss: 0.718 | 597 ms/step , 115690.70 GFLOP/s , 173775.7 tokens/s INFO:__main__:2024-11-30 09:13:36 | Epoch: 0 | Step: 344180 | Dataset: 0-8923560 | Loss: 0.738 | 596 ms/step , 115697.21 GFLOP/s , 173814.0 tokens/s INFO:__main__:2024-11-30 09:13:43 | Epoch: 0 | Step: 344190 | Dataset: 0-8925960 | Loss: 0.745 | 597 ms/step , 115688.10 GFLOP/s , 173725.9 tokens/s INFO:__main__:2024-11-30 09:13:50 | Epoch: 0 | Step: 344200 | Dataset: 0-8928360 | Loss: 0.778 | 597 ms/step , 115525.53 GFLOP/s , 173793.7 tokens/s INFO:__main__:2024-11-30 09:13:57 | Epoch: 0 | Step: 344210 | Dataset: 0-8930760 | Loss: 0.763 | 596 ms/step , 115704.56 GFLOP/s , 173805.5 tokens/s INFO:__main__:2024-11-30 09:14:04 | Epoch: 0 | Step: 344220 | Dataset: 0-8933160 | Loss: 0.752 | 597 ms/step , 115648.66 GFLOP/s , 173852.8 tokens/s INFO:__main__:2024-11-30 09:14:11 | Epoch: 0 | Step: 344230 | Dataset: 0-8935560 | Loss: 0.754 | 597 ms/step , 115676.67 GFLOP/s , 173735.7 tokens/s INFO:__main__:2024-11-30 09:14:18 | Epoch: 0 | Step: 344240 | Dataset: 0-8937960 | Loss: 0.656 | 597 ms/step , 115615.51 GFLOP/s , 173789.6 tokens/s INFO:__main__:2024-11-30 09:14:25 | Epoch: 0 | Step: 344250 | Dataset: 0-8940360 | Loss: 0.695 | 597 ms/step , 115620.41 GFLOP/s , 173777.0 tokens/s INFO:__main__:2024-11-30 09:14:32 | Epoch: 0 | Step: 344260 | Dataset: 0-8942760 | Loss: 0.735 | 597 ms/step , 115586.69 GFLOP/s , 173642.3 tokens/s INFO:__main__:2024-11-30 09:14:39 | Epoch: 0 | Step: 344270 | Dataset: 0-8945160 | Loss: 0.692 | 597 ms/step , 115598.03 GFLOP/s , 173761.7 tokens/s INFO:__main__:2024-11-30 09:14:46 | Epoch: 0 | Step: 344280 | Dataset: 0-8947560 | Loss: 0.750 | 596 ms/step , 115708.86 GFLOP/s , 173808.4 tokens/s INFO:__main__:2024-11-30 09:14:53 | Epoch: 0 | Step: 344290 | Dataset: 0-8949960 | Loss: 0.735 | 597 ms/step , 115566.09 GFLOP/s , 173787.7 tokens/s INFO:__main__:2024-11-30 09:15:00 | Epoch: 0 | Step: 344300 | Dataset: 0-8952360 | Loss: 0.710 | 596 ms/step , 115763.48 GFLOP/s , 173745.5 tokens/s INFO:__main__:2024-11-30 09:15:08 | Epoch: 0 | Step: 344310 | Dataset: 0-8954760 | Loss: 0.690 | 597 ms/step , 115631.21 GFLOP/s , 173715.8 tokens/s INFO:__main__:2024-11-30 09:15:15 | Epoch: 0 | Step: 344320 | Dataset: 0-8957160 | Loss: 0.664 | 596 ms/step , 115765.29 GFLOP/s , 173799.7 tokens/s INFO:__main__:2024-11-30 09:15:22 | Epoch: 0 | Step: 344330 | Dataset: 0-8959560 | Loss: 0.771 | 597 ms/step , 115622.45 GFLOP/s , 173731.8 tokens/s INFO:__main__:2024-11-30 09:15:29 | Epoch: 0 | Step: 344340 | Dataset: 0-8961960 | Loss: 0.701 | 598 ms/step , 115492.48 GFLOP/s , 173784.5 tokens/s INFO:__main__:2024-11-30 09:15:36 | Epoch: 0 | Step: 344350 | Dataset: 0-8964360 | Loss: 0.747 | 598 ms/step , 115498.66 GFLOP/s , 173737.3 tokens/s INFO:__main__:2024-11-30 09:15:43 | Epoch: 0 | Step: 344360 | Dataset: 0-8966760 | Loss: 0.745 | 597 ms/step , 115672.98 GFLOP/s , 173777.8 tokens/s INFO:__main__:2024-11-30 09:15:50 | Epoch: 0 | Step: 344370 | Dataset: 0-8969160 | Loss: 0.707 | 597 ms/step , 115638.28 GFLOP/s , 173724.7 tokens/s INFO:__main__:2024-11-30 09:15:57 | Epoch: 0 | Step: 344380 | Dataset: 0-8971560 | Loss: 0.680 | 596 ms/step , 115798.19 GFLOP/s , 173753.8 tokens/s INFO:__main__:2024-11-30 09:16:04 | Epoch: 0 | Step: 344390 | Dataset: 0-8973960 | Loss: 0.709 | 598 ms/step , 115446.07 GFLOP/s , 173774.1 tokens/s INFO:__main__:2024-11-30 09:16:11 | Epoch: 0 | Step: 344400 | Dataset: 0-8976360 | Loss: 0.678 | 597 ms/step , 115570.92 GFLOP/s , 173720.7 tokens/s INFO:__main__:2024-11-30 09:16:18 | Epoch: 0 | Step: 344410 | Dataset: 0-8978760 | Loss: 0.667 | 597 ms/step , 115635.34 GFLOP/s , 173788.0 tokens/s INFO:__main__:2024-11-30 09:16:25 | Epoch: 0 | Step: 344420 | Dataset: 0-8981160 | Loss: 0.710 | 597 ms/step , 115605.32 GFLOP/s , 173781.2 tokens/s INFO:__main__:2024-11-30 09:16:32 | Epoch: 0 | Step: 344430 | Dataset: 0-8983560 | Loss: 0.720 | 596 ms/step , 115735.42 GFLOP/s , 173754.5 tokens/s INFO:__main__:2024-11-30 09:16:39 | Epoch: 0 | Step: 344440 | Dataset: 0-8985960 | Loss: 0.727 | 598 ms/step , 115478.55 GFLOP/s , 173712.5 tokens/s INFO:__main__:2024-11-30 09:16:47 | Epoch: 0 | Step: 344450 | Dataset: 0-8988360 | Loss: 0.673 | 597 ms/step , 115673.20 GFLOP/s , 173764.0 tokens/s INFO:__main__:2024-11-30 09:16:54 | Epoch: 0 | Step: 344460 | Dataset: 0-8990760 | Loss: 0.664 | 597 ms/step , 115585.46 GFLOP/s , 173778.7 tokens/s INFO:__main__:2024-11-30 09:17:01 | Epoch: 0 | Step: 344470 | Dataset: 0-8993160 | Loss: 0.639 | 596 ms/step , 115724.77 GFLOP/s , 173729.9 tokens/s INFO:__main__:2024-11-30 09:17:08 | Epoch: 0 | Step: 344480 | Dataset: 0-8995560 | Loss: 0.726 | 597 ms/step , 115552.66 GFLOP/s , 173770.2 tokens/s INFO:__main__:2024-11-30 09:17:15 | Epoch: 0 | Step: 344490 | Dataset: 0-8997960 | Loss: 0.693 | 597 ms/step , 115513.28 GFLOP/s , 173706.0 tokens/s INFO:__main__:2024-11-30 09:17:22 | Validation | Step: 344500 | Val_loss: 0.414 | Best_val_loss: 0.3494 INFO:__main__:2024-11-30 09:17:23 | Epoch: 0 | Step: 344500 | Dataset: 0-9000360 | Loss: 0.686 | 597 ms/step , 115596.17 GFLOP/s , 147989.2 tokens/s INFO:__main__:2024-11-30 09:17:30 | Epoch: 0 | Step: 344510 | Dataset: 0-9002760 | Loss: 0.664 | 598 ms/step , 115313.51 GFLOP/s , 173891.9 tokens/s INFO:__main__:2024-11-30 09:17:37 | Epoch: 0 | Step: 344520 | Dataset: 0-9005160 | Loss: 0.716 | 598 ms/step , 115458.16 GFLOP/s , 173784.6 tokens/s INFO:__main__:2024-11-30 09:17:44 | Epoch: 0 | Step: 344530 | Dataset: 0-9007560 | Loss: 0.751 | 596 ms/step , 115751.46 GFLOP/s , 173737.8 tokens/s INFO:__main__:2024-11-30 09:17:51 | Epoch: 0 | Step: 344540 | Dataset: 0-9009960 | Loss: 0.670 | 596 ms/step , 115735.04 GFLOP/s , 173748.9 tokens/s INFO:__main__:2024-11-30 09:17:58 | Epoch: 0 | Step: 344550 | Dataset: 0-9012360 | Loss: 0.676 | 598 ms/step , 115366.83 GFLOP/s , 173766.3 tokens/s INFO:__main__:2024-11-30 09:18:06 | Epoch: 0 | Step: 344560 | Dataset: 0-9014760 | Loss: 0.734 | 596 ms/step , 115799.76 GFLOP/s , 173784.0 tokens/s INFO:__main__:2024-11-30 09:18:13 | Epoch: 0 | Step: 344570 | Dataset: 0-9017160 | Loss: 0.688 | 597 ms/step , 115550.31 GFLOP/s , 173752.0 tokens/s INFO:__main__:2024-11-30 09:18:20 | Epoch: 0 | Step: 344580 | Dataset: 0-9019560 | Loss: 0.755 | 597 ms/step , 115550.54 GFLOP/s , 173719.9 tokens/s INFO:__main__:2024-11-30 09:18:27 | Epoch: 0 | Step: 344590 | Dataset: 0-9021960 | Loss: 0.692 | 597 ms/step , 115619.52 GFLOP/s , 173674.1 tokens/s INFO:__main__:2024-11-30 09:18:34 | Epoch: 0 | Step: 344600 | Dataset: 0-9024360 | Loss: 0.684 | 597 ms/step , 115616.93 GFLOP/s , 173701.4 tokens/s INFO:__main__:2024-11-30 09:18:41 | Epoch: 0 | Step: 344610 | Dataset: 0-9026760 | Loss: 0.687 | 597 ms/step , 115632.46 GFLOP/s , 173805.7 tokens/s INFO:__main__:2024-11-30 09:18:48 | Epoch: 0 | Step: 344620 | Dataset: 0-9029160 | Loss: 0.675 | 597 ms/step , 115675.18 GFLOP/s , 173704.4 tokens/s INFO:__main__:2024-11-30 09:18:55 | Epoch: 0 | Step: 344630 | Dataset: 0-9031560 | Loss: 0.712 | 597 ms/step , 115507.98 GFLOP/s , 173719.5 tokens/s INFO:__main__:2024-11-30 09:19:02 | Epoch: 0 | Step: 344640 | Dataset: 0-9033960 | Loss: 0.637 | 597 ms/step , 115621.22 GFLOP/s , 173795.9 tokens/s INFO:__main__:2024-11-30 09:19:09 | Epoch: 0 | Step: 344650 | Dataset: 0-9036360 | Loss: 0.735 | 597 ms/step , 115602.60 GFLOP/s , 173821.8 tokens/s INFO:__main__:2024-11-30 09:19:16 | Epoch: 0 | Step: 344660 | Dataset: 0-9038760 | Loss: 0.707 | 597 ms/step , 115664.97 GFLOP/s , 173840.2 tokens/s INFO:__main__:2024-11-30 09:19:23 | Epoch: 0 | Step: 344670 | Dataset: 0-9041160 | Loss: 0.775 | 597 ms/step , 115566.26 GFLOP/s , 173772.0 tokens/s INFO:__main__:2024-11-30 09:19:30 | Epoch: 0 | Step: 344680 | Dataset: 0-9043560 | Loss: 0.687 | 597 ms/step , 115629.38 GFLOP/s , 173751.8 tokens/s INFO:__main__:2024-11-30 09:19:37 | Epoch: 0 | Step: 344690 | Dataset: 0-9045960 | Loss: 0.363 | 596 ms/step , 115861.64 GFLOP/s , 173954.8 tokens/s INFO:__main__:2024-11-30 09:19:45 | Epoch: 0 | Step: 344700 | Dataset: 0-9048360 | Loss: 0.371 | 596 ms/step , 115828.31 GFLOP/s , 174050.6 tokens/s INFO:__main__:2024-11-30 09:19:52 | Epoch: 0 | Step: 344710 | Dataset: 0-9050760 | Loss: 0.375 | 595 ms/step , 115938.39 GFLOP/s , 174064.3 tokens/s INFO:__main__:2024-11-30 09:19:59 | Epoch: 0 | Step: 344720 | Dataset: 0-9053160 | Loss: 0.359 | 595 ms/step , 115975.54 GFLOP/s , 174100.1 tokens/s INFO:__main__:2024-11-30 09:20:06 | Epoch: 0 | Step: 344730 | Dataset: 0-9055560 | Loss: 0.408 | 596 ms/step , 115843.78 GFLOP/s , 173963.3 tokens/s INFO:__main__:2024-11-30 09:20:13 | Epoch: 0 | Step: 344740 | Dataset: 0-9057960 | Loss: 0.363 | 596 ms/step , 115751.08 GFLOP/s , 174033.6 tokens/s INFO:__main__:2024-11-30 09:20:20 | Epoch: 0 | Step: 344750 | Dataset: 0-9060360 | Loss: 0.377 | 596 ms/step , 115799.14 GFLOP/s , 174073.8 tokens/s INFO:__main__:2024-11-30 09:20:27 | Epoch: 0 | Step: 344760 | Dataset: 0-9062760 | Loss: 0.378 | 596 ms/step , 115841.31 GFLOP/s , 174062.1 tokens/s INFO:__main__:2024-11-30 09:20:34 | Epoch: 0 | Step: 344770 | Dataset: 0-9065160 | Loss: 0.401 | 595 ms/step , 115991.12 GFLOP/s , 174079.9 tokens/s INFO:__main__:2024-11-30 09:20:41 | Epoch: 0 | Step: 344780 | Dataset: 0-9067560 | Loss: 0.382 | 595 ms/step , 115905.76 GFLOP/s , 174041.0 tokens/s INFO:__main__:2024-11-30 09:20:48 | Epoch: 0 | Step: 344790 | Dataset: 0-9069960 | Loss: 0.364 | 596 ms/step , 115833.92 GFLOP/s , 174090.1 tokens/s INFO:__main__:2024-11-30 09:20:55 | Epoch: 0 | Step: 344800 | Dataset: 0-9072360 | Loss: 0.353 | 596 ms/step , 115794.68 GFLOP/s , 174130.6 tokens/s INFO:__main__:2024-11-30 09:21:02 | Epoch: 0 | Step: 344810 | Dataset: 0-9074760 | Loss: 0.355 | 595 ms/step , 115890.87 GFLOP/s , 174068.3 tokens/s INFO:__main__:2024-11-30 09:21:09 | Epoch: 0 | Step: 344820 | Dataset: 0-9077160 | Loss: 0.334 | 596 ms/step , 115814.97 GFLOP/s , 174080.4 tokens/s INFO:__main__:2024-11-30 09:21:16 | Epoch: 0 | Step: 344830 | Dataset: 0-9079560 | Loss: 0.367 | 596 ms/step , 115874.81 GFLOP/s , 174067.0 tokens/s INFO:__main__:2024-11-30 09:21:23 | Epoch: 0 | Step: 344840 | Dataset: 0-9081960 | Loss: 0.384 | 597 ms/step , 115647.68 GFLOP/s , 173968.6 tokens/s INFO:__main__:2024-11-30 09:21:30 | Epoch: 0 | Step: 344850 | Dataset: 0-9084360 | Loss: 0.378 | 597 ms/step , 115684.86 GFLOP/s , 174096.3 tokens/s INFO:__main__:2024-11-30 09:21:37 | Epoch: 0 | Step: 344860 | Dataset: 0-9086760 | Loss: 0.341 | 596 ms/step , 115818.23 GFLOP/s , 174039.6 tokens/s INFO:__main__:2024-11-30 09:21:45 | Epoch: 0 | Step: 344870 | Dataset: 0-9089160 | Loss: 0.368 | 597 ms/step , 115592.51 GFLOP/s , 174021.3 tokens/s INFO:__main__:2024-11-30 09:21:52 | Epoch: 0 | Step: 344880 | Dataset: 0-9091560 | Loss: 0.334 | 596 ms/step , 115761.26 GFLOP/s , 174042.8 tokens/s INFO:__main__:2024-11-30 09:21:59 | Epoch: 0 | Step: 344890 | Dataset: 0-9093960 | Loss: 0.314 | 596 ms/step , 115752.03 GFLOP/s , 174101.7 tokens/s INFO:__main__:2024-11-30 09:22:06 | Epoch: 0 | Step: 344900 | Dataset: 0-9096360 | Loss: 0.358 | 596 ms/step , 115792.24 GFLOP/s , 174080.5 tokens/s INFO:__main__:2024-11-30 09:22:13 | Epoch: 0 | Step: 344910 | Dataset: 0-9098760 | Loss: 0.398 | 596 ms/step , 115785.49 GFLOP/s , 174053.6 tokens/s INFO:__main__:2024-11-30 09:22:20 | Epoch: 0 | Step: 344920 | Dataset: 0-9101160 | Loss: 0.405 | 596 ms/step , 115823.82 GFLOP/s , 174039.7 tokens/s INFO:__main__:2024-11-30 09:22:27 | Epoch: 0 | Step: 344930 | Dataset: 0-9103560 | Loss: 0.368 | 596 ms/step , 115702.57 GFLOP/s , 174065.4 tokens/s INFO:__main__:2024-11-30 09:22:34 | Epoch: 0 | Step: 344940 | Dataset: 0-9105960 | Loss: 0.333 | 596 ms/step , 115831.70 GFLOP/s , 174042.1 tokens/s INFO:__main__:2024-11-30 09:22:41 | Epoch: 0 | Step: 344950 | Dataset: 0-9108360 | Loss: 0.375 | 596 ms/step , 115864.00 GFLOP/s , 173990.9 tokens/s INFO:__main__:2024-11-30 09:22:48 | Epoch: 0 | Step: 344960 | Dataset: 0-9110760 | Loss: 0.353 | 596 ms/step , 115723.57 GFLOP/s , 173973.9 tokens/s INFO:__main__:2024-11-30 09:22:55 | Epoch: 0 | Step: 344970 | Dataset: 0-9113160 | Loss: 0.310 | 596 ms/step , 115783.94 GFLOP/s , 174056.9 tokens/s INFO:__main__:2024-11-30 09:23:02 | Epoch: 0 | Step: 344980 | Dataset: 0-9115560 | Loss: 0.384 | 595 ms/step , 115948.01 GFLOP/s , 174064.0 tokens/s INFO:__main__:2024-11-30 09:23:09 | Epoch: 0 | Step: 344990 | Dataset: 0-9117960 | Loss: 0.339 | 596 ms/step , 115839.07 GFLOP/s , 174032.8 tokens/s INFO:__main__:2024-11-30 09:23:17 | Validation | Step: 345000 | Val_loss: 0.349 | Best_val_loss: 0.3494 INFO:__main__:2024-11-30 09:23:17 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_092317_step_345000.pt` INFO:__main__:2024-11-30 09:23:19 | Epoch: 0 | Step: 345000 | Dataset: 0-9120360 | Loss: 0.383 | 595 ms/step , 116024.99 GFLOP/s , 121247.2 tokens/s INFO:__main__:2024-11-30 09:23:26 | Epoch: 0 | Step: 345010 | Dataset: 0-9122760 | Loss: 0.368 | 598 ms/step , 115390.98 GFLOP/s , 173679.2 tokens/s INFO:__main__:2024-11-30 09:23:34 | Epoch: 0 | Step: 345020 | Dataset: 0-9125160 | Loss: 0.396 | 597 ms/step , 115527.67 GFLOP/s , 173650.8 tokens/s INFO:__main__:2024-11-30 09:23:41 | Epoch: 0 | Step: 345030 | Dataset: 0-9127560 | Loss: 0.321 | 596 ms/step , 115817.98 GFLOP/s , 173658.3 tokens/s INFO:__main__:2024-11-30 09:23:48 | Epoch: 0 | Step: 345040 | Dataset: 0-9129960 | Loss: 0.374 | 596 ms/step , 115779.29 GFLOP/s , 173621.9 tokens/s INFO:__main__:2024-11-30 09:23:55 | Epoch: 0 | Step: 345050 | Dataset: 0-9132360 | Loss: 0.337 | 596 ms/step , 115879.49 GFLOP/s , 173581.4 tokens/s INFO:__main__:2024-11-30 09:24:02 | Epoch: 0 | Step: 345060 | Dataset: 0-9134760 | Loss: 0.343 | 595 ms/step , 115916.50 GFLOP/s , 173593.7 tokens/s INFO:__main__:2024-11-30 09:24:09 | Epoch: 0 | Step: 345070 | Dataset: 0-9137160 | Loss: 0.430 | 595 ms/step , 115974.71 GFLOP/s , 173510.7 tokens/s INFO:__main__:2024-11-30 09:24:16 | Epoch: 0 | Step: 345080 | Dataset: 0-9139560 | Loss: 0.349 | 595 ms/step , 115983.30 GFLOP/s , 173509.0 tokens/s INFO:__main__:2024-11-30 09:24:23 | Epoch: 0 | Step: 345090 | Dataset: 0-9141960 | Loss: 0.432 | 596 ms/step , 115711.46 GFLOP/s , 173597.1 tokens/s INFO:__main__:2024-11-30 09:24:30 | Epoch: 0 | Step: 345100 | Dataset: 0-9144360 | Loss: 0.359 | 596 ms/step , 115831.00 GFLOP/s , 173507.4 tokens/s INFO:__main__:2024-11-30 09:24:37 | Epoch: 0 | Step: 345110 | Dataset: 0-9146760 | Loss: 0.394 | 595 ms/step , 115928.20 GFLOP/s , 173535.3 tokens/s INFO:__main__:2024-11-30 09:24:44 | Epoch: 0 | Step: 345120 | Dataset: 0-9149160 | Loss: 0.290 | 596 ms/step , 115748.98 GFLOP/s , 173532.1 tokens/s INFO:__main__:2024-11-30 09:24:51 | Epoch: 0 | Step: 345130 | Dataset: 0-9151560 | Loss: 0.356 | 596 ms/step , 115781.01 GFLOP/s , 173536.3 tokens/s INFO:__main__:2024-11-30 09:24:59 | Epoch: 0 | Step: 345140 | Dataset: 0-9153960 | Loss: 0.375 | 595 ms/step , 115932.19 GFLOP/s , 173559.8 tokens/s INFO:__main__:2024-11-30 09:25:06 | Epoch: 0 | Step: 345150 | Dataset: 0-9156360 | Loss: 0.380 | 597 ms/step , 115675.46 GFLOP/s , 173501.3 tokens/s INFO:__main__:2024-11-30 09:25:13 | Epoch: 0 | Step: 345160 | Dataset: 0-9158760 | Loss: 0.380 | 595 ms/step , 115907.33 GFLOP/s , 173728.6 tokens/s INFO:__main__:2024-11-30 09:25:20 | Epoch: 0 | Step: 345170 | Dataset: 0-9161160 | Loss: 0.342 | 596 ms/step , 115848.65 GFLOP/s , 174065.0 tokens/s INFO:__main__:2024-11-30 09:25:27 | Epoch: 0 | Step: 345180 | Dataset: 0-9163560 | Loss: 0.369 | 597 ms/step , 115653.48 GFLOP/s , 174069.0 tokens/s INFO:__main__:2024-11-30 09:25:34 | Epoch: 0 | Step: 345190 | Dataset: 0-9165960 | Loss: 0.377 | 596 ms/step , 115716.47 GFLOP/s , 174041.3 tokens/s INFO:__main__:2024-11-30 09:25:41 | Epoch: 0 | Step: 345200 | Dataset: 0-9168360 | Loss: 0.368 | 597 ms/step , 115684.87 GFLOP/s , 174077.1 tokens/s INFO:__main__:2024-11-30 09:25:48 | Epoch: 0 | Step: 345210 | Dataset: 0-9170760 | Loss: 0.348 | 596 ms/step , 115735.55 GFLOP/s , 174110.3 tokens/s INFO:__main__:2024-11-30 09:25:55 | Epoch: 0 | Step: 345220 | Dataset: 0-9173160 | Loss: 0.366 | 596 ms/step , 115814.34 GFLOP/s , 174097.3 tokens/s INFO:__main__:2024-11-30 09:26:02 | Epoch: 0 | Step: 345230 | Dataset: 0-9175560 | Loss: 0.371 | 596 ms/step , 115789.02 GFLOP/s , 174079.5 tokens/s INFO:__main__:2024-11-30 09:26:09 | Epoch: 0 | Step: 345240 | Dataset: 0-9177960 | Loss: 0.359 | 596 ms/step , 115860.50 GFLOP/s , 174113.9 tokens/s INFO:__main__:2024-11-30 09:26:16 | Epoch: 0 | Step: 345250 | Dataset: 0-9180360 | Loss: 0.353 | 595 ms/step , 115899.70 GFLOP/s , 174091.7 tokens/s INFO:__main__:2024-11-30 09:26:23 | Epoch: 0 | Step: 345260 | Dataset: 0-9182760 | Loss: 0.316 | 596 ms/step , 115877.37 GFLOP/s , 174079.2 tokens/s INFO:__main__:2024-11-30 09:26:30 | Epoch: 0 | Step: 345270 | Dataset: 0-9185160 | Loss: 0.332 | 596 ms/step , 115726.24 GFLOP/s , 173996.6 tokens/s INFO:__main__:2024-11-30 09:26:37 | Epoch: 0 | Step: 345280 | Dataset: 0-9187560 | Loss: 0.338 | 595 ms/step , 116063.78 GFLOP/s , 174019.9 tokens/s INFO:__main__:2024-11-30 09:26:44 | Epoch: 0 | Step: 345290 | Dataset: 0-9189960 | Loss: 0.362 | 596 ms/step , 115835.32 GFLOP/s , 173975.3 tokens/s INFO:__main__:2024-11-30 09:26:52 | Epoch: 0 | Step: 345300 | Dataset: 0-9192360 | Loss: 0.338 | 595 ms/step , 115897.28 GFLOP/s , 174092.6 tokens/s INFO:__main__:2024-11-30 09:26:59 | Epoch: 0 | Step: 345310 | Dataset: 0-9194760 | Loss: 0.327 | 595 ms/step , 115952.18 GFLOP/s , 174028.9 tokens/s INFO:__main__:2024-11-30 09:27:06 | Epoch: 0 | Step: 345320 | Dataset: 0-9197160 | Loss: 0.374 | 596 ms/step , 115752.92 GFLOP/s , 173978.6 tokens/s INFO:__main__:2024-11-30 09:27:13 | Epoch: 0 | Step: 345330 | Dataset: 0-9199560 | Loss: 0.339 | 597 ms/step , 115692.21 GFLOP/s , 174023.3 tokens/s INFO:__main__:2024-11-30 09:27:20 | Epoch: 0 | Step: 345340 | Dataset: 0-9201960 | Loss: 0.334 | 597 ms/step , 115609.95 GFLOP/s , 174038.1 tokens/s INFO:__main__:2024-11-30 09:27:27 | Epoch: 0 | Step: 345350 | Dataset: 0-9204360 | Loss: 0.361 | 597 ms/step , 115562.81 GFLOP/s , 174049.4 tokens/s INFO:__main__:2024-11-30 09:27:34 | Epoch: 0 | Step: 345360 | Dataset: 0-9206760 | Loss: 0.347 | 596 ms/step , 115835.20 GFLOP/s , 174079.3 tokens/s INFO:__main__:2024-11-30 09:27:41 | Epoch: 0 | Step: 345370 | Dataset: 0-9209160 | Loss: 0.373 | 596 ms/step , 115790.92 GFLOP/s , 174070.5 tokens/s INFO:__main__:2024-11-30 09:27:48 | Epoch: 0 | Step: 345380 | Dataset: 0-9211560 | Loss: 0.333 | 596 ms/step , 115812.73 GFLOP/s , 174018.2 tokens/s INFO:__main__:2024-11-30 09:27:55 | Epoch: 0 | Step: 345390 | Dataset: 0-9213960 | Loss: 0.364 | 596 ms/step , 115708.57 GFLOP/s , 174039.4 tokens/s INFO:__main__:2024-11-30 09:28:02 | Epoch: 0 | Step: 345400 | Dataset: 0-9216360 | Loss: 0.364 | 596 ms/step , 115715.50 GFLOP/s , 174008.5 tokens/s INFO:__main__:2024-11-30 09:28:09 | Epoch: 0 | Step: 345410 | Dataset: 0-9218760 | Loss: 0.339 | 596 ms/step , 115827.64 GFLOP/s , 174019.2 tokens/s INFO:__main__:2024-11-30 09:28:16 | Epoch: 0 | Step: 345420 | Dataset: 0-9221160 | Loss: 0.390 | 596 ms/step , 115833.64 GFLOP/s , 174001.9 tokens/s INFO:__main__:2024-11-30 09:28:23 | Epoch: 0 | Step: 345430 | Dataset: 0-9223560 | Loss: 0.367 | 595 ms/step , 115891.71 GFLOP/s , 174080.3 tokens/s INFO:__main__:2024-11-30 09:28:30 | Epoch: 0 | Step: 345440 | Dataset: 0-9225960 | Loss: 0.345 | 596 ms/step , 115753.72 GFLOP/s , 174019.0 tokens/s INFO:__main__:2024-11-30 09:28:37 | Epoch: 0 | Step: 345450 | Dataset: 0-9228360 | Loss: 0.431 | 596 ms/step , 115801.01 GFLOP/s , 174013.2 tokens/s INFO:__main__:2024-11-30 09:28:44 | Epoch: 0 | Step: 345460 | Dataset: 0-9230760 | Loss: 0.342 | 595 ms/step , 115896.94 GFLOP/s , 174073.6 tokens/s INFO:__main__:2024-11-30 09:28:52 | Epoch: 0 | Step: 345470 | Dataset: 0-9233160 | Loss: 0.369 | 596 ms/step , 115874.20 GFLOP/s , 173828.1 tokens/s INFO:__main__:2024-11-30 09:28:59 | Epoch: 0 | Step: 345480 | Dataset: 0-9235560 | Loss: 0.330 | 595 ms/step , 116007.53 GFLOP/s , 174088.4 tokens/s INFO:__main__:2024-11-30 09:29:06 | Epoch: 0 | Step: 345490 | Dataset: 0-9237960 | Loss: 0.380 | 595 ms/step , 115988.19 GFLOP/s , 174037.6 tokens/s INFO:__main__:2024-11-30 09:29:13 | Validation | Step: 345500 | Val_loss: 0.386 | Best_val_loss: 0.3489 INFO:__main__:2024-11-30 09:29:14 | Epoch: 0 | Step: 345500 | Dataset: 0-9240360 | Loss: 0.374 | 595 ms/step , 116061.58 GFLOP/s , 148132.7 tokens/s INFO:__main__:2024-11-30 09:29:21 | Epoch: 0 | Step: 345510 | Dataset: 0-9242760 | Loss: 0.312 | 597 ms/step , 115679.23 GFLOP/s , 174039.2 tokens/s INFO:__main__:2024-11-30 09:29:28 | Epoch: 0 | Step: 345520 | Dataset: 0-9245160 | Loss: 0.336 | 596 ms/step , 115826.19 GFLOP/s , 174017.8 tokens/s INFO:__main__:2024-11-30 09:29:35 | Epoch: 0 | Step: 345530 | Dataset: 0-9247560 | Loss: 0.348 | 596 ms/step , 115840.31 GFLOP/s , 174062.7 tokens/s INFO:__main__:2024-11-30 09:29:42 | Epoch: 0 | Step: 345540 | Dataset: 0-9249960 | Loss: 0.301 | 596 ms/step , 115778.10 GFLOP/s , 174027.5 tokens/s INFO:__main__:2024-11-30 09:29:49 | Epoch: 0 | Step: 345550 | Dataset: 0-9252360 | Loss: 0.330 | 596 ms/step , 115807.11 GFLOP/s , 174065.1 tokens/s INFO:__main__:2024-11-30 09:29:56 | Epoch: 0 | Step: 345560 | Dataset: 0-9254760 | Loss: 0.343 | 596 ms/step , 115769.68 GFLOP/s , 174019.2 tokens/s INFO:__main__:2024-11-30 09:30:03 | Epoch: 0 | Step: 345570 | Dataset: 0-9257160 | Loss: 0.353 | 596 ms/step , 115888.77 GFLOP/s , 174013.1 tokens/s INFO:__main__:2024-11-30 09:30:10 | Epoch: 0 | Step: 345580 | Dataset: 0-9259560 | Loss: 0.351 | 595 ms/step , 115939.26 GFLOP/s , 174084.0 tokens/s INFO:__main__:2024-11-30 09:30:18 | Epoch: 0 | Step: 345590 | Dataset: 0-9261960 | Loss: 0.320 | 596 ms/step , 115755.47 GFLOP/s , 174008.2 tokens/s INFO:__main__:2024-11-30 09:30:25 | Epoch: 0 | Step: 345600 | Dataset: 0-9264360 | Loss: 0.377 | 595 ms/step , 115940.93 GFLOP/s , 174014.2 tokens/s INFO:__main__:2024-11-30 09:30:32 | Epoch: 0 | Step: 345610 | Dataset: 0-9266760 | Loss: 0.403 | 596 ms/step , 115704.82 GFLOP/s , 173910.4 tokens/s INFO:__main__:2024-11-30 09:30:39 | Epoch: 0 | Step: 345620 | Dataset: 0-9269160 | Loss: 0.359 | 597 ms/step , 115665.43 GFLOP/s , 173979.3 tokens/s INFO:__main__:2024-11-30 09:30:46 | Epoch: 0 | Step: 345630 | Dataset: 0-9271560 | Loss: 0.359 | 595 ms/step , 115988.69 GFLOP/s , 173999.1 tokens/s INFO:__main__:2024-11-30 09:30:53 | Epoch: 0 | Step: 345640 | Dataset: 0-9273960 | Loss: 0.303 | 596 ms/step , 115887.98 GFLOP/s , 174026.1 tokens/s INFO:__main__:2024-11-30 09:31:00 | Epoch: 0 | Step: 345650 | Dataset: 0-9276360 | Loss: 0.336 | 597 ms/step , 115631.13 GFLOP/s , 173967.2 tokens/s INFO:__main__:2024-11-30 09:31:07 | Epoch: 0 | Step: 345660 | Dataset: 0-9278760 | Loss: 0.361 | 596 ms/step , 115756.58 GFLOP/s , 173979.5 tokens/s INFO:__main__:2024-11-30 09:31:14 | Epoch: 0 | Step: 345670 | Dataset: 0-9281160 | Loss: 0.409 | 595 ms/step , 115979.00 GFLOP/s , 174036.8 tokens/s INFO:__main__:2024-11-30 09:31:21 | Epoch: 0 | Step: 345680 | Dataset: 0-9283560 | Loss: 0.355 | 596 ms/step , 115858.85 GFLOP/s , 173984.6 tokens/s INFO:__main__:2024-11-30 09:31:28 | Epoch: 0 | Step: 345690 | Dataset: 0-9285960 | Loss: 0.383 | 596 ms/step , 115815.55 GFLOP/s , 173990.3 tokens/s INFO:__main__:2024-11-30 09:31:35 | Epoch: 0 | Step: 345700 | Dataset: 0-9288360 | Loss: 0.386 | 596 ms/step , 115770.71 GFLOP/s , 173975.7 tokens/s INFO:__main__:2024-11-30 09:31:42 | Epoch: 0 | Step: 345710 | Dataset: 0-9290760 | Loss: 0.360 | 597 ms/step , 115668.88 GFLOP/s , 174025.3 tokens/s INFO:__main__:2024-11-30 09:31:49 | Epoch: 0 | Step: 345720 | Dataset: 0-9293160 | Loss: 0.393 | 596 ms/step , 115858.44 GFLOP/s , 174057.2 tokens/s INFO:__main__:2024-11-30 09:31:56 | Epoch: 0 | Step: 345730 | Dataset: 0-9295560 | Loss: 0.362 | 595 ms/step , 115999.25 GFLOP/s , 173978.8 tokens/s INFO:__main__:2024-11-30 09:32:03 | Epoch: 0 | Step: 345740 | Dataset: 0-9297960 | Loss: 0.350 | 596 ms/step , 115754.05 GFLOP/s , 174081.1 tokens/s INFO:__main__:2024-11-30 09:32:11 | Epoch: 0 | Step: 345750 | Dataset: 0-9300360 | Loss: 0.389 | 595 ms/step , 116018.76 GFLOP/s , 174047.1 tokens/s INFO:__main__:2024-11-30 09:32:18 | Epoch: 0 | Step: 345760 | Dataset: 0-9302760 | Loss: 0.344 | 596 ms/step , 115765.32 GFLOP/s , 173959.8 tokens/s INFO:__main__:2024-11-30 09:32:25 | Epoch: 0 | Step: 345770 | Dataset: 0-9305160 | Loss: 0.344 | 596 ms/step , 115887.98 GFLOP/s , 174075.8 tokens/s INFO:__main__:2024-11-30 09:32:32 | Epoch: 0 | Step: 345780 | Dataset: 0-9307560 | Loss: 0.341 | 595 ms/step , 116007.55 GFLOP/s , 174118.5 tokens/s INFO:__main__:2024-11-30 09:32:39 | Epoch: 0 | Step: 345790 | Dataset: 0-9309960 | Loss: 0.311 | 595 ms/step , 115890.45 GFLOP/s , 174049.8 tokens/s INFO:__main__:2024-11-30 09:32:46 | Epoch: 0 | Step: 345800 | Dataset: 0-9312360 | Loss: 0.331 | 595 ms/step , 116002.30 GFLOP/s , 174081.2 tokens/s INFO:__main__:2024-11-30 09:32:53 | Epoch: 0 | Step: 345810 | Dataset: 0-9314760 | Loss: 0.333 | 596 ms/step , 115844.31 GFLOP/s , 174077.8 tokens/s INFO:__main__:2024-11-30 09:33:00 | Epoch: 0 | Step: 345820 | Dataset: 0-9317160 | Loss: 0.309 | 596 ms/step , 115764.90 GFLOP/s , 174090.7 tokens/s INFO:__main__:2024-11-30 09:33:07 | Epoch: 0 | Step: 345830 | Dataset: 0-9319560 | Loss: 0.283 | 596 ms/step , 115866.24 GFLOP/s , 174098.7 tokens/s INFO:__main__:2024-11-30 09:33:14 | Epoch: 0 | Step: 345840 | Dataset: 0-9321960 | Loss: 0.328 | 595 ms/step , 116003.80 GFLOP/s , 174143.5 tokens/s INFO:__main__:2024-11-30 09:33:21 | Epoch: 0 | Step: 345850 | Dataset: 0-9324360 | Loss: 0.320 | 595 ms/step , 116075.81 GFLOP/s , 174043.9 tokens/s INFO:__main__:2024-11-30 09:33:28 | Epoch: 0 | Step: 345860 | Dataset: 0-9326760 | Loss: 0.323 | 595 ms/step , 115943.66 GFLOP/s , 174203.8 tokens/s INFO:__main__:2024-11-30 09:33:35 | Epoch: 0 | Step: 345870 | Dataset: 0-9329160 | Loss: 0.292 | 595 ms/step , 115954.61 GFLOP/s , 174173.7 tokens/s INFO:__main__:2024-11-30 09:33:42 | Epoch: 0 | Step: 345880 | Dataset: 0-9331560 | Loss: 0.327 | 595 ms/step , 115972.99 GFLOP/s , 174180.3 tokens/s INFO:__main__:2024-11-30 09:33:49 | Epoch: 0 | Step: 345890 | Dataset: 0-9333960 | Loss: 0.331 | 597 ms/step , 115574.88 GFLOP/s , 174217.7 tokens/s INFO:__main__:2024-11-30 09:33:56 | Epoch: 0 | Step: 345900 | Dataset: 0-9336360 | Loss: 0.289 | 596 ms/step , 115884.34 GFLOP/s , 174172.6 tokens/s INFO:__main__:2024-11-30 09:34:03 | Epoch: 0 | Step: 345910 | Dataset: 0-9338760 | Loss: 0.281 | 596 ms/step , 115831.20 GFLOP/s , 174170.4 tokens/s INFO:__main__:2024-11-30 09:34:10 | Epoch: 0 | Step: 345920 | Dataset: 0-9341160 | Loss: 0.345 | 596 ms/step , 115834.85 GFLOP/s , 174098.6 tokens/s INFO:__main__:2024-11-30 09:34:18 | Epoch: 0 | Step: 345930 | Dataset: 0-9343560 | Loss: 0.294 | 595 ms/step , 115915.02 GFLOP/s , 174182.2 tokens/s INFO:__main__:2024-11-30 09:34:25 | Epoch: 0 | Step: 345940 | Dataset: 0-9345960 | Loss: 0.289 | 596 ms/step , 115866.74 GFLOP/s , 174127.0 tokens/s INFO:__main__:2024-11-30 09:34:32 | Epoch: 0 | Step: 345950 | Dataset: 0-9348360 | Loss: 0.328 | 595 ms/step , 115940.67 GFLOP/s , 174124.8 tokens/s INFO:__main__:2024-11-30 09:34:39 | Epoch: 0 | Step: 345960 | Dataset: 0-9350760 | Loss: 0.354 | 595 ms/step , 115968.34 GFLOP/s , 174151.6 tokens/s INFO:__main__:2024-11-30 09:34:46 | Epoch: 0 | Step: 345970 | Dataset: 0-9353160 | Loss: 0.269 | 595 ms/step , 115955.90 GFLOP/s , 174095.8 tokens/s INFO:__main__:2024-11-30 09:34:53 | Epoch: 0 | Step: 345980 | Dataset: 0-9355560 | Loss: 0.314 | 596 ms/step , 115773.84 GFLOP/s , 174077.1 tokens/s INFO:__main__:2024-11-30 09:35:00 | Epoch: 0 | Step: 345990 | Dataset: 0-9357960 | Loss: 0.367 | 597 ms/step , 115661.82 GFLOP/s , 174073.2 tokens/s INFO:__main__:2024-11-30 09:35:07 | Validation | Step: 346000 | Val_loss: 0.359 | Best_val_loss: 0.3489 INFO:__main__:2024-11-30 09:35:07 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_093507_step_346000.pt` INFO:__main__:2024-11-30 09:35:10 | Epoch: 0 | Step: 346000 | Dataset: 0-9360360 | Loss: 0.252 | 595 ms/step , 116013.83 GFLOP/s , 120460.5 tokens/s INFO:__main__:2024-11-30 09:35:17 | Epoch: 0 | Step: 346010 | Dataset: 0-9362760 | Loss: 0.323 | 598 ms/step , 115472.98 GFLOP/s , 173643.0 tokens/s INFO:__main__:2024-11-30 09:35:24 | Epoch: 0 | Step: 346020 | Dataset: 0-9365160 | Loss: 0.293 | 597 ms/step , 115575.37 GFLOP/s , 173668.5 tokens/s INFO:__main__:2024-11-30 09:35:31 | Epoch: 0 | Step: 346030 | Dataset: 0-9367560 | Loss: 0.331 | 595 ms/step , 116027.38 GFLOP/s , 173811.4 tokens/s INFO:__main__:2024-11-30 09:35:38 | Epoch: 0 | Step: 346040 | Dataset: 0-9369960 | Loss: 0.323 | 596 ms/step , 115860.77 GFLOP/s , 173711.8 tokens/s INFO:__main__:2024-11-30 09:35:45 | Epoch: 0 | Step: 346050 | Dataset: 0-9372360 | Loss: 0.361 | 595 ms/step , 115928.97 GFLOP/s , 173647.4 tokens/s INFO:__main__:2024-11-30 09:35:53 | Epoch: 0 | Step: 346060 | Dataset: 0-9374760 | Loss: 0.254 | 595 ms/step , 115948.46 GFLOP/s , 173691.6 tokens/s INFO:__main__:2024-11-30 09:36:00 | Epoch: 0 | Step: 346070 | Dataset: 0-9377160 | Loss: 0.327 | 595 ms/step , 115911.78 GFLOP/s , 173533.8 tokens/s INFO:__main__:2024-11-30 09:36:07 | Epoch: 0 | Step: 346080 | Dataset: 0-9379560 | Loss: 0.354 | 595 ms/step , 115902.66 GFLOP/s , 173580.1 tokens/s INFO:__main__:2024-11-30 09:36:14 | Epoch: 0 | Step: 346090 | Dataset: 0-9381960 | Loss: 0.324 | 595 ms/step , 115979.89 GFLOP/s , 173553.6 tokens/s INFO:__main__:2024-11-30 09:36:21 | Epoch: 0 | Step: 346100 | Dataset: 0-9384360 | Loss: 0.314 | 596 ms/step , 115809.63 GFLOP/s , 173580.9 tokens/s INFO:__main__:2024-11-30 09:36:28 | Epoch: 0 | Step: 346110 | Dataset: 0-9386760 | Loss: 0.317 | 595 ms/step , 115992.10 GFLOP/s , 173663.2 tokens/s INFO:__main__:2024-11-30 09:36:35 | Epoch: 0 | Step: 346120 | Dataset: 0-9389160 | Loss: 0.245 | 596 ms/step , 115810.93 GFLOP/s , 173678.2 tokens/s INFO:__main__:2024-11-30 09:36:42 | Epoch: 0 | Step: 346130 | Dataset: 0-9391560 | Loss: 0.419 | 596 ms/step , 115811.68 GFLOP/s , 174096.2 tokens/s INFO:__main__:2024-11-30 09:36:49 | Epoch: 0 | Step: 346140 | Dataset: 0-9393960 | Loss: 0.276 | 595 ms/step , 116013.60 GFLOP/s , 174125.9 tokens/s INFO:__main__:2024-11-30 09:36:56 | Epoch: 0 | Step: 346150 | Dataset: 0-9396360 | Loss: 0.277 | 596 ms/step , 115820.79 GFLOP/s , 174099.2 tokens/s INFO:__main__:2024-11-30 09:37:03 | Epoch: 0 | Step: 346160 | Dataset: 0-9398760 | Loss: 0.334 | 595 ms/step , 115990.64 GFLOP/s , 174156.6 tokens/s INFO:__main__:2024-11-30 09:37:10 | Epoch: 0 | Step: 346170 | Dataset: 0-9401160 | Loss: 0.312 | 596 ms/step , 115819.62 GFLOP/s , 174152.9 tokens/s INFO:__main__:2024-11-30 09:37:17 | Epoch: 0 | Step: 346180 | Dataset: 0-9403560 | Loss: 0.319 | 595 ms/step , 115982.16 GFLOP/s , 174125.8 tokens/s INFO:__main__:2024-11-30 09:37:24 | Epoch: 0 | Step: 346190 | Dataset: 0-9405960 | Loss: 0.391 | 596 ms/step , 115853.08 GFLOP/s , 174138.0 tokens/s INFO:__main__:2024-11-30 09:37:31 | Epoch: 0 | Step: 346200 | Dataset: 0-9408360 | Loss: 0.349 | 596 ms/step , 115880.29 GFLOP/s , 174188.7 tokens/s INFO:__main__:2024-11-30 09:37:39 | Epoch: 0 | Step: 346210 | Dataset: 0-9410760 | Loss: 0.375 | 595 ms/step , 115989.28 GFLOP/s , 174129.7 tokens/s INFO:__main__:2024-11-30 09:37:46 | Epoch: 0 | Step: 346220 | Dataset: 0-9413160 | Loss: 0.323 | 595 ms/step , 115890.40 GFLOP/s , 174162.7 tokens/s INFO:__main__:2024-11-30 09:37:53 | Epoch: 0 | Step: 346230 | Dataset: 0-9415560 | Loss: 0.340 | 596 ms/step , 115860.58 GFLOP/s , 174113.1 tokens/s INFO:__main__:2024-11-30 09:38:00 | Epoch: 0 | Step: 346240 | Dataset: 0-9417960 | Loss: 0.333 | 595 ms/step , 115902.64 GFLOP/s , 174152.5 tokens/s INFO:__main__:2024-11-30 09:38:07 | Epoch: 0 | Step: 346250 | Dataset: 0-9420360 | Loss: 0.355 | 596 ms/step , 115785.19 GFLOP/s , 174102.8 tokens/s INFO:__main__:2024-11-30 09:38:14 | Epoch: 0 | Step: 346260 | Dataset: 0-9422760 | Loss: 0.324 | 595 ms/step , 116000.86 GFLOP/s , 174119.1 tokens/s INFO:__main__:2024-11-30 09:38:21 | Epoch: 0 | Step: 346270 | Dataset: 0-9425160 | Loss: 0.356 | 597 ms/step , 115694.66 GFLOP/s , 174069.9 tokens/s INFO:__main__:2024-11-30 09:38:28 | Epoch: 0 | Step: 346280 | Dataset: 0-9427560 | Loss: 0.360 | 596 ms/step , 115865.58 GFLOP/s , 174065.6 tokens/s INFO:__main__:2024-11-30 09:38:35 | Epoch: 0 | Step: 346290 | Dataset: 0-9429960 | Loss: 0.300 | 595 ms/step , 116034.17 GFLOP/s , 174175.2 tokens/s INFO:__main__:2024-11-30 09:38:42 | Epoch: 0 | Step: 346300 | Dataset: 0-9432360 | Loss: 0.292 | 597 ms/step , 115692.94 GFLOP/s , 174066.0 tokens/s INFO:__main__:2024-11-30 09:38:49 | Epoch: 0 | Step: 346310 | Dataset: 0-9434760 | Loss: 0.300 | 596 ms/step , 115777.20 GFLOP/s , 174063.2 tokens/s INFO:__main__:2024-11-30 09:38:56 | Epoch: 0 | Step: 346320 | Dataset: 0-9437160 | Loss: 0.287 | 595 ms/step , 115913.28 GFLOP/s , 174094.6 tokens/s INFO:__main__:2024-11-30 09:39:03 | Epoch: 0 | Step: 346330 | Dataset: 0-9439560 | Loss: 0.472 | 596 ms/step , 115825.13 GFLOP/s , 173960.6 tokens/s INFO:__main__:2024-11-30 09:39:10 | Epoch: 0 | Step: 346340 | Dataset: 0-9441960 | Loss: 0.475 | 596 ms/step , 115842.92 GFLOP/s , 173871.2 tokens/s INFO:__main__:2024-11-30 09:39:17 | Epoch: 0 | Step: 346350 | Dataset: 0-9444360 | Loss: 0.423 | 595 ms/step , 115896.27 GFLOP/s , 173970.5 tokens/s INFO:__main__:2024-11-30 09:39:24 | Epoch: 0 | Step: 346360 | Dataset: 0-9446760 | Loss: 0.431 | 596 ms/step , 115817.14 GFLOP/s , 173925.4 tokens/s INFO:__main__:2024-11-30 09:39:31 | Epoch: 0 | Step: 346370 | Dataset: 0-9449160 | Loss: 0.411 | 596 ms/step , 115877.19 GFLOP/s , 173910.2 tokens/s INFO:__main__:2024-11-30 09:39:39 | Epoch: 0 | Step: 346380 | Dataset: 0-9451560 | Loss: 0.484 | 596 ms/step , 115797.95 GFLOP/s , 173971.6 tokens/s INFO:__main__:2024-11-30 09:39:46 | Epoch: 0 | Step: 346390 | Dataset: 0-9453960 | Loss: 0.465 | 596 ms/step , 115701.90 GFLOP/s , 173933.7 tokens/s INFO:__main__:2024-11-30 09:39:53 | Epoch: 0 | Step: 346400 | Dataset: 0-9456360 | Loss: 0.408 | 595 ms/step , 115918.12 GFLOP/s , 174017.2 tokens/s INFO:__main__:2024-11-30 09:40:00 | Epoch: 0 | Step: 346410 | Dataset: 0-9458760 | Loss: 0.535 | 597 ms/step , 115648.69 GFLOP/s , 173970.3 tokens/s INFO:__main__:2024-11-30 09:40:07 | Epoch: 0 | Step: 346420 | Dataset: 0-9461160 | Loss: 0.462 | 596 ms/step , 115735.68 GFLOP/s , 174011.2 tokens/s INFO:__main__:2024-11-30 09:40:14 | Epoch: 0 | Step: 346430 | Dataset: 0-9463560 | Loss: 0.415 | 596 ms/step , 115840.34 GFLOP/s , 173913.5 tokens/s INFO:__main__:2024-11-30 09:40:21 | Epoch: 0 | Step: 346440 | Dataset: 0-9465960 | Loss: 0.449 | 597 ms/step , 115635.51 GFLOP/s , 173946.7 tokens/s INFO:__main__:2024-11-30 09:40:28 | Epoch: 0 | Step: 346450 | Dataset: 0-9468360 | Loss: 0.445 | 596 ms/step , 115736.39 GFLOP/s , 173976.1 tokens/s INFO:__main__:2024-11-30 09:40:35 | Epoch: 0 | Step: 346460 | Dataset: 0-9470760 | Loss: 0.438 | 597 ms/step , 115638.79 GFLOP/s , 173873.0 tokens/s INFO:__main__:2024-11-30 09:40:42 | Epoch: 0 | Step: 346470 | Dataset: 0-9473160 | Loss: 0.472 | 596 ms/step , 115744.28 GFLOP/s , 173978.2 tokens/s INFO:__main__:2024-11-30 09:40:49 | Epoch: 0 | Step: 346480 | Dataset: 0-9475560 | Loss: 0.455 | 596 ms/step , 115738.25 GFLOP/s , 173973.2 tokens/s INFO:__main__:2024-11-30 09:40:56 | Epoch: 0 | Step: 346490 | Dataset: 0-9477960 | Loss: 0.498 | 596 ms/step , 115802.59 GFLOP/s , 174010.6 tokens/s INFO:__main__:2024-11-30 09:41:04 | Validation | Step: 346500 | Val_loss: 0.374 | Best_val_loss: 0.3489 INFO:__main__:2024-11-30 09:41:05 | Epoch: 0 | Step: 346500 | Dataset: 0-9480360 | Loss: 0.512 | 596 ms/step , 115750.12 GFLOP/s , 148110.2 tokens/s INFO:__main__:2024-11-30 09:41:12 | Epoch: 0 | Step: 346510 | Dataset: 0-9482760 | Loss: 0.416 | 596 ms/step , 115797.14 GFLOP/s , 174084.3 tokens/s INFO:__main__:2024-11-30 09:41:19 | Epoch: 0 | Step: 346520 | Dataset: 0-9485160 | Loss: 0.452 | 596 ms/step , 115743.01 GFLOP/s , 174028.6 tokens/s INFO:__main__:2024-11-30 09:41:26 | Epoch: 0 | Step: 346530 | Dataset: 0-9487560 | Loss: 0.480 | 596 ms/step , 115786.75 GFLOP/s , 173989.2 tokens/s INFO:__main__:2024-11-30 09:41:33 | Epoch: 0 | Step: 346540 | Dataset: 0-9489960 | Loss: 0.412 | 596 ms/step , 115808.82 GFLOP/s , 173981.9 tokens/s INFO:__main__:2024-11-30 09:41:40 | Epoch: 0 | Step: 346550 | Dataset: 0-9492360 | Loss: 0.427 | 596 ms/step , 115830.19 GFLOP/s , 173965.4 tokens/s INFO:__main__:2024-11-30 09:41:47 | Epoch: 0 | Step: 346560 | Dataset: 0-9494760 | Loss: 0.443 | 595 ms/step , 115958.95 GFLOP/s , 174026.2 tokens/s INFO:__main__:2024-11-30 09:41:54 | Epoch: 0 | Step: 346570 | Dataset: 0-9497160 | Loss: 0.431 | 596 ms/step , 115862.67 GFLOP/s , 174049.5 tokens/s INFO:__main__:2024-11-30 09:42:01 | Epoch: 0 | Step: 346580 | Dataset: 0-9499560 | Loss: 0.449 | 596 ms/step , 115881.77 GFLOP/s , 173972.7 tokens/s INFO:__main__:2024-11-30 09:42:08 | Epoch: 0 | Step: 346590 | Dataset: 0-9501960 | Loss: 0.458 | 597 ms/step , 115661.78 GFLOP/s , 174003.9 tokens/s INFO:__main__:2024-11-30 09:42:15 | Epoch: 0 | Step: 346600 | Dataset: 0-9504360 | Loss: 0.471 | 596 ms/step , 115885.21 GFLOP/s , 173952.7 tokens/s INFO:__main__:2024-11-30 09:42:22 | Epoch: 0 | Step: 346610 | Dataset: 0-9506760 | Loss: 0.479 | 597 ms/step , 115681.57 GFLOP/s , 173912.4 tokens/s INFO:__main__:2024-11-30 09:42:29 | Epoch: 0 | Step: 346620 | Dataset: 0-9509160 | Loss: 0.476 | 595 ms/step , 115898.73 GFLOP/s , 173940.4 tokens/s INFO:__main__:2024-11-30 09:42:36 | Epoch: 0 | Step: 346630 | Dataset: 0-9511560 | Loss: 0.448 | 597 ms/step , 115643.65 GFLOP/s , 173933.6 tokens/s INFO:__main__:2024-11-30 09:42:43 | Epoch: 0 | Step: 346640 | Dataset: 0-9513960 | Loss: 0.427 | 596 ms/step , 115876.11 GFLOP/s , 173870.4 tokens/s INFO:__main__:2024-11-30 09:42:50 | Epoch: 0 | Step: 346650 | Dataset: 0-9516360 | Loss: 0.464 | 596 ms/step , 115777.46 GFLOP/s , 173911.8 tokens/s INFO:__main__:2024-11-30 09:42:58 | Epoch: 0 | Step: 346660 | Dataset: 0-9518760 | Loss: 0.473 | 596 ms/step , 115720.67 GFLOP/s , 173993.9 tokens/s INFO:__main__:2024-11-30 09:43:05 | Epoch: 0 | Step: 346670 | Dataset: 0-9521160 | Loss: 0.464 | 596 ms/step , 115733.96 GFLOP/s , 173939.1 tokens/s INFO:__main__:2024-11-30 09:43:12 | Epoch: 0 | Step: 346680 | Dataset: 0-9523560 | Loss: 0.419 | 596 ms/step , 115871.34 GFLOP/s , 173924.5 tokens/s INFO:__main__:2024-11-30 09:43:19 | Epoch: 0 | Step: 346690 | Dataset: 0-9525960 | Loss: 0.478 | 597 ms/step , 115650.20 GFLOP/s , 173969.5 tokens/s INFO:__main__:2024-11-30 09:43:26 | Epoch: 0 | Step: 346700 | Dataset: 0-9528360 | Loss: 0.448 | 596 ms/step , 115820.03 GFLOP/s , 173946.0 tokens/s INFO:__main__:2024-11-30 09:43:33 | Epoch: 0 | Step: 346710 | Dataset: 0-9530760 | Loss: 0.445 | 596 ms/step , 115739.62 GFLOP/s , 173980.4 tokens/s INFO:__main__:2024-11-30 09:43:40 | Epoch: 0 | Step: 346720 | Dataset: 0-9533160 | Loss: 0.447 | 597 ms/step , 115565.16 GFLOP/s , 173959.6 tokens/s INFO:__main__:2024-11-30 09:43:47 | Epoch: 0 | Step: 346730 | Dataset: 0-9535560 | Loss: 0.475 | 596 ms/step , 115781.74 GFLOP/s , 173921.7 tokens/s INFO:__main__:2024-11-30 09:43:54 | Epoch: 0 | Step: 346740 | Dataset: 0-9537960 | Loss: 0.413 | 596 ms/step , 115846.97 GFLOP/s , 173999.2 tokens/s INFO:__main__:2024-11-30 09:44:01 | Epoch: 0 | Step: 346750 | Dataset: 0-9540360 | Loss: 0.499 | 596 ms/step , 115712.72 GFLOP/s , 173922.1 tokens/s INFO:__main__:2024-11-30 09:44:08 | Epoch: 0 | Step: 346760 | Dataset: 0-9542760 | Loss: 0.474 | 596 ms/step , 115789.09 GFLOP/s , 173966.4 tokens/s INFO:__main__:2024-11-30 09:44:15 | Epoch: 0 | Step: 346770 | Dataset: 0-9545160 | Loss: 0.459 | 596 ms/step , 115830.62 GFLOP/s , 173849.7 tokens/s INFO:__main__:2024-11-30 09:44:22 | Epoch: 0 | Step: 346780 | Dataset: 0-9547560 | Loss: 0.429 | 596 ms/step , 115719.61 GFLOP/s , 173922.4 tokens/s INFO:__main__:2024-11-30 09:44:29 | Epoch: 0 | Step: 346790 | Dataset: 0-9549960 | Loss: 0.470 | 596 ms/step , 115880.35 GFLOP/s , 173946.8 tokens/s INFO:__main__:2024-11-30 09:44:36 | Epoch: 0 | Step: 346800 | Dataset: 0-9552360 | Loss: 0.459 | 597 ms/step , 115513.76 GFLOP/s , 173866.8 tokens/s INFO:__main__:2024-11-30 09:44:44 | Epoch: 0 | Step: 346810 | Dataset: 0-9554760 | Loss: 0.458 | 596 ms/step , 115755.15 GFLOP/s , 173905.6 tokens/s INFO:__main__:2024-11-30 09:44:51 | Epoch: 0 | Step: 346820 | Dataset: 0-9557160 | Loss: 0.478 | 596 ms/step , 115752.94 GFLOP/s , 173853.9 tokens/s INFO:__main__:2024-11-30 09:44:58 | Epoch: 0 | Step: 346830 | Dataset: 0-9559560 | Loss: 0.450 | 596 ms/step , 115758.93 GFLOP/s , 173921.4 tokens/s INFO:__main__:2024-11-30 09:45:05 | Epoch: 0 | Step: 346840 | Dataset: 0-9561960 | Loss: 0.483 | 596 ms/step , 115836.72 GFLOP/s , 173853.2 tokens/s INFO:__main__:2024-11-30 09:45:12 | Epoch: 0 | Step: 346850 | Dataset: 0-9564360 | Loss: 0.474 | 596 ms/step , 115818.60 GFLOP/s , 173852.1 tokens/s INFO:__main__:2024-11-30 09:45:19 | Epoch: 0 | Step: 346860 | Dataset: 0-9566760 | Loss: 0.471 | 597 ms/step , 115680.55 GFLOP/s , 173935.8 tokens/s INFO:__main__:2024-11-30 09:45:26 | Epoch: 0 | Step: 346870 | Dataset: 0-9569160 | Loss: 0.382 | 596 ms/step , 115696.23 GFLOP/s , 173857.0 tokens/s INFO:__main__:2024-11-30 09:45:33 | Epoch: 0 | Step: 346880 | Dataset: 0-9571560 | Loss: 0.389 | 596 ms/step , 115880.64 GFLOP/s , 174023.2 tokens/s INFO:__main__:2024-11-30 09:45:40 | Epoch: 0 | Step: 346890 | Dataset: 0-9573960 | Loss: 0.440 | 597 ms/step , 115688.08 GFLOP/s , 173963.3 tokens/s INFO:__main__:2024-11-30 09:45:47 | Epoch: 0 | Step: 346900 | Dataset: 0-9576360 | Loss: 0.435 | 596 ms/step , 115773.51 GFLOP/s , 173991.2 tokens/s INFO:__main__:2024-11-30 09:45:54 | Epoch: 0 | Step: 346910 | Dataset: 0-9578760 | Loss: 0.340 | 596 ms/step , 115793.84 GFLOP/s , 174026.4 tokens/s INFO:__main__:2024-11-30 09:46:01 | Epoch: 0 | Step: 346920 | Dataset: 0-9581160 | Loss: 0.339 | 595 ms/step , 115930.29 GFLOP/s , 173935.0 tokens/s INFO:__main__:2024-11-30 09:46:08 | Epoch: 0 | Step: 346930 | Dataset: 0-9583560 | Loss: 0.301 | 595 ms/step , 116011.69 GFLOP/s , 174045.7 tokens/s INFO:__main__:2024-11-30 09:46:15 | Epoch: 0 | Step: 346940 | Dataset: 0-9585960 | Loss: 0.383 | 596 ms/step , 115793.37 GFLOP/s , 173939.1 tokens/s INFO:__main__:2024-11-30 09:46:22 | Epoch: 0 | Step: 346950 | Dataset: 0-9588360 | Loss: 0.384 | 596 ms/step , 115838.21 GFLOP/s , 174062.4 tokens/s INFO:__main__:2024-11-30 09:46:29 | Epoch: 0 | Step: 346960 | Dataset: 0-9590760 | Loss: 0.364 | 596 ms/step , 115776.74 GFLOP/s , 174052.0 tokens/s INFO:__main__:2024-11-30 09:46:37 | Epoch: 0 | Step: 346970 | Dataset: 0-9593160 | Loss: 0.369 | 596 ms/step , 115852.66 GFLOP/s , 173976.2 tokens/s INFO:__main__:2024-11-30 09:46:44 | Epoch: 0 | Step: 346980 | Dataset: 0-9595560 | Loss: 0.368 | 596 ms/step , 115855.65 GFLOP/s , 173984.1 tokens/s INFO:__main__:2024-11-30 09:46:51 | Epoch: 0 | Step: 346990 | Dataset: 0-9597960 | Loss: 0.379 | 595 ms/step , 116011.83 GFLOP/s , 174001.2 tokens/s INFO:__main__:2024-11-30 09:46:58 | Validation | Step: 347000 | Val_loss: 0.363 | Best_val_loss: 0.3489 INFO:__main__:2024-11-30 09:46:58 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_094658_step_347000.pt` INFO:__main__:2024-11-30 09:47:01 | Epoch: 0 | Step: 347000 | Dataset: 0-9600360 | Loss: 0.376 | 594 ms/step , 116105.41 GFLOP/s , 120361.7 tokens/s INFO:__main__:2024-11-30 09:47:08 | Epoch: 0 | Step: 347010 | Dataset: 0-9602760 | Loss: 0.394 | 598 ms/step , 115494.58 GFLOP/s , 173650.9 tokens/s INFO:__main__:2024-11-30 09:47:15 | Epoch: 0 | Step: 347020 | Dataset: 0-9605160 | Loss: 0.328 | 598 ms/step , 115366.68 GFLOP/s , 173528.6 tokens/s INFO:__main__:2024-11-30 09:47:22 | Epoch: 0 | Step: 347030 | Dataset: 0-9607560 | Loss: 0.322 | 597 ms/step , 115507.83 GFLOP/s , 173571.9 tokens/s INFO:__main__:2024-11-30 09:47:29 | Epoch: 0 | Step: 347040 | Dataset: 0-9609960 | Loss: 0.331 | 598 ms/step , 115481.93 GFLOP/s , 173610.7 tokens/s INFO:__main__:2024-11-30 09:47:36 | Epoch: 0 | Step: 347050 | Dataset: 0-9612360 | Loss: 0.346 | 595 ms/step , 115906.92 GFLOP/s , 173446.9 tokens/s INFO:__main__:2024-11-30 09:47:43 | Epoch: 0 | Step: 347060 | Dataset: 0-9614760 | Loss: 0.322 | 595 ms/step , 115982.26 GFLOP/s , 173490.3 tokens/s INFO:__main__:2024-11-30 09:47:50 | Epoch: 0 | Step: 347070 | Dataset: 0-9617160 | Loss: 0.345 | 595 ms/step , 116002.63 GFLOP/s , 173511.6 tokens/s INFO:__main__:2024-11-30 09:47:58 | Epoch: 0 | Step: 347080 | Dataset: 0-9619560 | Loss: 0.363 | 596 ms/step , 115863.59 GFLOP/s , 173581.7 tokens/s INFO:__main__:2024-11-30 09:48:05 | Epoch: 0 | Step: 347090 | Dataset: 0-9621960 | Loss: 0.346 | 595 ms/step , 115987.91 GFLOP/s , 173552.9 tokens/s INFO:__main__:2024-11-30 09:48:12 | Epoch: 0 | Step: 347100 | Dataset: 0-9624360 | Loss: 0.356 | 595 ms/step , 115975.84 GFLOP/s , 173595.8 tokens/s INFO:__main__:2024-11-30 09:48:19 | Epoch: 0 | Step: 347110 | Dataset: 0-9626760 | Loss: 0.382 | 595 ms/step , 115934.55 GFLOP/s , 173532.3 tokens/s INFO:__main__:2024-11-30 09:48:26 | Epoch: 0 | Step: 347120 | Dataset: 0-9629160 | Loss: 0.391 | 595 ms/step , 115999.26 GFLOP/s , 173611.4 tokens/s INFO:__main__:2024-11-30 09:48:33 | Epoch: 0 | Step: 347130 | Dataset: 0-9631560 | Loss: 0.355 | 596 ms/step , 115863.47 GFLOP/s , 173600.9 tokens/s INFO:__main__:2024-11-30 09:48:40 | Epoch: 0 | Step: 347140 | Dataset: 0-9633960 | Loss: 0.303 | 595 ms/step , 115992.17 GFLOP/s , 173554.1 tokens/s INFO:__main__:2024-11-30 09:48:47 | Epoch: 0 | Step: 347150 | Dataset: 0-9636360 | Loss: 0.335 | 595 ms/step , 115909.42 GFLOP/s , 173549.7 tokens/s INFO:__main__:2024-11-30 09:48:54 | Epoch: 0 | Step: 347160 | Dataset: 0-9638760 | Loss: 0.345 | 596 ms/step , 115713.83 GFLOP/s , 173581.6 tokens/s INFO:__main__:2024-11-30 09:49:01 | Epoch: 0 | Step: 347170 | Dataset: 0-9641160 | Loss: 0.407 | 596 ms/step , 115851.13 GFLOP/s , 173551.1 tokens/s INFO:__main__:2024-11-30 09:49:08 | Epoch: 0 | Step: 347180 | Dataset: 0-9643560 | Loss: 0.378 | 596 ms/step , 115861.34 GFLOP/s , 173580.4 tokens/s INFO:__main__:2024-11-30 09:49:15 | Epoch: 0 | Step: 347190 | Dataset: 0-9645960 | Loss: 0.409 | 596 ms/step , 115700.56 GFLOP/s , 173531.2 tokens/s INFO:__main__:2024-11-30 09:49:22 | Epoch: 0 | Step: 347200 | Dataset: 0-9648360 | Loss: 0.368 | 595 ms/step , 115893.27 GFLOP/s , 173557.7 tokens/s INFO:__main__:2024-11-30 09:49:30 | Epoch: 0 | Step: 347210 | Dataset: 0-9650760 | Loss: 0.357 | 596 ms/step , 115863.38 GFLOP/s , 173571.7 tokens/s INFO:__main__:2024-11-30 09:49:37 | Epoch: 0 | Step: 347220 | Dataset: 0-9653160 | Loss: 0.358 | 596 ms/step , 115885.93 GFLOP/s , 173684.3 tokens/s INFO:__main__:2024-11-30 09:49:44 | Epoch: 0 | Step: 347230 | Dataset: 0-9655560 | Loss: 0.362 | 595 ms/step , 116039.98 GFLOP/s , 173666.4 tokens/s INFO:__main__:2024-11-30 09:49:51 | Epoch: 0 | Step: 347240 | Dataset: 0-9657960 | Loss: 0.387 | 595 ms/step , 116033.89 GFLOP/s , 173560.1 tokens/s INFO:__main__:2024-11-30 09:49:58 | Epoch: 0 | Step: 347250 | Dataset: 0-9660360 | Loss: 0.368 | 595 ms/step , 115923.35 GFLOP/s , 173622.5 tokens/s INFO:__main__:2024-11-30 09:50:05 | Epoch: 0 | Step: 347260 | Dataset: 0-9662760 | Loss: 0.396 | 595 ms/step , 116024.95 GFLOP/s , 173562.4 tokens/s INFO:__main__:2024-11-30 09:50:12 | Epoch: 0 | Step: 347270 | Dataset: 0-9665160 | Loss: 0.370 | 595 ms/step , 115977.21 GFLOP/s , 173596.5 tokens/s INFO:__main__:2024-11-30 09:50:19 | Epoch: 0 | Step: 347280 | Dataset: 0-9667560 | Loss: 0.382 | 595 ms/step , 115970.34 GFLOP/s , 173604.6 tokens/s INFO:__main__:2024-11-30 09:50:26 | Epoch: 0 | Step: 347290 | Dataset: 0-9669960 | Loss: 0.346 | 595 ms/step , 116022.75 GFLOP/s , 173574.8 tokens/s INFO:__main__:2024-11-30 09:50:33 | Epoch: 0 | Step: 347300 | Dataset: 0-9672360 | Loss: 0.381 | 594 ms/step , 116137.78 GFLOP/s , 173538.2 tokens/s INFO:__main__:2024-11-30 09:50:40 | Epoch: 0 | Step: 347310 | Dataset: 0-9674760 | Loss: 0.354 | 595 ms/step , 115939.28 GFLOP/s , 173652.8 tokens/s INFO:__main__:2024-11-30 09:50:47 | Epoch: 0 | Step: 347320 | Dataset: 0-9677160 | Loss: 0.390 | 595 ms/step , 115938.85 GFLOP/s , 173599.0 tokens/s INFO:__main__:2024-11-30 09:50:54 | Epoch: 0 | Step: 347330 | Dataset: 0-9679560 | Loss: 0.322 | 594 ms/step , 116106.49 GFLOP/s , 173433.1 tokens/s INFO:__main__:2024-11-30 09:51:02 | Epoch: 0 | Step: 347340 | Dataset: 0-9681960 | Loss: 0.304 | 595 ms/step , 115948.81 GFLOP/s , 173604.4 tokens/s INFO:__main__:2024-11-30 09:51:09 | Epoch: 0 | Step: 347350 | Dataset: 0-9684360 | Loss: 0.369 | 594 ms/step , 116103.56 GFLOP/s , 173629.7 tokens/s INFO:__main__:2024-11-30 09:51:16 | Epoch: 0 | Step: 347360 | Dataset: 0-9686760 | Loss: 0.343 | 595 ms/step , 115955.09 GFLOP/s , 173643.8 tokens/s INFO:__main__:2024-11-30 09:51:23 | Epoch: 0 | Step: 347370 | Dataset: 0-9689160 | Loss: 0.399 | 599 ms/step , 115218.74 GFLOP/s , 173427.3 tokens/s INFO:__main__:2024-11-30 09:51:30 | Epoch: 0 | Step: 347380 | Dataset: 0-9691560 | Loss: 0.376 | 595 ms/step , 115970.17 GFLOP/s , 173576.5 tokens/s INFO:__main__:2024-11-30 09:51:37 | Epoch: 0 | Step: 347390 | Dataset: 0-9693960 | Loss: 0.385 | 595 ms/step , 115992.02 GFLOP/s , 173627.1 tokens/s INFO:__main__:2024-11-30 09:51:44 | Epoch: 0 | Step: 347400 | Dataset: 0-9696360 | Loss: 0.374 | 596 ms/step , 115867.62 GFLOP/s , 173526.2 tokens/s INFO:__main__:2024-11-30 09:51:51 | Epoch: 0 | Step: 347410 | Dataset: 0-9698760 | Loss: 0.365 | 595 ms/step , 115907.72 GFLOP/s , 173642.1 tokens/s INFO:__main__:2024-11-30 09:51:58 | Epoch: 0 | Step: 347420 | Dataset: 0-9701160 | Loss: 0.502 | 595 ms/step , 115933.64 GFLOP/s , 173549.7 tokens/s INFO:__main__:2024-11-30 09:52:05 | Epoch: 0 | Step: 347430 | Dataset: 0-9703560 | Loss: 0.553 | 597 ms/step , 115684.03 GFLOP/s , 173510.2 tokens/s INFO:__main__:2024-11-30 09:52:12 | Epoch: 0 | Step: 347440 | Dataset: 0-9705960 | Loss: 0.585 | 595 ms/step , 115903.34 GFLOP/s , 173590.5 tokens/s INFO:__main__:2024-11-30 09:52:19 | Epoch: 0 | Step: 347450 | Dataset: 0-9708360 | Loss: 0.528 | 595 ms/step , 116009.89 GFLOP/s , 173510.4 tokens/s INFO:__main__:2024-11-30 09:52:27 | Epoch: 0 | Step: 347460 | Dataset: 0-9710760 | Loss: 0.551 | 596 ms/step , 115765.65 GFLOP/s , 173520.9 tokens/s INFO:__main__:2024-11-30 09:52:34 | Epoch: 0 | Step: 347470 | Dataset: 0-9713160 | Loss: 0.604 | 595 ms/step , 115912.53 GFLOP/s , 173603.9 tokens/s INFO:__main__:2024-11-30 09:52:41 | Epoch: 0 | Step: 347480 | Dataset: 0-9715560 | Loss: 0.570 | 595 ms/step , 115965.93 GFLOP/s , 173437.8 tokens/s INFO:__main__:2024-11-30 09:52:48 | Epoch: 0 | Step: 347490 | Dataset: 0-9717960 | Loss: 0.580 | 595 ms/step , 116040.01 GFLOP/s , 173590.9 tokens/s INFO:__main__:2024-11-30 09:52:55 | Validation | Step: 347500 | Val_loss: 0.369 | Best_val_loss: 0.3489 INFO:__main__:2024-11-30 09:52:56 | Epoch: 0 | Step: 347500 | Dataset: 0-9720360 | Loss: 0.554 | 596 ms/step , 115712.28 GFLOP/s , 147733.6 tokens/s INFO:__main__:2024-11-30 09:53:03 | Epoch: 0 | Step: 347510 | Dataset: 0-9722760 | Loss: 0.586 | 596 ms/step , 115824.04 GFLOP/s , 173738.8 tokens/s INFO:__main__:2024-11-30 09:53:10 | Epoch: 0 | Step: 347520 | Dataset: 0-9725160 | Loss: 0.571 | 595 ms/step , 115907.80 GFLOP/s , 173634.3 tokens/s INFO:__main__:2024-11-30 09:53:17 | Epoch: 0 | Step: 347530 | Dataset: 0-9727560 | Loss: 0.571 | 596 ms/step , 115773.63 GFLOP/s , 173668.7 tokens/s INFO:__main__:2024-11-30 09:53:24 | Epoch: 0 | Step: 347540 | Dataset: 0-9729960 | Loss: 0.591 | 595 ms/step , 116007.59 GFLOP/s , 173650.4 tokens/s INFO:__main__:2024-11-30 09:53:31 | Epoch: 0 | Step: 347550 | Dataset: 0-9732360 | Loss: 0.585 | 596 ms/step , 115828.22 GFLOP/s , 173630.2 tokens/s INFO:__main__:2024-11-30 09:53:39 | Epoch: 0 | Step: 347560 | Dataset: 0-9734760 | Loss: 0.619 | 596 ms/step , 115839.21 GFLOP/s , 173678.0 tokens/s INFO:__main__:2024-11-30 09:53:46 | Epoch: 0 | Step: 347570 | Dataset: 0-9737160 | Loss: 0.541 | 597 ms/step , 115592.20 GFLOP/s , 173571.8 tokens/s INFO:__main__:2024-11-30 09:53:53 | Epoch: 0 | Step: 347580 | Dataset: 0-9739560 | Loss: 0.565 | 596 ms/step , 115782.99 GFLOP/s , 173522.8 tokens/s INFO:__main__:2024-11-30 09:54:00 | Epoch: 0 | Step: 347590 | Dataset: 0-9741960 | Loss: 0.576 | 596 ms/step , 115833.20 GFLOP/s , 173574.6 tokens/s INFO:__main__:2024-11-30 09:54:07 | Epoch: 0 | Step: 347600 | Dataset: 0-9744360 | Loss: 0.549 | 597 ms/step , 115651.09 GFLOP/s , 173497.3 tokens/s INFO:__main__:2024-11-30 09:54:14 | Epoch: 0 | Step: 347610 | Dataset: 0-9746760 | Loss: 0.509 | 596 ms/step , 115832.93 GFLOP/s , 173525.0 tokens/s INFO:__main__:2024-11-30 09:54:21 | Epoch: 0 | Step: 347620 | Dataset: 0-9749160 | Loss: 0.600 | 596 ms/step , 115835.47 GFLOP/s , 173538.4 tokens/s INFO:__main__:2024-11-30 09:54:28 | Epoch: 0 | Step: 347630 | Dataset: 0-9751560 | Loss: 0.529 | 596 ms/step , 115699.52 GFLOP/s , 173531.9 tokens/s INFO:__main__:2024-11-30 09:54:35 | Epoch: 0 | Step: 347640 | Dataset: 0-9753960 | Loss: 0.560 | 595 ms/step , 116027.66 GFLOP/s , 173516.4 tokens/s INFO:__main__:2024-11-30 09:54:42 | Epoch: 0 | Step: 347650 | Dataset: 0-9756360 | Loss: 0.543 | 596 ms/step , 115871.28 GFLOP/s , 173377.5 tokens/s INFO:__main__:2024-11-30 09:54:49 | Epoch: 0 | Step: 347660 | Dataset: 0-9758760 | Loss: 0.595 | 595 ms/step , 115945.66 GFLOP/s , 173530.4 tokens/s INFO:__main__:2024-11-30 09:54:56 | Epoch: 0 | Step: 347670 | Dataset: 0-9761160 | Loss: 0.523 | 597 ms/step , 115639.97 GFLOP/s , 173464.0 tokens/s INFO:__main__:2024-11-30 09:55:04 | Epoch: 0 | Step: 347680 | Dataset: 0-9763560 | Loss: 0.579 | 597 ms/step , 115675.98 GFLOP/s , 173467.5 tokens/s INFO:__main__:2024-11-30 09:55:11 | Epoch: 0 | Step: 347690 | Dataset: 0-9765960 | Loss: 0.610 | 596 ms/step , 115885.38 GFLOP/s , 173477.9 tokens/s INFO:__main__:2024-11-30 09:55:18 | Epoch: 0 | Step: 347700 | Dataset: 0-9768360 | Loss: 0.557 | 596 ms/step , 115830.39 GFLOP/s , 173453.8 tokens/s INFO:__main__:2024-11-30 09:55:25 | Epoch: 0 | Step: 347710 | Dataset: 0-9770760 | Loss: 0.544 | 597 ms/step , 115537.09 GFLOP/s , 173411.9 tokens/s INFO:__main__:2024-11-30 09:55:32 | Epoch: 0 | Step: 347720 | Dataset: 0-9773160 | Loss: 0.561 | 596 ms/step , 115714.76 GFLOP/s , 173416.0 tokens/s INFO:__main__:2024-11-30 09:55:39 | Epoch: 0 | Step: 347730 | Dataset: 0-9775560 | Loss: 0.526 | 596 ms/step , 115819.09 GFLOP/s , 173454.8 tokens/s INFO:__main__:2024-11-30 09:55:46 | Epoch: 0 | Step: 347740 | Dataset: 0-9777960 | Loss: 0.594 | 597 ms/step , 115613.94 GFLOP/s , 173437.4 tokens/s INFO:__main__:2024-11-30 09:55:53 | Epoch: 0 | Step: 347750 | Dataset: 0-9780360 | Loss: 0.511 | 596 ms/step , 115801.18 GFLOP/s , 173500.3 tokens/s INFO:__main__:2024-11-30 09:56:00 | Epoch: 0 | Step: 347760 | Dataset: 0-9782760 | Loss: 0.558 | 597 ms/step , 115678.87 GFLOP/s , 173323.8 tokens/s INFO:__main__:2024-11-30 09:56:07 | Epoch: 0 | Step: 347770 | Dataset: 0-9785160 | Loss: 0.586 | 596 ms/step , 115777.58 GFLOP/s , 173450.2 tokens/s INFO:__main__:2024-11-30 09:56:14 | Epoch: 0 | Step: 347780 | Dataset: 0-9787560 | Loss: 0.592 | 597 ms/step , 115613.26 GFLOP/s , 173278.3 tokens/s INFO:__main__:2024-11-30 09:56:21 | Epoch: 0 | Step: 347790 | Dataset: 0-9789960 | Loss: 0.555 | 597 ms/step , 115556.01 GFLOP/s , 173367.6 tokens/s INFO:__main__:2024-11-30 09:56:29 | Epoch: 0 | Step: 347800 | Dataset: 0-9792360 | Loss: 0.544 | 596 ms/step , 115834.55 GFLOP/s , 173387.2 tokens/s INFO:__main__:2024-11-30 09:56:36 | Epoch: 0 | Step: 347810 | Dataset: 0-9794760 | Loss: 0.582 | 597 ms/step , 115670.45 GFLOP/s , 173326.4 tokens/s INFO:__main__:2024-11-30 09:56:43 | Epoch: 0 | Step: 347820 | Dataset: 0-9797160 | Loss: 0.553 | 596 ms/step , 115703.96 GFLOP/s , 173387.5 tokens/s INFO:__main__:2024-11-30 09:56:50 | Epoch: 0 | Step: 347830 | Dataset: 0-9799560 | Loss: 0.512 | 597 ms/step , 115577.12 GFLOP/s , 173316.6 tokens/s INFO:__main__:2024-11-30 09:56:57 | Epoch: 0 | Step: 347840 | Dataset: 0-9801960 | Loss: 0.511 | 596 ms/step , 115728.03 GFLOP/s , 173336.4 tokens/s INFO:__main__:2024-11-30 09:57:04 | Epoch: 0 | Step: 347850 | Dataset: 0-9804360 | Loss: 0.693 | 597 ms/step , 115671.39 GFLOP/s , 173423.4 tokens/s INFO:__main__:2024-11-30 09:57:11 | Epoch: 0 | Step: 347860 | Dataset: 0-9806760 | Loss: 0.563 | 596 ms/step , 115696.19 GFLOP/s , 173327.0 tokens/s INFO:__main__:2024-11-30 09:57:18 | Epoch: 0 | Step: 347870 | Dataset: 0-9809160 | Loss: 0.629 | 595 ms/step , 115929.76 GFLOP/s , 173364.2 tokens/s INFO:__main__:2024-11-30 09:57:25 | Epoch: 0 | Step: 347880 | Dataset: 0-9811560 | Loss: 0.559 | 597 ms/step , 115575.19 GFLOP/s , 173354.4 tokens/s INFO:__main__:2024-11-30 09:57:32 | Epoch: 0 | Step: 347890 | Dataset: 0-9813960 | Loss: 0.561 | 596 ms/step , 115700.55 GFLOP/s , 173536.3 tokens/s INFO:__main__:2024-11-30 09:57:39 | Epoch: 0 | Step: 347900 | Dataset: 0-9816360 | Loss: 0.538 | 597 ms/step , 115518.96 GFLOP/s , 173823.1 tokens/s INFO:__main__:2024-11-30 09:57:46 | Epoch: 0 | Step: 347910 | Dataset: 0-9818760 | Loss: 0.548 | 597 ms/step , 115565.92 GFLOP/s , 173837.0 tokens/s INFO:__main__:2024-11-30 09:57:54 | Epoch: 0 | Step: 347920 | Dataset: 0-9821160 | Loss: 0.512 | 596 ms/step , 115712.40 GFLOP/s , 173780.7 tokens/s INFO:__main__:2024-11-30 09:58:01 | Epoch: 0 | Step: 347930 | Dataset: 0-9823560 | Loss: 0.529 | 597 ms/step , 115637.27 GFLOP/s , 173802.0 tokens/s INFO:__main__:2024-11-30 09:58:08 | Epoch: 0 | Step: 347940 | Dataset: 0-9825960 | Loss: 0.582 | 597 ms/step , 115580.03 GFLOP/s , 173853.2 tokens/s INFO:__main__:2024-11-30 09:58:15 | Epoch: 0 | Step: 347950 | Dataset: 0-9828360 | Loss: 0.542 | 597 ms/step , 115690.61 GFLOP/s , 173852.5 tokens/s INFO:__main__:2024-11-30 09:58:22 | Epoch: 0 | Step: 347960 | Dataset: 0-9830760 | Loss: 0.609 | 597 ms/step , 115511.13 GFLOP/s , 173745.2 tokens/s INFO:__main__:2024-11-30 09:58:29 | Epoch: 0 | Step: 347970 | Dataset: 0-9833160 | Loss: 0.464 | 596 ms/step , 115762.96 GFLOP/s , 173884.6 tokens/s INFO:__main__:2024-11-30 09:58:36 | Epoch: 0 | Step: 347980 | Dataset: 0-9835560 | Loss: 0.386 | 596 ms/step , 115862.49 GFLOP/s , 173855.8 tokens/s INFO:__main__:2024-11-30 09:58:43 | Epoch: 0 | Step: 347990 | Dataset: 0-9837960 | Loss: 0.420 | 596 ms/step , 115729.21 GFLOP/s , 173882.4 tokens/s INFO:__main__:2024-11-30 09:58:51 | Validation | Step: 348000 | Val_loss: 0.390 | Best_val_loss: 0.3489 INFO:__main__:2024-11-30 09:58:51 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_095851_step_348000.pt` INFO:__main__:2024-11-30 09:58:53 | Epoch: 0 | Step: 348000 | Dataset: 0-9840360 | Loss: 0.351 | 595 ms/step , 116001.09 GFLOP/s , 121049.5 tokens/s INFO:__main__:2024-11-30 09:59:00 | Epoch: 0 | Step: 348010 | Dataset: 0-9842760 | Loss: 0.386 | 597 ms/step , 115546.23 GFLOP/s , 173587.7 tokens/s INFO:__main__:2024-11-30 09:59:07 | Epoch: 0 | Step: 348020 | Dataset: 0-9845160 | Loss: 0.420 | 595 ms/step , 115894.37 GFLOP/s , 173637.6 tokens/s INFO:__main__:2024-11-30 09:59:14 | Epoch: 0 | Step: 348030 | Dataset: 0-9847560 | Loss: 0.385 | 598 ms/step , 115312.74 GFLOP/s , 173531.7 tokens/s INFO:__main__:2024-11-30 09:59:22 | Epoch: 0 | Step: 348040 | Dataset: 0-9849960 | Loss: 0.397 | 597 ms/step , 115587.44 GFLOP/s , 173496.7 tokens/s INFO:__main__:2024-11-30 09:59:29 | Epoch: 0 | Step: 348050 | Dataset: 0-9852360 | Loss: 0.336 | 596 ms/step , 115784.51 GFLOP/s , 173484.7 tokens/s INFO:__main__:2024-11-30 09:59:36 | Epoch: 0 | Step: 348060 | Dataset: 0-9854760 | Loss: 0.370 | 596 ms/step , 115822.92 GFLOP/s , 173459.6 tokens/s INFO:__main__:2024-11-30 09:59:43 | Epoch: 0 | Step: 348070 | Dataset: 0-9857160 | Loss: 0.356 | 596 ms/step , 115872.20 GFLOP/s , 173440.2 tokens/s INFO:__main__:2024-11-30 09:59:50 | Epoch: 0 | Step: 348080 | Dataset: 0-9859560 | Loss: 0.364 | 596 ms/step , 115775.41 GFLOP/s , 173592.1 tokens/s INFO:__main__:2024-11-30 09:59:57 | Epoch: 0 | Step: 348090 | Dataset: 0-9861960 | Loss: 0.379 | 597 ms/step , 115584.27 GFLOP/s , 173908.0 tokens/s INFO:__main__:2024-11-30 10:00:04 | Epoch: 0 | Step: 348100 | Dataset: 0-9864360 | Loss: 0.414 | 596 ms/step , 115772.96 GFLOP/s , 173936.4 tokens/s INFO:__main__:2024-11-30 10:00:11 | Epoch: 0 | Step: 348110 | Dataset: 0-9866760 | Loss: 0.357 | 597 ms/step , 115645.88 GFLOP/s , 173842.0 tokens/s INFO:__main__:2024-11-30 10:00:18 | Epoch: 0 | Step: 348120 | Dataset: 0-9869160 | Loss: 0.340 | 597 ms/step , 115531.73 GFLOP/s , 173834.0 tokens/s INFO:__main__:2024-11-30 10:00:25 | Epoch: 0 | Step: 348130 | Dataset: 0-9871560 | Loss: 0.377 | 596 ms/step , 115803.57 GFLOP/s , 173743.6 tokens/s INFO:__main__:2024-11-30 10:00:32 | Epoch: 0 | Step: 348140 | Dataset: 0-9873960 | Loss: 0.428 | 596 ms/step , 115716.56 GFLOP/s , 173821.7 tokens/s INFO:__main__:2024-11-30 10:00:39 | Epoch: 0 | Step: 348150 | Dataset: 0-9876360 | Loss: 0.415 | 597 ms/step , 115645.96 GFLOP/s , 173790.3 tokens/s INFO:__main__:2024-11-30 10:00:46 | Epoch: 0 | Step: 348160 | Dataset: 0-9878760 | Loss: 0.450 | 596 ms/step , 115790.32 GFLOP/s , 173867.0 tokens/s INFO:__main__:2024-11-30 10:00:53 | Epoch: 0 | Step: 348170 | Dataset: 0-9881160 | Loss: 0.376 | 597 ms/step , 115674.24 GFLOP/s , 173877.0 tokens/s INFO:__main__:2024-11-30 10:01:01 | Epoch: 0 | Step: 348180 | Dataset: 0-9883560 | Loss: 0.327 | 596 ms/step , 115727.97 GFLOP/s , 173812.5 tokens/s INFO:__main__:2024-11-30 10:01:08 | Epoch: 0 | Step: 348190 | Dataset: 0-9885960 | Loss: 0.359 | 597 ms/step , 115599.26 GFLOP/s , 173783.7 tokens/s INFO:__main__:2024-11-30 10:01:15 | Epoch: 0 | Step: 348200 | Dataset: 0-9888360 | Loss: 0.377 | 596 ms/step , 115766.28 GFLOP/s , 173835.2 tokens/s INFO:__main__:2024-11-30 10:01:22 | Epoch: 0 | Step: 348210 | Dataset: 0-9890760 | Loss: 0.384 | 596 ms/step , 115816.44 GFLOP/s , 173848.6 tokens/s INFO:__main__:2024-11-30 10:01:29 | Epoch: 0 | Step: 348220 | Dataset: 0-9893160 | Loss: 0.379 | 596 ms/step , 115796.68 GFLOP/s , 173820.6 tokens/s INFO:__main__:2024-11-30 10:01:36 | Epoch: 0 | Step: 348230 | Dataset: 0-9895560 | Loss: 0.389 | 596 ms/step , 115730.75 GFLOP/s , 173878.5 tokens/s INFO:__main__:2024-11-30 10:01:43 | Epoch: 0 | Step: 348240 | Dataset: 0-9897960 | Loss: 0.367 | 597 ms/step , 115681.54 GFLOP/s , 173866.3 tokens/s INFO:__main__:2024-11-30 10:01:50 | Epoch: 0 | Step: 348250 | Dataset: 0-9900360 | Loss: 0.391 | 596 ms/step , 115750.78 GFLOP/s , 173865.8 tokens/s INFO:__main__:2024-11-30 10:01:57 | Epoch: 0 | Step: 348260 | Dataset: 0-9902760 | Loss: 0.408 | 597 ms/step , 115688.68 GFLOP/s , 173913.3 tokens/s INFO:__main__:2024-11-30 10:02:04 | Epoch: 0 | Step: 348270 | Dataset: 0-9905160 | Loss: 0.429 | 596 ms/step , 115766.96 GFLOP/s , 173949.4 tokens/s INFO:__main__:2024-11-30 10:02:11 | Epoch: 0 | Step: 348280 | Dataset: 0-9907560 | Loss: 0.393 | 596 ms/step , 115807.81 GFLOP/s , 173805.2 tokens/s INFO:__main__:2024-11-30 10:02:18 | Epoch: 0 | Step: 348290 | Dataset: 0-9909960 | Loss: 0.386 | 596 ms/step , 115789.57 GFLOP/s , 173931.6 tokens/s INFO:__main__:2024-11-30 10:02:25 | Epoch: 0 | Step: 348300 | Dataset: 0-9912360 | Loss: 0.417 | 597 ms/step , 115683.71 GFLOP/s , 173920.5 tokens/s INFO:__main__:2024-11-30 10:02:32 | Epoch: 0 | Step: 348310 | Dataset: 0-9914760 | Loss: 0.381 | 596 ms/step , 115818.60 GFLOP/s , 173927.5 tokens/s INFO:__main__:2024-11-30 10:02:39 | Epoch: 0 | Step: 348320 | Dataset: 0-9917160 | Loss: 0.354 | 596 ms/step , 115761.61 GFLOP/s , 173941.4 tokens/s INFO:__main__:2024-11-30 10:02:47 | Epoch: 0 | Step: 348330 | Dataset: 0-9919560 | Loss: 0.400 | 596 ms/step , 115866.39 GFLOP/s , 173920.6 tokens/s INFO:__main__:2024-11-30 10:02:54 | Epoch: 0 | Step: 348340 | Dataset: 0-9921960 | Loss: 0.380 | 597 ms/step , 115595.09 GFLOP/s , 173731.9 tokens/s INFO:__main__:2024-11-30 10:03:01 | Epoch: 0 | Step: 348350 | Dataset: 0-9924360 | Loss: 0.376 | 596 ms/step , 115700.20 GFLOP/s , 174006.3 tokens/s INFO:__main__:2024-11-30 10:03:08 | Epoch: 0 | Step: 348360 | Dataset: 0-9926760 | Loss: 0.380 | 596 ms/step , 115813.49 GFLOP/s , 173885.2 tokens/s INFO:__main__:2024-11-30 10:03:15 | Epoch: 0 | Step: 348370 | Dataset: 0-9929160 | Loss: 0.368 | 596 ms/step , 115805.40 GFLOP/s , 173963.9 tokens/s INFO:__main__:2024-11-30 10:03:22 | Epoch: 0 | Step: 348380 | Dataset: 0-9931560 | Loss: 0.374 | 597 ms/step , 115681.90 GFLOP/s , 174004.3 tokens/s INFO:__main__:2024-11-30 10:03:29 | Epoch: 0 | Step: 348390 | Dataset: 0-9933960 | Loss: 0.367 | 596 ms/step , 115793.73 GFLOP/s , 174022.6 tokens/s INFO:__main__:2024-11-30 10:03:36 | Epoch: 0 | Step: 348400 | Dataset: 0-9936360 | Loss: 0.402 | 595 ms/step , 115894.45 GFLOP/s , 173980.6 tokens/s INFO:__main__:2024-11-30 10:03:43 | Epoch: 0 | Step: 348410 | Dataset: 0-9938760 | Loss: 0.392 | 596 ms/step , 115788.45 GFLOP/s , 173976.3 tokens/s INFO:__main__:2024-11-30 10:03:50 | Epoch: 0 | Step: 348420 | Dataset: 0-9941160 | Loss: 0.370 | 596 ms/step , 115740.98 GFLOP/s , 173900.7 tokens/s INFO:__main__:2024-11-30 10:03:57 | Epoch: 0 | Step: 348430 | Dataset: 0-9943560 | Loss: 0.400 | 596 ms/step , 115837.85 GFLOP/s , 173959.8 tokens/s INFO:__main__:2024-11-30 10:04:04 | Epoch: 0 | Step: 348440 | Dataset: 0-9945960 | Loss: 0.371 | 596 ms/step , 115764.61 GFLOP/s , 174011.3 tokens/s INFO:__main__:2024-11-30 10:04:11 | Epoch: 0 | Step: 348450 | Dataset: 0-9948360 | Loss: 0.366 | 596 ms/step , 115729.11 GFLOP/s , 173997.0 tokens/s INFO:__main__:2024-11-30 10:04:18 | Epoch: 0 | Step: 348460 | Dataset: 0-9950760 | Loss: 0.356 | 596 ms/step , 115830.64 GFLOP/s , 173997.0 tokens/s INFO:__main__:2024-11-30 10:04:25 | Epoch: 0 | Step: 348470 | Dataset: 0-9953160 | Loss: 0.340 | 596 ms/step , 115850.69 GFLOP/s , 173993.9 tokens/s INFO:__main__:2024-11-30 10:04:32 | Epoch: 0 | Step: 348480 | Dataset: 0-9955560 | Loss: 0.340 | 598 ms/step , 115451.90 GFLOP/s , 173913.0 tokens/s INFO:__main__:2024-11-30 10:04:40 | Epoch: 0 | Step: 348490 | Dataset: 0-9957960 | Loss: 0.531 | 596 ms/step , 115779.05 GFLOP/s , 173852.3 tokens/s INFO:__main__:2024-11-30 10:04:47 | Validation | Step: 348500 | Val_loss: 0.388 | Best_val_loss: 0.3489 INFO:__main__:2024-11-30 10:04:48 | Epoch: 0 | Step: 348500 | Dataset: 0-9960360 | Loss: 0.609 | 596 ms/step , 115826.58 GFLOP/s , 148046.7 tokens/s INFO:__main__:2024-11-30 10:04:55 | Epoch: 0 | Step: 348510 | Dataset: 0-9962760 | Loss: 1.139 | 598 ms/step , 115471.47 GFLOP/s , 173493.3 tokens/s INFO:__main__:2024-11-30 10:05:02 | Epoch: 0 | Step: 348520 | Dataset: 0-9965160 | Loss: 1.086 | 597 ms/step , 115570.24 GFLOP/s , 173390.9 tokens/s INFO:__main__:2024-11-30 10:05:09 | Epoch: 0 | Step: 348530 | Dataset: 0-9967560 | Loss: 1.111 | 597 ms/step , 115611.23 GFLOP/s , 173397.8 tokens/s INFO:__main__:2024-11-30 10:05:16 | Epoch: 0 | Step: 348540 | Dataset: 0-9969960 | Loss: 1.158 | 597 ms/step , 115678.76 GFLOP/s , 173371.9 tokens/s INFO:__main__:2024-11-30 10:05:23 | Epoch: 0 | Step: 348550 | Dataset: 0-9972360 | Loss: 1.108 | 596 ms/step , 115802.71 GFLOP/s , 173402.9 tokens/s INFO:__main__:2024-11-30 10:05:30 | Epoch: 0 | Step: 348560 | Dataset: 0-9974760 | Loss: 1.103 | 597 ms/step , 115622.18 GFLOP/s , 173319.4 tokens/s INFO:__main__:2024-11-30 10:05:37 | Epoch: 0 | Step: 348570 | Dataset: 0-9977160 | Loss: 1.112 | 597 ms/step , 115641.52 GFLOP/s , 173319.4 tokens/s INFO:__main__:2024-11-30 10:05:45 | Epoch: 0 | Step: 348580 | Dataset: 0-9979560 | Loss: 1.073 | 597 ms/step , 115622.60 GFLOP/s , 173301.0 tokens/s INFO:__main__:2024-11-30 10:05:52 | Epoch: 0 | Step: 348590 | Dataset: 0-9981960 | Loss: 1.065 | 596 ms/step , 115791.56 GFLOP/s , 173283.1 tokens/s INFO:__main__:2024-11-30 10:05:59 | Epoch: 0 | Step: 348600 | Dataset: 0-9984360 | Loss: 1.061 | 597 ms/step , 115594.81 GFLOP/s , 173301.3 tokens/s INFO:__main__:2024-11-30 10:06:06 | Epoch: 0 | Step: 348610 | Dataset: 0-9986760 | Loss: 1.096 | 597 ms/step , 115693.28 GFLOP/s , 173264.9 tokens/s INFO:__main__:2024-11-30 10:06:13 | Epoch: 0 | Step: 348620 | Dataset: 0-9989160 | Loss: 1.052 | 597 ms/step , 115539.97 GFLOP/s , 173297.1 tokens/s INFO:__main__:2024-11-30 10:06:20 | Epoch: 0 | Step: 348630 | Dataset: 0-9991560 | Loss: 1.122 | 597 ms/step , 115519.60 GFLOP/s , 173335.6 tokens/s INFO:__main__:2024-11-30 10:06:27 | Epoch: 0 | Step: 348640 | Dataset: 0-9993960 | Loss: 1.065 | 596 ms/step , 115696.85 GFLOP/s , 173346.7 tokens/s INFO:__main__:2024-11-30 10:06:34 | Epoch: 0 | Step: 348650 | Dataset: 0-9996360 | Loss: 1.076 | 597 ms/step , 115556.52 GFLOP/s , 173293.4 tokens/s INFO:__main__:2024-11-30 10:06:41 | Epoch: 0 | Step: 348660 | Dataset: 0-9998760 | Loss: 1.076 | 597 ms/step , 115635.01 GFLOP/s , 173243.0 tokens/s INFO:__main__:2024-11-30 10:06:48 | Epoch: 0 | Step: 348670 | Dataset: 0-10001160 | Loss: 1.068 | 596 ms/step , 115750.34 GFLOP/s , 173353.3 tokens/s INFO:__main__:2024-11-30 10:06:55 | Epoch: 0 | Step: 348680 | Dataset: 0-10003560 | Loss: 1.087 | 597 ms/step , 115525.00 GFLOP/s , 173274.9 tokens/s INFO:__main__:2024-11-30 10:07:03 | Epoch: 0 | Step: 348690 | Dataset: 0-10005960 | Loss: 1.109 | 596 ms/step , 115746.10 GFLOP/s , 173322.8 tokens/s INFO:__main__:2024-11-30 10:07:10 | Epoch: 0 | Step: 348700 | Dataset: 0-10008360 | Loss: 1.046 | 597 ms/step , 115604.09 GFLOP/s , 173277.9 tokens/s INFO:__main__:2024-11-30 10:07:17 | Epoch: 0 | Step: 348710 | Dataset: 0-10010760 | Loss: 1.079 | 597 ms/step , 115601.60 GFLOP/s , 173348.8 tokens/s INFO:__main__:2024-11-30 10:07:24 | Epoch: 0 | Step: 348720 | Dataset: 0-10013160 | Loss: 1.128 | 597 ms/step , 115586.64 GFLOP/s , 173278.1 tokens/s INFO:__main__:2024-11-30 10:07:31 | Epoch: 0 | Step: 348730 | Dataset: 0-10015560 | Loss: 1.088 | 596 ms/step , 115714.09 GFLOP/s , 173354.3 tokens/s INFO:__main__:2024-11-30 10:07:38 | Epoch: 0 | Step: 348740 | Dataset: 0-10017960 | Loss: 1.051 | 597 ms/step , 115616.93 GFLOP/s , 173328.5 tokens/s INFO:__main__:2024-11-30 10:07:45 | Epoch: 0 | Step: 348750 | Dataset: 0-10020360 | Loss: 1.092 | 597 ms/step , 115645.84 GFLOP/s , 173291.9 tokens/s INFO:__main__:2024-11-30 10:07:52 | Epoch: 0 | Step: 348760 | Dataset: 0-10022760 | Loss: 1.058 | 597 ms/step , 115509.56 GFLOP/s , 173330.8 tokens/s INFO:__main__:2024-11-30 10:07:59 | Epoch: 0 | Step: 348770 | Dataset: 0-10025160 | Loss: 1.102 | 596 ms/step , 115824.39 GFLOP/s , 173251.0 tokens/s INFO:__main__:2024-11-30 10:08:06 | Epoch: 0 | Step: 348780 | Dataset: 0-10027560 | Loss: 1.043 | 597 ms/step , 115675.44 GFLOP/s , 173364.9 tokens/s INFO:__main__:2024-11-30 10:08:13 | Epoch: 0 | Step: 348790 | Dataset: 0-10029960 | Loss: 1.047 | 596 ms/step , 115717.98 GFLOP/s , 173321.6 tokens/s INFO:__main__:2024-11-30 10:08:21 | Epoch: 0 | Step: 348800 | Dataset: 0-10032360 | Loss: 1.040 | 597 ms/step , 115592.66 GFLOP/s , 173330.2 tokens/s INFO:__main__:2024-11-30 10:08:28 | Epoch: 0 | Step: 348810 | Dataset: 0-10034760 | Loss: 1.073 | 597 ms/step , 115658.60 GFLOP/s , 173385.2 tokens/s INFO:__main__:2024-11-30 10:08:35 | Epoch: 0 | Step: 348820 | Dataset: 0-10037160 | Loss: 1.043 | 597 ms/step , 115624.76 GFLOP/s , 173369.2 tokens/s INFO:__main__:2024-11-30 10:08:42 | Epoch: 0 | Step: 348830 | Dataset: 0-10039560 | Loss: 1.089 | 597 ms/step , 115570.18 GFLOP/s , 173310.7 tokens/s INFO:__main__:2024-11-30 10:08:49 | Epoch: 0 | Step: 348840 | Dataset: 0-10041960 | Loss: 1.067 | 597 ms/step , 115539.57 GFLOP/s , 173288.4 tokens/s INFO:__main__:2024-11-30 10:08:56 | Epoch: 0 | Step: 348850 | Dataset: 0-10044360 | Loss: 1.058 | 596 ms/step , 115791.77 GFLOP/s , 173244.2 tokens/s INFO:__main__:2024-11-30 10:09:03 | Epoch: 0 | Step: 348860 | Dataset: 0-10046760 | Loss: 1.041 | 597 ms/step , 115559.62 GFLOP/s , 173297.0 tokens/s INFO:__main__:2024-11-30 10:09:10 | Epoch: 0 | Step: 348870 | Dataset: 0-10049160 | Loss: 1.067 | 597 ms/step , 115573.45 GFLOP/s , 173235.0 tokens/s INFO:__main__:2024-11-30 10:09:17 | Epoch: 0 | Step: 348880 | Dataset: 0-10051560 | Loss: 1.072 | 596 ms/step , 115800.28 GFLOP/s , 173229.5 tokens/s INFO:__main__:2024-11-30 10:09:24 | Epoch: 0 | Step: 348890 | Dataset: 0-10053960 | Loss: 1.052 | 597 ms/step , 115638.28 GFLOP/s , 173360.1 tokens/s INFO:__main__:2024-11-30 10:09:31 | Epoch: 0 | Step: 348900 | Dataset: 0-10056360 | Loss: 1.048 | 597 ms/step , 115613.74 GFLOP/s , 173389.2 tokens/s INFO:__main__:2024-11-30 10:09:39 | Epoch: 0 | Step: 348910 | Dataset: 0-10058760 | Loss: 1.068 | 597 ms/step , 115646.79 GFLOP/s , 173334.6 tokens/s INFO:__main__:2024-11-30 10:09:46 | Epoch: 0 | Step: 348920 | Dataset: 0-10061160 | Loss: 1.038 | 597 ms/step , 115590.89 GFLOP/s , 173332.1 tokens/s INFO:__main__:2024-11-30 10:09:53 | Epoch: 0 | Step: 348930 | Dataset: 0-10063560 | Loss: 1.037 | 596 ms/step , 115724.94 GFLOP/s , 173291.5 tokens/s INFO:__main__:2024-11-30 10:10:00 | Epoch: 0 | Step: 348940 | Dataset: 0-10065960 | Loss: 1.092 | 596 ms/step , 115732.87 GFLOP/s , 173352.0 tokens/s INFO:__main__:2024-11-30 10:10:07 | Epoch: 0 | Step: 348950 | Dataset: 0-10068360 | Loss: 1.042 | 597 ms/step , 115582.38 GFLOP/s , 173273.3 tokens/s INFO:__main__:2024-11-30 10:10:14 | Epoch: 0 | Step: 348960 | Dataset: 0-10070760 | Loss: 1.055 | 596 ms/step , 115723.27 GFLOP/s , 173411.3 tokens/s INFO:__main__:2024-11-30 10:10:21 | Epoch: 0 | Step: 348970 | Dataset: 0-10073160 | Loss: 1.070 | 596 ms/step , 115762.90 GFLOP/s , 173348.0 tokens/s INFO:__main__:2024-11-30 10:10:28 | Epoch: 0 | Step: 348980 | Dataset: 0-10075560 | Loss: 1.047 | 598 ms/step , 115397.97 GFLOP/s , 173344.7 tokens/s INFO:__main__:2024-11-30 10:10:35 | Epoch: 0 | Step: 348990 | Dataset: 0-10077960 | Loss: 1.045 | 596 ms/step , 115731.73 GFLOP/s , 173433.8 tokens/s INFO:__main__:2024-11-30 10:10:43 | Validation | Step: 349000 | Val_loss: 0.392 | Best_val_loss: 0.3489 INFO:__main__:2024-11-30 10:10:43 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_101043_step_349000.pt` INFO:__main__:2024-11-30 10:10:45 | Epoch: 0 | Step: 349000 | Dataset: 0-10080360 | Loss: 1.047 | 595 ms/step , 115946.12 GFLOP/s , 120982.2 tokens/s INFO:__main__:2024-11-30 10:10:52 | Epoch: 0 | Step: 349010 | Dataset: 0-10082760 | Loss: 1.037 | 599 ms/step , 115190.79 GFLOP/s , 173431.8 tokens/s INFO:__main__:2024-11-30 10:11:00 | Epoch: 0 | Step: 349020 | Dataset: 0-10085160 | Loss: 1.050 | 598 ms/step , 115372.96 GFLOP/s , 173466.6 tokens/s INFO:__main__:2024-11-30 10:11:07 | Epoch: 0 | Step: 349030 | Dataset: 0-10087560 | Loss: 1.062 | 598 ms/step , 115372.94 GFLOP/s , 173467.4 tokens/s INFO:__main__:2024-11-30 10:11:14 | Epoch: 0 | Step: 349040 | Dataset: 0-10089960 | Loss: 1.063 | 599 ms/step , 115288.57 GFLOP/s , 173415.3 tokens/s INFO:__main__:2024-11-30 10:11:21 | Epoch: 0 | Step: 349050 | Dataset: 0-10092360 | Loss: 1.060 | 599 ms/step , 115259.98 GFLOP/s , 173431.4 tokens/s INFO:__main__:2024-11-30 10:11:28 | Epoch: 0 | Step: 349060 | Dataset: 0-10094760 | Loss: 0.676 | 597 ms/step , 115553.46 GFLOP/s , 173523.0 tokens/s INFO:__main__:2024-11-30 10:11:35 | Epoch: 0 | Step: 349070 | Dataset: 0-10097160 | Loss: 0.680 | 598 ms/step , 115338.25 GFLOP/s , 173494.7 tokens/s INFO:__main__:2024-11-30 10:11:42 | Epoch: 0 | Step: 349080 | Dataset: 0-10099560 | Loss: 0.751 | 598 ms/step , 115470.78 GFLOP/s , 173469.1 tokens/s INFO:__main__:2024-11-30 10:11:49 | Epoch: 0 | Step: 349090 | Dataset: 0-10101960 | Loss: 0.636 | 598 ms/step , 115498.91 GFLOP/s , 173528.9 tokens/s INFO:__main__:2024-11-30 10:11:56 | Epoch: 0 | Step: 349100 | Dataset: 0-10104360 | Loss: 0.768 | 599 ms/step , 115126.18 GFLOP/s , 173436.6 tokens/s INFO:__main__:2024-11-30 10:12:03 | Epoch: 0 | Step: 349110 | Dataset: 0-10106760 | Loss: 0.652 | 598 ms/step , 115323.68 GFLOP/s , 173469.0 tokens/s INFO:__main__:2024-11-30 10:12:10 | Epoch: 0 | Step: 349120 | Dataset: 0-10109160 | Loss: 0.764 | 598 ms/step , 115472.14 GFLOP/s , 173462.6 tokens/s INFO:__main__:2024-11-30 10:12:17 | Epoch: 0 | Step: 349130 | Dataset: 0-10111560 | Loss: 0.715 | 598 ms/step , 115328.62 GFLOP/s , 173190.1 tokens/s INFO:__main__:2024-11-30 10:12:25 | Epoch: 0 | Step: 349140 | Dataset: 0-10113960 | Loss: 0.646 | 597 ms/step , 115505.50 GFLOP/s , 173458.2 tokens/s INFO:__main__:2024-11-30 10:12:32 | Epoch: 0 | Step: 349150 | Dataset: 0-10116360 | Loss: 0.685 | 597 ms/step , 115554.36 GFLOP/s , 173487.2 tokens/s INFO:__main__:2024-11-30 10:12:39 | Epoch: 0 | Step: 349160 | Dataset: 0-10118760 | Loss: 0.654 | 597 ms/step , 115511.95 GFLOP/s , 173503.2 tokens/s INFO:__main__:2024-11-30 10:12:46 | Epoch: 0 | Step: 349170 | Dataset: 0-10121160 | Loss: 0.722 | 599 ms/step , 115300.31 GFLOP/s , 173389.2 tokens/s INFO:__main__:2024-11-30 10:12:53 | Epoch: 0 | Step: 349180 | Dataset: 0-10123560 | Loss: 0.627 | 598 ms/step , 115408.74 GFLOP/s , 173351.0 tokens/s INFO:__main__:2024-11-30 10:13:00 | Epoch: 0 | Step: 349190 | Dataset: 0-10125960 | Loss: 0.623 | 598 ms/step , 115348.02 GFLOP/s , 173438.3 tokens/s INFO:__main__:2024-11-30 10:13:07 | Epoch: 0 | Step: 349200 | Dataset: 0-10128360 | Loss: 0.726 | 598 ms/step , 115391.57 GFLOP/s , 173447.1 tokens/s INFO:__main__:2024-11-30 10:13:14 | Epoch: 0 | Step: 349210 | Dataset: 0-10130760 | Loss: 0.710 | 598 ms/step , 115393.14 GFLOP/s , 173554.4 tokens/s INFO:__main__:2024-11-30 10:13:21 | Epoch: 0 | Step: 349220 | Dataset: 0-10133160 | Loss: 0.736 | 598 ms/step , 115374.19 GFLOP/s , 173582.2 tokens/s INFO:__main__:2024-11-30 10:13:28 | Epoch: 0 | Step: 349230 | Dataset: 0-10135560 | Loss: 0.591 | 598 ms/step , 115448.12 GFLOP/s , 173582.7 tokens/s INFO:__main__:2024-11-30 10:13:35 | Epoch: 0 | Step: 349240 | Dataset: 0-10137960 | Loss: 0.742 | 598 ms/step , 115419.91 GFLOP/s , 173587.6 tokens/s INFO:__main__:2024-11-30 10:13:42 | Epoch: 0 | Step: 349250 | Dataset: 0-10140360 | Loss: 0.568 | 598 ms/step , 115345.62 GFLOP/s , 173541.4 tokens/s INFO:__main__:2024-11-30 10:13:50 | Epoch: 0 | Step: 349260 | Dataset: 0-10142760 | Loss: 0.747 | 598 ms/step , 115437.22 GFLOP/s , 173580.0 tokens/s INFO:__main__:2024-11-30 10:13:57 | Epoch: 0 | Step: 349270 | Dataset: 0-10145160 | Loss: 0.607 | 598 ms/step , 115446.75 GFLOP/s , 173550.9 tokens/s INFO:__main__:2024-11-30 10:14:04 | Epoch: 0 | Step: 349280 | Dataset: 0-10147560 | Loss: 0.644 | 598 ms/step , 115370.50 GFLOP/s , 173516.8 tokens/s INFO:__main__:2024-11-30 10:14:11 | Epoch: 0 | Step: 349290 | Dataset: 0-10149960 | Loss: 0.639 | 598 ms/step , 115390.97 GFLOP/s , 173543.1 tokens/s INFO:__main__:2024-11-30 10:14:18 | Epoch: 0 | Step: 349300 | Dataset: 0-10152360 | Loss: 0.709 | 598 ms/step , 115422.76 GFLOP/s , 173573.9 tokens/s INFO:__main__:2024-11-30 10:14:25 | Epoch: 0 | Step: 349310 | Dataset: 0-10154760 | Loss: 0.724 | 599 ms/step , 115294.30 GFLOP/s , 173603.4 tokens/s INFO:__main__:2024-11-30 10:14:32 | Epoch: 0 | Step: 349320 | Dataset: 0-10157160 | Loss: 0.646 | 598 ms/step , 115420.04 GFLOP/s , 173554.3 tokens/s INFO:__main__:2024-11-30 10:14:39 | Epoch: 0 | Step: 349330 | Dataset: 0-10159560 | Loss: 0.679 | 598 ms/step , 115335.81 GFLOP/s , 173570.1 tokens/s INFO:__main__:2024-11-30 10:14:46 | Epoch: 0 | Step: 349340 | Dataset: 0-10161960 | Loss: 0.648 | 598 ms/step , 115462.46 GFLOP/s , 173567.5 tokens/s INFO:__main__:2024-11-30 10:14:53 | Epoch: 0 | Step: 349350 | Dataset: 0-10164360 | Loss: 0.641 | 598 ms/step , 115406.36 GFLOP/s , 173549.8 tokens/s INFO:__main__:2024-11-30 10:15:00 | Epoch: 0 | Step: 349360 | Dataset: 0-10166760 | Loss: 0.763 | 599 ms/step , 115289.79 GFLOP/s , 173562.0 tokens/s INFO:__main__:2024-11-30 10:15:07 | Epoch: 0 | Step: 349370 | Dataset: 0-10169160 | Loss: 0.636 | 598 ms/step , 115365.12 GFLOP/s , 173574.5 tokens/s INFO:__main__:2024-11-30 10:15:15 | Epoch: 0 | Step: 349380 | Dataset: 0-10171560 | Loss: 0.695 | 597 ms/step , 115508.83 GFLOP/s , 173535.7 tokens/s INFO:__main__:2024-11-30 10:15:22 | Epoch: 0 | Step: 349390 | Dataset: 0-10173960 | Loss: 0.724 | 598 ms/step , 115338.36 GFLOP/s , 173570.0 tokens/s INFO:__main__:2024-11-30 10:15:29 | Epoch: 0 | Step: 349400 | Dataset: 0-10176360 | Loss: 0.647 | 598 ms/step , 115447.21 GFLOP/s , 173584.7 tokens/s INFO:__main__:2024-11-30 10:15:36 | Epoch: 0 | Step: 349410 | Dataset: 0-10178760 | Loss: 0.705 | 598 ms/step , 115499.83 GFLOP/s , 173347.8 tokens/s INFO:__main__:2024-11-30 10:15:43 | Epoch: 0 | Step: 349420 | Dataset: 0-10181160 | Loss: 0.627 | 598 ms/step , 115468.13 GFLOP/s , 173468.8 tokens/s INFO:__main__:2024-11-30 10:15:50 | Epoch: 0 | Step: 349430 | Dataset: 0-10183560 | Loss: 0.721 | 598 ms/step , 115397.76 GFLOP/s , 173572.0 tokens/s INFO:__main__:2024-11-30 10:15:57 | Epoch: 0 | Step: 349440 | Dataset: 0-10185960 | Loss: 0.673 | 597 ms/step , 115587.10 GFLOP/s , 173469.6 tokens/s INFO:__main__:2024-11-30 10:16:04 | Epoch: 0 | Step: 349450 | Dataset: 0-10188360 | Loss: 0.538 | 598 ms/step , 115325.99 GFLOP/s , 173482.0 tokens/s INFO:__main__:2024-11-30 10:16:11 | Epoch: 0 | Step: 349460 | Dataset: 0-10190760 | Loss: 0.635 | 602 ms/step , 114639.18 GFLOP/s , 173410.8 tokens/s INFO:__main__:2024-11-30 10:16:18 | Epoch: 0 | Step: 349470 | Dataset: 0-10193160 | Loss: 0.729 | 598 ms/step , 115433.67 GFLOP/s , 173545.6 tokens/s INFO:__main__:2024-11-30 10:16:25 | Epoch: 0 | Step: 349480 | Dataset: 0-10195560 | Loss: 0.743 | 596 ms/step , 115732.66 GFLOP/s , 173552.5 tokens/s INFO:__main__:2024-11-30 10:16:32 | Epoch: 0 | Step: 349490 | Dataset: 0-10197960 | Loss: 0.722 | 596 ms/step , 115796.60 GFLOP/s , 173522.8 tokens/s INFO:__main__:2024-11-30 10:16:40 | Validation | Step: 349500 | Val_loss: 0.389 | Best_val_loss: 0.3489 INFO:__main__:2024-11-30 10:16:41 | Epoch: 0 | Step: 349500 | Dataset: 0-10200360 | Loss: 0.587 | 594 ms/step , 116098.43 GFLOP/s , 147904.7 tokens/s INFO:__main__:2024-11-30 10:16:48 | Epoch: 0 | Step: 349510 | Dataset: 0-10202760 | Loss: 0.707 | 595 ms/step , 115941.32 GFLOP/s , 173695.0 tokens/s INFO:__main__:2024-11-30 10:16:55 | Epoch: 0 | Step: 349520 | Dataset: 0-10205160 | Loss: 0.621 | 596 ms/step , 115795.50 GFLOP/s , 173637.4 tokens/s INFO:__main__:2024-11-30 10:17:02 | Epoch: 0 | Step: 349530 | Dataset: 0-10207560 | Loss: 0.642 | 596 ms/step , 115794.33 GFLOP/s , 173620.0 tokens/s INFO:__main__:2024-11-30 10:17:09 | Epoch: 0 | Step: 349540 | Dataset: 0-10209960 | Loss: 0.648 | 596 ms/step , 115785.78 GFLOP/s , 173574.5 tokens/s INFO:__main__:2024-11-30 10:17:16 | Epoch: 0 | Step: 349550 | Dataset: 0-10212360 | Loss: 0.649 | 596 ms/step , 115795.25 GFLOP/s , 173622.8 tokens/s INFO:__main__:2024-11-30 10:17:23 | Epoch: 0 | Step: 349560 | Dataset: 0-10214760 | Loss: 0.739 | 596 ms/step , 115877.35 GFLOP/s , 173472.4 tokens/s INFO:__main__:2024-11-30 10:17:30 | Epoch: 0 | Step: 349570 | Dataset: 0-10217160 | Loss: 0.606 | 596 ms/step , 115825.74 GFLOP/s , 173576.0 tokens/s INFO:__main__:2024-11-30 10:17:37 | Epoch: 0 | Step: 349580 | Dataset: 0-10219560 | Loss: 0.744 | 596 ms/step , 115827.34 GFLOP/s , 173418.3 tokens/s INFO:__main__:2024-11-30 10:17:44 | Epoch: 0 | Step: 349590 | Dataset: 0-10221960 | Loss: 0.696 | 596 ms/step , 115737.41 GFLOP/s , 173535.4 tokens/s INFO:__main__:2024-11-30 10:17:52 | Epoch: 0 | Step: 349600 | Dataset: 0-10224360 | Loss: 0.361 | 595 ms/step , 115934.61 GFLOP/s , 173593.0 tokens/s INFO:__main__:2024-11-30 10:17:59 | Epoch: 0 | Step: 349610 | Dataset: 0-10226760 | Loss: 0.344 | 595 ms/step , 115989.53 GFLOP/s , 173559.6 tokens/s INFO:__main__:2024-11-30 10:18:06 | Epoch: 0 | Step: 349620 | Dataset: 0-10229160 | Loss: 0.376 | 595 ms/step , 115954.86 GFLOP/s , 173592.2 tokens/s INFO:__main__:2024-11-30 10:18:13 | Epoch: 0 | Step: 349630 | Dataset: 0-10231560 | Loss: 0.339 | 595 ms/step , 115902.50 GFLOP/s , 173632.6 tokens/s INFO:__main__:2024-11-30 10:18:20 | Epoch: 0 | Step: 349640 | Dataset: 0-10233960 | Loss: 0.317 | 596 ms/step , 115842.49 GFLOP/s , 173578.0 tokens/s INFO:__main__:2024-11-30 10:18:27 | Epoch: 0 | Step: 349650 | Dataset: 0-10236360 | Loss: 0.303 | 595 ms/step , 115897.54 GFLOP/s , 173596.3 tokens/s INFO:__main__:2024-11-30 10:18:34 | Epoch: 0 | Step: 349660 | Dataset: 0-10238760 | Loss: 0.360 | 596 ms/step , 115784.78 GFLOP/s , 173613.0 tokens/s INFO:__main__:2024-11-30 10:18:41 | Epoch: 0 | Step: 349670 | Dataset: 0-10241160 | Loss: 0.396 | 596 ms/step , 115702.75 GFLOP/s , 173610.3 tokens/s INFO:__main__:2024-11-30 10:18:48 | Epoch: 0 | Step: 349680 | Dataset: 0-10243560 | Loss: 0.334 | 595 ms/step , 115925.24 GFLOP/s , 173625.0 tokens/s INFO:__main__:2024-11-30 10:18:55 | Epoch: 0 | Step: 349690 | Dataset: 0-10245960 | Loss: 0.384 | 596 ms/step , 115779.97 GFLOP/s , 173588.3 tokens/s INFO:__main__:2024-11-30 10:19:02 | Epoch: 0 | Step: 349700 | Dataset: 0-10248360 | Loss: 0.336 | 596 ms/step , 115726.76 GFLOP/s , 173566.0 tokens/s INFO:__main__:2024-11-30 10:19:09 | Epoch: 0 | Step: 349710 | Dataset: 0-10250760 | Loss: 0.325 | 596 ms/step , 115880.58 GFLOP/s , 173522.8 tokens/s INFO:__main__:2024-11-30 10:19:16 | Epoch: 0 | Step: 349720 | Dataset: 0-10253160 | Loss: 0.361 | 595 ms/step , 115926.07 GFLOP/s , 173567.2 tokens/s INFO:__main__:2024-11-30 10:19:24 | Epoch: 0 | Step: 349730 | Dataset: 0-10255560 | Loss: 0.395 | 596 ms/step , 115866.15 GFLOP/s , 173600.7 tokens/s INFO:__main__:2024-11-30 10:19:31 | Epoch: 0 | Step: 349740 | Dataset: 0-10257960 | Loss: 0.356 | 595 ms/step , 115918.46 GFLOP/s , 173588.9 tokens/s INFO:__main__:2024-11-30 10:19:38 | Epoch: 0 | Step: 349750 | Dataset: 0-10260360 | Loss: 0.343 | 596 ms/step , 115851.29 GFLOP/s , 173497.0 tokens/s INFO:__main__:2024-11-30 10:19:45 | Epoch: 0 | Step: 349760 | Dataset: 0-10262760 | Loss: 0.373 | 597 ms/step , 115662.85 GFLOP/s , 173518.2 tokens/s INFO:__main__:2024-11-30 10:19:52 | Epoch: 0 | Step: 349770 | Dataset: 0-10265160 | Loss: 0.365 | 596 ms/step , 115845.61 GFLOP/s , 173578.1 tokens/s INFO:__main__:2024-11-30 10:19:59 | Epoch: 0 | Step: 349780 | Dataset: 0-10267560 | Loss: 0.349 | 596 ms/step , 115784.35 GFLOP/s , 173755.6 tokens/s INFO:__main__:2024-11-30 10:20:06 | Epoch: 0 | Step: 349790 | Dataset: 0-10269960 | Loss: 0.353 | 597 ms/step , 115585.81 GFLOP/s , 173977.5 tokens/s INFO:__main__:2024-11-30 10:20:13 | Epoch: 0 | Step: 349800 | Dataset: 0-10272360 | Loss: 0.330 | 595 ms/step , 115943.83 GFLOP/s , 174039.2 tokens/s INFO:__main__:2024-11-30 10:20:20 | Epoch: 0 | Step: 349810 | Dataset: 0-10274760 | Loss: 0.361 | 596 ms/step , 115849.69 GFLOP/s , 174034.8 tokens/s INFO:__main__:2024-11-30 10:20:27 | Epoch: 0 | Step: 349820 | Dataset: 0-10277160 | Loss: 0.290 | 596 ms/step , 115837.16 GFLOP/s , 173960.3 tokens/s INFO:__main__:2024-11-30 10:20:34 | Epoch: 0 | Step: 349830 | Dataset: 0-10279560 | Loss: 0.378 | 596 ms/step , 115765.91 GFLOP/s , 174014.6 tokens/s INFO:__main__:2024-11-30 10:20:41 | Epoch: 0 | Step: 349840 | Dataset: 0-10281960 | Loss: 0.382 | 596 ms/step , 115855.12 GFLOP/s , 174097.8 tokens/s INFO:__main__:2024-11-30 10:20:48 | Epoch: 0 | Step: 349850 | Dataset: 0-10284360 | Loss: 0.359 | 597 ms/step , 115664.69 GFLOP/s , 174031.7 tokens/s INFO:__main__:2024-11-30 10:20:55 | Epoch: 0 | Step: 349860 | Dataset: 0-10286760 | Loss: 0.346 | 596 ms/step , 115840.74 GFLOP/s , 174001.8 tokens/s INFO:__main__:2024-11-30 10:21:02 | Epoch: 0 | Step: 349870 | Dataset: 0-10289160 | Loss: 0.412 | 596 ms/step , 115784.50 GFLOP/s , 173945.1 tokens/s INFO:__main__:2024-11-30 10:21:10 | Epoch: 0 | Step: 349880 | Dataset: 0-10291560 | Loss: 0.388 | 596 ms/step , 115731.25 GFLOP/s , 174015.7 tokens/s INFO:__main__:2024-11-30 10:21:17 | Epoch: 0 | Step: 349890 | Dataset: 0-10293960 | Loss: 0.372 | 596 ms/step , 115728.24 GFLOP/s , 174041.0 tokens/s INFO:__main__:2024-11-30 10:21:24 | Epoch: 0 | Step: 349900 | Dataset: 0-10296360 | Loss: 0.315 | 596 ms/step , 115810.98 GFLOP/s , 174013.6 tokens/s INFO:__main__:2024-11-30 10:21:31 | Epoch: 0 | Step: 349910 | Dataset: 0-10298760 | Loss: 0.359 | 597 ms/step , 115657.25 GFLOP/s , 173971.4 tokens/s INFO:__main__:2024-11-30 10:21:38 | Epoch: 0 | Step: 349920 | Dataset: 0-10301160 | Loss: 0.371 | 595 ms/step , 115921.11 GFLOP/s , 174012.3 tokens/s INFO:__main__:2024-11-30 10:21:45 | Epoch: 0 | Step: 349930 | Dataset: 0-10303560 | Loss: 0.407 | 596 ms/step , 115880.22 GFLOP/s , 174014.4 tokens/s INFO:__main__:2024-11-30 10:21:52 | Epoch: 0 | Step: 349940 | Dataset: 0-10305960 | Loss: 0.393 | 597 ms/step , 115618.57 GFLOP/s , 173987.1 tokens/s INFO:__main__:2024-11-30 10:21:59 | Epoch: 0 | Step: 349950 | Dataset: 0-10308360 | Loss: 0.366 | 596 ms/step , 115716.98 GFLOP/s , 174030.8 tokens/s INFO:__main__:2024-11-30 10:22:06 | Epoch: 0 | Step: 349960 | Dataset: 0-10310760 | Loss: 0.337 | 596 ms/step , 115812.07 GFLOP/s , 173990.3 tokens/s INFO:__main__:2024-11-30 10:22:13 | Epoch: 0 | Step: 349970 | Dataset: 0-10313160 | Loss: 0.353 | 596 ms/step , 115820.86 GFLOP/s , 173998.1 tokens/s INFO:__main__:2024-11-30 10:22:20 | Epoch: 0 | Step: 349980 | Dataset: 0-10315560 | Loss: 0.358 | 596 ms/step , 115716.58 GFLOP/s , 173993.1 tokens/s INFO:__main__:2024-11-30 10:22:27 | Epoch: 0 | Step: 349990 | Dataset: 0-10317960 | Loss: 0.345 | 596 ms/step , 115703.86 GFLOP/s , 173911.2 tokens/s INFO:__main__:2024-11-30 10:22:35 | Validation | Step: 350000 | Val_loss: 0.349 | Best_val_loss: 0.3489 INFO:__main__:2024-11-30 10:22:35 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_102235_step_350000.pt` INFO:__main__:2024-11-30 10:22:37 | Epoch: 0 | Step: 350000 | Dataset: 0-10320360 | Loss: 0.366 | 595 ms/step , 116007.87 GFLOP/s , 121567.6 tokens/s INFO:__main__:2024-11-30 10:22:44 | Epoch: 0 | Step: 350010 | Dataset: 0-10322760 | Loss: 0.347 | 595 ms/step , 115939.70 GFLOP/s , 173718.3 tokens/s INFO:__main__:2024-11-30 10:22:52 | Epoch: 0 | Step: 350020 | Dataset: 0-10325160 | Loss: 0.327 | 596 ms/step , 115760.49 GFLOP/s , 173574.7 tokens/s INFO:__main__:2024-11-30 10:22:59 | Epoch: 0 | Step: 350030 | Dataset: 0-10327560 | Loss: 0.355 | 595 ms/step , 115928.86 GFLOP/s , 173580.5 tokens/s INFO:__main__:2024-11-30 10:23:06 | Epoch: 0 | Step: 350040 | Dataset: 0-10329960 | Loss: 0.321 | 595 ms/step , 115942.95 GFLOP/s , 173647.8 tokens/s INFO:__main__:2024-11-30 10:23:13 | Epoch: 0 | Step: 350050 | Dataset: 0-10332360 | Loss: 0.358 | 596 ms/step , 115833.40 GFLOP/s , 173613.8 tokens/s INFO:__main__:2024-11-30 10:23:20 | Epoch: 0 | Step: 350060 | Dataset: 0-10334760 | Loss: 0.403 | 596 ms/step , 115875.91 GFLOP/s , 173572.9 tokens/s INFO:__main__:2024-11-30 10:23:27 | Epoch: 0 | Step: 350070 | Dataset: 0-10337160 | Loss: 0.370 | 595 ms/step , 115927.59 GFLOP/s , 173532.5 tokens/s INFO:__main__:2024-11-30 10:23:34 | Epoch: 0 | Step: 350080 | Dataset: 0-10339560 | Loss: 0.356 | 595 ms/step , 115917.50 GFLOP/s , 173549.7 tokens/s INFO:__main__:2024-11-30 10:23:41 | Epoch: 0 | Step: 350090 | Dataset: 0-10341960 | Loss: 0.351 | 595 ms/step , 115980.35 GFLOP/s , 173493.8 tokens/s INFO:__main__:2024-11-30 10:23:48 | Epoch: 0 | Step: 350100 | Dataset: 0-10344360 | Loss: 0.309 | 596 ms/step , 115879.08 GFLOP/s , 173497.7 tokens/s INFO:__main__:2024-11-30 10:23:55 | Epoch: 0 | Step: 350110 | Dataset: 0-10346760 | Loss: 0.338 | 596 ms/step , 115809.54 GFLOP/s , 173506.3 tokens/s INFO:__main__:2024-11-30 10:24:02 | Epoch: 0 | Step: 350120 | Dataset: 0-10349160 | Loss: 0.344 | 596 ms/step , 115793.45 GFLOP/s , 173415.5 tokens/s INFO:__main__:2024-11-30 10:24:09 | Epoch: 0 | Step: 350130 | Dataset: 0-10351560 | Loss: 0.352 | 595 ms/step , 115968.68 GFLOP/s , 173997.6 tokens/s INFO:__main__:2024-11-30 10:24:16 | Epoch: 0 | Step: 350140 | Dataset: 0-10353960 | Loss: 0.351 | 596 ms/step , 115825.65 GFLOP/s , 173995.3 tokens/s INFO:__main__:2024-11-30 10:24:24 | Epoch: 0 | Step: 350150 | Dataset: 0-10356360 | Loss: 0.621 | 596 ms/step , 115720.29 GFLOP/s , 173937.5 tokens/s INFO:__main__:2024-11-30 10:24:31 | Epoch: 0 | Step: 350160 | Dataset: 0-10358760 | Loss: 0.560 | 596 ms/step , 115710.35 GFLOP/s , 173868.3 tokens/s INFO:__main__:2024-11-30 10:24:38 | Epoch: 0 | Step: 350170 | Dataset: 0-10361160 | Loss: 0.625 | 596 ms/step , 115735.82 GFLOP/s , 173822.8 tokens/s INFO:__main__:2024-11-30 10:24:45 | Epoch: 0 | Step: 350180 | Dataset: 0-10363560 | Loss: 0.556 | 596 ms/step , 115774.35 GFLOP/s , 173908.2 tokens/s INFO:__main__:2024-11-30 10:24:52 | Epoch: 0 | Step: 350190 | Dataset: 0-10365960 | Loss: 0.591 | 597 ms/step , 115513.85 GFLOP/s , 173842.6 tokens/s INFO:__main__:2024-11-30 10:24:59 | Epoch: 0 | Step: 350200 | Dataset: 0-10368360 | Loss: 0.563 | 597 ms/step , 115686.91 GFLOP/s , 173885.6 tokens/s INFO:__main__:2024-11-30 10:25:06 | Epoch: 0 | Step: 350210 | Dataset: 0-10370760 | Loss: 0.573 | 596 ms/step , 115759.25 GFLOP/s , 173929.4 tokens/s INFO:__main__:2024-11-30 10:25:13 | Epoch: 0 | Step: 350220 | Dataset: 0-10373160 | Loss: 0.566 | 597 ms/step , 115684.09 GFLOP/s , 173880.0 tokens/s INFO:__main__:2024-11-30 10:25:20 | Epoch: 0 | Step: 350230 | Dataset: 0-10375560 | Loss: 0.631 | 596 ms/step , 115827.47 GFLOP/s , 173889.3 tokens/s INFO:__main__:2024-11-30 10:25:27 | Epoch: 0 | Step: 350240 | Dataset: 0-10377960 | Loss: 0.583 | 596 ms/step , 115797.06 GFLOP/s , 173920.6 tokens/s INFO:__main__:2024-11-30 10:25:34 | Epoch: 0 | Step: 350250 | Dataset: 0-10380360 | Loss: 0.585 | 596 ms/step , 115705.39 GFLOP/s , 173837.9 tokens/s INFO:__main__:2024-11-30 10:25:41 | Epoch: 0 | Step: 350260 | Dataset: 0-10382760 | Loss: 0.617 | 596 ms/step , 115720.23 GFLOP/s , 173926.9 tokens/s INFO:__main__:2024-11-30 10:25:48 | Epoch: 0 | Step: 350270 | Dataset: 0-10385160 | Loss: 0.541 | 596 ms/step , 115712.39 GFLOP/s , 173918.0 tokens/s INFO:__main__:2024-11-30 10:25:55 | Epoch: 0 | Step: 350280 | Dataset: 0-10387560 | Loss: 0.579 | 597 ms/step , 115591.10 GFLOP/s , 173862.0 tokens/s INFO:__main__:2024-11-30 10:26:02 | Epoch: 0 | Step: 350290 | Dataset: 0-10389960 | Loss: 0.574 | 596 ms/step , 115730.55 GFLOP/s , 173866.8 tokens/s INFO:__main__:2024-11-30 10:26:10 | Epoch: 0 | Step: 350300 | Dataset: 0-10392360 | Loss: 0.515 | 596 ms/step , 115757.96 GFLOP/s , 173870.4 tokens/s INFO:__main__:2024-11-30 10:26:17 | Epoch: 0 | Step: 350310 | Dataset: 0-10394760 | Loss: 0.581 | 596 ms/step , 115763.58 GFLOP/s , 173904.7 tokens/s INFO:__main__:2024-11-30 10:26:24 | Epoch: 0 | Step: 350320 | Dataset: 0-10397160 | Loss: 0.576 | 596 ms/step , 115792.76 GFLOP/s , 173842.1 tokens/s INFO:__main__:2024-11-30 10:26:31 | Epoch: 0 | Step: 350330 | Dataset: 0-10399560 | Loss: 0.564 | 596 ms/step , 115808.42 GFLOP/s , 173914.9 tokens/s INFO:__main__:2024-11-30 10:26:38 | Epoch: 0 | Step: 350340 | Dataset: 0-10401960 | Loss: 0.562 | 596 ms/step , 115780.97 GFLOP/s , 173930.0 tokens/s INFO:__main__:2024-11-30 10:26:45 | Epoch: 0 | Step: 350350 | Dataset: 0-10404360 | Loss: 0.513 | 596 ms/step , 115782.51 GFLOP/s , 173857.0 tokens/s INFO:__main__:2024-11-30 10:26:52 | Epoch: 0 | Step: 350360 | Dataset: 0-10406760 | Loss: 0.505 | 596 ms/step , 115721.48 GFLOP/s , 173824.0 tokens/s INFO:__main__:2024-11-30 10:26:59 | Epoch: 0 | Step: 350370 | Dataset: 0-10409160 | Loss: 0.626 | 596 ms/step , 115832.04 GFLOP/s , 173911.7 tokens/s INFO:__main__:2024-11-30 10:27:06 | Epoch: 0 | Step: 350380 | Dataset: 0-10411560 | Loss: 0.520 | 596 ms/step , 115751.16 GFLOP/s , 173916.4 tokens/s INFO:__main__:2024-11-30 10:27:13 | Epoch: 0 | Step: 350390 | Dataset: 0-10413960 | Loss: 0.614 | 596 ms/step , 115774.22 GFLOP/s , 173898.8 tokens/s INFO:__main__:2024-11-30 10:27:20 | Epoch: 0 | Step: 350400 | Dataset: 0-10416360 | Loss: 0.594 | 597 ms/step , 115611.39 GFLOP/s , 173859.7 tokens/s INFO:__main__:2024-11-30 10:27:27 | Epoch: 0 | Step: 350410 | Dataset: 0-10418760 | Loss: 1.098 | 596 ms/step , 115707.56 GFLOP/s , 173918.7 tokens/s INFO:__main__:2024-11-30 10:27:34 | Epoch: 0 | Step: 350420 | Dataset: 0-10421160 | Loss: 0.752 | 598 ms/step , 115352.23 GFLOP/s , 173774.7 tokens/s INFO:__main__:2024-11-30 10:27:41 | Epoch: 0 | Step: 350430 | Dataset: 0-10423560 | Loss: 0.526 | 596 ms/step , 115792.28 GFLOP/s , 173835.4 tokens/s INFO:__main__:2024-11-30 10:27:48 | Epoch: 0 | Step: 350440 | Dataset: 0-10425960 | Loss: 0.480 | 597 ms/step , 115643.54 GFLOP/s , 173907.8 tokens/s INFO:__main__:2024-11-30 10:27:56 | Epoch: 0 | Step: 350450 | Dataset: 0-10428360 | Loss: 0.569 | 597 ms/step , 115684.15 GFLOP/s , 173902.9 tokens/s INFO:__main__:2024-11-30 10:28:03 | Epoch: 0 | Step: 350460 | Dataset: 0-10430760 | Loss: 0.553 | 596 ms/step , 115846.85 GFLOP/s , 173857.8 tokens/s INFO:__main__:2024-11-30 10:28:10 | Epoch: 0 | Step: 350470 | Dataset: 0-10433160 | Loss: 0.579 | 597 ms/step , 115682.32 GFLOP/s , 173902.5 tokens/s INFO:__main__:2024-11-30 10:28:17 | Epoch: 0 | Step: 350480 | Dataset: 0-10435560 | Loss: 0.568 | 596 ms/step , 115777.65 GFLOP/s , 173887.0 tokens/s INFO:__main__:2024-11-30 10:28:24 | Epoch: 0 | Step: 350490 | Dataset: 0-10437960 | Loss: 0.626 | 595 ms/step , 115996.88 GFLOP/s , 173913.9 tokens/s INFO:__main__:2024-11-30 10:28:31 | Validation | Step: 350500 | Val_loss: 0.569 | Best_val_loss: 0.3487 INFO:__main__:2024-11-30 10:28:32 | Epoch: 0 | Step: 350500 | Dataset: 0-10440360 | Loss: 0.579 | 597 ms/step , 115637.83 GFLOP/s , 148015.4 tokens/s INFO:__main__:2024-11-30 10:28:39 | Epoch: 0 | Step: 350510 | Dataset: 0-10442760 | Loss: 0.565 | 597 ms/step , 115624.43 GFLOP/s , 173564.1 tokens/s INFO:__main__:2024-11-30 10:28:46 | Epoch: 0 | Step: 350520 | Dataset: 0-10445160 | Loss: 0.512 | 596 ms/step , 115813.33 GFLOP/s , 173456.2 tokens/s INFO:__main__:2024-11-30 10:28:53 | Epoch: 0 | Step: 350530 | Dataset: 0-10447560 | Loss: 0.593 | 596 ms/step , 115801.44 GFLOP/s , 173516.9 tokens/s INFO:__main__:2024-11-30 10:29:00 | Epoch: 0 | Step: 350540 | Dataset: 0-10449960 | Loss: 0.494 | 595 ms/step , 115898.16 GFLOP/s , 173529.0 tokens/s INFO:__main__:2024-11-30 10:29:07 | Epoch: 0 | Step: 350550 | Dataset: 0-10452360 | Loss: 0.498 | 596 ms/step , 115735.61 GFLOP/s , 173438.1 tokens/s INFO:__main__:2024-11-30 10:29:15 | Epoch: 0 | Step: 350560 | Dataset: 0-10454760 | Loss: 0.571 | 596 ms/step , 115837.51 GFLOP/s , 173603.8 tokens/s INFO:__main__:2024-11-30 10:29:22 | Epoch: 0 | Step: 350570 | Dataset: 0-10457160 | Loss: 0.522 | 596 ms/step , 115797.79 GFLOP/s , 173457.3 tokens/s INFO:__main__:2024-11-30 10:29:29 | Epoch: 0 | Step: 350580 | Dataset: 0-10459560 | Loss: 0.594 | 596 ms/step , 115749.53 GFLOP/s , 173511.6 tokens/s INFO:__main__:2024-11-30 10:29:36 | Epoch: 0 | Step: 350590 | Dataset: 0-10461960 | Loss: 0.556 | 596 ms/step , 115706.71 GFLOP/s , 173471.7 tokens/s INFO:__main__:2024-11-30 10:29:43 | Epoch: 0 | Step: 350600 | Dataset: 0-10464360 | Loss: 0.643 | 595 ms/step , 116048.51 GFLOP/s , 173491.1 tokens/s INFO:__main__:2024-11-30 10:29:50 | Epoch: 0 | Step: 350610 | Dataset: 0-10466760 | Loss: 0.561 | 596 ms/step , 115755.57 GFLOP/s , 173520.0 tokens/s INFO:__main__:2024-11-30 10:29:57 | Epoch: 0 | Step: 350620 | Dataset: 0-10469160 | Loss: 0.528 | 596 ms/step , 115848.31 GFLOP/s , 173552.5 tokens/s INFO:__main__:2024-11-30 10:30:04 | Epoch: 0 | Step: 350630 | Dataset: 0-10471560 | Loss: 0.591 | 596 ms/step , 115885.25 GFLOP/s , 173512.5 tokens/s INFO:__main__:2024-11-30 10:30:11 | Epoch: 0 | Step: 350640 | Dataset: 0-10473960 | Loss: 0.538 | 596 ms/step , 115879.34 GFLOP/s , 173522.9 tokens/s INFO:__main__:2024-11-30 10:30:18 | Epoch: 0 | Step: 350650 | Dataset: 0-10476360 | Loss: 0.541 | 596 ms/step , 115825.10 GFLOP/s , 173508.4 tokens/s INFO:__main__:2024-11-30 10:30:25 | Epoch: 0 | Step: 350660 | Dataset: 0-10478760 | Loss: 0.624 | 596 ms/step , 115814.24 GFLOP/s , 173593.2 tokens/s INFO:__main__:2024-11-30 10:30:32 | Epoch: 0 | Step: 350670 | Dataset: 0-10481160 | Loss: 0.640 | 595 ms/step , 115919.74 GFLOP/s , 173533.8 tokens/s INFO:__main__:2024-11-30 10:30:40 | Epoch: 0 | Step: 350680 | Dataset: 0-10483560 | Loss: 0.532 | 596 ms/step , 115802.24 GFLOP/s , 173459.5 tokens/s INFO:__main__:2024-11-30 10:30:47 | Epoch: 0 | Step: 350690 | Dataset: 0-10485960 | Loss: 0.601 | 597 ms/step , 115625.05 GFLOP/s , 173503.0 tokens/s INFO:__main__:2024-11-30 10:30:54 | Epoch: 0 | Step: 350700 | Dataset: 0-10488360 | Loss: 0.796 | 597 ms/step , 115675.55 GFLOP/s , 173476.4 tokens/s INFO:__main__:2024-11-30 10:31:01 | Epoch: 0 | Step: 350710 | Dataset: 0-10490760 | Loss: 0.799 | 596 ms/step , 115755.00 GFLOP/s , 173422.9 tokens/s INFO:__main__:2024-11-30 10:31:08 | Epoch: 0 | Step: 350720 | Dataset: 0-10493160 | Loss: 0.779 | 597 ms/step , 115640.49 GFLOP/s , 173463.8 tokens/s INFO:__main__:2024-11-30 10:31:15 | Epoch: 0 | Step: 350730 | Dataset: 0-10495560 | Loss: 0.791 | 597 ms/step , 115566.17 GFLOP/s , 173420.3 tokens/s INFO:__main__:2024-11-30 10:31:22 | Epoch: 0 | Step: 350740 | Dataset: 0-10497960 | Loss: 0.770 | 596 ms/step , 115804.90 GFLOP/s , 173384.0 tokens/s INFO:__main__:2024-11-30 10:31:29 | Epoch: 0 | Step: 350750 | Dataset: 0-10500360 | Loss: 0.747 | 597 ms/step , 115549.95 GFLOP/s , 173385.0 tokens/s INFO:__main__:2024-11-30 10:31:36 | Epoch: 0 | Step: 350760 | Dataset: 0-10502760 | Loss: 0.777 | 597 ms/step , 115635.30 GFLOP/s , 173391.7 tokens/s INFO:__main__:2024-11-30 10:31:43 | Epoch: 0 | Step: 350770 | Dataset: 0-10505160 | Loss: 0.802 | 597 ms/step , 115694.36 GFLOP/s , 173322.8 tokens/s INFO:__main__:2024-11-30 10:31:50 | Epoch: 0 | Step: 350780 | Dataset: 0-10507560 | Loss: 0.784 | 597 ms/step , 115627.71 GFLOP/s , 173267.4 tokens/s INFO:__main__:2024-11-30 10:31:57 | Epoch: 0 | Step: 350790 | Dataset: 0-10509960 | Loss: 0.740 | 596 ms/step , 115773.11 GFLOP/s , 173510.8 tokens/s INFO:__main__:2024-11-30 10:32:05 | Epoch: 0 | Step: 350800 | Dataset: 0-10512360 | Loss: 0.769 | 596 ms/step , 115762.47 GFLOP/s , 173394.4 tokens/s INFO:__main__:2024-11-30 10:32:12 | Epoch: 0 | Step: 350810 | Dataset: 0-10514760 | Loss: 0.756 | 596 ms/step , 115738.34 GFLOP/s , 173399.6 tokens/s INFO:__main__:2024-11-30 10:32:19 | Epoch: 0 | Step: 350820 | Dataset: 0-10517160 | Loss: 0.779 | 597 ms/step , 115559.03 GFLOP/s , 173458.3 tokens/s INFO:__main__:2024-11-30 10:32:26 | Epoch: 0 | Step: 350830 | Dataset: 0-10519560 | Loss: 0.756 | 596 ms/step , 115829.63 GFLOP/s , 173483.8 tokens/s INFO:__main__:2024-11-30 10:32:33 | Epoch: 0 | Step: 350840 | Dataset: 0-10521960 | Loss: 0.743 | 597 ms/step , 115657.49 GFLOP/s , 173417.9 tokens/s INFO:__main__:2024-11-30 10:32:40 | Epoch: 0 | Step: 350850 | Dataset: 0-10524360 | Loss: 0.738 | 596 ms/step , 115742.75 GFLOP/s , 173484.9 tokens/s INFO:__main__:2024-11-30 10:32:47 | Epoch: 0 | Step: 350860 | Dataset: 0-10526760 | Loss: 0.731 | 596 ms/step , 115724.52 GFLOP/s , 173460.3 tokens/s INFO:__main__:2024-11-30 10:32:54 | Epoch: 0 | Step: 350870 | Dataset: 0-10529160 | Loss: 0.806 | 596 ms/step , 115756.04 GFLOP/s , 173441.1 tokens/s INFO:__main__:2024-11-30 10:33:01 | Epoch: 0 | Step: 350880 | Dataset: 0-10531560 | Loss: 0.727 | 596 ms/step , 115784.51 GFLOP/s , 173495.4 tokens/s INFO:__main__:2024-11-30 10:33:08 | Epoch: 0 | Step: 350890 | Dataset: 0-10533960 | Loss: 0.750 | 596 ms/step , 115857.28 GFLOP/s , 173490.4 tokens/s INFO:__main__:2024-11-30 10:33:15 | Epoch: 0 | Step: 350900 | Dataset: 0-10536360 | Loss: 0.747 | 596 ms/step , 115752.06 GFLOP/s , 173471.3 tokens/s INFO:__main__:2024-11-30 10:33:23 | Epoch: 0 | Step: 350910 | Dataset: 0-10538760 | Loss: 0.789 | 597 ms/step , 115542.64 GFLOP/s , 173370.1 tokens/s INFO:__main__:2024-11-30 10:33:30 | Epoch: 0 | Step: 350920 | Dataset: 0-10541160 | Loss: 0.762 | 596 ms/step , 115883.05 GFLOP/s , 173512.0 tokens/s INFO:__main__:2024-11-30 10:33:37 | Epoch: 0 | Step: 350930 | Dataset: 0-10543560 | Loss: 0.792 | 596 ms/step , 115864.74 GFLOP/s , 173479.0 tokens/s INFO:__main__:2024-11-30 10:33:44 | Epoch: 0 | Step: 350940 | Dataset: 0-10545960 | Loss: 0.766 | 595 ms/step , 115901.18 GFLOP/s , 173513.0 tokens/s INFO:__main__:2024-11-30 10:33:51 | Epoch: 0 | Step: 350950 | Dataset: 0-10548360 | Loss: 0.768 | 596 ms/step , 115853.98 GFLOP/s , 173411.4 tokens/s INFO:__main__:2024-11-30 10:33:58 | Epoch: 0 | Step: 350960 | Dataset: 0-10550760 | Loss: 0.780 | 597 ms/step , 115581.66 GFLOP/s , 173421.2 tokens/s INFO:__main__:2024-11-30 10:34:05 | Epoch: 0 | Step: 350970 | Dataset: 0-10553160 | Loss: 0.746 | 596 ms/step , 115729.45 GFLOP/s , 173417.9 tokens/s INFO:__main__:2024-11-30 10:34:12 | Epoch: 0 | Step: 350980 | Dataset: 0-10555560 | Loss: 0.759 | 595 ms/step , 115940.74 GFLOP/s , 173481.6 tokens/s INFO:__main__:2024-11-30 10:34:19 | Epoch: 0 | Step: 350990 | Dataset: 0-10557960 | Loss: 0.785 | 596 ms/step , 115701.11 GFLOP/s , 173466.5 tokens/s INFO:__main__:2024-11-30 10:34:27 | Validation | Step: 351000 | Val_loss: 0.648 | Best_val_loss: 0.3487 INFO:__main__:2024-11-30 10:34:27 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_103427_step_351000.pt` INFO:__main__:2024-11-30 10:34:29 | Epoch: 0 | Step: 351000 | Dataset: 0-10560360 | Loss: 0.734 | 595 ms/step , 115908.82 GFLOP/s , 120511.7 tokens/s INFO:__main__:2024-11-30 10:34:36 | Epoch: 0 | Step: 351010 | Dataset: 0-10562760 | Loss: 0.732 | 599 ms/step , 115249.88 GFLOP/s , 173556.0 tokens/s INFO:__main__:2024-11-30 10:34:44 | Epoch: 0 | Step: 351020 | Dataset: 0-10565160 | Loss: 0.801 | 598 ms/step , 115410.34 GFLOP/s , 173549.8 tokens/s INFO:__main__:2024-11-30 10:34:51 | Epoch: 0 | Step: 351030 | Dataset: 0-10567560 | Loss: 0.732 | 598 ms/step , 115351.97 GFLOP/s , 173530.1 tokens/s INFO:__main__:2024-11-30 10:34:58 | Epoch: 0 | Step: 351040 | Dataset: 0-10569960 | Loss: 0.770 | 598 ms/step , 115468.90 GFLOP/s , 173512.3 tokens/s INFO:__main__:2024-11-30 10:35:05 | Epoch: 0 | Step: 351050 | Dataset: 0-10572360 | Loss: 0.742 | 597 ms/step , 115532.53 GFLOP/s , 173520.2 tokens/s INFO:__main__:2024-11-30 10:35:12 | Epoch: 0 | Step: 351060 | Dataset: 0-10574760 | Loss: 0.748 | 598 ms/step , 115411.54 GFLOP/s , 173511.7 tokens/s INFO:__main__:2024-11-30 10:35:19 | Epoch: 0 | Step: 351070 | Dataset: 0-10577160 | Loss: 0.755 | 599 ms/step , 115207.74 GFLOP/s , 173486.1 tokens/s INFO:__main__:2024-11-30 10:35:26 | Epoch: 0 | Step: 351080 | Dataset: 0-10579560 | Loss: 0.739 | 599 ms/step , 115279.73 GFLOP/s , 173513.2 tokens/s INFO:__main__:2024-11-30 10:35:33 | Epoch: 0 | Step: 351090 | Dataset: 0-10581960 | Loss: 0.760 | 598 ms/step , 115358.42 GFLOP/s , 173468.9 tokens/s INFO:__main__:2024-11-30 10:35:40 | Epoch: 0 | Step: 351100 | Dataset: 0-10584360 | Loss: 0.761 | 598 ms/step , 115485.50 GFLOP/s , 173505.9 tokens/s INFO:__main__:2024-11-30 10:35:47 | Epoch: 0 | Step: 351110 | Dataset: 0-10586760 | Loss: 0.725 | 598 ms/step , 115423.67 GFLOP/s , 173467.3 tokens/s INFO:__main__:2024-11-30 10:35:54 | Epoch: 0 | Step: 351120 | Dataset: 0-10589160 | Loss: 0.778 | 597 ms/step , 115560.90 GFLOP/s , 173469.3 tokens/s INFO:__main__:2024-11-30 10:36:01 | Epoch: 0 | Step: 351130 | Dataset: 0-10591560 | Loss: 0.752 | 598 ms/step , 115386.18 GFLOP/s , 173479.0 tokens/s INFO:__main__:2024-11-30 10:36:09 | Epoch: 0 | Step: 351140 | Dataset: 0-10593960 | Loss: 0.746 | 599 ms/step , 115210.79 GFLOP/s , 173508.0 tokens/s INFO:__main__:2024-11-30 10:36:16 | Epoch: 0 | Step: 351150 | Dataset: 0-10596360 | Loss: 0.825 | 598 ms/step , 115387.75 GFLOP/s , 173505.7 tokens/s INFO:__main__:2024-11-30 10:36:23 | Epoch: 0 | Step: 351160 | Dataset: 0-10598760 | Loss: 0.821 | 598 ms/step , 115373.64 GFLOP/s , 173426.3 tokens/s INFO:__main__:2024-11-30 10:36:30 | Epoch: 0 | Step: 351170 | Dataset: 0-10601160 | Loss: 0.730 | 598 ms/step , 115488.08 GFLOP/s , 173495.0 tokens/s INFO:__main__:2024-11-30 10:36:37 | Epoch: 0 | Step: 351180 | Dataset: 0-10603560 | Loss: 0.772 | 598 ms/step , 115462.06 GFLOP/s , 173448.2 tokens/s INFO:__main__:2024-11-30 10:36:44 | Epoch: 0 | Step: 351190 | Dataset: 0-10605960 | Loss: 0.748 | 598 ms/step , 115389.15 GFLOP/s , 173505.7 tokens/s INFO:__main__:2024-11-30 10:36:51 | Epoch: 0 | Step: 351200 | Dataset: 0-10608360 | Loss: 0.809 | 598 ms/step , 115471.43 GFLOP/s , 173501.5 tokens/s INFO:__main__:2024-11-30 10:36:58 | Epoch: 0 | Step: 351210 | Dataset: 0-10610760 | Loss: 0.807 | 597 ms/step , 115502.41 GFLOP/s , 173442.2 tokens/s INFO:__main__:2024-11-30 10:37:05 | Epoch: 0 | Step: 351220 | Dataset: 0-10613160 | Loss: 0.760 | 598 ms/step , 115394.79 GFLOP/s , 173459.2 tokens/s INFO:__main__:2024-11-30 10:37:12 | Epoch: 0 | Step: 351230 | Dataset: 0-10615560 | Loss: 0.807 | 598 ms/step , 115477.60 GFLOP/s , 173532.0 tokens/s INFO:__main__:2024-11-30 10:37:19 | Epoch: 0 | Step: 351240 | Dataset: 0-10617960 | Loss: 0.787 | 597 ms/step , 115530.66 GFLOP/s , 173555.8 tokens/s INFO:__main__:2024-11-30 10:37:26 | Epoch: 0 | Step: 351250 | Dataset: 0-10620360 | Loss: 0.675 | 602 ms/step , 114733.97 GFLOP/s , 173523.0 tokens/s INFO:__main__:2024-11-30 10:37:34 | Epoch: 0 | Step: 351260 | Dataset: 0-10622760 | Loss: 0.664 | 597 ms/step , 115565.90 GFLOP/s , 173499.4 tokens/s INFO:__main__:2024-11-30 10:37:41 | Epoch: 0 | Step: 351270 | Dataset: 0-10625160 | Loss: 0.610 | 597 ms/step , 115603.05 GFLOP/s , 173615.1 tokens/s INFO:__main__:2024-11-30 10:37:48 | Epoch: 0 | Step: 351280 | Dataset: 0-10627560 | Loss: 0.633 | 597 ms/step , 115565.74 GFLOP/s , 173575.1 tokens/s INFO:__main__:2024-11-30 10:37:55 | Epoch: 0 | Step: 351290 | Dataset: 0-10629960 | Loss: 0.631 | 596 ms/step , 115742.47 GFLOP/s , 173521.2 tokens/s INFO:__main__:2024-11-30 10:38:02 | Epoch: 0 | Step: 351300 | Dataset: 0-10632360 | Loss: 0.748 | 598 ms/step , 115498.35 GFLOP/s , 173626.1 tokens/s INFO:__main__:2024-11-30 10:38:09 | Epoch: 0 | Step: 351310 | Dataset: 0-10634760 | Loss: 0.627 | 598 ms/step , 115465.69 GFLOP/s , 173515.3 tokens/s INFO:__main__:2024-11-30 10:38:16 | Epoch: 0 | Step: 351320 | Dataset: 0-10637160 | Loss: 0.712 | 597 ms/step , 115525.53 GFLOP/s , 173604.1 tokens/s INFO:__main__:2024-11-30 10:38:23 | Epoch: 0 | Step: 351330 | Dataset: 0-10639560 | Loss: 0.679 | 597 ms/step , 115678.35 GFLOP/s , 173498.6 tokens/s INFO:__main__:2024-11-30 10:38:30 | Epoch: 0 | Step: 351340 | Dataset: 0-10641960 | Loss: 0.691 | 598 ms/step , 115448.77 GFLOP/s , 173498.3 tokens/s INFO:__main__:2024-11-30 10:38:37 | Epoch: 0 | Step: 351350 | Dataset: 0-10644360 | Loss: 0.660 | 597 ms/step , 115580.99 GFLOP/s , 173538.2 tokens/s INFO:__main__:2024-11-30 10:38:44 | Epoch: 0 | Step: 351360 | Dataset: 0-10646760 | Loss: 0.651 | 598 ms/step , 115452.59 GFLOP/s , 173590.2 tokens/s INFO:__main__:2024-11-30 10:38:51 | Epoch: 0 | Step: 351370 | Dataset: 0-10649160 | Loss: 0.718 | 598 ms/step , 115429.73 GFLOP/s , 173523.7 tokens/s INFO:__main__:2024-11-30 10:38:58 | Epoch: 0 | Step: 351380 | Dataset: 0-10651560 | Loss: 0.739 | 598 ms/step , 115482.74 GFLOP/s , 173561.3 tokens/s INFO:__main__:2024-11-30 10:39:06 | Epoch: 0 | Step: 351390 | Dataset: 0-10653960 | Loss: 0.689 | 599 ms/step , 115205.15 GFLOP/s , 173422.3 tokens/s INFO:__main__:2024-11-30 10:39:13 | Epoch: 0 | Step: 351400 | Dataset: 0-10656360 | Loss: 0.670 | 599 ms/step , 115255.41 GFLOP/s , 173540.4 tokens/s INFO:__main__:2024-11-30 10:39:20 | Epoch: 0 | Step: 351410 | Dataset: 0-10658760 | Loss: 0.655 | 598 ms/step , 115440.31 GFLOP/s , 173510.4 tokens/s INFO:__main__:2024-11-30 10:39:27 | Epoch: 0 | Step: 351420 | Dataset: 0-10661160 | Loss: 0.751 | 597 ms/step , 115507.37 GFLOP/s , 173544.6 tokens/s INFO:__main__:2024-11-30 10:39:34 | Epoch: 0 | Step: 351430 | Dataset: 0-10663560 | Loss: 0.680 | 597 ms/step , 115554.68 GFLOP/s , 173574.1 tokens/s INFO:__main__:2024-11-30 10:39:41 | Epoch: 0 | Step: 351440 | Dataset: 0-10665960 | Loss: 0.697 | 597 ms/step , 115519.95 GFLOP/s , 173392.7 tokens/s INFO:__main__:2024-11-30 10:39:48 | Epoch: 0 | Step: 351450 | Dataset: 0-10668360 | Loss: 0.661 | 597 ms/step , 115576.36 GFLOP/s , 173562.4 tokens/s INFO:__main__:2024-11-30 10:39:55 | Epoch: 0 | Step: 351460 | Dataset: 0-10670760 | Loss: 0.709 | 598 ms/step , 115362.21 GFLOP/s , 173510.6 tokens/s INFO:__main__:2024-11-30 10:40:02 | Epoch: 0 | Step: 351470 | Dataset: 0-10673160 | Loss: 0.716 | 598 ms/step , 115326.46 GFLOP/s , 173466.7 tokens/s INFO:__main__:2024-11-30 10:40:09 | Epoch: 0 | Step: 351480 | Dataset: 0-10675560 | Loss: 0.650 | 599 ms/step , 115290.96 GFLOP/s , 173488.1 tokens/s INFO:__main__:2024-11-30 10:40:16 | Epoch: 0 | Step: 351490 | Dataset: 0-10677960 | Loss: 0.579 | 598 ms/step , 115364.25 GFLOP/s , 173394.9 tokens/s INFO:__main__:2024-11-30 10:40:24 | Validation | Step: 351500 | Val_loss: 0.688 | Best_val_loss: 0.3487 INFO:__main__:2024-11-30 10:40:25 | Epoch: 0 | Step: 351500 | Dataset: 0-10680360 | Loss: 0.666 | 597 ms/step , 115546.87 GFLOP/s , 147632.5 tokens/s INFO:__main__:2024-11-30 10:40:32 | Epoch: 0 | Step: 351510 | Dataset: 0-10682760 | Loss: 0.720 | 598 ms/step , 115389.95 GFLOP/s , 173560.3 tokens/s INFO:__main__:2024-11-30 10:40:39 | Epoch: 0 | Step: 351520 | Dataset: 0-10685160 | Loss: 0.784 | 597 ms/step , 115548.65 GFLOP/s , 173514.2 tokens/s INFO:__main__:2024-11-30 10:40:46 | Epoch: 0 | Step: 351530 | Dataset: 0-10687560 | Loss: 0.658 | 598 ms/step , 115332.23 GFLOP/s , 173435.6 tokens/s INFO:__main__:2024-11-30 10:40:53 | Epoch: 0 | Step: 351540 | Dataset: 0-10689960 | Loss: 0.690 | 599 ms/step , 115199.44 GFLOP/s , 173481.7 tokens/s INFO:__main__:2024-11-30 10:41:00 | Epoch: 0 | Step: 351550 | Dataset: 0-10692360 | Loss: 0.687 | 598 ms/step , 115462.17 GFLOP/s , 173430.9 tokens/s INFO:__main__:2024-11-30 10:41:07 | Epoch: 0 | Step: 351560 | Dataset: 0-10694760 | Loss: 0.561 | 598 ms/step , 115451.41 GFLOP/s , 173492.3 tokens/s INFO:__main__:2024-11-30 10:41:14 | Epoch: 0 | Step: 351570 | Dataset: 0-10697160 | Loss: 0.668 | 598 ms/step , 115382.65 GFLOP/s , 173517.9 tokens/s INFO:__main__:2024-11-30 10:41:21 | Epoch: 0 | Step: 351580 | Dataset: 0-10699560 | Loss: 0.606 | 597 ms/step , 115522.02 GFLOP/s , 173517.7 tokens/s INFO:__main__:2024-11-30 10:41:28 | Epoch: 0 | Step: 351590 | Dataset: 0-10701960 | Loss: 0.757 | 597 ms/step , 115508.40 GFLOP/s , 173532.3 tokens/s INFO:__main__:2024-11-30 10:41:36 | Epoch: 0 | Step: 351600 | Dataset: 0-10704360 | Loss: 0.724 | 598 ms/step , 115356.43 GFLOP/s , 173531.0 tokens/s INFO:__main__:2024-11-30 10:41:43 | Epoch: 0 | Step: 351610 | Dataset: 0-10706760 | Loss: 0.793 | 598 ms/step , 115361.83 GFLOP/s , 173417.1 tokens/s INFO:__main__:2024-11-30 10:41:50 | Epoch: 0 | Step: 351620 | Dataset: 0-10709160 | Loss: 0.736 | 598 ms/step , 115439.20 GFLOP/s , 173391.9 tokens/s INFO:__main__:2024-11-30 10:41:57 | Epoch: 0 | Step: 351630 | Dataset: 0-10711560 | Loss: 0.633 | 598 ms/step , 115425.29 GFLOP/s , 173474.6 tokens/s INFO:__main__:2024-11-30 10:42:04 | Epoch: 0 | Step: 351640 | Dataset: 0-10713960 | Loss: 0.653 | 598 ms/step , 115471.49 GFLOP/s , 173502.6 tokens/s INFO:__main__:2024-11-30 10:42:11 | Epoch: 0 | Step: 351650 | Dataset: 0-10716360 | Loss: 0.667 | 597 ms/step , 115520.07 GFLOP/s , 173510.8 tokens/s INFO:__main__:2024-11-30 10:42:18 | Epoch: 0 | Step: 351660 | Dataset: 0-10718760 | Loss: 0.666 | 599 ms/step , 115213.46 GFLOP/s , 173437.2 tokens/s INFO:__main__:2024-11-30 10:42:25 | Epoch: 0 | Step: 351670 | Dataset: 0-10721160 | Loss: 0.646 | 598 ms/step , 115378.36 GFLOP/s , 173452.3 tokens/s INFO:__main__:2024-11-30 10:42:32 | Epoch: 0 | Step: 351680 | Dataset: 0-10723560 | Loss: 0.627 | 597 ms/step , 115553.68 GFLOP/s , 173498.4 tokens/s INFO:__main__:2024-11-30 10:42:39 | Epoch: 0 | Step: 351690 | Dataset: 0-10725960 | Loss: 0.599 | 598 ms/step , 115337.23 GFLOP/s , 173455.7 tokens/s INFO:__main__:2024-11-30 10:42:46 | Epoch: 0 | Step: 351700 | Dataset: 0-10728360 | Loss: 0.665 | 597 ms/step , 115533.36 GFLOP/s , 173525.8 tokens/s INFO:__main__:2024-11-30 10:42:53 | Epoch: 0 | Step: 351710 | Dataset: 0-10730760 | Loss: 0.728 | 599 ms/step , 115294.24 GFLOP/s , 173391.2 tokens/s INFO:__main__:2024-11-30 10:43:01 | Epoch: 0 | Step: 351720 | Dataset: 0-10733160 | Loss: 0.608 | 599 ms/step , 115296.33 GFLOP/s , 173424.1 tokens/s INFO:__main__:2024-11-30 10:43:08 | Epoch: 0 | Step: 351730 | Dataset: 0-10735560 | Loss: 0.661 | 597 ms/step , 115527.19 GFLOP/s , 173507.7 tokens/s INFO:__main__:2024-11-30 10:43:15 | Epoch: 0 | Step: 351740 | Dataset: 0-10737960 | Loss: 0.674 | 598 ms/step , 115448.47 GFLOP/s , 173476.7 tokens/s INFO:__main__:2024-11-30 10:43:22 | Epoch: 0 | Step: 351750 | Dataset: 0-10740360 | Loss: 0.747 | 598 ms/step , 115371.05 GFLOP/s , 173503.9 tokens/s INFO:__main__:2024-11-30 10:43:29 | Epoch: 0 | Step: 351760 | Dataset: 0-10742760 | Loss: 0.569 | 598 ms/step , 115420.75 GFLOP/s , 173398.6 tokens/s INFO:__main__:2024-11-30 10:43:36 | Epoch: 0 | Step: 351770 | Dataset: 0-10745160 | Loss: 0.722 | 598 ms/step , 115310.89 GFLOP/s , 173443.1 tokens/s INFO:__main__:2024-11-30 10:43:43 | Epoch: 0 | Step: 351780 | Dataset: 0-10747560 | Loss: 0.579 | 598 ms/step , 115376.63 GFLOP/s , 173385.8 tokens/s INFO:__main__:2024-11-30 10:43:50 | Epoch: 0 | Step: 351790 | Dataset: 0-10749960 | Loss: 0.434 | 597 ms/step , 115506.88 GFLOP/s , 173521.0 tokens/s INFO:__main__:2024-11-30 10:43:57 | Epoch: 0 | Step: 351800 | Dataset: 0-10752360 | Loss: 0.432 | 598 ms/step , 115479.28 GFLOP/s , 173476.4 tokens/s INFO:__main__:2024-11-30 10:44:04 | Epoch: 0 | Step: 351810 | Dataset: 0-10754760 | Loss: 0.367 | 597 ms/step , 115560.73 GFLOP/s , 173555.3 tokens/s INFO:__main__:2024-11-30 10:44:11 | Epoch: 0 | Step: 351820 | Dataset: 0-10757160 | Loss: 0.384 | 597 ms/step , 115656.11 GFLOP/s , 173517.8 tokens/s INFO:__main__:2024-11-30 10:44:18 | Epoch: 0 | Step: 351830 | Dataset: 0-10759560 | Loss: 0.407 | 598 ms/step , 115327.08 GFLOP/s , 173403.7 tokens/s INFO:__main__:2024-11-30 10:44:26 | Epoch: 0 | Step: 351840 | Dataset: 0-10761960 | Loss: 0.383 | 597 ms/step , 115588.10 GFLOP/s , 173344.7 tokens/s INFO:__main__:2024-11-30 10:44:33 | Epoch: 0 | Step: 351850 | Dataset: 0-10764360 | Loss: 0.366 | 597 ms/step , 115530.84 GFLOP/s , 173539.9 tokens/s INFO:__main__:2024-11-30 10:44:40 | Epoch: 0 | Step: 351860 | Dataset: 0-10766760 | Loss: 0.425 | 598 ms/step , 115390.89 GFLOP/s , 173541.7 tokens/s INFO:__main__:2024-11-30 10:44:47 | Epoch: 0 | Step: 351870 | Dataset: 0-10769160 | Loss: 0.451 | 597 ms/step , 115551.40 GFLOP/s , 173516.2 tokens/s INFO:__main__:2024-11-30 10:44:54 | Epoch: 0 | Step: 351880 | Dataset: 0-10771560 | Loss: 0.409 | 597 ms/step , 115611.14 GFLOP/s , 173591.9 tokens/s INFO:__main__:2024-11-30 10:45:01 | Epoch: 0 | Step: 351890 | Dataset: 0-10773960 | Loss: 0.409 | 598 ms/step , 115395.78 GFLOP/s , 173546.6 tokens/s INFO:__main__:2024-11-30 10:45:08 | Epoch: 0 | Step: 351900 | Dataset: 0-10776360 | Loss: 0.372 | 597 ms/step , 115601.73 GFLOP/s , 173533.9 tokens/s INFO:__main__:2024-11-30 10:45:15 | Epoch: 0 | Step: 351910 | Dataset: 0-10778760 | Loss: 0.435 | 598 ms/step , 115366.50 GFLOP/s , 173433.3 tokens/s INFO:__main__:2024-11-30 10:45:22 | Epoch: 0 | Step: 351920 | Dataset: 0-10781160 | Loss: 0.396 | 598 ms/step , 115354.57 GFLOP/s , 173483.5 tokens/s INFO:__main__:2024-11-30 10:45:29 | Epoch: 0 | Step: 351930 | Dataset: 0-10783560 | Loss: 0.398 | 598 ms/step , 115315.57 GFLOP/s , 173463.7 tokens/s INFO:__main__:2024-11-30 10:45:36 | Epoch: 0 | Step: 351940 | Dataset: 0-10785960 | Loss: 0.419 | 598 ms/step , 115404.07 GFLOP/s , 173510.1 tokens/s INFO:__main__:2024-11-30 10:45:43 | Epoch: 0 | Step: 351950 | Dataset: 0-10788360 | Loss: 0.423 | 597 ms/step , 115542.37 GFLOP/s , 173473.8 tokens/s INFO:__main__:2024-11-30 10:45:51 | Epoch: 0 | Step: 351960 | Dataset: 0-10790760 | Loss: 0.401 | 595 ms/step , 115909.54 GFLOP/s , 173472.1 tokens/s INFO:__main__:2024-11-30 10:45:58 | Epoch: 0 | Step: 351970 | Dataset: 0-10793160 | Loss: 0.396 | 597 ms/step , 115691.49 GFLOP/s , 173422.5 tokens/s INFO:__main__:2024-11-30 10:46:05 | Epoch: 0 | Step: 351980 | Dataset: 0-10795560 | Loss: 0.417 | 595 ms/step , 115900.61 GFLOP/s , 173487.6 tokens/s INFO:__main__:2024-11-30 10:46:12 | Epoch: 0 | Step: 351990 | Dataset: 0-10797960 | Loss: 0.387 | 596 ms/step , 115874.49 GFLOP/s , 173543.7 tokens/s INFO:__main__:2024-11-30 10:46:19 | Validation | Step: 352000 | Val_loss: 0.702 | Best_val_loss: 0.3487 INFO:__main__:2024-11-30 10:46:19 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_104619_step_352000.pt` INFO:__main__:2024-11-30 10:46:22 | Epoch: 0 | Step: 352000 | Dataset: 0-10800360 | Loss: 0.396 | 595 ms/step , 115973.47 GFLOP/s , 121288.3 tokens/s INFO:__main__:2024-11-30 10:46:29 | Epoch: 0 | Step: 352010 | Dataset: 0-10802760 | Loss: 0.432 | 597 ms/step , 115528.85 GFLOP/s , 173630.5 tokens/s INFO:__main__:2024-11-30 10:46:36 | Epoch: 0 | Step: 352020 | Dataset: 0-10805160 | Loss: 0.440 | 598 ms/step , 115461.39 GFLOP/s , 173527.5 tokens/s INFO:__main__:2024-11-30 10:46:43 | Epoch: 0 | Step: 352030 | Dataset: 0-10807560 | Loss: 0.465 | 597 ms/step , 115511.15 GFLOP/s , 173582.5 tokens/s INFO:__main__:2024-11-30 10:46:50 | Epoch: 0 | Step: 352040 | Dataset: 0-10809960 | Loss: 0.430 | 596 ms/step , 115779.95 GFLOP/s , 173591.7 tokens/s INFO:__main__:2024-11-30 10:46:57 | Epoch: 0 | Step: 352050 | Dataset: 0-10812360 | Loss: 0.386 | 596 ms/step , 115862.75 GFLOP/s , 173591.7 tokens/s INFO:__main__:2024-11-30 10:47:04 | Epoch: 0 | Step: 352060 | Dataset: 0-10814760 | Loss: 0.400 | 596 ms/step , 115889.29 GFLOP/s , 173479.6 tokens/s INFO:__main__:2024-11-30 10:47:11 | Epoch: 0 | Step: 352070 | Dataset: 0-10817160 | Loss: 0.432 | 596 ms/step , 115778.99 GFLOP/s , 173473.2 tokens/s INFO:__main__:2024-11-30 10:47:19 | Epoch: 0 | Step: 352080 | Dataset: 0-10819560 | Loss: 0.363 | 596 ms/step , 115731.88 GFLOP/s , 173563.2 tokens/s INFO:__main__:2024-11-30 10:47:26 | Epoch: 0 | Step: 352090 | Dataset: 0-10821960 | Loss: 0.415 | 597 ms/step , 115673.34 GFLOP/s , 173528.8 tokens/s INFO:__main__:2024-11-30 10:47:33 | Epoch: 0 | Step: 352100 | Dataset: 0-10824360 | Loss: 0.402 | 596 ms/step , 115734.35 GFLOP/s , 173413.1 tokens/s INFO:__main__:2024-11-30 10:47:40 | Epoch: 0 | Step: 352110 | Dataset: 0-10826760 | Loss: 0.373 | 596 ms/step , 115733.07 GFLOP/s , 173522.9 tokens/s INFO:__main__:2024-11-30 10:47:47 | Epoch: 0 | Step: 352120 | Dataset: 0-10829160 | Loss: 0.377 | 597 ms/step , 115651.60 GFLOP/s , 173471.2 tokens/s INFO:__main__:2024-11-30 10:47:54 | Epoch: 0 | Step: 352130 | Dataset: 0-10831560 | Loss: 0.460 | 596 ms/step , 115785.76 GFLOP/s , 173463.0 tokens/s INFO:__main__:2024-11-30 10:48:01 | Epoch: 0 | Step: 352140 | Dataset: 0-10833960 | Loss: 0.436 | 596 ms/step , 115867.60 GFLOP/s , 173492.5 tokens/s INFO:__main__:2024-11-30 10:48:08 | Epoch: 0 | Step: 352150 | Dataset: 0-10836360 | Loss: 0.450 | 596 ms/step , 115855.66 GFLOP/s , 173374.8 tokens/s INFO:__main__:2024-11-30 10:48:15 | Epoch: 0 | Step: 352160 | Dataset: 0-10838760 | Loss: 0.451 | 597 ms/step , 115634.64 GFLOP/s , 173503.9 tokens/s INFO:__main__:2024-11-30 10:48:22 | Epoch: 0 | Step: 352170 | Dataset: 0-10841160 | Loss: 0.396 | 596 ms/step , 115825.34 GFLOP/s , 173517.0 tokens/s INFO:__main__:2024-11-30 10:48:29 | Epoch: 0 | Step: 352180 | Dataset: 0-10843560 | Loss: 0.405 | 596 ms/step , 115711.14 GFLOP/s , 173511.8 tokens/s INFO:__main__:2024-11-30 10:48:36 | Epoch: 0 | Step: 352190 | Dataset: 0-10845960 | Loss: 0.390 | 597 ms/step , 115691.55 GFLOP/s , 173520.9 tokens/s INFO:__main__:2024-11-30 10:48:44 | Epoch: 0 | Step: 352200 | Dataset: 0-10848360 | Loss: 0.416 | 596 ms/step , 115876.04 GFLOP/s , 173485.8 tokens/s INFO:__main__:2024-11-30 10:48:51 | Epoch: 0 | Step: 352210 | Dataset: 0-10850760 | Loss: 0.418 | 596 ms/step , 115843.54 GFLOP/s , 173525.0 tokens/s INFO:__main__:2024-11-30 10:48:58 | Epoch: 0 | Step: 352220 | Dataset: 0-10853160 | Loss: 0.387 | 597 ms/step , 115654.32 GFLOP/s , 173512.9 tokens/s INFO:__main__:2024-11-30 10:49:05 | Epoch: 0 | Step: 352230 | Dataset: 0-10855560 | Loss: 0.394 | 596 ms/step , 115753.62 GFLOP/s , 173342.7 tokens/s INFO:__main__:2024-11-30 10:49:12 | Epoch: 0 | Step: 352240 | Dataset: 0-10857960 | Loss: 0.383 | 596 ms/step , 115730.71 GFLOP/s , 173443.0 tokens/s INFO:__main__:2024-11-30 10:49:19 | Epoch: 0 | Step: 352250 | Dataset: 0-10860360 | Loss: 0.399 | 595 ms/step , 115988.33 GFLOP/s , 173429.2 tokens/s INFO:__main__:2024-11-30 10:49:26 | Epoch: 0 | Step: 352260 | Dataset: 0-10862760 | Loss: 0.359 | 595 ms/step , 115942.82 GFLOP/s , 173469.4 tokens/s INFO:__main__:2024-11-30 10:49:33 | Epoch: 0 | Step: 352270 | Dataset: 0-10865160 | Loss: 0.390 | 597 ms/step , 115620.53 GFLOP/s , 173375.1 tokens/s INFO:__main__:2024-11-30 10:49:40 | Epoch: 0 | Step: 352280 | Dataset: 0-10867560 | Loss: 0.409 | 596 ms/step , 115813.91 GFLOP/s , 173384.3 tokens/s INFO:__main__:2024-11-30 10:49:47 | Epoch: 0 | Step: 352290 | Dataset: 0-10869960 | Loss: 0.399 | 596 ms/step , 115779.56 GFLOP/s , 173498.1 tokens/s INFO:__main__:2024-11-30 10:49:54 | Epoch: 0 | Step: 352300 | Dataset: 0-10872360 | Loss: 0.352 | 595 ms/step , 115956.68 GFLOP/s , 173509.7 tokens/s INFO:__main__:2024-11-30 10:50:01 | Epoch: 0 | Step: 352310 | Dataset: 0-10874760 | Loss: 0.439 | 596 ms/step , 115856.36 GFLOP/s , 173496.4 tokens/s INFO:__main__:2024-11-30 10:50:09 | Epoch: 0 | Step: 352320 | Dataset: 0-10877160 | Loss: 0.439 | 597 ms/step , 115635.39 GFLOP/s , 173384.9 tokens/s INFO:__main__:2024-11-30 10:50:16 | Epoch: 0 | Step: 352330 | Dataset: 0-10879560 | Loss: 0.321 | 596 ms/step , 115795.26 GFLOP/s , 173408.2 tokens/s INFO:__main__:2024-11-30 10:50:23 | Epoch: 0 | Step: 352340 | Dataset: 0-10881960 | Loss: 0.341 | 595 ms/step , 115944.23 GFLOP/s , 173523.0 tokens/s INFO:__main__:2024-11-30 10:50:30 | Epoch: 0 | Step: 352350 | Dataset: 0-10884360 | Loss: 0.383 | 596 ms/step , 115749.96 GFLOP/s , 173478.6 tokens/s INFO:__main__:2024-11-30 10:50:37 | Epoch: 0 | Step: 352360 | Dataset: 0-10886760 | Loss: 0.363 | 596 ms/step , 115800.59 GFLOP/s , 173394.3 tokens/s INFO:__main__:2024-11-30 10:50:44 | Epoch: 0 | Step: 352370 | Dataset: 0-10889160 | Loss: 0.347 | 596 ms/step , 115755.95 GFLOP/s , 173445.7 tokens/s INFO:__main__:2024-11-30 10:50:51 | Epoch: 0 | Step: 352380 | Dataset: 0-10891560 | Loss: 0.353 | 596 ms/step , 115770.76 GFLOP/s , 173424.2 tokens/s INFO:__main__:2024-11-30 10:50:58 | Epoch: 0 | Step: 352390 | Dataset: 0-10893960 | Loss: 0.432 | 595 ms/step , 115913.56 GFLOP/s , 173487.7 tokens/s INFO:__main__:2024-11-30 10:51:05 | Epoch: 0 | Step: 352400 | Dataset: 0-10896360 | Loss: 0.346 | 596 ms/step , 115747.47 GFLOP/s , 173857.5 tokens/s INFO:__main__:2024-11-30 10:51:12 | Epoch: 0 | Step: 352410 | Dataset: 0-10898760 | Loss: 0.322 | 596 ms/step , 115742.75 GFLOP/s , 173934.1 tokens/s INFO:__main__:2024-11-30 10:51:19 | Epoch: 0 | Step: 352420 | Dataset: 0-10901160 | Loss: 0.364 | 596 ms/step , 115830.98 GFLOP/s , 173978.7 tokens/s INFO:__main__:2024-11-30 10:51:26 | Epoch: 0 | Step: 352430 | Dataset: 0-10903560 | Loss: 0.398 | 597 ms/step , 115586.09 GFLOP/s , 173915.4 tokens/s INFO:__main__:2024-11-30 10:51:33 | Epoch: 0 | Step: 352440 | Dataset: 0-10905960 | Loss: 0.365 | 596 ms/step , 115742.12 GFLOP/s , 173986.5 tokens/s INFO:__main__:2024-11-30 10:51:41 | Epoch: 0 | Step: 352450 | Dataset: 0-10908360 | Loss: 0.374 | 601 ms/step , 114863.39 GFLOP/s , 173936.5 tokens/s INFO:__main__:2024-11-30 10:51:48 | Epoch: 0 | Step: 352460 | Dataset: 0-10910760 | Loss: 0.327 | 596 ms/step , 115875.43 GFLOP/s , 173865.5 tokens/s INFO:__main__:2024-11-30 10:51:55 | Epoch: 0 | Step: 352470 | Dataset: 0-10913160 | Loss: 0.366 | 596 ms/step , 115824.25 GFLOP/s , 173940.4 tokens/s INFO:__main__:2024-11-30 10:52:02 | Epoch: 0 | Step: 352480 | Dataset: 0-10915560 | Loss: 0.314 | 595 ms/step , 115900.18 GFLOP/s , 173904.2 tokens/s INFO:__main__:2024-11-30 10:52:09 | Epoch: 0 | Step: 352490 | Dataset: 0-10917960 | Loss: 0.426 | 597 ms/step , 115671.35 GFLOP/s , 173941.5 tokens/s INFO:__main__:2024-11-30 10:52:16 | Validation | Step: 352500 | Val_loss: 0.683 | Best_val_loss: 0.3487 INFO:__main__:2024-11-30 10:52:17 | Epoch: 0 | Step: 352500 | Dataset: 0-10920360 | Loss: 0.368 | 597 ms/step , 115567.52 GFLOP/s , 147923.0 tokens/s INFO:__main__:2024-11-30 10:52:24 | Epoch: 0 | Step: 352510 | Dataset: 0-10922760 | Loss: 0.369 | 596 ms/step , 115824.82 GFLOP/s , 173549.0 tokens/s INFO:__main__:2024-11-30 10:52:31 | Epoch: 0 | Step: 352520 | Dataset: 0-10925160 | Loss: 0.314 | 596 ms/step , 115846.89 GFLOP/s , 173565.0 tokens/s INFO:__main__:2024-11-30 10:52:38 | Epoch: 0 | Step: 352530 | Dataset: 0-10927560 | Loss: 0.325 | 596 ms/step , 115787.30 GFLOP/s , 173516.9 tokens/s INFO:__main__:2024-11-30 10:52:45 | Epoch: 0 | Step: 352540 | Dataset: 0-10929960 | Loss: 0.303 | 596 ms/step , 115807.08 GFLOP/s , 173444.9 tokens/s INFO:__main__:2024-11-30 10:52:53 | Epoch: 0 | Step: 352550 | Dataset: 0-10932360 | Loss: 0.358 | 596 ms/step , 115782.05 GFLOP/s , 173363.9 tokens/s INFO:__main__:2024-11-30 10:53:00 | Epoch: 0 | Step: 352560 | Dataset: 0-10934760 | Loss: 0.337 | 596 ms/step , 115827.68 GFLOP/s , 173900.8 tokens/s INFO:__main__:2024-11-30 10:53:07 | Epoch: 0 | Step: 352570 | Dataset: 0-10937160 | Loss: 0.391 | 596 ms/step , 115752.26 GFLOP/s , 174009.8 tokens/s INFO:__main__:2024-11-30 10:53:14 | Epoch: 0 | Step: 352580 | Dataset: 0-10939560 | Loss: 0.387 | 597 ms/step , 115555.57 GFLOP/s , 173934.1 tokens/s INFO:__main__:2024-11-30 10:53:21 | Epoch: 0 | Step: 352590 | Dataset: 0-10941960 | Loss: 0.383 | 595 ms/step , 115926.49 GFLOP/s , 173991.1 tokens/s INFO:__main__:2024-11-30 10:53:28 | Epoch: 0 | Step: 352600 | Dataset: 0-10944360 | Loss: 0.351 | 595 ms/step , 115932.24 GFLOP/s , 174042.2 tokens/s INFO:__main__:2024-11-30 10:53:35 | Epoch: 0 | Step: 352610 | Dataset: 0-10946760 | Loss: 0.386 | 596 ms/step , 115889.03 GFLOP/s , 174090.2 tokens/s INFO:__main__:2024-11-30 10:53:42 | Epoch: 0 | Step: 352620 | Dataset: 0-10949160 | Loss: 0.367 | 596 ms/step , 115862.29 GFLOP/s , 174009.1 tokens/s INFO:__main__:2024-11-30 10:53:49 | Epoch: 0 | Step: 352630 | Dataset: 0-10951560 | Loss: 0.385 | 595 ms/step , 115940.25 GFLOP/s , 174031.1 tokens/s INFO:__main__:2024-11-30 10:53:56 | Epoch: 0 | Step: 352640 | Dataset: 0-10953960 | Loss: 0.370 | 596 ms/step , 115888.52 GFLOP/s , 174063.2 tokens/s INFO:__main__:2024-11-30 10:54:03 | Epoch: 0 | Step: 352650 | Dataset: 0-10956360 | Loss: 0.371 | 595 ms/step , 115982.41 GFLOP/s , 174026.7 tokens/s INFO:__main__:2024-11-30 10:54:10 | Epoch: 0 | Step: 352660 | Dataset: 0-10958760 | Loss: 0.357 | 596 ms/step , 115773.85 GFLOP/s , 174027.5 tokens/s INFO:__main__:2024-11-30 10:54:17 | Epoch: 0 | Step: 352670 | Dataset: 0-10961160 | Loss: 0.357 | 596 ms/step , 115871.14 GFLOP/s , 174038.3 tokens/s INFO:__main__:2024-11-30 10:54:24 | Epoch: 0 | Step: 352680 | Dataset: 0-10963560 | Loss: 0.328 | 597 ms/step , 115609.52 GFLOP/s , 173974.2 tokens/s INFO:__main__:2024-11-30 10:54:31 | Epoch: 0 | Step: 352690 | Dataset: 0-10965960 | Loss: 0.357 | 596 ms/step , 115779.72 GFLOP/s , 173855.4 tokens/s INFO:__main__:2024-11-30 10:54:38 | Epoch: 0 | Step: 352700 | Dataset: 0-10968360 | Loss: 0.333 | 597 ms/step , 115636.50 GFLOP/s , 173965.3 tokens/s INFO:__main__:2024-11-30 10:54:46 | Epoch: 0 | Step: 352710 | Dataset: 0-10970760 | Loss: 0.369 | 596 ms/step , 115821.77 GFLOP/s , 173889.5 tokens/s INFO:__main__:2024-11-30 10:54:53 | Epoch: 0 | Step: 352720 | Dataset: 0-10973160 | Loss: 0.347 | 596 ms/step , 115725.64 GFLOP/s , 173969.9 tokens/s INFO:__main__:2024-11-30 10:55:00 | Epoch: 0 | Step: 352730 | Dataset: 0-10975560 | Loss: 0.363 | 596 ms/step , 115726.41 GFLOP/s , 173946.9 tokens/s INFO:__main__:2024-11-30 10:55:07 | Epoch: 0 | Step: 352740 | Dataset: 0-10977960 | Loss: 0.343 | 597 ms/step , 115619.41 GFLOP/s , 173894.7 tokens/s INFO:__main__:2024-11-30 10:55:14 | Epoch: 0 | Step: 352750 | Dataset: 0-10980360 | Loss: 0.334 | 597 ms/step , 115662.88 GFLOP/s , 173904.8 tokens/s INFO:__main__:2024-11-30 10:55:21 | Epoch: 0 | Step: 352760 | Dataset: 0-10982760 | Loss: 0.335 | 595 ms/step , 115890.30 GFLOP/s , 173889.8 tokens/s INFO:__main__:2024-11-30 10:55:28 | Epoch: 0 | Step: 352770 | Dataset: 0-10985160 | Loss: 0.347 | 596 ms/step , 115806.81 GFLOP/s , 174006.1 tokens/s INFO:__main__:2024-11-30 10:55:35 | Epoch: 0 | Step: 352780 | Dataset: 0-10987560 | Loss: 0.358 | 596 ms/step , 115873.07 GFLOP/s , 174077.5 tokens/s INFO:__main__:2024-11-30 10:55:42 | Epoch: 0 | Step: 352790 | Dataset: 0-10989960 | Loss: 0.312 | 596 ms/step , 115868.94 GFLOP/s , 173996.2 tokens/s INFO:__main__:2024-11-30 10:55:49 | Epoch: 0 | Step: 352800 | Dataset: 0-10992360 | Loss: 0.331 | 596 ms/step , 115829.82 GFLOP/s , 174047.9 tokens/s INFO:__main__:2024-11-30 10:55:56 | Epoch: 0 | Step: 352810 | Dataset: 0-10994760 | Loss: 0.366 | 595 ms/step , 115895.00 GFLOP/s , 173975.9 tokens/s INFO:__main__:2024-11-30 10:56:03 | Epoch: 0 | Step: 352820 | Dataset: 0-10997160 | Loss: 0.380 | 596 ms/step , 115806.04 GFLOP/s , 174059.8 tokens/s INFO:__main__:2024-11-30 10:56:10 | Epoch: 0 | Step: 352830 | Dataset: 0-10999560 | Loss: 0.402 | 596 ms/step , 115877.73 GFLOP/s , 173941.3 tokens/s INFO:__main__:2024-11-30 10:56:17 | Epoch: 0 | Step: 352840 | Dataset: 0-11001960 | Loss: 0.347 | 596 ms/step , 115839.20 GFLOP/s , 173963.5 tokens/s INFO:__main__:2024-11-30 10:56:24 | Epoch: 0 | Step: 352850 | Dataset: 0-11004360 | Loss: 0.362 | 596 ms/step , 115791.64 GFLOP/s , 174012.1 tokens/s INFO:__main__:2024-11-30 10:56:31 | Epoch: 0 | Step: 352860 | Dataset: 0-11006760 | Loss: 0.334 | 595 ms/step , 115988.75 GFLOP/s , 173842.4 tokens/s INFO:__main__:2024-11-30 10:56:39 | Epoch: 0 | Step: 352870 | Dataset: 0-11009160 | Loss: 0.351 | 596 ms/step , 115875.16 GFLOP/s , 174038.0 tokens/s INFO:__main__:2024-11-30 10:56:46 | Epoch: 0 | Step: 352880 | Dataset: 0-11011560 | Loss: 0.672 | 597 ms/step , 115643.80 GFLOP/s , 173952.0 tokens/s INFO:__main__:2024-11-30 10:56:53 | Epoch: 0 | Step: 352890 | Dataset: 0-11013960 | Loss: 0.682 | 596 ms/step , 115881.01 GFLOP/s , 173740.0 tokens/s INFO:__main__:2024-11-30 10:57:00 | Epoch: 0 | Step: 352900 | Dataset: 0-11016360 | Loss: 0.615 | 598 ms/step , 115499.22 GFLOP/s , 173894.1 tokens/s INFO:__main__:2024-11-30 10:57:07 | Epoch: 0 | Step: 352910 | Dataset: 0-11018760 | Loss: 0.681 | 596 ms/step , 115813.38 GFLOP/s , 173885.0 tokens/s INFO:__main__:2024-11-30 10:57:14 | Epoch: 0 | Step: 352920 | Dataset: 0-11021160 | Loss: 0.645 | 595 ms/step , 115902.37 GFLOP/s , 173878.0 tokens/s INFO:__main__:2024-11-30 10:57:21 | Epoch: 0 | Step: 352930 | Dataset: 0-11023560 | Loss: 0.746 | 596 ms/step , 115822.48 GFLOP/s , 173809.5 tokens/s INFO:__main__:2024-11-30 10:57:28 | Epoch: 0 | Step: 352940 | Dataset: 0-11025960 | Loss: 0.701 | 596 ms/step , 115731.23 GFLOP/s , 173947.5 tokens/s INFO:__main__:2024-11-30 10:57:35 | Epoch: 0 | Step: 352950 | Dataset: 0-11028360 | Loss: 0.696 | 595 ms/step , 115901.41 GFLOP/s , 173955.6 tokens/s INFO:__main__:2024-11-30 10:57:42 | Epoch: 0 | Step: 352960 | Dataset: 0-11030760 | Loss: 0.717 | 595 ms/step , 115890.45 GFLOP/s , 173937.2 tokens/s INFO:__main__:2024-11-30 10:57:49 | Epoch: 0 | Step: 352970 | Dataset: 0-11033160 | Loss: 0.600 | 597 ms/step , 115690.28 GFLOP/s , 173932.2 tokens/s INFO:__main__:2024-11-30 10:57:56 | Epoch: 0 | Step: 352980 | Dataset: 0-11035560 | Loss: 0.579 | 595 ms/step , 115952.53 GFLOP/s , 173927.8 tokens/s INFO:__main__:2024-11-30 10:58:03 | Epoch: 0 | Step: 352990 | Dataset: 0-11037960 | Loss: 0.609 | 596 ms/step , 115822.44 GFLOP/s , 173851.3 tokens/s INFO:__main__:2024-11-30 10:58:11 | Validation | Step: 353000 | Val_loss: 0.660 | Best_val_loss: 0.3487 INFO:__main__:2024-11-30 10:58:11 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_105811_step_353000.pt` INFO:__main__:2024-11-30 10:58:13 | Epoch: 0 | Step: 353000 | Dataset: 0-11040360 | Loss: 0.736 | 594 ms/step , 116100.80 GFLOP/s , 121992.8 tokens/s INFO:__main__:2024-11-30 10:58:20 | Epoch: 0 | Step: 353010 | Dataset: 0-11042760 | Loss: 0.708 | 597 ms/step , 115523.95 GFLOP/s , 173652.7 tokens/s INFO:__main__:2024-11-30 10:58:28 | Epoch: 0 | Step: 353020 | Dataset: 0-11045160 | Loss: 0.644 | 597 ms/step , 115594.37 GFLOP/s , 173652.1 tokens/s INFO:__main__:2024-11-30 10:58:35 | Epoch: 0 | Step: 353030 | Dataset: 0-11047560 | Loss: 0.814 | 597 ms/step , 115588.53 GFLOP/s , 173647.5 tokens/s INFO:__main__:2024-11-30 10:58:42 | Epoch: 0 | Step: 353040 | Dataset: 0-11049960 | Loss: 0.593 | 597 ms/step , 115523.64 GFLOP/s , 173639.3 tokens/s INFO:__main__:2024-11-30 10:58:49 | Epoch: 0 | Step: 353050 | Dataset: 0-11052360 | Loss: 0.662 | 598 ms/step , 115314.88 GFLOP/s , 173567.9 tokens/s INFO:__main__:2024-11-30 10:58:56 | Epoch: 0 | Step: 353060 | Dataset: 0-11054760 | Loss: 0.640 | 598 ms/step , 115458.43 GFLOP/s , 173604.7 tokens/s INFO:__main__:2024-11-30 10:59:03 | Epoch: 0 | Step: 353070 | Dataset: 0-11057160 | Loss: 0.520 | 597 ms/step , 115598.53 GFLOP/s , 173572.9 tokens/s INFO:__main__:2024-11-30 10:59:10 | Epoch: 0 | Step: 353080 | Dataset: 0-11059560 | Loss: 0.658 | 598 ms/step , 115388.24 GFLOP/s , 173576.1 tokens/s INFO:__main__:2024-11-30 10:59:17 | Epoch: 0 | Step: 353090 | Dataset: 0-11061960 | Loss: 0.571 | 598 ms/step , 115353.08 GFLOP/s , 173510.1 tokens/s INFO:__main__:2024-11-30 10:59:24 | Epoch: 0 | Step: 353100 | Dataset: 0-11064360 | Loss: 0.634 | 597 ms/step , 115525.40 GFLOP/s , 173596.5 tokens/s INFO:__main__:2024-11-30 10:59:31 | Epoch: 0 | Step: 353110 | Dataset: 0-11066760 | Loss: 0.630 | 599 ms/step , 115272.90 GFLOP/s , 173570.7 tokens/s INFO:__main__:2024-11-30 10:59:38 | Epoch: 0 | Step: 353120 | Dataset: 0-11069160 | Loss: 0.563 | 596 ms/step , 115710.27 GFLOP/s , 173623.6 tokens/s INFO:__main__:2024-11-30 10:59:45 | Epoch: 0 | Step: 353130 | Dataset: 0-11071560 | Loss: 0.621 | 598 ms/step , 115462.07 GFLOP/s , 173488.1 tokens/s INFO:__main__:2024-11-30 10:59:53 | Epoch: 0 | Step: 353140 | Dataset: 0-11073960 | Loss: 0.557 | 598 ms/step , 115431.11 GFLOP/s , 173471.2 tokens/s INFO:__main__:2024-11-30 11:00:00 | Epoch: 0 | Step: 353150 | Dataset: 0-11076360 | Loss: 0.665 | 598 ms/step , 115388.32 GFLOP/s , 173503.9 tokens/s INFO:__main__:2024-11-30 11:00:07 | Epoch: 0 | Step: 353160 | Dataset: 0-11078760 | Loss: 0.635 | 597 ms/step , 115567.64 GFLOP/s , 173471.2 tokens/s INFO:__main__:2024-11-30 11:00:14 | Epoch: 0 | Step: 353170 | Dataset: 0-11081160 | Loss: 0.692 | 598 ms/step , 115411.60 GFLOP/s , 173360.4 tokens/s INFO:__main__:2024-11-30 11:00:21 | Epoch: 0 | Step: 353180 | Dataset: 0-11083560 | Loss: 0.703 | 598 ms/step , 115436.96 GFLOP/s , 173318.8 tokens/s INFO:__main__:2024-11-30 11:00:28 | Epoch: 0 | Step: 353190 | Dataset: 0-11085960 | Loss: 0.590 | 598 ms/step , 115453.52 GFLOP/s , 173451.4 tokens/s INFO:__main__:2024-11-30 11:00:35 | Epoch: 0 | Step: 353200 | Dataset: 0-11088360 | Loss: 0.629 | 598 ms/step , 115328.16 GFLOP/s , 173416.0 tokens/s INFO:__main__:2024-11-30 11:00:42 | Epoch: 0 | Step: 353210 | Dataset: 0-11090760 | Loss: 0.595 | 598 ms/step , 115344.14 GFLOP/s , 173420.8 tokens/s INFO:__main__:2024-11-30 11:00:49 | Epoch: 0 | Step: 353220 | Dataset: 0-11093160 | Loss: 0.701 | 598 ms/step , 115405.00 GFLOP/s , 173355.5 tokens/s INFO:__main__:2024-11-30 11:00:56 | Epoch: 0 | Step: 353230 | Dataset: 0-11095560 | Loss: 0.631 | 598 ms/step , 115496.33 GFLOP/s , 173431.2 tokens/s INFO:__main__:2024-11-30 11:01:03 | Epoch: 0 | Step: 353240 | Dataset: 0-11097960 | Loss: 0.704 | 598 ms/step , 115419.69 GFLOP/s , 173445.6 tokens/s INFO:__main__:2024-11-30 11:01:10 | Epoch: 0 | Step: 353250 | Dataset: 0-11100360 | Loss: 0.591 | 598 ms/step , 115428.25 GFLOP/s , 173410.7 tokens/s INFO:__main__:2024-11-30 11:01:18 | Epoch: 0 | Step: 353260 | Dataset: 0-11102760 | Loss: 0.659 | 598 ms/step , 115476.94 GFLOP/s , 173457.6 tokens/s INFO:__main__:2024-11-30 11:01:25 | Epoch: 0 | Step: 353270 | Dataset: 0-11105160 | Loss: 0.603 | 597 ms/step , 115613.94 GFLOP/s , 173548.6 tokens/s INFO:__main__:2024-11-30 11:01:32 | Epoch: 0 | Step: 353280 | Dataset: 0-11107560 | Loss: 0.632 | 597 ms/step , 115503.02 GFLOP/s , 173604.4 tokens/s INFO:__main__:2024-11-30 11:01:39 | Epoch: 0 | Step: 353290 | Dataset: 0-11109960 | Loss: 0.551 | 597 ms/step , 115661.48 GFLOP/s , 173546.7 tokens/s INFO:__main__:2024-11-30 11:01:46 | Epoch: 0 | Step: 353300 | Dataset: 0-11112360 | Loss: 0.654 | 598 ms/step , 115429.06 GFLOP/s , 173526.6 tokens/s INFO:__main__:2024-11-30 11:01:53 | Epoch: 0 | Step: 353310 | Dataset: 0-11114760 | Loss: 0.659 | 598 ms/step , 115486.91 GFLOP/s , 173453.4 tokens/s INFO:__main__:2024-11-30 11:02:00 | Epoch: 0 | Step: 353320 | Dataset: 0-11117160 | Loss: 0.653 | 598 ms/step , 115332.27 GFLOP/s , 173450.4 tokens/s INFO:__main__:2024-11-30 11:02:07 | Epoch: 0 | Step: 353330 | Dataset: 0-11119560 | Loss: 0.684 | 597 ms/step , 115534.82 GFLOP/s , 173491.7 tokens/s INFO:__main__:2024-11-30 11:02:14 | Epoch: 0 | Step: 353340 | Dataset: 0-11121960 | Loss: 0.607 | 597 ms/step , 115546.02 GFLOP/s , 173514.9 tokens/s INFO:__main__:2024-11-30 11:02:21 | Epoch: 0 | Step: 353350 | Dataset: 0-11124360 | Loss: 0.709 | 595 ms/step , 115958.39 GFLOP/s , 173523.7 tokens/s INFO:__main__:2024-11-30 11:02:28 | Epoch: 0 | Step: 353360 | Dataset: 0-11126760 | Loss: 0.666 | 595 ms/step , 115993.15 GFLOP/s , 173585.1 tokens/s INFO:__main__:2024-11-30 11:02:35 | Epoch: 0 | Step: 353370 | Dataset: 0-11129160 | Loss: 0.572 | 595 ms/step , 115929.99 GFLOP/s , 173524.4 tokens/s INFO:__main__:2024-11-30 11:02:43 | Epoch: 0 | Step: 353380 | Dataset: 0-11131560 | Loss: 0.598 | 596 ms/step , 115780.15 GFLOP/s , 173495.0 tokens/s INFO:__main__:2024-11-30 11:02:50 | Epoch: 0 | Step: 353390 | Dataset: 0-11133960 | Loss: 0.681 | 596 ms/step , 115801.46 GFLOP/s , 173423.7 tokens/s INFO:__main__:2024-11-30 11:02:57 | Epoch: 0 | Step: 353400 | Dataset: 0-11136360 | Loss: 0.746 | 597 ms/step , 115544.55 GFLOP/s , 173475.6 tokens/s INFO:__main__:2024-11-30 11:03:04 | Epoch: 0 | Step: 353410 | Dataset: 0-11138760 | Loss: 0.706 | 596 ms/step , 115865.37 GFLOP/s , 173534.2 tokens/s INFO:__main__:2024-11-30 11:03:11 | Epoch: 0 | Step: 353420 | Dataset: 0-11141160 | Loss: 0.619 | 596 ms/step , 115815.41 GFLOP/s , 173452.1 tokens/s INFO:__main__:2024-11-30 11:03:18 | Epoch: 0 | Step: 353430 | Dataset: 0-11143560 | Loss: 0.482 | 595 ms/step , 115987.35 GFLOP/s , 173683.5 tokens/s INFO:__main__:2024-11-30 11:03:25 | Epoch: 0 | Step: 353440 | Dataset: 0-11145960 | Loss: 0.436 | 595 ms/step , 115927.63 GFLOP/s , 173664.2 tokens/s INFO:__main__:2024-11-30 11:03:32 | Epoch: 0 | Step: 353450 | Dataset: 0-11148360 | Loss: 0.447 | 596 ms/step , 115699.42 GFLOP/s , 173638.4 tokens/s INFO:__main__:2024-11-30 11:03:39 | Epoch: 0 | Step: 353460 | Dataset: 0-11150760 | Loss: 0.404 | 595 ms/step , 115907.82 GFLOP/s , 173597.3 tokens/s INFO:__main__:2024-11-30 11:03:46 | Epoch: 0 | Step: 353470 | Dataset: 0-11153160 | Loss: 0.458 | 595 ms/step , 115942.13 GFLOP/s , 173558.7 tokens/s INFO:__main__:2024-11-30 11:03:53 | Epoch: 0 | Step: 353480 | Dataset: 0-11155560 | Loss: 0.414 | 596 ms/step , 115814.07 GFLOP/s , 173634.0 tokens/s INFO:__main__:2024-11-30 11:04:00 | Epoch: 0 | Step: 353490 | Dataset: 0-11157960 | Loss: 0.546 | 595 ms/step , 115912.02 GFLOP/s , 173660.1 tokens/s INFO:__main__:2024-11-30 11:04:08 | Validation | Step: 353500 | Val_loss: 0.679 | Best_val_loss: 0.3487 INFO:__main__:2024-11-30 11:04:09 | Epoch: 0 | Step: 353500 | Dataset: 0-11160360 | Loss: 0.452 | 593 ms/step , 116362.17 GFLOP/s , 147651.8 tokens/s INFO:__main__:2024-11-30 11:04:16 | Epoch: 0 | Step: 353510 | Dataset: 0-11162760 | Loss: 0.455 | 595 ms/step , 116018.23 GFLOP/s , 173821.7 tokens/s INFO:__main__:2024-11-30 11:04:23 | Epoch: 0 | Step: 353520 | Dataset: 0-11165160 | Loss: 0.479 | 595 ms/step , 115974.95 GFLOP/s , 173713.9 tokens/s INFO:__main__:2024-11-30 11:04:30 | Epoch: 0 | Step: 353530 | Dataset: 0-11167560 | Loss: 0.436 | 595 ms/step , 116019.40 GFLOP/s , 173704.2 tokens/s INFO:__main__:2024-11-30 11:04:37 | Epoch: 0 | Step: 353540 | Dataset: 0-11169960 | Loss: 0.410 | 595 ms/step , 116012.53 GFLOP/s , 173606.4 tokens/s INFO:__main__:2024-11-30 11:04:44 | Epoch: 0 | Step: 353550 | Dataset: 0-11172360 | Loss: 0.391 | 597 ms/step , 115678.11 GFLOP/s , 173580.0 tokens/s INFO:__main__:2024-11-30 11:04:51 | Epoch: 0 | Step: 353560 | Dataset: 0-11174760 | Loss: 0.421 | 596 ms/step , 115868.59 GFLOP/s , 173595.0 tokens/s INFO:__main__:2024-11-30 11:04:58 | Epoch: 0 | Step: 353570 | Dataset: 0-11177160 | Loss: 0.388 | 595 ms/step , 116012.02 GFLOP/s , 173589.1 tokens/s INFO:__main__:2024-11-30 11:05:05 | Epoch: 0 | Step: 353580 | Dataset: 0-11179560 | Loss: 0.387 | 596 ms/step , 115835.28 GFLOP/s , 173670.1 tokens/s INFO:__main__:2024-11-30 11:05:12 | Epoch: 0 | Step: 353590 | Dataset: 0-11181960 | Loss: 0.404 | 595 ms/step , 116049.86 GFLOP/s , 173702.3 tokens/s INFO:__main__:2024-11-30 11:05:19 | Epoch: 0 | Step: 353600 | Dataset: 0-11184360 | Loss: 0.384 | 595 ms/step , 115975.17 GFLOP/s , 173716.8 tokens/s INFO:__main__:2024-11-30 11:05:27 | Epoch: 0 | Step: 353610 | Dataset: 0-11186760 | Loss: 0.393 | 595 ms/step , 115986.21 GFLOP/s , 173654.3 tokens/s INFO:__main__:2024-11-30 11:05:34 | Epoch: 0 | Step: 353620 | Dataset: 0-11189160 | Loss: 0.392 | 595 ms/step , 116018.42 GFLOP/s , 173637.0 tokens/s INFO:__main__:2024-11-30 11:05:41 | Epoch: 0 | Step: 353630 | Dataset: 0-11191560 | Loss: 0.352 | 595 ms/step , 116079.83 GFLOP/s , 173715.9 tokens/s INFO:__main__:2024-11-30 11:05:48 | Epoch: 0 | Step: 353640 | Dataset: 0-11193960 | Loss: 0.460 | 595 ms/step , 115996.69 GFLOP/s , 173589.1 tokens/s INFO:__main__:2024-11-30 11:05:55 | Epoch: 0 | Step: 353650 | Dataset: 0-11196360 | Loss: 0.389 | 595 ms/step , 115923.89 GFLOP/s , 173673.2 tokens/s INFO:__main__:2024-11-30 11:06:02 | Epoch: 0 | Step: 353660 | Dataset: 0-11198760 | Loss: 0.422 | 595 ms/step , 115977.38 GFLOP/s , 173738.2 tokens/s INFO:__main__:2024-11-30 11:06:09 | Epoch: 0 | Step: 353670 | Dataset: 0-11201160 | Loss: 0.411 | 595 ms/step , 115943.27 GFLOP/s , 173612.2 tokens/s INFO:__main__:2024-11-30 11:06:16 | Epoch: 0 | Step: 353680 | Dataset: 0-11203560 | Loss: 0.399 | 595 ms/step , 115997.23 GFLOP/s , 173756.5 tokens/s INFO:__main__:2024-11-30 11:06:23 | Epoch: 0 | Step: 353690 | Dataset: 0-11205960 | Loss: 0.481 | 596 ms/step , 115764.72 GFLOP/s , 174008.8 tokens/s INFO:__main__:2024-11-30 11:06:30 | Epoch: 0 | Step: 353700 | Dataset: 0-11208360 | Loss: 0.483 | 597 ms/step , 115610.69 GFLOP/s , 173962.8 tokens/s INFO:__main__:2024-11-30 11:06:37 | Epoch: 0 | Step: 353710 | Dataset: 0-11210760 | Loss: 0.482 | 596 ms/step , 115760.16 GFLOP/s , 173992.0 tokens/s INFO:__main__:2024-11-30 11:06:44 | Epoch: 0 | Step: 353720 | Dataset: 0-11213160 | Loss: 0.439 | 596 ms/step , 115757.36 GFLOP/s , 173941.3 tokens/s INFO:__main__:2024-11-30 11:06:51 | Epoch: 0 | Step: 353730 | Dataset: 0-11215560 | Loss: 0.469 | 596 ms/step , 115761.08 GFLOP/s , 173984.4 tokens/s INFO:__main__:2024-11-30 11:06:58 | Epoch: 0 | Step: 353740 | Dataset: 0-11217960 | Loss: 0.474 | 595 ms/step , 115952.94 GFLOP/s , 174026.7 tokens/s INFO:__main__:2024-11-30 11:07:06 | Epoch: 0 | Step: 353750 | Dataset: 0-11220360 | Loss: 0.390 | 596 ms/step , 115740.05 GFLOP/s , 174005.4 tokens/s INFO:__main__:2024-11-30 11:07:13 | Epoch: 0 | Step: 353760 | Dataset: 0-11222760 | Loss: 0.482 | 595 ms/step , 115925.63 GFLOP/s , 174044.1 tokens/s INFO:__main__:2024-11-30 11:07:20 | Epoch: 0 | Step: 353770 | Dataset: 0-11225160 | Loss: 0.514 | 596 ms/step , 115861.71 GFLOP/s , 174026.3 tokens/s INFO:__main__:2024-11-30 11:07:27 | Epoch: 0 | Step: 353780 | Dataset: 0-11227560 | Loss: 0.437 | 597 ms/step , 115670.20 GFLOP/s , 173913.1 tokens/s INFO:__main__:2024-11-30 11:07:34 | Epoch: 0 | Step: 353790 | Dataset: 0-11229960 | Loss: 0.434 | 597 ms/step , 115686.12 GFLOP/s , 173890.9 tokens/s INFO:__main__:2024-11-30 11:07:41 | Epoch: 0 | Step: 353800 | Dataset: 0-11232360 | Loss: 0.446 | 595 ms/step , 115928.40 GFLOP/s , 174042.8 tokens/s INFO:__main__:2024-11-30 11:07:48 | Epoch: 0 | Step: 353810 | Dataset: 0-11234760 | Loss: 0.495 | 597 ms/step , 115508.75 GFLOP/s , 174006.6 tokens/s INFO:__main__:2024-11-30 11:07:55 | Epoch: 0 | Step: 353820 | Dataset: 0-11237160 | Loss: 0.464 | 596 ms/step , 115830.50 GFLOP/s , 174017.0 tokens/s INFO:__main__:2024-11-30 11:08:02 | Epoch: 0 | Step: 353830 | Dataset: 0-11239560 | Loss: 0.425 | 596 ms/step , 115833.58 GFLOP/s , 173873.3 tokens/s INFO:__main__:2024-11-30 11:08:09 | Epoch: 0 | Step: 353840 | Dataset: 0-11241960 | Loss: 0.438 | 597 ms/step , 115628.42 GFLOP/s , 173932.0 tokens/s INFO:__main__:2024-11-30 11:08:16 | Epoch: 0 | Step: 353850 | Dataset: 0-11244360 | Loss: 0.502 | 596 ms/step , 115863.67 GFLOP/s , 173948.9 tokens/s INFO:__main__:2024-11-30 11:08:23 | Epoch: 0 | Step: 353860 | Dataset: 0-11246760 | Loss: 0.462 | 596 ms/step , 115843.50 GFLOP/s , 173935.3 tokens/s INFO:__main__:2024-11-30 11:08:30 | Epoch: 0 | Step: 353870 | Dataset: 0-11249160 | Loss: 0.423 | 596 ms/step , 115831.89 GFLOP/s , 173938.4 tokens/s INFO:__main__:2024-11-30 11:08:37 | Epoch: 0 | Step: 353880 | Dataset: 0-11251560 | Loss: 0.470 | 596 ms/step , 115740.42 GFLOP/s , 173999.7 tokens/s INFO:__main__:2024-11-30 11:08:44 | Epoch: 0 | Step: 353890 | Dataset: 0-11253960 | Loss: 0.477 | 597 ms/step , 115654.19 GFLOP/s , 173952.1 tokens/s INFO:__main__:2024-11-30 11:08:51 | Epoch: 0 | Step: 353900 | Dataset: 0-11256360 | Loss: 0.425 | 597 ms/step , 115670.10 GFLOP/s , 173976.8 tokens/s INFO:__main__:2024-11-30 11:08:59 | Epoch: 0 | Step: 353910 | Dataset: 0-11258760 | Loss: 0.459 | 596 ms/step , 115785.08 GFLOP/s , 173986.5 tokens/s INFO:__main__:2024-11-30 11:09:06 | Epoch: 0 | Step: 353920 | Dataset: 0-11261160 | Loss: 0.508 | 596 ms/step , 115853.95 GFLOP/s , 174034.8 tokens/s INFO:__main__:2024-11-30 11:09:13 | Epoch: 0 | Step: 353930 | Dataset: 0-11263560 | Loss: 0.450 | 596 ms/step , 115866.76 GFLOP/s , 173965.4 tokens/s INFO:__main__:2024-11-30 11:09:20 | Epoch: 0 | Step: 353940 | Dataset: 0-11265960 | Loss: 0.476 | 596 ms/step , 115824.73 GFLOP/s , 174023.8 tokens/s INFO:__main__:2024-11-30 11:09:27 | Epoch: 0 | Step: 353950 | Dataset: 0-11268360 | Loss: 0.442 | 596 ms/step , 115875.56 GFLOP/s , 173925.0 tokens/s INFO:__main__:2024-11-30 11:09:34 | Epoch: 0 | Step: 353960 | Dataset: 0-11270760 | Loss: 0.451 | 596 ms/step , 115769.73 GFLOP/s , 173950.5 tokens/s INFO:__main__:2024-11-30 11:09:41 | Epoch: 0 | Step: 353970 | Dataset: 0-11273160 | Loss: 0.681 | 596 ms/step , 115814.60 GFLOP/s , 173885.3 tokens/s INFO:__main__:2024-11-30 11:09:48 | Epoch: 0 | Step: 353980 | Dataset: 0-11275560 | Loss: 0.710 | 597 ms/step , 115643.75 GFLOP/s , 173838.0 tokens/s INFO:__main__:2024-11-30 11:09:55 | Epoch: 0 | Step: 353990 | Dataset: 0-11277960 | Loss: 0.691 | 597 ms/step , 115550.00 GFLOP/s , 173857.2 tokens/s INFO:__main__:2024-11-30 11:10:03 | Validation | Step: 354000 | Val_loss: 0.650 | Best_val_loss: 0.3487 INFO:__main__:2024-11-30 11:10:03 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_111003_step_354000.pt` INFO:__main__:2024-11-30 11:10:05 | Epoch: 0 | Step: 354000 | Dataset: 0-11280360 | Loss: 0.650 | 594 ms/step , 116109.01 GFLOP/s , 121384.4 tokens/s INFO:__main__:2024-11-30 11:10:12 | Epoch: 0 | Step: 354010 | Dataset: 0-11282760 | Loss: 0.612 | 597 ms/step , 115589.05 GFLOP/s , 173674.6 tokens/s INFO:__main__:2024-11-30 11:10:19 | Epoch: 0 | Step: 354020 | Dataset: 0-11285160 | Loss: 0.659 | 596 ms/step , 115855.21 GFLOP/s , 173535.2 tokens/s INFO:__main__:2024-11-30 11:10:26 | Epoch: 0 | Step: 354030 | Dataset: 0-11287560 | Loss: 0.670 | 596 ms/step , 115882.67 GFLOP/s , 173606.6 tokens/s INFO:__main__:2024-11-30 11:10:33 | Epoch: 0 | Step: 354040 | Dataset: 0-11289960 | Loss: 0.650 | 596 ms/step , 115790.56 GFLOP/s , 173455.7 tokens/s INFO:__main__:2024-11-30 11:10:41 | Epoch: 0 | Step: 354050 | Dataset: 0-11292360 | Loss: 0.659 | 597 ms/step , 115565.87 GFLOP/s , 173370.1 tokens/s INFO:__main__:2024-11-30 11:10:48 | Epoch: 0 | Step: 354060 | Dataset: 0-11294760 | Loss: 0.626 | 596 ms/step , 115759.53 GFLOP/s , 173440.5 tokens/s INFO:__main__:2024-11-30 11:10:55 | Epoch: 0 | Step: 354070 | Dataset: 0-11297160 | Loss: 0.638 | 598 ms/step , 115486.59 GFLOP/s , 173453.1 tokens/s INFO:__main__:2024-11-30 11:11:02 | Epoch: 0 | Step: 354080 | Dataset: 0-11299560 | Loss: 0.614 | 596 ms/step , 115730.51 GFLOP/s , 173912.6 tokens/s INFO:__main__:2024-11-30 11:11:09 | Epoch: 0 | Step: 354090 | Dataset: 0-11301960 | Loss: 0.667 | 595 ms/step , 115953.89 GFLOP/s , 173889.3 tokens/s INFO:__main__:2024-11-30 11:11:16 | Epoch: 0 | Step: 354100 | Dataset: 0-11304360 | Loss: 0.612 | 596 ms/step , 115756.83 GFLOP/s , 173881.9 tokens/s INFO:__main__:2024-11-30 11:11:23 | Epoch: 0 | Step: 354110 | Dataset: 0-11306760 | Loss: 0.625 | 597 ms/step , 115550.84 GFLOP/s , 173833.0 tokens/s INFO:__main__:2024-11-30 11:11:30 | Epoch: 0 | Step: 354120 | Dataset: 0-11309160 | Loss: 0.729 | 596 ms/step , 115797.09 GFLOP/s , 173913.3 tokens/s INFO:__main__:2024-11-30 11:11:37 | Epoch: 0 | Step: 354130 | Dataset: 0-11311560 | Loss: 0.655 | 597 ms/step , 115623.22 GFLOP/s , 173827.8 tokens/s INFO:__main__:2024-11-30 11:11:44 | Epoch: 0 | Step: 354140 | Dataset: 0-11313960 | Loss: 0.661 | 596 ms/step , 115696.95 GFLOP/s , 173843.8 tokens/s INFO:__main__:2024-11-30 11:11:51 | Epoch: 0 | Step: 354150 | Dataset: 0-11316360 | Loss: 0.701 | 597 ms/step , 115643.52 GFLOP/s , 173874.9 tokens/s INFO:__main__:2024-11-30 11:11:58 | Epoch: 0 | Step: 354160 | Dataset: 0-11318760 | Loss: 0.687 | 597 ms/step , 115640.93 GFLOP/s , 173820.5 tokens/s INFO:__main__:2024-11-30 11:12:05 | Epoch: 0 | Step: 354170 | Dataset: 0-11321160 | Loss: 0.666 | 595 ms/step , 116016.15 GFLOP/s , 173915.4 tokens/s INFO:__main__:2024-11-30 11:12:12 | Epoch: 0 | Step: 354180 | Dataset: 0-11323560 | Loss: 0.754 | 597 ms/step , 115688.94 GFLOP/s , 173869.4 tokens/s INFO:__main__:2024-11-30 11:12:20 | Epoch: 0 | Step: 354190 | Dataset: 0-11325960 | Loss: 0.648 | 597 ms/step , 115676.11 GFLOP/s , 173880.3 tokens/s INFO:__main__:2024-11-30 11:12:27 | Epoch: 0 | Step: 354200 | Dataset: 0-11328360 | Loss: 0.639 | 597 ms/step , 115599.27 GFLOP/s , 173896.6 tokens/s INFO:__main__:2024-11-30 11:12:34 | Epoch: 0 | Step: 354210 | Dataset: 0-11330760 | Loss: 0.720 | 597 ms/step , 115648.74 GFLOP/s , 173735.2 tokens/s INFO:__main__:2024-11-30 11:12:41 | Epoch: 0 | Step: 354220 | Dataset: 0-11333160 | Loss: 0.595 | 596 ms/step , 115729.90 GFLOP/s , 173805.8 tokens/s INFO:__main__:2024-11-30 11:12:48 | Epoch: 0 | Step: 354230 | Dataset: 0-11335560 | Loss: 0.687 | 597 ms/step , 115538.37 GFLOP/s , 173806.2 tokens/s INFO:__main__:2024-11-30 11:12:55 | Epoch: 0 | Step: 354240 | Dataset: 0-11337960 | Loss: 0.692 | 596 ms/step , 115756.09 GFLOP/s , 173969.3 tokens/s INFO:__main__:2024-11-30 11:13:02 | Epoch: 0 | Step: 354250 | Dataset: 0-11340360 | Loss: 0.664 | 597 ms/step , 115689.52 GFLOP/s , 173908.8 tokens/s INFO:__main__:2024-11-30 11:13:09 | Epoch: 0 | Step: 354260 | Dataset: 0-11342760 | Loss: 0.645 | 597 ms/step , 115649.15 GFLOP/s , 173973.8 tokens/s INFO:__main__:2024-11-30 11:13:16 | Epoch: 0 | Step: 354270 | Dataset: 0-11345160 | Loss: 0.729 | 596 ms/step , 115710.38 GFLOP/s , 173825.7 tokens/s INFO:__main__:2024-11-30 11:13:23 | Epoch: 0 | Step: 354280 | Dataset: 0-11347560 | Loss: 0.663 | 596 ms/step , 115839.06 GFLOP/s , 173942.4 tokens/s INFO:__main__:2024-11-30 11:13:30 | Epoch: 0 | Step: 354290 | Dataset: 0-11349960 | Loss: 0.743 | 597 ms/step , 115668.97 GFLOP/s , 173884.9 tokens/s INFO:__main__:2024-11-30 11:13:37 | Epoch: 0 | Step: 354300 | Dataset: 0-11352360 | Loss: 0.659 | 596 ms/step , 115775.04 GFLOP/s , 174026.9 tokens/s INFO:__main__:2024-11-30 11:13:44 | Epoch: 0 | Step: 354310 | Dataset: 0-11354760 | Loss: 0.685 | 596 ms/step , 115802.37 GFLOP/s , 173879.4 tokens/s INFO:__main__:2024-11-30 11:13:51 | Epoch: 0 | Step: 354320 | Dataset: 0-11357160 | Loss: 0.656 | 595 ms/step , 115946.15 GFLOP/s , 173910.8 tokens/s INFO:__main__:2024-11-30 11:13:58 | Epoch: 0 | Step: 354330 | Dataset: 0-11359560 | Loss: 0.640 | 596 ms/step , 115722.75 GFLOP/s , 173950.7 tokens/s INFO:__main__:2024-11-30 11:14:06 | Epoch: 0 | Step: 354340 | Dataset: 0-11361960 | Loss: 0.624 | 596 ms/step , 115830.22 GFLOP/s , 173962.4 tokens/s INFO:__main__:2024-11-30 11:14:13 | Epoch: 0 | Step: 354350 | Dataset: 0-11364360 | Loss: 0.682 | 596 ms/step , 115770.99 GFLOP/s , 173918.4 tokens/s INFO:__main__:2024-11-30 11:14:20 | Epoch: 0 | Step: 354360 | Dataset: 0-11366760 | Loss: 0.650 | 596 ms/step , 115716.94 GFLOP/s , 173974.2 tokens/s INFO:__main__:2024-11-30 11:14:27 | Epoch: 0 | Step: 354370 | Dataset: 0-11369160 | Loss: 0.668 | 596 ms/step , 115779.61 GFLOP/s , 173968.6 tokens/s INFO:__main__:2024-11-30 11:14:34 | Epoch: 0 | Step: 354380 | Dataset: 0-11371560 | Loss: 0.702 | 597 ms/step , 115567.20 GFLOP/s , 173852.3 tokens/s INFO:__main__:2024-11-30 11:14:41 | Epoch: 0 | Step: 354390 | Dataset: 0-11373960 | Loss: 0.654 | 597 ms/step , 115668.88 GFLOP/s , 173880.7 tokens/s INFO:__main__:2024-11-30 11:14:48 | Epoch: 0 | Step: 354400 | Dataset: 0-11376360 | Loss: 0.667 | 595 ms/step , 115912.38 GFLOP/s , 173858.1 tokens/s INFO:__main__:2024-11-30 11:14:55 | Epoch: 0 | Step: 354410 | Dataset: 0-11378760 | Loss: 0.740 | 598 ms/step , 115488.89 GFLOP/s , 173909.9 tokens/s INFO:__main__:2024-11-30 11:15:02 | Epoch: 0 | Step: 354420 | Dataset: 0-11381160 | Loss: 0.596 | 597 ms/step , 115614.87 GFLOP/s , 173880.1 tokens/s INFO:__main__:2024-11-30 11:15:09 | Epoch: 0 | Step: 354430 | Dataset: 0-11383560 | Loss: 0.717 | 596 ms/step , 115821.95 GFLOP/s , 173869.3 tokens/s INFO:__main__:2024-11-30 11:15:16 | Epoch: 0 | Step: 354440 | Dataset: 0-11385960 | Loss: 0.699 | 596 ms/step , 115773.03 GFLOP/s , 173948.0 tokens/s INFO:__main__:2024-11-30 11:15:23 | Epoch: 0 | Step: 354450 | Dataset: 0-11388360 | Loss: 0.727 | 596 ms/step , 115867.39 GFLOP/s , 173929.0 tokens/s INFO:__main__:2024-11-30 11:15:30 | Epoch: 0 | Step: 354460 | Dataset: 0-11390760 | Loss: 0.626 | 597 ms/step , 115683.72 GFLOP/s , 173981.8 tokens/s INFO:__main__:2024-11-30 11:15:37 | Epoch: 0 | Step: 354470 | Dataset: 0-11393160 | Loss: 0.718 | 595 ms/step , 115906.35 GFLOP/s , 173897.9 tokens/s INFO:__main__:2024-11-30 11:15:44 | Epoch: 0 | Step: 354480 | Dataset: 0-11395560 | Loss: 0.781 | 597 ms/step , 115688.40 GFLOP/s , 173989.3 tokens/s INFO:__main__:2024-11-30 11:15:52 | Epoch: 0 | Step: 354490 | Dataset: 0-11397960 | Loss: 0.739 | 597 ms/step , 115688.18 GFLOP/s , 173928.2 tokens/s INFO:__main__:2024-11-30 11:15:59 | Validation | Step: 354500 | Val_loss: 0.660 | Best_val_loss: 0.3487 INFO:__main__:2024-11-30 11:16:00 | Epoch: 0 | Step: 354500 | Dataset: 0-11400360 | Loss: 0.623 | 596 ms/step , 115865.41 GFLOP/s , 148049.3 tokens/s INFO:__main__:2024-11-30 11:16:07 | Epoch: 0 | Step: 354510 | Dataset: 0-11402760 | Loss: 0.719 | 596 ms/step , 115792.56 GFLOP/s , 174036.0 tokens/s INFO:__main__:2024-11-30 11:16:14 | Epoch: 0 | Step: 354520 | Dataset: 0-11405160 | Loss: 0.598 | 595 ms/step , 115893.05 GFLOP/s , 174049.8 tokens/s INFO:__main__:2024-11-30 11:16:21 | Epoch: 0 | Step: 354530 | Dataset: 0-11407560 | Loss: 0.702 | 596 ms/step , 115816.90 GFLOP/s , 173972.7 tokens/s INFO:__main__:2024-11-30 11:16:28 | Epoch: 0 | Step: 354540 | Dataset: 0-11409960 | Loss: 0.723 | 596 ms/step , 115791.68 GFLOP/s , 173976.9 tokens/s INFO:__main__:2024-11-30 11:16:35 | Epoch: 0 | Step: 354550 | Dataset: 0-11412360 | Loss: 0.696 | 596 ms/step , 115797.08 GFLOP/s , 173920.1 tokens/s INFO:__main__:2024-11-30 11:16:42 | Epoch: 0 | Step: 354560 | Dataset: 0-11414760 | Loss: 0.637 | 597 ms/step , 115643.62 GFLOP/s , 173902.8 tokens/s INFO:__main__:2024-11-30 11:16:49 | Epoch: 0 | Step: 354570 | Dataset: 0-11417160 | Loss: 0.685 | 597 ms/step , 115656.48 GFLOP/s , 173966.0 tokens/s INFO:__main__:2024-11-30 11:16:56 | Epoch: 0 | Step: 354580 | Dataset: 0-11419560 | Loss: 0.658 | 596 ms/step , 115710.38 GFLOP/s , 173971.3 tokens/s INFO:__main__:2024-11-30 11:17:03 | Epoch: 0 | Step: 354590 | Dataset: 0-11421960 | Loss: 0.684 | 595 ms/step , 115897.61 GFLOP/s , 173898.2 tokens/s INFO:__main__:2024-11-30 11:17:10 | Epoch: 0 | Step: 354600 | Dataset: 0-11424360 | Loss: 0.675 | 596 ms/step , 115698.35 GFLOP/s , 173914.8 tokens/s INFO:__main__:2024-11-30 11:17:18 | Epoch: 0 | Step: 354610 | Dataset: 0-11426760 | Loss: 0.640 | 596 ms/step , 115836.97 GFLOP/s , 173917.6 tokens/s INFO:__main__:2024-11-30 11:17:25 | Epoch: 0 | Step: 354620 | Dataset: 0-11429160 | Loss: 0.696 | 596 ms/step , 115699.13 GFLOP/s , 173965.9 tokens/s INFO:__main__:2024-11-30 11:17:32 | Epoch: 0 | Step: 354630 | Dataset: 0-11431560 | Loss: 0.671 | 595 ms/step , 115913.45 GFLOP/s , 173979.3 tokens/s INFO:__main__:2024-11-30 11:17:39 | Epoch: 0 | Step: 354640 | Dataset: 0-11433960 | Loss: 0.710 | 596 ms/step , 115831.45 GFLOP/s , 173974.4 tokens/s INFO:__main__:2024-11-30 11:17:46 | Epoch: 0 | Step: 354650 | Dataset: 0-11436360 | Loss: 0.574 | 596 ms/step , 115810.62 GFLOP/s , 173966.5 tokens/s INFO:__main__:2024-11-30 11:17:53 | Epoch: 0 | Step: 354660 | Dataset: 0-11438760 | Loss: 0.592 | 596 ms/step , 115748.90 GFLOP/s , 174021.7 tokens/s INFO:__main__:2024-11-30 11:18:00 | Epoch: 0 | Step: 354670 | Dataset: 0-11441160 | Loss: 0.712 | 596 ms/step , 115792.89 GFLOP/s , 173953.5 tokens/s INFO:__main__:2024-11-30 11:18:07 | Epoch: 0 | Step: 354680 | Dataset: 0-11443560 | Loss: 0.641 | 596 ms/step , 115850.97 GFLOP/s , 174022.2 tokens/s INFO:__main__:2024-11-30 11:18:14 | Epoch: 0 | Step: 354690 | Dataset: 0-11445960 | Loss: 0.708 | 596 ms/step , 115822.84 GFLOP/s , 174056.3 tokens/s INFO:__main__:2024-11-30 11:18:21 | Epoch: 0 | Step: 354700 | Dataset: 0-11448360 | Loss: 0.641 | 595 ms/step , 115975.48 GFLOP/s , 174050.0 tokens/s INFO:__main__:2024-11-30 11:18:28 | Epoch: 0 | Step: 354710 | Dataset: 0-11450760 | Loss: 0.722 | 596 ms/step , 115880.41 GFLOP/s , 174048.6 tokens/s INFO:__main__:2024-11-30 11:18:35 | Epoch: 0 | Step: 354720 | Dataset: 0-11453160 | Loss: 0.647 | 596 ms/step , 115810.47 GFLOP/s , 173981.5 tokens/s INFO:__main__:2024-11-30 11:18:42 | Epoch: 0 | Step: 354730 | Dataset: 0-11455560 | Loss: 0.685 | 596 ms/step , 115885.27 GFLOP/s , 174054.7 tokens/s INFO:__main__:2024-11-30 11:18:49 | Epoch: 0 | Step: 354740 | Dataset: 0-11457960 | Loss: 0.629 | 596 ms/step , 115882.03 GFLOP/s , 174022.0 tokens/s INFO:__main__:2024-11-30 11:18:56 | Epoch: 0 | Step: 354750 | Dataset: 0-11460360 | Loss: 0.649 | 596 ms/step , 115889.64 GFLOP/s , 173984.0 tokens/s INFO:__main__:2024-11-30 11:19:03 | Epoch: 0 | Step: 354760 | Dataset: 0-11462760 | Loss: 0.698 | 595 ms/step , 115914.57 GFLOP/s , 173984.4 tokens/s INFO:__main__:2024-11-30 11:19:11 | Epoch: 0 | Step: 354770 | Dataset: 0-11465160 | Loss: 0.532 | 597 ms/step , 115680.77 GFLOP/s , 173957.0 tokens/s INFO:__main__:2024-11-30 11:19:18 | Epoch: 0 | Step: 354780 | Dataset: 0-11467560 | Loss: 0.562 | 596 ms/step , 115882.07 GFLOP/s , 174067.1 tokens/s INFO:__main__:2024-11-30 11:19:25 | Epoch: 0 | Step: 354790 | Dataset: 0-11469960 | Loss: 0.686 | 596 ms/step , 115748.96 GFLOP/s , 174006.1 tokens/s INFO:__main__:2024-11-30 11:19:32 | Epoch: 0 | Step: 354800 | Dataset: 0-11472360 | Loss: 0.668 | 596 ms/step , 115784.46 GFLOP/s , 174066.1 tokens/s INFO:__main__:2024-11-30 11:19:39 | Epoch: 0 | Step: 354810 | Dataset: 0-11474760 | Loss: 0.625 | 595 ms/step , 115909.79 GFLOP/s , 174050.5 tokens/s INFO:__main__:2024-11-30 11:19:46 | Epoch: 0 | Step: 354820 | Dataset: 0-11477160 | Loss: 0.582 | 595 ms/step , 115914.22 GFLOP/s , 174008.2 tokens/s INFO:__main__:2024-11-30 11:19:53 | Epoch: 0 | Step: 354830 | Dataset: 0-11479560 | Loss: 0.692 | 596 ms/step , 115832.81 GFLOP/s , 173956.5 tokens/s INFO:__main__:2024-11-30 11:20:00 | Epoch: 0 | Step: 354840 | Dataset: 0-11481960 | Loss: 0.682 | 596 ms/step , 115758.69 GFLOP/s , 173940.0 tokens/s INFO:__main__:2024-11-30 11:20:07 | Epoch: 0 | Step: 354850 | Dataset: 0-11484360 | Loss: 0.663 | 595 ms/step , 115994.06 GFLOP/s , 173966.8 tokens/s INFO:__main__:2024-11-30 11:20:14 | Epoch: 0 | Step: 354860 | Dataset: 0-11486760 | Loss: 0.646 | 596 ms/step , 115788.84 GFLOP/s , 174048.6 tokens/s INFO:__main__:2024-11-30 11:20:21 | Epoch: 0 | Step: 354870 | Dataset: 0-11489160 | Loss: 0.649 | 596 ms/step , 115821.53 GFLOP/s , 173955.4 tokens/s INFO:__main__:2024-11-30 11:20:28 | Epoch: 0 | Step: 354880 | Dataset: 0-11491560 | Loss: 0.696 | 596 ms/step , 115780.35 GFLOP/s , 173895.1 tokens/s INFO:__main__:2024-11-30 11:20:35 | Epoch: 0 | Step: 354890 | Dataset: 0-11493960 | Loss: 0.669 | 596 ms/step , 115823.72 GFLOP/s , 174027.8 tokens/s INFO:__main__:2024-11-30 11:20:42 | Epoch: 0 | Step: 354900 | Dataset: 0-11496360 | Loss: 0.542 | 595 ms/step , 116036.18 GFLOP/s , 174086.7 tokens/s INFO:__main__:2024-11-30 11:20:49 | Epoch: 0 | Step: 354910 | Dataset: 0-11498760 | Loss: 0.616 | 596 ms/step , 115875.73 GFLOP/s , 174047.0 tokens/s INFO:__main__:2024-11-30 11:20:56 | Epoch: 0 | Step: 354920 | Dataset: 0-11501160 | Loss: 0.660 | 596 ms/step , 115761.31 GFLOP/s , 173956.1 tokens/s INFO:__main__:2024-11-30 11:21:04 | Epoch: 0 | Step: 354930 | Dataset: 0-11503560 | Loss: 0.590 | 595 ms/step , 115895.46 GFLOP/s , 174075.8 tokens/s INFO:__main__:2024-11-30 11:21:11 | Epoch: 0 | Step: 354940 | Dataset: 0-11505960 | Loss: 0.688 | 596 ms/step , 115843.77 GFLOP/s , 174023.8 tokens/s INFO:__main__:2024-11-30 11:21:18 | Epoch: 0 | Step: 354950 | Dataset: 0-11508360 | Loss: 0.652 | 595 ms/step , 115898.72 GFLOP/s , 174161.7 tokens/s INFO:__main__:2024-11-30 11:21:25 | Epoch: 0 | Step: 354960 | Dataset: 0-11510760 | Loss: 0.670 | 596 ms/step , 115874.56 GFLOP/s , 174067.3 tokens/s INFO:__main__:2024-11-30 11:21:32 | Epoch: 0 | Step: 354970 | Dataset: 0-11513160 | Loss: 0.651 | 595 ms/step , 115894.76 GFLOP/s , 174026.9 tokens/s INFO:__main__:2024-11-30 11:21:39 | Epoch: 0 | Step: 354980 | Dataset: 0-11515560 | Loss: 0.725 | 595 ms/step , 115978.52 GFLOP/s , 174111.8 tokens/s INFO:__main__:2024-11-30 11:21:46 | Epoch: 0 | Step: 354990 | Dataset: 0-11517960 | Loss: 0.658 | 597 ms/step , 115643.32 GFLOP/s , 174072.9 tokens/s INFO:__main__:2024-11-30 11:21:53 | Validation | Step: 355000 | Val_loss: 0.702 | Best_val_loss: 0.3487 INFO:__main__:2024-11-30 11:21:53 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_112153_step_355000.pt` INFO:__main__:2024-11-30 11:21:56 | Epoch: 0 | Step: 355000 | Dataset: 0-11520360 | Loss: 0.670 | 595 ms/step , 116006.57 GFLOP/s , 121660.7 tokens/s INFO:__main__:2024-11-30 11:22:03 | Epoch: 0 | Step: 355010 | Dataset: 0-11522760 | Loss: 0.721 | 597 ms/step , 115631.01 GFLOP/s , 173634.3 tokens/s INFO:__main__:2024-11-30 11:22:10 | Epoch: 0 | Step: 355020 | Dataset: 0-11525160 | Loss: 0.636 | 597 ms/step , 115564.16 GFLOP/s , 173703.9 tokens/s INFO:__main__:2024-11-30 11:22:17 | Epoch: 0 | Step: 355030 | Dataset: 0-11527560 | Loss: 0.736 | 598 ms/step , 115449.12 GFLOP/s , 173635.1 tokens/s INFO:__main__:2024-11-30 11:22:24 | Epoch: 0 | Step: 355040 | Dataset: 0-11529960 | Loss: 0.675 | 598 ms/step , 115482.02 GFLOP/s , 173630.8 tokens/s INFO:__main__:2024-11-30 11:22:31 | Epoch: 0 | Step: 355050 | Dataset: 0-11532360 | Loss: 0.704 | 597 ms/step , 115632.91 GFLOP/s , 173628.5 tokens/s INFO:__main__:2024-11-30 11:22:38 | Epoch: 0 | Step: 355060 | Dataset: 0-11534760 | Loss: 0.696 | 597 ms/step , 115539.92 GFLOP/s , 173661.8 tokens/s INFO:__main__:2024-11-30 11:22:46 | Epoch: 0 | Step: 355070 | Dataset: 0-11537160 | Loss: 0.628 | 597 ms/step , 115586.92 GFLOP/s , 173487.9 tokens/s INFO:__main__:2024-11-30 11:22:53 | Epoch: 0 | Step: 355080 | Dataset: 0-11539560 | Loss: 0.637 | 597 ms/step , 115562.86 GFLOP/s , 173602.7 tokens/s INFO:__main__:2024-11-30 11:23:00 | Epoch: 0 | Step: 355090 | Dataset: 0-11541960 | Loss: 0.605 | 597 ms/step , 115667.27 GFLOP/s , 173644.9 tokens/s INFO:__main__:2024-11-30 11:23:07 | Epoch: 0 | Step: 355100 | Dataset: 0-11544360 | Loss: 0.677 | 596 ms/step , 115726.52 GFLOP/s , 173599.8 tokens/s INFO:__main__:2024-11-30 11:23:14 | Epoch: 0 | Step: 355110 | Dataset: 0-11546760 | Loss: 0.656 | 597 ms/step , 115599.42 GFLOP/s , 173509.4 tokens/s INFO:__main__:2024-11-30 11:23:21 | Epoch: 0 | Step: 355120 | Dataset: 0-11549160 | Loss: 0.627 | 597 ms/step , 115521.35 GFLOP/s , 173592.6 tokens/s INFO:__main__:2024-11-30 11:23:28 | Epoch: 0 | Step: 355130 | Dataset: 0-11551560 | Loss: 0.648 | 597 ms/step , 115654.52 GFLOP/s , 173534.6 tokens/s INFO:__main__:2024-11-30 11:23:35 | Epoch: 0 | Step: 355140 | Dataset: 0-11553960 | Loss: 0.685 | 598 ms/step , 115458.80 GFLOP/s , 173523.2 tokens/s INFO:__main__:2024-11-30 11:23:42 | Epoch: 0 | Step: 355150 | Dataset: 0-11556360 | Loss: 0.630 | 597 ms/step , 115628.17 GFLOP/s , 173446.8 tokens/s INFO:__main__:2024-11-30 11:23:49 | Epoch: 0 | Step: 355160 | Dataset: 0-11558760 | Loss: 0.666 | 597 ms/step , 115587.70 GFLOP/s , 173569.4 tokens/s INFO:__main__:2024-11-30 11:23:56 | Epoch: 0 | Step: 355170 | Dataset: 0-11561160 | Loss: 0.622 | 598 ms/step , 115435.27 GFLOP/s , 173576.9 tokens/s INFO:__main__:2024-11-30 11:24:03 | Epoch: 0 | Step: 355180 | Dataset: 0-11563560 | Loss: 0.725 | 597 ms/step , 115612.58 GFLOP/s , 173579.5 tokens/s INFO:__main__:2024-11-30 11:24:10 | Epoch: 0 | Step: 355190 | Dataset: 0-11565960 | Loss: 0.667 | 597 ms/step , 115531.26 GFLOP/s , 173546.9 tokens/s INFO:__main__:2024-11-30 11:24:18 | Epoch: 0 | Step: 355200 | Dataset: 0-11568360 | Loss: 0.698 | 597 ms/step , 115629.09 GFLOP/s , 173609.0 tokens/s INFO:__main__:2024-11-30 11:24:25 | Epoch: 0 | Step: 355210 | Dataset: 0-11570760 | Loss: 0.770 | 597 ms/step , 115589.41 GFLOP/s , 173610.9 tokens/s INFO:__main__:2024-11-30 11:24:32 | Epoch: 0 | Step: 355220 | Dataset: 0-11573160 | Loss: 0.619 | 597 ms/step , 115580.77 GFLOP/s , 173598.0 tokens/s INFO:__main__:2024-11-30 11:24:39 | Epoch: 0 | Step: 355230 | Dataset: 0-11575560 | Loss: 0.699 | 597 ms/step , 115569.30 GFLOP/s , 173557.8 tokens/s INFO:__main__:2024-11-30 11:24:46 | Epoch: 0 | Step: 355240 | Dataset: 0-11577960 | Loss: 0.633 | 597 ms/step , 115665.81 GFLOP/s , 173596.9 tokens/s INFO:__main__:2024-11-30 11:24:53 | Epoch: 0 | Step: 355250 | Dataset: 0-11580360 | Loss: 0.755 | 597 ms/step , 115585.38 GFLOP/s , 173521.9 tokens/s INFO:__main__:2024-11-30 11:25:00 | Epoch: 0 | Step: 355260 | Dataset: 0-11582760 | Loss: 0.664 | 597 ms/step , 115604.83 GFLOP/s , 173625.2 tokens/s INFO:__main__:2024-11-30 11:25:07 | Epoch: 0 | Step: 355270 | Dataset: 0-11585160 | Loss: 0.713 | 597 ms/step , 115661.58 GFLOP/s , 173554.2 tokens/s INFO:__main__:2024-11-30 11:25:14 | Epoch: 0 | Step: 355280 | Dataset: 0-11587560 | Loss: 0.622 | 597 ms/step , 115620.60 GFLOP/s , 173556.2 tokens/s INFO:__main__:2024-11-30 11:25:21 | Epoch: 0 | Step: 355290 | Dataset: 0-11589960 | Loss: 0.676 | 597 ms/step , 115690.17 GFLOP/s , 173619.9 tokens/s INFO:__main__:2024-11-30 11:25:28 | Epoch: 0 | Step: 355300 | Dataset: 0-11592360 | Loss: 0.661 | 598 ms/step , 115475.48 GFLOP/s , 173483.9 tokens/s INFO:__main__:2024-11-30 11:25:35 | Epoch: 0 | Step: 355310 | Dataset: 0-11594760 | Loss: 0.711 | 598 ms/step , 115449.25 GFLOP/s , 173521.2 tokens/s INFO:__main__:2024-11-30 11:25:43 | Epoch: 0 | Step: 355320 | Dataset: 0-11597160 | Loss: 0.687 | 597 ms/step , 115534.34 GFLOP/s , 173439.7 tokens/s INFO:__main__:2024-11-30 11:25:50 | Epoch: 0 | Step: 355330 | Dataset: 0-11599560 | Loss: 0.677 | 596 ms/step , 115840.29 GFLOP/s , 173491.9 tokens/s INFO:__main__:2024-11-30 11:25:57 | Epoch: 0 | Step: 355340 | Dataset: 0-11601960 | Loss: 0.677 | 597 ms/step , 115667.55 GFLOP/s , 173541.4 tokens/s INFO:__main__:2024-11-30 11:26:04 | Epoch: 0 | Step: 355350 | Dataset: 0-11604360 | Loss: 0.774 | 596 ms/step , 115760.10 GFLOP/s , 173583.8 tokens/s INFO:__main__:2024-11-30 11:26:11 | Epoch: 0 | Step: 355360 | Dataset: 0-11606760 | Loss: 0.653 | 595 ms/step , 115929.76 GFLOP/s , 173595.8 tokens/s INFO:__main__:2024-11-30 11:26:18 | Epoch: 0 | Step: 355370 | Dataset: 0-11609160 | Loss: 0.742 | 595 ms/step , 115963.45 GFLOP/s , 173620.2 tokens/s INFO:__main__:2024-11-30 11:26:25 | Epoch: 0 | Step: 355380 | Dataset: 0-11611560 | Loss: 0.752 | 595 ms/step , 115944.68 GFLOP/s , 173518.5 tokens/s INFO:__main__:2024-11-30 11:26:32 | Epoch: 0 | Step: 355390 | Dataset: 0-11613960 | Loss: 0.668 | 595 ms/step , 115932.02 GFLOP/s , 173580.5 tokens/s INFO:__main__:2024-11-30 11:26:39 | Epoch: 0 | Step: 355400 | Dataset: 0-11616360 | Loss: 0.721 | 596 ms/step , 115869.97 GFLOP/s , 173508.2 tokens/s INFO:__main__:2024-11-30 11:26:46 | Epoch: 0 | Step: 355410 | Dataset: 0-11618760 | Loss: 0.570 | 596 ms/step , 115753.38 GFLOP/s , 173493.0 tokens/s INFO:__main__:2024-11-30 11:26:53 | Epoch: 0 | Step: 355420 | Dataset: 0-11621160 | Loss: 0.730 | 597 ms/step , 115650.52 GFLOP/s , 173539.1 tokens/s INFO:__main__:2024-11-30 11:27:00 | Epoch: 0 | Step: 355430 | Dataset: 0-11623560 | Loss: 0.647 | 596 ms/step , 115847.92 GFLOP/s , 173494.8 tokens/s INFO:__main__:2024-11-30 11:27:07 | Epoch: 0 | Step: 355440 | Dataset: 0-11625960 | Loss: 0.725 | 596 ms/step , 115793.89 GFLOP/s , 173577.9 tokens/s INFO:__main__:2024-11-30 11:27:15 | Epoch: 0 | Step: 355450 | Dataset: 0-11628360 | Loss: 0.727 | 600 ms/step , 115027.95 GFLOP/s , 173408.3 tokens/s INFO:__main__:2024-11-30 11:27:22 | Epoch: 0 | Step: 355460 | Dataset: 0-11630760 | Loss: 0.626 | 595 ms/step , 115927.83 GFLOP/s , 173517.1 tokens/s INFO:__main__:2024-11-30 11:27:29 | Epoch: 0 | Step: 355470 | Dataset: 0-11633160 | Loss: 0.642 | 595 ms/step , 115907.68 GFLOP/s , 173560.8 tokens/s INFO:__main__:2024-11-30 11:27:36 | Epoch: 0 | Step: 355480 | Dataset: 0-11635560 | Loss: 0.732 | 596 ms/step , 115713.04 GFLOP/s , 173444.0 tokens/s INFO:__main__:2024-11-30 11:27:43 | Epoch: 0 | Step: 355490 | Dataset: 0-11637960 | Loss: 0.743 | 596 ms/step , 115737.08 GFLOP/s , 173496.7 tokens/s INFO:__main__:2024-11-30 11:27:51 | Validation | Step: 355500 | Val_loss: 0.660 | Best_val_loss: 0.3487 INFO:__main__:2024-11-30 11:27:51 | Epoch: 0 | Step: 355500 | Dataset: 0-11640360 | Loss: 0.649 | 595 ms/step , 115966.34 GFLOP/s , 147707.0 tokens/s INFO:__main__:2024-11-30 11:27:58 | Epoch: 0 | Step: 355510 | Dataset: 0-11642760 | Loss: 0.611 | 597 ms/step , 115600.09 GFLOP/s , 173524.0 tokens/s INFO:__main__:2024-11-30 11:28:05 | Epoch: 0 | Step: 355520 | Dataset: 0-11645160 | Loss: 0.637 | 597 ms/step , 115631.05 GFLOP/s , 173551.0 tokens/s INFO:__main__:2024-11-30 11:28:12 | Epoch: 0 | Step: 355530 | Dataset: 0-11647560 | Loss: 0.664 | 597 ms/step , 115612.71 GFLOP/s , 173447.3 tokens/s INFO:__main__:2024-11-30 11:28:20 | Epoch: 0 | Step: 355540 | Dataset: 0-11649960 | Loss: 0.698 | 596 ms/step , 115803.96 GFLOP/s , 173439.0 tokens/s INFO:__main__:2024-11-30 11:28:27 | Epoch: 0 | Step: 355550 | Dataset: 0-11652360 | Loss: 0.685 | 597 ms/step , 115690.65 GFLOP/s , 173481.4 tokens/s INFO:__main__:2024-11-30 11:28:34 | Epoch: 0 | Step: 355560 | Dataset: 0-11654760 | Loss: 0.660 | 596 ms/step , 115775.13 GFLOP/s , 173442.5 tokens/s INFO:__main__:2024-11-30 11:28:41 | Epoch: 0 | Step: 355570 | Dataset: 0-11657160 | Loss: 0.730 | 596 ms/step , 115702.64 GFLOP/s , 173421.5 tokens/s INFO:__main__:2024-11-30 11:28:48 | Epoch: 0 | Step: 355580 | Dataset: 0-11659560 | Loss: 0.740 | 596 ms/step , 115827.25 GFLOP/s , 173401.5 tokens/s INFO:__main__:2024-11-30 11:28:55 | Epoch: 0 | Step: 355590 | Dataset: 0-11661960 | Loss: 0.755 | 596 ms/step , 115736.46 GFLOP/s , 173443.8 tokens/s INFO:__main__:2024-11-30 11:29:02 | Epoch: 0 | Step: 355600 | Dataset: 0-11664360 | Loss: 0.699 | 598 ms/step , 115486.14 GFLOP/s , 173394.1 tokens/s INFO:__main__:2024-11-30 11:29:09 | Epoch: 0 | Step: 355610 | Dataset: 0-11666760 | Loss: 0.472 | 596 ms/step , 115699.72 GFLOP/s , 173558.8 tokens/s INFO:__main__:2024-11-30 11:29:16 | Epoch: 0 | Step: 355620 | Dataset: 0-11669160 | Loss: 0.470 | 596 ms/step , 115725.72 GFLOP/s , 173886.5 tokens/s INFO:__main__:2024-11-30 11:29:23 | Epoch: 0 | Step: 355630 | Dataset: 0-11671560 | Loss: 0.447 | 596 ms/step , 115785.98 GFLOP/s , 173847.1 tokens/s INFO:__main__:2024-11-30 11:29:30 | Epoch: 0 | Step: 355640 | Dataset: 0-11673960 | Loss: 0.383 | 596 ms/step , 115816.31 GFLOP/s , 173889.8 tokens/s INFO:__main__:2024-11-30 11:29:37 | Epoch: 0 | Step: 355650 | Dataset: 0-11676360 | Loss: 0.431 | 597 ms/step , 115562.42 GFLOP/s , 173891.7 tokens/s INFO:__main__:2024-11-30 11:29:44 | Epoch: 0 | Step: 355660 | Dataset: 0-11678760 | Loss: 0.421 | 596 ms/step , 115766.30 GFLOP/s , 173899.7 tokens/s INFO:__main__:2024-11-30 11:29:52 | Epoch: 0 | Step: 355670 | Dataset: 0-11681160 | Loss: 0.404 | 596 ms/step , 115834.59 GFLOP/s , 173854.7 tokens/s INFO:__main__:2024-11-30 11:29:59 | Epoch: 0 | Step: 355680 | Dataset: 0-11683560 | Loss: 0.392 | 596 ms/step , 115780.25 GFLOP/s , 173795.2 tokens/s INFO:__main__:2024-11-30 11:30:06 | Epoch: 0 | Step: 355690 | Dataset: 0-11685960 | Loss: 0.423 | 596 ms/step , 115775.60 GFLOP/s , 173862.5 tokens/s INFO:__main__:2024-11-30 11:30:13 | Epoch: 0 | Step: 355700 | Dataset: 0-11688360 | Loss: 0.395 | 596 ms/step , 115831.44 GFLOP/s , 173927.3 tokens/s INFO:__main__:2024-11-30 11:30:20 | Epoch: 0 | Step: 355710 | Dataset: 0-11690760 | Loss: 0.438 | 596 ms/step , 115824.93 GFLOP/s , 173916.8 tokens/s INFO:__main__:2024-11-30 11:30:27 | Epoch: 0 | Step: 355720 | Dataset: 0-11693160 | Loss: 0.431 | 596 ms/step , 115718.65 GFLOP/s , 173871.7 tokens/s INFO:__main__:2024-11-30 11:30:34 | Epoch: 0 | Step: 355730 | Dataset: 0-11695560 | Loss: 0.397 | 596 ms/step , 115793.67 GFLOP/s , 173933.6 tokens/s INFO:__main__:2024-11-30 11:30:41 | Epoch: 0 | Step: 355740 | Dataset: 0-11697960 | Loss: 0.404 | 598 ms/step , 115464.58 GFLOP/s , 173897.5 tokens/s INFO:__main__:2024-11-30 11:30:48 | Epoch: 0 | Step: 355750 | Dataset: 0-11700360 | Loss: 0.397 | 601 ms/step , 114801.10 GFLOP/s , 173886.1 tokens/s INFO:__main__:2024-11-30 11:30:55 | Epoch: 0 | Step: 355760 | Dataset: 0-11702760 | Loss: 0.403 | 596 ms/step , 115783.20 GFLOP/s , 173888.6 tokens/s INFO:__main__:2024-11-30 11:31:02 | Epoch: 0 | Step: 355770 | Dataset: 0-11705160 | Loss: 1.134 | 597 ms/step , 115613.34 GFLOP/s , 173877.3 tokens/s INFO:__main__:2024-11-30 11:31:09 | Epoch: 0 | Step: 355780 | Dataset: 0-11707560 | Loss: 0.576 | 595 ms/step , 115981.76 GFLOP/s , 173978.0 tokens/s INFO:__main__:2024-11-30 11:31:16 | Epoch: 0 | Step: 355790 | Dataset: 0-11709960 | Loss: 0.553 | 596 ms/step , 115813.96 GFLOP/s , 174060.0 tokens/s INFO:__main__:2024-11-30 11:31:23 | Epoch: 0 | Step: 355800 | Dataset: 0-11712360 | Loss: 0.527 | 595 ms/step , 115910.50 GFLOP/s , 174100.3 tokens/s INFO:__main__:2024-11-30 11:31:30 | Epoch: 0 | Step: 355810 | Dataset: 0-11714760 | Loss: 0.580 | 596 ms/step , 115851.06 GFLOP/s , 174111.8 tokens/s INFO:__main__:2024-11-30 11:31:38 | Epoch: 0 | Step: 355820 | Dataset: 0-11717160 | Loss: 0.502 | 596 ms/step , 115833.90 GFLOP/s , 174012.6 tokens/s INFO:__main__:2024-11-30 11:31:45 | Epoch: 0 | Step: 355830 | Dataset: 0-11719560 | Loss: 0.550 | 596 ms/step , 115720.68 GFLOP/s , 174091.0 tokens/s INFO:__main__:2024-11-30 11:31:52 | Epoch: 0 | Step: 355840 | Dataset: 0-11721960 | Loss: 0.554 | 596 ms/step , 115881.90 GFLOP/s , 174096.7 tokens/s INFO:__main__:2024-11-30 11:31:59 | Epoch: 0 | Step: 355850 | Dataset: 0-11724360 | Loss: 0.563 | 597 ms/step , 115513.18 GFLOP/s , 174035.6 tokens/s INFO:__main__:2024-11-30 11:32:06 | Epoch: 0 | Step: 355860 | Dataset: 0-11726760 | Loss: 0.582 | 596 ms/step , 115784.03 GFLOP/s , 174057.1 tokens/s INFO:__main__:2024-11-30 11:32:13 | Epoch: 0 | Step: 355870 | Dataset: 0-11729160 | Loss: 0.489 | 597 ms/step , 115660.44 GFLOP/s , 174010.8 tokens/s INFO:__main__:2024-11-30 11:32:20 | Epoch: 0 | Step: 355880 | Dataset: 0-11731560 | Loss: 0.495 | 596 ms/step , 115777.29 GFLOP/s , 174070.7 tokens/s INFO:__main__:2024-11-30 11:32:27 | Epoch: 0 | Step: 355890 | Dataset: 0-11733960 | Loss: 0.569 | 596 ms/step , 115836.98 GFLOP/s , 174067.9 tokens/s INFO:__main__:2024-11-30 11:32:34 | Epoch: 0 | Step: 355900 | Dataset: 0-11736360 | Loss: 0.506 | 595 ms/step , 115943.88 GFLOP/s , 174109.9 tokens/s INFO:__main__:2024-11-30 11:32:41 | Epoch: 0 | Step: 355910 | Dataset: 0-11738760 | Loss: 0.538 | 595 ms/step , 115926.46 GFLOP/s , 174027.4 tokens/s INFO:__main__:2024-11-30 11:32:48 | Epoch: 0 | Step: 355920 | Dataset: 0-11741160 | Loss: 0.501 | 596 ms/step , 115794.82 GFLOP/s , 174079.1 tokens/s INFO:__main__:2024-11-30 11:32:55 | Epoch: 0 | Step: 355930 | Dataset: 0-11743560 | Loss: 0.619 | 596 ms/step , 115851.96 GFLOP/s , 174061.5 tokens/s INFO:__main__:2024-11-30 11:33:02 | Epoch: 0 | Step: 355940 | Dataset: 0-11745960 | Loss: 0.496 | 595 ms/step , 116049.80 GFLOP/s , 174154.6 tokens/s INFO:__main__:2024-11-30 11:33:09 | Epoch: 0 | Step: 355950 | Dataset: 0-11748360 | Loss: 0.573 | 596 ms/step , 115826.34 GFLOP/s , 174126.4 tokens/s INFO:__main__:2024-11-30 11:33:16 | Epoch: 0 | Step: 355960 | Dataset: 0-11750760 | Loss: 0.560 | 595 ms/step , 116014.82 GFLOP/s , 173880.6 tokens/s INFO:__main__:2024-11-30 11:33:23 | Epoch: 0 | Step: 355970 | Dataset: 0-11753160 | Loss: 0.459 | 596 ms/step , 115841.12 GFLOP/s , 173983.0 tokens/s INFO:__main__:2024-11-30 11:33:30 | Epoch: 0 | Step: 355980 | Dataset: 0-11755560 | Loss: 0.508 | 595 ms/step , 115919.45 GFLOP/s , 174138.4 tokens/s INFO:__main__:2024-11-30 11:33:38 | Epoch: 0 | Step: 355990 | Dataset: 0-11757960 | Loss: 0.489 | 595 ms/step , 115899.63 GFLOP/s , 174120.1 tokens/s INFO:__main__:2024-11-30 11:33:45 | Validation | Step: 356000 | Val_loss: 0.700 | Best_val_loss: 0.3487 INFO:__main__:2024-11-30 11:33:45 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_113345_step_356000.pt` INFO:__main__:2024-11-30 11:33:48 | Epoch: 0 | Step: 356000 | Dataset: 0-11760360 | Loss: 0.518 | 594 ms/step , 116218.45 GFLOP/s , 121926.3 tokens/s INFO:__main__:2024-11-30 11:33:55 | Epoch: 0 | Step: 356010 | Dataset: 0-11762760 | Loss: 0.524 | 596 ms/step , 115837.28 GFLOP/s , 173760.9 tokens/s INFO:__main__:2024-11-30 11:34:02 | Epoch: 0 | Step: 356020 | Dataset: 0-11765160 | Loss: 0.467 | 596 ms/step , 115885.93 GFLOP/s , 173551.4 tokens/s INFO:__main__:2024-11-30 11:34:09 | Epoch: 0 | Step: 356030 | Dataset: 0-11767560 | Loss: 0.482 | 595 ms/step , 115999.65 GFLOP/s , 173702.7 tokens/s INFO:__main__:2024-11-30 11:34:16 | Epoch: 0 | Step: 356040 | Dataset: 0-11769960 | Loss: 0.473 | 595 ms/step , 115957.61 GFLOP/s , 173699.0 tokens/s INFO:__main__:2024-11-30 11:34:23 | Epoch: 0 | Step: 356050 | Dataset: 0-11772360 | Loss: 0.497 | 596 ms/step , 115852.79 GFLOP/s , 173771.2 tokens/s INFO:__main__:2024-11-30 11:34:30 | Epoch: 0 | Step: 356060 | Dataset: 0-11774760 | Loss: 0.465 | 595 ms/step , 116004.92 GFLOP/s , 173701.6 tokens/s INFO:__main__:2024-11-30 11:34:37 | Epoch: 0 | Step: 356070 | Dataset: 0-11777160 | Loss: 0.552 | 595 ms/step , 116034.11 GFLOP/s , 173647.1 tokens/s INFO:__main__:2024-11-30 11:34:44 | Epoch: 0 | Step: 356080 | Dataset: 0-11779560 | Loss: 0.555 | 596 ms/step , 115791.46 GFLOP/s , 173608.8 tokens/s INFO:__main__:2024-11-30 11:34:51 | Epoch: 0 | Step: 356090 | Dataset: 0-11781960 | Loss: 0.522 | 595 ms/step , 115939.80 GFLOP/s , 173632.7 tokens/s INFO:__main__:2024-11-30 11:34:58 | Epoch: 0 | Step: 356100 | Dataset: 0-11784360 | Loss: 0.629 | 595 ms/step , 115890.77 GFLOP/s , 173757.2 tokens/s INFO:__main__:2024-11-30 11:35:05 | Epoch: 0 | Step: 356110 | Dataset: 0-11786760 | Loss: 0.607 | 595 ms/step , 116071.83 GFLOP/s , 173717.6 tokens/s INFO:__main__:2024-11-30 11:35:12 | Epoch: 0 | Step: 356120 | Dataset: 0-11789160 | Loss: 0.522 | 595 ms/step , 116021.85 GFLOP/s , 173661.7 tokens/s INFO:__main__:2024-11-30 11:35:20 | Epoch: 0 | Step: 356130 | Dataset: 0-11791560 | Loss: 0.509 | 595 ms/step , 115956.16 GFLOP/s , 173674.6 tokens/s INFO:__main__:2024-11-30 11:35:27 | Epoch: 0 | Step: 356140 | Dataset: 0-11793960 | Loss: 0.590 | 594 ms/step , 116154.57 GFLOP/s , 173710.2 tokens/s INFO:__main__:2024-11-30 11:35:34 | Epoch: 0 | Step: 356150 | Dataset: 0-11796360 | Loss: 0.590 | 594 ms/step , 116178.22 GFLOP/s , 173599.4 tokens/s INFO:__main__:2024-11-30 11:35:41 | Epoch: 0 | Step: 356160 | Dataset: 0-11798760 | Loss: 0.695 | 596 ms/step , 115881.02 GFLOP/s , 173471.5 tokens/s INFO:__main__:2024-11-30 11:35:48 | Epoch: 0 | Step: 356170 | Dataset: 0-11801160 | Loss: 0.662 | 597 ms/step , 115694.10 GFLOP/s , 173464.7 tokens/s INFO:__main__:2024-11-30 11:35:55 | Epoch: 0 | Step: 356180 | Dataset: 0-11803560 | Loss: 0.722 | 596 ms/step , 115700.34 GFLOP/s , 173436.9 tokens/s INFO:__main__:2024-11-30 11:36:02 | Epoch: 0 | Step: 356190 | Dataset: 0-11805960 | Loss: 0.644 | 596 ms/step , 115781.53 GFLOP/s , 173514.0 tokens/s INFO:__main__:2024-11-30 11:36:09 | Epoch: 0 | Step: 356200 | Dataset: 0-11808360 | Loss: 0.688 | 597 ms/step , 115695.28 GFLOP/s , 173569.5 tokens/s INFO:__main__:2024-11-30 11:36:16 | Epoch: 0 | Step: 356210 | Dataset: 0-11810760 | Loss: 0.622 | 595 ms/step , 115988.53 GFLOP/s , 173480.0 tokens/s INFO:__main__:2024-11-30 11:36:23 | Epoch: 0 | Step: 356220 | Dataset: 0-11813160 | Loss: 0.680 | 595 ms/step , 116033.69 GFLOP/s , 173525.9 tokens/s INFO:__main__:2024-11-30 11:36:30 | Epoch: 0 | Step: 356230 | Dataset: 0-11815560 | Loss: 0.673 | 596 ms/step , 115834.98 GFLOP/s , 173483.5 tokens/s INFO:__main__:2024-11-30 11:36:37 | Epoch: 0 | Step: 356240 | Dataset: 0-11817960 | Loss: 0.735 | 595 ms/step , 115905.14 GFLOP/s , 173535.3 tokens/s INFO:__main__:2024-11-30 11:36:45 | Epoch: 0 | Step: 356250 | Dataset: 0-11820360 | Loss: 0.637 | 595 ms/step , 115891.90 GFLOP/s , 173462.9 tokens/s INFO:__main__:2024-11-30 11:36:52 | Epoch: 0 | Step: 356260 | Dataset: 0-11822760 | Loss: 0.705 | 596 ms/step , 115807.40 GFLOP/s , 173527.0 tokens/s INFO:__main__:2024-11-30 11:36:59 | Epoch: 0 | Step: 356270 | Dataset: 0-11825160 | Loss: 0.659 | 596 ms/step , 115786.02 GFLOP/s , 173462.8 tokens/s INFO:__main__:2024-11-30 11:37:06 | Epoch: 0 | Step: 356280 | Dataset: 0-11827560 | Loss: 0.705 | 596 ms/step , 115833.37 GFLOP/s , 173525.0 tokens/s INFO:__main__:2024-11-30 11:37:13 | Epoch: 0 | Step: 356290 | Dataset: 0-11829960 | Loss: 0.700 | 596 ms/step , 115735.69 GFLOP/s , 173560.2 tokens/s INFO:__main__:2024-11-30 11:37:20 | Epoch: 0 | Step: 356300 | Dataset: 0-11832360 | Loss: 0.719 | 597 ms/step , 115671.77 GFLOP/s , 173508.6 tokens/s INFO:__main__:2024-11-30 11:37:27 | Epoch: 0 | Step: 356310 | Dataset: 0-11834760 | Loss: 0.556 | 595 ms/step , 115920.88 GFLOP/s , 173548.9 tokens/s INFO:__main__:2024-11-30 11:37:34 | Epoch: 0 | Step: 356320 | Dataset: 0-11837160 | Loss: 0.632 | 595 ms/step , 115985.12 GFLOP/s , 173562.3 tokens/s INFO:__main__:2024-11-30 11:37:41 | Epoch: 0 | Step: 356330 | Dataset: 0-11839560 | Loss: 0.618 | 595 ms/step , 115958.04 GFLOP/s , 173541.2 tokens/s INFO:__main__:2024-11-30 11:37:48 | Epoch: 0 | Step: 356340 | Dataset: 0-11841960 | Loss: 0.673 | 595 ms/step , 115947.27 GFLOP/s , 173608.5 tokens/s INFO:__main__:2024-11-30 11:37:55 | Epoch: 0 | Step: 356350 | Dataset: 0-11844360 | Loss: 0.684 | 596 ms/step , 115724.66 GFLOP/s , 173574.3 tokens/s INFO:__main__:2024-11-30 11:38:02 | Epoch: 0 | Step: 356360 | Dataset: 0-11846760 | Loss: 0.575 | 596 ms/step , 115818.14 GFLOP/s , 173591.9 tokens/s INFO:__main__:2024-11-30 11:38:10 | Epoch: 0 | Step: 356370 | Dataset: 0-11849160 | Loss: 0.605 | 595 ms/step , 115968.67 GFLOP/s , 173645.0 tokens/s INFO:__main__:2024-11-30 11:38:17 | Epoch: 0 | Step: 356380 | Dataset: 0-11851560 | Loss: 0.597 | 595 ms/step , 115891.14 GFLOP/s , 173594.7 tokens/s INFO:__main__:2024-11-30 11:38:24 | Epoch: 0 | Step: 356390 | Dataset: 0-11853960 | Loss: 0.650 | 596 ms/step , 115873.43 GFLOP/s , 173614.4 tokens/s INFO:__main__:2024-11-30 11:38:31 | Epoch: 0 | Step: 356400 | Dataset: 0-11856360 | Loss: 0.715 | 595 ms/step , 115907.50 GFLOP/s , 173615.0 tokens/s INFO:__main__:2024-11-30 11:38:38 | Epoch: 0 | Step: 356410 | Dataset: 0-11858760 | Loss: 0.704 | 595 ms/step , 115958.03 GFLOP/s , 173624.0 tokens/s INFO:__main__:2024-11-30 11:38:45 | Epoch: 0 | Step: 356420 | Dataset: 0-11861160 | Loss: 0.653 | 595 ms/step , 116037.43 GFLOP/s , 173563.3 tokens/s INFO:__main__:2024-11-30 11:38:52 | Epoch: 0 | Step: 356430 | Dataset: 0-11863560 | Loss: 0.751 | 595 ms/step , 116078.02 GFLOP/s , 173625.9 tokens/s INFO:__main__:2024-11-30 11:38:59 | Epoch: 0 | Step: 356440 | Dataset: 0-11865960 | Loss: 0.769 | 595 ms/step , 115994.70 GFLOP/s , 173559.5 tokens/s INFO:__main__:2024-11-30 11:39:06 | Epoch: 0 | Step: 356450 | Dataset: 0-11868360 | Loss: 0.661 | 595 ms/step , 115933.82 GFLOP/s , 173588.7 tokens/s INFO:__main__:2024-11-30 11:39:13 | Epoch: 0 | Step: 356460 | Dataset: 0-11870760 | Loss: 0.693 | 595 ms/step , 115911.84 GFLOP/s , 173535.9 tokens/s INFO:__main__:2024-11-30 11:39:20 | Epoch: 0 | Step: 356470 | Dataset: 0-11873160 | Loss: 0.555 | 596 ms/step , 115832.55 GFLOP/s , 173437.5 tokens/s INFO:__main__:2024-11-30 11:39:27 | Epoch: 0 | Step: 356480 | Dataset: 0-11875560 | Loss: 0.689 | 595 ms/step , 115946.07 GFLOP/s , 173618.6 tokens/s INFO:__main__:2024-11-30 11:39:34 | Epoch: 0 | Step: 356490 | Dataset: 0-11877960 | Loss: 0.642 | 596 ms/step , 115862.03 GFLOP/s , 173553.8 tokens/s INFO:__main__:2024-11-30 11:39:42 | Validation | Step: 356500 | Val_loss: 0.643 | Best_val_loss: 0.3487 INFO:__main__:2024-11-30 11:39:43 | Epoch: 0 | Step: 356500 | Dataset: 0-11880360 | Loss: 0.656 | 595 ms/step , 116016.63 GFLOP/s , 147876.5 tokens/s INFO:__main__:2024-11-30 11:39:50 | Epoch: 0 | Step: 356510 | Dataset: 0-11882760 | Loss: 0.755 | 595 ms/step , 115890.09 GFLOP/s , 173815.3 tokens/s INFO:__main__:2024-11-30 11:39:57 | Epoch: 0 | Step: 356520 | Dataset: 0-11885160 | Loss: 0.763 | 596 ms/step , 115887.26 GFLOP/s , 173621.3 tokens/s INFO:__main__:2024-11-30 11:40:04 | Epoch: 0 | Step: 356530 | Dataset: 0-11887560 | Loss: 0.630 | 596 ms/step , 115863.20 GFLOP/s , 173519.1 tokens/s INFO:__main__:2024-11-30 11:40:11 | Epoch: 0 | Step: 356540 | Dataset: 0-11889960 | Loss: 0.717 | 595 ms/step , 115915.11 GFLOP/s , 173587.9 tokens/s INFO:__main__:2024-11-30 11:40:18 | Epoch: 0 | Step: 356550 | Dataset: 0-11892360 | Loss: 0.671 | 595 ms/step , 115956.22 GFLOP/s , 173563.1 tokens/s INFO:__main__:2024-11-30 11:40:25 | Epoch: 0 | Step: 356560 | Dataset: 0-11894760 | Loss: 0.680 | 596 ms/step , 115749.54 GFLOP/s , 173658.3 tokens/s INFO:__main__:2024-11-30 11:40:32 | Epoch: 0 | Step: 356570 | Dataset: 0-11897160 | Loss: 0.631 | 598 ms/step , 115409.32 GFLOP/s , 173508.3 tokens/s INFO:__main__:2024-11-30 11:40:39 | Epoch: 0 | Step: 356580 | Dataset: 0-11899560 | Loss: 0.542 | 595 ms/step , 116041.02 GFLOP/s , 173552.4 tokens/s INFO:__main__:2024-11-30 11:40:46 | Epoch: 0 | Step: 356590 | Dataset: 0-11901960 | Loss: 0.702 | 596 ms/step , 115716.17 GFLOP/s , 173596.4 tokens/s INFO:__main__:2024-11-30 11:40:54 | Epoch: 0 | Step: 356600 | Dataset: 0-11904360 | Loss: 0.667 | 595 ms/step , 115911.84 GFLOP/s , 173539.8 tokens/s INFO:__main__:2024-11-30 11:41:01 | Epoch: 0 | Step: 356610 | Dataset: 0-11906760 | Loss: 0.722 | 596 ms/step , 115861.93 GFLOP/s , 173536.7 tokens/s INFO:__main__:2024-11-30 11:41:08 | Epoch: 0 | Step: 356620 | Dataset: 0-11909160 | Loss: 0.698 | 596 ms/step , 115858.51 GFLOP/s , 173521.1 tokens/s INFO:__main__:2024-11-30 11:41:15 | Epoch: 0 | Step: 356630 | Dataset: 0-11911560 | Loss: 0.737 | 595 ms/step , 115946.46 GFLOP/s , 173455.4 tokens/s INFO:__main__:2024-11-30 11:41:22 | Epoch: 0 | Step: 356640 | Dataset: 0-11913960 | Loss: 0.635 | 595 ms/step , 115901.70 GFLOP/s , 173577.0 tokens/s INFO:__main__:2024-11-30 11:41:29 | Epoch: 0 | Step: 356650 | Dataset: 0-11916360 | Loss: 0.656 | 596 ms/step , 115775.14 GFLOP/s , 173498.7 tokens/s INFO:__main__:2024-11-30 11:41:36 | Epoch: 0 | Step: 356660 | Dataset: 0-11918760 | Loss: 0.661 | 596 ms/step , 115792.12 GFLOP/s , 173458.3 tokens/s INFO:__main__:2024-11-30 11:41:43 | Epoch: 0 | Step: 356670 | Dataset: 0-11921160 | Loss: 0.748 | 596 ms/step , 115807.23 GFLOP/s , 173493.9 tokens/s INFO:__main__:2024-11-30 11:41:50 | Epoch: 0 | Step: 356680 | Dataset: 0-11923560 | Loss: 0.711 | 597 ms/step , 115566.33 GFLOP/s , 173553.2 tokens/s INFO:__main__:2024-11-30 11:41:57 | Epoch: 0 | Step: 356690 | Dataset: 0-11925960 | Loss: 0.608 | 596 ms/step , 115789.81 GFLOP/s , 173510.1 tokens/s INFO:__main__:2024-11-30 11:42:04 | Epoch: 0 | Step: 356700 | Dataset: 0-11928360 | Loss: 0.786 | 596 ms/step , 115810.64 GFLOP/s , 173628.3 tokens/s INFO:__main__:2024-11-30 11:42:11 | Epoch: 0 | Step: 356710 | Dataset: 0-11930760 | Loss: 0.658 | 596 ms/step , 115803.91 GFLOP/s , 173424.5 tokens/s INFO:__main__:2024-11-30 11:42:19 | Epoch: 0 | Step: 356720 | Dataset: 0-11933160 | Loss: 0.762 | 596 ms/step , 115802.24 GFLOP/s , 173560.5 tokens/s INFO:__main__:2024-11-30 11:42:26 | Epoch: 0 | Step: 356730 | Dataset: 0-11935560 | Loss: 0.708 | 596 ms/step , 115705.11 GFLOP/s , 173511.0 tokens/s INFO:__main__:2024-11-30 11:42:33 | Epoch: 0 | Step: 356740 | Dataset: 0-11937960 | Loss: 0.696 | 595 ms/step , 115952.36 GFLOP/s , 173436.2 tokens/s INFO:__main__:2024-11-30 11:42:40 | Epoch: 0 | Step: 356750 | Dataset: 0-11940360 | Loss: 0.734 | 596 ms/step , 115796.31 GFLOP/s , 173492.4 tokens/s INFO:__main__:2024-11-30 11:42:47 | Epoch: 0 | Step: 356760 | Dataset: 0-11942760 | Loss: 0.679 | 596 ms/step , 115724.07 GFLOP/s , 173468.7 tokens/s INFO:__main__:2024-11-30 11:42:54 | Epoch: 0 | Step: 356770 | Dataset: 0-11945160 | Loss: 0.645 | 596 ms/step , 115825.40 GFLOP/s , 173494.2 tokens/s INFO:__main__:2024-11-30 11:43:01 | Epoch: 0 | Step: 356780 | Dataset: 0-11947560 | Loss: 0.632 | 596 ms/step , 115831.75 GFLOP/s , 173416.9 tokens/s INFO:__main__:2024-11-30 11:43:08 | Epoch: 0 | Step: 356790 | Dataset: 0-11949960 | Loss: 0.738 | 596 ms/step , 115703.08 GFLOP/s , 173456.9 tokens/s INFO:__main__:2024-11-30 11:43:15 | Epoch: 0 | Step: 356800 | Dataset: 0-11952360 | Loss: 0.642 | 596 ms/step , 115746.89 GFLOP/s , 173505.2 tokens/s INFO:__main__:2024-11-30 11:43:22 | Epoch: 0 | Step: 356810 | Dataset: 0-11954760 | Loss: 0.629 | 596 ms/step , 115752.56 GFLOP/s , 173469.3 tokens/s INFO:__main__:2024-11-30 11:43:29 | Epoch: 0 | Step: 356820 | Dataset: 0-11957160 | Loss: 0.707 | 597 ms/step , 115622.94 GFLOP/s , 173510.5 tokens/s INFO:__main__:2024-11-30 11:43:36 | Epoch: 0 | Step: 356830 | Dataset: 0-11959560 | Loss: 0.582 | 596 ms/step , 115808.35 GFLOP/s , 173379.5 tokens/s INFO:__main__:2024-11-30 11:43:44 | Epoch: 0 | Step: 356840 | Dataset: 0-11961960 | Loss: 0.651 | 595 ms/step , 115993.40 GFLOP/s , 173473.7 tokens/s INFO:__main__:2024-11-30 11:43:51 | Epoch: 0 | Step: 356850 | Dataset: 0-11964360 | Loss: 0.690 | 598 ms/step , 115319.38 GFLOP/s , 173562.0 tokens/s INFO:__main__:2024-11-30 11:43:58 | Epoch: 0 | Step: 356860 | Dataset: 0-11966760 | Loss: 0.596 | 596 ms/step , 115732.61 GFLOP/s , 173500.1 tokens/s INFO:__main__:2024-11-30 11:44:05 | Epoch: 0 | Step: 356870 | Dataset: 0-11969160 | Loss: 0.655 | 596 ms/step , 115836.81 GFLOP/s , 173537.3 tokens/s INFO:__main__:2024-11-30 11:44:12 | Epoch: 0 | Step: 356880 | Dataset: 0-11971560 | Loss: 0.701 | 597 ms/step , 115689.65 GFLOP/s , 173429.5 tokens/s INFO:__main__:2024-11-30 11:44:19 | Epoch: 0 | Step: 356890 | Dataset: 0-11973960 | Loss: 0.653 | 596 ms/step , 115834.96 GFLOP/s , 173409.7 tokens/s INFO:__main__:2024-11-30 11:44:26 | Epoch: 0 | Step: 356900 | Dataset: 0-11976360 | Loss: 0.691 | 596 ms/step , 115776.07 GFLOP/s , 173500.1 tokens/s INFO:__main__:2024-11-30 11:44:33 | Epoch: 0 | Step: 356910 | Dataset: 0-11978760 | Loss: 0.720 | 596 ms/step , 115763.13 GFLOP/s , 173444.4 tokens/s INFO:__main__:2024-11-30 11:44:40 | Epoch: 0 | Step: 356920 | Dataset: 0-11981160 | Loss: 0.652 | 596 ms/step , 115779.02 GFLOP/s , 173439.1 tokens/s INFO:__main__:2024-11-30 11:44:47 | Epoch: 0 | Step: 356930 | Dataset: 0-11983560 | Loss: 0.624 | 596 ms/step , 115783.68 GFLOP/s , 173465.3 tokens/s INFO:__main__:2024-11-30 11:44:54 | Epoch: 0 | Step: 356940 | Dataset: 0-11985960 | Loss: 0.614 | 597 ms/step , 115654.84 GFLOP/s , 173520.5 tokens/s INFO:__main__:2024-11-30 11:45:01 | Epoch: 0 | Step: 356950 | Dataset: 0-11988360 | Loss: 0.643 | 596 ms/step , 115785.65 GFLOP/s , 173535.6 tokens/s INFO:__main__:2024-11-30 11:45:09 | Epoch: 0 | Step: 356960 | Dataset: 0-11990760 | Loss: 0.638 | 596 ms/step , 115850.25 GFLOP/s , 173505.4 tokens/s INFO:__main__:2024-11-30 11:45:16 | Epoch: 0 | Step: 356970 | Dataset: 0-11993160 | Loss: 0.706 | 596 ms/step , 115720.97 GFLOP/s , 173486.1 tokens/s INFO:__main__:2024-11-30 11:45:23 | Epoch: 0 | Step: 356980 | Dataset: 0-11995560 | Loss: 0.657 | 596 ms/step , 115799.34 GFLOP/s , 173496.3 tokens/s INFO:__main__:2024-11-30 11:45:30 | Epoch: 0 | Step: 356990 | Dataset: 0-11997960 | Loss: 0.730 | 597 ms/step , 115672.61 GFLOP/s , 173475.6 tokens/s INFO:__main__:2024-11-30 11:45:37 | Validation | Step: 357000 | Val_loss: 0.687 | Best_val_loss: 0.3487 INFO:__main__:2024-11-30 11:45:37 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_114537_step_357000.pt` INFO:__main__:2024-11-30 11:45:40 | Epoch: 0 | Step: 357000 | Dataset: 0-12000360 | Loss: 0.650 | 595 ms/step , 116027.23 GFLOP/s , 121332.8 tokens/s INFO:__main__:2024-11-30 11:45:47 | Epoch: 0 | Step: 357010 | Dataset: 0-12002760 | Loss: 0.703 | 599 ms/step , 115256.85 GFLOP/s , 173498.4 tokens/s INFO:__main__:2024-11-30 11:45:54 | Epoch: 0 | Step: 357020 | Dataset: 0-12005160 | Loss: 0.626 | 598 ms/step , 115495.92 GFLOP/s , 173502.0 tokens/s INFO:__main__:2024-11-30 11:46:01 | Epoch: 0 | Step: 357030 | Dataset: 0-12007560 | Loss: 0.734 | 598 ms/step , 115380.30 GFLOP/s , 173531.5 tokens/s INFO:__main__:2024-11-30 11:46:08 | Epoch: 0 | Step: 357040 | Dataset: 0-12009960 | Loss: 0.594 | 597 ms/step , 115516.59 GFLOP/s , 173514.3 tokens/s INFO:__main__:2024-11-30 11:46:15 | Epoch: 0 | Step: 357050 | Dataset: 0-12012360 | Loss: 0.676 | 598 ms/step , 115497.19 GFLOP/s , 173530.0 tokens/s INFO:__main__:2024-11-30 11:46:22 | Epoch: 0 | Step: 357060 | Dataset: 0-12014760 | Loss: 0.692 | 596 ms/step , 115742.24 GFLOP/s , 173513.6 tokens/s INFO:__main__:2024-11-30 11:46:29 | Epoch: 0 | Step: 357070 | Dataset: 0-12017160 | Loss: 0.680 | 596 ms/step , 115702.79 GFLOP/s , 173572.6 tokens/s INFO:__main__:2024-11-30 11:46:37 | Epoch: 0 | Step: 357080 | Dataset: 0-12019560 | Loss: 0.611 | 597 ms/step , 115660.16 GFLOP/s , 173535.9 tokens/s INFO:__main__:2024-11-30 11:46:44 | Epoch: 0 | Step: 357090 | Dataset: 0-12021960 | Loss: 0.656 | 597 ms/step , 115651.60 GFLOP/s , 173518.0 tokens/s INFO:__main__:2024-11-30 11:46:51 | Epoch: 0 | Step: 357100 | Dataset: 0-12024360 | Loss: 0.698 | 596 ms/step , 115763.50 GFLOP/s , 173449.6 tokens/s INFO:__main__:2024-11-30 11:46:58 | Epoch: 0 | Step: 357110 | Dataset: 0-12026760 | Loss: 0.658 | 596 ms/step , 115715.07 GFLOP/s , 173526.8 tokens/s INFO:__main__:2024-11-30 11:47:05 | Epoch: 0 | Step: 357120 | Dataset: 0-12029160 | Loss: 0.600 | 596 ms/step , 115832.80 GFLOP/s , 173481.7 tokens/s INFO:__main__:2024-11-30 11:47:12 | Epoch: 0 | Step: 357130 | Dataset: 0-12031560 | Loss: 0.643 | 596 ms/step , 115844.35 GFLOP/s , 173467.4 tokens/s INFO:__main__:2024-11-30 11:47:19 | Epoch: 0 | Step: 357140 | Dataset: 0-12033960 | Loss: 0.666 | 596 ms/step , 115763.12 GFLOP/s , 173475.8 tokens/s INFO:__main__:2024-11-30 11:47:26 | Epoch: 0 | Step: 357150 | Dataset: 0-12036360 | Loss: 0.777 | 596 ms/step , 115734.77 GFLOP/s , 173441.5 tokens/s INFO:__main__:2024-11-30 11:47:33 | Epoch: 0 | Step: 357160 | Dataset: 0-12038760 | Loss: 0.758 | 597 ms/step , 115674.67 GFLOP/s , 173510.7 tokens/s INFO:__main__:2024-11-30 11:47:40 | Epoch: 0 | Step: 357170 | Dataset: 0-12041160 | Loss: 0.609 | 596 ms/step , 115707.58 GFLOP/s , 173493.1 tokens/s INFO:__main__:2024-11-30 11:47:47 | Epoch: 0 | Step: 357180 | Dataset: 0-12043560 | Loss: 0.683 | 597 ms/step , 115673.82 GFLOP/s , 173490.7 tokens/s INFO:__main__:2024-11-30 11:47:54 | Epoch: 0 | Step: 357190 | Dataset: 0-12045960 | Loss: 0.614 | 596 ms/step , 115847.33 GFLOP/s , 173483.3 tokens/s INFO:__main__:2024-11-30 11:48:02 | Epoch: 0 | Step: 357200 | Dataset: 0-12048360 | Loss: 0.640 | 595 ms/step , 115946.62 GFLOP/s , 173556.7 tokens/s INFO:__main__:2024-11-30 11:48:09 | Epoch: 0 | Step: 357210 | Dataset: 0-12050760 | Loss: 0.699 | 596 ms/step , 115857.25 GFLOP/s , 173589.1 tokens/s INFO:__main__:2024-11-30 11:48:16 | Epoch: 0 | Step: 357220 | Dataset: 0-12053160 | Loss: 0.650 | 596 ms/step , 115888.50 GFLOP/s , 173467.0 tokens/s INFO:__main__:2024-11-30 11:48:23 | Epoch: 0 | Step: 357230 | Dataset: 0-12055560 | Loss: 0.690 | 596 ms/step , 115831.02 GFLOP/s , 173519.4 tokens/s INFO:__main__:2024-11-30 11:48:30 | Epoch: 0 | Step: 357240 | Dataset: 0-12057960 | Loss: 0.584 | 595 ms/step , 115908.34 GFLOP/s , 173528.8 tokens/s INFO:__main__:2024-11-30 11:48:37 | Epoch: 0 | Step: 357250 | Dataset: 0-12060360 | Loss: 0.646 | 596 ms/step , 115778.18 GFLOP/s , 173577.4 tokens/s INFO:__main__:2024-11-30 11:48:44 | Epoch: 0 | Step: 357260 | Dataset: 0-12062760 | Loss: 0.628 | 596 ms/step , 115837.36 GFLOP/s , 173584.4 tokens/s INFO:__main__:2024-11-30 11:48:51 | Epoch: 0 | Step: 357270 | Dataset: 0-12065160 | Loss: 0.721 | 596 ms/step , 115867.27 GFLOP/s , 173588.2 tokens/s INFO:__main__:2024-11-30 11:48:58 | Epoch: 0 | Step: 357280 | Dataset: 0-12067560 | Loss: 0.763 | 595 ms/step , 115922.15 GFLOP/s , 173545.5 tokens/s INFO:__main__:2024-11-30 11:49:05 | Epoch: 0 | Step: 357290 | Dataset: 0-12069960 | Loss: 0.657 | 596 ms/step , 115877.14 GFLOP/s , 173591.5 tokens/s INFO:__main__:2024-11-30 11:49:12 | Epoch: 0 | Step: 357300 | Dataset: 0-12072360 | Loss: 0.593 | 595 ms/step , 115962.99 GFLOP/s , 173613.9 tokens/s INFO:__main__:2024-11-30 11:49:19 | Epoch: 0 | Step: 357310 | Dataset: 0-12074760 | Loss: 0.646 | 595 ms/step , 115978.81 GFLOP/s , 173568.6 tokens/s INFO:__main__:2024-11-30 11:49:27 | Epoch: 0 | Step: 357320 | Dataset: 0-12077160 | Loss: 0.725 | 595 ms/step , 116076.79 GFLOP/s , 173574.5 tokens/s INFO:__main__:2024-11-30 11:49:34 | Epoch: 0 | Step: 357330 | Dataset: 0-12079560 | Loss: 1.099 | 597 ms/step , 115552.44 GFLOP/s , 173527.9 tokens/s INFO:__main__:2024-11-30 11:49:41 | Epoch: 0 | Step: 357340 | Dataset: 0-12081960 | Loss: 1.047 | 596 ms/step , 115762.62 GFLOP/s , 173429.6 tokens/s INFO:__main__:2024-11-30 11:49:48 | Epoch: 0 | Step: 357350 | Dataset: 0-12084360 | Loss: 1.057 | 596 ms/step , 115828.13 GFLOP/s , 173413.1 tokens/s INFO:__main__:2024-11-30 11:49:55 | Epoch: 0 | Step: 357360 | Dataset: 0-12086760 | Loss: 1.018 | 596 ms/step , 115730.57 GFLOP/s , 173407.6 tokens/s INFO:__main__:2024-11-30 11:50:02 | Epoch: 0 | Step: 357370 | Dataset: 0-12089160 | Loss: 1.052 | 596 ms/step , 115749.32 GFLOP/s , 173443.5 tokens/s INFO:__main__:2024-11-30 11:50:09 | Epoch: 0 | Step: 357380 | Dataset: 0-12091560 | Loss: 1.050 | 596 ms/step , 115731.50 GFLOP/s , 173420.3 tokens/s INFO:__main__:2024-11-30 11:50:16 | Epoch: 0 | Step: 357390 | Dataset: 0-12093960 | Loss: 1.079 | 597 ms/step , 115594.75 GFLOP/s , 173330.0 tokens/s INFO:__main__:2024-11-30 11:50:23 | Epoch: 0 | Step: 357400 | Dataset: 0-12096360 | Loss: 1.059 | 597 ms/step , 115595.89 GFLOP/s , 173471.1 tokens/s INFO:__main__:2024-11-30 11:50:30 | Epoch: 0 | Step: 357410 | Dataset: 0-12098760 | Loss: 1.071 | 597 ms/step , 115655.26 GFLOP/s , 173354.6 tokens/s INFO:__main__:2024-11-30 11:50:37 | Epoch: 0 | Step: 357420 | Dataset: 0-12101160 | Loss: 1.025 | 597 ms/step , 115560.88 GFLOP/s , 173403.2 tokens/s INFO:__main__:2024-11-30 11:50:44 | Epoch: 0 | Step: 357430 | Dataset: 0-12103560 | Loss: 1.056 | 597 ms/step , 115661.77 GFLOP/s , 173518.4 tokens/s INFO:__main__:2024-11-30 11:50:52 | Epoch: 0 | Step: 357440 | Dataset: 0-12105960 | Loss: 1.025 | 596 ms/step , 115820.57 GFLOP/s , 173437.3 tokens/s INFO:__main__:2024-11-30 11:50:59 | Epoch: 0 | Step: 357450 | Dataset: 0-12108360 | Loss: 1.063 | 596 ms/step , 115790.27 GFLOP/s , 173342.5 tokens/s INFO:__main__:2024-11-30 11:51:06 | Epoch: 0 | Step: 357460 | Dataset: 0-12110760 | Loss: 1.041 | 596 ms/step , 115714.75 GFLOP/s , 173494.5 tokens/s INFO:__main__:2024-11-30 11:51:13 | Epoch: 0 | Step: 357470 | Dataset: 0-12113160 | Loss: 1.040 | 596 ms/step , 115814.76 GFLOP/s , 173417.3 tokens/s INFO:__main__:2024-11-30 11:51:20 | Epoch: 0 | Step: 357480 | Dataset: 0-12115560 | Loss: 1.083 | 596 ms/step , 115774.75 GFLOP/s , 173495.4 tokens/s INFO:__main__:2024-11-30 11:51:27 | Epoch: 0 | Step: 357490 | Dataset: 0-12117960 | Loss: 1.029 | 596 ms/step , 115763.19 GFLOP/s , 173436.4 tokens/s INFO:__main__:2024-11-30 11:51:35 | Validation | Step: 357500 | Val_loss: 0.692 | Best_val_loss: 0.3487 INFO:__main__:2024-11-30 11:51:35 | Epoch: 0 | Step: 357500 | Dataset: 0-12120360 | Loss: 1.018 | 595 ms/step , 115996.19 GFLOP/s , 147697.0 tokens/s INFO:__main__:2024-11-30 11:51:42 | Epoch: 0 | Step: 357510 | Dataset: 0-12122760 | Loss: 0.904 | 596 ms/step , 115853.10 GFLOP/s , 174002.1 tokens/s INFO:__main__:2024-11-30 11:51:49 | Epoch: 0 | Step: 357520 | Dataset: 0-12125160 | Loss: 1.503 | 596 ms/step , 115876.79 GFLOP/s , 173631.8 tokens/s INFO:__main__:2024-11-30 11:51:57 | Epoch: 0 | Step: 357530 | Dataset: 0-12127560 | Loss: 1.529 | 596 ms/step , 115877.60 GFLOP/s , 173568.2 tokens/s INFO:__main__:2024-11-30 11:52:04 | Epoch: 0 | Step: 357540 | Dataset: 0-12129960 | Loss: 1.425 | 596 ms/step , 115837.52 GFLOP/s , 173508.3 tokens/s INFO:__main__:2024-11-30 11:52:11 | Epoch: 0 | Step: 357550 | Dataset: 0-12132360 | Loss: 1.510 | 596 ms/step , 115718.10 GFLOP/s , 173539.9 tokens/s INFO:__main__:2024-11-30 11:52:18 | Epoch: 0 | Step: 357560 | Dataset: 0-12134760 | Loss: 1.345 | 595 ms/step , 115976.05 GFLOP/s , 173592.0 tokens/s INFO:__main__:2024-11-30 11:52:25 | Epoch: 0 | Step: 357570 | Dataset: 0-12137160 | Loss: 1.408 | 596 ms/step , 115779.31 GFLOP/s , 173535.3 tokens/s INFO:__main__:2024-11-30 11:52:32 | Epoch: 0 | Step: 357580 | Dataset: 0-12139560 | Loss: 1.380 | 597 ms/step , 115685.36 GFLOP/s , 173494.6 tokens/s INFO:__main__:2024-11-30 11:52:39 | Epoch: 0 | Step: 357590 | Dataset: 0-12141960 | Loss: 1.386 | 595 ms/step , 115893.53 GFLOP/s , 173471.6 tokens/s INFO:__main__:2024-11-30 11:52:46 | Epoch: 0 | Step: 357600 | Dataset: 0-12144360 | Loss: 1.402 | 597 ms/step , 115646.85 GFLOP/s , 173621.4 tokens/s INFO:__main__:2024-11-30 11:52:53 | Epoch: 0 | Step: 357610 | Dataset: 0-12146760 | Loss: 1.454 | 596 ms/step , 115813.79 GFLOP/s , 173550.6 tokens/s INFO:__main__:2024-11-30 11:53:00 | Epoch: 0 | Step: 357620 | Dataset: 0-12149160 | Loss: 1.501 | 595 ms/step , 115923.21 GFLOP/s , 173606.3 tokens/s INFO:__main__:2024-11-30 11:53:07 | Epoch: 0 | Step: 357630 | Dataset: 0-12151560 | Loss: 1.348 | 595 ms/step , 115924.44 GFLOP/s , 173623.2 tokens/s INFO:__main__:2024-11-30 11:53:14 | Epoch: 0 | Step: 357640 | Dataset: 0-12153960 | Loss: 1.358 | 595 ms/step , 115918.05 GFLOP/s , 173625.5 tokens/s INFO:__main__:2024-11-30 11:53:21 | Epoch: 0 | Step: 357650 | Dataset: 0-12156360 | Loss: 1.319 | 596 ms/step , 115856.83 GFLOP/s , 173647.9 tokens/s INFO:__main__:2024-11-30 11:53:29 | Epoch: 0 | Step: 357660 | Dataset: 0-12158760 | Loss: 1.326 | 595 ms/step , 116002.45 GFLOP/s , 173631.3 tokens/s INFO:__main__:2024-11-30 11:53:36 | Epoch: 0 | Step: 357670 | Dataset: 0-12161160 | Loss: 0.865 | 595 ms/step , 115917.09 GFLOP/s , 173636.3 tokens/s INFO:__main__:2024-11-30 11:53:43 | Epoch: 0 | Step: 357680 | Dataset: 0-12163560 | Loss: 0.857 | 594 ms/step , 116110.05 GFLOP/s , 173794.4 tokens/s INFO:__main__:2024-11-30 11:53:50 | Epoch: 0 | Step: 357690 | Dataset: 0-12165960 | Loss: 0.822 | 595 ms/step , 116050.61 GFLOP/s , 173896.1 tokens/s INFO:__main__:2024-11-30 11:53:57 | Epoch: 0 | Step: 357700 | Dataset: 0-12168360 | Loss: 0.765 | 595 ms/step , 116010.03 GFLOP/s , 173717.8 tokens/s INFO:__main__:2024-11-30 11:54:04 | Epoch: 0 | Step: 357710 | Dataset: 0-12170760 | Loss: 0.799 | 595 ms/step , 115987.53 GFLOP/s , 173756.4 tokens/s INFO:__main__:2024-11-30 11:54:11 | Epoch: 0 | Step: 357720 | Dataset: 0-12173160 | Loss: 0.717 | 596 ms/step , 115806.96 GFLOP/s , 173733.6 tokens/s INFO:__main__:2024-11-30 11:54:18 | Epoch: 0 | Step: 357730 | Dataset: 0-12175560 | Loss: 0.749 | 595 ms/step , 115925.94 GFLOP/s , 173671.9 tokens/s INFO:__main__:2024-11-30 11:54:25 | Epoch: 0 | Step: 357740 | Dataset: 0-12177960 | Loss: 0.715 | 596 ms/step , 115847.37 GFLOP/s , 173802.7 tokens/s INFO:__main__:2024-11-30 11:54:32 | Epoch: 0 | Step: 357750 | Dataset: 0-12180360 | Loss: 0.683 | 594 ms/step , 116138.51 GFLOP/s , 173663.4 tokens/s INFO:__main__:2024-11-30 11:54:39 | Epoch: 0 | Step: 357760 | Dataset: 0-12182760 | Loss: 0.668 | 594 ms/step , 116125.06 GFLOP/s , 173718.9 tokens/s INFO:__main__:2024-11-30 11:54:46 | Epoch: 0 | Step: 357770 | Dataset: 0-12185160 | Loss: 0.673 | 595 ms/step , 115908.14 GFLOP/s , 173765.0 tokens/s INFO:__main__:2024-11-30 11:54:53 | Epoch: 0 | Step: 357780 | Dataset: 0-12187560 | Loss: 0.646 | 594 ms/step , 116149.15 GFLOP/s , 173749.4 tokens/s INFO:__main__:2024-11-30 11:55:00 | Epoch: 0 | Step: 357790 | Dataset: 0-12189960 | Loss: 0.657 | 596 ms/step , 115838.59 GFLOP/s , 173691.8 tokens/s INFO:__main__:2024-11-30 11:55:08 | Epoch: 0 | Step: 357800 | Dataset: 0-12192360 | Loss: 0.423 | 594 ms/step , 116145.21 GFLOP/s , 173835.4 tokens/s INFO:__main__:2024-11-30 11:55:15 | Epoch: 0 | Step: 357810 | Dataset: 0-12194760 | Loss: 0.476 | 596 ms/step , 115840.19 GFLOP/s , 173792.4 tokens/s INFO:__main__:2024-11-30 11:55:22 | Epoch: 0 | Step: 357820 | Dataset: 0-12197160 | Loss: 0.418 | 595 ms/step , 115985.60 GFLOP/s , 173835.2 tokens/s INFO:__main__:2024-11-30 11:55:29 | Epoch: 0 | Step: 357830 | Dataset: 0-12199560 | Loss: 0.369 | 595 ms/step , 115934.70 GFLOP/s , 173750.9 tokens/s INFO:__main__:2024-11-30 11:55:36 | Epoch: 0 | Step: 357840 | Dataset: 0-12201960 | Loss: 1.030 | 595 ms/step , 116004.36 GFLOP/s , 173739.0 tokens/s INFO:__main__:2024-11-30 11:55:43 | Epoch: 0 | Step: 357850 | Dataset: 0-12204360 | Loss: 1.082 | 595 ms/step , 115906.39 GFLOP/s , 173465.4 tokens/s INFO:__main__:2024-11-30 11:55:50 | Epoch: 0 | Step: 357860 | Dataset: 0-12206760 | Loss: 0.952 | 596 ms/step , 115822.28 GFLOP/s , 173620.2 tokens/s INFO:__main__:2024-11-30 11:55:57 | Epoch: 0 | Step: 357870 | Dataset: 0-12209160 | Loss: 1.013 | 595 ms/step , 115959.21 GFLOP/s , 173563.6 tokens/s INFO:__main__:2024-11-30 11:56:04 | Epoch: 0 | Step: 357880 | Dataset: 0-12211560 | Loss: 0.915 | 595 ms/step , 115903.24 GFLOP/s , 173565.8 tokens/s INFO:__main__:2024-11-30 11:56:11 | Epoch: 0 | Step: 357890 | Dataset: 0-12213960 | Loss: 1.111 | 596 ms/step , 115849.49 GFLOP/s , 173528.2 tokens/s INFO:__main__:2024-11-30 11:56:18 | Epoch: 0 | Step: 357900 | Dataset: 0-12216360 | Loss: 1.041 | 596 ms/step , 115749.17 GFLOP/s , 173586.8 tokens/s INFO:__main__:2024-11-30 11:56:25 | Epoch: 0 | Step: 357910 | Dataset: 0-12218760 | Loss: 0.925 | 596 ms/step , 115861.14 GFLOP/s , 173564.9 tokens/s INFO:__main__:2024-11-30 11:56:32 | Epoch: 0 | Step: 357920 | Dataset: 0-12221160 | Loss: 0.950 | 595 ms/step , 115992.15 GFLOP/s , 173659.9 tokens/s INFO:__main__:2024-11-30 11:56:40 | Epoch: 0 | Step: 357930 | Dataset: 0-12223560 | Loss: 1.037 | 596 ms/step , 115846.82 GFLOP/s , 173644.1 tokens/s INFO:__main__:2024-11-30 11:56:47 | Epoch: 0 | Step: 357940 | Dataset: 0-12225960 | Loss: 1.173 | 595 ms/step , 115958.92 GFLOP/s , 173685.9 tokens/s INFO:__main__:2024-11-30 11:56:54 | Epoch: 0 | Step: 357950 | Dataset: 0-12228360 | Loss: 0.948 | 596 ms/step , 115868.39 GFLOP/s , 173670.5 tokens/s INFO:__main__:2024-11-30 11:57:01 | Epoch: 0 | Step: 357960 | Dataset: 0-12230760 | Loss: 1.155 | 596 ms/step , 115817.85 GFLOP/s , 173680.5 tokens/s INFO:__main__:2024-11-30 11:57:08 | Epoch: 0 | Step: 357970 | Dataset: 0-12233160 | Loss: 0.925 | 595 ms/step , 115994.51 GFLOP/s , 173679.3 tokens/s INFO:__main__:2024-11-30 11:57:15 | Epoch: 0 | Step: 357980 | Dataset: 0-12235560 | Loss: 1.149 | 596 ms/step , 115759.42 GFLOP/s , 173481.3 tokens/s INFO:__main__:2024-11-30 11:57:22 | Epoch: 0 | Step: 357990 | Dataset: 0-12237960 | Loss: 1.053 | 596 ms/step , 115887.51 GFLOP/s , 173632.9 tokens/s INFO:__main__:2024-11-30 11:57:30 | Validation | Step: 358000 | Val_loss: 0.659 | Best_val_loss: 0.3487 INFO:__main__:2024-11-30 11:57:30 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_115730_step_358000.pt` INFO:__main__:2024-11-30 11:57:32 | Epoch: 0 | Step: 358000 | Dataset: 0-12240360 | Loss: 0.904 | 594 ms/step , 116153.41 GFLOP/s , 121095.2 tokens/s INFO:__main__:2024-11-30 11:57:39 | Epoch: 0 | Step: 358010 | Dataset: 0-12242760 | Loss: 1.112 | 598 ms/step , 115366.21 GFLOP/s , 173730.2 tokens/s INFO:__main__:2024-11-30 11:57:46 | Epoch: 0 | Step: 358020 | Dataset: 0-12245160 | Loss: 1.062 | 597 ms/step , 115546.18 GFLOP/s , 173658.0 tokens/s INFO:__main__:2024-11-30 11:57:53 | Epoch: 0 | Step: 358030 | Dataset: 0-12247560 | Loss: 0.896 | 597 ms/step , 115551.36 GFLOP/s , 173669.9 tokens/s INFO:__main__:2024-11-30 11:58:00 | Epoch: 0 | Step: 358040 | Dataset: 0-12249960 | Loss: 1.024 | 597 ms/step , 115536.69 GFLOP/s , 173616.3 tokens/s INFO:__main__:2024-11-30 11:58:08 | Epoch: 0 | Step: 358050 | Dataset: 0-12252360 | Loss: 0.916 | 597 ms/step , 115515.62 GFLOP/s , 173614.6 tokens/s INFO:__main__:2024-11-30 11:58:15 | Epoch: 0 | Step: 358060 | Dataset: 0-12254760 | Loss: 0.987 | 597 ms/step , 115678.20 GFLOP/s , 173545.2 tokens/s INFO:__main__:2024-11-30 11:58:22 | Epoch: 0 | Step: 358070 | Dataset: 0-12257160 | Loss: 0.943 | 597 ms/step , 115596.98 GFLOP/s , 173559.2 tokens/s INFO:__main__:2024-11-30 11:58:29 | Epoch: 0 | Step: 358080 | Dataset: 0-12259560 | Loss: 1.095 | 597 ms/step , 115617.96 GFLOP/s , 173546.2 tokens/s INFO:__main__:2024-11-30 11:58:36 | Epoch: 0 | Step: 358090 | Dataset: 0-12261960 | Loss: 0.932 | 597 ms/step , 115601.98 GFLOP/s , 173632.0 tokens/s INFO:__main__:2024-11-30 11:58:43 | Epoch: 0 | Step: 358100 | Dataset: 0-12264360 | Loss: 0.927 | 597 ms/step , 115568.62 GFLOP/s , 173547.2 tokens/s INFO:__main__:2024-11-30 11:58:50 | Epoch: 0 | Step: 358110 | Dataset: 0-12266760 | Loss: 1.031 | 598 ms/step , 115493.19 GFLOP/s , 173589.4 tokens/s INFO:__main__:2024-11-30 11:58:57 | Epoch: 0 | Step: 358120 | Dataset: 0-12269160 | Loss: 1.130 | 597 ms/step , 115535.29 GFLOP/s , 173577.6 tokens/s INFO:__main__:2024-11-30 11:59:04 | Epoch: 0 | Step: 358130 | Dataset: 0-12271560 | Loss: 0.943 | 597 ms/step , 115546.88 GFLOP/s , 173564.7 tokens/s INFO:__main__:2024-11-30 11:59:11 | Epoch: 0 | Step: 358140 | Dataset: 0-12273960 | Loss: 1.061 | 598 ms/step , 115359.15 GFLOP/s , 173632.2 tokens/s INFO:__main__:2024-11-30 11:59:18 | Epoch: 0 | Step: 358150 | Dataset: 0-12276360 | Loss: 1.050 | 597 ms/step , 115568.86 GFLOP/s , 173609.9 tokens/s INFO:__main__:2024-11-30 11:59:25 | Epoch: 0 | Step: 358160 | Dataset: 0-12278760 | Loss: 0.974 | 598 ms/step , 115468.76 GFLOP/s , 173574.7 tokens/s INFO:__main__:2024-11-30 11:59:32 | Epoch: 0 | Step: 358170 | Dataset: 0-12281160 | Loss: 1.068 | 598 ms/step , 115388.95 GFLOP/s , 173638.9 tokens/s INFO:__main__:2024-11-30 11:59:40 | Epoch: 0 | Step: 358180 | Dataset: 0-12283560 | Loss: 0.837 | 597 ms/step , 115646.45 GFLOP/s , 173532.2 tokens/s INFO:__main__:2024-11-30 11:59:47 | Epoch: 0 | Step: 358190 | Dataset: 0-12285960 | Loss: 0.924 | 598 ms/step , 115434.08 GFLOP/s , 173584.8 tokens/s INFO:__main__:2024-11-30 11:59:54 | Epoch: 0 | Step: 358200 | Dataset: 0-12288360 | Loss: 0.797 | 598 ms/step , 115396.51 GFLOP/s , 173575.7 tokens/s INFO:__main__:2024-11-30 12:00:01 | Epoch: 0 | Step: 358210 | Dataset: 0-12290760 | Loss: 0.872 | 597 ms/step , 115522.99 GFLOP/s , 173533.8 tokens/s INFO:__main__:2024-11-30 12:00:08 | Epoch: 0 | Step: 358220 | Dataset: 0-12293160 | Loss: 0.906 | 598 ms/step , 115324.21 GFLOP/s , 173562.0 tokens/s INFO:__main__:2024-11-30 12:00:15 | Epoch: 0 | Step: 358230 | Dataset: 0-12295560 | Loss: 0.893 | 598 ms/step , 115400.17 GFLOP/s , 173409.4 tokens/s INFO:__main__:2024-11-30 12:00:22 | Epoch: 0 | Step: 358240 | Dataset: 0-12297960 | Loss: 0.647 | 597 ms/step , 115591.08 GFLOP/s , 173637.4 tokens/s INFO:__main__:2024-11-30 12:00:29 | Epoch: 0 | Step: 358250 | Dataset: 0-12300360 | Loss: 0.883 | 597 ms/step , 115547.03 GFLOP/s , 173547.7 tokens/s INFO:__main__:2024-11-30 12:00:36 | Epoch: 0 | Step: 358260 | Dataset: 0-12302760 | Loss: 1.116 | 597 ms/step , 115598.23 GFLOP/s , 173457.8 tokens/s INFO:__main__:2024-11-30 12:00:43 | Epoch: 0 | Step: 358270 | Dataset: 0-12305160 | Loss: 0.934 | 597 ms/step , 115620.78 GFLOP/s , 173505.6 tokens/s INFO:__main__:2024-11-30 12:00:50 | Epoch: 0 | Step: 358280 | Dataset: 0-12307560 | Loss: 0.847 | 598 ms/step , 115399.46 GFLOP/s , 173529.2 tokens/s INFO:__main__:2024-11-30 12:00:57 | Epoch: 0 | Step: 358290 | Dataset: 0-12309960 | Loss: 0.872 | 597 ms/step , 115583.06 GFLOP/s , 173566.2 tokens/s INFO:__main__:2024-11-30 12:01:05 | Epoch: 0 | Step: 358300 | Dataset: 0-12312360 | Loss: 1.086 | 597 ms/step , 115512.42 GFLOP/s , 173510.3 tokens/s INFO:__main__:2024-11-30 12:01:12 | Epoch: 0 | Step: 358310 | Dataset: 0-12314760 | Loss: 0.902 | 598 ms/step , 115489.65 GFLOP/s , 173581.7 tokens/s INFO:__main__:2024-11-30 12:01:19 | Epoch: 0 | Step: 358320 | Dataset: 0-12317160 | Loss: 1.002 | 597 ms/step , 115550.43 GFLOP/s , 173650.7 tokens/s INFO:__main__:2024-11-30 12:01:26 | Epoch: 0 | Step: 358330 | Dataset: 0-12319560 | Loss: 0.897 | 597 ms/step , 115507.33 GFLOP/s , 173611.3 tokens/s INFO:__main__:2024-11-30 12:01:33 | Epoch: 0 | Step: 358340 | Dataset: 0-12321960 | Loss: 0.389 | 596 ms/step , 115747.05 GFLOP/s , 173637.3 tokens/s INFO:__main__:2024-11-30 12:01:40 | Epoch: 0 | Step: 358350 | Dataset: 0-12324360 | Loss: 0.356 | 596 ms/step , 115808.94 GFLOP/s , 173836.4 tokens/s INFO:__main__:2024-11-30 12:01:47 | Epoch: 0 | Step: 358360 | Dataset: 0-12326760 | Loss: 0.338 | 597 ms/step , 115627.48 GFLOP/s , 173822.4 tokens/s INFO:__main__:2024-11-30 12:01:54 | Epoch: 0 | Step: 358370 | Dataset: 0-12329160 | Loss: 0.332 | 597 ms/step , 115648.52 GFLOP/s , 173800.6 tokens/s INFO:__main__:2024-11-30 12:02:01 | Epoch: 0 | Step: 358380 | Dataset: 0-12331560 | Loss: 0.298 | 596 ms/step , 115758.28 GFLOP/s , 173847.3 tokens/s INFO:__main__:2024-11-30 12:02:08 | Epoch: 0 | Step: 358390 | Dataset: 0-12333960 | Loss: 0.395 | 597 ms/step , 115626.46 GFLOP/s , 173760.4 tokens/s INFO:__main__:2024-11-30 12:02:15 | Epoch: 0 | Step: 358400 | Dataset: 0-12336360 | Loss: 0.373 | 597 ms/step , 115673.09 GFLOP/s , 173817.7 tokens/s INFO:__main__:2024-11-30 12:02:22 | Epoch: 0 | Step: 358410 | Dataset: 0-12338760 | Loss: 0.317 | 596 ms/step , 115695.92 GFLOP/s , 173775.3 tokens/s INFO:__main__:2024-11-30 12:02:29 | Epoch: 0 | Step: 358420 | Dataset: 0-12341160 | Loss: 0.370 | 597 ms/step , 115688.47 GFLOP/s , 173828.5 tokens/s INFO:__main__:2024-11-30 12:02:36 | Epoch: 0 | Step: 358430 | Dataset: 0-12343560 | Loss: 0.385 | 597 ms/step , 115658.88 GFLOP/s , 173769.4 tokens/s INFO:__main__:2024-11-30 12:02:44 | Epoch: 0 | Step: 358440 | Dataset: 0-12345960 | Loss: 0.316 | 596 ms/step , 115781.03 GFLOP/s , 173711.9 tokens/s INFO:__main__:2024-11-30 12:02:51 | Epoch: 0 | Step: 358450 | Dataset: 0-12348360 | Loss: 0.357 | 597 ms/step , 115653.68 GFLOP/s , 173821.2 tokens/s INFO:__main__:2024-11-30 12:02:58 | Epoch: 0 | Step: 358460 | Dataset: 0-12350760 | Loss: 0.323 | 597 ms/step , 115668.74 GFLOP/s , 173796.0 tokens/s INFO:__main__:2024-11-30 12:03:05 | Epoch: 0 | Step: 358470 | Dataset: 0-12353160 | Loss: 0.350 | 597 ms/step , 115580.04 GFLOP/s , 173793.2 tokens/s INFO:__main__:2024-11-30 12:03:12 | Epoch: 0 | Step: 358480 | Dataset: 0-12355560 | Loss: 0.355 | 597 ms/step , 115518.26 GFLOP/s , 173689.8 tokens/s INFO:__main__:2024-11-30 12:03:19 | Epoch: 0 | Step: 358490 | Dataset: 0-12357960 | Loss: 0.682 | 597 ms/step , 115647.08 GFLOP/s , 173452.6 tokens/s INFO:__main__:2024-11-30 12:03:27 | Validation | Step: 358500 | Val_loss: 0.679 | Best_val_loss: 0.3487 INFO:__main__:2024-11-30 12:03:27 | Epoch: 0 | Step: 358500 | Dataset: 0-12360360 | Loss: 0.661 | 597 ms/step , 115679.59 GFLOP/s , 147846.2 tokens/s INFO:__main__:2024-11-30 12:03:34 | Epoch: 0 | Step: 358510 | Dataset: 0-12362760 | Loss: 0.637 | 597 ms/step , 115654.11 GFLOP/s , 173774.2 tokens/s INFO:__main__:2024-11-30 12:03:41 | Epoch: 0 | Step: 358520 | Dataset: 0-12365160 | Loss: 0.623 | 597 ms/step , 115568.07 GFLOP/s , 173771.2 tokens/s INFO:__main__:2024-11-30 12:03:48 | Epoch: 0 | Step: 358530 | Dataset: 0-12367560 | Loss: 0.606 | 597 ms/step , 115634.93 GFLOP/s , 173717.6 tokens/s INFO:__main__:2024-11-30 12:03:56 | Epoch: 0 | Step: 358540 | Dataset: 0-12369960 | Loss: 0.592 | 596 ms/step , 115711.98 GFLOP/s , 173752.6 tokens/s INFO:__main__:2024-11-30 12:04:03 | Epoch: 0 | Step: 358550 | Dataset: 0-12372360 | Loss: 0.565 | 597 ms/step , 115527.48 GFLOP/s , 173734.6 tokens/s INFO:__main__:2024-11-30 12:04:10 | Epoch: 0 | Step: 358560 | Dataset: 0-12374760 | Loss: 0.556 | 597 ms/step , 115612.44 GFLOP/s , 173682.1 tokens/s INFO:__main__:2024-11-30 12:04:17 | Epoch: 0 | Step: 358570 | Dataset: 0-12377160 | Loss: 0.577 | 598 ms/step , 115501.02 GFLOP/s , 173668.3 tokens/s INFO:__main__:2024-11-30 12:04:24 | Epoch: 0 | Step: 358580 | Dataset: 0-12379560 | Loss: 0.548 | 597 ms/step , 115595.23 GFLOP/s , 173644.5 tokens/s INFO:__main__:2024-11-30 12:04:31 | Epoch: 0 | Step: 358590 | Dataset: 0-12381960 | Loss: 0.561 | 597 ms/step , 115543.51 GFLOP/s , 173666.9 tokens/s INFO:__main__:2024-11-30 12:04:38 | Epoch: 0 | Step: 358600 | Dataset: 0-12384360 | Loss: 0.500 | 597 ms/step , 115549.97 GFLOP/s , 173631.8 tokens/s INFO:__main__:2024-11-30 12:04:45 | Epoch: 0 | Step: 358610 | Dataset: 0-12386760 | Loss: 0.507 | 597 ms/step , 115624.78 GFLOP/s , 173596.3 tokens/s INFO:__main__:2024-11-30 12:04:52 | Epoch: 0 | Step: 358620 | Dataset: 0-12389160 | Loss: 0.524 | 597 ms/step , 115637.67 GFLOP/s , 173538.8 tokens/s INFO:__main__:2024-11-30 12:04:59 | Epoch: 0 | Step: 358630 | Dataset: 0-12391560 | Loss: 0.510 | 597 ms/step , 115634.01 GFLOP/s , 173624.9 tokens/s INFO:__main__:2024-11-30 12:05:06 | Epoch: 0 | Step: 358640 | Dataset: 0-12393960 | Loss: 2.129 | 599 ms/step , 115282.35 GFLOP/s , 173418.8 tokens/s INFO:__main__:2024-11-30 12:05:13 | Epoch: 0 | Step: 358650 | Dataset: 0-12396360 | Loss: 2.090 | 600 ms/step , 115080.85 GFLOP/s , 173324.0 tokens/s INFO:__main__:2024-11-30 12:05:20 | Epoch: 0 | Step: 358660 | Dataset: 0-12398760 | Loss: 2.092 | 599 ms/step , 115161.09 GFLOP/s , 173321.6 tokens/s INFO:__main__:2024-11-30 12:05:28 | Epoch: 0 | Step: 358670 | Dataset: 0-12401160 | Loss: 2.011 | 599 ms/step , 115196.48 GFLOP/s , 173315.1 tokens/s INFO:__main__:2024-11-30 12:05:35 | Epoch: 0 | Step: 358680 | Dataset: 0-12403560 | Loss: 2.076 | 599 ms/step , 115300.54 GFLOP/s , 173288.2 tokens/s INFO:__main__:2024-11-30 12:05:42 | Epoch: 0 | Step: 358690 | Dataset: 0-12405960 | Loss: 0.566 | 596 ms/step , 115879.48 GFLOP/s , 173446.5 tokens/s INFO:__main__:2024-11-30 12:05:49 | Epoch: 0 | Step: 358700 | Dataset: 0-12408360 | Loss: 0.502 | 595 ms/step , 116001.46 GFLOP/s , 173652.6 tokens/s INFO:__main__:2024-11-30 12:05:56 | Epoch: 0 | Step: 358710 | Dataset: 0-12410760 | Loss: 0.526 | 595 ms/step , 115917.94 GFLOP/s , 173573.7 tokens/s INFO:__main__:2024-11-30 12:06:03 | Epoch: 0 | Step: 358720 | Dataset: 0-12413160 | Loss: 0.508 | 596 ms/step , 115763.46 GFLOP/s , 173546.8 tokens/s INFO:__main__:2024-11-30 12:06:10 | Epoch: 0 | Step: 358730 | Dataset: 0-12415560 | Loss: 0.459 | 596 ms/step , 115754.29 GFLOP/s , 173410.3 tokens/s INFO:__main__:2024-11-30 12:06:17 | Epoch: 0 | Step: 358740 | Dataset: 0-12417960 | Loss: 0.464 | 596 ms/step , 115798.97 GFLOP/s , 173391.0 tokens/s INFO:__main__:2024-11-30 12:06:24 | Epoch: 0 | Step: 358750 | Dataset: 0-12420360 | Loss: 0.431 | 596 ms/step , 115701.97 GFLOP/s , 173509.2 tokens/s INFO:__main__:2024-11-30 12:06:31 | Epoch: 0 | Step: 358760 | Dataset: 0-12422760 | Loss: 0.442 | 595 ms/step , 115919.64 GFLOP/s , 173436.8 tokens/s INFO:__main__:2024-11-30 12:06:38 | Epoch: 0 | Step: 358770 | Dataset: 0-12425160 | Loss: 0.422 | 596 ms/step , 115846.71 GFLOP/s , 173402.4 tokens/s INFO:__main__:2024-11-30 12:06:45 | Epoch: 0 | Step: 358780 | Dataset: 0-12427560 | Loss: 0.379 | 597 ms/step , 115647.44 GFLOP/s , 173476.7 tokens/s INFO:__main__:2024-11-30 12:06:53 | Epoch: 0 | Step: 358790 | Dataset: 0-12429960 | Loss: 0.445 | 596 ms/step , 115849.17 GFLOP/s , 173434.6 tokens/s INFO:__main__:2024-11-30 12:07:00 | Epoch: 0 | Step: 358800 | Dataset: 0-12432360 | Loss: 0.472 | 596 ms/step , 115811.11 GFLOP/s , 173453.0 tokens/s INFO:__main__:2024-11-30 12:07:07 | Epoch: 0 | Step: 358810 | Dataset: 0-12434760 | Loss: 0.419 | 596 ms/step , 115852.14 GFLOP/s , 173384.8 tokens/s INFO:__main__:2024-11-30 12:07:14 | Epoch: 0 | Step: 358820 | Dataset: 0-12437160 | Loss: 0.386 | 597 ms/step , 115669.62 GFLOP/s , 173397.5 tokens/s INFO:__main__:2024-11-30 12:07:21 | Epoch: 0 | Step: 358830 | Dataset: 0-12439560 | Loss: 0.413 | 597 ms/step , 115590.17 GFLOP/s , 173262.2 tokens/s INFO:__main__:2024-11-30 12:07:28 | Epoch: 0 | Step: 358840 | Dataset: 0-12441960 | Loss: 0.432 | 596 ms/step , 115789.95 GFLOP/s , 173364.9 tokens/s INFO:__main__:2024-11-30 12:07:35 | Epoch: 0 | Step: 358850 | Dataset: 0-12444360 | Loss: 0.467 | 596 ms/step , 115878.29 GFLOP/s , 173387.5 tokens/s INFO:__main__:2024-11-30 12:07:42 | Epoch: 0 | Step: 358860 | Dataset: 0-12446760 | Loss: 0.514 | 597 ms/step , 115639.76 GFLOP/s , 173265.3 tokens/s INFO:__main__:2024-11-30 12:07:49 | Epoch: 0 | Step: 358870 | Dataset: 0-12449160 | Loss: 0.433 | 596 ms/step , 115799.37 GFLOP/s , 173331.2 tokens/s INFO:__main__:2024-11-30 12:07:56 | Epoch: 0 | Step: 358880 | Dataset: 0-12451560 | Loss: 0.420 | 596 ms/step , 115784.69 GFLOP/s , 173363.0 tokens/s INFO:__main__:2024-11-30 12:08:03 | Epoch: 0 | Step: 358890 | Dataset: 0-12453960 | Loss: 0.367 | 596 ms/step , 115721.40 GFLOP/s , 173365.0 tokens/s INFO:__main__:2024-11-30 12:08:11 | Epoch: 0 | Step: 358900 | Dataset: 0-12456360 | Loss: 0.443 | 596 ms/step , 115797.55 GFLOP/s , 173371.3 tokens/s INFO:__main__:2024-11-30 12:08:18 | Epoch: 0 | Step: 358910 | Dataset: 0-12458760 | Loss: 0.425 | 596 ms/step , 115706.14 GFLOP/s , 173480.7 tokens/s INFO:__main__:2024-11-30 12:08:25 | Epoch: 0 | Step: 358920 | Dataset: 0-12461160 | Loss: 0.419 | 597 ms/step , 115667.52 GFLOP/s , 173396.1 tokens/s INFO:__main__:2024-11-30 12:08:32 | Epoch: 0 | Step: 358930 | Dataset: 0-12463560 | Loss: 0.396 | 596 ms/step , 115848.05 GFLOP/s , 173451.6 tokens/s INFO:__main__:2024-11-30 12:08:39 | Epoch: 0 | Step: 358940 | Dataset: 0-12465960 | Loss: 0.384 | 597 ms/step , 115691.61 GFLOP/s , 173436.5 tokens/s INFO:__main__:2024-11-30 12:08:46 | Epoch: 0 | Step: 358950 | Dataset: 0-12468360 | Loss: 0.433 | 596 ms/step , 115836.60 GFLOP/s , 173556.5 tokens/s INFO:__main__:2024-11-30 12:08:53 | Epoch: 0 | Step: 358960 | Dataset: 0-12470760 | Loss: 0.359 | 596 ms/step , 115813.49 GFLOP/s , 173607.9 tokens/s INFO:__main__:2024-11-30 12:09:00 | Epoch: 0 | Step: 358970 | Dataset: 0-12473160 | Loss: 0.396 | 596 ms/step , 115704.22 GFLOP/s , 173436.0 tokens/s INFO:__main__:2024-11-30 12:09:07 | Epoch: 0 | Step: 358980 | Dataset: 0-12475560 | Loss: 0.394 | 596 ms/step , 115851.51 GFLOP/s , 173480.8 tokens/s INFO:__main__:2024-11-30 12:09:14 | Epoch: 0 | Step: 358990 | Dataset: 0-12477960 | Loss: 0.375 | 598 ms/step , 115317.87 GFLOP/s , 173484.8 tokens/s INFO:__main__:2024-11-30 12:09:22 | Validation | Step: 359000 | Val_loss: 0.690 | Best_val_loss: 0.3487 INFO:__main__:2024-11-30 12:09:22 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_120922_step_359000.pt` INFO:__main__:2024-11-30 12:09:24 | Epoch: 0 | Step: 359000 | Dataset: 0-12480360 | Loss: 0.388 | 594 ms/step , 116101.34 GFLOP/s , 121611.3 tokens/s INFO:__main__:2024-11-30 12:09:31 | Epoch: 0 | Step: 359010 | Dataset: 0-12482760 | Loss: 0.397 | 597 ms/step , 115519.22 GFLOP/s , 173503.7 tokens/s INFO:__main__:2024-11-30 12:09:39 | Epoch: 0 | Step: 359020 | Dataset: 0-12485160 | Loss: 0.385 | 596 ms/step , 115828.72 GFLOP/s , 173521.1 tokens/s INFO:__main__:2024-11-30 12:09:46 | Epoch: 0 | Step: 359030 | Dataset: 0-12487560 | Loss: 0.386 | 596 ms/step , 115825.08 GFLOP/s , 173519.4 tokens/s INFO:__main__:2024-11-30 12:09:53 | Epoch: 0 | Step: 359040 | Dataset: 0-12489960 | Loss: 0.792 | 597 ms/step , 115611.65 GFLOP/s , 173364.2 tokens/s INFO:__main__:2024-11-30 12:10:00 | Epoch: 0 | Step: 359050 | Dataset: 0-12492360 | Loss: 0.725 | 597 ms/step , 115682.25 GFLOP/s , 173336.8 tokens/s INFO:__main__:2024-11-30 12:10:07 | Epoch: 0 | Step: 359060 | Dataset: 0-12494760 | Loss: 0.755 | 596 ms/step , 115712.33 GFLOP/s , 173266.9 tokens/s INFO:__main__:2024-11-30 12:10:14 | Epoch: 0 | Step: 359070 | Dataset: 0-12497160 | Loss: 0.774 | 596 ms/step , 115761.96 GFLOP/s , 173307.2 tokens/s INFO:__main__:2024-11-30 12:10:21 | Epoch: 0 | Step: 359080 | Dataset: 0-12499560 | Loss: 0.735 | 598 ms/step , 115432.51 GFLOP/s , 173316.1 tokens/s INFO:__main__:2024-11-30 12:10:28 | Epoch: 0 | Step: 359090 | Dataset: 0-12501960 | Loss: 0.759 | 602 ms/step , 114703.82 GFLOP/s , 173209.4 tokens/s INFO:__main__:2024-11-30 12:10:35 | Epoch: 0 | Step: 359100 | Dataset: 0-12504360 | Loss: 0.713 | 597 ms/step , 115654.79 GFLOP/s , 173386.3 tokens/s INFO:__main__:2024-11-30 12:10:42 | Epoch: 0 | Step: 359110 | Dataset: 0-12506760 | Loss: 0.671 | 597 ms/step , 115545.44 GFLOP/s , 173337.0 tokens/s INFO:__main__:2024-11-30 12:10:49 | Epoch: 0 | Step: 359120 | Dataset: 0-12509160 | Loss: 0.674 | 597 ms/step , 115594.36 GFLOP/s , 173227.6 tokens/s INFO:__main__:2024-11-30 12:10:57 | Epoch: 0 | Step: 359130 | Dataset: 0-12511560 | Loss: 0.701 | 597 ms/step , 115660.36 GFLOP/s , 173229.4 tokens/s INFO:__main__:2024-11-30 12:11:04 | Epoch: 0 | Step: 359140 | Dataset: 0-12513960 | Loss: 0.700 | 597 ms/step , 115606.71 GFLOP/s , 173327.9 tokens/s INFO:__main__:2024-11-30 12:11:11 | Epoch: 0 | Step: 359150 | Dataset: 0-12516360 | Loss: 0.711 | 597 ms/step , 115624.35 GFLOP/s , 173086.3 tokens/s INFO:__main__:2024-11-30 12:11:18 | Epoch: 0 | Step: 359160 | Dataset: 0-12518760 | Loss: 0.654 | 597 ms/step , 115603.45 GFLOP/s , 173247.8 tokens/s INFO:__main__:2024-11-30 12:11:25 | Epoch: 0 | Step: 359170 | Dataset: 0-12521160 | Loss: 0.735 | 596 ms/step , 115764.80 GFLOP/s , 173319.5 tokens/s INFO:__main__:2024-11-30 12:11:32 | Epoch: 0 | Step: 359180 | Dataset: 0-12523560 | Loss: 0.703 | 597 ms/step , 115553.59 GFLOP/s , 173728.5 tokens/s INFO:__main__:2024-11-30 12:11:39 | Epoch: 0 | Step: 359190 | Dataset: 0-12525960 | Loss: 0.584 | 597 ms/step , 115611.84 GFLOP/s , 173669.5 tokens/s INFO:__main__:2024-11-30 12:11:46 | Epoch: 0 | Step: 359200 | Dataset: 0-12528360 | Loss: 0.750 | 597 ms/step , 115679.58 GFLOP/s , 173636.0 tokens/s INFO:__main__:2024-11-30 12:11:53 | Epoch: 0 | Step: 359210 | Dataset: 0-12530760 | Loss: 0.750 | 597 ms/step , 115604.32 GFLOP/s , 173756.9 tokens/s INFO:__main__:2024-11-30 12:12:00 | Epoch: 0 | Step: 359220 | Dataset: 0-12533160 | Loss: 0.713 | 597 ms/step , 115508.49 GFLOP/s , 173689.0 tokens/s INFO:__main__:2024-11-30 12:12:07 | Epoch: 0 | Step: 359230 | Dataset: 0-12535560 | Loss: 0.710 | 597 ms/step , 115565.72 GFLOP/s , 173675.2 tokens/s INFO:__main__:2024-11-30 12:12:14 | Epoch: 0 | Step: 359240 | Dataset: 0-12537960 | Loss: 0.698 | 597 ms/step , 115632.94 GFLOP/s , 173792.4 tokens/s INFO:__main__:2024-11-30 12:12:22 | Epoch: 0 | Step: 359250 | Dataset: 0-12540360 | Loss: 0.620 | 598 ms/step , 115497.03 GFLOP/s , 173661.2 tokens/s INFO:__main__:2024-11-30 12:12:29 | Epoch: 0 | Step: 359260 | Dataset: 0-12542760 | Loss: 0.712 | 597 ms/step , 115528.57 GFLOP/s , 173793.1 tokens/s INFO:__main__:2024-11-30 12:12:36 | Epoch: 0 | Step: 359270 | Dataset: 0-12545160 | Loss: 0.716 | 596 ms/step , 115768.53 GFLOP/s , 173790.6 tokens/s INFO:__main__:2024-11-30 12:12:43 | Epoch: 0 | Step: 359280 | Dataset: 0-12547560 | Loss: 0.688 | 597 ms/step , 115534.62 GFLOP/s , 173796.4 tokens/s INFO:__main__:2024-11-30 12:12:50 | Epoch: 0 | Step: 359290 | Dataset: 0-12549960 | Loss: 0.718 | 596 ms/step , 115757.06 GFLOP/s , 173800.7 tokens/s INFO:__main__:2024-11-30 12:12:57 | Epoch: 0 | Step: 359300 | Dataset: 0-12552360 | Loss: 0.753 | 597 ms/step , 115619.44 GFLOP/s , 173829.0 tokens/s INFO:__main__:2024-11-30 12:13:04 | Epoch: 0 | Step: 359310 | Dataset: 0-12554760 | Loss: 0.691 | 597 ms/step , 115677.50 GFLOP/s , 173805.4 tokens/s INFO:__main__:2024-11-30 12:13:11 | Epoch: 0 | Step: 359320 | Dataset: 0-12557160 | Loss: 0.716 | 597 ms/step , 115527.69 GFLOP/s , 173622.3 tokens/s INFO:__main__:2024-11-30 12:13:18 | Epoch: 0 | Step: 359330 | Dataset: 0-12559560 | Loss: 0.698 | 597 ms/step , 115686.61 GFLOP/s , 173799.1 tokens/s INFO:__main__:2024-11-30 12:13:25 | Epoch: 0 | Step: 359340 | Dataset: 0-12561960 | Loss: 0.738 | 597 ms/step , 115613.07 GFLOP/s , 173755.4 tokens/s INFO:__main__:2024-11-30 12:13:32 | Epoch: 0 | Step: 359350 | Dataset: 0-12564360 | Loss: 0.626 | 597 ms/step , 115604.53 GFLOP/s , 173776.0 tokens/s INFO:__main__:2024-11-30 12:13:39 | Epoch: 0 | Step: 359360 | Dataset: 0-12566760 | Loss: 0.695 | 596 ms/step , 115796.01 GFLOP/s , 173795.4 tokens/s INFO:__main__:2024-11-30 12:13:46 | Epoch: 0 | Step: 359370 | Dataset: 0-12569160 | Loss: 0.657 | 596 ms/step , 115741.53 GFLOP/s , 173805.5 tokens/s INFO:__main__:2024-11-30 12:13:53 | Epoch: 0 | Step: 359380 | Dataset: 0-12571560 | Loss: 0.686 | 597 ms/step , 115618.45 GFLOP/s , 173833.2 tokens/s INFO:__main__:2024-11-30 12:14:00 | Epoch: 0 | Step: 359390 | Dataset: 0-12573960 | Loss: 0.650 | 597 ms/step , 115635.10 GFLOP/s , 173809.6 tokens/s INFO:__main__:2024-11-30 12:14:08 | Epoch: 0 | Step: 359400 | Dataset: 0-12576360 | Loss: 0.657 | 597 ms/step , 115641.85 GFLOP/s , 173797.7 tokens/s INFO:__main__:2024-11-30 12:14:15 | Epoch: 0 | Step: 359410 | Dataset: 0-12578760 | Loss: 0.718 | 597 ms/step , 115600.99 GFLOP/s , 173851.2 tokens/s INFO:__main__:2024-11-30 12:14:22 | Epoch: 0 | Step: 359420 | Dataset: 0-12581160 | Loss: 0.654 | 597 ms/step , 115671.99 GFLOP/s , 173799.1 tokens/s INFO:__main__:2024-11-30 12:14:29 | Epoch: 0 | Step: 359430 | Dataset: 0-12583560 | Loss: 0.671 | 597 ms/step , 115536.39 GFLOP/s , 173774.2 tokens/s INFO:__main__:2024-11-30 12:14:36 | Epoch: 0 | Step: 359440 | Dataset: 0-12585960 | Loss: 0.726 | 596 ms/step , 115779.27 GFLOP/s , 173750.7 tokens/s INFO:__main__:2024-11-30 12:14:43 | Epoch: 0 | Step: 359450 | Dataset: 0-12588360 | Loss: 0.683 | 596 ms/step , 115727.24 GFLOP/s , 173746.4 tokens/s INFO:__main__:2024-11-30 12:14:50 | Epoch: 0 | Step: 359460 | Dataset: 0-12590760 | Loss: 0.691 | 596 ms/step , 115754.98 GFLOP/s , 173908.0 tokens/s INFO:__main__:2024-11-30 12:14:57 | Epoch: 0 | Step: 359470 | Dataset: 0-12593160 | Loss: 0.624 | 596 ms/step , 115879.35 GFLOP/s , 173893.2 tokens/s INFO:__main__:2024-11-30 12:15:04 | Epoch: 0 | Step: 359480 | Dataset: 0-12595560 | Loss: 0.370 | 595 ms/step , 115970.93 GFLOP/s , 174162.1 tokens/s INFO:__main__:2024-11-30 12:15:11 | Epoch: 0 | Step: 359490 | Dataset: 0-12597960 | Loss: 0.386 | 596 ms/step , 115860.96 GFLOP/s , 174155.1 tokens/s INFO:__main__:2024-11-30 12:15:19 | Validation | Step: 359500 | Val_loss: 0.679 | Best_val_loss: 0.3487 INFO:__main__:2024-11-30 12:15:19 | Epoch: 0 | Step: 359500 | Dataset: 0-12600360 | Loss: 0.343 | 594 ms/step , 116156.45 GFLOP/s , 148237.0 tokens/s INFO:__main__:2024-11-30 12:15:27 | Epoch: 0 | Step: 359510 | Dataset: 0-12602760 | Loss: 0.355 | 594 ms/step , 116132.19 GFLOP/s , 173803.0 tokens/s INFO:__main__:2024-11-30 12:15:34 | Epoch: 0 | Step: 359520 | Dataset: 0-12605160 | Loss: 0.321 | 595 ms/step , 115933.02 GFLOP/s , 173744.3 tokens/s INFO:__main__:2024-11-30 12:15:41 | Epoch: 0 | Step: 359530 | Dataset: 0-12607560 | Loss: 0.363 | 595 ms/step , 115983.95 GFLOP/s , 173657.1 tokens/s INFO:__main__:2024-11-30 12:15:48 | Epoch: 0 | Step: 359540 | Dataset: 0-12609960 | Loss: 0.361 | 595 ms/step , 115916.54 GFLOP/s , 173602.5 tokens/s INFO:__main__:2024-11-30 12:15:55 | Epoch: 0 | Step: 359550 | Dataset: 0-12612360 | Loss: 0.296 | 595 ms/step , 116040.34 GFLOP/s , 173619.7 tokens/s INFO:__main__:2024-11-30 12:16:02 | Epoch: 0 | Step: 359560 | Dataset: 0-12614760 | Loss: 0.320 | 595 ms/step , 115982.49 GFLOP/s , 173685.7 tokens/s INFO:__main__:2024-11-30 12:16:09 | Epoch: 0 | Step: 359570 | Dataset: 0-12617160 | Loss: 0.350 | 595 ms/step , 115934.10 GFLOP/s , 173720.2 tokens/s INFO:__main__:2024-11-30 12:16:16 | Epoch: 0 | Step: 359580 | Dataset: 0-12619560 | Loss: 0.355 | 594 ms/step , 116108.81 GFLOP/s , 173581.8 tokens/s INFO:__main__:2024-11-30 12:16:23 | Epoch: 0 | Step: 359590 | Dataset: 0-12621960 | Loss: 0.675 | 595 ms/step , 115939.67 GFLOP/s , 173557.3 tokens/s INFO:__main__:2024-11-30 12:16:30 | Epoch: 0 | Step: 359600 | Dataset: 0-12624360 | Loss: 0.631 | 597 ms/step , 115672.85 GFLOP/s , 173460.2 tokens/s INFO:__main__:2024-11-30 12:16:37 | Epoch: 0 | Step: 359610 | Dataset: 0-12626760 | Loss: 0.629 | 596 ms/step , 115833.62 GFLOP/s , 173590.8 tokens/s INFO:__main__:2024-11-30 12:16:44 | Epoch: 0 | Step: 359620 | Dataset: 0-12629160 | Loss: 0.676 | 599 ms/step , 115237.14 GFLOP/s , 173507.5 tokens/s INFO:__main__:2024-11-30 12:16:51 | Epoch: 0 | Step: 359630 | Dataset: 0-12631560 | Loss: 0.694 | 596 ms/step , 115782.75 GFLOP/s , 173550.1 tokens/s INFO:__main__:2024-11-30 12:16:59 | Epoch: 0 | Step: 359640 | Dataset: 0-12633960 | Loss: 0.761 | 596 ms/step , 115783.12 GFLOP/s , 173541.1 tokens/s INFO:__main__:2024-11-30 12:17:06 | Epoch: 0 | Step: 359650 | Dataset: 0-12636360 | Loss: 0.755 | 596 ms/step , 115715.25 GFLOP/s , 173514.4 tokens/s INFO:__main__:2024-11-30 12:17:13 | Epoch: 0 | Step: 359660 | Dataset: 0-12638760 | Loss: 0.625 | 595 ms/step , 115968.60 GFLOP/s , 173356.7 tokens/s INFO:__main__:2024-11-30 12:17:20 | Epoch: 0 | Step: 359670 | Dataset: 0-12641160 | Loss: 0.653 | 595 ms/step , 115894.35 GFLOP/s , 173531.1 tokens/s INFO:__main__:2024-11-30 12:17:27 | Epoch: 0 | Step: 359680 | Dataset: 0-12643560 | Loss: 0.695 | 595 ms/step , 115929.33 GFLOP/s , 173488.9 tokens/s INFO:__main__:2024-11-30 12:17:34 | Epoch: 0 | Step: 359690 | Dataset: 0-12645960 | Loss: 0.648 | 596 ms/step , 115798.56 GFLOP/s , 173400.2 tokens/s INFO:__main__:2024-11-30 12:17:41 | Epoch: 0 | Step: 359700 | Dataset: 0-12648360 | Loss: 0.709 | 595 ms/step , 115920.38 GFLOP/s , 173531.1 tokens/s INFO:__main__:2024-11-30 12:17:48 | Epoch: 0 | Step: 359710 | Dataset: 0-12650760 | Loss: 0.706 | 596 ms/step , 115734.51 GFLOP/s , 173527.1 tokens/s INFO:__main__:2024-11-30 12:17:55 | Epoch: 0 | Step: 359720 | Dataset: 0-12653160 | Loss: 0.597 | 596 ms/step , 115766.93 GFLOP/s , 173574.6 tokens/s INFO:__main__:2024-11-30 12:18:02 | Epoch: 0 | Step: 359730 | Dataset: 0-12655560 | Loss: 0.751 | 596 ms/step , 115875.42 GFLOP/s , 173524.8 tokens/s INFO:__main__:2024-11-30 12:18:09 | Epoch: 0 | Step: 359740 | Dataset: 0-12657960 | Loss: 0.689 | 596 ms/step , 115795.27 GFLOP/s , 173561.1 tokens/s INFO:__main__:2024-11-30 12:18:16 | Epoch: 0 | Step: 359750 | Dataset: 0-12660360 | Loss: 0.579 | 595 ms/step , 115906.05 GFLOP/s , 173570.8 tokens/s INFO:__main__:2024-11-30 12:18:24 | Epoch: 0 | Step: 359760 | Dataset: 0-12662760 | Loss: 0.693 | 596 ms/step , 115753.22 GFLOP/s , 173553.2 tokens/s INFO:__main__:2024-11-30 12:18:31 | Epoch: 0 | Step: 359770 | Dataset: 0-12665160 | Loss: 0.695 | 596 ms/step , 115785.06 GFLOP/s , 173695.6 tokens/s INFO:__main__:2024-11-30 12:18:38 | Epoch: 0 | Step: 359780 | Dataset: 0-12667560 | Loss: 0.593 | 596 ms/step , 115850.65 GFLOP/s , 173578.5 tokens/s INFO:__main__:2024-11-30 12:18:45 | Epoch: 0 | Step: 359790 | Dataset: 0-12669960 | Loss: 0.638 | 595 ms/step , 115990.26 GFLOP/s , 173589.7 tokens/s INFO:__main__:2024-11-30 12:18:52 | Epoch: 0 | Step: 359800 | Dataset: 0-12672360 | Loss: 0.731 | 596 ms/step , 115870.26 GFLOP/s , 173530.9 tokens/s INFO:__main__:2024-11-30 12:18:59 | Epoch: 0 | Step: 359810 | Dataset: 0-12674760 | Loss: 0.702 | 595 ms/step , 116062.35 GFLOP/s , 173615.2 tokens/s INFO:__main__:2024-11-30 12:19:06 | Epoch: 0 | Step: 359820 | Dataset: 0-12677160 | Loss: 0.656 | 595 ms/step , 115943.63 GFLOP/s , 173601.3 tokens/s INFO:__main__:2024-11-30 12:19:13 | Epoch: 0 | Step: 359830 | Dataset: 0-12679560 | Loss: 0.614 | 595 ms/step , 115919.61 GFLOP/s , 173603.0 tokens/s INFO:__main__:2024-11-30 12:19:20 | Epoch: 0 | Step: 359840 | Dataset: 0-12681960 | Loss: 0.600 | 596 ms/step , 115749.54 GFLOP/s , 173542.6 tokens/s INFO:__main__:2024-11-30 12:19:27 | Epoch: 0 | Step: 359850 | Dataset: 0-12684360 | Loss: 0.741 | 595 ms/step , 115929.98 GFLOP/s , 173562.5 tokens/s INFO:__main__:2024-11-30 12:19:34 | Epoch: 0 | Step: 359860 | Dataset: 0-12686760 | Loss: 0.591 | 596 ms/step , 115786.92 GFLOP/s , 173643.4 tokens/s INFO:__main__:2024-11-30 12:19:41 | Epoch: 0 | Step: 359870 | Dataset: 0-12689160 | Loss: 0.593 | 596 ms/step , 115785.40 GFLOP/s , 173577.8 tokens/s INFO:__main__:2024-11-30 12:19:48 | Epoch: 0 | Step: 359880 | Dataset: 0-12691560 | Loss: 0.726 | 596 ms/step , 115770.62 GFLOP/s , 173594.4 tokens/s INFO:__main__:2024-11-30 12:19:56 | Epoch: 0 | Step: 359890 | Dataset: 0-12693960 | Loss: 0.694 | 596 ms/step , 115856.19 GFLOP/s , 173512.2 tokens/s INFO:__main__:2024-11-30 12:20:03 | Epoch: 0 | Step: 359900 | Dataset: 0-12696360 | Loss: 0.686 | 595 ms/step , 115951.96 GFLOP/s , 173475.0 tokens/s INFO:__main__:2024-11-30 12:20:10 | Epoch: 0 | Step: 359910 | Dataset: 0-12698760 | Loss: 0.618 | 595 ms/step , 115932.06 GFLOP/s , 173560.0 tokens/s INFO:__main__:2024-11-30 12:20:17 | Epoch: 0 | Step: 359920 | Dataset: 0-12701160 | Loss: 0.692 | 595 ms/step , 115893.53 GFLOP/s , 173617.5 tokens/s INFO:__main__:2024-11-30 12:20:24 | Epoch: 0 | Step: 359930 | Dataset: 0-12703560 | Loss: 0.618 | 596 ms/step , 115759.89 GFLOP/s , 173574.3 tokens/s INFO:__main__:2024-11-30 12:20:31 | Epoch: 0 | Step: 359940 | Dataset: 0-12705960 | Loss: 0.716 | 595 ms/step , 115961.04 GFLOP/s , 173500.0 tokens/s INFO:__main__:2024-11-30 12:20:38 | Epoch: 0 | Step: 359950 | Dataset: 0-12708360 | Loss: 0.714 | 595 ms/step , 115944.54 GFLOP/s , 173625.4 tokens/s INFO:__main__:2024-11-30 12:20:45 | Epoch: 0 | Step: 359960 | Dataset: 0-12710760 | Loss: 0.689 | 595 ms/step , 115916.80 GFLOP/s , 173457.5 tokens/s INFO:__main__:2024-11-30 12:20:52 | Epoch: 0 | Step: 359970 | Dataset: 0-12713160 | Loss: 0.619 | 595 ms/step , 116048.68 GFLOP/s , 173604.8 tokens/s INFO:__main__:2024-11-30 12:20:59 | Epoch: 0 | Step: 359980 | Dataset: 0-12715560 | Loss: 0.711 | 595 ms/step , 115897.24 GFLOP/s , 173574.9 tokens/s INFO:__main__:2024-11-30 12:21:06 | Epoch: 0 | Step: 359990 | Dataset: 0-12717960 | Loss: 0.636 | 595 ms/step , 115905.84 GFLOP/s , 173598.0 tokens/s INFO:__main__:2024-11-30 12:21:14 | Validation | Step: 360000 | Val_loss: 0.672 | Best_val_loss: 0.3487 INFO:__main__:2024-11-30 12:21:14 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_122114_step_360000.pt` INFO:__main__:2024-11-30 12:21:16 | Epoch: 0 | Step: 360000 | Dataset: 0-12720360 | Loss: 0.599 | 594 ms/step , 116264.18 GFLOP/s , 121630.2 tokens/s INFO:__main__:2024-11-30 12:21:24 | Epoch: 0 | Step: 360010 | Dataset: 0-12722760 | Loss: 0.702 | 597 ms/step , 115513.52 GFLOP/s , 173744.4 tokens/s INFO:__main__:2024-11-30 12:21:31 | Epoch: 0 | Step: 360020 | Dataset: 0-12725160 | Loss: 0.635 | 597 ms/step , 115641.25 GFLOP/s , 173746.4 tokens/s INFO:__main__:2024-11-30 12:21:38 | Epoch: 0 | Step: 360030 | Dataset: 0-12727560 | Loss: 0.643 | 597 ms/step , 115660.54 GFLOP/s , 173657.4 tokens/s INFO:__main__:2024-11-30 12:21:45 | Epoch: 0 | Step: 360040 | Dataset: 0-12729960 | Loss: 0.688 | 598 ms/step , 115498.23 GFLOP/s , 173621.4 tokens/s INFO:__main__:2024-11-30 12:21:52 | Epoch: 0 | Step: 360050 | Dataset: 0-12732360 | Loss: 0.632 | 598 ms/step , 115442.27 GFLOP/s , 173564.5 tokens/s INFO:__main__:2024-11-30 12:21:59 | Epoch: 0 | Step: 360060 | Dataset: 0-12734760 | Loss: 0.646 | 597 ms/step , 115535.14 GFLOP/s , 173537.0 tokens/s INFO:__main__:2024-11-30 12:22:06 | Epoch: 0 | Step: 360070 | Dataset: 0-12737160 | Loss: 0.658 | 598 ms/step , 115433.48 GFLOP/s , 173572.8 tokens/s INFO:__main__:2024-11-30 12:22:13 | Epoch: 0 | Step: 360080 | Dataset: 0-12739560 | Loss: 0.647 | 598 ms/step , 115428.86 GFLOP/s , 173629.4 tokens/s INFO:__main__:2024-11-30 12:22:20 | Epoch: 0 | Step: 360090 | Dataset: 0-12741960 | Loss: 0.637 | 598 ms/step , 115467.93 GFLOP/s , 173594.0 tokens/s INFO:__main__:2024-11-30 12:22:27 | Epoch: 0 | Step: 360100 | Dataset: 0-12744360 | Loss: 0.689 | 597 ms/step , 115509.00 GFLOP/s , 173629.9 tokens/s INFO:__main__:2024-11-30 12:22:34 | Epoch: 0 | Step: 360110 | Dataset: 0-12746760 | Loss: 0.716 | 598 ms/step , 115461.45 GFLOP/s , 173614.5 tokens/s INFO:__main__:2024-11-30 12:22:41 | Epoch: 0 | Step: 360120 | Dataset: 0-12749160 | Loss: 0.621 | 597 ms/step , 115580.91 GFLOP/s , 173570.6 tokens/s INFO:__main__:2024-11-30 12:22:48 | Epoch: 0 | Step: 360130 | Dataset: 0-12751560 | Loss: 0.833 | 598 ms/step , 115364.45 GFLOP/s , 173565.7 tokens/s INFO:__main__:2024-11-30 12:22:56 | Epoch: 0 | Step: 360140 | Dataset: 0-12753960 | Loss: 0.776 | 598 ms/step , 115405.21 GFLOP/s , 173444.8 tokens/s INFO:__main__:2024-11-30 12:23:03 | Epoch: 0 | Step: 360150 | Dataset: 0-12756360 | Loss: 0.750 | 598 ms/step , 115359.78 GFLOP/s , 173459.6 tokens/s INFO:__main__:2024-11-30 12:23:10 | Epoch: 0 | Step: 360160 | Dataset: 0-12758760 | Loss: 0.792 | 598 ms/step , 115387.09 GFLOP/s , 173365.3 tokens/s INFO:__main__:2024-11-30 12:23:17 | Epoch: 0 | Step: 360170 | Dataset: 0-12761160 | Loss: 0.765 | 598 ms/step , 115357.98 GFLOP/s , 173423.9 tokens/s INFO:__main__:2024-11-30 12:23:24 | Epoch: 0 | Step: 360180 | Dataset: 0-12763560 | Loss: 0.725 | 598 ms/step , 115451.39 GFLOP/s , 173451.7 tokens/s INFO:__main__:2024-11-30 12:23:31 | Epoch: 0 | Step: 360190 | Dataset: 0-12765960 | Loss: 0.823 | 598 ms/step , 115432.92 GFLOP/s , 173426.9 tokens/s INFO:__main__:2024-11-30 12:23:38 | Epoch: 0 | Step: 360200 | Dataset: 0-12768360 | Loss: 0.762 | 598 ms/step , 115391.60 GFLOP/s , 173427.4 tokens/s INFO:__main__:2024-11-30 12:23:45 | Epoch: 0 | Step: 360210 | Dataset: 0-12770760 | Loss: 0.822 | 598 ms/step , 115418.42 GFLOP/s , 173344.6 tokens/s INFO:__main__:2024-11-30 12:23:52 | Epoch: 0 | Step: 360220 | Dataset: 0-12773160 | Loss: 0.782 | 598 ms/step , 115409.77 GFLOP/s , 173431.8 tokens/s INFO:__main__:2024-11-30 12:23:59 | Epoch: 0 | Step: 360230 | Dataset: 0-12775560 | Loss: 0.746 | 598 ms/step , 115421.26 GFLOP/s , 173416.1 tokens/s INFO:__main__:2024-11-30 12:24:06 | Epoch: 0 | Step: 360240 | Dataset: 0-12777960 | Loss: 0.760 | 598 ms/step , 115472.89 GFLOP/s , 173430.2 tokens/s INFO:__main__:2024-11-30 12:24:13 | Epoch: 0 | Step: 360250 | Dataset: 0-12780360 | Loss: 0.735 | 598 ms/step , 115323.68 GFLOP/s , 173403.2 tokens/s INFO:__main__:2024-11-30 12:24:21 | Epoch: 0 | Step: 360260 | Dataset: 0-12782760 | Loss: 0.753 | 598 ms/step , 115360.99 GFLOP/s , 173475.9 tokens/s INFO:__main__:2024-11-30 12:24:28 | Epoch: 0 | Step: 360270 | Dataset: 0-12785160 | Loss: 0.702 | 598 ms/step , 115499.01 GFLOP/s , 173513.5 tokens/s INFO:__main__:2024-11-30 12:24:35 | Epoch: 0 | Step: 360280 | Dataset: 0-12787560 | Loss: 0.762 | 597 ms/step , 115558.97 GFLOP/s , 173526.4 tokens/s INFO:__main__:2024-11-30 12:24:42 | Epoch: 0 | Step: 360290 | Dataset: 0-12789960 | Loss: 0.777 | 598 ms/step , 115468.59 GFLOP/s , 173417.5 tokens/s INFO:__main__:2024-11-30 12:24:49 | Epoch: 0 | Step: 360300 | Dataset: 0-12792360 | Loss: 0.754 | 597 ms/step , 115525.20 GFLOP/s , 173469.0 tokens/s INFO:__main__:2024-11-30 12:24:56 | Epoch: 0 | Step: 360310 | Dataset: 0-12794760 | Loss: 0.775 | 598 ms/step , 115327.18 GFLOP/s , 173484.6 tokens/s INFO:__main__:2024-11-30 12:25:03 | Epoch: 0 | Step: 360320 | Dataset: 0-12797160 | Loss: 0.738 | 598 ms/step , 115450.73 GFLOP/s , 173353.5 tokens/s INFO:__main__:2024-11-30 12:25:10 | Epoch: 0 | Step: 360330 | Dataset: 0-12799560 | Loss: 0.787 | 598 ms/step , 115435.76 GFLOP/s , 173499.0 tokens/s INFO:__main__:2024-11-30 12:25:17 | Epoch: 0 | Step: 360340 | Dataset: 0-12801960 | Loss: 0.722 | 599 ms/step , 115249.71 GFLOP/s , 173389.4 tokens/s INFO:__main__:2024-11-30 12:25:24 | Epoch: 0 | Step: 360350 | Dataset: 0-12804360 | Loss: 0.681 | 597 ms/step , 115531.75 GFLOP/s , 173463.7 tokens/s INFO:__main__:2024-11-30 12:25:31 | Epoch: 0 | Step: 360360 | Dataset: 0-12806760 | Loss: 0.758 | 597 ms/step , 115534.46 GFLOP/s , 173440.2 tokens/s INFO:__main__:2024-11-30 12:25:39 | Epoch: 0 | Step: 360370 | Dataset: 0-12809160 | Loss: 0.741 | 599 ms/step , 115308.25 GFLOP/s , 173436.5 tokens/s INFO:__main__:2024-11-30 12:25:46 | Epoch: 0 | Step: 360380 | Dataset: 0-12811560 | Loss: 0.743 | 597 ms/step , 115569.95 GFLOP/s , 173466.5 tokens/s INFO:__main__:2024-11-30 12:25:53 | Epoch: 0 | Step: 360390 | Dataset: 0-12813960 | Loss: 0.729 | 599 ms/step , 115282.53 GFLOP/s , 173425.6 tokens/s INFO:__main__:2024-11-30 12:26:00 | Epoch: 0 | Step: 360400 | Dataset: 0-12816360 | Loss: 0.760 | 598 ms/step , 115315.05 GFLOP/s , 173517.2 tokens/s INFO:__main__:2024-11-30 12:26:07 | Epoch: 0 | Step: 360410 | Dataset: 0-12818760 | Loss: 0.732 | 597 ms/step , 115553.24 GFLOP/s , 173490.5 tokens/s INFO:__main__:2024-11-30 12:26:14 | Epoch: 0 | Step: 360420 | Dataset: 0-12821160 | Loss: 0.746 | 598 ms/step , 115403.07 GFLOP/s , 173305.5 tokens/s INFO:__main__:2024-11-30 12:26:21 | Epoch: 0 | Step: 360430 | Dataset: 0-12823560 | Loss: 0.769 | 598 ms/step , 115370.61 GFLOP/s , 173439.5 tokens/s INFO:__main__:2024-11-30 12:26:28 | Epoch: 0 | Step: 360440 | Dataset: 0-12825960 | Loss: 0.745 | 598 ms/step , 115480.06 GFLOP/s , 173431.2 tokens/s INFO:__main__:2024-11-30 12:26:35 | Epoch: 0 | Step: 360450 | Dataset: 0-12828360 | Loss: 0.760 | 598 ms/step , 115381.51 GFLOP/s , 173386.2 tokens/s INFO:__main__:2024-11-30 12:26:42 | Epoch: 0 | Step: 360460 | Dataset: 0-12830760 | Loss: 0.777 | 598 ms/step , 115423.39 GFLOP/s , 173485.5 tokens/s INFO:__main__:2024-11-30 12:26:49 | Epoch: 0 | Step: 360470 | Dataset: 0-12833160 | Loss: 0.788 | 598 ms/step , 115448.91 GFLOP/s , 173476.1 tokens/s INFO:__main__:2024-11-30 12:26:56 | Epoch: 0 | Step: 360480 | Dataset: 0-12835560 | Loss: 0.708 | 597 ms/step , 115532.15 GFLOP/s , 173475.5 tokens/s INFO:__main__:2024-11-30 12:27:04 | Epoch: 0 | Step: 360490 | Dataset: 0-12837960 | Loss: 0.674 | 598 ms/step , 115446.86 GFLOP/s , 173443.0 tokens/s INFO:__main__:2024-11-30 12:27:11 | Validation | Step: 360500 | Val_loss: 0.683 | Best_val_loss: 0.3487 INFO:__main__:2024-11-30 12:27:12 | Epoch: 0 | Step: 360500 | Dataset: 0-12840360 | Loss: 0.739 | 596 ms/step , 115844.05 GFLOP/s , 147610.3 tokens/s INFO:__main__:2024-11-30 12:27:19 | Epoch: 0 | Step: 360510 | Dataset: 0-12842760 | Loss: 0.781 | 598 ms/step , 115367.21 GFLOP/s , 173525.8 tokens/s INFO:__main__:2024-11-30 12:27:26 | Epoch: 0 | Step: 360520 | Dataset: 0-12845160 | Loss: 0.707 | 598 ms/step , 115373.60 GFLOP/s , 173560.2 tokens/s INFO:__main__:2024-11-30 12:27:33 | Epoch: 0 | Step: 360530 | Dataset: 0-12847560 | Loss: 0.735 | 598 ms/step , 115322.74 GFLOP/s , 173559.9 tokens/s INFO:__main__:2024-11-30 12:27:40 | Epoch: 0 | Step: 360540 | Dataset: 0-12849960 | Loss: 0.689 | 599 ms/step , 115295.35 GFLOP/s , 173505.3 tokens/s INFO:__main__:2024-11-30 12:27:47 | Epoch: 0 | Step: 360550 | Dataset: 0-12852360 | Loss: 0.758 | 598 ms/step , 115463.98 GFLOP/s , 173444.8 tokens/s INFO:__main__:2024-11-30 12:27:54 | Epoch: 0 | Step: 360560 | Dataset: 0-12854760 | Loss: 0.750 | 598 ms/step , 115444.64 GFLOP/s , 173486.0 tokens/s INFO:__main__:2024-11-30 12:28:01 | Epoch: 0 | Step: 360570 | Dataset: 0-12857160 | Loss: 0.793 | 598 ms/step , 115388.90 GFLOP/s , 173501.1 tokens/s INFO:__main__:2024-11-30 12:28:08 | Epoch: 0 | Step: 360580 | Dataset: 0-12859560 | Loss: 0.765 | 597 ms/step , 115570.38 GFLOP/s , 173576.0 tokens/s INFO:__main__:2024-11-30 12:28:16 | Epoch: 0 | Step: 360590 | Dataset: 0-12861960 | Loss: 0.618 | 597 ms/step , 115510.45 GFLOP/s , 173521.7 tokens/s INFO:__main__:2024-11-30 12:28:23 | Epoch: 0 | Step: 360600 | Dataset: 0-12864360 | Loss: 0.738 | 598 ms/step , 115439.07 GFLOP/s , 173453.1 tokens/s INFO:__main__:2024-11-30 12:28:30 | Epoch: 0 | Step: 360610 | Dataset: 0-12866760 | Loss: 0.768 | 598 ms/step , 115337.94 GFLOP/s , 173448.8 tokens/s INFO:__main__:2024-11-30 12:28:37 | Epoch: 0 | Step: 360620 | Dataset: 0-12869160 | Loss: 0.773 | 598 ms/step , 115387.12 GFLOP/s , 173483.2 tokens/s INFO:__main__:2024-11-30 12:28:44 | Epoch: 0 | Step: 360630 | Dataset: 0-12871560 | Loss: 0.730 | 598 ms/step , 115467.30 GFLOP/s , 173444.0 tokens/s INFO:__main__:2024-11-30 12:28:51 | Epoch: 0 | Step: 360640 | Dataset: 0-12873960 | Loss: 0.779 | 598 ms/step , 115406.30 GFLOP/s , 173456.2 tokens/s INFO:__main__:2024-11-30 12:28:58 | Epoch: 0 | Step: 360650 | Dataset: 0-12876360 | Loss: 0.732 | 598 ms/step , 115443.07 GFLOP/s , 173474.9 tokens/s INFO:__main__:2024-11-30 12:29:05 | Epoch: 0 | Step: 360660 | Dataset: 0-12878760 | Loss: 0.785 | 598 ms/step , 115460.39 GFLOP/s , 173407.2 tokens/s INFO:__main__:2024-11-30 12:29:12 | Epoch: 0 | Step: 360670 | Dataset: 0-12881160 | Loss: 0.729 | 598 ms/step , 115405.91 GFLOP/s , 173401.3 tokens/s INFO:__main__:2024-11-30 12:29:19 | Epoch: 1 | Step: 360680 | Dataset: 0-1898 | Loss: 0.290 | 596 ms/step , 115770.32 GFLOP/s , 173670.0 tokens/s INFO:__main__:2024-11-30 12:29:26 | Epoch: 1 | Step: 360690 | Dataset: 0-4298 | Loss: 0.503 | 597 ms/step , 115589.26 GFLOP/s , 173884.4 tokens/s INFO:__main__:2024-11-30 12:29:33 | Epoch: 1 | Step: 360700 | Dataset: 0-6698 | Loss: 0.386 | 597 ms/step , 115674.36 GFLOP/s , 173841.6 tokens/s INFO:__main__:2024-11-30 12:29:41 | Epoch: 1 | Step: 360710 | Dataset: 0-9098 | Loss: 0.693 | 598 ms/step , 115398.91 GFLOP/s , 173607.4 tokens/s INFO:__main__:2024-11-30 12:29:48 | Epoch: 1 | Step: 360720 | Dataset: 0-11498 | Loss: 0.666 | 598 ms/step , 115414.67 GFLOP/s , 173537.4 tokens/s INFO:__main__:2024-11-30 12:29:55 | Epoch: 1 | Step: 360730 | Dataset: 0-13898 | Loss: 0.557 | 597 ms/step , 115549.59 GFLOP/s , 173543.9 tokens/s INFO:__main__:2024-11-30 12:30:02 | Epoch: 1 | Step: 360740 | Dataset: 0-16298 | Loss: 0.765 | 599 ms/step , 115265.08 GFLOP/s , 173527.5 tokens/s INFO:__main__:2024-11-30 12:30:09 | Epoch: 1 | Step: 360750 | Dataset: 0-18698 | Loss: 0.636 | 598 ms/step , 115339.76 GFLOP/s , 173545.5 tokens/s INFO:__main__:2024-11-30 12:30:16 | Epoch: 1 | Step: 360760 | Dataset: 0-21098 | Loss: 0.645 | 597 ms/step , 115532.40 GFLOP/s , 173515.7 tokens/s INFO:__main__:2024-11-30 12:30:23 | Epoch: 1 | Step: 360770 | Dataset: 0-23498 | Loss: 0.622 | 598 ms/step , 115382.45 GFLOP/s , 173538.5 tokens/s INFO:__main__:2024-11-30 12:30:30 | Epoch: 1 | Step: 360780 | Dataset: 0-25898 | Loss: 0.584 | 598 ms/step , 115395.64 GFLOP/s , 173450.8 tokens/s INFO:__main__:2024-11-30 12:30:37 | Epoch: 1 | Step: 360790 | Dataset: 0-28298 | Loss: 0.588 | 597 ms/step , 115524.76 GFLOP/s , 173475.7 tokens/s INFO:__main__:2024-11-30 12:30:44 | Epoch: 1 | Step: 360800 | Dataset: 0-30698 | Loss: 0.669 | 598 ms/step , 115408.35 GFLOP/s , 173440.4 tokens/s INFO:__main__:2024-11-30 12:30:51 | Epoch: 1 | Step: 360810 | Dataset: 0-33098 | Loss: 0.600 | 597 ms/step , 115526.00 GFLOP/s , 173462.8 tokens/s INFO:__main__:2024-11-30 12:30:58 | Epoch: 1 | Step: 360820 | Dataset: 0-35498 | Loss: 0.593 | 598 ms/step , 115491.33 GFLOP/s , 173532.7 tokens/s INFO:__main__:2024-11-30 12:31:06 | Epoch: 1 | Step: 360830 | Dataset: 0-37898 | Loss: 0.721 | 603 ms/step , 114465.67 GFLOP/s , 173361.1 tokens/s INFO:__main__:2024-11-30 12:31:13 | Epoch: 1 | Step: 360840 | Dataset: 0-40298 | Loss: 0.684 | 597 ms/step , 115532.45 GFLOP/s , 173515.5 tokens/s INFO:__main__:2024-11-30 12:31:20 | Epoch: 1 | Step: 360850 | Dataset: 0-42698 | Loss: 0.593 | 596 ms/step , 115783.85 GFLOP/s , 173537.0 tokens/s INFO:__main__:2024-11-30 12:31:27 | Epoch: 1 | Step: 360860 | Dataset: 0-45098 | Loss: 0.557 | 595 ms/step , 115891.88 GFLOP/s , 173479.1 tokens/s INFO:__main__:2024-11-30 12:31:34 | Epoch: 1 | Step: 360870 | Dataset: 0-47498 | Loss: 0.665 | 596 ms/step , 115761.19 GFLOP/s , 173503.3 tokens/s INFO:__main__:2024-11-30 12:31:41 | Epoch: 1 | Step: 360880 | Dataset: 0-49898 | Loss: 0.739 | 596 ms/step , 115734.95 GFLOP/s , 173474.2 tokens/s INFO:__main__:2024-11-30 12:31:48 | Epoch: 1 | Step: 360890 | Dataset: 0-52298 | Loss: 0.663 | 595 ms/step , 115893.54 GFLOP/s , 173499.6 tokens/s INFO:__main__:2024-11-30 12:31:55 | Epoch: 1 | Step: 360900 | Dataset: 0-54698 | Loss: 0.665 | 596 ms/step , 115756.00 GFLOP/s , 173488.8 tokens/s INFO:__main__:2024-11-30 12:32:02 | Epoch: 1 | Step: 360910 | Dataset: 0-57098 | Loss: 0.627 | 596 ms/step , 115742.63 GFLOP/s , 173376.3 tokens/s INFO:__main__:2024-11-30 12:32:09 | Epoch: 1 | Step: 360920 | Dataset: 0-59498 | Loss: 0.638 | 596 ms/step , 115857.24 GFLOP/s , 173552.9 tokens/s INFO:__main__:2024-11-30 12:32:16 | Epoch: 1 | Step: 360930 | Dataset: 0-61898 | Loss: 0.711 | 596 ms/step , 115734.17 GFLOP/s , 173473.5 tokens/s INFO:__main__:2024-11-30 12:32:23 | Epoch: 1 | Step: 360940 | Dataset: 0-64298 | Loss: 0.634 | 596 ms/step , 115799.64 GFLOP/s , 173434.0 tokens/s INFO:__main__:2024-11-30 12:32:31 | Epoch: 1 | Step: 360950 | Dataset: 0-66698 | Loss: 0.559 | 596 ms/step , 115746.02 GFLOP/s , 173493.4 tokens/s INFO:__main__:2024-11-30 12:32:38 | Epoch: 1 | Step: 360960 | Dataset: 0-69098 | Loss: 0.580 | 596 ms/step , 115743.62 GFLOP/s , 173550.8 tokens/s INFO:__main__:2024-11-30 12:32:45 | Epoch: 1 | Step: 360970 | Dataset: 0-71498 | Loss: 0.670 | 596 ms/step , 115776.79 GFLOP/s , 173426.7 tokens/s INFO:__main__:2024-11-30 12:32:52 | Epoch: 1 | Step: 360980 | Dataset: 0-73898 | Loss: 0.575 | 596 ms/step , 115844.90 GFLOP/s , 173467.2 tokens/s INFO:__main__:2024-11-30 12:32:59 | Epoch: 1 | Step: 360990 | Dataset: 0-76298 | Loss: 0.603 | 596 ms/step , 115863.67 GFLOP/s , 173453.5 tokens/s INFO:__main__:2024-11-30 12:33:06 | Validation | Step: 361000 | Val_loss: 0.651 | Best_val_loss: 0.3487 INFO:__main__:2024-11-30 12:33:06 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_123306_step_361000.pt` INFO:__main__:2024-11-30 12:33:09 | Epoch: 1 | Step: 361000 | Dataset: 0-78698 | Loss: 0.580 | 595 ms/step , 116049.44 GFLOP/s , 121476.8 tokens/s INFO:__main__:2024-11-30 12:33:16 | Epoch: 1 | Step: 361010 | Dataset: 0-81098 | Loss: 0.636 | 598 ms/step , 115418.77 GFLOP/s , 173510.6 tokens/s INFO:__main__:2024-11-30 12:33:23 | Epoch: 1 | Step: 361020 | Dataset: 0-83498 | Loss: 0.651 | 596 ms/step , 115737.28 GFLOP/s , 173533.7 tokens/s INFO:__main__:2024-11-30 12:33:30 | Epoch: 1 | Step: 361030 | Dataset: 0-85898 | Loss: 0.652 | 597 ms/step , 115651.30 GFLOP/s , 173654.7 tokens/s INFO:__main__:2024-11-30 12:33:37 | Epoch: 1 | Step: 361040 | Dataset: 0-88298 | Loss: 0.651 | 596 ms/step , 115845.00 GFLOP/s , 173598.4 tokens/s INFO:__main__:2024-11-30 12:33:44 | Epoch: 1 | Step: 361050 | Dataset: 0-90698 | Loss: 0.566 | 596 ms/step , 115709.79 GFLOP/s , 173565.3 tokens/s INFO:__main__:2024-11-30 12:33:51 | Epoch: 1 | Step: 361060 | Dataset: 0-93098 | Loss: 0.505 | 596 ms/step , 115727.88 GFLOP/s , 173577.9 tokens/s INFO:__main__:2024-11-30 12:33:59 | Epoch: 1 | Step: 361070 | Dataset: 0-95498 | Loss: 0.675 | 596 ms/step , 115856.74 GFLOP/s , 173500.2 tokens/s INFO:__main__:2024-11-30 12:34:06 | Epoch: 1 | Step: 361080 | Dataset: 0-97898 | Loss: 0.719 | 597 ms/step , 115667.03 GFLOP/s , 173582.0 tokens/s INFO:__main__:2024-11-30 12:34:13 | Epoch: 1 | Step: 361090 | Dataset: 0-100298 | Loss: 0.637 | 596 ms/step , 115862.36 GFLOP/s , 173627.6 tokens/s INFO:__main__:2024-11-30 12:34:20 | Epoch: 1 | Step: 361100 | Dataset: 0-102698 | Loss: 0.620 | 597 ms/step , 115653.06 GFLOP/s , 173497.4 tokens/s INFO:__main__:2024-11-30 12:34:27 | Epoch: 1 | Step: 361110 | Dataset: 0-105098 | Loss: 0.561 | 596 ms/step , 115791.22 GFLOP/s , 173546.6 tokens/s INFO:__main__:2024-11-30 12:34:34 | Epoch: 1 | Step: 361120 | Dataset: 0-107498 | Loss: 0.600 | 596 ms/step , 115767.65 GFLOP/s , 173512.0 tokens/s INFO:__main__:2024-11-30 12:34:41 | Epoch: 1 | Step: 361130 | Dataset: 0-109898 | Loss: 0.706 | 596 ms/step , 115876.14 GFLOP/s , 173400.6 tokens/s INFO:__main__:2024-11-30 12:34:48 | Epoch: 1 | Step: 361140 | Dataset: 0-112298 | Loss: 0.723 | 599 ms/step , 115306.98 GFLOP/s , 173470.2 tokens/s INFO:__main__:2024-11-30 12:34:55 | Epoch: 1 | Step: 361150 | Dataset: 0-114698 | Loss: 0.651 | 596 ms/step , 115817.78 GFLOP/s , 173488.2 tokens/s INFO:__main__:2024-11-30 12:35:02 | Epoch: 1 | Step: 361160 | Dataset: 0-117098 | Loss: 0.658 | 596 ms/step , 115815.07 GFLOP/s , 173527.9 tokens/s INFO:__main__:2024-11-30 12:35:09 | Epoch: 1 | Step: 361170 | Dataset: 0-119498 | Loss: 0.623 | 596 ms/step , 115780.36 GFLOP/s , 173441.4 tokens/s INFO:__main__:2024-11-30 12:35:16 | Epoch: 1 | Step: 361180 | Dataset: 0-121898 | Loss: 0.658 | 595 ms/step , 115906.21 GFLOP/s , 173443.2 tokens/s INFO:__main__:2024-11-30 12:35:24 | Epoch: 1 | Step: 361190 | Dataset: 0-124298 | Loss: 0.523 | 596 ms/step , 115849.92 GFLOP/s , 173428.1 tokens/s INFO:__main__:2024-11-30 12:35:31 | Epoch: 1 | Step: 361200 | Dataset: 0-126698 | Loss: 0.695 | 596 ms/step , 115696.27 GFLOP/s , 173527.6 tokens/s INFO:__main__:2024-11-30 12:35:38 | Epoch: 1 | Step: 361210 | Dataset: 0-129098 | Loss: 0.621 | 596 ms/step , 115705.53 GFLOP/s , 173508.2 tokens/s INFO:__main__:2024-11-30 12:35:45 | Epoch: 1 | Step: 361220 | Dataset: 0-131498 | Loss: 0.634 | 597 ms/step , 115626.52 GFLOP/s , 173444.9 tokens/s INFO:__main__:2024-11-30 12:35:52 | Epoch: 1 | Step: 361230 | Dataset: 0-133898 | Loss: 0.547 | 596 ms/step , 115831.32 GFLOP/s , 173481.9 tokens/s INFO:__main__:2024-11-30 12:35:59 | Epoch: 1 | Step: 361240 | Dataset: 0-136298 | Loss: 0.606 | 596 ms/step , 115701.63 GFLOP/s , 173399.1 tokens/s INFO:__main__:2024-11-30 12:36:06 | Epoch: 1 | Step: 361250 | Dataset: 0-138698 | Loss: 0.618 | 596 ms/step , 115872.08 GFLOP/s , 173404.2 tokens/s INFO:__main__:2024-11-30 12:36:13 | Epoch: 1 | Step: 361260 | Dataset: 0-141098 | Loss: 0.574 | 595 ms/step , 115925.34 GFLOP/s , 173395.2 tokens/s INFO:__main__:2024-11-30 12:36:20 | Epoch: 1 | Step: 361270 | Dataset: 0-143498 | Loss: 0.535 | 595 ms/step , 115977.24 GFLOP/s , 173428.4 tokens/s INFO:__main__:2024-11-30 12:36:27 | Epoch: 1 | Step: 361280 | Dataset: 0-145898 | Loss: 0.681 | 597 ms/step , 115603.44 GFLOP/s , 173489.8 tokens/s INFO:__main__:2024-11-30 12:36:34 | Epoch: 1 | Step: 361290 | Dataset: 0-148298 | Loss: 0.593 | 596 ms/step , 115796.04 GFLOP/s , 173940.8 tokens/s INFO:__main__:2024-11-30 12:36:41 | Epoch: 1 | Step: 361300 | Dataset: 0-150698 | Loss: 0.573 | 596 ms/step , 115742.96 GFLOP/s , 173710.9 tokens/s INFO:__main__:2024-11-30 12:36:48 | Epoch: 1 | Step: 361310 | Dataset: 0-153098 | Loss: 0.522 | 596 ms/step , 115833.21 GFLOP/s , 173968.1 tokens/s INFO:__main__:2024-11-30 12:36:56 | Epoch: 1 | Step: 361320 | Dataset: 0-155498 | Loss: 0.584 | 597 ms/step , 115635.63 GFLOP/s , 173900.6 tokens/s INFO:__main__:2024-11-30 12:37:03 | Epoch: 1 | Step: 361330 | Dataset: 0-157898 | Loss: 0.549 | 597 ms/step , 115611.58 GFLOP/s , 173884.1 tokens/s INFO:__main__:2024-11-30 12:37:10 | Epoch: 1 | Step: 361340 | Dataset: 0-160298 | Loss: 0.560 | 596 ms/step , 115850.87 GFLOP/s , 173891.0 tokens/s INFO:__main__:2024-11-30 12:37:17 | Epoch: 1 | Step: 361350 | Dataset: 0-162698 | Loss: 0.553 | 596 ms/step , 115751.66 GFLOP/s , 173912.0 tokens/s INFO:__main__:2024-11-30 12:37:24 | Epoch: 1 | Step: 361360 | Dataset: 0-165098 | Loss: 0.506 | 596 ms/step , 115864.57 GFLOP/s , 173918.6 tokens/s INFO:__main__:2024-11-30 12:37:31 | Epoch: 1 | Step: 361370 | Dataset: 0-167498 | Loss: 0.532 | 596 ms/step , 115808.56 GFLOP/s , 173868.8 tokens/s INFO:__main__:2024-11-30 12:37:38 | Epoch: 1 | Step: 361380 | Dataset: 0-169898 | Loss: 0.570 | 596 ms/step , 115713.91 GFLOP/s , 173912.0 tokens/s INFO:__main__:2024-11-30 12:37:45 | Epoch: 1 | Step: 361390 | Dataset: 0-172298 | Loss: 0.560 | 596 ms/step , 115729.53 GFLOP/s , 173902.6 tokens/s INFO:__main__:2024-11-30 12:37:52 | Epoch: 1 | Step: 361400 | Dataset: 0-174698 | Loss: 0.592 | 596 ms/step , 115764.21 GFLOP/s , 173822.9 tokens/s INFO:__main__:2024-11-30 12:37:59 | Epoch: 1 | Step: 361410 | Dataset: 0-177098 | Loss: 0.569 | 597 ms/step , 115673.64 GFLOP/s , 173808.5 tokens/s INFO:__main__:2024-11-30 12:38:06 | Epoch: 1 | Step: 361420 | Dataset: 0-179498 | Loss: 0.537 | 597 ms/step , 115657.13 GFLOP/s , 173868.3 tokens/s INFO:__main__:2024-11-30 12:38:13 | Epoch: 1 | Step: 361430 | Dataset: 0-181898 | Loss: 0.523 | 597 ms/step , 115683.52 GFLOP/s , 173829.0 tokens/s INFO:__main__:2024-11-30 12:38:20 | Epoch: 1 | Step: 361440 | Dataset: 0-184298 | Loss: 0.570 | 596 ms/step , 115820.66 GFLOP/s , 173953.2 tokens/s INFO:__main__:2024-11-30 12:38:27 | Epoch: 1 | Step: 361450 | Dataset: 0-186698 | Loss: 0.525 | 597 ms/step , 115602.32 GFLOP/s , 173906.8 tokens/s INFO:__main__:2024-11-30 12:38:34 | Epoch: 1 | Step: 361460 | Dataset: 0-189098 | Loss: 0.677 | 596 ms/step , 115753.18 GFLOP/s , 173882.2 tokens/s INFO:__main__:2024-11-30 12:38:42 | Epoch: 1 | Step: 361470 | Dataset: 0-191498 | Loss: 0.567 | 595 ms/step , 115907.48 GFLOP/s , 173903.3 tokens/s INFO:__main__:2024-11-30 12:38:49 | Epoch: 1 | Step: 361480 | Dataset: 0-193898 | Loss: 0.519 | 596 ms/step , 115802.97 GFLOP/s , 173971.0 tokens/s INFO:__main__:2024-11-30 12:38:56 | Epoch: 1 | Step: 361490 | Dataset: 0-196298 | Loss: 0.641 | 597 ms/step , 115693.77 GFLOP/s , 173928.6 tokens/s INFO:__main__:2024-11-30 12:39:03 | Validation | Step: 361500 | Val_loss: 0.605 | Best_val_loss: 0.3487 INFO:__main__:2024-11-30 12:39:04 | Epoch: 1 | Step: 361500 | Dataset: 0-198698 | Loss: 0.509 | 596 ms/step , 115731.83 GFLOP/s , 148158.4 tokens/s INFO:__main__:2024-11-30 12:39:11 | Epoch: 1 | Step: 361510 | Dataset: 0-201098 | Loss: 0.514 | 596 ms/step , 115884.06 GFLOP/s , 173490.9 tokens/s INFO:__main__:2024-11-30 12:39:18 | Epoch: 1 | Step: 361520 | Dataset: 0-203498 | Loss: 0.533 | 598 ms/step , 115447.58 GFLOP/s , 173407.8 tokens/s INFO:__main__:2024-11-30 12:39:25 | Epoch: 1 | Step: 361530 | Dataset: 0-205898 | Loss: 0.505 | 596 ms/step , 115718.19 GFLOP/s , 173465.0 tokens/s INFO:__main__:2024-11-30 12:39:32 | Epoch: 1 | Step: 361540 | Dataset: 0-208298 | Loss: 0.470 | 596 ms/step , 115881.49 GFLOP/s , 173433.1 tokens/s INFO:__main__:2024-11-30 12:39:39 | Epoch: 1 | Step: 361550 | Dataset: 0-210698 | Loss: 0.538 | 595 ms/step , 116008.44 GFLOP/s , 173443.8 tokens/s INFO:__main__:2024-11-30 12:39:46 | Epoch: 1 | Step: 361560 | Dataset: 0-213098 | Loss: 0.544 | 596 ms/step , 115720.03 GFLOP/s , 173408.3 tokens/s INFO:__main__:2024-11-30 12:39:54 | Epoch: 1 | Step: 361570 | Dataset: 0-215498 | Loss: 0.602 | 596 ms/step , 115744.48 GFLOP/s , 173352.0 tokens/s INFO:__main__:2024-11-30 12:40:01 | Epoch: 1 | Step: 361580 | Dataset: 0-217898 | Loss: 0.537 | 596 ms/step , 115742.62 GFLOP/s , 173365.5 tokens/s INFO:__main__:2024-11-30 12:40:08 | Epoch: 1 | Step: 361590 | Dataset: 0-220298 | Loss: 0.546 | 596 ms/step , 115747.36 GFLOP/s , 173470.5 tokens/s INFO:__main__:2024-11-30 12:40:15 | Epoch: 1 | Step: 361600 | Dataset: 0-222698 | Loss: 0.525 | 598 ms/step , 115481.58 GFLOP/s , 173477.5 tokens/s INFO:__main__:2024-11-30 12:40:22 | Epoch: 1 | Step: 361610 | Dataset: 0-225098 | Loss: 0.500 | 596 ms/step , 115781.17 GFLOP/s , 173552.5 tokens/s INFO:__main__:2024-11-30 12:40:29 | Epoch: 1 | Step: 361620 | Dataset: 0-227498 | Loss: 0.613 | 595 ms/step , 115928.27 GFLOP/s , 173361.1 tokens/s INFO:__main__:2024-11-30 12:40:36 | Epoch: 1 | Step: 361630 | Dataset: 0-229898 | Loss: 0.506 | 596 ms/step , 115868.51 GFLOP/s , 173459.0 tokens/s INFO:__main__:2024-11-30 12:40:43 | Epoch: 1 | Step: 361640 | Dataset: 0-232298 | Loss: 0.537 | 596 ms/step , 115845.17 GFLOP/s , 173419.7 tokens/s INFO:__main__:2024-11-30 12:40:50 | Epoch: 1 | Step: 361650 | Dataset: 0-234698 | Loss: 0.477 | 596 ms/step , 115846.88 GFLOP/s , 173488.9 tokens/s INFO:__main__:2024-11-30 12:40:57 | Epoch: 1 | Step: 361660 | Dataset: 0-237098 | Loss: 0.567 | 595 ms/step , 115915.22 GFLOP/s , 173325.6 tokens/s INFO:__main__:2024-11-30 12:41:04 | Epoch: 1 | Step: 361670 | Dataset: 0-239498 | Loss: 0.542 | 595 ms/step , 116037.39 GFLOP/s , 173537.3 tokens/s INFO:__main__:2024-11-30 12:41:12 | Epoch: 1 | Step: 361680 | Dataset: 0-241898 | Loss: 0.588 | 596 ms/step , 115815.52 GFLOP/s , 173509.1 tokens/s INFO:__main__:2024-11-30 12:41:19 | Epoch: 1 | Step: 361690 | Dataset: 0-244298 | Loss: 0.470 | 596 ms/step , 115726.70 GFLOP/s , 173544.7 tokens/s INFO:__main__:2024-11-30 12:41:26 | Epoch: 1 | Step: 361700 | Dataset: 0-246698 | Loss: 0.513 | 596 ms/step , 115866.37 GFLOP/s , 173472.6 tokens/s INFO:__main__:2024-11-30 12:41:33 | Epoch: 1 | Step: 361710 | Dataset: 0-249098 | Loss: 0.548 | 596 ms/step , 115822.90 GFLOP/s , 173529.5 tokens/s INFO:__main__:2024-11-30 12:41:40 | Epoch: 1 | Step: 361720 | Dataset: 0-251498 | Loss: 0.551 | 596 ms/step , 115825.18 GFLOP/s , 173511.5 tokens/s INFO:__main__:2024-11-30 12:41:47 | Epoch: 1 | Step: 361730 | Dataset: 0-253898 | Loss: 0.617 | 596 ms/step , 115700.11 GFLOP/s , 173537.0 tokens/s INFO:__main__:2024-11-30 12:41:54 | Epoch: 1 | Step: 361740 | Dataset: 0-256298 | Loss: 0.523 | 595 ms/step , 115950.93 GFLOP/s , 173398.4 tokens/s INFO:__main__:2024-11-30 12:42:01 | Epoch: 1 | Step: 361750 | Dataset: 0-258698 | Loss: 0.554 | 596 ms/step , 115858.69 GFLOP/s , 173476.3 tokens/s INFO:__main__:2024-11-30 12:42:08 | Epoch: 1 | Step: 361760 | Dataset: 0-261098 | Loss: 0.586 | 596 ms/step , 115887.43 GFLOP/s , 173653.5 tokens/s INFO:__main__:2024-11-30 12:42:15 | Epoch: 1 | Step: 361770 | Dataset: 0-263498 | Loss: 1.366 | 595 ms/step , 115953.49 GFLOP/s , 173551.1 tokens/s INFO:__main__:2024-11-30 12:42:22 | Epoch: 1 | Step: 361780 | Dataset: 0-265898 | Loss: 1.297 | 595 ms/step , 115958.90 GFLOP/s , 173451.0 tokens/s INFO:__main__:2024-11-30 12:42:29 | Epoch: 1 | Step: 361790 | Dataset: 0-268298 | Loss: 1.848 | 600 ms/step , 114964.79 GFLOP/s , 173391.5 tokens/s INFO:__main__:2024-11-30 12:42:36 | Epoch: 1 | Step: 361800 | Dataset: 0-270698 | Loss: 1.507 | 595 ms/step , 115953.90 GFLOP/s , 173405.7 tokens/s INFO:__main__:2024-11-30 12:42:44 | Epoch: 1 | Step: 361810 | Dataset: 0-273098 | Loss: 1.418 | 596 ms/step , 115804.82 GFLOP/s , 173526.7 tokens/s INFO:__main__:2024-11-30 12:42:51 | Epoch: 1 | Step: 361820 | Dataset: 0-275498 | Loss: 1.760 | 596 ms/step , 115853.64 GFLOP/s , 173482.0 tokens/s INFO:__main__:2024-11-30 12:42:58 | Epoch: 1 | Step: 361830 | Dataset: 0-277898 | Loss: 1.805 | 596 ms/step , 115714.96 GFLOP/s , 173490.5 tokens/s INFO:__main__:2024-11-30 12:43:05 | Epoch: 1 | Step: 361840 | Dataset: 0-280298 | Loss: 2.552 | 596 ms/step , 115733.29 GFLOP/s , 173505.2 tokens/s INFO:__main__:2024-11-30 12:43:12 | Epoch: 1 | Step: 361850 | Dataset: 0-282698 | Loss: 1.845 | 595 ms/step , 115913.96 GFLOP/s , 173433.0 tokens/s INFO:__main__:2024-11-30 12:43:19 | Epoch: 1 | Step: 361860 | Dataset: 0-285098 | Loss: 1.954 | 596 ms/step , 115854.87 GFLOP/s , 173551.3 tokens/s INFO:__main__:2024-11-30 12:43:26 | Epoch: 1 | Step: 361870 | Dataset: 0-287498 | Loss: 0.442 | 595 ms/step , 115953.11 GFLOP/s , 173558.9 tokens/s INFO:__main__:2024-11-30 12:43:33 | Epoch: 1 | Step: 361880 | Dataset: 0-289898 | Loss: 0.206 | 595 ms/step , 116074.38 GFLOP/s , 173705.6 tokens/s INFO:__main__:2024-11-30 12:43:40 | Epoch: 1 | Step: 361890 | Dataset: 0-292298 | Loss: 0.457 | 596 ms/step , 115836.47 GFLOP/s , 173522.0 tokens/s INFO:__main__:2024-11-30 12:43:47 | Epoch: 1 | Step: 361900 | Dataset: 0-294698 | Loss: 1.350 | 596 ms/step , 115710.19 GFLOP/s , 173507.4 tokens/s INFO:__main__:2024-11-30 12:43:54 | Epoch: 1 | Step: 361910 | Dataset: 0-297098 | Loss: 1.056 | 595 ms/step , 116028.63 GFLOP/s , 173618.0 tokens/s INFO:__main__:2024-11-30 12:44:01 | Epoch: 1 | Step: 361920 | Dataset: 0-299498 | Loss: 1.017 | 596 ms/step , 115818.81 GFLOP/s , 173599.8 tokens/s INFO:__main__:2024-11-30 12:44:09 | Epoch: 1 | Step: 361930 | Dataset: 0-301898 | Loss: 1.209 | 596 ms/step , 115763.78 GFLOP/s , 173547.6 tokens/s INFO:__main__:2024-11-30 12:44:16 | Epoch: 1 | Step: 361940 | Dataset: 0-304298 | Loss: 0.981 | 595 ms/step , 115978.98 GFLOP/s , 173578.7 tokens/s INFO:__main__:2024-11-30 12:44:23 | Epoch: 1 | Step: 361950 | Dataset: 0-306698 | Loss: 1.312 | 596 ms/step , 115818.59 GFLOP/s , 173511.7 tokens/s INFO:__main__:2024-11-30 12:44:30 | Epoch: 1 | Step: 361960 | Dataset: 0-309098 | Loss: 1.279 | 595 ms/step , 115902.33 GFLOP/s , 173552.9 tokens/s INFO:__main__:2024-11-30 12:44:37 | Epoch: 1 | Step: 361970 | Dataset: 0-311498 | Loss: 1.808 | 596 ms/step , 115844.29 GFLOP/s , 173592.7 tokens/s INFO:__main__:2024-11-30 12:44:44 | Epoch: 1 | Step: 361980 | Dataset: 0-313898 | Loss: 0.883 | 595 ms/step , 115897.22 GFLOP/s , 173660.2 tokens/s INFO:__main__:2024-11-30 12:44:51 | Epoch: 1 | Step: 361990 | Dataset: 0-316298 | Loss: 0.773 | 595 ms/step , 116069.09 GFLOP/s , 173673.0 tokens/s INFO:__main__:2024-11-30 12:44:59 | Validation | Step: 362000 | Val_loss: 0.580 | Best_val_loss: 0.3487 INFO:__main__:2024-11-30 12:44:59 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_124459_step_362000.pt` INFO:__main__:2024-11-30 12:45:01 | Epoch: 1 | Step: 362000 | Dataset: 0-318698 | Loss: 0.212 | 593 ms/step , 116407.98 GFLOP/s , 122227.7 tokens/s INFO:__main__:2024-11-30 12:45:08 | Epoch: 1 | Step: 362010 | Dataset: 0-321098 | Loss: 0.219 | 595 ms/step , 116044.93 GFLOP/s , 174212.1 tokens/s INFO:__main__:2024-11-30 12:45:15 | Epoch: 1 | Step: 362020 | Dataset: 0-323498 | Loss: 0.355 | 596 ms/step , 115762.14 GFLOP/s , 174104.1 tokens/s INFO:__main__:2024-11-30 12:45:22 | Epoch: 1 | Step: 362030 | Dataset: 0-325898 | Loss: 0.406 | 597 ms/step , 115609.49 GFLOP/s , 174084.3 tokens/s INFO:__main__:2024-11-30 12:45:29 | Epoch: 1 | Step: 362040 | Dataset: 0-328298 | Loss: 0.311 | 596 ms/step , 115820.43 GFLOP/s , 174067.7 tokens/s INFO:__main__:2024-11-30 12:45:36 | Epoch: 1 | Step: 362050 | Dataset: 0-330698 | Loss: 0.385 | 596 ms/step , 115701.00 GFLOP/s , 173998.5 tokens/s INFO:__main__:2024-11-30 12:45:43 | Epoch: 1 | Step: 362060 | Dataset: 0-333098 | Loss: 0.384 | 596 ms/step , 115775.43 GFLOP/s , 173953.4 tokens/s INFO:__main__:2024-11-30 12:45:50 | Epoch: 1 | Step: 362070 | Dataset: 0-335498 | Loss: 0.222 | 597 ms/step , 115545.05 GFLOP/s , 173881.6 tokens/s INFO:__main__:2024-11-30 12:45:58 | Epoch: 1 | Step: 362080 | Dataset: 0-337898 | Loss: 0.086 | 596 ms/step , 115821.09 GFLOP/s , 173951.5 tokens/s INFO:__main__:2024-11-30 12:46:05 | Epoch: 1 | Step: 362090 | Dataset: 0-340298 | Loss: 0.377 | 597 ms/step , 115590.49 GFLOP/s , 173957.7 tokens/s INFO:__main__:2024-11-30 12:46:12 | Epoch: 1 | Step: 362100 | Dataset: 0-342698 | Loss: 0.368 | 595 ms/step , 115910.51 GFLOP/s , 173982.0 tokens/s INFO:__main__:2024-11-30 12:46:19 | Epoch: 1 | Step: 362110 | Dataset: 0-345098 | Loss: 0.256 | 595 ms/step , 116040.71 GFLOP/s , 174034.5 tokens/s INFO:__main__:2024-11-30 12:46:26 | Epoch: 1 | Step: 362120 | Dataset: 0-347498 | Loss: 0.175 | 596 ms/step , 115834.68 GFLOP/s , 174055.9 tokens/s INFO:__main__:2024-11-30 12:46:33 | Epoch: 1 | Step: 362130 | Dataset: 0-349898 | Loss: 0.143 | 596 ms/step , 115752.34 GFLOP/s , 173966.4 tokens/s INFO:__main__:2024-11-30 12:46:40 | Epoch: 1 | Step: 362140 | Dataset: 0-352298 | Loss: 0.362 | 596 ms/step , 115750.67 GFLOP/s , 173939.5 tokens/s INFO:__main__:2024-11-30 12:46:47 | Epoch: 1 | Step: 362150 | Dataset: 0-354698 | Loss: 0.194 | 596 ms/step , 115843.83 GFLOP/s , 173980.2 tokens/s INFO:__main__:2024-11-30 12:46:54 | Epoch: 1 | Step: 362160 | Dataset: 0-357098 | Loss: 0.376 | 597 ms/step , 115661.32 GFLOP/s , 173898.2 tokens/s INFO:__main__:2024-11-30 12:47:01 | Epoch: 1 | Step: 362170 | Dataset: 0-359498 | Loss: 0.239 | 596 ms/step , 115885.25 GFLOP/s , 173893.7 tokens/s INFO:__main__:2024-11-30 12:47:08 | Epoch: 1 | Step: 362180 | Dataset: 0-361898 | Loss: 0.266 | 596 ms/step , 115776.98 GFLOP/s , 173851.9 tokens/s INFO:__main__:2024-11-30 12:47:15 | Epoch: 1 | Step: 362190 | Dataset: 0-364298 | Loss: 0.226 | 596 ms/step , 115731.96 GFLOP/s , 173922.6 tokens/s INFO:__main__:2024-11-30 12:47:22 | Epoch: 1 | Step: 362200 | Dataset: 0-366698 | Loss: 0.167 | 596 ms/step , 115834.33 GFLOP/s , 174026.5 tokens/s INFO:__main__:2024-11-30 12:47:29 | Epoch: 1 | Step: 362210 | Dataset: 0-369098 | Loss: 0.298 | 595 ms/step , 115910.89 GFLOP/s , 173966.8 tokens/s INFO:__main__:2024-11-30 12:47:36 | Epoch: 1 | Step: 362220 | Dataset: 0-371498 | Loss: 0.404 | 597 ms/step , 115629.78 GFLOP/s , 173863.0 tokens/s INFO:__main__:2024-11-30 12:47:44 | Epoch: 1 | Step: 362230 | Dataset: 0-373898 | Loss: 0.493 | 597 ms/step , 115636.32 GFLOP/s , 173787.6 tokens/s INFO:__main__:2024-11-30 12:47:51 | Epoch: 1 | Step: 362240 | Dataset: 0-376298 | Loss: 0.616 | 597 ms/step , 115686.26 GFLOP/s , 173679.0 tokens/s INFO:__main__:2024-11-30 12:47:58 | Epoch: 1 | Step: 362250 | Dataset: 0-378698 | Loss: 0.506 | 597 ms/step , 115640.04 GFLOP/s , 173794.2 tokens/s INFO:__main__:2024-11-30 12:48:05 | Epoch: 1 | Step: 362260 | Dataset: 0-381098 | Loss: 0.242 | 597 ms/step , 115622.90 GFLOP/s , 173812.5 tokens/s INFO:__main__:2024-11-30 12:48:12 | Epoch: 1 | Step: 362270 | Dataset: 0-383498 | Loss: 0.632 | 597 ms/step , 115618.25 GFLOP/s , 173769.8 tokens/s INFO:__main__:2024-11-30 12:48:19 | Epoch: 1 | Step: 362280 | Dataset: 0-385898 | Loss: 1.093 | 598 ms/step , 115435.42 GFLOP/s , 173721.7 tokens/s INFO:__main__:2024-11-30 12:48:26 | Epoch: 1 | Step: 362290 | Dataset: 0-388298 | Loss: 1.263 | 597 ms/step , 115545.88 GFLOP/s , 173528.9 tokens/s INFO:__main__:2024-11-30 12:48:33 | Epoch: 1 | Step: 362300 | Dataset: 0-390698 | Loss: 0.277 | 596 ms/step , 115767.67 GFLOP/s , 173578.4 tokens/s INFO:__main__:2024-11-30 12:48:40 | Epoch: 1 | Step: 362310 | Dataset: 0-393098 | Loss: 0.570 | 597 ms/step , 115585.66 GFLOP/s , 173702.4 tokens/s INFO:__main__:2024-11-30 12:48:47 | Epoch: 1 | Step: 362320 | Dataset: 0-395498 | Loss: 0.985 | 597 ms/step , 115597.33 GFLOP/s , 173626.0 tokens/s INFO:__main__:2024-11-30 12:48:54 | Epoch: 1 | Step: 362330 | Dataset: 0-397898 | Loss: 0.605 | 597 ms/step , 115677.87 GFLOP/s , 173633.7 tokens/s INFO:__main__:2024-11-30 12:49:01 | Epoch: 1 | Step: 362340 | Dataset: 0-400298 | Loss: 0.241 | 597 ms/step , 115638.82 GFLOP/s , 173784.5 tokens/s INFO:__main__:2024-11-30 12:49:08 | Epoch: 1 | Step: 362350 | Dataset: 0-402698 | Loss: 0.250 | 597 ms/step , 115648.73 GFLOP/s , 173718.5 tokens/s INFO:__main__:2024-11-30 12:49:15 | Epoch: 1 | Step: 362360 | Dataset: 0-405098 | Loss: 0.252 | 596 ms/step , 115771.29 GFLOP/s , 173975.0 tokens/s INFO:__main__:2024-11-30 12:49:23 | Epoch: 1 | Step: 362370 | Dataset: 0-407498 | Loss: 0.228 | 596 ms/step , 115697.61 GFLOP/s , 173997.9 tokens/s INFO:__main__:2024-11-30 12:49:30 | Epoch: 1 | Step: 362380 | Dataset: 0-409898 | Loss: 0.115 | 596 ms/step , 115848.12 GFLOP/s , 173904.9 tokens/s INFO:__main__:2024-11-30 12:49:37 | Epoch: 1 | Step: 362390 | Dataset: 0-412298 | Loss: 0.631 | 597 ms/step , 115662.43 GFLOP/s , 173740.0 tokens/s INFO:__main__:2024-11-30 12:49:44 | Epoch: 1 | Step: 362400 | Dataset: 0-414698 | Loss: 0.217 | 596 ms/step , 115710.53 GFLOP/s , 173834.6 tokens/s INFO:__main__:2024-11-30 12:49:51 | Epoch: 1 | Step: 362410 | Dataset: 0-417098 | Loss: 0.266 | 596 ms/step , 115783.67 GFLOP/s , 173721.0 tokens/s INFO:__main__:2024-11-30 12:49:58 | Epoch: 1 | Step: 362420 | Dataset: 0-419498 | Loss: 0.447 | 595 ms/step , 115898.72 GFLOP/s , 173856.5 tokens/s INFO:__main__:2024-11-30 12:50:05 | Epoch: 1 | Step: 362430 | Dataset: 0-421898 | Loss: 0.548 | 597 ms/step , 115691.51 GFLOP/s , 173704.1 tokens/s INFO:__main__:2024-11-30 12:50:12 | Epoch: 1 | Step: 362440 | Dataset: 0-424298 | Loss: 0.588 | 596 ms/step , 115731.99 GFLOP/s , 173750.7 tokens/s INFO:__main__:2024-11-30 12:50:19 | Epoch: 1 | Step: 362450 | Dataset: 0-426698 | Loss: 0.471 | 597 ms/step , 115671.68 GFLOP/s , 173706.4 tokens/s INFO:__main__:2024-11-30 12:50:26 | Epoch: 1 | Step: 362460 | Dataset: 0-429098 | Loss: 0.297 | 596 ms/step , 115769.93 GFLOP/s , 173865.3 tokens/s INFO:__main__:2024-11-30 12:50:33 | Epoch: 1 | Step: 362470 | Dataset: 0-431498 | Loss: 0.284 | 599 ms/step , 115259.91 GFLOP/s , 173990.6 tokens/s INFO:__main__:2024-11-30 12:50:40 | Epoch: 1 | Step: 362480 | Dataset: 0-433898 | Loss: 0.946 | 597 ms/step , 115650.53 GFLOP/s , 173658.4 tokens/s INFO:__main__:2024-11-30 12:50:47 | Epoch: 1 | Step: 362490 | Dataset: 0-436298 | Loss: 0.302 | 597 ms/step , 115669.53 GFLOP/s , 173854.0 tokens/s INFO:__main__:2024-11-30 12:50:55 | Validation | Step: 362500 | Val_loss: 0.564 | Best_val_loss: 0.3487 INFO:__main__:2024-11-30 12:50:56 | Epoch: 1 | Step: 362500 | Dataset: 0-438698 | Loss: 0.599 | 596 ms/step , 115815.86 GFLOP/s , 147802.3 tokens/s INFO:__main__:2024-11-30 12:51:03 | Epoch: 1 | Step: 362510 | Dataset: 0-441098 | Loss: 0.303 | 596 ms/step , 115704.47 GFLOP/s , 173797.4 tokens/s INFO:__main__:2024-11-30 12:51:10 | Epoch: 1 | Step: 362520 | Dataset: 0-443498 | Loss: 0.628 | 597 ms/step , 115683.83 GFLOP/s , 173890.0 tokens/s INFO:__main__:2024-11-30 12:51:17 | Epoch: 1 | Step: 362530 | Dataset: 0-445898 | Loss: 0.427 | 597 ms/step , 115599.94 GFLOP/s , 173877.6 tokens/s INFO:__main__:2024-11-30 12:51:24 | Epoch: 1 | Step: 362540 | Dataset: 0-448298 | Loss: 0.133 | 596 ms/step , 115810.37 GFLOP/s , 173891.5 tokens/s INFO:__main__:2024-11-30 12:51:31 | Epoch: 1 | Step: 362550 | Dataset: 0-450698 | Loss: 0.230 | 596 ms/step , 115793.37 GFLOP/s , 173998.6 tokens/s INFO:__main__:2024-11-30 12:51:38 | Epoch: 1 | Step: 362560 | Dataset: 0-453098 | Loss: 0.159 | 595 ms/step , 116042.10 GFLOP/s , 174057.0 tokens/s INFO:__main__:2024-11-30 12:51:45 | Epoch: 1 | Step: 362570 | Dataset: 0-455498 | Loss: 0.346 | 596 ms/step , 115806.94 GFLOP/s , 174089.8 tokens/s INFO:__main__:2024-11-30 12:51:52 | Epoch: 1 | Step: 362580 | Dataset: 0-457898 | Loss: 0.217 | 596 ms/step , 115739.02 GFLOP/s , 174013.8 tokens/s INFO:__main__:2024-11-30 12:51:59 | Epoch: 1 | Step: 362590 | Dataset: 0-460298 | Loss: 0.186 | 596 ms/step , 115853.68 GFLOP/s , 174029.4 tokens/s INFO:__main__:2024-11-30 12:52:06 | Epoch: 1 | Step: 362600 | Dataset: 0-462698 | Loss: 0.262 | 596 ms/step , 115770.76 GFLOP/s , 173950.9 tokens/s INFO:__main__:2024-11-30 12:52:13 | Epoch: 1 | Step: 362610 | Dataset: 0-465098 | Loss: 0.672 | 598 ms/step , 115483.68 GFLOP/s , 173883.2 tokens/s INFO:__main__:2024-11-30 12:52:20 | Epoch: 1 | Step: 362620 | Dataset: 0-467498 | Loss: 0.498 | 597 ms/step , 115683.52 GFLOP/s , 173872.1 tokens/s INFO:__main__:2024-11-30 12:52:28 | Epoch: 1 | Step: 362630 | Dataset: 0-469898 | Loss: 0.349 | 597 ms/step , 115530.12 GFLOP/s , 173782.5 tokens/s INFO:__main__:2024-11-30 12:52:35 | Epoch: 1 | Step: 362640 | Dataset: 0-472298 | Loss: 0.940 | 598 ms/step , 115411.50 GFLOP/s , 173605.8 tokens/s INFO:__main__:2024-11-30 12:52:42 | Epoch: 1 | Step: 362650 | Dataset: 0-474698 | Loss: 1.008 | 597 ms/step , 115619.61 GFLOP/s , 173661.1 tokens/s INFO:__main__:2024-11-30 12:52:49 | Epoch: 1 | Step: 362660 | Dataset: 0-477098 | Loss: 1.117 | 598 ms/step , 115500.68 GFLOP/s , 173568.5 tokens/s INFO:__main__:2024-11-30 12:52:56 | Epoch: 1 | Step: 362670 | Dataset: 0-479498 | Loss: 0.766 | 597 ms/step , 115657.87 GFLOP/s , 173493.3 tokens/s INFO:__main__:2024-11-30 12:53:03 | Epoch: 1 | Step: 362680 | Dataset: 0-481898 | Loss: 1.020 | 597 ms/step , 115543.47 GFLOP/s , 173625.3 tokens/s INFO:__main__:2024-11-30 12:53:10 | Epoch: 1 | Step: 362690 | Dataset: 0-484298 | Loss: 0.581 | 598 ms/step , 115496.55 GFLOP/s , 173578.6 tokens/s INFO:__main__:2024-11-30 12:53:17 | Epoch: 1 | Step: 362700 | Dataset: 0-486698 | Loss: 1.160 | 597 ms/step , 115603.06 GFLOP/s , 173559.9 tokens/s INFO:__main__:2024-11-30 12:53:24 | Epoch: 1 | Step: 362710 | Dataset: 0-489098 | Loss: 1.087 | 598 ms/step , 115437.51 GFLOP/s , 173610.8 tokens/s INFO:__main__:2024-11-30 12:53:31 | Epoch: 1 | Step: 362720 | Dataset: 0-491498 | Loss: 0.423 | 596 ms/step , 115700.41 GFLOP/s , 173541.9 tokens/s INFO:__main__:2024-11-30 12:53:38 | Epoch: 1 | Step: 362730 | Dataset: 0-493898 | Loss: 0.762 | 597 ms/step , 115506.36 GFLOP/s , 173631.9 tokens/s INFO:__main__:2024-11-30 12:53:45 | Epoch: 1 | Step: 362740 | Dataset: 0-496298 | Loss: 1.328 | 599 ms/step , 115222.58 GFLOP/s , 173535.7 tokens/s INFO:__main__:2024-11-30 12:53:52 | Epoch: 1 | Step: 362750 | Dataset: 0-498698 | Loss: 0.587 | 597 ms/step , 115695.14 GFLOP/s , 173540.9 tokens/s INFO:__main__:2024-11-30 12:54:00 | Epoch: 1 | Step: 362760 | Dataset: 0-501098 | Loss: 1.166 | 598 ms/step , 115452.25 GFLOP/s , 173536.2 tokens/s INFO:__main__:2024-11-30 12:54:07 | Epoch: 1 | Step: 362770 | Dataset: 0-503498 | Loss: 0.392 | 596 ms/step , 115843.24 GFLOP/s , 173758.1 tokens/s INFO:__main__:2024-11-30 12:54:14 | Epoch: 1 | Step: 362780 | Dataset: 0-505898 | Loss: 0.507 | 597 ms/step , 115560.19 GFLOP/s , 173929.3 tokens/s INFO:__main__:2024-11-30 12:54:21 | Epoch: 1 | Step: 362790 | Dataset: 0-508298 | Loss: 0.543 | 596 ms/step , 115716.48 GFLOP/s , 173872.6 tokens/s INFO:__main__:2024-11-30 12:54:28 | Epoch: 1 | Step: 362800 | Dataset: 0-510698 | Loss: 1.133 | 598 ms/step , 115363.95 GFLOP/s , 173598.3 tokens/s INFO:__main__:2024-11-30 12:54:35 | Epoch: 1 | Step: 362810 | Dataset: 0-513098 | Loss: 0.824 | 597 ms/step , 115504.72 GFLOP/s , 173699.9 tokens/s INFO:__main__:2024-11-30 12:54:42 | Epoch: 1 | Step: 362820 | Dataset: 0-515498 | Loss: 0.862 | 597 ms/step , 115616.96 GFLOP/s , 173690.6 tokens/s INFO:__main__:2024-11-30 12:54:49 | Epoch: 1 | Step: 362830 | Dataset: 0-517898 | Loss: 0.359 | 596 ms/step , 115784.13 GFLOP/s , 173873.8 tokens/s INFO:__main__:2024-11-30 12:54:56 | Epoch: 1 | Step: 362840 | Dataset: 0-520298 | Loss: 0.267 | 597 ms/step , 115667.50 GFLOP/s , 173640.7 tokens/s INFO:__main__:2024-11-30 12:55:03 | Epoch: 1 | Step: 362850 | Dataset: 0-522698 | Loss: 0.253 | 597 ms/step , 115658.59 GFLOP/s , 173570.6 tokens/s INFO:__main__:2024-11-30 12:55:10 | Epoch: 1 | Step: 362860 | Dataset: 0-525098 | Loss: 1.087 | 600 ms/step , 115060.71 GFLOP/s , 173668.5 tokens/s INFO:__main__:2024-11-30 12:55:17 | Epoch: 1 | Step: 362870 | Dataset: 0-527498 | Loss: 1.049 | 599 ms/step , 115159.73 GFLOP/s , 173305.9 tokens/s INFO:__main__:2024-11-30 12:55:24 | Epoch: 1 | Step: 362880 | Dataset: 0-529898 | Loss: 1.060 | 598 ms/step , 115385.06 GFLOP/s , 173342.8 tokens/s INFO:__main__:2024-11-30 12:55:32 | Epoch: 1 | Step: 362890 | Dataset: 0-532298 | Loss: 1.060 | 599 ms/step , 115130.67 GFLOP/s , 173218.6 tokens/s INFO:__main__:2024-11-30 12:55:39 | Epoch: 1 | Step: 362900 | Dataset: 0-534698 | Loss: 1.053 | 599 ms/step , 115158.59 GFLOP/s , 173248.7 tokens/s INFO:__main__:2024-11-30 12:55:46 | Epoch: 1 | Step: 362910 | Dataset: 0-537098 | Loss: 1.065 | 598 ms/step , 115480.87 GFLOP/s , 173165.4 tokens/s INFO:__main__:2024-11-30 12:55:53 | Epoch: 1 | Step: 362920 | Dataset: 0-539498 | Loss: 1.056 | 599 ms/step , 115276.93 GFLOP/s , 173352.7 tokens/s INFO:__main__:2024-11-30 12:56:00 | Epoch: 1 | Step: 362930 | Dataset: 0-541898 | Loss: 0.541 | 597 ms/step , 115577.05 GFLOP/s , 173409.0 tokens/s INFO:__main__:2024-11-30 12:56:07 | Epoch: 1 | Step: 362940 | Dataset: 0-544298 | Loss: 0.524 | 597 ms/step , 115528.36 GFLOP/s , 173426.9 tokens/s INFO:__main__:2024-11-30 12:56:14 | Epoch: 1 | Step: 362950 | Dataset: 0-546698 | Loss: 0.467 | 598 ms/step , 115332.56 GFLOP/s , 173476.9 tokens/s INFO:__main__:2024-11-30 12:56:21 | Epoch: 1 | Step: 362960 | Dataset: 0-549098 | Loss: 0.602 | 599 ms/step , 115304.23 GFLOP/s , 173423.0 tokens/s INFO:__main__:2024-11-30 12:56:28 | Epoch: 1 | Step: 362970 | Dataset: 0-551498 | Loss: 0.518 | 597 ms/step , 115551.10 GFLOP/s , 173418.4 tokens/s INFO:__main__:2024-11-30 12:56:35 | Epoch: 1 | Step: 362980 | Dataset: 0-553898 | Loss: 0.552 | 598 ms/step , 115440.91 GFLOP/s , 173570.2 tokens/s INFO:__main__:2024-11-30 12:56:42 | Epoch: 1 | Step: 362990 | Dataset: 0-556298 | Loss: 0.550 | 598 ms/step , 115436.06 GFLOP/s , 173530.5 tokens/s INFO:__main__:2024-11-30 12:56:50 | Validation | Step: 363000 | Val_loss: 0.581 | Best_val_loss: 0.3487 INFO:__main__:2024-11-30 12:56:50 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_125650_step_363000.pt` INFO:__main__:2024-11-30 12:56:53 | Epoch: 1 | Step: 363000 | Dataset: 0-558698 | Loss: 0.530 | 594 ms/step , 116157.15 GFLOP/s , 121881.6 tokens/s INFO:__main__:2024-11-30 12:57:00 | Epoch: 1 | Step: 363010 | Dataset: 0-561098 | Loss: 0.575 | 597 ms/step , 115567.56 GFLOP/s , 173700.6 tokens/s INFO:__main__:2024-11-30 12:57:07 | Epoch: 1 | Step: 363020 | Dataset: 0-563498 | Loss: 0.491 | 597 ms/step , 115635.12 GFLOP/s , 173629.4 tokens/s INFO:__main__:2024-11-30 12:57:14 | Epoch: 1 | Step: 363030 | Dataset: 0-565898 | Loss: 1.411 | 598 ms/step , 115398.33 GFLOP/s , 173651.5 tokens/s INFO:__main__:2024-11-30 12:57:21 | Epoch: 1 | Step: 363040 | Dataset: 0-568298 | Loss: 1.360 | 598 ms/step , 115406.57 GFLOP/s , 173568.2 tokens/s INFO:__main__:2024-11-30 12:57:28 | Epoch: 1 | Step: 363050 | Dataset: 0-570698 | Loss: 1.353 | 598 ms/step , 115355.56 GFLOP/s , 173478.4 tokens/s INFO:__main__:2024-11-30 12:57:35 | Epoch: 1 | Step: 363060 | Dataset: 0-573098 | Loss: 1.195 | 597 ms/step , 115604.54 GFLOP/s , 173530.0 tokens/s INFO:__main__:2024-11-30 12:57:42 | Epoch: 1 | Step: 363070 | Dataset: 0-575498 | Loss: 1.298 | 597 ms/step , 115573.95 GFLOP/s , 173527.2 tokens/s INFO:__main__:2024-11-30 12:57:49 | Epoch: 1 | Step: 363080 | Dataset: 0-577898 | Loss: 1.228 | 598 ms/step , 115455.85 GFLOP/s , 173536.8 tokens/s INFO:__main__:2024-11-30 12:57:56 | Epoch: 1 | Step: 363090 | Dataset: 0-580298 | Loss: 1.307 | 598 ms/step , 115472.16 GFLOP/s , 173466.7 tokens/s INFO:__main__:2024-11-30 12:58:03 | Epoch: 1 | Step: 363100 | Dataset: 0-582698 | Loss: 1.249 | 599 ms/step , 115291.08 GFLOP/s , 173342.4 tokens/s INFO:__main__:2024-11-30 12:58:10 | Epoch: 1 | Step: 363110 | Dataset: 0-585098 | Loss: 1.352 | 598 ms/step , 115391.42 GFLOP/s , 173453.0 tokens/s INFO:__main__:2024-11-30 12:58:17 | Epoch: 1 | Step: 363120 | Dataset: 0-587498 | Loss: 1.290 | 598 ms/step , 115359.66 GFLOP/s , 173432.5 tokens/s INFO:__main__:2024-11-30 12:58:25 | Epoch: 1 | Step: 363130 | Dataset: 0-589898 | Loss: 1.199 | 597 ms/step , 115584.31 GFLOP/s , 173479.7 tokens/s INFO:__main__:2024-11-30 12:58:32 | Epoch: 1 | Step: 363140 | Dataset: 0-592298 | Loss: 1.340 | 598 ms/step , 115445.85 GFLOP/s , 173446.0 tokens/s INFO:__main__:2024-11-30 12:58:39 | Epoch: 1 | Step: 363150 | Dataset: 0-594698 | Loss: 1.247 | 598 ms/step , 115435.44 GFLOP/s , 173396.5 tokens/s INFO:__main__:2024-11-30 12:58:46 | Epoch: 1 | Step: 363160 | Dataset: 0-597098 | Loss: 1.231 | 598 ms/step , 115363.83 GFLOP/s , 173424.4 tokens/s INFO:__main__:2024-11-30 12:58:53 | Epoch: 1 | Step: 363170 | Dataset: 0-599498 | Loss: 1.292 | 598 ms/step , 115462.78 GFLOP/s , 173431.6 tokens/s INFO:__main__:2024-11-30 12:59:00 | Epoch: 1 | Step: 363180 | Dataset: 0-601898 | Loss: 1.282 | 598 ms/step , 115425.65 GFLOP/s , 173514.2 tokens/s INFO:__main__:2024-11-30 12:59:07 | Epoch: 1 | Step: 363190 | Dataset: 0-604298 | Loss: 1.269 | 598 ms/step , 115395.61 GFLOP/s , 173441.1 tokens/s INFO:__main__:2024-11-30 12:59:14 | Epoch: 1 | Step: 363200 | Dataset: 0-606698 | Loss: 1.321 | 599 ms/step , 115306.02 GFLOP/s , 173488.3 tokens/s INFO:__main__:2024-11-30 12:59:21 | Epoch: 1 | Step: 363210 | Dataset: 0-609098 | Loss: 1.158 | 598 ms/step , 115353.91 GFLOP/s , 173409.2 tokens/s INFO:__main__:2024-11-30 12:59:28 | Epoch: 1 | Step: 363220 | Dataset: 0-611498 | Loss: 1.295 | 598 ms/step , 115379.21 GFLOP/s , 173434.0 tokens/s INFO:__main__:2024-11-30 12:59:35 | Epoch: 1 | Step: 363230 | Dataset: 0-613898 | Loss: 1.220 | 598 ms/step , 115396.62 GFLOP/s , 173429.3 tokens/s INFO:__main__:2024-11-30 12:59:43 | Epoch: 1 | Step: 363240 | Dataset: 0-616298 | Loss: 1.258 | 598 ms/step , 115402.99 GFLOP/s , 173406.7 tokens/s INFO:__main__:2024-11-30 12:59:50 | Epoch: 1 | Step: 363250 | Dataset: 0-618698 | Loss: 1.256 | 598 ms/step , 115410.52 GFLOP/s , 173349.2 tokens/s INFO:__main__:2024-11-30 12:59:57 | Epoch: 1 | Step: 363260 | Dataset: 0-621098 | Loss: 1.332 | 598 ms/step , 115340.31 GFLOP/s , 173328.5 tokens/s INFO:__main__:2024-11-30 13:00:04 | Epoch: 1 | Step: 363270 | Dataset: 0-623498 | Loss: 1.307 | 598 ms/step , 115393.71 GFLOP/s , 173268.5 tokens/s INFO:__main__:2024-11-30 13:00:11 | Epoch: 1 | Step: 363280 | Dataset: 0-625898 | Loss: 1.227 | 599 ms/step , 115297.23 GFLOP/s , 173347.1 tokens/s INFO:__main__:2024-11-30 13:00:18 | Epoch: 1 | Step: 363290 | Dataset: 0-628298 | Loss: 1.354 | 598 ms/step , 115360.52 GFLOP/s , 173279.8 tokens/s INFO:__main__:2024-11-30 13:00:25 | Epoch: 1 | Step: 363300 | Dataset: 0-630698 | Loss: 1.214 | 599 ms/step , 115149.28 GFLOP/s , 173137.4 tokens/s INFO:__main__:2024-11-30 13:00:32 | Epoch: 1 | Step: 363310 | Dataset: 0-633098 | Loss: 1.305 | 598 ms/step , 115338.78 GFLOP/s , 173245.9 tokens/s INFO:__main__:2024-11-30 13:00:39 | Epoch: 1 | Step: 363320 | Dataset: 0-635498 | Loss: 1.306 | 599 ms/step , 115227.48 GFLOP/s , 173253.6 tokens/s INFO:__main__:2024-11-30 13:00:46 | Epoch: 1 | Step: 363330 | Dataset: 0-637898 | Loss: 1.242 | 598 ms/step , 115337.75 GFLOP/s , 173274.1 tokens/s INFO:__main__:2024-11-30 13:00:53 | Epoch: 1 | Step: 363340 | Dataset: 0-640298 | Loss: 1.300 | 599 ms/step , 115253.92 GFLOP/s , 173306.2 tokens/s INFO:__main__:2024-11-30 13:01:01 | Epoch: 1 | Step: 363350 | Dataset: 0-642698 | Loss: 1.300 | 599 ms/step , 115271.64 GFLOP/s , 173249.5 tokens/s INFO:__main__:2024-11-30 13:01:08 | Epoch: 1 | Step: 363360 | Dataset: 0-645098 | Loss: 1.318 | 598 ms/step , 115309.21 GFLOP/s , 173288.0 tokens/s INFO:__main__:2024-11-30 13:01:15 | Epoch: 1 | Step: 363370 | Dataset: 0-647498 | Loss: 1.330 | 599 ms/step , 115279.54 GFLOP/s , 173136.4 tokens/s INFO:__main__:2024-11-30 13:01:22 | Epoch: 1 | Step: 363380 | Dataset: 0-649898 | Loss: 1.234 | 599 ms/step , 115218.63 GFLOP/s , 173248.2 tokens/s INFO:__main__:2024-11-30 13:01:29 | Epoch: 1 | Step: 363390 | Dataset: 0-652298 | Loss: 1.212 | 599 ms/step , 115246.48 GFLOP/s , 173255.5 tokens/s INFO:__main__:2024-11-30 13:01:36 | Epoch: 1 | Step: 363400 | Dataset: 0-654698 | Loss: 1.261 | 599 ms/step , 115219.00 GFLOP/s , 173183.3 tokens/s INFO:__main__:2024-11-30 13:01:43 | Epoch: 1 | Step: 363410 | Dataset: 0-657098 | Loss: 0.450 | 599 ms/step , 115224.94 GFLOP/s , 173352.0 tokens/s INFO:__main__:2024-11-30 13:01:50 | Epoch: 1 | Step: 363420 | Dataset: 0-659498 | Loss: 0.551 | 598 ms/step , 115433.44 GFLOP/s , 173400.8 tokens/s INFO:__main__:2024-11-30 13:01:57 | Epoch: 1 | Step: 363430 | Dataset: 0-661898 | Loss: 0.478 | 598 ms/step , 115495.66 GFLOP/s , 173389.4 tokens/s INFO:__main__:2024-11-30 13:02:04 | Epoch: 1 | Step: 363440 | Dataset: 0-664298 | Loss: 0.464 | 598 ms/step , 115483.55 GFLOP/s , 173514.4 tokens/s INFO:__main__:2024-11-30 13:02:11 | Epoch: 1 | Step: 363450 | Dataset: 0-666698 | Loss: 0.464 | 598 ms/step , 115425.64 GFLOP/s , 173391.9 tokens/s INFO:__main__:2024-11-30 13:02:19 | Epoch: 1 | Step: 363460 | Dataset: 0-669098 | Loss: 0.498 | 598 ms/step , 115420.17 GFLOP/s , 173413.6 tokens/s INFO:__main__:2024-11-30 13:02:26 | Epoch: 1 | Step: 363470 | Dataset: 0-671498 | Loss: 0.499 | 598 ms/step , 115482.05 GFLOP/s , 173484.2 tokens/s INFO:__main__:2024-11-30 13:02:33 | Epoch: 1 | Step: 363480 | Dataset: 0-673898 | Loss: 0.463 | 598 ms/step , 115416.56 GFLOP/s , 173429.6 tokens/s INFO:__main__:2024-11-30 13:02:40 | Epoch: 1 | Step: 363490 | Dataset: 0-676298 | Loss: 0.466 | 598 ms/step , 115362.29 GFLOP/s , 173546.9 tokens/s INFO:__main__:2024-11-30 13:02:47 | Validation | Step: 363500 | Val_loss: 0.560 | Best_val_loss: 0.3487 INFO:__main__:2024-11-30 13:02:48 | Epoch: 1 | Step: 363500 | Dataset: 0-678698 | Loss: 0.464 | 596 ms/step , 115752.56 GFLOP/s , 147652.6 tokens/s INFO:__main__:2024-11-30 13:02:55 | Epoch: 1 | Step: 363510 | Dataset: 0-681098 | Loss: 0.402 | 598 ms/step , 115431.46 GFLOP/s , 173446.8 tokens/s INFO:__main__:2024-11-30 13:03:02 | Epoch: 1 | Step: 363520 | Dataset: 0-683498 | Loss: 0.424 | 597 ms/step , 115555.79 GFLOP/s , 173486.2 tokens/s INFO:__main__:2024-11-30 13:03:09 | Epoch: 1 | Step: 363530 | Dataset: 0-685898 | Loss: 0.462 | 597 ms/step , 115673.17 GFLOP/s , 173463.2 tokens/s INFO:__main__:2024-11-30 13:03:16 | Epoch: 1 | Step: 363540 | Dataset: 0-688298 | Loss: 0.512 | 597 ms/step , 115506.26 GFLOP/s , 173519.1 tokens/s INFO:__main__:2024-11-30 13:03:23 | Epoch: 1 | Step: 363550 | Dataset: 0-690698 | Loss: 0.455 | 598 ms/step , 115422.93 GFLOP/s , 173382.0 tokens/s INFO:__main__:2024-11-30 13:03:31 | Epoch: 1 | Step: 363560 | Dataset: 0-693098 | Loss: 0.512 | 598 ms/step , 115387.64 GFLOP/s , 173399.3 tokens/s INFO:__main__:2024-11-30 13:03:38 | Epoch: 1 | Step: 363570 | Dataset: 0-695498 | Loss: 0.494 | 598 ms/step , 115358.08 GFLOP/s , 173478.0 tokens/s INFO:__main__:2024-11-30 13:03:45 | Epoch: 1 | Step: 363580 | Dataset: 0-697898 | Loss: 0.416 | 598 ms/step , 115441.52 GFLOP/s , 173379.7 tokens/s INFO:__main__:2024-11-30 13:03:52 | Epoch: 1 | Step: 363590 | Dataset: 0-700298 | Loss: 0.460 | 598 ms/step , 115413.19 GFLOP/s , 173503.8 tokens/s INFO:__main__:2024-11-30 13:03:59 | Epoch: 1 | Step: 363600 | Dataset: 0-702698 | Loss: 0.476 | 598 ms/step , 115393.55 GFLOP/s , 173351.8 tokens/s INFO:__main__:2024-11-30 13:04:06 | Epoch: 1 | Step: 363610 | Dataset: 0-705098 | Loss: 0.459 | 597 ms/step , 115606.83 GFLOP/s , 173483.9 tokens/s INFO:__main__:2024-11-30 13:04:13 | Epoch: 1 | Step: 363620 | Dataset: 0-707498 | Loss: 0.387 | 597 ms/step , 115580.24 GFLOP/s , 173509.1 tokens/s INFO:__main__:2024-11-30 13:04:20 | Epoch: 1 | Step: 363630 | Dataset: 0-709898 | Loss: 0.520 | 598 ms/step , 115404.59 GFLOP/s , 173421.5 tokens/s INFO:__main__:2024-11-30 13:04:27 | Epoch: 1 | Step: 363640 | Dataset: 0-712298 | Loss: 0.457 | 598 ms/step , 115402.35 GFLOP/s , 173519.4 tokens/s INFO:__main__:2024-11-30 13:04:34 | Epoch: 1 | Step: 363650 | Dataset: 0-714698 | Loss: 0.462 | 598 ms/step , 115397.01 GFLOP/s , 173427.2 tokens/s INFO:__main__:2024-11-30 13:04:41 | Epoch: 1 | Step: 363660 | Dataset: 0-717098 | Loss: 0.462 | 598 ms/step , 115423.53 GFLOP/s , 173471.3 tokens/s INFO:__main__:2024-11-30 13:04:49 | Epoch: 1 | Step: 363670 | Dataset: 0-719498 | Loss: 0.531 | 597 ms/step , 115541.36 GFLOP/s , 173378.6 tokens/s INFO:__main__:2024-11-30 13:04:56 | Epoch: 1 | Step: 363680 | Dataset: 0-721898 | Loss: 0.395 | 598 ms/step , 115488.47 GFLOP/s , 173400.8 tokens/s INFO:__main__:2024-11-30 13:05:03 | Epoch: 1 | Step: 363690 | Dataset: 0-724298 | Loss: 0.431 | 598 ms/step , 115413.43 GFLOP/s , 173451.5 tokens/s INFO:__main__:2024-11-30 13:05:10 | Epoch: 1 | Step: 363700 | Dataset: 0-726698 | Loss: 0.454 | 597 ms/step , 115550.09 GFLOP/s , 173446.5 tokens/s INFO:__main__:2024-11-30 13:05:17 | Epoch: 1 | Step: 363710 | Dataset: 0-729098 | Loss: 0.483 | 598 ms/step , 115465.40 GFLOP/s , 173461.8 tokens/s INFO:__main__:2024-11-30 13:05:24 | Epoch: 1 | Step: 363720 | Dataset: 0-731498 | Loss: 0.423 | 597 ms/step , 115620.42 GFLOP/s , 173395.9 tokens/s INFO:__main__:2024-11-30 13:05:31 | Epoch: 1 | Step: 363730 | Dataset: 0-733898 | Loss: 0.421 | 598 ms/step , 115409.03 GFLOP/s , 173403.2 tokens/s INFO:__main__:2024-11-30 13:05:38 | Epoch: 1 | Step: 363740 | Dataset: 0-736298 | Loss: 0.518 | 597 ms/step , 115569.87 GFLOP/s , 173487.2 tokens/s INFO:__main__:2024-11-30 13:05:45 | Epoch: 1 | Step: 363750 | Dataset: 0-738698 | Loss: 0.418 | 598 ms/step , 115456.55 GFLOP/s , 173483.7 tokens/s INFO:__main__:2024-11-30 13:05:52 | Epoch: 1 | Step: 363760 | Dataset: 0-741098 | Loss: 0.473 | 598 ms/step , 115459.85 GFLOP/s , 173434.7 tokens/s INFO:__main__:2024-11-30 13:05:59 | Epoch: 1 | Step: 363770 | Dataset: 0-743498 | Loss: 0.492 | 598 ms/step , 115380.41 GFLOP/s , 173313.9 tokens/s INFO:__main__:2024-11-30 13:06:06 | Epoch: 1 | Step: 363780 | Dataset: 0-745898 | Loss: 0.460 | 597 ms/step , 115537.16 GFLOP/s , 173424.9 tokens/s INFO:__main__:2024-11-30 13:06:14 | Epoch: 1 | Step: 363790 | Dataset: 0-748298 | Loss: 0.461 | 597 ms/step , 115522.28 GFLOP/s , 173429.9 tokens/s INFO:__main__:2024-11-30 13:06:21 | Epoch: 1 | Step: 363800 | Dataset: 0-750698 | Loss: 0.439 | 598 ms/step , 115368.18 GFLOP/s , 173503.8 tokens/s INFO:__main__:2024-11-30 13:06:28 | Epoch: 1 | Step: 363810 | Dataset: 0-753098 | Loss: 0.492 | 598 ms/step , 115394.82 GFLOP/s , 173378.0 tokens/s INFO:__main__:2024-11-30 13:06:35 | Epoch: 1 | Step: 363820 | Dataset: 0-755498 | Loss: 0.467 | 598 ms/step , 115436.19 GFLOP/s , 173439.3 tokens/s INFO:__main__:2024-11-30 13:06:42 | Epoch: 1 | Step: 363830 | Dataset: 0-757898 | Loss: 0.434 | 598 ms/step , 115394.06 GFLOP/s , 173372.0 tokens/s INFO:__main__:2024-11-30 13:06:49 | Epoch: 1 | Step: 363840 | Dataset: 0-760298 | Loss: 0.419 | 598 ms/step , 115361.84 GFLOP/s , 173617.3 tokens/s INFO:__main__:2024-11-30 13:06:56 | Epoch: 1 | Step: 363850 | Dataset: 0-762698 | Loss: 0.433 | 598 ms/step , 115366.30 GFLOP/s , 173677.6 tokens/s INFO:__main__:2024-11-30 13:07:03 | Epoch: 1 | Step: 363860 | Dataset: 0-765098 | Loss: 0.429 | 598 ms/step , 115385.97 GFLOP/s , 173584.8 tokens/s INFO:__main__:2024-11-30 13:07:10 | Epoch: 1 | Step: 363870 | Dataset: 0-767498 | Loss: 0.449 | 598 ms/step , 115336.44 GFLOP/s , 173584.2 tokens/s INFO:__main__:2024-11-30 13:07:17 | Epoch: 1 | Step: 363880 | Dataset: 0-769898 | Loss: 0.485 | 599 ms/step , 115291.12 GFLOP/s , 173577.8 tokens/s INFO:__main__:2024-11-30 13:07:24 | Epoch: 1 | Step: 363890 | Dataset: 0-772298 | Loss: 0.448 | 597 ms/step , 115509.24 GFLOP/s , 173686.5 tokens/s INFO:__main__:2024-11-30 13:07:31 | Epoch: 1 | Step: 363900 | Dataset: 0-774698 | Loss: 0.397 | 597 ms/step , 115571.93 GFLOP/s , 173658.2 tokens/s INFO:__main__:2024-11-30 13:07:38 | Epoch: 1 | Step: 363910 | Dataset: 0-777098 | Loss: 0.417 | 598 ms/step , 115433.23 GFLOP/s , 173600.0 tokens/s INFO:__main__:2024-11-30 13:07:46 | Epoch: 1 | Step: 363920 | Dataset: 0-779498 | Loss: 0.467 | 597 ms/step , 115564.29 GFLOP/s , 173628.5 tokens/s INFO:__main__:2024-11-30 13:07:53 | Epoch: 1 | Step: 363930 | Dataset: 0-781898 | Loss: 0.429 | 598 ms/step , 115463.30 GFLOP/s , 173634.0 tokens/s INFO:__main__:2024-11-30 13:08:00 | Epoch: 1 | Step: 363940 | Dataset: 0-784298 | Loss: 0.419 | 602 ms/step , 114689.75 GFLOP/s , 173573.8 tokens/s INFO:__main__:2024-11-30 13:08:07 | Epoch: 1 | Step: 363950 | Dataset: 0-786698 | Loss: 0.378 | 598 ms/step , 115393.90 GFLOP/s , 173694.2 tokens/s INFO:__main__:2024-11-30 13:08:14 | Epoch: 1 | Step: 363960 | Dataset: 0-789098 | Loss: 0.591 | 598 ms/step , 115451.43 GFLOP/s , 173577.3 tokens/s INFO:__main__:2024-11-30 13:08:21 | Epoch: 1 | Step: 363970 | Dataset: 0-791498 | Loss: 0.673 | 598 ms/step , 115435.16 GFLOP/s , 173560.0 tokens/s INFO:__main__:2024-11-30 13:08:28 | Epoch: 1 | Step: 363980 | Dataset: 0-793898 | Loss: 0.698 | 598 ms/step , 115395.40 GFLOP/s , 173551.6 tokens/s INFO:__main__:2024-11-30 13:08:35 | Epoch: 1 | Step: 363990 | Dataset: 0-796298 | Loss: 0.592 | 598 ms/step , 115362.11 GFLOP/s , 173555.3 tokens/s INFO:__main__:2024-11-30 13:08:43 | Validation | Step: 364000 | Val_loss: 0.592 | Best_val_loss: 0.3487 INFO:__main__:2024-11-30 13:08:43 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_130843_step_364000.pt` INFO:__main__:2024-11-30 13:08:45 | Epoch: 1 | Step: 364000 | Dataset: 0-798698 | Loss: 0.577 | 594 ms/step , 116123.12 GFLOP/s , 121147.4 tokens/s INFO:__main__:2024-11-30 13:08:52 | Epoch: 1 | Step: 364010 | Dataset: 0-801098 | Loss: 0.570 | 598 ms/step , 115370.11 GFLOP/s , 173562.1 tokens/s INFO:__main__:2024-11-30 13:08:59 | Epoch: 1 | Step: 364020 | Dataset: 0-803498 | Loss: 0.658 | 598 ms/step , 115500.55 GFLOP/s , 173473.1 tokens/s INFO:__main__:2024-11-30 13:09:07 | Epoch: 1 | Step: 364030 | Dataset: 0-805898 | Loss: 0.691 | 597 ms/step , 115578.67 GFLOP/s , 173550.3 tokens/s INFO:__main__:2024-11-30 13:09:14 | Epoch: 1 | Step: 364040 | Dataset: 0-808298 | Loss: 0.685 | 597 ms/step , 115531.65 GFLOP/s , 173503.3 tokens/s INFO:__main__:2024-11-30 13:09:21 | Epoch: 1 | Step: 364050 | Dataset: 0-810698 | Loss: 0.677 | 597 ms/step , 115566.51 GFLOP/s , 173522.0 tokens/s INFO:__main__:2024-11-30 13:09:28 | Epoch: 1 | Step: 364060 | Dataset: 0-813098 | Loss: 0.628 | 598 ms/step , 115413.92 GFLOP/s , 173413.2 tokens/s INFO:__main__:2024-11-30 13:09:35 | Epoch: 1 | Step: 364070 | Dataset: 0-815498 | Loss: 0.684 | 598 ms/step , 115485.35 GFLOP/s , 173472.3 tokens/s INFO:__main__:2024-11-30 13:09:42 | Epoch: 1 | Step: 364080 | Dataset: 0-817898 | Loss: 0.723 | 598 ms/step , 115319.61 GFLOP/s , 173405.6 tokens/s INFO:__main__:2024-11-30 13:09:49 | Epoch: 1 | Step: 364090 | Dataset: 0-820298 | Loss: 0.622 | 598 ms/step , 115497.74 GFLOP/s , 173382.9 tokens/s INFO:__main__:2024-11-30 13:09:56 | Epoch: 1 | Step: 364100 | Dataset: 0-822698 | Loss: 0.650 | 598 ms/step , 115457.45 GFLOP/s , 173434.9 tokens/s INFO:__main__:2024-11-30 13:10:03 | Epoch: 1 | Step: 364110 | Dataset: 0-825098 | Loss: 0.650 | 597 ms/step , 115656.08 GFLOP/s , 173506.7 tokens/s INFO:__main__:2024-11-30 13:10:10 | Epoch: 1 | Step: 364120 | Dataset: 0-827498 | Loss: 0.549 | 597 ms/step , 115648.35 GFLOP/s , 173583.8 tokens/s INFO:__main__:2024-11-30 13:10:17 | Epoch: 1 | Step: 364130 | Dataset: 0-829898 | Loss: 0.673 | 597 ms/step , 115648.65 GFLOP/s , 173476.5 tokens/s INFO:__main__:2024-11-30 13:10:24 | Epoch: 1 | Step: 364140 | Dataset: 0-832298 | Loss: 0.776 | 598 ms/step , 115475.83 GFLOP/s , 173552.4 tokens/s INFO:__main__:2024-11-30 13:10:32 | Epoch: 1 | Step: 364150 | Dataset: 0-834698 | Loss: 0.748 | 599 ms/step , 115191.64 GFLOP/s , 173437.2 tokens/s INFO:__main__:2024-11-30 13:10:39 | Epoch: 1 | Step: 364160 | Dataset: 0-837098 | Loss: 0.599 | 597 ms/step , 115576.81 GFLOP/s , 173357.1 tokens/s INFO:__main__:2024-11-30 13:10:46 | Epoch: 1 | Step: 364170 | Dataset: 0-839498 | Loss: 0.651 | 598 ms/step , 115379.13 GFLOP/s , 173525.9 tokens/s INFO:__main__:2024-11-30 13:10:53 | Epoch: 1 | Step: 364180 | Dataset: 0-841898 | Loss: 0.602 | 597 ms/step , 115633.73 GFLOP/s , 173424.2 tokens/s INFO:__main__:2024-11-30 13:11:00 | Epoch: 1 | Step: 364190 | Dataset: 0-844298 | Loss: 0.708 | 598 ms/step , 115366.30 GFLOP/s , 173207.8 tokens/s INFO:__main__:2024-11-30 13:11:07 | Epoch: 1 | Step: 364200 | Dataset: 0-846698 | Loss: 0.650 | 597 ms/step , 115589.32 GFLOP/s , 173396.3 tokens/s INFO:__main__:2024-11-30 13:11:14 | Epoch: 1 | Step: 364210 | Dataset: 0-849098 | Loss: 0.680 | 598 ms/step , 115466.51 GFLOP/s , 173437.8 tokens/s INFO:__main__:2024-11-30 13:11:21 | Epoch: 1 | Step: 364220 | Dataset: 0-851498 | Loss: 0.741 | 597 ms/step , 115628.51 GFLOP/s , 173432.3 tokens/s INFO:__main__:2024-11-30 13:11:28 | Epoch: 1 | Step: 364230 | Dataset: 0-853898 | Loss: 0.707 | 598 ms/step , 115334.21 GFLOP/s , 173403.9 tokens/s INFO:__main__:2024-11-30 13:11:35 | Epoch: 1 | Step: 364240 | Dataset: 0-856298 | Loss: 0.664 | 597 ms/step , 115550.83 GFLOP/s , 173457.6 tokens/s INFO:__main__:2024-11-30 13:11:42 | Epoch: 1 | Step: 364250 | Dataset: 0-858698 | Loss: 0.653 | 598 ms/step , 115412.71 GFLOP/s , 173478.6 tokens/s INFO:__main__:2024-11-30 13:11:49 | Epoch: 1 | Step: 364260 | Dataset: 0-861098 | Loss: 0.782 | 597 ms/step , 115573.11 GFLOP/s , 173436.9 tokens/s INFO:__main__:2024-11-30 13:11:57 | Epoch: 1 | Step: 364270 | Dataset: 0-863498 | Loss: 0.704 | 598 ms/step , 115341.63 GFLOP/s , 173517.2 tokens/s INFO:__main__:2024-11-30 13:12:04 | Epoch: 1 | Step: 364280 | Dataset: 0-865898 | Loss: 0.737 | 597 ms/step , 115580.95 GFLOP/s , 173430.4 tokens/s INFO:__main__:2024-11-30 13:12:11 | Epoch: 1 | Step: 364290 | Dataset: 0-868298 | Loss: 0.593 | 598 ms/step , 115424.80 GFLOP/s , 173539.5 tokens/s INFO:__main__:2024-11-30 13:12:18 | Epoch: 1 | Step: 364300 | Dataset: 0-870698 | Loss: 0.608 | 598 ms/step , 115422.17 GFLOP/s , 173434.4 tokens/s INFO:__main__:2024-11-30 13:12:25 | Epoch: 1 | Step: 364310 | Dataset: 0-873098 | Loss: 0.632 | 597 ms/step , 115514.90 GFLOP/s , 173435.1 tokens/s INFO:__main__:2024-11-30 13:12:32 | Epoch: 1 | Step: 364320 | Dataset: 0-875498 | Loss: 0.680 | 598 ms/step , 115480.91 GFLOP/s , 173466.4 tokens/s INFO:__main__:2024-11-30 13:12:39 | Epoch: 1 | Step: 364330 | Dataset: 0-877898 | Loss: 0.728 | 598 ms/step , 115394.00 GFLOP/s , 173430.8 tokens/s INFO:__main__:2024-11-30 13:12:46 | Epoch: 1 | Step: 364340 | Dataset: 0-880298 | Loss: 0.709 | 597 ms/step , 115508.98 GFLOP/s , 173503.6 tokens/s INFO:__main__:2024-11-30 13:12:53 | Epoch: 1 | Step: 364350 | Dataset: 0-882698 | Loss: 0.666 | 598 ms/step , 115331.12 GFLOP/s , 173343.7 tokens/s INFO:__main__:2024-11-30 13:13:00 | Epoch: 1 | Step: 364360 | Dataset: 0-885098 | Loss: 0.713 | 598 ms/step , 115332.49 GFLOP/s , 173383.2 tokens/s INFO:__main__:2024-11-30 13:13:07 | Epoch: 1 | Step: 364370 | Dataset: 0-887498 | Loss: 0.665 | 599 ms/step , 115188.99 GFLOP/s , 173331.2 tokens/s INFO:__main__:2024-11-30 13:13:14 | Epoch: 1 | Step: 364380 | Dataset: 0-889898 | Loss: 0.762 | 597 ms/step , 115506.74 GFLOP/s , 173360.4 tokens/s INFO:__main__:2024-11-30 13:13:22 | Epoch: 1 | Step: 364390 | Dataset: 0-892298 | Loss: 0.655 | 598 ms/step , 115393.45 GFLOP/s , 173475.7 tokens/s INFO:__main__:2024-11-30 13:13:29 | Epoch: 1 | Step: 364400 | Dataset: 0-894698 | Loss: 0.649 | 597 ms/step , 115504.75 GFLOP/s , 173391.7 tokens/s INFO:__main__:2024-11-30 13:13:36 | Epoch: 1 | Step: 364410 | Dataset: 0-897098 | Loss: 0.631 | 597 ms/step , 115663.77 GFLOP/s , 173512.3 tokens/s INFO:__main__:2024-11-30 13:13:43 | Epoch: 1 | Step: 364420 | Dataset: 0-899498 | Loss: 0.640 | 596 ms/step , 115719.91 GFLOP/s , 173439.2 tokens/s INFO:__main__:2024-11-30 13:13:50 | Epoch: 1 | Step: 364430 | Dataset: 0-901898 | Loss: 0.648 | 596 ms/step , 115723.31 GFLOP/s , 173290.1 tokens/s INFO:__main__:2024-11-30 13:13:57 | Epoch: 1 | Step: 364440 | Dataset: 0-904298 | Loss: 0.619 | 598 ms/step , 115415.49 GFLOP/s , 173469.7 tokens/s INFO:__main__:2024-11-30 13:14:04 | Epoch: 1 | Step: 364450 | Dataset: 0-906698 | Loss: 0.634 | 598 ms/step , 115493.72 GFLOP/s , 173524.9 tokens/s INFO:__main__:2024-11-30 13:14:11 | Epoch: 1 | Step: 364460 | Dataset: 0-909098 | Loss: 0.650 | 597 ms/step , 115635.47 GFLOP/s , 173557.6 tokens/s INFO:__main__:2024-11-30 13:14:18 | Epoch: 1 | Step: 364470 | Dataset: 0-911498 | Loss: 0.742 | 598 ms/step , 115497.21 GFLOP/s , 173549.9 tokens/s INFO:__main__:2024-11-30 13:14:25 | Epoch: 1 | Step: 364480 | Dataset: 0-913898 | Loss: 0.581 | 597 ms/step , 115695.08 GFLOP/s , 173484.4 tokens/s INFO:__main__:2024-11-30 13:14:32 | Epoch: 1 | Step: 364490 | Dataset: 0-916298 | Loss: 0.602 | 598 ms/step , 115490.77 GFLOP/s , 173532.6 tokens/s INFO:__main__:2024-11-30 13:14:40 | Validation | Step: 364500 | Val_loss: 0.567 | Best_val_loss: 0.3487 INFO:__main__:2024-11-30 13:14:41 | Epoch: 1 | Step: 364500 | Dataset: 0-918698 | Loss: 0.615 | 596 ms/step , 115750.41 GFLOP/s , 147788.8 tokens/s INFO:__main__:2024-11-30 13:14:48 | Epoch: 1 | Step: 364510 | Dataset: 0-921098 | Loss: 0.694 | 597 ms/step , 115592.08 GFLOP/s , 173576.1 tokens/s INFO:__main__:2024-11-30 13:14:55 | Epoch: 1 | Step: 364520 | Dataset: 0-923498 | Loss: 0.677 | 598 ms/step , 115498.53 GFLOP/s , 173565.0 tokens/s INFO:__main__:2024-11-30 13:15:02 | Epoch: 1 | Step: 364530 | Dataset: 0-925898 | Loss: 0.674 | 598 ms/step , 115457.54 GFLOP/s , 173569.3 tokens/s INFO:__main__:2024-11-30 13:15:09 | Epoch: 1 | Step: 364540 | Dataset: 0-928298 | Loss: 0.706 | 597 ms/step , 115685.70 GFLOP/s , 173588.1 tokens/s INFO:__main__:2024-11-30 13:15:16 | Epoch: 1 | Step: 364550 | Dataset: 0-930698 | Loss: 0.704 | 598 ms/step , 115378.38 GFLOP/s , 173540.8 tokens/s INFO:__main__:2024-11-30 13:15:23 | Epoch: 1 | Step: 364560 | Dataset: 0-933098 | Loss: 0.655 | 596 ms/step , 115729.84 GFLOP/s , 173609.5 tokens/s INFO:__main__:2024-11-30 13:15:30 | Epoch: 1 | Step: 364570 | Dataset: 0-935498 | Loss: 0.602 | 597 ms/step , 115615.83 GFLOP/s , 173532.3 tokens/s INFO:__main__:2024-11-30 13:15:37 | Epoch: 1 | Step: 364580 | Dataset: 0-937898 | Loss: 0.677 | 597 ms/step , 115641.07 GFLOP/s , 173481.9 tokens/s INFO:__main__:2024-11-30 13:15:44 | Epoch: 1 | Step: 364590 | Dataset: 0-940298 | Loss: 0.639 | 597 ms/step , 115559.86 GFLOP/s , 173596.2 tokens/s INFO:__main__:2024-11-30 13:15:52 | Epoch: 1 | Step: 364600 | Dataset: 0-942698 | Loss: 0.621 | 597 ms/step , 115557.25 GFLOP/s , 173517.9 tokens/s INFO:__main__:2024-11-30 13:15:59 | Epoch: 1 | Step: 364610 | Dataset: 0-945098 | Loss: 0.642 | 597 ms/step , 115689.10 GFLOP/s , 173395.9 tokens/s INFO:__main__:2024-11-30 13:16:06 | Epoch: 1 | Step: 364620 | Dataset: 0-947498 | Loss: 0.575 | 597 ms/step , 115690.26 GFLOP/s , 173547.9 tokens/s INFO:__main__:2024-11-30 13:16:13 | Epoch: 1 | Step: 364630 | Dataset: 0-949898 | Loss: 0.669 | 596 ms/step , 115701.97 GFLOP/s , 173547.1 tokens/s INFO:__main__:2024-11-30 13:16:20 | Epoch: 1 | Step: 364640 | Dataset: 0-952298 | Loss: 0.712 | 597 ms/step , 115522.38 GFLOP/s , 173419.6 tokens/s INFO:__main__:2024-11-30 13:16:27 | Epoch: 1 | Step: 364650 | Dataset: 0-954698 | Loss: 0.579 | 597 ms/step , 115643.96 GFLOP/s , 173470.2 tokens/s INFO:__main__:2024-11-30 13:16:34 | Epoch: 1 | Step: 364660 | Dataset: 0-957098 | Loss: 0.649 | 597 ms/step , 115508.89 GFLOP/s , 173535.7 tokens/s INFO:__main__:2024-11-30 13:16:41 | Epoch: 1 | Step: 364670 | Dataset: 0-959498 | Loss: 0.721 | 597 ms/step , 115686.44 GFLOP/s , 173555.5 tokens/s INFO:__main__:2024-11-30 13:16:48 | Epoch: 1 | Step: 364680 | Dataset: 0-961898 | Loss: 0.621 | 597 ms/step , 115596.16 GFLOP/s , 173519.5 tokens/s INFO:__main__:2024-11-30 13:16:55 | Epoch: 1 | Step: 364690 | Dataset: 0-964298 | Loss: 0.671 | 597 ms/step , 115591.11 GFLOP/s , 173594.8 tokens/s INFO:__main__:2024-11-30 13:17:02 | Epoch: 1 | Step: 364700 | Dataset: 0-966698 | Loss: 0.635 | 597 ms/step , 115547.40 GFLOP/s , 173551.4 tokens/s INFO:__main__:2024-11-30 13:17:09 | Epoch: 1 | Step: 364710 | Dataset: 0-969098 | Loss: 0.736 | 596 ms/step , 115779.86 GFLOP/s , 173609.7 tokens/s INFO:__main__:2024-11-30 13:17:16 | Epoch: 1 | Step: 364720 | Dataset: 0-971498 | Loss: 0.637 | 597 ms/step , 115553.52 GFLOP/s , 173580.5 tokens/s INFO:__main__:2024-11-30 13:17:24 | Epoch: 1 | Step: 364730 | Dataset: 0-973898 | Loss: 0.628 | 597 ms/step , 115562.17 GFLOP/s , 173621.9 tokens/s INFO:__main__:2024-11-30 13:17:31 | Epoch: 1 | Step: 364740 | Dataset: 0-976298 | Loss: 0.630 | 597 ms/step , 115554.61 GFLOP/s , 173660.2 tokens/s INFO:__main__:2024-11-30 13:17:38 | Epoch: 1 | Step: 364750 | Dataset: 0-978698 | Loss: 0.660 | 597 ms/step , 115673.40 GFLOP/s , 173630.1 tokens/s INFO:__main__:2024-11-30 13:17:45 | Epoch: 1 | Step: 364760 | Dataset: 0-981098 | Loss: 0.599 | 597 ms/step , 115621.86 GFLOP/s , 173580.3 tokens/s INFO:__main__:2024-11-30 13:17:52 | Epoch: 1 | Step: 364770 | Dataset: 0-983498 | Loss: 0.580 | 600 ms/step , 115066.73 GFLOP/s , 173580.9 tokens/s INFO:__main__:2024-11-30 13:17:59 | Epoch: 1 | Step: 364780 | Dataset: 0-985898 | Loss: 0.658 | 596 ms/step , 115725.16 GFLOP/s , 173570.5 tokens/s INFO:__main__:2024-11-30 13:18:06 | Epoch: 1 | Step: 364790 | Dataset: 0-988298 | Loss: 0.612 | 597 ms/step , 115636.90 GFLOP/s , 173588.7 tokens/s INFO:__main__:2024-11-30 13:18:13 | Epoch: 1 | Step: 364800 | Dataset: 0-990698 | Loss: 0.625 | 597 ms/step , 115552.25 GFLOP/s , 173622.2 tokens/s INFO:__main__:2024-11-30 13:18:20 | Epoch: 1 | Step: 364810 | Dataset: 0-993098 | Loss: 0.659 | 597 ms/step , 115538.97 GFLOP/s , 173673.7 tokens/s INFO:__main__:2024-11-30 13:18:27 | Epoch: 1 | Step: 364820 | Dataset: 0-995498 | Loss: 0.714 | 598 ms/step , 115496.91 GFLOP/s , 173597.4 tokens/s INFO:__main__:2024-11-30 13:18:34 | Epoch: 1 | Step: 364830 | Dataset: 0-997898 | Loss: 0.616 | 597 ms/step , 115588.62 GFLOP/s , 173680.8 tokens/s INFO:__main__:2024-11-30 13:18:41 | Epoch: 1 | Step: 364840 | Dataset: 0-1000298 | Loss: 0.707 | 596 ms/step , 115711.14 GFLOP/s , 173539.4 tokens/s INFO:__main__:2024-11-30 13:18:49 | Epoch: 1 | Step: 364850 | Dataset: 0-1002698 | Loss: 0.694 | 598 ms/step , 115400.41 GFLOP/s , 173571.8 tokens/s INFO:__main__:2024-11-30 13:18:56 | Epoch: 1 | Step: 364860 | Dataset: 0-1005098 | Loss: 0.709 | 597 ms/step , 115544.44 GFLOP/s , 173392.2 tokens/s INFO:__main__:2024-11-30 13:19:03 | Epoch: 1 | Step: 364870 | Dataset: 0-1007498 | Loss: 0.676 | 597 ms/step , 115549.79 GFLOP/s , 173359.5 tokens/s INFO:__main__:2024-11-30 13:19:10 | Epoch: 1 | Step: 364880 | Dataset: 0-1009898 | Loss: 0.648 | 597 ms/step , 115682.70 GFLOP/s , 173561.9 tokens/s INFO:__main__:2024-11-30 13:19:17 | Epoch: 1 | Step: 364890 | Dataset: 0-1012298 | Loss: 0.673 | 597 ms/step , 115566.78 GFLOP/s , 173518.2 tokens/s INFO:__main__:2024-11-30 13:19:24 | Epoch: 1 | Step: 364900 | Dataset: 0-1014698 | Loss: 0.654 | 599 ms/step , 115215.28 GFLOP/s , 173375.6 tokens/s INFO:__main__:2024-11-30 13:19:31 | Epoch: 1 | Step: 364910 | Dataset: 0-1017098 | Loss: 0.589 | 597 ms/step , 115509.63 GFLOP/s , 173475.1 tokens/s INFO:__main__:2024-11-30 13:19:38 | Epoch: 1 | Step: 364920 | Dataset: 0-1019498 | Loss: 0.580 | 597 ms/step , 115642.10 GFLOP/s , 173596.1 tokens/s INFO:__main__:2024-11-30 13:19:45 | Epoch: 1 | Step: 364930 | Dataset: 0-1021898 | Loss: 0.602 | 598 ms/step , 115444.37 GFLOP/s , 173493.7 tokens/s INFO:__main__:2024-11-30 13:19:52 | Epoch: 1 | Step: 364940 | Dataset: 0-1024298 | Loss: 0.738 | 597 ms/step , 115658.96 GFLOP/s , 173556.2 tokens/s INFO:__main__:2024-11-30 13:19:59 | Epoch: 1 | Step: 364950 | Dataset: 0-1026698 | Loss: 0.622 | 598 ms/step , 115428.06 GFLOP/s , 173498.3 tokens/s INFO:__main__:2024-11-30 13:20:06 | Epoch: 1 | Step: 364960 | Dataset: 0-1029098 | Loss: 0.663 | 597 ms/step , 115512.96 GFLOP/s , 173434.0 tokens/s INFO:__main__:2024-11-30 13:20:14 | Epoch: 1 | Step: 364970 | Dataset: 0-1031498 | Loss: 0.689 | 597 ms/step , 115516.94 GFLOP/s , 173445.8 tokens/s INFO:__main__:2024-11-30 13:20:21 | Epoch: 1 | Step: 364980 | Dataset: 0-1033898 | Loss: 0.664 | 597 ms/step , 115677.31 GFLOP/s , 173338.9 tokens/s INFO:__main__:2024-11-30 13:20:28 | Epoch: 1 | Step: 364990 | Dataset: 0-1036298 | Loss: 0.643 | 598 ms/step , 115340.31 GFLOP/s , 173618.8 tokens/s INFO:__main__:2024-11-30 13:20:35 | Validation | Step: 365000 | Val_loss: 0.576 | Best_val_loss: 0.3487 INFO:__main__:2024-11-30 13:20:35 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_132035_step_365000.pt` INFO:__main__:2024-11-30 13:20:38 | Epoch: 1 | Step: 365000 | Dataset: 0-1038698 | Loss: 0.617 | 594 ms/step , 116157.56 GFLOP/s , 120833.0 tokens/s INFO:__main__:2024-11-30 13:20:45 | Epoch: 1 | Step: 365010 | Dataset: 0-1041098 | Loss: 0.665 | 596 ms/step , 115779.27 GFLOP/s , 173723.9 tokens/s INFO:__main__:2024-11-30 13:20:52 | Epoch: 1 | Step: 365020 | Dataset: 0-1043498 | Loss: 0.623 | 598 ms/step , 115465.89 GFLOP/s , 173702.0 tokens/s INFO:__main__:2024-11-30 13:20:59 | Epoch: 1 | Step: 365030 | Dataset: 0-1045898 | Loss: 0.641 | 598 ms/step , 115483.45 GFLOP/s , 173578.0 tokens/s INFO:__main__:2024-11-30 13:21:06 | Epoch: 1 | Step: 365040 | Dataset: 0-1048298 | Loss: 0.632 | 597 ms/step , 115558.58 GFLOP/s , 173610.1 tokens/s INFO:__main__:2024-11-30 13:21:13 | Epoch: 1 | Step: 365050 | Dataset: 0-1050698 | Loss: 0.712 | 597 ms/step , 115692.34 GFLOP/s , 173666.9 tokens/s INFO:__main__:2024-11-30 13:21:20 | Epoch: 1 | Step: 365060 | Dataset: 0-1053098 | Loss: 0.663 | 598 ms/step , 115440.52 GFLOP/s , 173593.9 tokens/s INFO:__main__:2024-11-30 13:21:27 | Epoch: 1 | Step: 365070 | Dataset: 0-1055498 | Loss: 0.595 | 596 ms/step , 115709.24 GFLOP/s , 173537.7 tokens/s INFO:__main__:2024-11-30 13:21:34 | Epoch: 1 | Step: 365080 | Dataset: 0-1057898 | Loss: 0.689 | 597 ms/step , 115558.82 GFLOP/s , 173561.9 tokens/s INFO:__main__:2024-11-30 13:21:42 | Epoch: 1 | Step: 365090 | Dataset: 0-1060298 | Loss: 0.651 | 597 ms/step , 115613.90 GFLOP/s , 173505.6 tokens/s INFO:__main__:2024-11-30 13:21:49 | Epoch: 1 | Step: 365100 | Dataset: 0-1062698 | Loss: 0.643 | 597 ms/step , 115620.39 GFLOP/s , 173608.9 tokens/s INFO:__main__:2024-11-30 13:21:56 | Epoch: 1 | Step: 365110 | Dataset: 0-1065098 | Loss: 0.660 | 598 ms/step , 115464.75 GFLOP/s , 173600.8 tokens/s INFO:__main__:2024-11-30 13:22:03 | Epoch: 1 | Step: 365120 | Dataset: 0-1067498 | Loss: 0.626 | 598 ms/step , 115448.28 GFLOP/s , 173690.3 tokens/s INFO:__main__:2024-11-30 13:22:10 | Epoch: 1 | Step: 365130 | Dataset: 0-1069898 | Loss: 0.618 | 597 ms/step , 115597.86 GFLOP/s , 173660.1 tokens/s INFO:__main__:2024-11-30 13:22:17 | Epoch: 1 | Step: 365140 | Dataset: 0-1072298 | Loss: 0.591 | 597 ms/step , 115569.26 GFLOP/s , 173620.6 tokens/s INFO:__main__:2024-11-30 13:22:24 | Epoch: 1 | Step: 365150 | Dataset: 0-1074698 | Loss: 0.675 | 598 ms/step , 115429.16 GFLOP/s , 173535.1 tokens/s INFO:__main__:2024-11-30 13:22:31 | Epoch: 1 | Step: 365160 | Dataset: 0-1077098 | Loss: 0.720 | 598 ms/step , 115411.13 GFLOP/s , 173522.6 tokens/s INFO:__main__:2024-11-30 13:22:38 | Epoch: 1 | Step: 365170 | Dataset: 0-1079498 | Loss: 0.589 | 598 ms/step , 115469.23 GFLOP/s , 173553.9 tokens/s INFO:__main__:2024-11-30 13:22:45 | Epoch: 1 | Step: 365180 | Dataset: 0-1081898 | Loss: 0.632 | 598 ms/step , 115418.08 GFLOP/s , 173593.2 tokens/s INFO:__main__:2024-11-30 13:22:52 | Epoch: 1 | Step: 365190 | Dataset: 0-1084298 | Loss: 0.608 | 598 ms/step , 115342.71 GFLOP/s , 173492.0 tokens/s INFO:__main__:2024-11-30 13:22:59 | Epoch: 1 | Step: 365200 | Dataset: 0-1086698 | Loss: 0.689 | 598 ms/step , 115449.90 GFLOP/s , 173603.5 tokens/s INFO:__main__:2024-11-30 13:23:06 | Epoch: 1 | Step: 365210 | Dataset: 0-1089098 | Loss: 0.747 | 598 ms/step , 115431.36 GFLOP/s , 173614.0 tokens/s INFO:__main__:2024-11-30 13:23:14 | Epoch: 1 | Step: 365220 | Dataset: 0-1091498 | Loss: 0.654 | 597 ms/step , 115546.07 GFLOP/s , 173658.8 tokens/s INFO:__main__:2024-11-30 13:23:21 | Epoch: 1 | Step: 365230 | Dataset: 0-1093898 | Loss: 0.619 | 596 ms/step , 115741.69 GFLOP/s , 173594.5 tokens/s INFO:__main__:2024-11-30 13:23:28 | Epoch: 1 | Step: 365240 | Dataset: 0-1096298 | Loss: 0.576 | 597 ms/step , 115607.45 GFLOP/s , 173598.3 tokens/s INFO:__main__:2024-11-30 13:23:35 | Epoch: 1 | Step: 365250 | Dataset: 0-1098698 | Loss: 0.678 | 597 ms/step , 115537.49 GFLOP/s , 173641.4 tokens/s INFO:__main__:2024-11-30 13:23:42 | Epoch: 1 | Step: 365260 | Dataset: 0-1101098 | Loss: 0.644 | 598 ms/step , 115319.07 GFLOP/s , 173600.4 tokens/s INFO:__main__:2024-11-30 13:23:49 | Epoch: 1 | Step: 365270 | Dataset: 0-1103498 | Loss: 0.664 | 597 ms/step , 115622.58 GFLOP/s , 173557.2 tokens/s INFO:__main__:2024-11-30 13:23:56 | Epoch: 1 | Step: 365280 | Dataset: 0-1105898 | Loss: 0.656 | 597 ms/step , 115566.07 GFLOP/s , 173576.8 tokens/s INFO:__main__:2024-11-30 13:24:03 | Epoch: 1 | Step: 365290 | Dataset: 0-1108298 | Loss: 0.673 | 597 ms/step , 115623.24 GFLOP/s , 173590.5 tokens/s INFO:__main__:2024-11-30 13:24:10 | Epoch: 1 | Step: 365300 | Dataset: 0-1110698 | Loss: 0.762 | 598 ms/step , 115424.54 GFLOP/s , 173559.2 tokens/s INFO:__main__:2024-11-30 13:24:17 | Epoch: 1 | Step: 365310 | Dataset: 0-1113098 | Loss: 0.673 | 597 ms/step , 115596.47 GFLOP/s , 173614.2 tokens/s INFO:__main__:2024-11-30 13:24:24 | Epoch: 1 | Step: 365320 | Dataset: 0-1115498 | Loss: 0.744 | 597 ms/step , 115567.87 GFLOP/s , 173616.5 tokens/s INFO:__main__:2024-11-30 13:24:31 | Epoch: 1 | Step: 365330 | Dataset: 0-1117898 | Loss: 0.604 | 597 ms/step , 115660.34 GFLOP/s , 173628.3 tokens/s INFO:__main__:2024-11-30 13:24:39 | Epoch: 1 | Step: 365340 | Dataset: 0-1120298 | Loss: 0.658 | 598 ms/step , 115484.28 GFLOP/s , 173524.3 tokens/s INFO:__main__:2024-11-30 13:24:46 | Epoch: 1 | Step: 365350 | Dataset: 0-1122698 | Loss: 0.735 | 597 ms/step , 115533.76 GFLOP/s , 173497.9 tokens/s INFO:__main__:2024-11-30 13:24:53 | Epoch: 1 | Step: 365360 | Dataset: 0-1125098 | Loss: 0.647 | 598 ms/step , 115411.21 GFLOP/s , 173532.8 tokens/s INFO:__main__:2024-11-30 13:25:00 | Epoch: 1 | Step: 365370 | Dataset: 0-1127498 | Loss: 0.686 | 597 ms/step , 115623.40 GFLOP/s , 173681.2 tokens/s INFO:__main__:2024-11-30 13:25:07 | Epoch: 1 | Step: 365380 | Dataset: 0-1129898 | Loss: 0.658 | 597 ms/step , 115580.31 GFLOP/s , 173635.9 tokens/s INFO:__main__:2024-11-30 13:25:14 | Epoch: 1 | Step: 365390 | Dataset: 0-1132298 | Loss: 0.651 | 597 ms/step , 115618.72 GFLOP/s , 173521.9 tokens/s INFO:__main__:2024-11-30 13:25:21 | Epoch: 1 | Step: 365400 | Dataset: 0-1134698 | Loss: 0.669 | 598 ms/step , 115491.25 GFLOP/s , 173468.6 tokens/s INFO:__main__:2024-11-30 13:25:28 | Epoch: 1 | Step: 365410 | Dataset: 0-1137098 | Loss: 0.686 | 597 ms/step , 115622.67 GFLOP/s , 173556.7 tokens/s INFO:__main__:2024-11-30 13:25:35 | Epoch: 1 | Step: 365420 | Dataset: 0-1139498 | Loss: 0.665 | 598 ms/step , 115412.06 GFLOP/s , 173488.5 tokens/s INFO:__main__:2024-11-30 13:25:42 | Epoch: 1 | Step: 365430 | Dataset: 0-1141898 | Loss: 0.670 | 597 ms/step , 115564.33 GFLOP/s , 173577.2 tokens/s INFO:__main__:2024-11-30 13:25:49 | Epoch: 1 | Step: 365440 | Dataset: 0-1144298 | Loss: 0.640 | 598 ms/step , 115482.08 GFLOP/s , 173505.4 tokens/s INFO:__main__:2024-11-30 13:25:56 | Epoch: 1 | Step: 365450 | Dataset: 0-1146698 | Loss: 0.736 | 599 ms/step , 115275.83 GFLOP/s , 173490.5 tokens/s INFO:__main__:2024-11-30 13:26:03 | Epoch: 1 | Step: 365460 | Dataset: 0-1149098 | Loss: 0.584 | 599 ms/step , 115284.86 GFLOP/s , 173413.6 tokens/s INFO:__main__:2024-11-30 13:26:11 | Epoch: 1 | Step: 365470 | Dataset: 0-1151498 | Loss: 0.595 | 598 ms/step , 115360.33 GFLOP/s , 173438.5 tokens/s INFO:__main__:2024-11-30 13:26:18 | Epoch: 1 | Step: 365480 | Dataset: 0-1153898 | Loss: 0.599 | 597 ms/step , 115630.16 GFLOP/s , 173587.6 tokens/s INFO:__main__:2024-11-30 13:26:25 | Epoch: 1 | Step: 365490 | Dataset: 0-1156298 | Loss: 0.673 | 598 ms/step , 115358.74 GFLOP/s , 173452.6 tokens/s INFO:__main__:2024-11-30 13:26:32 | Validation | Step: 365500 | Val_loss: 0.601 | Best_val_loss: 0.3487 INFO:__main__:2024-11-30 13:26:33 | Epoch: 1 | Step: 365500 | Dataset: 0-1158698 | Loss: 0.661 | 597 ms/step , 115638.77 GFLOP/s , 147661.2 tokens/s INFO:__main__:2024-11-30 13:26:40 | Epoch: 1 | Step: 365510 | Dataset: 0-1161098 | Loss: 0.684 | 598 ms/step , 115361.08 GFLOP/s , 173538.1 tokens/s INFO:__main__:2024-11-30 13:26:47 | Epoch: 1 | Step: 365520 | Dataset: 0-1163498 | Loss: 0.628 | 598 ms/step , 115401.15 GFLOP/s , 173505.4 tokens/s INFO:__main__:2024-11-30 13:26:54 | Epoch: 1 | Step: 365530 | Dataset: 0-1165898 | Loss: 0.637 | 598 ms/step , 115363.43 GFLOP/s , 173416.7 tokens/s INFO:__main__:2024-11-30 13:27:01 | Epoch: 1 | Step: 365540 | Dataset: 0-1168298 | Loss: 0.667 | 599 ms/step , 115229.16 GFLOP/s , 173449.6 tokens/s INFO:__main__:2024-11-30 13:27:08 | Epoch: 1 | Step: 365550 | Dataset: 0-1170698 | Loss: 0.639 | 597 ms/step , 115528.39 GFLOP/s , 173421.4 tokens/s INFO:__main__:2024-11-30 13:27:16 | Epoch: 1 | Step: 365560 | Dataset: 0-1173098 | Loss: 0.738 | 598 ms/step , 115446.72 GFLOP/s , 173464.4 tokens/s INFO:__main__:2024-11-30 13:27:23 | Epoch: 1 | Step: 365570 | Dataset: 0-1175498 | Loss: 0.617 | 597 ms/step , 115509.26 GFLOP/s , 173421.3 tokens/s INFO:__main__:2024-11-30 13:27:30 | Epoch: 1 | Step: 365580 | Dataset: 0-1177898 | Loss: 0.688 | 598 ms/step , 115345.25 GFLOP/s , 173502.7 tokens/s INFO:__main__:2024-11-30 13:27:37 | Epoch: 1 | Step: 365590 | Dataset: 0-1180298 | Loss: 0.381 | 597 ms/step , 115673.78 GFLOP/s , 173532.1 tokens/s INFO:__main__:2024-11-30 13:27:44 | Epoch: 1 | Step: 365600 | Dataset: 0-1182698 | Loss: 0.364 | 597 ms/step , 115551.69 GFLOP/s , 173572.4 tokens/s INFO:__main__:2024-11-30 13:27:51 | Epoch: 1 | Step: 365610 | Dataset: 0-1185098 | Loss: 0.387 | 597 ms/step , 115585.51 GFLOP/s , 173595.2 tokens/s INFO:__main__:2024-11-30 13:27:58 | Epoch: 1 | Step: 365620 | Dataset: 0-1187498 | Loss: 0.349 | 598 ms/step , 115478.20 GFLOP/s , 173590.3 tokens/s INFO:__main__:2024-11-30 13:28:05 | Epoch: 1 | Step: 365630 | Dataset: 0-1189898 | Loss: 0.336 | 597 ms/step , 115510.85 GFLOP/s , 173502.6 tokens/s INFO:__main__:2024-11-30 13:28:12 | Epoch: 1 | Step: 365640 | Dataset: 0-1192298 | Loss: 0.430 | 598 ms/step , 115373.49 GFLOP/s , 173613.3 tokens/s INFO:__main__:2024-11-30 13:28:19 | Epoch: 1 | Step: 365650 | Dataset: 0-1194698 | Loss: 0.380 | 597 ms/step , 115579.21 GFLOP/s , 173552.3 tokens/s INFO:__main__:2024-11-30 13:28:26 | Epoch: 1 | Step: 365660 | Dataset: 0-1197098 | Loss: 0.368 | 598 ms/step , 115394.66 GFLOP/s , 173523.2 tokens/s INFO:__main__:2024-11-30 13:28:33 | Epoch: 1 | Step: 365670 | Dataset: 0-1199498 | Loss: 0.338 | 597 ms/step , 115503.28 GFLOP/s , 173626.9 tokens/s INFO:__main__:2024-11-30 13:28:41 | Epoch: 1 | Step: 365680 | Dataset: 0-1201898 | Loss: 0.346 | 597 ms/step , 115560.41 GFLOP/s , 173597.4 tokens/s INFO:__main__:2024-11-30 13:28:48 | Epoch: 1 | Step: 365690 | Dataset: 0-1204298 | Loss: 0.411 | 597 ms/step , 115518.11 GFLOP/s , 173048.3 tokens/s INFO:__main__:2024-11-30 13:28:55 | Epoch: 1 | Step: 365700 | Dataset: 0-1206698 | Loss: 0.325 | 598 ms/step , 115467.19 GFLOP/s , 173522.4 tokens/s INFO:__main__:2024-11-30 13:29:02 | Epoch: 1 | Step: 365710 | Dataset: 0-1209098 | Loss: 0.391 | 597 ms/step , 115653.66 GFLOP/s , 173403.7 tokens/s INFO:__main__:2024-11-30 13:29:09 | Epoch: 1 | Step: 365720 | Dataset: 0-1211498 | Loss: 0.370 | 598 ms/step , 115480.79 GFLOP/s , 173509.7 tokens/s INFO:__main__:2024-11-30 13:29:16 | Epoch: 1 | Step: 365730 | Dataset: 0-1213898 | Loss: 0.362 | 597 ms/step , 115645.04 GFLOP/s , 173552.4 tokens/s INFO:__main__:2024-11-30 13:29:23 | Epoch: 1 | Step: 365740 | Dataset: 0-1216298 | Loss: 0.340 | 597 ms/step , 115602.00 GFLOP/s , 173523.9 tokens/s INFO:__main__:2024-11-30 13:29:30 | Epoch: 1 | Step: 365750 | Dataset: 0-1218698 | Loss: 0.376 | 597 ms/step , 115512.09 GFLOP/s , 173399.9 tokens/s INFO:__main__:2024-11-30 13:29:37 | Epoch: 1 | Step: 365760 | Dataset: 0-1221098 | Loss: 0.384 | 599 ms/step , 115306.01 GFLOP/s , 173516.6 tokens/s INFO:__main__:2024-11-30 13:29:44 | Epoch: 1 | Step: 365770 | Dataset: 0-1223498 | Loss: 0.374 | 597 ms/step , 115671.64 GFLOP/s , 173553.0 tokens/s INFO:__main__:2024-11-30 13:29:51 | Epoch: 1 | Step: 365780 | Dataset: 0-1225898 | Loss: 0.395 | 598 ms/step , 115352.58 GFLOP/s , 173499.8 tokens/s INFO:__main__:2024-11-30 13:29:58 | Epoch: 1 | Step: 365790 | Dataset: 0-1228298 | Loss: 0.358 | 598 ms/step , 115483.17 GFLOP/s , 173564.3 tokens/s INFO:__main__:2024-11-30 13:30:06 | Epoch: 1 | Step: 365800 | Dataset: 0-1230698 | Loss: 0.358 | 598 ms/step , 115423.53 GFLOP/s , 173464.7 tokens/s INFO:__main__:2024-11-30 13:30:13 | Epoch: 1 | Step: 365810 | Dataset: 0-1233098 | Loss: 0.370 | 598 ms/step , 115492.27 GFLOP/s , 173650.4 tokens/s INFO:__main__:2024-11-30 13:30:20 | Epoch: 1 | Step: 365820 | Dataset: 0-1235498 | Loss: 0.335 | 598 ms/step , 115432.74 GFLOP/s , 173664.2 tokens/s INFO:__main__:2024-11-30 13:30:27 | Epoch: 1 | Step: 365830 | Dataset: 0-1237898 | Loss: 0.358 | 598 ms/step , 115479.05 GFLOP/s , 173653.3 tokens/s INFO:__main__:2024-11-30 13:30:34 | Epoch: 1 | Step: 365840 | Dataset: 0-1240298 | Loss: 0.397 | 597 ms/step , 115629.81 GFLOP/s , 173635.4 tokens/s INFO:__main__:2024-11-30 13:30:41 | Epoch: 1 | Step: 365850 | Dataset: 0-1242698 | Loss: 0.356 | 598 ms/step , 115458.95 GFLOP/s , 173624.6 tokens/s INFO:__main__:2024-11-30 13:30:48 | Epoch: 1 | Step: 365860 | Dataset: 0-1245098 | Loss: 0.360 | 597 ms/step , 115617.48 GFLOP/s , 173627.3 tokens/s INFO:__main__:2024-11-30 13:30:55 | Epoch: 1 | Step: 365870 | Dataset: 0-1247498 | Loss: 0.319 | 599 ms/step , 115285.46 GFLOP/s , 173642.9 tokens/s INFO:__main__:2024-11-30 13:31:02 | Epoch: 1 | Step: 365880 | Dataset: 0-1249898 | Loss: 0.359 | 597 ms/step , 115524.32 GFLOP/s , 173634.5 tokens/s INFO:__main__:2024-11-30 13:31:09 | Epoch: 1 | Step: 365890 | Dataset: 0-1252298 | Loss: 0.347 | 597 ms/step , 115608.79 GFLOP/s , 173690.3 tokens/s INFO:__main__:2024-11-30 13:31:16 | Epoch: 1 | Step: 365900 | Dataset: 0-1254698 | Loss: 0.376 | 597 ms/step , 115542.16 GFLOP/s , 173599.2 tokens/s INFO:__main__:2024-11-30 13:31:23 | Epoch: 1 | Step: 365910 | Dataset: 0-1257098 | Loss: 0.357 | 597 ms/step , 115640.66 GFLOP/s , 173623.5 tokens/s INFO:__main__:2024-11-30 13:31:30 | Epoch: 1 | Step: 365920 | Dataset: 0-1259498 | Loss: 0.323 | 597 ms/step , 115530.56 GFLOP/s , 173636.4 tokens/s INFO:__main__:2024-11-30 13:31:38 | Epoch: 1 | Step: 365930 | Dataset: 0-1261898 | Loss: 0.315 | 597 ms/step , 115552.88 GFLOP/s , 173565.9 tokens/s INFO:__main__:2024-11-30 13:31:45 | Epoch: 1 | Step: 365940 | Dataset: 0-1264298 | Loss: 0.349 | 598 ms/step , 115472.04 GFLOP/s , 173599.0 tokens/s INFO:__main__:2024-11-30 13:31:52 | Epoch: 1 | Step: 365950 | Dataset: 0-1266698 | Loss: 0.363 | 597 ms/step , 115542.92 GFLOP/s , 173683.6 tokens/s INFO:__main__:2024-11-30 13:31:59 | Epoch: 1 | Step: 365960 | Dataset: 0-1269098 | Loss: 0.350 | 597 ms/step , 115634.18 GFLOP/s , 173646.3 tokens/s INFO:__main__:2024-11-30 13:32:06 | Epoch: 1 | Step: 365970 | Dataset: 0-1271498 | Loss: 0.400 | 597 ms/step , 115631.65 GFLOP/s , 173641.4 tokens/s INFO:__main__:2024-11-30 13:32:13 | Epoch: 1 | Step: 365980 | Dataset: 0-1273898 | Loss: 0.368 | 597 ms/step , 115617.40 GFLOP/s , 173639.1 tokens/s INFO:__main__:2024-11-30 13:32:20 | Epoch: 1 | Step: 365990 | Dataset: 0-1276298 | Loss: 0.334 | 597 ms/step , 115581.99 GFLOP/s , 173627.8 tokens/s INFO:__main__:2024-11-30 13:32:28 | Validation | Step: 366000 | Val_loss: 0.586 | Best_val_loss: 0.3487 INFO:__main__:2024-11-30 13:32:28 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_133228_step_366000.pt` INFO:__main__:2024-11-30 13:32:30 | Epoch: 1 | Step: 366000 | Dataset: 0-1278698 | Loss: 0.293 | 594 ms/step , 116217.67 GFLOP/s , 120196.2 tokens/s INFO:__main__:2024-11-30 13:32:37 | Epoch: 1 | Step: 366010 | Dataset: 0-1281098 | Loss: 0.332 | 598 ms/step , 115498.91 GFLOP/s , 173559.5 tokens/s INFO:__main__:2024-11-30 13:32:44 | Epoch: 1 | Step: 366020 | Dataset: 0-1283498 | Loss: 0.365 | 598 ms/step , 115481.89 GFLOP/s , 173607.6 tokens/s INFO:__main__:2024-11-30 13:32:51 | Epoch: 1 | Step: 366030 | Dataset: 0-1285898 | Loss: 0.364 | 597 ms/step , 115594.99 GFLOP/s , 173578.8 tokens/s INFO:__main__:2024-11-30 13:32:59 | Epoch: 1 | Step: 366040 | Dataset: 0-1288298 | Loss: 0.393 | 597 ms/step , 115538.33 GFLOP/s , 173556.4 tokens/s INFO:__main__:2024-11-30 13:33:06 | Epoch: 1 | Step: 366050 | Dataset: 0-1290698 | Loss: 0.365 | 596 ms/step , 115767.19 GFLOP/s , 173469.9 tokens/s INFO:__main__:2024-11-30 13:33:13 | Epoch: 1 | Step: 366060 | Dataset: 0-1293098 | Loss: 0.395 | 598 ms/step , 115501.65 GFLOP/s , 173502.9 tokens/s INFO:__main__:2024-11-30 13:33:20 | Epoch: 1 | Step: 366070 | Dataset: 0-1295498 | Loss: 0.344 | 596 ms/step , 115725.17 GFLOP/s , 173394.8 tokens/s INFO:__main__:2024-11-30 13:33:27 | Epoch: 1 | Step: 366080 | Dataset: 0-1297898 | Loss: 0.355 | 598 ms/step , 115470.00 GFLOP/s , 173463.3 tokens/s INFO:__main__:2024-11-30 13:33:34 | Epoch: 1 | Step: 366090 | Dataset: 0-1300298 | Loss: 0.305 | 597 ms/step , 115638.57 GFLOP/s , 173693.3 tokens/s INFO:__main__:2024-11-30 13:33:41 | Epoch: 1 | Step: 366100 | Dataset: 0-1302698 | Loss: 0.323 | 597 ms/step , 115516.94 GFLOP/s , 173640.1 tokens/s INFO:__main__:2024-11-30 13:33:48 | Epoch: 1 | Step: 366110 | Dataset: 0-1305098 | Loss: 0.386 | 597 ms/step , 115651.96 GFLOP/s , 173775.1 tokens/s INFO:__main__:2024-11-30 13:33:55 | Epoch: 1 | Step: 366120 | Dataset: 0-1307498 | Loss: 0.363 | 597 ms/step , 115510.72 GFLOP/s , 173722.8 tokens/s INFO:__main__:2024-11-30 13:34:02 | Epoch: 1 | Step: 366130 | Dataset: 0-1309898 | Loss: 0.345 | 597 ms/step , 115603.78 GFLOP/s , 173768.2 tokens/s INFO:__main__:2024-11-30 13:34:09 | Epoch: 1 | Step: 366140 | Dataset: 0-1312298 | Loss: 0.817 | 597 ms/step , 115650.66 GFLOP/s , 173699.3 tokens/s INFO:__main__:2024-11-30 13:34:16 | Epoch: 1 | Step: 366150 | Dataset: 0-1314698 | Loss: 0.712 | 597 ms/step , 115577.20 GFLOP/s , 173717.3 tokens/s INFO:__main__:2024-11-30 13:34:23 | Epoch: 1 | Step: 366160 | Dataset: 0-1317098 | Loss: 0.779 | 597 ms/step , 115624.24 GFLOP/s , 173808.1 tokens/s INFO:__main__:2024-11-30 13:34:31 | Epoch: 1 | Step: 366170 | Dataset: 0-1319498 | Loss: 0.794 | 598 ms/step , 115342.00 GFLOP/s , 173786.1 tokens/s INFO:__main__:2024-11-30 13:34:38 | Epoch: 1 | Step: 366180 | Dataset: 0-1321898 | Loss: 0.739 | 596 ms/step , 115742.54 GFLOP/s , 173763.1 tokens/s INFO:__main__:2024-11-30 13:34:45 | Epoch: 1 | Step: 366190 | Dataset: 0-1324298 | Loss: 0.503 | 598 ms/step , 115488.57 GFLOP/s , 173858.4 tokens/s INFO:__main__:2024-11-30 13:34:52 | Epoch: 1 | Step: 366200 | Dataset: 0-1326698 | Loss: 0.698 | 597 ms/step , 115647.18 GFLOP/s , 173809.3 tokens/s INFO:__main__:2024-11-30 13:34:59 | Epoch: 1 | Step: 366210 | Dataset: 0-1329098 | Loss: 0.609 | 597 ms/step , 115611.93 GFLOP/s , 173810.6 tokens/s INFO:__main__:2024-11-30 13:35:06 | Epoch: 1 | Step: 366220 | Dataset: 0-1331498 | Loss: 0.185 | 597 ms/step , 115661.40 GFLOP/s , 173866.1 tokens/s INFO:__main__:2024-11-30 13:35:13 | Epoch: 1 | Step: 366230 | Dataset: 0-1333898 | Loss: 0.467 | 598 ms/step , 115446.05 GFLOP/s , 173775.0 tokens/s INFO:__main__:2024-11-30 13:35:20 | Epoch: 1 | Step: 366240 | Dataset: 0-1336298 | Loss: 0.289 | 597 ms/step , 115689.12 GFLOP/s , 173836.0 tokens/s INFO:__main__:2024-11-30 13:35:27 | Epoch: 1 | Step: 366250 | Dataset: 0-1338698 | Loss: 0.199 | 596 ms/step , 115734.99 GFLOP/s , 173952.4 tokens/s INFO:__main__:2024-11-30 13:35:34 | Epoch: 1 | Step: 366260 | Dataset: 0-1341098 | Loss: 0.145 | 596 ms/step , 115789.85 GFLOP/s , 174007.1 tokens/s INFO:__main__:2024-11-30 13:35:41 | Epoch: 1 | Step: 366270 | Dataset: 0-1343498 | Loss: 0.457 | 597 ms/step , 115598.26 GFLOP/s , 174009.1 tokens/s INFO:__main__:2024-11-30 13:35:48 | Epoch: 1 | Step: 366280 | Dataset: 0-1345898 | Loss: 0.278 | 596 ms/step , 115796.38 GFLOP/s , 173906.8 tokens/s INFO:__main__:2024-11-30 13:35:55 | Epoch: 1 | Step: 366290 | Dataset: 0-1348298 | Loss: 0.360 | 597 ms/step , 115613.35 GFLOP/s , 173855.9 tokens/s INFO:__main__:2024-11-30 13:36:02 | Epoch: 1 | Step: 366300 | Dataset: 0-1350698 | Loss: 0.926 | 596 ms/step , 115714.52 GFLOP/s , 173780.5 tokens/s INFO:__main__:2024-11-30 13:36:09 | Epoch: 1 | Step: 366310 | Dataset: 0-1353098 | Loss: 1.055 | 598 ms/step , 115380.22 GFLOP/s , 173755.1 tokens/s INFO:__main__:2024-11-30 13:36:17 | Epoch: 1 | Step: 366320 | Dataset: 0-1355498 | Loss: 1.066 | 598 ms/step , 115312.09 GFLOP/s , 173478.4 tokens/s INFO:__main__:2024-11-30 13:36:24 | Epoch: 1 | Step: 366330 | Dataset: 0-1357898 | Loss: 1.021 | 599 ms/step , 115267.60 GFLOP/s , 173502.4 tokens/s INFO:__main__:2024-11-30 13:36:31 | Epoch: 1 | Step: 366340 | Dataset: 0-1360298 | Loss: 1.025 | 599 ms/step , 115288.55 GFLOP/s , 173414.1 tokens/s INFO:__main__:2024-11-30 13:36:38 | Epoch: 1 | Step: 366350 | Dataset: 0-1362698 | Loss: 1.067 | 598 ms/step , 115328.26 GFLOP/s , 173486.2 tokens/s INFO:__main__:2024-11-30 13:36:45 | Epoch: 1 | Step: 366360 | Dataset: 0-1365098 | Loss: 1.058 | 598 ms/step , 115481.62 GFLOP/s , 173538.2 tokens/s INFO:__main__:2024-11-30 13:36:52 | Epoch: 1 | Step: 366370 | Dataset: 0-1367498 | Loss: 1.072 | 601 ms/step , 114868.11 GFLOP/s , 173384.0 tokens/s INFO:__main__:2024-11-30 13:36:59 | Epoch: 1 | Step: 366380 | Dataset: 0-1369898 | Loss: 1.037 | 598 ms/step , 115457.61 GFLOP/s , 173515.8 tokens/s INFO:__main__:2024-11-30 13:37:06 | Epoch: 1 | Step: 366390 | Dataset: 0-1372298 | Loss: 1.029 | 601 ms/step , 114850.17 GFLOP/s , 173507.4 tokens/s INFO:__main__:2024-11-30 13:37:13 | Epoch: 1 | Step: 366400 | Dataset: 0-1374698 | Loss: 1.058 | 598 ms/step , 115488.60 GFLOP/s , 173595.0 tokens/s INFO:__main__:2024-11-30 13:37:20 | Epoch: 1 | Step: 366410 | Dataset: 0-1377098 | Loss: 1.048 | 599 ms/step , 115257.24 GFLOP/s , 173526.5 tokens/s INFO:__main__:2024-11-30 13:37:27 | Epoch: 1 | Step: 366420 | Dataset: 0-1379498 | Loss: 1.024 | 598 ms/step , 115415.28 GFLOP/s , 173648.0 tokens/s INFO:__main__:2024-11-30 13:37:34 | Epoch: 1 | Step: 366430 | Dataset: 0-1381898 | Loss: 0.995 | 599 ms/step , 115269.47 GFLOP/s , 173595.2 tokens/s INFO:__main__:2024-11-30 13:37:42 | Epoch: 1 | Step: 366440 | Dataset: 0-1384298 | Loss: 1.016 | 598 ms/step , 115398.04 GFLOP/s , 173646.5 tokens/s INFO:__main__:2024-11-30 13:37:49 | Epoch: 1 | Step: 366450 | Dataset: 0-1386698 | Loss: 1.042 | 599 ms/step , 115247.79 GFLOP/s , 173595.3 tokens/s INFO:__main__:2024-11-30 13:37:56 | Epoch: 1 | Step: 366460 | Dataset: 0-1389098 | Loss: 1.022 | 598 ms/step , 115454.89 GFLOP/s , 173618.6 tokens/s INFO:__main__:2024-11-30 13:38:03 | Epoch: 1 | Step: 366470 | Dataset: 0-1391498 | Loss: 1.019 | 599 ms/step , 115246.42 GFLOP/s , 173655.0 tokens/s INFO:__main__:2024-11-30 13:38:10 | Epoch: 1 | Step: 366480 | Dataset: 0-1393898 | Loss: 1.049 | 597 ms/step , 115507.43 GFLOP/s , 173609.2 tokens/s INFO:__main__:2024-11-30 13:38:17 | Epoch: 1 | Step: 366490 | Dataset: 0-1396298 | Loss: 1.043 | 598 ms/step , 115388.19 GFLOP/s , 173637.2 tokens/s INFO:__main__:2024-11-30 13:38:25 | Validation | Step: 366500 | Val_loss: 0.575 | Best_val_loss: 0.3487 INFO:__main__:2024-11-30 13:38:25 | Epoch: 1 | Step: 366500 | Dataset: 0-1398698 | Loss: 1.042 | 596 ms/step , 115696.90 GFLOP/s , 147862.0 tokens/s INFO:__main__:2024-11-30 13:38:32 | Epoch: 1 | Step: 366510 | Dataset: 0-1401098 | Loss: 1.017 | 597 ms/step , 115531.73 GFLOP/s , 173516.3 tokens/s INFO:__main__:2024-11-30 13:38:39 | Epoch: 1 | Step: 366520 | Dataset: 0-1403498 | Loss: 1.025 | 597 ms/step , 115561.96 GFLOP/s , 173413.8 tokens/s INFO:__main__:2024-11-30 13:38:46 | Epoch: 1 | Step: 366530 | Dataset: 0-1405898 | Loss: 1.021 | 597 ms/step , 115536.56 GFLOP/s , 173427.7 tokens/s INFO:__main__:2024-11-30 13:38:54 | Epoch: 1 | Step: 366540 | Dataset: 0-1408298 | Loss: 1.035 | 598 ms/step , 115496.55 GFLOP/s , 173494.7 tokens/s INFO:__main__:2024-11-30 13:39:01 | Epoch: 1 | Step: 366550 | Dataset: 0-1410698 | Loss: 1.063 | 598 ms/step , 115450.06 GFLOP/s , 173482.7 tokens/s INFO:__main__:2024-11-30 13:39:08 | Epoch: 1 | Step: 366560 | Dataset: 0-1413098 | Loss: 1.030 | 597 ms/step , 115602.02 GFLOP/s , 173432.5 tokens/s INFO:__main__:2024-11-30 13:39:15 | Epoch: 1 | Step: 366570 | Dataset: 0-1415498 | Loss: 1.046 | 598 ms/step , 115331.60 GFLOP/s , 173359.9 tokens/s INFO:__main__:2024-11-30 13:39:22 | Epoch: 1 | Step: 366580 | Dataset: 0-1417898 | Loss: 1.046 | 597 ms/step , 115563.15 GFLOP/s , 173409.3 tokens/s INFO:__main__:2024-11-30 13:39:29 | Epoch: 1 | Step: 366590 | Dataset: 0-1420298 | Loss: 1.082 | 597 ms/step , 115559.99 GFLOP/s , 173377.4 tokens/s INFO:__main__:2024-11-30 13:39:36 | Epoch: 1 | Step: 366600 | Dataset: 0-1422698 | Loss: 0.997 | 598 ms/step , 115443.44 GFLOP/s , 173467.6 tokens/s INFO:__main__:2024-11-30 13:39:43 | Epoch: 1 | Step: 366610 | Dataset: 0-1425098 | Loss: 1.001 | 597 ms/step , 115539.46 GFLOP/s , 173407.5 tokens/s INFO:__main__:2024-11-30 13:39:50 | Epoch: 1 | Step: 366620 | Dataset: 0-1427498 | Loss: 1.013 | 597 ms/step , 115555.37 GFLOP/s , 173444.1 tokens/s INFO:__main__:2024-11-30 13:39:57 | Epoch: 1 | Step: 366630 | Dataset: 0-1429898 | Loss: 1.033 | 597 ms/step , 115606.18 GFLOP/s , 173448.1 tokens/s INFO:__main__:2024-11-30 13:40:04 | Epoch: 1 | Step: 366640 | Dataset: 0-1432298 | Loss: 1.030 | 598 ms/step , 115433.25 GFLOP/s , 173431.0 tokens/s INFO:__main__:2024-11-30 13:40:11 | Epoch: 1 | Step: 366650 | Dataset: 0-1434698 | Loss: 1.032 | 598 ms/step , 115450.49 GFLOP/s , 173522.1 tokens/s INFO:__main__:2024-11-30 13:40:19 | Epoch: 1 | Step: 366660 | Dataset: 0-1437098 | Loss: 1.065 | 597 ms/step , 115653.34 GFLOP/s , 173481.9 tokens/s INFO:__main__:2024-11-30 13:40:26 | Epoch: 1 | Step: 366670 | Dataset: 0-1439498 | Loss: 1.037 | 598 ms/step , 115469.88 GFLOP/s , 173486.9 tokens/s INFO:__main__:2024-11-30 13:40:33 | Epoch: 1 | Step: 366680 | Dataset: 0-1441898 | Loss: 1.080 | 596 ms/step , 115701.66 GFLOP/s , 173477.5 tokens/s INFO:__main__:2024-11-30 13:40:40 | Epoch: 1 | Step: 366690 | Dataset: 0-1444298 | Loss: 0.835 | 597 ms/step , 115619.64 GFLOP/s , 173458.5 tokens/s INFO:__main__:2024-11-30 13:40:47 | Epoch: 1 | Step: 366700 | Dataset: 0-1446698 | Loss: 0.810 | 597 ms/step , 115524.45 GFLOP/s , 173465.8 tokens/s INFO:__main__:2024-11-30 13:40:54 | Epoch: 1 | Step: 366710 | Dataset: 0-1449098 | Loss: 0.736 | 598 ms/step , 115457.98 GFLOP/s , 173545.3 tokens/s INFO:__main__:2024-11-30 13:41:01 | Epoch: 1 | Step: 366720 | Dataset: 0-1451498 | Loss: 0.872 | 601 ms/step , 114879.46 GFLOP/s , 173302.5 tokens/s INFO:__main__:2024-11-30 13:41:08 | Epoch: 1 | Step: 366730 | Dataset: 0-1453898 | Loss: 0.763 | 597 ms/step , 115577.93 GFLOP/s , 173435.4 tokens/s INFO:__main__:2024-11-30 13:41:15 | Epoch: 1 | Step: 366740 | Dataset: 0-1456298 | Loss: 0.765 | 598 ms/step , 115443.67 GFLOP/s , 173517.3 tokens/s INFO:__main__:2024-11-30 13:41:22 | Epoch: 1 | Step: 366750 | Dataset: 0-1458698 | Loss: 0.746 | 597 ms/step , 115619.14 GFLOP/s , 173535.9 tokens/s INFO:__main__:2024-11-30 13:41:29 | Epoch: 1 | Step: 366760 | Dataset: 0-1461098 | Loss: 0.804 | 597 ms/step , 115559.02 GFLOP/s , 173598.3 tokens/s INFO:__main__:2024-11-30 13:41:36 | Epoch: 1 | Step: 366770 | Dataset: 0-1463498 | Loss: 0.747 | 597 ms/step , 115629.93 GFLOP/s , 173499.9 tokens/s INFO:__main__:2024-11-30 13:41:44 | Epoch: 1 | Step: 366780 | Dataset: 0-1465898 | Loss: 0.739 | 597 ms/step , 115537.27 GFLOP/s , 173476.1 tokens/s INFO:__main__:2024-11-30 13:41:51 | Epoch: 1 | Step: 366790 | Dataset: 0-1468298 | Loss: 0.753 | 597 ms/step , 115678.28 GFLOP/s , 173563.6 tokens/s INFO:__main__:2024-11-30 13:41:58 | Epoch: 1 | Step: 366800 | Dataset: 0-1470698 | Loss: 0.786 | 597 ms/step , 115521.25 GFLOP/s , 173403.7 tokens/s INFO:__main__:2024-11-30 13:42:05 | Epoch: 1 | Step: 366810 | Dataset: 0-1473098 | Loss: 0.670 | 597 ms/step , 115649.00 GFLOP/s , 173611.2 tokens/s INFO:__main__:2024-11-30 13:42:12 | Epoch: 1 | Step: 366820 | Dataset: 0-1475498 | Loss: 0.851 | 597 ms/step , 115506.05 GFLOP/s , 173545.6 tokens/s INFO:__main__:2024-11-30 13:42:19 | Epoch: 1 | Step: 366830 | Dataset: 0-1477898 | Loss: 0.770 | 602 ms/step , 114705.88 GFLOP/s , 173431.9 tokens/s INFO:__main__:2024-11-30 13:42:26 | Epoch: 1 | Step: 366840 | Dataset: 0-1480298 | Loss: 0.617 | 597 ms/step , 115577.81 GFLOP/s , 173495.2 tokens/s INFO:__main__:2024-11-30 13:42:33 | Epoch: 1 | Step: 366850 | Dataset: 0-1482698 | Loss: 0.779 | 597 ms/step , 115636.30 GFLOP/s , 173581.2 tokens/s INFO:__main__:2024-11-30 13:42:40 | Epoch: 1 | Step: 366860 | Dataset: 0-1485098 | Loss: 0.823 | 597 ms/step , 115529.24 GFLOP/s , 173625.0 tokens/s INFO:__main__:2024-11-30 13:42:47 | Epoch: 1 | Step: 366870 | Dataset: 0-1487498 | Loss: 0.748 | 596 ms/step , 115704.16 GFLOP/s , 173558.3 tokens/s INFO:__main__:2024-11-30 13:42:54 | Epoch: 1 | Step: 366880 | Dataset: 0-1489898 | Loss: 0.761 | 597 ms/step , 115511.80 GFLOP/s , 173537.7 tokens/s INFO:__main__:2024-11-30 13:43:01 | Epoch: 1 | Step: 366890 | Dataset: 0-1492298 | Loss: 0.743 | 597 ms/step , 115646.80 GFLOP/s , 173489.2 tokens/s INFO:__main__:2024-11-30 13:43:09 | Epoch: 1 | Step: 366900 | Dataset: 0-1494698 | Loss: 0.777 | 598 ms/step , 115415.42 GFLOP/s , 173588.5 tokens/s INFO:__main__:2024-11-30 13:43:16 | Epoch: 1 | Step: 366910 | Dataset: 0-1497098 | Loss: 0.756 | 597 ms/step , 115563.50 GFLOP/s , 173595.6 tokens/s INFO:__main__:2024-11-30 13:43:23 | Epoch: 1 | Step: 366920 | Dataset: 0-1499498 | Loss: 0.726 | 596 ms/step , 115746.73 GFLOP/s , 173390.3 tokens/s INFO:__main__:2024-11-30 13:43:30 | Epoch: 1 | Step: 366930 | Dataset: 0-1501898 | Loss: 0.811 | 598 ms/step , 115434.02 GFLOP/s , 173600.7 tokens/s INFO:__main__:2024-11-30 13:43:37 | Epoch: 1 | Step: 366940 | Dataset: 0-1504298 | Loss: 0.767 | 597 ms/step , 115638.93 GFLOP/s , 173563.3 tokens/s INFO:__main__:2024-11-30 13:43:44 | Epoch: 1 | Step: 366950 | Dataset: 0-1506698 | Loss: 0.789 | 597 ms/step , 115693.98 GFLOP/s , 173558.9 tokens/s INFO:__main__:2024-11-30 13:43:51 | Epoch: 1 | Step: 366960 | Dataset: 0-1509098 | Loss: 0.804 | 598 ms/step , 115436.39 GFLOP/s , 173602.8 tokens/s INFO:__main__:2024-11-30 13:43:58 | Epoch: 1 | Step: 366970 | Dataset: 0-1511498 | Loss: 0.796 | 598 ms/step , 115336.99 GFLOP/s , 173589.9 tokens/s INFO:__main__:2024-11-30 13:44:05 | Epoch: 1 | Step: 366980 | Dataset: 0-1513898 | Loss: 0.746 | 597 ms/step , 115564.79 GFLOP/s , 173551.9 tokens/s INFO:__main__:2024-11-30 13:44:12 | Epoch: 1 | Step: 366990 | Dataset: 0-1516298 | Loss: 0.678 | 598 ms/step , 115463.08 GFLOP/s , 173606.2 tokens/s INFO:__main__:2024-11-30 13:44:20 | Validation | Step: 367000 | Val_loss: 0.806 | Best_val_loss: 0.3487 INFO:__main__:2024-11-30 13:44:20 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_134420_step_367000.pt` INFO:__main__:2024-11-30 13:44:22 | Epoch: 1 | Step: 367000 | Dataset: 0-1518698 | Loss: 0.789 | 595 ms/step , 116030.29 GFLOP/s , 120325.0 tokens/s INFO:__main__:2024-11-30 13:44:30 | Epoch: 1 | Step: 367010 | Dataset: 0-1521098 | Loss: 0.902 | 596 ms/step , 115757.99 GFLOP/s , 173784.2 tokens/s INFO:__main__:2024-11-30 13:44:37 | Epoch: 1 | Step: 367020 | Dataset: 0-1523498 | Loss: 0.876 | 598 ms/step , 115348.20 GFLOP/s , 173627.3 tokens/s INFO:__main__:2024-11-30 13:44:44 | Epoch: 1 | Step: 367030 | Dataset: 0-1525898 | Loss: 0.848 | 597 ms/step , 115610.60 GFLOP/s , 173579.6 tokens/s INFO:__main__:2024-11-30 13:44:51 | Epoch: 1 | Step: 367040 | Dataset: 0-1528298 | Loss: 0.773 | 598 ms/step , 115383.40 GFLOP/s , 173562.2 tokens/s INFO:__main__:2024-11-30 13:44:58 | Epoch: 1 | Step: 367050 | Dataset: 0-1530698 | Loss: 0.865 | 596 ms/step , 115737.08 GFLOP/s , 173465.9 tokens/s INFO:__main__:2024-11-30 13:45:05 | Epoch: 1 | Step: 367060 | Dataset: 0-1533098 | Loss: 0.805 | 597 ms/step , 115504.33 GFLOP/s , 173559.0 tokens/s INFO:__main__:2024-11-30 13:45:12 | Epoch: 1 | Step: 367070 | Dataset: 0-1535498 | Loss: 0.828 | 597 ms/step , 115643.73 GFLOP/s , 173464.9 tokens/s INFO:__main__:2024-11-30 13:45:19 | Epoch: 1 | Step: 367080 | Dataset: 0-1537898 | Loss: 0.807 | 598 ms/step , 115450.69 GFLOP/s , 173550.6 tokens/s INFO:__main__:2024-11-30 13:45:26 | Epoch: 1 | Step: 367090 | Dataset: 0-1540298 | Loss: 0.826 | 597 ms/step , 115625.84 GFLOP/s , 173528.9 tokens/s INFO:__main__:2024-11-30 13:45:33 | Epoch: 1 | Step: 367100 | Dataset: 0-1542698 | Loss: 0.720 | 598 ms/step , 115329.11 GFLOP/s , 173456.3 tokens/s INFO:__main__:2024-11-30 13:45:40 | Epoch: 1 | Step: 367110 | Dataset: 0-1545098 | Loss: 0.805 | 598 ms/step , 115425.99 GFLOP/s , 173454.3 tokens/s INFO:__main__:2024-11-30 13:45:47 | Epoch: 1 | Step: 367120 | Dataset: 0-1547498 | Loss: 0.792 | 598 ms/step , 115416.49 GFLOP/s , 173405.0 tokens/s INFO:__main__:2024-11-30 13:45:55 | Epoch: 1 | Step: 367130 | Dataset: 0-1549898 | Loss: 0.763 | 597 ms/step , 115538.87 GFLOP/s , 173449.6 tokens/s INFO:__main__:2024-11-30 13:46:02 | Epoch: 1 | Step: 367140 | Dataset: 0-1552298 | Loss: 0.788 | 597 ms/step , 115583.42 GFLOP/s , 173519.3 tokens/s INFO:__main__:2024-11-30 13:46:09 | Epoch: 1 | Step: 367150 | Dataset: 0-1554698 | Loss: 0.836 | 598 ms/step , 115422.75 GFLOP/s , 173378.5 tokens/s INFO:__main__:2024-11-30 13:46:16 | Epoch: 1 | Step: 367160 | Dataset: 0-1557098 | Loss: 0.819 | 597 ms/step , 115571.60 GFLOP/s , 173379.7 tokens/s INFO:__main__:2024-11-30 13:46:23 | Epoch: 1 | Step: 367170 | Dataset: 0-1559498 | Loss: 0.854 | 598 ms/step , 115438.27 GFLOP/s , 173436.0 tokens/s INFO:__main__:2024-11-30 13:46:30 | Epoch: 1 | Step: 367180 | Dataset: 0-1561898 | Loss: 0.876 | 597 ms/step , 115539.89 GFLOP/s , 173478.2 tokens/s INFO:__main__:2024-11-30 13:46:37 | Epoch: 1 | Step: 367190 | Dataset: 0-1564298 | Loss: 0.738 | 599 ms/step , 115305.36 GFLOP/s , 173424.7 tokens/s INFO:__main__:2024-11-30 13:46:44 | Epoch: 1 | Step: 367200 | Dataset: 0-1566698 | Loss: 0.773 | 598 ms/step , 115423.08 GFLOP/s , 173451.4 tokens/s INFO:__main__:2024-11-30 13:46:51 | Epoch: 1 | Step: 367210 | Dataset: 0-1569098 | Loss: 0.780 | 598 ms/step , 115317.76 GFLOP/s , 173328.2 tokens/s INFO:__main__:2024-11-30 13:46:58 | Epoch: 1 | Step: 367220 | Dataset: 0-1571498 | Loss: 0.787 | 602 ms/step , 114691.97 GFLOP/s , 173270.9 tokens/s INFO:__main__:2024-11-30 13:47:05 | Epoch: 1 | Step: 367230 | Dataset: 0-1573898 | Loss: 0.589 | 598 ms/step , 115499.15 GFLOP/s , 173441.6 tokens/s INFO:__main__:2024-11-30 13:47:12 | Epoch: 1 | Step: 367240 | Dataset: 0-1576298 | Loss: 0.607 | 597 ms/step , 115521.85 GFLOP/s , 173483.2 tokens/s INFO:__main__:2024-11-30 13:47:20 | Epoch: 1 | Step: 367250 | Dataset: 0-1578698 | Loss: 0.543 | 598 ms/step , 115431.02 GFLOP/s , 173346.0 tokens/s INFO:__main__:2024-11-30 13:47:27 | Epoch: 1 | Step: 367260 | Dataset: 0-1581098 | Loss: 0.577 | 598 ms/step , 115432.66 GFLOP/s , 173417.3 tokens/s INFO:__main__:2024-11-30 13:47:34 | Epoch: 1 | Step: 367270 | Dataset: 0-1583498 | Loss: 0.533 | 598 ms/step , 115448.90 GFLOP/s , 173436.8 tokens/s INFO:__main__:2024-11-30 13:47:41 | Epoch: 1 | Step: 367280 | Dataset: 0-1585898 | Loss: 0.572 | 598 ms/step , 115481.20 GFLOP/s , 173442.1 tokens/s INFO:__main__:2024-11-30 13:47:48 | Epoch: 1 | Step: 367290 | Dataset: 0-1588298 | Loss: 0.511 | 598 ms/step , 115411.94 GFLOP/s , 173425.9 tokens/s INFO:__main__:2024-11-30 13:47:55 | Epoch: 1 | Step: 367300 | Dataset: 0-1590698 | Loss: 0.559 | 598 ms/step , 115397.77 GFLOP/s , 173352.5 tokens/s INFO:__main__:2024-11-30 13:48:02 | Epoch: 1 | Step: 367310 | Dataset: 0-1593098 | Loss: 0.556 | 597 ms/step , 115503.99 GFLOP/s , 173318.0 tokens/s INFO:__main__:2024-11-30 13:48:09 | Epoch: 1 | Step: 367320 | Dataset: 0-1595498 | Loss: 0.534 | 597 ms/step , 115514.29 GFLOP/s , 173428.8 tokens/s INFO:__main__:2024-11-30 13:48:16 | Epoch: 1 | Step: 367330 | Dataset: 0-1597898 | Loss: 0.557 | 598 ms/step , 115482.65 GFLOP/s , 173405.2 tokens/s INFO:__main__:2024-11-30 13:48:23 | Epoch: 1 | Step: 367340 | Dataset: 0-1600298 | Loss: 0.551 | 597 ms/step , 115539.50 GFLOP/s , 173307.6 tokens/s INFO:__main__:2024-11-30 13:48:30 | Epoch: 1 | Step: 367350 | Dataset: 0-1602698 | Loss: 0.578 | 597 ms/step , 115539.34 GFLOP/s , 173270.3 tokens/s INFO:__main__:2024-11-30 13:48:38 | Epoch: 1 | Step: 367360 | Dataset: 0-1605098 | Loss: 0.537 | 598 ms/step , 115314.72 GFLOP/s , 173267.1 tokens/s INFO:__main__:2024-11-30 13:48:45 | Epoch: 1 | Step: 367370 | Dataset: 0-1607498 | Loss: 0.544 | 598 ms/step , 115383.06 GFLOP/s , 173305.0 tokens/s INFO:__main__:2024-11-30 13:48:52 | Epoch: 1 | Step: 367380 | Dataset: 0-1609898 | Loss: 0.596 | 598 ms/step , 115396.42 GFLOP/s , 173201.9 tokens/s INFO:__main__:2024-11-30 13:48:59 | Epoch: 1 | Step: 367390 | Dataset: 0-1612298 | Loss: 0.566 | 597 ms/step , 115552.94 GFLOP/s , 173307.6 tokens/s INFO:__main__:2024-11-30 13:49:06 | Epoch: 1 | Step: 367400 | Dataset: 0-1614698 | Loss: 0.504 | 598 ms/step , 115369.35 GFLOP/s , 173288.7 tokens/s INFO:__main__:2024-11-30 13:49:13 | Epoch: 1 | Step: 367410 | Dataset: 0-1617098 | Loss: 0.672 | 598 ms/step , 115459.86 GFLOP/s , 173199.2 tokens/s INFO:__main__:2024-11-30 13:49:20 | Epoch: 1 | Step: 367420 | Dataset: 0-1619498 | Loss: 0.547 | 599 ms/step , 115296.85 GFLOP/s , 173185.7 tokens/s INFO:__main__:2024-11-30 13:49:27 | Epoch: 1 | Step: 367430 | Dataset: 0-1621898 | Loss: 0.516 | 599 ms/step , 115299.69 GFLOP/s , 173253.5 tokens/s INFO:__main__:2024-11-30 13:49:34 | Epoch: 1 | Step: 367440 | Dataset: 0-1624298 | Loss: 0.492 | 599 ms/step , 115173.27 GFLOP/s , 173249.4 tokens/s INFO:__main__:2024-11-30 13:49:41 | Epoch: 1 | Step: 367450 | Dataset: 0-1626698 | Loss: 0.486 | 598 ms/step , 115484.48 GFLOP/s , 173230.0 tokens/s INFO:__main__:2024-11-30 13:49:48 | Epoch: 1 | Step: 367460 | Dataset: 0-1629098 | Loss: 0.554 | 599 ms/step , 115242.35 GFLOP/s , 173191.2 tokens/s INFO:__main__:2024-11-30 13:49:56 | Epoch: 1 | Step: 367470 | Dataset: 0-1631498 | Loss: 0.559 | 598 ms/step , 115331.41 GFLOP/s , 173134.8 tokens/s INFO:__main__:2024-11-30 13:50:03 | Epoch: 1 | Step: 367480 | Dataset: 0-1633898 | Loss: 0.563 | 599 ms/step , 115295.39 GFLOP/s , 173156.6 tokens/s INFO:__main__:2024-11-30 13:50:10 | Epoch: 1 | Step: 367490 | Dataset: 0-1636298 | Loss: 0.555 | 599 ms/step , 115279.19 GFLOP/s , 172953.8 tokens/s INFO:__main__:2024-11-30 13:50:17 | Validation | Step: 367500 | Val_loss: 0.824 | Best_val_loss: 0.3487 INFO:__main__:2024-11-30 13:50:18 | Epoch: 1 | Step: 367500 | Dataset: 0-1638698 | Loss: 0.563 | 599 ms/step , 115277.43 GFLOP/s , 147350.1 tokens/s INFO:__main__:2024-11-30 13:50:25 | Epoch: 1 | Step: 367510 | Dataset: 0-1641098 | Loss: 0.503 | 599 ms/step , 115208.39 GFLOP/s , 173200.8 tokens/s INFO:__main__:2024-11-30 13:50:32 | Epoch: 1 | Step: 367520 | Dataset: 0-1643498 | Loss: 0.586 | 600 ms/step , 115016.24 GFLOP/s , 173124.0 tokens/s INFO:__main__:2024-11-30 13:50:39 | Epoch: 1 | Step: 367530 | Dataset: 0-1645898 | Loss: 0.495 | 599 ms/step , 115220.13 GFLOP/s , 173118.3 tokens/s INFO:__main__:2024-11-30 13:50:46 | Epoch: 1 | Step: 367540 | Dataset: 0-1648298 | Loss: 0.506 | 599 ms/step , 115252.45 GFLOP/s , 173030.4 tokens/s INFO:__main__:2024-11-30 13:50:54 | Epoch: 1 | Step: 367550 | Dataset: 0-1650698 | Loss: 0.586 | 599 ms/step , 115120.93 GFLOP/s , 173093.5 tokens/s INFO:__main__:2024-11-30 13:51:01 | Epoch: 1 | Step: 367560 | Dataset: 0-1653098 | Loss: 0.509 | 599 ms/step , 115236.55 GFLOP/s , 173046.2 tokens/s INFO:__main__:2024-11-30 13:51:08 | Epoch: 1 | Step: 367570 | Dataset: 0-1655498 | Loss: 0.547 | 599 ms/step , 115256.77 GFLOP/s , 172995.0 tokens/s INFO:__main__:2024-11-30 13:51:15 | Epoch: 1 | Step: 367580 | Dataset: 0-1657898 | Loss: 0.535 | 599 ms/step , 115219.48 GFLOP/s , 173120.0 tokens/s INFO:__main__:2024-11-30 13:51:22 | Epoch: 1 | Step: 367590 | Dataset: 0-1660298 | Loss: 0.547 | 599 ms/step , 115130.36 GFLOP/s , 173123.3 tokens/s INFO:__main__:2024-11-30 13:51:29 | Epoch: 1 | Step: 367600 | Dataset: 0-1662698 | Loss: 0.515 | 598 ms/step , 115382.18 GFLOP/s , 173224.4 tokens/s INFO:__main__:2024-11-30 13:51:36 | Epoch: 1 | Step: 367610 | Dataset: 0-1665098 | Loss: 0.518 | 599 ms/step , 115294.21 GFLOP/s , 173200.8 tokens/s INFO:__main__:2024-11-30 13:51:43 | Epoch: 1 | Step: 367620 | Dataset: 0-1667498 | Loss: 0.528 | 599 ms/step , 115195.41 GFLOP/s , 173100.5 tokens/s INFO:__main__:2024-11-30 13:51:50 | Epoch: 1 | Step: 367630 | Dataset: 0-1669898 | Loss: 0.498 | 599 ms/step , 115152.99 GFLOP/s , 173124.1 tokens/s INFO:__main__:2024-11-30 13:51:57 | Epoch: 1 | Step: 367640 | Dataset: 0-1672298 | Loss: 0.523 | 600 ms/step , 115038.79 GFLOP/s , 173134.8 tokens/s INFO:__main__:2024-11-30 13:52:05 | Epoch: 1 | Step: 367650 | Dataset: 0-1674698 | Loss: 0.565 | 600 ms/step , 115033.45 GFLOP/s , 173096.5 tokens/s INFO:__main__:2024-11-30 13:52:12 | Epoch: 1 | Step: 367660 | Dataset: 0-1677098 | Loss: 0.549 | 600 ms/step , 115071.33 GFLOP/s , 173091.6 tokens/s INFO:__main__:2024-11-30 13:52:19 | Epoch: 1 | Step: 367670 | Dataset: 0-1679498 | Loss: 0.561 | 600 ms/step , 115093.77 GFLOP/s , 172928.4 tokens/s INFO:__main__:2024-11-30 13:52:26 | Epoch: 1 | Step: 367680 | Dataset: 0-1681898 | Loss: 0.572 | 600 ms/step , 115037.82 GFLOP/s , 173022.5 tokens/s INFO:__main__:2024-11-30 13:52:33 | Epoch: 1 | Step: 367690 | Dataset: 0-1684298 | Loss: 0.533 | 600 ms/step , 115026.19 GFLOP/s , 173018.4 tokens/s INFO:__main__:2024-11-30 13:52:40 | Epoch: 1 | Step: 367700 | Dataset: 0-1686698 | Loss: 0.552 | 600 ms/step , 115004.67 GFLOP/s , 173088.9 tokens/s INFO:__main__:2024-11-30 13:52:47 | Epoch: 1 | Step: 367710 | Dataset: 0-1689098 | Loss: 0.515 | 599 ms/step , 115154.13 GFLOP/s , 172995.5 tokens/s INFO:__main__:2024-11-30 13:52:54 | Epoch: 1 | Step: 367720 | Dataset: 0-1691498 | Loss: 0.548 | 599 ms/step , 115218.65 GFLOP/s , 173062.7 tokens/s INFO:__main__:2024-11-30 13:53:01 | Epoch: 1 | Step: 367730 | Dataset: 0-1693898 | Loss: 0.539 | 600 ms/step , 115114.58 GFLOP/s , 172937.5 tokens/s INFO:__main__:2024-11-30 13:53:08 | Epoch: 1 | Step: 367740 | Dataset: 0-1696298 | Loss: 0.566 | 600 ms/step , 114997.09 GFLOP/s , 172978.9 tokens/s INFO:__main__:2024-11-30 13:53:16 | Epoch: 1 | Step: 367750 | Dataset: 0-1698698 | Loss: 0.558 | 600 ms/step , 114948.38 GFLOP/s , 173042.8 tokens/s INFO:__main__:2024-11-30 13:53:23 | Epoch: 1 | Step: 367760 | Dataset: 0-1701098 | Loss: 0.550 | 600 ms/step , 115064.29 GFLOP/s , 173095.9 tokens/s INFO:__main__:2024-11-30 13:53:30 | Epoch: 1 | Step: 367770 | Dataset: 0-1703498 | Loss: 0.525 | 600 ms/step , 115070.34 GFLOP/s , 173034.0 tokens/s INFO:__main__:2024-11-30 13:53:37 | Epoch: 1 | Step: 367780 | Dataset: 0-1705898 | Loss: 1.305 | 600 ms/step , 114985.45 GFLOP/s , 173032.4 tokens/s INFO:__main__:2024-11-30 13:53:44 | Epoch: 1 | Step: 367790 | Dataset: 0-1708298 | Loss: 1.171 | 599 ms/step , 115168.14 GFLOP/s , 172983.6 tokens/s INFO:__main__:2024-11-30 13:53:51 | Epoch: 1 | Step: 367800 | Dataset: 0-1710698 | Loss: 1.293 | 600 ms/step , 114949.32 GFLOP/s , 172966.2 tokens/s INFO:__main__:2024-11-30 13:53:58 | Epoch: 1 | Step: 367810 | Dataset: 0-1713098 | Loss: 1.272 | 601 ms/step , 114878.61 GFLOP/s , 172777.1 tokens/s INFO:__main__:2024-11-30 13:54:05 | Epoch: 1 | Step: 367820 | Dataset: 0-1715498 | Loss: 1.228 | 600 ms/step , 115001.03 GFLOP/s , 172929.2 tokens/s INFO:__main__:2024-11-30 13:54:12 | Epoch: 1 | Step: 367830 | Dataset: 0-1717898 | Loss: 1.329 | 599 ms/step , 115203.41 GFLOP/s , 173007.9 tokens/s INFO:__main__:2024-11-30 13:54:20 | Epoch: 1 | Step: 367840 | Dataset: 0-1720298 | Loss: 1.233 | 601 ms/step , 114902.40 GFLOP/s , 172957.9 tokens/s INFO:__main__:2024-11-30 13:54:27 | Epoch: 1 | Step: 367850 | Dataset: 0-1722698 | Loss: 1.238 | 601 ms/step , 114851.86 GFLOP/s , 172939.5 tokens/s INFO:__main__:2024-11-30 13:54:34 | Epoch: 1 | Step: 367860 | Dataset: 0-1725098 | Loss: 0.834 | 601 ms/step , 114895.97 GFLOP/s , 173016.3 tokens/s INFO:__main__:2024-11-30 13:54:41 | Epoch: 1 | Step: 367870 | Dataset: 0-1727498 | Loss: 0.812 | 599 ms/step , 115124.72 GFLOP/s , 172924.9 tokens/s INFO:__main__:2024-11-30 13:54:48 | Epoch: 1 | Step: 367880 | Dataset: 0-1729898 | Loss: 0.799 | 599 ms/step , 115225.57 GFLOP/s , 173056.4 tokens/s INFO:__main__:2024-11-30 13:54:55 | Epoch: 1 | Step: 367890 | Dataset: 0-1732298 | Loss: 0.811 | 600 ms/step , 114951.98 GFLOP/s , 172987.6 tokens/s INFO:__main__:2024-11-30 13:55:02 | Epoch: 1 | Step: 367900 | Dataset: 0-1734698 | Loss: 0.794 | 601 ms/step , 114884.52 GFLOP/s , 172917.0 tokens/s INFO:__main__:2024-11-30 13:55:09 | Epoch: 1 | Step: 367910 | Dataset: 0-1737098 | Loss: 0.782 | 600 ms/step , 115029.65 GFLOP/s , 172907.3 tokens/s INFO:__main__:2024-11-30 13:55:16 | Epoch: 1 | Step: 367920 | Dataset: 0-1739498 | Loss: 0.788 | 598 ms/step , 115335.40 GFLOP/s , 173015.2 tokens/s INFO:__main__:2024-11-30 13:55:23 | Epoch: 1 | Step: 367930 | Dataset: 0-1741898 | Loss: 0.770 | 600 ms/step , 115054.74 GFLOP/s , 172957.2 tokens/s INFO:__main__:2024-11-30 13:55:31 | Epoch: 1 | Step: 367940 | Dataset: 0-1744298 | Loss: 0.762 | 600 ms/step , 115034.17 GFLOP/s , 172948.2 tokens/s INFO:__main__:2024-11-30 13:55:38 | Epoch: 1 | Step: 367950 | Dataset: 0-1746698 | Loss: 0.772 | 600 ms/step , 114997.39 GFLOP/s , 173030.4 tokens/s INFO:__main__:2024-11-30 13:55:45 | Epoch: 1 | Step: 367960 | Dataset: 0-1749098 | Loss: 0.738 | 600 ms/step , 115048.11 GFLOP/s , 173068.2 tokens/s INFO:__main__:2024-11-30 13:55:52 | Epoch: 1 | Step: 367970 | Dataset: 0-1751498 | Loss: 0.765 | 601 ms/step , 114887.89 GFLOP/s , 173025.3 tokens/s INFO:__main__:2024-11-30 13:55:59 | Epoch: 1 | Step: 367980 | Dataset: 0-1753898 | Loss: 0.764 | 599 ms/step , 115180.36 GFLOP/s , 172933.8 tokens/s INFO:__main__:2024-11-30 13:56:06 | Epoch: 1 | Step: 367990 | Dataset: 0-1756298 | Loss: 0.744 | 599 ms/step , 115136.68 GFLOP/s , 173029.3 tokens/s INFO:__main__:2024-11-30 13:56:14 | Validation | Step: 368000 | Val_loss: 0.800 | Best_val_loss: 0.3487 INFO:__main__:2024-11-30 13:56:14 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_135614_step_368000.pt` INFO:__main__:2024-11-30 13:56:16 | Epoch: 1 | Step: 368000 | Dataset: 0-1758698 | Loss: 0.742 | 597 ms/step , 115658.69 GFLOP/s , 119129.3 tokens/s INFO:__main__:2024-11-30 13:56:23 | Epoch: 1 | Step: 368010 | Dataset: 0-1761098 | Loss: 0.734 | 599 ms/step , 115228.30 GFLOP/s , 173068.0 tokens/s INFO:__main__:2024-11-30 13:56:31 | Epoch: 1 | Step: 368020 | Dataset: 0-1763498 | Loss: 0.749 | 600 ms/step , 115113.60 GFLOP/s , 173044.5 tokens/s INFO:__main__:2024-11-30 13:56:38 | Epoch: 1 | Step: 368030 | Dataset: 0-1765898 | Loss: 0.744 | 599 ms/step , 115192.42 GFLOP/s , 173113.3 tokens/s INFO:__main__:2024-11-30 13:56:45 | Epoch: 1 | Step: 368040 | Dataset: 0-1768298 | Loss: 0.736 | 600 ms/step , 115093.75 GFLOP/s , 173065.3 tokens/s INFO:__main__:2024-11-30 13:56:52 | Epoch: 1 | Step: 368050 | Dataset: 0-1770698 | Loss: 0.738 | 599 ms/step , 115235.09 GFLOP/s , 173021.0 tokens/s INFO:__main__:2024-11-30 13:56:59 | Epoch: 1 | Step: 368060 | Dataset: 0-1773098 | Loss: 0.735 | 599 ms/step , 115197.83 GFLOP/s , 173041.4 tokens/s INFO:__main__:2024-11-30 13:57:06 | Epoch: 1 | Step: 368070 | Dataset: 0-1775498 | Loss: 0.747 | 599 ms/step , 115239.45 GFLOP/s , 173044.6 tokens/s INFO:__main__:2024-11-30 13:57:13 | Epoch: 1 | Step: 368080 | Dataset: 0-1777898 | Loss: 0.736 | 599 ms/step , 115145.14 GFLOP/s , 173071.9 tokens/s INFO:__main__:2024-11-30 13:57:20 | Epoch: 1 | Step: 368090 | Dataset: 0-1780298 | Loss: 0.776 | 599 ms/step , 115143.53 GFLOP/s , 173046.0 tokens/s INFO:__main__:2024-11-30 13:57:27 | Epoch: 1 | Step: 368100 | Dataset: 0-1782698 | Loss: 0.735 | 600 ms/step , 115104.79 GFLOP/s , 173118.3 tokens/s INFO:__main__:2024-11-30 13:57:35 | Epoch: 1 | Step: 368110 | Dataset: 0-1785098 | Loss: 0.731 | 598 ms/step , 115320.80 GFLOP/s , 173079.2 tokens/s INFO:__main__:2024-11-30 13:57:42 | Epoch: 1 | Step: 368120 | Dataset: 0-1787498 | Loss: 0.736 | 599 ms/step , 115249.30 GFLOP/s , 173094.5 tokens/s INFO:__main__:2024-11-30 13:57:49 | Epoch: 1 | Step: 368130 | Dataset: 0-1789898 | Loss: 1.100 | 599 ms/step , 115170.10 GFLOP/s , 173041.7 tokens/s INFO:__main__:2024-11-30 13:57:56 | Epoch: 1 | Step: 368140 | Dataset: 0-1792298 | Loss: 1.048 | 599 ms/step , 115231.74 GFLOP/s , 173121.8 tokens/s INFO:__main__:2024-11-30 13:58:03 | Epoch: 1 | Step: 368150 | Dataset: 0-1794698 | Loss: 1.085 | 599 ms/step , 115207.49 GFLOP/s , 173060.1 tokens/s INFO:__main__:2024-11-30 13:58:10 | Epoch: 1 | Step: 368160 | Dataset: 0-1797098 | Loss: 1.064 | 599 ms/step , 115246.45 GFLOP/s , 173070.4 tokens/s INFO:__main__:2024-11-30 13:58:17 | Epoch: 1 | Step: 368170 | Dataset: 0-1799498 | Loss: 1.110 | 600 ms/step , 115106.37 GFLOP/s , 173097.1 tokens/s INFO:__main__:2024-11-30 13:58:24 | Epoch: 1 | Step: 368180 | Dataset: 0-1801898 | Loss: 1.063 | 599 ms/step , 115124.19 GFLOP/s , 173115.9 tokens/s INFO:__main__:2024-11-30 13:58:31 | Epoch: 1 | Step: 368190 | Dataset: 0-1804298 | Loss: 1.057 | 600 ms/step , 114968.61 GFLOP/s , 173078.8 tokens/s INFO:__main__:2024-11-30 13:58:38 | Epoch: 1 | Step: 368200 | Dataset: 0-1806698 | Loss: 1.013 | 599 ms/step , 115172.54 GFLOP/s , 173090.8 tokens/s INFO:__main__:2024-11-30 13:58:45 | Epoch: 1 | Step: 368210 | Dataset: 0-1809098 | Loss: 1.112 | 600 ms/step , 115109.31 GFLOP/s , 173125.5 tokens/s INFO:__main__:2024-11-30 13:58:53 | Epoch: 1 | Step: 368220 | Dataset: 0-1811498 | Loss: 1.036 | 599 ms/step , 115188.63 GFLOP/s , 173195.7 tokens/s INFO:__main__:2024-11-30 13:59:00 | Epoch: 1 | Step: 368230 | Dataset: 0-1813898 | Loss: 0.998 | 600 ms/step , 115055.31 GFLOP/s , 173093.4 tokens/s INFO:__main__:2024-11-30 13:59:07 | Epoch: 1 | Step: 368240 | Dataset: 0-1816298 | Loss: 1.022 | 599 ms/step , 115164.78 GFLOP/s , 173161.1 tokens/s INFO:__main__:2024-11-30 13:59:14 | Epoch: 1 | Step: 368250 | Dataset: 0-1818698 | Loss: 1.061 | 600 ms/step , 115043.96 GFLOP/s , 173164.9 tokens/s INFO:__main__:2024-11-30 13:59:21 | Epoch: 1 | Step: 368260 | Dataset: 0-1821098 | Loss: 1.010 | 599 ms/step , 115239.52 GFLOP/s , 173152.3 tokens/s INFO:__main__:2024-11-30 13:59:28 | Epoch: 1 | Step: 368270 | Dataset: 0-1823498 | Loss: 1.028 | 600 ms/step , 115038.31 GFLOP/s , 173171.8 tokens/s INFO:__main__:2024-11-30 13:59:35 | Epoch: 1 | Step: 368280 | Dataset: 0-1825898 | Loss: 0.987 | 599 ms/step , 115279.08 GFLOP/s , 173266.7 tokens/s INFO:__main__:2024-11-30 13:59:42 | Epoch: 1 | Step: 368290 | Dataset: 0-1828298 | Loss: 0.989 | 599 ms/step , 115178.88 GFLOP/s , 173259.5 tokens/s INFO:__main__:2024-11-30 13:59:49 | Epoch: 1 | Step: 368300 | Dataset: 0-1830698 | Loss: 1.032 | 599 ms/step , 115131.63 GFLOP/s , 173236.7 tokens/s INFO:__main__:2024-11-30 13:59:56 | Epoch: 1 | Step: 368310 | Dataset: 0-1833098 | Loss: 1.036 | 600 ms/step , 115092.46 GFLOP/s , 173171.4 tokens/s INFO:__main__:2024-11-30 14:00:04 | Epoch: 1 | Step: 368320 | Dataset: 0-1835498 | Loss: 0.700 | 598 ms/step , 115337.80 GFLOP/s , 173201.2 tokens/s INFO:__main__:2024-11-30 14:00:11 | Epoch: 1 | Step: 368330 | Dataset: 0-1837898 | Loss: 0.721 | 599 ms/step , 115125.01 GFLOP/s , 173231.9 tokens/s INFO:__main__:2024-11-30 14:00:18 | Epoch: 1 | Step: 368340 | Dataset: 0-1840298 | Loss: 0.741 | 599 ms/step , 115260.07 GFLOP/s , 173233.1 tokens/s INFO:__main__:2024-11-30 14:00:25 | Epoch: 1 | Step: 368350 | Dataset: 0-1842698 | Loss: 0.692 | 599 ms/step , 115158.90 GFLOP/s , 173322.8 tokens/s INFO:__main__:2024-11-30 14:00:32 | Epoch: 1 | Step: 368360 | Dataset: 0-1845098 | Loss: 0.661 | 599 ms/step , 115121.39 GFLOP/s , 173285.0 tokens/s INFO:__main__:2024-11-30 14:00:39 | Epoch: 1 | Step: 368370 | Dataset: 0-1847498 | Loss: 0.710 | 599 ms/step , 115255.29 GFLOP/s , 173247.5 tokens/s INFO:__main__:2024-11-30 14:00:46 | Epoch: 1 | Step: 368380 | Dataset: 0-1849898 | Loss: 0.707 | 599 ms/step , 115160.87 GFLOP/s , 173238.4 tokens/s INFO:__main__:2024-11-30 14:00:53 | Epoch: 1 | Step: 368390 | Dataset: 0-1852298 | Loss: 0.602 | 599 ms/step , 115259.23 GFLOP/s , 173370.8 tokens/s INFO:__main__:2024-11-30 14:01:00 | Epoch: 1 | Step: 368400 | Dataset: 0-1854698 | Loss: 0.727 | 599 ms/step , 115263.12 GFLOP/s , 173244.7 tokens/s INFO:__main__:2024-11-30 14:01:07 | Epoch: 1 | Step: 368410 | Dataset: 0-1857098 | Loss: 0.642 | 600 ms/step , 115035.63 GFLOP/s , 173165.0 tokens/s INFO:__main__:2024-11-30 14:01:14 | Epoch: 1 | Step: 368420 | Dataset: 0-1859498 | Loss: 0.678 | 598 ms/step , 115346.08 GFLOP/s , 173313.5 tokens/s INFO:__main__:2024-11-30 14:01:22 | Epoch: 1 | Step: 368430 | Dataset: 0-1861898 | Loss: 0.637 | 601 ms/step , 114780.55 GFLOP/s , 173243.2 tokens/s INFO:__main__:2024-11-30 14:01:29 | Epoch: 1 | Step: 368440 | Dataset: 0-1864298 | Loss: 0.759 | 598 ms/step , 115416.01 GFLOP/s , 173376.4 tokens/s INFO:__main__:2024-11-30 14:01:36 | Epoch: 1 | Step: 368450 | Dataset: 0-1866698 | Loss: 0.620 | 599 ms/step , 115272.25 GFLOP/s , 173399.2 tokens/s INFO:__main__:2024-11-30 14:01:43 | Epoch: 1 | Step: 368460 | Dataset: 0-1869098 | Loss: 0.679 | 598 ms/step , 115462.22 GFLOP/s , 173358.4 tokens/s INFO:__main__:2024-11-30 14:01:50 | Epoch: 1 | Step: 368470 | Dataset: 0-1871498 | Loss: 0.667 | 599 ms/step , 115186.80 GFLOP/s , 173397.3 tokens/s INFO:__main__:2024-11-30 14:01:57 | Epoch: 1 | Step: 368480 | Dataset: 0-1873898 | Loss: 0.618 | 598 ms/step , 115314.71 GFLOP/s , 173377.2 tokens/s INFO:__main__:2024-11-30 14:02:04 | Epoch: 1 | Step: 368490 | Dataset: 0-1876298 | Loss: 0.659 | 599 ms/step , 115177.00 GFLOP/s , 173428.7 tokens/s INFO:__main__:2024-11-30 14:02:12 | Validation | Step: 368500 | Val_loss: 0.784 | Best_val_loss: 0.3487 INFO:__main__:2024-11-30 14:02:12 | Epoch: 1 | Step: 368500 | Dataset: 0-1878698 | Loss: 0.631 | 596 ms/step , 115779.85 GFLOP/s , 147613.0 tokens/s INFO:__main__:2024-11-30 14:02:19 | Epoch: 1 | Step: 368510 | Dataset: 0-1881098 | Loss: 0.573 | 598 ms/step , 115333.49 GFLOP/s , 173272.9 tokens/s INFO:__main__:2024-11-30 14:02:27 | Epoch: 1 | Step: 368520 | Dataset: 0-1883498 | Loss: 0.669 | 598 ms/step , 115403.86 GFLOP/s , 173240.0 tokens/s INFO:__main__:2024-11-30 14:02:34 | Epoch: 1 | Step: 368530 | Dataset: 0-1885898 | Loss: 0.616 | 598 ms/step , 115373.91 GFLOP/s , 173097.2 tokens/s INFO:__main__:2024-11-30 14:02:41 | Epoch: 1 | Step: 368540 | Dataset: 0-1888298 | Loss: 0.576 | 598 ms/step , 115439.18 GFLOP/s , 173174.5 tokens/s INFO:__main__:2024-11-30 14:02:48 | Epoch: 1 | Step: 368550 | Dataset: 0-1890698 | Loss: 0.700 | 598 ms/step , 115421.48 GFLOP/s , 173125.0 tokens/s INFO:__main__:2024-11-30 14:02:55 | Epoch: 1 | Step: 368560 | Dataset: 0-1893098 | Loss: 0.715 | 598 ms/step , 115381.24 GFLOP/s , 173207.7 tokens/s INFO:__main__:2024-11-30 14:03:02 | Epoch: 1 | Step: 368570 | Dataset: 0-1895498 | Loss: 0.707 | 598 ms/step , 115336.52 GFLOP/s , 173197.0 tokens/s INFO:__main__:2024-11-30 14:03:09 | Epoch: 1 | Step: 368580 | Dataset: 0-1897898 | Loss: 0.605 | 598 ms/step , 115333.13 GFLOP/s , 173007.3 tokens/s INFO:__main__:2024-11-30 14:03:16 | Epoch: 1 | Step: 368590 | Dataset: 0-1900298 | Loss: 0.664 | 598 ms/step , 115344.81 GFLOP/s , 173150.4 tokens/s INFO:__main__:2024-11-30 14:03:23 | Epoch: 1 | Step: 368600 | Dataset: 0-1902698 | Loss: 0.602 | 597 ms/step , 115595.45 GFLOP/s , 173192.3 tokens/s INFO:__main__:2024-11-30 14:03:30 | Epoch: 1 | Step: 368610 | Dataset: 0-1905098 | Loss: 0.643 | 599 ms/step , 115300.54 GFLOP/s , 173187.8 tokens/s INFO:__main__:2024-11-30 14:03:38 | Epoch: 1 | Step: 368620 | Dataset: 0-1907498 | Loss: 0.639 | 598 ms/step , 115468.18 GFLOP/s , 173110.5 tokens/s INFO:__main__:2024-11-30 14:03:45 | Epoch: 1 | Step: 368630 | Dataset: 0-1909898 | Loss: 0.590 | 599 ms/step , 115261.06 GFLOP/s , 173228.0 tokens/s INFO:__main__:2024-11-30 14:03:52 | Epoch: 1 | Step: 368640 | Dataset: 0-1912298 | Loss: 0.609 | 599 ms/step , 115222.03 GFLOP/s , 173140.1 tokens/s INFO:__main__:2024-11-30 14:03:59 | Epoch: 1 | Step: 368650 | Dataset: 0-1914698 | Loss: 0.621 | 598 ms/step , 115376.93 GFLOP/s , 173163.7 tokens/s INFO:__main__:2024-11-30 14:04:06 | Epoch: 1 | Step: 368660 | Dataset: 0-1917098 | Loss: 0.657 | 598 ms/step , 115340.83 GFLOP/s , 173214.3 tokens/s INFO:__main__:2024-11-30 14:04:13 | Epoch: 1 | Step: 368670 | Dataset: 0-1919498 | Loss: 0.630 | 598 ms/step , 115334.42 GFLOP/s , 173281.0 tokens/s INFO:__main__:2024-11-30 14:04:20 | Epoch: 1 | Step: 368680 | Dataset: 0-1921898 | Loss: 0.608 | 598 ms/step , 115358.89 GFLOP/s , 173096.0 tokens/s INFO:__main__:2024-11-30 14:04:27 | Epoch: 1 | Step: 368690 | Dataset: 0-1924298 | Loss: 0.687 | 599 ms/step , 115265.46 GFLOP/s , 173091.7 tokens/s INFO:__main__:2024-11-30 14:04:34 | Epoch: 1 | Step: 368700 | Dataset: 0-1926698 | Loss: 0.684 | 598 ms/step , 115482.51 GFLOP/s , 173125.2 tokens/s INFO:__main__:2024-11-30 14:04:41 | Epoch: 1 | Step: 368710 | Dataset: 0-1929098 | Loss: 0.736 | 598 ms/step , 115317.37 GFLOP/s , 173221.1 tokens/s INFO:__main__:2024-11-30 14:04:49 | Epoch: 1 | Step: 368720 | Dataset: 0-1931498 | Loss: 0.669 | 598 ms/step , 115325.15 GFLOP/s , 173164.3 tokens/s INFO:__main__:2024-11-30 14:04:56 | Epoch: 1 | Step: 368730 | Dataset: 0-1933898 | Loss: 0.664 | 599 ms/step , 115283.37 GFLOP/s , 173230.2 tokens/s INFO:__main__:2024-11-30 14:05:03 | Epoch: 1 | Step: 368740 | Dataset: 0-1936298 | Loss: 0.686 | 598 ms/step , 115395.08 GFLOP/s , 173204.7 tokens/s INFO:__main__:2024-11-30 14:05:10 | Epoch: 1 | Step: 368750 | Dataset: 0-1938698 | Loss: 0.704 | 599 ms/step , 115232.41 GFLOP/s , 173218.6 tokens/s INFO:__main__:2024-11-30 14:05:17 | Epoch: 1 | Step: 368760 | Dataset: 0-1941098 | Loss: 0.664 | 598 ms/step , 115420.20 GFLOP/s , 173266.1 tokens/s INFO:__main__:2024-11-30 14:05:24 | Epoch: 1 | Step: 368770 | Dataset: 0-1943498 | Loss: 0.654 | 598 ms/step , 115325.20 GFLOP/s , 173139.5 tokens/s INFO:__main__:2024-11-30 14:05:31 | Epoch: 1 | Step: 368780 | Dataset: 0-1945898 | Loss: 0.773 | 597 ms/step , 115519.85 GFLOP/s , 173210.9 tokens/s INFO:__main__:2024-11-30 14:05:38 | Epoch: 1 | Step: 368790 | Dataset: 0-1948298 | Loss: 0.642 | 598 ms/step , 115389.33 GFLOP/s , 173242.2 tokens/s INFO:__main__:2024-11-30 14:05:45 | Epoch: 1 | Step: 368800 | Dataset: 0-1950698 | Loss: 0.558 | 598 ms/step , 115316.65 GFLOP/s , 173097.6 tokens/s INFO:__main__:2024-11-30 14:05:52 | Epoch: 1 | Step: 368810 | Dataset: 0-1953098 | Loss: 0.608 | 598 ms/step , 115364.94 GFLOP/s , 173202.7 tokens/s INFO:__main__:2024-11-30 14:05:59 | Epoch: 1 | Step: 368820 | Dataset: 0-1955498 | Loss: 0.606 | 598 ms/step , 115438.35 GFLOP/s , 173248.3 tokens/s INFO:__main__:2024-11-30 14:06:07 | Epoch: 1 | Step: 368830 | Dataset: 0-1957898 | Loss: 0.645 | 598 ms/step , 115376.80 GFLOP/s , 173197.5 tokens/s INFO:__main__:2024-11-30 14:06:14 | Epoch: 1 | Step: 368840 | Dataset: 0-1960298 | Loss: 0.643 | 598 ms/step , 115449.91 GFLOP/s , 173227.4 tokens/s INFO:__main__:2024-11-30 14:06:21 | Epoch: 1 | Step: 368850 | Dataset: 0-1962698 | Loss: 0.684 | 598 ms/step , 115398.53 GFLOP/s , 173228.9 tokens/s INFO:__main__:2024-11-30 14:06:28 | Epoch: 1 | Step: 368860 | Dataset: 0-1965098 | Loss: 0.692 | 598 ms/step , 115499.38 GFLOP/s , 173244.9 tokens/s INFO:__main__:2024-11-30 14:06:35 | Epoch: 1 | Step: 368870 | Dataset: 0-1967498 | Loss: 0.529 | 597 ms/step , 115502.82 GFLOP/s , 173304.3 tokens/s INFO:__main__:2024-11-30 14:06:42 | Epoch: 1 | Step: 368880 | Dataset: 0-1969898 | Loss: 0.487 | 598 ms/step , 115478.22 GFLOP/s , 173350.5 tokens/s INFO:__main__:2024-11-30 14:06:49 | Epoch: 1 | Step: 368890 | Dataset: 0-1972298 | Loss: 0.496 | 597 ms/step , 115520.78 GFLOP/s , 173309.2 tokens/s INFO:__main__:2024-11-30 14:06:56 | Epoch: 1 | Step: 368900 | Dataset: 0-1974698 | Loss: 0.513 | 602 ms/step , 114703.55 GFLOP/s , 173310.9 tokens/s INFO:__main__:2024-11-30 14:07:03 | Epoch: 1 | Step: 368910 | Dataset: 0-1977098 | Loss: 0.498 | 597 ms/step , 115609.76 GFLOP/s , 173349.6 tokens/s INFO:__main__:2024-11-30 14:07:10 | Epoch: 1 | Step: 368920 | Dataset: 0-1979498 | Loss: 0.464 | 598 ms/step , 115499.14 GFLOP/s , 173288.8 tokens/s INFO:__main__:2024-11-30 14:07:17 | Epoch: 1 | Step: 368930 | Dataset: 0-1981898 | Loss: 0.464 | 597 ms/step , 115508.94 GFLOP/s , 173290.2 tokens/s INFO:__main__:2024-11-30 14:07:25 | Epoch: 1 | Step: 368940 | Dataset: 0-1984298 | Loss: 0.467 | 598 ms/step , 115396.65 GFLOP/s , 173197.8 tokens/s INFO:__main__:2024-11-30 14:07:32 | Epoch: 1 | Step: 368950 | Dataset: 0-1986698 | Loss: 0.438 | 598 ms/step , 115360.43 GFLOP/s , 173201.9 tokens/s INFO:__main__:2024-11-30 14:07:39 | Epoch: 1 | Step: 368960 | Dataset: 0-1989098 | Loss: 0.482 | 598 ms/step , 115330.82 GFLOP/s , 173233.8 tokens/s INFO:__main__:2024-11-30 14:07:46 | Epoch: 1 | Step: 368970 | Dataset: 0-1991498 | Loss: 0.234 | 598 ms/step , 115448.59 GFLOP/s , 173158.8 tokens/s INFO:__main__:2024-11-30 14:07:53 | Epoch: 1 | Step: 368980 | Dataset: 0-1993898 | Loss: 0.204 | 598 ms/step , 115425.03 GFLOP/s , 173208.6 tokens/s INFO:__main__:2024-11-30 14:08:00 | Epoch: 1 | Step: 368990 | Dataset: 0-1996298 | Loss: 0.210 | 598 ms/step , 115351.01 GFLOP/s , 173261.2 tokens/s INFO:__main__:2024-11-30 14:08:08 | Validation | Step: 369000 | Val_loss: 0.800 | Best_val_loss: 0.3487 INFO:__main__:2024-11-30 14:08:08 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_140808_step_369000.pt` INFO:__main__:2024-11-30 14:08:10 | Epoch: 1 | Step: 369000 | Dataset: 0-1998698 | Loss: 0.179 | 601 ms/step , 114877.17 GFLOP/s , 118874.5 tokens/s INFO:__main__:2024-11-30 14:08:17 | Epoch: 1 | Step: 369010 | Dataset: 0-2001098 | Loss: 0.159 | 598 ms/step , 115404.97 GFLOP/s , 173471.0 tokens/s INFO:__main__:2024-11-30 14:08:25 | Epoch: 1 | Step: 369020 | Dataset: 0-2003498 | Loss: 0.167 | 598 ms/step , 115327.68 GFLOP/s , 173343.3 tokens/s INFO:__main__:2024-11-30 14:08:32 | Epoch: 1 | Step: 369030 | Dataset: 0-2005898 | Loss: 0.156 | 598 ms/step , 115465.22 GFLOP/s , 173321.0 tokens/s INFO:__main__:2024-11-30 14:08:39 | Epoch: 1 | Step: 369040 | Dataset: 0-2008298 | Loss: 0.151 | 598 ms/step , 115399.66 GFLOP/s , 173302.6 tokens/s INFO:__main__:2024-11-30 14:08:46 | Epoch: 1 | Step: 369050 | Dataset: 0-2010698 | Loss: 0.146 | 598 ms/step , 115376.84 GFLOP/s , 173322.9 tokens/s INFO:__main__:2024-11-30 14:08:53 | Epoch: 1 | Step: 369060 | Dataset: 0-2013098 | Loss: 0.166 | 598 ms/step , 115495.13 GFLOP/s , 173162.2 tokens/s INFO:__main__:2024-11-30 14:09:00 | Epoch: 1 | Step: 369070 | Dataset: 0-2015498 | Loss: 0.149 | 598 ms/step , 115431.63 GFLOP/s , 173248.1 tokens/s INFO:__main__:2024-11-30 14:09:07 | Epoch: 1 | Step: 369080 | Dataset: 0-2017898 | Loss: 0.141 | 598 ms/step , 115385.81 GFLOP/s , 173162.0 tokens/s INFO:__main__:2024-11-30 14:09:14 | Epoch: 1 | Step: 369090 | Dataset: 0-2020298 | Loss: 0.149 | 598 ms/step , 115385.62 GFLOP/s , 173216.7 tokens/s INFO:__main__:2024-11-30 14:09:21 | Epoch: 1 | Step: 369100 | Dataset: 0-2022698 | Loss: 0.139 | 599 ms/step , 115296.13 GFLOP/s , 173235.4 tokens/s INFO:__main__:2024-11-30 14:09:28 | Epoch: 1 | Step: 369110 | Dataset: 0-2025098 | Loss: 0.143 | 598 ms/step , 115488.85 GFLOP/s , 173188.5 tokens/s INFO:__main__:2024-11-30 14:09:35 | Epoch: 1 | Step: 369120 | Dataset: 0-2027498 | Loss: 0.140 | 598 ms/step , 115322.87 GFLOP/s , 173183.9 tokens/s INFO:__main__:2024-11-30 14:09:43 | Epoch: 1 | Step: 369130 | Dataset: 0-2029898 | Loss: 0.138 | 598 ms/step , 115333.22 GFLOP/s , 173213.9 tokens/s INFO:__main__:2024-11-30 14:09:50 | Epoch: 1 | Step: 369140 | Dataset: 0-2032298 | Loss: 0.137 | 598 ms/step , 115458.56 GFLOP/s , 173224.6 tokens/s INFO:__main__:2024-11-30 14:09:57 | Epoch: 1 | Step: 369150 | Dataset: 0-2034698 | Loss: 0.136 | 598 ms/step , 115445.38 GFLOP/s , 173204.4 tokens/s INFO:__main__:2024-11-30 14:10:04 | Epoch: 1 | Step: 369160 | Dataset: 0-2037098 | Loss: 0.151 | 599 ms/step , 115308.96 GFLOP/s , 173089.7 tokens/s INFO:__main__:2024-11-30 14:10:11 | Epoch: 1 | Step: 369170 | Dataset: 0-2039498 | Loss: 0.138 | 598 ms/step , 115361.03 GFLOP/s , 173260.8 tokens/s INFO:__main__:2024-11-30 14:10:18 | Epoch: 1 | Step: 369180 | Dataset: 0-2041898 | Loss: 1.235 | 598 ms/step , 115348.03 GFLOP/s , 173261.8 tokens/s INFO:__main__:2024-11-30 14:10:25 | Epoch: 1 | Step: 369190 | Dataset: 0-2044298 | Loss: 1.135 | 598 ms/step , 115410.66 GFLOP/s , 173177.7 tokens/s INFO:__main__:2024-11-30 14:10:32 | Epoch: 1 | Step: 369200 | Dataset: 0-2046698 | Loss: 1.204 | 598 ms/step , 115396.66 GFLOP/s , 173162.7 tokens/s INFO:__main__:2024-11-30 14:10:39 | Epoch: 1 | Step: 369210 | Dataset: 0-2049098 | Loss: 1.140 | 598 ms/step , 115328.34 GFLOP/s , 173251.1 tokens/s INFO:__main__:2024-11-30 14:10:46 | Epoch: 1 | Step: 369220 | Dataset: 0-2051498 | Loss: 1.026 | 598 ms/step , 115358.52 GFLOP/s , 173257.5 tokens/s INFO:__main__:2024-11-30 14:10:54 | Epoch: 1 | Step: 369230 | Dataset: 0-2053898 | Loss: 0.791 | 598 ms/step , 115343.98 GFLOP/s , 173202.1 tokens/s INFO:__main__:2024-11-30 14:11:01 | Epoch: 1 | Step: 369240 | Dataset: 0-2056298 | Loss: 0.799 | 599 ms/step , 115276.10 GFLOP/s , 173173.3 tokens/s INFO:__main__:2024-11-30 14:11:08 | Epoch: 1 | Step: 369250 | Dataset: 0-2058698 | Loss: 0.722 | 598 ms/step , 115329.54 GFLOP/s , 173202.7 tokens/s INFO:__main__:2024-11-30 14:11:15 | Epoch: 1 | Step: 369260 | Dataset: 0-2061098 | Loss: 0.811 | 599 ms/step , 115277.47 GFLOP/s , 173194.5 tokens/s INFO:__main__:2024-11-30 14:11:22 | Epoch: 1 | Step: 369270 | Dataset: 0-2063498 | Loss: 0.795 | 598 ms/step , 115357.30 GFLOP/s , 173222.1 tokens/s INFO:__main__:2024-11-30 14:11:29 | Epoch: 1 | Step: 369280 | Dataset: 0-2065898 | Loss: 0.738 | 598 ms/step , 115360.62 GFLOP/s , 173214.9 tokens/s INFO:__main__:2024-11-30 14:11:36 | Epoch: 1 | Step: 369290 | Dataset: 0-2068298 | Loss: 0.783 | 598 ms/step , 115437.62 GFLOP/s , 173197.6 tokens/s INFO:__main__:2024-11-30 14:11:43 | Epoch: 1 | Step: 369300 | Dataset: 0-2070698 | Loss: 0.787 | 598 ms/step , 115397.21 GFLOP/s , 173226.1 tokens/s INFO:__main__:2024-11-30 14:11:50 | Epoch: 1 | Step: 369310 | Dataset: 0-2073098 | Loss: 0.715 | 598 ms/step , 115501.39 GFLOP/s , 173212.9 tokens/s INFO:__main__:2024-11-30 14:11:57 | Epoch: 1 | Step: 369320 | Dataset: 0-2075498 | Loss: 0.757 | 598 ms/step , 115385.88 GFLOP/s , 173228.0 tokens/s INFO:__main__:2024-11-30 14:12:04 | Epoch: 1 | Step: 369330 | Dataset: 0-2077898 | Loss: 0.786 | 600 ms/step , 115048.59 GFLOP/s , 173177.4 tokens/s INFO:__main__:2024-11-30 14:12:12 | Epoch: 1 | Step: 369340 | Dataset: 0-2080298 | Loss: 0.781 | 598 ms/step , 115353.35 GFLOP/s , 173252.0 tokens/s INFO:__main__:2024-11-30 14:12:19 | Epoch: 1 | Step: 369350 | Dataset: 0-2082698 | Loss: 0.738 | 600 ms/step , 115104.96 GFLOP/s , 173243.8 tokens/s INFO:__main__:2024-11-30 14:12:26 | Epoch: 1 | Step: 369360 | Dataset: 0-2085098 | Loss: 0.745 | 599 ms/step , 115306.57 GFLOP/s , 173088.1 tokens/s INFO:__main__:2024-11-30 14:12:33 | Epoch: 1 | Step: 369370 | Dataset: 0-2087498 | Loss: 0.779 | 597 ms/step , 115572.95 GFLOP/s , 173183.1 tokens/s INFO:__main__:2024-11-30 14:12:40 | Epoch: 1 | Step: 369380 | Dataset: 0-2089898 | Loss: 0.776 | 599 ms/step , 115264.29 GFLOP/s , 173217.8 tokens/s INFO:__main__:2024-11-30 14:12:47 | Epoch: 1 | Step: 369390 | Dataset: 0-2092298 | Loss: 0.783 | 598 ms/step , 115432.68 GFLOP/s , 173247.1 tokens/s INFO:__main__:2024-11-30 14:12:54 | Epoch: 1 | Step: 369400 | Dataset: 0-2094698 | Loss: 0.760 | 598 ms/step , 115345.17 GFLOP/s , 173191.6 tokens/s INFO:__main__:2024-11-30 14:13:01 | Epoch: 1 | Step: 369410 | Dataset: 0-2097098 | Loss: 0.754 | 598 ms/step , 115379.97 GFLOP/s , 173232.6 tokens/s INFO:__main__:2024-11-30 14:13:08 | Epoch: 1 | Step: 369420 | Dataset: 0-2099498 | Loss: 0.477 | 598 ms/step , 115380.53 GFLOP/s , 173238.8 tokens/s INFO:__main__:2024-11-30 14:13:15 | Epoch: 1 | Step: 369430 | Dataset: 0-2101898 | Loss: 0.429 | 599 ms/step , 115283.41 GFLOP/s , 173171.9 tokens/s INFO:__main__:2024-11-30 14:13:22 | Epoch: 1 | Step: 369440 | Dataset: 0-2104298 | Loss: 0.421 | 598 ms/step , 115383.82 GFLOP/s , 173247.6 tokens/s INFO:__main__:2024-11-30 14:13:30 | Epoch: 1 | Step: 369450 | Dataset: 0-2106698 | Loss: 0.471 | 598 ms/step , 115310.73 GFLOP/s , 173125.2 tokens/s INFO:__main__:2024-11-30 14:13:37 | Epoch: 1 | Step: 369460 | Dataset: 0-2109098 | Loss: 0.463 | 599 ms/step , 115305.48 GFLOP/s , 173287.8 tokens/s INFO:__main__:2024-11-30 14:13:44 | Epoch: 1 | Step: 369470 | Dataset: 0-2111498 | Loss: 0.446 | 597 ms/step , 115596.07 GFLOP/s , 173227.3 tokens/s INFO:__main__:2024-11-30 14:13:51 | Epoch: 1 | Step: 369480 | Dataset: 0-2113898 | Loss: 0.411 | 598 ms/step , 115352.12 GFLOP/s , 173255.3 tokens/s INFO:__main__:2024-11-30 14:13:58 | Epoch: 1 | Step: 369490 | Dataset: 0-2116298 | Loss: 0.515 | 598 ms/step , 115458.32 GFLOP/s , 173215.5 tokens/s INFO:__main__:2024-11-30 14:14:06 | Validation | Step: 369500 | Val_loss: 0.774 | Best_val_loss: 0.3487 INFO:__main__:2024-11-30 14:14:06 | Epoch: 1 | Step: 369500 | Dataset: 0-2118698 | Loss: 0.481 | 596 ms/step , 115698.34 GFLOP/s , 147476.9 tokens/s INFO:__main__:2024-11-30 14:14:13 | Epoch: 1 | Step: 369510 | Dataset: 0-2121098 | Loss: 0.461 | 598 ms/step , 115364.50 GFLOP/s , 173389.4 tokens/s INFO:__main__:2024-11-30 14:14:20 | Epoch: 1 | Step: 369520 | Dataset: 0-2123498 | Loss: 0.429 | 598 ms/step , 115407.85 GFLOP/s , 173223.9 tokens/s INFO:__main__:2024-11-30 14:14:28 | Epoch: 1 | Step: 369530 | Dataset: 0-2125898 | Loss: 0.447 | 599 ms/step , 115170.27 GFLOP/s , 173295.8 tokens/s INFO:__main__:2024-11-30 14:14:35 | Epoch: 1 | Step: 369540 | Dataset: 0-2128298 | Loss: 0.464 | 598 ms/step , 115423.76 GFLOP/s , 173222.4 tokens/s INFO:__main__:2024-11-30 14:14:42 | Epoch: 1 | Step: 369550 | Dataset: 0-2130698 | Loss: 0.485 | 598 ms/step , 115437.72 GFLOP/s , 173169.5 tokens/s INFO:__main__:2024-11-30 14:14:49 | Epoch: 1 | Step: 369560 | Dataset: 0-2133098 | Loss: 0.435 | 599 ms/step , 115229.89 GFLOP/s , 173305.6 tokens/s INFO:__main__:2024-11-30 14:14:56 | Epoch: 1 | Step: 369570 | Dataset: 0-2135498 | Loss: 0.494 | 598 ms/step , 115422.71 GFLOP/s , 173196.8 tokens/s INFO:__main__:2024-11-30 14:15:03 | Epoch: 1 | Step: 369580 | Dataset: 0-2137898 | Loss: 0.469 | 598 ms/step , 115453.04 GFLOP/s , 173286.4 tokens/s INFO:__main__:2024-11-30 14:15:10 | Epoch: 1 | Step: 369590 | Dataset: 0-2140298 | Loss: 0.468 | 598 ms/step , 115364.23 GFLOP/s , 173301.0 tokens/s INFO:__main__:2024-11-30 14:15:17 | Epoch: 1 | Step: 369600 | Dataset: 0-2142698 | Loss: 0.427 | 598 ms/step , 115428.29 GFLOP/s , 173236.9 tokens/s INFO:__main__:2024-11-30 14:15:24 | Epoch: 1 | Step: 369610 | Dataset: 0-2145098 | Loss: 0.432 | 598 ms/step , 115491.30 GFLOP/s , 173230.4 tokens/s INFO:__main__:2024-11-30 14:15:31 | Epoch: 1 | Step: 369620 | Dataset: 0-2147498 | Loss: 0.471 | 597 ms/step , 115518.09 GFLOP/s , 173298.4 tokens/s INFO:__main__:2024-11-30 14:15:38 | Epoch: 1 | Step: 369630 | Dataset: 0-2149898 | Loss: 0.465 | 598 ms/step , 115469.72 GFLOP/s , 173240.0 tokens/s INFO:__main__:2024-11-30 14:15:46 | Epoch: 1 | Step: 369640 | Dataset: 0-2152298 | Loss: 0.469 | 598 ms/step , 115405.45 GFLOP/s , 173279.3 tokens/s INFO:__main__:2024-11-30 14:15:53 | Epoch: 1 | Step: 369650 | Dataset: 0-2154698 | Loss: 0.477 | 598 ms/step , 115476.42 GFLOP/s , 173170.2 tokens/s INFO:__main__:2024-11-30 14:16:00 | Epoch: 1 | Step: 369660 | Dataset: 0-2157098 | Loss: 0.475 | 598 ms/step , 115419.41 GFLOP/s , 173269.5 tokens/s INFO:__main__:2024-11-30 14:16:07 | Epoch: 1 | Step: 369670 | Dataset: 0-2159498 | Loss: 0.414 | 597 ms/step , 115530.12 GFLOP/s , 173145.0 tokens/s INFO:__main__:2024-11-30 14:16:14 | Epoch: 1 | Step: 369680 | Dataset: 0-2161898 | Loss: 0.429 | 599 ms/step , 115304.15 GFLOP/s , 173166.4 tokens/s INFO:__main__:2024-11-30 14:16:21 | Epoch: 1 | Step: 369690 | Dataset: 0-2164298 | Loss: 0.463 | 598 ms/step , 115419.51 GFLOP/s , 173307.0 tokens/s INFO:__main__:2024-11-30 14:16:28 | Epoch: 1 | Step: 369700 | Dataset: 0-2166698 | Loss: 0.423 | 598 ms/step , 115352.74 GFLOP/s , 173207.9 tokens/s INFO:__main__:2024-11-30 14:16:35 | Epoch: 1 | Step: 369710 | Dataset: 0-2169098 | Loss: 0.492 | 598 ms/step , 115312.18 GFLOP/s , 173236.6 tokens/s INFO:__main__:2024-11-30 14:16:42 | Epoch: 1 | Step: 369720 | Dataset: 0-2171498 | Loss: 0.432 | 598 ms/step , 115409.49 GFLOP/s , 173305.3 tokens/s INFO:__main__:2024-11-30 14:16:49 | Epoch: 1 | Step: 369730 | Dataset: 0-2173898 | Loss: 0.421 | 598 ms/step , 115389.04 GFLOP/s , 173255.9 tokens/s INFO:__main__:2024-11-30 14:16:57 | Epoch: 1 | Step: 369740 | Dataset: 0-2176298 | Loss: 0.410 | 598 ms/step , 115315.25 GFLOP/s , 173287.6 tokens/s INFO:__main__:2024-11-30 14:17:04 | Epoch: 1 | Step: 369750 | Dataset: 0-2178698 | Loss: 0.385 | 598 ms/step , 115352.56 GFLOP/s , 173244.5 tokens/s INFO:__main__:2024-11-30 14:17:11 | Epoch: 1 | Step: 369760 | Dataset: 0-2181098 | Loss: 0.471 | 598 ms/step , 115328.67 GFLOP/s , 173195.4 tokens/s INFO:__main__:2024-11-30 14:17:18 | Epoch: 1 | Step: 369770 | Dataset: 0-2183498 | Loss: 0.477 | 598 ms/step , 115365.31 GFLOP/s , 173209.1 tokens/s INFO:__main__:2024-11-30 14:17:25 | Epoch: 1 | Step: 369780 | Dataset: 0-2185898 | Loss: 0.413 | 598 ms/step , 115391.39 GFLOP/s , 173327.5 tokens/s INFO:__main__:2024-11-30 14:17:32 | Epoch: 1 | Step: 369790 | Dataset: 0-2188298 | Loss: 0.430 | 599 ms/step , 115144.00 GFLOP/s , 173131.0 tokens/s INFO:__main__:2024-11-30 14:17:39 | Epoch: 1 | Step: 369800 | Dataset: 0-2190698 | Loss: 0.553 | 598 ms/step , 115392.01 GFLOP/s , 173154.2 tokens/s INFO:__main__:2024-11-30 14:17:46 | Epoch: 1 | Step: 369810 | Dataset: 0-2193098 | Loss: 0.627 | 599 ms/step , 115300.63 GFLOP/s , 173206.5 tokens/s INFO:__main__:2024-11-30 14:17:53 | Epoch: 1 | Step: 369820 | Dataset: 0-2195498 | Loss: 0.717 | 598 ms/step , 115380.03 GFLOP/s , 173307.8 tokens/s INFO:__main__:2024-11-30 14:18:00 | Epoch: 1 | Step: 369830 | Dataset: 0-2197898 | Loss: 0.672 | 599 ms/step , 115272.32 GFLOP/s , 173257.2 tokens/s INFO:__main__:2024-11-30 14:18:07 | Epoch: 1 | Step: 369840 | Dataset: 0-2200298 | Loss: 0.675 | 598 ms/step , 115386.10 GFLOP/s , 173292.6 tokens/s INFO:__main__:2024-11-30 14:18:15 | Epoch: 1 | Step: 369850 | Dataset: 0-2202698 | Loss: 0.618 | 599 ms/step , 115198.67 GFLOP/s , 173284.8 tokens/s INFO:__main__:2024-11-30 14:18:22 | Epoch: 1 | Step: 369860 | Dataset: 0-2205098 | Loss: 0.607 | 598 ms/step , 115373.64 GFLOP/s , 173255.8 tokens/s INFO:__main__:2024-11-30 14:18:29 | Epoch: 1 | Step: 369870 | Dataset: 0-2207498 | Loss: 0.646 | 598 ms/step , 115341.12 GFLOP/s , 173269.0 tokens/s INFO:__main__:2024-11-30 14:18:36 | Epoch: 1 | Step: 369880 | Dataset: 0-2209898 | Loss: 0.674 | 598 ms/step , 115468.64 GFLOP/s , 173256.2 tokens/s INFO:__main__:2024-11-30 14:18:43 | Epoch: 1 | Step: 369890 | Dataset: 0-2212298 | Loss: 0.679 | 598 ms/step , 115389.74 GFLOP/s , 173274.8 tokens/s INFO:__main__:2024-11-30 14:18:50 | Epoch: 1 | Step: 369900 | Dataset: 0-2214698 | Loss: 0.576 | 598 ms/step , 115460.41 GFLOP/s , 173314.3 tokens/s INFO:__main__:2024-11-30 14:18:57 | Epoch: 1 | Step: 369910 | Dataset: 0-2217098 | Loss: 0.645 | 598 ms/step , 115316.27 GFLOP/s , 173276.4 tokens/s INFO:__main__:2024-11-30 14:19:04 | Epoch: 1 | Step: 369920 | Dataset: 0-2219498 | Loss: 0.722 | 598 ms/step , 115362.27 GFLOP/s , 173238.4 tokens/s INFO:__main__:2024-11-30 14:19:11 | Epoch: 1 | Step: 369930 | Dataset: 0-2221898 | Loss: 0.618 | 598 ms/step , 115368.65 GFLOP/s , 173252.4 tokens/s INFO:__main__:2024-11-30 14:19:18 | Epoch: 1 | Step: 369940 | Dataset: 0-2224298 | Loss: 0.671 | 598 ms/step , 115382.44 GFLOP/s , 173204.9 tokens/s INFO:__main__:2024-11-30 14:19:25 | Epoch: 1 | Step: 369950 | Dataset: 0-2226698 | Loss: 0.603 | 598 ms/step , 115489.26 GFLOP/s , 173195.2 tokens/s INFO:__main__:2024-11-30 14:19:33 | Epoch: 1 | Step: 369960 | Dataset: 0-2229098 | Loss: 0.588 | 598 ms/step , 115429.66 GFLOP/s , 173076.9 tokens/s INFO:__main__:2024-11-30 14:19:40 | Epoch: 1 | Step: 369970 | Dataset: 0-2231498 | Loss: 0.555 | 597 ms/step , 115548.47 GFLOP/s , 173328.3 tokens/s INFO:__main__:2024-11-30 14:19:47 | Epoch: 1 | Step: 369980 | Dataset: 0-2233898 | Loss: 0.537 | 598 ms/step , 115470.42 GFLOP/s , 173358.5 tokens/s INFO:__main__:2024-11-30 14:19:54 | Epoch: 1 | Step: 369990 | Dataset: 0-2236298 | Loss: 0.546 | 598 ms/step , 115487.97 GFLOP/s , 173248.6 tokens/s INFO:__main__:2024-11-30 14:20:01 | Validation | Step: 370000 | Val_loss: 0.785 | Best_val_loss: 0.3487 INFO:__main__:2024-11-30 14:20:01 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_142001_step_370000.pt` INFO:__main__:2024-11-30 14:20:04 | Epoch: 1 | Step: 370000 | Dataset: 0-2238698 | Loss: 0.540 | 595 ms/step , 116065.14 GFLOP/s , 119943.9 tokens/s INFO:__main__:2024-11-30 14:20:11 | Epoch: 1 | Step: 370010 | Dataset: 0-2241098 | Loss: 0.575 | 598 ms/step , 115459.98 GFLOP/s , 173450.8 tokens/s INFO:__main__:2024-11-30 14:20:18 | Epoch: 1 | Step: 370020 | Dataset: 0-2243498 | Loss: 0.539 | 598 ms/step , 115388.30 GFLOP/s , 173482.3 tokens/s INFO:__main__:2024-11-30 14:20:25 | Epoch: 1 | Step: 370030 | Dataset: 0-2245898 | Loss: 0.257 | 597 ms/step , 115530.32 GFLOP/s , 173370.2 tokens/s INFO:__main__:2024-11-30 14:20:32 | Epoch: 1 | Step: 370040 | Dataset: 0-2248298 | Loss: 0.254 | 597 ms/step , 115522.91 GFLOP/s , 173368.8 tokens/s INFO:__main__:2024-11-30 14:20:40 | Epoch: 1 | Step: 370050 | Dataset: 0-2250698 | Loss: 0.241 | 597 ms/step , 115525.59 GFLOP/s , 173496.6 tokens/s INFO:__main__:2024-11-30 14:20:47 | Epoch: 1 | Step: 370060 | Dataset: 0-2253098 | Loss: 0.234 | 597 ms/step , 115506.93 GFLOP/s , 173410.2 tokens/s INFO:__main__:2024-11-30 14:20:54 | Epoch: 1 | Step: 370070 | Dataset: 0-2255498 | Loss: 0.220 | 598 ms/step , 115468.55 GFLOP/s , 173355.7 tokens/s INFO:__main__:2024-11-30 14:21:01 | Epoch: 1 | Step: 370080 | Dataset: 0-2257898 | Loss: 0.206 | 598 ms/step , 115475.39 GFLOP/s , 173383.8 tokens/s INFO:__main__:2024-11-30 14:21:08 | Epoch: 1 | Step: 370090 | Dataset: 0-2260298 | Loss: 0.211 | 597 ms/step , 115563.70 GFLOP/s , 173354.3 tokens/s INFO:__main__:2024-11-30 14:21:15 | Epoch: 1 | Step: 370100 | Dataset: 0-2262698 | Loss: 0.205 | 598 ms/step , 115465.22 GFLOP/s , 173364.7 tokens/s INFO:__main__:2024-11-30 14:21:22 | Epoch: 1 | Step: 370110 | Dataset: 0-2265098 | Loss: 0.216 | 598 ms/step , 115432.16 GFLOP/s , 173355.7 tokens/s INFO:__main__:2024-11-30 14:21:29 | Epoch: 1 | Step: 370120 | Dataset: 0-2267498 | Loss: 0.213 | 598 ms/step , 115453.22 GFLOP/s , 173340.5 tokens/s INFO:__main__:2024-11-30 14:21:36 | Epoch: 1 | Step: 370130 | Dataset: 0-2269898 | Loss: 0.206 | 598 ms/step , 115438.19 GFLOP/s , 173374.0 tokens/s INFO:__main__:2024-11-30 14:21:43 | Epoch: 1 | Step: 370140 | Dataset: 0-2272298 | Loss: 0.199 | 598 ms/step , 115452.15 GFLOP/s , 173267.5 tokens/s INFO:__main__:2024-11-30 14:21:50 | Epoch: 1 | Step: 370150 | Dataset: 0-2274698 | Loss: 0.211 | 598 ms/step , 115465.57 GFLOP/s , 173309.9 tokens/s INFO:__main__:2024-11-30 14:21:57 | Epoch: 1 | Step: 370160 | Dataset: 0-2277098 | Loss: 0.206 | 597 ms/step , 115542.08 GFLOP/s , 173266.8 tokens/s INFO:__main__:2024-11-30 14:22:05 | Epoch: 1 | Step: 370170 | Dataset: 0-2279498 | Loss: 0.210 | 598 ms/step , 115442.01 GFLOP/s , 173314.9 tokens/s INFO:__main__:2024-11-30 14:22:12 | Epoch: 1 | Step: 370180 | Dataset: 0-2281898 | Loss: 0.201 | 597 ms/step , 115531.35 GFLOP/s , 173371.2 tokens/s INFO:__main__:2024-11-30 14:22:19 | Epoch: 1 | Step: 370190 | Dataset: 0-2284298 | Loss: 0.208 | 598 ms/step , 115436.38 GFLOP/s , 173311.9 tokens/s INFO:__main__:2024-11-30 14:22:26 | Epoch: 1 | Step: 370200 | Dataset: 0-2286698 | Loss: 0.204 | 598 ms/step , 115348.48 GFLOP/s , 173349.0 tokens/s INFO:__main__:2024-11-30 14:22:33 | Epoch: 1 | Step: 370210 | Dataset: 0-2289098 | Loss: 0.207 | 597 ms/step , 115522.48 GFLOP/s , 173311.7 tokens/s INFO:__main__:2024-11-30 14:22:40 | Epoch: 1 | Step: 370220 | Dataset: 0-2291498 | Loss: 0.206 | 598 ms/step , 115438.91 GFLOP/s , 173114.0 tokens/s INFO:__main__:2024-11-30 14:22:47 | Epoch: 1 | Step: 370230 | Dataset: 0-2293898 | Loss: 0.199 | 598 ms/step , 115419.03 GFLOP/s , 173338.5 tokens/s INFO:__main__:2024-11-30 14:22:54 | Epoch: 1 | Step: 370240 | Dataset: 0-2296298 | Loss: 1.298 | 598 ms/step , 115349.61 GFLOP/s , 173282.9 tokens/s INFO:__main__:2024-11-30 14:23:01 | Epoch: 1 | Step: 370250 | Dataset: 0-2298698 | Loss: 0.418 | 597 ms/step , 115517.31 GFLOP/s , 173115.7 tokens/s INFO:__main__:2024-11-30 14:23:08 | Epoch: 1 | Step: 370260 | Dataset: 0-2301098 | Loss: 0.413 | 597 ms/step , 115505.58 GFLOP/s , 173275.2 tokens/s INFO:__main__:2024-11-30 14:23:15 | Epoch: 1 | Step: 370270 | Dataset: 0-2303498 | Loss: 0.440 | 598 ms/step , 115407.71 GFLOP/s , 173251.7 tokens/s INFO:__main__:2024-11-30 14:23:23 | Epoch: 1 | Step: 370280 | Dataset: 0-2305898 | Loss: 0.346 | 598 ms/step , 115419.88 GFLOP/s , 173182.6 tokens/s INFO:__main__:2024-11-30 14:23:30 | Epoch: 1 | Step: 370290 | Dataset: 0-2308298 | Loss: 0.421 | 597 ms/step , 115511.94 GFLOP/s , 173286.6 tokens/s INFO:__main__:2024-11-30 14:23:37 | Epoch: 1 | Step: 370300 | Dataset: 0-2310698 | Loss: 0.391 | 598 ms/step , 115363.06 GFLOP/s , 173345.3 tokens/s INFO:__main__:2024-11-30 14:23:44 | Epoch: 1 | Step: 370310 | Dataset: 0-2313098 | Loss: 0.470 | 597 ms/step , 115533.84 GFLOP/s , 173231.5 tokens/s INFO:__main__:2024-11-30 14:23:51 | Epoch: 1 | Step: 370320 | Dataset: 0-2315498 | Loss: 0.404 | 598 ms/step , 115386.02 GFLOP/s , 173265.5 tokens/s INFO:__main__:2024-11-30 14:23:58 | Epoch: 1 | Step: 370330 | Dataset: 0-2317898 | Loss: 0.413 | 598 ms/step , 115452.40 GFLOP/s , 173266.3 tokens/s INFO:__main__:2024-11-30 14:24:05 | Epoch: 1 | Step: 370340 | Dataset: 0-2320298 | Loss: 0.430 | 598 ms/step , 115337.41 GFLOP/s , 173222.1 tokens/s INFO:__main__:2024-11-30 14:24:12 | Epoch: 1 | Step: 370350 | Dataset: 0-2322698 | Loss: 0.410 | 598 ms/step , 115332.89 GFLOP/s , 173259.6 tokens/s INFO:__main__:2024-11-30 14:24:19 | Epoch: 1 | Step: 370360 | Dataset: 0-2325098 | Loss: 0.399 | 598 ms/step , 115309.46 GFLOP/s , 173287.0 tokens/s INFO:__main__:2024-11-30 14:24:26 | Epoch: 1 | Step: 370370 | Dataset: 0-2327498 | Loss: 0.416 | 598 ms/step , 115383.08 GFLOP/s , 173207.1 tokens/s INFO:__main__:2024-11-30 14:24:34 | Epoch: 1 | Step: 370380 | Dataset: 0-2329898 | Loss: 0.416 | 598 ms/step , 115463.25 GFLOP/s , 173242.8 tokens/s INFO:__main__:2024-11-30 14:24:41 | Epoch: 1 | Step: 370390 | Dataset: 0-2332298 | Loss: 0.367 | 598 ms/step , 115363.74 GFLOP/s , 173273.5 tokens/s INFO:__main__:2024-11-30 14:24:48 | Epoch: 1 | Step: 370400 | Dataset: 0-2334698 | Loss: 0.377 | 598 ms/step , 115375.76 GFLOP/s , 173294.2 tokens/s INFO:__main__:2024-11-30 14:24:55 | Epoch: 1 | Step: 370410 | Dataset: 0-2337098 | Loss: 0.393 | 598 ms/step , 115337.58 GFLOP/s , 173210.7 tokens/s INFO:__main__:2024-11-30 14:25:02 | Epoch: 1 | Step: 370420 | Dataset: 0-2339498 | Loss: 0.344 | 598 ms/step , 115399.95 GFLOP/s , 173343.7 tokens/s INFO:__main__:2024-11-30 14:25:09 | Epoch: 1 | Step: 370430 | Dataset: 0-2341898 | Loss: 0.358 | 598 ms/step , 115466.88 GFLOP/s , 173224.1 tokens/s INFO:__main__:2024-11-30 14:25:16 | Epoch: 1 | Step: 370440 | Dataset: 0-2344298 | Loss: 0.419 | 598 ms/step , 115446.12 GFLOP/s , 173255.6 tokens/s INFO:__main__:2024-11-30 14:25:23 | Epoch: 1 | Step: 370450 | Dataset: 0-2346698 | Loss: 0.382 | 598 ms/step , 115383.83 GFLOP/s , 173229.7 tokens/s INFO:__main__:2024-11-30 14:25:30 | Epoch: 1 | Step: 370460 | Dataset: 0-2349098 | Loss: 0.415 | 598 ms/step , 115442.24 GFLOP/s , 173223.7 tokens/s INFO:__main__:2024-11-30 14:25:37 | Epoch: 1 | Step: 370470 | Dataset: 0-2351498 | Loss: 0.408 | 598 ms/step , 115448.26 GFLOP/s , 173273.5 tokens/s INFO:__main__:2024-11-30 14:25:44 | Epoch: 1 | Step: 370480 | Dataset: 0-2353898 | Loss: 0.442 | 598 ms/step , 115370.70 GFLOP/s , 173254.4 tokens/s INFO:__main__:2024-11-30 14:25:52 | Epoch: 1 | Step: 370490 | Dataset: 0-2356298 | Loss: 0.451 | 598 ms/step , 115388.65 GFLOP/s , 173275.2 tokens/s INFO:__main__:2024-11-30 14:25:59 | Validation | Step: 370500 | Val_loss: 0.806 | Best_val_loss: 0.3487 INFO:__main__:2024-11-30 14:26:00 | Epoch: 1 | Step: 370500 | Dataset: 0-2358698 | Loss: 0.394 | 597 ms/step , 115688.92 GFLOP/s , 147570.2 tokens/s INFO:__main__:2024-11-30 14:26:07 | Epoch: 1 | Step: 370510 | Dataset: 0-2361098 | Loss: 0.793 | 598 ms/step , 115322.38 GFLOP/s , 173404.4 tokens/s INFO:__main__:2024-11-30 14:26:14 | Epoch: 1 | Step: 370520 | Dataset: 0-2363498 | Loss: 0.780 | 598 ms/step , 115326.04 GFLOP/s , 173332.5 tokens/s INFO:__main__:2024-11-30 14:26:21 | Epoch: 1 | Step: 370530 | Dataset: 0-2365898 | Loss: 0.691 | 599 ms/step , 115304.94 GFLOP/s , 173309.0 tokens/s INFO:__main__:2024-11-30 14:26:28 | Epoch: 1 | Step: 370540 | Dataset: 0-2368298 | Loss: 0.685 | 597 ms/step , 115539.34 GFLOP/s , 173274.2 tokens/s INFO:__main__:2024-11-30 14:26:35 | Epoch: 1 | Step: 370550 | Dataset: 0-2370698 | Loss: 0.692 | 598 ms/step , 115324.76 GFLOP/s , 173296.1 tokens/s INFO:__main__:2024-11-30 14:26:42 | Epoch: 1 | Step: 370560 | Dataset: 0-2373098 | Loss: 0.715 | 598 ms/step , 115475.72 GFLOP/s , 173306.1 tokens/s INFO:__main__:2024-11-30 14:26:49 | Epoch: 1 | Step: 370570 | Dataset: 0-2375498 | Loss: 0.636 | 598 ms/step , 115395.06 GFLOP/s , 173267.6 tokens/s INFO:__main__:2024-11-30 14:26:57 | Epoch: 1 | Step: 370580 | Dataset: 0-2377898 | Loss: 0.685 | 599 ms/step , 115225.18 GFLOP/s , 173157.5 tokens/s INFO:__main__:2024-11-30 14:27:04 | Epoch: 1 | Step: 370590 | Dataset: 0-2380298 | Loss: 0.653 | 598 ms/step , 115428.21 GFLOP/s , 173264.1 tokens/s INFO:__main__:2024-11-30 14:27:11 | Epoch: 1 | Step: 370600 | Dataset: 0-2382698 | Loss: 0.672 | 598 ms/step , 115351.73 GFLOP/s , 173293.6 tokens/s INFO:__main__:2024-11-30 14:27:18 | Epoch: 1 | Step: 370610 | Dataset: 0-2385098 | Loss: 0.717 | 598 ms/step , 115361.91 GFLOP/s , 173268.9 tokens/s INFO:__main__:2024-11-30 14:27:25 | Epoch: 1 | Step: 370620 | Dataset: 0-2387498 | Loss: 0.675 | 599 ms/step , 115274.28 GFLOP/s , 173227.6 tokens/s INFO:__main__:2024-11-30 14:27:32 | Epoch: 1 | Step: 370630 | Dataset: 0-2389898 | Loss: 0.666 | 598 ms/step , 115435.76 GFLOP/s , 173224.7 tokens/s INFO:__main__:2024-11-30 14:27:39 | Epoch: 1 | Step: 370640 | Dataset: 0-2392298 | Loss: 0.671 | 598 ms/step , 115363.13 GFLOP/s , 173277.7 tokens/s INFO:__main__:2024-11-30 14:27:46 | Epoch: 1 | Step: 370650 | Dataset: 0-2394698 | Loss: 0.682 | 598 ms/step , 115372.75 GFLOP/s , 173265.9 tokens/s INFO:__main__:2024-11-30 14:27:53 | Epoch: 1 | Step: 370660 | Dataset: 0-2397098 | Loss: 0.671 | 599 ms/step , 115276.65 GFLOP/s , 173206.9 tokens/s INFO:__main__:2024-11-30 14:28:00 | Epoch: 1 | Step: 370670 | Dataset: 0-2399498 | Loss: 0.663 | 599 ms/step , 115280.35 GFLOP/s , 173258.4 tokens/s INFO:__main__:2024-11-30 14:28:07 | Epoch: 1 | Step: 370680 | Dataset: 0-2401898 | Loss: 0.718 | 599 ms/step , 115161.66 GFLOP/s , 173217.6 tokens/s INFO:__main__:2024-11-30 14:28:15 | Epoch: 1 | Step: 370690 | Dataset: 0-2404298 | Loss: 0.720 | 598 ms/step , 115361.75 GFLOP/s , 173355.7 tokens/s INFO:__main__:2024-11-30 14:28:22 | Epoch: 1 | Step: 370700 | Dataset: 0-2406698 | Loss: 0.719 | 599 ms/step , 115121.66 GFLOP/s , 173324.3 tokens/s INFO:__main__:2024-11-30 14:28:29 | Epoch: 1 | Step: 370710 | Dataset: 0-2409098 | Loss: 0.712 | 598 ms/step , 115316.35 GFLOP/s , 173329.7 tokens/s INFO:__main__:2024-11-30 14:28:36 | Epoch: 1 | Step: 370720 | Dataset: 0-2411498 | Loss: 0.698 | 598 ms/step , 115314.69 GFLOP/s , 173289.5 tokens/s INFO:__main__:2024-11-30 14:28:43 | Epoch: 1 | Step: 370730 | Dataset: 0-2413898 | Loss: 0.697 | 599 ms/step , 115302.29 GFLOP/s , 173241.0 tokens/s INFO:__main__:2024-11-30 14:28:50 | Epoch: 1 | Step: 370740 | Dataset: 0-2416298 | Loss: 0.726 | 598 ms/step , 115490.25 GFLOP/s , 173257.1 tokens/s INFO:__main__:2024-11-30 14:28:57 | Epoch: 1 | Step: 370750 | Dataset: 0-2418698 | Loss: 0.699 | 598 ms/step , 115393.24 GFLOP/s , 173232.5 tokens/s INFO:__main__:2024-11-30 14:29:04 | Epoch: 1 | Step: 370760 | Dataset: 0-2421098 | Loss: 0.702 | 598 ms/step , 115367.08 GFLOP/s , 173339.3 tokens/s INFO:__main__:2024-11-30 14:29:11 | Epoch: 1 | Step: 370770 | Dataset: 0-2423498 | Loss: 0.708 | 598 ms/step , 115419.48 GFLOP/s , 173334.8 tokens/s INFO:__main__:2024-11-30 14:29:18 | Epoch: 1 | Step: 370780 | Dataset: 0-2425898 | Loss: 0.714 | 599 ms/step , 115216.27 GFLOP/s , 173281.8 tokens/s INFO:__main__:2024-11-30 14:29:25 | Epoch: 1 | Step: 370790 | Dataset: 0-2428298 | Loss: 0.700 | 599 ms/step , 115206.82 GFLOP/s , 173283.9 tokens/s INFO:__main__:2024-11-30 14:29:33 | Epoch: 1 | Step: 370800 | Dataset: 0-2430698 | Loss: 0.678 | 599 ms/step , 115290.26 GFLOP/s , 173273.9 tokens/s INFO:__main__:2024-11-30 14:29:40 | Epoch: 1 | Step: 370810 | Dataset: 0-2433098 | Loss: 0.695 | 598 ms/step , 115330.65 GFLOP/s , 173249.0 tokens/s INFO:__main__:2024-11-30 14:29:47 | Epoch: 1 | Step: 370820 | Dataset: 0-2435498 | Loss: 0.623 | 599 ms/step , 115267.91 GFLOP/s , 173286.2 tokens/s INFO:__main__:2024-11-30 14:29:54 | Epoch: 1 | Step: 370830 | Dataset: 0-2437898 | Loss: 0.692 | 598 ms/step , 115439.25 GFLOP/s , 173268.8 tokens/s INFO:__main__:2024-11-30 14:30:01 | Epoch: 1 | Step: 370840 | Dataset: 0-2440298 | Loss: 0.677 | 598 ms/step , 115358.90 GFLOP/s , 173264.1 tokens/s INFO:__main__:2024-11-30 14:30:08 | Epoch: 1 | Step: 370850 | Dataset: 0-2442698 | Loss: 0.652 | 599 ms/step , 115265.42 GFLOP/s , 173248.2 tokens/s INFO:__main__:2024-11-30 14:30:15 | Epoch: 1 | Step: 370860 | Dataset: 0-2445098 | Loss: 0.660 | 599 ms/step , 115296.68 GFLOP/s , 173118.4 tokens/s INFO:__main__:2024-11-30 14:30:22 | Epoch: 1 | Step: 370870 | Dataset: 0-2447498 | Loss: 0.648 | 598 ms/step , 115406.82 GFLOP/s , 173319.5 tokens/s INFO:__main__:2024-11-30 14:30:29 | Epoch: 1 | Step: 370880 | Dataset: 0-2449898 | Loss: 0.673 | 598 ms/step , 115364.94 GFLOP/s , 173283.5 tokens/s INFO:__main__:2024-11-30 14:30:36 | Epoch: 1 | Step: 370890 | Dataset: 0-2452298 | Loss: 0.635 | 598 ms/step , 115421.13 GFLOP/s , 173273.5 tokens/s INFO:__main__:2024-11-30 14:30:44 | Epoch: 1 | Step: 370900 | Dataset: 0-2454698 | Loss: 0.635 | 598 ms/step , 115325.47 GFLOP/s , 173262.4 tokens/s INFO:__main__:2024-11-30 14:30:51 | Epoch: 1 | Step: 370910 | Dataset: 0-2457098 | Loss: 0.693 | 598 ms/step , 115327.80 GFLOP/s , 173131.1 tokens/s INFO:__main__:2024-11-30 14:30:58 | Epoch: 1 | Step: 370920 | Dataset: 0-2459498 | Loss: 0.671 | 599 ms/step , 115284.46 GFLOP/s , 173282.4 tokens/s INFO:__main__:2024-11-30 14:31:05 | Epoch: 1 | Step: 370930 | Dataset: 0-2461898 | Loss: 0.681 | 599 ms/step , 115275.08 GFLOP/s , 173262.3 tokens/s INFO:__main__:2024-11-30 14:31:12 | Epoch: 1 | Step: 370940 | Dataset: 0-2464298 | Loss: 0.627 | 598 ms/step , 115373.86 GFLOP/s , 173253.3 tokens/s INFO:__main__:2024-11-30 14:31:19 | Epoch: 1 | Step: 370950 | Dataset: 0-2466698 | Loss: 0.638 | 598 ms/step , 115479.54 GFLOP/s , 173380.8 tokens/s INFO:__main__:2024-11-30 14:31:26 | Epoch: 1 | Step: 370960 | Dataset: 0-2469098 | Loss: 0.714 | 598 ms/step , 115385.82 GFLOP/s , 173248.9 tokens/s INFO:__main__:2024-11-30 14:31:33 | Epoch: 1 | Step: 370970 | Dataset: 0-2471498 | Loss: 0.674 | 599 ms/step , 115241.78 GFLOP/s , 173322.0 tokens/s INFO:__main__:2024-11-30 14:31:40 | Epoch: 1 | Step: 370980 | Dataset: 0-2473898 | Loss: 0.630 | 599 ms/step , 115260.27 GFLOP/s , 173279.1 tokens/s INFO:__main__:2024-11-30 14:31:47 | Epoch: 1 | Step: 370990 | Dataset: 0-2476298 | Loss: 0.704 | 598 ms/step , 115453.63 GFLOP/s , 173286.8 tokens/s INFO:__main__:2024-11-30 14:31:55 | Validation | Step: 371000 | Val_loss: 0.851 | Best_val_loss: 0.3487 INFO:__main__:2024-11-30 14:31:55 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_143155_step_371000.pt` INFO:__main__:2024-11-30 14:31:58 | Epoch: 1 | Step: 371000 | Dataset: 0-2478698 | Loss: 0.764 | 594 ms/step , 116126.74 GFLOP/s , 119475.6 tokens/s INFO:__main__:2024-11-30 14:32:05 | Epoch: 1 | Step: 371010 | Dataset: 0-2481098 | Loss: 0.677 | 598 ms/step , 115468.31 GFLOP/s , 173499.0 tokens/s INFO:__main__:2024-11-30 14:32:12 | Epoch: 1 | Step: 371020 | Dataset: 0-2483498 | Loss: 0.674 | 598 ms/step , 115379.94 GFLOP/s , 173410.0 tokens/s INFO:__main__:2024-11-30 14:32:19 | Epoch: 1 | Step: 371030 | Dataset: 0-2485898 | Loss: 0.684 | 598 ms/step , 115359.87 GFLOP/s , 173383.5 tokens/s INFO:__main__:2024-11-30 14:32:26 | Epoch: 1 | Step: 371040 | Dataset: 0-2488298 | Loss: 0.679 | 598 ms/step , 115446.35 GFLOP/s , 173344.8 tokens/s INFO:__main__:2024-11-30 14:32:33 | Epoch: 1 | Step: 371050 | Dataset: 0-2490698 | Loss: 0.649 | 598 ms/step , 115330.51 GFLOP/s , 173335.8 tokens/s INFO:__main__:2024-11-30 14:32:40 | Epoch: 1 | Step: 371060 | Dataset: 0-2493098 | Loss: 0.572 | 598 ms/step , 115387.97 GFLOP/s , 173284.6 tokens/s INFO:__main__:2024-11-30 14:32:47 | Epoch: 1 | Step: 371070 | Dataset: 0-2495498 | Loss: 0.564 | 598 ms/step , 115353.03 GFLOP/s , 173360.5 tokens/s INFO:__main__:2024-11-30 14:32:54 | Epoch: 1 | Step: 371080 | Dataset: 0-2497898 | Loss: 0.567 | 598 ms/step , 115413.96 GFLOP/s , 173344.2 tokens/s INFO:__main__:2024-11-30 14:33:01 | Epoch: 1 | Step: 371090 | Dataset: 0-2500298 | Loss: 0.557 | 598 ms/step , 115325.91 GFLOP/s , 173285.7 tokens/s INFO:__main__:2024-11-30 14:33:09 | Epoch: 1 | Step: 371100 | Dataset: 0-2502698 | Loss: 0.509 | 598 ms/step , 115467.32 GFLOP/s , 173370.6 tokens/s INFO:__main__:2024-11-30 14:33:16 | Epoch: 1 | Step: 371110 | Dataset: 0-2505098 | Loss: 0.572 | 598 ms/step , 115388.97 GFLOP/s , 173325.9 tokens/s INFO:__main__:2024-11-30 14:33:23 | Epoch: 1 | Step: 371120 | Dataset: 0-2507498 | Loss: 0.513 | 597 ms/step , 115521.36 GFLOP/s , 173278.7 tokens/s INFO:__main__:2024-11-30 14:33:30 | Epoch: 1 | Step: 371130 | Dataset: 0-2509898 | Loss: 0.515 | 598 ms/step , 115353.20 GFLOP/s , 173325.5 tokens/s INFO:__main__:2024-11-30 14:33:37 | Epoch: 1 | Step: 371140 | Dataset: 0-2512298 | Loss: 0.548 | 598 ms/step , 115392.78 GFLOP/s , 173336.0 tokens/s INFO:__main__:2024-11-30 14:33:44 | Epoch: 1 | Step: 371150 | Dataset: 0-2514698 | Loss: 0.640 | 598 ms/step , 115359.39 GFLOP/s , 173347.7 tokens/s INFO:__main__:2024-11-30 14:33:51 | Epoch: 1 | Step: 371160 | Dataset: 0-2517098 | Loss: 0.553 | 602 ms/step , 114698.88 GFLOP/s , 173246.8 tokens/s INFO:__main__:2024-11-30 14:33:58 | Epoch: 1 | Step: 371170 | Dataset: 0-2519498 | Loss: 0.539 | 598 ms/step , 115317.65 GFLOP/s , 173365.4 tokens/s INFO:__main__:2024-11-30 14:34:05 | Epoch: 1 | Step: 371180 | Dataset: 0-2521898 | Loss: 0.520 | 598 ms/step , 115322.48 GFLOP/s , 173345.8 tokens/s INFO:__main__:2024-11-30 14:34:12 | Epoch: 1 | Step: 371190 | Dataset: 0-2524298 | Loss: 0.548 | 599 ms/step , 115272.04 GFLOP/s , 173257.6 tokens/s INFO:__main__:2024-11-30 14:34:19 | Epoch: 1 | Step: 371200 | Dataset: 0-2526698 | Loss: 0.487 | 598 ms/step , 115481.71 GFLOP/s , 173262.4 tokens/s INFO:__main__:2024-11-30 14:34:26 | Epoch: 1 | Step: 371210 | Dataset: 0-2529098 | Loss: 0.527 | 599 ms/step , 115261.80 GFLOP/s , 173300.8 tokens/s INFO:__main__:2024-11-30 14:34:34 | Epoch: 1 | Step: 371220 | Dataset: 0-2531498 | Loss: 0.559 | 597 ms/step , 115538.96 GFLOP/s , 173302.8 tokens/s INFO:__main__:2024-11-30 14:34:41 | Epoch: 1 | Step: 371230 | Dataset: 0-2533898 | Loss: 0.497 | 598 ms/step , 115429.92 GFLOP/s , 173292.2 tokens/s INFO:__main__:2024-11-30 14:34:48 | Epoch: 1 | Step: 371240 | Dataset: 0-2536298 | Loss: 0.545 | 599 ms/step , 115289.26 GFLOP/s , 173274.0 tokens/s INFO:__main__:2024-11-30 14:34:55 | Epoch: 1 | Step: 371250 | Dataset: 0-2538698 | Loss: 0.537 | 598 ms/step , 115396.79 GFLOP/s , 173262.7 tokens/s INFO:__main__:2024-11-30 14:35:02 | Epoch: 1 | Step: 371260 | Dataset: 0-2541098 | Loss: 0.556 | 599 ms/step , 115303.99 GFLOP/s , 173311.7 tokens/s INFO:__main__:2024-11-30 14:35:09 | Epoch: 1 | Step: 371270 | Dataset: 0-2543498 | Loss: 0.540 | 598 ms/step , 115407.83 GFLOP/s , 173191.4 tokens/s INFO:__main__:2024-11-30 14:35:16 | Epoch: 1 | Step: 371280 | Dataset: 0-2545898 | Loss: 0.578 | 598 ms/step , 115316.26 GFLOP/s , 173348.1 tokens/s INFO:__main__:2024-11-30 14:35:23 | Epoch: 1 | Step: 371290 | Dataset: 0-2548298 | Loss: 0.578 | 598 ms/step , 115312.40 GFLOP/s , 173285.9 tokens/s INFO:__main__:2024-11-30 14:35:30 | Epoch: 1 | Step: 371300 | Dataset: 0-2550698 | Loss: 0.546 | 597 ms/step , 115533.64 GFLOP/s , 173300.2 tokens/s INFO:__main__:2024-11-30 14:35:37 | Epoch: 1 | Step: 371310 | Dataset: 0-2553098 | Loss: 0.516 | 598 ms/step , 115500.13 GFLOP/s , 173315.2 tokens/s INFO:__main__:2024-11-30 14:35:44 | Epoch: 1 | Step: 371320 | Dataset: 0-2555498 | Loss: 0.553 | 599 ms/step , 115210.63 GFLOP/s , 173194.3 tokens/s INFO:__main__:2024-11-30 14:35:52 | Epoch: 1 | Step: 371330 | Dataset: 0-2557898 | Loss: 0.528 | 598 ms/step , 115430.48 GFLOP/s , 173322.0 tokens/s INFO:__main__:2024-11-30 14:35:59 | Epoch: 1 | Step: 371340 | Dataset: 0-2560298 | Loss: 0.511 | 598 ms/step , 115447.41 GFLOP/s , 173229.1 tokens/s INFO:__main__:2024-11-30 14:36:06 | Epoch: 1 | Step: 371350 | Dataset: 0-2562698 | Loss: 0.559 | 598 ms/step , 115370.89 GFLOP/s , 173204.6 tokens/s INFO:__main__:2024-11-30 14:36:13 | Epoch: 1 | Step: 371360 | Dataset: 0-2565098 | Loss: 0.560 | 598 ms/step , 115311.39 GFLOP/s , 173296.3 tokens/s INFO:__main__:2024-11-30 14:36:20 | Epoch: 1 | Step: 371370 | Dataset: 0-2567498 | Loss: 0.504 | 598 ms/step , 115419.24 GFLOP/s , 173271.7 tokens/s INFO:__main__:2024-11-30 14:36:27 | Epoch: 1 | Step: 371380 | Dataset: 0-2569898 | Loss: 0.550 | 599 ms/step , 115296.52 GFLOP/s , 173221.2 tokens/s INFO:__main__:2024-11-30 14:36:34 | Epoch: 1 | Step: 371390 | Dataset: 0-2572298 | Loss: 0.521 | 597 ms/step , 115567.18 GFLOP/s , 173311.6 tokens/s INFO:__main__:2024-11-30 14:36:41 | Epoch: 1 | Step: 371400 | Dataset: 0-2574698 | Loss: 0.529 | 598 ms/step , 115310.56 GFLOP/s , 173324.8 tokens/s INFO:__main__:2024-11-30 14:36:48 | Epoch: 1 | Step: 371410 | Dataset: 0-2577098 | Loss: 0.558 | 598 ms/step , 115440.34 GFLOP/s , 173293.3 tokens/s INFO:__main__:2024-11-30 14:36:55 | Epoch: 1 | Step: 371420 | Dataset: 0-2579498 | Loss: 0.531 | 598 ms/step , 115394.29 GFLOP/s , 173333.2 tokens/s INFO:__main__:2024-11-30 14:37:03 | Epoch: 1 | Step: 371430 | Dataset: 0-2581898 | Loss: 0.515 | 599 ms/step , 115206.39 GFLOP/s , 173324.1 tokens/s INFO:__main__:2024-11-30 14:37:10 | Epoch: 1 | Step: 371440 | Dataset: 0-2584298 | Loss: 0.538 | 598 ms/step , 115399.20 GFLOP/s , 173202.1 tokens/s INFO:__main__:2024-11-30 14:37:17 | Epoch: 1 | Step: 371450 | Dataset: 0-2586698 | Loss: 0.547 | 598 ms/step , 115377.58 GFLOP/s , 173199.4 tokens/s INFO:__main__:2024-11-30 14:37:24 | Epoch: 1 | Step: 371460 | Dataset: 0-2589098 | Loss: 0.554 | 598 ms/step , 115386.54 GFLOP/s , 173192.9 tokens/s INFO:__main__:2024-11-30 14:37:31 | Epoch: 1 | Step: 371470 | Dataset: 0-2591498 | Loss: 0.523 | 599 ms/step , 115284.58 GFLOP/s , 173205.7 tokens/s INFO:__main__:2024-11-30 14:37:38 | Epoch: 1 | Step: 371480 | Dataset: 0-2593898 | Loss: 0.527 | 598 ms/step , 115327.34 GFLOP/s , 173283.1 tokens/s INFO:__main__:2024-11-30 14:37:45 | Epoch: 1 | Step: 371490 | Dataset: 0-2596298 | Loss: 0.586 | 599 ms/step , 115302.39 GFLOP/s , 173266.8 tokens/s INFO:__main__:2024-11-30 14:37:53 | Validation | Step: 371500 | Val_loss: 0.804 | Best_val_loss: 0.3487 INFO:__main__:2024-11-30 14:37:53 | Epoch: 1 | Step: 371500 | Dataset: 0-2598698 | Loss: 0.560 | 598 ms/step , 115366.52 GFLOP/s , 147473.3 tokens/s INFO:__main__:2024-11-30 14:38:00 | Epoch: 1 | Step: 371510 | Dataset: 0-2601098 | Loss: 0.560 | 599 ms/step , 115301.03 GFLOP/s , 173317.0 tokens/s INFO:__main__:2024-11-30 14:38:08 | Epoch: 1 | Step: 371520 | Dataset: 0-2603498 | Loss: 0.541 | 598 ms/step , 115378.42 GFLOP/s , 173270.8 tokens/s INFO:__main__:2024-11-30 14:38:15 | Epoch: 1 | Step: 371530 | Dataset: 0-2605898 | Loss: 0.522 | 598 ms/step , 115336.12 GFLOP/s , 173340.7 tokens/s INFO:__main__:2024-11-30 14:38:22 | Epoch: 1 | Step: 371540 | Dataset: 0-2608298 | Loss: 0.521 | 598 ms/step , 115314.50 GFLOP/s , 173331.5 tokens/s INFO:__main__:2024-11-30 14:38:29 | Epoch: 1 | Step: 371550 | Dataset: 0-2610698 | Loss: 0.563 | 598 ms/step , 115363.02 GFLOP/s , 173308.3 tokens/s INFO:__main__:2024-11-30 14:38:36 | Epoch: 1 | Step: 371560 | Dataset: 0-2613098 | Loss: 0.547 | 598 ms/step , 115417.87 GFLOP/s , 173295.6 tokens/s INFO:__main__:2024-11-30 14:38:43 | Epoch: 1 | Step: 371570 | Dataset: 0-2615498 | Loss: 0.445 | 599 ms/step , 115250.13 GFLOP/s , 173277.5 tokens/s INFO:__main__:2024-11-30 14:38:50 | Epoch: 1 | Step: 371580 | Dataset: 0-2617898 | Loss: 0.531 | 598 ms/step , 115389.54 GFLOP/s , 173251.4 tokens/s INFO:__main__:2024-11-30 14:38:57 | Epoch: 1 | Step: 371590 | Dataset: 0-2620298 | Loss: 0.522 | 601 ms/step , 114831.97 GFLOP/s , 173206.9 tokens/s INFO:__main__:2024-11-30 14:39:04 | Epoch: 1 | Step: 371600 | Dataset: 0-2622698 | Loss: 0.959 | 598 ms/step , 115438.81 GFLOP/s , 173280.0 tokens/s INFO:__main__:2024-11-30 14:39:11 | Epoch: 1 | Step: 371610 | Dataset: 0-2625098 | Loss: 0.866 | 598 ms/step , 115343.15 GFLOP/s , 173289.3 tokens/s INFO:__main__:2024-11-30 14:39:18 | Epoch: 1 | Step: 371620 | Dataset: 0-2627498 | Loss: 0.876 | 598 ms/step , 115463.72 GFLOP/s , 173277.1 tokens/s INFO:__main__:2024-11-30 14:39:26 | Epoch: 1 | Step: 371630 | Dataset: 0-2629898 | Loss: 0.985 | 598 ms/step , 115422.79 GFLOP/s , 173292.9 tokens/s INFO:__main__:2024-11-30 14:39:33 | Epoch: 1 | Step: 371640 | Dataset: 0-2632298 | Loss: 0.860 | 598 ms/step , 115362.23 GFLOP/s , 173351.2 tokens/s INFO:__main__:2024-11-30 14:39:40 | Epoch: 1 | Step: 371650 | Dataset: 0-2634698 | Loss: 0.800 | 598 ms/step , 115429.53 GFLOP/s , 173229.1 tokens/s INFO:__main__:2024-11-30 14:39:47 | Epoch: 1 | Step: 371660 | Dataset: 0-2637098 | Loss: 0.870 | 598 ms/step , 115473.93 GFLOP/s , 173255.6 tokens/s INFO:__main__:2024-11-30 14:39:54 | Epoch: 1 | Step: 371670 | Dataset: 0-2639498 | Loss: 0.840 | 598 ms/step , 115376.26 GFLOP/s , 173279.6 tokens/s INFO:__main__:2024-11-30 14:40:01 | Epoch: 1 | Step: 371680 | Dataset: 0-2641898 | Loss: 0.789 | 598 ms/step , 115441.95 GFLOP/s , 173232.2 tokens/s INFO:__main__:2024-11-30 14:40:08 | Epoch: 1 | Step: 371690 | Dataset: 0-2644298 | Loss: 0.881 | 598 ms/step , 115349.41 GFLOP/s , 173248.5 tokens/s INFO:__main__:2024-11-30 14:40:15 | Epoch: 1 | Step: 371700 | Dataset: 0-2646698 | Loss: 0.959 | 598 ms/step , 115455.29 GFLOP/s , 173347.0 tokens/s INFO:__main__:2024-11-30 14:40:22 | Epoch: 1 | Step: 371710 | Dataset: 0-2649098 | Loss: 0.806 | 598 ms/step , 115427.92 GFLOP/s , 173233.2 tokens/s INFO:__main__:2024-11-30 14:40:29 | Epoch: 1 | Step: 371720 | Dataset: 0-2651498 | Loss: 0.873 | 598 ms/step , 115422.36 GFLOP/s , 173327.7 tokens/s INFO:__main__:2024-11-30 14:40:36 | Epoch: 1 | Step: 371730 | Dataset: 0-2653898 | Loss: 0.828 | 598 ms/step , 115436.32 GFLOP/s , 173260.5 tokens/s INFO:__main__:2024-11-30 14:40:44 | Epoch: 1 | Step: 371740 | Dataset: 0-2656298 | Loss: 0.924 | 598 ms/step , 115439.35 GFLOP/s , 173318.2 tokens/s INFO:__main__:2024-11-30 14:40:51 | Epoch: 1 | Step: 371750 | Dataset: 0-2658698 | Loss: 0.825 | 598 ms/step , 115430.17 GFLOP/s , 173319.3 tokens/s INFO:__main__:2024-11-30 14:40:58 | Epoch: 1 | Step: 371760 | Dataset: 0-2661098 | Loss: 0.814 | 598 ms/step , 115490.89 GFLOP/s , 173324.9 tokens/s INFO:__main__:2024-11-30 14:41:05 | Epoch: 1 | Step: 371770 | Dataset: 0-2663498 | Loss: 0.846 | 598 ms/step , 115351.40 GFLOP/s , 173274.6 tokens/s INFO:__main__:2024-11-30 14:41:12 | Epoch: 1 | Step: 371780 | Dataset: 0-2665898 | Loss: 0.943 | 598 ms/step , 115376.39 GFLOP/s , 173280.0 tokens/s INFO:__main__:2024-11-30 14:41:19 | Epoch: 1 | Step: 371790 | Dataset: 0-2668298 | Loss: 0.951 | 598 ms/step , 115492.93 GFLOP/s , 173228.0 tokens/s INFO:__main__:2024-11-30 14:41:26 | Epoch: 1 | Step: 371800 | Dataset: 0-2670698 | Loss: 0.860 | 598 ms/step , 115463.75 GFLOP/s , 173353.5 tokens/s INFO:__main__:2024-11-30 14:41:33 | Epoch: 1 | Step: 371810 | Dataset: 0-2673098 | Loss: 0.721 | 600 ms/step , 115082.43 GFLOP/s , 173236.1 tokens/s INFO:__main__:2024-11-30 14:41:40 | Epoch: 1 | Step: 371820 | Dataset: 0-2675498 | Loss: 0.860 | 598 ms/step , 115326.77 GFLOP/s , 173292.8 tokens/s INFO:__main__:2024-11-30 14:41:47 | Epoch: 1 | Step: 371830 | Dataset: 0-2677898 | Loss: 0.818 | 598 ms/step , 115401.43 GFLOP/s , 173292.0 tokens/s INFO:__main__:2024-11-30 14:41:55 | Epoch: 1 | Step: 371840 | Dataset: 0-2680298 | Loss: 0.860 | 598 ms/step , 115387.44 GFLOP/s , 173258.9 tokens/s INFO:__main__:2024-11-30 14:42:02 | Epoch: 1 | Step: 371850 | Dataset: 0-2682698 | Loss: 0.812 | 598 ms/step , 115445.73 GFLOP/s , 173310.0 tokens/s INFO:__main__:2024-11-30 14:42:09 | Epoch: 1 | Step: 371860 | Dataset: 0-2685098 | Loss: 0.933 | 598 ms/step , 115375.21 GFLOP/s , 173260.2 tokens/s INFO:__main__:2024-11-30 14:42:16 | Epoch: 1 | Step: 371870 | Dataset: 0-2687498 | Loss: 0.889 | 598 ms/step , 115465.33 GFLOP/s , 173201.4 tokens/s INFO:__main__:2024-11-30 14:42:23 | Epoch: 1 | Step: 371880 | Dataset: 0-2689898 | Loss: 0.454 | 598 ms/step , 115390.13 GFLOP/s , 173262.3 tokens/s INFO:__main__:2024-11-30 14:42:30 | Epoch: 1 | Step: 371890 | Dataset: 0-2692298 | Loss: 0.456 | 598 ms/step , 115335.89 GFLOP/s , 173286.2 tokens/s INFO:__main__:2024-11-30 14:42:37 | Epoch: 1 | Step: 371900 | Dataset: 0-2694698 | Loss: 0.448 | 598 ms/step , 115431.29 GFLOP/s , 173282.7 tokens/s INFO:__main__:2024-11-30 14:42:44 | Epoch: 1 | Step: 371910 | Dataset: 0-2697098 | Loss: 0.440 | 598 ms/step , 115420.31 GFLOP/s , 173284.0 tokens/s INFO:__main__:2024-11-30 14:42:51 | Epoch: 1 | Step: 371920 | Dataset: 0-2699498 | Loss: 0.387 | 599 ms/step , 115307.39 GFLOP/s , 173229.8 tokens/s INFO:__main__:2024-11-30 14:42:58 | Epoch: 1 | Step: 371930 | Dataset: 0-2701898 | Loss: 0.421 | 598 ms/step , 115396.47 GFLOP/s , 173263.8 tokens/s INFO:__main__:2024-11-30 14:43:05 | Epoch: 1 | Step: 371940 | Dataset: 0-2704298 | Loss: 0.387 | 598 ms/step , 115353.51 GFLOP/s , 173272.4 tokens/s INFO:__main__:2024-11-30 14:43:13 | Epoch: 1 | Step: 371950 | Dataset: 0-2706698 | Loss: 0.430 | 599 ms/step , 115248.55 GFLOP/s , 173033.8 tokens/s INFO:__main__:2024-11-30 14:43:20 | Epoch: 1 | Step: 371960 | Dataset: 0-2709098 | Loss: 0.388 | 598 ms/step , 115409.45 GFLOP/s , 173272.7 tokens/s INFO:__main__:2024-11-30 14:43:27 | Epoch: 1 | Step: 371970 | Dataset: 0-2711498 | Loss: 0.389 | 598 ms/step , 115395.64 GFLOP/s , 173148.9 tokens/s INFO:__main__:2024-11-30 14:43:34 | Epoch: 1 | Step: 371980 | Dataset: 0-2713898 | Loss: 0.397 | 598 ms/step , 115342.64 GFLOP/s , 173215.3 tokens/s INFO:__main__:2024-11-30 14:43:41 | Epoch: 1 | Step: 371990 | Dataset: 0-2716298 | Loss: 0.415 | 598 ms/step , 115311.31 GFLOP/s , 173166.7 tokens/s INFO:__main__:2024-11-30 14:43:49 | Validation | Step: 372000 | Val_loss: 0.791 | Best_val_loss: 0.3487 INFO:__main__:2024-11-30 14:43:49 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_144349_step_372000.pt` INFO:__main__:2024-11-30 14:43:51 | Epoch: 1 | Step: 372000 | Dataset: 0-2718698 | Loss: 0.384 | 595 ms/step , 115996.76 GFLOP/s , 118931.8 tokens/s INFO:__main__:2024-11-30 14:43:58 | Epoch: 1 | Step: 372010 | Dataset: 0-2721098 | Loss: 0.443 | 598 ms/step , 115400.34 GFLOP/s , 173504.1 tokens/s INFO:__main__:2024-11-30 14:44:05 | Epoch: 1 | Step: 372020 | Dataset: 0-2723498 | Loss: 0.421 | 599 ms/step , 115287.76 GFLOP/s , 173332.7 tokens/s INFO:__main__:2024-11-30 14:44:12 | Epoch: 1 | Step: 372030 | Dataset: 0-2725898 | Loss: 0.399 | 598 ms/step , 115406.46 GFLOP/s , 173279.8 tokens/s INFO:__main__:2024-11-30 14:44:20 | Epoch: 1 | Step: 372040 | Dataset: 0-2728298 | Loss: 0.404 | 598 ms/step , 115432.21 GFLOP/s , 173355.1 tokens/s INFO:__main__:2024-11-30 14:44:27 | Epoch: 1 | Step: 372050 | Dataset: 0-2730698 | Loss: 0.411 | 597 ms/step , 115513.52 GFLOP/s , 173372.7 tokens/s INFO:__main__:2024-11-30 14:44:34 | Epoch: 1 | Step: 372060 | Dataset: 0-2733098 | Loss: 0.417 | 599 ms/step , 115285.29 GFLOP/s , 173330.0 tokens/s INFO:__main__:2024-11-30 14:44:41 | Epoch: 1 | Step: 372070 | Dataset: 0-2735498 | Loss: 0.339 | 597 ms/step , 115546.34 GFLOP/s , 173326.0 tokens/s INFO:__main__:2024-11-30 14:44:48 | Epoch: 1 | Step: 372080 | Dataset: 0-2737898 | Loss: 0.402 | 598 ms/step , 115455.20 GFLOP/s , 173257.6 tokens/s INFO:__main__:2024-11-30 14:44:55 | Epoch: 1 | Step: 372090 | Dataset: 0-2740298 | Loss: 0.396 | 598 ms/step , 115392.53 GFLOP/s , 173307.0 tokens/s INFO:__main__:2024-11-30 14:45:02 | Epoch: 1 | Step: 372100 | Dataset: 0-2742698 | Loss: 0.403 | 598 ms/step , 115362.00 GFLOP/s , 173287.9 tokens/s INFO:__main__:2024-11-30 14:45:09 | Epoch: 1 | Step: 372110 | Dataset: 0-2745098 | Loss: 0.395 | 598 ms/step , 115406.34 GFLOP/s , 173263.1 tokens/s INFO:__main__:2024-11-30 14:45:16 | Epoch: 1 | Step: 372120 | Dataset: 0-2747498 | Loss: 0.411 | 598 ms/step , 115324.95 GFLOP/s , 173210.8 tokens/s INFO:__main__:2024-11-30 14:45:23 | Epoch: 1 | Step: 372130 | Dataset: 0-2749898 | Loss: 0.403 | 598 ms/step , 115313.98 GFLOP/s , 173251.4 tokens/s INFO:__main__:2024-11-30 14:45:30 | Epoch: 1 | Step: 372140 | Dataset: 0-2752298 | Loss: 0.374 | 598 ms/step , 115408.02 GFLOP/s , 173255.1 tokens/s INFO:__main__:2024-11-30 14:45:38 | Epoch: 1 | Step: 372150 | Dataset: 0-2754698 | Loss: 0.718 | 598 ms/step , 115343.48 GFLOP/s , 173288.9 tokens/s INFO:__main__:2024-11-30 14:45:45 | Epoch: 1 | Step: 372160 | Dataset: 0-2757098 | Loss: 0.702 | 598 ms/step , 115341.38 GFLOP/s , 173310.2 tokens/s INFO:__main__:2024-11-30 14:45:52 | Epoch: 1 | Step: 372170 | Dataset: 0-2759498 | Loss: 0.728 | 598 ms/step , 115354.43 GFLOP/s , 173180.0 tokens/s INFO:__main__:2024-11-30 14:45:59 | Epoch: 1 | Step: 372180 | Dataset: 0-2761898 | Loss: 0.583 | 599 ms/step , 115241.62 GFLOP/s , 173145.3 tokens/s INFO:__main__:2024-11-30 14:46:06 | Epoch: 1 | Step: 372190 | Dataset: 0-2764298 | Loss: 0.655 | 599 ms/step , 115229.81 GFLOP/s , 173309.2 tokens/s INFO:__main__:2024-11-30 14:46:13 | Epoch: 1 | Step: 372200 | Dataset: 0-2766698 | Loss: 0.633 | 598 ms/step , 115312.78 GFLOP/s , 173217.4 tokens/s INFO:__main__:2024-11-30 14:46:20 | Epoch: 1 | Step: 372210 | Dataset: 0-2769098 | Loss: 0.669 | 598 ms/step , 115378.19 GFLOP/s , 173144.1 tokens/s INFO:__main__:2024-11-30 14:46:27 | Epoch: 1 | Step: 372220 | Dataset: 0-2771498 | Loss: 0.666 | 599 ms/step , 115302.14 GFLOP/s , 173308.4 tokens/s INFO:__main__:2024-11-30 14:46:34 | Epoch: 1 | Step: 372230 | Dataset: 0-2773898 | Loss: 0.725 | 598 ms/step , 115325.79 GFLOP/s , 173282.0 tokens/s INFO:__main__:2024-11-30 14:46:41 | Epoch: 1 | Step: 372240 | Dataset: 0-2776298 | Loss: 0.706 | 599 ms/step , 115278.48 GFLOP/s , 173300.8 tokens/s INFO:__main__:2024-11-30 14:46:49 | Epoch: 1 | Step: 372250 | Dataset: 0-2778698 | Loss: 0.739 | 598 ms/step , 115459.14 GFLOP/s , 173300.3 tokens/s INFO:__main__:2024-11-30 14:46:56 | Epoch: 1 | Step: 372260 | Dataset: 0-2781098 | Loss: 0.579 | 598 ms/step , 115357.89 GFLOP/s , 173251.3 tokens/s INFO:__main__:2024-11-30 14:47:03 | Epoch: 1 | Step: 372270 | Dataset: 0-2783498 | Loss: 0.628 | 598 ms/step , 115359.47 GFLOP/s , 173233.4 tokens/s INFO:__main__:2024-11-30 14:47:10 | Epoch: 1 | Step: 372280 | Dataset: 0-2785898 | Loss: 0.589 | 598 ms/step , 115495.48 GFLOP/s , 173161.1 tokens/s INFO:__main__:2024-11-30 14:47:17 | Epoch: 1 | Step: 372290 | Dataset: 0-2788298 | Loss: 0.705 | 598 ms/step , 115403.72 GFLOP/s , 173232.4 tokens/s INFO:__main__:2024-11-30 14:47:24 | Epoch: 1 | Step: 372300 | Dataset: 0-2790698 | Loss: 0.606 | 599 ms/step , 115133.77 GFLOP/s , 173249.5 tokens/s INFO:__main__:2024-11-30 14:47:31 | Epoch: 1 | Step: 372310 | Dataset: 0-2793098 | Loss: 0.582 | 598 ms/step , 115358.53 GFLOP/s , 173144.3 tokens/s INFO:__main__:2024-11-30 14:47:38 | Epoch: 1 | Step: 372320 | Dataset: 0-2795498 | Loss: 0.706 | 599 ms/step , 115253.24 GFLOP/s , 173276.6 tokens/s INFO:__main__:2024-11-30 14:47:45 | Epoch: 1 | Step: 372330 | Dataset: 0-2797898 | Loss: 0.710 | 599 ms/step , 115264.80 GFLOP/s , 173310.7 tokens/s INFO:__main__:2024-11-30 14:47:52 | Epoch: 1 | Step: 372340 | Dataset: 0-2800298 | Loss: 0.650 | 598 ms/step , 115309.21 GFLOP/s , 173259.8 tokens/s INFO:__main__:2024-11-30 14:47:59 | Epoch: 1 | Step: 372350 | Dataset: 0-2802698 | Loss: 0.649 | 599 ms/step , 115237.68 GFLOP/s , 173290.5 tokens/s INFO:__main__:2024-11-30 14:48:07 | Epoch: 1 | Step: 372360 | Dataset: 0-2805098 | Loss: 0.625 | 599 ms/step , 115288.51 GFLOP/s , 173301.7 tokens/s INFO:__main__:2024-11-30 14:48:14 | Epoch: 1 | Step: 372370 | Dataset: 0-2807498 | Loss: 0.635 | 598 ms/step , 115461.60 GFLOP/s , 173372.9 tokens/s INFO:__main__:2024-11-30 14:48:21 | Epoch: 1 | Step: 372380 | Dataset: 0-2809898 | Loss: 0.600 | 598 ms/step , 115385.04 GFLOP/s , 173233.8 tokens/s INFO:__main__:2024-11-30 14:48:28 | Epoch: 1 | Step: 372390 | Dataset: 0-2812298 | Loss: 0.689 | 598 ms/step , 115469.55 GFLOP/s , 173322.3 tokens/s INFO:__main__:2024-11-30 14:48:35 | Epoch: 1 | Step: 372400 | Dataset: 0-2814698 | Loss: 0.638 | 598 ms/step , 115347.09 GFLOP/s , 173302.3 tokens/s INFO:__main__:2024-11-30 14:48:42 | Epoch: 1 | Step: 372410 | Dataset: 0-2817098 | Loss: 0.754 | 598 ms/step , 115453.57 GFLOP/s , 173137.5 tokens/s INFO:__main__:2024-11-30 14:48:49 | Epoch: 1 | Step: 372420 | Dataset: 0-2819498 | Loss: 0.731 | 599 ms/step , 115269.38 GFLOP/s , 173312.0 tokens/s INFO:__main__:2024-11-30 14:48:56 | Epoch: 1 | Step: 372430 | Dataset: 0-2821898 | Loss: 0.665 | 598 ms/step , 115401.05 GFLOP/s , 173268.9 tokens/s INFO:__main__:2024-11-30 14:49:03 | Epoch: 1 | Step: 372440 | Dataset: 0-2824298 | Loss: 0.674 | 598 ms/step , 115432.58 GFLOP/s , 173184.7 tokens/s INFO:__main__:2024-11-30 14:49:10 | Epoch: 1 | Step: 372450 | Dataset: 0-2826698 | Loss: 0.685 | 598 ms/step , 115498.79 GFLOP/s , 173279.1 tokens/s INFO:__main__:2024-11-30 14:49:17 | Epoch: 1 | Step: 372460 | Dataset: 0-2829098 | Loss: 0.714 | 599 ms/step , 115274.06 GFLOP/s , 173221.3 tokens/s INFO:__main__:2024-11-30 14:49:25 | Epoch: 1 | Step: 372470 | Dataset: 0-2831498 | Loss: 0.667 | 598 ms/step , 115376.53 GFLOP/s , 173237.9 tokens/s INFO:__main__:2024-11-30 14:49:32 | Epoch: 1 | Step: 372480 | Dataset: 0-2833898 | Loss: 0.702 | 598 ms/step , 115349.00 GFLOP/s , 173241.7 tokens/s INFO:__main__:2024-11-30 14:49:39 | Epoch: 1 | Step: 372490 | Dataset: 0-2836298 | Loss: 0.622 | 598 ms/step , 115379.93 GFLOP/s , 173282.1 tokens/s INFO:__main__:2024-11-30 14:49:46 | Validation | Step: 372500 | Val_loss: 0.450 | Best_val_loss: 0.3487 INFO:__main__:2024-11-30 14:49:47 | Epoch: 1 | Step: 372500 | Dataset: 0-2838698 | Loss: 0.704 | 597 ms/step , 115564.84 GFLOP/s , 147507.1 tokens/s INFO:__main__:2024-11-30 14:49:54 | Epoch: 1 | Step: 372510 | Dataset: 0-2841098 | Loss: 0.640 | 598 ms/step , 115392.60 GFLOP/s , 173366.6 tokens/s INFO:__main__:2024-11-30 14:50:01 | Epoch: 1 | Step: 372520 | Dataset: 0-2843498 | Loss: 0.666 | 597 ms/step , 115504.01 GFLOP/s , 173290.8 tokens/s INFO:__main__:2024-11-30 14:50:08 | Epoch: 1 | Step: 372530 | Dataset: 0-2845898 | Loss: 0.622 | 598 ms/step , 115362.32 GFLOP/s , 173303.8 tokens/s INFO:__main__:2024-11-30 14:50:15 | Epoch: 1 | Step: 372540 | Dataset: 0-2848298 | Loss: 0.634 | 598 ms/step , 115348.01 GFLOP/s , 173287.8 tokens/s INFO:__main__:2024-11-30 14:50:23 | Epoch: 1 | Step: 372550 | Dataset: 0-2850698 | Loss: 0.619 | 598 ms/step , 115410.82 GFLOP/s , 173289.4 tokens/s INFO:__main__:2024-11-30 14:50:30 | Epoch: 1 | Step: 372560 | Dataset: 0-2853098 | Loss: 0.644 | 598 ms/step , 115400.84 GFLOP/s , 173285.4 tokens/s INFO:__main__:2024-11-30 14:50:37 | Epoch: 1 | Step: 372570 | Dataset: 0-2855498 | Loss: 0.724 | 597 ms/step , 115568.91 GFLOP/s , 173240.9 tokens/s INFO:__main__:2024-11-30 14:50:44 | Epoch: 1 | Step: 372580 | Dataset: 0-2857898 | Loss: 0.669 | 598 ms/step , 115359.61 GFLOP/s , 173232.3 tokens/s INFO:__main__:2024-11-30 14:50:51 | Epoch: 1 | Step: 372590 | Dataset: 0-2860298 | Loss: 0.724 | 598 ms/step , 115365.85 GFLOP/s , 173283.7 tokens/s INFO:__main__:2024-11-30 14:50:58 | Epoch: 1 | Step: 372600 | Dataset: 0-2862698 | Loss: 0.634 | 598 ms/step , 115374.71 GFLOP/s , 173156.4 tokens/s INFO:__main__:2024-11-30 14:51:05 | Epoch: 1 | Step: 372610 | Dataset: 0-2865098 | Loss: 0.626 | 598 ms/step , 115394.93 GFLOP/s , 173310.6 tokens/s INFO:__main__:2024-11-30 14:51:12 | Epoch: 1 | Step: 372620 | Dataset: 0-2867498 | Loss: 0.509 | 598 ms/step , 115477.79 GFLOP/s , 173216.7 tokens/s INFO:__main__:2024-11-30 14:51:19 | Epoch: 1 | Step: 372630 | Dataset: 0-2869898 | Loss: 0.610 | 599 ms/step , 115231.48 GFLOP/s , 173267.7 tokens/s INFO:__main__:2024-11-30 14:51:26 | Epoch: 1 | Step: 372640 | Dataset: 0-2872298 | Loss: 0.692 | 599 ms/step , 115233.62 GFLOP/s , 173291.0 tokens/s INFO:__main__:2024-11-30 14:51:33 | Epoch: 1 | Step: 372650 | Dataset: 0-2874698 | Loss: 0.662 | 598 ms/step , 115440.33 GFLOP/s , 173268.1 tokens/s INFO:__main__:2024-11-30 14:51:41 | Epoch: 1 | Step: 372660 | Dataset: 0-2877098 | Loss: 0.697 | 598 ms/step , 115447.23 GFLOP/s , 173348.1 tokens/s INFO:__main__:2024-11-30 14:51:48 | Epoch: 1 | Step: 372670 | Dataset: 0-2879498 | Loss: 0.648 | 597 ms/step , 115508.82 GFLOP/s , 173192.4 tokens/s INFO:__main__:2024-11-30 14:51:55 | Epoch: 1 | Step: 372680 | Dataset: 0-2881898 | Loss: 0.642 | 598 ms/step , 115388.37 GFLOP/s , 173293.6 tokens/s INFO:__main__:2024-11-30 14:52:02 | Epoch: 1 | Step: 372690 | Dataset: 0-2884298 | Loss: 0.152 | 597 ms/step , 115634.16 GFLOP/s , 173306.9 tokens/s INFO:__main__:2024-11-30 14:52:09 | Epoch: 1 | Step: 372700 | Dataset: 0-2886698 | Loss: 0.782 | 599 ms/step , 115239.12 GFLOP/s , 173316.2 tokens/s INFO:__main__:2024-11-30 14:52:16 | Epoch: 1 | Step: 372710 | Dataset: 0-2889098 | Loss: 0.741 | 598 ms/step , 115337.63 GFLOP/s , 173242.4 tokens/s INFO:__main__:2024-11-30 14:52:23 | Epoch: 1 | Step: 372720 | Dataset: 0-2891498 | Loss: 0.765 | 599 ms/step , 115208.50 GFLOP/s , 173256.7 tokens/s INFO:__main__:2024-11-30 14:52:30 | Epoch: 1 | Step: 372730 | Dataset: 0-2893898 | Loss: 0.748 | 598 ms/step , 115361.93 GFLOP/s , 173243.6 tokens/s INFO:__main__:2024-11-30 14:52:37 | Epoch: 1 | Step: 372740 | Dataset: 0-2896298 | Loss: 0.715 | 598 ms/step , 115409.80 GFLOP/s , 173223.3 tokens/s INFO:__main__:2024-11-30 14:52:44 | Epoch: 1 | Step: 372750 | Dataset: 0-2898698 | Loss: 0.741 | 598 ms/step , 115340.14 GFLOP/s , 173263.0 tokens/s INFO:__main__:2024-11-30 14:52:51 | Epoch: 1 | Step: 372760 | Dataset: 0-2901098 | Loss: 0.714 | 599 ms/step , 115227.65 GFLOP/s , 173235.2 tokens/s INFO:__main__:2024-11-30 14:52:59 | Epoch: 1 | Step: 372770 | Dataset: 0-2903498 | Loss: 0.761 | 599 ms/step , 115302.29 GFLOP/s , 173343.1 tokens/s INFO:__main__:2024-11-30 14:53:06 | Epoch: 1 | Step: 372780 | Dataset: 0-2905898 | Loss: 0.741 | 599 ms/step , 115275.82 GFLOP/s , 173231.8 tokens/s INFO:__main__:2024-11-30 14:53:13 | Epoch: 1 | Step: 372790 | Dataset: 0-2908298 | Loss: 0.729 | 599 ms/step , 115272.24 GFLOP/s , 173265.3 tokens/s INFO:__main__:2024-11-30 14:53:20 | Epoch: 1 | Step: 372800 | Dataset: 0-2910698 | Loss: 0.730 | 599 ms/step , 115169.98 GFLOP/s , 173258.5 tokens/s INFO:__main__:2024-11-30 14:53:27 | Epoch: 1 | Step: 372810 | Dataset: 0-2913098 | Loss: 0.732 | 599 ms/step , 115275.09 GFLOP/s , 173252.6 tokens/s INFO:__main__:2024-11-30 14:53:34 | Epoch: 1 | Step: 372820 | Dataset: 0-2915498 | Loss: 0.726 | 598 ms/step , 115373.68 GFLOP/s , 173247.4 tokens/s INFO:__main__:2024-11-30 14:53:41 | Epoch: 1 | Step: 372830 | Dataset: 0-2917898 | Loss: 0.768 | 598 ms/step , 115405.79 GFLOP/s , 173315.9 tokens/s INFO:__main__:2024-11-30 14:53:48 | Epoch: 1 | Step: 372840 | Dataset: 0-2920298 | Loss: 0.729 | 600 ms/step , 115108.12 GFLOP/s , 173210.3 tokens/s INFO:__main__:2024-11-30 14:53:55 | Epoch: 1 | Step: 372850 | Dataset: 0-2922698 | Loss: 0.705 | 599 ms/step , 115300.44 GFLOP/s , 173231.7 tokens/s INFO:__main__:2024-11-30 14:54:02 | Epoch: 1 | Step: 372860 | Dataset: 0-2925098 | Loss: 0.735 | 599 ms/step , 115193.65 GFLOP/s , 173219.1 tokens/s INFO:__main__:2024-11-30 14:54:09 | Epoch: 1 | Step: 372870 | Dataset: 0-2927498 | Loss: 0.748 | 598 ms/step , 115367.20 GFLOP/s , 173232.3 tokens/s INFO:__main__:2024-11-30 14:54:17 | Epoch: 1 | Step: 372880 | Dataset: 0-2929898 | Loss: 0.750 | 599 ms/step , 115246.08 GFLOP/s , 173262.2 tokens/s INFO:__main__:2024-11-30 14:54:24 | Epoch: 1 | Step: 372890 | Dataset: 0-2932298 | Loss: 0.694 | 598 ms/step , 115343.31 GFLOP/s , 173257.5 tokens/s INFO:__main__:2024-11-30 14:54:31 | Epoch: 1 | Step: 372900 | Dataset: 0-2934698 | Loss: 0.733 | 598 ms/step , 115312.47 GFLOP/s , 173263.6 tokens/s INFO:__main__:2024-11-30 14:54:38 | Epoch: 1 | Step: 372910 | Dataset: 0-2937098 | Loss: 0.730 | 598 ms/step , 115378.34 GFLOP/s , 173256.2 tokens/s INFO:__main__:2024-11-30 14:54:45 | Epoch: 1 | Step: 372920 | Dataset: 0-2939498 | Loss: 0.705 | 599 ms/step , 115221.50 GFLOP/s , 173255.5 tokens/s INFO:__main__:2024-11-30 14:54:52 | Epoch: 1 | Step: 372930 | Dataset: 0-2941898 | Loss: 0.708 | 598 ms/step , 115353.61 GFLOP/s , 173285.7 tokens/s INFO:__main__:2024-11-30 14:54:59 | Epoch: 1 | Step: 372940 | Dataset: 0-2944298 | Loss: 0.699 | 600 ms/step , 115115.34 GFLOP/s , 173245.1 tokens/s INFO:__main__:2024-11-30 14:55:06 | Epoch: 1 | Step: 372950 | Dataset: 0-2946698 | Loss: 0.704 | 598 ms/step , 115418.98 GFLOP/s , 173279.9 tokens/s INFO:__main__:2024-11-30 14:55:13 | Epoch: 1 | Step: 372960 | Dataset: 0-2949098 | Loss: 0.741 | 599 ms/step , 115249.95 GFLOP/s , 173321.5 tokens/s INFO:__main__:2024-11-30 14:55:20 | Epoch: 1 | Step: 372970 | Dataset: 0-2951498 | Loss: 0.732 | 598 ms/step , 115315.76 GFLOP/s , 173224.2 tokens/s INFO:__main__:2024-11-30 14:55:27 | Epoch: 1 | Step: 372980 | Dataset: 0-2953898 | Loss: 0.712 | 598 ms/step , 115391.11 GFLOP/s , 173273.6 tokens/s INFO:__main__:2024-11-30 14:55:35 | Epoch: 1 | Step: 372990 | Dataset: 0-2956298 | Loss: 0.739 | 599 ms/step , 115272.22 GFLOP/s , 173337.4 tokens/s INFO:__main__:2024-11-30 14:55:42 | Validation | Step: 373000 | Val_loss: 0.439 | Best_val_loss: 0.3487 INFO:__main__:2024-11-30 14:55:42 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_145542_step_373000.pt` INFO:__main__:2024-11-30 14:55:45 | Epoch: 1 | Step: 373000 | Dataset: 0-2958698 | Loss: 0.736 | 595 ms/step , 116061.70 GFLOP/s , 119013.3 tokens/s INFO:__main__:2024-11-30 14:55:52 | Epoch: 1 | Step: 373010 | Dataset: 0-2961098 | Loss: 0.728 | 598 ms/step , 115348.42 GFLOP/s , 173505.5 tokens/s INFO:__main__:2024-11-30 14:55:59 | Epoch: 1 | Step: 373020 | Dataset: 0-2963498 | Loss: 0.726 | 598 ms/step , 115393.43 GFLOP/s , 173432.4 tokens/s INFO:__main__:2024-11-30 14:56:06 | Epoch: 1 | Step: 373030 | Dataset: 0-2965898 | Loss: 0.750 | 598 ms/step , 115447.40 GFLOP/s , 173399.3 tokens/s INFO:__main__:2024-11-30 14:56:13 | Epoch: 1 | Step: 373040 | Dataset: 0-2968298 | Loss: 0.710 | 599 ms/step , 115268.55 GFLOP/s , 173327.5 tokens/s INFO:__main__:2024-11-30 14:56:20 | Epoch: 1 | Step: 373050 | Dataset: 0-2970698 | Loss: 0.709 | 598 ms/step , 115360.11 GFLOP/s , 173260.6 tokens/s INFO:__main__:2024-11-30 14:56:27 | Epoch: 1 | Step: 373060 | Dataset: 0-2973098 | Loss: 0.724 | 599 ms/step , 115254.57 GFLOP/s , 173341.7 tokens/s INFO:__main__:2024-11-30 14:56:35 | Epoch: 1 | Step: 373070 | Dataset: 0-2975498 | Loss: 0.697 | 598 ms/step , 115409.65 GFLOP/s , 173311.0 tokens/s INFO:__main__:2024-11-30 14:56:42 | Epoch: 1 | Step: 373080 | Dataset: 0-2977898 | Loss: 0.703 | 599 ms/step , 115270.73 GFLOP/s , 173252.9 tokens/s INFO:__main__:2024-11-30 14:56:49 | Epoch: 1 | Step: 373090 | Dataset: 0-2980298 | Loss: 0.739 | 598 ms/step , 115358.86 GFLOP/s , 173334.6 tokens/s INFO:__main__:2024-11-30 14:56:56 | Epoch: 1 | Step: 373100 | Dataset: 0-2982698 | Loss: 0.714 | 599 ms/step , 115262.46 GFLOP/s , 173256.6 tokens/s INFO:__main__:2024-11-30 14:57:03 | Epoch: 1 | Step: 373110 | Dataset: 0-2985098 | Loss: 0.696 | 598 ms/step , 115342.94 GFLOP/s , 173367.8 tokens/s INFO:__main__:2024-11-30 14:57:10 | Epoch: 1 | Step: 373120 | Dataset: 0-2987498 | Loss: 0.764 | 599 ms/step , 115271.06 GFLOP/s , 173337.8 tokens/s INFO:__main__:2024-11-30 14:57:17 | Epoch: 1 | Step: 373130 | Dataset: 0-2989898 | Loss: 0.726 | 598 ms/step , 115425.66 GFLOP/s , 173319.4 tokens/s INFO:__main__:2024-11-30 14:57:24 | Epoch: 1 | Step: 373140 | Dataset: 0-2992298 | Loss: 0.702 | 599 ms/step , 115287.72 GFLOP/s , 173210.4 tokens/s INFO:__main__:2024-11-30 14:57:31 | Epoch: 1 | Step: 373150 | Dataset: 0-2994698 | Loss: 0.698 | 598 ms/step , 115378.20 GFLOP/s , 173372.2 tokens/s INFO:__main__:2024-11-30 14:57:38 | Epoch: 1 | Step: 373160 | Dataset: 0-2997098 | Loss: 0.722 | 599 ms/step , 115166.61 GFLOP/s , 173250.7 tokens/s INFO:__main__:2024-11-30 14:57:45 | Epoch: 1 | Step: 373170 | Dataset: 0-2999498 | Loss: 0.714 | 599 ms/step , 115268.69 GFLOP/s , 173294.0 tokens/s INFO:__main__:2024-11-30 14:57:53 | Epoch: 1 | Step: 373180 | Dataset: 0-3001898 | Loss: 0.689 | 599 ms/step , 115232.24 GFLOP/s , 173303.0 tokens/s INFO:__main__:2024-11-30 14:58:00 | Epoch: 1 | Step: 373190 | Dataset: 0-3004298 | Loss: 0.727 | 598 ms/step , 115329.45 GFLOP/s , 173338.6 tokens/s INFO:__main__:2024-11-30 14:58:07 | Epoch: 1 | Step: 373200 | Dataset: 0-3006698 | Loss: 0.687 | 598 ms/step , 115320.75 GFLOP/s , 173231.3 tokens/s INFO:__main__:2024-11-30 14:58:14 | Epoch: 1 | Step: 373210 | Dataset: 0-3009098 | Loss: 0.733 | 599 ms/step , 115297.65 GFLOP/s , 173233.3 tokens/s INFO:__main__:2024-11-30 14:58:21 | Epoch: 1 | Step: 373220 | Dataset: 0-3011498 | Loss: 0.705 | 598 ms/step , 115327.44 GFLOP/s , 173305.4 tokens/s INFO:__main__:2024-11-30 14:58:28 | Epoch: 1 | Step: 373230 | Dataset: 0-3013898 | Loss: 0.713 | 598 ms/step , 115324.12 GFLOP/s , 173336.1 tokens/s INFO:__main__:2024-11-30 14:58:35 | Epoch: 1 | Step: 373240 | Dataset: 0-3016298 | Loss: 0.617 | 598 ms/step , 115313.84 GFLOP/s , 173313.9 tokens/s INFO:__main__:2024-11-30 14:58:42 | Epoch: 1 | Step: 373250 | Dataset: 0-3018698 | Loss: 0.670 | 598 ms/step , 115398.47 GFLOP/s , 173221.0 tokens/s INFO:__main__:2024-11-30 14:58:49 | Epoch: 1 | Step: 373260 | Dataset: 0-3021098 | Loss: 0.640 | 598 ms/step , 115330.02 GFLOP/s , 173249.4 tokens/s INFO:__main__:2024-11-30 14:58:56 | Epoch: 1 | Step: 373270 | Dataset: 0-3023498 | Loss: 0.691 | 598 ms/step , 115370.25 GFLOP/s , 173280.5 tokens/s INFO:__main__:2024-11-30 14:59:03 | Epoch: 1 | Step: 373280 | Dataset: 0-3025898 | Loss: 0.689 | 599 ms/step , 115160.44 GFLOP/s , 173394.4 tokens/s INFO:__main__:2024-11-30 14:59:11 | Epoch: 1 | Step: 373290 | Dataset: 0-3028298 | Loss: 0.618 | 597 ms/step , 115532.12 GFLOP/s , 173265.3 tokens/s INFO:__main__:2024-11-30 14:59:18 | Epoch: 1 | Step: 373300 | Dataset: 0-3030698 | Loss: 0.613 | 599 ms/step , 115287.29 GFLOP/s , 173293.8 tokens/s INFO:__main__:2024-11-30 14:59:25 | Epoch: 1 | Step: 373310 | Dataset: 0-3033098 | Loss: 0.673 | 598 ms/step , 115399.68 GFLOP/s , 173295.2 tokens/s INFO:__main__:2024-11-30 14:59:32 | Epoch: 1 | Step: 373320 | Dataset: 0-3035498 | Loss: 0.658 | 600 ms/step , 115050.40 GFLOP/s , 173129.6 tokens/s INFO:__main__:2024-11-30 14:59:39 | Epoch: 1 | Step: 373330 | Dataset: 0-3037898 | Loss: 0.636 | 598 ms/step , 115357.25 GFLOP/s , 173352.0 tokens/s INFO:__main__:2024-11-30 14:59:46 | Epoch: 1 | Step: 373340 | Dataset: 0-3040298 | Loss: 0.761 | 597 ms/step , 115514.34 GFLOP/s , 173307.1 tokens/s INFO:__main__:2024-11-30 14:59:53 | Epoch: 1 | Step: 373350 | Dataset: 0-3042698 | Loss: 0.756 | 598 ms/step , 115448.98 GFLOP/s , 172933.8 tokens/s INFO:__main__:2024-11-30 15:00:00 | Epoch: 1 | Step: 373360 | Dataset: 0-3045098 | Loss: 0.647 | 599 ms/step , 115308.78 GFLOP/s , 173289.4 tokens/s INFO:__main__:2024-11-30 15:00:07 | Epoch: 1 | Step: 373370 | Dataset: 0-3047498 | Loss: 0.719 | 598 ms/step , 115396.65 GFLOP/s , 173273.8 tokens/s INFO:__main__:2024-11-30 15:00:14 | Epoch: 1 | Step: 373380 | Dataset: 0-3049898 | Loss: 0.679 | 598 ms/step , 115445.31 GFLOP/s , 173330.2 tokens/s INFO:__main__:2024-11-30 15:00:21 | Epoch: 1 | Step: 373390 | Dataset: 0-3052298 | Loss: 0.617 | 597 ms/step , 115527.38 GFLOP/s , 173391.2 tokens/s INFO:__main__:2024-11-30 15:00:29 | Epoch: 1 | Step: 373400 | Dataset: 0-3054698 | Loss: 0.595 | 599 ms/step , 115267.46 GFLOP/s , 173329.7 tokens/s INFO:__main__:2024-11-30 15:00:36 | Epoch: 1 | Step: 373410 | Dataset: 0-3057098 | Loss: 0.645 | 598 ms/step , 115331.30 GFLOP/s , 173187.0 tokens/s INFO:__main__:2024-11-30 15:00:43 | Epoch: 1 | Step: 373420 | Dataset: 0-3059498 | Loss: 0.730 | 598 ms/step , 115494.35 GFLOP/s , 173307.7 tokens/s INFO:__main__:2024-11-30 15:00:50 | Epoch: 1 | Step: 373430 | Dataset: 0-3061898 | Loss: 0.636 | 598 ms/step , 115492.66 GFLOP/s , 173274.7 tokens/s INFO:__main__:2024-11-30 15:00:57 | Epoch: 1 | Step: 373440 | Dataset: 0-3064298 | Loss: 0.664 | 597 ms/step , 115505.64 GFLOP/s , 173293.3 tokens/s INFO:__main__:2024-11-30 15:01:04 | Epoch: 1 | Step: 373450 | Dataset: 0-3066698 | Loss: 0.579 | 598 ms/step , 115334.96 GFLOP/s , 173256.9 tokens/s INFO:__main__:2024-11-30 15:01:11 | Epoch: 1 | Step: 373460 | Dataset: 0-3069098 | Loss: 0.739 | 599 ms/step , 115265.42 GFLOP/s , 173210.3 tokens/s INFO:__main__:2024-11-30 15:01:18 | Epoch: 1 | Step: 373470 | Dataset: 0-3071498 | Loss: 0.656 | 598 ms/step , 115359.42 GFLOP/s , 173329.7 tokens/s INFO:__main__:2024-11-30 15:01:25 | Epoch: 1 | Step: 373480 | Dataset: 0-3073898 | Loss: 0.706 | 598 ms/step , 115375.15 GFLOP/s , 173364.1 tokens/s INFO:__main__:2024-11-30 15:01:32 | Epoch: 1 | Step: 373490 | Dataset: 0-3076298 | Loss: 0.590 | 598 ms/step , 115381.75 GFLOP/s , 173246.7 tokens/s INFO:__main__:2024-11-30 15:01:40 | Validation | Step: 373500 | Val_loss: 0.421 | Best_val_loss: 0.3487 INFO:__main__:2024-11-30 15:01:41 | Epoch: 1 | Step: 373500 | Dataset: 0-3078698 | Loss: 0.627 | 597 ms/step , 115643.31 GFLOP/s , 147608.7 tokens/s INFO:__main__:2024-11-30 15:01:48 | Epoch: 1 | Step: 373510 | Dataset: 0-3081098 | Loss: 0.662 | 598 ms/step , 115344.41 GFLOP/s , 173332.4 tokens/s INFO:__main__:2024-11-30 15:01:55 | Epoch: 1 | Step: 373520 | Dataset: 0-3083498 | Loss: 0.692 | 598 ms/step , 115393.42 GFLOP/s , 173417.2 tokens/s INFO:__main__:2024-11-30 15:02:02 | Epoch: 1 | Step: 373530 | Dataset: 0-3085898 | Loss: 0.617 | 598 ms/step , 115342.38 GFLOP/s , 173335.8 tokens/s INFO:__main__:2024-11-30 15:02:09 | Epoch: 1 | Step: 373540 | Dataset: 0-3088298 | Loss: 0.613 | 599 ms/step , 115199.49 GFLOP/s , 173345.0 tokens/s INFO:__main__:2024-11-30 15:02:16 | Epoch: 1 | Step: 373550 | Dataset: 0-3090698 | Loss: 0.596 | 598 ms/step , 115413.47 GFLOP/s , 173334.4 tokens/s INFO:__main__:2024-11-30 15:02:23 | Epoch: 1 | Step: 373560 | Dataset: 0-3093098 | Loss: 0.716 | 598 ms/step , 115393.46 GFLOP/s , 173302.7 tokens/s INFO:__main__:2024-11-30 15:02:30 | Epoch: 1 | Step: 373570 | Dataset: 0-3095498 | Loss: 0.658 | 598 ms/step , 115334.87 GFLOP/s , 173369.1 tokens/s INFO:__main__:2024-11-30 15:02:37 | Epoch: 1 | Step: 373580 | Dataset: 0-3097898 | Loss: 0.625 | 599 ms/step , 115264.61 GFLOP/s , 173335.8 tokens/s INFO:__main__:2024-11-30 15:02:44 | Epoch: 1 | Step: 373590 | Dataset: 0-3100298 | Loss: 0.716 | 597 ms/step , 115582.22 GFLOP/s , 173327.0 tokens/s INFO:__main__:2024-11-30 15:02:52 | Epoch: 1 | Step: 373600 | Dataset: 0-3102698 | Loss: 0.670 | 602 ms/step , 114689.07 GFLOP/s , 173165.7 tokens/s INFO:__main__:2024-11-30 15:02:59 | Epoch: 1 | Step: 373610 | Dataset: 0-3105098 | Loss: 0.679 | 598 ms/step , 115395.12 GFLOP/s , 173281.5 tokens/s INFO:__main__:2024-11-30 15:03:06 | Epoch: 1 | Step: 373620 | Dataset: 0-3107498 | Loss: 0.614 | 598 ms/step , 115399.51 GFLOP/s , 173347.0 tokens/s INFO:__main__:2024-11-30 15:03:13 | Epoch: 1 | Step: 373630 | Dataset: 0-3109898 | Loss: 0.680 | 598 ms/step , 115353.79 GFLOP/s , 173366.0 tokens/s INFO:__main__:2024-11-30 15:03:20 | Epoch: 1 | Step: 373640 | Dataset: 0-3112298 | Loss: 0.675 | 598 ms/step , 115344.84 GFLOP/s , 173264.3 tokens/s INFO:__main__:2024-11-30 15:03:27 | Epoch: 1 | Step: 373650 | Dataset: 0-3114698 | Loss: 0.687 | 599 ms/step , 115303.52 GFLOP/s , 173199.1 tokens/s INFO:__main__:2024-11-30 15:03:34 | Epoch: 1 | Step: 373660 | Dataset: 0-3117098 | Loss: 0.683 | 598 ms/step , 115438.41 GFLOP/s , 173325.4 tokens/s INFO:__main__:2024-11-30 15:03:41 | Epoch: 1 | Step: 373670 | Dataset: 0-3119498 | Loss: 0.597 | 598 ms/step , 115360.55 GFLOP/s , 173326.8 tokens/s INFO:__main__:2024-11-30 15:03:48 | Epoch: 1 | Step: 373680 | Dataset: 0-3121898 | Loss: 0.712 | 598 ms/step , 115344.85 GFLOP/s , 173280.7 tokens/s INFO:__main__:2024-11-30 15:03:55 | Epoch: 1 | Step: 373690 | Dataset: 0-3124298 | Loss: 0.675 | 599 ms/step , 115151.08 GFLOP/s , 173250.2 tokens/s INFO:__main__:2024-11-30 15:04:02 | Epoch: 1 | Step: 373700 | Dataset: 0-3126698 | Loss: 0.597 | 598 ms/step , 115465.72 GFLOP/s , 173309.2 tokens/s INFO:__main__:2024-11-30 15:04:10 | Epoch: 1 | Step: 373710 | Dataset: 0-3129098 | Loss: 0.591 | 598 ms/step , 115439.30 GFLOP/s , 173285.2 tokens/s INFO:__main__:2024-11-30 15:04:17 | Epoch: 1 | Step: 373720 | Dataset: 0-3131498 | Loss: 0.720 | 598 ms/step , 115324.71 GFLOP/s , 173315.8 tokens/s INFO:__main__:2024-11-30 15:04:24 | Epoch: 1 | Step: 373730 | Dataset: 0-3133898 | Loss: 0.616 | 598 ms/step , 115449.12 GFLOP/s , 173273.7 tokens/s INFO:__main__:2024-11-30 15:04:31 | Epoch: 1 | Step: 373740 | Dataset: 0-3136298 | Loss: 0.679 | 599 ms/step , 115288.10 GFLOP/s , 173296.4 tokens/s INFO:__main__:2024-11-30 15:04:38 | Epoch: 1 | Step: 373750 | Dataset: 0-3138698 | Loss: 0.809 | 598 ms/step , 115441.15 GFLOP/s , 173287.2 tokens/s INFO:__main__:2024-11-30 15:04:45 | Epoch: 1 | Step: 373760 | Dataset: 0-3141098 | Loss: 0.652 | 598 ms/step , 115429.80 GFLOP/s , 173193.8 tokens/s INFO:__main__:2024-11-30 15:04:52 | Epoch: 1 | Step: 373770 | Dataset: 0-3143498 | Loss: 0.656 | 598 ms/step , 115334.84 GFLOP/s , 173344.7 tokens/s INFO:__main__:2024-11-30 15:04:59 | Epoch: 1 | Step: 373780 | Dataset: 0-3145898 | Loss: 0.659 | 598 ms/step , 115430.58 GFLOP/s , 173249.4 tokens/s INFO:__main__:2024-11-30 15:05:06 | Epoch: 1 | Step: 373790 | Dataset: 0-3148298 | Loss: 0.343 | 598 ms/step , 115409.35 GFLOP/s , 173322.2 tokens/s INFO:__main__:2024-11-30 15:05:13 | Epoch: 1 | Step: 373800 | Dataset: 0-3150698 | Loss: 0.369 | 597 ms/step , 115548.30 GFLOP/s , 173347.1 tokens/s INFO:__main__:2024-11-30 15:05:20 | Epoch: 1 | Step: 373810 | Dataset: 0-3153098 | Loss: 0.362 | 598 ms/step , 115492.24 GFLOP/s , 173341.3 tokens/s INFO:__main__:2024-11-30 15:05:28 | Epoch: 1 | Step: 373820 | Dataset: 0-3155498 | Loss: 0.387 | 598 ms/step , 115437.28 GFLOP/s , 173338.9 tokens/s INFO:__main__:2024-11-30 15:05:35 | Epoch: 1 | Step: 373830 | Dataset: 0-3157898 | Loss: 0.234 | 597 ms/step , 115549.51 GFLOP/s , 173352.8 tokens/s INFO:__main__:2024-11-30 15:05:42 | Epoch: 1 | Step: 373840 | Dataset: 0-3160298 | Loss: 0.377 | 598 ms/step , 115387.03 GFLOP/s , 173344.6 tokens/s INFO:__main__:2024-11-30 15:05:49 | Epoch: 1 | Step: 373850 | Dataset: 0-3162698 | Loss: 0.388 | 598 ms/step , 115445.25 GFLOP/s , 173318.3 tokens/s INFO:__main__:2024-11-30 15:05:56 | Epoch: 1 | Step: 373860 | Dataset: 0-3165098 | Loss: 0.343 | 598 ms/step , 115336.63 GFLOP/s , 173330.3 tokens/s INFO:__main__:2024-11-30 15:06:03 | Epoch: 1 | Step: 373870 | Dataset: 0-3167498 | Loss: 0.401 | 598 ms/step , 115457.19 GFLOP/s , 173346.6 tokens/s INFO:__main__:2024-11-30 15:06:10 | Epoch: 1 | Step: 373880 | Dataset: 0-3169898 | Loss: 0.346 | 598 ms/step , 115480.77 GFLOP/s , 173278.1 tokens/s INFO:__main__:2024-11-30 15:06:17 | Epoch: 1 | Step: 373890 | Dataset: 0-3172298 | Loss: 0.401 | 598 ms/step , 115342.08 GFLOP/s , 173305.9 tokens/s INFO:__main__:2024-11-30 15:06:24 | Epoch: 1 | Step: 373900 | Dataset: 0-3174698 | Loss: 0.379 | 597 ms/step , 115502.69 GFLOP/s , 173352.3 tokens/s INFO:__main__:2024-11-30 15:06:31 | Epoch: 1 | Step: 373910 | Dataset: 0-3177098 | Loss: 0.345 | 598 ms/step , 115368.28 GFLOP/s , 173311.5 tokens/s INFO:__main__:2024-11-30 15:06:38 | Epoch: 1 | Step: 373920 | Dataset: 0-3179498 | Loss: 0.382 | 599 ms/step , 115278.40 GFLOP/s , 173353.1 tokens/s INFO:__main__:2024-11-30 15:06:46 | Epoch: 1 | Step: 373930 | Dataset: 0-3181898 | Loss: 0.373 | 598 ms/step , 115440.00 GFLOP/s , 173263.4 tokens/s INFO:__main__:2024-11-30 15:06:53 | Epoch: 1 | Step: 373940 | Dataset: 0-3184298 | Loss: 0.349 | 598 ms/step , 115402.00 GFLOP/s , 173254.0 tokens/s INFO:__main__:2024-11-30 15:07:00 | Epoch: 1 | Step: 373950 | Dataset: 0-3186698 | Loss: 0.354 | 597 ms/step , 115512.29 GFLOP/s , 173348.9 tokens/s INFO:__main__:2024-11-30 15:07:07 | Epoch: 1 | Step: 373960 | Dataset: 0-3189098 | Loss: 0.369 | 598 ms/step , 115387.47 GFLOP/s , 173290.2 tokens/s INFO:__main__:2024-11-30 15:07:14 | Epoch: 1 | Step: 373970 | Dataset: 0-3191498 | Loss: 0.345 | 598 ms/step , 115485.80 GFLOP/s , 173329.1 tokens/s INFO:__main__:2024-11-30 15:07:21 | Epoch: 1 | Step: 373980 | Dataset: 0-3193898 | Loss: 0.327 | 598 ms/step , 115449.34 GFLOP/s , 173392.2 tokens/s INFO:__main__:2024-11-30 15:07:28 | Epoch: 1 | Step: 373990 | Dataset: 0-3196298 | Loss: 0.377 | 598 ms/step , 115444.09 GFLOP/s , 173285.7 tokens/s INFO:__main__:2024-11-30 15:07:36 | Validation | Step: 374000 | Val_loss: 0.429 | Best_val_loss: 0.3487 INFO:__main__:2024-11-30 15:07:36 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_150736_step_374000.pt` INFO:__main__:2024-11-30 15:07:39 | Epoch: 1 | Step: 374000 | Dataset: 0-3198698 | Loss: 0.357 | 595 ms/step , 116039.30 GFLOP/s , 117904.1 tokens/s INFO:__main__:2024-11-30 15:07:46 | Epoch: 1 | Step: 374010 | Dataset: 0-3201098 | Loss: 0.347 | 598 ms/step , 115456.62 GFLOP/s , 173557.3 tokens/s INFO:__main__:2024-11-30 15:07:53 | Epoch: 1 | Step: 374020 | Dataset: 0-3203498 | Loss: 0.366 | 597 ms/step , 115571.97 GFLOP/s , 173473.4 tokens/s INFO:__main__:2024-11-30 15:08:00 | Epoch: 1 | Step: 374030 | Dataset: 0-3205898 | Loss: 0.343 | 597 ms/step , 115592.13 GFLOP/s , 173559.2 tokens/s INFO:__main__:2024-11-30 15:08:07 | Epoch: 1 | Step: 374040 | Dataset: 0-3208298 | Loss: 0.334 | 598 ms/step , 115425.36 GFLOP/s , 173411.2 tokens/s INFO:__main__:2024-11-30 15:08:14 | Epoch: 1 | Step: 374050 | Dataset: 0-3210698 | Loss: 0.335 | 598 ms/step , 115453.36 GFLOP/s , 173459.6 tokens/s INFO:__main__:2024-11-30 15:08:21 | Epoch: 1 | Step: 374060 | Dataset: 0-3213098 | Loss: 0.360 | 597 ms/step , 115514.27 GFLOP/s , 173325.4 tokens/s INFO:__main__:2024-11-30 15:08:28 | Epoch: 1 | Step: 374070 | Dataset: 0-3215498 | Loss: 0.370 | 598 ms/step , 115501.35 GFLOP/s , 173470.1 tokens/s INFO:__main__:2024-11-30 15:08:35 | Epoch: 1 | Step: 374080 | Dataset: 0-3217898 | Loss: 0.370 | 599 ms/step , 115285.82 GFLOP/s , 173324.2 tokens/s INFO:__main__:2024-11-30 15:08:42 | Epoch: 1 | Step: 374090 | Dataset: 0-3220298 | Loss: 0.367 | 597 ms/step , 115618.22 GFLOP/s , 173422.6 tokens/s INFO:__main__:2024-11-30 15:08:49 | Epoch: 1 | Step: 374100 | Dataset: 0-3222698 | Loss: 0.375 | 598 ms/step , 115377.74 GFLOP/s , 173367.8 tokens/s INFO:__main__:2024-11-30 15:08:56 | Epoch: 1 | Step: 374110 | Dataset: 0-3225098 | Loss: 0.336 | 598 ms/step , 115499.25 GFLOP/s , 173337.7 tokens/s INFO:__main__:2024-11-30 15:09:04 | Epoch: 1 | Step: 374120 | Dataset: 0-3227498 | Loss: 0.324 | 598 ms/step , 115375.86 GFLOP/s , 173413.4 tokens/s INFO:__main__:2024-11-30 15:09:11 | Epoch: 1 | Step: 374130 | Dataset: 0-3229898 | Loss: 0.362 | 598 ms/step , 115413.12 GFLOP/s , 173336.8 tokens/s INFO:__main__:2024-11-30 15:09:18 | Epoch: 1 | Step: 374140 | Dataset: 0-3232298 | Loss: 0.295 | 597 ms/step , 115635.91 GFLOP/s , 173344.7 tokens/s INFO:__main__:2024-11-30 15:09:25 | Epoch: 1 | Step: 374150 | Dataset: 0-3234698 | Loss: 0.382 | 597 ms/step , 115558.48 GFLOP/s , 173359.5 tokens/s INFO:__main__:2024-11-30 15:09:32 | Epoch: 1 | Step: 374160 | Dataset: 0-3237098 | Loss: 0.368 | 600 ms/step , 115013.22 GFLOP/s , 173258.1 tokens/s INFO:__main__:2024-11-30 15:09:39 | Epoch: 1 | Step: 374170 | Dataset: 0-3239498 | Loss: 0.369 | 597 ms/step , 115552.56 GFLOP/s , 173343.0 tokens/s INFO:__main__:2024-11-30 15:09:46 | Epoch: 1 | Step: 374180 | Dataset: 0-3241898 | Loss: 0.375 | 598 ms/step , 115469.89 GFLOP/s , 173286.0 tokens/s INFO:__main__:2024-11-30 15:09:53 | Epoch: 1 | Step: 374190 | Dataset: 0-3244298 | Loss: 0.341 | 597 ms/step , 115502.22 GFLOP/s , 173210.0 tokens/s INFO:__main__:2024-11-30 15:10:00 | Epoch: 1 | Step: 374200 | Dataset: 0-3246698 | Loss: 0.315 | 598 ms/step , 115462.35 GFLOP/s , 173373.8 tokens/s INFO:__main__:2024-11-30 15:10:07 | Epoch: 1 | Step: 374210 | Dataset: 0-3249098 | Loss: 0.370 | 598 ms/step , 115372.67 GFLOP/s , 173272.1 tokens/s INFO:__main__:2024-11-30 15:10:14 | Epoch: 1 | Step: 374220 | Dataset: 0-3251498 | Loss: 0.337 | 598 ms/step , 115500.31 GFLOP/s , 173323.9 tokens/s INFO:__main__:2024-11-30 15:10:22 | Epoch: 1 | Step: 374230 | Dataset: 0-3253898 | Loss: 0.320 | 598 ms/step , 115493.24 GFLOP/s , 173336.8 tokens/s INFO:__main__:2024-11-30 15:10:29 | Epoch: 1 | Step: 374240 | Dataset: 0-3256298 | Loss: 0.334 | 598 ms/step , 115434.20 GFLOP/s , 173337.3 tokens/s INFO:__main__:2024-11-30 15:10:36 | Epoch: 1 | Step: 374250 | Dataset: 0-3258698 | Loss: 0.348 | 598 ms/step , 115339.92 GFLOP/s , 173335.9 tokens/s INFO:__main__:2024-11-30 15:10:43 | Epoch: 1 | Step: 374260 | Dataset: 0-3261098 | Loss: 0.368 | 597 ms/step , 115645.78 GFLOP/s , 173343.0 tokens/s INFO:__main__:2024-11-30 15:10:50 | Epoch: 1 | Step: 374270 | Dataset: 0-3263498 | Loss: 0.402 | 598 ms/step , 115426.78 GFLOP/s , 173197.8 tokens/s INFO:__main__:2024-11-30 15:10:57 | Epoch: 1 | Step: 374280 | Dataset: 0-3265898 | Loss: 0.389 | 597 ms/step , 115593.30 GFLOP/s , 173349.8 tokens/s INFO:__main__:2024-11-30 15:11:04 | Epoch: 1 | Step: 374290 | Dataset: 0-3268298 | Loss: 0.352 | 597 ms/step , 115551.08 GFLOP/s , 173267.6 tokens/s INFO:__main__:2024-11-30 15:11:11 | Epoch: 1 | Step: 374300 | Dataset: 0-3270698 | Loss: 0.335 | 597 ms/step , 115596.90 GFLOP/s , 173385.9 tokens/s INFO:__main__:2024-11-30 15:11:18 | Epoch: 1 | Step: 374310 | Dataset: 0-3273098 | Loss: 0.331 | 598 ms/step , 115450.24 GFLOP/s , 173290.8 tokens/s INFO:__main__:2024-11-30 15:11:25 | Epoch: 1 | Step: 374320 | Dataset: 0-3275498 | Loss: 0.400 | 597 ms/step , 115514.64 GFLOP/s , 173371.1 tokens/s INFO:__main__:2024-11-30 15:11:32 | Epoch: 1 | Step: 374330 | Dataset: 0-3277898 | Loss: 0.452 | 598 ms/step , 115379.28 GFLOP/s , 173297.5 tokens/s INFO:__main__:2024-11-30 15:11:40 | Epoch: 1 | Step: 374340 | Dataset: 0-3280298 | Loss: 0.442 | 598 ms/step , 115321.69 GFLOP/s , 173264.5 tokens/s INFO:__main__:2024-11-30 15:11:47 | Epoch: 1 | Step: 374350 | Dataset: 0-3282698 | Loss: 0.401 | 598 ms/step , 115396.59 GFLOP/s , 173330.0 tokens/s INFO:__main__:2024-11-30 15:11:54 | Epoch: 1 | Step: 374360 | Dataset: 0-3285098 | Loss: 0.438 | 598 ms/step , 115453.98 GFLOP/s , 173287.5 tokens/s INFO:__main__:2024-11-30 15:12:01 | Epoch: 1 | Step: 374370 | Dataset: 0-3287498 | Loss: 0.405 | 598 ms/step , 115389.85 GFLOP/s , 173320.9 tokens/s INFO:__main__:2024-11-30 15:12:08 | Epoch: 1 | Step: 374380 | Dataset: 0-3289898 | Loss: 0.384 | 597 ms/step , 115549.52 GFLOP/s , 173318.5 tokens/s INFO:__main__:2024-11-30 15:12:15 | Epoch: 1 | Step: 374390 | Dataset: 0-3292298 | Loss: 0.423 | 597 ms/step , 115563.30 GFLOP/s , 173279.7 tokens/s INFO:__main__:2024-11-30 15:12:22 | Epoch: 1 | Step: 374400 | Dataset: 0-3294698 | Loss: 0.404 | 598 ms/step , 115363.36 GFLOP/s , 173278.5 tokens/s INFO:__main__:2024-11-30 15:12:29 | Epoch: 1 | Step: 374410 | Dataset: 0-3297098 | Loss: 0.420 | 598 ms/step , 115441.62 GFLOP/s , 173267.5 tokens/s INFO:__main__:2024-11-30 15:12:36 | Epoch: 1 | Step: 374420 | Dataset: 0-3299498 | Loss: 0.378 | 598 ms/step , 115340.15 GFLOP/s , 173380.6 tokens/s INFO:__main__:2024-11-30 15:12:43 | Epoch: 1 | Step: 374430 | Dataset: 0-3301898 | Loss: 0.362 | 598 ms/step , 115328.55 GFLOP/s , 173287.9 tokens/s INFO:__main__:2024-11-30 15:12:50 | Epoch: 1 | Step: 374440 | Dataset: 0-3304298 | Loss: 0.379 | 598 ms/step , 115395.61 GFLOP/s , 173255.7 tokens/s INFO:__main__:2024-11-30 15:12:58 | Epoch: 1 | Step: 374450 | Dataset: 0-3306698 | Loss: 0.456 | 598 ms/step , 115492.47 GFLOP/s , 173237.0 tokens/s INFO:__main__:2024-11-30 15:13:05 | Epoch: 1 | Step: 374460 | Dataset: 0-3309098 | Loss: 0.414 | 598 ms/step , 115317.68 GFLOP/s , 173315.6 tokens/s INFO:__main__:2024-11-30 15:13:12 | Epoch: 1 | Step: 374470 | Dataset: 0-3311498 | Loss: 0.436 | 598 ms/step , 115347.90 GFLOP/s , 173335.0 tokens/s INFO:__main__:2024-11-30 15:13:19 | Epoch: 1 | Step: 374480 | Dataset: 0-3313898 | Loss: 0.403 | 598 ms/step , 115362.02 GFLOP/s , 173308.1 tokens/s INFO:__main__:2024-11-30 15:13:26 | Epoch: 1 | Step: 374490 | Dataset: 0-3316298 | Loss: 0.416 | 597 ms/step , 115505.51 GFLOP/s , 173257.4 tokens/s INFO:__main__:2024-11-30 15:13:33 | Validation | Step: 374500 | Val_loss: 0.412 | Best_val_loss: 0.3487 INFO:__main__:2024-11-30 15:13:34 | Epoch: 1 | Step: 374500 | Dataset: 0-3318698 | Loss: 0.415 | 597 ms/step , 115551.10 GFLOP/s , 147622.4 tokens/s INFO:__main__:2024-11-30 15:13:41 | Epoch: 1 | Step: 374510 | Dataset: 0-3321098 | Loss: 0.428 | 597 ms/step , 115531.90 GFLOP/s , 173379.3 tokens/s INFO:__main__:2024-11-30 15:13:48 | Epoch: 1 | Step: 374520 | Dataset: 0-3323498 | Loss: 0.409 | 598 ms/step , 115396.00 GFLOP/s , 173361.4 tokens/s INFO:__main__:2024-11-30 15:13:55 | Epoch: 1 | Step: 374530 | Dataset: 0-3325898 | Loss: 0.420 | 598 ms/step , 115360.51 GFLOP/s , 173303.6 tokens/s INFO:__main__:2024-11-30 15:14:03 | Epoch: 1 | Step: 374540 | Dataset: 0-3328298 | Loss: 0.364 | 597 ms/step , 115642.95 GFLOP/s , 173328.4 tokens/s INFO:__main__:2024-11-30 15:14:10 | Epoch: 1 | Step: 374550 | Dataset: 0-3330698 | Loss: 0.409 | 598 ms/step , 115384.17 GFLOP/s , 173251.0 tokens/s INFO:__main__:2024-11-30 15:14:17 | Epoch: 1 | Step: 374560 | Dataset: 0-3333098 | Loss: 0.388 | 597 ms/step , 115508.70 GFLOP/s , 173350.8 tokens/s INFO:__main__:2024-11-30 15:14:24 | Epoch: 1 | Step: 374570 | Dataset: 0-3335498 | Loss: 0.382 | 599 ms/step , 115266.07 GFLOP/s , 173277.9 tokens/s INFO:__main__:2024-11-30 15:14:31 | Epoch: 1 | Step: 374580 | Dataset: 0-3337898 | Loss: 0.407 | 598 ms/step , 115481.05 GFLOP/s , 173340.8 tokens/s INFO:__main__:2024-11-30 15:14:38 | Epoch: 1 | Step: 374590 | Dataset: 0-3340298 | Loss: 0.367 | 598 ms/step , 115391.57 GFLOP/s , 173288.8 tokens/s INFO:__main__:2024-11-30 15:14:45 | Epoch: 1 | Step: 374600 | Dataset: 0-3342698 | Loss: 0.371 | 598 ms/step , 115453.54 GFLOP/s , 173330.3 tokens/s INFO:__main__:2024-11-30 15:14:52 | Epoch: 1 | Step: 374610 | Dataset: 0-3345098 | Loss: 0.399 | 598 ms/step , 115310.22 GFLOP/s , 173308.0 tokens/s INFO:__main__:2024-11-30 15:14:59 | Epoch: 1 | Step: 374620 | Dataset: 0-3347498 | Loss: 0.363 | 597 ms/step , 115626.58 GFLOP/s , 173359.9 tokens/s INFO:__main__:2024-11-30 15:15:06 | Epoch: 1 | Step: 374630 | Dataset: 0-3349898 | Loss: 0.384 | 599 ms/step , 115280.30 GFLOP/s , 173327.4 tokens/s INFO:__main__:2024-11-30 15:15:13 | Epoch: 1 | Step: 374640 | Dataset: 0-3352298 | Loss: 0.431 | 597 ms/step , 115527.55 GFLOP/s , 173307.4 tokens/s INFO:__main__:2024-11-30 15:15:21 | Epoch: 1 | Step: 374650 | Dataset: 0-3354698 | Loss: 0.388 | 598 ms/step , 115363.76 GFLOP/s , 173256.3 tokens/s INFO:__main__:2024-11-30 15:15:28 | Epoch: 1 | Step: 374660 | Dataset: 0-3357098 | Loss: 0.393 | 597 ms/step , 115565.40 GFLOP/s , 173360.4 tokens/s INFO:__main__:2024-11-30 15:15:35 | Epoch: 1 | Step: 374670 | Dataset: 0-3359498 | Loss: 0.378 | 598 ms/step , 115386.06 GFLOP/s , 173321.3 tokens/s INFO:__main__:2024-11-30 15:15:42 | Epoch: 1 | Step: 374680 | Dataset: 0-3361898 | Loss: 0.405 | 598 ms/step , 115470.10 GFLOP/s , 173325.4 tokens/s INFO:__main__:2024-11-30 15:15:49 | Epoch: 1 | Step: 374690 | Dataset: 0-3364298 | Loss: 0.399 | 597 ms/step , 115538.87 GFLOP/s , 173282.1 tokens/s INFO:__main__:2024-11-30 15:15:56 | Epoch: 1 | Step: 374700 | Dataset: 0-3366698 | Loss: 0.393 | 597 ms/step , 115522.70 GFLOP/s , 173318.9 tokens/s INFO:__main__:2024-11-30 15:16:03 | Epoch: 1 | Step: 374710 | Dataset: 0-3369098 | Loss: 0.456 | 598 ms/step , 115472.93 GFLOP/s , 173301.9 tokens/s INFO:__main__:2024-11-30 15:16:10 | Epoch: 1 | Step: 374720 | Dataset: 0-3371498 | Loss: 0.424 | 598 ms/step , 115390.46 GFLOP/s , 173308.4 tokens/s INFO:__main__:2024-11-30 15:16:17 | Epoch: 1 | Step: 374730 | Dataset: 0-3373898 | Loss: 0.389 | 598 ms/step , 115494.65 GFLOP/s , 173219.4 tokens/s INFO:__main__:2024-11-30 15:16:24 | Epoch: 1 | Step: 374740 | Dataset: 0-3376298 | Loss: 0.427 | 599 ms/step , 115245.79 GFLOP/s , 173331.5 tokens/s INFO:__main__:2024-11-30 15:16:31 | Epoch: 1 | Step: 374750 | Dataset: 0-3378698 | Loss: 0.358 | 599 ms/step , 115281.69 GFLOP/s , 173304.1 tokens/s INFO:__main__:2024-11-30 15:16:39 | Epoch: 1 | Step: 374760 | Dataset: 0-3381098 | Loss: 0.417 | 599 ms/step , 115194.25 GFLOP/s , 173277.3 tokens/s INFO:__main__:2024-11-30 15:16:46 | Epoch: 1 | Step: 374770 | Dataset: 0-3383498 | Loss: 0.393 | 598 ms/step , 115484.47 GFLOP/s , 173339.6 tokens/s INFO:__main__:2024-11-30 15:16:53 | Epoch: 1 | Step: 374780 | Dataset: 0-3385898 | Loss: 0.397 | 597 ms/step , 115538.21 GFLOP/s , 173380.7 tokens/s INFO:__main__:2024-11-30 15:17:00 | Epoch: 1 | Step: 374790 | Dataset: 0-3388298 | Loss: 0.428 | 598 ms/step , 115379.38 GFLOP/s , 173270.2 tokens/s INFO:__main__:2024-11-30 15:17:07 | Epoch: 1 | Step: 374800 | Dataset: 0-3390698 | Loss: 0.443 | 597 ms/step , 115506.51 GFLOP/s , 173340.1 tokens/s INFO:__main__:2024-11-30 15:17:14 | Epoch: 1 | Step: 374810 | Dataset: 0-3393098 | Loss: 1.083 | 598 ms/step , 115473.94 GFLOP/s , 173256.2 tokens/s INFO:__main__:2024-11-30 15:17:21 | Epoch: 1 | Step: 374820 | Dataset: 0-3395498 | Loss: 1.147 | 599 ms/step , 115298.09 GFLOP/s , 173207.7 tokens/s INFO:__main__:2024-11-30 15:17:28 | Epoch: 1 | Step: 374830 | Dataset: 0-3397898 | Loss: 0.802 | 597 ms/step , 115529.37 GFLOP/s , 173249.9 tokens/s INFO:__main__:2024-11-30 15:17:35 | Epoch: 1 | Step: 374840 | Dataset: 0-3400298 | Loss: 0.954 | 599 ms/step , 115300.81 GFLOP/s , 173197.5 tokens/s INFO:__main__:2024-11-30 15:17:42 | Epoch: 1 | Step: 374850 | Dataset: 0-3402698 | Loss: 1.028 | 599 ms/step , 115220.65 GFLOP/s , 173249.8 tokens/s INFO:__main__:2024-11-30 15:17:49 | Epoch: 1 | Step: 374860 | Dataset: 0-3405098 | Loss: 1.261 | 598 ms/step , 115458.73 GFLOP/s , 173269.9 tokens/s INFO:__main__:2024-11-30 15:17:57 | Epoch: 1 | Step: 374870 | Dataset: 0-3407498 | Loss: 0.585 | 598 ms/step , 115370.38 GFLOP/s , 173196.5 tokens/s INFO:__main__:2024-11-30 15:18:04 | Epoch: 1 | Step: 374880 | Dataset: 0-3409898 | Loss: 0.378 | 597 ms/step , 115619.23 GFLOP/s , 173372.6 tokens/s INFO:__main__:2024-11-30 15:18:11 | Epoch: 1 | Step: 374890 | Dataset: 0-3412298 | Loss: 0.391 | 598 ms/step , 115461.56 GFLOP/s , 173269.1 tokens/s INFO:__main__:2024-11-30 15:18:18 | Epoch: 1 | Step: 374900 | Dataset: 0-3414698 | Loss: 0.389 | 598 ms/step , 115424.51 GFLOP/s , 173230.9 tokens/s INFO:__main__:2024-11-30 15:18:25 | Epoch: 1 | Step: 374910 | Dataset: 0-3417098 | Loss: 0.345 | 598 ms/step , 115433.70 GFLOP/s , 173304.5 tokens/s INFO:__main__:2024-11-30 15:18:32 | Epoch: 1 | Step: 374920 | Dataset: 0-3419498 | Loss: 0.387 | 598 ms/step , 115397.73 GFLOP/s , 173215.9 tokens/s INFO:__main__:2024-11-30 15:18:39 | Epoch: 1 | Step: 374930 | Dataset: 0-3421898 | Loss: 0.363 | 598 ms/step , 115471.93 GFLOP/s , 173251.9 tokens/s INFO:__main__:2024-11-30 15:18:46 | Epoch: 1 | Step: 374940 | Dataset: 0-3424298 | Loss: 0.409 | 599 ms/step , 115235.10 GFLOP/s , 173210.5 tokens/s INFO:__main__:2024-11-30 15:18:53 | Epoch: 1 | Step: 374950 | Dataset: 0-3426698 | Loss: 0.354 | 598 ms/step , 115494.59 GFLOP/s , 173218.1 tokens/s INFO:__main__:2024-11-30 15:19:00 | Epoch: 1 | Step: 374960 | Dataset: 0-3429098 | Loss: 0.335 | 598 ms/step , 115326.87 GFLOP/s , 173267.4 tokens/s INFO:__main__:2024-11-30 15:19:07 | Epoch: 1 | Step: 374970 | Dataset: 0-3431498 | Loss: 0.368 | 598 ms/step , 115488.05 GFLOP/s , 173096.5 tokens/s INFO:__main__:2024-11-30 15:19:15 | Epoch: 1 | Step: 374980 | Dataset: 0-3433898 | Loss: 0.361 | 598 ms/step , 115324.60 GFLOP/s , 173233.5 tokens/s INFO:__main__:2024-11-30 15:19:22 | Epoch: 1 | Step: 374990 | Dataset: 0-3436298 | Loss: 0.397 | 598 ms/step , 115372.45 GFLOP/s , 173349.0 tokens/s INFO:__main__:2024-11-30 15:19:29 | Validation | Step: 375000 | Val_loss: 0.411 | Best_val_loss: 0.3487 INFO:__main__:2024-11-30 15:19:29 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_151929_step_375000.pt` INFO:__main__:2024-11-30 15:19:32 | Epoch: 1 | Step: 375000 | Dataset: 0-3438698 | Loss: 0.325 | 595 ms/step , 115973.88 GFLOP/s , 119516.3 tokens/s INFO:__main__:2024-11-30 15:19:39 | Epoch: 1 | Step: 375010 | Dataset: 0-3441098 | Loss: 0.360 | 598 ms/step , 115423.84 GFLOP/s , 173570.2 tokens/s INFO:__main__:2024-11-30 15:19:46 | Epoch: 1 | Step: 375020 | Dataset: 0-3443498 | Loss: 0.327 | 598 ms/step , 115406.40 GFLOP/s , 173447.4 tokens/s INFO:__main__:2024-11-30 15:19:53 | Epoch: 1 | Step: 375030 | Dataset: 0-3445898 | Loss: 0.368 | 597 ms/step , 115513.38 GFLOP/s , 173436.2 tokens/s INFO:__main__:2024-11-30 15:20:00 | Epoch: 1 | Step: 375040 | Dataset: 0-3448298 | Loss: 0.362 | 599 ms/step , 115245.99 GFLOP/s , 173318.4 tokens/s INFO:__main__:2024-11-30 15:20:07 | Epoch: 1 | Step: 375050 | Dataset: 0-3450698 | Loss: 0.392 | 598 ms/step , 115312.19 GFLOP/s , 173380.2 tokens/s INFO:__main__:2024-11-30 15:20:14 | Epoch: 1 | Step: 375060 | Dataset: 0-3453098 | Loss: 0.328 | 598 ms/step , 115491.71 GFLOP/s , 173333.3 tokens/s INFO:__main__:2024-11-30 15:20:22 | Epoch: 1 | Step: 375070 | Dataset: 0-3455498 | Loss: 0.390 | 597 ms/step , 115532.01 GFLOP/s , 173362.2 tokens/s INFO:__main__:2024-11-30 15:20:29 | Epoch: 1 | Step: 375080 | Dataset: 0-3457898 | Loss: 0.372 | 598 ms/step , 115378.11 GFLOP/s , 173326.3 tokens/s INFO:__main__:2024-11-30 15:20:36 | Epoch: 1 | Step: 375090 | Dataset: 0-3460298 | Loss: 0.416 | 598 ms/step , 115405.77 GFLOP/s , 173314.0 tokens/s INFO:__main__:2024-11-30 15:20:43 | Epoch: 1 | Step: 375100 | Dataset: 0-3462698 | Loss: 0.376 | 598 ms/step , 115418.53 GFLOP/s , 173310.2 tokens/s INFO:__main__:2024-11-30 15:20:50 | Epoch: 1 | Step: 375110 | Dataset: 0-3465098 | Loss: 0.317 | 598 ms/step , 115392.77 GFLOP/s , 173351.8 tokens/s INFO:__main__:2024-11-30 15:20:57 | Epoch: 1 | Step: 375120 | Dataset: 0-3467498 | Loss: 0.368 | 598 ms/step , 115386.76 GFLOP/s , 173297.9 tokens/s INFO:__main__:2024-11-30 15:21:04 | Epoch: 1 | Step: 375130 | Dataset: 0-3469898 | Loss: 0.373 | 598 ms/step , 115431.29 GFLOP/s , 173373.5 tokens/s INFO:__main__:2024-11-30 15:21:11 | Epoch: 1 | Step: 375140 | Dataset: 0-3472298 | Loss: 0.343 | 598 ms/step , 115398.23 GFLOP/s , 173174.1 tokens/s INFO:__main__:2024-11-30 15:21:18 | Epoch: 1 | Step: 375150 | Dataset: 0-3474698 | Loss: 0.320 | 598 ms/step , 115368.75 GFLOP/s , 173336.7 tokens/s INFO:__main__:2024-11-30 15:21:25 | Epoch: 1 | Step: 375160 | Dataset: 0-3477098 | Loss: 0.332 | 597 ms/step , 115513.17 GFLOP/s , 173252.4 tokens/s INFO:__main__:2024-11-30 15:21:32 | Epoch: 1 | Step: 375170 | Dataset: 0-3479498 | Loss: 0.357 | 598 ms/step , 115338.58 GFLOP/s , 173290.5 tokens/s INFO:__main__:2024-11-30 15:21:40 | Epoch: 1 | Step: 375180 | Dataset: 0-3481898 | Loss: 0.374 | 597 ms/step , 115565.63 GFLOP/s , 173250.8 tokens/s INFO:__main__:2024-11-30 15:21:47 | Epoch: 1 | Step: 375190 | Dataset: 0-3484298 | Loss: 0.316 | 598 ms/step , 115323.39 GFLOP/s , 173235.2 tokens/s INFO:__main__:2024-11-30 15:21:54 | Epoch: 1 | Step: 375200 | Dataset: 0-3486698 | Loss: 0.346 | 598 ms/step , 115444.77 GFLOP/s , 173286.6 tokens/s INFO:__main__:2024-11-30 15:22:01 | Epoch: 1 | Step: 375210 | Dataset: 0-3489098 | Loss: 0.342 | 599 ms/step , 115243.19 GFLOP/s , 173286.1 tokens/s INFO:__main__:2024-11-30 15:22:08 | Epoch: 1 | Step: 375220 | Dataset: 0-3491498 | Loss: 0.380 | 598 ms/step , 115430.36 GFLOP/s , 173243.9 tokens/s INFO:__main__:2024-11-30 15:22:15 | Epoch: 1 | Step: 375230 | Dataset: 0-3493898 | Loss: 0.306 | 598 ms/step , 115394.52 GFLOP/s , 173294.5 tokens/s INFO:__main__:2024-11-30 15:22:22 | Epoch: 1 | Step: 375240 | Dataset: 0-3496298 | Loss: 0.318 | 599 ms/step , 115221.35 GFLOP/s , 173276.2 tokens/s INFO:__main__:2024-11-30 15:22:29 | Epoch: 1 | Step: 375250 | Dataset: 0-3498698 | Loss: 0.341 | 598 ms/step , 115405.45 GFLOP/s , 173267.3 tokens/s INFO:__main__:2024-11-30 15:22:36 | Epoch: 1 | Step: 375260 | Dataset: 0-3501098 | Loss: 0.350 | 598 ms/step , 115500.21 GFLOP/s , 173267.7 tokens/s INFO:__main__:2024-11-30 15:22:43 | Epoch: 1 | Step: 375270 | Dataset: 0-3503498 | Loss: 0.362 | 598 ms/step , 115470.44 GFLOP/s , 173247.3 tokens/s INFO:__main__:2024-11-30 15:22:50 | Epoch: 1 | Step: 375280 | Dataset: 0-3505898 | Loss: 0.402 | 598 ms/step , 115444.86 GFLOP/s , 173241.9 tokens/s INFO:__main__:2024-11-30 15:22:58 | Epoch: 1 | Step: 375290 | Dataset: 0-3508298 | Loss: 0.348 | 598 ms/step , 115376.71 GFLOP/s , 173225.4 tokens/s INFO:__main__:2024-11-30 15:23:05 | Epoch: 1 | Step: 375300 | Dataset: 0-3510698 | Loss: 0.375 | 598 ms/step , 115420.58 GFLOP/s , 173249.5 tokens/s INFO:__main__:2024-11-30 15:23:12 | Epoch: 1 | Step: 375310 | Dataset: 0-3513098 | Loss: 0.322 | 598 ms/step , 115490.32 GFLOP/s , 173252.5 tokens/s INFO:__main__:2024-11-30 15:23:19 | Epoch: 1 | Step: 375320 | Dataset: 0-3515498 | Loss: 0.340 | 598 ms/step , 115354.49 GFLOP/s , 173248.7 tokens/s INFO:__main__:2024-11-30 15:23:26 | Epoch: 1 | Step: 375330 | Dataset: 0-3517898 | Loss: 0.338 | 598 ms/step , 115445.14 GFLOP/s , 173266.8 tokens/s INFO:__main__:2024-11-30 15:23:33 | Epoch: 1 | Step: 375340 | Dataset: 0-3520298 | Loss: 0.372 | 598 ms/step , 115350.95 GFLOP/s , 173197.5 tokens/s INFO:__main__:2024-11-30 15:23:40 | Epoch: 1 | Step: 375350 | Dataset: 0-3522698 | Loss: 0.333 | 598 ms/step , 115361.21 GFLOP/s , 173253.0 tokens/s INFO:__main__:2024-11-30 15:23:47 | Epoch: 1 | Step: 375360 | Dataset: 0-3525098 | Loss: 0.390 | 598 ms/step , 115310.82 GFLOP/s , 173177.1 tokens/s INFO:__main__:2024-11-30 15:23:54 | Epoch: 1 | Step: 375370 | Dataset: 0-3527498 | Loss: 0.381 | 598 ms/step , 115427.61 GFLOP/s , 173238.4 tokens/s INFO:__main__:2024-11-30 15:24:01 | Epoch: 1 | Step: 375380 | Dataset: 0-3529898 | Loss: 0.344 | 599 ms/step , 115300.96 GFLOP/s , 173239.0 tokens/s INFO:__main__:2024-11-30 15:24:08 | Epoch: 1 | Step: 375390 | Dataset: 0-3532298 | Loss: 0.336 | 598 ms/step , 115356.39 GFLOP/s , 173295.6 tokens/s INFO:__main__:2024-11-30 15:24:16 | Epoch: 1 | Step: 375400 | Dataset: 0-3534698 | Loss: 0.378 | 597 ms/step , 115570.28 GFLOP/s , 173258.7 tokens/s INFO:__main__:2024-11-30 15:24:23 | Epoch: 1 | Step: 375410 | Dataset: 0-3537098 | Loss: 0.351 | 598 ms/step , 115344.53 GFLOP/s , 173277.5 tokens/s INFO:__main__:2024-11-30 15:24:30 | Epoch: 1 | Step: 375420 | Dataset: 0-3539498 | Loss: 0.520 | 597 ms/step , 115510.71 GFLOP/s , 173292.6 tokens/s INFO:__main__:2024-11-30 15:24:37 | Epoch: 1 | Step: 375430 | Dataset: 0-3541898 | Loss: 0.550 | 599 ms/step , 115263.85 GFLOP/s , 173196.0 tokens/s INFO:__main__:2024-11-30 15:24:44 | Epoch: 1 | Step: 375440 | Dataset: 0-3544298 | Loss: 0.541 | 598 ms/step , 115356.04 GFLOP/s , 173176.6 tokens/s INFO:__main__:2024-11-30 15:24:51 | Epoch: 1 | Step: 375450 | Dataset: 0-3546698 | Loss: 0.511 | 598 ms/step , 115309.24 GFLOP/s , 173281.9 tokens/s INFO:__main__:2024-11-30 15:24:58 | Epoch: 1 | Step: 375460 | Dataset: 0-3549098 | Loss: 0.506 | 598 ms/step , 115315.40 GFLOP/s , 173261.3 tokens/s INFO:__main__:2024-11-30 15:25:05 | Epoch: 1 | Step: 375470 | Dataset: 0-3551498 | Loss: 0.502 | 598 ms/step , 115309.23 GFLOP/s , 173228.2 tokens/s INFO:__main__:2024-11-30 15:25:12 | Epoch: 1 | Step: 375480 | Dataset: 0-3553898 | Loss: 0.548 | 598 ms/step , 115397.21 GFLOP/s , 173200.8 tokens/s INFO:__main__:2024-11-30 15:25:19 | Epoch: 1 | Step: 375490 | Dataset: 0-3556298 | Loss: 0.572 | 599 ms/step , 115202.06 GFLOP/s , 173279.7 tokens/s INFO:__main__:2024-11-30 15:25:27 | Validation | Step: 375500 | Val_loss: 0.425 | Best_val_loss: 0.3487 INFO:__main__:2024-11-30 15:25:28 | Epoch: 1 | Step: 375500 | Dataset: 0-3558698 | Loss: 0.561 | 597 ms/step , 115674.39 GFLOP/s , 147535.5 tokens/s INFO:__main__:2024-11-30 15:25:35 | Epoch: 1 | Step: 375510 | Dataset: 0-3561098 | Loss: 0.552 | 598 ms/step , 115315.28 GFLOP/s , 173315.6 tokens/s INFO:__main__:2024-11-30 15:25:42 | Epoch: 1 | Step: 375520 | Dataset: 0-3563498 | Loss: 0.537 | 597 ms/step , 115547.03 GFLOP/s , 173272.7 tokens/s INFO:__main__:2024-11-30 15:25:49 | Epoch: 1 | Step: 375530 | Dataset: 0-3565898 | Loss: 0.561 | 598 ms/step , 115403.60 GFLOP/s , 173316.9 tokens/s INFO:__main__:2024-11-30 15:25:56 | Epoch: 1 | Step: 375540 | Dataset: 0-3568298 | Loss: 0.533 | 599 ms/step , 115288.63 GFLOP/s , 173256.5 tokens/s INFO:__main__:2024-11-30 15:26:03 | Epoch: 1 | Step: 375550 | Dataset: 0-3570698 | Loss: 0.499 | 598 ms/step , 115338.62 GFLOP/s , 173247.1 tokens/s INFO:__main__:2024-11-30 15:26:10 | Epoch: 1 | Step: 375560 | Dataset: 0-3573098 | Loss: 0.506 | 599 ms/step , 115236.10 GFLOP/s , 173293.7 tokens/s INFO:__main__:2024-11-30 15:26:17 | Epoch: 1 | Step: 375570 | Dataset: 0-3575498 | Loss: 0.520 | 597 ms/step , 115526.41 GFLOP/s , 173275.6 tokens/s INFO:__main__:2024-11-30 15:26:24 | Epoch: 1 | Step: 375580 | Dataset: 0-3577898 | Loss: 0.575 | 598 ms/step , 115320.08 GFLOP/s , 173272.6 tokens/s INFO:__main__:2024-11-30 15:26:32 | Epoch: 1 | Step: 375590 | Dataset: 0-3580298 | Loss: 0.545 | 597 ms/step , 115532.25 GFLOP/s , 173271.8 tokens/s INFO:__main__:2024-11-30 15:26:39 | Epoch: 1 | Step: 375600 | Dataset: 0-3582698 | Loss: 0.615 | 599 ms/step , 115221.00 GFLOP/s , 173202.2 tokens/s INFO:__main__:2024-11-30 15:26:46 | Epoch: 1 | Step: 375610 | Dataset: 0-3585098 | Loss: 0.532 | 598 ms/step , 115370.04 GFLOP/s , 173259.2 tokens/s INFO:__main__:2024-11-30 15:26:53 | Epoch: 1 | Step: 375620 | Dataset: 0-3587498 | Loss: 0.547 | 598 ms/step , 115451.98 GFLOP/s , 173221.7 tokens/s INFO:__main__:2024-11-30 15:27:00 | Epoch: 1 | Step: 375630 | Dataset: 0-3589898 | Loss: 0.558 | 598 ms/step , 115339.11 GFLOP/s , 173116.5 tokens/s INFO:__main__:2024-11-30 15:27:07 | Epoch: 1 | Step: 375640 | Dataset: 0-3592298 | Loss: 0.479 | 598 ms/step , 115479.36 GFLOP/s , 173232.1 tokens/s INFO:__main__:2024-11-30 15:27:14 | Epoch: 1 | Step: 375650 | Dataset: 0-3594698 | Loss: 0.501 | 599 ms/step , 115262.33 GFLOP/s , 173311.4 tokens/s INFO:__main__:2024-11-30 15:27:21 | Epoch: 1 | Step: 375660 | Dataset: 0-3597098 | Loss: 0.575 | 598 ms/step , 115460.67 GFLOP/s , 173209.7 tokens/s INFO:__main__:2024-11-30 15:27:28 | Epoch: 1 | Step: 375670 | Dataset: 0-3599498 | Loss: 0.501 | 599 ms/step , 115261.21 GFLOP/s , 173228.9 tokens/s INFO:__main__:2024-11-30 15:27:35 | Epoch: 1 | Step: 375680 | Dataset: 0-3601898 | Loss: 0.528 | 598 ms/step , 115481.64 GFLOP/s , 173254.2 tokens/s INFO:__main__:2024-11-30 15:27:42 | Epoch: 1 | Step: 375690 | Dataset: 0-3604298 | Loss: 0.527 | 598 ms/step , 115312.93 GFLOP/s , 173269.6 tokens/s INFO:__main__:2024-11-30 15:27:50 | Epoch: 1 | Step: 375700 | Dataset: 0-3606698 | Loss: 0.489 | 598 ms/step , 115394.69 GFLOP/s , 173176.0 tokens/s INFO:__main__:2024-11-30 15:27:57 | Epoch: 1 | Step: 375710 | Dataset: 0-3609098 | Loss: 0.525 | 599 ms/step , 115272.73 GFLOP/s , 173203.2 tokens/s INFO:__main__:2024-11-30 15:28:04 | Epoch: 1 | Step: 375720 | Dataset: 0-3611498 | Loss: 0.520 | 598 ms/step , 115336.57 GFLOP/s , 173237.9 tokens/s INFO:__main__:2024-11-30 15:28:11 | Epoch: 1 | Step: 375730 | Dataset: 0-3613898 | Loss: 0.522 | 599 ms/step , 115262.72 GFLOP/s , 173202.3 tokens/s INFO:__main__:2024-11-30 15:28:18 | Epoch: 1 | Step: 375740 | Dataset: 0-3616298 | Loss: 0.609 | 598 ms/step , 115392.68 GFLOP/s , 173104.3 tokens/s INFO:__main__:2024-11-30 15:28:25 | Epoch: 1 | Step: 375750 | Dataset: 0-3618698 | Loss: 0.565 | 599 ms/step , 115202.97 GFLOP/s , 173115.4 tokens/s INFO:__main__:2024-11-30 15:28:32 | Epoch: 1 | Step: 375760 | Dataset: 0-3621098 | Loss: 0.509 | 598 ms/step , 115452.38 GFLOP/s , 173269.3 tokens/s INFO:__main__:2024-11-30 15:28:39 | Epoch: 1 | Step: 375770 | Dataset: 0-3623498 | Loss: 0.518 | 598 ms/step , 115386.36 GFLOP/s , 173199.1 tokens/s INFO:__main__:2024-11-30 15:28:46 | Epoch: 1 | Step: 375780 | Dataset: 0-3625898 | Loss: 0.574 | 598 ms/step , 115365.31 GFLOP/s , 173159.8 tokens/s INFO:__main__:2024-11-30 15:28:53 | Epoch: 1 | Step: 375790 | Dataset: 0-3628298 | Loss: 0.550 | 598 ms/step , 115351.37 GFLOP/s , 173107.5 tokens/s INFO:__main__:2024-11-30 15:29:01 | Epoch: 1 | Step: 375800 | Dataset: 0-3630698 | Loss: 0.623 | 597 ms/step , 115524.55 GFLOP/s , 173284.9 tokens/s INFO:__main__:2024-11-30 15:29:08 | Epoch: 1 | Step: 375810 | Dataset: 0-3633098 | Loss: 0.541 | 598 ms/step , 115343.67 GFLOP/s , 173251.6 tokens/s INFO:__main__:2024-11-30 15:29:15 | Epoch: 1 | Step: 375820 | Dataset: 0-3635498 | Loss: 0.476 | 598 ms/step , 115404.72 GFLOP/s , 173214.0 tokens/s INFO:__main__:2024-11-30 15:29:22 | Epoch: 1 | Step: 375830 | Dataset: 0-3637898 | Loss: 0.508 | 598 ms/step , 115398.19 GFLOP/s , 173207.4 tokens/s INFO:__main__:2024-11-30 15:29:29 | Epoch: 1 | Step: 375840 | Dataset: 0-3640298 | Loss: 0.611 | 598 ms/step , 115407.81 GFLOP/s , 173196.0 tokens/s INFO:__main__:2024-11-30 15:29:36 | Epoch: 1 | Step: 375850 | Dataset: 0-3642698 | Loss: 0.502 | 599 ms/step , 115199.55 GFLOP/s , 173162.4 tokens/s INFO:__main__:2024-11-30 15:29:43 | Epoch: 1 | Step: 375860 | Dataset: 0-3645098 | Loss: 0.572 | 598 ms/step , 115373.76 GFLOP/s , 173172.2 tokens/s INFO:__main__:2024-11-30 15:29:50 | Epoch: 1 | Step: 375870 | Dataset: 0-3647498 | Loss: 0.517 | 598 ms/step , 115340.42 GFLOP/s , 173224.4 tokens/s INFO:__main__:2024-11-30 15:29:57 | Epoch: 1 | Step: 375880 | Dataset: 0-3649898 | Loss: 0.566 | 598 ms/step , 115354.36 GFLOP/s , 173187.2 tokens/s INFO:__main__:2024-11-30 15:30:04 | Epoch: 1 | Step: 375890 | Dataset: 0-3652298 | Loss: 0.510 | 599 ms/step , 115221.71 GFLOP/s , 173164.3 tokens/s INFO:__main__:2024-11-30 15:30:12 | Epoch: 1 | Step: 375900 | Dataset: 0-3654698 | Loss: 0.508 | 598 ms/step , 115336.26 GFLOP/s , 173073.7 tokens/s INFO:__main__:2024-11-30 15:30:19 | Epoch: 1 | Step: 375910 | Dataset: 0-3657098 | Loss: 0.472 | 599 ms/step , 115145.08 GFLOP/s , 173129.4 tokens/s INFO:__main__:2024-11-30 15:30:26 | Epoch: 1 | Step: 375920 | Dataset: 0-3659498 | Loss: 0.500 | 598 ms/step , 115378.05 GFLOP/s , 173141.9 tokens/s INFO:__main__:2024-11-30 15:30:33 | Epoch: 1 | Step: 375930 | Dataset: 0-3661898 | Loss: 0.529 | 599 ms/step , 115296.30 GFLOP/s , 173180.5 tokens/s INFO:__main__:2024-11-30 15:30:40 | Epoch: 1 | Step: 375940 | Dataset: 0-3664298 | Loss: 0.494 | 598 ms/step , 115356.67 GFLOP/s , 173102.8 tokens/s INFO:__main__:2024-11-30 15:30:47 | Epoch: 1 | Step: 375950 | Dataset: 0-3666698 | Loss: 0.519 | 599 ms/step , 115187.29 GFLOP/s , 173188.9 tokens/s INFO:__main__:2024-11-30 15:30:54 | Epoch: 1 | Step: 375960 | Dataset: 0-3669098 | Loss: 0.544 | 599 ms/step , 115238.75 GFLOP/s , 173178.6 tokens/s INFO:__main__:2024-11-30 15:31:01 | Epoch: 1 | Step: 375970 | Dataset: 0-3671498 | Loss: 0.407 | 599 ms/step , 115226.89 GFLOP/s , 173127.2 tokens/s INFO:__main__:2024-11-30 15:31:08 | Epoch: 1 | Step: 375980 | Dataset: 0-3673898 | Loss: 0.401 | 597 ms/step , 115514.16 GFLOP/s , 173277.1 tokens/s INFO:__main__:2024-11-30 15:31:15 | Epoch: 1 | Step: 375990 | Dataset: 0-3676298 | Loss: 0.390 | 599 ms/step , 115277.24 GFLOP/s , 173168.7 tokens/s INFO:__main__:2024-11-30 15:31:23 | Validation | Step: 376000 | Val_loss: 0.378 | Best_val_loss: 0.3487 INFO:__main__:2024-11-30 15:31:23 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_153123_step_376000.pt` INFO:__main__:2024-11-30 15:31:26 | Epoch: 1 | Step: 376000 | Dataset: 0-3678698 | Loss: 0.392 | 595 ms/step , 115992.01 GFLOP/s , 119406.9 tokens/s INFO:__main__:2024-11-30 15:31:33 | Epoch: 1 | Step: 376010 | Dataset: 0-3681098 | Loss: 0.397 | 598 ms/step , 115462.23 GFLOP/s , 173382.9 tokens/s INFO:__main__:2024-11-30 15:31:40 | Epoch: 1 | Step: 376020 | Dataset: 0-3683498 | Loss: 0.400 | 598 ms/step , 115466.44 GFLOP/s , 173426.5 tokens/s INFO:__main__:2024-11-30 15:31:47 | Epoch: 1 | Step: 376030 | Dataset: 0-3685898 | Loss: 0.382 | 598 ms/step , 115433.62 GFLOP/s , 173305.0 tokens/s INFO:__main__:2024-11-30 15:31:54 | Epoch: 1 | Step: 376040 | Dataset: 0-3688298 | Loss: 0.430 | 598 ms/step , 115464.86 GFLOP/s , 173299.7 tokens/s INFO:__main__:2024-11-30 15:32:01 | Epoch: 1 | Step: 376050 | Dataset: 0-3690698 | Loss: 0.413 | 598 ms/step , 115328.32 GFLOP/s , 173273.9 tokens/s INFO:__main__:2024-11-30 15:32:08 | Epoch: 1 | Step: 376060 | Dataset: 0-3693098 | Loss: 0.402 | 598 ms/step , 115428.61 GFLOP/s , 173274.7 tokens/s INFO:__main__:2024-11-30 15:32:15 | Epoch: 1 | Step: 376070 | Dataset: 0-3695498 | Loss: 0.384 | 598 ms/step , 115373.70 GFLOP/s , 173122.1 tokens/s INFO:__main__:2024-11-30 15:32:22 | Epoch: 1 | Step: 376080 | Dataset: 0-3697898 | Loss: 0.411 | 599 ms/step , 115264.65 GFLOP/s , 173270.6 tokens/s INFO:__main__:2024-11-30 15:32:29 | Epoch: 1 | Step: 376090 | Dataset: 0-3700298 | Loss: 0.401 | 598 ms/step , 115381.31 GFLOP/s , 173209.8 tokens/s INFO:__main__:2024-11-30 15:32:37 | Epoch: 1 | Step: 376100 | Dataset: 0-3702698 | Loss: 0.466 | 597 ms/step , 115554.41 GFLOP/s , 173295.4 tokens/s INFO:__main__:2024-11-30 15:32:44 | Epoch: 1 | Step: 376110 | Dataset: 0-3705098 | Loss: 0.434 | 599 ms/step , 115283.78 GFLOP/s , 173227.4 tokens/s INFO:__main__:2024-11-30 15:32:51 | Epoch: 1 | Step: 376120 | Dataset: 0-3707498 | Loss: 0.400 | 599 ms/step , 115282.60 GFLOP/s , 173232.8 tokens/s INFO:__main__:2024-11-30 15:32:58 | Epoch: 1 | Step: 376130 | Dataset: 0-3709898 | Loss: 0.365 | 603 ms/step , 114526.78 GFLOP/s , 173119.6 tokens/s INFO:__main__:2024-11-30 15:33:05 | Epoch: 1 | Step: 376140 | Dataset: 0-3712298 | Loss: 0.401 | 598 ms/step , 115398.72 GFLOP/s , 173200.8 tokens/s INFO:__main__:2024-11-30 15:33:12 | Epoch: 1 | Step: 376150 | Dataset: 0-3714698 | Loss: 0.443 | 599 ms/step , 115288.25 GFLOP/s , 173250.8 tokens/s INFO:__main__:2024-11-30 15:33:19 | Epoch: 1 | Step: 376160 | Dataset: 0-3717098 | Loss: 0.420 | 598 ms/step , 115468.26 GFLOP/s , 173252.9 tokens/s INFO:__main__:2024-11-30 15:33:26 | Epoch: 1 | Step: 376170 | Dataset: 0-3719498 | Loss: 0.471 | 599 ms/step , 115203.18 GFLOP/s , 173196.8 tokens/s INFO:__main__:2024-11-30 15:33:33 | Epoch: 1 | Step: 376180 | Dataset: 0-3721898 | Loss: 0.424 | 598 ms/step , 115322.03 GFLOP/s , 173212.9 tokens/s INFO:__main__:2024-11-30 15:33:40 | Epoch: 1 | Step: 376190 | Dataset: 0-3724298 | Loss: 0.391 | 599 ms/step , 115192.97 GFLOP/s , 173260.0 tokens/s INFO:__main__:2024-11-30 15:33:48 | Epoch: 1 | Step: 376200 | Dataset: 0-3726698 | Loss: 0.411 | 598 ms/step , 115461.08 GFLOP/s , 173253.8 tokens/s INFO:__main__:2024-11-30 15:33:55 | Epoch: 1 | Step: 376210 | Dataset: 0-3729098 | Loss: 0.415 | 599 ms/step , 115150.28 GFLOP/s , 173190.6 tokens/s INFO:__main__:2024-11-30 15:34:02 | Epoch: 1 | Step: 376220 | Dataset: 0-3731498 | Loss: 0.370 | 597 ms/step , 115525.58 GFLOP/s , 173262.2 tokens/s INFO:__main__:2024-11-30 15:34:09 | Epoch: 1 | Step: 376230 | Dataset: 0-3733898 | Loss: 0.369 | 597 ms/step , 115532.02 GFLOP/s , 173141.3 tokens/s INFO:__main__:2024-11-30 15:34:16 | Epoch: 1 | Step: 376240 | Dataset: 0-3736298 | Loss: 0.387 | 598 ms/step , 115323.54 GFLOP/s , 173211.3 tokens/s INFO:__main__:2024-11-30 15:34:23 | Epoch: 1 | Step: 376250 | Dataset: 0-3738698 | Loss: 0.393 | 598 ms/step , 115370.34 GFLOP/s , 173166.3 tokens/s INFO:__main__:2024-11-30 15:34:30 | Epoch: 1 | Step: 376260 | Dataset: 0-3741098 | Loss: 0.425 | 598 ms/step , 115459.25 GFLOP/s , 173242.8 tokens/s INFO:__main__:2024-11-30 15:34:37 | Epoch: 1 | Step: 376270 | Dataset: 0-3743498 | Loss: 0.397 | 599 ms/step , 115289.22 GFLOP/s , 173168.7 tokens/s INFO:__main__:2024-11-30 15:34:44 | Epoch: 1 | Step: 376280 | Dataset: 0-3745898 | Loss: 0.387 | 598 ms/step , 115434.84 GFLOP/s , 173261.4 tokens/s INFO:__main__:2024-11-30 15:34:51 | Epoch: 1 | Step: 376290 | Dataset: 0-3748298 | Loss: 0.377 | 599 ms/step , 115307.03 GFLOP/s , 173228.4 tokens/s INFO:__main__:2024-11-30 15:34:58 | Epoch: 1 | Step: 376300 | Dataset: 0-3750698 | Loss: 0.428 | 598 ms/step , 115497.42 GFLOP/s , 173108.5 tokens/s INFO:__main__:2024-11-30 15:35:06 | Epoch: 1 | Step: 376310 | Dataset: 0-3753098 | Loss: 0.417 | 598 ms/step , 115323.79 GFLOP/s , 173131.0 tokens/s INFO:__main__:2024-11-30 15:35:13 | Epoch: 1 | Step: 376320 | Dataset: 0-3755498 | Loss: 0.417 | 598 ms/step , 115369.96 GFLOP/s , 173236.2 tokens/s INFO:__main__:2024-11-30 15:35:20 | Epoch: 1 | Step: 376330 | Dataset: 0-3757898 | Loss: 0.431 | 599 ms/step , 115273.32 GFLOP/s , 173148.4 tokens/s INFO:__main__:2024-11-30 15:35:27 | Epoch: 1 | Step: 376340 | Dataset: 0-3760298 | Loss: 0.380 | 597 ms/step , 115542.37 GFLOP/s , 173268.1 tokens/s INFO:__main__:2024-11-30 15:35:34 | Epoch: 1 | Step: 376350 | Dataset: 0-3762698 | Loss: 0.370 | 598 ms/step , 115320.33 GFLOP/s , 173252.9 tokens/s INFO:__main__:2024-11-30 15:35:41 | Epoch: 1 | Step: 376360 | Dataset: 0-3765098 | Loss: 0.508 | 599 ms/step , 115275.31 GFLOP/s , 173165.6 tokens/s INFO:__main__:2024-11-30 15:35:48 | Epoch: 1 | Step: 376370 | Dataset: 0-3767498 | Loss: 0.394 | 598 ms/step , 115394.91 GFLOP/s , 173225.8 tokens/s INFO:__main__:2024-11-30 15:35:55 | Epoch: 1 | Step: 376380 | Dataset: 0-3769898 | Loss: 0.369 | 598 ms/step , 115411.13 GFLOP/s , 173193.3 tokens/s INFO:__main__:2024-11-30 15:36:02 | Epoch: 1 | Step: 376390 | Dataset: 0-3772298 | Loss: 0.384 | 599 ms/step , 115264.85 GFLOP/s , 173273.3 tokens/s INFO:__main__:2024-11-30 15:36:09 | Epoch: 1 | Step: 376400 | Dataset: 0-3774698 | Loss: 0.354 | 597 ms/step , 115574.03 GFLOP/s , 173286.3 tokens/s INFO:__main__:2024-11-30 15:36:16 | Epoch: 1 | Step: 376410 | Dataset: 0-3777098 | Loss: 0.403 | 598 ms/step , 115449.28 GFLOP/s , 173216.2 tokens/s INFO:__main__:2024-11-30 15:36:24 | Epoch: 1 | Step: 376420 | Dataset: 0-3779498 | Loss: 0.418 | 598 ms/step , 115327.99 GFLOP/s , 173139.1 tokens/s INFO:__main__:2024-11-30 15:36:31 | Epoch: 1 | Step: 376430 | Dataset: 0-3781898 | Loss: 0.413 | 598 ms/step , 115313.96 GFLOP/s , 173209.2 tokens/s INFO:__main__:2024-11-30 15:36:38 | Epoch: 1 | Step: 376440 | Dataset: 0-3784298 | Loss: 0.407 | 598 ms/step , 115330.12 GFLOP/s , 173238.1 tokens/s INFO:__main__:2024-11-30 15:36:45 | Epoch: 1 | Step: 376450 | Dataset: 0-3786698 | Loss: 0.401 | 599 ms/step , 115280.16 GFLOP/s , 173224.2 tokens/s INFO:__main__:2024-11-30 15:36:52 | Epoch: 1 | Step: 376460 | Dataset: 0-3789098 | Loss: 0.372 | 598 ms/step , 115410.25 GFLOP/s , 173237.5 tokens/s INFO:__main__:2024-11-30 15:36:59 | Epoch: 1 | Step: 376470 | Dataset: 0-3791498 | Loss: 0.399 | 598 ms/step , 115449.37 GFLOP/s , 173231.5 tokens/s INFO:__main__:2024-11-30 15:37:06 | Epoch: 1 | Step: 376480 | Dataset: 0-3793898 | Loss: 0.443 | 598 ms/step , 115338.35 GFLOP/s , 173230.9 tokens/s INFO:__main__:2024-11-30 15:37:13 | Epoch: 1 | Step: 376490 | Dataset: 0-3796298 | Loss: 0.417 | 598 ms/step , 115388.22 GFLOP/s , 173234.4 tokens/s INFO:__main__:2024-11-30 15:37:21 | Validation | Step: 376500 | Val_loss: 0.374 | Best_val_loss: 0.3487 INFO:__main__:2024-11-30 15:37:22 | Epoch: 1 | Step: 376500 | Dataset: 0-3798698 | Loss: 0.395 | 596 ms/step , 115730.32 GFLOP/s , 147506.9 tokens/s INFO:__main__:2024-11-30 15:37:29 | Epoch: 1 | Step: 376510 | Dataset: 0-3801098 | Loss: 0.339 | 599 ms/step , 115303.50 GFLOP/s , 173370.1 tokens/s INFO:__main__:2024-11-30 15:37:36 | Epoch: 1 | Step: 376520 | Dataset: 0-3803498 | Loss: 0.602 | 598 ms/step , 115455.37 GFLOP/s , 173301.3 tokens/s INFO:__main__:2024-11-30 15:37:43 | Epoch: 1 | Step: 376530 | Dataset: 0-3805898 | Loss: 0.617 | 598 ms/step , 115353.07 GFLOP/s , 173354.9 tokens/s INFO:__main__:2024-11-30 15:37:50 | Epoch: 1 | Step: 376540 | Dataset: 0-3808298 | Loss: 0.698 | 598 ms/step , 115454.27 GFLOP/s , 173322.5 tokens/s INFO:__main__:2024-11-30 15:37:57 | Epoch: 1 | Step: 376550 | Dataset: 0-3810698 | Loss: 0.703 | 598 ms/step , 115341.86 GFLOP/s , 173326.9 tokens/s INFO:__main__:2024-11-30 15:38:04 | Epoch: 1 | Step: 376560 | Dataset: 0-3813098 | Loss: 0.572 | 598 ms/step , 115442.56 GFLOP/s , 173250.5 tokens/s INFO:__main__:2024-11-30 15:38:11 | Epoch: 1 | Step: 376570 | Dataset: 0-3815498 | Loss: 0.573 | 599 ms/step , 115309.07 GFLOP/s , 173238.2 tokens/s INFO:__main__:2024-11-30 15:38:18 | Epoch: 1 | Step: 376580 | Dataset: 0-3817898 | Loss: 0.684 | 597 ms/step , 115524.88 GFLOP/s , 173243.0 tokens/s INFO:__main__:2024-11-30 15:38:25 | Epoch: 1 | Step: 376590 | Dataset: 0-3820298 | Loss: 0.648 | 599 ms/step , 115252.19 GFLOP/s , 173284.5 tokens/s INFO:__main__:2024-11-30 15:38:32 | Epoch: 1 | Step: 376600 | Dataset: 0-3822698 | Loss: 0.572 | 602 ms/step , 114662.85 GFLOP/s , 173204.5 tokens/s INFO:__main__:2024-11-30 15:38:40 | Epoch: 1 | Step: 376610 | Dataset: 0-3825098 | Loss: 0.645 | 598 ms/step , 115377.16 GFLOP/s , 173219.5 tokens/s INFO:__main__:2024-11-30 15:38:47 | Epoch: 1 | Step: 376620 | Dataset: 0-3827498 | Loss: 0.652 | 598 ms/step , 115322.33 GFLOP/s , 173260.7 tokens/s INFO:__main__:2024-11-30 15:38:54 | Epoch: 1 | Step: 376630 | Dataset: 0-3829898 | Loss: 0.645 | 598 ms/step , 115346.12 GFLOP/s , 173241.6 tokens/s INFO:__main__:2024-11-30 15:39:01 | Epoch: 1 | Step: 376640 | Dataset: 0-3832298 | Loss: 0.597 | 598 ms/step , 115486.21 GFLOP/s , 173177.2 tokens/s INFO:__main__:2024-11-30 15:39:08 | Epoch: 1 | Step: 376650 | Dataset: 0-3834698 | Loss: 0.651 | 599 ms/step , 115280.96 GFLOP/s , 173213.3 tokens/s INFO:__main__:2024-11-30 15:39:15 | Epoch: 1 | Step: 376660 | Dataset: 0-3837098 | Loss: 0.593 | 598 ms/step , 115366.08 GFLOP/s , 173227.2 tokens/s INFO:__main__:2024-11-30 15:39:22 | Epoch: 1 | Step: 376670 | Dataset: 0-3839498 | Loss: 0.601 | 598 ms/step , 115492.13 GFLOP/s , 173290.9 tokens/s INFO:__main__:2024-11-30 15:39:29 | Epoch: 1 | Step: 376680 | Dataset: 0-3841898 | Loss: 0.684 | 598 ms/step , 115393.60 GFLOP/s , 173214.1 tokens/s INFO:__main__:2024-11-30 15:39:36 | Epoch: 1 | Step: 376690 | Dataset: 0-3844298 | Loss: 0.629 | 599 ms/step , 115307.59 GFLOP/s , 173173.5 tokens/s INFO:__main__:2024-11-30 15:39:43 | Epoch: 1 | Step: 376700 | Dataset: 0-3846698 | Loss: 0.672 | 600 ms/step , 114956.75 GFLOP/s , 173036.6 tokens/s INFO:__main__:2024-11-30 15:39:51 | Epoch: 1 | Step: 376710 | Dataset: 0-3849098 | Loss: 0.684 | 599 ms/step , 115231.61 GFLOP/s , 173213.0 tokens/s INFO:__main__:2024-11-30 15:39:58 | Epoch: 1 | Step: 376720 | Dataset: 0-3851498 | Loss: 0.555 | 598 ms/step , 115342.64 GFLOP/s , 173187.5 tokens/s INFO:__main__:2024-11-30 15:40:05 | Epoch: 1 | Step: 376730 | Dataset: 0-3853898 | Loss: 0.664 | 598 ms/step , 115331.48 GFLOP/s , 173245.8 tokens/s INFO:__main__:2024-11-30 15:40:12 | Epoch: 1 | Step: 376740 | Dataset: 0-3856298 | Loss: 0.623 | 598 ms/step , 115360.96 GFLOP/s , 173209.8 tokens/s INFO:__main__:2024-11-30 15:40:19 | Epoch: 1 | Step: 376750 | Dataset: 0-3858698 | Loss: 0.642 | 599 ms/step , 115239.67 GFLOP/s , 173187.1 tokens/s INFO:__main__:2024-11-30 15:40:26 | Epoch: 1 | Step: 376760 | Dataset: 0-3861098 | Loss: 0.637 | 598 ms/step , 115320.88 GFLOP/s , 173297.1 tokens/s INFO:__main__:2024-11-30 15:40:33 | Epoch: 1 | Step: 376770 | Dataset: 0-3863498 | Loss: 0.696 | 599 ms/step , 115197.32 GFLOP/s , 173176.6 tokens/s INFO:__main__:2024-11-30 15:40:40 | Epoch: 1 | Step: 376780 | Dataset: 0-3865898 | Loss: 0.679 | 598 ms/step , 115385.16 GFLOP/s , 173257.5 tokens/s INFO:__main__:2024-11-30 15:40:47 | Epoch: 1 | Step: 376790 | Dataset: 0-3868298 | Loss: 0.628 | 598 ms/step , 115356.12 GFLOP/s , 173198.0 tokens/s INFO:__main__:2024-11-30 15:40:54 | Epoch: 1 | Step: 376800 | Dataset: 0-3870698 | Loss: 0.510 | 598 ms/step , 115318.35 GFLOP/s , 173217.7 tokens/s INFO:__main__:2024-11-30 15:41:01 | Epoch: 1 | Step: 376810 | Dataset: 0-3873098 | Loss: 0.761 | 598 ms/step , 115323.86 GFLOP/s , 173273.3 tokens/s INFO:__main__:2024-11-30 15:41:09 | Epoch: 1 | Step: 376820 | Dataset: 0-3875498 | Loss: 0.644 | 598 ms/step , 115462.08 GFLOP/s , 173193.5 tokens/s INFO:__main__:2024-11-30 15:41:16 | Epoch: 1 | Step: 376830 | Dataset: 0-3877898 | Loss: 0.657 | 599 ms/step , 115268.98 GFLOP/s , 173227.8 tokens/s INFO:__main__:2024-11-30 15:41:23 | Epoch: 1 | Step: 376840 | Dataset: 0-3880298 | Loss: 0.685 | 601 ms/step , 114918.98 GFLOP/s , 173186.6 tokens/s INFO:__main__:2024-11-30 15:41:30 | Epoch: 1 | Step: 376850 | Dataset: 0-3882698 | Loss: 0.671 | 598 ms/step , 115356.70 GFLOP/s , 173132.0 tokens/s INFO:__main__:2024-11-30 15:41:37 | Epoch: 1 | Step: 376860 | Dataset: 0-3885098 | Loss: 0.623 | 598 ms/step , 115323.97 GFLOP/s , 173275.6 tokens/s INFO:__main__:2024-11-30 15:41:44 | Epoch: 1 | Step: 376870 | Dataset: 0-3887498 | Loss: 0.647 | 599 ms/step , 115207.06 GFLOP/s , 173186.0 tokens/s INFO:__main__:2024-11-30 15:41:51 | Epoch: 1 | Step: 376880 | Dataset: 0-3889898 | Loss: 0.607 | 598 ms/step , 115436.29 GFLOP/s , 173222.7 tokens/s INFO:__main__:2024-11-30 15:41:58 | Epoch: 1 | Step: 376890 | Dataset: 0-3892298 | Loss: 0.619 | 599 ms/step , 115263.71 GFLOP/s , 173180.0 tokens/s INFO:__main__:2024-11-30 15:42:05 | Epoch: 1 | Step: 376900 | Dataset: 0-3894698 | Loss: 0.652 | 598 ms/step , 115331.74 GFLOP/s , 173231.6 tokens/s INFO:__main__:2024-11-30 15:42:12 | Epoch: 1 | Step: 376910 | Dataset: 0-3897098 | Loss: 0.646 | 598 ms/step , 115320.63 GFLOP/s , 173238.1 tokens/s INFO:__main__:2024-11-30 15:42:19 | Epoch: 1 | Step: 376920 | Dataset: 0-3899498 | Loss: 0.660 | 600 ms/step , 115059.36 GFLOP/s , 173241.6 tokens/s INFO:__main__:2024-11-30 15:42:27 | Epoch: 1 | Step: 376930 | Dataset: 0-3901898 | Loss: 0.666 | 599 ms/step , 115194.04 GFLOP/s , 173201.2 tokens/s INFO:__main__:2024-11-30 15:42:34 | Epoch: 1 | Step: 376940 | Dataset: 0-3904298 | Loss: 0.665 | 598 ms/step , 115402.32 GFLOP/s , 173213.2 tokens/s INFO:__main__:2024-11-30 15:42:41 | Epoch: 1 | Step: 376950 | Dataset: 0-3906698 | Loss: 0.621 | 599 ms/step , 115281.51 GFLOP/s , 173135.2 tokens/s INFO:__main__:2024-11-30 15:42:48 | Epoch: 1 | Step: 376960 | Dataset: 0-3909098 | Loss: 0.601 | 598 ms/step , 115438.38 GFLOP/s , 173250.3 tokens/s INFO:__main__:2024-11-30 15:42:55 | Epoch: 1 | Step: 376970 | Dataset: 0-3911498 | Loss: 0.657 | 599 ms/step , 115288.78 GFLOP/s , 173219.6 tokens/s INFO:__main__:2024-11-30 15:43:02 | Epoch: 1 | Step: 376980 | Dataset: 0-3913898 | Loss: 0.636 | 597 ms/step , 115622.11 GFLOP/s , 173242.7 tokens/s INFO:__main__:2024-11-30 15:43:09 | Epoch: 1 | Step: 376990 | Dataset: 0-3916298 | Loss: 0.600 | 599 ms/step , 115258.45 GFLOP/s , 173219.4 tokens/s INFO:__main__:2024-11-30 15:43:17 | Validation | Step: 377000 | Val_loss: 0.395 | Best_val_loss: 0.3487 INFO:__main__:2024-11-30 15:43:17 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_154317_step_377000.pt` INFO:__main__:2024-11-30 15:43:19 | Epoch: 1 | Step: 377000 | Dataset: 0-3918698 | Loss: 0.674 | 595 ms/step , 115906.25 GFLOP/s , 120010.3 tokens/s INFO:__main__:2024-11-30 15:43:26 | Epoch: 1 | Step: 377010 | Dataset: 0-3921098 | Loss: 0.702 | 598 ms/step , 115368.63 GFLOP/s , 173416.7 tokens/s INFO:__main__:2024-11-30 15:43:34 | Epoch: 1 | Step: 377020 | Dataset: 0-3923498 | Loss: 0.712 | 599 ms/step , 115247.72 GFLOP/s , 173359.8 tokens/s INFO:__main__:2024-11-30 15:43:41 | Epoch: 1 | Step: 377030 | Dataset: 0-3925898 | Loss: 0.723 | 598 ms/step , 115497.15 GFLOP/s , 173392.7 tokens/s INFO:__main__:2024-11-30 15:43:48 | Epoch: 1 | Step: 377040 | Dataset: 0-3928298 | Loss: 0.598 | 598 ms/step , 115348.44 GFLOP/s , 173260.9 tokens/s INFO:__main__:2024-11-30 15:43:55 | Epoch: 1 | Step: 377050 | Dataset: 0-3930698 | Loss: 0.618 | 598 ms/step , 115355.97 GFLOP/s , 173380.1 tokens/s INFO:__main__:2024-11-30 15:44:02 | Epoch: 1 | Step: 377060 | Dataset: 0-3933098 | Loss: 0.647 | 598 ms/step , 115405.13 GFLOP/s , 173147.5 tokens/s INFO:__main__:2024-11-30 15:44:09 | Epoch: 1 | Step: 377070 | Dataset: 0-3935498 | Loss: 0.678 | 599 ms/step , 115300.84 GFLOP/s , 173236.4 tokens/s INFO:__main__:2024-11-30 15:44:16 | Epoch: 1 | Step: 377080 | Dataset: 0-3937898 | Loss: 0.687 | 599 ms/step , 115199.89 GFLOP/s , 173160.2 tokens/s INFO:__main__:2024-11-30 15:44:23 | Epoch: 1 | Step: 377090 | Dataset: 0-3940298 | Loss: 0.675 | 599 ms/step , 115287.23 GFLOP/s , 173251.8 tokens/s INFO:__main__:2024-11-30 15:44:30 | Epoch: 1 | Step: 377100 | Dataset: 0-3942698 | Loss: 0.628 | 599 ms/step , 115246.88 GFLOP/s , 173261.5 tokens/s INFO:__main__:2024-11-30 15:44:37 | Epoch: 1 | Step: 377110 | Dataset: 0-3945098 | Loss: 0.700 | 598 ms/step , 115395.98 GFLOP/s , 173263.7 tokens/s INFO:__main__:2024-11-30 15:44:44 | Epoch: 1 | Step: 377120 | Dataset: 0-3947498 | Loss: 0.662 | 599 ms/step , 115221.64 GFLOP/s , 173277.7 tokens/s INFO:__main__:2024-11-30 15:44:52 | Epoch: 1 | Step: 377130 | Dataset: 0-3949898 | Loss: 0.671 | 598 ms/step , 115442.71 GFLOP/s , 173250.1 tokens/s INFO:__main__:2024-11-30 15:44:59 | Epoch: 1 | Step: 377140 | Dataset: 0-3952298 | Loss: 0.691 | 599 ms/step , 115180.75 GFLOP/s , 173189.8 tokens/s INFO:__main__:2024-11-30 15:45:06 | Epoch: 1 | Step: 377150 | Dataset: 0-3954698 | Loss: 0.642 | 598 ms/step , 115337.12 GFLOP/s , 173206.5 tokens/s INFO:__main__:2024-11-30 15:45:13 | Epoch: 1 | Step: 377160 | Dataset: 0-3957098 | Loss: 0.689 | 598 ms/step , 115357.93 GFLOP/s , 173229.7 tokens/s INFO:__main__:2024-11-30 15:45:20 | Epoch: 1 | Step: 377170 | Dataset: 0-3959498 | Loss: 0.690 | 598 ms/step , 115475.55 GFLOP/s , 173171.9 tokens/s INFO:__main__:2024-11-30 15:45:27 | Epoch: 1 | Step: 377180 | Dataset: 0-3961898 | Loss: 0.667 | 599 ms/step , 115194.84 GFLOP/s , 173213.7 tokens/s INFO:__main__:2024-11-30 15:45:34 | Epoch: 1 | Step: 377190 | Dataset: 0-3964298 | Loss: 0.672 | 599 ms/step , 115222.00 GFLOP/s , 173142.2 tokens/s INFO:__main__:2024-11-30 15:45:41 | Epoch: 1 | Step: 377200 | Dataset: 0-3966698 | Loss: 0.676 | 599 ms/step , 115282.33 GFLOP/s , 173137.4 tokens/s INFO:__main__:2024-11-30 15:45:48 | Epoch: 1 | Step: 377210 | Dataset: 0-3969098 | Loss: 0.687 | 598 ms/step , 115323.40 GFLOP/s , 173257.7 tokens/s INFO:__main__:2024-11-30 15:45:55 | Epoch: 1 | Step: 377220 | Dataset: 0-3971498 | Loss: 0.699 | 599 ms/step , 115263.87 GFLOP/s , 173208.5 tokens/s INFO:__main__:2024-11-30 15:46:03 | Epoch: 1 | Step: 377230 | Dataset: 0-3973898 | Loss: 0.712 | 599 ms/step , 115285.00 GFLOP/s , 173212.0 tokens/s INFO:__main__:2024-11-30 15:46:10 | Epoch: 1 | Step: 377240 | Dataset: 0-3976298 | Loss: 0.715 | 598 ms/step , 115339.75 GFLOP/s , 173059.2 tokens/s INFO:__main__:2024-11-30 15:46:17 | Epoch: 1 | Step: 377250 | Dataset: 0-3978698 | Loss: 0.640 | 598 ms/step , 115481.69 GFLOP/s , 173228.5 tokens/s INFO:__main__:2024-11-30 15:46:24 | Epoch: 1 | Step: 377260 | Dataset: 0-3981098 | Loss: 0.643 | 599 ms/step , 115274.01 GFLOP/s , 173048.6 tokens/s INFO:__main__:2024-11-30 15:46:31 | Epoch: 1 | Step: 377270 | Dataset: 0-3983498 | Loss: 0.664 | 598 ms/step , 115405.56 GFLOP/s , 173174.7 tokens/s INFO:__main__:2024-11-30 15:46:38 | Epoch: 1 | Step: 377280 | Dataset: 0-3985898 | Loss: 0.709 | 599 ms/step , 115161.26 GFLOP/s , 173225.5 tokens/s INFO:__main__:2024-11-30 15:46:45 | Epoch: 1 | Step: 377290 | Dataset: 0-3988298 | Loss: 0.705 | 600 ms/step , 115037.85 GFLOP/s , 173168.2 tokens/s INFO:__main__:2024-11-30 15:46:52 | Epoch: 1 | Step: 377300 | Dataset: 0-3990698 | Loss: 0.747 | 598 ms/step , 115364.53 GFLOP/s , 173159.7 tokens/s INFO:__main__:2024-11-30 15:46:59 | Epoch: 1 | Step: 377310 | Dataset: 0-3993098 | Loss: 0.595 | 598 ms/step , 115470.97 GFLOP/s , 173137.3 tokens/s INFO:__main__:2024-11-30 15:47:06 | Epoch: 1 | Step: 377320 | Dataset: 0-3995498 | Loss: 0.689 | 598 ms/step , 115351.96 GFLOP/s , 173190.6 tokens/s INFO:__main__:2024-11-30 15:47:13 | Epoch: 1 | Step: 377330 | Dataset: 0-3997898 | Loss: 0.712 | 599 ms/step , 115197.21 GFLOP/s , 173232.2 tokens/s INFO:__main__:2024-11-30 15:47:21 | Epoch: 1 | Step: 377340 | Dataset: 0-4000298 | Loss: 0.655 | 599 ms/step , 115136.58 GFLOP/s , 173251.7 tokens/s INFO:__main__:2024-11-30 15:47:28 | Epoch: 1 | Step: 377350 | Dataset: 0-4002698 | Loss: 0.695 | 598 ms/step , 115413.65 GFLOP/s , 173067.9 tokens/s INFO:__main__:2024-11-30 15:47:35 | Epoch: 1 | Step: 377360 | Dataset: 0-4005098 | Loss: 0.654 | 599 ms/step , 115299.83 GFLOP/s , 173164.5 tokens/s INFO:__main__:2024-11-30 15:47:42 | Epoch: 1 | Step: 377370 | Dataset: 0-4007498 | Loss: 0.682 | 598 ms/step , 115357.35 GFLOP/s , 173278.8 tokens/s INFO:__main__:2024-11-30 15:47:49 | Epoch: 1 | Step: 377380 | Dataset: 0-4009898 | Loss: 0.697 | 599 ms/step , 115289.51 GFLOP/s , 173246.0 tokens/s INFO:__main__:2024-11-30 15:47:56 | Epoch: 1 | Step: 377390 | Dataset: 0-4012298 | Loss: 0.666 | 597 ms/step , 115511.01 GFLOP/s , 173065.0 tokens/s INFO:__main__:2024-11-30 15:48:03 | Epoch: 1 | Step: 377400 | Dataset: 0-4014698 | Loss: 0.644 | 598 ms/step , 115328.45 GFLOP/s , 173216.4 tokens/s INFO:__main__:2024-11-30 15:48:10 | Epoch: 1 | Step: 377410 | Dataset: 0-4017098 | Loss: 0.714 | 599 ms/step , 115303.94 GFLOP/s , 173210.7 tokens/s INFO:__main__:2024-11-30 15:48:17 | Epoch: 1 | Step: 377420 | Dataset: 0-4019498 | Loss: 0.699 | 599 ms/step , 115294.03 GFLOP/s , 173231.2 tokens/s INFO:__main__:2024-11-30 15:48:24 | Epoch: 1 | Step: 377430 | Dataset: 0-4021898 | Loss: 0.683 | 598 ms/step , 115326.85 GFLOP/s , 173192.6 tokens/s INFO:__main__:2024-11-30 15:48:32 | Epoch: 1 | Step: 377440 | Dataset: 0-4024298 | Loss: 0.652 | 599 ms/step , 115243.29 GFLOP/s , 173239.3 tokens/s INFO:__main__:2024-11-30 15:48:39 | Epoch: 1 | Step: 377450 | Dataset: 0-4026698 | Loss: 0.700 | 598 ms/step , 115402.07 GFLOP/s , 173127.1 tokens/s INFO:__main__:2024-11-30 15:48:46 | Epoch: 1 | Step: 377460 | Dataset: 0-4029098 | Loss: 0.660 | 599 ms/step , 115179.33 GFLOP/s , 173291.1 tokens/s INFO:__main__:2024-11-30 15:48:53 | Epoch: 1 | Step: 377470 | Dataset: 0-4031498 | Loss: 0.661 | 598 ms/step , 115339.28 GFLOP/s , 173221.6 tokens/s INFO:__main__:2024-11-30 15:49:00 | Epoch: 1 | Step: 377480 | Dataset: 0-4033898 | Loss: 0.712 | 599 ms/step , 115148.93 GFLOP/s , 173257.4 tokens/s INFO:__main__:2024-11-30 15:49:07 | Epoch: 1 | Step: 377490 | Dataset: 0-4036298 | Loss: 0.633 | 598 ms/step , 115457.78 GFLOP/s , 173151.6 tokens/s INFO:__main__:2024-11-30 15:49:15 | Validation | Step: 377500 | Val_loss: 0.436 | Best_val_loss: 0.3487 INFO:__main__:2024-11-30 15:49:15 | Epoch: 1 | Step: 377500 | Dataset: 0-4038698 | Loss: 0.697 | 598 ms/step , 115493.56 GFLOP/s , 147455.3 tokens/s INFO:__main__:2024-11-30 15:49:22 | Epoch: 1 | Step: 377510 | Dataset: 0-4041098 | Loss: 0.635 | 599 ms/step , 115254.96 GFLOP/s , 173305.2 tokens/s INFO:__main__:2024-11-30 15:49:30 | Epoch: 1 | Step: 377520 | Dataset: 0-4043498 | Loss: 0.677 | 599 ms/step , 115292.66 GFLOP/s , 173247.0 tokens/s INFO:__main__:2024-11-30 15:49:37 | Epoch: 1 | Step: 377530 | Dataset: 0-4045898 | Loss: 0.647 | 599 ms/step , 115200.70 GFLOP/s , 173303.7 tokens/s INFO:__main__:2024-11-30 15:49:44 | Epoch: 1 | Step: 377540 | Dataset: 0-4048298 | Loss: 0.700 | 598 ms/step , 115347.15 GFLOP/s , 173149.1 tokens/s INFO:__main__:2024-11-30 15:49:51 | Epoch: 1 | Step: 377550 | Dataset: 0-4050698 | Loss: 0.596 | 598 ms/step , 115366.47 GFLOP/s , 173274.4 tokens/s INFO:__main__:2024-11-30 15:49:58 | Epoch: 1 | Step: 377560 | Dataset: 0-4053098 | Loss: 0.672 | 598 ms/step , 115347.57 GFLOP/s , 173232.9 tokens/s INFO:__main__:2024-11-30 15:50:05 | Epoch: 1 | Step: 377570 | Dataset: 0-4055498 | Loss: 0.713 | 599 ms/step , 115260.28 GFLOP/s , 173175.2 tokens/s INFO:__main__:2024-11-30 15:50:12 | Epoch: 1 | Step: 377580 | Dataset: 0-4057898 | Loss: 0.628 | 598 ms/step , 115328.98 GFLOP/s , 173171.8 tokens/s INFO:__main__:2024-11-30 15:50:19 | Epoch: 1 | Step: 377590 | Dataset: 0-4060298 | Loss: 0.677 | 599 ms/step , 115231.11 GFLOP/s , 173185.6 tokens/s INFO:__main__:2024-11-30 15:50:26 | Epoch: 1 | Step: 377600 | Dataset: 0-4062698 | Loss: 0.596 | 599 ms/step , 115224.48 GFLOP/s , 173183.9 tokens/s INFO:__main__:2024-11-30 15:50:33 | Epoch: 1 | Step: 377610 | Dataset: 0-4065098 | Loss: 0.837 | 598 ms/step , 115364.65 GFLOP/s , 173276.5 tokens/s INFO:__main__:2024-11-30 15:50:40 | Epoch: 1 | Step: 377620 | Dataset: 0-4067498 | Loss: 0.788 | 599 ms/step , 115210.80 GFLOP/s , 173209.3 tokens/s INFO:__main__:2024-11-30 15:50:48 | Epoch: 1 | Step: 377630 | Dataset: 0-4069898 | Loss: 0.805 | 599 ms/step , 115276.59 GFLOP/s , 173169.3 tokens/s INFO:__main__:2024-11-30 15:50:55 | Epoch: 1 | Step: 377640 | Dataset: 0-4072298 | Loss: 0.827 | 599 ms/step , 115170.28 GFLOP/s , 173161.6 tokens/s INFO:__main__:2024-11-30 15:51:02 | Epoch: 1 | Step: 377650 | Dataset: 0-4074698 | Loss: 0.786 | 598 ms/step , 115353.83 GFLOP/s , 173265.2 tokens/s INFO:__main__:2024-11-30 15:51:09 | Epoch: 1 | Step: 377660 | Dataset: 0-4077098 | Loss: 0.793 | 599 ms/step , 115255.40 GFLOP/s , 173158.5 tokens/s INFO:__main__:2024-11-30 15:51:16 | Epoch: 1 | Step: 377670 | Dataset: 0-4079498 | Loss: 0.823 | 599 ms/step , 115291.62 GFLOP/s , 173212.0 tokens/s INFO:__main__:2024-11-30 15:51:23 | Epoch: 1 | Step: 377680 | Dataset: 0-4081898 | Loss: 0.772 | 599 ms/step , 115268.55 GFLOP/s , 173190.2 tokens/s INFO:__main__:2024-11-30 15:51:30 | Epoch: 1 | Step: 377690 | Dataset: 0-4084298 | Loss: 0.764 | 598 ms/step , 115360.81 GFLOP/s , 173199.1 tokens/s INFO:__main__:2024-11-30 15:51:37 | Epoch: 1 | Step: 377700 | Dataset: 0-4086698 | Loss: 0.756 | 599 ms/step , 115152.29 GFLOP/s , 173202.3 tokens/s INFO:__main__:2024-11-30 15:51:44 | Epoch: 1 | Step: 377710 | Dataset: 0-4089098 | Loss: 0.680 | 598 ms/step , 115329.13 GFLOP/s , 173222.8 tokens/s INFO:__main__:2024-11-30 15:51:51 | Epoch: 1 | Step: 377720 | Dataset: 0-4091498 | Loss: 0.651 | 598 ms/step , 115327.26 GFLOP/s , 173227.0 tokens/s INFO:__main__:2024-11-30 15:51:59 | Epoch: 1 | Step: 377730 | Dataset: 0-4093898 | Loss: 0.656 | 598 ms/step , 115440.02 GFLOP/s , 173070.0 tokens/s INFO:__main__:2024-11-30 15:52:06 | Epoch: 1 | Step: 377740 | Dataset: 0-4096298 | Loss: 0.621 | 599 ms/step , 115163.70 GFLOP/s , 173115.9 tokens/s INFO:__main__:2024-11-30 15:52:13 | Epoch: 1 | Step: 377750 | Dataset: 0-4098698 | Loss: 0.609 | 598 ms/step , 115484.69 GFLOP/s , 173238.2 tokens/s INFO:__main__:2024-11-30 15:52:20 | Epoch: 1 | Step: 377760 | Dataset: 0-4101098 | Loss: 0.620 | 599 ms/step , 115230.64 GFLOP/s , 173174.6 tokens/s INFO:__main__:2024-11-30 15:52:27 | Epoch: 1 | Step: 377770 | Dataset: 0-4103498 | Loss: 0.608 | 597 ms/step , 115556.11 GFLOP/s , 173260.8 tokens/s INFO:__main__:2024-11-30 15:52:34 | Epoch: 1 | Step: 377780 | Dataset: 0-4105898 | Loss: 0.545 | 599 ms/step , 115287.46 GFLOP/s , 173211.7 tokens/s INFO:__main__:2024-11-30 15:52:41 | Epoch: 1 | Step: 377790 | Dataset: 0-4108298 | Loss: 0.624 | 599 ms/step , 115189.08 GFLOP/s , 173255.2 tokens/s INFO:__main__:2024-11-30 15:52:48 | Epoch: 1 | Step: 377800 | Dataset: 0-4110698 | Loss: 0.609 | 598 ms/step , 115314.55 GFLOP/s , 173291.6 tokens/s INFO:__main__:2024-11-30 15:52:55 | Epoch: 1 | Step: 377810 | Dataset: 0-4113098 | Loss: 0.587 | 599 ms/step , 115231.49 GFLOP/s , 173236.5 tokens/s INFO:__main__:2024-11-30 15:53:02 | Epoch: 1 | Step: 377820 | Dataset: 0-4115498 | Loss: 0.611 | 599 ms/step , 115232.07 GFLOP/s , 173242.9 tokens/s INFO:__main__:2024-11-30 15:53:09 | Epoch: 1 | Step: 377830 | Dataset: 0-4117898 | Loss: 0.621 | 598 ms/step , 115321.35 GFLOP/s , 173224.2 tokens/s INFO:__main__:2024-11-30 15:53:17 | Epoch: 1 | Step: 377840 | Dataset: 0-4120298 | Loss: 0.634 | 599 ms/step , 115307.29 GFLOP/s , 173173.3 tokens/s INFO:__main__:2024-11-30 15:53:24 | Epoch: 1 | Step: 377850 | Dataset: 0-4122698 | Loss: 0.589 | 599 ms/step , 115308.32 GFLOP/s , 173227.1 tokens/s INFO:__main__:2024-11-30 15:53:31 | Epoch: 1 | Step: 377860 | Dataset: 0-4125098 | Loss: 0.618 | 599 ms/step , 115289.96 GFLOP/s , 173280.4 tokens/s INFO:__main__:2024-11-30 15:53:38 | Epoch: 1 | Step: 377870 | Dataset: 0-4127498 | Loss: 0.575 | 599 ms/step , 115254.56 GFLOP/s , 173223.9 tokens/s INFO:__main__:2024-11-30 15:53:45 | Epoch: 1 | Step: 377880 | Dataset: 0-4129898 | Loss: 0.557 | 599 ms/step , 115198.43 GFLOP/s , 173162.0 tokens/s INFO:__main__:2024-11-30 15:53:52 | Epoch: 1 | Step: 377890 | Dataset: 0-4132298 | Loss: 0.578 | 599 ms/step , 115150.55 GFLOP/s , 173221.0 tokens/s INFO:__main__:2024-11-30 15:53:59 | Epoch: 1 | Step: 377900 | Dataset: 0-4134698 | Loss: 0.597 | 599 ms/step , 115229.63 GFLOP/s , 173214.2 tokens/s INFO:__main__:2024-11-30 15:54:06 | Epoch: 1 | Step: 377910 | Dataset: 0-4137098 | Loss: 0.619 | 598 ms/step , 115383.61 GFLOP/s , 173251.0 tokens/s INFO:__main__:2024-11-30 15:54:13 | Epoch: 1 | Step: 377920 | Dataset: 0-4139498 | Loss: 0.578 | 599 ms/step , 115219.54 GFLOP/s , 173249.0 tokens/s INFO:__main__:2024-11-30 15:54:20 | Epoch: 1 | Step: 377930 | Dataset: 0-4141898 | Loss: 0.577 | 598 ms/step , 115376.63 GFLOP/s , 173157.3 tokens/s INFO:__main__:2024-11-30 15:54:27 | Epoch: 1 | Step: 377940 | Dataset: 0-4144298 | Loss: 0.600 | 599 ms/step , 115276.27 GFLOP/s , 173158.1 tokens/s INFO:__main__:2024-11-30 15:54:35 | Epoch: 1 | Step: 377950 | Dataset: 0-4146698 | Loss: 0.597 | 598 ms/step , 115313.94 GFLOP/s , 173079.5 tokens/s INFO:__main__:2024-11-30 15:54:42 | Epoch: 1 | Step: 377960 | Dataset: 0-4149098 | Loss: 0.580 | 599 ms/step , 115194.18 GFLOP/s , 173225.2 tokens/s INFO:__main__:2024-11-30 15:54:49 | Epoch: 1 | Step: 377970 | Dataset: 0-4151498 | Loss: 0.563 | 598 ms/step , 115420.83 GFLOP/s , 173229.5 tokens/s INFO:__main__:2024-11-30 15:54:56 | Epoch: 1 | Step: 377980 | Dataset: 0-4153898 | Loss: 0.605 | 598 ms/step , 115383.26 GFLOP/s , 173223.9 tokens/s INFO:__main__:2024-11-30 15:55:03 | Epoch: 1 | Step: 377990 | Dataset: 0-4156298 | Loss: 0.576 | 599 ms/step , 115286.11 GFLOP/s , 173101.5 tokens/s INFO:__main__:2024-11-30 15:55:11 | Validation | Step: 378000 | Val_loss: 0.331 | Best_val_loss: 0.3487 INFO:__main__:2024-11-30 15:55:11 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_155511_step_378000.pt` INFO:__main__:2024-11-30 15:55:13 | Epoch: 1 | Step: 378000 | Dataset: 0-4158698 | Loss: 0.564 | 595 ms/step , 116040.71 GFLOP/s , 120525.5 tokens/s INFO:__main__:2024-11-30 15:55:20 | Epoch: 1 | Step: 378010 | Dataset: 0-4161098 | Loss: 0.552 | 598 ms/step , 115339.19 GFLOP/s , 173456.9 tokens/s INFO:__main__:2024-11-30 15:55:27 | Epoch: 1 | Step: 378020 | Dataset: 0-4163498 | Loss: 0.588 | 598 ms/step , 115438.19 GFLOP/s , 173403.5 tokens/s INFO:__main__:2024-11-30 15:55:34 | Epoch: 1 | Step: 378030 | Dataset: 0-4165898 | Loss: 0.580 | 598 ms/step , 115313.81 GFLOP/s , 173262.4 tokens/s INFO:__main__:2024-11-30 15:55:42 | Epoch: 1 | Step: 378040 | Dataset: 0-4168298 | Loss: 0.556 | 598 ms/step , 115497.80 GFLOP/s , 173430.2 tokens/s INFO:__main__:2024-11-30 15:55:49 | Epoch: 1 | Step: 378050 | Dataset: 0-4170698 | Loss: 0.597 | 599 ms/step , 115193.19 GFLOP/s , 173287.1 tokens/s INFO:__main__:2024-11-30 15:55:56 | Epoch: 1 | Step: 378060 | Dataset: 0-4173098 | Loss: 0.628 | 599 ms/step , 115241.41 GFLOP/s , 173290.5 tokens/s INFO:__main__:2024-11-30 15:56:03 | Epoch: 1 | Step: 378070 | Dataset: 0-4175498 | Loss: 0.603 | 599 ms/step , 115238.32 GFLOP/s , 173278.6 tokens/s INFO:__main__:2024-11-30 15:56:10 | Epoch: 1 | Step: 378080 | Dataset: 0-4177898 | Loss: 0.591 | 597 ms/step , 115515.20 GFLOP/s , 173278.8 tokens/s INFO:__main__:2024-11-30 15:56:17 | Epoch: 1 | Step: 378090 | Dataset: 0-4180298 | Loss: 0.579 | 599 ms/step , 115191.40 GFLOP/s , 173210.5 tokens/s INFO:__main__:2024-11-30 15:56:24 | Epoch: 1 | Step: 378100 | Dataset: 0-4182698 | Loss: 0.539 | 598 ms/step , 115370.24 GFLOP/s , 173214.7 tokens/s INFO:__main__:2024-11-30 15:56:31 | Epoch: 1 | Step: 378110 | Dataset: 0-4185098 | Loss: 0.619 | 598 ms/step , 115370.32 GFLOP/s , 173228.7 tokens/s INFO:__main__:2024-11-30 15:56:38 | Epoch: 1 | Step: 378120 | Dataset: 0-4187498 | Loss: 0.590 | 599 ms/step , 115284.49 GFLOP/s , 173291.4 tokens/s INFO:__main__:2024-11-30 15:56:45 | Epoch: 1 | Step: 378130 | Dataset: 0-4189898 | Loss: 0.567 | 598 ms/step , 115345.28 GFLOP/s , 173247.8 tokens/s INFO:__main__:2024-11-30 15:56:52 | Epoch: 1 | Step: 378140 | Dataset: 0-4192298 | Loss: 0.563 | 598 ms/step , 115358.87 GFLOP/s , 173232.2 tokens/s INFO:__main__:2024-11-30 15:57:00 | Epoch: 1 | Step: 378150 | Dataset: 0-4194698 | Loss: 0.524 | 598 ms/step , 115351.33 GFLOP/s , 173323.6 tokens/s INFO:__main__:2024-11-30 15:57:07 | Epoch: 1 | Step: 378160 | Dataset: 0-4197098 | Loss: 0.443 | 599 ms/step , 115243.49 GFLOP/s , 173350.4 tokens/s INFO:__main__:2024-11-30 15:57:14 | Epoch: 1 | Step: 378170 | Dataset: 0-4199498 | Loss: 0.441 | 599 ms/step , 115298.28 GFLOP/s , 173227.8 tokens/s INFO:__main__:2024-11-30 15:57:21 | Epoch: 1 | Step: 378180 | Dataset: 0-4201898 | Loss: 0.435 | 598 ms/step , 115470.43 GFLOP/s , 173227.6 tokens/s INFO:__main__:2024-11-30 15:57:28 | Epoch: 1 | Step: 378190 | Dataset: 0-4204298 | Loss: 0.447 | 599 ms/step , 115249.71 GFLOP/s , 173131.3 tokens/s INFO:__main__:2024-11-30 15:57:35 | Epoch: 1 | Step: 378200 | Dataset: 0-4206698 | Loss: 0.442 | 598 ms/step , 115353.41 GFLOP/s , 173215.3 tokens/s INFO:__main__:2024-11-30 15:57:42 | Epoch: 1 | Step: 378210 | Dataset: 0-4209098 | Loss: 0.447 | 598 ms/step , 115411.29 GFLOP/s , 173179.4 tokens/s INFO:__main__:2024-11-30 15:57:49 | Epoch: 1 | Step: 378220 | Dataset: 0-4211498 | Loss: 0.455 | 597 ms/step , 115641.58 GFLOP/s , 173246.5 tokens/s INFO:__main__:2024-11-30 15:57:56 | Epoch: 1 | Step: 378230 | Dataset: 0-4213898 | Loss: 0.426 | 598 ms/step , 115373.94 GFLOP/s , 173266.7 tokens/s INFO:__main__:2024-11-30 15:58:03 | Epoch: 1 | Step: 378240 | Dataset: 0-4216298 | Loss: 0.427 | 598 ms/step , 115378.04 GFLOP/s , 173225.9 tokens/s INFO:__main__:2024-11-30 15:58:10 | Epoch: 1 | Step: 378250 | Dataset: 0-4218698 | Loss: 0.457 | 598 ms/step , 115457.32 GFLOP/s , 173234.7 tokens/s INFO:__main__:2024-11-30 15:58:18 | Epoch: 1 | Step: 378260 | Dataset: 0-4221098 | Loss: 0.414 | 598 ms/step , 115414.83 GFLOP/s , 173227.9 tokens/s INFO:__main__:2024-11-30 15:58:25 | Epoch: 1 | Step: 378270 | Dataset: 0-4223498 | Loss: 0.468 | 599 ms/step , 115304.76 GFLOP/s , 173167.0 tokens/s INFO:__main__:2024-11-30 15:58:32 | Epoch: 1 | Step: 378280 | Dataset: 0-4225898 | Loss: 0.458 | 598 ms/step , 115423.64 GFLOP/s , 173159.9 tokens/s INFO:__main__:2024-11-30 15:58:39 | Epoch: 1 | Step: 378290 | Dataset: 0-4228298 | Loss: 0.447 | 599 ms/step , 115257.95 GFLOP/s , 173251.7 tokens/s INFO:__main__:2024-11-30 15:58:46 | Epoch: 1 | Step: 378300 | Dataset: 0-4230698 | Loss: 0.421 | 599 ms/step , 115290.89 GFLOP/s , 173237.0 tokens/s INFO:__main__:2024-11-30 15:58:53 | Epoch: 1 | Step: 378310 | Dataset: 0-4233098 | Loss: 0.375 | 597 ms/step , 115567.13 GFLOP/s , 173269.2 tokens/s INFO:__main__:2024-11-30 15:59:00 | Epoch: 1 | Step: 378320 | Dataset: 0-4235498 | Loss: 0.421 | 598 ms/step , 115458.10 GFLOP/s , 173261.8 tokens/s INFO:__main__:2024-11-30 15:59:07 | Epoch: 1 | Step: 378330 | Dataset: 0-4237898 | Loss: 0.403 | 598 ms/step , 115328.63 GFLOP/s , 173189.3 tokens/s INFO:__main__:2024-11-30 15:59:14 | Epoch: 1 | Step: 378340 | Dataset: 0-4240298 | Loss: 0.431 | 598 ms/step , 115360.91 GFLOP/s , 173185.9 tokens/s INFO:__main__:2024-11-30 15:59:21 | Epoch: 1 | Step: 378350 | Dataset: 0-4242698 | Loss: 0.382 | 599 ms/step , 115226.28 GFLOP/s , 173247.7 tokens/s INFO:__main__:2024-11-30 15:59:28 | Epoch: 1 | Step: 378360 | Dataset: 0-4245098 | Loss: 0.428 | 598 ms/step , 115347.24 GFLOP/s , 173077.7 tokens/s INFO:__main__:2024-11-30 15:59:36 | Epoch: 1 | Step: 378370 | Dataset: 0-4247498 | Loss: 0.460 | 598 ms/step , 115347.93 GFLOP/s , 173262.4 tokens/s INFO:__main__:2024-11-30 15:59:43 | Epoch: 1 | Step: 378380 | Dataset: 0-4249898 | Loss: 0.419 | 599 ms/step , 115199.24 GFLOP/s , 173173.2 tokens/s INFO:__main__:2024-11-30 15:59:50 | Epoch: 1 | Step: 378390 | Dataset: 0-4252298 | Loss: 0.450 | 598 ms/step , 115445.69 GFLOP/s , 173247.8 tokens/s INFO:__main__:2024-11-30 15:59:57 | Epoch: 1 | Step: 378400 | Dataset: 0-4254698 | Loss: 0.415 | 599 ms/step , 115270.42 GFLOP/s , 173243.6 tokens/s INFO:__main__:2024-11-30 16:00:04 | Epoch: 1 | Step: 378410 | Dataset: 0-4257098 | Loss: 0.425 | 598 ms/step , 115476.29 GFLOP/s , 173248.5 tokens/s INFO:__main__:2024-11-30 16:00:11 | Epoch: 1 | Step: 378420 | Dataset: 0-4259498 | Loss: 0.369 | 598 ms/step , 115315.63 GFLOP/s , 173230.0 tokens/s INFO:__main__:2024-11-30 16:00:18 | Epoch: 1 | Step: 378430 | Dataset: 0-4261898 | Loss: 0.523 | 599 ms/step , 115233.38 GFLOP/s , 173275.8 tokens/s INFO:__main__:2024-11-30 16:00:25 | Epoch: 1 | Step: 378440 | Dataset: 0-4264298 | Loss: 0.431 | 598 ms/step , 115393.98 GFLOP/s , 173225.8 tokens/s INFO:__main__:2024-11-30 16:00:32 | Epoch: 1 | Step: 378450 | Dataset: 0-4266698 | Loss: 0.444 | 597 ms/step , 115512.51 GFLOP/s , 173268.8 tokens/s INFO:__main__:2024-11-30 16:00:39 | Epoch: 1 | Step: 378460 | Dataset: 0-4269098 | Loss: 0.399 | 598 ms/step , 115359.42 GFLOP/s , 173102.7 tokens/s INFO:__main__:2024-11-30 16:00:47 | Epoch: 1 | Step: 378470 | Dataset: 0-4271498 | Loss: 0.438 | 599 ms/step , 115278.11 GFLOP/s , 173117.8 tokens/s INFO:__main__:2024-11-30 16:00:54 | Epoch: 1 | Step: 378480 | Dataset: 0-4273898 | Loss: 0.422 | 598 ms/step , 115401.05 GFLOP/s , 173251.3 tokens/s INFO:__main__:2024-11-30 16:01:01 | Epoch: 1 | Step: 378490 | Dataset: 0-4276298 | Loss: 0.399 | 598 ms/step , 115320.96 GFLOP/s , 173171.9 tokens/s INFO:__main__:2024-11-30 16:01:08 | Validation | Step: 378500 | Val_loss: 0.328 | Best_val_loss: 0.3307 INFO:__main__:2024-11-30 16:01:08 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_160108_step_378500.pt` INFO:__main__:2024-11-30 16:01:11 | Epoch: 1 | Step: 378500 | Dataset: 0-4278698 | Loss: 0.440 | 596 ms/step , 115871.58 GFLOP/s , 120134.2 tokens/s INFO:__main__:2024-11-30 16:01:18 | Epoch: 1 | Step: 378510 | Dataset: 0-4281098 | Loss: 0.449 | 598 ms/step , 115407.24 GFLOP/s , 173405.9 tokens/s INFO:__main__:2024-11-30 16:01:25 | Epoch: 1 | Step: 378520 | Dataset: 0-4283498 | Loss: 0.494 | 597 ms/step , 115513.39 GFLOP/s , 173422.8 tokens/s INFO:__main__:2024-11-30 16:01:32 | Epoch: 1 | Step: 378530 | Dataset: 0-4285898 | Loss: 0.429 | 598 ms/step , 115312.88 GFLOP/s , 173303.7 tokens/s INFO:__main__:2024-11-30 16:01:39 | Epoch: 1 | Step: 378540 | Dataset: 0-4288298 | Loss: 0.404 | 598 ms/step , 115363.38 GFLOP/s , 173195.4 tokens/s INFO:__main__:2024-11-30 16:01:46 | Epoch: 1 | Step: 378550 | Dataset: 0-4290698 | Loss: 0.420 | 598 ms/step , 115442.86 GFLOP/s , 173276.7 tokens/s INFO:__main__:2024-11-30 16:01:53 | Epoch: 1 | Step: 378560 | Dataset: 0-4293098 | Loss: 0.449 | 598 ms/step , 115386.08 GFLOP/s , 173364.0 tokens/s INFO:__main__:2024-11-30 16:02:01 | Epoch: 1 | Step: 378570 | Dataset: 0-4295498 | Loss: 0.453 | 599 ms/step , 115297.78 GFLOP/s , 173247.0 tokens/s INFO:__main__:2024-11-30 16:02:08 | Epoch: 1 | Step: 378580 | Dataset: 0-4297898 | Loss: 0.487 | 598 ms/step , 115497.87 GFLOP/s , 173327.2 tokens/s INFO:__main__:2024-11-30 16:02:15 | Epoch: 1 | Step: 378590 | Dataset: 0-4300298 | Loss: 0.451 | 598 ms/step , 115325.40 GFLOP/s , 173212.7 tokens/s INFO:__main__:2024-11-30 16:02:22 | Epoch: 1 | Step: 378600 | Dataset: 0-4302698 | Loss: 0.450 | 598 ms/step , 115371.04 GFLOP/s , 173253.5 tokens/s INFO:__main__:2024-11-30 16:02:29 | Epoch: 1 | Step: 378610 | Dataset: 0-4305098 | Loss: 0.419 | 598 ms/step , 115404.25 GFLOP/s , 173307.1 tokens/s INFO:__main__:2024-11-30 16:02:36 | Epoch: 1 | Step: 378620 | Dataset: 0-4307498 | Loss: 0.397 | 598 ms/step , 115432.00 GFLOP/s , 173190.2 tokens/s INFO:__main__:2024-11-30 16:02:43 | Epoch: 1 | Step: 378630 | Dataset: 0-4309898 | Loss: 0.441 | 599 ms/step , 115262.69 GFLOP/s , 173261.3 tokens/s INFO:__main__:2024-11-30 16:02:50 | Epoch: 1 | Step: 378640 | Dataset: 0-4312298 | Loss: 0.453 | 598 ms/step , 115498.67 GFLOP/s , 173267.1 tokens/s INFO:__main__:2024-11-30 16:02:57 | Epoch: 1 | Step: 378650 | Dataset: 0-4314698 | Loss: 0.449 | 598 ms/step , 115324.88 GFLOP/s , 173232.6 tokens/s INFO:__main__:2024-11-30 16:03:04 | Epoch: 1 | Step: 378660 | Dataset: 0-4317098 | Loss: 0.421 | 598 ms/step , 115488.77 GFLOP/s , 173311.5 tokens/s INFO:__main__:2024-11-30 16:03:11 | Epoch: 1 | Step: 378670 | Dataset: 0-4319498 | Loss: 0.450 | 599 ms/step , 115229.32 GFLOP/s , 173198.5 tokens/s INFO:__main__:2024-11-30 16:03:19 | Epoch: 1 | Step: 378680 | Dataset: 0-4321898 | Loss: 0.366 | 599 ms/step , 115247.92 GFLOP/s , 173209.5 tokens/s INFO:__main__:2024-11-30 16:03:26 | Epoch: 1 | Step: 378690 | Dataset: 0-4324298 | Loss: 1.755 | 599 ms/step , 115190.55 GFLOP/s , 173185.0 tokens/s INFO:__main__:2024-11-30 16:03:33 | Epoch: 1 | Step: 378700 | Dataset: 0-4326698 | Loss: 0.597 | 597 ms/step , 115529.67 GFLOP/s , 173261.0 tokens/s INFO:__main__:2024-11-30 16:03:40 | Epoch: 1 | Step: 378710 | Dataset: 0-4329098 | Loss: 0.549 | 599 ms/step , 115298.53 GFLOP/s , 173194.0 tokens/s INFO:__main__:2024-11-30 16:03:47 | Epoch: 1 | Step: 378720 | Dataset: 0-4331498 | Loss: 0.535 | 598 ms/step , 115309.75 GFLOP/s , 173219.2 tokens/s INFO:__main__:2024-11-30 16:03:54 | Epoch: 1 | Step: 378730 | Dataset: 0-4333898 | Loss: 0.570 | 599 ms/step , 115281.93 GFLOP/s , 173189.8 tokens/s INFO:__main__:2024-11-30 16:04:01 | Epoch: 1 | Step: 378740 | Dataset: 0-4336298 | Loss: 0.557 | 598 ms/step , 115373.72 GFLOP/s , 173195.2 tokens/s INFO:__main__:2024-11-30 16:04:08 | Epoch: 1 | Step: 378750 | Dataset: 0-4338698 | Loss: 0.564 | 599 ms/step , 115306.17 GFLOP/s , 173229.7 tokens/s INFO:__main__:2024-11-30 16:04:15 | Epoch: 1 | Step: 378760 | Dataset: 0-4341098 | Loss: 0.550 | 598 ms/step , 115408.49 GFLOP/s , 173254.5 tokens/s INFO:__main__:2024-11-30 16:04:22 | Epoch: 1 | Step: 378770 | Dataset: 0-4343498 | Loss: 0.587 | 598 ms/step , 115347.30 GFLOP/s , 173243.3 tokens/s INFO:__main__:2024-11-30 16:04:30 | Epoch: 1 | Step: 378780 | Dataset: 0-4345898 | Loss: 0.529 | 598 ms/step , 115402.29 GFLOP/s , 173208.4 tokens/s INFO:__main__:2024-11-30 16:04:37 | Epoch: 1 | Step: 378790 | Dataset: 0-4348298 | Loss: 0.497 | 598 ms/step , 115363.41 GFLOP/s , 173280.3 tokens/s INFO:__main__:2024-11-30 16:04:44 | Epoch: 1 | Step: 378800 | Dataset: 0-4350698 | Loss: 0.501 | 598 ms/step , 115436.94 GFLOP/s , 173286.1 tokens/s INFO:__main__:2024-11-30 16:04:51 | Epoch: 1 | Step: 378810 | Dataset: 0-4353098 | Loss: 0.462 | 599 ms/step , 115266.96 GFLOP/s , 173250.6 tokens/s INFO:__main__:2024-11-30 16:04:58 | Epoch: 1 | Step: 378820 | Dataset: 0-4355498 | Loss: 0.577 | 598 ms/step , 115339.62 GFLOP/s , 173080.2 tokens/s INFO:__main__:2024-11-30 16:05:05 | Epoch: 1 | Step: 378830 | Dataset: 0-4357898 | Loss: 0.565 | 598 ms/step , 115325.56 GFLOP/s , 173199.1 tokens/s INFO:__main__:2024-11-30 16:05:12 | Epoch: 1 | Step: 378840 | Dataset: 0-4360298 | Loss: 0.407 | 598 ms/step , 115420.21 GFLOP/s , 173143.8 tokens/s INFO:__main__:2024-11-30 16:05:19 | Epoch: 1 | Step: 378850 | Dataset: 0-4362698 | Loss: 0.418 | 599 ms/step , 115155.96 GFLOP/s , 173279.8 tokens/s INFO:__main__:2024-11-30 16:05:26 | Epoch: 1 | Step: 378860 | Dataset: 0-4365098 | Loss: 0.359 | 598 ms/step , 115414.88 GFLOP/s , 173255.4 tokens/s INFO:__main__:2024-11-30 16:05:33 | Epoch: 1 | Step: 378870 | Dataset: 0-4367498 | Loss: 0.400 | 599 ms/step , 115304.55 GFLOP/s , 173305.9 tokens/s INFO:__main__:2024-11-30 16:05:40 | Epoch: 1 | Step: 378880 | Dataset: 0-4369898 | Loss: 0.374 | 598 ms/step , 115311.97 GFLOP/s , 173175.6 tokens/s INFO:__main__:2024-11-30 16:05:48 | Epoch: 1 | Step: 378890 | Dataset: 0-4372298 | Loss: 0.404 | 598 ms/step , 115388.96 GFLOP/s , 173215.6 tokens/s INFO:__main__:2024-11-30 16:05:55 | Epoch: 1 | Step: 378900 | Dataset: 0-4374698 | Loss: 0.442 | 598 ms/step , 115500.20 GFLOP/s , 173160.8 tokens/s INFO:__main__:2024-11-30 16:06:02 | Epoch: 1 | Step: 378910 | Dataset: 0-4377098 | Loss: 0.389 | 599 ms/step , 115297.32 GFLOP/s , 173279.2 tokens/s INFO:__main__:2024-11-30 16:06:09 | Epoch: 1 | Step: 378920 | Dataset: 0-4379498 | Loss: 0.350 | 599 ms/step , 115294.48 GFLOP/s , 173268.7 tokens/s INFO:__main__:2024-11-30 16:06:16 | Epoch: 1 | Step: 378930 | Dataset: 0-4381898 | Loss: 0.378 | 599 ms/step , 115300.36 GFLOP/s , 173158.0 tokens/s INFO:__main__:2024-11-30 16:06:23 | Epoch: 1 | Step: 378940 | Dataset: 0-4384298 | Loss: 0.390 | 598 ms/step , 115469.78 GFLOP/s , 173287.2 tokens/s INFO:__main__:2024-11-30 16:06:30 | Epoch: 1 | Step: 378950 | Dataset: 0-4386698 | Loss: 0.399 | 598 ms/step , 115439.79 GFLOP/s , 173204.9 tokens/s INFO:__main__:2024-11-30 16:06:37 | Epoch: 1 | Step: 378960 | Dataset: 0-4389098 | Loss: 0.391 | 598 ms/step , 115449.91 GFLOP/s , 173234.1 tokens/s INFO:__main__:2024-11-30 16:06:44 | Epoch: 1 | Step: 378970 | Dataset: 0-4391498 | Loss: 0.389 | 599 ms/step , 115284.24 GFLOP/s , 173286.7 tokens/s INFO:__main__:2024-11-30 16:06:51 | Epoch: 1 | Step: 378980 | Dataset: 0-4393898 | Loss: 0.351 | 598 ms/step , 115335.51 GFLOP/s , 173086.6 tokens/s INFO:__main__:2024-11-30 16:06:58 | Epoch: 1 | Step: 378990 | Dataset: 0-4396298 | Loss: 0.455 | 599 ms/step , 115282.84 GFLOP/s , 173207.7 tokens/s INFO:__main__:2024-11-30 16:07:06 | Validation | Step: 379000 | Val_loss: 0.319 | Best_val_loss: 0.3283 INFO:__main__:2024-11-30 16:07:06 | Saving full-param checkpoint to `/root/autodl-tmp/checkpoint/checkpoint_20241130_160706_step_379000.pt` INFO:__main__:2024-11-30 16:07:09 | Epoch: 1 | Step: 379000 | Dataset: 0-4398698 | Loss: 0.357 | 595 ms/step , 116065.62 GFLOP/s , 120054.0 tokens/s INFO:__main__:2024-11-30 16:07:16 | Epoch: 1 | Step: 379010 | Dataset: 0-4401098 | Loss: 0.403 | 598 ms/step , 115387.45 GFLOP/s , 173490.6 tokens/s