[2026-04-21 20:28:38] CUDA_VISIBLE_DEVICES: 0,1 [2026-04-21 20:28:38] Number of processes: 2 [2026-04-21 20:28:38] Process index: 0 [2026-04-21 20:28:38] Mixed precision: bf16 [2026-04-21 20:28:38] ============================================================ [2026-04-21 20:28:38] Pythia Training Pipeline (Hydra + Trackio + Accelerate) [2026-04-21 20:28:38] ============================================================ [2026-04-21 20:28:38] Config: model: name: EleutherAI/pythia-1b checkpoint_path: null from_scratch: false training: epochs: 3 batch_size: 4 eval_batch_size: 12 gradient_accumulation_steps: 4 lr: 2.0e-05 weight_decay: 0.1 betas: - 0.9 - 0.95 eps: 1.0e-08 lr_scheduler: wsd warmup_ratio: 0.1 decay_ratio: 0.2 warmup_steps: 100 min_lr_ratio: 0.1 max_grad_norm: 1.0 use_amp: true resume: false resume_checkpoint: null data: path: /home/test/byte-llms-code/code_completion_exp/datasets/data_V5_full max_context_len: 4096 max_target_len: 256 num_workers: 4 pin_memory: true logging: log_interval: 10 save_interval: 3000 eval_interval: 1000 save_every_epoch: true tracking: enabled: true backend: wandb project: code-completion-full-docstring run_name: pythia_train entity: null base_url: https://wandb.platun0v.ru local_dir: outputs/2026-04-21/20-28-37 paths: output_dir: outputs/2026-04-21/20-28-37 seed: 42 device: cuda [2026-04-21 20:28:40] Initializing tokenizer... [2026-04-21 20:28:40] Loading model... [2026-04-21 20:28:44] Loaded pretrained: EleutherAI/pythia-1b [2026-04-21 20:28:44] Total params: 1,011,781,632 [2026-04-21 20:28:44] Trainable params: 1,011,781,632 [2026-04-21 20:28:44] Creating dataloaders... [2026-04-21 20:28:44] Train dataset size: 338932 [2026-04-21 20:28:44] Train batches per epoch (before DDP split): 84733 [2026-04-21 20:28:44] Validation dataset size: 37592 [2026-04-21 20:28:44] Validation batches: 3133 [2026-04-21 20:28:44] Creating optimizer... [2026-04-21 20:28:44] Total steps: 31775, Steps per epoch: 42367 [2026-04-21 20:28:44] Preparing model, optimizer, and dataloaders with Accelerate... [2026-04-21 20:28:45] Train batches per epoch (after DDP split): 42367 [2026-04-21 20:28:45] Starting training... [2026-04-21 20:28:45] ============================================================ [2026-04-21 20:28:45] EPOCH 1/3 [2026-04-21 20:28:45] ============================================================ [2026-04-21 20:28:52] Epoch 1 | Step 10 | Loss: 2.5546 | LR: 2.11e-06 [2026-04-21 20:28:57] Epoch 1 | Step 20 | Loss: 2.5085 | LR: 2.23e-06 [2026-04-21 20:29:02] Epoch 1 | Step 30 | Loss: 2.4054 | LR: 2.34e-06 [2026-04-21 20:29:08] Epoch 1 | Step 40 | Loss: 2.3876 | LR: 2.45e-06 [2026-04-21 20:29:13] Epoch 1 | Step 50 | Loss: 2.2760 | LR: 2.57e-06 [2026-04-21 20:29:18] Epoch 1 | Step 60 | Loss: 2.1904 | LR: 2.68e-06 [2026-04-21 20:29:23] Epoch 1 | Step 70 | Loss: 2.0895 | LR: 2.79e-06 [2026-04-21 20:29:29] Epoch 1 | Step 80 | Loss: 2.0168 | LR: 2.91e-06 [2026-04-21 20:29:34] Epoch 1 | Step 90 | Loss: 1.9734 | LR: 3.02e-06 [2026-04-21 20:29:39] Epoch 1 | Step 100 | Loss: 1.9267 | LR: 3.13e-06 [2026-04-21 20:29:44] Epoch 1 | Step 110 | Loss: 1.8862 | LR: 3.25e-06 [2026-04-21 20:29:49] Epoch 1 | Step 120 | Loss: 1.8307 | LR: 3.36e-06 [2026-04-21 20:29:54] Epoch 1 | Step 130 | Loss: 1.7898 | LR: 3.47e-06 [2026-04-21 20:29:59] Epoch 1 | Step 140 | Loss: 1.7454 | LR: 3.59e-06 [2026-04-21 20:30:04] Epoch 1 | Step 150 | Loss: 1.7093 | LR: 3.70e-06 [2026-04-21 20:30:09] Epoch 1 | Step 160 | Loss: 1.6776 | LR: 3.81e-06 [2026-04-21 20:30:15] Epoch 1 | Step 170 | Loss: 1.6486 | LR: 3.93e-06 [2026-04-21 20:30:20] Epoch 1 | Step 180 | Loss: 1.6184 | LR: 4.04e-06 [2026-04-21 20:30:25] Epoch 1 | Step 190 | Loss: 1.5929 | LR: 4.15e-06 [2026-04-21 20:30:32] Epoch 1 | Step 200 | Loss: 1.5810 | LR: 4.27e-06 [2026-04-21 20:30:37] Epoch 1 | Step 210 | Loss: 1.5594 | LR: 4.38e-06 [2026-04-21 20:30:42] Epoch 1 | Step 220 | Loss: 1.5452 | LR: 4.49e-06 [2026-04-21 20:30:47] Epoch 1 | Step 230 | Loss: 1.5327 | LR: 4.61e-06 [2026-04-21 20:30:52] Epoch 1 | Step 240 | Loss: 1.5171 | LR: 4.72e-06 [2026-04-21 20:30:57] Epoch 1 | Step 250 | Loss: 1.5077 | LR: 4.83e-06 [2026-04-21 20:31:03] Epoch 1 | Step 260 | Loss: 1.4969 | LR: 4.95e-06 [2026-04-21 20:31:08] Epoch 1 | Step 270 | Loss: 1.4847 | LR: 5.06e-06 [2026-04-21 20:31:13] Epoch 1 | Step 280 | Loss: 1.4741 | LR: 5.17e-06 [2026-04-21 20:31:19] Epoch 1 | Step 290 | Loss: 1.4624 | LR: 5.29e-06 [2026-04-21 20:31:24] Epoch 1 | Step 300 | Loss: 1.4509 | LR: 5.40e-06 [2026-04-21 20:31:29] Epoch 1 | Step 310 | Loss: 1.4492 | LR: 5.51e-06 [2026-04-21 20:31:35] Epoch 1 | Step 320 | Loss: 1.4423 | LR: 5.63e-06 [2026-04-21 20:31:40] Epoch 1 | Step 330 | Loss: 1.4348 | LR: 5.74e-06 [2026-04-21 20:31:45] Epoch 1 | Step 340 | Loss: 1.4270 | LR: 5.85e-06 [2026-04-21 20:31:50] Epoch 1 | Step 350 | Loss: 1.4185 | LR: 5.97e-06 [2026-04-21 20:31:55] Epoch 1 | Step 360 | Loss: 1.4117 | LR: 6.08e-06 [2026-04-21 20:32:00] Epoch 1 | Step 370 | Loss: 1.4017 | LR: 6.19e-06 [2026-04-21 20:32:06] Epoch 1 | Step 380 | Loss: 1.3923 | LR: 6.31e-06 [2026-04-21 20:32:12] Epoch 1 | Step 390 | Loss: 1.3889 | LR: 6.42e-06 [2026-04-21 20:32:17] Epoch 1 | Step 400 | Loss: 1.3828 | LR: 6.53e-06 [2026-04-21 20:32:22] Epoch 1 | Step 410 | Loss: 1.3732 | LR: 6.65e-06 [2026-04-21 20:32:27] Epoch 1 | Step 420 | Loss: 1.3689 | LR: 6.76e-06 [2026-04-21 20:32:32] Epoch 1 | Step 430 | Loss: 1.3621 | LR: 6.87e-06 [2026-04-21 20:32:37] Epoch 1 | Step 440 | Loss: 1.3552 | LR: 6.99e-06 [2026-04-21 20:32:43] Epoch 1 | Step 450 | Loss: 1.3514 | LR: 7.10e-06 [2026-04-21 20:32:48] Epoch 1 | Step 460 | Loss: 1.3462 | LR: 7.21e-06 [2026-04-21 20:32:53] Epoch 1 | Step 470 | Loss: 1.3405 | LR: 7.33e-06 [2026-04-21 20:32:58] Epoch 1 | Step 480 | Loss: 1.3348 | LR: 7.44e-06 [2026-04-21 20:33:03] Epoch 1 | Step 490 | Loss: 1.3318 | LR: 7.55e-06 [2026-04-21 20:33:08] Epoch 1 | Step 500 | Loss: 1.3305 | LR: 7.67e-06 [2026-04-21 20:33:14] Epoch 1 | Step 510 | Loss: 1.3274 | LR: 7.78e-06 [2026-04-21 20:33:19] Epoch 1 | Step 520 | Loss: 1.3249 | LR: 7.89e-06 [2026-04-21 20:33:24] Epoch 1 | Step 530 | Loss: 1.3216 | LR: 8.01e-06 [2026-04-21 20:33:29] Epoch 1 | Step 540 | Loss: 1.3157 | LR: 8.12e-06 [2026-04-21 20:33:34] Epoch 1 | Step 550 | Loss: 1.3099 | LR: 8.23e-06 [2026-04-21 20:33:40] Epoch 1 | Step 560 | Loss: 1.3061 | LR: 8.35e-06 [2026-04-21 20:33:45] Epoch 1 | Step 570 | Loss: 1.3021 | LR: 8.46e-06 [2026-04-21 20:33:49] Epoch 1 | Step 580 | Loss: 1.2992 | LR: 8.57e-06 [2026-04-21 20:33:55] Epoch 1 | Step 590 | Loss: 1.2976 | LR: 8.69e-06 [2026-04-21 20:34:01] Epoch 1 | Step 600 | Loss: 1.2948 | LR: 8.80e-06 [2026-04-21 20:34:05] Epoch 1 | Step 610 | Loss: 1.2929 | LR: 8.91e-06 [2026-04-21 20:34:11] Epoch 1 | Step 620 | Loss: 1.2885 | LR: 9.03e-06 [2026-04-21 20:34:16] Epoch 1 | Step 630 | Loss: 1.2850 | LR: 9.14e-06 [2026-04-21 20:34:21] Epoch 1 | Step 640 | Loss: 1.2805 | LR: 9.25e-06 [2026-04-21 20:34:27] Epoch 1 | Step 650 | Loss: 1.2774 | LR: 9.37e-06 [2026-04-21 20:34:33] Epoch 1 | Step 660 | Loss: 1.2761 | LR: 9.48e-06 [2026-04-21 20:34:39] Epoch 1 | Step 670 | Loss: 1.2733 | LR: 9.59e-06 [2026-04-21 20:34:43] Epoch 1 | Step 680 | Loss: 1.2702 | LR: 9.71e-06 [2026-04-21 20:34:49] Epoch 1 | Step 690 | Loss: 1.2677 | LR: 9.82e-06 [2026-04-21 20:34:54] Epoch 1 | Step 700 | Loss: 1.2673 | LR: 9.93e-06 [2026-04-21 20:35:00] Epoch 1 | Step 710 | Loss: 1.2667 | LR: 1.00e-05 [2026-04-21 20:35:04] Epoch 1 | Step 720 | Loss: 1.2644 | LR: 1.02e-05 [2026-04-21 20:35:10] Epoch 1 | Step 730 | Loss: 1.2624 | LR: 1.03e-05 [2026-04-21 20:35:15] Epoch 1 | Step 740 | Loss: 1.2606 | LR: 1.04e-05 [2026-04-21 20:35:21] Epoch 1 | Step 750 | Loss: 1.2583 | LR: 1.05e-05 [2026-04-21 20:35:26] Epoch 1 | Step 760 | Loss: 1.2562 | LR: 1.06e-05 [2026-04-21 20:35:32] Epoch 1 | Step 770 | Loss: 1.2550 | LR: 1.07e-05 [2026-04-21 20:35:36] Epoch 1 | Step 780 | Loss: 1.2535 | LR: 1.08e-05 [2026-04-21 20:35:42] Epoch 1 | Step 790 | Loss: 1.2515 | LR: 1.10e-05 [2026-04-21 20:35:47] Epoch 1 | Step 800 | Loss: 1.2504 | LR: 1.11e-05 [2026-04-21 20:35:52] Epoch 1 | Step 810 | Loss: 1.2511 | LR: 1.12e-05 [2026-04-21 20:35:57] Epoch 1 | Step 820 | Loss: 1.2494 | LR: 1.13e-05 [2026-04-21 20:36:03] Epoch 1 | Step 830 | Loss: 1.2492 | LR: 1.14e-05 [2026-04-21 20:36:08] Epoch 1 | Step 840 | Loss: 1.2465 | LR: 1.15e-05 [2026-04-21 20:36:13] Epoch 1 | Step 850 | Loss: 1.2455 | LR: 1.16e-05 [2026-04-21 20:36:18] Epoch 1 | Step 860 | Loss: 1.2465 | LR: 1.17e-05 [2026-04-21 20:36:24] Epoch 1 | Step 870 | Loss: 1.2445 | LR: 1.19e-05 [2026-04-21 20:36:29] Epoch 1 | Step 880 | Loss: 1.2431 | LR: 1.20e-05 [2026-04-21 20:36:35] Epoch 1 | Step 890 | Loss: 1.2415 | LR: 1.21e-05 [2026-04-21 20:36:40] Epoch 1 | Step 900 | Loss: 1.2408 | LR: 1.22e-05 [2026-04-21 20:36:45] Epoch 1 | Step 910 | Loss: 1.2382 | LR: 1.23e-05 [2026-04-21 20:36:51] Epoch 1 | Step 920 | Loss: 1.2367 | LR: 1.24e-05 [2026-04-21 20:36:56] Epoch 1 | Step 930 | Loss: 1.2338 | LR: 1.25e-05 [2026-04-21 20:37:02] Epoch 1 | Step 940 | Loss: 1.2316 | LR: 1.27e-05 [2026-04-21 20:37:06] Epoch 1 | Step 950 | Loss: 1.2297 | LR: 1.28e-05 [2026-04-21 20:37:12] Epoch 1 | Step 960 | Loss: 1.2298 | LR: 1.29e-05 [2026-04-21 20:37:18] Epoch 1 | Step 970 | Loss: 1.2280 | LR: 1.30e-05 [2026-04-21 20:37:23] Epoch 1 | Step 980 | Loss: 1.2269 | LR: 1.31e-05 [2026-04-21 20:37:29] Epoch 1 | Step 990 | Loss: 1.2262 | LR: 1.32e-05 [2026-04-21 20:37:34] Epoch 1 | Step 1000 | Loss: 1.2247 | LR: 1.33e-05 [2026-04-21 20:37:36] Validation | Batch 10/1567 | Loss: 1.0750 [2026-04-21 20:37:37] Validation | Batch 20/1567 | Loss: 1.1627 [2026-04-21 20:37:38] Validation | Batch 30/1567 | Loss: 1.1306 [2026-04-21 20:37:40] Validation | Batch 40/1567 | Loss: 1.1529 [2026-04-21 20:37:41] Validation | Batch 50/1567 | Loss: 1.1330 [2026-04-21 20:37:42] Validation | Batch 60/1567 | Loss: 1.1233 [2026-04-21 20:37:43] Validation | Batch 70/1567 | Loss: 1.1130 [2026-04-21 20:37:45] Validation | Batch 80/1567 | Loss: 1.1363 [2026-04-21 20:37:46] Validation | Batch 90/1567 | Loss: 1.1278 [2026-04-21 20:37:48] Validation | Batch 100/1567 | Loss: 1.1068 [2026-04-21 20:37:49] Validation | Batch 110/1567 | Loss: 1.0971 [2026-04-21 20:37:50] Validation | Batch 120/1567 | Loss: 1.0899 [2026-04-21 20:37:52] Validation | Batch 130/1567 | Loss: 1.0819 [2026-04-21 20:37:53] Validation | Batch 140/1567 | Loss: 1.0927 [2026-04-21 20:37:54] Validation | Batch 150/1567 | Loss: 1.1014 [2026-04-21 20:37:55] Validation | Batch 160/1567 | Loss: 1.1004 [2026-04-21 20:37:56] Validation | Batch 170/1567 | Loss: 1.0931 [2026-04-21 20:37:57] Validation | Batch 180/1567 | Loss: 1.0954 [2026-04-21 20:37:58] Validation | Batch 190/1567 | Loss: 1.1004 [2026-04-21 20:38:00] Validation | Batch 200/1567 | Loss: 1.1040 [2026-04-21 20:38:01] Validation | Batch 210/1567 | Loss: 1.1046 [2026-04-21 20:38:02] Validation | Batch 220/1567 | Loss: 1.1088 [2026-04-21 20:38:04] Validation | Batch 230/1567 | Loss: 1.1122 [2026-04-21 20:38:05] Validation | Batch 240/1567 | Loss: 1.1166 [2026-04-21 20:38:06] Validation | Batch 250/1567 | Loss: 1.1203 [2026-04-21 20:38:07] Validation | Batch 260/1567 | Loss: 1.1226 [2026-04-21 20:38:08] Validation | Batch 270/1567 | Loss: 1.1265 [2026-04-21 20:38:10] Validation | Batch 280/1567 | Loss: 1.1312 [2026-04-21 20:38:12] Validation | Batch 290/1567 | Loss: 1.1293 [2026-04-21 20:38:13] Validation | Batch 300/1567 | Loss: 1.1282 [2026-04-21 20:38:14] Validation | Batch 310/1567 | Loss: 1.1249 [2026-04-21 20:38:15] Validation | Batch 320/1567 | Loss: 1.1263 [2026-04-21 20:38:16] Validation | Batch 330/1567 | Loss: 1.1257 [2026-04-21 20:38:18] Validation | Batch 340/1567 | Loss: 1.1241 [2026-04-21 20:38:19] Validation | Batch 350/1567 | Loss: 1.1217 [2026-04-21 20:38:20] Validation | Batch 360/1567 | Loss: 1.1158 [2026-04-21 20:38:21] Validation | Batch 370/1567 | Loss: 1.1154 [2026-04-21 20:38:23] Validation | Batch 380/1567 | Loss: 1.1192 [2026-04-21 20:38:24] Validation | Batch 390/1567 | Loss: 1.1177 [2026-04-21 20:38:25] Validation | Batch 400/1567 | Loss: 1.1188 [2026-04-21 20:38:26] Validation | Batch 410/1567 | Loss: 1.1153 [2026-04-21 20:38:27] Validation | Batch 420/1567 | Loss: 1.1135 [2026-04-21 20:38:29] Validation | Batch 430/1567 | Loss: 1.1170 [2026-04-21 20:38:30] Validation | Batch 440/1567 | Loss: 1.1168 [2026-04-21 20:38:31] Validation | Batch 450/1567 | Loss: 1.1199 [2026-04-21 20:38:32] Validation | Batch 460/1567 | Loss: 1.1229 [2026-04-21 20:38:33] Validation | Batch 470/1567 | Loss: 1.1276 [2026-04-21 20:38:34] Validation | Batch 480/1567 | Loss: 1.1255 [2026-04-21 20:38:36] Validation | Batch 490/1567 | Loss: 1.1232 [2026-04-21 20:38:36] Validation | Batch 500/1567 | Loss: 1.1244 [2026-04-21 20:38:38] Validation | Batch 510/1567 | Loss: 1.1242 [2026-04-21 20:38:39] Validation | Batch 520/1567 | Loss: 1.1250 [2026-04-21 20:38:40] Validation | Batch 530/1567 | Loss: 1.1234 [2026-04-21 20:38:41] Validation | Batch 540/1567 | Loss: 1.1204 [2026-04-21 20:38:43] Validation | Batch 550/1567 | Loss: 1.1219 [2026-04-21 20:38:44] Validation | Batch 560/1567 | Loss: 1.1209 [2026-04-21 20:38:45] Validation | Batch 570/1567 | Loss: 1.1163 [2026-04-21 20:38:47] Validation | Batch 580/1567 | Loss: 1.1179 [2026-04-21 20:38:48] Validation | Batch 590/1567 | Loss: 1.1171 [2026-04-21 20:38:49] Validation | Batch 600/1567 | Loss: 1.1153 [2026-04-21 20:38:50] Validation | Batch 610/1567 | Loss: 1.1173 [2026-04-21 20:38:52] Validation | Batch 620/1567 | Loss: 1.1148 [2026-04-21 20:38:53] Validation | Batch 630/1567 | Loss: 1.1169 [2026-04-21 20:38:54] Validation | Batch 640/1567 | Loss: 1.1174 [2026-04-21 20:38:56] Validation | Batch 650/1567 | Loss: 1.1206 [2026-04-21 20:38:57] Validation | Batch 660/1567 | Loss: 1.1218 [2026-04-21 20:38:58] Validation | Batch 670/1567 | Loss: 1.1204 [2026-04-21 20:38:59] Validation | Batch 680/1567 | Loss: 1.1188 [2026-04-21 20:39:00] Validation | Batch 690/1567 | Loss: 1.1172 [2026-04-21 20:39:01] Validation | Batch 700/1567 | Loss: 1.1170 [2026-04-21 20:39:03] Validation | Batch 710/1567 | Loss: 1.1163 [2026-04-21 20:39:04] Validation | Batch 720/1567 | Loss: 1.1131 [2026-04-21 20:39:05] Validation | Batch 730/1567 | Loss: 1.1133 [2026-04-21 20:39:06] Validation | Batch 740/1567 | Loss: 1.1140 [2026-04-21 20:39:07] Validation | Batch 750/1567 | Loss: 1.1132 [2026-04-21 20:39:08] Validation | Batch 760/1567 | Loss: 1.1149 [2026-04-21 20:39:10] Validation | Batch 770/1567 | Loss: 1.1143 [2026-04-21 20:39:11] Validation | Batch 780/1567 | Loss: 1.1151 [2026-04-21 20:39:12] Validation | Batch 790/1567 | Loss: 1.1135 [2026-04-21 20:39:13] Validation | Batch 800/1567 | Loss: 1.1116 [2026-04-21 20:39:14] Validation | Batch 810/1567 | Loss: 1.1119 [2026-04-21 20:39:15] Validation | Batch 820/1567 | Loss: 1.1111 [2026-04-21 20:39:16] Validation | Batch 830/1567 | Loss: 1.1112 [2026-04-21 20:39:17] Validation | Batch 840/1567 | Loss: 1.1111 [2026-04-21 20:39:18] Validation | Batch 850/1567 | Loss: 1.1118 [2026-04-21 20:39:19] Validation | Batch 860/1567 | Loss: 1.1123 [2026-04-21 20:39:20] Validation | Batch 870/1567 | Loss: 1.1127 [2026-04-21 20:39:21] Validation | Batch 880/1567 | Loss: 1.1124 [2026-04-21 20:39:23] Validation | Batch 890/1567 | Loss: 1.1139 [2026-04-21 20:39:24] Validation | Batch 900/1567 | Loss: 1.1134 [2026-04-21 20:39:25] Validation | Batch 910/1567 | Loss: 1.1130 [2026-04-21 20:39:26] Validation | Batch 920/1567 | Loss: 1.1146 [2026-04-21 20:39:27] Validation | Batch 930/1567 | Loss: 1.1139 [2026-04-21 20:39:28] Validation | Batch 940/1567 | Loss: 1.1139 [2026-04-21 20:39:29] Validation | Batch 950/1567 | Loss: 1.1127 [2026-04-21 20:39:30] Validation | Batch 960/1567 | Loss: 1.1131 [2026-04-21 20:39:31] Validation | Batch 970/1567 | Loss: 1.1139 [2026-04-21 20:39:32] Validation | Batch 980/1567 | Loss: 1.1132 [2026-04-21 20:39:33] Validation | Batch 990/1567 | Loss: 1.1147 [2026-04-21 20:39:35] Validation | Batch 1000/1567 | Loss: 1.1150 [2026-04-21 20:39:36] Validation | Batch 1010/1567 | Loss: 1.1138 [2026-04-21 20:39:37] Validation | Batch 1020/1567 | Loss: 1.1147 [2026-04-21 20:39:38] Validation | Batch 1030/1567 | Loss: 1.1151 [2026-04-21 20:39:40] Validation | Batch 1040/1567 | Loss: 1.1147 [2026-04-21 20:39:41] Validation | Batch 1050/1567 | Loss: 1.1139 [2026-04-21 20:39:42] Validation | Batch 1060/1567 | Loss: 1.1151 [2026-04-21 20:39:43] Validation | Batch 1070/1567 | Loss: 1.1158 [2026-04-21 20:39:44] Validation | Batch 1080/1567 | Loss: 1.1173 [2026-04-21 20:39:46] Validation | Batch 1090/1567 | Loss: 1.1199 [2026-04-21 20:39:47] Validation | Batch 1100/1567 | Loss: 1.1212 [2026-04-21 20:39:48] Validation | Batch 1110/1567 | Loss: 1.1206 [2026-04-21 20:39:49] Validation | Batch 1120/1567 | Loss: 1.1207 [2026-04-21 20:39:50] Validation | Batch 1130/1567 | Loss: 1.1190 [2026-04-21 20:39:51] Validation | Batch 1140/1567 | Loss: 1.1194 [2026-04-21 20:39:53] Validation | Batch 1150/1567 | Loss: 1.1180 [2026-04-21 20:39:53] Validation | Batch 1160/1567 | Loss: 1.1175 [2026-04-21 20:39:55] Validation | Batch 1170/1567 | Loss: 1.1174 [2026-04-21 20:39:56] Validation | Batch 1180/1567 | Loss: 1.1179 [2026-04-21 20:39:57] Validation | Batch 1190/1567 | Loss: 1.1182 [2026-04-21 20:39:58] Validation | Batch 1200/1567 | Loss: 1.1171 [2026-04-21 20:40:00] Validation | Batch 1210/1567 | Loss: 1.1161 [2026-04-21 20:40:00] Validation | Batch 1220/1567 | Loss: 1.1171 [2026-04-21 20:40:02] Validation | Batch 1230/1567 | Loss: 1.1176 [2026-04-21 20:40:03] Validation | Batch 1240/1567 | Loss: 1.1176 [2026-04-21 20:40:04] Validation | Batch 1250/1567 | Loss: 1.1179 [2026-04-21 20:40:05] Validation | Batch 1260/1567 | Loss: 1.1173 [2026-04-21 20:40:07] Validation | Batch 1270/1567 | Loss: 1.1158 [2026-04-21 20:40:08] Validation | Batch 1280/1567 | Loss: 1.1157 [2026-04-21 20:40:10] Validation | Batch 1290/1567 | Loss: 1.1155 [2026-04-21 20:40:11] Validation | Batch 1300/1567 | Loss: 1.1161 [2026-04-21 20:40:12] Validation | Batch 1310/1567 | Loss: 1.1166 [2026-04-21 20:40:13] Validation | Batch 1320/1567 | Loss: 1.1171 [2026-04-21 20:40:14] Validation | Batch 1330/1567 | Loss: 1.1182 [2026-04-21 20:40:15] Validation | Batch 1340/1567 | Loss: 1.1179 [2026-04-21 20:40:16] Validation | Batch 1350/1567 | Loss: 1.1180 [2026-04-21 20:40:17] Validation | Batch 1360/1567 | Loss: 1.1171 [2026-04-21 20:40:19] Validation | Batch 1370/1567 | Loss: 1.1167 [2026-04-21 20:40:20] Validation | Batch 1380/1567 | Loss: 1.1166 [2026-04-21 20:40:21] Validation | Batch 1390/1567 | Loss: 1.1158 [2026-04-21 20:40:22] Validation | Batch 1400/1567 | Loss: 1.1160 [2026-04-21 20:40:23] Validation | Batch 1410/1567 | Loss: 1.1165 [2026-04-21 20:40:24] Validation | Batch 1420/1567 | Loss: 1.1165 [2026-04-21 20:40:25] Validation | Batch 1430/1567 | Loss: 1.1170 [2026-04-21 20:40:27] Validation | Batch 1440/1567 | Loss: 1.1177 [2026-04-21 20:40:27] Validation | Batch 1450/1567 | Loss: 1.1177 [2026-04-21 20:40:28] Validation | Batch 1460/1567 | Loss: 1.1170 [2026-04-21 20:40:29] Validation | Batch 1470/1567 | Loss: 1.1167 [2026-04-21 20:40:30] Validation | Batch 1480/1567 | Loss: 1.1163 [2026-04-21 20:40:31] Validation | Batch 1490/1567 | Loss: 1.1156 [2026-04-21 20:40:33] Validation | Batch 1500/1567 | Loss: 1.1155 [2026-04-21 20:40:34] Validation | Batch 1510/1567 | Loss: 1.1146 [2026-04-21 20:40:34] Validation | Batch 1520/1567 | Loss: 1.1145 [2026-04-21 20:40:35] Validation | Batch 1530/1567 | Loss: 1.1144 [2026-04-21 20:40:37] Validation | Batch 1540/1567 | Loss: 1.1147 [2026-04-21 20:40:38] Validation | Batch 1550/1567 | Loss: 1.1159 [2026-04-21 20:40:39] Validation | Batch 1560/1567 | Loss: 1.1164 [2026-04-21 20:40:40] Validation | Batch 1567/1567 | Loss: 1.1166 [2026-04-21 20:40:40] Validation | Loss: 1.1166 | PPL: 3.13 | Time: 185.59s [2026-04-21 20:40:43] New best model saved! Val loss: 1.1166 [2026-04-21 20:40:49] Epoch 1 | Step 1010 | Loss: 1.2233 | LR: 1.34e-05 [2026-04-21 20:40:54] Epoch 1 | Step 1020 | Loss: 1.2215 | LR: 1.36e-05 [2026-04-21 20:40:59] Epoch 1 | Step 1030 | Loss: 1.2197 | LR: 1.37e-05 [2026-04-21 20:41:04] Epoch 1 | Step 1040 | Loss: 1.2176 | LR: 1.38e-05 [2026-04-21 20:41:10] Epoch 1 | Step 1050 | Loss: 1.2156 | LR: 1.39e-05 [2026-04-21 20:41:15] Epoch 1 | Step 1060 | Loss: 1.2149 | LR: 1.40e-05 [2026-04-21 20:41:20] Epoch 1 | Step 1070 | Loss: 1.2140 | LR: 1.41e-05 [2026-04-21 20:41:25] Epoch 1 | Step 1080 | Loss: 1.2132 | LR: 1.42e-05 [2026-04-21 20:41:30] Epoch 1 | Step 1090 | Loss: 1.2129 | LR: 1.44e-05 [2026-04-21 20:41:35] Epoch 1 | Step 1100 | Loss: 1.2103 | LR: 1.45e-05 [2026-04-21 20:41:40] Epoch 1 | Step 1110 | Loss: 1.2092 | LR: 1.46e-05 [2026-04-21 20:41:45] Epoch 1 | Step 1120 | Loss: 1.2072 | LR: 1.47e-05 [2026-04-21 20:41:50] Epoch 1 | Step 1130 | Loss: 1.2069 | LR: 1.48e-05 [2026-04-21 20:41:55] Epoch 1 | Step 1140 | Loss: 1.2059 | LR: 1.49e-05 [2026-04-21 20:42:01] Epoch 1 | Step 1150 | Loss: 1.2052 | LR: 1.50e-05 [2026-04-21 20:42:06] Epoch 1 | Step 1160 | Loss: 1.2044 | LR: 1.51e-05 [2026-04-21 20:42:11] Epoch 1 | Step 1170 | Loss: 1.2029 | LR: 1.53e-05 [2026-04-21 20:42:17] Epoch 1 | Step 1180 | Loss: 1.2027 | LR: 1.54e-05 [2026-04-21 20:42:22] Epoch 1 | Step 1190 | Loss: 1.2013 | LR: 1.55e-05 [2026-04-21 20:42:27] Epoch 1 | Step 1200 | Loss: 1.1998 | LR: 1.56e-05 [2026-04-21 20:42:32] Epoch 1 | Step 1210 | Loss: 1.1990 | LR: 1.57e-05 [2026-04-21 20:42:37] Epoch 1 | Step 1220 | Loss: 1.1978 | LR: 1.58e-05 [2026-04-21 20:42:42] Epoch 1 | Step 1230 | Loss: 1.1967 | LR: 1.59e-05 [2026-04-21 20:42:48] Epoch 1 | Step 1240 | Loss: 1.1961 | LR: 1.61e-05 [2026-04-21 20:42:52] Epoch 1 | Step 1250 | Loss: 1.1979 | LR: 1.62e-05 [2026-04-21 20:42:57] Epoch 1 | Step 1260 | Loss: 1.1954 | LR: 1.63e-05 [2026-04-21 20:43:03] Epoch 1 | Step 1270 | Loss: 1.1935 | LR: 1.64e-05 [2026-04-21 20:43:07] Epoch 1 | Step 1280 | Loss: 1.1920 | LR: 1.65e-05 [2026-04-21 20:43:12] Epoch 1 | Step 1290 | Loss: 1.1904 | LR: 1.66e-05 [2026-04-21 20:43:18] Epoch 1 | Step 1300 | Loss: 1.1897 | LR: 1.67e-05 [2026-04-21 20:43:23] Epoch 1 | Step 1310 | Loss: 1.1877 | LR: 1.68e-05 [2026-04-21 20:43:28] Epoch 1 | Step 1320 | Loss: 1.1869 | LR: 1.70e-05 [2026-04-21 20:43:33] Epoch 1 | Step 1330 | Loss: 1.1852 | LR: 1.71e-05 [2026-04-21 20:43:39] Epoch 1 | Step 1340 | Loss: 1.1851 | LR: 1.72e-05 [2026-04-21 20:43:45] Epoch 1 | Step 1350 | Loss: 1.1849 | LR: 1.73e-05 [2026-04-21 20:43:50] Epoch 1 | Step 1360 | Loss: 1.1827 | LR: 1.74e-05 [2026-04-21 20:43:55] Epoch 1 | Step 1370 | Loss: 1.1809 | LR: 1.75e-05 [2026-04-21 20:44:01] Epoch 1 | Step 1380 | Loss: 1.1806 | LR: 1.76e-05 [2026-04-21 20:44:06] Epoch 1 | Step 1390 | Loss: 1.1810 | LR: 1.78e-05 [2026-04-21 20:44:12] Epoch 1 | Step 1400 | Loss: 1.1799 | LR: 1.79e-05 [2026-04-21 20:44:18] Epoch 1 | Step 1410 | Loss: 1.1797 | LR: 1.80e-05 [2026-04-21 20:44:22] Epoch 1 | Step 1420 | Loss: 1.1783 | LR: 1.81e-05 [2026-04-21 20:44:28] Epoch 1 | Step 1430 | Loss: 1.1781 | LR: 1.82e-05 [2026-04-21 20:44:33] Epoch 1 | Step 1440 | Loss: 1.1769 | LR: 1.83e-05 [2026-04-21 20:44:39] Epoch 1 | Step 1450 | Loss: 1.1751 | LR: 1.84e-05 [2026-04-21 20:44:44] Epoch 1 | Step 1460 | Loss: 1.1741 | LR: 1.85e-05 [2026-04-21 20:44:48] Epoch 1 | Step 1470 | Loss: 1.1735 | LR: 1.87e-05 [2026-04-21 20:44:53] Epoch 1 | Step 1480 | Loss: 1.1723 | LR: 1.88e-05 [2026-04-21 20:44:58] Epoch 1 | Step 1490 | Loss: 1.1725 | LR: 1.89e-05 [2026-04-21 20:45:03] Epoch 1 | Step 1500 | Loss: 1.1727 | LR: 1.90e-05 [2026-04-21 20:45:08] Epoch 1 | Step 1510 | Loss: 1.1722 | LR: 1.91e-05 [2026-04-21 20:45:14] Epoch 1 | Step 1520 | Loss: 1.1720 | LR: 1.92e-05 [2026-04-21 20:45:19] Epoch 1 | Step 1530 | Loss: 1.1713 | LR: 1.93e-05 [2026-04-21 20:45:24] Epoch 1 | Step 1540 | Loss: 1.1715 | LR: 1.95e-05 [2026-04-21 20:45:29] Epoch 1 | Step 1550 | Loss: 1.1704 | LR: 1.96e-05 [2026-04-21 20:45:34] Epoch 1 | Step 1560 | Loss: 1.1707 | LR: 1.97e-05 [2026-04-21 20:45:40] Epoch 1 | Step 1570 | Loss: 1.1698 | LR: 1.98e-05 [2026-04-21 20:45:46] Epoch 1 | Step 1580 | Loss: 1.1687 | LR: 1.99e-05 [2026-04-21 20:45:51] Epoch 1 | Step 1590 | Loss: 1.1680 | LR: 2.00e-05 [2026-04-21 20:45:57] Epoch 1 | Step 1600 | Loss: 1.1675 | LR: 2.00e-05 [2026-04-21 20:46:02] Epoch 1 | Step 1610 | Loss: 1.1663 | LR: 2.00e-05 [2026-04-21 20:46:07] Epoch 1 | Step 1620 | Loss: 1.1648 | LR: 2.00e-05 [2026-04-21 20:46:13] Epoch 1 | Step 1630 | Loss: 1.1643 | LR: 2.00e-05 [2026-04-21 20:46:18] Epoch 1 | Step 1640 | Loss: 1.1643 | LR: 2.00e-05 [2026-04-21 20:46:23] Epoch 1 | Step 1650 | Loss: 1.1641 | LR: 2.00e-05 [2026-04-21 20:46:28] Epoch 1 | Step 1660 | Loss: 1.1638 | LR: 2.00e-05 [2026-04-21 20:46:33] Epoch 1 | Step 1670 | Loss: 1.1626 | LR: 2.00e-05 [2026-04-21 20:46:39] Epoch 1 | Step 1680 | Loss: 1.1631 | LR: 2.00e-05 [2026-04-21 20:46:45] Epoch 1 | Step 1690 | Loss: 1.1632 | LR: 2.00e-05 [2026-04-21 20:46:50] Epoch 1 | Step 1700 | Loss: 1.1630 | LR: 2.00e-05 [2026-04-21 20:46:56] Epoch 1 | Step 1710 | Loss: 1.1623 | LR: 2.00e-05 [2026-04-21 20:47:01] Epoch 1 | Step 1720 | Loss: 1.1623 | LR: 2.00e-05 [2026-04-21 20:47:06] Epoch 1 | Step 1730 | Loss: 1.1618 | LR: 2.00e-05 [2026-04-21 20:47:11] Epoch 1 | Step 1740 | Loss: 1.1621 | LR: 2.00e-05 [2026-04-21 20:47:17] Epoch 1 | Step 1750 | Loss: 1.1618 | LR: 2.00e-05 [2026-04-21 20:47:22] Epoch 1 | Step 1760 | Loss: 1.1616 | LR: 2.00e-05 [2026-04-21 20:47:27] Epoch 1 | Step 1770 | Loss: 1.1606 | LR: 2.00e-05 [2026-04-21 20:47:32] Epoch 1 | Step 1780 | Loss: 1.1607 | LR: 2.00e-05 [2026-04-21 20:47:37] Epoch 1 | Step 1790 | Loss: 1.1608 | LR: 2.00e-05 [2026-04-21 20:47:43] Epoch 1 | Step 1800 | Loss: 1.1606 | LR: 2.00e-05 [2026-04-21 20:47:48] Epoch 1 | Step 1810 | Loss: 1.1606 | LR: 2.00e-05 [2026-04-21 20:47:54] Epoch 1 | Step 1820 | Loss: 1.1602 | LR: 2.00e-05 [2026-04-21 20:47:59] Epoch 1 | Step 1830 | Loss: 1.1595 | LR: 2.00e-05 [2026-04-21 20:48:04] Epoch 1 | Step 1840 | Loss: 1.1589 | LR: 2.00e-05 [2026-04-21 20:48:09] Epoch 1 | Step 1850 | Loss: 1.1586 | LR: 2.00e-05 [2026-04-21 20:48:14] Epoch 1 | Step 1860 | Loss: 1.1589 | LR: 2.00e-05 [2026-04-21 20:48:20] Epoch 1 | Step 1870 | Loss: 1.1584 | LR: 2.00e-05 [2026-04-21 20:48:26] Epoch 1 | Step 1880 | Loss: 1.1591 | LR: 2.00e-05 [2026-04-21 20:48:32] Epoch 1 | Step 1890 | Loss: 1.1595 | LR: 2.00e-05 [2026-04-21 20:48:37] Epoch 1 | Step 1900 | Loss: 1.1592 | LR: 2.00e-05 [2026-04-21 20:48:42] Epoch 1 | Step 1910 | Loss: 1.1592 | LR: 2.00e-05 [2026-04-21 20:48:48] Epoch 1 | Step 1920 | Loss: 1.1579 | LR: 2.00e-05 [2026-04-21 20:48:53] Epoch 1 | Step 1930 | Loss: 1.1579 | LR: 2.00e-05 [2026-04-21 20:48:59] Epoch 1 | Step 1940 | Loss: 1.1570 | LR: 2.00e-05 [2026-04-21 20:49:05] Epoch 1 | Step 1950 | Loss: 1.1567 | LR: 2.00e-05 [2026-04-21 20:49:10] Epoch 1 | Step 1960 | Loss: 1.1575 | LR: 2.00e-05 [2026-04-21 20:49:15] Epoch 1 | Step 1970 | Loss: 1.1566 | LR: 2.00e-05 [2026-04-21 20:49:20] Epoch 1 | Step 1980 | Loss: 1.1564 | LR: 2.00e-05 [2026-04-21 20:49:25] Epoch 1 | Step 1990 | Loss: 1.1562 | LR: 2.00e-05 [2026-04-21 20:49:30] Epoch 1 | Step 2000 | Loss: 1.1558 | LR: 2.00e-05 [2026-04-21 20:49:32] Validation | Batch 10/1567 | Loss: 1.0736 [2026-04-21 20:49:33] Validation | Batch 20/1567 | Loss: 1.1529 [2026-04-21 20:49:34] Validation | Batch 30/1567 | Loss: 1.1169 [2026-04-21 20:49:36] Validation | Batch 40/1567 | Loss: 1.1436 [2026-04-21 20:49:36] Validation | Batch 50/1567 | Loss: 1.1254 [2026-04-21 20:49:38] Validation | Batch 60/1567 | Loss: 1.1166 [2026-04-21 20:49:39] Validation | Batch 70/1567 | Loss: 1.1070 [2026-04-21 20:49:41] Validation | Batch 80/1567 | Loss: 1.1238 [2026-04-21 20:49:42] Validation | Batch 90/1567 | Loss: 1.1155 [2026-04-21 20:49:43] Validation | Batch 100/1567 | Loss: 1.0934 [2026-04-21 20:49:44] Validation | Batch 110/1567 | Loss: 1.0857 [2026-04-21 20:49:45] Validation | Batch 120/1567 | Loss: 1.0778 [2026-04-21 20:49:47] Validation | Batch 130/1567 | Loss: 1.0709 [2026-04-21 20:49:48] Validation | Batch 140/1567 | Loss: 1.0811 [2026-04-21 20:49:49] Validation | Batch 150/1567 | Loss: 1.0914 [2026-04-21 20:49:50] Validation | Batch 160/1567 | Loss: 1.0893 [2026-04-21 20:49:51] Validation | Batch 170/1567 | Loss: 1.0822 [2026-04-21 20:49:52] Validation | Batch 180/1567 | Loss: 1.0832 [2026-04-21 20:49:53] Validation | Batch 190/1567 | Loss: 1.0879 [2026-04-21 20:49:55] Validation | Batch 200/1567 | Loss: 1.0924 [2026-04-21 20:49:56] Validation | Batch 210/1567 | Loss: 1.0922 [2026-04-21 20:49:57] Validation | Batch 220/1567 | Loss: 1.0964 [2026-04-21 20:49:59] Validation | Batch 230/1567 | Loss: 1.1002 [2026-04-21 20:50:00] Validation | Batch 240/1567 | Loss: 1.1054 [2026-04-21 20:50:01] Validation | Batch 250/1567 | Loss: 1.1093 [2026-04-21 20:50:02] Validation | Batch 260/1567 | Loss: 1.1116 [2026-04-21 20:50:03] Validation | Batch 270/1567 | Loss: 1.1153 [2026-04-21 20:50:05] Validation | Batch 280/1567 | Loss: 1.1191 [2026-04-21 20:50:07] Validation | Batch 290/1567 | Loss: 1.1169 [2026-04-21 20:50:08] Validation | Batch 300/1567 | Loss: 1.1160 [2026-04-21 20:50:09] Validation | Batch 310/1567 | Loss: 1.1125 [2026-04-21 20:50:10] Validation | Batch 320/1567 | Loss: 1.1153 [2026-04-21 20:50:11] Validation | Batch 330/1567 | Loss: 1.1149 [2026-04-21 20:50:13] Validation | Batch 340/1567 | Loss: 1.1140 [2026-04-21 20:50:14] Validation | Batch 350/1567 | Loss: 1.1125 [2026-04-21 20:50:15] Validation | Batch 360/1567 | Loss: 1.1061 [2026-04-21 20:50:16] Validation | Batch 370/1567 | Loss: 1.1063 [2026-04-21 20:50:17] Validation | Batch 380/1567 | Loss: 1.1104 [2026-04-21 20:50:19] Validation | Batch 390/1567 | Loss: 1.1090 [2026-04-21 20:50:20] Validation | Batch 400/1567 | Loss: 1.1104 [2026-04-21 20:50:21] Validation | Batch 410/1567 | Loss: 1.1068 [2026-04-21 20:50:22] Validation | Batch 420/1567 | Loss: 1.1049 [2026-04-21 20:50:23] Validation | Batch 430/1567 | Loss: 1.1083 [2026-04-21 20:50:25] Validation | Batch 440/1567 | Loss: 1.1082 [2026-04-21 20:50:26] Validation | Batch 450/1567 | Loss: 1.1112 [2026-04-21 20:50:27] Validation | Batch 460/1567 | Loss: 1.1143 [2026-04-21 20:50:28] Validation | Batch 470/1567 | Loss: 1.1188 [2026-04-21 20:50:29] Validation | Batch 480/1567 | Loss: 1.1167 [2026-04-21 20:50:30] Validation | Batch 490/1567 | Loss: 1.1143 [2026-04-21 20:50:31] Validation | Batch 500/1567 | Loss: 1.1151 [2026-04-21 20:50:33] Validation | Batch 510/1567 | Loss: 1.1150 [2026-04-21 20:50:34] Validation | Batch 520/1567 | Loss: 1.1160 [2026-04-21 20:50:35] Validation | Batch 530/1567 | Loss: 1.1144 [2026-04-21 20:50:36] Validation | Batch 540/1567 | Loss: 1.1115 [2026-04-21 20:50:38] Validation | Batch 550/1567 | Loss: 1.1128 [2026-04-21 20:50:39] Validation | Batch 560/1567 | Loss: 1.1119 [2026-04-21 20:50:40] Validation | Batch 570/1567 | Loss: 1.1075 [2026-04-21 20:50:41] Validation | Batch 580/1567 | Loss: 1.1091 [2026-04-21 20:50:43] Validation | Batch 590/1567 | Loss: 1.1082 [2026-04-21 20:50:44] Validation | Batch 600/1567 | Loss: 1.1067 [2026-04-21 20:50:45] Validation | Batch 610/1567 | Loss: 1.1085 [2026-04-21 20:50:46] Validation | Batch 620/1567 | Loss: 1.1061 [2026-04-21 20:50:48] Validation | Batch 630/1567 | Loss: 1.1079 [2026-04-21 20:50:49] Validation | Batch 640/1567 | Loss: 1.1085 [2026-04-21 20:50:51] Validation | Batch 650/1567 | Loss: 1.1116 [2026-04-21 20:50:52] Validation | Batch 660/1567 | Loss: 1.1129 [2026-04-21 20:50:53] Validation | Batch 670/1567 | Loss: 1.1114 [2026-04-21 20:50:54] Validation | Batch 680/1567 | Loss: 1.1099 [2026-04-21 20:50:55] Validation | Batch 690/1567 | Loss: 1.1084 [2026-04-21 20:50:56] Validation | Batch 700/1567 | Loss: 1.1081 [2026-04-21 20:50:58] Validation | Batch 710/1567 | Loss: 1.1071 [2026-04-21 20:50:59] Validation | Batch 720/1567 | Loss: 1.1037 [2026-04-21 20:51:00] Validation | Batch 730/1567 | Loss: 1.1042 [2026-04-21 20:51:01] Validation | Batch 740/1567 | Loss: 1.1048 [2026-04-21 20:51:02] Validation | Batch 750/1567 | Loss: 1.1042 [2026-04-21 20:51:03] Validation | Batch 760/1567 | Loss: 1.1060 [2026-04-21 20:51:05] Validation | Batch 770/1567 | Loss: 1.1055 [2026-04-21 20:51:06] Validation | Batch 780/1567 | Loss: 1.1064 [2026-04-21 20:51:07] Validation | Batch 790/1567 | Loss: 1.1046 [2026-04-21 20:51:08] Validation | Batch 800/1567 | Loss: 1.1026 [2026-04-21 20:51:09] Validation | Batch 810/1567 | Loss: 1.1032 [2026-04-21 20:51:10] Validation | Batch 820/1567 | Loss: 1.1028 [2026-04-21 20:51:11] Validation | Batch 830/1567 | Loss: 1.1028 [2026-04-21 20:51:12] Validation | Batch 840/1567 | Loss: 1.1029 [2026-04-21 20:51:13] Validation | Batch 850/1567 | Loss: 1.1035 [2026-04-21 20:51:14] Validation | Batch 860/1567 | Loss: 1.1042 [2026-04-21 20:51:15] Validation | Batch 870/1567 | Loss: 1.1046 [2026-04-21 20:51:16] Validation | Batch 880/1567 | Loss: 1.1045 [2026-04-21 20:51:17] Validation | Batch 890/1567 | Loss: 1.1058 [2026-04-21 20:51:19] Validation | Batch 900/1567 | Loss: 1.1054 [2026-04-21 20:51:20] Validation | Batch 910/1567 | Loss: 1.1050 [2026-04-21 20:51:21] Validation | Batch 920/1567 | Loss: 1.1069 [2026-04-21 20:51:22] Validation | Batch 930/1567 | Loss: 1.1065 [2026-04-21 20:51:23] Validation | Batch 940/1567 | Loss: 1.1064 [2026-04-21 20:51:24] Validation | Batch 950/1567 | Loss: 1.1054 [2026-04-21 20:51:25] Validation | Batch 960/1567 | Loss: 1.1061 [2026-04-21 20:51:26] Validation | Batch 970/1567 | Loss: 1.1070 [2026-04-21 20:51:27] Validation | Batch 980/1567 | Loss: 1.1063 [2026-04-21 20:51:28] Validation | Batch 990/1567 | Loss: 1.1077 [2026-04-21 20:51:29] Validation | Batch 1000/1567 | Loss: 1.1081 [2026-04-21 20:51:31] Validation | Batch 1010/1567 | Loss: 1.1069 [2026-04-21 20:51:32] Validation | Batch 1020/1567 | Loss: 1.1078 [2026-04-21 20:51:33] Validation | Batch 1030/1567 | Loss: 1.1085 [2026-04-21 20:51:34] Validation | Batch 1040/1567 | Loss: 1.1081 [2026-04-21 20:51:35] Validation | Batch 1050/1567 | Loss: 1.1072 [2026-04-21 20:51:37] Validation | Batch 1060/1567 | Loss: 1.1084 [2026-04-21 20:51:38] Validation | Batch 1070/1567 | Loss: 1.1087 [2026-04-21 20:51:39] Validation | Batch 1080/1567 | Loss: 1.1099 [2026-04-21 20:51:40] Validation | Batch 1090/1567 | Loss: 1.1125 [2026-04-21 20:51:42] Validation | Batch 1100/1567 | Loss: 1.1139 [2026-04-21 20:51:43] Validation | Batch 1110/1567 | Loss: 1.1133 [2026-04-21 20:51:44] Validation | Batch 1120/1567 | Loss: 1.1136 [2026-04-21 20:51:45] Validation | Batch 1130/1567 | Loss: 1.1120 [2026-04-21 20:51:46] Validation | Batch 1140/1567 | Loss: 1.1125 [2026-04-21 20:51:48] Validation | Batch 1150/1567 | Loss: 1.1112 [2026-04-21 20:51:48] Validation | Batch 1160/1567 | Loss: 1.1105 [2026-04-21 20:51:49] Validation | Batch 1170/1567 | Loss: 1.1105 [2026-04-21 20:51:51] Validation | Batch 1180/1567 | Loss: 1.1108 [2026-04-21 20:51:52] Validation | Batch 1190/1567 | Loss: 1.1113 [2026-04-21 20:51:53] Validation | Batch 1200/1567 | Loss: 1.1102 [2026-04-21 20:51:54] Validation | Batch 1210/1567 | Loss: 1.1091 [2026-04-21 20:51:55] Validation | Batch 1220/1567 | Loss: 1.1103 [2026-04-21 20:51:57] Validation | Batch 1230/1567 | Loss: 1.1107 [2026-04-21 20:51:58] Validation | Batch 1240/1567 | Loss: 1.1107 [2026-04-21 20:51:59] Validation | Batch 1250/1567 | Loss: 1.1110 [2026-04-21 20:52:00] Validation | Batch 1260/1567 | Loss: 1.1106 [2026-04-21 20:52:02] Validation | Batch 1270/1567 | Loss: 1.1091 [2026-04-21 20:52:03] Validation | Batch 1280/1567 | Loss: 1.1090 [2026-04-21 20:52:05] Validation | Batch 1290/1567 | Loss: 1.1089 [2026-04-21 20:52:06] Validation | Batch 1300/1567 | Loss: 1.1094 [2026-04-21 20:52:07] Validation | Batch 1310/1567 | Loss: 1.1100 [2026-04-21 20:52:08] Validation | Batch 1320/1567 | Loss: 1.1108 [2026-04-21 20:52:09] Validation | Batch 1330/1567 | Loss: 1.1121 [2026-04-21 20:52:10] Validation | Batch 1340/1567 | Loss: 1.1118 [2026-04-21 20:52:11] Validation | Batch 1350/1567 | Loss: 1.1120 [2026-04-21 20:52:12] Validation | Batch 1360/1567 | Loss: 1.1111 [2026-04-21 20:52:13] Validation | Batch 1370/1567 | Loss: 1.1108 [2026-04-21 20:52:15] Validation | Batch 1380/1567 | Loss: 1.1108 [2026-04-21 20:52:16] Validation | Batch 1390/1567 | Loss: 1.1100 [2026-04-21 20:52:17] Validation | Batch 1400/1567 | Loss: 1.1101 [2026-04-21 20:52:18] Validation | Batch 1410/1567 | Loss: 1.1105 [2026-04-21 20:52:19] Validation | Batch 1420/1567 | Loss: 1.1104 [2026-04-21 20:52:20] Validation | Batch 1430/1567 | Loss: 1.1107 [2026-04-21 20:52:21] Validation | Batch 1440/1567 | Loss: 1.1114 [2026-04-21 20:52:22] Validation | Batch 1450/1567 | Loss: 1.1114 [2026-04-21 20:52:23] Validation | Batch 1460/1567 | Loss: 1.1109 [2026-04-21 20:52:24] Validation | Batch 1470/1567 | Loss: 1.1107 [2026-04-21 20:52:25] Validation | Batch 1480/1567 | Loss: 1.1103 [2026-04-21 20:52:26] Validation | Batch 1490/1567 | Loss: 1.1097 [2026-04-21 20:52:27] Validation | Batch 1500/1567 | Loss: 1.1095 [2026-04-21 20:52:28] Validation | Batch 1510/1567 | Loss: 1.1087 [2026-04-21 20:52:29] Validation | Batch 1520/1567 | Loss: 1.1085 [2026-04-21 20:52:30] Validation | Batch 1530/1567 | Loss: 1.1083 [2026-04-21 20:52:32] Validation | Batch 1540/1567 | Loss: 1.1087 [2026-04-21 20:52:33] Validation | Batch 1550/1567 | Loss: 1.1099 [2026-04-21 20:52:34] Validation | Batch 1560/1567 | Loss: 1.1102 [2026-04-21 20:52:35] Validation | Batch 1567/1567 | Loss: 1.1103 [2026-04-21 20:52:35] Validation | Loss: 1.1103 | PPL: 3.11 | Time: 184.49s [2026-04-21 20:52:52] New best model saved! Val loss: 1.1103 [2026-04-21 20:52:58] Epoch 1 | Step 2010 | Loss: 1.1551 | LR: 2.00e-05 [2026-04-21 20:53:03] Epoch 1 | Step 2020 | Loss: 1.1548 | LR: 2.00e-05 [2026-04-21 20:53:08] Epoch 1 | Step 2030 | Loss: 1.1551 | LR: 2.00e-05 [2026-04-21 20:53:13] Epoch 1 | Step 2040 | Loss: 1.1540 | LR: 2.00e-05 [2026-04-21 20:53:19] Epoch 1 | Step 2050 | Loss: 1.1534 | LR: 2.00e-05 [2026-04-21 20:53:24] Epoch 1 | Step 2060 | Loss: 1.1527 | LR: 2.00e-05 [2026-04-21 20:53:29] Epoch 1 | Step 2070 | Loss: 1.1525 | LR: 2.00e-05 [2026-04-21 20:53:35] Epoch 1 | Step 2080 | Loss: 1.1517 | LR: 2.00e-05 [2026-04-21 20:53:40] Epoch 1 | Step 2090 | Loss: 1.1506 | LR: 2.00e-05 [2026-04-21 20:53:45] Epoch 1 | Step 2100 | Loss: 1.1507 | LR: 2.00e-05 [2026-04-21 20:53:50] Epoch 1 | Step 2110 | Loss: 1.1507 | LR: 2.00e-05 [2026-04-21 20:53:55] Epoch 1 | Step 2120 | Loss: 1.1506 | LR: 2.00e-05 [2026-04-21 20:54:00] Epoch 1 | Step 2130 | Loss: 1.1505 | LR: 2.00e-05 [2026-04-21 20:54:06] Epoch 1 | Step 2140 | Loss: 1.1503 | LR: 2.00e-05 [2026-04-21 20:54:11] Epoch 1 | Step 2150 | Loss: 1.1508 | LR: 2.00e-05 [2026-04-21 20:54:17] Epoch 1 | Step 2160 | Loss: 1.1503 | LR: 2.00e-05 [2026-04-21 20:54:23] Epoch 1 | Step 2170 | Loss: 1.1511 | LR: 2.00e-05 [2026-04-21 20:54:28] Epoch 1 | Step 2180 | Loss: 1.1506 | LR: 2.00e-05 [2026-04-21 20:54:33] Epoch 1 | Step 2190 | Loss: 1.1504 | LR: 2.00e-05 [2026-04-21 20:54:39] Epoch 1 | Step 2200 | Loss: 1.1503 | LR: 2.00e-05 [2026-04-21 20:54:44] Epoch 1 | Step 2210 | Loss: 1.1500 | LR: 2.00e-05 [2026-04-21 20:54:50] Epoch 1 | Step 2220 | Loss: 1.1498 | LR: 2.00e-05 [2026-04-21 20:54:55] Epoch 1 | Step 2230 | Loss: 1.1505 | LR: 2.00e-05 [2026-04-21 20:55:00] Epoch 1 | Step 2240 | Loss: 1.1497 | LR: 2.00e-05 [2026-04-21 20:55:06] Epoch 1 | Step 2250 | Loss: 1.1492 | LR: 2.00e-05 [2026-04-21 20:55:12] Epoch 1 | Step 2260 | Loss: 1.1493 | LR: 2.00e-05 [2026-04-21 20:55:18] Epoch 1 | Step 2270 | Loss: 1.1492 | LR: 2.00e-05 [2026-04-21 20:55:23] Epoch 1 | Step 2280 | Loss: 1.1489 | LR: 2.00e-05 [2026-04-21 20:55:28] Epoch 1 | Step 2290 | Loss: 1.1482 | LR: 2.00e-05 [2026-04-21 20:55:33] Epoch 1 | Step 2300 | Loss: 1.1474 | LR: 2.00e-05 [2026-04-21 20:55:38] Epoch 1 | Step 2310 | Loss: 1.1468 | LR: 2.00e-05 [2026-04-21 20:55:43] Epoch 1 | Step 2320 | Loss: 1.1469 | LR: 2.00e-05 [2026-04-21 20:55:48] Epoch 1 | Step 2330 | Loss: 1.1464 | LR: 2.00e-05 [2026-04-21 20:55:53] Epoch 1 | Step 2340 | Loss: 1.1462 | LR: 2.00e-05 [2026-04-21 20:55:59] Epoch 1 | Step 2350 | Loss: 1.1459 | LR: 2.00e-05 [2026-04-21 20:56:04] Epoch 1 | Step 2360 | Loss: 1.1458 | LR: 2.00e-05 [2026-04-21 20:56:09] Epoch 1 | Step 2370 | Loss: 1.1458 | LR: 2.00e-05 [2026-04-21 20:56:14] Epoch 1 | Step 2380 | Loss: 1.1457 | LR: 2.00e-05 [2026-04-21 20:56:20] Epoch 1 | Step 2390 | Loss: 1.1451 | LR: 2.00e-05 [2026-04-21 20:56:25] Epoch 1 | Step 2400 | Loss: 1.1448 | LR: 2.00e-05 [2026-04-21 20:56:30] Epoch 1 | Step 2410 | Loss: 1.1446 | LR: 2.00e-05 [2026-04-21 20:56:36] Epoch 1 | Step 2420 | Loss: 1.1446 | LR: 2.00e-05 [2026-04-21 20:56:41] Epoch 1 | Step 2430 | Loss: 1.1448 | LR: 2.00e-05 [2026-04-21 20:56:46] Epoch 1 | Step 2440 | Loss: 1.1447 | LR: 2.00e-05 [2026-04-21 20:56:51] Epoch 1 | Step 2450 | Loss: 1.1440 | LR: 2.00e-05 [2026-04-21 20:56:57] Epoch 1 | Step 2460 | Loss: 1.1438 | LR: 2.00e-05 [2026-04-21 20:57:03] Epoch 1 | Step 2470 | Loss: 1.1433 | LR: 2.00e-05 [2026-04-21 20:57:07] Epoch 1 | Step 2480 | Loss: 1.1434 | LR: 2.00e-05 [2026-04-21 20:57:12] Epoch 1 | Step 2490 | Loss: 1.1435 | LR: 2.00e-05 [2026-04-21 20:57:17] Epoch 1 | Step 2500 | Loss: 1.1430 | LR: 2.00e-05 [2026-04-21 20:57:22] Epoch 1 | Step 2510 | Loss: 1.1430 | LR: 2.00e-05 [2026-04-21 20:57:28] Epoch 1 | Step 2520 | Loss: 1.1426 | LR: 2.00e-05 [2026-04-21 20:57:32] Epoch 1 | Step 2530 | Loss: 1.1434 | LR: 2.00e-05 [2026-04-21 20:57:38] Epoch 1 | Step 2540 | Loss: 1.1432 | LR: 2.00e-05 [2026-04-21 20:57:43] Epoch 1 | Step 2550 | Loss: 1.1431 | LR: 2.00e-05 [2026-04-21 20:57:48] Epoch 1 | Step 2560 | Loss: 1.1429 | LR: 2.00e-05 [2026-04-21 20:57:52] Epoch 1 | Step 2570 | Loss: 1.1428 | LR: 2.00e-05 [2026-04-21 20:57:58] Epoch 1 | Step 2580 | Loss: 1.1423 | LR: 2.00e-05 [2026-04-21 20:58:03] Epoch 1 | Step 2590 | Loss: 1.1418 | LR: 2.00e-05 [2026-04-21 20:58:07] Epoch 1 | Step 2600 | Loss: 1.1411 | LR: 2.00e-05 [2026-04-21 20:58:13] Epoch 1 | Step 2610 | Loss: 1.1409 | LR: 2.00e-05 [2026-04-21 20:58:19] Epoch 1 | Step 2620 | Loss: 1.1408 | LR: 2.00e-05 [2026-04-21 20:58:24] Epoch 1 | Step 2630 | Loss: 1.1404 | LR: 2.00e-05 [2026-04-21 20:58:29] Epoch 1 | Step 2640 | Loss: 1.1404 | LR: 2.00e-05 [2026-04-21 20:58:35] Epoch 1 | Step 2650 | Loss: 1.1397 | LR: 2.00e-05 [2026-04-21 20:58:40] Epoch 1 | Step 2660 | Loss: 1.1390 | LR: 2.00e-05 [2026-04-21 20:58:45] Epoch 1 | Step 2670 | Loss: 1.1385 | LR: 2.00e-05 [2026-04-21 20:58:51] Epoch 1 | Step 2680 | Loss: 1.1384 | LR: 2.00e-05 [2026-04-21 20:58:56] Epoch 1 | Step 2690 | Loss: 1.1382 | LR: 2.00e-05 [2026-04-21 20:59:02] Epoch 1 | Step 2700 | Loss: 1.1376 | LR: 2.00e-05 [2026-04-21 20:59:07] Epoch 1 | Step 2710 | Loss: 1.1372 | LR: 2.00e-05 [2026-04-21 20:59:12] Epoch 1 | Step 2720 | Loss: 1.1369 | LR: 2.00e-05 [2026-04-21 20:59:17] Epoch 1 | Step 2730 | Loss: 1.1366 | LR: 2.00e-05 [2026-04-21 20:59:22] Epoch 1 | Step 2740 | Loss: 1.1364 | LR: 2.00e-05 [2026-04-21 20:59:28] Epoch 1 | Step 2750 | Loss: 1.1365 | LR: 2.00e-05 [2026-04-21 20:59:34] Epoch 1 | Step 2760 | Loss: 1.1368 | LR: 2.00e-05 [2026-04-21 20:59:39] Epoch 1 | Step 2770 | Loss: 1.1367 | LR: 2.00e-05 [2026-04-21 20:59:44] Epoch 1 | Step 2780 | Loss: 1.1368 | LR: 2.00e-05 [2026-04-21 20:59:50] Epoch 1 | Step 2790 | Loss: 1.1367 | LR: 2.00e-05 [2026-04-21 20:59:55] Epoch 1 | Step 2800 | Loss: 1.1363 | LR: 2.00e-05 [2026-04-21 21:00:01] Epoch 1 | Step 2810 | Loss: 1.1361 | LR: 2.00e-05 [2026-04-21 21:00:06] Epoch 1 | Step 2820 | Loss: 1.1357 | LR: 2.00e-05 [2026-04-21 21:00:12] Epoch 1 | Step 2830 | Loss: 1.1353 | LR: 2.00e-05 [2026-04-21 21:00:17] Epoch 1 | Step 2840 | Loss: 1.1352 | LR: 2.00e-05 [2026-04-21 21:00:21] Epoch 1 | Step 2850 | Loss: 1.1355 | LR: 2.00e-05 [2026-04-21 21:00:26] Epoch 1 | Step 2860 | Loss: 1.1353 | LR: 2.00e-05 [2026-04-21 21:00:32] Epoch 1 | Step 2870 | Loss: 1.1356 | LR: 2.00e-05 [2026-04-21 21:00:38] Epoch 1 | Step 2880 | Loss: 1.1351 | LR: 2.00e-05 [2026-04-21 21:00:42] Epoch 1 | Step 2890 | Loss: 1.1344 | LR: 2.00e-05 [2026-04-21 21:00:48] Epoch 1 | Step 2900 | Loss: 1.1340 | LR: 2.00e-05 [2026-04-21 21:00:54] Epoch 1 | Step 2910 | Loss: 1.1336 | LR: 2.00e-05 [2026-04-21 21:00:58] Epoch 1 | Step 2920 | Loss: 1.1332 | LR: 2.00e-05 [2026-04-21 21:01:04] Epoch 1 | Step 2930 | Loss: 1.1327 | LR: 2.00e-05 [2026-04-21 21:01:09] Epoch 1 | Step 2940 | Loss: 1.1325 | LR: 2.00e-05 [2026-04-21 21:01:15] Epoch 1 | Step 2950 | Loss: 1.1325 | LR: 2.00e-05 [2026-04-21 21:01:20] Epoch 1 | Step 2960 | Loss: 1.1321 | LR: 2.00e-05 [2026-04-21 21:01:25] Epoch 1 | Step 2970 | Loss: 1.1318 | LR: 2.00e-05 [2026-04-21 21:01:30] Epoch 1 | Step 2980 | Loss: 1.1318 | LR: 2.00e-05 [2026-04-21 21:01:36] Epoch 1 | Step 2990 | Loss: 1.1315 | LR: 2.00e-05 [2026-04-21 21:01:42] Epoch 1 | Step 3000 | Loss: 1.1315 | LR: 2.00e-05 [2026-04-21 21:01:52] Checkpoint saved: outputs/2026-04-21/20-28-37/checkpoints/checkpoint_step_3000.pt [2026-04-21 21:02:04] Validation | Batch 10/1567 | Loss: 1.0717 [2026-04-21 21:02:05] Validation | Batch 20/1567 | Loss: 1.1566 [2026-04-21 21:02:07] Validation | Batch 30/1567 | Loss: 1.1190 [2026-04-21 21:02:08] Validation | Batch 40/1567 | Loss: 1.1487 [2026-04-21 21:02:09] Validation | Batch 50/1567 | Loss: 1.1283 [2026-04-21 21:02:10] Validation | Batch 60/1567 | Loss: 1.1174 [2026-04-21 21:02:11] Validation | Batch 70/1567 | Loss: 1.1043 [2026-04-21 21:02:13] Validation | Batch 80/1567 | Loss: 1.1186 [2026-04-21 21:02:14] Validation | Batch 90/1567 | Loss: 1.1113 [2026-04-21 21:02:16] Validation | Batch 100/1567 | Loss: 1.0880 [2026-04-21 21:02:17] Validation | Batch 110/1567 | Loss: 1.0784 [2026-04-21 21:02:18] Validation | Batch 120/1567 | Loss: 1.0698 [2026-04-21 21:02:19] Validation | Batch 130/1567 | Loss: 1.0635 [2026-04-21 21:02:20] Validation | Batch 140/1567 | Loss: 1.0749 [2026-04-21 21:02:21] Validation | Batch 150/1567 | Loss: 1.0852 [2026-04-21 21:02:23] Validation | Batch 160/1567 | Loss: 1.0834 [2026-04-21 21:02:24] Validation | Batch 170/1567 | Loss: 1.0754 [2026-04-21 21:02:25] Validation | Batch 180/1567 | Loss: 1.0780 [2026-04-21 21:02:26] Validation | Batch 190/1567 | Loss: 1.0835 [2026-04-21 21:02:28] Validation | Batch 200/1567 | Loss: 1.0874 [2026-04-21 21:02:29] Validation | Batch 210/1567 | Loss: 1.0878 [2026-04-21 21:02:30] Validation | Batch 220/1567 | Loss: 1.0921 [2026-04-21 21:02:32] Validation | Batch 230/1567 | Loss: 1.0964 [2026-04-21 21:02:33] Validation | Batch 240/1567 | Loss: 1.1009 [2026-04-21 21:02:34] Validation | Batch 250/1567 | Loss: 1.1046 [2026-04-21 21:02:35] Validation | Batch 260/1567 | Loss: 1.1065 [2026-04-21 21:02:36] Validation | Batch 270/1567 | Loss: 1.1106 [2026-04-21 21:02:38] Validation | Batch 280/1567 | Loss: 1.1137 [2026-04-21 21:02:40] Validation | Batch 290/1567 | Loss: 1.1106 [2026-04-21 21:02:41] Validation | Batch 300/1567 | Loss: 1.1091 [2026-04-21 21:02:42] Validation | Batch 310/1567 | Loss: 1.1056 [2026-04-21 21:02:43] Validation | Batch 320/1567 | Loss: 1.1078 [2026-04-21 21:02:45] Validation | Batch 330/1567 | Loss: 1.1079 [2026-04-21 21:02:46] Validation | Batch 340/1567 | Loss: 1.1070 [2026-04-21 21:02:47] Validation | Batch 350/1567 | Loss: 1.1048 [2026-04-21 21:02:48] Validation | Batch 360/1567 | Loss: 1.0986 [2026-04-21 21:02:50] Validation | Batch 370/1567 | Loss: 1.0994 [2026-04-21 21:02:51] Validation | Batch 380/1567 | Loss: 1.1035 [2026-04-21 21:02:52] Validation | Batch 390/1567 | Loss: 1.1025 [2026-04-21 21:02:53] Validation | Batch 400/1567 | Loss: 1.1038 [2026-04-21 21:02:55] Validation | Batch 410/1567 | Loss: 1.1001 [2026-04-21 21:02:56] Validation | Batch 420/1567 | Loss: 1.0985 [2026-04-21 21:02:57] Validation | Batch 430/1567 | Loss: 1.1028 [2026-04-21 21:02:58] Validation | Batch 440/1567 | Loss: 1.1029 [2026-04-21 21:02:59] Validation | Batch 450/1567 | Loss: 1.1058 [2026-04-21 21:03:00] Validation | Batch 460/1567 | Loss: 1.1093 [2026-04-21 21:03:01] Validation | Batch 470/1567 | Loss: 1.1136 [2026-04-21 21:03:03] Validation | Batch 480/1567 | Loss: 1.1110 [2026-04-21 21:03:04] Validation | Batch 490/1567 | Loss: 1.1089 [2026-04-21 21:03:05] Validation | Batch 500/1567 | Loss: 1.1101 [2026-04-21 21:03:06] Validation | Batch 510/1567 | Loss: 1.1100 [2026-04-21 21:03:07] Validation | Batch 520/1567 | Loss: 1.1113 [2026-04-21 21:03:08] Validation | Batch 530/1567 | Loss: 1.1094 [2026-04-21 21:03:10] Validation | Batch 540/1567 | Loss: 1.1065 [2026-04-21 21:03:11] Validation | Batch 550/1567 | Loss: 1.1080 [2026-04-21 21:03:12] Validation | Batch 560/1567 | Loss: 1.1071 [2026-04-21 21:03:13] Validation | Batch 570/1567 | Loss: 1.1028 [2026-04-21 21:03:15] Validation | Batch 580/1567 | Loss: 1.1045 [2026-04-21 21:03:16] Validation | Batch 590/1567 | Loss: 1.1040 [2026-04-21 21:03:17] Validation | Batch 600/1567 | Loss: 1.1025 [2026-04-21 21:03:18] Validation | Batch 610/1567 | Loss: 1.1045 [2026-04-21 21:03:20] Validation | Batch 620/1567 | Loss: 1.1021 [2026-04-21 21:03:21] Validation | Batch 630/1567 | Loss: 1.1037 [2026-04-21 21:03:23] Validation | Batch 640/1567 | Loss: 1.1041 [2026-04-21 21:03:25] Validation | Batch 650/1567 | Loss: 1.1073 [2026-04-21 21:03:26] Validation | Batch 660/1567 | Loss: 1.1089 [2026-04-21 21:03:27] Validation | Batch 670/1567 | Loss: 1.1074 [2026-04-21 21:03:28] Validation | Batch 680/1567 | Loss: 1.1056 [2026-04-21 21:03:29] Validation | Batch 690/1567 | Loss: 1.1039 [2026-04-21 21:03:31] Validation | Batch 700/1567 | Loss: 1.1035 [2026-04-21 21:03:32] Validation | Batch 710/1567 | Loss: 1.1028 [2026-04-21 21:03:33] Validation | Batch 720/1567 | Loss: 1.0997 [2026-04-21 21:03:34] Validation | Batch 730/1567 | Loss: 1.1002 [2026-04-21 21:03:35] Validation | Batch 740/1567 | Loss: 1.1007 [2026-04-21 21:03:36] Validation | Batch 750/1567 | Loss: 1.1000 [2026-04-21 21:03:37] Validation | Batch 760/1567 | Loss: 1.1015 [2026-04-21 21:03:39] Validation | Batch 770/1567 | Loss: 1.1010 [2026-04-21 21:03:40] Validation | Batch 780/1567 | Loss: 1.1021 [2026-04-21 21:03:41] Validation | Batch 790/1567 | Loss: 1.1003 [2026-04-21 21:03:42] Validation | Batch 800/1567 | Loss: 1.0985 [2026-04-21 21:03:43] Validation | Batch 810/1567 | Loss: 1.0986 [2026-04-21 21:03:44] Validation | Batch 820/1567 | Loss: 1.0982 [2026-04-21 21:03:45] Validation | Batch 830/1567 | Loss: 1.0981 [2026-04-21 21:03:46] Validation | Batch 840/1567 | Loss: 1.0982 [2026-04-21 21:03:47] Validation | Batch 850/1567 | Loss: 1.0989 [2026-04-21 21:03:48] Validation | Batch 860/1567 | Loss: 1.0995 [2026-04-21 21:03:49] Validation | Batch 870/1567 | Loss: 1.1001 [2026-04-21 21:03:50] Validation | Batch 880/1567 | Loss: 1.1000 [2026-04-21 21:03:52] Validation | Batch 890/1567 | Loss: 1.1011 [2026-04-21 21:03:53] Validation | Batch 900/1567 | Loss: 1.1005 [2026-04-21 21:03:54] Validation | Batch 910/1567 | Loss: 1.1000 [2026-04-21 21:03:55] Validation | Batch 920/1567 | Loss: 1.1014 [2026-04-21 21:03:56] Validation | Batch 930/1567 | Loss: 1.1012 [2026-04-21 21:03:57] Validation | Batch 940/1567 | Loss: 1.1012 [2026-04-21 21:03:59] Validation | Batch 950/1567 | Loss: 1.1003 [2026-04-21 21:04:00] Validation | Batch 960/1567 | Loss: 1.1009 [2026-04-21 21:04:01] Validation | Batch 970/1567 | Loss: 1.1016 [2026-04-21 21:04:02] Validation | Batch 980/1567 | Loss: 1.1011 [2026-04-21 21:04:02] Validation | Batch 990/1567 | Loss: 1.1022 [2026-04-21 21:04:04] Validation | Batch 1000/1567 | Loss: 1.1028 [2026-04-21 21:04:05] Validation | Batch 1010/1567 | Loss: 1.1018 [2026-04-21 21:04:06] Validation | Batch 1020/1567 | Loss: 1.1030 [2026-04-21 21:04:07] Validation | Batch 1030/1567 | Loss: 1.1036 [2026-04-21 21:04:09] Validation | Batch 1040/1567 | Loss: 1.1031 [2026-04-21 21:04:10] Validation | Batch 1050/1567 | Loss: 1.1023 [2026-04-21 21:04:11] Validation | Batch 1060/1567 | Loss: 1.1033 [2026-04-21 21:04:12] Validation | Batch 1070/1567 | Loss: 1.1036 [2026-04-21 21:04:14] Validation | Batch 1080/1567 | Loss: 1.1049 [2026-04-21 21:04:15] Validation | Batch 1090/1567 | Loss: 1.1075 [2026-04-21 21:04:16] Validation | Batch 1100/1567 | Loss: 1.1090 [2026-04-21 21:04:17] Validation | Batch 1110/1567 | Loss: 1.1083 [2026-04-21 21:04:18] Validation | Batch 1120/1567 | Loss: 1.1085 [2026-04-21 21:04:19] Validation | Batch 1130/1567 | Loss: 1.1069 [2026-04-21 21:04:20] Validation | Batch 1140/1567 | Loss: 1.1073 [2026-04-21 21:04:22] Validation | Batch 1150/1567 | Loss: 1.1060 [2026-04-21 21:04:23] Validation | Batch 1160/1567 | Loss: 1.1053 [2026-04-21 21:04:24] Validation | Batch 1170/1567 | Loss: 1.1054 [2026-04-21 21:04:25] Validation | Batch 1180/1567 | Loss: 1.1057 [2026-04-21 21:04:26] Validation | Batch 1190/1567 | Loss: 1.1061 [2026-04-21 21:04:28] Validation | Batch 1200/1567 | Loss: 1.1049 [2026-04-21 21:04:29] Validation | Batch 1210/1567 | Loss: 1.1039 [2026-04-21 21:04:30] Validation | Batch 1220/1567 | Loss: 1.1049 [2026-04-21 21:04:31] Validation | Batch 1230/1567 | Loss: 1.1054 [2026-04-21 21:04:32] Validation | Batch 1240/1567 | Loss: 1.1053 [2026-04-21 21:04:33] Validation | Batch 1250/1567 | Loss: 1.1056 [2026-04-21 21:04:35] Validation | Batch 1260/1567 | Loss: 1.1051 [2026-04-21 21:04:36] Validation | Batch 1270/1567 | Loss: 1.1036 [2026-04-21 21:04:37] Validation | Batch 1280/1567 | Loss: 1.1036 [2026-04-21 21:04:39] Validation | Batch 1290/1567 | Loss: 1.1035 [2026-04-21 21:04:40] Validation | Batch 1300/1567 | Loss: 1.1039 [2026-04-21 21:04:41] Validation | Batch 1310/1567 | Loss: 1.1046 [2026-04-21 21:04:42] Validation | Batch 1320/1567 | Loss: 1.1053 [2026-04-21 21:04:43] Validation | Batch 1330/1567 | Loss: 1.1065 [2026-04-21 21:04:44] Validation | Batch 1340/1567 | Loss: 1.1062 [2026-04-21 21:04:45] Validation | Batch 1350/1567 | Loss: 1.1065 [2026-04-21 21:04:46] Validation | Batch 1360/1567 | Loss: 1.1056 [2026-04-21 21:04:48] Validation | Batch 1370/1567 | Loss: 1.1052 [2026-04-21 21:04:49] Validation | Batch 1380/1567 | Loss: 1.1052 [2026-04-21 21:04:50] Validation | Batch 1390/1567 | Loss: 1.1043 [2026-04-21 21:04:51] Validation | Batch 1400/1567 | Loss: 1.1043 [2026-04-21 21:04:52] Validation | Batch 1410/1567 | Loss: 1.1047 [2026-04-21 21:04:53] Validation | Batch 1420/1567 | Loss: 1.1043 [2026-04-21 21:04:54] Validation | Batch 1430/1567 | Loss: 1.1047 [2026-04-21 21:04:56] Validation | Batch 1440/1567 | Loss: 1.1054 [2026-04-21 21:04:57] Validation | Batch 1450/1567 | Loss: 1.1055 [2026-04-21 21:04:57] Validation | Batch 1460/1567 | Loss: 1.1048 [2026-04-21 21:04:58] Validation | Batch 1470/1567 | Loss: 1.1045 [2026-04-21 21:04:59] Validation | Batch 1480/1567 | Loss: 1.1041 [2026-04-21 21:05:00] Validation | Batch 1490/1567 | Loss: 1.1035 [2026-04-21 21:05:02] Validation | Batch 1500/1567 | Loss: 1.1033 [2026-04-21 21:05:03] Validation | Batch 1510/1567 | Loss: 1.1023 [2026-04-21 21:05:04] Validation | Batch 1520/1567 | Loss: 1.1022 [2026-04-21 21:05:04] Validation | Batch 1530/1567 | Loss: 1.1021 [2026-04-21 21:05:06] Validation | Batch 1540/1567 | Loss: 1.1026 [2026-04-21 21:05:07] Validation | Batch 1550/1567 | Loss: 1.1037 [2026-04-21 21:05:08] Validation | Batch 1560/1567 | Loss: 1.1039 [2026-04-21 21:05:09] Validation | Batch 1567/1567 | Loss: 1.1039 [2026-04-21 21:05:09] Validation | Loss: 1.1039 | PPL: 3.09 | Time: 186.14s [2026-04-21 21:05:26] New best model saved! Val loss: 1.1039 [2026-04-21 21:05:32] Epoch 1 | Step 3010 | Loss: 1.1311 | LR: 2.00e-05 [2026-04-21 21:05:37] Epoch 1 | Step 3020 | Loss: 1.1308 | LR: 2.00e-05 [2026-04-21 21:05:42] Epoch 1 | Step 3030 | Loss: 1.1306 | LR: 2.00e-05 [2026-04-21 21:05:48] Epoch 1 | Step 3040 | Loss: 1.1304 | LR: 2.00e-05 [2026-04-21 21:05:53] Epoch 1 | Step 3050 | Loss: 1.1299 | LR: 2.00e-05 [2026-04-21 21:05:58] Epoch 1 | Step 3060 | Loss: 1.1295 | LR: 2.00e-05 [2026-04-21 21:06:03] Epoch 1 | Step 3070 | Loss: 1.1292 | LR: 2.00e-05 [2026-04-21 21:06:09] Epoch 1 | Step 3080 | Loss: 1.1292 | LR: 2.00e-05 [2026-04-21 21:06:14] Epoch 1 | Step 3090 | Loss: 1.1290 | LR: 2.00e-05 [2026-04-21 21:06:20] Epoch 1 | Step 3100 | Loss: 1.1289 | LR: 2.00e-05 [2026-04-21 21:06:26] Epoch 1 | Step 3110 | Loss: 1.1286 | LR: 2.00e-05 [2026-04-21 21:06:32] Epoch 1 | Step 3120 | Loss: 1.1280 | LR: 2.00e-05 [2026-04-21 21:06:37] Epoch 1 | Step 3130 | Loss: 1.1280 | LR: 2.00e-05 [2026-04-21 21:06:42] Epoch 1 | Step 3140 | Loss: 1.1276 | LR: 2.00e-05 [2026-04-21 21:06:48] Epoch 1 | Step 3150 | Loss: 1.1274 | LR: 2.00e-05 [2026-04-21 21:06:53] Epoch 1 | Step 3160 | Loss: 1.1273 | LR: 2.00e-05 [2026-04-21 21:06:59] Epoch 1 | Step 3170 | Loss: 1.1269 | LR: 2.00e-05 [2026-04-21 21:07:04] Epoch 1 | Step 3180 | Loss: 1.1263 | LR: 2.00e-05 [2026-04-21 21:07:09] Epoch 1 | Step 3190 | Loss: 1.1259 | LR: 2.00e-05 [2026-04-21 21:07:14] Epoch 1 | Step 3200 | Loss: 1.1259 | LR: 2.00e-05 [2026-04-21 21:07:19] Epoch 1 | Step 3210 | Loss: 1.1259 | LR: 2.00e-05 [2026-04-21 21:07:24] Epoch 1 | Step 3220 | Loss: 1.1257 | LR: 2.00e-05 [2026-04-21 21:07:30] Epoch 1 | Step 3230 | Loss: 1.1255 | LR: 2.00e-05 [2026-04-21 21:07:36] Epoch 1 | Step 3240 | Loss: 1.1250 | LR: 2.00e-05 [2026-04-21 21:07:42] Epoch 1 | Step 3250 | Loss: 1.1251 | LR: 2.00e-05 [2026-04-21 21:07:47] Epoch 1 | Step 3260 | Loss: 1.1248 | LR: 2.00e-05 [2026-04-21 21:07:52] Epoch 1 | Step 3270 | Loss: 1.1249 | LR: 2.00e-05 [2026-04-21 21:07:57] Epoch 1 | Step 3280 | Loss: 1.1244 | LR: 2.00e-05 [2026-04-21 21:08:03] Epoch 1 | Step 3290 | Loss: 1.1241 | LR: 2.00e-05 [2026-04-21 21:08:07] Epoch 1 | Step 3300 | Loss: 1.1240 | LR: 2.00e-05 [2026-04-21 21:08:12] Epoch 1 | Step 3310 | Loss: 1.1240 | LR: 2.00e-05 [2026-04-21 21:08:17] Epoch 1 | Step 3320 | Loss: 1.1236 | LR: 2.00e-05 [2026-04-21 21:08:22] Epoch 1 | Step 3330 | Loss: 1.1236 | LR: 2.00e-05 [2026-04-21 21:08:28] Epoch 1 | Step 3340 | Loss: 1.1234 | LR: 2.00e-05 [2026-04-21 21:08:33] Epoch 1 | Step 3350 | Loss: 1.1233 | LR: 2.00e-05 [2026-04-21 21:08:39] Epoch 1 | Step 3360 | Loss: 1.1230 | LR: 2.00e-05 [2026-04-21 21:08:43] Epoch 1 | Step 3370 | Loss: 1.1229 | LR: 2.00e-05 [2026-04-21 21:08:48] Epoch 1 | Step 3380 | Loss: 1.1230 | LR: 2.00e-05 [2026-04-21 21:08:54] Epoch 1 | Step 3390 | Loss: 1.1231 | LR: 2.00e-05 [2026-04-21 21:08:59] Epoch 1 | Step 3400 | Loss: 1.1227 | LR: 2.00e-05 [2026-04-21 21:09:04] Epoch 1 | Step 3410 | Loss: 1.1218 | LR: 2.00e-05 [2026-04-21 21:09:09] Epoch 1 | Step 3420 | Loss: 1.1220 | LR: 2.00e-05 [2026-04-21 21:09:14] Epoch 1 | Step 3430 | Loss: 1.1216 | LR: 2.00e-05 [2026-04-21 21:09:20] Epoch 1 | Step 3440 | Loss: 1.1217 | LR: 2.00e-05 [2026-04-21 21:09:25] Epoch 1 | Step 3450 | Loss: 1.1218 | LR: 2.00e-05 [2026-04-21 21:09:30] Epoch 1 | Step 3460 | Loss: 1.1220 | LR: 2.00e-05 [2026-04-21 21:09:36] Epoch 1 | Step 3470 | Loss: 1.1220 | LR: 2.00e-05 [2026-04-21 21:09:41] Epoch 1 | Step 3480 | Loss: 1.1219 | LR: 2.00e-05 [2026-04-21 21:09:46] Epoch 1 | Step 3490 | Loss: 1.1216 | LR: 2.00e-05 [2026-04-21 21:09:51] Epoch 1 | Step 3500 | Loss: 1.1218 | LR: 2.00e-05 [2026-04-21 21:09:56] Epoch 1 | Step 3510 | Loss: 1.1218 | LR: 2.00e-05 [2026-04-21 21:10:01] Epoch 1 | Step 3520 | Loss: 1.1219 | LR: 2.00e-05 [2026-04-21 21:10:06] Epoch 1 | Step 3530 | Loss: 1.1214 | LR: 2.00e-05 [2026-04-21 21:10:12] Epoch 1 | Step 3540 | Loss: 1.1216 | LR: 2.00e-05 [2026-04-21 21:10:17] Epoch 1 | Step 3550 | Loss: 1.1217 | LR: 2.00e-05 [2026-04-21 21:10:22] Epoch 1 | Step 3560 | Loss: 1.1216 | LR: 2.00e-05 [2026-04-21 21:10:27] Epoch 1 | Step 3570 | Loss: 1.1215 | LR: 2.00e-05 [2026-04-21 21:10:32] Epoch 1 | Step 3580 | Loss: 1.1215 | LR: 2.00e-05 [2026-04-21 21:10:38] Epoch 1 | Step 3590 | Loss: 1.1215 | LR: 2.00e-05 [2026-04-21 21:10:43] Epoch 1 | Step 3600 | Loss: 1.1214 | LR: 2.00e-05 [2026-04-21 21:10:49] Epoch 1 | Step 3610 | Loss: 1.1214 | LR: 2.00e-05 [2026-04-21 21:10:54] Epoch 1 | Step 3620 | Loss: 1.1212 | LR: 2.00e-05 [2026-04-21 21:10:59] Epoch 1 | Step 3630 | Loss: 1.1212 | LR: 2.00e-05 [2026-04-21 21:11:05] Epoch 1 | Step 3640 | Loss: 1.1208 | LR: 2.00e-05 [2026-04-21 21:11:10] Epoch 1 | Step 3650 | Loss: 1.1208 | LR: 2.00e-05 [2026-04-21 21:11:15] Epoch 1 | Step 3660 | Loss: 1.1208 | LR: 2.00e-05 [2026-04-21 21:11:20] Epoch 1 | Step 3670 | Loss: 1.1207 | LR: 2.00e-05 [2026-04-21 21:11:25] Epoch 1 | Step 3680 | Loss: 1.1206 | LR: 2.00e-05 [2026-04-21 21:11:32] Epoch 1 | Step 3690 | Loss: 1.1206 | LR: 2.00e-05 [2026-04-21 21:11:38] Epoch 1 | Step 3700 | Loss: 1.1206 | LR: 2.00e-05 [2026-04-21 21:11:44] Epoch 1 | Step 3710 | Loss: 1.1204 | LR: 2.00e-05 [2026-04-21 21:11:49] Epoch 1 | Step 3720 | Loss: 1.1203 | LR: 2.00e-05 [2026-04-21 21:11:54] Epoch 1 | Step 3730 | Loss: 1.1202 | LR: 2.00e-05 [2026-04-21 21:12:00] Epoch 1 | Step 3740 | Loss: 1.1199 | LR: 2.00e-05 [2026-04-21 21:12:05] Epoch 1 | Step 3750 | Loss: 1.1197 | LR: 2.00e-05 [2026-04-21 21:12:12] Epoch 1 | Step 3760 | Loss: 1.1194 | LR: 2.00e-05 [2026-04-21 21:12:17] Epoch 1 | Step 3770 | Loss: 1.1196 | LR: 2.00e-05 [2026-04-21 21:12:22] Epoch 1 | Step 3780 | Loss: 1.1198 | LR: 2.00e-05 [2026-04-21 21:12:27] Epoch 1 | Step 3790 | Loss: 1.1195 | LR: 2.00e-05 [2026-04-21 21:12:32] Epoch 1 | Step 3800 | Loss: 1.1193 | LR: 2.00e-05 [2026-04-21 21:12:38] Epoch 1 | Step 3810 | Loss: 1.1190 | LR: 2.00e-05 [2026-04-21 21:12:43] Epoch 1 | Step 3820 | Loss: 1.1190 | LR: 2.00e-05 [2026-04-21 21:12:48] Epoch 1 | Step 3830 | Loss: 1.1190 | LR: 2.00e-05 [2026-04-21 21:12:54] Epoch 1 | Step 3840 | Loss: 1.1191 | LR: 2.00e-05 [2026-04-21 21:12:59] Epoch 1 | Step 3850 | Loss: 1.1191 | LR: 2.00e-05 [2026-04-21 21:13:05] Epoch 1 | Step 3860 | Loss: 1.1189 | LR: 2.00e-05 [2026-04-21 21:13:10] Epoch 1 | Step 3870 | Loss: 1.1192 | LR: 2.00e-05 [2026-04-21 21:13:16] Epoch 1 | Step 3880 | Loss: 1.1192 | LR: 2.00e-05 [2026-04-21 21:13:22] Epoch 1 | Step 3890 | Loss: 1.1191 | LR: 2.00e-05 [2026-04-21 21:13:27] Epoch 1 | Step 3900 | Loss: 1.1188 | LR: 2.00e-05 [2026-04-21 21:13:33] Epoch 1 | Step 3910 | Loss: 1.1186 | LR: 2.00e-05 [2026-04-21 21:13:38] Epoch 1 | Step 3920 | Loss: 1.1184 | LR: 2.00e-05 [2026-04-21 21:13:44] Epoch 1 | Step 3930 | Loss: 1.1185 | LR: 2.00e-05 [2026-04-21 21:13:49] Epoch 1 | Step 3940 | Loss: 1.1179 | LR: 2.00e-05 [2026-04-21 21:13:54] Epoch 1 | Step 3950 | Loss: 1.1180 | LR: 2.00e-05 [2026-04-21 21:13:59] Epoch 1 | Step 3960 | Loss: 1.1180 | LR: 2.00e-05 [2026-04-21 21:14:04] Epoch 1 | Step 3970 | Loss: 1.1178 | LR: 2.00e-05 [2026-04-21 21:14:10] Epoch 1 | Step 3980 | Loss: 1.1179 | LR: 2.00e-05 [2026-04-21 21:14:16] Epoch 1 | Step 3990 | Loss: 1.1179 | LR: 2.00e-05 [2026-04-21 21:14:21] Epoch 1 | Step 4000 | Loss: 1.1179 | LR: 2.00e-05 [2026-04-21 21:14:22] Validation | Batch 10/1567 | Loss: 1.0634 [2026-04-21 21:14:23] Validation | Batch 20/1567 | Loss: 1.1468 [2026-04-21 21:14:25] Validation | Batch 30/1567 | Loss: 1.1154 [2026-04-21 21:14:26] Validation | Batch 40/1567 | Loss: 1.1360 [2026-04-21 21:14:27] Validation | Batch 50/1567 | Loss: 1.1167 [2026-04-21 21:14:28] Validation | Batch 60/1567 | Loss: 1.1058 [2026-04-21 21:14:30] Validation | Batch 70/1567 | Loss: 1.0929 [2026-04-21 21:14:31] Validation | Batch 80/1567 | Loss: 1.1066 [2026-04-21 21:14:33] Validation | Batch 90/1567 | Loss: 1.0989 [2026-04-21 21:14:34] Validation | Batch 100/1567 | Loss: 1.0765 [2026-04-21 21:14:35] Validation | Batch 110/1567 | Loss: 1.0675 [2026-04-21 21:14:36] Validation | Batch 120/1567 | Loss: 1.0607 [2026-04-21 21:14:38] Validation | Batch 130/1567 | Loss: 1.0566 [2026-04-21 21:14:39] Validation | Batch 140/1567 | Loss: 1.0673 [2026-04-21 21:14:40] Validation | Batch 150/1567 | Loss: 1.0775 [2026-04-21 21:14:41] Validation | Batch 160/1567 | Loss: 1.0757 [2026-04-21 21:14:42] Validation | Batch 170/1567 | Loss: 1.0691 [2026-04-21 21:14:43] Validation | Batch 180/1567 | Loss: 1.0715 [2026-04-21 21:14:44] Validation | Batch 190/1567 | Loss: 1.0761 [2026-04-21 21:14:45] Validation | Batch 200/1567 | Loss: 1.0800 [2026-04-21 21:14:47] Validation | Batch 210/1567 | Loss: 1.0806 [2026-04-21 21:14:48] Validation | Batch 220/1567 | Loss: 1.0849 [2026-04-21 21:14:49] Validation | Batch 230/1567 | Loss: 1.0875 [2026-04-21 21:14:51] Validation | Batch 240/1567 | Loss: 1.0916 [2026-04-21 21:14:52] Validation | Batch 250/1567 | Loss: 1.0965 [2026-04-21 21:14:53] Validation | Batch 260/1567 | Loss: 1.0988 [2026-04-21 21:14:54] Validation | Batch 270/1567 | Loss: 1.1030 [2026-04-21 21:14:56] Validation | Batch 280/1567 | Loss: 1.1067 [2026-04-21 21:14:57] Validation | Batch 290/1567 | Loss: 1.1031 [2026-04-21 21:14:59] Validation | Batch 300/1567 | Loss: 1.1017 [2026-04-21 21:15:00] Validation | Batch 310/1567 | Loss: 1.0988 [2026-04-21 21:15:01] Validation | Batch 320/1567 | Loss: 1.1011 [2026-04-21 21:15:02] Validation | Batch 330/1567 | Loss: 1.1009 [2026-04-21 21:15:04] Validation | Batch 340/1567 | Loss: 1.1001 [2026-04-21 21:15:05] Validation | Batch 350/1567 | Loss: 1.0976 [2026-04-21 21:15:06] Validation | Batch 360/1567 | Loss: 1.0912 [2026-04-21 21:15:07] Validation | Batch 370/1567 | Loss: 1.0912 [2026-04-21 21:15:08] Validation | Batch 380/1567 | Loss: 1.0955 [2026-04-21 21:15:10] Validation | Batch 390/1567 | Loss: 1.0946 [2026-04-21 21:15:11] Validation | Batch 400/1567 | Loss: 1.0962 [2026-04-21 21:15:12] Validation | Batch 410/1567 | Loss: 1.0923 [2026-04-21 21:15:13] Validation | Batch 420/1567 | Loss: 1.0905 [2026-04-21 21:15:14] Validation | Batch 430/1567 | Loss: 1.0942 [2026-04-21 21:15:16] Validation | Batch 440/1567 | Loss: 1.0944 [2026-04-21 21:15:17] Validation | Batch 450/1567 | Loss: 1.0970 [2026-04-21 21:15:18] Validation | Batch 460/1567 | Loss: 1.1001 [2026-04-21 21:15:19] Validation | Batch 470/1567 | Loss: 1.1047 [2026-04-21 21:15:20] Validation | Batch 480/1567 | Loss: 1.1023 [2026-04-21 21:15:21] Validation | Batch 490/1567 | Loss: 1.0999 [2026-04-21 21:15:22] Validation | Batch 500/1567 | Loss: 1.1012 [2026-04-21 21:15:24] Validation | Batch 510/1567 | Loss: 1.1009 [2026-04-21 21:15:24] Validation | Batch 520/1567 | Loss: 1.1021 [2026-04-21 21:15:26] Validation | Batch 530/1567 | Loss: 1.1006 [2026-04-21 21:15:27] Validation | Batch 540/1567 | Loss: 1.0979 [2026-04-21 21:15:29] Validation | Batch 550/1567 | Loss: 1.0992 [2026-04-21 21:15:30] Validation | Batch 560/1567 | Loss: 1.0982 [2026-04-21 21:15:31] Validation | Batch 570/1567 | Loss: 1.0943 [2026-04-21 21:15:32] Validation | Batch 580/1567 | Loss: 1.0958 [2026-04-21 21:15:34] Validation | Batch 590/1567 | Loss: 1.0952 [2026-04-21 21:15:35] Validation | Batch 600/1567 | Loss: 1.0939 [2026-04-21 21:15:36] Validation | Batch 610/1567 | Loss: 1.0959 [2026-04-21 21:15:37] Validation | Batch 620/1567 | Loss: 1.0936 [2026-04-21 21:15:39] Validation | Batch 630/1567 | Loss: 1.0946 [2026-04-21 21:15:40] Validation | Batch 640/1567 | Loss: 1.0954 [2026-04-21 21:15:42] Validation | Batch 650/1567 | Loss: 1.0986 [2026-04-21 21:15:43] Validation | Batch 660/1567 | Loss: 1.0999 [2026-04-21 21:15:44] Validation | Batch 670/1567 | Loss: 1.0986 [2026-04-21 21:15:45] Validation | Batch 680/1567 | Loss: 1.0972 [2026-04-21 21:15:46] Validation | Batch 690/1567 | Loss: 1.0954 [2026-04-21 21:15:47] Validation | Batch 700/1567 | Loss: 1.0953 [2026-04-21 21:15:49] Validation | Batch 710/1567 | Loss: 1.0946 [2026-04-21 21:15:50] Validation | Batch 720/1567 | Loss: 1.0915 [2026-04-21 21:15:51] Validation | Batch 730/1567 | Loss: 1.0922 [2026-04-21 21:15:51] Validation | Batch 740/1567 | Loss: 1.0926 [2026-04-21 21:15:53] Validation | Batch 750/1567 | Loss: 1.0920 [2026-04-21 21:15:54] Validation | Batch 760/1567 | Loss: 1.0940 [2026-04-21 21:15:55] Validation | Batch 770/1567 | Loss: 1.0936 [2026-04-21 21:15:57] Validation | Batch 780/1567 | Loss: 1.0944 [2026-04-21 21:15:58] Validation | Batch 790/1567 | Loss: 1.0929 [2026-04-21 21:15:59] Validation | Batch 800/1567 | Loss: 1.0911 [2026-04-21 21:16:00] Validation | Batch 810/1567 | Loss: 1.0914 [2026-04-21 21:16:01] Validation | Batch 820/1567 | Loss: 1.0910 [2026-04-21 21:16:02] Validation | Batch 830/1567 | Loss: 1.0908 [2026-04-21 21:16:03] Validation | Batch 840/1567 | Loss: 1.0910 [2026-04-21 21:16:04] Validation | Batch 850/1567 | Loss: 1.0917 [2026-04-21 21:16:05] Validation | Batch 860/1567 | Loss: 1.0923 [2026-04-21 21:16:06] Validation | Batch 870/1567 | Loss: 1.0929 [2026-04-21 21:16:07] Validation | Batch 880/1567 | Loss: 1.0928 [2026-04-21 21:16:08] Validation | Batch 890/1567 | Loss: 1.0932 [2026-04-21 21:16:10] Validation | Batch 900/1567 | Loss: 1.0927 [2026-04-21 21:16:11] Validation | Batch 910/1567 | Loss: 1.0924 [2026-04-21 21:16:12] Validation | Batch 920/1567 | Loss: 1.0940 [2026-04-21 21:16:13] Validation | Batch 930/1567 | Loss: 1.0938 [2026-04-21 21:16:14] Validation | Batch 940/1567 | Loss: 1.0938 [2026-04-21 21:16:15] Validation | Batch 950/1567 | Loss: 1.0930 [2026-04-21 21:16:16] Validation | Batch 960/1567 | Loss: 1.0937 [2026-04-21 21:16:17] Validation | Batch 970/1567 | Loss: 1.0942 [2026-04-21 21:16:18] Validation | Batch 980/1567 | Loss: 1.0939 [2026-04-21 21:16:19] Validation | Batch 990/1567 | Loss: 1.0951 [2026-04-21 21:16:20] Validation | Batch 1000/1567 | Loss: 1.0958 [2026-04-21 21:16:21] Validation | Batch 1010/1567 | Loss: 1.0947 [2026-04-21 21:16:23] Validation | Batch 1020/1567 | Loss: 1.0957 [2026-04-21 21:16:24] Validation | Batch 1030/1567 | Loss: 1.0964 [2026-04-21 21:16:25] Validation | Batch 1040/1567 | Loss: 1.0956 [2026-04-21 21:16:26] Validation | Batch 1050/1567 | Loss: 1.0947 [2026-04-21 21:16:27] Validation | Batch 1060/1567 | Loss: 1.0956 [2026-04-21 21:16:29] Validation | Batch 1070/1567 | Loss: 1.0957 [2026-04-21 21:16:30] Validation | Batch 1080/1567 | Loss: 1.0971 [2026-04-21 21:16:31] Validation | Batch 1090/1567 | Loss: 1.1000 [2026-04-21 21:16:33] Validation | Batch 1100/1567 | Loss: 1.1015 [2026-04-21 21:16:34] Validation | Batch 1110/1567 | Loss: 1.1007 [2026-04-21 21:16:35] Validation | Batch 1120/1567 | Loss: 1.1008 [2026-04-21 21:16:36] Validation | Batch 1130/1567 | Loss: 1.0991 [2026-04-21 21:16:37] Validation | Batch 1140/1567 | Loss: 1.0995 [2026-04-21 21:16:38] Validation | Batch 1150/1567 | Loss: 1.0982 [2026-04-21 21:16:39] Validation | Batch 1160/1567 | Loss: 1.0976 [2026-04-21 21:16:40] Validation | Batch 1170/1567 | Loss: 1.0976 [2026-04-21 21:16:42] Validation | Batch 1180/1567 | Loss: 1.0979 [2026-04-21 21:16:43] Validation | Batch 1190/1567 | Loss: 1.0984 [2026-04-21 21:16:44] Validation | Batch 1200/1567 | Loss: 1.0972 [2026-04-21 21:16:45] Validation | Batch 1210/1567 | Loss: 1.0962 [2026-04-21 21:16:46] Validation | Batch 1220/1567 | Loss: 1.0970 [2026-04-21 21:16:48] Validation | Batch 1230/1567 | Loss: 1.0976 [2026-04-21 21:16:49] Validation | Batch 1240/1567 | Loss: 1.0975 [2026-04-21 21:16:50] Validation | Batch 1250/1567 | Loss: 1.0976 [2026-04-21 21:16:51] Validation | Batch 1260/1567 | Loss: 1.0971 [2026-04-21 21:16:53] Validation | Batch 1270/1567 | Loss: 1.0956 [2026-04-21 21:16:54] Validation | Batch 1280/1567 | Loss: 1.0955 [2026-04-21 21:16:55] Validation | Batch 1290/1567 | Loss: 1.0954 [2026-04-21 21:16:57] Validation | Batch 1300/1567 | Loss: 1.0958 [2026-04-21 21:16:58] Validation | Batch 1310/1567 | Loss: 1.0965 [2026-04-21 21:16:59] Validation | Batch 1320/1567 | Loss: 1.0970 [2026-04-21 21:17:00] Validation | Batch 1330/1567 | Loss: 1.0983 [2026-04-21 21:17:01] Validation | Batch 1340/1567 | Loss: 1.0981 [2026-04-21 21:17:02] Validation | Batch 1350/1567 | Loss: 1.0984 [2026-04-21 21:17:03] Validation | Batch 1360/1567 | Loss: 1.0975 [2026-04-21 21:17:04] Validation | Batch 1370/1567 | Loss: 1.0971 [2026-04-21 21:17:06] Validation | Batch 1380/1567 | Loss: 1.0971 [2026-04-21 21:17:07] Validation | Batch 1390/1567 | Loss: 1.0963 [2026-04-21 21:17:08] Validation | Batch 1400/1567 | Loss: 1.0963 [2026-04-21 21:17:09] Validation | Batch 1410/1567 | Loss: 1.0966 [2026-04-21 21:17:10] Validation | Batch 1420/1567 | Loss: 1.0963 [2026-04-21 21:17:11] Validation | Batch 1430/1567 | Loss: 1.0967 [2026-04-21 21:17:12] Validation | Batch 1440/1567 | Loss: 1.0974 [2026-04-21 21:17:13] Validation | Batch 1450/1567 | Loss: 1.0975 [2026-04-21 21:17:14] Validation | Batch 1460/1567 | Loss: 1.0969 [2026-04-21 21:17:15] Validation | Batch 1470/1567 | Loss: 1.0967 [2026-04-21 21:17:16] Validation | Batch 1480/1567 | Loss: 1.0963 [2026-04-21 21:17:17] Validation | Batch 1490/1567 | Loss: 1.0958 [2026-04-21 21:17:18] Validation | Batch 1500/1567 | Loss: 1.0956 [2026-04-21 21:17:19] Validation | Batch 1510/1567 | Loss: 1.0947 [2026-04-21 21:17:20] Validation | Batch 1520/1567 | Loss: 1.0946 [2026-04-21 21:17:21] Validation | Batch 1530/1567 | Loss: 1.0945 [2026-04-21 21:17:23] Validation | Batch 1540/1567 | Loss: 1.0949 [2026-04-21 21:17:23] Validation | Batch 1550/1567 | Loss: 1.0961 [2026-04-21 21:17:25] Validation | Batch 1560/1567 | Loss: 1.0961 [2026-04-21 21:17:26] Validation | Batch 1567/1567 | Loss: 1.0962 [2026-04-21 21:17:26] Validation | Loss: 1.0962 | PPL: 3.06 | Time: 184.63s [2026-04-21 21:17:44] New best model saved! Val loss: 1.0962 [2026-04-21 21:17:49] Epoch 1 | Step 4010 | Loss: 1.1179 | LR: 2.00e-05 [2026-04-21 21:17:55] Epoch 1 | Step 4020 | Loss: 1.1180 | LR: 2.00e-05 [2026-04-21 21:18:00] Epoch 1 | Step 4030 | Loss: 1.1177 | LR: 2.00e-05 [2026-04-21 21:18:06] Epoch 1 | Step 4040 | Loss: 1.1175 | LR: 2.00e-05 [2026-04-21 21:18:11] Epoch 1 | Step 4050 | Loss: 1.1174 | LR: 2.00e-05 [2026-04-21 21:18:16] Epoch 1 | Step 4060 | Loss: 1.1171 | LR: 2.00e-05 [2026-04-21 21:18:21] Epoch 1 | Step 4070 | Loss: 1.1167 | LR: 2.00e-05 [2026-04-21 21:18:27] Epoch 1 | Step 4080 | Loss: 1.1167 | LR: 2.00e-05 [2026-04-21 21:18:32] Epoch 1 | Step 4090 | Loss: 1.1165 | LR: 2.00e-05 [2026-04-21 21:18:37] Epoch 1 | Step 4100 | Loss: 1.1165 | LR: 2.00e-05 [2026-04-21 21:18:43] Epoch 1 | Step 4110 | Loss: 1.1166 | LR: 2.00e-05 [2026-04-21 21:18:48] Epoch 1 | Step 4120 | Loss: 1.1163 | LR: 2.00e-05 [2026-04-21 21:18:53] Epoch 1 | Step 4130 | Loss: 1.1161 | LR: 2.00e-05 [2026-04-21 21:18:59] Epoch 1 | Step 4140 | Loss: 1.1161 | LR: 2.00e-05 [2026-04-21 21:19:04] Epoch 1 | Step 4150 | Loss: 1.1156 | LR: 2.00e-05 [2026-04-21 21:19:09] Epoch 1 | Step 4160 | Loss: 1.1157 | LR: 2.00e-05 [2026-04-21 21:19:15] Epoch 1 | Step 4170 | Loss: 1.1157 | LR: 2.00e-05 [2026-04-21 21:19:21] Epoch 1 | Step 4180 | Loss: 1.1157 | LR: 2.00e-05 [2026-04-21 21:19:26] Epoch 1 | Step 4190 | Loss: 1.1158 | LR: 2.00e-05 [2026-04-21 21:19:31] Epoch 1 | Step 4200 | Loss: 1.1160 | LR: 2.00e-05 [2026-04-21 21:19:37] Epoch 1 | Step 4210 | Loss: 1.1159 | LR: 2.00e-05 [2026-04-21 21:19:42] Epoch 1 | Step 4220 | Loss: 1.1156 | LR: 2.00e-05 [2026-04-21 21:19:47] Epoch 1 | Step 4230 | Loss: 1.1152 | LR: 2.00e-05 [2026-04-21 21:19:52] Epoch 1 | Step 4240 | Loss: 1.1151 | LR: 2.00e-05 [2026-04-21 21:19:57] Epoch 1 | Step 4250 | Loss: 1.1149 | LR: 2.00e-05 [2026-04-21 21:20:02] Epoch 1 | Step 4260 | Loss: 1.1149 | LR: 2.00e-05 [2026-04-21 21:20:08] Epoch 1 | Step 4270 | Loss: 1.1148 | LR: 2.00e-05 [2026-04-21 21:20:13] Epoch 1 | Step 4280 | Loss: 1.1148 | LR: 2.00e-05 [2026-04-21 21:20:19] Epoch 1 | Step 4290 | Loss: 1.1144 | LR: 2.00e-05 [2026-04-21 21:20:23] Epoch 1 | Step 4300 | Loss: 1.1141 | LR: 2.00e-05 [2026-04-21 21:20:29] Epoch 1 | Step 4310 | Loss: 1.1138 | LR: 2.00e-05 [2026-04-21 21:20:34] Epoch 1 | Step 4320 | Loss: 1.1137 | LR: 2.00e-05 [2026-04-21 21:20:40] Epoch 1 | Step 4330 | Loss: 1.1136 | LR: 2.00e-05 [2026-04-21 21:20:45] Epoch 1 | Step 4340 | Loss: 1.1135 | LR: 2.00e-05 [2026-04-21 21:20:50] Epoch 1 | Step 4350 | Loss: 1.1134 | LR: 2.00e-05 [2026-04-21 21:20:55] Epoch 1 | Step 4360 | Loss: 1.1133 | LR: 2.00e-05 [2026-04-21 21:21:01] Epoch 1 | Step 4370 | Loss: 1.1128 | LR: 2.00e-05 [2026-04-21 21:21:06] Epoch 1 | Step 4380 | Loss: 1.1125 | LR: 2.00e-05 [2026-04-21 21:21:11] Epoch 1 | Step 4390 | Loss: 1.1124 | LR: 2.00e-05 [2026-04-21 21:21:17] Epoch 1 | Step 4400 | Loss: 1.1122 | LR: 2.00e-05 [2026-04-21 21:21:21] Epoch 1 | Step 4410 | Loss: 1.1122 | LR: 2.00e-05 [2026-04-21 21:21:27] Epoch 1 | Step 4420 | Loss: 1.1122 | LR: 2.00e-05 [2026-04-21 21:21:32] Epoch 1 | Step 4430 | Loss: 1.1121 | LR: 2.00e-05 [2026-04-21 21:21:38] Epoch 1 | Step 4440 | Loss: 1.1119 | LR: 2.00e-05 [2026-04-21 21:21:43] Epoch 1 | Step 4450 | Loss: 1.1118 | LR: 2.00e-05 [2026-04-21 21:21:48] Epoch 1 | Step 4460 | Loss: 1.1116 | LR: 2.00e-05 [2026-04-21 21:21:53] Epoch 1 | Step 4470 | Loss: 1.1114 | LR: 2.00e-05 [2026-04-21 21:21:58] Epoch 1 | Step 4480 | Loss: 1.1112 | LR: 2.00e-05 [2026-04-21 21:22:04] Epoch 1 | Step 4490 | Loss: 1.1115 | LR: 2.00e-05 [2026-04-21 21:22:09] Epoch 1 | Step 4500 | Loss: 1.1114 | LR: 2.00e-05 [2026-04-21 21:22:14] Epoch 1 | Step 4510 | Loss: 1.1111 | LR: 2.00e-05 [2026-04-21 21:22:19] Epoch 1 | Step 4520 | Loss: 1.1112 | LR: 2.00e-05 [2026-04-21 21:22:24] Epoch 1 | Step 4530 | Loss: 1.1111 | LR: 2.00e-05 [2026-04-21 21:22:29] Epoch 1 | Step 4540 | Loss: 1.1111 | LR: 2.00e-05 [2026-04-21 21:22:34] Epoch 1 | Step 4550 | Loss: 1.1110 | LR: 2.00e-05 [2026-04-21 21:22:39] Epoch 1 | Step 4560 | Loss: 1.1109 | LR: 2.00e-05 [2026-04-21 21:22:44] Epoch 1 | Step 4570 | Loss: 1.1107 | LR: 2.00e-05 [2026-04-21 21:22:49] Epoch 1 | Step 4580 | Loss: 1.1104 | LR: 2.00e-05 [2026-04-21 21:22:54] Epoch 1 | Step 4590 | Loss: 1.1103 | LR: 2.00e-05 [2026-04-21 21:23:00] Epoch 1 | Step 4600 | Loss: 1.1103 | LR: 2.00e-05 [2026-04-21 21:23:05] Epoch 1 | Step 4610 | Loss: 1.1104 | LR: 2.00e-05 [2026-04-21 21:23:10] Epoch 1 | Step 4620 | Loss: 1.1104 | LR: 2.00e-05 [2026-04-21 21:23:15] Epoch 1 | Step 4630 | Loss: 1.1106 | LR: 2.00e-05 [2026-04-21 21:23:20] Epoch 1 | Step 4640 | Loss: 1.1103 | LR: 2.00e-05 [2026-04-21 21:23:25] Epoch 1 | Step 4650 | Loss: 1.1105 | LR: 2.00e-05 [2026-04-21 21:23:30] Epoch 1 | Step 4660 | Loss: 1.1105 | LR: 2.00e-05 [2026-04-21 21:23:35] Epoch 1 | Step 4670 | Loss: 1.1104 | LR: 2.00e-05 [2026-04-21 21:23:40] Epoch 1 | Step 4680 | Loss: 1.1102 | LR: 2.00e-05 [2026-04-21 21:23:45] Epoch 1 | Step 4690 | Loss: 1.1102 | LR: 2.00e-05 [2026-04-21 21:23:50] Epoch 1 | Step 4700 | Loss: 1.1100 | LR: 2.00e-05 [2026-04-21 21:23:55] Epoch 1 | Step 4710 | Loss: 1.1098 | LR: 2.00e-05 [2026-04-21 21:24:01] Epoch 1 | Step 4720 | Loss: 1.1096 | LR: 2.00e-05 [2026-04-21 21:24:06] Epoch 1 | Step 4730 | Loss: 1.1096 | LR: 2.00e-05 [2026-04-21 21:24:12] Epoch 1 | Step 4740 | Loss: 1.1096 | LR: 2.00e-05 [2026-04-21 21:24:17] Epoch 1 | Step 4750 | Loss: 1.1096 | LR: 2.00e-05 [2026-04-21 21:24:21] Epoch 1 | Step 4760 | Loss: 1.1094 | LR: 2.00e-05 [2026-04-21 21:24:27] Epoch 1 | Step 4770 | Loss: 1.1094 | LR: 2.00e-05 [2026-04-21 21:24:32] Epoch 1 | Step 4780 | Loss: 1.1092 | LR: 2.00e-05 [2026-04-21 21:24:37] Epoch 1 | Step 4790 | Loss: 1.1092 | LR: 2.00e-05 [2026-04-21 21:24:43] Epoch 1 | Step 4800 | Loss: 1.1094 | LR: 2.00e-05 [2026-04-21 21:24:49] Epoch 1 | Step 4810 | Loss: 1.1095 | LR: 2.00e-05 [2026-04-21 21:24:54] Epoch 1 | Step 4820 | Loss: 1.1096 | LR: 2.00e-05 [2026-04-21 21:24:59] Epoch 1 | Step 4830 | Loss: 1.1097 | LR: 2.00e-05 [2026-04-21 21:25:04] Epoch 1 | Step 4840 | Loss: 1.1098 | LR: 2.00e-05 [2026-04-21 21:25:10] Epoch 1 | Step 4850 | Loss: 1.1096 | LR: 2.00e-05 [2026-04-21 21:25:15] Epoch 1 | Step 4860 | Loss: 1.1094 | LR: 2.00e-05 [2026-04-21 21:25:20] Epoch 1 | Step 4870 | Loss: 1.1092 | LR: 2.00e-05 [2026-04-21 21:25:25] Epoch 1 | Step 4880 | Loss: 1.1093 | LR: 2.00e-05 [2026-04-21 21:25:31] Epoch 1 | Step 4890 | Loss: 1.1091 | LR: 2.00e-05 [2026-04-21 21:25:37] Epoch 1 | Step 4900 | Loss: 1.1091 | LR: 2.00e-05 [2026-04-21 21:25:42] Epoch 1 | Step 4910 | Loss: 1.1089 | LR: 2.00e-05 [2026-04-21 21:25:47] Epoch 1 | Step 4920 | Loss: 1.1088 | LR: 2.00e-05 [2026-04-21 21:25:53] Epoch 1 | Step 4930 | Loss: 1.1087 | LR: 2.00e-05 [2026-04-21 21:25:58] Epoch 1 | Step 4940 | Loss: 1.1088 | LR: 2.00e-05 [2026-04-21 21:26:04] Epoch 1 | Step 4950 | Loss: 1.1087 | LR: 2.00e-05 [2026-04-21 21:26:09] Epoch 1 | Step 4960 | Loss: 1.1085 | LR: 2.00e-05 [2026-04-21 21:26:13] Epoch 1 | Step 4970 | Loss: 1.1083 | LR: 2.00e-05 [2026-04-21 21:26:20] Epoch 1 | Step 4980 | Loss: 1.1083 | LR: 2.00e-05 [2026-04-21 21:26:25] Epoch 1 | Step 4990 | Loss: 1.1080 | LR: 2.00e-05 [2026-04-21 21:26:31] Epoch 1 | Step 5000 | Loss: 1.1078 | LR: 2.00e-05 [2026-04-21 21:26:33] Validation | Batch 10/1567 | Loss: 1.0695 [2026-04-21 21:26:34] Validation | Batch 20/1567 | Loss: 1.1462 [2026-04-21 21:26:35] Validation | Batch 30/1567 | Loss: 1.1090 [2026-04-21 21:26:37] Validation | Batch 40/1567 | Loss: 1.1260 [2026-04-21 21:26:37] Validation | Batch 50/1567 | Loss: 1.1070 [2026-04-21 21:26:39] Validation | Batch 60/1567 | Loss: 1.0961 [2026-04-21 21:26:40] Validation | Batch 70/1567 | Loss: 1.0865 [2026-04-21 21:26:42] Validation | Batch 80/1567 | Loss: 1.0993 [2026-04-21 21:26:43] Validation | Batch 90/1567 | Loss: 1.0923 [2026-04-21 21:26:44] Validation | Batch 100/1567 | Loss: 1.0743 [2026-04-21 21:26:45] Validation | Batch 110/1567 | Loss: 1.0658 [2026-04-21 21:26:46] Validation | Batch 120/1567 | Loss: 1.0585 [2026-04-21 21:26:48] Validation | Batch 130/1567 | Loss: 1.0540 [2026-04-21 21:26:49] Validation | Batch 140/1567 | Loss: 1.0642 [2026-04-21 21:26:50] Validation | Batch 150/1567 | Loss: 1.0737 [2026-04-21 21:26:51] Validation | Batch 160/1567 | Loss: 1.0714 [2026-04-21 21:26:52] Validation | Batch 170/1567 | Loss: 1.0640 [2026-04-21 21:26:53] Validation | Batch 180/1567 | Loss: 1.0671 [2026-04-21 21:26:54] Validation | Batch 190/1567 | Loss: 1.0738 [2026-04-21 21:26:56] Validation | Batch 200/1567 | Loss: 1.0777 [2026-04-21 21:26:57] Validation | Batch 210/1567 | Loss: 1.0768 [2026-04-21 21:26:58] Validation | Batch 220/1567 | Loss: 1.0809 [2026-04-21 21:27:00] Validation | Batch 230/1567 | Loss: 1.0845 [2026-04-21 21:27:01] Validation | Batch 240/1567 | Loss: 1.0888 [2026-04-21 21:27:02] Validation | Batch 250/1567 | Loss: 1.0927 [2026-04-21 21:27:03] Validation | Batch 260/1567 | Loss: 1.0949 [2026-04-21 21:27:04] Validation | Batch 270/1567 | Loss: 1.0995 [2026-04-21 21:27:06] Validation | Batch 280/1567 | Loss: 1.1022 [2026-04-21 21:27:08] Validation | Batch 290/1567 | Loss: 1.0980 [2026-04-21 21:27:09] Validation | Batch 300/1567 | Loss: 1.0967 [2026-04-21 21:27:10] Validation | Batch 310/1567 | Loss: 1.0936 [2026-04-21 21:27:11] Validation | Batch 320/1567 | Loss: 1.0961 [2026-04-21 21:27:12] Validation | Batch 330/1567 | Loss: 1.0957 [2026-04-21 21:27:14] Validation | Batch 340/1567 | Loss: 1.0948 [2026-04-21 21:27:15] Validation | Batch 350/1567 | Loss: 1.0923 [2026-04-21 21:27:16] Validation | Batch 360/1567 | Loss: 1.0867 [2026-04-21 21:27:17] Validation | Batch 370/1567 | Loss: 1.0871 [2026-04-21 21:27:18] Validation | Batch 380/1567 | Loss: 1.0912 [2026-04-21 21:27:20] Validation | Batch 390/1567 | Loss: 1.0899 [2026-04-21 21:27:21] Validation | Batch 400/1567 | Loss: 1.0913 [2026-04-21 21:27:22] Validation | Batch 410/1567 | Loss: 1.0872 [2026-04-21 21:27:23] Validation | Batch 420/1567 | Loss: 1.0854 [2026-04-21 21:27:24] Validation | Batch 430/1567 | Loss: 1.0886 [2026-04-21 21:27:26] Validation | Batch 440/1567 | Loss: 1.0892 [2026-04-21 21:27:27] Validation | Batch 450/1567 | Loss: 1.0918 [2026-04-21 21:27:28] Validation | Batch 460/1567 | Loss: 1.0951 [2026-04-21 21:27:29] Validation | Batch 470/1567 | Loss: 1.0998 [2026-04-21 21:27:30] Validation | Batch 480/1567 | Loss: 1.0972 [2026-04-21 21:27:31] Validation | Batch 490/1567 | Loss: 1.0946 [2026-04-21 21:27:32] Validation | Batch 500/1567 | Loss: 1.0958 [2026-04-21 21:27:34] Validation | Batch 510/1567 | Loss: 1.0959 [2026-04-21 21:27:35] Validation | Batch 520/1567 | Loss: 1.0971 [2026-04-21 21:27:36] Validation | Batch 530/1567 | Loss: 1.0955 [2026-04-21 21:27:37] Validation | Batch 540/1567 | Loss: 1.0925 [2026-04-21 21:27:39] Validation | Batch 550/1567 | Loss: 1.0937 [2026-04-21 21:27:40] Validation | Batch 560/1567 | Loss: 1.0928 [2026-04-21 21:27:41] Validation | Batch 570/1567 | Loss: 1.0884 [2026-04-21 21:27:43] Validation | Batch 580/1567 | Loss: 1.0903 [2026-04-21 21:27:44] Validation | Batch 590/1567 | Loss: 1.0898 [2026-04-21 21:27:45] Validation | Batch 600/1567 | Loss: 1.0885 [2026-04-21 21:27:46] Validation | Batch 610/1567 | Loss: 1.0907 [2026-04-21 21:27:47] Validation | Batch 620/1567 | Loss: 1.0886 [2026-04-21 21:27:49] Validation | Batch 630/1567 | Loss: 1.0889 [2026-04-21 21:27:50] Validation | Batch 640/1567 | Loss: 1.0894 [2026-04-21 21:27:52] Validation | Batch 650/1567 | Loss: 1.0925 [2026-04-21 21:27:53] Validation | Batch 660/1567 | Loss: 1.0937 [2026-04-21 21:27:54] Validation | Batch 670/1567 | Loss: 1.0921 [2026-04-21 21:27:55] Validation | Batch 680/1567 | Loss: 1.0908 [2026-04-21 21:27:56] Validation | Batch 690/1567 | Loss: 1.0892 [2026-04-21 21:27:57] Validation | Batch 700/1567 | Loss: 1.0893 [2026-04-21 21:27:59] Validation | Batch 710/1567 | Loss: 1.0885 [2026-04-21 21:28:00] Validation | Batch 720/1567 | Loss: 1.0853 [2026-04-21 21:28:01] Validation | Batch 730/1567 | Loss: 1.0858 [2026-04-21 21:28:02] Validation | Batch 740/1567 | Loss: 1.0864 [2026-04-21 21:28:03] Validation | Batch 750/1567 | Loss: 1.0859 [2026-04-21 21:28:04] Validation | Batch 760/1567 | Loss: 1.0876 [2026-04-21 21:28:06] Validation | Batch 770/1567 | Loss: 1.0872 [2026-04-21 21:28:07] Validation | Batch 780/1567 | Loss: 1.0883 [2026-04-21 21:28:08] Validation | Batch 790/1567 | Loss: 1.0866 [2026-04-21 21:28:09] Validation | Batch 800/1567 | Loss: 1.0848 [2026-04-21 21:28:10] Validation | Batch 810/1567 | Loss: 1.0853 [2026-04-21 21:28:11] Validation | Batch 820/1567 | Loss: 1.0846 [2026-04-21 21:28:12] Validation | Batch 830/1567 | Loss: 1.0840 [2026-04-21 21:28:13] Validation | Batch 840/1567 | Loss: 1.0843 [2026-04-21 21:28:14] Validation | Batch 850/1567 | Loss: 1.0850 [2026-04-21 21:28:15] Validation | Batch 860/1567 | Loss: 1.0857 [2026-04-21 21:28:16] Validation | Batch 870/1567 | Loss: 1.0863 [2026-04-21 21:28:17] Validation | Batch 880/1567 | Loss: 1.0863 [2026-04-21 21:28:19] Validation | Batch 890/1567 | Loss: 1.0864 [2026-04-21 21:28:20] Validation | Batch 900/1567 | Loss: 1.0860 [2026-04-21 21:28:21] Validation | Batch 910/1567 | Loss: 1.0855 [2026-04-21 21:28:22] Validation | Batch 920/1567 | Loss: 1.0873 [2026-04-21 21:28:23] Validation | Batch 930/1567 | Loss: 1.0871 [2026-04-21 21:28:24] Validation | Batch 940/1567 | Loss: 1.0871 [2026-04-21 21:28:25] Validation | Batch 950/1567 | Loss: 1.0864 [2026-04-21 21:28:26] Validation | Batch 960/1567 | Loss: 1.0870 [2026-04-21 21:28:27] Validation | Batch 970/1567 | Loss: 1.0876 [2026-04-21 21:28:28] Validation | Batch 980/1567 | Loss: 1.0871 [2026-04-21 21:28:29] Validation | Batch 990/1567 | Loss: 1.0882 [2026-04-21 21:28:31] Validation | Batch 1000/1567 | Loss: 1.0889 [2026-04-21 21:28:32] Validation | Batch 1010/1567 | Loss: 1.0879 [2026-04-21 21:28:33] Validation | Batch 1020/1567 | Loss: 1.0889 [2026-04-21 21:28:34] Validation | Batch 1030/1567 | Loss: 1.0894 [2026-04-21 21:28:36] Validation | Batch 1040/1567 | Loss: 1.0887 [2026-04-21 21:28:37] Validation | Batch 1050/1567 | Loss: 1.0879 [2026-04-21 21:28:38] Validation | Batch 1060/1567 | Loss: 1.0890 [2026-04-21 21:28:39] Validation | Batch 1070/1567 | Loss: 1.0890 [2026-04-21 21:28:40] Validation | Batch 1080/1567 | Loss: 1.0904 [2026-04-21 21:28:42] Validation | Batch 1090/1567 | Loss: 1.0932 [2026-04-21 21:28:43] Validation | Batch 1100/1567 | Loss: 1.0948 [2026-04-21 21:28:44] Validation | Batch 1110/1567 | Loss: 1.0941 [2026-04-21 21:28:45] Validation | Batch 1120/1567 | Loss: 1.0941 [2026-04-21 21:28:46] Validation | Batch 1130/1567 | Loss: 1.0924 [2026-04-21 21:28:47] Validation | Batch 1140/1567 | Loss: 1.0928 [2026-04-21 21:28:49] Validation | Batch 1150/1567 | Loss: 1.0916 [2026-04-21 21:28:49] Validation | Batch 1160/1567 | Loss: 1.0909 [2026-04-21 21:28:51] Validation | Batch 1170/1567 | Loss: 1.0909 [2026-04-21 21:28:52] Validation | Batch 1180/1567 | Loss: 1.0911 [2026-04-21 21:28:53] Validation | Batch 1190/1567 | Loss: 1.0915 [2026-04-21 21:28:54] Validation | Batch 1200/1567 | Loss: 1.0903 [2026-04-21 21:28:56] Validation | Batch 1210/1567 | Loss: 1.0893 [2026-04-21 21:28:56] Validation | Batch 1220/1567 | Loss: 1.0901 [2026-04-21 21:28:58] Validation | Batch 1230/1567 | Loss: 1.0907 [2026-04-21 21:28:59] Validation | Batch 1240/1567 | Loss: 1.0905 [2026-04-21 21:29:00] Validation | Batch 1250/1567 | Loss: 1.0907 [2026-04-21 21:29:01] Validation | Batch 1260/1567 | Loss: 1.0903 [2026-04-21 21:29:03] Validation | Batch 1270/1567 | Loss: 1.0887 [2026-04-21 21:29:04] Validation | Batch 1280/1567 | Loss: 1.0888 [2026-04-21 21:29:06] Validation | Batch 1290/1567 | Loss: 1.0886 [2026-04-21 21:29:07] Validation | Batch 1300/1567 | Loss: 1.0890 [2026-04-21 21:29:08] Validation | Batch 1310/1567 | Loss: 1.0896 [2026-04-21 21:29:09] Validation | Batch 1320/1567 | Loss: 1.0903 [2026-04-21 21:29:10] Validation | Batch 1330/1567 | Loss: 1.0917 [2026-04-21 21:29:11] Validation | Batch 1340/1567 | Loss: 1.0915 [2026-04-21 21:29:12] Validation | Batch 1350/1567 | Loss: 1.0918 [2026-04-21 21:29:13] Validation | Batch 1360/1567 | Loss: 1.0910 [2026-04-21 21:29:15] Validation | Batch 1370/1567 | Loss: 1.0907 [2026-04-21 21:29:16] Validation | Batch 1380/1567 | Loss: 1.0908 [2026-04-21 21:29:17] Validation | Batch 1390/1567 | Loss: 1.0901 [2026-04-21 21:29:18] Validation | Batch 1400/1567 | Loss: 1.0899 [2026-04-21 21:29:19] Validation | Batch 1410/1567 | Loss: 1.0905 [2026-04-21 21:29:20] Validation | Batch 1420/1567 | Loss: 1.0902 [2026-04-21 21:29:21] Validation | Batch 1430/1567 | Loss: 1.0904 [2026-04-21 21:29:23] Validation | Batch 1440/1567 | Loss: 1.0910 [2026-04-21 21:29:23] Validation | Batch 1450/1567 | Loss: 1.0911 [2026-04-21 21:29:24] Validation | Batch 1460/1567 | Loss: 1.0905 [2026-04-21 21:29:25] Validation | Batch 1470/1567 | Loss: 1.0903 [2026-04-21 21:29:26] Validation | Batch 1480/1567 | Loss: 1.0898 [2026-04-21 21:29:27] Validation | Batch 1490/1567 | Loss: 1.0892 [2026-04-21 21:29:29] Validation | Batch 1500/1567 | Loss: 1.0890 [2026-04-21 21:29:30] Validation | Batch 1510/1567 | Loss: 1.0881 [2026-04-21 21:29:30] Validation | Batch 1520/1567 | Loss: 1.0879 [2026-04-21 21:29:31] Validation | Batch 1530/1567 | Loss: 1.0878 [2026-04-21 21:29:33] Validation | Batch 1540/1567 | Loss: 1.0883 [2026-04-21 21:29:34] Validation | Batch 1550/1567 | Loss: 1.0895 [2026-04-21 21:29:35] Validation | Batch 1560/1567 | Loss: 1.0892 [2026-04-21 21:29:36] Validation | Batch 1567/1567 | Loss: 1.0893 [2026-04-21 21:29:36] Validation | Loss: 1.0893 | PPL: 3.03 | Time: 184.65s [2026-04-21 21:29:54] New best model saved! Val loss: 1.0893 [2026-04-21 21:29:59] Epoch 1 | Step 5010 | Loss: 1.1077 | LR: 2.00e-05 [2026-04-21 21:30:05] Epoch 1 | Step 5020 | Loss: 1.1074 | LR: 2.00e-05 [2026-04-21 21:30:11] Epoch 1 | Step 5030 | Loss: 1.1072 | LR: 2.00e-05 [2026-04-21 21:30:16] Epoch 1 | Step 5040 | Loss: 1.1071 | LR: 2.00e-05 [2026-04-21 21:30:21] Epoch 1 | Step 5050 | Loss: 1.1067 | LR: 2.00e-05 [2026-04-21 21:30:26] Epoch 1 | Step 5060 | Loss: 1.1065 | LR: 2.00e-05 [2026-04-21 21:30:32] Epoch 1 | Step 5070 | Loss: 1.1062 | LR: 2.00e-05 [2026-04-21 21:30:37] Epoch 1 | Step 5080 | Loss: 1.1060 | LR: 2.00e-05 [2026-04-21 21:30:42] Epoch 1 | Step 5090 | Loss: 1.1060 | LR: 2.00e-05 [2026-04-21 21:30:47] Epoch 1 | Step 5100 | Loss: 1.1061 | LR: 2.00e-05 [2026-04-21 21:30:53] Epoch 1 | Step 5110 | Loss: 1.1061 | LR: 2.00e-05 [2026-04-21 21:30:58] Epoch 1 | Step 5120 | Loss: 1.1060 | LR: 2.00e-05 [2026-04-21 21:31:03] Epoch 1 | Step 5130 | Loss: 1.1057 | LR: 2.00e-05 [2026-04-21 21:31:08] Epoch 1 | Step 5140 | Loss: 1.1055 | LR: 2.00e-05 [2026-04-21 21:31:14] Epoch 1 | Step 5150 | Loss: 1.1054 | LR: 2.00e-05 [2026-04-21 21:31:19] Epoch 1 | Step 5160 | Loss: 1.1053 | LR: 2.00e-05 [2026-04-21 21:31:24] Epoch 1 | Step 5170 | Loss: 1.1053 | LR: 2.00e-05 [2026-04-21 21:31:30] Epoch 1 | Step 5180 | Loss: 1.1050 | LR: 2.00e-05 [2026-04-21 21:31:35] Epoch 1 | Step 5190 | Loss: 1.1050 | LR: 2.00e-05 [2026-04-21 21:31:40] Epoch 1 | Step 5200 | Loss: 1.1046 | LR: 2.00e-05 [2026-04-21 21:31:45] Epoch 1 | Step 5210 | Loss: 1.1044 | LR: 2.00e-05 [2026-04-21 21:31:51] Epoch 1 | Step 5220 | Loss: 1.1044 | LR: 2.00e-05 [2026-04-21 21:31:57] Epoch 1 | Step 5230 | Loss: 1.1047 | LR: 2.00e-05 [2026-04-21 21:32:02] Epoch 1 | Step 5240 | Loss: 1.1049 | LR: 2.00e-05 [2026-04-21 21:32:07] Epoch 1 | Step 5250 | Loss: 1.1048 | LR: 2.00e-05 [2026-04-21 21:32:12] Epoch 1 | Step 5260 | Loss: 1.1046 | LR: 2.00e-05 [2026-04-21 21:32:17] Epoch 1 | Step 5270 | Loss: 1.1044 | LR: 2.00e-05 [2026-04-21 21:32:22] Epoch 1 | Step 5280 | Loss: 1.1046 | LR: 2.00e-05 [2026-04-21 21:32:27] Epoch 1 | Step 5290 | Loss: 1.1046 | LR: 2.00e-05 [2026-04-21 21:32:33] Epoch 1 | Step 5300 | Loss: 1.1044 | LR: 2.00e-05 [2026-04-21 21:32:38] Epoch 1 | Step 5310 | Loss: 1.1043 | LR: 2.00e-05 [2026-04-21 21:32:43] Epoch 1 | Step 5320 | Loss: 1.1043 | LR: 2.00e-05 [2026-04-21 21:32:49] Epoch 1 | Step 5330 | Loss: 1.1042 | LR: 2.00e-05 [2026-04-21 21:32:54] Epoch 1 | Step 5340 | Loss: 1.1040 | LR: 2.00e-05 [2026-04-21 21:32:59] Epoch 1 | Step 5350 | Loss: 1.1039 | LR: 2.00e-05 [2026-04-21 21:33:05] Epoch 1 | Step 5360 | Loss: 1.1041 | LR: 2.00e-05 [2026-04-21 21:33:10] Epoch 1 | Step 5370 | Loss: 1.1041 | LR: 2.00e-05 [2026-04-21 21:33:15] Epoch 1 | Step 5380 | Loss: 1.1039 | LR: 2.00e-05 [2026-04-21 21:33:21] Epoch 1 | Step 5390 | Loss: 1.1038 | LR: 2.00e-05 [2026-04-21 21:33:26] Epoch 1 | Step 5400 | Loss: 1.1036 | LR: 2.00e-05 [2026-04-21 21:33:31] Epoch 1 | Step 5410 | Loss: 1.1037 | LR: 2.00e-05 [2026-04-21 21:33:36] Epoch 1 | Step 5420 | Loss: 1.1035 | LR: 2.00e-05 [2026-04-21 21:33:42] Epoch 1 | Step 5430 | Loss: 1.1037 | LR: 2.00e-05 [2026-04-21 21:33:47] Epoch 1 | Step 5440 | Loss: 1.1034 | LR: 2.00e-05 [2026-04-21 21:33:54] Epoch 1 | Step 5450 | Loss: 1.1034 | LR: 2.00e-05 [2026-04-21 21:33:59] Epoch 1 | Step 5460 | Loss: 1.1033 | LR: 2.00e-05 [2026-04-21 21:34:04] Epoch 1 | Step 5470 | Loss: 1.1032 | LR: 2.00e-05 [2026-04-21 21:34:09] Epoch 1 | Step 5480 | Loss: 1.1029 | LR: 2.00e-05 [2026-04-21 21:34:15] Epoch 1 | Step 5490 | Loss: 1.1027 | LR: 2.00e-05 [2026-04-21 21:34:21] Epoch 1 | Step 5500 | Loss: 1.1026 | LR: 2.00e-05 [2026-04-21 21:34:26] Epoch 1 | Step 5510 | Loss: 1.1025 | LR: 2.00e-05 [2026-04-21 21:34:30] Epoch 1 | Step 5520 | Loss: 1.1026 | LR: 2.00e-05 [2026-04-21 21:34:35] Epoch 1 | Step 5530 | Loss: 1.1026 | LR: 2.00e-05 [2026-04-21 21:34:41] Epoch 1 | Step 5540 | Loss: 1.1026 | LR: 2.00e-05 [2026-04-21 21:34:46] Epoch 1 | Step 5550 | Loss: 1.1024 | LR: 2.00e-05 [2026-04-21 21:34:51] Epoch 1 | Step 5560 | Loss: 1.1022 | LR: 2.00e-05 [2026-04-21 21:34:57] Epoch 1 | Step 5570 | Loss: 1.1022 | LR: 2.00e-05 [2026-04-21 21:35:03] Epoch 1 | Step 5580 | Loss: 1.1022 | LR: 2.00e-05 [2026-04-21 21:35:09] Epoch 1 | Step 5590 | Loss: 1.1021 | LR: 2.00e-05 [2026-04-21 21:35:14] Epoch 1 | Step 5600 | Loss: 1.1018 | LR: 2.00e-05 [2026-04-21 21:35:19] Epoch 1 | Step 5610 | Loss: 1.1018 | LR: 2.00e-05 [2026-04-21 21:35:25] Epoch 1 | Step 5620 | Loss: 1.1018 | LR: 2.00e-05 [2026-04-21 21:35:31] Epoch 1 | Step 5630 | Loss: 1.1020 | LR: 2.00e-05 [2026-04-21 21:35:37] Epoch 1 | Step 5640 | Loss: 1.1020 | LR: 2.00e-05 [2026-04-21 21:35:42] Epoch 1 | Step 5650 | Loss: 1.1019 | LR: 2.00e-05 [2026-04-21 21:35:48] Epoch 1 | Step 5660 | Loss: 1.1018 | LR: 2.00e-05 [2026-04-21 21:35:53] Epoch 1 | Step 5670 | Loss: 1.1019 | LR: 2.00e-05 [2026-04-21 21:35:59] Epoch 1 | Step 5680 | Loss: 1.1022 | LR: 2.00e-05 [2026-04-21 21:36:04] Epoch 1 | Step 5690 | Loss: 1.1020 | LR: 2.00e-05 [2026-04-21 21:36:10] Epoch 1 | Step 5700 | Loss: 1.1019 | LR: 2.00e-05 [2026-04-21 21:36:15] Epoch 1 | Step 5710 | Loss: 1.1016 | LR: 2.00e-05 [2026-04-21 21:36:20] Epoch 1 | Step 5720 | Loss: 1.1015 | LR: 2.00e-05 [2026-04-21 21:36:25] Epoch 1 | Step 5730 | Loss: 1.1012 | LR: 2.00e-05 [2026-04-21 21:36:31] Epoch 1 | Step 5740 | Loss: 1.1011 | LR: 2.00e-05 [2026-04-21 21:36:36] Epoch 1 | Step 5750 | Loss: 1.1014 | LR: 2.00e-05 [2026-04-21 21:36:41] Epoch 1 | Step 5760 | Loss: 1.1013 | LR: 2.00e-05 [2026-04-21 21:36:47] Epoch 1 | Step 5770 | Loss: 1.1012 | LR: 2.00e-05 [2026-04-21 21:36:52] Epoch 1 | Step 5780 | Loss: 1.1013 | LR: 2.00e-05 [2026-04-21 21:36:57] Epoch 1 | Step 5790 | Loss: 1.1012 | LR: 2.00e-05 [2026-04-21 21:37:02] Epoch 1 | Step 5800 | Loss: 1.1010 | LR: 2.00e-05 [2026-04-21 21:37:08] Epoch 1 | Step 5810 | Loss: 1.1006 | LR: 2.00e-05 [2026-04-21 21:37:13] Epoch 1 | Step 5820 | Loss: 1.1004 | LR: 2.00e-05 [2026-04-21 21:37:19] Epoch 1 | Step 5830 | Loss: 1.1003 | LR: 2.00e-05 [2026-04-21 21:37:25] Epoch 1 | Step 5840 | Loss: 1.1001 | LR: 2.00e-05 [2026-04-21 21:37:30] Epoch 1 | Step 5850 | Loss: 1.1003 | LR: 2.00e-05 [2026-04-21 21:37:35] Epoch 1 | Step 5860 | Loss: 1.1003 | LR: 2.00e-05 [2026-04-21 21:37:40] Epoch 1 | Step 5870 | Loss: 1.0999 | LR: 2.00e-05 [2026-04-21 21:37:45] Epoch 1 | Step 5880 | Loss: 1.0998 | LR: 2.00e-05 [2026-04-21 21:37:51] Epoch 1 | Step 5890 | Loss: 1.0997 | LR: 2.00e-05 [2026-04-21 21:37:56] Epoch 1 | Step 5900 | Loss: 1.0996 | LR: 2.00e-05 [2026-04-21 21:38:02] Epoch 1 | Step 5910 | Loss: 1.0995 | LR: 2.00e-05 [2026-04-21 21:38:07] Epoch 1 | Step 5920 | Loss: 1.0991 | LR: 2.00e-05 [2026-04-21 21:38:13] Epoch 1 | Step 5930 | Loss: 1.0990 | LR: 2.00e-05 [2026-04-21 21:38:18] Epoch 1 | Step 5940 | Loss: 1.0991 | LR: 2.00e-05 [2026-04-21 21:38:23] Epoch 1 | Step 5950 | Loss: 1.0992 | LR: 2.00e-05 [2026-04-21 21:38:28] Epoch 1 | Step 5960 | Loss: 1.0992 | LR: 2.00e-05 [2026-04-21 21:38:33] Epoch 1 | Step 5970 | Loss: 1.0991 | LR: 2.00e-05 [2026-04-21 21:38:38] Epoch 1 | Step 5980 | Loss: 1.0992 | LR: 2.00e-05 [2026-04-21 21:38:43] Epoch 1 | Step 5990 | Loss: 1.0991 | LR: 2.00e-05 [2026-04-21 21:38:48] Epoch 1 | Step 6000 | Loss: 1.0990 | LR: 2.00e-05 [2026-04-21 21:38:59] Checkpoint saved: outputs/2026-04-21/20-28-37/checkpoints/checkpoint_step_6000.pt [2026-04-21 21:40:15] Validation | Batch 10/1567 | Loss: 1.0678 [2026-04-21 21:40:16] Validation | Batch 20/1567 | Loss: 1.1421 [2026-04-21 21:40:17] Validation | Batch 30/1567 | Loss: 1.1067 [2026-04-21 21:40:19] Validation | Batch 40/1567 | Loss: 1.1265 [2026-04-21 21:40:19] Validation | Batch 50/1567 | Loss: 1.1097 [2026-04-21 21:40:21] Validation | Batch 60/1567 | Loss: 1.0983 [2026-04-21 21:40:22] Validation | Batch 70/1567 | Loss: 1.0880 [2026-04-21 21:40:24] Validation | Batch 80/1567 | Loss: 1.0982 [2026-04-21 21:40:25] Validation | Batch 90/1567 | Loss: 1.0889 [2026-04-21 21:40:26] Validation | Batch 100/1567 | Loss: 1.0685 [2026-04-21 21:40:27] Validation | Batch 110/1567 | Loss: 1.0598 [2026-04-21 21:40:28] Validation | Batch 120/1567 | Loss: 1.0532 [2026-04-21 21:40:30] Validation | Batch 130/1567 | Loss: 1.0491 [2026-04-21 21:40:31] Validation | Batch 140/1567 | Loss: 1.0594 [2026-04-21 21:40:32] Validation | Batch 150/1567 | Loss: 1.0707 [2026-04-21 21:40:33] Validation | Batch 160/1567 | Loss: 1.0688 [2026-04-21 21:40:34] Validation | Batch 170/1567 | Loss: 1.0608 [2026-04-21 21:40:35] Validation | Batch 180/1567 | Loss: 1.0632 [2026-04-21 21:40:36] Validation | Batch 190/1567 | Loss: 1.0684 [2026-04-21 21:40:38] Validation | Batch 200/1567 | Loss: 1.0725 [2026-04-21 21:40:39] Validation | Batch 210/1567 | Loss: 1.0718 [2026-04-21 21:40:40] Validation | Batch 220/1567 | Loss: 1.0761 [2026-04-21 21:40:42] Validation | Batch 230/1567 | Loss: 1.0793 [2026-04-21 21:40:43] Validation | Batch 240/1567 | Loss: 1.0833 [2026-04-21 21:40:44] Validation | Batch 250/1567 | Loss: 1.0875 [2026-04-21 21:40:45] Validation | Batch 260/1567 | Loss: 1.0899 [2026-04-21 21:40:46] Validation | Batch 270/1567 | Loss: 1.0945 [2026-04-21 21:40:48] Validation | Batch 280/1567 | Loss: 1.0978 [2026-04-21 21:40:50] Validation | Batch 290/1567 | Loss: 1.0934 [2026-04-21 21:40:51] Validation | Batch 300/1567 | Loss: 1.0922 [2026-04-21 21:40:52] Validation | Batch 310/1567 | Loss: 1.0892 [2026-04-21 21:40:53] Validation | Batch 320/1567 | Loss: 1.0918 [2026-04-21 21:40:54] Validation | Batch 330/1567 | Loss: 1.0912 [2026-04-21 21:40:56] Validation | Batch 340/1567 | Loss: 1.0895 [2026-04-21 21:40:57] Validation | Batch 350/1567 | Loss: 1.0870 [2026-04-21 21:40:58] Validation | Batch 360/1567 | Loss: 1.0812 [2026-04-21 21:40:59] Validation | Batch 370/1567 | Loss: 1.0817 [2026-04-21 21:41:01] Validation | Batch 380/1567 | Loss: 1.0858 [2026-04-21 21:41:02] Validation | Batch 390/1567 | Loss: 1.0850 [2026-04-21 21:41:03] Validation | Batch 400/1567 | Loss: 1.0860 [2026-04-21 21:41:04] Validation | Batch 410/1567 | Loss: 1.0819 [2026-04-21 21:41:05] Validation | Batch 420/1567 | Loss: 1.0808 [2026-04-21 21:41:07] Validation | Batch 430/1567 | Loss: 1.0839 [2026-04-21 21:41:08] Validation | Batch 440/1567 | Loss: 1.0846 [2026-04-21 21:41:09] Validation | Batch 450/1567 | Loss: 1.0873 [2026-04-21 21:41:10] Validation | Batch 460/1567 | Loss: 1.0906 [2026-04-21 21:41:11] Validation | Batch 470/1567 | Loss: 1.0953 [2026-04-21 21:41:12] Validation | Batch 480/1567 | Loss: 1.0930 [2026-04-21 21:41:14] Validation | Batch 490/1567 | Loss: 1.0906 [2026-04-21 21:41:14] Validation | Batch 500/1567 | Loss: 1.0918 [2026-04-21 21:41:16] Validation | Batch 510/1567 | Loss: 1.0916 [2026-04-21 21:41:17] Validation | Batch 520/1567 | Loss: 1.0930 [2026-04-21 21:41:18] Validation | Batch 530/1567 | Loss: 1.0913 [2026-04-21 21:41:19] Validation | Batch 540/1567 | Loss: 1.0884 [2026-04-21 21:41:21] Validation | Batch 550/1567 | Loss: 1.0897 [2026-04-21 21:41:22] Validation | Batch 560/1567 | Loss: 1.0890 [2026-04-21 21:41:23] Validation | Batch 570/1567 | Loss: 1.0850 [2026-04-21 21:41:25] Validation | Batch 580/1567 | Loss: 1.0867 [2026-04-21 21:41:26] Validation | Batch 590/1567 | Loss: 1.0860 [2026-04-21 21:41:27] Validation | Batch 600/1567 | Loss: 1.0847 [2026-04-21 21:41:28] Validation | Batch 610/1567 | Loss: 1.0864 [2026-04-21 21:41:30] Validation | Batch 620/1567 | Loss: 1.0842 [2026-04-21 21:41:31] Validation | Batch 630/1567 | Loss: 1.0842 [2026-04-21 21:41:32] Validation | Batch 640/1567 | Loss: 1.0849 [2026-04-21 21:41:34] Validation | Batch 650/1567 | Loss: 1.0880 [2026-04-21 21:41:35] Validation | Batch 660/1567 | Loss: 1.0890 [2026-04-21 21:41:36] Validation | Batch 670/1567 | Loss: 1.0874 [2026-04-21 21:41:37] Validation | Batch 680/1567 | Loss: 1.0861 [2026-04-21 21:41:38] Validation | Batch 690/1567 | Loss: 1.0846 [2026-04-21 21:41:39] Validation | Batch 700/1567 | Loss: 1.0844 [2026-04-21 21:41:41] Validation | Batch 710/1567 | Loss: 1.0835 [2026-04-21 21:41:42] Validation | Batch 720/1567 | Loss: 1.0802 [2026-04-21 21:41:43] Validation | Batch 730/1567 | Loss: 1.0809 [2026-04-21 21:41:44] Validation | Batch 740/1567 | Loss: 1.0814 [2026-04-21 21:41:45] Validation | Batch 750/1567 | Loss: 1.0811 [2026-04-21 21:41:46] Validation | Batch 760/1567 | Loss: 1.0828 [2026-04-21 21:41:48] Validation | Batch 770/1567 | Loss: 1.0825 [2026-04-21 21:41:49] Validation | Batch 780/1567 | Loss: 1.0836 [2026-04-21 21:41:50] Validation | Batch 790/1567 | Loss: 1.0820 [2026-04-21 21:41:51] Validation | Batch 800/1567 | Loss: 1.0800 [2026-04-21 21:41:52] Validation | Batch 810/1567 | Loss: 1.0805 [2026-04-21 21:41:53] Validation | Batch 820/1567 | Loss: 1.0800 [2026-04-21 21:41:54] Validation | Batch 830/1567 | Loss: 1.0792 [2026-04-21 21:41:55] Validation | Batch 840/1567 | Loss: 1.0795 [2026-04-21 21:41:56] Validation | Batch 850/1567 | Loss: 1.0803 [2026-04-21 21:41:57] Validation | Batch 860/1567 | Loss: 1.0810 [2026-04-21 21:41:58] Validation | Batch 870/1567 | Loss: 1.0816 [2026-04-21 21:41:59] Validation | Batch 880/1567 | Loss: 1.0817 [2026-04-21 21:42:01] Validation | Batch 890/1567 | Loss: 1.0815 [2026-04-21 21:42:02] Validation | Batch 900/1567 | Loss: 1.0811 [2026-04-21 21:42:03] Validation | Batch 910/1567 | Loss: 1.0807 [2026-04-21 21:42:04] Validation | Batch 920/1567 | Loss: 1.0824 [2026-04-21 21:42:05] Validation | Batch 930/1567 | Loss: 1.0822 [2026-04-21 21:42:06] Validation | Batch 940/1567 | Loss: 1.0822 [2026-04-21 21:42:08] Validation | Batch 950/1567 | Loss: 1.0815 [2026-04-21 21:42:08] Validation | Batch 960/1567 | Loss: 1.0822 [2026-04-21 21:42:10] Validation | Batch 970/1567 | Loss: 1.0829 [2026-04-21 21:42:11] Validation | Batch 980/1567 | Loss: 1.0827 [2026-04-21 21:42:11] Validation | Batch 990/1567 | Loss: 1.0836 [2026-04-21 21:42:13] Validation | Batch 1000/1567 | Loss: 1.0843 [2026-04-21 21:42:14] Validation | Batch 1010/1567 | Loss: 1.0833 [2026-04-21 21:42:15] Validation | Batch 1020/1567 | Loss: 1.0844 [2026-04-21 21:42:16] Validation | Batch 1030/1567 | Loss: 1.0849 [2026-04-21 21:42:18] Validation | Batch 1040/1567 | Loss: 1.0840 [2026-04-21 21:42:19] Validation | Batch 1050/1567 | Loss: 1.0832 [2026-04-21 21:42:20] Validation | Batch 1060/1567 | Loss: 1.0843 [2026-04-21 21:42:21] Validation | Batch 1070/1567 | Loss: 1.0844 [2026-04-21 21:42:23] Validation | Batch 1080/1567 | Loss: 1.0859 [2026-04-21 21:42:24] Validation | Batch 1090/1567 | Loss: 1.0887 [2026-04-21 21:42:25] Validation | Batch 1100/1567 | Loss: 1.0902 [2026-04-21 21:42:26] Validation | Batch 1110/1567 | Loss: 1.0893 [2026-04-21 21:42:27] Validation | Batch 1120/1567 | Loss: 1.0895 [2026-04-21 21:42:28] Validation | Batch 1130/1567 | Loss: 1.0878 [2026-04-21 21:42:29] Validation | Batch 1140/1567 | Loss: 1.0882 [2026-04-21 21:42:31] Validation | Batch 1150/1567 | Loss: 1.0870 [2026-04-21 21:42:32] Validation | Batch 1160/1567 | Loss: 1.0864 [2026-04-21 21:42:33] Validation | Batch 1170/1567 | Loss: 1.0864 [2026-04-21 21:42:34] Validation | Batch 1180/1567 | Loss: 1.0867 [2026-04-21 21:42:35] Validation | Batch 1190/1567 | Loss: 1.0870 [2026-04-21 21:42:36] Validation | Batch 1200/1567 | Loss: 1.0858 [2026-04-21 21:42:38] Validation | Batch 1210/1567 | Loss: 1.0849 [2026-04-21 21:42:39] Validation | Batch 1220/1567 | Loss: 1.0858 [2026-04-21 21:42:40] Validation | Batch 1230/1567 | Loss: 1.0864 [2026-04-21 21:42:41] Validation | Batch 1240/1567 | Loss: 1.0863 [2026-04-21 21:42:42] Validation | Batch 1250/1567 | Loss: 1.0865 [2026-04-21 21:42:44] Validation | Batch 1260/1567 | Loss: 1.0862 [2026-04-21 21:42:45] Validation | Batch 1270/1567 | Loss: 1.0846 [2026-04-21 21:42:46] Validation | Batch 1280/1567 | Loss: 1.0847 [2026-04-21 21:42:48] Validation | Batch 1290/1567 | Loss: 1.0847 [2026-04-21 21:42:49] Validation | Batch 1300/1567 | Loss: 1.0851 [2026-04-21 21:42:50] Validation | Batch 1310/1567 | Loss: 1.0858 [2026-04-21 21:42:51] Validation | Batch 1320/1567 | Loss: 1.0863 [2026-04-21 21:42:52] Validation | Batch 1330/1567 | Loss: 1.0877 [2026-04-21 21:42:53] Validation | Batch 1340/1567 | Loss: 1.0873 [2026-04-21 21:42:54] Validation | Batch 1350/1567 | Loss: 1.0877 [2026-04-21 21:42:55] Validation | Batch 1360/1567 | Loss: 1.0869 [2026-04-21 21:42:57] Validation | Batch 1370/1567 | Loss: 1.0865 [2026-04-21 21:42:58] Validation | Batch 1380/1567 | Loss: 1.0866 [2026-04-21 21:42:59] Validation | Batch 1390/1567 | Loss: 1.0859 [2026-04-21 21:43:00] Validation | Batch 1400/1567 | Loss: 1.0856 [2026-04-21 21:43:01] Validation | Batch 1410/1567 | Loss: 1.0860 [2026-04-21 21:43:02] Validation | Batch 1420/1567 | Loss: 1.0858 [2026-04-21 21:43:03] Validation | Batch 1430/1567 | Loss: 1.0861 [2026-04-21 21:43:05] Validation | Batch 1440/1567 | Loss: 1.0867 [2026-04-21 21:43:06] Validation | Batch 1450/1567 | Loss: 1.0869 [2026-04-21 21:43:06] Validation | Batch 1460/1567 | Loss: 1.0864 [2026-04-21 21:43:07] Validation | Batch 1470/1567 | Loss: 1.0861 [2026-04-21 21:43:09] Validation | Batch 1480/1567 | Loss: 1.0857 [2026-04-21 21:43:09] Validation | Batch 1490/1567 | Loss: 1.0850 [2026-04-21 21:43:11] Validation | Batch 1500/1567 | Loss: 1.0849 [2026-04-21 21:43:12] Validation | Batch 1510/1567 | Loss: 1.0839 [2026-04-21 21:43:13] Validation | Batch 1520/1567 | Loss: 1.0838 [2026-04-21 21:43:13] Validation | Batch 1530/1567 | Loss: 1.0837 [2026-04-21 21:43:15] Validation | Batch 1540/1567 | Loss: 1.0843 [2026-04-21 21:43:16] Validation | Batch 1550/1567 | Loss: 1.0854 [2026-04-21 21:43:17] Validation | Batch 1560/1567 | Loss: 1.0851 [2026-04-21 21:43:18] Validation | Batch 1567/1567 | Loss: 1.0852 [2026-04-21 21:43:18] Validation | Loss: 1.0852 | PPL: 3.02 | Time: 184.75s [2026-04-21 21:43:36] New best model saved! Val loss: 1.0852 [2026-04-21 21:43:40] Epoch 1 | Step 6010 | Loss: 1.0990 | LR: 2.00e-05 [2026-04-21 21:43:45] Epoch 1 | Step 6020 | Loss: 1.0988 | LR: 2.00e-05 [2026-04-21 21:43:50] Epoch 1 | Step 6030 | Loss: 1.0989 | LR: 2.00e-05 [2026-04-21 21:43:57] Epoch 1 | Step 6040 | Loss: 1.0988 | LR: 2.00e-05 [2026-04-21 21:44:02] Epoch 1 | Step 6050 | Loss: 1.0988 | LR: 2.00e-05 [2026-04-21 21:44:07] Epoch 1 | Step 6060 | Loss: 1.0988 | LR: 2.00e-05 [2026-04-21 21:44:12] Epoch 1 | Step 6070 | Loss: 1.0988 | LR: 2.00e-05 [2026-04-21 21:44:18] Epoch 1 | Step 6080 | Loss: 1.0986 | LR: 2.00e-05 [2026-04-21 21:44:23] Epoch 1 | Step 6090 | Loss: 1.0985 | LR: 2.00e-05 [2026-04-21 21:44:28] Epoch 1 | Step 6100 | Loss: 1.0982 | LR: 2.00e-05 [2026-04-21 21:44:32] Epoch 1 | Step 6110 | Loss: 1.0983 | LR: 2.00e-05 [2026-04-21 21:44:38] Epoch 1 | Step 6120 | Loss: 1.0981 | LR: 2.00e-05 [2026-04-21 21:44:44] Epoch 1 | Step 6130 | Loss: 1.0982 | LR: 2.00e-05 [2026-04-21 21:44:49] Epoch 1 | Step 6140 | Loss: 1.0980 | LR: 2.00e-05 [2026-04-21 21:44:55] Epoch 1 | Step 6150 | Loss: 1.0977 | LR: 2.00e-05 [2026-04-21 21:45:01] Epoch 1 | Step 6160 | Loss: 1.0976 | LR: 2.00e-05 [2026-04-21 21:45:06] Epoch 1 | Step 6170 | Loss: 1.0977 | LR: 2.00e-05 [2026-04-21 21:45:11] Epoch 1 | Step 6180 | Loss: 1.0978 | LR: 2.00e-05 [2026-04-21 21:45:17] Epoch 1 | Step 6190 | Loss: 1.0977 | LR: 2.00e-05 [2026-04-21 21:45:22] Epoch 1 | Step 6200 | Loss: 1.0975 | LR: 2.00e-05 [2026-04-21 21:45:27] Epoch 1 | Step 6210 | Loss: 1.0974 | LR: 2.00e-05 [2026-04-21 21:45:32] Epoch 1 | Step 6220 | Loss: 1.0974 | LR: 2.00e-05 [2026-04-21 21:45:37] Epoch 1 | Step 6230 | Loss: 1.0975 | LR: 2.00e-05 [2026-04-21 21:45:42] Epoch 1 | Step 6240 | Loss: 1.0978 | LR: 2.00e-05 [2026-04-21 21:45:47] Epoch 1 | Step 6250 | Loss: 1.0975 | LR: 2.00e-05 [2026-04-21 21:45:52] Epoch 1 | Step 6260 | Loss: 1.0974 | LR: 2.00e-05 [2026-04-21 21:45:58] Epoch 1 | Step 6270 | Loss: 1.0972 | LR: 2.00e-05 [2026-04-21 21:46:04] Epoch 1 | Step 6280 | Loss: 1.0972 | LR: 2.00e-05 [2026-04-21 21:46:09] Epoch 1 | Step 6290 | Loss: 1.0973 | LR: 2.00e-05 [2026-04-21 21:46:14] Epoch 1 | Step 6300 | Loss: 1.0973 | LR: 2.00e-05 [2026-04-21 21:46:19] Epoch 1 | Step 6310 | Loss: 1.0972 | LR: 2.00e-05 [2026-04-21 21:46:25] Epoch 1 | Step 6320 | Loss: 1.0973 | LR: 2.00e-05 [2026-04-21 21:46:30] Epoch 1 | Step 6330 | Loss: 1.0974 | LR: 2.00e-05 [2026-04-21 21:46:35] Epoch 1 | Step 6340 | Loss: 1.0974 | LR: 2.00e-05 [2026-04-21 21:46:40] Epoch 1 | Step 6350 | Loss: 1.0975 | LR: 2.00e-05 [2026-04-21 21:46:46] Epoch 1 | Step 6360 | Loss: 1.0974 | LR: 2.00e-05 [2026-04-21 21:46:50] Epoch 1 | Step 6370 | Loss: 1.0974 | LR: 2.00e-05 [2026-04-21 21:46:55] Epoch 1 | Step 6380 | Loss: 1.0973 | LR: 2.00e-05 [2026-04-21 21:47:00] Epoch 1 | Step 6390 | Loss: 1.0973 | LR: 2.00e-05 [2026-04-21 21:47:05] Epoch 1 | Step 6400 | Loss: 1.0972 | LR: 2.00e-05 [2026-04-21 21:47:11] Epoch 1 | Step 6410 | Loss: 1.0971 | LR: 2.00e-05 [2026-04-21 21:47:16] Epoch 1 | Step 6420 | Loss: 1.0969 | LR: 2.00e-05 [2026-04-21 21:47:22] Epoch 1 | Step 6430 | Loss: 1.0966 | LR: 2.00e-05 [2026-04-21 21:47:27] Epoch 1 | Step 6440 | Loss: 1.0965 | LR: 2.00e-05 [2026-04-21 21:47:32] Epoch 1 | Step 6450 | Loss: 1.0965 | LR: 2.00e-05 [2026-04-21 21:47:38] Epoch 1 | Step 6460 | Loss: 1.0966 | LR: 2.00e-05 [2026-04-21 21:47:44] Epoch 1 | Step 6470 | Loss: 1.0965 | LR: 2.00e-05 [2026-04-21 21:47:50] Epoch 1 | Step 6480 | Loss: 1.0965 | LR: 2.00e-05 [2026-04-21 21:47:55] Epoch 1 | Step 6490 | Loss: 1.0965 | LR: 2.00e-05 [2026-04-21 21:48:00] Epoch 1 | Step 6500 | Loss: 1.0965 | LR: 2.00e-05 [2026-04-21 21:48:05] Epoch 1 | Step 6510 | Loss: 1.0964 | LR: 2.00e-05 [2026-04-21 21:48:10] Epoch 1 | Step 6520 | Loss: 1.0964 | LR: 2.00e-05 [2026-04-21 21:48:15] Epoch 1 | Step 6530 | Loss: 1.0964 | LR: 2.00e-05 [2026-04-21 21:48:21] Epoch 1 | Step 6540 | Loss: 1.0964 | LR: 2.00e-05 [2026-04-21 21:48:26] Epoch 1 | Step 6550 | Loss: 1.0962 | LR: 2.00e-05 [2026-04-21 21:48:31] Epoch 1 | Step 6560 | Loss: 1.0963 | LR: 2.00e-05 [2026-04-21 21:48:37] Epoch 1 | Step 6570 | Loss: 1.0964 | LR: 2.00e-05 [2026-04-21 21:48:42] Epoch 1 | Step 6580 | Loss: 1.0965 | LR: 2.00e-05 [2026-04-21 21:48:49] Epoch 1 | Step 6590 | Loss: 1.0965 | LR: 2.00e-05 [2026-04-21 21:48:54] Epoch 1 | Step 6600 | Loss: 1.0963 | LR: 2.00e-05 [2026-04-21 21:48:59] Epoch 1 | Step 6610 | Loss: 1.0964 | LR: 2.00e-05 [2026-04-21 21:49:05] Epoch 1 | Step 6620 | Loss: 1.0965 | LR: 2.00e-05 [2026-04-21 21:49:10] Epoch 1 | Step 6630 | Loss: 1.0964 | LR: 2.00e-05 [2026-04-21 21:49:16] Epoch 1 | Step 6640 | Loss: 1.0964 | LR: 2.00e-05 [2026-04-21 21:49:21] Epoch 1 | Step 6650 | Loss: 1.0963 | LR: 2.00e-05 [2026-04-21 21:49:27] Epoch 1 | Step 6660 | Loss: 1.0963 | LR: 2.00e-05 [2026-04-21 21:49:32] Epoch 1 | Step 6670 | Loss: 1.0963 | LR: 2.00e-05 [2026-04-21 21:49:37] Epoch 1 | Step 6680 | Loss: 1.0961 | LR: 2.00e-05 [2026-04-21 21:49:42] Epoch 1 | Step 6690 | Loss: 1.0961 | LR: 2.00e-05 [2026-04-21 21:49:47] Epoch 1 | Step 6700 | Loss: 1.0959 | LR: 2.00e-05 [2026-04-21 21:49:52] Epoch 1 | Step 6710 | Loss: 1.0958 | LR: 2.00e-05 [2026-04-21 21:49:57] Epoch 1 | Step 6720 | Loss: 1.0957 | LR: 2.00e-05 [2026-04-21 21:50:03] Epoch 1 | Step 6730 | Loss: 1.0956 | LR: 2.00e-05 [2026-04-21 21:50:08] Epoch 1 | Step 6740 | Loss: 1.0957 | LR: 2.00e-05 [2026-04-21 21:50:14] Epoch 1 | Step 6750 | Loss: 1.0956 | LR: 2.00e-05 [2026-04-21 21:50:20] Epoch 1 | Step 6760 | Loss: 1.0954 | LR: 2.00e-05 [2026-04-21 21:50:25] Epoch 1 | Step 6770 | Loss: 1.0955 | LR: 2.00e-05 [2026-04-21 21:50:31] Epoch 1 | Step 6780 | Loss: 1.0955 | LR: 2.00e-05 [2026-04-21 21:50:36] Epoch 1 | Step 6790 | Loss: 1.0953 | LR: 2.00e-05 [2026-04-21 21:50:41] Epoch 1 | Step 6800 | Loss: 1.0952 | LR: 2.00e-05 [2026-04-21 21:50:46] Epoch 1 | Step 6810 | Loss: 1.0952 | LR: 2.00e-05 [2026-04-21 21:50:51] Epoch 1 | Step 6820 | Loss: 1.0951 | LR: 2.00e-05 [2026-04-21 21:50:56] Epoch 1 | Step 6830 | Loss: 1.0951 | LR: 2.00e-05 [2026-04-21 21:51:00] Epoch 1 | Step 6840 | Loss: 1.0951 | LR: 2.00e-05 [2026-04-21 21:51:07] Epoch 1 | Step 6850 | Loss: 1.0951 | LR: 2.00e-05 [2026-04-21 21:51:12] Epoch 1 | Step 6860 | Loss: 1.0951 | LR: 2.00e-05 [2026-04-21 21:51:18] Epoch 1 | Step 6870 | Loss: 1.0951 | LR: 2.00e-05 [2026-04-21 21:51:23] Epoch 1 | Step 6880 | Loss: 1.0950 | LR: 2.00e-05 [2026-04-21 21:51:28] Epoch 1 | Step 6890 | Loss: 1.0951 | LR: 2.00e-05 [2026-04-21 21:51:34] Epoch 1 | Step 6900 | Loss: 1.0949 | LR: 2.00e-05 [2026-04-21 21:51:40] Epoch 1 | Step 6910 | Loss: 1.0950 | LR: 2.00e-05 [2026-04-21 21:51:45] Epoch 1 | Step 6920 | Loss: 1.0951 | LR: 2.00e-05 [2026-04-21 21:51:50] Epoch 1 | Step 6930 | Loss: 1.0949 | LR: 2.00e-05 [2026-04-21 21:51:56] Epoch 1 | Step 6940 | Loss: 1.0949 | LR: 2.00e-05 [2026-04-21 21:52:00] Epoch 1 | Step 6950 | Loss: 1.0948 | LR: 2.00e-05 [2026-04-21 21:52:05] Epoch 1 | Step 6960 | Loss: 1.0948 | LR: 2.00e-05 [2026-04-21 21:52:11] Epoch 1 | Step 6970 | Loss: 1.0945 | LR: 2.00e-05 [2026-04-21 21:52:16] Epoch 1 | Step 6980 | Loss: 1.0944 | LR: 2.00e-05 [2026-04-21 21:52:21] Epoch 1 | Step 6990 | Loss: 1.0945 | LR: 2.00e-05 [2026-04-21 21:52:26] Epoch 1 | Step 7000 | Loss: 1.0944 | LR: 2.00e-05 [2026-04-21 21:52:27] Validation | Batch 10/1567 | Loss: 1.0685 [2026-04-21 21:52:29] Validation | Batch 20/1567 | Loss: 1.1402 [2026-04-21 21:52:30] Validation | Batch 30/1567 | Loss: 1.1033 [2026-04-21 21:52:31] Validation | Batch 40/1567 | Loss: 1.1202 [2026-04-21 21:52:32] Validation | Batch 50/1567 | Loss: 1.1009 [2026-04-21 21:52:33] Validation | Batch 60/1567 | Loss: 1.0914 [2026-04-21 21:52:35] Validation | Batch 70/1567 | Loss: 1.0796 [2026-04-21 21:52:37] Validation | Batch 80/1567 | Loss: 1.0906 [2026-04-21 21:52:38] Validation | Batch 90/1567 | Loss: 1.0843 [2026-04-21 21:52:39] Validation | Batch 100/1567 | Loss: 1.0646 [2026-04-21 21:52:40] Validation | Batch 110/1567 | Loss: 1.0559 [2026-04-21 21:52:41] Validation | Batch 120/1567 | Loss: 1.0497 [2026-04-21 21:52:43] Validation | Batch 130/1567 | Loss: 1.0450 [2026-04-21 21:52:44] Validation | Batch 140/1567 | Loss: 1.0551 [2026-04-21 21:52:45] Validation | Batch 150/1567 | Loss: 1.0636 [2026-04-21 21:52:46] Validation | Batch 160/1567 | Loss: 1.0626 [2026-04-21 21:52:47] Validation | Batch 170/1567 | Loss: 1.0557 [2026-04-21 21:52:48] Validation | Batch 180/1567 | Loss: 1.0579 [2026-04-21 21:52:49] Validation | Batch 190/1567 | Loss: 1.0623 [2026-04-21 21:52:50] Validation | Batch 200/1567 | Loss: 1.0668 [2026-04-21 21:52:52] Validation | Batch 210/1567 | Loss: 1.0654 [2026-04-21 21:52:53] Validation | Batch 220/1567 | Loss: 1.0689 [2026-04-21 21:52:55] Validation | Batch 230/1567 | Loss: 1.0720 [2026-04-21 21:52:56] Validation | Batch 240/1567 | Loss: 1.0752 [2026-04-21 21:52:57] Validation | Batch 250/1567 | Loss: 1.0793 [2026-04-21 21:52:58] Validation | Batch 260/1567 | Loss: 1.0820 [2026-04-21 21:52:59] Validation | Batch 270/1567 | Loss: 1.0872 [2026-04-21 21:53:01] Validation | Batch 280/1567 | Loss: 1.0901 [2026-04-21 21:53:02] Validation | Batch 290/1567 | Loss: 1.0859 [2026-04-21 21:53:04] Validation | Batch 300/1567 | Loss: 1.0849 [2026-04-21 21:53:05] Validation | Batch 310/1567 | Loss: 1.0818 [2026-04-21 21:53:06] Validation | Batch 320/1567 | Loss: 1.0842 [2026-04-21 21:53:07] Validation | Batch 330/1567 | Loss: 1.0841 [2026-04-21 21:53:09] Validation | Batch 340/1567 | Loss: 1.0834 [2026-04-21 21:53:10] Validation | Batch 350/1567 | Loss: 1.0812 [2026-04-21 21:53:11] Validation | Batch 360/1567 | Loss: 1.0752 [2026-04-21 21:53:12] Validation | Batch 370/1567 | Loss: 1.0751 [2026-04-21 21:53:13] Validation | Batch 380/1567 | Loss: 1.0787 [2026-04-21 21:53:15] Validation | Batch 390/1567 | Loss: 1.0779 [2026-04-21 21:53:16] Validation | Batch 400/1567 | Loss: 1.0788 [2026-04-21 21:53:17] Validation | Batch 410/1567 | Loss: 1.0751 [2026-04-21 21:53:18] Validation | Batch 420/1567 | Loss: 1.0734 [2026-04-21 21:53:19] Validation | Batch 430/1567 | Loss: 1.0768 [2026-04-21 21:53:21] Validation | Batch 440/1567 | Loss: 1.0771 [2026-04-21 21:53:22] Validation | Batch 450/1567 | Loss: 1.0797 [2026-04-21 21:53:23] Validation | Batch 460/1567 | Loss: 1.0829 [2026-04-21 21:53:24] Validation | Batch 470/1567 | Loss: 1.0879 [2026-04-21 21:53:25] Validation | Batch 480/1567 | Loss: 1.0853 [2026-04-21 21:53:26] Validation | Batch 490/1567 | Loss: 1.0829 [2026-04-21 21:53:27] Validation | Batch 500/1567 | Loss: 1.0838 [2026-04-21 21:53:29] Validation | Batch 510/1567 | Loss: 1.0838 [2026-04-21 21:53:29] Validation | Batch 520/1567 | Loss: 1.0850 [2026-04-21 21:53:31] Validation | Batch 530/1567 | Loss: 1.0836 [2026-04-21 21:53:32] Validation | Batch 540/1567 | Loss: 1.0806 [2026-04-21 21:53:34] Validation | Batch 550/1567 | Loss: 1.0820 [2026-04-21 21:53:35] Validation | Batch 560/1567 | Loss: 1.0810 [2026-04-21 21:53:36] Validation | Batch 570/1567 | Loss: 1.0771 [2026-04-21 21:53:37] Validation | Batch 580/1567 | Loss: 1.0787 [2026-04-21 21:53:39] Validation | Batch 590/1567 | Loss: 1.0784 [2026-04-21 21:53:40] Validation | Batch 600/1567 | Loss: 1.0773 [2026-04-21 21:53:41] Validation | Batch 610/1567 | Loss: 1.0795 [2026-04-21 21:53:42] Validation | Batch 620/1567 | Loss: 1.0770 [2026-04-21 21:53:44] Validation | Batch 630/1567 | Loss: 1.0770 [2026-04-21 21:53:45] Validation | Batch 640/1567 | Loss: 1.0777 [2026-04-21 21:53:47] Validation | Batch 650/1567 | Loss: 1.0806 [2026-04-21 21:53:48] Validation | Batch 660/1567 | Loss: 1.0820 [2026-04-21 21:53:49] Validation | Batch 670/1567 | Loss: 1.0805 [2026-04-21 21:53:50] Validation | Batch 680/1567 | Loss: 1.0792 [2026-04-21 21:53:51] Validation | Batch 690/1567 | Loss: 1.0776 [2026-04-21 21:53:52] Validation | Batch 700/1567 | Loss: 1.0776 [2026-04-21 21:53:54] Validation | Batch 710/1567 | Loss: 1.0768 [2026-04-21 21:53:55] Validation | Batch 720/1567 | Loss: 1.0738 [2026-04-21 21:53:56] Validation | Batch 730/1567 | Loss: 1.0744 [2026-04-21 21:53:56] Validation | Batch 740/1567 | Loss: 1.0750 [2026-04-21 21:53:58] Validation | Batch 750/1567 | Loss: 1.0745 [2026-04-21 21:53:59] Validation | Batch 760/1567 | Loss: 1.0760 [2026-04-21 21:54:00] Validation | Batch 770/1567 | Loss: 1.0757 [2026-04-21 21:54:02] Validation | Batch 780/1567 | Loss: 1.0769 [2026-04-21 21:54:03] Validation | Batch 790/1567 | Loss: 1.0753 [2026-04-21 21:54:04] Validation | Batch 800/1567 | Loss: 1.0735 [2026-04-21 21:54:05] Validation | Batch 810/1567 | Loss: 1.0741 [2026-04-21 21:54:06] Validation | Batch 820/1567 | Loss: 1.0736 [2026-04-21 21:54:07] Validation | Batch 830/1567 | Loss: 1.0727 [2026-04-21 21:54:08] Validation | Batch 840/1567 | Loss: 1.0731 [2026-04-21 21:54:09] Validation | Batch 850/1567 | Loss: 1.0741 [2026-04-21 21:54:10] Validation | Batch 860/1567 | Loss: 1.0747 [2026-04-21 21:54:11] Validation | Batch 870/1567 | Loss: 1.0752 [2026-04-21 21:54:12] Validation | Batch 880/1567 | Loss: 1.0752 [2026-04-21 21:54:13] Validation | Batch 890/1567 | Loss: 1.0749 [2026-04-21 21:54:15] Validation | Batch 900/1567 | Loss: 1.0746 [2026-04-21 21:54:16] Validation | Batch 910/1567 | Loss: 1.0741 [2026-04-21 21:54:17] Validation | Batch 920/1567 | Loss: 1.0758 [2026-04-21 21:54:18] Validation | Batch 930/1567 | Loss: 1.0756 [2026-04-21 21:54:19] Validation | Batch 940/1567 | Loss: 1.0754 [2026-04-21 21:54:20] Validation | Batch 950/1567 | Loss: 1.0746 [2026-04-21 21:54:21] Validation | Batch 960/1567 | Loss: 1.0752 [2026-04-21 21:54:22] Validation | Batch 970/1567 | Loss: 1.0756 [2026-04-21 21:54:23] Validation | Batch 980/1567 | Loss: 1.0753 [2026-04-21 21:54:24] Validation | Batch 990/1567 | Loss: 1.0762 [2026-04-21 21:54:25] Validation | Batch 1000/1567 | Loss: 1.0769 [2026-04-21 21:54:27] Validation | Batch 1010/1567 | Loss: 1.0759 [2026-04-21 21:54:28] Validation | Batch 1020/1567 | Loss: 1.0772 [2026-04-21 21:54:29] Validation | Batch 1030/1567 | Loss: 1.0777 [2026-04-21 21:54:30] Validation | Batch 1040/1567 | Loss: 1.0770 [2026-04-21 21:54:31] Validation | Batch 1050/1567 | Loss: 1.0761 [2026-04-21 21:54:32] Validation | Batch 1060/1567 | Loss: 1.0772 [2026-04-21 21:54:34] Validation | Batch 1070/1567 | Loss: 1.0771 [2026-04-21 21:54:35] Validation | Batch 1080/1567 | Loss: 1.0786 [2026-04-21 21:54:36] Validation | Batch 1090/1567 | Loss: 1.0813 [2026-04-21 21:54:38] Validation | Batch 1100/1567 | Loss: 1.0828 [2026-04-21 21:54:39] Validation | Batch 1110/1567 | Loss: 1.0819 [2026-04-21 21:54:40] Validation | Batch 1120/1567 | Loss: 1.0822 [2026-04-21 21:54:41] Validation | Batch 1130/1567 | Loss: 1.0804 [2026-04-21 21:54:42] Validation | Batch 1140/1567 | Loss: 1.0809 [2026-04-21 21:54:43] Validation | Batch 1150/1567 | Loss: 1.0797 [2026-04-21 21:54:44] Validation | Batch 1160/1567 | Loss: 1.0790 [2026-04-21 21:54:45] Validation | Batch 1170/1567 | Loss: 1.0789 [2026-04-21 21:54:47] Validation | Batch 1180/1567 | Loss: 1.0792 [2026-04-21 21:54:48] Validation | Batch 1190/1567 | Loss: 1.0795 [2026-04-21 21:54:49] Validation | Batch 1200/1567 | Loss: 1.0783 [2026-04-21 21:54:50] Validation | Batch 1210/1567 | Loss: 1.0775 [2026-04-21 21:54:51] Validation | Batch 1220/1567 | Loss: 1.0785 [2026-04-21 21:54:53] Validation | Batch 1230/1567 | Loss: 1.0791 [2026-04-21 21:54:54] Validation | Batch 1240/1567 | Loss: 1.0790 [2026-04-21 21:54:55] Validation | Batch 1250/1567 | Loss: 1.0792 [2026-04-21 21:54:56] Validation | Batch 1260/1567 | Loss: 1.0790 [2026-04-21 21:54:58] Validation | Batch 1270/1567 | Loss: 1.0774 [2026-04-21 21:54:59] Validation | Batch 1280/1567 | Loss: 1.0775 [2026-04-21 21:55:00] Validation | Batch 1290/1567 | Loss: 1.0775 [2026-04-21 21:55:02] Validation | Batch 1300/1567 | Loss: 1.0780 [2026-04-21 21:55:03] Validation | Batch 1310/1567 | Loss: 1.0787 [2026-04-21 21:55:04] Validation | Batch 1320/1567 | Loss: 1.0793 [2026-04-21 21:55:05] Validation | Batch 1330/1567 | Loss: 1.0807 [2026-04-21 21:55:06] Validation | Batch 1340/1567 | Loss: 1.0803 [2026-04-21 21:55:07] Validation | Batch 1350/1567 | Loss: 1.0806 [2026-04-21 21:55:08] Validation | Batch 1360/1567 | Loss: 1.0798 [2026-04-21 21:55:09] Validation | Batch 1370/1567 | Loss: 1.0794 [2026-04-21 21:55:11] Validation | Batch 1380/1567 | Loss: 1.0794 [2026-04-21 21:55:12] Validation | Batch 1390/1567 | Loss: 1.0786 [2026-04-21 21:55:13] Validation | Batch 1400/1567 | Loss: 1.0783 [2026-04-21 21:55:14] Validation | Batch 1410/1567 | Loss: 1.0787 [2026-04-21 21:55:15] Validation | Batch 1420/1567 | Loss: 1.0785 [2026-04-21 21:55:16] Validation | Batch 1430/1567 | Loss: 1.0788 [2026-04-21 21:55:17] Validation | Batch 1440/1567 | Loss: 1.0794 [2026-04-21 21:55:18] Validation | Batch 1450/1567 | Loss: 1.0794 [2026-04-21 21:55:19] Validation | Batch 1460/1567 | Loss: 1.0790 [2026-04-21 21:55:20] Validation | Batch 1470/1567 | Loss: 1.0788 [2026-04-21 21:55:21] Validation | Batch 1480/1567 | Loss: 1.0784 [2026-04-21 21:55:22] Validation | Batch 1490/1567 | Loss: 1.0778 [2026-04-21 21:55:23] Validation | Batch 1500/1567 | Loss: 1.0774 [2026-04-21 21:55:24] Validation | Batch 1510/1567 | Loss: 1.0765 [2026-04-21 21:55:25] Validation | Batch 1520/1567 | Loss: 1.0763 [2026-04-21 21:55:26] Validation | Batch 1530/1567 | Loss: 1.0762 [2026-04-21 21:55:28] Validation | Batch 1540/1567 | Loss: 1.0766 [2026-04-21 21:55:28] Validation | Batch 1550/1567 | Loss: 1.0777 [2026-04-21 21:55:30] Validation | Batch 1560/1567 | Loss: 1.0774 [2026-04-21 21:55:31] Validation | Batch 1567/1567 | Loss: 1.0773 [2026-04-21 21:55:31] Validation | Loss: 1.0773 | PPL: 2.99 | Time: 184.48s [2026-04-21 21:55:48] New best model saved! Val loss: 1.0773 [2026-04-21 21:55:53] Epoch 1 | Step 7010 | Loss: 1.0943 | LR: 2.00e-05 [2026-04-21 21:55:59] Epoch 1 | Step 7020 | Loss: 1.0945 | LR: 2.00e-05 [2026-04-21 21:56:05] Epoch 1 | Step 7030 | Loss: 1.0945 | LR: 2.00e-05 [2026-04-21 21:56:10] Epoch 1 | Step 7040 | Loss: 1.0946 | LR: 2.00e-05 [2026-04-21 21:56:15] Epoch 1 | Step 7050 | Loss: 1.0946 | LR: 2.00e-05 [2026-04-21 21:56:20] Epoch 1 | Step 7060 | Loss: 1.0944 | LR: 2.00e-05 [2026-04-21 21:56:25] Epoch 1 | Step 7070 | Loss: 1.0944 | LR: 2.00e-05 [2026-04-21 21:56:30] Epoch 1 | Step 7080 | Loss: 1.0945 | LR: 2.00e-05 [2026-04-21 21:56:36] Epoch 1 | Step 7090 | Loss: 1.0944 | LR: 2.00e-05 [2026-04-21 21:56:40] Epoch 1 | Step 7100 | Loss: 1.0942 | LR: 2.00e-05 [2026-04-21 21:56:45] Epoch 1 | Step 7110 | Loss: 1.0942 | LR: 2.00e-05 [2026-04-21 21:56:51] Epoch 1 | Step 7120 | Loss: 1.0941 | LR: 2.00e-05 [2026-04-21 21:56:56] Epoch 1 | Step 7130 | Loss: 1.0941 | LR: 2.00e-05 [2026-04-21 21:57:02] Epoch 1 | Step 7140 | Loss: 1.0940 | LR: 2.00e-05 [2026-04-21 21:57:07] Epoch 1 | Step 7150 | Loss: 1.0941 | LR: 2.00e-05 [2026-04-21 21:57:13] Epoch 1 | Step 7160 | Loss: 1.0939 | LR: 2.00e-05 [2026-04-21 21:57:17] Epoch 1 | Step 7170 | Loss: 1.0938 | LR: 2.00e-05 [2026-04-21 21:57:22] Epoch 1 | Step 7180 | Loss: 1.0937 | LR: 2.00e-05 [2026-04-21 21:57:28] Epoch 1 | Step 7190 | Loss: 1.0937 | LR: 2.00e-05 [2026-04-21 21:57:33] Epoch 1 | Step 7200 | Loss: 1.0933 | LR: 2.00e-05 [2026-04-21 21:57:38] Epoch 1 | Step 7210 | Loss: 1.0934 | LR: 2.00e-05 [2026-04-21 21:57:43] Epoch 1 | Step 7220 | Loss: 1.0932 | LR: 2.00e-05 [2026-04-21 21:57:49] Epoch 1 | Step 7230 | Loss: 1.0933 | LR: 2.00e-05 [2026-04-21 21:57:54] Epoch 1 | Step 7240 | Loss: 1.0934 | LR: 2.00e-05 [2026-04-21 21:57:59] Epoch 1 | Step 7250 | Loss: 1.0934 | LR: 2.00e-05 [2026-04-21 21:58:04] Epoch 1 | Step 7260 | Loss: 1.0934 | LR: 2.00e-05 [2026-04-21 21:58:10] Epoch 1 | Step 7270 | Loss: 1.0934 | LR: 2.00e-05 [2026-04-21 21:58:15] Epoch 1 | Step 7280 | Loss: 1.0935 | LR: 2.00e-05 [2026-04-21 21:58:20] Epoch 1 | Step 7290 | Loss: 1.0933 | LR: 2.00e-05 [2026-04-21 21:58:25] Epoch 1 | Step 7300 | Loss: 1.0932 | LR: 2.00e-05 [2026-04-21 21:58:31] Epoch 1 | Step 7310 | Loss: 1.0931 | LR: 2.00e-05 [2026-04-21 21:58:36] Epoch 1 | Step 7320 | Loss: 1.0930 | LR: 2.00e-05 [2026-04-21 21:58:41] Epoch 1 | Step 7330 | Loss: 1.0929 | LR: 2.00e-05 [2026-04-21 21:58:47] Epoch 1 | Step 7340 | Loss: 1.0930 | LR: 2.00e-05 [2026-04-21 21:58:52] Epoch 1 | Step 7350 | Loss: 1.0928 | LR: 2.00e-05 [2026-04-21 21:58:57] Epoch 1 | Step 7360 | Loss: 1.0930 | LR: 2.00e-05 [2026-04-21 21:59:02] Epoch 1 | Step 7370 | Loss: 1.0929 | LR: 2.00e-05 [2026-04-21 21:59:07] Epoch 1 | Step 7380 | Loss: 1.0929 | LR: 2.00e-05 [2026-04-21 21:59:12] Epoch 1 | Step 7390 | Loss: 1.0929 | LR: 2.00e-05 [2026-04-21 21:59:18] Epoch 1 | Step 7400 | Loss: 1.0927 | LR: 2.00e-05 [2026-04-21 21:59:23] Epoch 1 | Step 7410 | Loss: 1.0926 | LR: 2.00e-05 [2026-04-21 21:59:27] Epoch 1 | Step 7420 | Loss: 1.0927 | LR: 2.00e-05 [2026-04-21 21:59:32] Epoch 1 | Step 7430 | Loss: 1.0925 | LR: 2.00e-05 [2026-04-21 21:59:37] Epoch 1 | Step 7440 | Loss: 1.0926 | LR: 2.00e-05 [2026-04-21 21:59:42] Epoch 1 | Step 7450 | Loss: 1.0923 | LR: 2.00e-05 [2026-04-21 21:59:47] Epoch 1 | Step 7460 | Loss: 1.0922 | LR: 2.00e-05 [2026-04-21 21:59:53] Epoch 1 | Step 7470 | Loss: 1.0922 | LR: 2.00e-05 [2026-04-21 21:59:58] Epoch 1 | Step 7480 | Loss: 1.0921 | LR: 2.00e-05 [2026-04-21 22:00:03] Epoch 1 | Step 7490 | Loss: 1.0920 | LR: 2.00e-05 [2026-04-21 22:00:08] Epoch 1 | Step 7500 | Loss: 1.0921 | LR: 2.00e-05 [2026-04-21 22:00:12] Epoch 1 | Step 7510 | Loss: 1.0922 | LR: 2.00e-05 [2026-04-21 22:00:17] Epoch 1 | Step 7520 | Loss: 1.0923 | LR: 2.00e-05 [2026-04-21 22:00:23] Epoch 1 | Step 7530 | Loss: 1.0921 | LR: 2.00e-05 [2026-04-21 22:00:27] Epoch 1 | Step 7540 | Loss: 1.0921 | LR: 2.00e-05 [2026-04-21 22:00:33] Epoch 1 | Step 7550 | Loss: 1.0920 | LR: 2.00e-05 [2026-04-21 22:00:38] Epoch 1 | Step 7560 | Loss: 1.0920 | LR: 2.00e-05 [2026-04-21 22:00:44] Epoch 1 | Step 7570 | Loss: 1.0921 | LR: 2.00e-05 [2026-04-21 22:00:49] Epoch 1 | Step 7580 | Loss: 1.0921 | LR: 2.00e-05 [2026-04-21 22:00:54] Epoch 1 | Step 7590 | Loss: 1.0920 | LR: 2.00e-05 [2026-04-21 22:00:59] Epoch 1 | Step 7600 | Loss: 1.0919 | LR: 2.00e-05 [2026-04-21 22:01:05] Epoch 1 | Step 7610 | Loss: 1.0919 | LR: 2.00e-05 [2026-04-21 22:01:10] Epoch 1 | Step 7620 | Loss: 1.0918 | LR: 2.00e-05 [2026-04-21 22:01:16] Epoch 1 | Step 7630 | Loss: 1.0918 | LR: 2.00e-05 [2026-04-21 22:01:21] Epoch 1 | Step 7640 | Loss: 1.0917 | LR: 2.00e-05 [2026-04-21 22:01:27] Epoch 1 | Step 7650 | Loss: 1.0915 | LR: 2.00e-05 [2026-04-21 22:01:33] Epoch 1 | Step 7660 | Loss: 1.0915 | LR: 2.00e-05 [2026-04-21 22:01:38] Epoch 1 | Step 7670 | Loss: 1.0915 | LR: 2.00e-05 [2026-04-21 22:01:43] Epoch 1 | Step 7680 | Loss: 1.0914 | LR: 2.00e-05 [2026-04-21 22:01:49] Epoch 1 | Step 7690 | Loss: 1.0916 | LR: 2.00e-05 [2026-04-21 22:01:54] Epoch 1 | Step 7700 | Loss: 1.0914 | LR: 2.00e-05 [2026-04-21 22:02:00] Epoch 1 | Step 7710 | Loss: 1.0912 | LR: 2.00e-05 [2026-04-21 22:02:06] Epoch 1 | Step 7720 | Loss: 1.0913 | LR: 2.00e-05 [2026-04-21 22:02:11] Epoch 1 | Step 7730 | Loss: 1.0911 | LR: 2.00e-05 [2026-04-21 22:02:16] Epoch 1 | Step 7740 | Loss: 1.0912 | LR: 2.00e-05 [2026-04-21 22:02:22] Epoch 1 | Step 7750 | Loss: 1.0911 | LR: 2.00e-05 [2026-04-21 22:02:27] Epoch 1 | Step 7760 | Loss: 1.0911 | LR: 2.00e-05 [2026-04-21 22:02:32] Epoch 1 | Step 7770 | Loss: 1.0908 | LR: 2.00e-05 [2026-04-21 22:02:38] Epoch 1 | Step 7780 | Loss: 1.0908 | LR: 2.00e-05 [2026-04-21 22:02:43] Epoch 1 | Step 7790 | Loss: 1.0908 | LR: 2.00e-05 [2026-04-21 22:02:48] Epoch 1 | Step 7800 | Loss: 1.0911 | LR: 2.00e-05 [2026-04-21 22:02:53] Epoch 1 | Step 7810 | Loss: 1.0909 | LR: 2.00e-05 [2026-04-21 22:02:58] Epoch 1 | Step 7820 | Loss: 1.0908 | LR: 2.00e-05 [2026-04-21 22:03:03] Epoch 1 | Step 7830 | Loss: 1.0909 | LR: 2.00e-05 [2026-04-21 22:03:10] Epoch 1 | Step 7840 | Loss: 1.0908 | LR: 2.00e-05 [2026-04-21 22:03:15] Epoch 1 | Step 7850 | Loss: 1.0908 | LR: 2.00e-05 [2026-04-21 22:03:21] Epoch 1 | Step 7860 | Loss: 1.0910 | LR: 2.00e-05 [2026-04-21 22:03:26] Epoch 1 | Step 7870 | Loss: 1.0909 | LR: 2.00e-05 [2026-04-21 22:03:31] Epoch 1 | Step 7880 | Loss: 1.0908 | LR: 2.00e-05 [2026-04-21 22:03:36] Epoch 1 | Step 7890 | Loss: 1.0908 | LR: 2.00e-05 [2026-04-21 22:03:41] Epoch 1 | Step 7900 | Loss: 1.0906 | LR: 2.00e-05 [2026-04-21 22:03:46] Epoch 1 | Step 7910 | Loss: 1.0907 | LR: 2.00e-05 [2026-04-21 22:03:52] Epoch 1 | Step 7920 | Loss: 1.0906 | LR: 2.00e-05 [2026-04-21 22:03:58] Epoch 1 | Step 7930 | Loss: 1.0905 | LR: 2.00e-05 [2026-04-21 22:04:03] Epoch 1 | Step 7940 | Loss: 1.0902 | LR: 2.00e-05 [2026-04-21 22:04:08] Epoch 1 | Step 7950 | Loss: 1.0902 | LR: 2.00e-05 [2026-04-21 22:04:14] Epoch 1 | Step 7960 | Loss: 1.0902 | LR: 2.00e-05 [2026-04-21 22:04:20] Epoch 1 | Step 7970 | Loss: 1.0903 | LR: 2.00e-05 [2026-04-21 22:04:25] Epoch 1 | Step 7980 | Loss: 1.0902 | LR: 2.00e-05 [2026-04-21 22:04:31] Epoch 1 | Step 7990 | Loss: 1.0902 | LR: 2.00e-05 [2026-04-21 22:04:36] Epoch 1 | Step 8000 | Loss: 1.0899 | LR: 2.00e-05 [2026-04-21 22:04:37] Validation | Batch 10/1567 | Loss: 1.0454 [2026-04-21 22:04:38] Validation | Batch 20/1567 | Loss: 1.1295 [2026-04-21 22:04:39] Validation | Batch 30/1567 | Loss: 1.0966 [2026-04-21 22:04:41] Validation | Batch 40/1567 | Loss: 1.1209 [2026-04-21 22:04:42] Validation | Batch 50/1567 | Loss: 1.0972 [2026-04-21 22:04:43] Validation | Batch 60/1567 | Loss: 1.0878 [2026-04-21 22:04:44] Validation | Batch 70/1567 | Loss: 1.0792 [2026-04-21 22:04:46] Validation | Batch 80/1567 | Loss: 1.0909 [2026-04-21 22:04:47] Validation | Batch 90/1567 | Loss: 1.0845 [2026-04-21 22:04:49] Validation | Batch 100/1567 | Loss: 1.0640 [2026-04-21 22:04:50] Validation | Batch 110/1567 | Loss: 1.0553 [2026-04-21 22:04:51] Validation | Batch 120/1567 | Loss: 1.0482 [2026-04-21 22:04:52] Validation | Batch 130/1567 | Loss: 1.0414 [2026-04-21 22:04:53] Validation | Batch 140/1567 | Loss: 1.0511 [2026-04-21 22:04:54] Validation | Batch 150/1567 | Loss: 1.0614 [2026-04-21 22:04:55] Validation | Batch 160/1567 | Loss: 1.0599 [2026-04-21 22:04:56] Validation | Batch 170/1567 | Loss: 1.0523 [2026-04-21 22:04:57] Validation | Batch 180/1567 | Loss: 1.0544 [2026-04-21 22:04:59] Validation | Batch 190/1567 | Loss: 1.0596 [2026-04-21 22:05:00] Validation | Batch 200/1567 | Loss: 1.0631 [2026-04-21 22:05:01] Validation | Batch 210/1567 | Loss: 1.0624 [2026-04-21 22:05:02] Validation | Batch 220/1567 | Loss: 1.0664 [2026-04-21 22:05:04] Validation | Batch 230/1567 | Loss: 1.0696 [2026-04-21 22:05:05] Validation | Batch 240/1567 | Loss: 1.0724 [2026-04-21 22:05:06] Validation | Batch 250/1567 | Loss: 1.0768 [2026-04-21 22:05:07] Validation | Batch 260/1567 | Loss: 1.0796 [2026-04-21 22:05:09] Validation | Batch 270/1567 | Loss: 1.0843 [2026-04-21 22:05:10] Validation | Batch 280/1567 | Loss: 1.0868 [2026-04-21 22:05:12] Validation | Batch 290/1567 | Loss: 1.0826 [2026-04-21 22:05:13] Validation | Batch 300/1567 | Loss: 1.0822 [2026-04-21 22:05:15] Validation | Batch 310/1567 | Loss: 1.0800 [2026-04-21 22:05:15] Validation | Batch 320/1567 | Loss: 1.0827 [2026-04-21 22:05:17] Validation | Batch 330/1567 | Loss: 1.0823 [2026-04-21 22:05:18] Validation | Batch 340/1567 | Loss: 1.0817 [2026-04-21 22:05:19] Validation | Batch 350/1567 | Loss: 1.0796 [2026-04-21 22:05:20] Validation | Batch 360/1567 | Loss: 1.0735 [2026-04-21 22:05:22] Validation | Batch 370/1567 | Loss: 1.0737 [2026-04-21 22:05:23] Validation | Batch 380/1567 | Loss: 1.0776 [2026-04-21 22:05:24] Validation | Batch 390/1567 | Loss: 1.0767 [2026-04-21 22:05:25] Validation | Batch 400/1567 | Loss: 1.0776 [2026-04-21 22:05:27] Validation | Batch 410/1567 | Loss: 1.0738 [2026-04-21 22:05:28] Validation | Batch 420/1567 | Loss: 1.0724 [2026-04-21 22:05:29] Validation | Batch 430/1567 | Loss: 1.0757 [2026-04-21 22:05:30] Validation | Batch 440/1567 | Loss: 1.0764 [2026-04-21 22:05:31] Validation | Batch 450/1567 | Loss: 1.0787 [2026-04-21 22:05:33] Validation | Batch 460/1567 | Loss: 1.0816 [2026-04-21 22:05:34] Validation | Batch 470/1567 | Loss: 1.0862 [2026-04-21 22:05:35] Validation | Batch 480/1567 | Loss: 1.0838 [2026-04-21 22:05:36] Validation | Batch 490/1567 | Loss: 1.0817 [2026-04-21 22:05:37] Validation | Batch 500/1567 | Loss: 1.0829 [2026-04-21 22:05:38] Validation | Batch 510/1567 | Loss: 1.0831 [2026-04-21 22:05:39] Validation | Batch 520/1567 | Loss: 1.0840 [2026-04-21 22:05:40] Validation | Batch 530/1567 | Loss: 1.0826 [2026-04-21 22:05:42] Validation | Batch 540/1567 | Loss: 1.0797 [2026-04-21 22:05:43] Validation | Batch 550/1567 | Loss: 1.0806 [2026-04-21 22:05:44] Validation | Batch 560/1567 | Loss: 1.0797 [2026-04-21 22:05:46] Validation | Batch 570/1567 | Loss: 1.0758 [2026-04-21 22:05:47] Validation | Batch 580/1567 | Loss: 1.0775 [2026-04-21 22:05:48] Validation | Batch 590/1567 | Loss: 1.0772 [2026-04-21 22:05:49] Validation | Batch 600/1567 | Loss: 1.0762 [2026-04-21 22:05:51] Validation | Batch 610/1567 | Loss: 1.0782 [2026-04-21 22:05:52] Validation | Batch 620/1567 | Loss: 1.0759 [2026-04-21 22:05:54] Validation | Batch 630/1567 | Loss: 1.0762 [2026-04-21 22:05:55] Validation | Batch 640/1567 | Loss: 1.0770 [2026-04-21 22:05:56] Validation | Batch 650/1567 | Loss: 1.0800 [2026-04-21 22:05:57] Validation | Batch 660/1567 | Loss: 1.0812 [2026-04-21 22:05:58] Validation | Batch 670/1567 | Loss: 1.0796 [2026-04-21 22:05:59] Validation | Batch 680/1567 | Loss: 1.0785 [2026-04-21 22:06:01] Validation | Batch 690/1567 | Loss: 1.0769 [2026-04-21 22:06:02] Validation | Batch 700/1567 | Loss: 1.0770 [2026-04-21 22:06:03] Validation | Batch 710/1567 | Loss: 1.0763 [2026-04-21 22:06:04] Validation | Batch 720/1567 | Loss: 1.0734 [2026-04-21 22:06:05] Validation | Batch 730/1567 | Loss: 1.0739 [2026-04-21 22:06:06] Validation | Batch 740/1567 | Loss: 1.0745 [2026-04-21 22:06:07] Validation | Batch 750/1567 | Loss: 1.0741 [2026-04-21 22:06:08] Validation | Batch 760/1567 | Loss: 1.0757 [2026-04-21 22:06:10] Validation | Batch 770/1567 | Loss: 1.0753 [2026-04-21 22:06:11] Validation | Batch 780/1567 | Loss: 1.0760 [2026-04-21 22:06:12] Validation | Batch 790/1567 | Loss: 1.0745 [2026-04-21 22:06:13] Validation | Batch 800/1567 | Loss: 1.0724 [2026-04-21 22:06:14] Validation | Batch 810/1567 | Loss: 1.0729 [2026-04-21 22:06:16] Validation | Batch 820/1567 | Loss: 1.0721 [2026-04-21 22:06:17] Validation | Batch 830/1567 | Loss: 1.0713 [2026-04-21 22:06:18] Validation | Batch 840/1567 | Loss: 1.0716 [2026-04-21 22:06:19] Validation | Batch 850/1567 | Loss: 1.0726 [2026-04-21 22:06:19] Validation | Batch 860/1567 | Loss: 1.0733 [2026-04-21 22:06:20] Validation | Batch 870/1567 | Loss: 1.0738 [2026-04-21 22:06:22] Validation | Batch 880/1567 | Loss: 1.0739 [2026-04-21 22:06:23] Validation | Batch 890/1567 | Loss: 1.0737 [2026-04-21 22:06:25] Validation | Batch 900/1567 | Loss: 1.0732 [2026-04-21 22:06:26] Validation | Batch 910/1567 | Loss: 1.0729 [2026-04-21 22:06:27] Validation | Batch 920/1567 | Loss: 1.0747 [2026-04-21 22:06:28] Validation | Batch 930/1567 | Loss: 1.0746 [2026-04-21 22:06:29] Validation | Batch 940/1567 | Loss: 1.0746 [2026-04-21 22:06:30] Validation | Batch 950/1567 | Loss: 1.0738 [2026-04-21 22:06:31] Validation | Batch 960/1567 | Loss: 1.0745 [2026-04-21 22:06:32] Validation | Batch 970/1567 | Loss: 1.0750 [2026-04-21 22:06:33] Validation | Batch 980/1567 | Loss: 1.0746 [2026-04-21 22:06:34] Validation | Batch 990/1567 | Loss: 1.0756 [2026-04-21 22:06:35] Validation | Batch 1000/1567 | Loss: 1.0761 [2026-04-21 22:06:36] Validation | Batch 1010/1567 | Loss: 1.0751 [2026-04-21 22:06:38] Validation | Batch 1020/1567 | Loss: 1.0762 [2026-04-21 22:06:39] Validation | Batch 1030/1567 | Loss: 1.0768 [2026-04-21 22:06:40] Validation | Batch 1040/1567 | Loss: 1.0761 [2026-04-21 22:06:41] Validation | Batch 1050/1567 | Loss: 1.0753 [2026-04-21 22:06:42] Validation | Batch 1060/1567 | Loss: 1.0764 [2026-04-21 22:06:44] Validation | Batch 1070/1567 | Loss: 1.0763 [2026-04-21 22:06:45] Validation | Batch 1080/1567 | Loss: 1.0778 [2026-04-21 22:06:46] Validation | Batch 1090/1567 | Loss: 1.0806 [2026-04-21 22:06:47] Validation | Batch 1100/1567 | Loss: 1.0822 [2026-04-21 22:06:48] Validation | Batch 1110/1567 | Loss: 1.0813 [2026-04-21 22:06:49] Validation | Batch 1120/1567 | Loss: 1.0815 [2026-04-21 22:06:51] Validation | Batch 1130/1567 | Loss: 1.0797 [2026-04-21 22:06:52] Validation | Batch 1140/1567 | Loss: 1.0801 [2026-04-21 22:06:53] Validation | Batch 1150/1567 | Loss: 1.0789 [2026-04-21 22:06:54] Validation | Batch 1160/1567 | Loss: 1.0782 [2026-04-21 22:06:55] Validation | Batch 1170/1567 | Loss: 1.0783 [2026-04-21 22:06:56] Validation | Batch 1180/1567 | Loss: 1.0788 [2026-04-21 22:06:58] Validation | Batch 1190/1567 | Loss: 1.0792 [2026-04-21 22:06:59] Validation | Batch 1200/1567 | Loss: 1.0780 [2026-04-21 22:07:00] Validation | Batch 1210/1567 | Loss: 1.0771 [2026-04-21 22:07:01] Validation | Batch 1220/1567 | Loss: 1.0780 [2026-04-21 22:07:02] Validation | Batch 1230/1567 | Loss: 1.0785 [2026-04-21 22:07:03] Validation | Batch 1240/1567 | Loss: 1.0782 [2026-04-21 22:07:05] Validation | Batch 1250/1567 | Loss: 1.0783 [2026-04-21 22:07:06] Validation | Batch 1260/1567 | Loss: 1.0780 [2026-04-21 22:07:07] Validation | Batch 1270/1567 | Loss: 1.0763 [2026-04-21 22:07:08] Validation | Batch 1280/1567 | Loss: 1.0763 [2026-04-21 22:07:10] Validation | Batch 1290/1567 | Loss: 1.0765 [2026-04-21 22:07:11] Validation | Batch 1300/1567 | Loss: 1.0769 [2026-04-21 22:07:12] Validation | Batch 1310/1567 | Loss: 1.0776 [2026-04-21 22:07:14] Validation | Batch 1320/1567 | Loss: 1.0782 [2026-04-21 22:07:15] Validation | Batch 1330/1567 | Loss: 1.0796 [2026-04-21 22:07:16] Validation | Batch 1340/1567 | Loss: 1.0791 [2026-04-21 22:07:17] Validation | Batch 1350/1567 | Loss: 1.0794 [2026-04-21 22:07:18] Validation | Batch 1360/1567 | Loss: 1.0785 [2026-04-21 22:07:19] Validation | Batch 1370/1567 | Loss: 1.0780 [2026-04-21 22:07:20] Validation | Batch 1380/1567 | Loss: 1.0781 [2026-04-21 22:07:21] Validation | Batch 1390/1567 | Loss: 1.0775 [2026-04-21 22:07:22] Validation | Batch 1400/1567 | Loss: 1.0771 [2026-04-21 22:07:24] Validation | Batch 1410/1567 | Loss: 1.0777 [2026-04-21 22:07:25] Validation | Batch 1420/1567 | Loss: 1.0776 [2026-04-21 22:07:26] Validation | Batch 1430/1567 | Loss: 1.0780 [2026-04-21 22:07:27] Validation | Batch 1440/1567 | Loss: 1.0785 [2026-04-21 22:07:28] Validation | Batch 1450/1567 | Loss: 1.0786 [2026-04-21 22:07:29] Validation | Batch 1460/1567 | Loss: 1.0780 [2026-04-21 22:07:30] Validation | Batch 1470/1567 | Loss: 1.0779 [2026-04-21 22:07:31] Validation | Batch 1480/1567 | Loss: 1.0775 [2026-04-21 22:07:32] Validation | Batch 1490/1567 | Loss: 1.0768 [2026-04-21 22:07:33] Validation | Batch 1500/1567 | Loss: 1.0765 [2026-04-21 22:07:34] Validation | Batch 1510/1567 | Loss: 1.0757 [2026-04-21 22:07:35] Validation | Batch 1520/1567 | Loss: 1.0756 [2026-04-21 22:07:36] Validation | Batch 1530/1567 | Loss: 1.0755 [2026-04-21 22:07:37] Validation | Batch 1540/1567 | Loss: 1.0761 [2026-04-21 22:07:38] Validation | Batch 1550/1567 | Loss: 1.0771 [2026-04-21 22:07:39] Validation | Batch 1560/1567 | Loss: 1.0768 [2026-04-21 22:07:40] Validation | Batch 1567/1567 | Loss: 1.0769 [2026-04-21 22:07:40] Validation | Loss: 1.0769 | PPL: 2.99 | Time: 184.62s [2026-04-21 22:07:58] New best model saved! Val loss: 1.0769 [2026-04-21 22:08:03] Epoch 1 | Step 8010 | Loss: 1.0898 | LR: 2.00e-05 [2026-04-21 22:08:09] Epoch 1 | Step 8020 | Loss: 1.0897 | LR: 2.00e-05 [2026-04-21 22:08:15] Epoch 1 | Step 8030 | Loss: 1.0898 | LR: 2.00e-05 [2026-04-21 22:08:20] Epoch 1 | Step 8040 | Loss: 1.0894 | LR: 2.00e-05 [2026-04-21 22:08:25] Epoch 1 | Step 8050 | Loss: 1.0895 | LR: 2.00e-05 [2026-04-21 22:08:30] Epoch 1 | Step 8060 | Loss: 1.0893 | LR: 2.00e-05 [2026-04-21 22:08:36] Epoch 1 | Step 8070 | Loss: 1.0894 | LR: 2.00e-05 [2026-04-21 22:08:41] Epoch 1 | Step 8080 | Loss: 1.0894 | LR: 2.00e-05 [2026-04-21 22:08:46] Epoch 1 | Step 8090 | Loss: 1.0893 | LR: 2.00e-05 [2026-04-21 22:08:52] Epoch 1 | Step 8100 | Loss: 1.0894 | LR: 2.00e-05 [2026-04-21 22:08:57] Epoch 1 | Step 8110 | Loss: 1.0893 | LR: 2.00e-05 [2026-04-21 22:09:01] Epoch 1 | Step 8120 | Loss: 1.0893 | LR: 2.00e-05 [2026-04-21 22:09:07] Epoch 1 | Step 8130 | Loss: 1.0891 | LR: 2.00e-05 [2026-04-21 22:09:12] Epoch 1 | Step 8140 | Loss: 1.0892 | LR: 2.00e-05 [2026-04-21 22:09:17] Epoch 1 | Step 8150 | Loss: 1.0891 | LR: 2.00e-05 [2026-04-21 22:09:23] Epoch 1 | Step 8160 | Loss: 1.0890 | LR: 2.00e-05 [2026-04-21 22:09:29] Epoch 1 | Step 8170 | Loss: 1.0889 | LR: 2.00e-05 [2026-04-21 22:09:35] Epoch 1 | Step 8180 | Loss: 1.0890 | LR: 2.00e-05 [2026-04-21 22:09:40] Epoch 1 | Step 8190 | Loss: 1.0888 | LR: 2.00e-05 [2026-04-21 22:09:46] Epoch 1 | Step 8200 | Loss: 1.0888 | LR: 2.00e-05 [2026-04-21 22:09:51] Epoch 1 | Step 8210 | Loss: 1.0888 | LR: 2.00e-05 [2026-04-21 22:09:57] Epoch 1 | Step 8220 | Loss: 1.0887 | LR: 2.00e-05 [2026-04-21 22:10:03] Epoch 1 | Step 8230 | Loss: 1.0888 | LR: 2.00e-05 [2026-04-21 22:10:08] Epoch 1 | Step 8240 | Loss: 1.0888 | LR: 2.00e-05 [2026-04-21 22:10:13] Epoch 1 | Step 8250 | Loss: 1.0887 | LR: 2.00e-05 [2026-04-21 22:10:19] Epoch 1 | Step 8260 | Loss: 1.0888 | LR: 2.00e-05 [2026-04-21 22:10:24] Epoch 1 | Step 8270 | Loss: 1.0888 | LR: 2.00e-05 [2026-04-21 22:10:30] Epoch 1 | Step 8280 | Loss: 1.0889 | LR: 2.00e-05 [2026-04-21 22:10:34] Epoch 1 | Step 8290 | Loss: 1.0886 | LR: 2.00e-05 [2026-04-21 22:10:40] Epoch 1 | Step 8300 | Loss: 1.0888 | LR: 2.00e-05 [2026-04-21 22:10:45] Epoch 1 | Step 8310 | Loss: 1.0886 | LR: 2.00e-05 [2026-04-21 22:10:51] Epoch 1 | Step 8320 | Loss: 1.0886 | LR: 2.00e-05 [2026-04-21 22:10:56] Epoch 1 | Step 8330 | Loss: 1.0886 | LR: 2.00e-05 [2026-04-21 22:11:02] Epoch 1 | Step 8340 | Loss: 1.0885 | LR: 2.00e-05 [2026-04-21 22:11:07] Epoch 1 | Step 8350 | Loss: 1.0885 | LR: 2.00e-05 [2026-04-21 22:11:13] Epoch 1 | Step 8360 | Loss: 1.0884 | LR: 2.00e-05 [2026-04-21 22:11:18] Epoch 1 | Step 8370 | Loss: 1.0884 | LR: 2.00e-05 [2026-04-21 22:11:24] Epoch 1 | Step 8380 | Loss: 1.0884 | LR: 2.00e-05 [2026-04-21 22:11:29] Epoch 1 | Step 8390 | Loss: 1.0884 | LR: 2.00e-05 [2026-04-21 22:11:35] Epoch 1 | Step 8400 | Loss: 1.0884 | LR: 2.00e-05 [2026-04-21 22:11:41] Epoch 1 | Step 8410 | Loss: 1.0884 | LR: 2.00e-05 [2026-04-21 22:11:46] Epoch 1 | Step 8420 | Loss: 1.0883 | LR: 2.00e-05 [2026-04-21 22:11:51] Epoch 1 | Step 8430 | Loss: 1.0882 | LR: 2.00e-05 [2026-04-21 22:11:56] Epoch 1 | Step 8440 | Loss: 1.0882 | LR: 2.00e-05 [2026-04-21 22:12:01] Epoch 1 | Step 8450 | Loss: 1.0881 | LR: 2.00e-05 [2026-04-21 22:12:07] Epoch 1 | Step 8460 | Loss: 1.0881 | LR: 2.00e-05 [2026-04-21 22:12:12] Epoch 1 | Step 8470 | Loss: 1.0878 | LR: 2.00e-05 [2026-04-21 22:12:17] Epoch 1 | Step 8480 | Loss: 1.0877 | LR: 2.00e-05 [2026-04-21 22:12:22] Epoch 1 | Step 8490 | Loss: 1.0876 | LR: 2.00e-05 [2026-04-21 22:12:28] Epoch 1 | Step 8500 | Loss: 1.0876 | LR: 2.00e-05 [2026-04-21 22:12:33] Epoch 1 | Step 8510 | Loss: 1.0876 | LR: 2.00e-05 [2026-04-21 22:12:38] Epoch 1 | Step 8520 | Loss: 1.0877 | LR: 2.00e-05 [2026-04-21 22:12:43] Epoch 1 | Step 8530 | Loss: 1.0877 | LR: 2.00e-05 [2026-04-21 22:12:48] Epoch 1 | Step 8540 | Loss: 1.0877 | LR: 2.00e-05 [2026-04-21 22:12:53] Epoch 1 | Step 8550 | Loss: 1.0877 | LR: 2.00e-05 [2026-04-21 22:12:57] Epoch 1 | Step 8560 | Loss: 1.0875 | LR: 2.00e-05 [2026-04-21 22:13:02] Epoch 1 | Step 8570 | Loss: 1.0873 | LR: 2.00e-05 [2026-04-21 22:13:07] Epoch 1 | Step 8580 | Loss: 1.0872 | LR: 2.00e-05 [2026-04-21 22:13:12] Epoch 1 | Step 8590 | Loss: 1.0873 | LR: 2.00e-05 [2026-04-21 22:13:18] Epoch 1 | Step 8600 | Loss: 1.0873 | LR: 2.00e-05 [2026-04-21 22:13:23] Epoch 1 | Step 8610 | Loss: 1.0872 | LR: 2.00e-05 [2026-04-21 22:13:28] Epoch 1 | Step 8620 | Loss: 1.0870 | LR: 2.00e-05 [2026-04-21 22:13:34] Epoch 1 | Step 8630 | Loss: 1.0869 | LR: 2.00e-05 [2026-04-21 22:13:39] Epoch 1 | Step 8640 | Loss: 1.0869 | LR: 2.00e-05 [2026-04-21 22:13:45] Epoch 1 | Step 8650 | Loss: 1.0867 | LR: 2.00e-05 [2026-04-21 22:13:50] Epoch 1 | Step 8660 | Loss: 1.0867 | LR: 2.00e-05 [2026-04-21 22:13:55] Epoch 1 | Step 8670 | Loss: 1.0867 | LR: 2.00e-05 [2026-04-21 22:14:00] Epoch 1 | Step 8680 | Loss: 1.0867 | LR: 2.00e-05 [2026-04-21 22:14:06] Epoch 1 | Step 8690 | Loss: 1.0866 | LR: 2.00e-05 [2026-04-21 22:14:12] Epoch 1 | Step 8700 | Loss: 1.0867 | LR: 2.00e-05 [2026-04-21 22:14:17] Epoch 1 | Step 8710 | Loss: 1.0867 | LR: 2.00e-05 [2026-04-21 22:14:23] Epoch 1 | Step 8720 | Loss: 1.0868 | LR: 2.00e-05 [2026-04-21 22:14:28] Epoch 1 | Step 8730 | Loss: 1.0868 | LR: 2.00e-05 [2026-04-21 22:14:34] Epoch 1 | Step 8740 | Loss: 1.0869 | LR: 2.00e-05 [2026-04-21 22:14:40] Epoch 1 | Step 8750 | Loss: 1.0869 | LR: 2.00e-05 [2026-04-21 22:14:45] Epoch 1 | Step 8760 | Loss: 1.0869 | LR: 2.00e-05 [2026-04-21 22:14:50] Epoch 1 | Step 8770 | Loss: 1.0868 | LR: 2.00e-05 [2026-04-21 22:14:55] Epoch 1 | Step 8780 | Loss: 1.0868 | LR: 2.00e-05 [2026-04-21 22:15:00] Epoch 1 | Step 8790 | Loss: 1.0868 | LR: 2.00e-05 [2026-04-21 22:15:05] Epoch 1 | Step 8800 | Loss: 1.0866 | LR: 2.00e-05 [2026-04-21 22:15:10] Epoch 1 | Step 8810 | Loss: 1.0866 | LR: 2.00e-05 [2026-04-21 22:15:16] Epoch 1 | Step 8820 | Loss: 1.0866 | LR: 2.00e-05 [2026-04-21 22:15:22] Epoch 1 | Step 8830 | Loss: 1.0866 | LR: 2.00e-05 [2026-04-21 22:15:27] Epoch 1 | Step 8840 | Loss: 1.0865 | LR: 2.00e-05 [2026-04-21 22:15:32] Epoch 1 | Step 8850 | Loss: 1.0865 | LR: 2.00e-05 [2026-04-21 22:15:37] Epoch 1 | Step 8860 | Loss: 1.0865 | LR: 2.00e-05 [2026-04-21 22:15:42] Epoch 1 | Step 8870 | Loss: 1.0863 | LR: 2.00e-05 [2026-04-21 22:15:47] Epoch 1 | Step 8880 | Loss: 1.0864 | LR: 2.00e-05 [2026-04-21 22:15:53] Epoch 1 | Step 8890 | Loss: 1.0863 | LR: 2.00e-05 [2026-04-21 22:15:58] Epoch 1 | Step 8900 | Loss: 1.0862 | LR: 2.00e-05 [2026-04-21 22:16:03] Epoch 1 | Step 8910 | Loss: 1.0861 | LR: 2.00e-05 [2026-04-21 22:16:09] Epoch 1 | Step 8920 | Loss: 1.0860 | LR: 2.00e-05 [2026-04-21 22:16:14] Epoch 1 | Step 8930 | Loss: 1.0858 | LR: 2.00e-05 [2026-04-21 22:16:19] Epoch 1 | Step 8940 | Loss: 1.0857 | LR: 2.00e-05 [2026-04-21 22:16:24] Epoch 1 | Step 8950 | Loss: 1.0857 | LR: 2.00e-05 [2026-04-21 22:16:30] Epoch 1 | Step 8960 | Loss: 1.0857 | LR: 2.00e-05 [2026-04-21 22:16:35] Epoch 1 | Step 8970 | Loss: 1.0855 | LR: 2.00e-05 [2026-04-21 22:16:40] Epoch 1 | Step 8980 | Loss: 1.0853 | LR: 2.00e-05 [2026-04-21 22:16:46] Epoch 1 | Step 8990 | Loss: 1.0853 | LR: 2.00e-05 [2026-04-21 22:16:52] Epoch 1 | Step 9000 | Loss: 1.0853 | LR: 2.00e-05 [2026-04-21 22:17:03] Checkpoint saved: outputs/2026-04-21/20-28-37/checkpoints/checkpoint_step_9000.pt [2026-04-21 22:18:19] Validation | Batch 10/1567 | Loss: 1.0569 [2026-04-21 22:18:20] Validation | Batch 20/1567 | Loss: 1.1350 [2026-04-21 22:18:21] Validation | Batch 30/1567 | Loss: 1.1001 [2026-04-21 22:18:23] Validation | Batch 40/1567 | Loss: 1.1213 [2026-04-21 22:18:24] Validation | Batch 50/1567 | Loss: 1.1032 [2026-04-21 22:18:25] Validation | Batch 60/1567 | Loss: 1.0920 [2026-04-21 22:18:26] Validation | Batch 70/1567 | Loss: 1.0792 [2026-04-21 22:18:28] Validation | Batch 80/1567 | Loss: 1.0893 [2026-04-21 22:18:29] Validation | Batch 90/1567 | Loss: 1.0835 [2026-04-21 22:18:31] Validation | Batch 100/1567 | Loss: 1.0619 [2026-04-21 22:18:32] Validation | Batch 110/1567 | Loss: 1.0543 [2026-04-21 22:18:33] Validation | Batch 120/1567 | Loss: 1.0473 [2026-04-21 22:18:34] Validation | Batch 130/1567 | Loss: 1.0434 [2026-04-21 22:18:35] Validation | Batch 140/1567 | Loss: 1.0526 [2026-04-21 22:18:36] Validation | Batch 150/1567 | Loss: 1.0628 [2026-04-21 22:18:37] Validation | Batch 160/1567 | Loss: 1.0607 [2026-04-21 22:18:38] Validation | Batch 170/1567 | Loss: 1.0529 [2026-04-21 22:18:39] Validation | Batch 180/1567 | Loss: 1.0544 [2026-04-21 22:18:41] Validation | Batch 190/1567 | Loss: 1.0595 [2026-04-21 22:18:42] Validation | Batch 200/1567 | Loss: 1.0632 [2026-04-21 22:18:43] Validation | Batch 210/1567 | Loss: 1.0620 [2026-04-21 22:18:44] Validation | Batch 220/1567 | Loss: 1.0671 [2026-04-21 22:18:46] Validation | Batch 230/1567 | Loss: 1.0703 [2026-04-21 22:18:47] Validation | Batch 240/1567 | Loss: 1.0722 [2026-04-21 22:18:48] Validation | Batch 250/1567 | Loss: 1.0762 [2026-04-21 22:18:49] Validation | Batch 260/1567 | Loss: 1.0785 [2026-04-21 22:18:51] Validation | Batch 270/1567 | Loss: 1.0829 [2026-04-21 22:18:52] Validation | Batch 280/1567 | Loss: 1.0847 [2026-04-21 22:18:54] Validation | Batch 290/1567 | Loss: 1.0804 [2026-04-21 22:18:55] Validation | Batch 300/1567 | Loss: 1.0800 [2026-04-21 22:18:57] Validation | Batch 310/1567 | Loss: 1.0769 [2026-04-21 22:18:57] Validation | Batch 320/1567 | Loss: 1.0798 [2026-04-21 22:18:59] Validation | Batch 330/1567 | Loss: 1.0794 [2026-04-21 22:19:00] Validation | Batch 340/1567 | Loss: 1.0783 [2026-04-21 22:19:01] Validation | Batch 350/1567 | Loss: 1.0763 [2026-04-21 22:19:02] Validation | Batch 360/1567 | Loss: 1.0703 [2026-04-21 22:19:04] Validation | Batch 370/1567 | Loss: 1.0700 [2026-04-21 22:19:05] Validation | Batch 380/1567 | Loss: 1.0739 [2026-04-21 22:19:06] Validation | Batch 390/1567 | Loss: 1.0730 [2026-04-21 22:19:07] Validation | Batch 400/1567 | Loss: 1.0737 [2026-04-21 22:19:09] Validation | Batch 410/1567 | Loss: 1.0699 [2026-04-21 22:19:10] Validation | Batch 420/1567 | Loss: 1.0682 [2026-04-21 22:19:11] Validation | Batch 430/1567 | Loss: 1.0709 [2026-04-21 22:19:12] Validation | Batch 440/1567 | Loss: 1.0714 [2026-04-21 22:19:13] Validation | Batch 450/1567 | Loss: 1.0737 [2026-04-21 22:19:15] Validation | Batch 460/1567 | Loss: 1.0767 [2026-04-21 22:19:16] Validation | Batch 470/1567 | Loss: 1.0816 [2026-04-21 22:19:17] Validation | Batch 480/1567 | Loss: 1.0794 [2026-04-21 22:19:18] Validation | Batch 490/1567 | Loss: 1.0773 [2026-04-21 22:19:19] Validation | Batch 500/1567 | Loss: 1.0784 [2026-04-21 22:19:20] Validation | Batch 510/1567 | Loss: 1.0785 [2026-04-21 22:19:21] Validation | Batch 520/1567 | Loss: 1.0798 [2026-04-21 22:19:22] Validation | Batch 530/1567 | Loss: 1.0783 [2026-04-21 22:19:24] Validation | Batch 540/1567 | Loss: 1.0757 [2026-04-21 22:19:25] Validation | Batch 550/1567 | Loss: 1.0769 [2026-04-21 22:19:26] Validation | Batch 560/1567 | Loss: 1.0762 [2026-04-21 22:19:28] Validation | Batch 570/1567 | Loss: 1.0724 [2026-04-21 22:19:29] Validation | Batch 580/1567 | Loss: 1.0735 [2026-04-21 22:19:30] Validation | Batch 590/1567 | Loss: 1.0730 [2026-04-21 22:19:31] Validation | Batch 600/1567 | Loss: 1.0722 [2026-04-21 22:19:33] Validation | Batch 610/1567 | Loss: 1.0742 [2026-04-21 22:19:34] Validation | Batch 620/1567 | Loss: 1.0719 [2026-04-21 22:19:36] Validation | Batch 630/1567 | Loss: 1.0719 [2026-04-21 22:19:37] Validation | Batch 640/1567 | Loss: 1.0726 [2026-04-21 22:19:38] Validation | Batch 650/1567 | Loss: 1.0757 [2026-04-21 22:19:39] Validation | Batch 660/1567 | Loss: 1.0770 [2026-04-21 22:19:40] Validation | Batch 670/1567 | Loss: 1.0755 [2026-04-21 22:19:41] Validation | Batch 680/1567 | Loss: 1.0743 [2026-04-21 22:19:43] Validation | Batch 690/1567 | Loss: 1.0728 [2026-04-21 22:19:44] Validation | Batch 700/1567 | Loss: 1.0726 [2026-04-21 22:19:45] Validation | Batch 710/1567 | Loss: 1.0719 [2026-04-21 22:19:46] Validation | Batch 720/1567 | Loss: 1.0689 [2026-04-21 22:19:47] Validation | Batch 730/1567 | Loss: 1.0694 [2026-04-21 22:19:48] Validation | Batch 740/1567 | Loss: 1.0699 [2026-04-21 22:19:49] Validation | Batch 750/1567 | Loss: 1.0697 [2026-04-21 22:19:50] Validation | Batch 760/1567 | Loss: 1.0712 [2026-04-21 22:19:52] Validation | Batch 770/1567 | Loss: 1.0707 [2026-04-21 22:19:53] Validation | Batch 780/1567 | Loss: 1.0716 [2026-04-21 22:19:54] Validation | Batch 790/1567 | Loss: 1.0701 [2026-04-21 22:19:55] Validation | Batch 800/1567 | Loss: 1.0681 [2026-04-21 22:19:56] Validation | Batch 810/1567 | Loss: 1.0684 [2026-04-21 22:19:58] Validation | Batch 820/1567 | Loss: 1.0675 [2026-04-21 22:19:59] Validation | Batch 830/1567 | Loss: 1.0667 [2026-04-21 22:20:00] Validation | Batch 840/1567 | Loss: 1.0670 [2026-04-21 22:20:01] Validation | Batch 850/1567 | Loss: 1.0679 [2026-04-21 22:20:01] Validation | Batch 860/1567 | Loss: 1.0688 [2026-04-21 22:20:02] Validation | Batch 870/1567 | Loss: 1.0694 [2026-04-21 22:20:04] Validation | Batch 880/1567 | Loss: 1.0693 [2026-04-21 22:20:05] Validation | Batch 890/1567 | Loss: 1.0689 [2026-04-21 22:20:07] Validation | Batch 900/1567 | Loss: 1.0683 [2026-04-21 22:20:08] Validation | Batch 910/1567 | Loss: 1.0680 [2026-04-21 22:20:09] Validation | Batch 920/1567 | Loss: 1.0699 [2026-04-21 22:20:10] Validation | Batch 930/1567 | Loss: 1.0697 [2026-04-21 22:20:11] Validation | Batch 940/1567 | Loss: 1.0697 [2026-04-21 22:20:12] Validation | Batch 950/1567 | Loss: 1.0691 [2026-04-21 22:20:13] Validation | Batch 960/1567 | Loss: 1.0696 [2026-04-21 22:20:14] Validation | Batch 970/1567 | Loss: 1.0701 [2026-04-21 22:20:15] Validation | Batch 980/1567 | Loss: 1.0698 [2026-04-21 22:20:16] Validation | Batch 990/1567 | Loss: 1.0708 [2026-04-21 22:20:17] Validation | Batch 1000/1567 | Loss: 1.0712 [2026-04-21 22:20:18] Validation | Batch 1010/1567 | Loss: 1.0702 [2026-04-21 22:20:20] Validation | Batch 1020/1567 | Loss: 1.0714 [2026-04-21 22:20:21] Validation | Batch 1030/1567 | Loss: 1.0719 [2026-04-21 22:20:22] Validation | Batch 1040/1567 | Loss: 1.0710 [2026-04-21 22:20:23] Validation | Batch 1050/1567 | Loss: 1.0702 [2026-04-21 22:20:24] Validation | Batch 1060/1567 | Loss: 1.0713 [2026-04-21 22:20:26] Validation | Batch 1070/1567 | Loss: 1.0713 [2026-04-21 22:20:27] Validation | Batch 1080/1567 | Loss: 1.0726 [2026-04-21 22:20:28] Validation | Batch 1090/1567 | Loss: 1.0753 [2026-04-21 22:20:29] Validation | Batch 1100/1567 | Loss: 1.0769 [2026-04-21 22:20:30] Validation | Batch 1110/1567 | Loss: 1.0760 [2026-04-21 22:20:31] Validation | Batch 1120/1567 | Loss: 1.0761 [2026-04-21 22:20:33] Validation | Batch 1130/1567 | Loss: 1.0744 [2026-04-21 22:20:34] Validation | Batch 1140/1567 | Loss: 1.0747 [2026-04-21 22:20:35] Validation | Batch 1150/1567 | Loss: 1.0733 [2026-04-21 22:20:36] Validation | Batch 1160/1567 | Loss: 1.0726 [2026-04-21 22:20:37] Validation | Batch 1170/1567 | Loss: 1.0727 [2026-04-21 22:20:38] Validation | Batch 1180/1567 | Loss: 1.0732 [2026-04-21 22:20:40] Validation | Batch 1190/1567 | Loss: 1.0735 [2026-04-21 22:20:41] Validation | Batch 1200/1567 | Loss: 1.0723 [2026-04-21 22:20:42] Validation | Batch 1210/1567 | Loss: 1.0714 [2026-04-21 22:20:43] Validation | Batch 1220/1567 | Loss: 1.0724 [2026-04-21 22:20:44] Validation | Batch 1230/1567 | Loss: 1.0731 [2026-04-21 22:20:45] Validation | Batch 1240/1567 | Loss: 1.0728 [2026-04-21 22:20:47] Validation | Batch 1250/1567 | Loss: 1.0730 [2026-04-21 22:20:48] Validation | Batch 1260/1567 | Loss: 1.0727 [2026-04-21 22:20:49] Validation | Batch 1270/1567 | Loss: 1.0711 [2026-04-21 22:20:50] Validation | Batch 1280/1567 | Loss: 1.0712 [2026-04-21 22:20:52] Validation | Batch 1290/1567 | Loss: 1.0713 [2026-04-21 22:20:54] Validation | Batch 1300/1567 | Loss: 1.0716 [2026-04-21 22:20:54] Validation | Batch 1310/1567 | Loss: 1.0725 [2026-04-21 22:20:56] Validation | Batch 1320/1567 | Loss: 1.0732 [2026-04-21 22:20:57] Validation | Batch 1330/1567 | Loss: 1.0746 [2026-04-21 22:20:58] Validation | Batch 1340/1567 | Loss: 1.0743 [2026-04-21 22:20:59] Validation | Batch 1350/1567 | Loss: 1.0746 [2026-04-21 22:21:00] Validation | Batch 1360/1567 | Loss: 1.0737 [2026-04-21 22:21:01] Validation | Batch 1370/1567 | Loss: 1.0733 [2026-04-21 22:21:02] Validation | Batch 1380/1567 | Loss: 1.0735 [2026-04-21 22:21:03] Validation | Batch 1390/1567 | Loss: 1.0728 [2026-04-21 22:21:04] Validation | Batch 1400/1567 | Loss: 1.0726 [2026-04-21 22:21:06] Validation | Batch 1410/1567 | Loss: 1.0730 [2026-04-21 22:21:07] Validation | Batch 1420/1567 | Loss: 1.0729 [2026-04-21 22:21:08] Validation | Batch 1430/1567 | Loss: 1.0734 [2026-04-21 22:21:09] Validation | Batch 1440/1567 | Loss: 1.0739 [2026-04-21 22:21:10] Validation | Batch 1450/1567 | Loss: 1.0739 [2026-04-21 22:21:11] Validation | Batch 1460/1567 | Loss: 1.0733 [2026-04-21 22:21:12] Validation | Batch 1470/1567 | Loss: 1.0731 [2026-04-21 22:21:13] Validation | Batch 1480/1567 | Loss: 1.0729 [2026-04-21 22:21:14] Validation | Batch 1490/1567 | Loss: 1.0724 [2026-04-21 22:21:15] Validation | Batch 1500/1567 | Loss: 1.0722 [2026-04-21 22:21:16] Validation | Batch 1510/1567 | Loss: 1.0712 [2026-04-21 22:21:17] Validation | Batch 1520/1567 | Loss: 1.0711 [2026-04-21 22:21:18] Validation | Batch 1530/1567 | Loss: 1.0710 [2026-04-21 22:21:19] Validation | Batch 1540/1567 | Loss: 1.0716 [2026-04-21 22:21:20] Validation | Batch 1550/1567 | Loss: 1.0727 [2026-04-21 22:21:22] Validation | Batch 1560/1567 | Loss: 1.0722 [2026-04-21 22:21:22] Validation | Batch 1567/1567 | Loss: 1.0723 [2026-04-21 22:21:22] Validation | Loss: 1.0723 | PPL: 2.98 | Time: 184.80s [2026-04-21 22:21:40] New best model saved! Val loss: 1.0723 [2026-04-21 22:21:47] Epoch 1 | Step 9010 | Loss: 1.0853 | LR: 2.00e-05 [2026-04-21 22:21:52] Epoch 1 | Step 9020 | Loss: 1.0853 | LR: 2.00e-05 [2026-04-21 22:21:57] Epoch 1 | Step 9030 | Loss: 1.0852 | LR: 2.00e-05 [2026-04-21 22:22:02] Epoch 1 | Step 9040 | Loss: 1.0852 | LR: 2.00e-05 [2026-04-21 22:22:07] Epoch 1 | Step 9050 | Loss: 1.0851 | LR: 2.00e-05 [2026-04-21 22:22:12] Epoch 1 | Step 9060 | Loss: 1.0851 | LR: 2.00e-05 [2026-04-21 22:22:17] Epoch 1 | Step 9070 | Loss: 1.0850 | LR: 2.00e-05 [2026-04-21 22:22:22] Epoch 1 | Step 9080 | Loss: 1.0849 | LR: 2.00e-05 [2026-04-21 22:22:27] Epoch 1 | Step 9090 | Loss: 1.0849 | LR: 2.00e-05 [2026-04-21 22:22:33] Epoch 1 | Step 9100 | Loss: 1.0848 | LR: 2.00e-05 [2026-04-21 22:22:38] Epoch 1 | Step 9110 | Loss: 1.0848 | LR: 2.00e-05 [2026-04-21 22:22:43] Epoch 1 | Step 9120 | Loss: 1.0847 | LR: 2.00e-05 [2026-04-21 22:22:48] Epoch 1 | Step 9130 | Loss: 1.0847 | LR: 2.00e-05 [2026-04-21 22:22:53] Epoch 1 | Step 9140 | Loss: 1.0846 | LR: 2.00e-05 [2026-04-21 22:22:59] Epoch 1 | Step 9150 | Loss: 1.0846 | LR: 2.00e-05 [2026-04-21 22:23:05] Epoch 1 | Step 9160 | Loss: 1.0846 | LR: 2.00e-05 [2026-04-21 22:23:09] Epoch 1 | Step 9170 | Loss: 1.0845 | LR: 2.00e-05 [2026-04-21 22:23:15] Epoch 1 | Step 9180 | Loss: 1.0845 | LR: 2.00e-05 [2026-04-21 22:23:20] Epoch 1 | Step 9190 | Loss: 1.0844 | LR: 2.00e-05 [2026-04-21 22:23:25] Epoch 1 | Step 9200 | Loss: 1.0844 | LR: 2.00e-05 [2026-04-21 22:23:30] Epoch 1 | Step 9210 | Loss: 1.0845 | LR: 2.00e-05 [2026-04-21 22:23:36] Epoch 1 | Step 9220 | Loss: 1.0845 | LR: 2.00e-05 [2026-04-21 22:23:42] Epoch 1 | Step 9230 | Loss: 1.0845 | LR: 2.00e-05 [2026-04-21 22:23:47] Epoch 1 | Step 9240 | Loss: 1.0844 | LR: 2.00e-05 [2026-04-21 22:23:52] Epoch 1 | Step 9250 | Loss: 1.0845 | LR: 2.00e-05 [2026-04-21 22:23:57] Epoch 1 | Step 9260 | Loss: 1.0844 | LR: 2.00e-05 [2026-04-21 22:24:02] Epoch 1 | Step 9270 | Loss: 1.0843 | LR: 2.00e-05 [2026-04-21 22:24:07] Epoch 1 | Step 9280 | Loss: 1.0842 | LR: 2.00e-05 [2026-04-21 22:24:13] Epoch 1 | Step 9290 | Loss: 1.0841 | LR: 2.00e-05 [2026-04-21 22:24:18] Epoch 1 | Step 9300 | Loss: 1.0841 | LR: 2.00e-05 [2026-04-21 22:24:23] Epoch 1 | Step 9310 | Loss: 1.0840 | LR: 2.00e-05 [2026-04-21 22:24:28] Epoch 1 | Step 9320 | Loss: 1.0840 | LR: 2.00e-05 [2026-04-21 22:24:34] Epoch 1 | Step 9330 | Loss: 1.0842 | LR: 2.00e-05 [2026-04-21 22:24:39] Epoch 1 | Step 9340 | Loss: 1.0840 | LR: 2.00e-05 [2026-04-21 22:24:44] Epoch 1 | Step 9350 | Loss: 1.0839 | LR: 2.00e-05 [2026-04-21 22:24:49] Epoch 1 | Step 9360 | Loss: 1.0839 | LR: 2.00e-05 [2026-04-21 22:24:54] Epoch 1 | Step 9370 | Loss: 1.0839 | LR: 2.00e-05 [2026-04-21 22:25:00] Epoch 1 | Step 9380 | Loss: 1.0839 | LR: 2.00e-05 [2026-04-21 22:25:06] Epoch 1 | Step 9390 | Loss: 1.0837 | LR: 2.00e-05 [2026-04-21 22:25:11] Epoch 1 | Step 9400 | Loss: 1.0837 | LR: 2.00e-05 [2026-04-21 22:25:16] Epoch 1 | Step 9410 | Loss: 1.0836 | LR: 2.00e-05 [2026-04-21 22:25:21] Epoch 1 | Step 9420 | Loss: 1.0836 | LR: 2.00e-05 [2026-04-21 22:25:27] Epoch 1 | Step 9430 | Loss: 1.0836 | LR: 2.00e-05 [2026-04-21 22:25:34] Epoch 1 | Step 9440 | Loss: 1.0835 | LR: 2.00e-05 [2026-04-21 22:25:39] Epoch 1 | Step 9450 | Loss: 1.0835 | LR: 2.00e-05 [2026-04-21 22:25:44] Epoch 1 | Step 9460 | Loss: 1.0834 | LR: 2.00e-05 [2026-04-21 22:25:50] Epoch 1 | Step 9470 | Loss: 1.0831 | LR: 2.00e-05 [2026-04-21 22:25:55] Epoch 1 | Step 9480 | Loss: 1.0832 | LR: 2.00e-05 [2026-04-21 22:26:01] Epoch 1 | Step 9490 | Loss: 1.0831 | LR: 2.00e-05 [2026-04-21 22:26:06] Epoch 1 | Step 9500 | Loss: 1.0831 | LR: 2.00e-05 [2026-04-21 22:26:11] Epoch 1 | Step 9510 | Loss: 1.0832 | LR: 2.00e-05 [2026-04-21 22:26:17] Epoch 1 | Step 9520 | Loss: 1.0833 | LR: 2.00e-05 [2026-04-21 22:26:22] Epoch 1 | Step 9530 | Loss: 1.0833 | LR: 2.00e-05 [2026-04-21 22:26:27] Epoch 1 | Step 9540 | Loss: 1.0834 | LR: 2.00e-05 [2026-04-21 22:26:33] Epoch 1 | Step 9550 | Loss: 1.0834 | LR: 2.00e-05 [2026-04-21 22:26:38] Epoch 1 | Step 9560 | Loss: 1.0834 | LR: 2.00e-05 [2026-04-21 22:26:44] Epoch 1 | Step 9570 | Loss: 1.0835 | LR: 2.00e-05 [2026-04-21 22:26:50] Epoch 1 | Step 9580 | Loss: 1.0835 | LR: 2.00e-05 [2026-04-21 22:26:55] Epoch 1 | Step 9590 | Loss: 1.0834 | LR: 2.00e-05 [2026-04-21 22:27:00] Epoch 1 | Step 9600 | Loss: 1.0834 | LR: 2.00e-05 [2026-04-21 22:27:07] Epoch 1 | Step 9610 | Loss: 1.0835 | LR: 2.00e-05 [2026-04-21 22:27:11] Epoch 1 | Step 9620 | Loss: 1.0836 | LR: 2.00e-05 [2026-04-21 22:27:17] Epoch 1 | Step 9630 | Loss: 1.0835 | LR: 2.00e-05 [2026-04-21 22:27:22] Epoch 1 | Step 9640 | Loss: 1.0833 | LR: 2.00e-05 [2026-04-21 22:27:28] Epoch 1 | Step 9650 | Loss: 1.0834 | LR: 2.00e-05 [2026-04-21 22:27:33] Epoch 1 | Step 9660 | Loss: 1.0834 | LR: 2.00e-05 [2026-04-21 22:27:39] Epoch 1 | Step 9670 | Loss: 1.0834 | LR: 2.00e-05 [2026-04-21 22:27:44] Epoch 1 | Step 9680 | Loss: 1.0832 | LR: 2.00e-05 [2026-04-21 22:27:49] Epoch 1 | Step 9690 | Loss: 1.0832 | LR: 2.00e-05 [2026-04-21 22:27:54] Epoch 1 | Step 9700 | Loss: 1.0831 | LR: 2.00e-05 [2026-04-21 22:27:59] Epoch 1 | Step 9710 | Loss: 1.0830 | LR: 2.00e-05 [2026-04-21 22:28:05] Epoch 1 | Step 9720 | Loss: 1.0830 | LR: 2.00e-05 [2026-04-21 22:28:10] Epoch 1 | Step 9730 | Loss: 1.0830 | LR: 2.00e-05 [2026-04-21 22:28:15] Epoch 1 | Step 9740 | Loss: 1.0828 | LR: 2.00e-05 [2026-04-21 22:28:21] Epoch 1 | Step 9750 | Loss: 1.0827 | LR: 2.00e-05 [2026-04-21 22:28:26] Epoch 1 | Step 9760 | Loss: 1.0827 | LR: 2.00e-05 [2026-04-21 22:28:31] Epoch 1 | Step 9770 | Loss: 1.0827 | LR: 2.00e-05 [2026-04-21 22:28:36] Epoch 1 | Step 9780 | Loss: 1.0827 | LR: 2.00e-05 [2026-04-21 22:28:41] Epoch 1 | Step 9790 | Loss: 1.0826 | LR: 2.00e-05 [2026-04-21 22:28:47] Epoch 1 | Step 9800 | Loss: 1.0825 | LR: 2.00e-05 [2026-04-21 22:28:52] Epoch 1 | Step 9810 | Loss: 1.0824 | LR: 2.00e-05 [2026-04-21 22:28:58] Epoch 1 | Step 9820 | Loss: 1.0822 | LR: 2.00e-05 [2026-04-21 22:29:03] Epoch 1 | Step 9830 | Loss: 1.0822 | LR: 2.00e-05 [2026-04-21 22:29:08] Epoch 1 | Step 9840 | Loss: 1.0823 | LR: 2.00e-05 [2026-04-21 22:29:14] Epoch 1 | Step 9850 | Loss: 1.0823 | LR: 2.00e-05 [2026-04-21 22:29:19] Epoch 1 | Step 9860 | Loss: 1.0823 | LR: 2.00e-05 [2026-04-21 22:29:24] Epoch 1 | Step 9870 | Loss: 1.0823 | LR: 2.00e-05 [2026-04-21 22:29:29] Epoch 1 | Step 9880 | Loss: 1.0822 | LR: 2.00e-05 [2026-04-21 22:29:34] Epoch 1 | Step 9890 | Loss: 1.0823 | LR: 2.00e-05 [2026-04-21 22:29:39] Epoch 1 | Step 9900 | Loss: 1.0822 | LR: 2.00e-05 [2026-04-21 22:29:44] Epoch 1 | Step 9910 | Loss: 1.0821 | LR: 2.00e-05 [2026-04-21 22:29:50] Epoch 1 | Step 9920 | Loss: 1.0821 | LR: 2.00e-05 [2026-04-21 22:29:55] Epoch 1 | Step 9930 | Loss: 1.0820 | LR: 2.00e-05 [2026-04-21 22:30:00] Epoch 1 | Step 9940 | Loss: 1.0820 | LR: 2.00e-05 [2026-04-21 22:30:05] Epoch 1 | Step 9950 | Loss: 1.0820 | LR: 2.00e-05 [2026-04-21 22:30:10] Epoch 1 | Step 9960 | Loss: 1.0820 | LR: 2.00e-05 [2026-04-21 22:30:15] Epoch 1 | Step 9970 | Loss: 1.0819 | LR: 2.00e-05 [2026-04-21 22:30:20] Epoch 1 | Step 9980 | Loss: 1.0819 | LR: 2.00e-05 [2026-04-21 22:30:25] Epoch 1 | Step 9990 | Loss: 1.0819 | LR: 2.00e-05 [2026-04-21 22:30:31] Epoch 1 | Step 10000 | Loss: 1.0818 | LR: 2.00e-05 [2026-04-21 22:30:32] Validation | Batch 10/1567 | Loss: 1.0543 [2026-04-21 22:30:33] Validation | Batch 20/1567 | Loss: 1.1348 [2026-04-21 22:30:34] Validation | Batch 30/1567 | Loss: 1.0957 [2026-04-21 22:30:36] Validation | Batch 40/1567 | Loss: 1.1173 [2026-04-21 22:30:37] Validation | Batch 50/1567 | Loss: 1.0937 [2026-04-21 22:30:38] Validation | Batch 60/1567 | Loss: 1.0797 [2026-04-21 22:30:39] Validation | Batch 70/1567 | Loss: 1.0687 [2026-04-21 22:30:41] Validation | Batch 80/1567 | Loss: 1.0774 [2026-04-21 22:30:42] Validation | Batch 90/1567 | Loss: 1.0725 [2026-04-21 22:30:43] Validation | Batch 100/1567 | Loss: 1.0511 [2026-04-21 22:30:44] Validation | Batch 110/1567 | Loss: 1.0433 [2026-04-21 22:30:46] Validation | Batch 120/1567 | Loss: 1.0372 [2026-04-21 22:30:47] Validation | Batch 130/1567 | Loss: 1.0325 [2026-04-21 22:30:48] Validation | Batch 140/1567 | Loss: 1.0418 [2026-04-21 22:30:49] Validation | Batch 150/1567 | Loss: 1.0528 [2026-04-21 22:30:50] Validation | Batch 160/1567 | Loss: 1.0515 [2026-04-21 22:30:51] Validation | Batch 170/1567 | Loss: 1.0442 [2026-04-21 22:30:52] Validation | Batch 180/1567 | Loss: 1.0466 [2026-04-21 22:30:53] Validation | Batch 190/1567 | Loss: 1.0519 [2026-04-21 22:30:55] Validation | Batch 200/1567 | Loss: 1.0558 [2026-04-21 22:30:56] Validation | Batch 210/1567 | Loss: 1.0540 [2026-04-21 22:30:57] Validation | Batch 220/1567 | Loss: 1.0588 [2026-04-21 22:30:59] Validation | Batch 230/1567 | Loss: 1.0624 [2026-04-21 22:31:00] Validation | Batch 240/1567 | Loss: 1.0652 [2026-04-21 22:31:01] Validation | Batch 250/1567 | Loss: 1.0688 [2026-04-21 22:31:02] Validation | Batch 260/1567 | Loss: 1.0721 [2026-04-21 22:31:03] Validation | Batch 270/1567 | Loss: 1.0769 [2026-04-21 22:31:05] Validation | Batch 280/1567 | Loss: 1.0794 [2026-04-21 22:31:07] Validation | Batch 290/1567 | Loss: 1.0749 [2026-04-21 22:31:08] Validation | Batch 300/1567 | Loss: 1.0744 [2026-04-21 22:31:09] Validation | Batch 310/1567 | Loss: 1.0709 [2026-04-21 22:31:10] Validation | Batch 320/1567 | Loss: 1.0735 [2026-04-21 22:31:12] Validation | Batch 330/1567 | Loss: 1.0734 [2026-04-21 22:31:13] Validation | Batch 340/1567 | Loss: 1.0726 [2026-04-21 22:31:14] Validation | Batch 350/1567 | Loss: 1.0701 [2026-04-21 22:31:15] Validation | Batch 360/1567 | Loss: 1.0642 [2026-04-21 22:31:17] Validation | Batch 370/1567 | Loss: 1.0640 [2026-04-21 22:31:18] Validation | Batch 380/1567 | Loss: 1.0678 [2026-04-21 22:31:19] Validation | Batch 390/1567 | Loss: 1.0670 [2026-04-21 22:31:20] Validation | Batch 400/1567 | Loss: 1.0677 [2026-04-21 22:31:22] Validation | Batch 410/1567 | Loss: 1.0640 [2026-04-21 22:31:23] Validation | Batch 420/1567 | Loss: 1.0624 [2026-04-21 22:31:24] Validation | Batch 430/1567 | Loss: 1.0649 [2026-04-21 22:31:25] Validation | Batch 440/1567 | Loss: 1.0654 [2026-04-21 22:31:26] Validation | Batch 450/1567 | Loss: 1.0681 [2026-04-21 22:31:28] Validation | Batch 460/1567 | Loss: 1.0708 [2026-04-21 22:31:28] Validation | Batch 470/1567 | Loss: 1.0755 [2026-04-21 22:31:30] Validation | Batch 480/1567 | Loss: 1.0732 [2026-04-21 22:31:31] Validation | Batch 490/1567 | Loss: 1.0707 [2026-04-21 22:31:32] Validation | Batch 500/1567 | Loss: 1.0717 [2026-04-21 22:31:33] Validation | Batch 510/1567 | Loss: 1.0718 [2026-04-21 22:31:34] Validation | Batch 520/1567 | Loss: 1.0733 [2026-04-21 22:31:35] Validation | Batch 530/1567 | Loss: 1.0717 [2026-04-21 22:31:37] Validation | Batch 540/1567 | Loss: 1.0688 [2026-04-21 22:31:38] Validation | Batch 550/1567 | Loss: 1.0701 [2026-04-21 22:31:39] Validation | Batch 560/1567 | Loss: 1.0692 [2026-04-21 22:31:41] Validation | Batch 570/1567 | Loss: 1.0653 [2026-04-21 22:31:42] Validation | Batch 580/1567 | Loss: 1.0668 [2026-04-21 22:31:43] Validation | Batch 590/1567 | Loss: 1.0663 [2026-04-21 22:31:44] Validation | Batch 600/1567 | Loss: 1.0652 [2026-04-21 22:31:46] Validation | Batch 610/1567 | Loss: 1.0676 [2026-04-21 22:31:47] Validation | Batch 620/1567 | Loss: 1.0654 [2026-04-21 22:31:48] Validation | Batch 630/1567 | Loss: 1.0656 [2026-04-21 22:31:50] Validation | Batch 640/1567 | Loss: 1.0666 [2026-04-21 22:31:51] Validation | Batch 650/1567 | Loss: 1.0696 [2026-04-21 22:31:52] Validation | Batch 660/1567 | Loss: 1.0709 [2026-04-21 22:31:53] Validation | Batch 670/1567 | Loss: 1.0695 [2026-04-21 22:31:54] Validation | Batch 680/1567 | Loss: 1.0683 [2026-04-21 22:31:55] Validation | Batch 690/1567 | Loss: 1.0669 [2026-04-21 22:31:57] Validation | Batch 700/1567 | Loss: 1.0669 [2026-04-21 22:31:58] Validation | Batch 710/1567 | Loss: 1.0663 [2026-04-21 22:31:59] Validation | Batch 720/1567 | Loss: 1.0634 [2026-04-21 22:32:00] Validation | Batch 730/1567 | Loss: 1.0636 [2026-04-21 22:32:01] Validation | Batch 740/1567 | Loss: 1.0642 [2026-04-21 22:32:02] Validation | Batch 750/1567 | Loss: 1.0639 [2026-04-21 22:32:03] Validation | Batch 760/1567 | Loss: 1.0656 [2026-04-21 22:32:05] Validation | Batch 770/1567 | Loss: 1.0649 [2026-04-21 22:32:06] Validation | Batch 780/1567 | Loss: 1.0659 [2026-04-21 22:32:07] Validation | Batch 790/1567 | Loss: 1.0644 [2026-04-21 22:32:08] Validation | Batch 800/1567 | Loss: 1.0625 [2026-04-21 22:32:09] Validation | Batch 810/1567 | Loss: 1.0631 [2026-04-21 22:32:10] Validation | Batch 820/1567 | Loss: 1.0624 [2026-04-21 22:32:11] Validation | Batch 830/1567 | Loss: 1.0616 [2026-04-21 22:32:12] Validation | Batch 840/1567 | Loss: 1.0620 [2026-04-21 22:32:13] Validation | Batch 850/1567 | Loss: 1.0631 [2026-04-21 22:32:14] Validation | Batch 860/1567 | Loss: 1.0639 [2026-04-21 22:32:15] Validation | Batch 870/1567 | Loss: 1.0646 [2026-04-21 22:32:17] Validation | Batch 880/1567 | Loss: 1.0647 [2026-04-21 22:32:18] Validation | Batch 890/1567 | Loss: 1.0645 [2026-04-21 22:32:19] Validation | Batch 900/1567 | Loss: 1.0639 [2026-04-21 22:32:20] Validation | Batch 910/1567 | Loss: 1.0636 [2026-04-21 22:32:22] Validation | Batch 920/1567 | Loss: 1.0655 [2026-04-21 22:32:22] Validation | Batch 930/1567 | Loss: 1.0652 [2026-04-21 22:32:24] Validation | Batch 940/1567 | Loss: 1.0653 [2026-04-21 22:32:25] Validation | Batch 950/1567 | Loss: 1.0647 [2026-04-21 22:32:26] Validation | Batch 960/1567 | Loss: 1.0652 [2026-04-21 22:32:27] Validation | Batch 970/1567 | Loss: 1.0657 [2026-04-21 22:32:28] Validation | Batch 980/1567 | Loss: 1.0654 [2026-04-21 22:32:29] Validation | Batch 990/1567 | Loss: 1.0664 [2026-04-21 22:32:30] Validation | Batch 1000/1567 | Loss: 1.0668 [2026-04-21 22:32:31] Validation | Batch 1010/1567 | Loss: 1.0659 [2026-04-21 22:32:32] Validation | Batch 1020/1567 | Loss: 1.0670 [2026-04-21 22:32:33] Validation | Batch 1030/1567 | Loss: 1.0675 [2026-04-21 22:32:35] Validation | Batch 1040/1567 | Loss: 1.0668 [2026-04-21 22:32:36] Validation | Batch 1050/1567 | Loss: 1.0659 [2026-04-21 22:32:37] Validation | Batch 1060/1567 | Loss: 1.0670 [2026-04-21 22:32:39] Validation | Batch 1070/1567 | Loss: 1.0669 [2026-04-21 22:32:40] Validation | Batch 1080/1567 | Loss: 1.0683 [2026-04-21 22:32:41] Validation | Batch 1090/1567 | Loss: 1.0709 [2026-04-21 22:32:42] Validation | Batch 1100/1567 | Loss: 1.0723 [2026-04-21 22:32:43] Validation | Batch 1110/1567 | Loss: 1.0714 [2026-04-21 22:32:44] Validation | Batch 1120/1567 | Loss: 1.0717 [2026-04-21 22:32:46] Validation | Batch 1130/1567 | Loss: 1.0699 [2026-04-21 22:32:47] Validation | Batch 1140/1567 | Loss: 1.0702 [2026-04-21 22:32:48] Validation | Batch 1150/1567 | Loss: 1.0690 [2026-04-21 22:32:49] Validation | Batch 1160/1567 | Loss: 1.0683 [2026-04-21 22:32:50] Validation | Batch 1170/1567 | Loss: 1.0685 [2026-04-21 22:32:51] Validation | Batch 1180/1567 | Loss: 1.0688 [2026-04-21 22:32:52] Validation | Batch 1190/1567 | Loss: 1.0690 [2026-04-21 22:32:54] Validation | Batch 1200/1567 | Loss: 1.0678 [2026-04-21 22:32:55] Validation | Batch 1210/1567 | Loss: 1.0670 [2026-04-21 22:32:56] Validation | Batch 1220/1567 | Loss: 1.0679 [2026-04-21 22:32:57] Validation | Batch 1230/1567 | Loss: 1.0685 [2026-04-21 22:32:58] Validation | Batch 1240/1567 | Loss: 1.0684 [2026-04-21 22:32:59] Validation | Batch 1250/1567 | Loss: 1.0686 [2026-04-21 22:33:01] Validation | Batch 1260/1567 | Loss: 1.0684 [2026-04-21 22:33:02] Validation | Batch 1270/1567 | Loss: 1.0669 [2026-04-21 22:33:03] Validation | Batch 1280/1567 | Loss: 1.0670 [2026-04-21 22:33:05] Validation | Batch 1290/1567 | Loss: 1.0671 [2026-04-21 22:33:06] Validation | Batch 1300/1567 | Loss: 1.0675 [2026-04-21 22:33:07] Validation | Batch 1310/1567 | Loss: 1.0681 [2026-04-21 22:33:08] Validation | Batch 1320/1567 | Loss: 1.0688 [2026-04-21 22:33:09] Validation | Batch 1330/1567 | Loss: 1.0700 [2026-04-21 22:33:11] Validation | Batch 1340/1567 | Loss: 1.0697 [2026-04-21 22:33:11] Validation | Batch 1350/1567 | Loss: 1.0700 [2026-04-21 22:33:13] Validation | Batch 1360/1567 | Loss: 1.0692 [2026-04-21 22:33:14] Validation | Batch 1370/1567 | Loss: 1.0687 [2026-04-21 22:33:15] Validation | Batch 1380/1567 | Loss: 1.0687 [2026-04-21 22:33:16] Validation | Batch 1390/1567 | Loss: 1.0680 [2026-04-21 22:33:17] Validation | Batch 1400/1567 | Loss: 1.0676 [2026-04-21 22:33:18] Validation | Batch 1410/1567 | Loss: 1.0683 [2026-04-21 22:33:19] Validation | Batch 1420/1567 | Loss: 1.0682 [2026-04-21 22:33:20] Validation | Batch 1430/1567 | Loss: 1.0686 [2026-04-21 22:33:22] Validation | Batch 1440/1567 | Loss: 1.0692 [2026-04-21 22:33:23] Validation | Batch 1450/1567 | Loss: 1.0693 [2026-04-21 22:33:24] Validation | Batch 1460/1567 | Loss: 1.0688 [2026-04-21 22:33:25] Validation | Batch 1470/1567 | Loss: 1.0686 [2026-04-21 22:33:26] Validation | Batch 1480/1567 | Loss: 1.0683 [2026-04-21 22:33:26] Validation | Batch 1490/1567 | Loss: 1.0677 [2026-04-21 22:33:28] Validation | Batch 1500/1567 | Loss: 1.0675 [2026-04-21 22:33:29] Validation | Batch 1510/1567 | Loss: 1.0665 [2026-04-21 22:33:30] Validation | Batch 1520/1567 | Loss: 1.0665 [2026-04-21 22:33:31] Validation | Batch 1530/1567 | Loss: 1.0664 [2026-04-21 22:33:32] Validation | Batch 1540/1567 | Loss: 1.0670 [2026-04-21 22:33:33] Validation | Batch 1550/1567 | Loss: 1.0682 [2026-04-21 22:33:34] Validation | Batch 1560/1567 | Loss: 1.0679 [2026-04-21 22:33:35] Validation | Batch 1567/1567 | Loss: 1.0680 [2026-04-21 22:33:35] Validation | Loss: 1.0680 | PPL: 2.96 | Time: 184.60s [2026-04-21 22:33:53] New best model saved! Val loss: 1.0680 [2026-04-21 22:33:58] Epoch 1 | Step 10010 | Loss: 1.0820 | LR: 2.00e-05 [2026-04-21 22:34:03] Epoch 1 | Step 10020 | Loss: 1.0820 | LR: 2.00e-05 [2026-04-21 22:34:08] Epoch 1 | Step 10030 | Loss: 1.0820 | LR: 2.00e-05 [2026-04-21 22:34:13] Epoch 1 | Step 10040 | Loss: 1.0818 | LR: 2.00e-05 [2026-04-21 22:34:18] Epoch 1 | Step 10050 | Loss: 1.0818 | LR: 2.00e-05 [2026-04-21 22:34:24] Epoch 1 | Step 10060 | Loss: 1.0818 | LR: 2.00e-05 [2026-04-21 22:34:29] Epoch 1 | Step 10070 | Loss: 1.0818 | LR: 2.00e-05 [2026-04-21 22:34:34] Epoch 1 | Step 10080 | Loss: 1.0818 | LR: 2.00e-05 [2026-04-21 22:34:39] Epoch 1 | Step 10090 | Loss: 1.0818 | LR: 2.00e-05 [2026-04-21 22:34:45] Epoch 1 | Step 10100 | Loss: 1.0817 | LR: 2.00e-05 [2026-04-21 22:34:50] Epoch 1 | Step 10110 | Loss: 1.0817 | LR: 2.00e-05 [2026-04-21 22:34:56] Epoch 1 | Step 10120 | Loss: 1.0816 | LR: 2.00e-05 [2026-04-21 22:35:01] Epoch 1 | Step 10130 | Loss: 1.0816 | LR: 2.00e-05 [2026-04-21 22:35:06] Epoch 1 | Step 10140 | Loss: 1.0816 | LR: 2.00e-05 [2026-04-21 22:35:12] Epoch 1 | Step 10150 | Loss: 1.0815 | LR: 2.00e-05 [2026-04-21 22:35:17] Epoch 1 | Step 10160 | Loss: 1.0816 | LR: 2.00e-05 [2026-04-21 22:35:23] Epoch 1 | Step 10170 | Loss: 1.0814 | LR: 2.00e-05 [2026-04-21 22:35:27] Epoch 1 | Step 10180 | Loss: 1.0813 | LR: 2.00e-05 [2026-04-21 22:35:32] Epoch 1 | Step 10190 | Loss: 1.0813 | LR: 2.00e-05 [2026-04-21 22:35:37] Epoch 1 | Step 10200 | Loss: 1.0813 | LR: 2.00e-05 [2026-04-21 22:35:43] Epoch 1 | Step 10210 | Loss: 1.0813 | LR: 2.00e-05 [2026-04-21 22:35:48] Epoch 1 | Step 10220 | Loss: 1.0812 | LR: 2.00e-05 [2026-04-21 22:35:53] Epoch 1 | Step 10230 | Loss: 1.0812 | LR: 2.00e-05 [2026-04-21 22:35:58] Epoch 1 | Step 10240 | Loss: 1.0812 | LR: 2.00e-05 [2026-04-21 22:36:04] Epoch 1 | Step 10250 | Loss: 1.0812 | LR: 2.00e-05 [2026-04-21 22:36:09] Epoch 1 | Step 10260 | Loss: 1.0811 | LR: 2.00e-05 [2026-04-21 22:36:14] Epoch 1 | Step 10270 | Loss: 1.0810 | LR: 2.00e-05 [2026-04-21 22:36:20] Epoch 1 | Step 10280 | Loss: 1.0811 | LR: 2.00e-05 [2026-04-21 22:36:25] Epoch 1 | Step 10290 | Loss: 1.0811 | LR: 2.00e-05 [2026-04-21 22:36:30] Epoch 1 | Step 10300 | Loss: 1.0810 | LR: 2.00e-05 [2026-04-21 22:36:34] Epoch 1 | Step 10310 | Loss: 1.0809 | LR: 2.00e-05 [2026-04-21 22:36:40] Epoch 1 | Step 10320 | Loss: 1.0808 | LR: 2.00e-05 [2026-04-21 22:36:46] Epoch 1 | Step 10330 | Loss: 1.0807 | LR: 2.00e-05 [2026-04-21 22:36:51] Epoch 1 | Step 10340 | Loss: 1.0808 | LR: 2.00e-05 [2026-04-21 22:36:56] Epoch 1 | Step 10350 | Loss: 1.0808 | LR: 2.00e-05 [2026-04-21 22:37:01] Epoch 1 | Step 10360 | Loss: 1.0809 | LR: 2.00e-05 [2026-04-21 22:37:06] Epoch 1 | Step 10370 | Loss: 1.0808 | LR: 2.00e-05 [2026-04-21 22:37:11] Epoch 1 | Step 10380 | Loss: 1.0809 | LR: 2.00e-05 [2026-04-21 22:37:16] Epoch 1 | Step 10390 | Loss: 1.0809 | LR: 2.00e-05 [2026-04-21 22:37:21] Epoch 1 | Step 10400 | Loss: 1.0809 | LR: 2.00e-05 [2026-04-21 22:37:27] Epoch 1 | Step 10410 | Loss: 1.0809 | LR: 2.00e-05 [2026-04-21 22:37:31] Epoch 1 | Step 10420 | Loss: 1.0808 | LR: 2.00e-05 [2026-04-21 22:37:37] Epoch 1 | Step 10430 | Loss: 1.0806 | LR: 2.00e-05 [2026-04-21 22:37:42] Epoch 1 | Step 10440 | Loss: 1.0806 | LR: 2.00e-05 [2026-04-21 22:37:47] Epoch 1 | Step 10450 | Loss: 1.0807 | LR: 2.00e-05 [2026-04-21 22:37:52] Epoch 1 | Step 10460 | Loss: 1.0808 | LR: 2.00e-05 [2026-04-21 22:37:58] Epoch 1 | Step 10470 | Loss: 1.0807 | LR: 2.00e-05 [2026-04-21 22:38:03] Epoch 1 | Step 10480 | Loss: 1.0808 | LR: 2.00e-05 [2026-04-21 22:38:08] Epoch 1 | Step 10490 | Loss: 1.0807 | LR: 2.00e-05 [2026-04-21 22:38:14] Epoch 1 | Step 10500 | Loss: 1.0806 | LR: 2.00e-05 [2026-04-21 22:38:20] Epoch 1 | Step 10510 | Loss: 1.0804 | LR: 2.00e-05 [2026-04-21 22:38:26] Epoch 1 | Step 10520 | Loss: 1.0804 | LR: 2.00e-05 [2026-04-21 22:38:32] Epoch 1 | Step 10530 | Loss: 1.0803 | LR: 2.00e-05 [2026-04-21 22:38:37] Epoch 1 | Step 10540 | Loss: 1.0803 | LR: 2.00e-05 [2026-04-21 22:38:43] Epoch 1 | Step 10550 | Loss: 1.0802 | LR: 2.00e-05 [2026-04-21 22:38:48] Epoch 1 | Step 10560 | Loss: 1.0802 | LR: 2.00e-05 [2026-04-21 22:38:53] Epoch 1 | Step 10570 | Loss: 1.0801 | LR: 2.00e-05 [2026-04-21 22:38:59] Epoch 1 | Step 10580 | Loss: 1.0801 | LR: 2.00e-05 [2026-04-21 22:39:04] Epoch 1 | Step 10590 | Loss: 1.0801 | LR: 2.00e-05 [2026-04-21 22:39:05] Epoch 1 completed in 7819.67s | Loss: 1.0801 [2026-04-21 22:39:15] Checkpoint saved: outputs/2026-04-21/20-28-37/checkpoints/checkpoint_step_10591.pt [2026-04-21 22:40:29] ============================================================ [2026-04-21 22:40:29] EPOCH 2/3 [2026-04-21 22:40:29] ============================================================ [2026-04-21 22:40:33] Epoch 2 | Step 10600 | Loss: 0.8839 | LR: 2.00e-05 [2026-04-21 22:40:38] Epoch 2 | Step 10610 | Loss: 0.8746 | LR: 2.00e-05 [2026-04-21 22:40:44] Epoch 2 | Step 10620 | Loss: 0.8551 | LR: 2.00e-05 [2026-04-21 22:40:49] Epoch 2 | Step 10630 | Loss: 0.8581 | LR: 2.00e-05 [2026-04-21 22:40:54] Epoch 2 | Step 10640 | Loss: 0.8580 | LR: 2.00e-05 [2026-04-21 22:40:59] Epoch 2 | Step 10650 | Loss: 0.8686 | LR: 2.00e-05 [2026-04-21 22:41:04] Epoch 2 | Step 10660 | Loss: 0.8703 | LR: 2.00e-05 [2026-04-21 22:41:09] Epoch 2 | Step 10670 | Loss: 0.8982 | LR: 2.00e-05 [2026-04-21 22:41:15] Epoch 2 | Step 10680 | Loss: 0.8991 | LR: 2.00e-05 [2026-04-21 22:41:20] Epoch 2 | Step 10690 | Loss: 0.9010 | LR: 2.00e-05 [2026-04-21 22:41:26] Epoch 2 | Step 10700 | Loss: 0.8967 | LR: 2.00e-05 [2026-04-21 22:41:31] Epoch 2 | Step 10710 | Loss: 0.9014 | LR: 2.00e-05 [2026-04-21 22:41:36] Epoch 2 | Step 10720 | Loss: 0.9087 | LR: 2.00e-05 [2026-04-21 22:41:42] Epoch 2 | Step 10730 | Loss: 0.9050 | LR: 2.00e-05 [2026-04-21 22:41:47] Epoch 2 | Step 10740 | Loss: 0.9039 | LR: 2.00e-05 [2026-04-21 22:41:52] Epoch 2 | Step 10750 | Loss: 0.9075 | LR: 2.00e-05 [2026-04-21 22:41:58] Epoch 2 | Step 10760 | Loss: 0.9090 | LR: 2.00e-05 [2026-04-21 22:42:03] Epoch 2 | Step 10770 | Loss: 0.9075 | LR: 2.00e-05 [2026-04-21 22:42:08] Epoch 2 | Step 10780 | Loss: 0.9043 | LR: 2.00e-05 [2026-04-21 22:42:14] Epoch 2 | Step 10790 | Loss: 0.9015 | LR: 2.00e-05 [2026-04-21 22:42:20] Epoch 2 | Step 10800 | Loss: 0.9038 | LR: 2.00e-05 [2026-04-21 22:42:25] Epoch 2 | Step 10810 | Loss: 0.9067 | LR: 2.00e-05 [2026-04-21 22:42:30] Epoch 2 | Step 10820 | Loss: 0.9109 | LR: 2.00e-05 [2026-04-21 22:42:36] Epoch 2 | Step 10830 | Loss: 0.9066 | LR: 2.00e-05 [2026-04-21 22:42:41] Epoch 2 | Step 10840 | Loss: 0.9108 | LR: 2.00e-05 [2026-04-21 22:42:46] Epoch 2 | Step 10850 | Loss: 0.9059 | LR: 2.00e-05 [2026-04-21 22:42:52] Epoch 2 | Step 10860 | Loss: 0.9036 | LR: 2.00e-05 [2026-04-21 22:42:57] Epoch 2 | Step 10870 | Loss: 0.9065 | LR: 2.00e-05 [2026-04-21 22:43:03] Epoch 2 | Step 10880 | Loss: 0.9035 | LR: 2.00e-05 [2026-04-21 22:43:08] Epoch 2 | Step 10890 | Loss: 0.9057 | LR: 2.00e-05 [2026-04-21 22:43:14] Epoch 2 | Step 10900 | Loss: 0.9047 | LR: 2.00e-05 [2026-04-21 22:43:20] Epoch 2 | Step 10910 | Loss: 0.9020 | LR: 2.00e-05 [2026-04-21 22:43:26] Epoch 2 | Step 10920 | Loss: 0.9042 | LR: 2.00e-05 [2026-04-21 22:43:31] Epoch 2 | Step 10930 | Loss: 0.9029 | LR: 2.00e-05 [2026-04-21 22:43:35] Epoch 2 | Step 10940 | Loss: 0.9026 | LR: 2.00e-05 [2026-04-21 22:43:40] Epoch 2 | Step 10950 | Loss: 0.9040 | LR: 2.00e-05 [2026-04-21 22:43:45] Epoch 2 | Step 10960 | Loss: 0.9028 | LR: 2.00e-05 [2026-04-21 22:43:51] Epoch 2 | Step 10970 | Loss: 0.9040 | LR: 2.00e-05 [2026-04-21 22:43:56] Epoch 2 | Step 10980 | Loss: 0.9032 | LR: 2.00e-05 [2026-04-21 22:44:01] Epoch 2 | Step 10990 | Loss: 0.9021 | LR: 2.00e-05 [2026-04-21 22:44:07] Epoch 2 | Step 11000 | Loss: 0.9040 | LR: 2.00e-05 [2026-04-21 22:44:08] Validation | Batch 10/1567 | Loss: 1.0741 [2026-04-21 22:44:09] Validation | Batch 20/1567 | Loss: 1.1528 [2026-04-21 22:44:10] Validation | Batch 30/1567 | Loss: 1.1141 [2026-04-21 22:44:12] Validation | Batch 40/1567 | Loss: 1.1354 [2026-04-21 22:44:13] Validation | Batch 50/1567 | Loss: 1.1100 [2026-04-21 22:44:14] Validation | Batch 60/1567 | Loss: 1.0982 [2026-04-21 22:44:15] Validation | Batch 70/1567 | Loss: 1.0887 [2026-04-21 22:44:17] Validation | Batch 80/1567 | Loss: 1.0955 [2026-04-21 22:44:18] Validation | Batch 90/1567 | Loss: 1.0901 [2026-04-21 22:44:19] Validation | Batch 100/1567 | Loss: 1.0680 [2026-04-21 22:44:20] Validation | Batch 110/1567 | Loss: 1.0573 [2026-04-21 22:44:22] Validation | Batch 120/1567 | Loss: 1.0502 [2026-04-21 22:44:23] Validation | Batch 130/1567 | Loss: 1.0454 [2026-04-21 22:44:24] Validation | Batch 140/1567 | Loss: 1.0550 [2026-04-21 22:44:25] Validation | Batch 150/1567 | Loss: 1.0656 [2026-04-21 22:44:26] Validation | Batch 160/1567 | Loss: 1.0644 [2026-04-21 22:44:27] Validation | Batch 170/1567 | Loss: 1.0555 [2026-04-21 22:44:28] Validation | Batch 180/1567 | Loss: 1.0568 [2026-04-21 22:44:29] Validation | Batch 190/1567 | Loss: 1.0618 [2026-04-21 22:44:31] Validation | Batch 200/1567 | Loss: 1.0661 [2026-04-21 22:44:32] Validation | Batch 210/1567 | Loss: 1.0642 [2026-04-21 22:44:33] Validation | Batch 220/1567 | Loss: 1.0688 [2026-04-21 22:44:35] Validation | Batch 230/1567 | Loss: 1.0726 [2026-04-21 22:44:36] Validation | Batch 240/1567 | Loss: 1.0748 [2026-04-21 22:44:37] Validation | Batch 250/1567 | Loss: 1.0791 [2026-04-21 22:44:38] Validation | Batch 260/1567 | Loss: 1.0821 [2026-04-21 22:44:39] Validation | Batch 270/1567 | Loss: 1.0867 [2026-04-21 22:44:41] Validation | Batch 280/1567 | Loss: 1.0895 [2026-04-21 22:44:43] Validation | Batch 290/1567 | Loss: 1.0852 [2026-04-21 22:44:44] Validation | Batch 300/1567 | Loss: 1.0842 [2026-04-21 22:44:45] Validation | Batch 310/1567 | Loss: 1.0813 [2026-04-21 22:44:46] Validation | Batch 320/1567 | Loss: 1.0839 [2026-04-21 22:44:48] Validation | Batch 330/1567 | Loss: 1.0841 [2026-04-21 22:44:49] Validation | Batch 340/1567 | Loss: 1.0835 [2026-04-21 22:44:50] Validation | Batch 350/1567 | Loss: 1.0809 [2026-04-21 22:44:51] Validation | Batch 360/1567 | Loss: 1.0745 [2026-04-21 22:44:53] Validation | Batch 370/1567 | Loss: 1.0748 [2026-04-21 22:44:54] Validation | Batch 380/1567 | Loss: 1.0784 [2026-04-21 22:44:55] Validation | Batch 390/1567 | Loss: 1.0774 [2026-04-21 22:44:56] Validation | Batch 400/1567 | Loss: 1.0787 [2026-04-21 22:44:58] Validation | Batch 410/1567 | Loss: 1.0748 [2026-04-21 22:44:59] Validation | Batch 420/1567 | Loss: 1.0737 [2026-04-21 22:45:00] Validation | Batch 430/1567 | Loss: 1.0761 [2026-04-21 22:45:01] Validation | Batch 440/1567 | Loss: 1.0764 [2026-04-21 22:45:02] Validation | Batch 450/1567 | Loss: 1.0789 [2026-04-21 22:45:04] Validation | Batch 460/1567 | Loss: 1.0816 [2026-04-21 22:45:04] Validation | Batch 470/1567 | Loss: 1.0863 [2026-04-21 22:45:06] Validation | Batch 480/1567 | Loss: 1.0837 [2026-04-21 22:45:07] Validation | Batch 490/1567 | Loss: 1.0813 [2026-04-21 22:45:08] Validation | Batch 500/1567 | Loss: 1.0821 [2026-04-21 22:45:09] Validation | Batch 510/1567 | Loss: 1.0822 [2026-04-21 22:45:10] Validation | Batch 520/1567 | Loss: 1.0834 [2026-04-21 22:45:11] Validation | Batch 530/1567 | Loss: 1.0816 [2026-04-21 22:45:13] Validation | Batch 540/1567 | Loss: 1.0789 [2026-04-21 22:45:14] Validation | Batch 550/1567 | Loss: 1.0800 [2026-04-21 22:45:15] Validation | Batch 560/1567 | Loss: 1.0792 [2026-04-21 22:45:17] Validation | Batch 570/1567 | Loss: 1.0751 [2026-04-21 22:45:18] Validation | Batch 580/1567 | Loss: 1.0768 [2026-04-21 22:45:19] Validation | Batch 590/1567 | Loss: 1.0764 [2026-04-21 22:45:20] Validation | Batch 600/1567 | Loss: 1.0752 [2026-04-21 22:45:22] Validation | Batch 610/1567 | Loss: 1.0773 [2026-04-21 22:45:23] Validation | Batch 620/1567 | Loss: 1.0753 [2026-04-21 22:45:24] Validation | Batch 630/1567 | Loss: 1.0752 [2026-04-21 22:45:26] Validation | Batch 640/1567 | Loss: 1.0762 [2026-04-21 22:45:27] Validation | Batch 650/1567 | Loss: 1.0789 [2026-04-21 22:45:28] Validation | Batch 660/1567 | Loss: 1.0803 [2026-04-21 22:45:29] Validation | Batch 670/1567 | Loss: 1.0787 [2026-04-21 22:45:30] Validation | Batch 680/1567 | Loss: 1.0774 [2026-04-21 22:45:31] Validation | Batch 690/1567 | Loss: 1.0759 [2026-04-21 22:45:33] Validation | Batch 700/1567 | Loss: 1.0760 [2026-04-21 22:45:34] Validation | Batch 710/1567 | Loss: 1.0752 [2026-04-21 22:45:35] Validation | Batch 720/1567 | Loss: 1.0722 [2026-04-21 22:45:36] Validation | Batch 730/1567 | Loss: 1.0726 [2026-04-21 22:45:37] Validation | Batch 740/1567 | Loss: 1.0730 [2026-04-21 22:45:38] Validation | Batch 750/1567 | Loss: 1.0725 [2026-04-21 22:45:39] Validation | Batch 760/1567 | Loss: 1.0741 [2026-04-21 22:45:41] Validation | Batch 770/1567 | Loss: 1.0737 [2026-04-21 22:45:42] Validation | Batch 780/1567 | Loss: 1.0747 [2026-04-21 22:45:43] Validation | Batch 790/1567 | Loss: 1.0731 [2026-04-21 22:45:44] Validation | Batch 800/1567 | Loss: 1.0711 [2026-04-21 22:45:45] Validation | Batch 810/1567 | Loss: 1.0716 [2026-04-21 22:45:46] Validation | Batch 820/1567 | Loss: 1.0709 [2026-04-21 22:45:47] Validation | Batch 830/1567 | Loss: 1.0700 [2026-04-21 22:45:48] Validation | Batch 840/1567 | Loss: 1.0705 [2026-04-21 22:45:50] Validation | Batch 850/1567 | Loss: 1.0715 [2026-04-21 22:45:50] Validation | Batch 860/1567 | Loss: 1.0725 [2026-04-21 22:45:51] Validation | Batch 870/1567 | Loss: 1.0730 [2026-04-21 22:45:53] Validation | Batch 880/1567 | Loss: 1.0728 [2026-04-21 22:45:54] Validation | Batch 890/1567 | Loss: 1.0726 [2026-04-21 22:45:55] Validation | Batch 900/1567 | Loss: 1.0721 [2026-04-21 22:45:56] Validation | Batch 910/1567 | Loss: 1.0717 [2026-04-21 22:45:58] Validation | Batch 920/1567 | Loss: 1.0737 [2026-04-21 22:45:59] Validation | Batch 930/1567 | Loss: 1.0735 [2026-04-21 22:46:00] Validation | Batch 940/1567 | Loss: 1.0736 [2026-04-21 22:46:01] Validation | Batch 950/1567 | Loss: 1.0731 [2026-04-21 22:46:02] Validation | Batch 960/1567 | Loss: 1.0735 [2026-04-21 22:46:03] Validation | Batch 970/1567 | Loss: 1.0739 [2026-04-21 22:46:04] Validation | Batch 980/1567 | Loss: 1.0735 [2026-04-21 22:46:05] Validation | Batch 990/1567 | Loss: 1.0745 [2026-04-21 22:46:06] Validation | Batch 1000/1567 | Loss: 1.0749 [2026-04-21 22:46:07] Validation | Batch 1010/1567 | Loss: 1.0740 [2026-04-21 22:46:08] Validation | Batch 1020/1567 | Loss: 1.0751 [2026-04-21 22:46:09] Validation | Batch 1030/1567 | Loss: 1.0757 [2026-04-21 22:46:11] Validation | Batch 1040/1567 | Loss: 1.0749 [2026-04-21 22:46:12] Validation | Batch 1050/1567 | Loss: 1.0739 [2026-04-21 22:46:13] Validation | Batch 1060/1567 | Loss: 1.0752 [2026-04-21 22:46:15] Validation | Batch 1070/1567 | Loss: 1.0752 [2026-04-21 22:46:16] Validation | Batch 1080/1567 | Loss: 1.0767 [2026-04-21 22:46:17] Validation | Batch 1090/1567 | Loss: 1.0794 [2026-04-21 22:46:18] Validation | Batch 1100/1567 | Loss: 1.0809 [2026-04-21 22:46:19] Validation | Batch 1110/1567 | Loss: 1.0798 [2026-04-21 22:46:20] Validation | Batch 1120/1567 | Loss: 1.0800 [2026-04-21 22:46:22] Validation | Batch 1130/1567 | Loss: 1.0782 [2026-04-21 22:46:23] Validation | Batch 1140/1567 | Loss: 1.0786 [2026-04-21 22:46:24] Validation | Batch 1150/1567 | Loss: 1.0774 [2026-04-21 22:46:25] Validation | Batch 1160/1567 | Loss: 1.0768 [2026-04-21 22:46:26] Validation | Batch 1170/1567 | Loss: 1.0772 [2026-04-21 22:46:27] Validation | Batch 1180/1567 | Loss: 1.0773 [2026-04-21 22:46:28] Validation | Batch 1190/1567 | Loss: 1.0777 [2026-04-21 22:46:30] Validation | Batch 1200/1567 | Loss: 1.0765 [2026-04-21 22:46:31] Validation | Batch 1210/1567 | Loss: 1.0758 [2026-04-21 22:46:32] Validation | Batch 1220/1567 | Loss: 1.0766 [2026-04-21 22:46:33] Validation | Batch 1230/1567 | Loss: 1.0772 [2026-04-21 22:46:34] Validation | Batch 1240/1567 | Loss: 1.0771 [2026-04-21 22:46:35] Validation | Batch 1250/1567 | Loss: 1.0773 [2026-04-21 22:46:37] Validation | Batch 1260/1567 | Loss: 1.0769 [2026-04-21 22:46:38] Validation | Batch 1270/1567 | Loss: 1.0752 [2026-04-21 22:46:39] Validation | Batch 1280/1567 | Loss: 1.0754 [2026-04-21 22:46:41] Validation | Batch 1290/1567 | Loss: 1.0757 [2026-04-21 22:46:42] Validation | Batch 1300/1567 | Loss: 1.0760 [2026-04-21 22:46:43] Validation | Batch 1310/1567 | Loss: 1.0766 [2026-04-21 22:46:44] Validation | Batch 1320/1567 | Loss: 1.0773 [2026-04-21 22:46:45] Validation | Batch 1330/1567 | Loss: 1.0788 [2026-04-21 22:46:47] Validation | Batch 1340/1567 | Loss: 1.0783 [2026-04-21 22:46:47] Validation | Batch 1350/1567 | Loss: 1.0786 [2026-04-21 22:46:49] Validation | Batch 1360/1567 | Loss: 1.0778 [2026-04-21 22:46:50] Validation | Batch 1370/1567 | Loss: 1.0774 [2026-04-21 22:46:51] Validation | Batch 1380/1567 | Loss: 1.0775 [2026-04-21 22:46:52] Validation | Batch 1390/1567 | Loss: 1.0769 [2026-04-21 22:46:53] Validation | Batch 1400/1567 | Loss: 1.0765 [2026-04-21 22:46:54] Validation | Batch 1410/1567 | Loss: 1.0770 [2026-04-21 22:46:55] Validation | Batch 1420/1567 | Loss: 1.0770 [2026-04-21 22:46:56] Validation | Batch 1430/1567 | Loss: 1.0773 [2026-04-21 22:46:58] Validation | Batch 1440/1567 | Loss: 1.0779 [2026-04-21 22:46:59] Validation | Batch 1450/1567 | Loss: 1.0781 [2026-04-21 22:47:00] Validation | Batch 1460/1567 | Loss: 1.0774 [2026-04-21 22:47:01] Validation | Batch 1470/1567 | Loss: 1.0772 [2026-04-21 22:47:02] Validation | Batch 1480/1567 | Loss: 1.0770 [2026-04-21 22:47:02] Validation | Batch 1490/1567 | Loss: 1.0764 [2026-04-21 22:47:04] Validation | Batch 1500/1567 | Loss: 1.0762 [2026-04-21 22:47:05] Validation | Batch 1510/1567 | Loss: 1.0753 [2026-04-21 22:47:06] Validation | Batch 1520/1567 | Loss: 1.0752 [2026-04-21 22:47:07] Validation | Batch 1530/1567 | Loss: 1.0753 [2026-04-21 22:47:08] Validation | Batch 1540/1567 | Loss: 1.0759 [2026-04-21 22:47:09] Validation | Batch 1550/1567 | Loss: 1.0771 [2026-04-21 22:47:10] Validation | Batch 1560/1567 | Loss: 1.0767 [2026-04-21 22:47:11] Validation | Batch 1567/1567 | Loss: 1.0768 [2026-04-21 22:47:11] Validation | Loss: 1.0768 | PPL: 2.99 | Time: 184.61s [2026-04-21 22:47:16] Epoch 2 | Step 11010 | Loss: 0.9032 | LR: 2.00e-05 [2026-04-21 22:47:21] Epoch 2 | Step 11020 | Loss: 0.9019 | LR: 2.00e-05 [2026-04-21 22:47:27] Epoch 2 | Step 11030 | Loss: 0.9015 | LR: 2.00e-05 [2026-04-21 22:47:31] Epoch 2 | Step 11040 | Loss: 0.9024 | LR: 2.00e-05 [2026-04-21 22:47:36] Epoch 2 | Step 11050 | Loss: 0.9008 | LR: 2.00e-05 [2026-04-21 22:47:42] Epoch 2 | Step 11060 | Loss: 0.8994 | LR: 2.00e-05 [2026-04-21 22:47:47] Epoch 2 | Step 11070 | Loss: 0.9009 | LR: 2.00e-05 [2026-04-21 22:47:52] Epoch 2 | Step 11080 | Loss: 0.9010 | LR: 2.00e-05 [2026-04-21 22:47:57] Epoch 2 | Step 11090 | Loss: 0.8993 | LR: 2.00e-05 [2026-04-21 22:48:02] Epoch 2 | Step 11100 | Loss: 0.8989 | LR: 2.00e-05 [2026-04-21 22:48:07] Epoch 2 | Step 11110 | Loss: 0.9021 | LR: 2.00e-05 [2026-04-21 22:48:13] Epoch 2 | Step 11120 | Loss: 0.9017 | LR: 2.00e-05 [2026-04-21 22:48:19] Epoch 2 | Step 11130 | Loss: 0.9010 | LR: 2.00e-05 [2026-04-21 22:48:23] Epoch 2 | Step 11140 | Loss: 0.8996 | LR: 2.00e-05 [2026-04-21 22:48:28] Epoch 2 | Step 11150 | Loss: 0.9017 | LR: 2.00e-05 [2026-04-21 22:48:33] Epoch 2 | Step 11160 | Loss: 0.9002 | LR: 2.00e-05 [2026-04-21 22:48:38] Epoch 2 | Step 11170 | Loss: 0.9004 | LR: 2.00e-05 [2026-04-21 22:48:45] Epoch 2 | Step 11180 | Loss: 0.8992 | LR: 2.00e-05 [2026-04-21 22:48:50] Epoch 2 | Step 11190 | Loss: 0.8982 | LR: 2.00e-05 [2026-04-21 22:48:56] Epoch 2 | Step 11200 | Loss: 0.8965 | LR: 2.00e-05 [2026-04-21 22:49:01] Epoch 2 | Step 11210 | Loss: 0.8973 | LR: 2.00e-05 [2026-04-21 22:49:06] Epoch 2 | Step 11220 | Loss: 0.8993 | LR: 2.00e-05 [2026-04-21 22:49:12] Epoch 2 | Step 11230 | Loss: 0.8993 | LR: 2.00e-05 [2026-04-21 22:49:17] Epoch 2 | Step 11240 | Loss: 0.8999 | LR: 2.00e-05 [2026-04-21 22:49:22] Epoch 2 | Step 11250 | Loss: 0.8983 | LR: 2.00e-05 [2026-04-21 22:49:27] Epoch 2 | Step 11260 | Loss: 0.8995 | LR: 2.00e-05 [2026-04-21 22:49:33] Epoch 2 | Step 11270 | Loss: 0.9004 | LR: 2.00e-05 [2026-04-21 22:49:38] Epoch 2 | Step 11280 | Loss: 0.8977 | LR: 2.00e-05 [2026-04-21 22:49:43] Epoch 2 | Step 11290 | Loss: 0.8978 | LR: 2.00e-05 [2026-04-21 22:49:49] Epoch 2 | Step 11300 | Loss: 0.8978 | LR: 2.00e-05 [2026-04-21 22:49:54] Epoch 2 | Step 11310 | Loss: 0.8984 | LR: 2.00e-05 [2026-04-21 22:49:59] Epoch 2 | Step 11320 | Loss: 0.8973 | LR: 2.00e-05 [2026-04-21 22:50:05] Epoch 2 | Step 11330 | Loss: 0.8986 | LR: 2.00e-05 [2026-04-21 22:50:10] Epoch 2 | Step 11340 | Loss: 0.8983 | LR: 2.00e-05 [2026-04-21 22:50:15] Epoch 2 | Step 11350 | Loss: 0.9008 | LR: 2.00e-05 [2026-04-21 22:50:20] Epoch 2 | Step 11360 | Loss: 0.8988 | LR: 2.00e-05 [2026-04-21 22:50:26] Epoch 2 | Step 11370 | Loss: 0.8985 | LR: 2.00e-05 [2026-04-21 22:50:31] Epoch 2 | Step 11380 | Loss: 0.8988 | LR: 2.00e-05 [2026-04-21 22:50:36] Epoch 2 | Step 11390 | Loss: 0.8999 | LR: 2.00e-05 [2026-04-21 22:50:42] Epoch 2 | Step 11400 | Loss: 0.9002 | LR: 2.00e-05 [2026-04-21 22:50:48] Epoch 2 | Step 11410 | Loss: 0.9003 | LR: 2.00e-05 [2026-04-21 22:50:53] Epoch 2 | Step 11420 | Loss: 0.8999 | LR: 2.00e-05 [2026-04-21 22:50:57] Epoch 2 | Step 11430 | Loss: 0.9003 | LR: 2.00e-05 [2026-04-21 22:51:03] Epoch 2 | Step 11440 | Loss: 0.9014 | LR: 2.00e-05 [2026-04-21 22:51:08] Epoch 2 | Step 11450 | Loss: 0.9030 | LR: 2.00e-05 [2026-04-21 22:51:12] Epoch 2 | Step 11460 | Loss: 0.9040 | LR: 2.00e-05 [2026-04-21 22:51:18] Epoch 2 | Step 11470 | Loss: 0.9034 | LR: 2.00e-05 [2026-04-21 22:51:23] Epoch 2 | Step 11480 | Loss: 0.9044 | LR: 2.00e-05 [2026-04-21 22:51:28] Epoch 2 | Step 11490 | Loss: 0.9032 | LR: 2.00e-05 [2026-04-21 22:51:33] Epoch 2 | Step 11500 | Loss: 0.9031 | LR: 2.00e-05 [2026-04-21 22:51:38] Epoch 2 | Step 11510 | Loss: 0.9025 | LR: 2.00e-05 [2026-04-21 22:51:43] Epoch 2 | Step 11520 | Loss: 0.9044 | LR: 2.00e-05 [2026-04-21 22:51:49] Epoch 2 | Step 11530 | Loss: 0.9034 | LR: 2.00e-05 [2026-04-21 22:51:54] Epoch 2 | Step 11540 | Loss: 0.9033 | LR: 2.00e-05 [2026-04-21 22:51:59] Epoch 2 | Step 11550 | Loss: 0.9043 | LR: 2.00e-05 [2026-04-21 22:52:03] Epoch 2 | Step 11560 | Loss: 0.9040 | LR: 2.00e-05 [2026-04-21 22:52:09] Epoch 2 | Step 11570 | Loss: 0.9035 | LR: 2.00e-05 [2026-04-21 22:52:14] Epoch 2 | Step 11580 | Loss: 0.9032 | LR: 2.00e-05 [2026-04-21 22:52:19] Epoch 2 | Step 11590 | Loss: 0.9023 | LR: 2.00e-05 [2026-04-21 22:52:25] Epoch 2 | Step 11600 | Loss: 0.9039 | LR: 2.00e-05 [2026-04-21 22:52:30] Epoch 2 | Step 11610 | Loss: 0.9045 | LR: 2.00e-05 [2026-04-21 22:52:35] Epoch 2 | Step 11620 | Loss: 0.9030 | LR: 2.00e-05 [2026-04-21 22:52:41] Epoch 2 | Step 11630 | Loss: 0.9024 | LR: 2.00e-05 [2026-04-21 22:52:46] Epoch 2 | Step 11640 | Loss: 0.9009 | LR: 2.00e-05 [2026-04-21 22:52:53] Epoch 2 | Step 11650 | Loss: 0.9001 | LR: 2.00e-05 [2026-04-21 22:52:58] Epoch 2 | Step 11660 | Loss: 0.8999 | LR: 2.00e-05 [2026-04-21 22:53:04] Epoch 2 | Step 11670 | Loss: 0.9009 | LR: 2.00e-05 [2026-04-21 22:53:09] Epoch 2 | Step 11680 | Loss: 0.9009 | LR: 2.00e-05 [2026-04-21 22:53:14] Epoch 2 | Step 11690 | Loss: 0.9010 | LR: 2.00e-05 [2026-04-21 22:53:20] Epoch 2 | Step 11700 | Loss: 0.9012 | LR: 2.00e-05 [2026-04-21 22:53:25] Epoch 2 | Step 11710 | Loss: 0.9016 | LR: 2.00e-05 [2026-04-21 22:53:30] Epoch 2 | Step 11720 | Loss: 0.9022 | LR: 2.00e-05 [2026-04-21 22:53:35] Epoch 2 | Step 11730 | Loss: 0.9020 | LR: 2.00e-05 [2026-04-21 22:53:40] Epoch 2 | Step 11740 | Loss: 0.9028 | LR: 2.00e-05 [2026-04-21 22:53:46] Epoch 2 | Step 11750 | Loss: 0.9023 | LR: 2.00e-05 [2026-04-21 22:53:50] Epoch 2 | Step 11760 | Loss: 0.9028 | LR: 2.00e-05 [2026-04-21 22:53:56] Epoch 2 | Step 11770 | Loss: 0.9025 | LR: 2.00e-05 [2026-04-21 22:54:01] Epoch 2 | Step 11780 | Loss: 0.9031 | LR: 2.00e-05 [2026-04-21 22:54:07] Epoch 2 | Step 11790 | Loss: 0.9034 | LR: 2.00e-05 [2026-04-21 22:54:12] Epoch 2 | Step 11800 | Loss: 0.9032 | LR: 2.00e-05 [2026-04-21 22:54:17] Epoch 2 | Step 11810 | Loss: 0.9030 | LR: 2.00e-05 [2026-04-21 22:54:24] Epoch 2 | Step 11820 | Loss: 0.9035 | LR: 2.00e-05 [2026-04-21 22:54:30] Epoch 2 | Step 11830 | Loss: 0.9036 | LR: 2.00e-05 [2026-04-21 22:54:36] Epoch 2 | Step 11840 | Loss: 0.9033 | LR: 2.00e-05 [2026-04-21 22:54:41] Epoch 2 | Step 11850 | Loss: 0.9026 | LR: 2.00e-05 [2026-04-21 22:54:47] Epoch 2 | Step 11860 | Loss: 0.9016 | LR: 2.00e-05 [2026-04-21 22:54:53] Epoch 2 | Step 11870 | Loss: 0.9019 | LR: 2.00e-05 [2026-04-21 22:54:58] Epoch 2 | Step 11880 | Loss: 0.9018 | LR: 2.00e-05 [2026-04-21 22:55:03] Epoch 2 | Step 11890 | Loss: 0.9016 | LR: 2.00e-05 [2026-04-21 22:55:08] Epoch 2 | Step 11900 | Loss: 0.9016 | LR: 2.00e-05 [2026-04-21 22:55:14] Epoch 2 | Step 11910 | Loss: 0.9017 | LR: 2.00e-05 [2026-04-21 22:55:18] Epoch 2 | Step 11920 | Loss: 0.9020 | LR: 2.00e-05 [2026-04-21 22:55:24] Epoch 2 | Step 11930 | Loss: 0.9021 | LR: 2.00e-05 [2026-04-21 22:55:29] Epoch 2 | Step 11940 | Loss: 0.9011 | LR: 2.00e-05 [2026-04-21 22:55:35] Epoch 2 | Step 11950 | Loss: 0.9019 | LR: 2.00e-05 [2026-04-21 22:55:40] Epoch 2 | Step 11960 | Loss: 0.9026 | LR: 2.00e-05 [2026-04-21 22:55:45] Epoch 2 | Step 11970 | Loss: 0.9018 | LR: 2.00e-05 [2026-04-21 22:55:51] Epoch 2 | Step 11980 | Loss: 0.9021 | LR: 2.00e-05 [2026-04-21 22:55:56] Epoch 2 | Step 11990 | Loss: 0.9026 | LR: 2.00e-05 [2026-04-21 22:56:02] Epoch 2 | Step 12000 | Loss: 0.9036 | LR: 2.00e-05 [2026-04-21 22:56:13] Checkpoint saved: outputs/2026-04-21/20-28-37/checkpoints/checkpoint_step_12000.pt [2026-04-21 22:57:28] Validation | Batch 10/1567 | Loss: 1.0656 [2026-04-21 22:57:29] Validation | Batch 20/1567 | Loss: 1.1495 [2026-04-21 22:57:31] Validation | Batch 30/1567 | Loss: 1.1079 [2026-04-21 22:57:32] Validation | Batch 40/1567 | Loss: 1.1284 [2026-04-21 22:57:33] Validation | Batch 50/1567 | Loss: 1.1051 [2026-04-21 22:57:34] Validation | Batch 60/1567 | Loss: 1.0930 [2026-04-21 22:57:35] Validation | Batch 70/1567 | Loss: 1.0815 [2026-04-21 22:57:37] Validation | Batch 80/1567 | Loss: 1.0882 [2026-04-21 22:57:39] Validation | Batch 90/1567 | Loss: 1.0834 [2026-04-21 22:57:40] Validation | Batch 100/1567 | Loss: 1.0614 [2026-04-21 22:57:41] Validation | Batch 110/1567 | Loss: 1.0508 [2026-04-21 22:57:42] Validation | Batch 120/1567 | Loss: 1.0442 [2026-04-21 22:57:44] Validation | Batch 130/1567 | Loss: 1.0403 [2026-04-21 22:57:45] Validation | Batch 140/1567 | Loss: 1.0522 [2026-04-21 22:57:46] Validation | Batch 150/1567 | Loss: 1.0628 [2026-04-21 22:57:47] Validation | Batch 160/1567 | Loss: 1.0623 [2026-04-21 22:57:48] Validation | Batch 170/1567 | Loss: 1.0539 [2026-04-21 22:57:49] Validation | Batch 180/1567 | Loss: 1.0564 [2026-04-21 22:57:50] Validation | Batch 190/1567 | Loss: 1.0610 [2026-04-21 22:57:51] Validation | Batch 200/1567 | Loss: 1.0641 [2026-04-21 22:57:53] Validation | Batch 210/1567 | Loss: 1.0616 [2026-04-21 22:57:54] Validation | Batch 220/1567 | Loss: 1.0665 [2026-04-21 22:57:56] Validation | Batch 230/1567 | Loss: 1.0696 [2026-04-21 22:57:57] Validation | Batch 240/1567 | Loss: 1.0732 [2026-04-21 22:57:58] Validation | Batch 250/1567 | Loss: 1.0773 [2026-04-21 22:57:59] Validation | Batch 260/1567 | Loss: 1.0806 [2026-04-21 22:58:00] Validation | Batch 270/1567 | Loss: 1.0853 [2026-04-21 22:58:02] Validation | Batch 280/1567 | Loss: 1.0881 [2026-04-21 22:58:04] Validation | Batch 290/1567 | Loss: 1.0838 [2026-04-21 22:58:05] Validation | Batch 300/1567 | Loss: 1.0831 [2026-04-21 22:58:06] Validation | Batch 310/1567 | Loss: 1.0802 [2026-04-21 22:58:07] Validation | Batch 320/1567 | Loss: 1.0830 [2026-04-21 22:58:08] Validation | Batch 330/1567 | Loss: 1.0827 [2026-04-21 22:58:10] Validation | Batch 340/1567 | Loss: 1.0823 [2026-04-21 22:58:11] Validation | Batch 350/1567 | Loss: 1.0791 [2026-04-21 22:58:12] Validation | Batch 360/1567 | Loss: 1.0733 [2026-04-21 22:58:13] Validation | Batch 370/1567 | Loss: 1.0738 [2026-04-21 22:58:14] Validation | Batch 380/1567 | Loss: 1.0775 [2026-04-21 22:58:16] Validation | Batch 390/1567 | Loss: 1.0762 [2026-04-21 22:58:17] Validation | Batch 400/1567 | Loss: 1.0776 [2026-04-21 22:58:18] Validation | Batch 410/1567 | Loss: 1.0735 [2026-04-21 22:58:19] Validation | Batch 420/1567 | Loss: 1.0718 [2026-04-21 22:58:20] Validation | Batch 430/1567 | Loss: 1.0744 [2026-04-21 22:58:22] Validation | Batch 440/1567 | Loss: 1.0746 [2026-04-21 22:58:23] Validation | Batch 450/1567 | Loss: 1.0768 [2026-04-21 22:58:24] Validation | Batch 460/1567 | Loss: 1.0799 [2026-04-21 22:58:25] Validation | Batch 470/1567 | Loss: 1.0848 [2026-04-21 22:58:26] Validation | Batch 480/1567 | Loss: 1.0823 [2026-04-21 22:58:27] Validation | Batch 490/1567 | Loss: 1.0801 [2026-04-21 22:58:28] Validation | Batch 500/1567 | Loss: 1.0813 [2026-04-21 22:58:30] Validation | Batch 510/1567 | Loss: 1.0813 [2026-04-21 22:58:31] Validation | Batch 520/1567 | Loss: 1.0825 [2026-04-21 22:58:32] Validation | Batch 530/1567 | Loss: 1.0808 [2026-04-21 22:58:33] Validation | Batch 540/1567 | Loss: 1.0783 [2026-04-21 22:58:35] Validation | Batch 550/1567 | Loss: 1.0798 [2026-04-21 22:58:36] Validation | Batch 560/1567 | Loss: 1.0789 [2026-04-21 22:58:37] Validation | Batch 570/1567 | Loss: 1.0749 [2026-04-21 22:58:38] Validation | Batch 580/1567 | Loss: 1.0765 [2026-04-21 22:58:40] Validation | Batch 590/1567 | Loss: 1.0764 [2026-04-21 22:58:41] Validation | Batch 600/1567 | Loss: 1.0753 [2026-04-21 22:58:42] Validation | Batch 610/1567 | Loss: 1.0775 [2026-04-21 22:58:43] Validation | Batch 620/1567 | Loss: 1.0757 [2026-04-21 22:58:45] Validation | Batch 630/1567 | Loss: 1.0758 [2026-04-21 22:58:46] Validation | Batch 640/1567 | Loss: 1.0765 [2026-04-21 22:58:48] Validation | Batch 650/1567 | Loss: 1.0793 [2026-04-21 22:58:49] Validation | Batch 660/1567 | Loss: 1.0807 [2026-04-21 22:58:50] Validation | Batch 670/1567 | Loss: 1.0790 [2026-04-21 22:58:51] Validation | Batch 680/1567 | Loss: 1.0778 [2026-04-21 22:58:52] Validation | Batch 690/1567 | Loss: 1.0764 [2026-04-21 22:58:53] Validation | Batch 700/1567 | Loss: 1.0764 [2026-04-21 22:58:55] Validation | Batch 710/1567 | Loss: 1.0756 [2026-04-21 22:58:56] Validation | Batch 720/1567 | Loss: 1.0723 [2026-04-21 22:58:57] Validation | Batch 730/1567 | Loss: 1.0726 [2026-04-21 22:58:58] Validation | Batch 740/1567 | Loss: 1.0732 [2026-04-21 22:58:59] Validation | Batch 750/1567 | Loss: 1.0727 [2026-04-21 22:59:00] Validation | Batch 760/1567 | Loss: 1.0741 [2026-04-21 22:59:02] Validation | Batch 770/1567 | Loss: 1.0737 [2026-04-21 22:59:03] Validation | Batch 780/1567 | Loss: 1.0747 [2026-04-21 22:59:04] Validation | Batch 790/1567 | Loss: 1.0732 [2026-04-21 22:59:05] Validation | Batch 800/1567 | Loss: 1.0713 [2026-04-21 22:59:06] Validation | Batch 810/1567 | Loss: 1.0719 [2026-04-21 22:59:07] Validation | Batch 820/1567 | Loss: 1.0713 [2026-04-21 22:59:08] Validation | Batch 830/1567 | Loss: 1.0705 [2026-04-21 22:59:09] Validation | Batch 840/1567 | Loss: 1.0709 [2026-04-21 22:59:10] Validation | Batch 850/1567 | Loss: 1.0719 [2026-04-21 22:59:11] Validation | Batch 860/1567 | Loss: 1.0726 [2026-04-21 22:59:12] Validation | Batch 870/1567 | Loss: 1.0735 [2026-04-21 22:59:13] Validation | Batch 880/1567 | Loss: 1.0735 [2026-04-21 22:59:15] Validation | Batch 890/1567 | Loss: 1.0731 [2026-04-21 22:59:16] Validation | Batch 900/1567 | Loss: 1.0727 [2026-04-21 22:59:17] Validation | Batch 910/1567 | Loss: 1.0723 [2026-04-21 22:59:18] Validation | Batch 920/1567 | Loss: 1.0742 [2026-04-21 22:59:19] Validation | Batch 930/1567 | Loss: 1.0741 [2026-04-21 22:59:20] Validation | Batch 940/1567 | Loss: 1.0740 [2026-04-21 22:59:21] Validation | Batch 950/1567 | Loss: 1.0736 [2026-04-21 22:59:22] Validation | Batch 960/1567 | Loss: 1.0741 [2026-04-21 22:59:23] Validation | Batch 970/1567 | Loss: 1.0747 [2026-04-21 22:59:24] Validation | Batch 980/1567 | Loss: 1.0743 [2026-04-21 22:59:25] Validation | Batch 990/1567 | Loss: 1.0754 [2026-04-21 22:59:27] Validation | Batch 1000/1567 | Loss: 1.0758 [2026-04-21 22:59:28] Validation | Batch 1010/1567 | Loss: 1.0748 [2026-04-21 22:59:29] Validation | Batch 1020/1567 | Loss: 1.0758 [2026-04-21 22:59:30] Validation | Batch 1030/1567 | Loss: 1.0762 [2026-04-21 22:59:32] Validation | Batch 1040/1567 | Loss: 1.0753 [2026-04-21 22:59:33] Validation | Batch 1050/1567 | Loss: 1.0743 [2026-04-21 22:59:34] Validation | Batch 1060/1567 | Loss: 1.0754 [2026-04-21 22:59:35] Validation | Batch 1070/1567 | Loss: 1.0754 [2026-04-21 22:59:36] Validation | Batch 1080/1567 | Loss: 1.0770 [2026-04-21 22:59:38] Validation | Batch 1090/1567 | Loss: 1.0799 [2026-04-21 22:59:39] Validation | Batch 1100/1567 | Loss: 1.0815 [2026-04-21 22:59:40] Validation | Batch 1110/1567 | Loss: 1.0804 [2026-04-21 22:59:41] Validation | Batch 1120/1567 | Loss: 1.0806 [2026-04-21 22:59:42] Validation | Batch 1130/1567 | Loss: 1.0789 [2026-04-21 22:59:43] Validation | Batch 1140/1567 | Loss: 1.0792 [2026-04-21 22:59:45] Validation | Batch 1150/1567 | Loss: 1.0779 [2026-04-21 22:59:45] Validation | Batch 1160/1567 | Loss: 1.0772 [2026-04-21 22:59:47] Validation | Batch 1170/1567 | Loss: 1.0775 [2026-04-21 22:59:48] Validation | Batch 1180/1567 | Loss: 1.0780 [2026-04-21 22:59:49] Validation | Batch 1190/1567 | Loss: 1.0783 [2026-04-21 22:59:50] Validation | Batch 1200/1567 | Loss: 1.0771 [2026-04-21 22:59:52] Validation | Batch 1210/1567 | Loss: 1.0764 [2026-04-21 22:59:52] Validation | Batch 1220/1567 | Loss: 1.0773 [2026-04-21 22:59:54] Validation | Batch 1230/1567 | Loss: 1.0778 [2026-04-21 22:59:55] Validation | Batch 1240/1567 | Loss: 1.0776 [2026-04-21 22:59:56] Validation | Batch 1250/1567 | Loss: 1.0779 [2026-04-21 22:59:57] Validation | Batch 1260/1567 | Loss: 1.0777 [2026-04-21 22:59:59] Validation | Batch 1270/1567 | Loss: 1.0759 [2026-04-21 23:00:00] Validation | Batch 1280/1567 | Loss: 1.0761 [2026-04-21 23:00:02] Validation | Batch 1290/1567 | Loss: 1.0761 [2026-04-21 23:00:03] Validation | Batch 1300/1567 | Loss: 1.0765 [2026-04-21 23:00:04] Validation | Batch 1310/1567 | Loss: 1.0773 [2026-04-21 23:00:05] Validation | Batch 1320/1567 | Loss: 1.0780 [2026-04-21 23:00:06] Validation | Batch 1330/1567 | Loss: 1.0795 [2026-04-21 23:00:07] Validation | Batch 1340/1567 | Loss: 1.0791 [2026-04-21 23:00:08] Validation | Batch 1350/1567 | Loss: 1.0794 [2026-04-21 23:00:09] Validation | Batch 1360/1567 | Loss: 1.0786 [2026-04-21 23:00:11] Validation | Batch 1370/1567 | Loss: 1.0783 [2026-04-21 23:00:12] Validation | Batch 1380/1567 | Loss: 1.0784 [2026-04-21 23:00:13] Validation | Batch 1390/1567 | Loss: 1.0776 [2026-04-21 23:00:14] Validation | Batch 1400/1567 | Loss: 1.0772 [2026-04-21 23:00:15] Validation | Batch 1410/1567 | Loss: 1.0778 [2026-04-21 23:00:16] Validation | Batch 1420/1567 | Loss: 1.0778 [2026-04-21 23:00:17] Validation | Batch 1430/1567 | Loss: 1.0781 [2026-04-21 23:00:19] Validation | Batch 1440/1567 | Loss: 1.0789 [2026-04-21 23:00:19] Validation | Batch 1450/1567 | Loss: 1.0790 [2026-04-21 23:00:20] Validation | Batch 1460/1567 | Loss: 1.0784 [2026-04-21 23:00:21] Validation | Batch 1470/1567 | Loss: 1.0783 [2026-04-21 23:00:22] Validation | Batch 1480/1567 | Loss: 1.0779 [2026-04-21 23:00:23] Validation | Batch 1490/1567 | Loss: 1.0773 [2026-04-21 23:00:25] Validation | Batch 1500/1567 | Loss: 1.0769 [2026-04-21 23:00:26] Validation | Batch 1510/1567 | Loss: 1.0761 [2026-04-21 23:00:26] Validation | Batch 1520/1567 | Loss: 1.0760 [2026-04-21 23:00:27] Validation | Batch 1530/1567 | Loss: 1.0759 [2026-04-21 23:00:29] Validation | Batch 1540/1567 | Loss: 1.0767 [2026-04-21 23:00:30] Validation | Batch 1550/1567 | Loss: 1.0780 [2026-04-21 23:00:31] Validation | Batch 1560/1567 | Loss: 1.0776 [2026-04-21 23:00:32] Validation | Batch 1567/1567 | Loss: 1.0777 [2026-04-21 23:00:32] Validation | Loss: 1.0777 | PPL: 2.99 | Time: 185.04s [2026-04-21 23:00:37] Epoch 2 | Step 12010 | Loss: 0.9041 | LR: 2.00e-05 [2026-04-21 23:00:43] Epoch 2 | Step 12020 | Loss: 0.9038 | LR: 2.00e-05 [2026-04-21 23:00:48] Epoch 2 | Step 12030 | Loss: 0.9034 | LR: 2.00e-05 [2026-04-21 23:00:53] Epoch 2 | Step 12040 | Loss: 0.9031 | LR: 2.00e-05 [2026-04-21 23:00:58] Epoch 2 | Step 12050 | Loss: 0.9035 | LR: 2.00e-05 [2026-04-21 23:01:04] Epoch 2 | Step 12060 | Loss: 0.9036 | LR: 2.00e-05 [2026-04-21 23:01:10] Epoch 2 | Step 12070 | Loss: 0.9042 | LR: 2.00e-05 [2026-04-21 23:01:15] Epoch 2 | Step 12080 | Loss: 0.9036 | LR: 2.00e-05 [2026-04-21 23:01:20] Epoch 2 | Step 12090 | Loss: 0.9041 | LR: 2.00e-05 [2026-04-21 23:01:25] Epoch 2 | Step 12100 | Loss: 0.9036 | LR: 2.00e-05 [2026-04-21 23:01:31] Epoch 2 | Step 12110 | Loss: 0.9027 | LR: 2.00e-05 [2026-04-21 23:01:36] Epoch 2 | Step 12120 | Loss: 0.9029 | LR: 2.00e-05 [2026-04-21 23:01:41] Epoch 2 | Step 12130 | Loss: 0.9037 | LR: 2.00e-05 [2026-04-21 23:01:47] Epoch 2 | Step 12140 | Loss: 0.9034 | LR: 2.00e-05 [2026-04-21 23:01:52] Epoch 2 | Step 12150 | Loss: 0.9032 | LR: 2.00e-05 [2026-04-21 23:01:58] Epoch 2 | Step 12160 | Loss: 0.9032 | LR: 2.00e-05 [2026-04-21 23:02:04] Epoch 2 | Step 12170 | Loss: 0.9030 | LR: 2.00e-05 [2026-04-21 23:02:09] Epoch 2 | Step 12180 | Loss: 0.9028 | LR: 2.00e-05 [2026-04-21 23:02:15] Epoch 2 | Step 12190 | Loss: 0.9030 | LR: 2.00e-05 [2026-04-21 23:02:20] Epoch 2 | Step 12200 | Loss: 0.9028 | LR: 2.00e-05 [2026-04-21 23:02:25] Epoch 2 | Step 12210 | Loss: 0.9028 | LR: 2.00e-05 [2026-04-21 23:02:30] Epoch 2 | Step 12220 | Loss: 0.9021 | LR: 2.00e-05 [2026-04-21 23:02:35] Epoch 2 | Step 12230 | Loss: 0.9022 | LR: 2.00e-05 [2026-04-21 23:02:40] Epoch 2 | Step 12240 | Loss: 0.9022 | LR: 2.00e-05 [2026-04-21 23:02:46] Epoch 2 | Step 12250 | Loss: 0.9018 | LR: 2.00e-05 [2026-04-21 23:02:51] Epoch 2 | Step 12260 | Loss: 0.9020 | LR: 2.00e-05 [2026-04-21 23:02:56] Epoch 2 | Step 12270 | Loss: 0.9020 | LR: 2.00e-05 [2026-04-21 23:03:02] Epoch 2 | Step 12280 | Loss: 0.9027 | LR: 2.00e-05 [2026-04-21 23:03:07] Epoch 2 | Step 12290 | Loss: 0.9027 | LR: 2.00e-05 [2026-04-21 23:03:13] Epoch 2 | Step 12300 | Loss: 0.9025 | LR: 2.00e-05 [2026-04-21 23:03:19] Epoch 2 | Step 12310 | Loss: 0.9024 | LR: 2.00e-05 [2026-04-21 23:03:24] Epoch 2 | Step 12320 | Loss: 0.9019 | LR: 2.00e-05 [2026-04-21 23:03:30] Epoch 2 | Step 12330 | Loss: 0.9019 | LR: 2.00e-05 [2026-04-21 23:03:36] Epoch 2 | Step 12340 | Loss: 0.9019 | LR: 2.00e-05 [2026-04-21 23:03:41] Epoch 2 | Step 12350 | Loss: 0.9020 | LR: 2.00e-05 [2026-04-21 23:03:47] Epoch 2 | Step 12360 | Loss: 0.9024 | LR: 2.00e-05 [2026-04-21 23:03:52] Epoch 2 | Step 12370 | Loss: 0.9017 | LR: 2.00e-05 [2026-04-21 23:03:57] Epoch 2 | Step 12380 | Loss: 0.9016 | LR: 2.00e-05 [2026-04-21 23:04:02] Epoch 2 | Step 12390 | Loss: 0.9015 | LR: 2.00e-05 [2026-04-21 23:04:08] Epoch 2 | Step 12400 | Loss: 0.9015 | LR: 2.00e-05 [2026-04-21 23:04:13] Epoch 2 | Step 12410 | Loss: 0.9015 | LR: 2.00e-05 [2026-04-21 23:04:18] Epoch 2 | Step 12420 | Loss: 0.9013 | LR: 2.00e-05 [2026-04-21 23:04:23] Epoch 2 | Step 12430 | Loss: 0.9012 | LR: 2.00e-05 [2026-04-21 23:04:28] Epoch 2 | Step 12440 | Loss: 0.9009 | LR: 2.00e-05 [2026-04-21 23:04:33] Epoch 2 | Step 12450 | Loss: 0.9011 | LR: 2.00e-05 [2026-04-21 23:04:39] Epoch 2 | Step 12460 | Loss: 0.9010 | LR: 2.00e-05 [2026-04-21 23:04:44] Epoch 2 | Step 12470 | Loss: 0.9015 | LR: 2.00e-05 [2026-04-21 23:04:50] Epoch 2 | Step 12480 | Loss: 0.9014 | LR: 2.00e-05 [2026-04-21 23:04:55] Epoch 2 | Step 12490 | Loss: 0.9011 | LR: 2.00e-05 [2026-04-21 23:05:00] Epoch 2 | Step 12500 | Loss: 0.9021 | LR: 2.00e-05 [2026-04-21 23:05:05] Epoch 2 | Step 12510 | Loss: 0.9022 | LR: 2.00e-05 [2026-04-21 23:05:10] Epoch 2 | Step 12520 | Loss: 0.9023 | LR: 2.00e-05 [2026-04-21 23:05:15] Epoch 2 | Step 12530 | Loss: 0.9023 | LR: 2.00e-05 [2026-04-21 23:05:20] Epoch 2 | Step 12540 | Loss: 0.9016 | LR: 2.00e-05 [2026-04-21 23:05:25] Epoch 2 | Step 12550 | Loss: 0.9010 | LR: 2.00e-05 [2026-04-21 23:05:30] Epoch 2 | Step 12560 | Loss: 0.9009 | LR: 2.00e-05 [2026-04-21 23:05:35] Epoch 2 | Step 12570 | Loss: 0.9014 | LR: 2.00e-05 [2026-04-21 23:05:40] Epoch 2 | Step 12580 | Loss: 0.9011 | LR: 2.00e-05 [2026-04-21 23:05:45] Epoch 2 | Step 12590 | Loss: 0.9005 | LR: 2.00e-05 [2026-04-21 23:05:50] Epoch 2 | Step 12600 | Loss: 0.9008 | LR: 2.00e-05 [2026-04-21 23:05:55] Epoch 2 | Step 12610 | Loss: 0.9006 | LR: 2.00e-05 [2026-04-21 23:06:00] Epoch 2 | Step 12620 | Loss: 0.9002 | LR: 2.00e-05 [2026-04-21 23:06:06] Epoch 2 | Step 12630 | Loss: 0.9003 | LR: 2.00e-05 [2026-04-21 23:06:11] Epoch 2 | Step 12640 | Loss: 0.9001 | LR: 2.00e-05 [2026-04-21 23:06:16] Epoch 2 | Step 12650 | Loss: 0.9010 | LR: 2.00e-05 [2026-04-21 23:06:22] Epoch 2 | Step 12660 | Loss: 0.9012 | LR: 2.00e-05 [2026-04-21 23:06:28] Epoch 2 | Step 12670 | Loss: 0.9010 | LR: 2.00e-05 [2026-04-21 23:06:33] Epoch 2 | Step 12680 | Loss: 0.9017 | LR: 2.00e-05 [2026-04-21 23:06:39] Epoch 2 | Step 12690 | Loss: 0.9016 | LR: 2.00e-05 [2026-04-21 23:06:44] Epoch 2 | Step 12700 | Loss: 0.9018 | LR: 2.00e-05 [2026-04-21 23:06:49] Epoch 2 | Step 12710 | Loss: 0.9016 | LR: 2.00e-05 [2026-04-21 23:06:54] Epoch 2 | Step 12720 | Loss: 0.9013 | LR: 2.00e-05 [2026-04-21 23:06:59] Epoch 2 | Step 12730 | Loss: 0.9012 | LR: 2.00e-05 [2026-04-21 23:07:05] Epoch 2 | Step 12740 | Loss: 0.9009 | LR: 2.00e-05 [2026-04-21 23:07:10] Epoch 2 | Step 12750 | Loss: 0.9009 | LR: 2.00e-05 [2026-04-21 23:07:16] Epoch 2 | Step 12760 | Loss: 0.9015 | LR: 2.00e-05 [2026-04-21 23:07:21] Epoch 2 | Step 12770 | Loss: 0.9009 | LR: 2.00e-05 [2026-04-21 23:07:26] Epoch 2 | Step 12780 | Loss: 0.9011 | LR: 2.00e-05 [2026-04-21 23:07:32] Epoch 2 | Step 12790 | Loss: 0.9012 | LR: 2.00e-05 [2026-04-21 23:07:37] Epoch 2 | Step 12800 | Loss: 0.9015 | LR: 2.00e-05 [2026-04-21 23:07:43] Epoch 2 | Step 12810 | Loss: 0.9017 | LR: 2.00e-05 [2026-04-21 23:07:47] Epoch 2 | Step 12820 | Loss: 0.9017 | LR: 1.99e-05 [2026-04-21 23:07:52] Epoch 2 | Step 12830 | Loss: 0.9024 | LR: 1.99e-05 [2026-04-21 23:07:58] Epoch 2 | Step 12840 | Loss: 0.9026 | LR: 1.99e-05 [2026-04-21 23:08:03] Epoch 2 | Step 12850 | Loss: 0.9022 | LR: 1.99e-05 [2026-04-21 23:08:08] Epoch 2 | Step 12860 | Loss: 0.9022 | LR: 1.99e-05 [2026-04-21 23:08:14] Epoch 2 | Step 12870 | Loss: 0.9024 | LR: 1.99e-05 [2026-04-21 23:08:19] Epoch 2 | Step 12880 | Loss: 0.9026 | LR: 1.99e-05 [2026-04-21 23:08:25] Epoch 2 | Step 12890 | Loss: 0.9029 | LR: 1.99e-05 [2026-04-21 23:08:30] Epoch 2 | Step 12900 | Loss: 0.9031 | LR: 1.98e-05 [2026-04-21 23:08:35] Epoch 2 | Step 12910 | Loss: 0.9032 | LR: 1.98e-05 [2026-04-21 23:08:41] Epoch 2 | Step 12920 | Loss: 0.9033 | LR: 1.98e-05 [2026-04-21 23:08:45] Epoch 2 | Step 12930 | Loss: 0.9032 | LR: 1.98e-05 [2026-04-21 23:08:50] Epoch 2 | Step 12940 | Loss: 0.9033 | LR: 1.98e-05 [2026-04-21 23:08:55] Epoch 2 | Step 12950 | Loss: 0.9035 | LR: 1.97e-05 [2026-04-21 23:09:00] Epoch 2 | Step 12960 | Loss: 0.9032 | LR: 1.97e-05 [2026-04-21 23:09:06] Epoch 2 | Step 12970 | Loss: 0.9030 | LR: 1.97e-05 [2026-04-21 23:09:11] Epoch 2 | Step 12980 | Loss: 0.9022 | LR: 1.97e-05 [2026-04-21 23:09:16] Epoch 2 | Step 12990 | Loss: 0.9026 | LR: 1.97e-05 [2026-04-21 23:09:21] Epoch 2 | Step 13000 | Loss: 0.9029 | LR: 1.96e-05 [2026-04-21 23:09:23] Validation | Batch 10/1567 | Loss: 1.0766 [2026-04-21 23:09:24] Validation | Batch 20/1567 | Loss: 1.1609 [2026-04-21 23:09:25] Validation | Batch 30/1567 | Loss: 1.1121 [2026-04-21 23:09:27] Validation | Batch 40/1567 | Loss: 1.1289 [2026-04-21 23:09:27] Validation | Batch 50/1567 | Loss: 1.1048 [2026-04-21 23:09:29] Validation | Batch 60/1567 | Loss: 1.0930 [2026-04-21 23:09:30] Validation | Batch 70/1567 | Loss: 1.0836 [2026-04-21 23:09:32] Validation | Batch 80/1567 | Loss: 1.0905 [2026-04-21 23:09:33] Validation | Batch 90/1567 | Loss: 1.0845 [2026-04-21 23:09:34] Validation | Batch 100/1567 | Loss: 1.0620 [2026-04-21 23:09:35] Validation | Batch 110/1567 | Loss: 1.0524 [2026-04-21 23:09:36] Validation | Batch 120/1567 | Loss: 1.0458 [2026-04-21 23:09:38] Validation | Batch 130/1567 | Loss: 1.0412 [2026-04-21 23:09:39] Validation | Batch 140/1567 | Loss: 1.0522 [2026-04-21 23:09:40] Validation | Batch 150/1567 | Loss: 1.0631 [2026-04-21 23:09:41] Validation | Batch 160/1567 | Loss: 1.0614 [2026-04-21 23:09:42] Validation | Batch 170/1567 | Loss: 1.0535 [2026-04-21 23:09:43] Validation | Batch 180/1567 | Loss: 1.0565 [2026-04-21 23:09:44] Validation | Batch 190/1567 | Loss: 1.0610 [2026-04-21 23:09:46] Validation | Batch 200/1567 | Loss: 1.0644 [2026-04-21 23:09:47] Validation | Batch 210/1567 | Loss: 1.0625 [2026-04-21 23:09:48] Validation | Batch 220/1567 | Loss: 1.0669 [2026-04-21 23:09:50] Validation | Batch 230/1567 | Loss: 1.0704 [2026-04-21 23:09:51] Validation | Batch 240/1567 | Loss: 1.0732 [2026-04-21 23:09:52] Validation | Batch 250/1567 | Loss: 1.0773 [2026-04-21 23:09:53] Validation | Batch 260/1567 | Loss: 1.0805 [2026-04-21 23:09:54] Validation | Batch 270/1567 | Loss: 1.0849 [2026-04-21 23:09:56] Validation | Batch 280/1567 | Loss: 1.0878 [2026-04-21 23:09:58] Validation | Batch 290/1567 | Loss: 1.0833 [2026-04-21 23:09:59] Validation | Batch 300/1567 | Loss: 1.0826 [2026-04-21 23:10:00] Validation | Batch 310/1567 | Loss: 1.0792 [2026-04-21 23:10:01] Validation | Batch 320/1567 | Loss: 1.0824 [2026-04-21 23:10:02] Validation | Batch 330/1567 | Loss: 1.0826 [2026-04-21 23:10:04] Validation | Batch 340/1567 | Loss: 1.0819 [2026-04-21 23:10:05] Validation | Batch 350/1567 | Loss: 1.0792 [2026-04-21 23:10:06] Validation | Batch 360/1567 | Loss: 1.0735 [2026-04-21 23:10:07] Validation | Batch 370/1567 | Loss: 1.0737 [2026-04-21 23:10:08] Validation | Batch 380/1567 | Loss: 1.0784 [2026-04-21 23:10:10] Validation | Batch 390/1567 | Loss: 1.0770 [2026-04-21 23:10:11] Validation | Batch 400/1567 | Loss: 1.0782 [2026-04-21 23:10:12] Validation | Batch 410/1567 | Loss: 1.0742 [2026-04-21 23:10:13] Validation | Batch 420/1567 | Loss: 1.0724 [2026-04-21 23:10:15] Validation | Batch 430/1567 | Loss: 1.0750 [2026-04-21 23:10:16] Validation | Batch 440/1567 | Loss: 1.0748 [2026-04-21 23:10:17] Validation | Batch 450/1567 | Loss: 1.0768 [2026-04-21 23:10:18] Validation | Batch 460/1567 | Loss: 1.0795 [2026-04-21 23:10:19] Validation | Batch 470/1567 | Loss: 1.0843 [2026-04-21 23:10:20] Validation | Batch 480/1567 | Loss: 1.0819 [2026-04-21 23:10:22] Validation | Batch 490/1567 | Loss: 1.0796 [2026-04-21 23:10:22] Validation | Batch 500/1567 | Loss: 1.0808 [2026-04-21 23:10:24] Validation | Batch 510/1567 | Loss: 1.0805 [2026-04-21 23:10:25] Validation | Batch 520/1567 | Loss: 1.0817 [2026-04-21 23:10:26] Validation | Batch 530/1567 | Loss: 1.0800 [2026-04-21 23:10:27] Validation | Batch 540/1567 | Loss: 1.0770 [2026-04-21 23:10:29] Validation | Batch 550/1567 | Loss: 1.0781 [2026-04-21 23:10:30] Validation | Batch 560/1567 | Loss: 1.0773 [2026-04-21 23:10:31] Validation | Batch 570/1567 | Loss: 1.0733 [2026-04-21 23:10:33] Validation | Batch 580/1567 | Loss: 1.0751 [2026-04-21 23:10:34] Validation | Batch 590/1567 | Loss: 1.0749 [2026-04-21 23:10:35] Validation | Batch 600/1567 | Loss: 1.0739 [2026-04-21 23:10:36] Validation | Batch 610/1567 | Loss: 1.0759 [2026-04-21 23:10:38] Validation | Batch 620/1567 | Loss: 1.0737 [2026-04-21 23:10:39] Validation | Batch 630/1567 | Loss: 1.0737 [2026-04-21 23:10:40] Validation | Batch 640/1567 | Loss: 1.0744 [2026-04-21 23:10:42] Validation | Batch 650/1567 | Loss: 1.0772 [2026-04-21 23:10:43] Validation | Batch 660/1567 | Loss: 1.0785 [2026-04-21 23:10:44] Validation | Batch 670/1567 | Loss: 1.0769 [2026-04-21 23:10:45] Validation | Batch 680/1567 | Loss: 1.0757 [2026-04-21 23:10:46] Validation | Batch 690/1567 | Loss: 1.0741 [2026-04-21 23:10:47] Validation | Batch 700/1567 | Loss: 1.0742 [2026-04-21 23:10:49] Validation | Batch 710/1567 | Loss: 1.0736 [2026-04-21 23:10:50] Validation | Batch 720/1567 | Loss: 1.0704 [2026-04-21 23:10:51] Validation | Batch 730/1567 | Loss: 1.0710 [2026-04-21 23:10:52] Validation | Batch 740/1567 | Loss: 1.0717 [2026-04-21 23:10:53] Validation | Batch 750/1567 | Loss: 1.0713 [2026-04-21 23:10:54] Validation | Batch 760/1567 | Loss: 1.0726 [2026-04-21 23:10:56] Validation | Batch 770/1567 | Loss: 1.0720 [2026-04-21 23:10:57] Validation | Batch 780/1567 | Loss: 1.0729 [2026-04-21 23:10:58] Validation | Batch 790/1567 | Loss: 1.0715 [2026-04-21 23:10:59] Validation | Batch 800/1567 | Loss: 1.0695 [2026-04-21 23:11:00] Validation | Batch 810/1567 | Loss: 1.0701 [2026-04-21 23:11:01] Validation | Batch 820/1567 | Loss: 1.0693 [2026-04-21 23:11:02] Validation | Batch 830/1567 | Loss: 1.0685 [2026-04-21 23:11:03] Validation | Batch 840/1567 | Loss: 1.0692 [2026-04-21 23:11:04] Validation | Batch 850/1567 | Loss: 1.0702 [2026-04-21 23:11:05] Validation | Batch 860/1567 | Loss: 1.0710 [2026-04-21 23:11:06] Validation | Batch 870/1567 | Loss: 1.0720 [2026-04-21 23:11:07] Validation | Batch 880/1567 | Loss: 1.0718 [2026-04-21 23:11:09] Validation | Batch 890/1567 | Loss: 1.0713 [2026-04-21 23:11:10] Validation | Batch 900/1567 | Loss: 1.0709 [2026-04-21 23:11:11] Validation | Batch 910/1567 | Loss: 1.0707 [2026-04-21 23:11:12] Validation | Batch 920/1567 | Loss: 1.0725 [2026-04-21 23:11:13] Validation | Batch 930/1567 | Loss: 1.0725 [2026-04-21 23:11:14] Validation | Batch 940/1567 | Loss: 1.0723 [2026-04-21 23:11:16] Validation | Batch 950/1567 | Loss: 1.0719 [2026-04-21 23:11:16] Validation | Batch 960/1567 | Loss: 1.0722 [2026-04-21 23:11:17] Validation | Batch 970/1567 | Loss: 1.0727 [2026-04-21 23:11:19] Validation | Batch 980/1567 | Loss: 1.0723 [2026-04-21 23:11:19] Validation | Batch 990/1567 | Loss: 1.0733 [2026-04-21 23:11:21] Validation | Batch 1000/1567 | Loss: 1.0735 [2026-04-21 23:11:22] Validation | Batch 1010/1567 | Loss: 1.0725 [2026-04-21 23:11:23] Validation | Batch 1020/1567 | Loss: 1.0738 [2026-04-21 23:11:24] Validation | Batch 1030/1567 | Loss: 1.0743 [2026-04-21 23:11:26] Validation | Batch 1040/1567 | Loss: 1.0734 [2026-04-21 23:11:27] Validation | Batch 1050/1567 | Loss: 1.0723 [2026-04-21 23:11:28] Validation | Batch 1060/1567 | Loss: 1.0736 [2026-04-21 23:11:29] Validation | Batch 1070/1567 | Loss: 1.0734 [2026-04-21 23:11:31] Validation | Batch 1080/1567 | Loss: 1.0748 [2026-04-21 23:11:32] Validation | Batch 1090/1567 | Loss: 1.0774 [2026-04-21 23:11:33] Validation | Batch 1100/1567 | Loss: 1.0790 [2026-04-21 23:11:34] Validation | Batch 1110/1567 | Loss: 1.0779 [2026-04-21 23:11:35] Validation | Batch 1120/1567 | Loss: 1.0781 [2026-04-21 23:11:36] Validation | Batch 1130/1567 | Loss: 1.0764 [2026-04-21 23:11:37] Validation | Batch 1140/1567 | Loss: 1.0768 [2026-04-21 23:11:39] Validation | Batch 1150/1567 | Loss: 1.0754 [2026-04-21 23:11:40] Validation | Batch 1160/1567 | Loss: 1.0747 [2026-04-21 23:11:41] Validation | Batch 1170/1567 | Loss: 1.0749 [2026-04-21 23:11:42] Validation | Batch 1180/1567 | Loss: 1.0753 [2026-04-21 23:11:43] Validation | Batch 1190/1567 | Loss: 1.0756 [2026-04-21 23:11:44] Validation | Batch 1200/1567 | Loss: 1.0744 [2026-04-21 23:11:46] Validation | Batch 1210/1567 | Loss: 1.0737 [2026-04-21 23:11:47] Validation | Batch 1220/1567 | Loss: 1.0747 [2026-04-21 23:11:48] Validation | Batch 1230/1567 | Loss: 1.0752 [2026-04-21 23:11:49] Validation | Batch 1240/1567 | Loss: 1.0749 [2026-04-21 23:11:50] Validation | Batch 1250/1567 | Loss: 1.0752 [2026-04-21 23:11:51] Validation | Batch 1260/1567 | Loss: 1.0750 [2026-04-21 23:11:53] Validation | Batch 1270/1567 | Loss: 1.0733 [2026-04-21 23:11:54] Validation | Batch 1280/1567 | Loss: 1.0735 [2026-04-21 23:11:56] Validation | Batch 1290/1567 | Loss: 1.0736 [2026-04-21 23:11:57] Validation | Batch 1300/1567 | Loss: 1.0739 [2026-04-21 23:11:58] Validation | Batch 1310/1567 | Loss: 1.0746 [2026-04-21 23:11:59] Validation | Batch 1320/1567 | Loss: 1.0753 [2026-04-21 23:12:00] Validation | Batch 1330/1567 | Loss: 1.0767 [2026-04-21 23:12:01] Validation | Batch 1340/1567 | Loss: 1.0763 [2026-04-21 23:12:02] Validation | Batch 1350/1567 | Loss: 1.0767 [2026-04-21 23:12:03] Validation | Batch 1360/1567 | Loss: 1.0758 [2026-04-21 23:12:05] Validation | Batch 1370/1567 | Loss: 1.0755 [2026-04-21 23:12:06] Validation | Batch 1380/1567 | Loss: 1.0756 [2026-04-21 23:12:07] Validation | Batch 1390/1567 | Loss: 1.0750 [2026-04-21 23:12:08] Validation | Batch 1400/1567 | Loss: 1.0746 [2026-04-21 23:12:09] Validation | Batch 1410/1567 | Loss: 1.0751 [2026-04-21 23:12:10] Validation | Batch 1420/1567 | Loss: 1.0752 [2026-04-21 23:12:11] Validation | Batch 1430/1567 | Loss: 1.0754 [2026-04-21 23:12:13] Validation | Batch 1440/1567 | Loss: 1.0762 [2026-04-21 23:12:14] Validation | Batch 1450/1567 | Loss: 1.0762 [2026-04-21 23:12:14] Validation | Batch 1460/1567 | Loss: 1.0756 [2026-04-21 23:12:15] Validation | Batch 1470/1567 | Loss: 1.0753 [2026-04-21 23:12:17] Validation | Batch 1480/1567 | Loss: 1.0750 [2026-04-21 23:12:17] Validation | Batch 1490/1567 | Loss: 1.0744 [2026-04-21 23:12:19] Validation | Batch 1500/1567 | Loss: 1.0741 [2026-04-21 23:12:20] Validation | Batch 1510/1567 | Loss: 1.0732 [2026-04-21 23:12:21] Validation | Batch 1520/1567 | Loss: 1.0731 [2026-04-21 23:12:21] Validation | Batch 1530/1567 | Loss: 1.0731 [2026-04-21 23:12:23] Validation | Batch 1540/1567 | Loss: 1.0737 [2026-04-21 23:12:24] Validation | Batch 1550/1567 | Loss: 1.0750 [2026-04-21 23:12:25] Validation | Batch 1560/1567 | Loss: 1.0747 [2026-04-21 23:12:26] Validation | Batch 1567/1567 | Loss: 1.0748 [2026-04-21 23:12:26] Validation | Loss: 1.0748 | PPL: 2.98 | Time: 184.77s [2026-04-21 23:12:30] Epoch 2 | Step 13010 | Loss: 0.9028 | LR: 1.96e-05 [2026-04-21 23:12:36] Epoch 2 | Step 13020 | Loss: 0.9028 | LR: 1.96e-05 [2026-04-21 23:12:40] Epoch 2 | Step 13030 | Loss: 0.9030 | LR: 1.96e-05 [2026-04-21 23:12:46] Epoch 2 | Step 13040 | Loss: 0.9033 | LR: 1.95e-05 [2026-04-21 23:12:51] Epoch 2 | Step 13050 | Loss: 0.9032 | LR: 1.95e-05 [2026-04-21 23:12:56] Epoch 2 | Step 13060 | Loss: 0.9034 | LR: 1.95e-05 [2026-04-21 23:13:02] Epoch 2 | Step 13070 | Loss: 0.9033 | LR: 1.94e-05 [2026-04-21 23:13:07] Epoch 2 | Step 13080 | Loss: 0.9032 | LR: 1.94e-05 [2026-04-21 23:13:12] Epoch 2 | Step 13090 | Loss: 0.9036 | LR: 1.94e-05 [2026-04-21 23:13:17] Epoch 2 | Step 13100 | Loss: 0.9040 | LR: 1.93e-05 [2026-04-21 23:13:23] Epoch 2 | Step 13110 | Loss: 0.9040 | LR: 1.93e-05 [2026-04-21 23:13:28] Epoch 2 | Step 13120 | Loss: 0.9037 | LR: 1.93e-05 [2026-04-21 23:13:33] Epoch 2 | Step 13130 | Loss: 0.9035 | LR: 1.92e-05 [2026-04-21 23:13:39] Epoch 2 | Step 13140 | Loss: 0.9033 | LR: 1.92e-05 [2026-04-21 23:13:44] Epoch 2 | Step 13150 | Loss: 0.9032 | LR: 1.92e-05 [2026-04-21 23:13:50] Epoch 2 | Step 13160 | Loss: 0.9037 | LR: 1.91e-05 [2026-04-21 23:13:55] Epoch 2 | Step 13170 | Loss: 0.9034 | LR: 1.91e-05 [2026-04-21 23:14:01] Epoch 2 | Step 13180 | Loss: 0.9037 | LR: 1.90e-05 [2026-04-21 23:14:07] Epoch 2 | Step 13190 | Loss: 0.9034 | LR: 1.90e-05 [2026-04-21 23:14:12] Epoch 2 | Step 13200 | Loss: 0.9032 | LR: 1.90e-05 [2026-04-21 23:14:17] Epoch 2 | Step 13210 | Loss: 0.9035 | LR: 1.89e-05 [2026-04-21 23:14:23] Epoch 2 | Step 13220 | Loss: 0.9035 | LR: 1.89e-05 [2026-04-21 23:14:28] Epoch 2 | Step 13230 | Loss: 0.9039 | LR: 1.88e-05 [2026-04-21 23:14:33] Epoch 2 | Step 13240 | Loss: 0.9034 | LR: 1.88e-05 [2026-04-21 23:14:38] Epoch 2 | Step 13250 | Loss: 0.9034 | LR: 1.87e-05 [2026-04-21 23:14:43] Epoch 2 | Step 13260 | Loss: 0.9033 | LR: 1.87e-05 [2026-04-21 23:14:48] Epoch 2 | Step 13270 | Loss: 0.9033 | LR: 1.87e-05 [2026-04-21 23:14:54] Epoch 2 | Step 13280 | Loss: 0.9033 | LR: 1.86e-05 [2026-04-21 23:15:00] Epoch 2 | Step 13290 | Loss: 0.9030 | LR: 1.86e-05 [2026-04-21 23:15:04] Epoch 2 | Step 13300 | Loss: 0.9027 | LR: 1.85e-05 [2026-04-21 23:15:09] Epoch 2 | Step 13310 | Loss: 0.9027 | LR: 1.85e-05 [2026-04-21 23:15:14] Epoch 2 | Step 13320 | Loss: 0.9030 | LR: 1.84e-05 [2026-04-21 23:15:20] Epoch 2 | Step 13330 | Loss: 0.9029 | LR: 1.84e-05 [2026-04-21 23:15:24] Epoch 2 | Step 13340 | Loss: 0.9025 | LR: 1.83e-05 [2026-04-21 23:15:29] Epoch 2 | Step 13350 | Loss: 0.9027 | LR: 1.83e-05 [2026-04-21 23:15:34] Epoch 2 | Step 13360 | Loss: 0.9024 | LR: 1.82e-05 [2026-04-21 23:15:39] Epoch 2 | Step 13370 | Loss: 0.9021 | LR: 1.82e-05 [2026-04-21 23:15:44] Epoch 2 | Step 13380 | Loss: 0.9020 | LR: 1.81e-05 [2026-04-21 23:15:49] Epoch 2 | Step 13390 | Loss: 0.9026 | LR: 1.80e-05 [2026-04-21 23:15:54] Epoch 2 | Step 13400 | Loss: 0.9029 | LR: 1.80e-05 [2026-04-21 23:15:59] Epoch 2 | Step 13410 | Loss: 0.9034 | LR: 1.79e-05 [2026-04-21 23:16:04] Epoch 2 | Step 13420 | Loss: 0.9035 | LR: 1.79e-05 [2026-04-21 23:16:09] Epoch 2 | Step 13430 | Loss: 0.9035 | LR: 1.78e-05 [2026-04-21 23:16:14] Epoch 2 | Step 13440 | Loss: 0.9036 | LR: 1.78e-05 [2026-04-21 23:16:18] Epoch 2 | Step 13450 | Loss: 0.9036 | LR: 1.77e-05 [2026-04-21 23:16:24] Epoch 2 | Step 13460 | Loss: 0.9033 | LR: 1.76e-05 [2026-04-21 23:16:29] Epoch 2 | Step 13470 | Loss: 0.9032 | LR: 1.76e-05 [2026-04-21 23:16:34] Epoch 2 | Step 13480 | Loss: 0.9031 | LR: 1.75e-05 [2026-04-21 23:16:40] Epoch 2 | Step 13490 | Loss: 0.9028 | LR: 1.75e-05 [2026-04-21 23:16:45] Epoch 2 | Step 13500 | Loss: 0.9026 | LR: 1.74e-05 [2026-04-21 23:16:50] Epoch 2 | Step 13510 | Loss: 0.9027 | LR: 1.73e-05 [2026-04-21 23:16:55] Epoch 2 | Step 13520 | Loss: 0.9030 | LR: 1.73e-05 [2026-04-21 23:17:00] Epoch 2 | Step 13530 | Loss: 0.9028 | LR: 1.72e-05 [2026-04-21 23:17:06] Epoch 2 | Step 13540 | Loss: 0.9031 | LR: 1.71e-05 [2026-04-21 23:17:11] Epoch 2 | Step 13550 | Loss: 0.9027 | LR: 1.71e-05 [2026-04-21 23:17:17] Epoch 2 | Step 13560 | Loss: 0.9025 | LR: 1.70e-05 [2026-04-21 23:17:21] Epoch 2 | Step 13570 | Loss: 0.9022 | LR: 1.69e-05 [2026-04-21 23:17:26] Epoch 2 | Step 13580 | Loss: 0.9024 | LR: 1.69e-05 [2026-04-21 23:17:31] Epoch 2 | Step 13590 | Loss: 0.9022 | LR: 1.68e-05 [2026-04-21 23:17:36] Epoch 2 | Step 13600 | Loss: 0.9021 | LR: 1.67e-05 [2026-04-21 23:17:42] Epoch 2 | Step 13610 | Loss: 0.9018 | LR: 1.67e-05 [2026-04-21 23:17:48] Epoch 2 | Step 13620 | Loss: 0.9018 | LR: 1.66e-05 [2026-04-21 23:17:53] Epoch 2 | Step 13630 | Loss: 0.9016 | LR: 1.65e-05 [2026-04-21 23:17:58] Epoch 2 | Step 13640 | Loss: 0.9017 | LR: 1.65e-05 [2026-04-21 23:18:03] Epoch 2 | Step 13650 | Loss: 0.9015 | LR: 1.64e-05 [2026-04-21 23:18:08] Epoch 2 | Step 13660 | Loss: 0.9017 | LR: 1.63e-05 [2026-04-21 23:18:13] Epoch 2 | Step 13670 | Loss: 0.9016 | LR: 1.62e-05 [2026-04-21 23:18:19] Epoch 2 | Step 13680 | Loss: 0.9013 | LR: 1.62e-05 [2026-04-21 23:18:24] Epoch 2 | Step 13690 | Loss: 0.9015 | LR: 1.61e-05 [2026-04-21 23:18:29] Epoch 2 | Step 13700 | Loss: 0.9017 | LR: 1.60e-05 [2026-04-21 23:18:35] Epoch 2 | Step 13710 | Loss: 0.9019 | LR: 1.59e-05 [2026-04-21 23:18:40] Epoch 2 | Step 13720 | Loss: 0.9019 | LR: 1.59e-05 [2026-04-21 23:18:45] Epoch 2 | Step 13730 | Loss: 0.9018 | LR: 1.58e-05 [2026-04-21 23:18:50] Epoch 2 | Step 13740 | Loss: 0.9020 | LR: 1.57e-05 [2026-04-21 23:18:55] Epoch 2 | Step 13750 | Loss: 0.9023 | LR: 1.56e-05 [2026-04-21 23:19:01] Epoch 2 | Step 13760 | Loss: 0.9020 | LR: 1.56e-05 [2026-04-21 23:19:06] Epoch 2 | Step 13770 | Loss: 0.9021 | LR: 1.55e-05 [2026-04-21 23:19:10] Epoch 2 | Step 13780 | Loss: 0.9018 | LR: 1.54e-05 [2026-04-21 23:19:16] Epoch 2 | Step 13790 | Loss: 0.9019 | LR: 1.53e-05 [2026-04-21 23:19:21] Epoch 2 | Step 13800 | Loss: 0.9016 | LR: 1.53e-05 [2026-04-21 23:19:26] Epoch 2 | Step 13810 | Loss: 0.9018 | LR: 1.52e-05 [2026-04-21 23:19:32] Epoch 2 | Step 13820 | Loss: 0.9015 | LR: 1.51e-05 [2026-04-21 23:19:36] Epoch 2 | Step 13830 | Loss: 0.9018 | LR: 1.50e-05 [2026-04-21 23:19:42] Epoch 2 | Step 13840 | Loss: 0.9018 | LR: 1.49e-05 [2026-04-21 23:19:47] Epoch 2 | Step 13850 | Loss: 0.9022 | LR: 1.49e-05 [2026-04-21 23:19:52] Epoch 2 | Step 13860 | Loss: 0.9022 | LR: 1.48e-05 [2026-04-21 23:19:57] Epoch 2 | Step 13870 | Loss: 0.9020 | LR: 1.47e-05 [2026-04-21 23:20:03] Epoch 2 | Step 13880 | Loss: 0.9023 | LR: 1.46e-05 [2026-04-21 23:20:08] Epoch 2 | Step 13890 | Loss: 0.9024 | LR: 1.45e-05 [2026-04-21 23:20:13] Epoch 2 | Step 13900 | Loss: 0.9026 | LR: 1.45e-05 [2026-04-21 23:20:18] Epoch 2 | Step 13910 | Loss: 0.9022 | LR: 1.44e-05 [2026-04-21 23:20:24] Epoch 2 | Step 13920 | Loss: 0.9024 | LR: 1.43e-05 [2026-04-21 23:20:29] Epoch 2 | Step 13930 | Loss: 0.9026 | LR: 1.42e-05 [2026-04-21 23:20:34] Epoch 2 | Step 13940 | Loss: 0.9027 | LR: 1.41e-05 [2026-04-21 23:20:40] Epoch 2 | Step 13950 | Loss: 0.9027 | LR: 1.40e-05 [2026-04-21 23:20:45] Epoch 2 | Step 13960 | Loss: 0.9026 | LR: 1.40e-05 [2026-04-21 23:20:50] Epoch 2 | Step 13970 | Loss: 0.9027 | LR: 1.39e-05 [2026-04-21 23:20:55] Epoch 2 | Step 13980 | Loss: 0.9026 | LR: 1.38e-05 [2026-04-21 23:21:01] Epoch 2 | Step 13990 | Loss: 0.9027 | LR: 1.37e-05 [2026-04-21 23:21:06] Epoch 2 | Step 14000 | Loss: 0.9030 | LR: 1.36e-05 [2026-04-21 23:21:08] Validation | Batch 10/1567 | Loss: 1.0445 [2026-04-21 23:21:09] Validation | Batch 20/1567 | Loss: 1.1312 [2026-04-21 23:21:10] Validation | Batch 30/1567 | Loss: 1.0851 [2026-04-21 23:21:12] Validation | Batch 40/1567 | Loss: 1.1060 [2026-04-21 23:21:13] Validation | Batch 50/1567 | Loss: 1.0835 [2026-04-21 23:21:14] Validation | Batch 60/1567 | Loss: 1.0717 [2026-04-21 23:21:15] Validation | Batch 70/1567 | Loss: 1.0630 [2026-04-21 23:21:17] Validation | Batch 80/1567 | Loss: 1.0710 [2026-04-21 23:21:18] Validation | Batch 90/1567 | Loss: 1.0657 [2026-04-21 23:21:19] Validation | Batch 100/1567 | Loss: 1.0449 [2026-04-21 23:21:20] Validation | Batch 110/1567 | Loss: 1.0359 [2026-04-21 23:21:22] Validation | Batch 120/1567 | Loss: 1.0311 [2026-04-21 23:21:23] Validation | Batch 130/1567 | Loss: 1.0263 [2026-04-21 23:21:24] Validation | Batch 140/1567 | Loss: 1.0366 [2026-04-21 23:21:25] Validation | Batch 150/1567 | Loss: 1.0475 [2026-04-21 23:21:26] Validation | Batch 160/1567 | Loss: 1.0462 [2026-04-21 23:21:27] Validation | Batch 170/1567 | Loss: 1.0384 [2026-04-21 23:21:28] Validation | Batch 180/1567 | Loss: 1.0413 [2026-04-21 23:21:29] Validation | Batch 190/1567 | Loss: 1.0456 [2026-04-21 23:21:31] Validation | Batch 200/1567 | Loss: 1.0497 [2026-04-21 23:21:32] Validation | Batch 210/1567 | Loss: 1.0482 [2026-04-21 23:21:33] Validation | Batch 220/1567 | Loss: 1.0534 [2026-04-21 23:21:35] Validation | Batch 230/1567 | Loss: 1.0574 [2026-04-21 23:21:36] Validation | Batch 240/1567 | Loss: 1.0595 [2026-04-21 23:21:37] Validation | Batch 250/1567 | Loss: 1.0637 [2026-04-21 23:21:38] Validation | Batch 260/1567 | Loss: 1.0667 [2026-04-21 23:21:39] Validation | Batch 270/1567 | Loss: 1.0714 [2026-04-21 23:21:41] Validation | Batch 280/1567 | Loss: 1.0748 [2026-04-21 23:21:43] Validation | Batch 290/1567 | Loss: 1.0700 [2026-04-21 23:21:44] Validation | Batch 300/1567 | Loss: 1.0693 [2026-04-21 23:21:45] Validation | Batch 310/1567 | Loss: 1.0659 [2026-04-21 23:21:46] Validation | Batch 320/1567 | Loss: 1.0689 [2026-04-21 23:21:48] Validation | Batch 330/1567 | Loss: 1.0689 [2026-04-21 23:21:49] Validation | Batch 340/1567 | Loss: 1.0682 [2026-04-21 23:21:50] Validation | Batch 350/1567 | Loss: 1.0659 [2026-04-21 23:21:51] Validation | Batch 360/1567 | Loss: 1.0596 [2026-04-21 23:21:53] Validation | Batch 370/1567 | Loss: 1.0598 [2026-04-21 23:21:54] Validation | Batch 380/1567 | Loss: 1.0641 [2026-04-21 23:21:55] Validation | Batch 390/1567 | Loss: 1.0631 [2026-04-21 23:21:56] Validation | Batch 400/1567 | Loss: 1.0640 [2026-04-21 23:21:58] Validation | Batch 410/1567 | Loss: 1.0600 [2026-04-21 23:21:59] Validation | Batch 420/1567 | Loss: 1.0583 [2026-04-21 23:22:00] Validation | Batch 430/1567 | Loss: 1.0607 [2026-04-21 23:22:01] Validation | Batch 440/1567 | Loss: 1.0607 [2026-04-21 23:22:02] Validation | Batch 450/1567 | Loss: 1.0625 [2026-04-21 23:22:04] Validation | Batch 460/1567 | Loss: 1.0651 [2026-04-21 23:22:04] Validation | Batch 470/1567 | Loss: 1.0700 [2026-04-21 23:22:06] Validation | Batch 480/1567 | Loss: 1.0675 [2026-04-21 23:22:07] Validation | Batch 490/1567 | Loss: 1.0652 [2026-04-21 23:22:08] Validation | Batch 500/1567 | Loss: 1.0663 [2026-04-21 23:22:09] Validation | Batch 510/1567 | Loss: 1.0662 [2026-04-21 23:22:10] Validation | Batch 520/1567 | Loss: 1.0675 [2026-04-21 23:22:11] Validation | Batch 530/1567 | Loss: 1.0659 [2026-04-21 23:22:13] Validation | Batch 540/1567 | Loss: 1.0632 [2026-04-21 23:22:14] Validation | Batch 550/1567 | Loss: 1.0646 [2026-04-21 23:22:15] Validation | Batch 560/1567 | Loss: 1.0637 [2026-04-21 23:22:17] Validation | Batch 570/1567 | Loss: 1.0596 [2026-04-21 23:22:18] Validation | Batch 580/1567 | Loss: 1.0616 [2026-04-21 23:22:19] Validation | Batch 590/1567 | Loss: 1.0612 [2026-04-21 23:22:20] Validation | Batch 600/1567 | Loss: 1.0601 [2026-04-21 23:22:22] Validation | Batch 610/1567 | Loss: 1.0622 [2026-04-21 23:22:23] Validation | Batch 620/1567 | Loss: 1.0600 [2026-04-21 23:22:24] Validation | Batch 630/1567 | Loss: 1.0602 [2026-04-21 23:22:26] Validation | Batch 640/1567 | Loss: 1.0607 [2026-04-21 23:22:27] Validation | Batch 650/1567 | Loss: 1.0636 [2026-04-21 23:22:28] Validation | Batch 660/1567 | Loss: 1.0649 [2026-04-21 23:22:29] Validation | Batch 670/1567 | Loss: 1.0632 [2026-04-21 23:22:30] Validation | Batch 680/1567 | Loss: 1.0621 [2026-04-21 23:22:31] Validation | Batch 690/1567 | Loss: 1.0605 [2026-04-21 23:22:33] Validation | Batch 700/1567 | Loss: 1.0606 [2026-04-21 23:22:34] Validation | Batch 710/1567 | Loss: 1.0598 [2026-04-21 23:22:35] Validation | Batch 720/1567 | Loss: 1.0567 [2026-04-21 23:22:36] Validation | Batch 730/1567 | Loss: 1.0572 [2026-04-21 23:22:37] Validation | Batch 740/1567 | Loss: 1.0578 [2026-04-21 23:22:38] Validation | Batch 750/1567 | Loss: 1.0574 [2026-04-21 23:22:39] Validation | Batch 760/1567 | Loss: 1.0587 [2026-04-21 23:22:41] Validation | Batch 770/1567 | Loss: 1.0580 [2026-04-21 23:22:42] Validation | Batch 780/1567 | Loss: 1.0591 [2026-04-21 23:22:43] Validation | Batch 790/1567 | Loss: 1.0576 [2026-04-21 23:22:44] Validation | Batch 800/1567 | Loss: 1.0559 [2026-04-21 23:22:45] Validation | Batch 810/1567 | Loss: 1.0565 [2026-04-21 23:22:46] Validation | Batch 820/1567 | Loss: 1.0558 [2026-04-21 23:22:47] Validation | Batch 830/1567 | Loss: 1.0551 [2026-04-21 23:22:48] Validation | Batch 840/1567 | Loss: 1.0557 [2026-04-21 23:22:49] Validation | Batch 850/1567 | Loss: 1.0568 [2026-04-21 23:22:50] Validation | Batch 860/1567 | Loss: 1.0575 [2026-04-21 23:22:51] Validation | Batch 870/1567 | Loss: 1.0584 [2026-04-21 23:22:53] Validation | Batch 880/1567 | Loss: 1.0584 [2026-04-21 23:22:54] Validation | Batch 890/1567 | Loss: 1.0580 [2026-04-21 23:22:55] Validation | Batch 900/1567 | Loss: 1.0575 [2026-04-21 23:22:56] Validation | Batch 910/1567 | Loss: 1.0571 [2026-04-21 23:22:58] Validation | Batch 920/1567 | Loss: 1.0591 [2026-04-21 23:22:59] Validation | Batch 930/1567 | Loss: 1.0591 [2026-04-21 23:23:00] Validation | Batch 940/1567 | Loss: 1.0590 [2026-04-21 23:23:01] Validation | Batch 950/1567 | Loss: 1.0585 [2026-04-21 23:23:02] Validation | Batch 960/1567 | Loss: 1.0588 [2026-04-21 23:23:03] Validation | Batch 970/1567 | Loss: 1.0595 [2026-04-21 23:23:04] Validation | Batch 980/1567 | Loss: 1.0591 [2026-04-21 23:23:05] Validation | Batch 990/1567 | Loss: 1.0600 [2026-04-21 23:23:06] Validation | Batch 1000/1567 | Loss: 1.0604 [2026-04-21 23:23:07] Validation | Batch 1010/1567 | Loss: 1.0596 [2026-04-21 23:23:08] Validation | Batch 1020/1567 | Loss: 1.0608 [2026-04-21 23:23:09] Validation | Batch 1030/1567 | Loss: 1.0612 [2026-04-21 23:23:11] Validation | Batch 1040/1567 | Loss: 1.0603 [2026-04-21 23:23:12] Validation | Batch 1050/1567 | Loss: 1.0594 [2026-04-21 23:23:13] Validation | Batch 1060/1567 | Loss: 1.0606 [2026-04-21 23:23:15] Validation | Batch 1070/1567 | Loss: 1.0604 [2026-04-21 23:23:16] Validation | Batch 1080/1567 | Loss: 1.0617 [2026-04-21 23:23:17] Validation | Batch 1090/1567 | Loss: 1.0643 [2026-04-21 23:23:18] Validation | Batch 1100/1567 | Loss: 1.0659 [2026-04-21 23:23:19] Validation | Batch 1110/1567 | Loss: 1.0649 [2026-04-21 23:23:20] Validation | Batch 1120/1567 | Loss: 1.0651 [2026-04-21 23:23:22] Validation | Batch 1130/1567 | Loss: 1.0633 [2026-04-21 23:23:23] Validation | Batch 1140/1567 | Loss: 1.0638 [2026-04-21 23:23:24] Validation | Batch 1150/1567 | Loss: 1.0624 [2026-04-21 23:23:25] Validation | Batch 1160/1567 | Loss: 1.0617 [2026-04-21 23:23:26] Validation | Batch 1170/1567 | Loss: 1.0620 [2026-04-21 23:23:27] Validation | Batch 1180/1567 | Loss: 1.0622 [2026-04-21 23:23:29] Validation | Batch 1190/1567 | Loss: 1.0625 [2026-04-21 23:23:30] Validation | Batch 1200/1567 | Loss: 1.0613 [2026-04-21 23:23:31] Validation | Batch 1210/1567 | Loss: 1.0606 [2026-04-21 23:23:32] Validation | Batch 1220/1567 | Loss: 1.0615 [2026-04-21 23:23:33] Validation | Batch 1230/1567 | Loss: 1.0620 [2026-04-21 23:23:34] Validation | Batch 1240/1567 | Loss: 1.0619 [2026-04-21 23:23:35] Validation | Batch 1250/1567 | Loss: 1.0623 [2026-04-21 23:23:37] Validation | Batch 1260/1567 | Loss: 1.0621 [2026-04-21 23:23:38] Validation | Batch 1270/1567 | Loss: 1.0603 [2026-04-21 23:23:39] Validation | Batch 1280/1567 | Loss: 1.0605 [2026-04-21 23:23:41] Validation | Batch 1290/1567 | Loss: 1.0606 [2026-04-21 23:23:42] Validation | Batch 1300/1567 | Loss: 1.0609 [2026-04-21 23:23:43] Validation | Batch 1310/1567 | Loss: 1.0617 [2026-04-21 23:23:45] Validation | Batch 1320/1567 | Loss: 1.0622 [2026-04-21 23:23:46] Validation | Batch 1330/1567 | Loss: 1.0637 [2026-04-21 23:23:47] Validation | Batch 1340/1567 | Loss: 1.0635 [2026-04-21 23:23:48] Validation | Batch 1350/1567 | Loss: 1.0638 [2026-04-21 23:23:49] Validation | Batch 1360/1567 | Loss: 1.0629 [2026-04-21 23:23:50] Validation | Batch 1370/1567 | Loss: 1.0626 [2026-04-21 23:23:51] Validation | Batch 1380/1567 | Loss: 1.0626 [2026-04-21 23:23:52] Validation | Batch 1390/1567 | Loss: 1.0619 [2026-04-21 23:23:53] Validation | Batch 1400/1567 | Loss: 1.0615 [2026-04-21 23:23:54] Validation | Batch 1410/1567 | Loss: 1.0621 [2026-04-21 23:23:56] Validation | Batch 1420/1567 | Loss: 1.0621 [2026-04-21 23:23:57] Validation | Batch 1430/1567 | Loss: 1.0625 [2026-04-21 23:23:58] Validation | Batch 1440/1567 | Loss: 1.0633 [2026-04-21 23:23:59] Validation | Batch 1450/1567 | Loss: 1.0635 [2026-04-21 23:24:00] Validation | Batch 1460/1567 | Loss: 1.0629 [2026-04-21 23:24:01] Validation | Batch 1470/1567 | Loss: 1.0627 [2026-04-21 23:24:02] Validation | Batch 1480/1567 | Loss: 1.0624 [2026-04-21 23:24:03] Validation | Batch 1490/1567 | Loss: 1.0619 [2026-04-21 23:24:04] Validation | Batch 1500/1567 | Loss: 1.0616 [2026-04-21 23:24:05] Validation | Batch 1510/1567 | Loss: 1.0607 [2026-04-21 23:24:06] Validation | Batch 1520/1567 | Loss: 1.0606 [2026-04-21 23:24:07] Validation | Batch 1530/1567 | Loss: 1.0607 [2026-04-21 23:24:08] Validation | Batch 1540/1567 | Loss: 1.0612 [2026-04-21 23:24:09] Validation | Batch 1550/1567 | Loss: 1.0625 [2026-04-21 23:24:10] Validation | Batch 1560/1567 | Loss: 1.0622 [2026-04-21 23:24:11] Validation | Batch 1567/1567 | Loss: 1.0622 [2026-04-21 23:24:11] Validation | Loss: 1.0622 | PPL: 2.94 | Time: 184.82s [2026-04-21 23:24:29] New best model saved! Val loss: 1.0622 [2026-04-21 23:24:35] Epoch 2 | Step 14010 | Loss: 0.9028 | LR: 1.35e-05 [2026-04-21 23:24:41] Epoch 2 | Step 14020 | Loss: 0.9026 | LR: 1.34e-05 [2026-04-21 23:24:45] Epoch 2 | Step 14030 | Loss: 0.9025 | LR: 1.34e-05 [2026-04-21 23:24:51] Epoch 2 | Step 14040 | Loss: 0.9022 | LR: 1.33e-05 [2026-04-21 23:24:56] Epoch 2 | Step 14050 | Loss: 0.9025 | LR: 1.32e-05 [2026-04-21 23:25:01] Epoch 2 | Step 14060 | Loss: 0.9022 | LR: 1.31e-05 [2026-04-21 23:25:06] Epoch 2 | Step 14070 | Loss: 0.9023 | LR: 1.30e-05 [2026-04-21 23:25:12] Epoch 2 | Step 14080 | Loss: 0.9021 | LR: 1.29e-05 [2026-04-21 23:25:17] Epoch 2 | Step 14090 | Loss: 0.9026 | LR: 1.28e-05 [2026-04-21 23:25:23] Epoch 2 | Step 14100 | Loss: 0.9028 | LR: 1.28e-05 [2026-04-21 23:25:28] Epoch 2 | Step 14110 | Loss: 0.9024 | LR: 1.27e-05 [2026-04-21 23:25:33] Epoch 2 | Step 14120 | Loss: 0.9024 | LR: 1.26e-05 [2026-04-21 23:25:39] Epoch 2 | Step 14130 | Loss: 0.9028 | LR: 1.25e-05 [2026-04-21 23:25:44] Epoch 2 | Step 14140 | Loss: 0.9027 | LR: 1.24e-05 [2026-04-21 23:25:49] Epoch 2 | Step 14150 | Loss: 0.9027 | LR: 1.23e-05 [2026-04-21 23:25:55] Epoch 2 | Step 14160 | Loss: 0.9029 | LR: 1.22e-05 [2026-04-21 23:25:59] Epoch 2 | Step 14170 | Loss: 0.9028 | LR: 1.21e-05 [2026-04-21 23:26:04] Epoch 2 | Step 14180 | Loss: 0.9031 | LR: 1.21e-05 [2026-04-21 23:26:10] Epoch 2 | Step 14190 | Loss: 0.9033 | LR: 1.20e-05 [2026-04-21 23:26:15] Epoch 2 | Step 14200 | Loss: 0.9030 | LR: 1.19e-05 [2026-04-21 23:26:20] Epoch 2 | Step 14210 | Loss: 0.9029 | LR: 1.18e-05 [2026-04-21 23:26:25] Epoch 2 | Step 14220 | Loss: 0.9027 | LR: 1.17e-05 [2026-04-21 23:26:30] Epoch 2 | Step 14230 | Loss: 0.9029 | LR: 1.16e-05 [2026-04-21 23:26:36] Epoch 2 | Step 14240 | Loss: 0.9026 | LR: 1.15e-05 [2026-04-21 23:26:41] Epoch 2 | Step 14250 | Loss: 0.9027 | LR: 1.14e-05 [2026-04-21 23:26:47] Epoch 2 | Step 14260 | Loss: 0.9025 | LR: 1.13e-05 [2026-04-21 23:26:52] Epoch 2 | Step 14270 | Loss: 0.9027 | LR: 1.13e-05 [2026-04-21 23:26:57] Epoch 2 | Step 14280 | Loss: 0.9028 | LR: 1.12e-05 [2026-04-21 23:27:02] Epoch 2 | Step 14290 | Loss: 0.9030 | LR: 1.11e-05 [2026-04-21 23:27:07] Epoch 2 | Step 14300 | Loss: 0.9029 | LR: 1.10e-05 [2026-04-21 23:27:12] Epoch 2 | Step 14310 | Loss: 0.9027 | LR: 1.09e-05 [2026-04-21 23:27:18] Epoch 2 | Step 14320 | Loss: 0.9027 | LR: 1.08e-05 [2026-04-21 23:27:23] Epoch 2 | Step 14330 | Loss: 0.9026 | LR: 1.07e-05 [2026-04-21 23:27:30] Epoch 2 | Step 14340 | Loss: 0.9027 | LR: 1.06e-05 [2026-04-21 23:27:34] Epoch 2 | Step 14350 | Loss: 0.9028 | LR: 1.05e-05 [2026-04-21 23:27:40] Epoch 2 | Step 14360 | Loss: 0.9028 | LR: 1.05e-05 [2026-04-21 23:27:46] Epoch 2 | Step 14370 | Loss: 0.9026 | LR: 1.04e-05 [2026-04-21 23:27:51] Epoch 2 | Step 14380 | Loss: 0.9026 | LR: 1.03e-05 [2026-04-21 23:27:56] Epoch 2 | Step 14390 | Loss: 0.9024 | LR: 1.02e-05 [2026-04-21 23:28:01] Epoch 2 | Step 14400 | Loss: 0.9024 | LR: 1.01e-05 [2026-04-21 23:28:06] Epoch 2 | Step 14410 | Loss: 0.9022 | LR: 1.00e-05 [2026-04-21 23:28:11] Epoch 2 | Step 14420 | Loss: 0.9021 | LR: 9.92e-06 [2026-04-21 23:28:16] Epoch 2 | Step 14430 | Loss: 0.9020 | LR: 9.84e-06 [2026-04-21 23:28:21] Epoch 2 | Step 14440 | Loss: 0.9018 | LR: 9.75e-06 [2026-04-21 23:28:26] Epoch 2 | Step 14450 | Loss: 0.9017 | LR: 9.66e-06 [2026-04-21 23:28:31] Epoch 2 | Step 14460 | Loss: 0.9018 | LR: 9.57e-06 [2026-04-21 23:28:37] Epoch 2 | Step 14470 | Loss: 0.9022 | LR: 9.48e-06 [2026-04-21 23:28:42] Epoch 2 | Step 14480 | Loss: 0.9022 | LR: 9.40e-06 [2026-04-21 23:28:47] Epoch 2 | Step 14490 | Loss: 0.9024 | LR: 9.31e-06 [2026-04-21 23:28:53] Epoch 2 | Step 14500 | Loss: 0.9024 | LR: 9.22e-06 [2026-04-21 23:28:58] Epoch 2 | Step 14510 | Loss: 0.9023 | LR: 9.13e-06 [2026-04-21 23:29:03] Epoch 2 | Step 14520 | Loss: 0.9022 | LR: 9.05e-06 [2026-04-21 23:29:08] Epoch 2 | Step 14530 | Loss: 0.9022 | LR: 8.96e-06 [2026-04-21 23:29:13] Epoch 2 | Step 14540 | Loss: 0.9023 | LR: 8.87e-06 [2026-04-21 23:29:18] Epoch 2 | Step 14550 | Loss: 0.9025 | LR: 8.79e-06 [2026-04-21 23:29:24] Epoch 2 | Step 14560 | Loss: 0.9024 | LR: 8.70e-06 [2026-04-21 23:29:29] Epoch 2 | Step 14570 | Loss: 0.9026 | LR: 8.62e-06 [2026-04-21 23:29:35] Epoch 2 | Step 14580 | Loss: 0.9024 | LR: 8.53e-06 [2026-04-21 23:29:40] Epoch 2 | Step 14590 | Loss: 0.9023 | LR: 8.44e-06 [2026-04-21 23:29:45] Epoch 2 | Step 14600 | Loss: 0.9020 | LR: 8.36e-06 [2026-04-21 23:29:50] Epoch 2 | Step 14610 | Loss: 0.9019 | LR: 8.27e-06 [2026-04-21 23:29:55] Epoch 2 | Step 14620 | Loss: 0.9016 | LR: 8.19e-06 [2026-04-21 23:30:00] Epoch 2 | Step 14630 | Loss: 0.9017 | LR: 8.10e-06 [2026-04-21 23:30:06] Epoch 2 | Step 14640 | Loss: 0.9021 | LR: 8.02e-06 [2026-04-21 23:30:13] Epoch 2 | Step 14650 | Loss: 0.9022 | LR: 7.94e-06 [2026-04-21 23:30:18] Epoch 2 | Step 14660 | Loss: 0.9022 | LR: 7.85e-06 [2026-04-21 23:30:24] Epoch 2 | Step 14670 | Loss: 0.9025 | LR: 7.77e-06 [2026-04-21 23:30:29] Epoch 2 | Step 14680 | Loss: 0.9025 | LR: 7.69e-06 [2026-04-21 23:30:35] Epoch 2 | Step 14690 | Loss: 0.9022 | LR: 7.60e-06 [2026-04-21 23:30:40] Epoch 2 | Step 14700 | Loss: 0.9022 | LR: 7.52e-06 [2026-04-21 23:30:44] Epoch 2 | Step 14710 | Loss: 0.9024 | LR: 7.44e-06 [2026-04-21 23:30:50] Epoch 2 | Step 14720 | Loss: 0.9022 | LR: 7.36e-06 [2026-04-21 23:30:55] Epoch 2 | Step 14730 | Loss: 0.9019 | LR: 7.28e-06 [2026-04-21 23:31:00] Epoch 2 | Step 14740 | Loss: 0.9019 | LR: 7.20e-06 [2026-04-21 23:31:05] Epoch 2 | Step 14750 | Loss: 0.9020 | LR: 7.12e-06 [2026-04-21 23:31:11] Epoch 2 | Step 14760 | Loss: 0.9018 | LR: 7.04e-06 [2026-04-21 23:31:17] Epoch 2 | Step 14770 | Loss: 0.9016 | LR: 6.96e-06 [2026-04-21 23:31:21] Epoch 2 | Step 14780 | Loss: 0.9019 | LR: 6.88e-06 [2026-04-21 23:31:27] Epoch 2 | Step 14790 | Loss: 0.9020 | LR: 6.80e-06 [2026-04-21 23:31:32] Epoch 2 | Step 14800 | Loss: 0.9019 | LR: 6.72e-06 [2026-04-21 23:31:38] Epoch 2 | Step 14810 | Loss: 0.9018 | LR: 6.64e-06 [2026-04-21 23:31:43] Epoch 2 | Step 14820 | Loss: 0.9018 | LR: 6.56e-06 [2026-04-21 23:31:48] Epoch 2 | Step 14830 | Loss: 0.9017 | LR: 6.49e-06 [2026-04-21 23:31:53] Epoch 2 | Step 14840 | Loss: 0.9018 | LR: 6.41e-06 [2026-04-21 23:31:58] Epoch 2 | Step 14850 | Loss: 0.9016 | LR: 6.33e-06 [2026-04-21 23:32:03] Epoch 2 | Step 14860 | Loss: 0.9016 | LR: 6.26e-06 [2026-04-21 23:32:08] Epoch 2 | Step 14870 | Loss: 0.9016 | LR: 6.18e-06 [2026-04-21 23:32:14] Epoch 2 | Step 14880 | Loss: 0.9017 | LR: 6.11e-06 [2026-04-21 23:32:20] Epoch 2 | Step 14890 | Loss: 0.9014 | LR: 6.03e-06 [2026-04-21 23:32:25] Epoch 2 | Step 14900 | Loss: 0.9013 | LR: 5.96e-06 [2026-04-21 23:32:30] Epoch 2 | Step 14910 | Loss: 0.9014 | LR: 5.89e-06 [2026-04-21 23:32:35] Epoch 2 | Step 14920 | Loss: 0.9014 | LR: 5.81e-06 [2026-04-21 23:32:41] Epoch 2 | Step 14930 | Loss: 0.9012 | LR: 5.74e-06 [2026-04-21 23:32:46] Epoch 2 | Step 14940 | Loss: 0.9010 | LR: 5.67e-06 [2026-04-21 23:32:51] Epoch 2 | Step 14950 | Loss: 0.9012 | LR: 5.60e-06 [2026-04-21 23:32:57] Epoch 2 | Step 14960 | Loss: 0.9011 | LR: 5.53e-06 [2026-04-21 23:33:02] Epoch 2 | Step 14970 | Loss: 0.9011 | LR: 5.46e-06 [2026-04-21 23:33:08] Epoch 2 | Step 14980 | Loss: 0.9010 | LR: 5.39e-06 [2026-04-21 23:33:13] Epoch 2 | Step 14990 | Loss: 0.9009 | LR: 5.32e-06 [2026-04-21 23:33:18] Epoch 2 | Step 15000 | Loss: 0.9011 | LR: 5.25e-06 [2026-04-21 23:33:29] Checkpoint saved: outputs/2026-04-21/20-28-37/checkpoints/checkpoint_step_15000.pt [2026-04-21 23:34:44] Validation | Batch 10/1567 | Loss: 1.0426 [2026-04-21 23:34:45] Validation | Batch 20/1567 | Loss: 1.1266 [2026-04-21 23:34:46] Validation | Batch 30/1567 | Loss: 1.0829 [2026-04-21 23:34:48] Validation | Batch 40/1567 | Loss: 1.1028 [2026-04-21 23:34:49] Validation | Batch 50/1567 | Loss: 1.0788 [2026-04-21 23:34:50] Validation | Batch 60/1567 | Loss: 1.0674 [2026-04-21 23:34:51] Validation | Batch 70/1567 | Loss: 1.0584 [2026-04-21 23:34:53] Validation | Batch 80/1567 | Loss: 1.0667 [2026-04-21 23:34:54] Validation | Batch 90/1567 | Loss: 1.0612 [2026-04-21 23:34:56] Validation | Batch 100/1567 | Loss: 1.0401 [2026-04-21 23:34:57] Validation | Batch 110/1567 | Loss: 1.0305 [2026-04-21 23:34:58] Validation | Batch 120/1567 | Loss: 1.0252 [2026-04-21 23:34:59] Validation | Batch 130/1567 | Loss: 1.0201 [2026-04-21 23:35:00] Validation | Batch 140/1567 | Loss: 1.0303 [2026-04-21 23:35:01] Validation | Batch 150/1567 | Loss: 1.0411 [2026-04-21 23:35:02] Validation | Batch 160/1567 | Loss: 1.0400 [2026-04-21 23:35:03] Validation | Batch 170/1567 | Loss: 1.0322 [2026-04-21 23:35:04] Validation | Batch 180/1567 | Loss: 1.0349 [2026-04-21 23:35:06] Validation | Batch 190/1567 | Loss: 1.0395 [2026-04-21 23:35:07] Validation | Batch 200/1567 | Loss: 1.0431 [2026-04-21 23:35:08] Validation | Batch 210/1567 | Loss: 1.0411 [2026-04-21 23:35:10] Validation | Batch 220/1567 | Loss: 1.0465 [2026-04-21 23:35:11] Validation | Batch 230/1567 | Loss: 1.0501 [2026-04-21 23:35:12] Validation | Batch 240/1567 | Loss: 1.0519 [2026-04-21 23:35:13] Validation | Batch 250/1567 | Loss: 1.0561 [2026-04-21 23:35:14] Validation | Batch 260/1567 | Loss: 1.0591 [2026-04-21 23:35:16] Validation | Batch 270/1567 | Loss: 1.0636 [2026-04-21 23:35:17] Validation | Batch 280/1567 | Loss: 1.0671 [2026-04-21 23:35:19] Validation | Batch 290/1567 | Loss: 1.0624 [2026-04-21 23:35:20] Validation | Batch 300/1567 | Loss: 1.0618 [2026-04-21 23:35:22] Validation | Batch 310/1567 | Loss: 1.0584 [2026-04-21 23:35:22] Validation | Batch 320/1567 | Loss: 1.0614 [2026-04-21 23:35:24] Validation | Batch 330/1567 | Loss: 1.0615 [2026-04-21 23:35:25] Validation | Batch 340/1567 | Loss: 1.0608 [2026-04-21 23:35:26] Validation | Batch 350/1567 | Loss: 1.0583 [2026-04-21 23:35:28] Validation | Batch 360/1567 | Loss: 1.0522 [2026-04-21 23:35:29] Validation | Batch 370/1567 | Loss: 1.0523 [2026-04-21 23:35:30] Validation | Batch 380/1567 | Loss: 1.0564 [2026-04-21 23:35:31] Validation | Batch 390/1567 | Loss: 1.0554 [2026-04-21 23:35:32] Validation | Batch 400/1567 | Loss: 1.0561 [2026-04-21 23:35:34] Validation | Batch 410/1567 | Loss: 1.0522 [2026-04-21 23:35:35] Validation | Batch 420/1567 | Loss: 1.0506 [2026-04-21 23:35:36] Validation | Batch 430/1567 | Loss: 1.0532 [2026-04-21 23:35:37] Validation | Batch 440/1567 | Loss: 1.0533 [2026-04-21 23:35:38] Validation | Batch 450/1567 | Loss: 1.0553 [2026-04-21 23:35:40] Validation | Batch 460/1567 | Loss: 1.0579 [2026-04-21 23:35:41] Validation | Batch 470/1567 | Loss: 1.0628 [2026-04-21 23:35:42] Validation | Batch 480/1567 | Loss: 1.0604 [2026-04-21 23:35:43] Validation | Batch 490/1567 | Loss: 1.0582 [2026-04-21 23:35:44] Validation | Batch 500/1567 | Loss: 1.0592 [2026-04-21 23:35:45] Validation | Batch 510/1567 | Loss: 1.0590 [2026-04-21 23:35:46] Validation | Batch 520/1567 | Loss: 1.0604 [2026-04-21 23:35:47] Validation | Batch 530/1567 | Loss: 1.0588 [2026-04-21 23:35:49] Validation | Batch 540/1567 | Loss: 1.0561 [2026-04-21 23:35:50] Validation | Batch 550/1567 | Loss: 1.0573 [2026-04-21 23:35:51] Validation | Batch 560/1567 | Loss: 1.0564 [2026-04-21 23:35:53] Validation | Batch 570/1567 | Loss: 1.0524 [2026-04-21 23:35:54] Validation | Batch 580/1567 | Loss: 1.0543 [2026-04-21 23:35:55] Validation | Batch 590/1567 | Loss: 1.0541 [2026-04-21 23:35:56] Validation | Batch 600/1567 | Loss: 1.0530 [2026-04-21 23:35:58] Validation | Batch 610/1567 | Loss: 1.0550 [2026-04-21 23:35:59] Validation | Batch 620/1567 | Loss: 1.0529 [2026-04-21 23:36:01] Validation | Batch 630/1567 | Loss: 1.0531 [2026-04-21 23:36:02] Validation | Batch 640/1567 | Loss: 1.0538 [2026-04-21 23:36:03] Validation | Batch 650/1567 | Loss: 1.0566 [2026-04-21 23:36:04] Validation | Batch 660/1567 | Loss: 1.0580 [2026-04-21 23:36:05] Validation | Batch 670/1567 | Loss: 1.0562 [2026-04-21 23:36:06] Validation | Batch 680/1567 | Loss: 1.0550 [2026-04-21 23:36:08] Validation | Batch 690/1567 | Loss: 1.0535 [2026-04-21 23:36:09] Validation | Batch 700/1567 | Loss: 1.0536 [2026-04-21 23:36:10] Validation | Batch 710/1567 | Loss: 1.0528 [2026-04-21 23:36:11] Validation | Batch 720/1567 | Loss: 1.0497 [2026-04-21 23:36:12] Validation | Batch 730/1567 | Loss: 1.0502 [2026-04-21 23:36:13] Validation | Batch 740/1567 | Loss: 1.0508 [2026-04-21 23:36:14] Validation | Batch 750/1567 | Loss: 1.0504 [2026-04-21 23:36:15] Validation | Batch 760/1567 | Loss: 1.0517 [2026-04-21 23:36:17] Validation | Batch 770/1567 | Loss: 1.0512 [2026-04-21 23:36:18] Validation | Batch 780/1567 | Loss: 1.0523 [2026-04-21 23:36:19] Validation | Batch 790/1567 | Loss: 1.0508 [2026-04-21 23:36:20] Validation | Batch 800/1567 | Loss: 1.0490 [2026-04-21 23:36:21] Validation | Batch 810/1567 | Loss: 1.0495 [2026-04-21 23:36:23] Validation | Batch 820/1567 | Loss: 1.0488 [2026-04-21 23:36:24] Validation | Batch 830/1567 | Loss: 1.0480 [2026-04-21 23:36:25] Validation | Batch 840/1567 | Loss: 1.0487 [2026-04-21 23:36:26] Validation | Batch 850/1567 | Loss: 1.0498 [2026-04-21 23:36:26] Validation | Batch 860/1567 | Loss: 1.0506 [2026-04-21 23:36:27] Validation | Batch 870/1567 | Loss: 1.0514 [2026-04-21 23:36:29] Validation | Batch 880/1567 | Loss: 1.0512 [2026-04-21 23:36:30] Validation | Batch 890/1567 | Loss: 1.0508 [2026-04-21 23:36:32] Validation | Batch 900/1567 | Loss: 1.0505 [2026-04-21 23:36:33] Validation | Batch 910/1567 | Loss: 1.0502 [2026-04-21 23:36:34] Validation | Batch 920/1567 | Loss: 1.0521 [2026-04-21 23:36:35] Validation | Batch 930/1567 | Loss: 1.0520 [2026-04-21 23:36:36] Validation | Batch 940/1567 | Loss: 1.0519 [2026-04-21 23:36:37] Validation | Batch 950/1567 | Loss: 1.0515 [2026-04-21 23:36:38] Validation | Batch 960/1567 | Loss: 1.0517 [2026-04-21 23:36:39] Validation | Batch 970/1567 | Loss: 1.0523 [2026-04-21 23:36:40] Validation | Batch 980/1567 | Loss: 1.0519 [2026-04-21 23:36:41] Validation | Batch 990/1567 | Loss: 1.0528 [2026-04-21 23:36:42] Validation | Batch 1000/1567 | Loss: 1.0532 [2026-04-21 23:36:43] Validation | Batch 1010/1567 | Loss: 1.0524 [2026-04-21 23:36:45] Validation | Batch 1020/1567 | Loss: 1.0536 [2026-04-21 23:36:46] Validation | Batch 1030/1567 | Loss: 1.0541 [2026-04-21 23:36:47] Validation | Batch 1040/1567 | Loss: 1.0532 [2026-04-21 23:36:48] Validation | Batch 1050/1567 | Loss: 1.0522 [2026-04-21 23:36:49] Validation | Batch 1060/1567 | Loss: 1.0534 [2026-04-21 23:36:51] Validation | Batch 1070/1567 | Loss: 1.0532 [2026-04-21 23:36:52] Validation | Batch 1080/1567 | Loss: 1.0546 [2026-04-21 23:36:53] Validation | Batch 1090/1567 | Loss: 1.0571 [2026-04-21 23:36:54] Validation | Batch 1100/1567 | Loss: 1.0587 [2026-04-21 23:36:55] Validation | Batch 1110/1567 | Loss: 1.0577 [2026-04-21 23:36:56] Validation | Batch 1120/1567 | Loss: 1.0578 [2026-04-21 23:36:58] Validation | Batch 1130/1567 | Loss: 1.0561 [2026-04-21 23:36:59] Validation | Batch 1140/1567 | Loss: 1.0565 [2026-04-21 23:37:00] Validation | Batch 1150/1567 | Loss: 1.0552 [2026-04-21 23:37:01] Validation | Batch 1160/1567 | Loss: 1.0546 [2026-04-21 23:37:02] Validation | Batch 1170/1567 | Loss: 1.0549 [2026-04-21 23:37:03] Validation | Batch 1180/1567 | Loss: 1.0551 [2026-04-21 23:37:05] Validation | Batch 1190/1567 | Loss: 1.0553 [2026-04-21 23:37:06] Validation | Batch 1200/1567 | Loss: 1.0541 [2026-04-21 23:37:07] Validation | Batch 1210/1567 | Loss: 1.0534 [2026-04-21 23:37:08] Validation | Batch 1220/1567 | Loss: 1.0543 [2026-04-21 23:37:09] Validation | Batch 1230/1567 | Loss: 1.0548 [2026-04-21 23:37:10] Validation | Batch 1240/1567 | Loss: 1.0547 [2026-04-21 23:37:12] Validation | Batch 1250/1567 | Loss: 1.0550 [2026-04-21 23:37:13] Validation | Batch 1260/1567 | Loss: 1.0547 [2026-04-21 23:37:14] Validation | Batch 1270/1567 | Loss: 1.0530 [2026-04-21 23:37:15] Validation | Batch 1280/1567 | Loss: 1.0532 [2026-04-21 23:37:17] Validation | Batch 1290/1567 | Loss: 1.0533 [2026-04-21 23:37:18] Validation | Batch 1300/1567 | Loss: 1.0537 [2026-04-21 23:37:19] Validation | Batch 1310/1567 | Loss: 1.0544 [2026-04-21 23:37:21] Validation | Batch 1320/1567 | Loss: 1.0550 [2026-04-21 23:37:22] Validation | Batch 1330/1567 | Loss: 1.0564 [2026-04-21 23:37:23] Validation | Batch 1340/1567 | Loss: 1.0561 [2026-04-21 23:37:24] Validation | Batch 1350/1567 | Loss: 1.0564 [2026-04-21 23:37:25] Validation | Batch 1360/1567 | Loss: 1.0555 [2026-04-21 23:37:26] Validation | Batch 1370/1567 | Loss: 1.0552 [2026-04-21 23:37:27] Validation | Batch 1380/1567 | Loss: 1.0552 [2026-04-21 23:37:28] Validation | Batch 1390/1567 | Loss: 1.0544 [2026-04-21 23:37:29] Validation | Batch 1400/1567 | Loss: 1.0540 [2026-04-21 23:37:31] Validation | Batch 1410/1567 | Loss: 1.0546 [2026-04-21 23:37:32] Validation | Batch 1420/1567 | Loss: 1.0546 [2026-04-21 23:37:33] Validation | Batch 1430/1567 | Loss: 1.0549 [2026-04-21 23:37:34] Validation | Batch 1440/1567 | Loss: 1.0557 [2026-04-21 23:37:35] Validation | Batch 1450/1567 | Loss: 1.0558 [2026-04-21 23:37:36] Validation | Batch 1460/1567 | Loss: 1.0551 [2026-04-21 23:37:37] Validation | Batch 1470/1567 | Loss: 1.0550 [2026-04-21 23:37:38] Validation | Batch 1480/1567 | Loss: 1.0547 [2026-04-21 23:37:39] Validation | Batch 1490/1567 | Loss: 1.0542 [2026-04-21 23:37:40] Validation | Batch 1500/1567 | Loss: 1.0539 [2026-04-21 23:37:41] Validation | Batch 1510/1567 | Loss: 1.0530 [2026-04-21 23:37:42] Validation | Batch 1520/1567 | Loss: 1.0529 [2026-04-21 23:37:43] Validation | Batch 1530/1567 | Loss: 1.0529 [2026-04-21 23:37:44] Validation | Batch 1540/1567 | Loss: 1.0535 [2026-04-21 23:37:45] Validation | Batch 1550/1567 | Loss: 1.0548 [2026-04-21 23:37:47] Validation | Batch 1560/1567 | Loss: 1.0544 [2026-04-21 23:37:47] Validation | Batch 1567/1567 | Loss: 1.0544 [2026-04-21 23:37:47] Validation | Loss: 1.0544 | PPL: 2.92 | Time: 184.70s [2026-04-21 23:38:05] New best model saved! Val loss: 1.0544 [2026-04-21 23:38:10] Epoch 2 | Step 15010 | Loss: 0.9012 | LR: 5.18e-06 [2026-04-21 23:38:16] Epoch 2 | Step 15020 | Loss: 0.9011 | LR: 5.11e-06 [2026-04-21 23:38:22] Epoch 2 | Step 15030 | Loss: 0.9012 | LR: 5.05e-06 [2026-04-21 23:38:27] Epoch 2 | Step 15040 | Loss: 0.9012 | LR: 4.98e-06 [2026-04-21 23:38:32] Epoch 2 | Step 15050 | Loss: 0.9011 | LR: 4.91e-06 [2026-04-21 23:38:37] Epoch 2 | Step 15060 | Loss: 0.9014 | LR: 4.85e-06 [2026-04-21 23:38:43] Epoch 2 | Step 15070 | Loss: 0.9013 | LR: 4.78e-06 [2026-04-21 23:38:48] Epoch 2 | Step 15080 | Loss: 0.9013 | LR: 4.72e-06 [2026-04-21 23:38:54] Epoch 2 | Step 15090 | Loss: 0.9015 | LR: 4.66e-06 [2026-04-21 23:38:59] Epoch 2 | Step 15100 | Loss: 0.9015 | LR: 4.59e-06 [2026-04-21 23:39:04] Epoch 2 | Step 15110 | Loss: 0.9017 | LR: 4.53e-06 [2026-04-21 23:39:10] Epoch 2 | Step 15120 | Loss: 0.9020 | LR: 4.47e-06 [2026-04-21 23:39:15] Epoch 2 | Step 15130 | Loss: 0.9022 | LR: 4.41e-06 [2026-04-21 23:39:21] Epoch 2 | Step 15140 | Loss: 0.9025 | LR: 4.35e-06 [2026-04-21 23:39:27] Epoch 2 | Step 15150 | Loss: 0.9023 | LR: 4.29e-06 [2026-04-21 23:39:33] Epoch 2 | Step 15160 | Loss: 0.9020 | LR: 4.23e-06 [2026-04-21 23:39:38] Epoch 2 | Step 15170 | Loss: 0.9023 | LR: 4.17e-06 [2026-04-21 23:39:43] Epoch 2 | Step 15180 | Loss: 0.9020 | LR: 4.11e-06 [2026-04-21 23:39:48] Epoch 2 | Step 15190 | Loss: 0.9021 | LR: 4.06e-06 [2026-04-21 23:39:53] Epoch 2 | Step 15200 | Loss: 0.9022 | LR: 4.00e-06 [2026-04-21 23:39:58] Epoch 2 | Step 15210 | Loss: 0.9021 | LR: 3.94e-06 [2026-04-21 23:40:03] Epoch 2 | Step 15220 | Loss: 0.9018 | LR: 3.89e-06 [2026-04-21 23:40:08] Epoch 2 | Step 15230 | Loss: 0.9019 | LR: 3.84e-06 [2026-04-21 23:40:13] Epoch 2 | Step 15240 | Loss: 0.9019 | LR: 3.78e-06 [2026-04-21 23:40:19] Epoch 2 | Step 15250 | Loss: 0.9018 | LR: 3.73e-06 [2026-04-21 23:40:24] Epoch 2 | Step 15260 | Loss: 0.9017 | LR: 3.68e-06 [2026-04-21 23:40:29] Epoch 2 | Step 15270 | Loss: 0.9018 | LR: 3.63e-06 [2026-04-21 23:40:35] Epoch 2 | Step 15280 | Loss: 0.9016 | LR: 3.58e-06 [2026-04-21 23:40:41] Epoch 2 | Step 15290 | Loss: 0.9018 | LR: 3.53e-06 [2026-04-21 23:40:46] Epoch 2 | Step 15300 | Loss: 0.9017 | LR: 3.48e-06 [2026-04-21 23:40:52] Epoch 2 | Step 15310 | Loss: 0.9019 | LR: 3.43e-06 [2026-04-21 23:40:57] Epoch 2 | Step 15320 | Loss: 0.9020 | LR: 3.38e-06 [2026-04-21 23:41:02] Epoch 2 | Step 15330 | Loss: 0.9021 | LR: 3.33e-06 [2026-04-21 23:41:07] Epoch 2 | Step 15340 | Loss: 0.9022 | LR: 3.29e-06 [2026-04-21 23:41:13] Epoch 2 | Step 15350 | Loss: 0.9019 | LR: 3.24e-06 [2026-04-21 23:41:17] Epoch 2 | Step 15360 | Loss: 0.9019 | LR: 3.20e-06 [2026-04-21 23:41:22] Epoch 2 | Step 15370 | Loss: 0.9020 | LR: 3.15e-06 [2026-04-21 23:41:27] Epoch 2 | Step 15380 | Loss: 0.9022 | LR: 3.11e-06 [2026-04-21 23:41:33] Epoch 2 | Step 15390 | Loss: 0.9023 | LR: 3.07e-06 [2026-04-21 23:41:38] Epoch 2 | Step 15400 | Loss: 0.9023 | LR: 3.03e-06 [2026-04-21 23:41:44] Epoch 2 | Step 15410 | Loss: 0.9024 | LR: 2.98e-06 [2026-04-21 23:41:49] Epoch 2 | Step 15420 | Loss: 0.9021 | LR: 2.94e-06 [2026-04-21 23:41:54] Epoch 2 | Step 15430 | Loss: 0.9021 | LR: 2.91e-06 [2026-04-21 23:41:59] Epoch 2 | Step 15440 | Loss: 0.9023 | LR: 2.87e-06 [2026-04-21 23:42:05] Epoch 2 | Step 15450 | Loss: 0.9021 | LR: 2.83e-06 [2026-04-21 23:42:10] Epoch 2 | Step 15460 | Loss: 0.9022 | LR: 2.79e-06 [2026-04-21 23:42:16] Epoch 2 | Step 15470 | Loss: 0.9023 | LR: 2.76e-06 [2026-04-21 23:42:21] Epoch 2 | Step 15480 | Loss: 0.9019 | LR: 2.72e-06 [2026-04-21 23:42:27] Epoch 2 | Step 15490 | Loss: 0.9018 | LR: 2.69e-06 [2026-04-21 23:42:32] Epoch 2 | Step 15500 | Loss: 0.9019 | LR: 2.65e-06 [2026-04-21 23:42:37] Epoch 2 | Step 15510 | Loss: 0.9018 | LR: 2.62e-06 [2026-04-21 23:42:43] Epoch 2 | Step 15520 | Loss: 0.9018 | LR: 2.59e-06 [2026-04-21 23:42:48] Epoch 2 | Step 15530 | Loss: 0.9016 | LR: 2.56e-06 [2026-04-21 23:42:54] Epoch 2 | Step 15540 | Loss: 0.9016 | LR: 2.53e-06 [2026-04-21 23:42:59] Epoch 2 | Step 15550 | Loss: 0.9015 | LR: 2.50e-06 [2026-04-21 23:43:05] Epoch 2 | Step 15560 | Loss: 0.9014 | LR: 2.47e-06 [2026-04-21 23:43:10] Epoch 2 | Step 15570 | Loss: 0.9012 | LR: 2.44e-06 [2026-04-21 23:43:15] Epoch 2 | Step 15580 | Loss: 0.9016 | LR: 2.41e-06 [2026-04-21 23:43:21] Epoch 2 | Step 15590 | Loss: 0.9013 | LR: 2.39e-06 [2026-04-21 23:43:26] Epoch 2 | Step 15600 | Loss: 0.9015 | LR: 2.36e-06 [2026-04-21 23:43:31] Epoch 2 | Step 15610 | Loss: 0.9012 | LR: 2.34e-06 [2026-04-21 23:43:36] Epoch 2 | Step 15620 | Loss: 0.9011 | LR: 2.31e-06 [2026-04-21 23:43:42] Epoch 2 | Step 15630 | Loss: 0.9009 | LR: 2.29e-06 [2026-04-21 23:43:48] Epoch 2 | Step 15640 | Loss: 0.9010 | LR: 2.27e-06 [2026-04-21 23:43:52] Epoch 2 | Step 15650 | Loss: 0.9011 | LR: 2.25e-06 [2026-04-21 23:43:58] Epoch 2 | Step 15660 | Loss: 0.9011 | LR: 2.23e-06 [2026-04-21 23:44:03] Epoch 2 | Step 15670 | Loss: 0.9008 | LR: 2.21e-06 [2026-04-21 23:44:08] Epoch 2 | Step 15680 | Loss: 0.9006 | LR: 2.19e-06 [2026-04-21 23:44:13] Epoch 2 | Step 15690 | Loss: 0.9007 | LR: 2.17e-06 [2026-04-21 23:44:19] Epoch 2 | Step 15700 | Loss: 0.9006 | LR: 2.15e-06 [2026-04-21 23:44:24] Epoch 2 | Step 15710 | Loss: 0.9006 | LR: 2.14e-06 [2026-04-21 23:44:30] Epoch 2 | Step 15720 | Loss: 0.9006 | LR: 2.12e-06 [2026-04-21 23:44:36] Epoch 2 | Step 15730 | Loss: 0.9008 | LR: 2.11e-06 [2026-04-21 23:44:41] Epoch 2 | Step 15740 | Loss: 0.9012 | LR: 2.10e-06 [2026-04-21 23:44:46] Epoch 2 | Step 15750 | Loss: 0.9011 | LR: 2.08e-06 [2026-04-21 23:44:51] Epoch 2 | Step 15760 | Loss: 0.9009 | LR: 2.07e-06 [2026-04-21 23:44:56] Epoch 2 | Step 15770 | Loss: 0.9008 | LR: 2.06e-06 [2026-04-21 23:45:02] Epoch 2 | Step 15780 | Loss: 0.9008 | LR: 2.05e-06 [2026-04-21 23:45:07] Epoch 2 | Step 15790 | Loss: 0.9009 | LR: 2.04e-06 [2026-04-21 23:45:12] Epoch 2 | Step 15800 | Loss: 0.9008 | LR: 2.03e-06 [2026-04-21 23:45:18] Epoch 2 | Step 15810 | Loss: 0.9008 | LR: 2.03e-06 [2026-04-21 23:45:23] Epoch 2 | Step 15820 | Loss: 0.9006 | LR: 2.02e-06 [2026-04-21 23:45:28] Epoch 2 | Step 15830 | Loss: 0.9005 | LR: 2.01e-06 [2026-04-21 23:45:33] Epoch 2 | Step 15840 | Loss: 0.9006 | LR: 2.01e-06 [2026-04-21 23:45:38] Epoch 2 | Step 15850 | Loss: 0.9005 | LR: 2.01e-06 [2026-04-21 23:45:44] Epoch 2 | Step 15860 | Loss: 0.9006 | LR: 2.00e-06 [2026-04-21 23:45:49] Epoch 2 | Step 15870 | Loss: 0.9005 | LR: 2.00e-06 [2026-04-21 23:45:54] Epoch 2 | Step 15880 | Loss: 0.9003 | LR: 2.00e-06 [2026-04-21 23:46:00] Epoch 2 | Step 15890 | Loss: 0.9004 | LR: 2.00e-06 [2026-04-21 23:46:05] Epoch 2 | Step 15900 | Loss: 0.9005 | LR: 2.00e-06 [2026-04-21 23:46:10] Epoch 2 | Step 15910 | Loss: 0.9003 | LR: 2.00e-06 [2026-04-21 23:46:16] Epoch 2 | Step 15920 | Loss: 0.9003 | LR: 2.00e-06 [2026-04-21 23:46:22] Epoch 2 | Step 15930 | Loss: 0.9004 | LR: 2.00e-06 [2026-04-21 23:46:28] Epoch 2 | Step 15940 | Loss: 0.9003 | LR: 2.00e-06 [2026-04-21 23:46:32] Epoch 2 | Step 15950 | Loss: 0.9003 | LR: 2.00e-06 [2026-04-21 23:46:38] Epoch 2 | Step 15960 | Loss: 0.9001 | LR: 2.00e-06 [2026-04-21 23:46:43] Epoch 2 | Step 15970 | Loss: 0.9001 | LR: 2.00e-06 [2026-04-21 23:46:50] Epoch 2 | Step 15980 | Loss: 0.9003 | LR: 2.00e-06 [2026-04-21 23:46:54] Epoch 2 | Step 15990 | Loss: 0.9003 | LR: 2.00e-06 [2026-04-21 23:47:00] Epoch 2 | Step 16000 | Loss: 0.9002 | LR: 2.00e-06 [2026-04-21 23:47:01] Validation | Batch 10/1567 | Loss: 1.0403 [2026-04-21 23:47:02] Validation | Batch 20/1567 | Loss: 1.1253 [2026-04-21 23:47:03] Validation | Batch 30/1567 | Loss: 1.0820 [2026-04-21 23:47:05] Validation | Batch 40/1567 | Loss: 1.1027 [2026-04-21 23:47:06] Validation | Batch 50/1567 | Loss: 1.0787 [2026-04-21 23:47:07] Validation | Batch 60/1567 | Loss: 1.0670 [2026-04-21 23:47:08] Validation | Batch 70/1567 | Loss: 1.0582 [2026-04-21 23:47:10] Validation | Batch 80/1567 | Loss: 1.0663 [2026-04-21 23:47:11] Validation | Batch 90/1567 | Loss: 1.0608 [2026-04-21 23:47:12] Validation | Batch 100/1567 | Loss: 1.0398 [2026-04-21 23:47:13] Validation | Batch 110/1567 | Loss: 1.0303 [2026-04-21 23:47:15] Validation | Batch 120/1567 | Loss: 1.0250 [2026-04-21 23:47:16] Validation | Batch 130/1567 | Loss: 1.0197 [2026-04-21 23:47:17] Validation | Batch 140/1567 | Loss: 1.0299 [2026-04-21 23:47:18] Validation | Batch 150/1567 | Loss: 1.0408 [2026-04-21 23:47:19] Validation | Batch 160/1567 | Loss: 1.0397 [2026-04-21 23:47:20] Validation | Batch 170/1567 | Loss: 1.0319 [2026-04-21 23:47:21] Validation | Batch 180/1567 | Loss: 1.0348 [2026-04-21 23:47:22] Validation | Batch 190/1567 | Loss: 1.0393 [2026-04-21 23:47:24] Validation | Batch 200/1567 | Loss: 1.0428 [2026-04-21 23:47:25] Validation | Batch 210/1567 | Loss: 1.0408 [2026-04-21 23:47:26] Validation | Batch 220/1567 | Loss: 1.0461 [2026-04-21 23:47:28] Validation | Batch 230/1567 | Loss: 1.0498 [2026-04-21 23:47:29] Validation | Batch 240/1567 | Loss: 1.0517 [2026-04-21 23:47:30] Validation | Batch 250/1567 | Loss: 1.0559 [2026-04-21 23:47:31] Validation | Batch 260/1567 | Loss: 1.0589 [2026-04-21 23:47:32] Validation | Batch 270/1567 | Loss: 1.0634 [2026-04-21 23:47:34] Validation | Batch 280/1567 | Loss: 1.0668 [2026-04-21 23:47:36] Validation | Batch 290/1567 | Loss: 1.0621 [2026-04-21 23:47:37] Validation | Batch 300/1567 | Loss: 1.0615 [2026-04-21 23:47:38] Validation | Batch 310/1567 | Loss: 1.0581 [2026-04-21 23:47:39] Validation | Batch 320/1567 | Loss: 1.0610 [2026-04-21 23:47:41] Validation | Batch 330/1567 | Loss: 1.0611 [2026-04-21 23:47:42] Validation | Batch 340/1567 | Loss: 1.0603 [2026-04-21 23:47:43] Validation | Batch 350/1567 | Loss: 1.0578 [2026-04-21 23:47:44] Validation | Batch 360/1567 | Loss: 1.0517 [2026-04-21 23:47:46] Validation | Batch 370/1567 | Loss: 1.0517 [2026-04-21 23:47:47] Validation | Batch 380/1567 | Loss: 1.0558 [2026-04-21 23:47:48] Validation | Batch 390/1567 | Loss: 1.0549 [2026-04-21 23:47:49] Validation | Batch 400/1567 | Loss: 1.0556 [2026-04-21 23:47:51] Validation | Batch 410/1567 | Loss: 1.0517 [2026-04-21 23:47:52] Validation | Batch 420/1567 | Loss: 1.0501 [2026-04-21 23:47:53] Validation | Batch 430/1567 | Loss: 1.0527 [2026-04-21 23:47:54] Validation | Batch 440/1567 | Loss: 1.0528 [2026-04-21 23:47:55] Validation | Batch 450/1567 | Loss: 1.0548 [2026-04-21 23:47:57] Validation | Batch 460/1567 | Loss: 1.0574 [2026-04-21 23:47:57] Validation | Batch 470/1567 | Loss: 1.0623 [2026-04-21 23:47:59] Validation | Batch 480/1567 | Loss: 1.0599 [2026-04-21 23:48:00] Validation | Batch 490/1567 | Loss: 1.0576 [2026-04-21 23:48:01] Validation | Batch 500/1567 | Loss: 1.0587 [2026-04-21 23:48:02] Validation | Batch 510/1567 | Loss: 1.0585 [2026-04-21 23:48:03] Validation | Batch 520/1567 | Loss: 1.0599 [2026-04-21 23:48:04] Validation | Batch 530/1567 | Loss: 1.0584 [2026-04-21 23:48:06] Validation | Batch 540/1567 | Loss: 1.0557 [2026-04-21 23:48:07] Validation | Batch 550/1567 | Loss: 1.0568 [2026-04-21 23:48:08] Validation | Batch 560/1567 | Loss: 1.0560 [2026-04-21 23:48:10] Validation | Batch 570/1567 | Loss: 1.0518 [2026-04-21 23:48:11] Validation | Batch 580/1567 | Loss: 1.0537 [2026-04-21 23:48:12] Validation | Batch 590/1567 | Loss: 1.0535 [2026-04-21 23:48:13] Validation | Batch 600/1567 | Loss: 1.0524 [2026-04-21 23:48:15] Validation | Batch 610/1567 | Loss: 1.0544 [2026-04-21 23:48:16] Validation | Batch 620/1567 | Loss: 1.0523 [2026-04-21 23:48:17] Validation | Batch 630/1567 | Loss: 1.0526 [2026-04-21 23:48:19] Validation | Batch 640/1567 | Loss: 1.0532 [2026-04-21 23:48:20] Validation | Batch 650/1567 | Loss: 1.0561 [2026-04-21 23:48:21] Validation | Batch 660/1567 | Loss: 1.0574 [2026-04-21 23:48:22] Validation | Batch 670/1567 | Loss: 1.0556 [2026-04-21 23:48:23] Validation | Batch 680/1567 | Loss: 1.0544 [2026-04-21 23:48:24] Validation | Batch 690/1567 | Loss: 1.0530 [2026-04-21 23:48:26] Validation | Batch 700/1567 | Loss: 1.0530 [2026-04-21 23:48:27] Validation | Batch 710/1567 | Loss: 1.0522 [2026-04-21 23:48:28] Validation | Batch 720/1567 | Loss: 1.0491 [2026-04-21 23:48:29] Validation | Batch 730/1567 | Loss: 1.0496 [2026-04-21 23:48:30] Validation | Batch 740/1567 | Loss: 1.0502 [2026-04-21 23:48:31] Validation | Batch 750/1567 | Loss: 1.0499 [2026-04-21 23:48:32] Validation | Batch 760/1567 | Loss: 1.0512 [2026-04-21 23:48:34] Validation | Batch 770/1567 | Loss: 1.0507 [2026-04-21 23:48:35] Validation | Batch 780/1567 | Loss: 1.0517 [2026-04-21 23:48:36] Validation | Batch 790/1567 | Loss: 1.0502 [2026-04-21 23:48:37] Validation | Batch 800/1567 | Loss: 1.0485 [2026-04-21 23:48:38] Validation | Batch 810/1567 | Loss: 1.0491 [2026-04-21 23:48:39] Validation | Batch 820/1567 | Loss: 1.0483 [2026-04-21 23:48:41] Validation | Batch 830/1567 | Loss: 1.0475 [2026-04-21 23:48:41] Validation | Batch 840/1567 | Loss: 1.0482 [2026-04-21 23:48:43] Validation | Batch 850/1567 | Loss: 1.0493 [2026-04-21 23:48:43] Validation | Batch 860/1567 | Loss: 1.0500 [2026-04-21 23:48:44] Validation | Batch 870/1567 | Loss: 1.0508 [2026-04-21 23:48:46] Validation | Batch 880/1567 | Loss: 1.0507 [2026-04-21 23:48:47] Validation | Batch 890/1567 | Loss: 1.0503 [2026-04-21 23:48:48] Validation | Batch 900/1567 | Loss: 1.0499 [2026-04-21 23:48:50] Validation | Batch 910/1567 | Loss: 1.0497 [2026-04-21 23:48:51] Validation | Batch 920/1567 | Loss: 1.0515 [2026-04-21 23:48:52] Validation | Batch 930/1567 | Loss: 1.0514 [2026-04-21 23:48:53] Validation | Batch 940/1567 | Loss: 1.0514 [2026-04-21 23:48:54] Validation | Batch 950/1567 | Loss: 1.0509 [2026-04-21 23:48:55] Validation | Batch 960/1567 | Loss: 1.0512 [2026-04-21 23:48:56] Validation | Batch 970/1567 | Loss: 1.0518 [2026-04-21 23:48:57] Validation | Batch 980/1567 | Loss: 1.0514 [2026-04-21 23:48:58] Validation | Batch 990/1567 | Loss: 1.0524 [2026-04-21 23:48:59] Validation | Batch 1000/1567 | Loss: 1.0527 [2026-04-21 23:49:00] Validation | Batch 1010/1567 | Loss: 1.0519 [2026-04-21 23:49:01] Validation | Batch 1020/1567 | Loss: 1.0531 [2026-04-21 23:49:02] Validation | Batch 1030/1567 | Loss: 1.0535 [2026-04-21 23:49:04] Validation | Batch 1040/1567 | Loss: 1.0527 [2026-04-21 23:49:05] Validation | Batch 1050/1567 | Loss: 1.0517 [2026-04-21 23:49:06] Validation | Batch 1060/1567 | Loss: 1.0529 [2026-04-21 23:49:08] Validation | Batch 1070/1567 | Loss: 1.0527 [2026-04-21 23:49:09] Validation | Batch 1080/1567 | Loss: 1.0540 [2026-04-21 23:49:10] Validation | Batch 1090/1567 | Loss: 1.0566 [2026-04-21 23:49:11] Validation | Batch 1100/1567 | Loss: 1.0582 [2026-04-21 23:49:12] Validation | Batch 1110/1567 | Loss: 1.0572 [2026-04-21 23:49:13] Validation | Batch 1120/1567 | Loss: 1.0573 [2026-04-21 23:49:15] Validation | Batch 1130/1567 | Loss: 1.0556 [2026-04-21 23:49:16] Validation | Batch 1140/1567 | Loss: 1.0560 [2026-04-21 23:49:17] Validation | Batch 1150/1567 | Loss: 1.0547 [2026-04-21 23:49:18] Validation | Batch 1160/1567 | Loss: 1.0541 [2026-04-21 23:49:19] Validation | Batch 1170/1567 | Loss: 1.0543 [2026-04-21 23:49:20] Validation | Batch 1180/1567 | Loss: 1.0546 [2026-04-21 23:49:22] Validation | Batch 1190/1567 | Loss: 1.0548 [2026-04-21 23:49:23] Validation | Batch 1200/1567 | Loss: 1.0536 [2026-04-21 23:49:24] Validation | Batch 1210/1567 | Loss: 1.0529 [2026-04-21 23:49:25] Validation | Batch 1220/1567 | Loss: 1.0538 [2026-04-21 23:49:26] Validation | Batch 1230/1567 | Loss: 1.0543 [2026-04-21 23:49:27] Validation | Batch 1240/1567 | Loss: 1.0541 [2026-04-21 23:49:28] Validation | Batch 1250/1567 | Loss: 1.0544 [2026-04-21 23:49:30] Validation | Batch 1260/1567 | Loss: 1.0542 [2026-04-21 23:49:31] Validation | Batch 1270/1567 | Loss: 1.0525 [2026-04-21 23:49:32] Validation | Batch 1280/1567 | Loss: 1.0526 [2026-04-21 23:49:34] Validation | Batch 1290/1567 | Loss: 1.0528 [2026-04-21 23:49:35] Validation | Batch 1300/1567 | Loss: 1.0531 [2026-04-21 23:49:36] Validation | Batch 1310/1567 | Loss: 1.0539 [2026-04-21 23:49:38] Validation | Batch 1320/1567 | Loss: 1.0544 [2026-04-21 23:49:39] Validation | Batch 1330/1567 | Loss: 1.0559 [2026-04-21 23:49:40] Validation | Batch 1340/1567 | Loss: 1.0556 [2026-04-21 23:49:41] Validation | Batch 1350/1567 | Loss: 1.0559 [2026-04-21 23:49:42] Validation | Batch 1360/1567 | Loss: 1.0550 [2026-04-21 23:49:43] Validation | Batch 1370/1567 | Loss: 1.0546 [2026-04-21 23:49:44] Validation | Batch 1380/1567 | Loss: 1.0547 [2026-04-21 23:49:45] Validation | Batch 1390/1567 | Loss: 1.0539 [2026-04-21 23:49:46] Validation | Batch 1400/1567 | Loss: 1.0535 [2026-04-21 23:49:47] Validation | Batch 1410/1567 | Loss: 1.0541 [2026-04-21 23:49:49] Validation | Batch 1420/1567 | Loss: 1.0541 [2026-04-21 23:49:50] Validation | Batch 1430/1567 | Loss: 1.0544 [2026-04-21 23:49:51] Validation | Batch 1440/1567 | Loss: 1.0552 [2026-04-21 23:49:52] Validation | Batch 1450/1567 | Loss: 1.0553 [2026-04-21 23:49:53] Validation | Batch 1460/1567 | Loss: 1.0546 [2026-04-21 23:49:54] Validation | Batch 1470/1567 | Loss: 1.0545 [2026-04-21 23:49:55] Validation | Batch 1480/1567 | Loss: 1.0542 [2026-04-21 23:49:56] Validation | Batch 1490/1567 | Loss: 1.0537 [2026-04-21 23:49:57] Validation | Batch 1500/1567 | Loss: 1.0534 [2026-04-21 23:49:58] Validation | Batch 1510/1567 | Loss: 1.0525 [2026-04-21 23:49:59] Validation | Batch 1520/1567 | Loss: 1.0524 [2026-04-21 23:50:00] Validation | Batch 1530/1567 | Loss: 1.0524 [2026-04-21 23:50:01] Validation | Batch 1540/1567 | Loss: 1.0530 [2026-04-21 23:50:02] Validation | Batch 1550/1567 | Loss: 1.0543 [2026-04-21 23:50:03] Validation | Batch 1560/1567 | Loss: 1.0539 [2026-04-21 23:50:04] Validation | Batch 1567/1567 | Loss: 1.0540 [2026-04-21 23:50:04] Validation | Loss: 1.0540 | PPL: 2.92 | Time: 184.73s [2026-04-21 23:50:22] New best model saved! Val loss: 1.0540 [2026-04-21 23:50:27] Epoch 2 | Step 16010 | Loss: 0.9005 | LR: 2.00e-06 [2026-04-21 23:50:33] Epoch 2 | Step 16020 | Loss: 0.9003 | LR: 2.00e-06 [2026-04-21 23:50:38] Epoch 2 | Step 16030 | Loss: 0.9003 | LR: 2.00e-06 [2026-04-21 23:50:43] Epoch 2 | Step 16040 | Loss: 0.9003 | LR: 2.00e-06 [2026-04-21 23:50:48] Epoch 2 | Step 16050 | Loss: 0.9003 | LR: 2.00e-06 [2026-04-21 23:50:54] Epoch 2 | Step 16060 | Loss: 0.9003 | LR: 2.00e-06 [2026-04-21 23:50:59] Epoch 2 | Step 16070 | Loss: 0.9003 | LR: 2.00e-06 [2026-04-21 23:51:04] Epoch 2 | Step 16080 | Loss: 0.9004 | LR: 2.00e-06 [2026-04-21 23:51:10] Epoch 2 | Step 16090 | Loss: 0.9002 | LR: 2.00e-06 [2026-04-21 23:51:15] Epoch 2 | Step 16100 | Loss: 0.9002 | LR: 2.00e-06 [2026-04-21 23:51:20] Epoch 2 | Step 16110 | Loss: 0.9001 | LR: 2.00e-06 [2026-04-21 23:51:27] Epoch 2 | Step 16120 | Loss: 0.9001 | LR: 2.00e-06 [2026-04-21 23:51:32] Epoch 2 | Step 16130 | Loss: 0.9000 | LR: 2.00e-06 [2026-04-21 23:51:38] Epoch 2 | Step 16140 | Loss: 0.8999 | LR: 2.00e-06 [2026-04-21 23:51:43] Epoch 2 | Step 16150 | Loss: 0.8999 | LR: 2.00e-06 [2026-04-21 23:51:49] Epoch 2 | Step 16160 | Loss: 0.8999 | LR: 2.00e-06 [2026-04-21 23:51:54] Epoch 2 | Step 16170 | Loss: 0.8997 | LR: 2.00e-06 [2026-04-21 23:51:59] Epoch 2 | Step 16180 | Loss: 0.8998 | LR: 2.00e-06 [2026-04-21 23:52:04] Epoch 2 | Step 16190 | Loss: 0.8999 | LR: 2.00e-06 [2026-04-21 23:52:08] Epoch 2 | Step 16200 | Loss: 0.8998 | LR: 2.00e-06 [2026-04-21 23:52:13] Epoch 2 | Step 16210 | Loss: 0.8996 | LR: 2.00e-06 [2026-04-21 23:52:18] Epoch 2 | Step 16220 | Loss: 0.8997 | LR: 2.00e-06 [2026-04-21 23:52:24] Epoch 2 | Step 16230 | Loss: 0.8998 | LR: 2.00e-06 [2026-04-21 23:52:29] Epoch 2 | Step 16240 | Loss: 0.8997 | LR: 2.00e-06 [2026-04-21 23:52:36] Epoch 2 | Step 16250 | Loss: 0.8997 | LR: 2.00e-06 [2026-04-21 23:52:42] Epoch 2 | Step 16260 | Loss: 0.8997 | LR: 2.00e-06 [2026-04-21 23:52:47] Epoch 2 | Step 16270 | Loss: 0.8997 | LR: 2.00e-06 [2026-04-21 23:52:52] Epoch 2 | Step 16280 | Loss: 0.8993 | LR: 2.00e-06 [2026-04-21 23:52:57] Epoch 2 | Step 16290 | Loss: 0.8992 | LR: 2.00e-06 [2026-04-21 23:53:02] Epoch 2 | Step 16300 | Loss: 0.8994 | LR: 2.00e-06 [2026-04-21 23:53:08] Epoch 2 | Step 16310 | Loss: 0.8996 | LR: 2.00e-06 [2026-04-21 23:53:13] Epoch 2 | Step 16320 | Loss: 0.8997 | LR: 2.00e-06 [2026-04-21 23:53:19] Epoch 2 | Step 16330 | Loss: 0.8996 | LR: 2.00e-06 [2026-04-21 23:53:24] Epoch 2 | Step 16340 | Loss: 0.8997 | LR: 2.00e-06 [2026-04-21 23:53:29] Epoch 2 | Step 16350 | Loss: 0.8998 | LR: 2.00e-06 [2026-04-21 23:53:34] Epoch 2 | Step 16360 | Loss: 0.8997 | LR: 2.00e-06 [2026-04-21 23:53:40] Epoch 2 | Step 16370 | Loss: 0.8996 | LR: 2.00e-06 [2026-04-21 23:53:45] Epoch 2 | Step 16380 | Loss: 0.8996 | LR: 2.00e-06 [2026-04-21 23:53:50] Epoch 2 | Step 16390 | Loss: 0.8996 | LR: 2.00e-06 [2026-04-21 23:53:55] Epoch 2 | Step 16400 | Loss: 0.8999 | LR: 2.00e-06 [2026-04-21 23:54:00] Epoch 2 | Step 16410 | Loss: 0.8998 | LR: 2.00e-06 [2026-04-21 23:54:05] Epoch 2 | Step 16420 | Loss: 0.8999 | LR: 2.00e-06 [2026-04-21 23:54:10] Epoch 2 | Step 16430 | Loss: 0.8998 | LR: 2.00e-06 [2026-04-21 23:54:15] Epoch 2 | Step 16440 | Loss: 0.8998 | LR: 2.00e-06 [2026-04-21 23:54:21] Epoch 2 | Step 16450 | Loss: 0.8998 | LR: 2.00e-06 [2026-04-21 23:54:26] Epoch 2 | Step 16460 | Loss: 0.8998 | LR: 2.00e-06 [2026-04-21 23:54:31] Epoch 2 | Step 16470 | Loss: 0.8998 | LR: 2.00e-06 [2026-04-21 23:54:37] Epoch 2 | Step 16480 | Loss: 0.8996 | LR: 2.00e-06 [2026-04-21 23:54:43] Epoch 2 | Step 16490 | Loss: 0.8998 | LR: 2.00e-06 [2026-04-21 23:54:48] Epoch 2 | Step 16500 | Loss: 0.8997 | LR: 2.00e-06 [2026-04-21 23:54:54] Epoch 2 | Step 16510 | Loss: 0.8997 | LR: 2.00e-06 [2026-04-21 23:54:59] Epoch 2 | Step 16520 | Loss: 0.8996 | LR: 2.00e-06 [2026-04-21 23:55:05] Epoch 2 | Step 16530 | Loss: 0.8996 | LR: 2.00e-06 [2026-04-21 23:55:09] Epoch 2 | Step 16540 | Loss: 0.8998 | LR: 2.00e-06 [2026-04-21 23:55:14] Epoch 2 | Step 16550 | Loss: 0.8997 | LR: 2.00e-06 [2026-04-21 23:55:20] Epoch 2 | Step 16560 | Loss: 0.8997 | LR: 2.00e-06 [2026-04-21 23:55:26] Epoch 2 | Step 16570 | Loss: 0.8998 | LR: 2.00e-06 [2026-04-21 23:55:31] Epoch 2 | Step 16580 | Loss: 0.9000 | LR: 2.00e-06 [2026-04-21 23:55:36] Epoch 2 | Step 16590 | Loss: 0.9003 | LR: 2.00e-06 [2026-04-21 23:55:42] Epoch 2 | Step 16600 | Loss: 0.9002 | LR: 2.00e-06 [2026-04-21 23:55:46] Epoch 2 | Step 16610 | Loss: 0.9000 | LR: 2.00e-06 [2026-04-21 23:55:52] Epoch 2 | Step 16620 | Loss: 0.9000 | LR: 2.00e-06 [2026-04-21 23:55:57] Epoch 2 | Step 16630 | Loss: 0.8999 | LR: 2.00e-06 [2026-04-21 23:56:03] Epoch 2 | Step 16640 | Loss: 0.8998 | LR: 2.00e-06 [2026-04-21 23:56:09] Epoch 2 | Step 16650 | Loss: 0.8998 | LR: 2.00e-06 [2026-04-21 23:56:13] Epoch 2 | Step 16660 | Loss: 0.9000 | LR: 2.00e-06 [2026-04-21 23:56:19] Epoch 2 | Step 16670 | Loss: 0.8999 | LR: 2.00e-06 [2026-04-21 23:56:24] Epoch 2 | Step 16680 | Loss: 0.9001 | LR: 2.00e-06 [2026-04-21 23:56:30] Epoch 2 | Step 16690 | Loss: 0.9000 | LR: 2.00e-06 [2026-04-21 23:56:34] Epoch 2 | Step 16700 | Loss: 0.9001 | LR: 2.00e-06 [2026-04-21 23:56:39] Epoch 2 | Step 16710 | Loss: 0.9000 | LR: 2.00e-06 [2026-04-21 23:56:44] Epoch 2 | Step 16720 | Loss: 0.9000 | LR: 2.00e-06 [2026-04-21 23:56:49] Epoch 2 | Step 16730 | Loss: 0.9000 | LR: 2.00e-06 [2026-04-21 23:56:54] Epoch 2 | Step 16740 | Loss: 0.9000 | LR: 2.00e-06 [2026-04-21 23:56:59] Epoch 2 | Step 16750 | Loss: 0.9002 | LR: 2.00e-06 [2026-04-21 23:57:05] Epoch 2 | Step 16760 | Loss: 0.9002 | LR: 2.00e-06 [2026-04-21 23:57:11] Epoch 2 | Step 16770 | Loss: 0.9002 | LR: 2.00e-06 [2026-04-21 23:57:17] Epoch 2 | Step 16780 | Loss: 0.9001 | LR: 2.00e-06 [2026-04-21 23:57:22] Epoch 2 | Step 16790 | Loss: 0.9002 | LR: 2.00e-06 [2026-04-21 23:57:27] Epoch 2 | Step 16800 | Loss: 0.9000 | LR: 2.00e-06 [2026-04-21 23:57:32] Epoch 2 | Step 16810 | Loss: 0.9000 | LR: 2.00e-06 [2026-04-21 23:57:37] Epoch 2 | Step 16820 | Loss: 0.9000 | LR: 2.00e-06 [2026-04-21 23:57:42] Epoch 2 | Step 16830 | Loss: 0.9000 | LR: 2.00e-06 [2026-04-21 23:57:47] Epoch 2 | Step 16840 | Loss: 0.9000 | LR: 2.00e-06 [2026-04-21 23:57:52] Epoch 2 | Step 16850 | Loss: 0.9002 | LR: 2.00e-06 [2026-04-21 23:57:58] Epoch 2 | Step 16860 | Loss: 0.9003 | LR: 2.00e-06 [2026-04-21 23:58:04] Epoch 2 | Step 16870 | Loss: 0.9001 | LR: 2.00e-06 [2026-04-21 23:58:10] Epoch 2 | Step 16880 | Loss: 0.8999 | LR: 2.00e-06 [2026-04-21 23:58:15] Epoch 2 | Step 16890 | Loss: 0.8998 | LR: 2.00e-06 [2026-04-21 23:58:20] Epoch 2 | Step 16900 | Loss: 0.8997 | LR: 2.00e-06 [2026-04-21 23:58:25] Epoch 2 | Step 16910 | Loss: 0.8997 | LR: 2.00e-06 [2026-04-21 23:58:31] Epoch 2 | Step 16920 | Loss: 0.8995 | LR: 2.00e-06 [2026-04-21 23:58:37] Epoch 2 | Step 16930 | Loss: 0.8995 | LR: 2.00e-06 [2026-04-21 23:58:43] Epoch 2 | Step 16940 | Loss: 0.8995 | LR: 2.00e-06 [2026-04-21 23:58:50] Epoch 2 | Step 16950 | Loss: 0.8996 | LR: 2.00e-06 [2026-04-21 23:58:55] Epoch 2 | Step 16960 | Loss: 0.8994 | LR: 2.00e-06 [2026-04-21 23:59:00] Epoch 2 | Step 16970 | Loss: 0.8994 | LR: 2.00e-06 [2026-04-21 23:59:06] Epoch 2 | Step 16980 | Loss: 0.8994 | LR: 2.00e-06 [2026-04-21 23:59:11] Epoch 2 | Step 16990 | Loss: 0.8993 | LR: 2.00e-06 [2026-04-21 23:59:16] Epoch 2 | Step 17000 | Loss: 0.8995 | LR: 2.00e-06 [2026-04-21 23:59:17] Validation | Batch 10/1567 | Loss: 1.0396 [2026-04-21 23:59:19] Validation | Batch 20/1567 | Loss: 1.1242 [2026-04-21 23:59:20] Validation | Batch 30/1567 | Loss: 1.0810 [2026-04-21 23:59:21] Validation | Batch 40/1567 | Loss: 1.1016 [2026-04-21 23:59:22] Validation | Batch 50/1567 | Loss: 1.0779 [2026-04-21 23:59:23] Validation | Batch 60/1567 | Loss: 1.0663 [2026-04-21 23:59:25] Validation | Batch 70/1567 | Loss: 1.0575 [2026-04-21 23:59:27] Validation | Batch 80/1567 | Loss: 1.0657 [2026-04-21 23:59:28] Validation | Batch 90/1567 | Loss: 1.0603 [2026-04-21 23:59:29] Validation | Batch 100/1567 | Loss: 1.0393 [2026-04-21 23:59:30] Validation | Batch 110/1567 | Loss: 1.0297 [2026-04-21 23:59:31] Validation | Batch 120/1567 | Loss: 1.0245 [2026-04-21 23:59:33] Validation | Batch 130/1567 | Loss: 1.0192 [2026-04-21 23:59:34] Validation | Batch 140/1567 | Loss: 1.0295 [2026-04-21 23:59:35] Validation | Batch 150/1567 | Loss: 1.0403 [2026-04-21 23:59:36] Validation | Batch 160/1567 | Loss: 1.0391 [2026-04-21 23:59:37] Validation | Batch 170/1567 | Loss: 1.0313 [2026-04-21 23:59:38] Validation | Batch 180/1567 | Loss: 1.0341 [2026-04-21 23:59:39] Validation | Batch 190/1567 | Loss: 1.0387 [2026-04-21 23:59:40] Validation | Batch 200/1567 | Loss: 1.0421 [2026-04-21 23:59:42] Validation | Batch 210/1567 | Loss: 1.0401 [2026-04-21 23:59:43] Validation | Batch 220/1567 | Loss: 1.0454 [2026-04-21 23:59:44] Validation | Batch 230/1567 | Loss: 1.0491 [2026-04-21 23:59:46] Validation | Batch 240/1567 | Loss: 1.0510 [2026-04-21 23:59:47] Validation | Batch 250/1567 | Loss: 1.0552 [2026-04-21 23:59:48] Validation | Batch 260/1567 | Loss: 1.0581 [2026-04-21 23:59:49] Validation | Batch 270/1567 | Loss: 1.0626 [2026-04-21 23:59:51] Validation | Batch 280/1567 | Loss: 1.0661 [2026-04-21 23:59:52] Validation | Batch 290/1567 | Loss: 1.0614 [2026-04-21 23:59:54] Validation | Batch 300/1567 | Loss: 1.0607 [2026-04-21 23:59:55] Validation | Batch 310/1567 | Loss: 1.0574 [2026-04-21 23:59:56] Validation | Batch 320/1567 | Loss: 1.0603 [2026-04-21 23:59:57] Validation | Batch 330/1567 | Loss: 1.0605 [2026-04-21 23:59:59] Validation | Batch 340/1567 | Loss: 1.0597 [2026-04-22 00:00:00] Validation | Batch 350/1567 | Loss: 1.0572 [2026-04-22 00:00:01] Validation | Batch 360/1567 | Loss: 1.0511 [2026-04-22 00:00:02] Validation | Batch 370/1567 | Loss: 1.0511 [2026-04-22 00:00:03] Validation | Batch 380/1567 | Loss: 1.0553 [2026-04-22 00:00:05] Validation | Batch 390/1567 | Loss: 1.0544 [2026-04-22 00:00:06] Validation | Batch 400/1567 | Loss: 1.0551 [2026-04-22 00:00:07] Validation | Batch 410/1567 | Loss: 1.0512 [2026-04-22 00:00:08] Validation | Batch 420/1567 | Loss: 1.0496 [2026-04-22 00:00:09] Validation | Batch 430/1567 | Loss: 1.0521 [2026-04-22 00:00:11] Validation | Batch 440/1567 | Loss: 1.0523 [2026-04-22 00:00:12] Validation | Batch 450/1567 | Loss: 1.0543 [2026-04-22 00:00:13] Validation | Batch 460/1567 | Loss: 1.0569 [2026-04-22 00:00:14] Validation | Batch 470/1567 | Loss: 1.0618 [2026-04-22 00:00:15] Validation | Batch 480/1567 | Loss: 1.0594 [2026-04-22 00:00:16] Validation | Batch 490/1567 | Loss: 1.0571 [2026-04-22 00:00:17] Validation | Batch 500/1567 | Loss: 1.0582 [2026-04-22 00:00:19] Validation | Batch 510/1567 | Loss: 1.0581 [2026-04-22 00:00:19] Validation | Batch 520/1567 | Loss: 1.0594 [2026-04-22 00:00:21] Validation | Batch 530/1567 | Loss: 1.0579 [2026-04-22 00:00:22] Validation | Batch 540/1567 | Loss: 1.0552 [2026-04-22 00:00:24] Validation | Batch 550/1567 | Loss: 1.0564 [2026-04-22 00:00:25] Validation | Batch 560/1567 | Loss: 1.0556 [2026-04-22 00:00:26] Validation | Batch 570/1567 | Loss: 1.0514 [2026-04-22 00:00:27] Validation | Batch 580/1567 | Loss: 1.0533 [2026-04-22 00:00:29] Validation | Batch 590/1567 | Loss: 1.0531 [2026-04-22 00:00:30] Validation | Batch 600/1567 | Loss: 1.0520 [2026-04-22 00:00:31] Validation | Batch 610/1567 | Loss: 1.0540 [2026-04-22 00:00:32] Validation | Batch 620/1567 | Loss: 1.0519 [2026-04-22 00:00:34] Validation | Batch 630/1567 | Loss: 1.0522 [2026-04-22 00:00:35] Validation | Batch 640/1567 | Loss: 1.0528 [2026-04-22 00:00:37] Validation | Batch 650/1567 | Loss: 1.0556 [2026-04-22 00:00:38] Validation | Batch 660/1567 | Loss: 1.0570 [2026-04-22 00:00:39] Validation | Batch 670/1567 | Loss: 1.0552 [2026-04-22 00:00:40] Validation | Batch 680/1567 | Loss: 1.0540 [2026-04-22 00:00:41] Validation | Batch 690/1567 | Loss: 1.0525 [2026-04-22 00:00:42] Validation | Batch 700/1567 | Loss: 1.0526 [2026-04-22 00:00:44] Validation | Batch 710/1567 | Loss: 1.0518 [2026-04-22 00:00:45] Validation | Batch 720/1567 | Loss: 1.0487 [2026-04-22 00:00:46] Validation | Batch 730/1567 | Loss: 1.0492 [2026-04-22 00:00:46] Validation | Batch 740/1567 | Loss: 1.0498 [2026-04-22 00:00:48] Validation | Batch 750/1567 | Loss: 1.0494 [2026-04-22 00:00:49] Validation | Batch 760/1567 | Loss: 1.0507 [2026-04-22 00:00:50] Validation | Batch 770/1567 | Loss: 1.0502 [2026-04-22 00:00:52] Validation | Batch 780/1567 | Loss: 1.0513 [2026-04-22 00:00:53] Validation | Batch 790/1567 | Loss: 1.0498 [2026-04-22 00:00:54] Validation | Batch 800/1567 | Loss: 1.0480 [2026-04-22 00:00:55] Validation | Batch 810/1567 | Loss: 1.0486 [2026-04-22 00:00:56] Validation | Batch 820/1567 | Loss: 1.0478 [2026-04-22 00:00:57] Validation | Batch 830/1567 | Loss: 1.0471 [2026-04-22 00:00:58] Validation | Batch 840/1567 | Loss: 1.0477 [2026-04-22 00:00:59] Validation | Batch 850/1567 | Loss: 1.0488 [2026-04-22 00:01:00] Validation | Batch 860/1567 | Loss: 1.0496 [2026-04-22 00:01:01] Validation | Batch 870/1567 | Loss: 1.0504 [2026-04-22 00:01:02] Validation | Batch 880/1567 | Loss: 1.0502 [2026-04-22 00:01:03] Validation | Batch 890/1567 | Loss: 1.0498 [2026-04-22 00:01:05] Validation | Batch 900/1567 | Loss: 1.0495 [2026-04-22 00:01:06] Validation | Batch 910/1567 | Loss: 1.0492 [2026-04-22 00:01:07] Validation | Batch 920/1567 | Loss: 1.0510 [2026-04-22 00:01:08] Validation | Batch 930/1567 | Loss: 1.0509 [2026-04-22 00:01:09] Validation | Batch 940/1567 | Loss: 1.0509 [2026-04-22 00:01:10] Validation | Batch 950/1567 | Loss: 1.0504 [2026-04-22 00:01:11] Validation | Batch 960/1567 | Loss: 1.0507 [2026-04-22 00:01:12] Validation | Batch 970/1567 | Loss: 1.0512 [2026-04-22 00:01:13] Validation | Batch 980/1567 | Loss: 1.0509 [2026-04-22 00:01:14] Validation | Batch 990/1567 | Loss: 1.0518 [2026-04-22 00:01:15] Validation | Batch 1000/1567 | Loss: 1.0522 [2026-04-22 00:01:17] Validation | Batch 1010/1567 | Loss: 1.0514 [2026-04-22 00:01:18] Validation | Batch 1020/1567 | Loss: 1.0525 [2026-04-22 00:01:19] Validation | Batch 1030/1567 | Loss: 1.0530 [2026-04-22 00:01:20] Validation | Batch 1040/1567 | Loss: 1.0522 [2026-04-22 00:01:21] Validation | Batch 1050/1567 | Loss: 1.0511 [2026-04-22 00:01:23] Validation | Batch 1060/1567 | Loss: 1.0523 [2026-04-22 00:01:24] Validation | Batch 1070/1567 | Loss: 1.0522 [2026-04-22 00:01:25] Validation | Batch 1080/1567 | Loss: 1.0535 [2026-04-22 00:01:26] Validation | Batch 1090/1567 | Loss: 1.0561 [2026-04-22 00:01:28] Validation | Batch 1100/1567 | Loss: 1.0577 [2026-04-22 00:01:29] Validation | Batch 1110/1567 | Loss: 1.0566 [2026-04-22 00:01:30] Validation | Batch 1120/1567 | Loss: 1.0568 [2026-04-22 00:01:31] Validation | Batch 1130/1567 | Loss: 1.0550 [2026-04-22 00:01:32] Validation | Batch 1140/1567 | Loss: 1.0555 [2026-04-22 00:01:34] Validation | Batch 1150/1567 | Loss: 1.0542 [2026-04-22 00:01:34] Validation | Batch 1160/1567 | Loss: 1.0536 [2026-04-22 00:01:35] Validation | Batch 1170/1567 | Loss: 1.0538 [2026-04-22 00:01:37] Validation | Batch 1180/1567 | Loss: 1.0541 [2026-04-22 00:01:38] Validation | Batch 1190/1567 | Loss: 1.0543 [2026-04-22 00:01:39] Validation | Batch 1200/1567 | Loss: 1.0531 [2026-04-22 00:01:40] Validation | Batch 1210/1567 | Loss: 1.0524 [2026-04-22 00:01:41] Validation | Batch 1220/1567 | Loss: 1.0532 [2026-04-22 00:01:43] Validation | Batch 1230/1567 | Loss: 1.0537 [2026-04-22 00:01:44] Validation | Batch 1240/1567 | Loss: 1.0536 [2026-04-22 00:01:45] Validation | Batch 1250/1567 | Loss: 1.0539 [2026-04-22 00:01:46] Validation | Batch 1260/1567 | Loss: 1.0537 [2026-04-22 00:01:48] Validation | Batch 1270/1567 | Loss: 1.0519 [2026-04-22 00:01:49] Validation | Batch 1280/1567 | Loss: 1.0521 [2026-04-22 00:01:51] Validation | Batch 1290/1567 | Loss: 1.0522 [2026-04-22 00:01:52] Validation | Batch 1300/1567 | Loss: 1.0526 [2026-04-22 00:01:53] Validation | Batch 1310/1567 | Loss: 1.0534 [2026-04-22 00:01:54] Validation | Batch 1320/1567 | Loss: 1.0539 [2026-04-22 00:01:55] Validation | Batch 1330/1567 | Loss: 1.0554 [2026-04-22 00:01:56] Validation | Batch 1340/1567 | Loss: 1.0551 [2026-04-22 00:01:57] Validation | Batch 1350/1567 | Loss: 1.0554 [2026-04-22 00:01:58] Validation | Batch 1360/1567 | Loss: 1.0545 [2026-04-22 00:02:00] Validation | Batch 1370/1567 | Loss: 1.0541 [2026-04-22 00:02:01] Validation | Batch 1380/1567 | Loss: 1.0541 [2026-04-22 00:02:02] Validation | Batch 1390/1567 | Loss: 1.0534 [2026-04-22 00:02:03] Validation | Batch 1400/1567 | Loss: 1.0530 [2026-04-22 00:02:04] Validation | Batch 1410/1567 | Loss: 1.0536 [2026-04-22 00:02:05] Validation | Batch 1420/1567 | Loss: 1.0535 [2026-04-22 00:02:06] Validation | Batch 1430/1567 | Loss: 1.0539 [2026-04-22 00:02:07] Validation | Batch 1440/1567 | Loss: 1.0546 [2026-04-22 00:02:08] Validation | Batch 1450/1567 | Loss: 1.0547 [2026-04-22 00:02:09] Validation | Batch 1460/1567 | Loss: 1.0541 [2026-04-22 00:02:10] Validation | Batch 1470/1567 | Loss: 1.0539 [2026-04-22 00:02:11] Validation | Batch 1480/1567 | Loss: 1.0537 [2026-04-22 00:02:12] Validation | Batch 1490/1567 | Loss: 1.0532 [2026-04-22 00:02:14] Validation | Batch 1500/1567 | Loss: 1.0529 [2026-04-22 00:02:15] Validation | Batch 1510/1567 | Loss: 1.0519 [2026-04-22 00:02:15] Validation | Batch 1520/1567 | Loss: 1.0519 [2026-04-22 00:02:16] Validation | Batch 1530/1567 | Loss: 1.0519 [2026-04-22 00:02:18] Validation | Batch 1540/1567 | Loss: 1.0525 [2026-04-22 00:02:19] Validation | Batch 1550/1567 | Loss: 1.0538 [2026-04-22 00:02:20] Validation | Batch 1560/1567 | Loss: 1.0534 [2026-04-22 00:02:21] Validation | Batch 1567/1567 | Loss: 1.0534 [2026-04-22 00:02:21] Validation | Loss: 1.0534 | PPL: 2.92 | Time: 184.75s [2026-04-22 00:02:39] New best model saved! Val loss: 1.0534 [2026-04-22 00:02:44] Epoch 2 | Step 17010 | Loss: 0.8997 | LR: 2.00e-06 [2026-04-22 00:02:50] Epoch 2 | Step 17020 | Loss: 0.8996 | LR: 2.00e-06 [2026-04-22 00:02:55] Epoch 2 | Step 17030 | Loss: 0.8994 | LR: 2.00e-06 [2026-04-22 00:02:59] Epoch 2 | Step 17040 | Loss: 0.8992 | LR: 2.00e-06 [2026-04-22 00:03:05] Epoch 2 | Step 17050 | Loss: 0.8989 | LR: 2.00e-06 [2026-04-22 00:03:11] Epoch 2 | Step 17060 | Loss: 0.8987 | LR: 2.00e-06 [2026-04-22 00:03:16] Epoch 2 | Step 17070 | Loss: 0.8989 | LR: 2.00e-06 [2026-04-22 00:03:21] Epoch 2 | Step 17080 | Loss: 0.8986 | LR: 2.00e-06 [2026-04-22 00:03:26] Epoch 2 | Step 17090 | Loss: 0.8986 | LR: 2.00e-06 [2026-04-22 00:03:32] Epoch 2 | Step 17100 | Loss: 0.8986 | LR: 2.00e-06 [2026-04-22 00:03:36] Epoch 2 | Step 17110 | Loss: 0.8987 | LR: 2.00e-06 [2026-04-22 00:03:41] Epoch 2 | Step 17120 | Loss: 0.8988 | LR: 2.00e-06 [2026-04-22 00:03:46] Epoch 2 | Step 17130 | Loss: 0.8987 | LR: 2.00e-06 [2026-04-22 00:03:52] Epoch 2 | Step 17140 | Loss: 0.8989 | LR: 2.00e-06 [2026-04-22 00:03:57] Epoch 2 | Step 17150 | Loss: 0.8991 | LR: 2.00e-06 [2026-04-22 00:04:03] Epoch 2 | Step 17160 | Loss: 0.8989 | LR: 2.00e-06 [2026-04-22 00:04:09] Epoch 2 | Step 17170 | Loss: 0.8990 | LR: 2.00e-06 [2026-04-22 00:04:14] Epoch 2 | Step 17180 | Loss: 0.8992 | LR: 2.00e-06 [2026-04-22 00:04:20] Epoch 2 | Step 17190 | Loss: 0.8993 | LR: 2.00e-06 [2026-04-22 00:04:25] Epoch 2 | Step 17200 | Loss: 0.8992 | LR: 2.00e-06 [2026-04-22 00:04:31] Epoch 2 | Step 17210 | Loss: 0.8994 | LR: 2.00e-06 [2026-04-22 00:04:36] Epoch 2 | Step 17220 | Loss: 0.8995 | LR: 2.00e-06 [2026-04-22 00:04:42] Epoch 2 | Step 17230 | Loss: 0.8995 | LR: 2.00e-06 [2026-04-22 00:04:47] Epoch 2 | Step 17240 | Loss: 0.8995 | LR: 2.00e-06 [2026-04-22 00:04:53] Epoch 2 | Step 17250 | Loss: 0.8994 | LR: 2.00e-06 [2026-04-22 00:04:59] Epoch 2 | Step 17260 | Loss: 0.8994 | LR: 2.00e-06 [2026-04-22 00:05:04] Epoch 2 | Step 17270 | Loss: 0.8994 | LR: 2.00e-06 [2026-04-22 00:05:10] Epoch 2 | Step 17280 | Loss: 0.8992 | LR: 2.00e-06 [2026-04-22 00:05:15] Epoch 2 | Step 17290 | Loss: 0.8994 | LR: 2.00e-06 [2026-04-22 00:05:20] Epoch 2 | Step 17300 | Loss: 0.8994 | LR: 2.00e-06 [2026-04-22 00:05:25] Epoch 2 | Step 17310 | Loss: 0.8993 | LR: 2.00e-06 [2026-04-22 00:05:31] Epoch 2 | Step 17320 | Loss: 0.8993 | LR: 2.00e-06 [2026-04-22 00:05:36] Epoch 2 | Step 17330 | Loss: 0.8992 | LR: 2.00e-06 [2026-04-22 00:05:42] Epoch 2 | Step 17340 | Loss: 0.8992 | LR: 2.00e-06 [2026-04-22 00:05:48] Epoch 2 | Step 17350 | Loss: 0.8991 | LR: 2.00e-06 [2026-04-22 00:05:53] Epoch 2 | Step 17360 | Loss: 0.8992 | LR: 2.00e-06 [2026-04-22 00:05:58] Epoch 2 | Step 17370 | Loss: 0.8991 | LR: 2.00e-06 [2026-04-22 00:06:04] Epoch 2 | Step 17380 | Loss: 0.8989 | LR: 2.00e-06 [2026-04-22 00:06:09] Epoch 2 | Step 17390 | Loss: 0.8989 | LR: 2.00e-06 [2026-04-22 00:06:15] Epoch 2 | Step 17400 | Loss: 0.8989 | LR: 2.00e-06 [2026-04-22 00:06:20] Epoch 2 | Step 17410 | Loss: 0.8990 | LR: 2.00e-06 [2026-04-22 00:06:25] Epoch 2 | Step 17420 | Loss: 0.8989 | LR: 2.00e-06 [2026-04-22 00:06:31] Epoch 2 | Step 17430 | Loss: 0.8991 | LR: 2.00e-06 [2026-04-22 00:06:36] Epoch 2 | Step 17440 | Loss: 0.8990 | LR: 2.00e-06 [2026-04-22 00:06:41] Epoch 2 | Step 17450 | Loss: 0.8992 | LR: 2.00e-06 [2026-04-22 00:06:46] Epoch 2 | Step 17460 | Loss: 0.8992 | LR: 2.00e-06 [2026-04-22 00:06:51] Epoch 2 | Step 17470 | Loss: 0.8992 | LR: 2.00e-06 [2026-04-22 00:06:57] Epoch 2 | Step 17480 | Loss: 0.8993 | LR: 2.00e-06 [2026-04-22 00:07:02] Epoch 2 | Step 17490 | Loss: 0.8991 | LR: 2.00e-06 [2026-04-22 00:07:08] Epoch 2 | Step 17500 | Loss: 0.8992 | LR: 2.00e-06 [2026-04-22 00:07:13] Epoch 2 | Step 17510 | Loss: 0.8991 | LR: 2.00e-06 [2026-04-22 00:07:18] Epoch 2 | Step 17520 | Loss: 0.8989 | LR: 2.00e-06 [2026-04-22 00:07:24] Epoch 2 | Step 17530 | Loss: 0.8989 | LR: 2.00e-06 [2026-04-22 00:07:29] Epoch 2 | Step 17540 | Loss: 0.8988 | LR: 2.00e-06 [2026-04-22 00:07:34] Epoch 2 | Step 17550 | Loss: 0.8988 | LR: 2.00e-06 [2026-04-22 00:07:39] Epoch 2 | Step 17560 | Loss: 0.8988 | LR: 2.00e-06 [2026-04-22 00:07:45] Epoch 2 | Step 17570 | Loss: 0.8988 | LR: 2.00e-06 [2026-04-22 00:07:50] Epoch 2 | Step 17580 | Loss: 0.8989 | LR: 2.00e-06 [2026-04-22 00:07:56] Epoch 2 | Step 17590 | Loss: 0.8989 | LR: 2.00e-06 [2026-04-22 00:08:01] Epoch 2 | Step 17600 | Loss: 0.8991 | LR: 2.00e-06 [2026-04-22 00:08:06] Epoch 2 | Step 17610 | Loss: 0.8990 | LR: 2.00e-06 [2026-04-22 00:08:11] Epoch 2 | Step 17620 | Loss: 0.8988 | LR: 2.00e-06 [2026-04-22 00:08:17] Epoch 2 | Step 17630 | Loss: 0.8987 | LR: 2.00e-06 [2026-04-22 00:08:22] Epoch 2 | Step 17640 | Loss: 0.8985 | LR: 2.00e-06 [2026-04-22 00:08:27] Epoch 2 | Step 17650 | Loss: 0.8986 | LR: 2.00e-06 [2026-04-22 00:08:33] Epoch 2 | Step 17660 | Loss: 0.8986 | LR: 2.00e-06 [2026-04-22 00:08:38] Epoch 2 | Step 17670 | Loss: 0.8985 | LR: 2.00e-06 [2026-04-22 00:08:43] Epoch 2 | Step 17680 | Loss: 0.8986 | LR: 2.00e-06 [2026-04-22 00:08:49] Epoch 2 | Step 17690 | Loss: 0.8988 | LR: 2.00e-06 [2026-04-22 00:08:55] Epoch 2 | Step 17700 | Loss: 0.8989 | LR: 2.00e-06 [2026-04-22 00:09:01] Epoch 2 | Step 17710 | Loss: 0.8989 | LR: 2.00e-06 [2026-04-22 00:09:06] Epoch 2 | Step 17720 | Loss: 0.8989 | LR: 2.00e-06 [2026-04-22 00:09:11] Epoch 2 | Step 17730 | Loss: 0.8990 | LR: 2.00e-06 [2026-04-22 00:09:16] Epoch 2 | Step 17740 | Loss: 0.8989 | LR: 2.00e-06 [2026-04-22 00:09:22] Epoch 2 | Step 17750 | Loss: 0.8988 | LR: 2.00e-06 [2026-04-22 00:09:28] Epoch 2 | Step 17760 | Loss: 0.8987 | LR: 2.00e-06 [2026-04-22 00:09:33] Epoch 2 | Step 17770 | Loss: 0.8988 | LR: 2.00e-06 [2026-04-22 00:09:38] Epoch 2 | Step 17780 | Loss: 0.8989 | LR: 2.00e-06 [2026-04-22 00:09:43] Epoch 2 | Step 17790 | Loss: 0.8990 | LR: 2.00e-06 [2026-04-22 00:09:47] Epoch 2 | Step 17800 | Loss: 0.8989 | LR: 2.00e-06 [2026-04-22 00:09:53] Epoch 2 | Step 17810 | Loss: 0.8989 | LR: 2.00e-06 [2026-04-22 00:09:58] Epoch 2 | Step 17820 | Loss: 0.8988 | LR: 2.00e-06 [2026-04-22 00:10:02] Epoch 2 | Step 17830 | Loss: 0.8987 | LR: 2.00e-06 [2026-04-22 00:10:08] Epoch 2 | Step 17840 | Loss: 0.8986 | LR: 2.00e-06 [2026-04-22 00:10:13] Epoch 2 | Step 17850 | Loss: 0.8985 | LR: 2.00e-06 [2026-04-22 00:10:18] Epoch 2 | Step 17860 | Loss: 0.8984 | LR: 2.00e-06 [2026-04-22 00:10:23] Epoch 2 | Step 17870 | Loss: 0.8982 | LR: 2.00e-06 [2026-04-22 00:10:28] Epoch 2 | Step 17880 | Loss: 0.8983 | LR: 2.00e-06 [2026-04-22 00:10:33] Epoch 2 | Step 17890 | Loss: 0.8981 | LR: 2.00e-06 [2026-04-22 00:10:38] Epoch 2 | Step 17900 | Loss: 0.8980 | LR: 2.00e-06 [2026-04-22 00:10:43] Epoch 2 | Step 17910 | Loss: 0.8978 | LR: 2.00e-06 [2026-04-22 00:10:49] Epoch 2 | Step 17920 | Loss: 0.8978 | LR: 2.00e-06 [2026-04-22 00:10:54] Epoch 2 | Step 17930 | Loss: 0.8978 | LR: 2.00e-06 [2026-04-22 00:11:00] Epoch 2 | Step 17940 | Loss: 0.8977 | LR: 2.00e-06 [2026-04-22 00:11:05] Epoch 2 | Step 17950 | Loss: 0.8978 | LR: 2.00e-06 [2026-04-22 00:11:10] Epoch 2 | Step 17960 | Loss: 0.8977 | LR: 2.00e-06 [2026-04-22 00:11:16] Epoch 2 | Step 17970 | Loss: 0.8976 | LR: 2.00e-06 [2026-04-22 00:11:21] Epoch 2 | Step 17980 | Loss: 0.8975 | LR: 2.00e-06 [2026-04-22 00:11:26] Epoch 2 | Step 17990 | Loss: 0.8976 | LR: 2.00e-06 [2026-04-22 00:11:31] Epoch 2 | Step 18000 | Loss: 0.8976 | LR: 2.00e-06 [2026-04-22 00:11:42] Checkpoint saved: outputs/2026-04-21/20-28-37/checkpoints/checkpoint_step_18000.pt [2026-04-22 00:12:58] Validation | Batch 10/1567 | Loss: 1.0391 [2026-04-22 00:12:59] Validation | Batch 20/1567 | Loss: 1.1240 [2026-04-22 00:13:00] Validation | Batch 30/1567 | Loss: 1.0804 [2026-04-22 00:13:02] Validation | Batch 40/1567 | Loss: 1.1011 [2026-04-22 00:13:03] Validation | Batch 50/1567 | Loss: 1.0773 [2026-04-22 00:13:04] Validation | Batch 60/1567 | Loss: 1.0659 [2026-04-22 00:13:05] Validation | Batch 70/1567 | Loss: 1.0571 [2026-04-22 00:13:07] Validation | Batch 80/1567 | Loss: 1.0654 [2026-04-22 00:13:08] Validation | Batch 90/1567 | Loss: 1.0600 [2026-04-22 00:13:09] Validation | Batch 100/1567 | Loss: 1.0390 [2026-04-22 00:13:10] Validation | Batch 110/1567 | Loss: 1.0293 [2026-04-22 00:13:12] Validation | Batch 120/1567 | Loss: 1.0240 [2026-04-22 00:13:13] Validation | Batch 130/1567 | Loss: 1.0188 [2026-04-22 00:13:14] Validation | Batch 140/1567 | Loss: 1.0290 [2026-04-22 00:13:15] Validation | Batch 150/1567 | Loss: 1.0398 [2026-04-22 00:13:16] Validation | Batch 160/1567 | Loss: 1.0387 [2026-04-22 00:13:17] Validation | Batch 170/1567 | Loss: 1.0310 [2026-04-22 00:13:18] Validation | Batch 180/1567 | Loss: 1.0338 [2026-04-22 00:13:20] Validation | Batch 190/1567 | Loss: 1.0383 [2026-04-22 00:13:21] Validation | Batch 200/1567 | Loss: 1.0418 [2026-04-22 00:13:22] Validation | Batch 210/1567 | Loss: 1.0398 [2026-04-22 00:13:23] Validation | Batch 220/1567 | Loss: 1.0451 [2026-04-22 00:13:25] Validation | Batch 230/1567 | Loss: 1.0489 [2026-04-22 00:13:26] Validation | Batch 240/1567 | Loss: 1.0508 [2026-04-22 00:13:27] Validation | Batch 250/1567 | Loss: 1.0550 [2026-04-22 00:13:29] Validation | Batch 260/1567 | Loss: 1.0579 [2026-04-22 00:13:30] Validation | Batch 270/1567 | Loss: 1.0624 [2026-04-22 00:13:32] Validation | Batch 280/1567 | Loss: 1.0659 [2026-04-22 00:13:34] Validation | Batch 290/1567 | Loss: 1.0612 [2026-04-22 00:13:35] Validation | Batch 300/1567 | Loss: 1.0606 [2026-04-22 00:13:36] Validation | Batch 310/1567 | Loss: 1.0572 [2026-04-22 00:13:37] Validation | Batch 320/1567 | Loss: 1.0601 [2026-04-22 00:13:39] Validation | Batch 330/1567 | Loss: 1.0602 [2026-04-22 00:13:40] Validation | Batch 340/1567 | Loss: 1.0594 [2026-04-22 00:13:41] Validation | Batch 350/1567 | Loss: 1.0570 [2026-04-22 00:13:42] Validation | Batch 360/1567 | Loss: 1.0509 [2026-04-22 00:13:44] Validation | Batch 370/1567 | Loss: 1.0509 [2026-04-22 00:13:45] Validation | Batch 380/1567 | Loss: 1.0551 [2026-04-22 00:13:46] Validation | Batch 390/1567 | Loss: 1.0541 [2026-04-22 00:13:47] Validation | Batch 400/1567 | Loss: 1.0548 [2026-04-22 00:13:48] Validation | Batch 410/1567 | Loss: 1.0509 [2026-04-22 00:13:49] Validation | Batch 420/1567 | Loss: 1.0493 [2026-04-22 00:13:51] Validation | Batch 430/1567 | Loss: 1.0519 [2026-04-22 00:13:52] Validation | Batch 440/1567 | Loss: 1.0521 [2026-04-22 00:13:53] Validation | Batch 450/1567 | Loss: 1.0541 [2026-04-22 00:13:54] Validation | Batch 460/1567 | Loss: 1.0567 [2026-04-22 00:13:55] Validation | Batch 470/1567 | Loss: 1.0616 [2026-04-22 00:13:57] Validation | Batch 480/1567 | Loss: 1.0592 [2026-04-22 00:13:58] Validation | Batch 490/1567 | Loss: 1.0569 [2026-04-22 00:13:59] Validation | Batch 500/1567 | Loss: 1.0580 [2026-04-22 00:14:00] Validation | Batch 510/1567 | Loss: 1.0578 [2026-04-22 00:14:01] Validation | Batch 520/1567 | Loss: 1.0591 [2026-04-22 00:14:02] Validation | Batch 530/1567 | Loss: 1.0576 [2026-04-22 00:14:03] Validation | Batch 540/1567 | Loss: 1.0549 [2026-04-22 00:14:05] Validation | Batch 550/1567 | Loss: 1.0561 [2026-04-22 00:14:06] Validation | Batch 560/1567 | Loss: 1.0552 [2026-04-22 00:14:07] Validation | Batch 570/1567 | Loss: 1.0511 [2026-04-22 00:14:09] Validation | Batch 580/1567 | Loss: 1.0530 [2026-04-22 00:14:10] Validation | Batch 590/1567 | Loss: 1.0527 [2026-04-22 00:14:11] Validation | Batch 600/1567 | Loss: 1.0516 [2026-04-22 00:14:12] Validation | Batch 610/1567 | Loss: 1.0537 [2026-04-22 00:14:14] Validation | Batch 620/1567 | Loss: 1.0516 [2026-04-22 00:14:15] Validation | Batch 630/1567 | Loss: 1.0518 [2026-04-22 00:14:17] Validation | Batch 640/1567 | Loss: 1.0525 [2026-04-22 00:14:18] Validation | Batch 650/1567 | Loss: 1.0553 [2026-04-22 00:14:19] Validation | Batch 660/1567 | Loss: 1.0566 [2026-04-22 00:14:20] Validation | Batch 670/1567 | Loss: 1.0548 [2026-04-22 00:14:21] Validation | Batch 680/1567 | Loss: 1.0536 [2026-04-22 00:14:22] Validation | Batch 690/1567 | Loss: 1.0522 [2026-04-22 00:14:24] Validation | Batch 700/1567 | Loss: 1.0522 [2026-04-22 00:14:25] Validation | Batch 710/1567 | Loss: 1.0514 [2026-04-22 00:14:26] Validation | Batch 720/1567 | Loss: 1.0483 [2026-04-22 00:14:27] Validation | Batch 730/1567 | Loss: 1.0488 [2026-04-22 00:14:28] Validation | Batch 740/1567 | Loss: 1.0495 [2026-04-22 00:14:29] Validation | Batch 750/1567 | Loss: 1.0491 [2026-04-22 00:14:30] Validation | Batch 760/1567 | Loss: 1.0504 [2026-04-22 00:14:32] Validation | Batch 770/1567 | Loss: 1.0499 [2026-04-22 00:14:33] Validation | Batch 780/1567 | Loss: 1.0510 [2026-04-22 00:14:34] Validation | Batch 790/1567 | Loss: 1.0494 [2026-04-22 00:14:35] Validation | Batch 800/1567 | Loss: 1.0477 [2026-04-22 00:14:36] Validation | Batch 810/1567 | Loss: 1.0483 [2026-04-22 00:14:37] Validation | Batch 820/1567 | Loss: 1.0476 [2026-04-22 00:14:38] Validation | Batch 830/1567 | Loss: 1.0468 [2026-04-22 00:14:39] Validation | Batch 840/1567 | Loss: 1.0475 [2026-04-22 00:14:40] Validation | Batch 850/1567 | Loss: 1.0486 [2026-04-22 00:14:41] Validation | Batch 860/1567 | Loss: 1.0493 [2026-04-22 00:14:42] Validation | Batch 870/1567 | Loss: 1.0501 [2026-04-22 00:14:43] Validation | Batch 880/1567 | Loss: 1.0500 [2026-04-22 00:14:45] Validation | Batch 890/1567 | Loss: 1.0495 [2026-04-22 00:14:46] Validation | Batch 900/1567 | Loss: 1.0492 [2026-04-22 00:14:47] Validation | Batch 910/1567 | Loss: 1.0489 [2026-04-22 00:14:48] Validation | Batch 920/1567 | Loss: 1.0508 [2026-04-22 00:14:49] Validation | Batch 930/1567 | Loss: 1.0507 [2026-04-22 00:14:51] Validation | Batch 940/1567 | Loss: 1.0506 [2026-04-22 00:14:52] Validation | Batch 950/1567 | Loss: 1.0502 [2026-04-22 00:14:53] Validation | Batch 960/1567 | Loss: 1.0505 [2026-04-22 00:14:54] Validation | Batch 970/1567 | Loss: 1.0510 [2026-04-22 00:14:55] Validation | Batch 980/1567 | Loss: 1.0507 [2026-04-22 00:14:55] Validation | Batch 990/1567 | Loss: 1.0516 [2026-04-22 00:14:57] Validation | Batch 1000/1567 | Loss: 1.0520 [2026-04-22 00:14:58] Validation | Batch 1010/1567 | Loss: 1.0511 [2026-04-22 00:14:59] Validation | Batch 1020/1567 | Loss: 1.0523 [2026-04-22 00:15:00] Validation | Batch 1030/1567 | Loss: 1.0528 [2026-04-22 00:15:02] Validation | Batch 1040/1567 | Loss: 1.0519 [2026-04-22 00:15:03] Validation | Batch 1050/1567 | Loss: 1.0509 [2026-04-22 00:15:04] Validation | Batch 1060/1567 | Loss: 1.0521 [2026-04-22 00:15:06] Validation | Batch 1070/1567 | Loss: 1.0519 [2026-04-22 00:15:07] Validation | Batch 1080/1567 | Loss: 1.0533 [2026-04-22 00:15:08] Validation | Batch 1090/1567 | Loss: 1.0559 [2026-04-22 00:15:09] Validation | Batch 1100/1567 | Loss: 1.0574 [2026-04-22 00:15:10] Validation | Batch 1110/1567 | Loss: 1.0564 [2026-04-22 00:15:11] Validation | Batch 1120/1567 | Loss: 1.0565 [2026-04-22 00:15:12] Validation | Batch 1130/1567 | Loss: 1.0548 [2026-04-22 00:15:14] Validation | Batch 1140/1567 | Loss: 1.0552 [2026-04-22 00:15:15] Validation | Batch 1150/1567 | Loss: 1.0540 [2026-04-22 00:15:16] Validation | Batch 1160/1567 | Loss: 1.0533 [2026-04-22 00:15:17] Validation | Batch 1170/1567 | Loss: 1.0536 [2026-04-22 00:15:18] Validation | Batch 1180/1567 | Loss: 1.0538 [2026-04-22 00:15:19] Validation | Batch 1190/1567 | Loss: 1.0541 [2026-04-22 00:15:21] Validation | Batch 1200/1567 | Loss: 1.0528 [2026-04-22 00:15:22] Validation | Batch 1210/1567 | Loss: 1.0521 [2026-04-22 00:15:23] Validation | Batch 1220/1567 | Loss: 1.0530 [2026-04-22 00:15:24] Validation | Batch 1230/1567 | Loss: 1.0535 [2026-04-22 00:15:25] Validation | Batch 1240/1567 | Loss: 1.0534 [2026-04-22 00:15:26] Validation | Batch 1250/1567 | Loss: 1.0537 [2026-04-22 00:15:28] Validation | Batch 1260/1567 | Loss: 1.0535 [2026-04-22 00:15:29] Validation | Batch 1270/1567 | Loss: 1.0517 [2026-04-22 00:15:30] Validation | Batch 1280/1567 | Loss: 1.0519 [2026-04-22 00:15:32] Validation | Batch 1290/1567 | Loss: 1.0520 [2026-04-22 00:15:33] Validation | Batch 1300/1567 | Loss: 1.0524 [2026-04-22 00:15:34] Validation | Batch 1310/1567 | Loss: 1.0531 [2026-04-22 00:15:35] Validation | Batch 1320/1567 | Loss: 1.0537 [2026-04-22 00:15:36] Validation | Batch 1330/1567 | Loss: 1.0552 [2026-04-22 00:15:38] Validation | Batch 1340/1567 | Loss: 1.0549 [2026-04-22 00:15:38] Validation | Batch 1350/1567 | Loss: 1.0552 [2026-04-22 00:15:40] Validation | Batch 1360/1567 | Loss: 1.0543 [2026-04-22 00:15:41] Validation | Batch 1370/1567 | Loss: 1.0539 [2026-04-22 00:15:42] Validation | Batch 1380/1567 | Loss: 1.0539 [2026-04-22 00:15:43] Validation | Batch 1390/1567 | Loss: 1.0532 [2026-04-22 00:15:44] Validation | Batch 1400/1567 | Loss: 1.0528 [2026-04-22 00:15:45] Validation | Batch 1410/1567 | Loss: 1.0534 [2026-04-22 00:15:46] Validation | Batch 1420/1567 | Loss: 1.0533 [2026-04-22 00:15:47] Validation | Batch 1430/1567 | Loss: 1.0537 [2026-04-22 00:15:49] Validation | Batch 1440/1567 | Loss: 1.0544 [2026-04-22 00:15:50] Validation | Batch 1450/1567 | Loss: 1.0545 [2026-04-22 00:15:50] Validation | Batch 1460/1567 | Loss: 1.0539 [2026-04-22 00:15:52] Validation | Batch 1470/1567 | Loss: 1.0537 [2026-04-22 00:15:53] Validation | Batch 1480/1567 | Loss: 1.0535 [2026-04-22 00:15:53] Validation | Batch 1490/1567 | Loss: 1.0530 [2026-04-22 00:15:55] Validation | Batch 1500/1567 | Loss: 1.0527 [2026-04-22 00:15:56] Validation | Batch 1510/1567 | Loss: 1.0518 [2026-04-22 00:15:57] Validation | Batch 1520/1567 | Loss: 1.0517 [2026-04-22 00:15:58] Validation | Batch 1530/1567 | Loss: 1.0517 [2026-04-22 00:15:59] Validation | Batch 1540/1567 | Loss: 1.0523 [2026-04-22 00:16:00] Validation | Batch 1550/1567 | Loss: 1.0536 [2026-04-22 00:16:01] Validation | Batch 1560/1567 | Loss: 1.0532 [2026-04-22 00:16:02] Validation | Batch 1567/1567 | Loss: 1.0533 [2026-04-22 00:16:02] Validation | Loss: 1.0533 | PPL: 2.92 | Time: 185.52s [2026-04-22 00:16:20] New best model saved! Val loss: 1.0533 [2026-04-22 00:16:25] Epoch 2 | Step 18010 | Loss: 0.8976 | LR: 2.00e-06 [2026-04-22 00:16:31] Epoch 2 | Step 18020 | Loss: 0.8974 | LR: 2.00e-06 [2026-04-22 00:16:35] Epoch 2 | Step 18030 | Loss: 0.8975 | LR: 2.00e-06 [2026-04-22 00:16:41] Epoch 2 | Step 18040 | Loss: 0.8975 | LR: 2.00e-06 [2026-04-22 00:16:47] Epoch 2 | Step 18050 | Loss: 0.8974 | LR: 2.00e-06 [2026-04-22 00:16:51] Epoch 2 | Step 18060 | Loss: 0.8976 | LR: 2.00e-06 [2026-04-22 00:16:57] Epoch 2 | Step 18070 | Loss: 0.8976 | LR: 2.00e-06 [2026-04-22 00:17:02] Epoch 2 | Step 18080 | Loss: 0.8977 | LR: 2.00e-06 [2026-04-22 00:17:07] Epoch 2 | Step 18090 | Loss: 0.8977 | LR: 2.00e-06 [2026-04-22 00:17:13] Epoch 2 | Step 18100 | Loss: 0.8976 | LR: 2.00e-06 [2026-04-22 00:17:18] Epoch 2 | Step 18110 | Loss: 0.8977 | LR: 2.00e-06 [2026-04-22 00:17:23] Epoch 2 | Step 18120 | Loss: 0.8976 | LR: 2.00e-06 [2026-04-22 00:17:27] Epoch 2 | Step 18130 | Loss: 0.8975 | LR: 2.00e-06 [2026-04-22 00:17:32] Epoch 2 | Step 18140 | Loss: 0.8973 | LR: 2.00e-06 [2026-04-22 00:17:38] Epoch 2 | Step 18150 | Loss: 0.8974 | LR: 2.00e-06 [2026-04-22 00:17:43] Epoch 2 | Step 18160 | Loss: 0.8975 | LR: 2.00e-06 [2026-04-22 00:17:49] Epoch 2 | Step 18170 | Loss: 0.8973 | LR: 2.00e-06 [2026-04-22 00:17:54] Epoch 2 | Step 18180 | Loss: 0.8972 | LR: 2.00e-06 [2026-04-22 00:17:59] Epoch 2 | Step 18190 | Loss: 0.8971 | LR: 2.00e-06 [2026-04-22 00:18:04] Epoch 2 | Step 18200 | Loss: 0.8973 | LR: 2.00e-06 [2026-04-22 00:18:10] Epoch 2 | Step 18210 | Loss: 0.8972 | LR: 2.00e-06 [2026-04-22 00:18:16] Epoch 2 | Step 18220 | Loss: 0.8971 | LR: 2.00e-06 [2026-04-22 00:18:21] Epoch 2 | Step 18230 | Loss: 0.8971 | LR: 2.00e-06 [2026-04-22 00:18:26] Epoch 2 | Step 18240 | Loss: 0.8971 | LR: 2.00e-06 [2026-04-22 00:18:32] Epoch 2 | Step 18250 | Loss: 0.8972 | LR: 2.00e-06 [2026-04-22 00:18:37] Epoch 2 | Step 18260 | Loss: 0.8971 | LR: 2.00e-06 [2026-04-22 00:18:42] Epoch 2 | Step 18270 | Loss: 0.8973 | LR: 2.00e-06 [2026-04-22 00:18:48] Epoch 2 | Step 18280 | Loss: 0.8972 | LR: 2.00e-06 [2026-04-22 00:18:53] Epoch 2 | Step 18290 | Loss: 0.8971 | LR: 2.00e-06 [2026-04-22 00:18:58] Epoch 2 | Step 18300 | Loss: 0.8972 | LR: 2.00e-06 [2026-04-22 00:19:04] Epoch 2 | Step 18310 | Loss: 0.8972 | LR: 2.00e-06 [2026-04-22 00:19:09] Epoch 2 | Step 18320 | Loss: 0.8973 | LR: 2.00e-06 [2026-04-22 00:19:15] Epoch 2 | Step 18330 | Loss: 0.8972 | LR: 2.00e-06 [2026-04-22 00:19:20] Epoch 2 | Step 18340 | Loss: 0.8971 | LR: 2.00e-06 [2026-04-22 00:19:25] Epoch 2 | Step 18350 | Loss: 0.8970 | LR: 2.00e-06 [2026-04-22 00:19:30] Epoch 2 | Step 18360 | Loss: 0.8969 | LR: 2.00e-06 [2026-04-22 00:19:37] Epoch 2 | Step 18370 | Loss: 0.8969 | LR: 2.00e-06 [2026-04-22 00:19:42] Epoch 2 | Step 18380 | Loss: 0.8969 | LR: 2.00e-06 [2026-04-22 00:19:47] Epoch 2 | Step 18390 | Loss: 0.8970 | LR: 2.00e-06 [2026-04-22 00:19:54] Epoch 2 | Step 18400 | Loss: 0.8970 | LR: 2.00e-06 [2026-04-22 00:19:59] Epoch 2 | Step 18410 | Loss: 0.8971 | LR: 2.00e-06 [2026-04-22 00:20:04] Epoch 2 | Step 18420 | Loss: 0.8973 | LR: 2.00e-06 [2026-04-22 00:20:09] Epoch 2 | Step 18430 | Loss: 0.8974 | LR: 2.00e-06 [2026-04-22 00:20:14] Epoch 2 | Step 18440 | Loss: 0.8974 | LR: 2.00e-06 [2026-04-22 00:20:19] Epoch 2 | Step 18450 | Loss: 0.8973 | LR: 2.00e-06 [2026-04-22 00:20:25] Epoch 2 | Step 18460 | Loss: 0.8972 | LR: 2.00e-06 [2026-04-22 00:20:30] Epoch 2 | Step 18470 | Loss: 0.8974 | LR: 2.00e-06 [2026-04-22 00:20:35] Epoch 2 | Step 18480 | Loss: 0.8974 | LR: 2.00e-06 [2026-04-22 00:20:41] Epoch 2 | Step 18490 | Loss: 0.8975 | LR: 2.00e-06 [2026-04-22 00:20:45] Epoch 2 | Step 18500 | Loss: 0.8974 | LR: 2.00e-06 [2026-04-22 00:20:51] Epoch 2 | Step 18510 | Loss: 0.8974 | LR: 2.00e-06 [2026-04-22 00:20:56] Epoch 2 | Step 18520 | Loss: 0.8975 | LR: 2.00e-06 [2026-04-22 00:21:01] Epoch 2 | Step 18530 | Loss: 0.8974 | LR: 2.00e-06 [2026-04-22 00:21:06] Epoch 2 | Step 18540 | Loss: 0.8974 | LR: 2.00e-06 [2026-04-22 00:21:11] Epoch 2 | Step 18550 | Loss: 0.8973 | LR: 2.00e-06 [2026-04-22 00:21:16] Epoch 2 | Step 18560 | Loss: 0.8976 | LR: 2.00e-06 [2026-04-22 00:21:22] Epoch 2 | Step 18570 | Loss: 0.8976 | LR: 2.00e-06 [2026-04-22 00:21:28] Epoch 2 | Step 18580 | Loss: 0.8976 | LR: 2.00e-06 [2026-04-22 00:21:34] Epoch 2 | Step 18590 | Loss: 0.8975 | LR: 2.00e-06 [2026-04-22 00:21:39] Epoch 2 | Step 18600 | Loss: 0.8974 | LR: 2.00e-06 [2026-04-22 00:21:44] Epoch 2 | Step 18610 | Loss: 0.8973 | LR: 2.00e-06 [2026-04-22 00:21:49] Epoch 2 | Step 18620 | Loss: 0.8973 | LR: 2.00e-06 [2026-04-22 00:21:54] Epoch 2 | Step 18630 | Loss: 0.8972 | LR: 2.00e-06 [2026-04-22 00:22:00] Epoch 2 | Step 18640 | Loss: 0.8972 | LR: 2.00e-06 [2026-04-22 00:22:05] Epoch 2 | Step 18650 | Loss: 0.8973 | LR: 2.00e-06 [2026-04-22 00:22:10] Epoch 2 | Step 18660 | Loss: 0.8973 | LR: 2.00e-06 [2026-04-22 00:22:16] Epoch 2 | Step 18670 | Loss: 0.8973 | LR: 2.00e-06 [2026-04-22 00:22:22] Epoch 2 | Step 18680 | Loss: 0.8972 | LR: 2.00e-06 [2026-04-22 00:22:28] Epoch 2 | Step 18690 | Loss: 0.8973 | LR: 2.00e-06 [2026-04-22 00:22:32] Epoch 2 | Step 18700 | Loss: 0.8972 | LR: 2.00e-06 [2026-04-22 00:22:38] Epoch 2 | Step 18710 | Loss: 0.8972 | LR: 2.00e-06 [2026-04-22 00:22:43] Epoch 2 | Step 18720 | Loss: 0.8971 | LR: 2.00e-06 [2026-04-22 00:22:49] Epoch 2 | Step 18730 | Loss: 0.8969 | LR: 2.00e-06 [2026-04-22 00:22:54] Epoch 2 | Step 18740 | Loss: 0.8968 | LR: 2.00e-06 [2026-04-22 00:22:59] Epoch 2 | Step 18750 | Loss: 0.8968 | LR: 2.00e-06 [2026-04-22 00:23:05] Epoch 2 | Step 18760 | Loss: 0.8966 | LR: 2.00e-06 [2026-04-22 00:23:10] Epoch 2 | Step 18770 | Loss: 0.8967 | LR: 2.00e-06 [2026-04-22 00:23:15] Epoch 2 | Step 18780 | Loss: 0.8968 | LR: 2.00e-06 [2026-04-22 00:23:20] Epoch 2 | Step 18790 | Loss: 0.8969 | LR: 2.00e-06 [2026-04-22 00:23:25] Epoch 2 | Step 18800 | Loss: 0.8968 | LR: 2.00e-06 [2026-04-22 00:23:31] Epoch 2 | Step 18810 | Loss: 0.8968 | LR: 2.00e-06 [2026-04-22 00:23:36] Epoch 2 | Step 18820 | Loss: 0.8968 | LR: 2.00e-06 [2026-04-22 00:23:42] Epoch 2 | Step 18830 | Loss: 0.8968 | LR: 2.00e-06 [2026-04-22 00:23:47] Epoch 2 | Step 18840 | Loss: 0.8968 | LR: 2.00e-06 [2026-04-22 00:23:53] Epoch 2 | Step 18850 | Loss: 0.8969 | LR: 2.00e-06 [2026-04-22 00:23:58] Epoch 2 | Step 18860 | Loss: 0.8970 | LR: 2.00e-06 [2026-04-22 00:24:03] Epoch 2 | Step 18870 | Loss: 0.8970 | LR: 2.00e-06 [2026-04-22 00:24:09] Epoch 2 | Step 18880 | Loss: 0.8970 | LR: 2.00e-06 [2026-04-22 00:24:13] Epoch 2 | Step 18890 | Loss: 0.8970 | LR: 2.00e-06 [2026-04-22 00:24:19] Epoch 2 | Step 18900 | Loss: 0.8969 | LR: 2.00e-06 [2026-04-22 00:24:25] Epoch 2 | Step 18910 | Loss: 0.8970 | LR: 2.00e-06 [2026-04-22 00:24:31] Epoch 2 | Step 18920 | Loss: 0.8970 | LR: 2.00e-06 [2026-04-22 00:24:36] Epoch 2 | Step 18930 | Loss: 0.8969 | LR: 2.00e-06 [2026-04-22 00:24:42] Epoch 2 | Step 18940 | Loss: 0.8969 | LR: 2.00e-06 [2026-04-22 00:24:47] Epoch 2 | Step 18950 | Loss: 0.8969 | LR: 2.00e-06 [2026-04-22 00:24:52] Epoch 2 | Step 18960 | Loss: 0.8969 | LR: 2.00e-06 [2026-04-22 00:24:57] Epoch 2 | Step 18970 | Loss: 0.8968 | LR: 2.00e-06 [2026-04-22 00:25:03] Epoch 2 | Step 18980 | Loss: 0.8969 | LR: 2.00e-06 [2026-04-22 00:25:08] Epoch 2 | Step 18990 | Loss: 0.8970 | LR: 2.00e-06 [2026-04-22 00:25:13] Epoch 2 | Step 19000 | Loss: 0.8970 | LR: 2.00e-06 [2026-04-22 00:25:15] Validation | Batch 10/1567 | Loss: 1.0389 [2026-04-22 00:25:16] Validation | Batch 20/1567 | Loss: 1.1237 [2026-04-22 00:25:17] Validation | Batch 30/1567 | Loss: 1.0805 [2026-04-22 00:25:19] Validation | Batch 40/1567 | Loss: 1.1012 [2026-04-22 00:25:19] Validation | Batch 50/1567 | Loss: 1.0773 [2026-04-22 00:25:21] Validation | Batch 60/1567 | Loss: 1.0658 [2026-04-22 00:25:22] Validation | Batch 70/1567 | Loss: 1.0568 [2026-04-22 00:25:24] Validation | Batch 80/1567 | Loss: 1.0650 [2026-04-22 00:25:25] Validation | Batch 90/1567 | Loss: 1.0596 [2026-04-22 00:25:26] Validation | Batch 100/1567 | Loss: 1.0385 [2026-04-22 00:25:27] Validation | Batch 110/1567 | Loss: 1.0290 [2026-04-22 00:25:29] Validation | Batch 120/1567 | Loss: 1.0236 [2026-04-22 00:25:30] Validation | Batch 130/1567 | Loss: 1.0185 [2026-04-22 00:25:31] Validation | Batch 140/1567 | Loss: 1.0286 [2026-04-22 00:25:32] Validation | Batch 150/1567 | Loss: 1.0393 [2026-04-22 00:25:33] Validation | Batch 160/1567 | Loss: 1.0383 [2026-04-22 00:25:34] Validation | Batch 170/1567 | Loss: 1.0305 [2026-04-22 00:25:35] Validation | Batch 180/1567 | Loss: 1.0334 [2026-04-22 00:25:36] Validation | Batch 190/1567 | Loss: 1.0378 [2026-04-22 00:25:38] Validation | Batch 200/1567 | Loss: 1.0413 [2026-04-22 00:25:39] Validation | Batch 210/1567 | Loss: 1.0394 [2026-04-22 00:25:40] Validation | Batch 220/1567 | Loss: 1.0447 [2026-04-22 00:25:42] Validation | Batch 230/1567 | Loss: 1.0485 [2026-04-22 00:25:43] Validation | Batch 240/1567 | Loss: 1.0504 [2026-04-22 00:25:44] Validation | Batch 250/1567 | Loss: 1.0545 [2026-04-22 00:25:45] Validation | Batch 260/1567 | Loss: 1.0575 [2026-04-22 00:25:46] Validation | Batch 270/1567 | Loss: 1.0620 [2026-04-22 00:25:48] Validation | Batch 280/1567 | Loss: 1.0655 [2026-04-22 00:25:50] Validation | Batch 290/1567 | Loss: 1.0608 [2026-04-22 00:25:51] Validation | Batch 300/1567 | Loss: 1.0601 [2026-04-22 00:25:52] Validation | Batch 310/1567 | Loss: 1.0569 [2026-04-22 00:25:53] Validation | Batch 320/1567 | Loss: 1.0597 [2026-04-22 00:25:55] Validation | Batch 330/1567 | Loss: 1.0598 [2026-04-22 00:25:56] Validation | Batch 340/1567 | Loss: 1.0590 [2026-04-22 00:25:57] Validation | Batch 350/1567 | Loss: 1.0565 [2026-04-22 00:25:58] Validation | Batch 360/1567 | Loss: 1.0504 [2026-04-22 00:26:00] Validation | Batch 370/1567 | Loss: 1.0505 [2026-04-22 00:26:01] Validation | Batch 380/1567 | Loss: 1.0546 [2026-04-22 00:26:02] Validation | Batch 390/1567 | Loss: 1.0537 [2026-04-22 00:26:03] Validation | Batch 400/1567 | Loss: 1.0544 [2026-04-22 00:26:04] Validation | Batch 410/1567 | Loss: 1.0505 [2026-04-22 00:26:05] Validation | Batch 420/1567 | Loss: 1.0490 [2026-04-22 00:26:07] Validation | Batch 430/1567 | Loss: 1.0515 [2026-04-22 00:26:08] Validation | Batch 440/1567 | Loss: 1.0517 [2026-04-22 00:26:09] Validation | Batch 450/1567 | Loss: 1.0537 [2026-04-22 00:26:10] Validation | Batch 460/1567 | Loss: 1.0562 [2026-04-22 00:26:11] Validation | Batch 470/1567 | Loss: 1.0611 [2026-04-22 00:26:13] Validation | Batch 480/1567 | Loss: 1.0588 [2026-04-22 00:26:14] Validation | Batch 490/1567 | Loss: 1.0565 [2026-04-22 00:26:15] Validation | Batch 500/1567 | Loss: 1.0575 [2026-04-22 00:26:16] Validation | Batch 510/1567 | Loss: 1.0573 [2026-04-22 00:26:17] Validation | Batch 520/1567 | Loss: 1.0587 [2026-04-22 00:26:18] Validation | Batch 530/1567 | Loss: 1.0572 [2026-04-22 00:26:19] Validation | Batch 540/1567 | Loss: 1.0545 [2026-04-22 00:26:21] Validation | Batch 550/1567 | Loss: 1.0556 [2026-04-22 00:26:22] Validation | Batch 560/1567 | Loss: 1.0548 [2026-04-22 00:26:23] Validation | Batch 570/1567 | Loss: 1.0507 [2026-04-22 00:26:25] Validation | Batch 580/1567 | Loss: 1.0525 [2026-04-22 00:26:26] Validation | Batch 590/1567 | Loss: 1.0523 [2026-04-22 00:26:27] Validation | Batch 600/1567 | Loss: 1.0512 [2026-04-22 00:26:28] Validation | Batch 610/1567 | Loss: 1.0532 [2026-04-22 00:26:30] Validation | Batch 620/1567 | Loss: 1.0512 [2026-04-22 00:26:31] Validation | Batch 630/1567 | Loss: 1.0514 [2026-04-22 00:26:33] Validation | Batch 640/1567 | Loss: 1.0520 [2026-04-22 00:26:34] Validation | Batch 650/1567 | Loss: 1.0549 [2026-04-22 00:26:35] Validation | Batch 660/1567 | Loss: 1.0562 [2026-04-22 00:26:36] Validation | Batch 670/1567 | Loss: 1.0545 [2026-04-22 00:26:37] Validation | Batch 680/1567 | Loss: 1.0533 [2026-04-22 00:26:38] Validation | Batch 690/1567 | Loss: 1.0518 [2026-04-22 00:26:40] Validation | Batch 700/1567 | Loss: 1.0518 [2026-04-22 00:26:41] Validation | Batch 710/1567 | Loss: 1.0511 [2026-04-22 00:26:42] Validation | Batch 720/1567 | Loss: 1.0479 [2026-04-22 00:26:43] Validation | Batch 730/1567 | Loss: 1.0485 [2026-04-22 00:26:44] Validation | Batch 740/1567 | Loss: 1.0491 [2026-04-22 00:26:45] Validation | Batch 750/1567 | Loss: 1.0487 [2026-04-22 00:26:46] Validation | Batch 760/1567 | Loss: 1.0500 [2026-04-22 00:26:48] Validation | Batch 770/1567 | Loss: 1.0495 [2026-04-22 00:26:49] Validation | Batch 780/1567 | Loss: 1.0506 [2026-04-22 00:26:50] Validation | Batch 790/1567 | Loss: 1.0491 [2026-04-22 00:26:51] Validation | Batch 800/1567 | Loss: 1.0474 [2026-04-22 00:26:52] Validation | Batch 810/1567 | Loss: 1.0480 [2026-04-22 00:26:53] Validation | Batch 820/1567 | Loss: 1.0472 [2026-04-22 00:26:54] Validation | Batch 830/1567 | Loss: 1.0464 [2026-04-22 00:26:55] Validation | Batch 840/1567 | Loss: 1.0471 [2026-04-22 00:26:56] Validation | Batch 850/1567 | Loss: 1.0482 [2026-04-22 00:26:57] Validation | Batch 860/1567 | Loss: 1.0489 [2026-04-22 00:26:58] Validation | Batch 870/1567 | Loss: 1.0498 [2026-04-22 00:26:59] Validation | Batch 880/1567 | Loss: 1.0496 [2026-04-22 00:27:01] Validation | Batch 890/1567 | Loss: 1.0492 [2026-04-22 00:27:02] Validation | Batch 900/1567 | Loss: 1.0489 [2026-04-22 00:27:03] Validation | Batch 910/1567 | Loss: 1.0486 [2026-04-22 00:27:04] Validation | Batch 920/1567 | Loss: 1.0504 [2026-04-22 00:27:05] Validation | Batch 930/1567 | Loss: 1.0503 [2026-04-22 00:27:07] Validation | Batch 940/1567 | Loss: 1.0502 [2026-04-22 00:27:08] Validation | Batch 950/1567 | Loss: 1.0498 [2026-04-22 00:27:09] Validation | Batch 960/1567 | Loss: 1.0501 [2026-04-22 00:27:10] Validation | Batch 970/1567 | Loss: 1.0506 [2026-04-22 00:27:11] Validation | Batch 980/1567 | Loss: 1.0503 [2026-04-22 00:27:12] Validation | Batch 990/1567 | Loss: 1.0512 [2026-04-22 00:27:13] Validation | Batch 1000/1567 | Loss: 1.0516 [2026-04-22 00:27:14] Validation | Batch 1010/1567 | Loss: 1.0508 [2026-04-22 00:27:15] Validation | Batch 1020/1567 | Loss: 1.0519 [2026-04-22 00:27:16] Validation | Batch 1030/1567 | Loss: 1.0524 [2026-04-22 00:27:18] Validation | Batch 1040/1567 | Loss: 1.0516 [2026-04-22 00:27:19] Validation | Batch 1050/1567 | Loss: 1.0505 [2026-04-22 00:27:20] Validation | Batch 1060/1567 | Loss: 1.0517 [2026-04-22 00:27:22] Validation | Batch 1070/1567 | Loss: 1.0516 [2026-04-22 00:27:23] Validation | Batch 1080/1567 | Loss: 1.0529 [2026-04-22 00:27:24] Validation | Batch 1090/1567 | Loss: 1.0555 [2026-04-22 00:27:25] Validation | Batch 1100/1567 | Loss: 1.0571 [2026-04-22 00:27:26] Validation | Batch 1110/1567 | Loss: 1.0560 [2026-04-22 00:27:27] Validation | Batch 1120/1567 | Loss: 1.0562 [2026-04-22 00:27:29] Validation | Batch 1130/1567 | Loss: 1.0544 [2026-04-22 00:27:30] Validation | Batch 1140/1567 | Loss: 1.0549 [2026-04-22 00:27:31] Validation | Batch 1150/1567 | Loss: 1.0536 [2026-04-22 00:27:32] Validation | Batch 1160/1567 | Loss: 1.0530 [2026-04-22 00:27:33] Validation | Batch 1170/1567 | Loss: 1.0532 [2026-04-22 00:27:34] Validation | Batch 1180/1567 | Loss: 1.0535 [2026-04-22 00:27:35] Validation | Batch 1190/1567 | Loss: 1.0537 [2026-04-22 00:27:37] Validation | Batch 1200/1567 | Loss: 1.0524 [2026-04-22 00:27:38] Validation | Batch 1210/1567 | Loss: 1.0518 [2026-04-22 00:27:39] Validation | Batch 1220/1567 | Loss: 1.0527 [2026-04-22 00:27:40] Validation | Batch 1230/1567 | Loss: 1.0532 [2026-04-22 00:27:41] Validation | Batch 1240/1567 | Loss: 1.0530 [2026-04-22 00:27:42] Validation | Batch 1250/1567 | Loss: 1.0534 [2026-04-22 00:27:44] Validation | Batch 1260/1567 | Loss: 1.0531 [2026-04-22 00:27:45] Validation | Batch 1270/1567 | Loss: 1.0514 [2026-04-22 00:27:46] Validation | Batch 1280/1567 | Loss: 1.0515 [2026-04-22 00:27:48] Validation | Batch 1290/1567 | Loss: 1.0516 [2026-04-22 00:27:49] Validation | Batch 1300/1567 | Loss: 1.0520 [2026-04-22 00:27:50] Validation | Batch 1310/1567 | Loss: 1.0527 [2026-04-22 00:27:51] Validation | Batch 1320/1567 | Loss: 1.0533 [2026-04-22 00:27:53] Validation | Batch 1330/1567 | Loss: 1.0548 [2026-04-22 00:27:54] Validation | Batch 1340/1567 | Loss: 1.0544 [2026-04-22 00:27:55] Validation | Batch 1350/1567 | Loss: 1.0547 [2026-04-22 00:27:56] Validation | Batch 1360/1567 | Loss: 1.0538 [2026-04-22 00:27:57] Validation | Batch 1370/1567 | Loss: 1.0535 [2026-04-22 00:27:58] Validation | Batch 1380/1567 | Loss: 1.0535 [2026-04-22 00:27:59] Validation | Batch 1390/1567 | Loss: 1.0528 [2026-04-22 00:28:00] Validation | Batch 1400/1567 | Loss: 1.0524 [2026-04-22 00:28:01] Validation | Batch 1410/1567 | Loss: 1.0529 [2026-04-22 00:28:02] Validation | Batch 1420/1567 | Loss: 1.0529 [2026-04-22 00:28:04] Validation | Batch 1430/1567 | Loss: 1.0532 [2026-04-22 00:28:05] Validation | Batch 1440/1567 | Loss: 1.0540 [2026-04-22 00:28:06] Validation | Batch 1450/1567 | Loss: 1.0541 [2026-04-22 00:28:07] Validation | Batch 1460/1567 | Loss: 1.0535 [2026-04-22 00:28:08] Validation | Batch 1470/1567 | Loss: 1.0533 [2026-04-22 00:28:09] Validation | Batch 1480/1567 | Loss: 1.0530 [2026-04-22 00:28:10] Validation | Batch 1490/1567 | Loss: 1.0525 [2026-04-22 00:28:11] Validation | Batch 1500/1567 | Loss: 1.0523 [2026-04-22 00:28:12] Validation | Batch 1510/1567 | Loss: 1.0513 [2026-04-22 00:28:13] Validation | Batch 1520/1567 | Loss: 1.0512 [2026-04-22 00:28:14] Validation | Batch 1530/1567 | Loss: 1.0512 [2026-04-22 00:28:15] Validation | Batch 1540/1567 | Loss: 1.0519 [2026-04-22 00:28:16] Validation | Batch 1550/1567 | Loss: 1.0532 [2026-04-22 00:28:17] Validation | Batch 1560/1567 | Loss: 1.0528 [2026-04-22 00:28:18] Validation | Batch 1567/1567 | Loss: 1.0528 [2026-04-22 00:28:18] Validation | Loss: 1.0528 | PPL: 2.92 | Time: 184.83s [2026-04-22 00:28:36] New best model saved! Val loss: 1.0528 [2026-04-22 00:28:42] Epoch 2 | Step 19010 | Loss: 0.8972 | LR: 2.00e-06 [2026-04-22 00:28:46] Epoch 2 | Step 19020 | Loss: 0.8971 | LR: 2.00e-06 [2026-04-22 00:28:52] Epoch 2 | Step 19030 | Loss: 0.8971 | LR: 2.00e-06 [2026-04-22 00:28:57] Epoch 2 | Step 19040 | Loss: 0.8970 | LR: 2.00e-06 [2026-04-22 00:29:03] Epoch 2 | Step 19050 | Loss: 0.8969 | LR: 2.00e-06 [2026-04-22 00:29:08] Epoch 2 | Step 19060 | Loss: 0.8969 | LR: 2.00e-06 [2026-04-22 00:29:14] Epoch 2 | Step 19070 | Loss: 0.8969 | LR: 2.00e-06 [2026-04-22 00:29:20] Epoch 2 | Step 19080 | Loss: 0.8969 | LR: 2.00e-06 [2026-04-22 00:29:25] Epoch 2 | Step 19090 | Loss: 0.8969 | LR: 2.00e-06 [2026-04-22 00:29:30] Epoch 2 | Step 19100 | Loss: 0.8968 | LR: 2.00e-06 [2026-04-22 00:29:35] Epoch 2 | Step 19110 | Loss: 0.8967 | LR: 2.00e-06 [2026-04-22 00:29:40] Epoch 2 | Step 19120 | Loss: 0.8966 | LR: 2.00e-06 [2026-04-22 00:29:46] Epoch 2 | Step 19130 | Loss: 0.8966 | LR: 2.00e-06 [2026-04-22 00:29:51] Epoch 2 | Step 19140 | Loss: 0.8965 | LR: 2.00e-06 [2026-04-22 00:29:56] Epoch 2 | Step 19150 | Loss: 0.8964 | LR: 2.00e-06 [2026-04-22 00:30:01] Epoch 2 | Step 19160 | Loss: 0.8965 | LR: 2.00e-06 [2026-04-22 00:30:06] Epoch 2 | Step 19170 | Loss: 0.8966 | LR: 2.00e-06 [2026-04-22 00:30:12] Epoch 2 | Step 19180 | Loss: 0.8965 | LR: 2.00e-06 [2026-04-22 00:30:17] Epoch 2 | Step 19190 | Loss: 0.8965 | LR: 2.00e-06 [2026-04-22 00:30:22] Epoch 2 | Step 19200 | Loss: 0.8966 | LR: 2.00e-06 [2026-04-22 00:30:27] Epoch 2 | Step 19210 | Loss: 0.8966 | LR: 2.00e-06 [2026-04-22 00:30:33] Epoch 2 | Step 19220 | Loss: 0.8966 | LR: 2.00e-06 [2026-04-22 00:30:39] Epoch 2 | Step 19230 | Loss: 0.8966 | LR: 2.00e-06 [2026-04-22 00:30:44] Epoch 2 | Step 19240 | Loss: 0.8965 | LR: 2.00e-06 [2026-04-22 00:30:50] Epoch 2 | Step 19250 | Loss: 0.8966 | LR: 2.00e-06 [2026-04-22 00:30:55] Epoch 2 | Step 19260 | Loss: 0.8966 | LR: 2.00e-06 [2026-04-22 00:31:00] Epoch 2 | Step 19270 | Loss: 0.8965 | LR: 2.00e-06 [2026-04-22 00:31:05] Epoch 2 | Step 19280 | Loss: 0.8965 | LR: 2.00e-06 [2026-04-22 00:31:10] Epoch 2 | Step 19290 | Loss: 0.8963 | LR: 2.00e-06 [2026-04-22 00:31:15] Epoch 2 | Step 19300 | Loss: 0.8964 | LR: 2.00e-06 [2026-04-22 00:31:21] Epoch 2 | Step 19310 | Loss: 0.8964 | LR: 2.00e-06 [2026-04-22 00:31:26] Epoch 2 | Step 19320 | Loss: 0.8963 | LR: 2.00e-06 [2026-04-22 00:31:31] Epoch 2 | Step 19330 | Loss: 0.8963 | LR: 2.00e-06 [2026-04-22 00:31:36] Epoch 2 | Step 19340 | Loss: 0.8963 | LR: 2.00e-06 [2026-04-22 00:31:42] Epoch 2 | Step 19350 | Loss: 0.8963 | LR: 2.00e-06 [2026-04-22 00:31:47] Epoch 2 | Step 19360 | Loss: 0.8962 | LR: 2.00e-06 [2026-04-22 00:31:52] Epoch 2 | Step 19370 | Loss: 0.8962 | LR: 2.00e-06 [2026-04-22 00:31:57] Epoch 2 | Step 19380 | Loss: 0.8963 | LR: 2.00e-06 [2026-04-22 00:32:02] Epoch 2 | Step 19390 | Loss: 0.8963 | LR: 2.00e-06 [2026-04-22 00:32:08] Epoch 2 | Step 19400 | Loss: 0.8963 | LR: 2.00e-06 [2026-04-22 00:32:13] Epoch 2 | Step 19410 | Loss: 0.8963 | LR: 2.00e-06 [2026-04-22 00:32:18] Epoch 2 | Step 19420 | Loss: 0.8961 | LR: 2.00e-06 [2026-04-22 00:32:23] Epoch 2 | Step 19430 | Loss: 0.8962 | LR: 2.00e-06 [2026-04-22 00:32:27] Epoch 2 | Step 19440 | Loss: 0.8961 | LR: 2.00e-06 [2026-04-22 00:32:33] Epoch 2 | Step 19450 | Loss: 0.8961 | LR: 2.00e-06 [2026-04-22 00:32:38] Epoch 2 | Step 19460 | Loss: 0.8961 | LR: 2.00e-06 [2026-04-22 00:32:43] Epoch 2 | Step 19470 | Loss: 0.8962 | LR: 2.00e-06 [2026-04-22 00:32:48] Epoch 2 | Step 19480 | Loss: 0.8961 | LR: 2.00e-06 [2026-04-22 00:32:54] Epoch 2 | Step 19490 | Loss: 0.8960 | LR: 2.00e-06 [2026-04-22 00:32:59] Epoch 2 | Step 19500 | Loss: 0.8961 | LR: 2.00e-06 [2026-04-22 00:33:04] Epoch 2 | Step 19510 | Loss: 0.8960 | LR: 2.00e-06 [2026-04-22 00:33:09] Epoch 2 | Step 19520 | Loss: 0.8959 | LR: 2.00e-06 [2026-04-22 00:33:14] Epoch 2 | Step 19530 | Loss: 0.8960 | LR: 2.00e-06 [2026-04-22 00:33:20] Epoch 2 | Step 19540 | Loss: 0.8960 | LR: 2.00e-06 [2026-04-22 00:33:24] Epoch 2 | Step 19550 | Loss: 0.8961 | LR: 2.00e-06 [2026-04-22 00:33:31] Epoch 2 | Step 19560 | Loss: 0.8962 | LR: 2.00e-06 [2026-04-22 00:33:36] Epoch 2 | Step 19570 | Loss: 0.8962 | LR: 2.00e-06 [2026-04-22 00:33:41] Epoch 2 | Step 19580 | Loss: 0.8962 | LR: 2.00e-06 [2026-04-22 00:33:47] Epoch 2 | Step 19590 | Loss: 0.8962 | LR: 2.00e-06 [2026-04-22 00:33:53] Epoch 2 | Step 19600 | Loss: 0.8962 | LR: 2.00e-06 [2026-04-22 00:33:58] Epoch 2 | Step 19610 | Loss: 0.8961 | LR: 2.00e-06 [2026-04-22 00:34:03] Epoch 2 | Step 19620 | Loss: 0.8962 | LR: 2.00e-06 [2026-04-22 00:34:09] Epoch 2 | Step 19630 | Loss: 0.8961 | LR: 2.00e-06 [2026-04-22 00:34:14] Epoch 2 | Step 19640 | Loss: 0.8960 | LR: 2.00e-06 [2026-04-22 00:34:19] Epoch 2 | Step 19650 | Loss: 0.8959 | LR: 2.00e-06 [2026-04-22 00:34:24] Epoch 2 | Step 19660 | Loss: 0.8959 | LR: 2.00e-06 [2026-04-22 00:34:30] Epoch 2 | Step 19670 | Loss: 0.8959 | LR: 2.00e-06 [2026-04-22 00:34:35] Epoch 2 | Step 19680 | Loss: 0.8959 | LR: 2.00e-06 [2026-04-22 00:34:41] Epoch 2 | Step 19690 | Loss: 0.8959 | LR: 2.00e-06 [2026-04-22 00:34:46] Epoch 2 | Step 19700 | Loss: 0.8958 | LR: 2.00e-06 [2026-04-22 00:34:52] Epoch 2 | Step 19710 | Loss: 0.8959 | LR: 2.00e-06 [2026-04-22 00:34:57] Epoch 2 | Step 19720 | Loss: 0.8959 | LR: 2.00e-06 [2026-04-22 00:35:02] Epoch 2 | Step 19730 | Loss: 0.8958 | LR: 2.00e-06 [2026-04-22 00:35:08] Epoch 2 | Step 19740 | Loss: 0.8957 | LR: 2.00e-06 [2026-04-22 00:35:14] Epoch 2 | Step 19750 | Loss: 0.8957 | LR: 2.00e-06 [2026-04-22 00:35:20] Epoch 2 | Step 19760 | Loss: 0.8957 | LR: 2.00e-06 [2026-04-22 00:35:25] Epoch 2 | Step 19770 | Loss: 0.8957 | LR: 2.00e-06 [2026-04-22 00:35:30] Epoch 2 | Step 19780 | Loss: 0.8957 | LR: 2.00e-06 [2026-04-22 00:35:36] Epoch 2 | Step 19790 | Loss: 0.8957 | LR: 2.00e-06 [2026-04-22 00:35:41] Epoch 2 | Step 19800 | Loss: 0.8958 | LR: 2.00e-06 [2026-04-22 00:35:47] Epoch 2 | Step 19810 | Loss: 0.8957 | LR: 2.00e-06 [2026-04-22 00:35:52] Epoch 2 | Step 19820 | Loss: 0.8956 | LR: 2.00e-06 [2026-04-22 00:35:57] Epoch 2 | Step 19830 | Loss: 0.8955 | LR: 2.00e-06 [2026-04-22 00:36:02] Epoch 2 | Step 19840 | Loss: 0.8956 | LR: 2.00e-06 [2026-04-22 00:36:08] Epoch 2 | Step 19850 | Loss: 0.8956 | LR: 2.00e-06 [2026-04-22 00:36:13] Epoch 2 | Step 19860 | Loss: 0.8955 | LR: 2.00e-06 [2026-04-22 00:36:18] Epoch 2 | Step 19870 | Loss: 0.8954 | LR: 2.00e-06 [2026-04-22 00:36:24] Epoch 2 | Step 19880 | Loss: 0.8955 | LR: 2.00e-06 [2026-04-22 00:36:29] Epoch 2 | Step 19890 | Loss: 0.8955 | LR: 2.00e-06 [2026-04-22 00:36:35] Epoch 2 | Step 19900 | Loss: 0.8954 | LR: 2.00e-06 [2026-04-22 00:36:41] Epoch 2 | Step 19910 | Loss: 0.8954 | LR: 2.00e-06 [2026-04-22 00:36:46] Epoch 2 | Step 19920 | Loss: 0.8954 | LR: 2.00e-06 [2026-04-22 00:36:52] Epoch 2 | Step 19930 | Loss: 0.8953 | LR: 2.00e-06 [2026-04-22 00:36:57] Epoch 2 | Step 19940 | Loss: 0.8954 | LR: 2.00e-06 [2026-04-22 00:37:02] Epoch 2 | Step 19950 | Loss: 0.8955 | LR: 2.00e-06 [2026-04-22 00:37:07] Epoch 2 | Step 19960 | Loss: 0.8955 | LR: 2.00e-06 [2026-04-22 00:37:13] Epoch 2 | Step 19970 | Loss: 0.8955 | LR: 2.00e-06 [2026-04-22 00:37:18] Epoch 2 | Step 19980 | Loss: 0.8955 | LR: 2.00e-06 [2026-04-22 00:37:23] Epoch 2 | Step 19990 | Loss: 0.8956 | LR: 2.00e-06 [2026-04-22 00:37:29] Epoch 2 | Step 20000 | Loss: 0.8955 | LR: 2.00e-06 [2026-04-22 00:37:30] Validation | Batch 10/1567 | Loss: 1.0386 [2026-04-22 00:37:31] Validation | Batch 20/1567 | Loss: 1.1231 [2026-04-22 00:37:33] Validation | Batch 30/1567 | Loss: 1.0797 [2026-04-22 00:37:34] Validation | Batch 40/1567 | Loss: 1.1004 [2026-04-22 00:37:35] Validation | Batch 50/1567 | Loss: 1.0767 [2026-04-22 00:37:36] Validation | Batch 60/1567 | Loss: 1.0652 [2026-04-22 00:37:38] Validation | Batch 70/1567 | Loss: 1.0564 [2026-04-22 00:37:39] Validation | Batch 80/1567 | Loss: 1.0646 [2026-04-22 00:37:40] Validation | Batch 90/1567 | Loss: 1.0592 [2026-04-22 00:37:42] Validation | Batch 100/1567 | Loss: 1.0382 [2026-04-22 00:37:43] Validation | Batch 110/1567 | Loss: 1.0285 [2026-04-22 00:37:44] Validation | Batch 120/1567 | Loss: 1.0233 [2026-04-22 00:37:45] Validation | Batch 130/1567 | Loss: 1.0180 [2026-04-22 00:37:47] Validation | Batch 140/1567 | Loss: 1.0282 [2026-04-22 00:37:48] Validation | Batch 150/1567 | Loss: 1.0390 [2026-04-22 00:37:49] Validation | Batch 160/1567 | Loss: 1.0379 [2026-04-22 00:37:50] Validation | Batch 170/1567 | Loss: 1.0301 [2026-04-22 00:37:51] Validation | Batch 180/1567 | Loss: 1.0330 [2026-04-22 00:37:52] Validation | Batch 190/1567 | Loss: 1.0374 [2026-04-22 00:37:53] Validation | Batch 200/1567 | Loss: 1.0409 [2026-04-22 00:37:55] Validation | Batch 210/1567 | Loss: 1.0389 [2026-04-22 00:37:56] Validation | Batch 220/1567 | Loss: 1.0442 [2026-04-22 00:37:57] Validation | Batch 230/1567 | Loss: 1.0480 [2026-04-22 00:37:59] Validation | Batch 240/1567 | Loss: 1.0499 [2026-04-22 00:38:00] Validation | Batch 250/1567 | Loss: 1.0542 [2026-04-22 00:38:01] Validation | Batch 260/1567 | Loss: 1.0571 [2026-04-22 00:38:02] Validation | Batch 270/1567 | Loss: 1.0616 [2026-04-22 00:38:04] Validation | Batch 280/1567 | Loss: 1.0649 [2026-04-22 00:38:05] Validation | Batch 290/1567 | Loss: 1.0603 [2026-04-22 00:38:07] Validation | Batch 300/1567 | Loss: 1.0596 [2026-04-22 00:38:08] Validation | Batch 310/1567 | Loss: 1.0563 [2026-04-22 00:38:09] Validation | Batch 320/1567 | Loss: 1.0592 [2026-04-22 00:38:10] Validation | Batch 330/1567 | Loss: 1.0593 [2026-04-22 00:38:12] Validation | Batch 340/1567 | Loss: 1.0586 [2026-04-22 00:38:13] Validation | Batch 350/1567 | Loss: 1.0561 [2026-04-22 00:38:14] Validation | Batch 360/1567 | Loss: 1.0499 [2026-04-22 00:38:15] Validation | Batch 370/1567 | Loss: 1.0500 [2026-04-22 00:38:16] Validation | Batch 380/1567 | Loss: 1.0541 [2026-04-22 00:38:18] Validation | Batch 390/1567 | Loss: 1.0531 [2026-04-22 00:38:18] Validation | Batch 400/1567 | Loss: 1.0539 [2026-04-22 00:38:20] Validation | Batch 410/1567 | Loss: 1.0500 [2026-04-22 00:38:21] Validation | Batch 420/1567 | Loss: 1.0484 [2026-04-22 00:38:22] Validation | Batch 430/1567 | Loss: 1.0510 [2026-04-22 00:38:24] Validation | Batch 440/1567 | Loss: 1.0511 [2026-04-22 00:38:25] Validation | Batch 450/1567 | Loss: 1.0531 [2026-04-22 00:38:26] Validation | Batch 460/1567 | Loss: 1.0557 [2026-04-22 00:38:27] Validation | Batch 470/1567 | Loss: 1.0606 [2026-04-22 00:38:28] Validation | Batch 480/1567 | Loss: 1.0582 [2026-04-22 00:38:29] Validation | Batch 490/1567 | Loss: 1.0559 [2026-04-22 00:38:30] Validation | Batch 500/1567 | Loss: 1.0570 [2026-04-22 00:38:32] Validation | Batch 510/1567 | Loss: 1.0568 [2026-04-22 00:38:32] Validation | Batch 520/1567 | Loss: 1.0581 [2026-04-22 00:38:34] Validation | Batch 530/1567 | Loss: 1.0566 [2026-04-22 00:38:35] Validation | Batch 540/1567 | Loss: 1.0539 [2026-04-22 00:38:37] Validation | Batch 550/1567 | Loss: 1.0550 [2026-04-22 00:38:38] Validation | Batch 560/1567 | Loss: 1.0542 [2026-04-22 00:38:39] Validation | Batch 570/1567 | Loss: 1.0501 [2026-04-22 00:38:40] Validation | Batch 580/1567 | Loss: 1.0519 [2026-04-22 00:38:42] Validation | Batch 590/1567 | Loss: 1.0517 [2026-04-22 00:38:43] Validation | Batch 600/1567 | Loss: 1.0506 [2026-04-22 00:38:44] Validation | Batch 610/1567 | Loss: 1.0527 [2026-04-22 00:38:45] Validation | Batch 620/1567 | Loss: 1.0506 [2026-04-22 00:38:47] Validation | Batch 630/1567 | Loss: 1.0508 [2026-04-22 00:38:48] Validation | Batch 640/1567 | Loss: 1.0515 [2026-04-22 00:38:50] Validation | Batch 650/1567 | Loss: 1.0543 [2026-04-22 00:38:51] Validation | Batch 660/1567 | Loss: 1.0556 [2026-04-22 00:38:52] Validation | Batch 670/1567 | Loss: 1.0539 [2026-04-22 00:38:53] Validation | Batch 680/1567 | Loss: 1.0527 [2026-04-22 00:38:54] Validation | Batch 690/1567 | Loss: 1.0512 [2026-04-22 00:38:55] Validation | Batch 700/1567 | Loss: 1.0513 [2026-04-22 00:38:57] Validation | Batch 710/1567 | Loss: 1.0505 [2026-04-22 00:38:57] Validation | Batch 720/1567 | Loss: 1.0474 [2026-04-22 00:38:58] Validation | Batch 730/1567 | Loss: 1.0479 [2026-04-22 00:38:59] Validation | Batch 740/1567 | Loss: 1.0486 [2026-04-22 00:39:01] Validation | Batch 750/1567 | Loss: 1.0482 [2026-04-22 00:39:02] Validation | Batch 760/1567 | Loss: 1.0495 [2026-04-22 00:39:03] Validation | Batch 770/1567 | Loss: 1.0490 [2026-04-22 00:39:05] Validation | Batch 780/1567 | Loss: 1.0501 [2026-04-22 00:39:06] Validation | Batch 790/1567 | Loss: 1.0486 [2026-04-22 00:39:07] Validation | Batch 800/1567 | Loss: 1.0468 [2026-04-22 00:39:08] Validation | Batch 810/1567 | Loss: 1.0474 [2026-04-22 00:39:09] Validation | Batch 820/1567 | Loss: 1.0467 [2026-04-22 00:39:10] Validation | Batch 830/1567 | Loss: 1.0459 [2026-04-22 00:39:11] Validation | Batch 840/1567 | Loss: 1.0466 [2026-04-22 00:39:12] Validation | Batch 850/1567 | Loss: 1.0477 [2026-04-22 00:39:13] Validation | Batch 860/1567 | Loss: 1.0484 [2026-04-22 00:39:14] Validation | Batch 870/1567 | Loss: 1.0492 [2026-04-22 00:39:15] Validation | Batch 880/1567 | Loss: 1.0491 [2026-04-22 00:39:16] Validation | Batch 890/1567 | Loss: 1.0487 [2026-04-22 00:39:18] Validation | Batch 900/1567 | Loss: 1.0483 [2026-04-22 00:39:19] Validation | Batch 910/1567 | Loss: 1.0480 [2026-04-22 00:39:20] Validation | Batch 920/1567 | Loss: 1.0499 [2026-04-22 00:39:21] Validation | Batch 930/1567 | Loss: 1.0498 [2026-04-22 00:39:22] Validation | Batch 940/1567 | Loss: 1.0497 [2026-04-22 00:39:23] Validation | Batch 950/1567 | Loss: 1.0493 [2026-04-22 00:39:24] Validation | Batch 960/1567 | Loss: 1.0496 [2026-04-22 00:39:25] Validation | Batch 970/1567 | Loss: 1.0501 [2026-04-22 00:39:26] Validation | Batch 980/1567 | Loss: 1.0498 [2026-04-22 00:39:27] Validation | Batch 990/1567 | Loss: 1.0507 [2026-04-22 00:39:28] Validation | Batch 1000/1567 | Loss: 1.0511 [2026-04-22 00:39:29] Validation | Batch 1010/1567 | Loss: 1.0502 [2026-04-22 00:39:31] Validation | Batch 1020/1567 | Loss: 1.0514 [2026-04-22 00:39:32] Validation | Batch 1030/1567 | Loss: 1.0519 [2026-04-22 00:39:33] Validation | Batch 1040/1567 | Loss: 1.0510 [2026-04-22 00:39:34] Validation | Batch 1050/1567 | Loss: 1.0500 [2026-04-22 00:39:36] Validation | Batch 1060/1567 | Loss: 1.0512 [2026-04-22 00:39:37] Validation | Batch 1070/1567 | Loss: 1.0510 [2026-04-22 00:39:38] Validation | Batch 1080/1567 | Loss: 1.0523 [2026-04-22 00:39:39] Validation | Batch 1090/1567 | Loss: 1.0549 [2026-04-22 00:39:41] Validation | Batch 1100/1567 | Loss: 1.0565 [2026-04-22 00:39:42] Validation | Batch 1110/1567 | Loss: 1.0555 [2026-04-22 00:39:43] Validation | Batch 1120/1567 | Loss: 1.0556 [2026-04-22 00:39:44] Validation | Batch 1130/1567 | Loss: 1.0539 [2026-04-22 00:39:45] Validation | Batch 1140/1567 | Loss: 1.0543 [2026-04-22 00:39:47] Validation | Batch 1150/1567 | Loss: 1.0531 [2026-04-22 00:39:47] Validation | Batch 1160/1567 | Loss: 1.0524 [2026-04-22 00:39:48] Validation | Batch 1170/1567 | Loss: 1.0527 [2026-04-22 00:39:50] Validation | Batch 1180/1567 | Loss: 1.0529 [2026-04-22 00:39:51] Validation | Batch 1190/1567 | Loss: 1.0532 [2026-04-22 00:39:52] Validation | Batch 1200/1567 | Loss: 1.0519 [2026-04-22 00:39:53] Validation | Batch 1210/1567 | Loss: 1.0512 [2026-04-22 00:39:54] Validation | Batch 1220/1567 | Loss: 1.0521 [2026-04-22 00:39:56] Validation | Batch 1230/1567 | Loss: 1.0526 [2026-04-22 00:39:57] Validation | Batch 1240/1567 | Loss: 1.0525 [2026-04-22 00:39:58] Validation | Batch 1250/1567 | Loss: 1.0528 [2026-04-22 00:39:59] Validation | Batch 1260/1567 | Loss: 1.0526 [2026-04-22 00:40:01] Validation | Batch 1270/1567 | Loss: 1.0508 [2026-04-22 00:40:02] Validation | Batch 1280/1567 | Loss: 1.0510 [2026-04-22 00:40:04] Validation | Batch 1290/1567 | Loss: 1.0511 [2026-04-22 00:40:05] Validation | Batch 1300/1567 | Loss: 1.0515 [2026-04-22 00:40:06] Validation | Batch 1310/1567 | Loss: 1.0522 [2026-04-22 00:40:07] Validation | Batch 1320/1567 | Loss: 1.0528 [2026-04-22 00:40:08] Validation | Batch 1330/1567 | Loss: 1.0542 [2026-04-22 00:40:09] Validation | Batch 1340/1567 | Loss: 1.0539 [2026-04-22 00:40:10] Validation | Batch 1350/1567 | Loss: 1.0542 [2026-04-22 00:40:11] Validation | Batch 1360/1567 | Loss: 1.0533 [2026-04-22 00:40:12] Validation | Batch 1370/1567 | Loss: 1.0530 [2026-04-22 00:40:14] Validation | Batch 1380/1567 | Loss: 1.0530 [2026-04-22 00:40:15] Validation | Batch 1390/1567 | Loss: 1.0522 [2026-04-22 00:40:16] Validation | Batch 1400/1567 | Loss: 1.0519 [2026-04-22 00:40:17] Validation | Batch 1410/1567 | Loss: 1.0524 [2026-04-22 00:40:18] Validation | Batch 1420/1567 | Loss: 1.0524 [2026-04-22 00:40:19] Validation | Batch 1430/1567 | Loss: 1.0527 [2026-04-22 00:40:20] Validation | Batch 1440/1567 | Loss: 1.0535 [2026-04-22 00:40:21] Validation | Batch 1450/1567 | Loss: 1.0536 [2026-04-22 00:40:22] Validation | Batch 1460/1567 | Loss: 1.0530 [2026-04-22 00:40:23] Validation | Batch 1470/1567 | Loss: 1.0528 [2026-04-22 00:40:24] Validation | Batch 1480/1567 | Loss: 1.0525 [2026-04-22 00:40:25] Validation | Batch 1490/1567 | Loss: 1.0520 [2026-04-22 00:40:26] Validation | Batch 1500/1567 | Loss: 1.0518 [2026-04-22 00:40:27] Validation | Batch 1510/1567 | Loss: 1.0508 [2026-04-22 00:40:28] Validation | Batch 1520/1567 | Loss: 1.0508 [2026-04-22 00:40:29] Validation | Batch 1530/1567 | Loss: 1.0508 [2026-04-22 00:40:31] Validation | Batch 1540/1567 | Loss: 1.0514 [2026-04-22 00:40:32] Validation | Batch 1550/1567 | Loss: 1.0527 [2026-04-22 00:40:33] Validation | Batch 1560/1567 | Loss: 1.0523 [2026-04-22 00:40:34] Validation | Batch 1567/1567 | Loss: 1.0523 [2026-04-22 00:40:34] Validation | Loss: 1.0523 | PPL: 2.91 | Time: 184.72s [2026-04-22 00:40:52] New best model saved! Val loss: 1.0523 [2026-04-22 00:40:57] Epoch 2 | Step 20010 | Loss: 0.8954 | LR: 2.00e-06 [2026-04-22 00:41:02] Epoch 2 | Step 20020 | Loss: 0.8954 | LR: 2.00e-06 [2026-04-22 00:41:08] Epoch 2 | Step 20030 | Loss: 0.8954 | LR: 2.00e-06 [2026-04-22 00:41:13] Epoch 2 | Step 20040 | Loss: 0.8953 | LR: 2.00e-06 [2026-04-22 00:41:19] Epoch 2 | Step 20050 | Loss: 0.8953 | LR: 2.00e-06 [2026-04-22 00:41:23] Epoch 2 | Step 20060 | Loss: 0.8952 | LR: 2.00e-06 [2026-04-22 00:41:29] Epoch 2 | Step 20070 | Loss: 0.8952 | LR: 2.00e-06 [2026-04-22 00:41:34] Epoch 2 | Step 20080 | Loss: 0.8951 | LR: 2.00e-06 [2026-04-22 00:41:39] Epoch 2 | Step 20090 | Loss: 0.8951 | LR: 2.00e-06 [2026-04-22 00:41:44] Epoch 2 | Step 20100 | Loss: 0.8950 | LR: 2.00e-06 [2026-04-22 00:41:50] Epoch 2 | Step 20110 | Loss: 0.8949 | LR: 2.00e-06 [2026-04-22 00:41:56] Epoch 2 | Step 20120 | Loss: 0.8950 | LR: 2.00e-06 [2026-04-22 00:42:01] Epoch 2 | Step 20130 | Loss: 0.8949 | LR: 2.00e-06 [2026-04-22 00:42:06] Epoch 2 | Step 20140 | Loss: 0.8949 | LR: 2.00e-06 [2026-04-22 00:42:11] Epoch 2 | Step 20150 | Loss: 0.8949 | LR: 2.00e-06 [2026-04-22 00:42:17] Epoch 2 | Step 20160 | Loss: 0.8948 | LR: 2.00e-06 [2026-04-22 00:42:22] Epoch 2 | Step 20170 | Loss: 0.8949 | LR: 2.00e-06 [2026-04-22 00:42:28] Epoch 2 | Step 20180 | Loss: 0.8949 | LR: 2.00e-06 [2026-04-22 00:42:34] Epoch 2 | Step 20190 | Loss: 0.8950 | LR: 2.00e-06 [2026-04-22 00:42:39] Epoch 2 | Step 20200 | Loss: 0.8949 | LR: 2.00e-06 [2026-04-22 00:42:45] Epoch 2 | Step 20210 | Loss: 0.8949 | LR: 2.00e-06 [2026-04-22 00:42:51] Epoch 2 | Step 20220 | Loss: 0.8950 | LR: 2.00e-06 [2026-04-22 00:42:56] Epoch 2 | Step 20230 | Loss: 0.8951 | LR: 2.00e-06 [2026-04-22 00:43:01] Epoch 2 | Step 20240 | Loss: 0.8950 | LR: 2.00e-06 [2026-04-22 00:43:06] Epoch 2 | Step 20250 | Loss: 0.8950 | LR: 2.00e-06 [2026-04-22 00:43:12] Epoch 2 | Step 20260 | Loss: 0.8949 | LR: 2.00e-06 [2026-04-22 00:43:16] Epoch 2 | Step 20270 | Loss: 0.8950 | LR: 2.00e-06 [2026-04-22 00:43:22] Epoch 2 | Step 20280 | Loss: 0.8950 | LR: 2.00e-06 [2026-04-22 00:43:26] Epoch 2 | Step 20290 | Loss: 0.8950 | LR: 2.00e-06 [2026-04-22 00:43:32] Epoch 2 | Step 20300 | Loss: 0.8949 | LR: 2.00e-06 [2026-04-22 00:43:38] Epoch 2 | Step 20310 | Loss: 0.8949 | LR: 2.00e-06 [2026-04-22 00:43:43] Epoch 2 | Step 20320 | Loss: 0.8947 | LR: 2.00e-06 [2026-04-22 00:43:49] Epoch 2 | Step 20330 | Loss: 0.8948 | LR: 2.00e-06 [2026-04-22 00:43:54] Epoch 2 | Step 20340 | Loss: 0.8949 | LR: 2.00e-06 [2026-04-22 00:43:59] Epoch 2 | Step 20350 | Loss: 0.8948 | LR: 2.00e-06 [2026-04-22 00:44:05] Epoch 2 | Step 20360 | Loss: 0.8947 | LR: 2.00e-06 [2026-04-22 00:44:12] Epoch 2 | Step 20370 | Loss: 0.8947 | LR: 2.00e-06 [2026-04-22 00:44:17] Epoch 2 | Step 20380 | Loss: 0.8947 | LR: 2.00e-06 [2026-04-22 00:44:22] Epoch 2 | Step 20390 | Loss: 0.8946 | LR: 2.00e-06 [2026-04-22 00:44:27] Epoch 2 | Step 20400 | Loss: 0.8947 | LR: 2.00e-06 [2026-04-22 00:44:31] Epoch 2 | Step 20410 | Loss: 0.8947 | LR: 2.00e-06 [2026-04-22 00:44:37] Epoch 2 | Step 20420 | Loss: 0.8947 | LR: 2.00e-06 [2026-04-22 00:44:41] Epoch 2 | Step 20430 | Loss: 0.8947 | LR: 2.00e-06 [2026-04-22 00:44:46] Epoch 2 | Step 20440 | Loss: 0.8948 | LR: 2.00e-06 [2026-04-22 00:44:52] Epoch 2 | Step 20450 | Loss: 0.8948 | LR: 2.00e-06 [2026-04-22 00:44:57] Epoch 2 | Step 20460 | Loss: 0.8949 | LR: 2.00e-06 [2026-04-22 00:45:03] Epoch 2 | Step 20470 | Loss: 0.8947 | LR: 2.00e-06 [2026-04-22 00:45:08] Epoch 2 | Step 20480 | Loss: 0.8947 | LR: 2.00e-06 [2026-04-22 00:45:13] Epoch 2 | Step 20490 | Loss: 0.8947 | LR: 2.00e-06 [2026-04-22 00:45:18] Epoch 2 | Step 20500 | Loss: 0.8947 | LR: 2.00e-06 [2026-04-22 00:45:23] Epoch 2 | Step 20510 | Loss: 0.8948 | LR: 2.00e-06 [2026-04-22 00:45:28] Epoch 2 | Step 20520 | Loss: 0.8948 | LR: 2.00e-06 [2026-04-22 00:45:33] Epoch 2 | Step 20530 | Loss: 0.8948 | LR: 2.00e-06 [2026-04-22 00:45:37] Epoch 2 | Step 20540 | Loss: 0.8948 | LR: 2.00e-06 [2026-04-22 00:45:42] Epoch 2 | Step 20550 | Loss: 0.8948 | LR: 2.00e-06 [2026-04-22 00:45:48] Epoch 2 | Step 20560 | Loss: 0.8948 | LR: 2.00e-06 [2026-04-22 00:45:54] Epoch 2 | Step 20570 | Loss: 0.8949 | LR: 2.00e-06 [2026-04-22 00:46:00] Epoch 2 | Step 20580 | Loss: 0.8950 | LR: 2.00e-06 [2026-04-22 00:46:05] Epoch 2 | Step 20590 | Loss: 0.8949 | LR: 2.00e-06 [2026-04-22 00:46:10] Epoch 2 | Step 20600 | Loss: 0.8950 | LR: 2.00e-06 [2026-04-22 00:46:16] Epoch 2 | Step 20610 | Loss: 0.8950 | LR: 2.00e-06 [2026-04-22 00:46:21] Epoch 2 | Step 20620 | Loss: 0.8950 | LR: 2.00e-06 [2026-04-22 00:46:27] Epoch 2 | Step 20630 | Loss: 0.8949 | LR: 2.00e-06 [2026-04-22 00:46:32] Epoch 2 | Step 20640 | Loss: 0.8950 | LR: 2.00e-06 [2026-04-22 00:46:38] Epoch 2 | Step 20650 | Loss: 0.8950 | LR: 2.00e-06 [2026-04-22 00:46:43] Epoch 2 | Step 20660 | Loss: 0.8950 | LR: 2.00e-06 [2026-04-22 00:46:48] Epoch 2 | Step 20670 | Loss: 0.8950 | LR: 2.00e-06 [2026-04-22 00:46:53] Epoch 2 | Step 20680 | Loss: 0.8949 | LR: 2.00e-06 [2026-04-22 00:46:58] Epoch 2 | Step 20690 | Loss: 0.8949 | LR: 2.00e-06 [2026-04-22 00:47:04] Epoch 2 | Step 20700 | Loss: 0.8949 | LR: 2.00e-06 [2026-04-22 00:47:09] Epoch 2 | Step 20710 | Loss: 0.8949 | LR: 2.00e-06 [2026-04-22 00:47:14] Epoch 2 | Step 20720 | Loss: 0.8949 | LR: 2.00e-06 [2026-04-22 00:47:19] Epoch 2 | Step 20730 | Loss: 0.8949 | LR: 2.00e-06 [2026-04-22 00:47:24] Epoch 2 | Step 20740 | Loss: 0.8948 | LR: 2.00e-06 [2026-04-22 00:47:29] Epoch 2 | Step 20750 | Loss: 0.8949 | LR: 2.00e-06 [2026-04-22 00:47:35] Epoch 2 | Step 20760 | Loss: 0.8948 | LR: 2.00e-06 [2026-04-22 00:47:40] Epoch 2 | Step 20770 | Loss: 0.8947 | LR: 2.00e-06 [2026-04-22 00:47:46] Epoch 2 | Step 20780 | Loss: 0.8947 | LR: 2.00e-06 [2026-04-22 00:47:51] Epoch 2 | Step 20790 | Loss: 0.8947 | LR: 2.00e-06 [2026-04-22 00:47:55] Epoch 2 | Step 20800 | Loss: 0.8948 | LR: 2.00e-06 [2026-04-22 00:48:01] Epoch 2 | Step 20810 | Loss: 0.8947 | LR: 2.00e-06 [2026-04-22 00:48:06] Epoch 2 | Step 20820 | Loss: 0.8946 | LR: 2.00e-06 [2026-04-22 00:48:12] Epoch 2 | Step 20830 | Loss: 0.8947 | LR: 2.00e-06 [2026-04-22 00:48:17] Epoch 2 | Step 20840 | Loss: 0.8946 | LR: 2.00e-06 [2026-04-22 00:48:22] Epoch 2 | Step 20850 | Loss: 0.8947 | LR: 2.00e-06 [2026-04-22 00:48:27] Epoch 2 | Step 20860 | Loss: 0.8946 | LR: 2.00e-06 [2026-04-22 00:48:32] Epoch 2 | Step 20870 | Loss: 0.8946 | LR: 2.00e-06 [2026-04-22 00:48:38] Epoch 2 | Step 20880 | Loss: 0.8947 | LR: 2.00e-06 [2026-04-22 00:48:44] Epoch 2 | Step 20890 | Loss: 0.8947 | LR: 2.00e-06 [2026-04-22 00:48:48] Epoch 2 | Step 20900 | Loss: 0.8946 | LR: 2.00e-06 [2026-04-22 00:48:53] Epoch 2 | Step 20910 | Loss: 0.8946 | LR: 2.00e-06 [2026-04-22 00:48:58] Epoch 2 | Step 20920 | Loss: 0.8945 | LR: 2.00e-06 [2026-04-22 00:49:04] Epoch 2 | Step 20930 | Loss: 0.8944 | LR: 2.00e-06 [2026-04-22 00:49:09] Epoch 2 | Step 20940 | Loss: 0.8947 | LR: 2.00e-06 [2026-04-22 00:49:14] Epoch 2 | Step 20950 | Loss: 0.8946 | LR: 2.00e-06 [2026-04-22 00:49:20] Epoch 2 | Step 20960 | Loss: 0.8946 | LR: 2.00e-06 [2026-04-22 00:49:26] Epoch 2 | Step 20970 | Loss: 0.8946 | LR: 2.00e-06 [2026-04-22 00:49:31] Epoch 2 | Step 20980 | Loss: 0.8946 | LR: 2.00e-06 [2026-04-22 00:49:37] Epoch 2 | Step 20990 | Loss: 0.8946 | LR: 2.00e-06 [2026-04-22 00:49:42] Epoch 2 | Step 21000 | Loss: 0.8946 | LR: 2.00e-06 [2026-04-22 00:49:53] Checkpoint saved: outputs/2026-04-21/20-28-37/checkpoints/checkpoint_step_21000.pt [2026-04-22 00:51:09] Validation | Batch 10/1567 | Loss: 1.0380 [2026-04-22 00:51:10] Validation | Batch 20/1567 | Loss: 1.1226 [2026-04-22 00:51:11] Validation | Batch 30/1567 | Loss: 1.0790 [2026-04-22 00:51:13] Validation | Batch 40/1567 | Loss: 1.1000 [2026-04-22 00:51:13] Validation | Batch 50/1567 | Loss: 1.0761 [2026-04-22 00:51:15] Validation | Batch 60/1567 | Loss: 1.0649 [2026-04-22 00:51:17] Validation | Batch 70/1567 | Loss: 1.0560 [2026-04-22 00:51:18] Validation | Batch 80/1567 | Loss: 1.0641 [2026-04-22 00:51:20] Validation | Batch 90/1567 | Loss: 1.0588 [2026-04-22 00:51:21] Validation | Batch 100/1567 | Loss: 1.0379 [2026-04-22 00:51:22] Validation | Batch 110/1567 | Loss: 1.0282 [2026-04-22 00:51:23] Validation | Batch 120/1567 | Loss: 1.0228 [2026-04-22 00:51:25] Validation | Batch 130/1567 | Loss: 1.0176 [2026-04-22 00:51:26] Validation | Batch 140/1567 | Loss: 1.0279 [2026-04-22 00:51:27] Validation | Batch 150/1567 | Loss: 1.0387 [2026-04-22 00:51:28] Validation | Batch 160/1567 | Loss: 1.0377 [2026-04-22 00:51:29] Validation | Batch 170/1567 | Loss: 1.0299 [2026-04-22 00:51:30] Validation | Batch 180/1567 | Loss: 1.0327 [2026-04-22 00:51:31] Validation | Batch 190/1567 | Loss: 1.0371 [2026-04-22 00:51:32] Validation | Batch 200/1567 | Loss: 1.0406 [2026-04-22 00:51:34] Validation | Batch 210/1567 | Loss: 1.0387 [2026-04-22 00:51:35] Validation | Batch 220/1567 | Loss: 1.0440 [2026-04-22 00:51:36] Validation | Batch 230/1567 | Loss: 1.0478 [2026-04-22 00:51:38] Validation | Batch 240/1567 | Loss: 1.0497 [2026-04-22 00:51:39] Validation | Batch 250/1567 | Loss: 1.0539 [2026-04-22 00:51:40] Validation | Batch 260/1567 | Loss: 1.0568 [2026-04-22 00:51:41] Validation | Batch 270/1567 | Loss: 1.0613 [2026-04-22 00:51:43] Validation | Batch 280/1567 | Loss: 1.0647 [2026-04-22 00:51:44] Validation | Batch 290/1567 | Loss: 1.0600 [2026-04-22 00:51:46] Validation | Batch 300/1567 | Loss: 1.0593 [2026-04-22 00:51:47] Validation | Batch 310/1567 | Loss: 1.0560 [2026-04-22 00:51:48] Validation | Batch 320/1567 | Loss: 1.0588 [2026-04-22 00:51:49] Validation | Batch 330/1567 | Loss: 1.0589 [2026-04-22 00:51:51] Validation | Batch 340/1567 | Loss: 1.0582 [2026-04-22 00:51:52] Validation | Batch 350/1567 | Loss: 1.0557 [2026-04-22 00:51:53] Validation | Batch 360/1567 | Loss: 1.0496 [2026-04-22 00:51:54] Validation | Batch 370/1567 | Loss: 1.0496 [2026-04-22 00:51:55] Validation | Batch 380/1567 | Loss: 1.0538 [2026-04-22 00:51:57] Validation | Batch 390/1567 | Loss: 1.0528 [2026-04-22 00:51:58] Validation | Batch 400/1567 | Loss: 1.0536 [2026-04-22 00:51:59] Validation | Batch 410/1567 | Loss: 1.0496 [2026-04-22 00:52:00] Validation | Batch 420/1567 | Loss: 1.0480 [2026-04-22 00:52:01] Validation | Batch 430/1567 | Loss: 1.0506 [2026-04-22 00:52:03] Validation | Batch 440/1567 | Loss: 1.0508 [2026-04-22 00:52:04] Validation | Batch 450/1567 | Loss: 1.0528 [2026-04-22 00:52:05] Validation | Batch 460/1567 | Loss: 1.0554 [2026-04-22 00:52:06] Validation | Batch 470/1567 | Loss: 1.0603 [2026-04-22 00:52:07] Validation | Batch 480/1567 | Loss: 1.0579 [2026-04-22 00:52:08] Validation | Batch 490/1567 | Loss: 1.0556 [2026-04-22 00:52:09] Validation | Batch 500/1567 | Loss: 1.0567 [2026-04-22 00:52:11] Validation | Batch 510/1567 | Loss: 1.0565 [2026-04-22 00:52:11] Validation | Batch 520/1567 | Loss: 1.0578 [2026-04-22 00:52:13] Validation | Batch 530/1567 | Loss: 1.0563 [2026-04-22 00:52:14] Validation | Batch 540/1567 | Loss: 1.0536 [2026-04-22 00:52:16] Validation | Batch 550/1567 | Loss: 1.0547 [2026-04-22 00:52:17] Validation | Batch 560/1567 | Loss: 1.0539 [2026-04-22 00:52:18] Validation | Batch 570/1567 | Loss: 1.0498 [2026-04-22 00:52:19] Validation | Batch 580/1567 | Loss: 1.0516 [2026-04-22 00:52:21] Validation | Batch 590/1567 | Loss: 1.0514 [2026-04-22 00:52:22] Validation | Batch 600/1567 | Loss: 1.0503 [2026-04-22 00:52:23] Validation | Batch 610/1567 | Loss: 1.0523 [2026-04-22 00:52:24] Validation | Batch 620/1567 | Loss: 1.0503 [2026-04-22 00:52:26] Validation | Batch 630/1567 | Loss: 1.0505 [2026-04-22 00:52:27] Validation | Batch 640/1567 | Loss: 1.0511 [2026-04-22 00:52:29] Validation | Batch 650/1567 | Loss: 1.0539 [2026-04-22 00:52:30] Validation | Batch 660/1567 | Loss: 1.0552 [2026-04-22 00:52:31] Validation | Batch 670/1567 | Loss: 1.0535 [2026-04-22 00:52:32] Validation | Batch 680/1567 | Loss: 1.0523 [2026-04-22 00:52:33] Validation | Batch 690/1567 | Loss: 1.0508 [2026-04-22 00:52:34] Validation | Batch 700/1567 | Loss: 1.0509 [2026-04-22 00:52:36] Validation | Batch 710/1567 | Loss: 1.0501 [2026-04-22 00:52:37] Validation | Batch 720/1567 | Loss: 1.0470 [2026-04-22 00:52:38] Validation | Batch 730/1567 | Loss: 1.0476 [2026-04-22 00:52:39] Validation | Batch 740/1567 | Loss: 1.0482 [2026-04-22 00:52:40] Validation | Batch 750/1567 | Loss: 1.0478 [2026-04-22 00:52:41] Validation | Batch 760/1567 | Loss: 1.0491 [2026-04-22 00:52:43] Validation | Batch 770/1567 | Loss: 1.0486 [2026-04-22 00:52:44] Validation | Batch 780/1567 | Loss: 1.0497 [2026-04-22 00:52:45] Validation | Batch 790/1567 | Loss: 1.0482 [2026-04-22 00:52:46] Validation | Batch 800/1567 | Loss: 1.0464 [2026-04-22 00:52:47] Validation | Batch 810/1567 | Loss: 1.0470 [2026-04-22 00:52:48] Validation | Batch 820/1567 | Loss: 1.0463 [2026-04-22 00:52:49] Validation | Batch 830/1567 | Loss: 1.0455 [2026-04-22 00:52:50] Validation | Batch 840/1567 | Loss: 1.0462 [2026-04-22 00:52:51] Validation | Batch 850/1567 | Loss: 1.0473 [2026-04-22 00:52:52] Validation | Batch 860/1567 | Loss: 1.0480 [2026-04-22 00:52:53] Validation | Batch 870/1567 | Loss: 1.0488 [2026-04-22 00:52:54] Validation | Batch 880/1567 | Loss: 1.0487 [2026-04-22 00:52:55] Validation | Batch 890/1567 | Loss: 1.0483 [2026-04-22 00:52:57] Validation | Batch 900/1567 | Loss: 1.0479 [2026-04-22 00:52:58] Validation | Batch 910/1567 | Loss: 1.0477 [2026-04-22 00:52:59] Validation | Batch 920/1567 | Loss: 1.0495 [2026-04-22 00:53:00] Validation | Batch 930/1567 | Loss: 1.0494 [2026-04-22 00:53:01] Validation | Batch 940/1567 | Loss: 1.0493 [2026-04-22 00:53:02] Validation | Batch 950/1567 | Loss: 1.0489 [2026-04-22 00:53:03] Validation | Batch 960/1567 | Loss: 1.0492 [2026-04-22 00:53:04] Validation | Batch 970/1567 | Loss: 1.0497 [2026-04-22 00:53:05] Validation | Batch 980/1567 | Loss: 1.0494 [2026-04-22 00:53:06] Validation | Batch 990/1567 | Loss: 1.0503 [2026-04-22 00:53:07] Validation | Batch 1000/1567 | Loss: 1.0507 [2026-04-22 00:53:09] Validation | Batch 1010/1567 | Loss: 1.0499 [2026-04-22 00:53:10] Validation | Batch 1020/1567 | Loss: 1.0510 [2026-04-22 00:53:11] Validation | Batch 1030/1567 | Loss: 1.0515 [2026-04-22 00:53:12] Validation | Batch 1040/1567 | Loss: 1.0506 [2026-04-22 00:53:14] Validation | Batch 1050/1567 | Loss: 1.0496 [2026-04-22 00:53:15] Validation | Batch 1060/1567 | Loss: 1.0508 [2026-04-22 00:53:16] Validation | Batch 1070/1567 | Loss: 1.0506 [2026-04-22 00:53:17] Validation | Batch 1080/1567 | Loss: 1.0519 [2026-04-22 00:53:18] Validation | Batch 1090/1567 | Loss: 1.0545 [2026-04-22 00:53:20] Validation | Batch 1100/1567 | Loss: 1.0561 [2026-04-22 00:53:21] Validation | Batch 1110/1567 | Loss: 1.0551 [2026-04-22 00:53:22] Validation | Batch 1120/1567 | Loss: 1.0552 [2026-04-22 00:53:23] Validation | Batch 1130/1567 | Loss: 1.0535 [2026-04-22 00:53:24] Validation | Batch 1140/1567 | Loss: 1.0539 [2026-04-22 00:53:26] Validation | Batch 1150/1567 | Loss: 1.0526 [2026-04-22 00:53:26] Validation | Batch 1160/1567 | Loss: 1.0520 [2026-04-22 00:53:27] Validation | Batch 1170/1567 | Loss: 1.0523 [2026-04-22 00:53:29] Validation | Batch 1180/1567 | Loss: 1.0525 [2026-04-22 00:53:30] Validation | Batch 1190/1567 | Loss: 1.0528 [2026-04-22 00:53:31] Validation | Batch 1200/1567 | Loss: 1.0515 [2026-04-22 00:53:32] Validation | Batch 1210/1567 | Loss: 1.0508 [2026-04-22 00:53:33] Validation | Batch 1220/1567 | Loss: 1.0517 [2026-04-22 00:53:35] Validation | Batch 1230/1567 | Loss: 1.0522 [2026-04-22 00:53:36] Validation | Batch 1240/1567 | Loss: 1.0521 [2026-04-22 00:53:37] Validation | Batch 1250/1567 | Loss: 1.0524 [2026-04-22 00:53:38] Validation | Batch 1260/1567 | Loss: 1.0522 [2026-04-22 00:53:40] Validation | Batch 1270/1567 | Loss: 1.0504 [2026-04-22 00:53:41] Validation | Batch 1280/1567 | Loss: 1.0506 [2026-04-22 00:53:43] Validation | Batch 1290/1567 | Loss: 1.0507 [2026-04-22 00:53:44] Validation | Batch 1300/1567 | Loss: 1.0511 [2026-04-22 00:53:45] Validation | Batch 1310/1567 | Loss: 1.0518 [2026-04-22 00:53:46] Validation | Batch 1320/1567 | Loss: 1.0524 [2026-04-22 00:53:47] Validation | Batch 1330/1567 | Loss: 1.0539 [2026-04-22 00:53:48] Validation | Batch 1340/1567 | Loss: 1.0535 [2026-04-22 00:53:49] Validation | Batch 1350/1567 | Loss: 1.0538 [2026-04-22 00:53:50] Validation | Batch 1360/1567 | Loss: 1.0529 [2026-04-22 00:53:52] Validation | Batch 1370/1567 | Loss: 1.0526 [2026-04-22 00:53:53] Validation | Batch 1380/1567 | Loss: 1.0526 [2026-04-22 00:53:54] Validation | Batch 1390/1567 | Loss: 1.0519 [2026-04-22 00:53:55] Validation | Batch 1400/1567 | Loss: 1.0515 [2026-04-22 00:53:56] Validation | Batch 1410/1567 | Loss: 1.0521 [2026-04-22 00:53:57] Validation | Batch 1420/1567 | Loss: 1.0521 [2026-04-22 00:53:58] Validation | Batch 1430/1567 | Loss: 1.0524 [2026-04-22 00:53:59] Validation | Batch 1440/1567 | Loss: 1.0531 [2026-04-22 00:54:00] Validation | Batch 1450/1567 | Loss: 1.0533 [2026-04-22 00:54:01] Validation | Batch 1460/1567 | Loss: 1.0526 [2026-04-22 00:54:02] Validation | Batch 1470/1567 | Loss: 1.0525 [2026-04-22 00:54:03] Validation | Batch 1480/1567 | Loss: 1.0522 [2026-04-22 00:54:04] Validation | Batch 1490/1567 | Loss: 1.0517 [2026-04-22 00:54:06] Validation | Batch 1500/1567 | Loss: 1.0514 [2026-04-22 00:54:07] Validation | Batch 1510/1567 | Loss: 1.0505 [2026-04-22 00:54:07] Validation | Batch 1520/1567 | Loss: 1.0504 [2026-04-22 00:54:08] Validation | Batch 1530/1567 | Loss: 1.0504 [2026-04-22 00:54:10] Validation | Batch 1540/1567 | Loss: 1.0510 [2026-04-22 00:54:11] Validation | Batch 1550/1567 | Loss: 1.0523 [2026-04-22 00:54:12] Validation | Batch 1560/1567 | Loss: 1.0519 [2026-04-22 00:54:13] Validation | Batch 1567/1567 | Loss: 1.0520 [2026-04-22 00:54:13] Validation | Loss: 1.0520 | PPL: 2.91 | Time: 185.39s [2026-04-22 00:54:31] New best model saved! Val loss: 1.0520 [2026-04-22 00:54:36] Epoch 2 | Step 21010 | Loss: 0.8946 | LR: 2.00e-06 [2026-04-22 00:54:41] Epoch 2 | Step 21020 | Loss: 0.8945 | LR: 2.00e-06 [2026-04-22 00:54:46] Epoch 2 | Step 21030 | Loss: 0.8945 | LR: 2.00e-06 [2026-04-22 00:54:52] Epoch 2 | Step 21040 | Loss: 0.8945 | LR: 2.00e-06 [2026-04-22 00:54:57] Epoch 2 | Step 21050 | Loss: 0.8944 | LR: 2.00e-06 [2026-04-22 00:55:01] Epoch 2 | Step 21060 | Loss: 0.8944 | LR: 2.00e-06 [2026-04-22 00:55:06] Epoch 2 | Step 21070 | Loss: 0.8944 | LR: 2.00e-06 [2026-04-22 00:55:11] Epoch 2 | Step 21080 | Loss: 0.8945 | LR: 2.00e-06 [2026-04-22 00:55:16] Epoch 2 | Step 21090 | Loss: 0.8944 | LR: 2.00e-06 [2026-04-22 00:55:21] Epoch 2 | Step 21100 | Loss: 0.8945 | LR: 2.00e-06 [2026-04-22 00:55:26] Epoch 2 | Step 21110 | Loss: 0.8945 | LR: 2.00e-06 [2026-04-22 00:55:32] Epoch 2 | Step 21120 | Loss: 0.8946 | LR: 2.00e-06 [2026-04-22 00:55:37] Epoch 2 | Step 21130 | Loss: 0.8945 | LR: 2.00e-06 [2026-04-22 00:55:42] Epoch 2 | Step 21140 | Loss: 0.8944 | LR: 2.00e-06 [2026-04-22 00:55:47] Epoch 2 | Step 21150 | Loss: 0.8944 | LR: 2.00e-06 [2026-04-22 00:55:52] Epoch 2 | Step 21160 | Loss: 0.8944 | LR: 2.00e-06 [2026-04-22 00:55:57] Epoch 2 | Step 21170 | Loss: 0.8944 | LR: 2.00e-06 [2026-04-22 00:56:02] Epoch 2 | Step 21180 | Loss: 0.8945 | LR: 2.00e-06 [2026-04-22 00:56:04] Epoch 2 completed in 8135.12s | Loss: 0.8944 [2026-04-22 00:56:14] Checkpoint saved: outputs/2026-04-21/20-28-37/checkpoints/checkpoint_step_21182.pt [2026-04-22 00:57:27] ============================================================ [2026-04-22 00:57:27] EPOCH 3/3 [2026-04-22 00:57:27] ============================================================ [2026-04-22 00:57:31] Epoch 3 | Step 21190 | Loss: 0.7798 | LR: 2.00e-06 [2026-04-22 00:57:36] Epoch 3 | Step 21200 | Loss: 0.8079 | LR: 2.00e-06 [2026-04-22 00:57:42] Epoch 3 | Step 21210 | Loss: 0.8052 | LR: 2.00e-06 [2026-04-22 00:57:47] Epoch 3 | Step 21220 | Loss: 0.8094 | LR: 2.00e-06 [2026-04-22 00:57:52] Epoch 3 | Step 21230 | Loss: 0.7976 | LR: 2.00e-06 [2026-04-22 00:57:58] Epoch 3 | Step 21240 | Loss: 0.8309 | LR: 2.00e-06 [2026-04-22 00:58:03] Epoch 3 | Step 21250 | Loss: 0.8211 | LR: 2.00e-06 [2026-04-22 00:58:08] Epoch 3 | Step 21260 | Loss: 0.8296 | LR: 2.00e-06 [2026-04-22 00:58:13] Epoch 3 | Step 21270 | Loss: 0.8501 | LR: 2.00e-06 [2026-04-22 00:58:19] Epoch 3 | Step 21280 | Loss: 0.8385 | LR: 2.00e-06 [2026-04-22 00:58:24] Epoch 3 | Step 21290 | Loss: 0.8373 | LR: 2.00e-06 [2026-04-22 00:58:29] Epoch 3 | Step 21300 | Loss: 0.8340 | LR: 2.00e-06 [2026-04-22 00:58:34] Epoch 3 | Step 21310 | Loss: 0.8300 | LR: 2.00e-06 [2026-04-22 00:58:40] Epoch 3 | Step 21320 | Loss: 0.8240 | LR: 2.00e-06 [2026-04-22 00:58:45] Epoch 3 | Step 21330 | Loss: 0.8269 | LR: 2.00e-06 [2026-04-22 00:58:51] Epoch 3 | Step 21340 | Loss: 0.8264 | LR: 2.00e-06 [2026-04-22 00:58:56] Epoch 3 | Step 21350 | Loss: 0.8267 | LR: 2.00e-06 [2026-04-22 00:59:02] Epoch 3 | Step 21360 | Loss: 0.8275 | LR: 2.00e-06 [2026-04-22 00:59:07] Epoch 3 | Step 21370 | Loss: 0.8305 | LR: 2.00e-06 [2026-04-22 00:59:12] Epoch 3 | Step 21380 | Loss: 0.8312 | LR: 2.00e-06 [2026-04-22 00:59:18] Epoch 3 | Step 21390 | Loss: 0.8333 | LR: 2.00e-06 [2026-04-22 00:59:23] Epoch 3 | Step 21400 | Loss: 0.8313 | LR: 2.00e-06 [2026-04-22 00:59:28] Epoch 3 | Step 21410 | Loss: 0.8312 | LR: 2.00e-06 [2026-04-22 00:59:33] Epoch 3 | Step 21420 | Loss: 0.8328 | LR: 2.00e-06 [2026-04-22 00:59:39] Epoch 3 | Step 21430 | Loss: 0.8310 | LR: 2.00e-06 [2026-04-22 00:59:45] Epoch 3 | Step 21440 | Loss: 0.8336 | LR: 2.00e-06 [2026-04-22 00:59:50] Epoch 3 | Step 21450 | Loss: 0.8306 | LR: 2.00e-06 [2026-04-22 00:59:55] Epoch 3 | Step 21460 | Loss: 0.8290 | LR: 2.00e-06 [2026-04-22 01:00:00] Epoch 3 | Step 21470 | Loss: 0.8295 | LR: 2.00e-06 [2026-04-22 01:00:05] Epoch 3 | Step 21480 | Loss: 0.8268 | LR: 2.00e-06 [2026-04-22 01:00:10] Epoch 3 | Step 21490 | Loss: 0.8240 | LR: 2.00e-06 [2026-04-22 01:00:14] Epoch 3 | Step 21500 | Loss: 0.8242 | LR: 2.00e-06 [2026-04-22 01:00:20] Epoch 3 | Step 21510 | Loss: 0.8287 | LR: 2.00e-06 [2026-04-22 01:00:25] Epoch 3 | Step 21520 | Loss: 0.8273 | LR: 2.00e-06 [2026-04-22 01:00:30] Epoch 3 | Step 21530 | Loss: 0.8253 | LR: 2.00e-06 [2026-04-22 01:00:36] Epoch 3 | Step 21540 | Loss: 0.8251 | LR: 2.00e-06 [2026-04-22 01:00:42] Epoch 3 | Step 21550 | Loss: 0.8256 | LR: 2.00e-06 [2026-04-22 01:00:47] Epoch 3 | Step 21560 | Loss: 0.8256 | LR: 2.00e-06 [2026-04-22 01:00:52] Epoch 3 | Step 21570 | Loss: 0.8231 | LR: 2.00e-06 [2026-04-22 01:00:57] Epoch 3 | Step 21580 | Loss: 0.8229 | LR: 2.00e-06 [2026-04-22 01:01:02] Epoch 3 | Step 21590 | Loss: 0.8246 | LR: 2.00e-06 [2026-04-22 01:01:07] Epoch 3 | Step 21600 | Loss: 0.8235 | LR: 2.00e-06 [2026-04-22 01:01:13] Epoch 3 | Step 21610 | Loss: 0.8240 | LR: 2.00e-06 [2026-04-22 01:01:18] Epoch 3 | Step 21620 | Loss: 0.8251 | LR: 2.00e-06 [2026-04-22 01:01:23] Epoch 3 | Step 21630 | Loss: 0.8228 | LR: 2.00e-06 [2026-04-22 01:01:29] Epoch 3 | Step 21640 | Loss: 0.8220 | LR: 2.00e-06 [2026-04-22 01:01:34] Epoch 3 | Step 21650 | Loss: 0.8214 | LR: 2.00e-06 [2026-04-22 01:01:39] Epoch 3 | Step 21660 | Loss: 0.8225 | LR: 2.00e-06 [2026-04-22 01:01:44] Epoch 3 | Step 21670 | Loss: 0.8227 | LR: 2.00e-06 [2026-04-22 01:01:50] Epoch 3 | Step 21680 | Loss: 0.8235 | LR: 2.00e-06 [2026-04-22 01:01:56] Epoch 3 | Step 21690 | Loss: 0.8226 | LR: 2.00e-06 [2026-04-22 01:02:01] Epoch 3 | Step 21700 | Loss: 0.8214 | LR: 2.00e-06 [2026-04-22 01:02:06] Epoch 3 | Step 21710 | Loss: 0.8220 | LR: 2.00e-06 [2026-04-22 01:02:11] Epoch 3 | Step 21720 | Loss: 0.8210 | LR: 2.00e-06 [2026-04-22 01:02:16] Epoch 3 | Step 21730 | Loss: 0.8224 | LR: 2.00e-06 [2026-04-22 01:02:22] Epoch 3 | Step 21740 | Loss: 0.8240 | LR: 2.00e-06 [2026-04-22 01:02:28] Epoch 3 | Step 21750 | Loss: 0.8231 | LR: 2.00e-06 [2026-04-22 01:02:33] Epoch 3 | Step 21760 | Loss: 0.8230 | LR: 2.00e-06 [2026-04-22 01:02:38] Epoch 3 | Step 21770 | Loss: 0.8209 | LR: 2.00e-06 [2026-04-22 01:02:43] Epoch 3 | Step 21780 | Loss: 0.8198 | LR: 2.00e-06 [2026-04-22 01:02:49] Epoch 3 | Step 21790 | Loss: 0.8221 | LR: 2.00e-06 [2026-04-22 01:02:55] Epoch 3 | Step 21800 | Loss: 0.8227 | LR: 2.00e-06 [2026-04-22 01:03:00] Epoch 3 | Step 21810 | Loss: 0.8233 | LR: 2.00e-06 [2026-04-22 01:03:05] Epoch 3 | Step 21820 | Loss: 0.8223 | LR: 2.00e-06 [2026-04-22 01:03:11] Epoch 3 | Step 21830 | Loss: 0.8239 | LR: 2.00e-06 [2026-04-22 01:03:16] Epoch 3 | Step 21840 | Loss: 0.8239 | LR: 2.00e-06 [2026-04-22 01:03:21] Epoch 3 | Step 21850 | Loss: 0.8242 | LR: 2.00e-06 [2026-04-22 01:03:26] Epoch 3 | Step 21860 | Loss: 0.8236 | LR: 2.00e-06 [2026-04-22 01:03:32] Epoch 3 | Step 21870 | Loss: 0.8244 | LR: 2.00e-06 [2026-04-22 01:03:37] Epoch 3 | Step 21880 | Loss: 0.8244 | LR: 2.00e-06 [2026-04-22 01:03:43] Epoch 3 | Step 21890 | Loss: 0.8265 | LR: 2.00e-06 [2026-04-22 01:03:48] Epoch 3 | Step 21900 | Loss: 0.8249 | LR: 2.00e-06 [2026-04-22 01:03:54] Epoch 3 | Step 21910 | Loss: 0.8238 | LR: 2.00e-06 [2026-04-22 01:03:59] Epoch 3 | Step 21920 | Loss: 0.8242 | LR: 2.00e-06 [2026-04-22 01:04:05] Epoch 3 | Step 21930 | Loss: 0.8240 | LR: 2.00e-06 [2026-04-22 01:04:11] Epoch 3 | Step 21940 | Loss: 0.8240 | LR: 2.00e-06 [2026-04-22 01:04:17] Epoch 3 | Step 21950 | Loss: 0.8239 | LR: 2.00e-06 [2026-04-22 01:04:22] Epoch 3 | Step 21960 | Loss: 0.8236 | LR: 2.00e-06 [2026-04-22 01:04:27] Epoch 3 | Step 21970 | Loss: 0.8241 | LR: 2.00e-06 [2026-04-22 01:04:33] Epoch 3 | Step 21980 | Loss: 0.8252 | LR: 2.00e-06 [2026-04-22 01:04:38] Epoch 3 | Step 21990 | Loss: 0.8245 | LR: 2.00e-06 [2026-04-22 01:04:42] Epoch 3 | Step 22000 | Loss: 0.8252 | LR: 2.00e-06 [2026-04-22 01:04:44] Validation | Batch 10/1567 | Loss: 1.0434 [2026-04-22 01:04:45] Validation | Batch 20/1567 | Loss: 1.1299 [2026-04-22 01:04:46] Validation | Batch 30/1567 | Loss: 1.0858 [2026-04-22 01:04:48] Validation | Batch 40/1567 | Loss: 1.1068 [2026-04-22 01:04:49] Validation | Batch 50/1567 | Loss: 1.0824 [2026-04-22 01:04:50] Validation | Batch 60/1567 | Loss: 1.0709 [2026-04-22 01:04:51] Validation | Batch 70/1567 | Loss: 1.0624 [2026-04-22 01:04:53] Validation | Batch 80/1567 | Loss: 1.0709 [2026-04-22 01:04:54] Validation | Batch 90/1567 | Loss: 1.0654 [2026-04-22 01:04:55] Validation | Batch 100/1567 | Loss: 1.0442 [2026-04-22 01:04:56] Validation | Batch 110/1567 | Loss: 1.0341 [2026-04-22 01:04:58] Validation | Batch 120/1567 | Loss: 1.0288 [2026-04-22 01:04:59] Validation | Batch 130/1567 | Loss: 1.0234 [2026-04-22 01:05:00] Validation | Batch 140/1567 | Loss: 1.0339 [2026-04-22 01:05:01] Validation | Batch 150/1567 | Loss: 1.0447 [2026-04-22 01:05:02] Validation | Batch 160/1567 | Loss: 1.0436 [2026-04-22 01:05:03] Validation | Batch 170/1567 | Loss: 1.0358 [2026-04-22 01:05:04] Validation | Batch 180/1567 | Loss: 1.0388 [2026-04-22 01:05:05] Validation | Batch 190/1567 | Loss: 1.0432 [2026-04-22 01:05:07] Validation | Batch 200/1567 | Loss: 1.0466 [2026-04-22 01:05:08] Validation | Batch 210/1567 | Loss: 1.0446 [2026-04-22 01:05:09] Validation | Batch 220/1567 | Loss: 1.0499 [2026-04-22 01:05:11] Validation | Batch 230/1567 | Loss: 1.0538 [2026-04-22 01:05:12] Validation | Batch 240/1567 | Loss: 1.0559 [2026-04-22 01:05:13] Validation | Batch 250/1567 | Loss: 1.0601 [2026-04-22 01:05:14] Validation | Batch 260/1567 | Loss: 1.0630 [2026-04-22 01:05:15] Validation | Batch 270/1567 | Loss: 1.0675 [2026-04-22 01:05:17] Validation | Batch 280/1567 | Loss: 1.0709 [2026-04-22 01:05:19] Validation | Batch 290/1567 | Loss: 1.0662 [2026-04-22 01:05:20] Validation | Batch 300/1567 | Loss: 1.0655 [2026-04-22 01:05:21] Validation | Batch 310/1567 | Loss: 1.0622 [2026-04-22 01:05:22] Validation | Batch 320/1567 | Loss: 1.0651 [2026-04-22 01:05:24] Validation | Batch 330/1567 | Loss: 1.0652 [2026-04-22 01:05:25] Validation | Batch 340/1567 | Loss: 1.0645 [2026-04-22 01:05:26] Validation | Batch 350/1567 | Loss: 1.0619 [2026-04-22 01:05:27] Validation | Batch 360/1567 | Loss: 1.0557 [2026-04-22 01:05:29] Validation | Batch 370/1567 | Loss: 1.0557 [2026-04-22 01:05:30] Validation | Batch 380/1567 | Loss: 1.0600 [2026-04-22 01:05:31] Validation | Batch 390/1567 | Loss: 1.0590 [2026-04-22 01:05:32] Validation | Batch 400/1567 | Loss: 1.0598 [2026-04-22 01:05:34] Validation | Batch 410/1567 | Loss: 1.0558 [2026-04-22 01:05:35] Validation | Batch 420/1567 | Loss: 1.0541 [2026-04-22 01:05:36] Validation | Batch 430/1567 | Loss: 1.0567 [2026-04-22 01:05:37] Validation | Batch 440/1567 | Loss: 1.0569 [2026-04-22 01:05:38] Validation | Batch 450/1567 | Loss: 1.0589 [2026-04-22 01:05:39] Validation | Batch 460/1567 | Loss: 1.0615 [2026-04-22 01:05:40] Validation | Batch 470/1567 | Loss: 1.0664 [2026-04-22 01:05:42] Validation | Batch 480/1567 | Loss: 1.0640 [2026-04-22 01:05:43] Validation | Batch 490/1567 | Loss: 1.0617 [2026-04-22 01:05:44] Validation | Batch 500/1567 | Loss: 1.0628 [2026-04-22 01:05:45] Validation | Batch 510/1567 | Loss: 1.0626 [2026-04-22 01:05:46] Validation | Batch 520/1567 | Loss: 1.0640 [2026-04-22 01:05:47] Validation | Batch 530/1567 | Loss: 1.0625 [2026-04-22 01:05:49] Validation | Batch 540/1567 | Loss: 1.0597 [2026-04-22 01:05:50] Validation | Batch 550/1567 | Loss: 1.0609 [2026-04-22 01:05:51] Validation | Batch 560/1567 | Loss: 1.0600 [2026-04-22 01:05:52] Validation | Batch 570/1567 | Loss: 1.0558 [2026-04-22 01:05:54] Validation | Batch 580/1567 | Loss: 1.0577 [2026-04-22 01:05:55] Validation | Batch 590/1567 | Loss: 1.0575 [2026-04-22 01:05:56] Validation | Batch 600/1567 | Loss: 1.0564 [2026-04-22 01:05:57] Validation | Batch 610/1567 | Loss: 1.0585 [2026-04-22 01:05:59] Validation | Batch 620/1567 | Loss: 1.0564 [2026-04-22 01:06:00] Validation | Batch 630/1567 | Loss: 1.0567 [2026-04-22 01:06:02] Validation | Batch 640/1567 | Loss: 1.0573 [2026-04-22 01:06:03] Validation | Batch 650/1567 | Loss: 1.0602 [2026-04-22 01:06:04] Validation | Batch 660/1567 | Loss: 1.0615 [2026-04-22 01:06:05] Validation | Batch 670/1567 | Loss: 1.0597 [2026-04-22 01:06:06] Validation | Batch 680/1567 | Loss: 1.0585 [2026-04-22 01:06:07] Validation | Batch 690/1567 | Loss: 1.0570 [2026-04-22 01:06:09] Validation | Batch 700/1567 | Loss: 1.0571 [2026-04-22 01:06:10] Validation | Batch 710/1567 | Loss: 1.0563 [2026-04-22 01:06:11] Validation | Batch 720/1567 | Loss: 1.0532 [2026-04-22 01:06:12] Validation | Batch 730/1567 | Loss: 1.0538 [2026-04-22 01:06:13] Validation | Batch 740/1567 | Loss: 1.0544 [2026-04-22 01:06:14] Validation | Batch 750/1567 | Loss: 1.0540 [2026-04-22 01:06:15] Validation | Batch 760/1567 | Loss: 1.0553 [2026-04-22 01:06:17] Validation | Batch 770/1567 | Loss: 1.0548 [2026-04-22 01:06:18] Validation | Batch 780/1567 | Loss: 1.0559 [2026-04-22 01:06:19] Validation | Batch 790/1567 | Loss: 1.0544 [2026-04-22 01:06:20] Validation | Batch 800/1567 | Loss: 1.0526 [2026-04-22 01:06:21] Validation | Batch 810/1567 | Loss: 1.0533 [2026-04-22 01:06:22] Validation | Batch 820/1567 | Loss: 1.0525 [2026-04-22 01:06:23] Validation | Batch 830/1567 | Loss: 1.0517 [2026-04-22 01:06:24] Validation | Batch 840/1567 | Loss: 1.0524 [2026-04-22 01:06:25] Validation | Batch 850/1567 | Loss: 1.0536 [2026-04-22 01:06:26] Validation | Batch 860/1567 | Loss: 1.0543 [2026-04-22 01:06:27] Validation | Batch 870/1567 | Loss: 1.0551 [2026-04-22 01:06:28] Validation | Batch 880/1567 | Loss: 1.0550 [2026-04-22 01:06:30] Validation | Batch 890/1567 | Loss: 1.0545 [2026-04-22 01:06:31] Validation | Batch 900/1567 | Loss: 1.0542 [2026-04-22 01:06:32] Validation | Batch 910/1567 | Loss: 1.0539 [2026-04-22 01:06:33] Validation | Batch 920/1567 | Loss: 1.0558 [2026-04-22 01:06:34] Validation | Batch 930/1567 | Loss: 1.0557 [2026-04-22 01:06:35] Validation | Batch 940/1567 | Loss: 1.0556 [2026-04-22 01:06:37] Validation | Batch 950/1567 | Loss: 1.0552 [2026-04-22 01:06:38] Validation | Batch 960/1567 | Loss: 1.0555 [2026-04-22 01:06:39] Validation | Batch 970/1567 | Loss: 1.0561 [2026-04-22 01:06:40] Validation | Batch 980/1567 | Loss: 1.0557 [2026-04-22 01:06:40] Validation | Batch 990/1567 | Loss: 1.0566 [2026-04-22 01:06:42] Validation | Batch 1000/1567 | Loss: 1.0571 [2026-04-22 01:06:43] Validation | Batch 1010/1567 | Loss: 1.0562 [2026-04-22 01:06:44] Validation | Batch 1020/1567 | Loss: 1.0574 [2026-04-22 01:06:45] Validation | Batch 1030/1567 | Loss: 1.0578 [2026-04-22 01:06:47] Validation | Batch 1040/1567 | Loss: 1.0570 [2026-04-22 01:06:48] Validation | Batch 1050/1567 | Loss: 1.0559 [2026-04-22 01:06:49] Validation | Batch 1060/1567 | Loss: 1.0571 [2026-04-22 01:06:51] Validation | Batch 1070/1567 | Loss: 1.0570 [2026-04-22 01:06:52] Validation | Batch 1080/1567 | Loss: 1.0583 [2026-04-22 01:06:53] Validation | Batch 1090/1567 | Loss: 1.0609 [2026-04-22 01:06:54] Validation | Batch 1100/1567 | Loss: 1.0625 [2026-04-22 01:06:55] Validation | Batch 1110/1567 | Loss: 1.0615 [2026-04-22 01:06:56] Validation | Batch 1120/1567 | Loss: 1.0616 [2026-04-22 01:06:57] Validation | Batch 1130/1567 | Loss: 1.0599 [2026-04-22 01:06:59] Validation | Batch 1140/1567 | Loss: 1.0603 [2026-04-22 01:07:00] Validation | Batch 1150/1567 | Loss: 1.0590 [2026-04-22 01:07:01] Validation | Batch 1160/1567 | Loss: 1.0584 [2026-04-22 01:07:02] Validation | Batch 1170/1567 | Loss: 1.0587 [2026-04-22 01:07:03] Validation | Batch 1180/1567 | Loss: 1.0589 [2026-04-22 01:07:04] Validation | Batch 1190/1567 | Loss: 1.0592 [2026-04-22 01:07:06] Validation | Batch 1200/1567 | Loss: 1.0579 [2026-04-22 01:07:07] Validation | Batch 1210/1567 | Loss: 1.0573 [2026-04-22 01:07:08] Validation | Batch 1220/1567 | Loss: 1.0582 [2026-04-22 01:07:09] Validation | Batch 1230/1567 | Loss: 1.0587 [2026-04-22 01:07:10] Validation | Batch 1240/1567 | Loss: 1.0585 [2026-04-22 01:07:11] Validation | Batch 1250/1567 | Loss: 1.0589 [2026-04-22 01:07:13] Validation | Batch 1260/1567 | Loss: 1.0586 [2026-04-22 01:07:14] Validation | Batch 1270/1567 | Loss: 1.0569 [2026-04-22 01:07:15] Validation | Batch 1280/1567 | Loss: 1.0570 [2026-04-22 01:07:17] Validation | Batch 1290/1567 | Loss: 1.0571 [2026-04-22 01:07:18] Validation | Batch 1300/1567 | Loss: 1.0575 [2026-04-22 01:07:19] Validation | Batch 1310/1567 | Loss: 1.0582 [2026-04-22 01:07:20] Validation | Batch 1320/1567 | Loss: 1.0588 [2026-04-22 01:07:21] Validation | Batch 1330/1567 | Loss: 1.0603 [2026-04-22 01:07:23] Validation | Batch 1340/1567 | Loss: 1.0599 [2026-04-22 01:07:23] Validation | Batch 1350/1567 | Loss: 1.0603 [2026-04-22 01:07:25] Validation | Batch 1360/1567 | Loss: 1.0593 [2026-04-22 01:07:26] Validation | Batch 1370/1567 | Loss: 1.0590 [2026-04-22 01:07:27] Validation | Batch 1380/1567 | Loss: 1.0590 [2026-04-22 01:07:28] Validation | Batch 1390/1567 | Loss: 1.0582 [2026-04-22 01:07:29] Validation | Batch 1400/1567 | Loss: 1.0579 [2026-04-22 01:07:30] Validation | Batch 1410/1567 | Loss: 1.0584 [2026-04-22 01:07:31] Validation | Batch 1420/1567 | Loss: 1.0584 [2026-04-22 01:07:32] Validation | Batch 1430/1567 | Loss: 1.0587 [2026-04-22 01:07:34] Validation | Batch 1440/1567 | Loss: 1.0595 [2026-04-22 01:07:35] Validation | Batch 1450/1567 | Loss: 1.0597 [2026-04-22 01:07:36] Validation | Batch 1460/1567 | Loss: 1.0590 [2026-04-22 01:07:37] Validation | Batch 1470/1567 | Loss: 1.0588 [2026-04-22 01:07:38] Validation | Batch 1480/1567 | Loss: 1.0586 [2026-04-22 01:07:38] Validation | Batch 1490/1567 | Loss: 1.0581 [2026-04-22 01:07:40] Validation | Batch 1500/1567 | Loss: 1.0578 [2026-04-22 01:07:41] Validation | Batch 1510/1567 | Loss: 1.0569 [2026-04-22 01:07:42] Validation | Batch 1520/1567 | Loss: 1.0568 [2026-04-22 01:07:43] Validation | Batch 1530/1567 | Loss: 1.0568 [2026-04-22 01:07:44] Validation | Batch 1540/1567 | Loss: 1.0574 [2026-04-22 01:07:45] Validation | Batch 1550/1567 | Loss: 1.0587 [2026-04-22 01:07:46] Validation | Batch 1560/1567 | Loss: 1.0583 [2026-04-22 01:07:47] Validation | Batch 1567/1567 | Loss: 1.0584 [2026-04-22 01:07:47] Validation | Loss: 1.0584 | PPL: 2.93 | Time: 184.62s [2026-04-22 01:07:52] Epoch 3 | Step 22010 | Loss: 0.8244 | LR: 2.00e-06 [2026-04-22 01:07:57] Epoch 3 | Step 22020 | Loss: 0.8243 | LR: 2.00e-06 [2026-04-22 01:08:02] Epoch 3 | Step 22030 | Loss: 0.8224 | LR: 2.00e-06 [2026-04-22 01:08:07] Epoch 3 | Step 22040 | Loss: 0.8234 | LR: 2.00e-06 [2026-04-22 01:08:13] Epoch 3 | Step 22050 | Loss: 0.8233 | LR: 2.00e-06 [2026-04-22 01:08:19] Epoch 3 | Step 22060 | Loss: 0.8245 | LR: 2.00e-06 [2026-04-22 01:08:24] Epoch 3 | Step 22070 | Loss: 0.8240 | LR: 2.00e-06 [2026-04-22 01:08:30] Epoch 3 | Step 22080 | Loss: 0.8244 | LR: 2.00e-06 [2026-04-22 01:08:35] Epoch 3 | Step 22090 | Loss: 0.8231 | LR: 2.00e-06 [2026-04-22 01:08:41] Epoch 3 | Step 22100 | Loss: 0.8228 | LR: 2.00e-06 [2026-04-22 01:08:46] Epoch 3 | Step 22110 | Loss: 0.8244 | LR: 2.00e-06 [2026-04-22 01:08:51] Epoch 3 | Step 22120 | Loss: 0.8237 | LR: 2.00e-06 [2026-04-22 01:08:56] Epoch 3 | Step 22130 | Loss: 0.8234 | LR: 2.00e-06 [2026-04-22 01:09:02] Epoch 3 | Step 22140 | Loss: 0.8221 | LR: 2.00e-06 [2026-04-22 01:09:07] Epoch 3 | Step 22150 | Loss: 0.8207 | LR: 2.00e-06 [2026-04-22 01:09:12] Epoch 3 | Step 22160 | Loss: 0.8214 | LR: 2.00e-06 [2026-04-22 01:09:17] Epoch 3 | Step 22170 | Loss: 0.8207 | LR: 2.00e-06 [2026-04-22 01:09:23] Epoch 3 | Step 22180 | Loss: 0.8200 | LR: 2.00e-06 [2026-04-22 01:09:28] Epoch 3 | Step 22190 | Loss: 0.8209 | LR: 2.00e-06 [2026-04-22 01:09:34] Epoch 3 | Step 22200 | Loss: 0.8206 | LR: 2.00e-06 [2026-04-22 01:09:38] Epoch 3 | Step 22210 | Loss: 0.8208 | LR: 2.00e-06 [2026-04-22 01:09:44] Epoch 3 | Step 22220 | Loss: 0.8204 | LR: 2.00e-06 [2026-04-22 01:09:49] Epoch 3 | Step 22230 | Loss: 0.8202 | LR: 2.00e-06 [2026-04-22 01:09:54] Epoch 3 | Step 22240 | Loss: 0.8199 | LR: 2.00e-06 [2026-04-22 01:10:00] Epoch 3 | Step 22250 | Loss: 0.8198 | LR: 2.00e-06 [2026-04-22 01:10:04] Epoch 3 | Step 22260 | Loss: 0.8198 | LR: 2.00e-06 [2026-04-22 01:10:09] Epoch 3 | Step 22270 | Loss: 0.8194 | LR: 2.00e-06 [2026-04-22 01:10:15] Epoch 3 | Step 22280 | Loss: 0.8188 | LR: 2.00e-06 [2026-04-22 01:10:20] Epoch 3 | Step 22290 | Loss: 0.8184 | LR: 2.00e-06 [2026-04-22 01:10:26] Epoch 3 | Step 22300 | Loss: 0.8183 | LR: 2.00e-06 [2026-04-22 01:10:30] Epoch 3 | Step 22310 | Loss: 0.8183 | LR: 2.00e-06 [2026-04-22 01:10:35] Epoch 3 | Step 22320 | Loss: 0.8178 | LR: 2.00e-06 [2026-04-22 01:10:40] Epoch 3 | Step 22330 | Loss: 0.8173 | LR: 2.00e-06 [2026-04-22 01:10:45] Epoch 3 | Step 22340 | Loss: 0.8174 | LR: 2.00e-06 [2026-04-22 01:10:50] Epoch 3 | Step 22350 | Loss: 0.8178 | LR: 2.00e-06 [2026-04-22 01:10:55] Epoch 3 | Step 22360 | Loss: 0.8176 | LR: 2.00e-06 [2026-04-22 01:11:00] Epoch 3 | Step 22370 | Loss: 0.8168 | LR: 2.00e-06 [2026-04-22 01:11:05] Epoch 3 | Step 22380 | Loss: 0.8177 | LR: 2.00e-06 [2026-04-22 01:11:10] Epoch 3 | Step 22390 | Loss: 0.8178 | LR: 2.00e-06 [2026-04-22 01:11:16] Epoch 3 | Step 22400 | Loss: 0.8178 | LR: 2.00e-06 [2026-04-22 01:11:21] Epoch 3 | Step 22410 | Loss: 0.8174 | LR: 2.00e-06 [2026-04-22 01:11:26] Epoch 3 | Step 22420 | Loss: 0.8181 | LR: 2.00e-06 [2026-04-22 01:11:32] Epoch 3 | Step 22430 | Loss: 0.8187 | LR: 2.00e-06 [2026-04-22 01:11:37] Epoch 3 | Step 22440 | Loss: 0.8183 | LR: 2.00e-06 [2026-04-22 01:11:41] Epoch 3 | Step 22450 | Loss: 0.8180 | LR: 2.00e-06 [2026-04-22 01:11:47] Epoch 3 | Step 22460 | Loss: 0.8178 | LR: 2.00e-06 [2026-04-22 01:11:52] Epoch 3 | Step 22470 | Loss: 0.8175 | LR: 2.00e-06 [2026-04-22 01:11:57] Epoch 3 | Step 22480 | Loss: 0.8186 | LR: 2.00e-06 [2026-04-22 01:12:03] Epoch 3 | Step 22490 | Loss: 0.8188 | LR: 2.00e-06 [2026-04-22 01:12:07] Epoch 3 | Step 22500 | Loss: 0.8193 | LR: 2.00e-06 [2026-04-22 01:12:12] Epoch 3 | Step 22510 | Loss: 0.8201 | LR: 2.00e-06 [2026-04-22 01:12:18] Epoch 3 | Step 22520 | Loss: 0.8203 | LR: 2.00e-06 [2026-04-22 01:12:23] Epoch 3 | Step 22530 | Loss: 0.8203 | LR: 2.00e-06 [2026-04-22 01:12:28] Epoch 3 | Step 22540 | Loss: 0.8203 | LR: 2.00e-06 [2026-04-22 01:12:33] Epoch 3 | Step 22550 | Loss: 0.8199 | LR: 2.00e-06 [2026-04-22 01:12:38] Epoch 3 | Step 22560 | Loss: 0.8198 | LR: 2.00e-06 [2026-04-22 01:12:43] Epoch 3 | Step 22570 | Loss: 0.8194 | LR: 2.00e-06 [2026-04-22 01:12:48] Epoch 3 | Step 22580 | Loss: 0.8187 | LR: 2.00e-06 [2026-04-22 01:12:55] Epoch 3 | Step 22590 | Loss: 0.8190 | LR: 2.00e-06 [2026-04-22 01:13:01] Epoch 3 | Step 22600 | Loss: 0.8195 | LR: 2.00e-06 [2026-04-22 01:13:06] Epoch 3 | Step 22610 | Loss: 0.8199 | LR: 2.00e-06 [2026-04-22 01:13:11] Epoch 3 | Step 22620 | Loss: 0.8200 | LR: 2.00e-06 [2026-04-22 01:13:17] Epoch 3 | Step 22630 | Loss: 0.8197 | LR: 2.00e-06 [2026-04-22 01:13:23] Epoch 3 | Step 22640 | Loss: 0.8198 | LR: 2.00e-06 [2026-04-22 01:13:28] Epoch 3 | Step 22650 | Loss: 0.8196 | LR: 2.00e-06 [2026-04-22 01:13:33] Epoch 3 | Step 22660 | Loss: 0.8197 | LR: 2.00e-06 [2026-04-22 01:13:39] Epoch 3 | Step 22670 | Loss: 0.8186 | LR: 2.00e-06 [2026-04-22 01:13:44] Epoch 3 | Step 22680 | Loss: 0.8189 | LR: 2.00e-06 [2026-04-22 01:13:48] Epoch 3 | Step 22690 | Loss: 0.8191 | LR: 2.00e-06 [2026-04-22 01:13:53] Epoch 3 | Step 22700 | Loss: 0.8193 | LR: 2.00e-06 [2026-04-22 01:13:58] Epoch 3 | Step 22710 | Loss: 0.8191 | LR: 2.00e-06 [2026-04-22 01:14:05] Epoch 3 | Step 22720 | Loss: 0.8195 | LR: 2.00e-06 [2026-04-22 01:14:10] Epoch 3 | Step 22730 | Loss: 0.8194 | LR: 2.00e-06 [2026-04-22 01:14:16] Epoch 3 | Step 22740 | Loss: 0.8193 | LR: 2.00e-06 [2026-04-22 01:14:21] Epoch 3 | Step 22750 | Loss: 0.8194 | LR: 2.00e-06 [2026-04-22 01:14:26] Epoch 3 | Step 22760 | Loss: 0.8202 | LR: 2.00e-06 [2026-04-22 01:14:31] Epoch 3 | Step 22770 | Loss: 0.8205 | LR: 2.00e-06 [2026-04-22 01:14:36] Epoch 3 | Step 22780 | Loss: 0.8208 | LR: 2.00e-06 [2026-04-22 01:14:41] Epoch 3 | Step 22790 | Loss: 0.8210 | LR: 2.00e-06 [2026-04-22 01:14:46] Epoch 3 | Step 22800 | Loss: 0.8215 | LR: 2.00e-06 [2026-04-22 01:14:51] Epoch 3 | Step 22810 | Loss: 0.8213 | LR: 2.00e-06 [2026-04-22 01:14:56] Epoch 3 | Step 22820 | Loss: 0.8211 | LR: 2.00e-06 [2026-04-22 01:15:01] Epoch 3 | Step 22830 | Loss: 0.8215 | LR: 2.00e-06 [2026-04-22 01:15:06] Epoch 3 | Step 22840 | Loss: 0.8212 | LR: 2.00e-06 [2026-04-22 01:15:12] Epoch 3 | Step 22850 | Loss: 0.8207 | LR: 2.00e-06 [2026-04-22 01:15:17] Epoch 3 | Step 22860 | Loss: 0.8203 | LR: 2.00e-06 [2026-04-22 01:15:23] Epoch 3 | Step 22870 | Loss: 0.8207 | LR: 2.00e-06 [2026-04-22 01:15:29] Epoch 3 | Step 22880 | Loss: 0.8210 | LR: 2.00e-06 [2026-04-22 01:15:34] Epoch 3 | Step 22890 | Loss: 0.8210 | LR: 2.00e-06 [2026-04-22 01:15:40] Epoch 3 | Step 22900 | Loss: 0.8217 | LR: 2.00e-06 [2026-04-22 01:15:45] Epoch 3 | Step 22910 | Loss: 0.8216 | LR: 2.00e-06 [2026-04-22 01:15:51] Epoch 3 | Step 22920 | Loss: 0.8217 | LR: 2.00e-06 [2026-04-22 01:15:56] Epoch 3 | Step 22930 | Loss: 0.8217 | LR: 2.00e-06 [2026-04-22 01:16:01] Epoch 3 | Step 22940 | Loss: 0.8220 | LR: 2.00e-06 [2026-04-22 01:16:06] Epoch 3 | Step 22950 | Loss: 0.8223 | LR: 2.00e-06 [2026-04-22 01:16:12] Epoch 3 | Step 22960 | Loss: 0.8220 | LR: 2.00e-06 [2026-04-22 01:16:16] Epoch 3 | Step 22970 | Loss: 0.8220 | LR: 2.00e-06 [2026-04-22 01:16:21] Epoch 3 | Step 22980 | Loss: 0.8219 | LR: 2.00e-06 [2026-04-22 01:16:26] Epoch 3 | Step 22990 | Loss: 0.8214 | LR: 2.00e-06 [2026-04-22 01:16:32] Epoch 3 | Step 23000 | Loss: 0.8217 | LR: 2.00e-06 [2026-04-22 01:16:33] Validation | Batch 10/1567 | Loss: 1.0461 [2026-04-22 01:16:34] Validation | Batch 20/1567 | Loss: 1.1337 [2026-04-22 01:16:36] Validation | Batch 30/1567 | Loss: 1.0894 [2026-04-22 01:16:37] Validation | Batch 40/1567 | Loss: 1.1101 [2026-04-22 01:16:38] Validation | Batch 50/1567 | Loss: 1.0855 [2026-04-22 01:16:39] Validation | Batch 60/1567 | Loss: 1.0736 [2026-04-22 01:16:41] Validation | Batch 70/1567 | Loss: 1.0654 [2026-04-22 01:16:42] Validation | Batch 80/1567 | Loss: 1.0741 [2026-04-22 01:16:44] Validation | Batch 90/1567 | Loss: 1.0687 [2026-04-22 01:16:45] Validation | Batch 100/1567 | Loss: 1.0475 [2026-04-22 01:16:46] Validation | Batch 110/1567 | Loss: 1.0371 [2026-04-22 01:16:47] Validation | Batch 120/1567 | Loss: 1.0317 [2026-04-22 01:16:49] Validation | Batch 130/1567 | Loss: 1.0264 [2026-04-22 01:16:50] Validation | Batch 140/1567 | Loss: 1.0368 [2026-04-22 01:16:51] Validation | Batch 150/1567 | Loss: 1.0476 [2026-04-22 01:16:52] Validation | Batch 160/1567 | Loss: 1.0466 [2026-04-22 01:16:53] Validation | Batch 170/1567 | Loss: 1.0388 [2026-04-22 01:16:54] Validation | Batch 180/1567 | Loss: 1.0418 [2026-04-22 01:16:55] Validation | Batch 190/1567 | Loss: 1.0464 [2026-04-22 01:16:56] Validation | Batch 200/1567 | Loss: 1.0498 [2026-04-22 01:16:58] Validation | Batch 210/1567 | Loss: 1.0477 [2026-04-22 01:16:59] Validation | Batch 220/1567 | Loss: 1.0531 [2026-04-22 01:17:00] Validation | Batch 230/1567 | Loss: 1.0571 [2026-04-22 01:17:02] Validation | Batch 240/1567 | Loss: 1.0593 [2026-04-22 01:17:03] Validation | Batch 250/1567 | Loss: 1.0635 [2026-04-22 01:17:04] Validation | Batch 260/1567 | Loss: 1.0665 [2026-04-22 01:17:05] Validation | Batch 270/1567 | Loss: 1.0709 [2026-04-22 01:17:07] Validation | Batch 280/1567 | Loss: 1.0744 [2026-04-22 01:17:08] Validation | Batch 290/1567 | Loss: 1.0696 [2026-04-22 01:17:10] Validation | Batch 300/1567 | Loss: 1.0689 [2026-04-22 01:17:11] Validation | Batch 310/1567 | Loss: 1.0657 [2026-04-22 01:17:12] Validation | Batch 320/1567 | Loss: 1.0684 [2026-04-22 01:17:13] Validation | Batch 330/1567 | Loss: 1.0685 [2026-04-22 01:17:15] Validation | Batch 340/1567 | Loss: 1.0678 [2026-04-22 01:17:16] Validation | Batch 350/1567 | Loss: 1.0652 [2026-04-22 01:17:17] Validation | Batch 360/1567 | Loss: 1.0590 [2026-04-22 01:17:18] Validation | Batch 370/1567 | Loss: 1.0590 [2026-04-22 01:17:19] Validation | Batch 380/1567 | Loss: 1.0633 [2026-04-22 01:17:21] Validation | Batch 390/1567 | Loss: 1.0624 [2026-04-22 01:17:22] Validation | Batch 400/1567 | Loss: 1.0632 [2026-04-22 01:17:23] Validation | Batch 410/1567 | Loss: 1.0592 [2026-04-22 01:17:24] Validation | Batch 420/1567 | Loss: 1.0575 [2026-04-22 01:17:25] Validation | Batch 430/1567 | Loss: 1.0601 [2026-04-22 01:17:27] Validation | Batch 440/1567 | Loss: 1.0603 [2026-04-22 01:17:28] Validation | Batch 450/1567 | Loss: 1.0624 [2026-04-22 01:17:29] Validation | Batch 460/1567 | Loss: 1.0649 [2026-04-22 01:17:30] Validation | Batch 470/1567 | Loss: 1.0699 [2026-04-22 01:17:31] Validation | Batch 480/1567 | Loss: 1.0674 [2026-04-22 01:17:32] Validation | Batch 490/1567 | Loss: 1.0651 [2026-04-22 01:17:33] Validation | Batch 500/1567 | Loss: 1.0662 [2026-04-22 01:17:35] Validation | Batch 510/1567 | Loss: 1.0660 [2026-04-22 01:17:35] Validation | Batch 520/1567 | Loss: 1.0674 [2026-04-22 01:17:37] Validation | Batch 530/1567 | Loss: 1.0659 [2026-04-22 01:17:38] Validation | Batch 540/1567 | Loss: 1.0631 [2026-04-22 01:17:40] Validation | Batch 550/1567 | Loss: 1.0643 [2026-04-22 01:17:41] Validation | Batch 560/1567 | Loss: 1.0634 [2026-04-22 01:17:42] Validation | Batch 570/1567 | Loss: 1.0592 [2026-04-22 01:17:43] Validation | Batch 580/1567 | Loss: 1.0611 [2026-04-22 01:17:45] Validation | Batch 590/1567 | Loss: 1.0609 [2026-04-22 01:17:46] Validation | Batch 600/1567 | Loss: 1.0598 [2026-04-22 01:17:47] Validation | Batch 610/1567 | Loss: 1.0618 [2026-04-22 01:17:48] Validation | Batch 620/1567 | Loss: 1.0598 [2026-04-22 01:17:50] Validation | Batch 630/1567 | Loss: 1.0601 [2026-04-22 01:17:51] Validation | Batch 640/1567 | Loss: 1.0607 [2026-04-22 01:17:53] Validation | Batch 650/1567 | Loss: 1.0635 [2026-04-22 01:17:54] Validation | Batch 660/1567 | Loss: 1.0648 [2026-04-22 01:17:55] Validation | Batch 670/1567 | Loss: 1.0630 [2026-04-22 01:17:56] Validation | Batch 680/1567 | Loss: 1.0618 [2026-04-22 01:17:57] Validation | Batch 690/1567 | Loss: 1.0603 [2026-04-22 01:17:58] Validation | Batch 700/1567 | Loss: 1.0604 [2026-04-22 01:18:00] Validation | Batch 710/1567 | Loss: 1.0596 [2026-04-22 01:18:01] Validation | Batch 720/1567 | Loss: 1.0565 [2026-04-22 01:18:02] Validation | Batch 730/1567 | Loss: 1.0570 [2026-04-22 01:18:02] Validation | Batch 740/1567 | Loss: 1.0577 [2026-04-22 01:18:04] Validation | Batch 750/1567 | Loss: 1.0573 [2026-04-22 01:18:05] Validation | Batch 760/1567 | Loss: 1.0586 [2026-04-22 01:18:06] Validation | Batch 770/1567 | Loss: 1.0581 [2026-04-22 01:18:08] Validation | Batch 780/1567 | Loss: 1.0591 [2026-04-22 01:18:09] Validation | Batch 790/1567 | Loss: 1.0576 [2026-04-22 01:18:10] Validation | Batch 800/1567 | Loss: 1.0558 [2026-04-22 01:18:11] Validation | Batch 810/1567 | Loss: 1.0565 [2026-04-22 01:18:12] Validation | Batch 820/1567 | Loss: 1.0557 [2026-04-22 01:18:13] Validation | Batch 830/1567 | Loss: 1.0550 [2026-04-22 01:18:14] Validation | Batch 840/1567 | Loss: 1.0557 [2026-04-22 01:18:15] Validation | Batch 850/1567 | Loss: 1.0568 [2026-04-22 01:18:16] Validation | Batch 860/1567 | Loss: 1.0575 [2026-04-22 01:18:17] Validation | Batch 870/1567 | Loss: 1.0583 [2026-04-22 01:18:18] Validation | Batch 880/1567 | Loss: 1.0582 [2026-04-22 01:18:19] Validation | Batch 890/1567 | Loss: 1.0577 [2026-04-22 01:18:21] Validation | Batch 900/1567 | Loss: 1.0574 [2026-04-22 01:18:22] Validation | Batch 910/1567 | Loss: 1.0571 [2026-04-22 01:18:23] Validation | Batch 920/1567 | Loss: 1.0590 [2026-04-22 01:18:24] Validation | Batch 930/1567 | Loss: 1.0589 [2026-04-22 01:18:25] Validation | Batch 940/1567 | Loss: 1.0588 [2026-04-22 01:18:26] Validation | Batch 950/1567 | Loss: 1.0584 [2026-04-22 01:18:27] Validation | Batch 960/1567 | Loss: 1.0587 [2026-04-22 01:18:28] Validation | Batch 970/1567 | Loss: 1.0593 [2026-04-22 01:18:29] Validation | Batch 980/1567 | Loss: 1.0589 [2026-04-22 01:18:30] Validation | Batch 990/1567 | Loss: 1.0599 [2026-04-22 01:18:31] Validation | Batch 1000/1567 | Loss: 1.0603 [2026-04-22 01:18:33] Validation | Batch 1010/1567 | Loss: 1.0594 [2026-04-22 01:18:34] Validation | Batch 1020/1567 | Loss: 1.0606 [2026-04-22 01:18:35] Validation | Batch 1030/1567 | Loss: 1.0610 [2026-04-22 01:18:36] Validation | Batch 1040/1567 | Loss: 1.0602 [2026-04-22 01:18:37] Validation | Batch 1050/1567 | Loss: 1.0591 [2026-04-22 01:18:39] Validation | Batch 1060/1567 | Loss: 1.0603 [2026-04-22 01:18:40] Validation | Batch 1070/1567 | Loss: 1.0602 [2026-04-22 01:18:41] Validation | Batch 1080/1567 | Loss: 1.0615 [2026-04-22 01:18:42] Validation | Batch 1090/1567 | Loss: 1.0641 [2026-04-22 01:18:44] Validation | Batch 1100/1567 | Loss: 1.0657 [2026-04-22 01:18:45] Validation | Batch 1110/1567 | Loss: 1.0647 [2026-04-22 01:18:46] Validation | Batch 1120/1567 | Loss: 1.0649 [2026-04-22 01:18:47] Validation | Batch 1130/1567 | Loss: 1.0631 [2026-04-22 01:18:48] Validation | Batch 1140/1567 | Loss: 1.0635 [2026-04-22 01:18:50] Validation | Batch 1150/1567 | Loss: 1.0622 [2026-04-22 01:18:50] Validation | Batch 1160/1567 | Loss: 1.0616 [2026-04-22 01:18:51] Validation | Batch 1170/1567 | Loss: 1.0619 [2026-04-22 01:18:53] Validation | Batch 1180/1567 | Loss: 1.0621 [2026-04-22 01:18:54] Validation | Batch 1190/1567 | Loss: 1.0624 [2026-04-22 01:18:55] Validation | Batch 1200/1567 | Loss: 1.0611 [2026-04-22 01:18:56] Validation | Batch 1210/1567 | Loss: 1.0604 [2026-04-22 01:18:57] Validation | Batch 1220/1567 | Loss: 1.0613 [2026-04-22 01:18:59] Validation | Batch 1230/1567 | Loss: 1.0619 [2026-04-22 01:19:00] Validation | Batch 1240/1567 | Loss: 1.0617 [2026-04-22 01:19:01] Validation | Batch 1250/1567 | Loss: 1.0621 [2026-04-22 01:19:02] Validation | Batch 1260/1567 | Loss: 1.0618 [2026-04-22 01:19:04] Validation | Batch 1270/1567 | Loss: 1.0600 [2026-04-22 01:19:05] Validation | Batch 1280/1567 | Loss: 1.0602 [2026-04-22 01:19:07] Validation | Batch 1290/1567 | Loss: 1.0603 [2026-04-22 01:19:08] Validation | Batch 1300/1567 | Loss: 1.0606 [2026-04-22 01:19:09] Validation | Batch 1310/1567 | Loss: 1.0614 [2026-04-22 01:19:10] Validation | Batch 1320/1567 | Loss: 1.0619 [2026-04-22 01:19:11] Validation | Batch 1330/1567 | Loss: 1.0634 [2026-04-22 01:19:12] Validation | Batch 1340/1567 | Loss: 1.0631 [2026-04-22 01:19:13] Validation | Batch 1350/1567 | Loss: 1.0634 [2026-04-22 01:19:14] Validation | Batch 1360/1567 | Loss: 1.0625 [2026-04-22 01:19:15] Validation | Batch 1370/1567 | Loss: 1.0621 [2026-04-22 01:19:17] Validation | Batch 1380/1567 | Loss: 1.0621 [2026-04-22 01:19:18] Validation | Batch 1390/1567 | Loss: 1.0614 [2026-04-22 01:19:19] Validation | Batch 1400/1567 | Loss: 1.0610 [2026-04-22 01:19:20] Validation | Batch 1410/1567 | Loss: 1.0616 [2026-04-22 01:19:21] Validation | Batch 1420/1567 | Loss: 1.0616 [2026-04-22 01:19:22] Validation | Batch 1430/1567 | Loss: 1.0619 [2026-04-22 01:19:23] Validation | Batch 1440/1567 | Loss: 1.0627 [2026-04-22 01:19:24] Validation | Batch 1450/1567 | Loss: 1.0629 [2026-04-22 01:19:25] Validation | Batch 1460/1567 | Loss: 1.0622 [2026-04-22 01:19:26] Validation | Batch 1470/1567 | Loss: 1.0620 [2026-04-22 01:19:27] Validation | Batch 1480/1567 | Loss: 1.0618 [2026-04-22 01:19:28] Validation | Batch 1490/1567 | Loss: 1.0613 [2026-04-22 01:19:29] Validation | Batch 1500/1567 | Loss: 1.0610 [2026-04-22 01:19:30] Validation | Batch 1510/1567 | Loss: 1.0601 [2026-04-22 01:19:31] Validation | Batch 1520/1567 | Loss: 1.0600 [2026-04-22 01:19:32] Validation | Batch 1530/1567 | Loss: 1.0600 [2026-04-22 01:19:34] Validation | Batch 1540/1567 | Loss: 1.0606 [2026-04-22 01:19:35] Validation | Batch 1550/1567 | Loss: 1.0619 [2026-04-22 01:19:36] Validation | Batch 1560/1567 | Loss: 1.0615 [2026-04-22 01:19:37] Validation | Batch 1567/1567 | Loss: 1.0615 [2026-04-22 01:19:37] Validation | Loss: 1.0615 | PPL: 2.94 | Time: 184.69s [2026-04-22 01:19:42] Epoch 3 | Step 23010 | Loss: 0.8220 | LR: 2.00e-06 [2026-04-22 01:19:48] Epoch 3 | Step 23020 | Loss: 0.8215 | LR: 2.00e-06 [2026-04-22 01:19:53] Epoch 3 | Step 23030 | Loss: 0.8217 | LR: 2.00e-06 [2026-04-22 01:19:58] Epoch 3 | Step 23040 | Loss: 0.8217 | LR: 2.00e-06 [2026-04-22 01:20:04] Epoch 3 | Step 23050 | Loss: 0.8228 | LR: 2.00e-06 [2026-04-22 01:20:09] Epoch 3 | Step 23060 | Loss: 0.8234 | LR: 2.00e-06 [2026-04-22 01:20:14] Epoch 3 | Step 23070 | Loss: 0.8232 | LR: 2.00e-06 [2026-04-22 01:20:19] Epoch 3 | Step 23080 | Loss: 0.8228 | LR: 2.00e-06 [2026-04-22 01:20:24] Epoch 3 | Step 23090 | Loss: 0.8222 | LR: 2.00e-06 [2026-04-22 01:20:30] Epoch 3 | Step 23100 | Loss: 0.8222 | LR: 2.00e-06 [2026-04-22 01:20:35] Epoch 3 | Step 23110 | Loss: 0.8218 | LR: 2.00e-06 [2026-04-22 01:20:41] Epoch 3 | Step 23120 | Loss: 0.8225 | LR: 2.00e-06 [2026-04-22 01:20:46] Epoch 3 | Step 23130 | Loss: 0.8221 | LR: 2.00e-06 [2026-04-22 01:20:51] Epoch 3 | Step 23140 | Loss: 0.8217 | LR: 2.00e-06 [2026-04-22 01:20:57] Epoch 3 | Step 23150 | Loss: 0.8225 | LR: 2.00e-06 [2026-04-22 01:21:01] Epoch 3 | Step 23160 | Loss: 0.8223 | LR: 2.00e-06 [2026-04-22 01:21:07] Epoch 3 | Step 23170 | Loss: 0.8224 | LR: 2.00e-06 [2026-04-22 01:21:13] Epoch 3 | Step 23180 | Loss: 0.8226 | LR: 2.00e-06 [2026-04-22 01:21:18] Epoch 3 | Step 23190 | Loss: 0.8225 | LR: 2.00e-06 [2026-04-22 01:21:23] Epoch 3 | Step 23200 | Loss: 0.8221 | LR: 2.00e-06 [2026-04-22 01:21:28] Epoch 3 | Step 23210 | Loss: 0.8221 | LR: 2.00e-06 [2026-04-22 01:21:33] Epoch 3 | Step 23220 | Loss: 0.8222 | LR: 2.00e-06 [2026-04-22 01:21:38] Epoch 3 | Step 23230 | Loss: 0.8221 | LR: 2.00e-06 [2026-04-22 01:21:44] Epoch 3 | Step 23240 | Loss: 0.8219 | LR: 2.00e-06 [2026-04-22 01:21:49] Epoch 3 | Step 23250 | Loss: 0.8217 | LR: 2.00e-06 [2026-04-22 01:21:55] Epoch 3 | Step 23260 | Loss: 0.8220 | LR: 2.00e-06 [2026-04-22 01:22:00] Epoch 3 | Step 23270 | Loss: 0.8216 | LR: 2.00e-06 [2026-04-22 01:22:06] Epoch 3 | Step 23280 | Loss: 0.8218 | LR: 2.00e-06 [2026-04-22 01:22:11] Epoch 3 | Step 23290 | Loss: 0.8220 | LR: 2.00e-06 [2026-04-22 01:22:17] Epoch 3 | Step 23300 | Loss: 0.8224 | LR: 2.00e-06 [2026-04-22 01:22:23] Epoch 3 | Step 23310 | Loss: 0.8229 | LR: 2.00e-06 [2026-04-22 01:22:28] Epoch 3 | Step 23320 | Loss: 0.8231 | LR: 2.00e-06 [2026-04-22 01:22:34] Epoch 3 | Step 23330 | Loss: 0.8231 | LR: 2.00e-06 [2026-04-22 01:22:39] Epoch 3 | Step 23340 | Loss: 0.8234 | LR: 2.00e-06 [2026-04-22 01:22:44] Epoch 3 | Step 23350 | Loss: 0.8240 | LR: 2.00e-06 [2026-04-22 01:22:50] Epoch 3 | Step 23360 | Loss: 0.8241 | LR: 2.00e-06 [2026-04-22 01:22:55] Epoch 3 | Step 23370 | Loss: 0.8240 | LR: 2.00e-06 [2026-04-22 01:23:00] Epoch 3 | Step 23380 | Loss: 0.8241 | LR: 2.00e-06 [2026-04-22 01:23:05] Epoch 3 | Step 23390 | Loss: 0.8238 | LR: 2.00e-06 [2026-04-22 01:23:10] Epoch 3 | Step 23400 | Loss: 0.8240 | LR: 2.00e-06 [2026-04-22 01:23:16] Epoch 3 | Step 23410 | Loss: 0.8237 | LR: 2.00e-06 [2026-04-22 01:23:22] Epoch 3 | Step 23420 | Loss: 0.8240 | LR: 2.00e-06 [2026-04-22 01:23:27] Epoch 3 | Step 23430 | Loss: 0.8241 | LR: 2.00e-06 [2026-04-22 01:23:32] Epoch 3 | Step 23440 | Loss: 0.8240 | LR: 2.00e-06 [2026-04-22 01:23:37] Epoch 3 | Step 23450 | Loss: 0.8238 | LR: 2.00e-06 [2026-04-22 01:23:42] Epoch 3 | Step 23460 | Loss: 0.8240 | LR: 2.00e-06 [2026-04-22 01:23:48] Epoch 3 | Step 23470 | Loss: 0.8239 | LR: 2.00e-06 [2026-04-22 01:23:53] Epoch 3 | Step 23480 | Loss: 0.8238 | LR: 2.00e-06 [2026-04-22 01:23:59] Epoch 3 | Step 23490 | Loss: 0.8239 | LR: 2.00e-06 [2026-04-22 01:24:03] Epoch 3 | Step 23500 | Loss: 0.8239 | LR: 2.00e-06 [2026-04-22 01:24:09] Epoch 3 | Step 23510 | Loss: 0.8242 | LR: 2.00e-06 [2026-04-22 01:24:14] Epoch 3 | Step 23520 | Loss: 0.8241 | LR: 2.00e-06 [2026-04-22 01:24:19] Epoch 3 | Step 23530 | Loss: 0.8245 | LR: 2.00e-06 [2026-04-22 01:24:24] Epoch 3 | Step 23540 | Loss: 0.8239 | LR: 2.00e-06 [2026-04-22 01:24:29] Epoch 3 | Step 23550 | Loss: 0.8237 | LR: 2.00e-06 [2026-04-22 01:24:34] Epoch 3 | Step 23560 | Loss: 0.8234 | LR: 2.00e-06 [2026-04-22 01:24:39] Epoch 3 | Step 23570 | Loss: 0.8234 | LR: 2.00e-06 [2026-04-22 01:24:44] Epoch 3 | Step 23580 | Loss: 0.8233 | LR: 2.00e-06 [2026-04-22 01:24:49] Epoch 3 | Step 23590 | Loss: 0.8232 | LR: 2.00e-06 [2026-04-22 01:24:54] Epoch 3 | Step 23600 | Loss: 0.8231 | LR: 2.00e-06 [2026-04-22 01:24:58] Epoch 3 | Step 23610 | Loss: 0.8237 | LR: 2.00e-06 [2026-04-22 01:25:03] Epoch 3 | Step 23620 | Loss: 0.8236 | LR: 2.00e-06 [2026-04-22 01:25:09] Epoch 3 | Step 23630 | Loss: 0.8236 | LR: 2.00e-06 [2026-04-22 01:25:14] Epoch 3 | Step 23640 | Loss: 0.8238 | LR: 2.00e-06 [2026-04-22 01:25:19] Epoch 3 | Step 23650 | Loss: 0.8236 | LR: 2.00e-06 [2026-04-22 01:25:25] Epoch 3 | Step 23660 | Loss: 0.8233 | LR: 2.00e-06 [2026-04-22 01:25:30] Epoch 3 | Step 23670 | Loss: 0.8234 | LR: 2.00e-06 [2026-04-22 01:25:35] Epoch 3 | Step 23680 | Loss: 0.8237 | LR: 2.00e-06 [2026-04-22 01:25:40] Epoch 3 | Step 23690 | Loss: 0.8235 | LR: 2.00e-06 [2026-04-22 01:25:45] Epoch 3 | Step 23700 | Loss: 0.8236 | LR: 2.00e-06 [2026-04-22 01:25:50] Epoch 3 | Step 23710 | Loss: 0.8240 | LR: 2.00e-06 [2026-04-22 01:25:57] Epoch 3 | Step 23720 | Loss: 0.8236 | LR: 2.00e-06 [2026-04-22 01:26:01] Epoch 3 | Step 23730 | Loss: 0.8237 | LR: 2.00e-06 [2026-04-22 01:26:06] Epoch 3 | Step 23740 | Loss: 0.8237 | LR: 2.00e-06 [2026-04-22 01:26:12] Epoch 3 | Step 23750 | Loss: 0.8238 | LR: 2.00e-06 [2026-04-22 01:26:17] Epoch 3 | Step 23760 | Loss: 0.8237 | LR: 2.00e-06 [2026-04-22 01:26:22] Epoch 3 | Step 23770 | Loss: 0.8237 | LR: 2.00e-06 [2026-04-22 01:26:28] Epoch 3 | Step 23780 | Loss: 0.8236 | LR: 2.00e-06 [2026-04-22 01:26:33] Epoch 3 | Step 23790 | Loss: 0.8234 | LR: 2.00e-06 [2026-04-22 01:26:38] Epoch 3 | Step 23800 | Loss: 0.8233 | LR: 2.00e-06 [2026-04-22 01:26:43] Epoch 3 | Step 23810 | Loss: 0.8231 | LR: 2.00e-06 [2026-04-22 01:26:49] Epoch 3 | Step 23820 | Loss: 0.8231 | LR: 2.00e-06 [2026-04-22 01:26:54] Epoch 3 | Step 23830 | Loss: 0.8229 | LR: 2.00e-06 [2026-04-22 01:27:01] Epoch 3 | Step 23840 | Loss: 0.8230 | LR: 2.00e-06 [2026-04-22 01:27:07] Epoch 3 | Step 23850 | Loss: 0.8232 | LR: 2.00e-06 [2026-04-22 01:27:11] Epoch 3 | Step 23860 | Loss: 0.8234 | LR: 2.00e-06 [2026-04-22 01:27:18] Epoch 3 | Step 23870 | Loss: 0.8236 | LR: 2.00e-06 [2026-04-22 01:27:22] Epoch 3 | Step 23880 | Loss: 0.8234 | LR: 2.00e-06 [2026-04-22 01:27:28] Epoch 3 | Step 23890 | Loss: 0.8226 | LR: 2.00e-06 [2026-04-22 01:27:33] Epoch 3 | Step 23900 | Loss: 0.8225 | LR: 2.00e-06 [2026-04-22 01:27:38] Epoch 3 | Step 23910 | Loss: 0.8222 | LR: 2.00e-06 [2026-04-22 01:27:43] Epoch 3 | Step 23920 | Loss: 0.8225 | LR: 2.00e-06 [2026-04-22 01:27:49] Epoch 3 | Step 23930 | Loss: 0.8227 | LR: 2.00e-06 [2026-04-22 01:27:54] Epoch 3 | Step 23940 | Loss: 0.8227 | LR: 2.00e-06 [2026-04-22 01:28:00] Epoch 3 | Step 23950 | Loss: 0.8228 | LR: 2.00e-06 [2026-04-22 01:28:05] Epoch 3 | Step 23960 | Loss: 0.8229 | LR: 2.00e-06 [2026-04-22 01:28:10] Epoch 3 | Step 23970 | Loss: 0.8231 | LR: 2.00e-06 [2026-04-22 01:28:15] Epoch 3 | Step 23980 | Loss: 0.8233 | LR: 2.00e-06 [2026-04-22 01:28:21] Epoch 3 | Step 23990 | Loss: 0.8232 | LR: 2.00e-06 [2026-04-22 01:28:26] Epoch 3 | Step 24000 | Loss: 0.8233 | LR: 2.00e-06 [2026-04-22 01:28:37] Checkpoint saved: outputs/2026-04-21/20-28-37/checkpoints/checkpoint_step_24000.pt [2026-04-22 01:29:50] Validation | Batch 10/1567 | Loss: 1.0495 [2026-04-22 01:29:51] Validation | Batch 20/1567 | Loss: 1.1358 [2026-04-22 01:29:53] Validation | Batch 30/1567 | Loss: 1.0913 [2026-04-22 01:29:54] Validation | Batch 40/1567 | Loss: 1.1114 [2026-04-22 01:29:55] Validation | Batch 50/1567 | Loss: 1.0868 [2026-04-22 01:29:56] Validation | Batch 60/1567 | Loss: 1.0747 [2026-04-22 01:29:58] Validation | Batch 70/1567 | Loss: 1.0667 [2026-04-22 01:29:59] Validation | Batch 80/1567 | Loss: 1.0753 [2026-04-22 01:30:00] Validation | Batch 90/1567 | Loss: 1.0698 [2026-04-22 01:30:02] Validation | Batch 100/1567 | Loss: 1.0486 [2026-04-22 01:30:03] Validation | Batch 110/1567 | Loss: 1.0384 [2026-04-22 01:30:04] Validation | Batch 120/1567 | Loss: 1.0329 [2026-04-22 01:30:05] Validation | Batch 130/1567 | Loss: 1.0276 [2026-04-22 01:30:07] Validation | Batch 140/1567 | Loss: 1.0380 [2026-04-22 01:30:08] Validation | Batch 150/1567 | Loss: 1.0488 [2026-04-22 01:30:09] Validation | Batch 160/1567 | Loss: 1.0478 [2026-04-22 01:30:10] Validation | Batch 170/1567 | Loss: 1.0399 [2026-04-22 01:30:11] Validation | Batch 180/1567 | Loss: 1.0429 [2026-04-22 01:30:12] Validation | Batch 190/1567 | Loss: 1.0475 [2026-04-22 01:30:14] Validation | Batch 200/1567 | Loss: 1.0509 [2026-04-22 01:30:15] Validation | Batch 210/1567 | Loss: 1.0489 [2026-04-22 01:30:16] Validation | Batch 220/1567 | Loss: 1.0542 [2026-04-22 01:30:18] Validation | Batch 230/1567 | Loss: 1.0583 [2026-04-22 01:30:19] Validation | Batch 240/1567 | Loss: 1.0604 [2026-04-22 01:30:20] Validation | Batch 250/1567 | Loss: 1.0646 [2026-04-22 01:30:21] Validation | Batch 260/1567 | Loss: 1.0676 [2026-04-22 01:30:22] Validation | Batch 270/1567 | Loss: 1.0720 [2026-04-22 01:30:24] Validation | Batch 280/1567 | Loss: 1.0755 [2026-04-22 01:30:26] Validation | Batch 290/1567 | Loss: 1.0707 [2026-04-22 01:30:27] Validation | Batch 300/1567 | Loss: 1.0699 [2026-04-22 01:30:28] Validation | Batch 310/1567 | Loss: 1.0667 [2026-04-22 01:30:29] Validation | Batch 320/1567 | Loss: 1.0695 [2026-04-22 01:30:31] Validation | Batch 330/1567 | Loss: 1.0696 [2026-04-22 01:30:32] Validation | Batch 340/1567 | Loss: 1.0690 [2026-04-22 01:30:33] Validation | Batch 350/1567 | Loss: 1.0664 [2026-04-22 01:30:34] Validation | Batch 360/1567 | Loss: 1.0601 [2026-04-22 01:30:36] Validation | Batch 370/1567 | Loss: 1.0602 [2026-04-22 01:30:37] Validation | Batch 380/1567 | Loss: 1.0645 [2026-04-22 01:30:38] Validation | Batch 390/1567 | Loss: 1.0636 [2026-04-22 01:30:39] Validation | Batch 400/1567 | Loss: 1.0643 [2026-04-22 01:30:40] Validation | Batch 410/1567 | Loss: 1.0603 [2026-04-22 01:30:41] Validation | Batch 420/1567 | Loss: 1.0587 [2026-04-22 01:30:43] Validation | Batch 430/1567 | Loss: 1.0612 [2026-04-22 01:30:44] Validation | Batch 440/1567 | Loss: 1.0615 [2026-04-22 01:30:45] Validation | Batch 450/1567 | Loss: 1.0635 [2026-04-22 01:30:46] Validation | Batch 460/1567 | Loss: 1.0660 [2026-04-22 01:30:47] Validation | Batch 470/1567 | Loss: 1.0710 [2026-04-22 01:30:49] Validation | Batch 480/1567 | Loss: 1.0685 [2026-04-22 01:30:50] Validation | Batch 490/1567 | Loss: 1.0662 [2026-04-22 01:30:51] Validation | Batch 500/1567 | Loss: 1.0673 [2026-04-22 01:30:52] Validation | Batch 510/1567 | Loss: 1.0671 [2026-04-22 01:30:53] Validation | Batch 520/1567 | Loss: 1.0685 [2026-04-22 01:30:54] Validation | Batch 530/1567 | Loss: 1.0670 [2026-04-22 01:30:55] Validation | Batch 540/1567 | Loss: 1.0643 [2026-04-22 01:30:57] Validation | Batch 550/1567 | Loss: 1.0654 [2026-04-22 01:30:58] Validation | Batch 560/1567 | Loss: 1.0645 [2026-04-22 01:30:59] Validation | Batch 570/1567 | Loss: 1.0603 [2026-04-22 01:31:01] Validation | Batch 580/1567 | Loss: 1.0623 [2026-04-22 01:31:02] Validation | Batch 590/1567 | Loss: 1.0620 [2026-04-22 01:31:03] Validation | Batch 600/1567 | Loss: 1.0609 [2026-04-22 01:31:04] Validation | Batch 610/1567 | Loss: 1.0630 [2026-04-22 01:31:06] Validation | Batch 620/1567 | Loss: 1.0610 [2026-04-22 01:31:07] Validation | Batch 630/1567 | Loss: 1.0612 [2026-04-22 01:31:09] Validation | Batch 640/1567 | Loss: 1.0619 [2026-04-22 01:31:10] Validation | Batch 650/1567 | Loss: 1.0648 [2026-04-22 01:31:11] Validation | Batch 660/1567 | Loss: 1.0661 [2026-04-22 01:31:12] Validation | Batch 670/1567 | Loss: 1.0643 [2026-04-22 01:31:13] Validation | Batch 680/1567 | Loss: 1.0631 [2026-04-22 01:31:14] Validation | Batch 690/1567 | Loss: 1.0616 [2026-04-22 01:31:16] Validation | Batch 700/1567 | Loss: 1.0617 [2026-04-22 01:31:17] Validation | Batch 710/1567 | Loss: 1.0609 [2026-04-22 01:31:18] Validation | Batch 720/1567 | Loss: 1.0578 [2026-04-22 01:31:19] Validation | Batch 730/1567 | Loss: 1.0583 [2026-04-22 01:31:20] Validation | Batch 740/1567 | Loss: 1.0589 [2026-04-22 01:31:21] Validation | Batch 750/1567 | Loss: 1.0585 [2026-04-22 01:31:22] Validation | Batch 760/1567 | Loss: 1.0598 [2026-04-22 01:31:24] Validation | Batch 770/1567 | Loss: 1.0593 [2026-04-22 01:31:25] Validation | Batch 780/1567 | Loss: 1.0604 [2026-04-22 01:31:26] Validation | Batch 790/1567 | Loss: 1.0589 [2026-04-22 01:31:27] Validation | Batch 800/1567 | Loss: 1.0570 [2026-04-22 01:31:28] Validation | Batch 810/1567 | Loss: 1.0578 [2026-04-22 01:31:29] Validation | Batch 820/1567 | Loss: 1.0570 [2026-04-22 01:31:30] Validation | Batch 830/1567 | Loss: 1.0562 [2026-04-22 01:31:31] Validation | Batch 840/1567 | Loss: 1.0569 [2026-04-22 01:31:32] Validation | Batch 850/1567 | Loss: 1.0581 [2026-04-22 01:31:33] Validation | Batch 860/1567 | Loss: 1.0588 [2026-04-22 01:31:34] Validation | Batch 870/1567 | Loss: 1.0596 [2026-04-22 01:31:35] Validation | Batch 880/1567 | Loss: 1.0594 [2026-04-22 01:31:37] Validation | Batch 890/1567 | Loss: 1.0590 [2026-04-22 01:31:38] Validation | Batch 900/1567 | Loss: 1.0587 [2026-04-22 01:31:39] Validation | Batch 910/1567 | Loss: 1.0584 [2026-04-22 01:31:40] Validation | Batch 920/1567 | Loss: 1.0603 [2026-04-22 01:31:41] Validation | Batch 930/1567 | Loss: 1.0602 [2026-04-22 01:31:42] Validation | Batch 940/1567 | Loss: 1.0601 [2026-04-22 01:31:44] Validation | Batch 950/1567 | Loss: 1.0598 [2026-04-22 01:31:45] Validation | Batch 960/1567 | Loss: 1.0601 [2026-04-22 01:31:46] Validation | Batch 970/1567 | Loss: 1.0606 [2026-04-22 01:31:47] Validation | Batch 980/1567 | Loss: 1.0603 [2026-04-22 01:31:47] Validation | Batch 990/1567 | Loss: 1.0612 [2026-04-22 01:31:49] Validation | Batch 1000/1567 | Loss: 1.0617 [2026-04-22 01:31:50] Validation | Batch 1010/1567 | Loss: 1.0608 [2026-04-22 01:31:51] Validation | Batch 1020/1567 | Loss: 1.0620 [2026-04-22 01:31:52] Validation | Batch 1030/1567 | Loss: 1.0624 [2026-04-22 01:31:54] Validation | Batch 1040/1567 | Loss: 1.0616 [2026-04-22 01:31:55] Validation | Batch 1050/1567 | Loss: 1.0605 [2026-04-22 01:31:56] Validation | Batch 1060/1567 | Loss: 1.0617 [2026-04-22 01:31:57] Validation | Batch 1070/1567 | Loss: 1.0615 [2026-04-22 01:31:59] Validation | Batch 1080/1567 | Loss: 1.0629 [2026-04-22 01:32:00] Validation | Batch 1090/1567 | Loss: 1.0655 [2026-04-22 01:32:01] Validation | Batch 1100/1567 | Loss: 1.0671 [2026-04-22 01:32:02] Validation | Batch 1110/1567 | Loss: 1.0660 [2026-04-22 01:32:03] Validation | Batch 1120/1567 | Loss: 1.0662 [2026-04-22 01:32:04] Validation | Batch 1130/1567 | Loss: 1.0644 [2026-04-22 01:32:06] Validation | Batch 1140/1567 | Loss: 1.0649 [2026-04-22 01:32:07] Validation | Batch 1150/1567 | Loss: 1.0636 [2026-04-22 01:32:08] Validation | Batch 1160/1567 | Loss: 1.0629 [2026-04-22 01:32:09] Validation | Batch 1170/1567 | Loss: 1.0632 [2026-04-22 01:32:10] Validation | Batch 1180/1567 | Loss: 1.0635 [2026-04-22 01:32:11] Validation | Batch 1190/1567 | Loss: 1.0637 [2026-04-22 01:32:13] Validation | Batch 1200/1567 | Loss: 1.0624 [2026-04-22 01:32:14] Validation | Batch 1210/1567 | Loss: 1.0618 [2026-04-22 01:32:15] Validation | Batch 1220/1567 | Loss: 1.0627 [2026-04-22 01:32:16] Validation | Batch 1230/1567 | Loss: 1.0632 [2026-04-22 01:32:17] Validation | Batch 1240/1567 | Loss: 1.0631 [2026-04-22 01:32:18] Validation | Batch 1250/1567 | Loss: 1.0634 [2026-04-22 01:32:20] Validation | Batch 1260/1567 | Loss: 1.0632 [2026-04-22 01:32:21] Validation | Batch 1270/1567 | Loss: 1.0614 [2026-04-22 01:32:22] Validation | Batch 1280/1567 | Loss: 1.0616 [2026-04-22 01:32:24] Validation | Batch 1290/1567 | Loss: 1.0617 [2026-04-22 01:32:25] Validation | Batch 1300/1567 | Loss: 1.0620 [2026-04-22 01:32:26] Validation | Batch 1310/1567 | Loss: 1.0628 [2026-04-22 01:32:27] Validation | Batch 1320/1567 | Loss: 1.0633 [2026-04-22 01:32:28] Validation | Batch 1330/1567 | Loss: 1.0648 [2026-04-22 01:32:30] Validation | Batch 1340/1567 | Loss: 1.0645 [2026-04-22 01:32:30] Validation | Batch 1350/1567 | Loss: 1.0648 [2026-04-22 01:32:31] Validation | Batch 1360/1567 | Loss: 1.0639 [2026-04-22 01:32:33] Validation | Batch 1370/1567 | Loss: 1.0635 [2026-04-22 01:32:34] Validation | Batch 1380/1567 | Loss: 1.0635 [2026-04-22 01:32:35] Validation | Batch 1390/1567 | Loss: 1.0628 [2026-04-22 01:32:36] Validation | Batch 1400/1567 | Loss: 1.0624 [2026-04-22 01:32:37] Validation | Batch 1410/1567 | Loss: 1.0630 [2026-04-22 01:32:38] Validation | Batch 1420/1567 | Loss: 1.0630 [2026-04-22 01:32:39] Validation | Batch 1430/1567 | Loss: 1.0633 [2026-04-22 01:32:41] Validation | Batch 1440/1567 | Loss: 1.0641 [2026-04-22 01:32:42] Validation | Batch 1450/1567 | Loss: 1.0643 [2026-04-22 01:32:42] Validation | Batch 1460/1567 | Loss: 1.0636 [2026-04-22 01:32:44] Validation | Batch 1470/1567 | Loss: 1.0634 [2026-04-22 01:32:45] Validation | Batch 1480/1567 | Loss: 1.0632 [2026-04-22 01:32:45] Validation | Batch 1490/1567 | Loss: 1.0627 [2026-04-22 01:32:47] Validation | Batch 1500/1567 | Loss: 1.0624 [2026-04-22 01:32:48] Validation | Batch 1510/1567 | Loss: 1.0615 [2026-04-22 01:32:49] Validation | Batch 1520/1567 | Loss: 1.0614 [2026-04-22 01:32:50] Validation | Batch 1530/1567 | Loss: 1.0614 [2026-04-22 01:32:51] Validation | Batch 1540/1567 | Loss: 1.0620 [2026-04-22 01:32:52] Validation | Batch 1550/1567 | Loss: 1.0634 [2026-04-22 01:32:53] Validation | Batch 1560/1567 | Loss: 1.0629 [2026-04-22 01:32:54] Validation | Batch 1567/1567 | Loss: 1.0630 [2026-04-22 01:32:54] Validation | Loss: 1.0630 | PPL: 2.95 | Time: 185.10s [2026-04-22 01:33:00] Epoch 3 | Step 24010 | Loss: 0.8234 | LR: 2.00e-06 [2026-04-22 01:33:06] Epoch 3 | Step 24020 | Loss: 0.8231 | LR: 2.00e-06 [2026-04-22 01:33:11] Epoch 3 | Step 24030 | Loss: 0.8230 | LR: 2.00e-06 [2026-04-22 01:33:17] Epoch 3 | Step 24040 | Loss: 0.8227 | LR: 2.00e-06 [2026-04-22 01:33:22] Epoch 3 | Step 24050 | Loss: 0.8227 | LR: 2.00e-06 [2026-04-22 01:33:27] Epoch 3 | Step 24060 | Loss: 0.8231 | LR: 2.00e-06 [2026-04-22 01:33:32] Epoch 3 | Step 24070 | Loss: 0.8232 | LR: 2.00e-06 [2026-04-22 01:33:37] Epoch 3 | Step 24080 | Loss: 0.8230 | LR: 2.00e-06 [2026-04-22 01:33:42] Epoch 3 | Step 24090 | Loss: 0.8226 | LR: 2.00e-06 [2026-04-22 01:33:48] Epoch 3 | Step 24100 | Loss: 0.8227 | LR: 2.00e-06 [2026-04-22 01:33:53] Epoch 3 | Step 24110 | Loss: 0.8229 | LR: 2.00e-06 [2026-04-22 01:33:58] Epoch 3 | Step 24120 | Loss: 0.8232 | LR: 2.00e-06 [2026-04-22 01:34:03] Epoch 3 | Step 24130 | Loss: 0.8235 | LR: 2.00e-06 [2026-04-22 01:34:09] Epoch 3 | Step 24140 | Loss: 0.8236 | LR: 2.00e-06 [2026-04-22 01:34:15] Epoch 3 | Step 24150 | Loss: 0.8236 | LR: 2.00e-06 [2026-04-22 01:34:21] Epoch 3 | Step 24160 | Loss: 0.8238 | LR: 2.00e-06 [2026-04-22 01:34:26] Epoch 3 | Step 24170 | Loss: 0.8237 | LR: 2.00e-06 [2026-04-22 01:34:31] Epoch 3 | Step 24180 | Loss: 0.8239 | LR: 2.00e-06 [2026-04-22 01:34:36] Epoch 3 | Step 24190 | Loss: 0.8243 | LR: 2.00e-06 [2026-04-22 01:34:42] Epoch 3 | Step 24200 | Loss: 0.8243 | LR: 2.00e-06 [2026-04-22 01:34:47] Epoch 3 | Step 24210 | Loss: 0.8244 | LR: 2.00e-06 [2026-04-22 01:34:52] Epoch 3 | Step 24220 | Loss: 0.8244 | LR: 2.00e-06 [2026-04-22 01:34:57] Epoch 3 | Step 24230 | Loss: 0.8247 | LR: 2.00e-06 [2026-04-22 01:35:02] Epoch 3 | Step 24240 | Loss: 0.8247 | LR: 2.00e-06 [2026-04-22 01:35:07] Epoch 3 | Step 24250 | Loss: 0.8246 | LR: 2.00e-06 [2026-04-22 01:35:13] Epoch 3 | Step 24260 | Loss: 0.8244 | LR: 2.00e-06 [2026-04-22 01:35:19] Epoch 3 | Step 24270 | Loss: 0.8242 | LR: 2.00e-06 [2026-04-22 01:35:24] Epoch 3 | Step 24280 | Loss: 0.8245 | LR: 2.00e-06 [2026-04-22 01:35:30] Epoch 3 | Step 24290 | Loss: 0.8246 | LR: 2.00e-06 [2026-04-22 01:35:36] Epoch 3 | Step 24300 | Loss: 0.8245 | LR: 2.00e-06 [2026-04-22 01:35:42] Epoch 3 | Step 24310 | Loss: 0.8242 | LR: 2.00e-06 [2026-04-22 01:35:48] Epoch 3 | Step 24320 | Loss: 0.8244 | LR: 2.00e-06 [2026-04-22 01:35:53] Epoch 3 | Step 24330 | Loss: 0.8243 | LR: 2.00e-06 [2026-04-22 01:35:58] Epoch 3 | Step 24340 | Loss: 0.8242 | LR: 2.00e-06 [2026-04-22 01:36:03] Epoch 3 | Step 24350 | Loss: 0.8242 | LR: 2.00e-06 [2026-04-22 01:36:08] Epoch 3 | Step 24360 | Loss: 0.8241 | LR: 2.00e-06 [2026-04-22 01:36:13] Epoch 3 | Step 24370 | Loss: 0.8240 | LR: 2.00e-06 [2026-04-22 01:36:19] Epoch 3 | Step 24380 | Loss: 0.8237 | LR: 2.00e-06 [2026-04-22 01:36:24] Epoch 3 | Step 24390 | Loss: 0.8236 | LR: 2.00e-06 [2026-04-22 01:36:29] Epoch 3 | Step 24400 | Loss: 0.8232 | LR: 2.00e-06 [2026-04-22 01:36:33] Epoch 3 | Step 24410 | Loss: 0.8230 | LR: 2.00e-06 [2026-04-22 01:36:39] Epoch 3 | Step 24420 | Loss: 0.8230 | LR: 2.00e-06 [2026-04-22 01:36:44] Epoch 3 | Step 24430 | Loss: 0.8227 | LR: 2.00e-06 [2026-04-22 01:36:50] Epoch 3 | Step 24440 | Loss: 0.8225 | LR: 2.00e-06 [2026-04-22 01:36:55] Epoch 3 | Step 24450 | Loss: 0.8224 | LR: 2.00e-06 [2026-04-22 01:37:00] Epoch 3 | Step 24460 | Loss: 0.8220 | LR: 2.00e-06 [2026-04-22 01:37:06] Epoch 3 | Step 24470 | Loss: 0.8224 | LR: 2.00e-06 [2026-04-22 01:37:11] Epoch 3 | Step 24480 | Loss: 0.8219 | LR: 2.00e-06 [2026-04-22 01:37:16] Epoch 3 | Step 24490 | Loss: 0.8218 | LR: 2.00e-06 [2026-04-22 01:37:22] Epoch 3 | Step 24500 | Loss: 0.8213 | LR: 2.00e-06 [2026-04-22 01:37:27] Epoch 3 | Step 24510 | Loss: 0.8213 | LR: 2.00e-06 [2026-04-22 01:37:32] Epoch 3 | Step 24520 | Loss: 0.8213 | LR: 2.00e-06 [2026-04-22 01:37:37] Epoch 3 | Step 24530 | Loss: 0.8214 | LR: 2.00e-06 [2026-04-22 01:37:43] Epoch 3 | Step 24540 | Loss: 0.8213 | LR: 2.00e-06 [2026-04-22 01:37:49] Epoch 3 | Step 24550 | Loss: 0.8212 | LR: 2.00e-06 [2026-04-22 01:37:54] Epoch 3 | Step 24560 | Loss: 0.8214 | LR: 2.00e-06 [2026-04-22 01:37:59] Epoch 3 | Step 24570 | Loss: 0.8216 | LR: 2.00e-06 [2026-04-22 01:38:05] Epoch 3 | Step 24580 | Loss: 0.8216 | LR: 2.00e-06 [2026-04-22 01:38:10] Epoch 3 | Step 24590 | Loss: 0.8217 | LR: 2.00e-06 [2026-04-22 01:38:16] Epoch 3 | Step 24600 | Loss: 0.8218 | LR: 2.00e-06 [2026-04-22 01:38:20] Epoch 3 | Step 24610 | Loss: 0.8216 | LR: 2.00e-06 [2026-04-22 01:38:25] Epoch 3 | Step 24620 | Loss: 0.8219 | LR: 2.00e-06 [2026-04-22 01:38:31] Epoch 3 | Step 24630 | Loss: 0.8216 | LR: 2.00e-06 [2026-04-22 01:38:36] Epoch 3 | Step 24640 | Loss: 0.8218 | LR: 2.00e-06 [2026-04-22 01:38:41] Epoch 3 | Step 24650 | Loss: 0.8216 | LR: 2.00e-06 [2026-04-22 01:38:46] Epoch 3 | Step 24660 | Loss: 0.8217 | LR: 2.00e-06 [2026-04-22 01:38:51] Epoch 3 | Step 24670 | Loss: 0.8216 | LR: 2.00e-06 [2026-04-22 01:38:57] Epoch 3 | Step 24680 | Loss: 0.8217 | LR: 2.00e-06 [2026-04-22 01:39:02] Epoch 3 | Step 24690 | Loss: 0.8215 | LR: 2.00e-06 [2026-04-22 01:39:07] Epoch 3 | Step 24700 | Loss: 0.8214 | LR: 2.00e-06 [2026-04-22 01:39:12] Epoch 3 | Step 24710 | Loss: 0.8212 | LR: 2.00e-06 [2026-04-22 01:39:19] Epoch 3 | Step 24720 | Loss: 0.8214 | LR: 2.00e-06 [2026-04-22 01:39:24] Epoch 3 | Step 24730 | Loss: 0.8212 | LR: 2.00e-06 [2026-04-22 01:39:29] Epoch 3 | Step 24740 | Loss: 0.8211 | LR: 2.00e-06 [2026-04-22 01:39:34] Epoch 3 | Step 24750 | Loss: 0.8209 | LR: 2.00e-06 [2026-04-22 01:39:39] Epoch 3 | Step 24760 | Loss: 0.8210 | LR: 2.00e-06 [2026-04-22 01:39:44] Epoch 3 | Step 24770 | Loss: 0.8214 | LR: 2.00e-06 [2026-04-22 01:39:49] Epoch 3 | Step 24780 | Loss: 0.8212 | LR: 2.00e-06 [2026-04-22 01:39:55] Epoch 3 | Step 24790 | Loss: 0.8212 | LR: 2.00e-06 [2026-04-22 01:40:00] Epoch 3 | Step 24800 | Loss: 0.8212 | LR: 2.00e-06 [2026-04-22 01:40:06] Epoch 3 | Step 24810 | Loss: 0.8213 | LR: 2.00e-06 [2026-04-22 01:40:11] Epoch 3 | Step 24820 | Loss: 0.8214 | LR: 2.00e-06 [2026-04-22 01:40:16] Epoch 3 | Step 24830 | Loss: 0.8216 | LR: 2.00e-06 [2026-04-22 01:40:21] Epoch 3 | Step 24840 | Loss: 0.8216 | LR: 2.00e-06 [2026-04-22 01:40:28] Epoch 3 | Step 24850 | Loss: 0.8220 | LR: 2.00e-06 [2026-04-22 01:40:34] Epoch 3 | Step 24860 | Loss: 0.8219 | LR: 2.00e-06 [2026-04-22 01:40:39] Epoch 3 | Step 24870 | Loss: 0.8223 | LR: 2.00e-06 [2026-04-22 01:40:45] Epoch 3 | Step 24880 | Loss: 0.8222 | LR: 2.00e-06 [2026-04-22 01:40:50] Epoch 3 | Step 24890 | Loss: 0.8220 | LR: 2.00e-06 [2026-04-22 01:40:56] Epoch 3 | Step 24900 | Loss: 0.8222 | LR: 2.00e-06 [2026-04-22 01:41:02] Epoch 3 | Step 24910 | Loss: 0.8222 | LR: 2.00e-06 [2026-04-22 01:41:07] Epoch 3 | Step 24920 | Loss: 0.8223 | LR: 2.00e-06 [2026-04-22 01:41:13] Epoch 3 | Step 24930 | Loss: 0.8224 | LR: 2.00e-06 [2026-04-22 01:41:18] Epoch 3 | Step 24940 | Loss: 0.8223 | LR: 2.00e-06 [2026-04-22 01:41:24] Epoch 3 | Step 24950 | Loss: 0.8221 | LR: 2.00e-06 [2026-04-22 01:41:29] Epoch 3 | Step 24960 | Loss: 0.8220 | LR: 2.00e-06 [2026-04-22 01:41:33] Epoch 3 | Step 24970 | Loss: 0.8218 | LR: 2.00e-06 [2026-04-22 01:41:39] Epoch 3 | Step 24980 | Loss: 0.8216 | LR: 2.00e-06 [2026-04-22 01:41:44] Epoch 3 | Step 24990 | Loss: 0.8215 | LR: 2.00e-06 [2026-04-22 01:41:50] Epoch 3 | Step 25000 | Loss: 0.8215 | LR: 2.00e-06 [2026-04-22 01:41:52] Validation | Batch 10/1567 | Loss: 1.0507 [2026-04-22 01:41:53] Validation | Batch 20/1567 | Loss: 1.1363 [2026-04-22 01:41:54] Validation | Batch 30/1567 | Loss: 1.0919 [2026-04-22 01:41:56] Validation | Batch 40/1567 | Loss: 1.1126 [2026-04-22 01:41:56] Validation | Batch 50/1567 | Loss: 1.0875 [2026-04-22 01:41:58] Validation | Batch 60/1567 | Loss: 1.0759 [2026-04-22 01:41:59] Validation | Batch 70/1567 | Loss: 1.0677 [2026-04-22 01:42:01] Validation | Batch 80/1567 | Loss: 1.0764 [2026-04-22 01:42:02] Validation | Batch 90/1567 | Loss: 1.0710 [2026-04-22 01:42:03] Validation | Batch 100/1567 | Loss: 1.0496 [2026-04-22 01:42:04] Validation | Batch 110/1567 | Loss: 1.0395 [2026-04-22 01:42:06] Validation | Batch 120/1567 | Loss: 1.0341 [2026-04-22 01:42:07] Validation | Batch 130/1567 | Loss: 1.0286 [2026-04-22 01:42:08] Validation | Batch 140/1567 | Loss: 1.0390 [2026-04-22 01:42:09] Validation | Batch 150/1567 | Loss: 1.0499 [2026-04-22 01:42:10] Validation | Batch 160/1567 | Loss: 1.0488 [2026-04-22 01:42:11] Validation | Batch 170/1567 | Loss: 1.0409 [2026-04-22 01:42:12] Validation | Batch 180/1567 | Loss: 1.0439 [2026-04-22 01:42:13] Validation | Batch 190/1567 | Loss: 1.0484 [2026-04-22 01:42:15] Validation | Batch 200/1567 | Loss: 1.0517 [2026-04-22 01:42:16] Validation | Batch 210/1567 | Loss: 1.0496 [2026-04-22 01:42:17] Validation | Batch 220/1567 | Loss: 1.0550 [2026-04-22 01:42:19] Validation | Batch 230/1567 | Loss: 1.0591 [2026-04-22 01:42:20] Validation | Batch 240/1567 | Loss: 1.0612 [2026-04-22 01:42:21] Validation | Batch 250/1567 | Loss: 1.0655 [2026-04-22 01:42:22] Validation | Batch 260/1567 | Loss: 1.0686 [2026-04-22 01:42:23] Validation | Batch 270/1567 | Loss: 1.0730 [2026-04-22 01:42:25] Validation | Batch 280/1567 | Loss: 1.0765 [2026-04-22 01:42:27] Validation | Batch 290/1567 | Loss: 1.0718 [2026-04-22 01:42:28] Validation | Batch 300/1567 | Loss: 1.0711 [2026-04-22 01:42:29] Validation | Batch 310/1567 | Loss: 1.0679 [2026-04-22 01:42:30] Validation | Batch 320/1567 | Loss: 1.0707 [2026-04-22 01:42:31] Validation | Batch 330/1567 | Loss: 1.0708 [2026-04-22 01:42:33] Validation | Batch 340/1567 | Loss: 1.0702 [2026-04-22 01:42:34] Validation | Batch 350/1567 | Loss: 1.0675 [2026-04-22 01:42:35] Validation | Batch 360/1567 | Loss: 1.0612 [2026-04-22 01:42:36] Validation | Batch 370/1567 | Loss: 1.0613 [2026-04-22 01:42:38] Validation | Batch 380/1567 | Loss: 1.0656 [2026-04-22 01:42:39] Validation | Batch 390/1567 | Loss: 1.0646 [2026-04-22 01:42:40] Validation | Batch 400/1567 | Loss: 1.0654 [2026-04-22 01:42:41] Validation | Batch 410/1567 | Loss: 1.0615 [2026-04-22 01:42:42] Validation | Batch 420/1567 | Loss: 1.0598 [2026-04-22 01:42:44] Validation | Batch 430/1567 | Loss: 1.0623 [2026-04-22 01:42:45] Validation | Batch 440/1567 | Loss: 1.0625 [2026-04-22 01:42:46] Validation | Batch 450/1567 | Loss: 1.0645 [2026-04-22 01:42:47] Validation | Batch 460/1567 | Loss: 1.0671 [2026-04-22 01:42:48] Validation | Batch 470/1567 | Loss: 1.0720 [2026-04-22 01:42:49] Validation | Batch 480/1567 | Loss: 1.0696 [2026-04-22 01:42:51] Validation | Batch 490/1567 | Loss: 1.0673 [2026-04-22 01:42:52] Validation | Batch 500/1567 | Loss: 1.0684 [2026-04-22 01:42:53] Validation | Batch 510/1567 | Loss: 1.0683 [2026-04-22 01:42:54] Validation | Batch 520/1567 | Loss: 1.0696 [2026-04-22 01:42:55] Validation | Batch 530/1567 | Loss: 1.0681 [2026-04-22 01:42:56] Validation | Batch 540/1567 | Loss: 1.0654 [2026-04-22 01:42:58] Validation | Batch 550/1567 | Loss: 1.0665 [2026-04-22 01:42:59] Validation | Batch 560/1567 | Loss: 1.0656 [2026-04-22 01:43:00] Validation | Batch 570/1567 | Loss: 1.0614 [2026-04-22 01:43:02] Validation | Batch 580/1567 | Loss: 1.0633 [2026-04-22 01:43:03] Validation | Batch 590/1567 | Loss: 1.0631 [2026-04-22 01:43:04] Validation | Batch 600/1567 | Loss: 1.0620 [2026-04-22 01:43:05] Validation | Batch 610/1567 | Loss: 1.0640 [2026-04-22 01:43:07] Validation | Batch 620/1567 | Loss: 1.0620 [2026-04-22 01:43:08] Validation | Batch 630/1567 | Loss: 1.0623 [2026-04-22 01:43:10] Validation | Batch 640/1567 | Loss: 1.0629 [2026-04-22 01:43:11] Validation | Batch 650/1567 | Loss: 1.0658 [2026-04-22 01:43:12] Validation | Batch 660/1567 | Loss: 1.0671 [2026-04-22 01:43:13] Validation | Batch 670/1567 | Loss: 1.0652 [2026-04-22 01:43:14] Validation | Batch 680/1567 | Loss: 1.0640 [2026-04-22 01:43:15] Validation | Batch 690/1567 | Loss: 1.0625 [2026-04-22 01:43:17] Validation | Batch 700/1567 | Loss: 1.0627 [2026-04-22 01:43:18] Validation | Batch 710/1567 | Loss: 1.0619 [2026-04-22 01:43:19] Validation | Batch 720/1567 | Loss: 1.0588 [2026-04-22 01:43:20] Validation | Batch 730/1567 | Loss: 1.0593 [2026-04-22 01:43:21] Validation | Batch 740/1567 | Loss: 1.0600 [2026-04-22 01:43:22] Validation | Batch 750/1567 | Loss: 1.0596 [2026-04-22 01:43:23] Validation | Batch 760/1567 | Loss: 1.0608 [2026-04-22 01:43:25] Validation | Batch 770/1567 | Loss: 1.0604 [2026-04-22 01:43:26] Validation | Batch 780/1567 | Loss: 1.0614 [2026-04-22 01:43:27] Validation | Batch 790/1567 | Loss: 1.0599 [2026-04-22 01:43:28] Validation | Batch 800/1567 | Loss: 1.0581 [2026-04-22 01:43:29] Validation | Batch 810/1567 | Loss: 1.0588 [2026-04-22 01:43:30] Validation | Batch 820/1567 | Loss: 1.0580 [2026-04-22 01:43:31] Validation | Batch 830/1567 | Loss: 1.0572 [2026-04-22 01:43:32] Validation | Batch 840/1567 | Loss: 1.0579 [2026-04-22 01:43:33] Validation | Batch 850/1567 | Loss: 1.0591 [2026-04-22 01:43:34] Validation | Batch 860/1567 | Loss: 1.0598 [2026-04-22 01:43:35] Validation | Batch 870/1567 | Loss: 1.0606 [2026-04-22 01:43:36] Validation | Batch 880/1567 | Loss: 1.0605 [2026-04-22 01:43:38] Validation | Batch 890/1567 | Loss: 1.0600 [2026-04-22 01:43:39] Validation | Batch 900/1567 | Loss: 1.0597 [2026-04-22 01:43:40] Validation | Batch 910/1567 | Loss: 1.0594 [2026-04-22 01:43:41] Validation | Batch 920/1567 | Loss: 1.0613 [2026-04-22 01:43:42] Validation | Batch 930/1567 | Loss: 1.0612 [2026-04-22 01:43:43] Validation | Batch 940/1567 | Loss: 1.0611 [2026-04-22 01:43:45] Validation | Batch 950/1567 | Loss: 1.0607 [2026-04-22 01:43:46] Validation | Batch 960/1567 | Loss: 1.0610 [2026-04-22 01:43:47] Validation | Batch 970/1567 | Loss: 1.0616 [2026-04-22 01:43:48] Validation | Batch 980/1567 | Loss: 1.0613 [2026-04-22 01:43:48] Validation | Batch 990/1567 | Loss: 1.0622 [2026-04-22 01:43:50] Validation | Batch 1000/1567 | Loss: 1.0626 [2026-04-22 01:43:51] Validation | Batch 1010/1567 | Loss: 1.0617 [2026-04-22 01:43:52] Validation | Batch 1020/1567 | Loss: 1.0629 [2026-04-22 01:43:53] Validation | Batch 1030/1567 | Loss: 1.0634 [2026-04-22 01:43:55] Validation | Batch 1040/1567 | Loss: 1.0625 [2026-04-22 01:43:56] Validation | Batch 1050/1567 | Loss: 1.0615 [2026-04-22 01:43:57] Validation | Batch 1060/1567 | Loss: 1.0626 [2026-04-22 01:43:58] Validation | Batch 1070/1567 | Loss: 1.0625 [2026-04-22 01:44:00] Validation | Batch 1080/1567 | Loss: 1.0638 [2026-04-22 01:44:01] Validation | Batch 1090/1567 | Loss: 1.0665 [2026-04-22 01:44:02] Validation | Batch 1100/1567 | Loss: 1.0681 [2026-04-22 01:44:03] Validation | Batch 1110/1567 | Loss: 1.0670 [2026-04-22 01:44:04] Validation | Batch 1120/1567 | Loss: 1.0672 [2026-04-22 01:44:05] Validation | Batch 1130/1567 | Loss: 1.0654 [2026-04-22 01:44:06] Validation | Batch 1140/1567 | Loss: 1.0659 [2026-04-22 01:44:08] Validation | Batch 1150/1567 | Loss: 1.0646 [2026-04-22 01:44:09] Validation | Batch 1160/1567 | Loss: 1.0639 [2026-04-22 01:44:10] Validation | Batch 1170/1567 | Loss: 1.0642 [2026-04-22 01:44:11] Validation | Batch 1180/1567 | Loss: 1.0645 [2026-04-22 01:44:12] Validation | Batch 1190/1567 | Loss: 1.0647 [2026-04-22 01:44:13] Validation | Batch 1200/1567 | Loss: 1.0634 [2026-04-22 01:44:15] Validation | Batch 1210/1567 | Loss: 1.0628 [2026-04-22 01:44:16] Validation | Batch 1220/1567 | Loss: 1.0637 [2026-04-22 01:44:17] Validation | Batch 1230/1567 | Loss: 1.0642 [2026-04-22 01:44:18] Validation | Batch 1240/1567 | Loss: 1.0641 [2026-04-22 01:44:19] Validation | Batch 1250/1567 | Loss: 1.0645 [2026-04-22 01:44:21] Validation | Batch 1260/1567 | Loss: 1.0642 [2026-04-22 01:44:22] Validation | Batch 1270/1567 | Loss: 1.0624 [2026-04-22 01:44:23] Validation | Batch 1280/1567 | Loss: 1.0626 [2026-04-22 01:44:25] Validation | Batch 1290/1567 | Loss: 1.0627 [2026-04-22 01:44:26] Validation | Batch 1300/1567 | Loss: 1.0630 [2026-04-22 01:44:27] Validation | Batch 1310/1567 | Loss: 1.0638 [2026-04-22 01:44:28] Validation | Batch 1320/1567 | Loss: 1.0644 [2026-04-22 01:44:29] Validation | Batch 1330/1567 | Loss: 1.0658 [2026-04-22 01:44:30] Validation | Batch 1340/1567 | Loss: 1.0655 [2026-04-22 01:44:31] Validation | Batch 1350/1567 | Loss: 1.0658 [2026-04-22 01:44:32] Validation | Batch 1360/1567 | Loss: 1.0649 [2026-04-22 01:44:34] Validation | Batch 1370/1567 | Loss: 1.0645 [2026-04-22 01:44:35] Validation | Batch 1380/1567 | Loss: 1.0645 [2026-04-22 01:44:36] Validation | Batch 1390/1567 | Loss: 1.0638 [2026-04-22 01:44:37] Validation | Batch 1400/1567 | Loss: 1.0634 [2026-04-22 01:44:38] Validation | Batch 1410/1567 | Loss: 1.0640 [2026-04-22 01:44:39] Validation | Batch 1420/1567 | Loss: 1.0640 [2026-04-22 01:44:40] Validation | Batch 1430/1567 | Loss: 1.0643 [2026-04-22 01:44:42] Validation | Batch 1440/1567 | Loss: 1.0651 [2026-04-22 01:44:43] Validation | Batch 1450/1567 | Loss: 1.0653 [2026-04-22 01:44:43] Validation | Batch 1460/1567 | Loss: 1.0646 [2026-04-22 01:44:44] Validation | Batch 1470/1567 | Loss: 1.0644 [2026-04-22 01:44:45] Validation | Batch 1480/1567 | Loss: 1.0641 [2026-04-22 01:44:46] Validation | Batch 1490/1567 | Loss: 1.0636 [2026-04-22 01:44:48] Validation | Batch 1500/1567 | Loss: 1.0633 [2026-04-22 01:44:49] Validation | Batch 1510/1567 | Loss: 1.0624 [2026-04-22 01:44:50] Validation | Batch 1520/1567 | Loss: 1.0623 [2026-04-22 01:44:50] Validation | Batch 1530/1567 | Loss: 1.0623 [2026-04-22 01:44:52] Validation | Batch 1540/1567 | Loss: 1.0629 [2026-04-22 01:44:53] Validation | Batch 1550/1567 | Loss: 1.0643 [2026-04-22 01:44:54] Validation | Batch 1560/1567 | Loss: 1.0639 [2026-04-22 01:44:55] Validation | Batch 1567/1567 | Loss: 1.0639 [2026-04-22 01:44:55] Validation | Loss: 1.0639 | PPL: 2.95 | Time: 184.54s [2026-04-22 01:45:01] Epoch 3 | Step 25010 | Loss: 0.8219 | LR: 2.00e-06 [2026-04-22 01:45:06] Epoch 3 | Step 25020 | Loss: 0.8218 | LR: 2.00e-06 [2026-04-22 01:45:11] Epoch 3 | Step 25030 | Loss: 0.8218 | LR: 2.00e-06 [2026-04-22 01:45:16] Epoch 3 | Step 25040 | Loss: 0.8218 | LR: 2.00e-06 [2026-04-22 01:45:22] Epoch 3 | Step 25050 | Loss: 0.8220 | LR: 2.00e-06 [2026-04-22 01:45:27] Epoch 3 | Step 25060 | Loss: 0.8219 | LR: 2.00e-06 [2026-04-22 01:45:33] Epoch 3 | Step 25070 | Loss: 0.8220 | LR: 2.00e-06 [2026-04-22 01:45:38] Epoch 3 | Step 25080 | Loss: 0.8217 | LR: 2.00e-06 [2026-04-22 01:45:43] Epoch 3 | Step 25090 | Loss: 0.8219 | LR: 2.00e-06 [2026-04-22 01:45:49] Epoch 3 | Step 25100 | Loss: 0.8221 | LR: 2.00e-06 [2026-04-22 01:45:54] Epoch 3 | Step 25110 | Loss: 0.8221 | LR: 2.00e-06 [2026-04-22 01:46:00] Epoch 3 | Step 25120 | Loss: 0.8218 | LR: 2.00e-06 [2026-04-22 01:46:06] Epoch 3 | Step 25130 | Loss: 0.8216 | LR: 2.00e-06 [2026-04-22 01:46:11] Epoch 3 | Step 25140 | Loss: 0.8214 | LR: 2.00e-06 [2026-04-22 01:46:17] Epoch 3 | Step 25150 | Loss: 0.8215 | LR: 2.00e-06 [2026-04-22 01:46:23] Epoch 3 | Step 25160 | Loss: 0.8217 | LR: 2.00e-06 [2026-04-22 01:46:28] Epoch 3 | Step 25170 | Loss: 0.8216 | LR: 2.00e-06 [2026-04-22 01:46:33] Epoch 3 | Step 25180 | Loss: 0.8214 | LR: 2.00e-06 [2026-04-22 01:46:38] Epoch 3 | Step 25190 | Loss: 0.8217 | LR: 2.00e-06 [2026-04-22 01:46:44] Epoch 3 | Step 25200 | Loss: 0.8219 | LR: 2.00e-06 [2026-04-22 01:46:50] Epoch 3 | Step 25210 | Loss: 0.8220 | LR: 2.00e-06 [2026-04-22 01:46:55] Epoch 3 | Step 25220 | Loss: 0.8218 | LR: 2.00e-06 [2026-04-22 01:47:01] Epoch 3 | Step 25230 | Loss: 0.8217 | LR: 2.00e-06 [2026-04-22 01:47:06] Epoch 3 | Step 25240 | Loss: 0.8218 | LR: 2.00e-06 [2026-04-22 01:47:11] Epoch 3 | Step 25250 | Loss: 0.8217 | LR: 2.00e-06 [2026-04-22 01:47:16] Epoch 3 | Step 25260 | Loss: 0.8220 | LR: 2.00e-06 [2026-04-22 01:47:22] Epoch 3 | Step 25270 | Loss: 0.8221 | LR: 2.00e-06 [2026-04-22 01:47:27] Epoch 3 | Step 25280 | Loss: 0.8224 | LR: 2.00e-06 [2026-04-22 01:47:32] Epoch 3 | Step 25290 | Loss: 0.8223 | LR: 2.00e-06 [2026-04-22 01:47:37] Epoch 3 | Step 25300 | Loss: 0.8227 | LR: 2.00e-06 [2026-04-22 01:47:42] Epoch 3 | Step 25310 | Loss: 0.8228 | LR: 2.00e-06 [2026-04-22 01:47:47] Epoch 3 | Step 25320 | Loss: 0.8227 | LR: 2.00e-06 [2026-04-22 01:47:52] Epoch 3 | Step 25330 | Loss: 0.8226 | LR: 2.00e-06 [2026-04-22 01:47:59] Epoch 3 | Step 25340 | Loss: 0.8226 | LR: 2.00e-06 [2026-04-22 01:48:04] Epoch 3 | Step 25350 | Loss: 0.8224 | LR: 2.00e-06 [2026-04-22 01:48:09] Epoch 3 | Step 25360 | Loss: 0.8226 | LR: 2.00e-06 [2026-04-22 01:48:14] Epoch 3 | Step 25370 | Loss: 0.8226 | LR: 2.00e-06 [2026-04-22 01:48:20] Epoch 3 | Step 25380 | Loss: 0.8227 | LR: 2.00e-06 [2026-04-22 01:48:25] Epoch 3 | Step 25390 | Loss: 0.8230 | LR: 2.00e-06 [2026-04-22 01:48:31] Epoch 3 | Step 25400 | Loss: 0.8228 | LR: 2.00e-06 [2026-04-22 01:48:36] Epoch 3 | Step 25410 | Loss: 0.8229 | LR: 2.00e-06 [2026-04-22 01:48:41] Epoch 3 | Step 25420 | Loss: 0.8230 | LR: 2.00e-06 [2026-04-22 01:48:46] Epoch 3 | Step 25430 | Loss: 0.8228 | LR: 2.00e-06 [2026-04-22 01:48:51] Epoch 3 | Step 25440 | Loss: 0.8228 | LR: 2.00e-06 [2026-04-22 01:48:57] Epoch 3 | Step 25450 | Loss: 0.8230 | LR: 2.00e-06 [2026-04-22 01:49:03] Epoch 3 | Step 25460 | Loss: 0.8229 | LR: 2.00e-06 [2026-04-22 01:49:08] Epoch 3 | Step 25470 | Loss: 0.8230 | LR: 2.00e-06 [2026-04-22 01:49:14] Epoch 3 | Step 25480 | Loss: 0.8231 | LR: 2.00e-06 [2026-04-22 01:49:19] Epoch 3 | Step 25490 | Loss: 0.8231 | LR: 2.00e-06 [2026-04-22 01:49:25] Epoch 3 | Step 25500 | Loss: 0.8230 | LR: 2.00e-06 [2026-04-22 01:49:30] Epoch 3 | Step 25510 | Loss: 0.8230 | LR: 2.00e-06 [2026-04-22 01:49:35] Epoch 3 | Step 25520 | Loss: 0.8230 | LR: 2.00e-06 [2026-04-22 01:49:41] Epoch 3 | Step 25530 | Loss: 0.8231 | LR: 2.00e-06 [2026-04-22 01:49:48] Epoch 3 | Step 25540 | Loss: 0.8230 | LR: 2.00e-06 [2026-04-22 01:49:53] Epoch 3 | Step 25550 | Loss: 0.8231 | LR: 2.00e-06 [2026-04-22 01:49:59] Epoch 3 | Step 25560 | Loss: 0.8232 | LR: 2.00e-06 [2026-04-22 01:50:04] Epoch 3 | Step 25570 | Loss: 0.8230 | LR: 2.00e-06 [2026-04-22 01:50:08] Epoch 3 | Step 25580 | Loss: 0.8231 | LR: 2.00e-06 [2026-04-22 01:50:14] Epoch 3 | Step 25590 | Loss: 0.8229 | LR: 2.00e-06 [2026-04-22 01:50:19] Epoch 3 | Step 25600 | Loss: 0.8232 | LR: 2.00e-06 [2026-04-22 01:50:24] Epoch 3 | Step 25610 | Loss: 0.8230 | LR: 2.00e-06 [2026-04-22 01:50:30] Epoch 3 | Step 25620 | Loss: 0.8230 | LR: 2.00e-06 [2026-04-22 01:50:35] Epoch 3 | Step 25630 | Loss: 0.8231 | LR: 2.00e-06 [2026-04-22 01:50:41] Epoch 3 | Step 25640 | Loss: 0.8229 | LR: 2.00e-06 [2026-04-22 01:50:46] Epoch 3 | Step 25650 | Loss: 0.8230 | LR: 2.00e-06 [2026-04-22 01:50:52] Epoch 3 | Step 25660 | Loss: 0.8230 | LR: 2.00e-06 [2026-04-22 01:50:57] Epoch 3 | Step 25670 | Loss: 0.8227 | LR: 2.00e-06 [2026-04-22 01:51:02] Epoch 3 | Step 25680 | Loss: 0.8227 | LR: 2.00e-06 [2026-04-22 01:51:08] Epoch 3 | Step 25690 | Loss: 0.8230 | LR: 2.00e-06 [2026-04-22 01:51:13] Epoch 3 | Step 25700 | Loss: 0.8229 | LR: 2.00e-06 [2026-04-22 01:51:18] Epoch 3 | Step 25710 | Loss: 0.8230 | LR: 2.00e-06 [2026-04-22 01:51:23] Epoch 3 | Step 25720 | Loss: 0.8230 | LR: 2.00e-06 [2026-04-22 01:51:28] Epoch 3 | Step 25730 | Loss: 0.8230 | LR: 2.00e-06 [2026-04-22 01:51:33] Epoch 3 | Step 25740 | Loss: 0.8230 | LR: 2.00e-06 [2026-04-22 01:51:38] Epoch 3 | Step 25750 | Loss: 0.8229 | LR: 2.00e-06 [2026-04-22 01:51:43] Epoch 3 | Step 25760 | Loss: 0.8229 | LR: 2.00e-06 [2026-04-22 01:51:48] Epoch 3 | Step 25770 | Loss: 0.8229 | LR: 2.00e-06 [2026-04-22 01:51:53] Epoch 3 | Step 25780 | Loss: 0.8229 | LR: 2.00e-06 [2026-04-22 01:51:59] Epoch 3 | Step 25790 | Loss: 0.8229 | LR: 2.00e-06 [2026-04-22 01:52:05] Epoch 3 | Step 25800 | Loss: 0.8231 | LR: 2.00e-06 [2026-04-22 01:52:10] Epoch 3 | Step 25810 | Loss: 0.8231 | LR: 2.00e-06 [2026-04-22 01:52:15] Epoch 3 | Step 25820 | Loss: 0.8232 | LR: 2.00e-06 [2026-04-22 01:52:20] Epoch 3 | Step 25830 | Loss: 0.8232 | LR: 2.00e-06 [2026-04-22 01:52:26] Epoch 3 | Step 25840 | Loss: 0.8231 | LR: 2.00e-06 [2026-04-22 01:52:31] Epoch 3 | Step 25850 | Loss: 0.8233 | LR: 2.00e-06 [2026-04-22 01:52:37] Epoch 3 | Step 25860 | Loss: 0.8231 | LR: 2.00e-06 [2026-04-22 01:52:42] Epoch 3 | Step 25870 | Loss: 0.8231 | LR: 2.00e-06 [2026-04-22 01:52:48] Epoch 3 | Step 25880 | Loss: 0.8232 | LR: 2.00e-06 [2026-04-22 01:52:52] Epoch 3 | Step 25890 | Loss: 0.8235 | LR: 2.00e-06 [2026-04-22 01:52:57] Epoch 3 | Step 25900 | Loss: 0.8236 | LR: 2.00e-06 [2026-04-22 01:53:03] Epoch 3 | Step 25910 | Loss: 0.8238 | LR: 2.00e-06 [2026-04-22 01:53:08] Epoch 3 | Step 25920 | Loss: 0.8238 | LR: 2.00e-06 [2026-04-22 01:53:13] Epoch 3 | Step 25930 | Loss: 0.8239 | LR: 2.00e-06 [2026-04-22 01:53:19] Epoch 3 | Step 25940 | Loss: 0.8241 | LR: 2.00e-06 [2026-04-22 01:53:24] Epoch 3 | Step 25950 | Loss: 0.8241 | LR: 2.00e-06 [2026-04-22 01:53:29] Epoch 3 | Step 25960 | Loss: 0.8240 | LR: 2.00e-06 [2026-04-22 01:53:35] Epoch 3 | Step 25970 | Loss: 0.8240 | LR: 2.00e-06 [2026-04-22 01:53:40] Epoch 3 | Step 25980 | Loss: 0.8243 | LR: 2.00e-06 [2026-04-22 01:53:46] Epoch 3 | Step 25990 | Loss: 0.8241 | LR: 2.00e-06 [2026-04-22 01:53:52] Epoch 3 | Step 26000 | Loss: 0.8241 | LR: 2.00e-06 [2026-04-22 01:53:54] Validation | Batch 10/1567 | Loss: 1.0496 [2026-04-22 01:53:55] Validation | Batch 20/1567 | Loss: 1.1364 [2026-04-22 01:53:56] Validation | Batch 30/1567 | Loss: 1.0921 [2026-04-22 01:53:58] Validation | Batch 40/1567 | Loss: 1.1129 [2026-04-22 01:53:58] Validation | Batch 50/1567 | Loss: 1.0881 [2026-04-22 01:54:00] Validation | Batch 60/1567 | Loss: 1.0763 [2026-04-22 01:54:01] Validation | Batch 70/1567 | Loss: 1.0680 [2026-04-22 01:54:03] Validation | Batch 80/1567 | Loss: 1.0767 [2026-04-22 01:54:04] Validation | Batch 90/1567 | Loss: 1.0712 [2026-04-22 01:54:05] Validation | Batch 100/1567 | Loss: 1.0499 [2026-04-22 01:54:06] Validation | Batch 110/1567 | Loss: 1.0396 [2026-04-22 01:54:08] Validation | Batch 120/1567 | Loss: 1.0341 [2026-04-22 01:54:09] Validation | Batch 130/1567 | Loss: 1.0286 [2026-04-22 01:54:10] Validation | Batch 140/1567 | Loss: 1.0390 [2026-04-22 01:54:11] Validation | Batch 150/1567 | Loss: 1.0499 [2026-04-22 01:54:12] Validation | Batch 160/1567 | Loss: 1.0488 [2026-04-22 01:54:13] Validation | Batch 170/1567 | Loss: 1.0410 [2026-04-22 01:54:14] Validation | Batch 180/1567 | Loss: 1.0440 [2026-04-22 01:54:15] Validation | Batch 190/1567 | Loss: 1.0485 [2026-04-22 01:54:17] Validation | Batch 200/1567 | Loss: 1.0518 [2026-04-22 01:54:18] Validation | Batch 210/1567 | Loss: 1.0497 [2026-04-22 01:54:19] Validation | Batch 220/1567 | Loss: 1.0551 [2026-04-22 01:54:21] Validation | Batch 230/1567 | Loss: 1.0592 [2026-04-22 01:54:22] Validation | Batch 240/1567 | Loss: 1.0613 [2026-04-22 01:54:23] Validation | Batch 250/1567 | Loss: 1.0655 [2026-04-22 01:54:24] Validation | Batch 260/1567 | Loss: 1.0686 [2026-04-22 01:54:25] Validation | Batch 270/1567 | Loss: 1.0729 [2026-04-22 01:54:27] Validation | Batch 280/1567 | Loss: 1.0765 [2026-04-22 01:54:29] Validation | Batch 290/1567 | Loss: 1.0717 [2026-04-22 01:54:30] Validation | Batch 300/1567 | Loss: 1.0710 [2026-04-22 01:54:31] Validation | Batch 310/1567 | Loss: 1.0678 [2026-04-22 01:54:32] Validation | Batch 320/1567 | Loss: 1.0707 [2026-04-22 01:54:34] Validation | Batch 330/1567 | Loss: 1.0708 [2026-04-22 01:54:35] Validation | Batch 340/1567 | Loss: 1.0701 [2026-04-22 01:54:36] Validation | Batch 350/1567 | Loss: 1.0675 [2026-04-22 01:54:37] Validation | Batch 360/1567 | Loss: 1.0612 [2026-04-22 01:54:39] Validation | Batch 370/1567 | Loss: 1.0613 [2026-04-22 01:54:40] Validation | Batch 380/1567 | Loss: 1.0656 [2026-04-22 01:54:41] Validation | Batch 390/1567 | Loss: 1.0646 [2026-04-22 01:54:42] Validation | Batch 400/1567 | Loss: 1.0654 [2026-04-22 01:54:43] Validation | Batch 410/1567 | Loss: 1.0614 [2026-04-22 01:54:44] Validation | Batch 420/1567 | Loss: 1.0597 [2026-04-22 01:54:46] Validation | Batch 430/1567 | Loss: 1.0623 [2026-04-22 01:54:47] Validation | Batch 440/1567 | Loss: 1.0625 [2026-04-22 01:54:48] Validation | Batch 450/1567 | Loss: 1.0645 [2026-04-22 01:54:49] Validation | Batch 460/1567 | Loss: 1.0671 [2026-04-22 01:54:50] Validation | Batch 470/1567 | Loss: 1.0720 [2026-04-22 01:54:52] Validation | Batch 480/1567 | Loss: 1.0696 [2026-04-22 01:54:53] Validation | Batch 490/1567 | Loss: 1.0673 [2026-04-22 01:54:54] Validation | Batch 500/1567 | Loss: 1.0684 [2026-04-22 01:54:55] Validation | Batch 510/1567 | Loss: 1.0682 [2026-04-22 01:54:56] Validation | Batch 520/1567 | Loss: 1.0695 [2026-04-22 01:54:57] Validation | Batch 530/1567 | Loss: 1.0680 [2026-04-22 01:54:58] Validation | Batch 540/1567 | Loss: 1.0653 [2026-04-22 01:55:00] Validation | Batch 550/1567 | Loss: 1.0664 [2026-04-22 01:55:01] Validation | Batch 560/1567 | Loss: 1.0655 [2026-04-22 01:55:02] Validation | Batch 570/1567 | Loss: 1.0613 [2026-04-22 01:55:04] Validation | Batch 580/1567 | Loss: 1.0632 [2026-04-22 01:55:05] Validation | Batch 590/1567 | Loss: 1.0630 [2026-04-22 01:55:06] Validation | Batch 600/1567 | Loss: 1.0619 [2026-04-22 01:55:07] Validation | Batch 610/1567 | Loss: 1.0640 [2026-04-22 01:55:09] Validation | Batch 620/1567 | Loss: 1.0619 [2026-04-22 01:55:10] Validation | Batch 630/1567 | Loss: 1.0623 [2026-04-22 01:55:12] Validation | Batch 640/1567 | Loss: 1.0629 [2026-04-22 01:55:13] Validation | Batch 650/1567 | Loss: 1.0657 [2026-04-22 01:55:14] Validation | Batch 660/1567 | Loss: 1.0670 [2026-04-22 01:55:15] Validation | Batch 670/1567 | Loss: 1.0652 [2026-04-22 01:55:16] Validation | Batch 680/1567 | Loss: 1.0640 [2026-04-22 01:55:17] Validation | Batch 690/1567 | Loss: 1.0625 [2026-04-22 01:55:19] Validation | Batch 700/1567 | Loss: 1.0626 [2026-04-22 01:55:20] Validation | Batch 710/1567 | Loss: 1.0619 [2026-04-22 01:55:21] Validation | Batch 720/1567 | Loss: 1.0587 [2026-04-22 01:55:22] Validation | Batch 730/1567 | Loss: 1.0593 [2026-04-22 01:55:23] Validation | Batch 740/1567 | Loss: 1.0599 [2026-04-22 01:55:24] Validation | Batch 750/1567 | Loss: 1.0595 [2026-04-22 01:55:25] Validation | Batch 760/1567 | Loss: 1.0608 [2026-04-22 01:55:27] Validation | Batch 770/1567 | Loss: 1.0603 [2026-04-22 01:55:28] Validation | Batch 780/1567 | Loss: 1.0614 [2026-04-22 01:55:29] Validation | Batch 790/1567 | Loss: 1.0599 [2026-04-22 01:55:30] Validation | Batch 800/1567 | Loss: 1.0581 [2026-04-22 01:55:31] Validation | Batch 810/1567 | Loss: 1.0588 [2026-04-22 01:55:32] Validation | Batch 820/1567 | Loss: 1.0580 [2026-04-22 01:55:33] Validation | Batch 830/1567 | Loss: 1.0572 [2026-04-22 01:55:34] Validation | Batch 840/1567 | Loss: 1.0580 [2026-04-22 01:55:35] Validation | Batch 850/1567 | Loss: 1.0591 [2026-04-22 01:55:36] Validation | Batch 860/1567 | Loss: 1.0598 [2026-04-22 01:55:37] Validation | Batch 870/1567 | Loss: 1.0606 [2026-04-22 01:55:38] Validation | Batch 880/1567 | Loss: 1.0605 [2026-04-22 01:55:40] Validation | Batch 890/1567 | Loss: 1.0600 [2026-04-22 01:55:41] Validation | Batch 900/1567 | Loss: 1.0597 [2026-04-22 01:55:42] Validation | Batch 910/1567 | Loss: 1.0594 [2026-04-22 01:55:43] Validation | Batch 920/1567 | Loss: 1.0613 [2026-04-22 01:55:44] Validation | Batch 930/1567 | Loss: 1.0613 [2026-04-22 01:55:45] Validation | Batch 940/1567 | Loss: 1.0612 [2026-04-22 01:55:47] Validation | Batch 950/1567 | Loss: 1.0608 [2026-04-22 01:55:48] Validation | Batch 960/1567 | Loss: 1.0611 [2026-04-22 01:55:49] Validation | Batch 970/1567 | Loss: 1.0616 [2026-04-22 01:55:50] Validation | Batch 980/1567 | Loss: 1.0613 [2026-04-22 01:55:50] Validation | Batch 990/1567 | Loss: 1.0622 [2026-04-22 01:55:52] Validation | Batch 1000/1567 | Loss: 1.0627 [2026-04-22 01:55:53] Validation | Batch 1010/1567 | Loss: 1.0618 [2026-04-22 01:55:54] Validation | Batch 1020/1567 | Loss: 1.0630 [2026-04-22 01:55:55] Validation | Batch 1030/1567 | Loss: 1.0634 [2026-04-22 01:55:57] Validation | Batch 1040/1567 | Loss: 1.0626 [2026-04-22 01:55:58] Validation | Batch 1050/1567 | Loss: 1.0615 [2026-04-22 01:55:59] Validation | Batch 1060/1567 | Loss: 1.0627 [2026-04-22 01:56:00] Validation | Batch 1070/1567 | Loss: 1.0625 [2026-04-22 01:56:02] Validation | Batch 1080/1567 | Loss: 1.0639 [2026-04-22 01:56:03] Validation | Batch 1090/1567 | Loss: 1.0666 [2026-04-22 01:56:04] Validation | Batch 1100/1567 | Loss: 1.0681 [2026-04-22 01:56:05] Validation | Batch 1110/1567 | Loss: 1.0671 [2026-04-22 01:56:06] Validation | Batch 1120/1567 | Loss: 1.0673 [2026-04-22 01:56:07] Validation | Batch 1130/1567 | Loss: 1.0655 [2026-04-22 01:56:08] Validation | Batch 1140/1567 | Loss: 1.0659 [2026-04-22 01:56:10] Validation | Batch 1150/1567 | Loss: 1.0646 [2026-04-22 01:56:11] Validation | Batch 1160/1567 | Loss: 1.0640 [2026-04-22 01:56:12] Validation | Batch 1170/1567 | Loss: 1.0643 [2026-04-22 01:56:13] Validation | Batch 1180/1567 | Loss: 1.0646 [2026-04-22 01:56:14] Validation | Batch 1190/1567 | Loss: 1.0648 [2026-04-22 01:56:16] Validation | Batch 1200/1567 | Loss: 1.0635 [2026-04-22 01:56:17] Validation | Batch 1210/1567 | Loss: 1.0629 [2026-04-22 01:56:18] Validation | Batch 1220/1567 | Loss: 1.0638 [2026-04-22 01:56:19] Validation | Batch 1230/1567 | Loss: 1.0643 [2026-04-22 01:56:20] Validation | Batch 1240/1567 | Loss: 1.0642 [2026-04-22 01:56:21] Validation | Batch 1250/1567 | Loss: 1.0645 [2026-04-22 01:56:23] Validation | Batch 1260/1567 | Loss: 1.0643 [2026-04-22 01:56:24] Validation | Batch 1270/1567 | Loss: 1.0625 [2026-04-22 01:56:25] Validation | Batch 1280/1567 | Loss: 1.0627 [2026-04-22 01:56:27] Validation | Batch 1290/1567 | Loss: 1.0628 [2026-04-22 01:56:28] Validation | Batch 1300/1567 | Loss: 1.0631 [2026-04-22 01:56:29] Validation | Batch 1310/1567 | Loss: 1.0638 [2026-04-22 01:56:30] Validation | Batch 1320/1567 | Loss: 1.0644 [2026-04-22 01:56:31] Validation | Batch 1330/1567 | Loss: 1.0659 [2026-04-22 01:56:33] Validation | Batch 1340/1567 | Loss: 1.0656 [2026-04-22 01:56:33] Validation | Batch 1350/1567 | Loss: 1.0659 [2026-04-22 01:56:34] Validation | Batch 1360/1567 | Loss: 1.0649 [2026-04-22 01:56:36] Validation | Batch 1370/1567 | Loss: 1.0646 [2026-04-22 01:56:37] Validation | Batch 1380/1567 | Loss: 1.0646 [2026-04-22 01:56:38] Validation | Batch 1390/1567 | Loss: 1.0639 [2026-04-22 01:56:39] Validation | Batch 1400/1567 | Loss: 1.0635 [2026-04-22 01:56:40] Validation | Batch 1410/1567 | Loss: 1.0641 [2026-04-22 01:56:41] Validation | Batch 1420/1567 | Loss: 1.0641 [2026-04-22 01:56:42] Validation | Batch 1430/1567 | Loss: 1.0644 [2026-04-22 01:56:44] Validation | Batch 1440/1567 | Loss: 1.0652 [2026-04-22 01:56:45] Validation | Batch 1450/1567 | Loss: 1.0653 [2026-04-22 01:56:45] Validation | Batch 1460/1567 | Loss: 1.0647 [2026-04-22 01:56:47] Validation | Batch 1470/1567 | Loss: 1.0645 [2026-04-22 01:56:48] Validation | Batch 1480/1567 | Loss: 1.0642 [2026-04-22 01:56:48] Validation | Batch 1490/1567 | Loss: 1.0637 [2026-04-22 01:56:50] Validation | Batch 1500/1567 | Loss: 1.0634 [2026-04-22 01:56:51] Validation | Batch 1510/1567 | Loss: 1.0625 [2026-04-22 01:56:52] Validation | Batch 1520/1567 | Loss: 1.0624 [2026-04-22 01:56:53] Validation | Batch 1530/1567 | Loss: 1.0624 [2026-04-22 01:56:54] Validation | Batch 1540/1567 | Loss: 1.0630 [2026-04-22 01:56:55] Validation | Batch 1550/1567 | Loss: 1.0644 [2026-04-22 01:56:56] Validation | Batch 1560/1567 | Loss: 1.0640 [2026-04-22 01:56:57] Validation | Batch 1567/1567 | Loss: 1.0640 [2026-04-22 01:56:57] Validation | Loss: 1.0640 | PPL: 2.95 | Time: 184.65s [2026-04-22 01:57:02] Epoch 3 | Step 26010 | Loss: 0.8241 | LR: 2.00e-06 [2026-04-22 01:57:07] Epoch 3 | Step 26020 | Loss: 0.8241 | LR: 2.00e-06 [2026-04-22 01:57:12] Epoch 3 | Step 26030 | Loss: 0.8242 | LR: 2.00e-06 [2026-04-22 01:57:17] Epoch 3 | Step 26040 | Loss: 0.8241 | LR: 2.00e-06 [2026-04-22 01:57:21] Epoch 3 | Step 26050 | Loss: 0.8241 | LR: 2.00e-06 [2026-04-22 01:57:27] Epoch 3 | Step 26060 | Loss: 0.8241 | LR: 2.00e-06 [2026-04-22 01:57:32] Epoch 3 | Step 26070 | Loss: 0.8241 | LR: 2.00e-06 [2026-04-22 01:57:37] Epoch 3 | Step 26080 | Loss: 0.8242 | LR: 2.00e-06 [2026-04-22 01:57:42] Epoch 3 | Step 26090 | Loss: 0.8243 | LR: 2.00e-06 [2026-04-22 01:57:47] Epoch 3 | Step 26100 | Loss: 0.8242 | LR: 2.00e-06 [2026-04-22 01:57:53] Epoch 3 | Step 26110 | Loss: 0.8240 | LR: 2.00e-06 [2026-04-22 01:57:58] Epoch 3 | Step 26120 | Loss: 0.8240 | LR: 2.00e-06 [2026-04-22 01:58:03] Epoch 3 | Step 26130 | Loss: 0.8240 | LR: 2.00e-06 [2026-04-22 01:58:08] Epoch 3 | Step 26140 | Loss: 0.8238 | LR: 2.00e-06 [2026-04-22 01:58:13] Epoch 3 | Step 26150 | Loss: 0.8237 | LR: 2.00e-06 [2026-04-22 01:58:18] Epoch 3 | Step 26160 | Loss: 0.8236 | LR: 2.00e-06 [2026-04-22 01:58:23] Epoch 3 | Step 26170 | Loss: 0.8235 | LR: 2.00e-06 [2026-04-22 01:58:28] Epoch 3 | Step 26180 | Loss: 0.8237 | LR: 2.00e-06 [2026-04-22 01:58:33] Epoch 3 | Step 26190 | Loss: 0.8239 | LR: 2.00e-06 [2026-04-22 01:58:39] Epoch 3 | Step 26200 | Loss: 0.8238 | LR: 2.00e-06 [2026-04-22 01:58:44] Epoch 3 | Step 26210 | Loss: 0.8238 | LR: 2.00e-06 [2026-04-22 01:58:49] Epoch 3 | Step 26220 | Loss: 0.8239 | LR: 2.00e-06 [2026-04-22 01:58:54] Epoch 3 | Step 26230 | Loss: 0.8237 | LR: 2.00e-06 [2026-04-22 01:58:59] Epoch 3 | Step 26240 | Loss: 0.8239 | LR: 2.00e-06 [2026-04-22 01:59:05] Epoch 3 | Step 26250 | Loss: 0.8240 | LR: 2.00e-06 [2026-04-22 01:59:11] Epoch 3 | Step 26260 | Loss: 0.8241 | LR: 2.00e-06 [2026-04-22 01:59:16] Epoch 3 | Step 26270 | Loss: 0.8242 | LR: 2.00e-06 [2026-04-22 01:59:22] Epoch 3 | Step 26280 | Loss: 0.8241 | LR: 2.00e-06 [2026-04-22 01:59:27] Epoch 3 | Step 26290 | Loss: 0.8241 | LR: 2.00e-06 [2026-04-22 01:59:32] Epoch 3 | Step 26300 | Loss: 0.8240 | LR: 2.00e-06 [2026-04-22 01:59:37] Epoch 3 | Step 26310 | Loss: 0.8239 | LR: 2.00e-06 [2026-04-22 01:59:43] Epoch 3 | Step 26320 | Loss: 0.8238 | LR: 2.00e-06 [2026-04-22 01:59:49] Epoch 3 | Step 26330 | Loss: 0.8238 | LR: 2.00e-06 [2026-04-22 01:59:55] Epoch 3 | Step 26340 | Loss: 0.8239 | LR: 2.00e-06 [2026-04-22 01:59:59] Epoch 3 | Step 26350 | Loss: 0.8240 | LR: 2.00e-06 [2026-04-22 02:00:04] Epoch 3 | Step 26360 | Loss: 0.8238 | LR: 2.00e-06 [2026-04-22 02:00:09] Epoch 3 | Step 26370 | Loss: 0.8238 | LR: 2.00e-06 [2026-04-22 02:00:15] Epoch 3 | Step 26380 | Loss: 0.8238 | LR: 2.00e-06 [2026-04-22 02:00:20] Epoch 3 | Step 26390 | Loss: 0.8239 | LR: 2.00e-06 [2026-04-22 02:00:26] Epoch 3 | Step 26400 | Loss: 0.8236 | LR: 2.00e-06 [2026-04-22 02:00:31] Epoch 3 | Step 26410 | Loss: 0.8234 | LR: 2.00e-06 [2026-04-22 02:00:37] Epoch 3 | Step 26420 | Loss: 0.8235 | LR: 2.00e-06 [2026-04-22 02:00:41] Epoch 3 | Step 26430 | Loss: 0.8236 | LR: 2.00e-06 [2026-04-22 02:00:47] Epoch 3 | Step 26440 | Loss: 0.8235 | LR: 2.00e-06 [2026-04-22 02:00:54] Epoch 3 | Step 26450 | Loss: 0.8236 | LR: 2.00e-06 [2026-04-22 02:00:59] Epoch 3 | Step 26460 | Loss: 0.8238 | LR: 2.00e-06 [2026-04-22 02:01:04] Epoch 3 | Step 26470 | Loss: 0.8238 | LR: 2.00e-06 [2026-04-22 02:01:09] Epoch 3 | Step 26480 | Loss: 0.8237 | LR: 2.00e-06 [2026-04-22 02:01:14] Epoch 3 | Step 26490 | Loss: 0.8238 | LR: 2.00e-06 [2026-04-22 02:01:19] Epoch 3 | Step 26500 | Loss: 0.8239 | LR: 2.00e-06 [2026-04-22 02:01:25] Epoch 3 | Step 26510 | Loss: 0.8238 | LR: 2.00e-06 [2026-04-22 02:01:30] Epoch 3 | Step 26520 | Loss: 0.8237 | LR: 2.00e-06 [2026-04-22 02:01:34] Epoch 3 | Step 26530 | Loss: 0.8236 | LR: 2.00e-06 [2026-04-22 02:01:40] Epoch 3 | Step 26540 | Loss: 0.8238 | LR: 2.00e-06 [2026-04-22 02:01:45] Epoch 3 | Step 26550 | Loss: 0.8238 | LR: 2.00e-06 [2026-04-22 02:01:50] Epoch 3 | Step 26560 | Loss: 0.8240 | LR: 2.00e-06 [2026-04-22 02:01:55] Epoch 3 | Step 26570 | Loss: 0.8240 | LR: 2.00e-06 [2026-04-22 02:02:00] Epoch 3 | Step 26580 | Loss: 0.8239 | LR: 2.00e-06 [2026-04-22 02:02:05] Epoch 3 | Step 26590 | Loss: 0.8240 | LR: 2.00e-06 [2026-04-22 02:02:11] Epoch 3 | Step 26600 | Loss: 0.8243 | LR: 2.00e-06 [2026-04-22 02:02:16] Epoch 3 | Step 26610 | Loss: 0.8244 | LR: 2.00e-06 [2026-04-22 02:02:22] Epoch 3 | Step 26620 | Loss: 0.8244 | LR: 2.00e-06 [2026-04-22 02:02:28] Epoch 3 | Step 26630 | Loss: 0.8244 | LR: 2.00e-06 [2026-04-22 02:02:34] Epoch 3 | Step 26640 | Loss: 0.8246 | LR: 2.00e-06 [2026-04-22 02:02:39] Epoch 3 | Step 26650 | Loss: 0.8247 | LR: 2.00e-06 [2026-04-22 02:02:44] Epoch 3 | Step 26660 | Loss: 0.8247 | LR: 2.00e-06 [2026-04-22 02:02:51] Epoch 3 | Step 26670 | Loss: 0.8247 | LR: 2.00e-06 [2026-04-22 02:02:56] Epoch 3 | Step 26680 | Loss: 0.8248 | LR: 2.00e-06 [2026-04-22 02:03:01] Epoch 3 | Step 26690 | Loss: 0.8248 | LR: 2.00e-06 [2026-04-22 02:03:07] Epoch 3 | Step 26700 | Loss: 0.8249 | LR: 2.00e-06 [2026-04-22 02:03:12] Epoch 3 | Step 26710 | Loss: 0.8250 | LR: 2.00e-06 [2026-04-22 02:03:17] Epoch 3 | Step 26720 | Loss: 0.8250 | LR: 2.00e-06 [2026-04-22 02:03:22] Epoch 3 | Step 26730 | Loss: 0.8250 | LR: 2.00e-06 [2026-04-22 02:03:27] Epoch 3 | Step 26740 | Loss: 0.8251 | LR: 2.00e-06 [2026-04-22 02:03:32] Epoch 3 | Step 26750 | Loss: 0.8251 | LR: 2.00e-06 [2026-04-22 02:03:37] Epoch 3 | Step 26760 | Loss: 0.8251 | LR: 2.00e-06 [2026-04-22 02:03:43] Epoch 3 | Step 26770 | Loss: 0.8249 | LR: 2.00e-06 [2026-04-22 02:03:47] Epoch 3 | Step 26780 | Loss: 0.8251 | LR: 2.00e-06 [2026-04-22 02:03:52] Epoch 3 | Step 26790 | Loss: 0.8251 | LR: 2.00e-06 [2026-04-22 02:03:58] Epoch 3 | Step 26800 | Loss: 0.8252 | LR: 2.00e-06 [2026-04-22 02:04:03] Epoch 3 | Step 26810 | Loss: 0.8251 | LR: 2.00e-06 [2026-04-22 02:04:08] Epoch 3 | Step 26820 | Loss: 0.8252 | LR: 2.00e-06 [2026-04-22 02:04:14] Epoch 3 | Step 26830 | Loss: 0.8252 | LR: 2.00e-06 [2026-04-22 02:04:20] Epoch 3 | Step 26840 | Loss: 0.8253 | LR: 2.00e-06 [2026-04-22 02:04:25] Epoch 3 | Step 26850 | Loss: 0.8252 | LR: 2.00e-06 [2026-04-22 02:04:30] Epoch 3 | Step 26860 | Loss: 0.8252 | LR: 2.00e-06 [2026-04-22 02:04:35] Epoch 3 | Step 26870 | Loss: 0.8253 | LR: 2.00e-06 [2026-04-22 02:04:41] Epoch 3 | Step 26880 | Loss: 0.8254 | LR: 2.00e-06 [2026-04-22 02:04:47] Epoch 3 | Step 26890 | Loss: 0.8255 | LR: 2.00e-06 [2026-04-22 02:04:53] Epoch 3 | Step 26900 | Loss: 0.8256 | LR: 2.00e-06 [2026-04-22 02:04:57] Epoch 3 | Step 26910 | Loss: 0.8254 | LR: 2.00e-06 [2026-04-22 02:05:03] Epoch 3 | Step 26920 | Loss: 0.8254 | LR: 2.00e-06 [2026-04-22 02:05:08] Epoch 3 | Step 26930 | Loss: 0.8254 | LR: 2.00e-06 [2026-04-22 02:05:13] Epoch 3 | Step 26940 | Loss: 0.8256 | LR: 2.00e-06 [2026-04-22 02:05:18] Epoch 3 | Step 26950 | Loss: 0.8256 | LR: 2.00e-06 [2026-04-22 02:05:24] Epoch 3 | Step 26960 | Loss: 0.8258 | LR: 2.00e-06 [2026-04-22 02:05:30] Epoch 3 | Step 26970 | Loss: 0.8259 | LR: 2.00e-06 [2026-04-22 02:05:35] Epoch 3 | Step 26980 | Loss: 0.8258 | LR: 2.00e-06 [2026-04-22 02:05:40] Epoch 3 | Step 26990 | Loss: 0.8257 | LR: 2.00e-06 [2026-04-22 02:05:45] Epoch 3 | Step 27000 | Loss: 0.8255 | LR: 2.00e-06 [2026-04-22 02:05:56] Checkpoint saved: outputs/2026-04-21/20-28-37/checkpoints/checkpoint_step_27000.pt [2026-04-22 02:07:12] Validation | Batch 10/1567 | Loss: 1.0499 [2026-04-22 02:07:13] Validation | Batch 20/1567 | Loss: 1.1373 [2026-04-22 02:07:14] Validation | Batch 30/1567 | Loss: 1.0927 [2026-04-22 02:07:16] Validation | Batch 40/1567 | Loss: 1.1136 [2026-04-22 02:07:17] Validation | Batch 50/1567 | Loss: 1.0887 [2026-04-22 02:07:18] Validation | Batch 60/1567 | Loss: 1.0768 [2026-04-22 02:07:19] Validation | Batch 70/1567 | Loss: 1.0686 [2026-04-22 02:07:21] Validation | Batch 80/1567 | Loss: 1.0774 [2026-04-22 02:07:22] Validation | Batch 90/1567 | Loss: 1.0719 [2026-04-22 02:07:24] Validation | Batch 100/1567 | Loss: 1.0505 [2026-04-22 02:07:25] Validation | Batch 110/1567 | Loss: 1.0401 [2026-04-22 02:07:27] Validation | Batch 120/1567 | Loss: 1.0345 [2026-04-22 02:07:28] Validation | Batch 130/1567 | Loss: 1.0291 [2026-04-22 02:07:29] Validation | Batch 140/1567 | Loss: 1.0396 [2026-04-22 02:07:30] Validation | Batch 150/1567 | Loss: 1.0504 [2026-04-22 02:07:31] Validation | Batch 160/1567 | Loss: 1.0494 [2026-04-22 02:07:32] Validation | Batch 170/1567 | Loss: 1.0415 [2026-04-22 02:07:33] Validation | Batch 180/1567 | Loss: 1.0444 [2026-04-22 02:07:34] Validation | Batch 190/1567 | Loss: 1.0490 [2026-04-22 02:07:36] Validation | Batch 200/1567 | Loss: 1.0522 [2026-04-22 02:07:37] Validation | Batch 210/1567 | Loss: 1.0501 [2026-04-22 02:07:38] Validation | Batch 220/1567 | Loss: 1.0556 [2026-04-22 02:07:40] Validation | Batch 230/1567 | Loss: 1.0597 [2026-04-22 02:07:41] Validation | Batch 240/1567 | Loss: 1.0617 [2026-04-22 02:07:42] Validation | Batch 250/1567 | Loss: 1.0660 [2026-04-22 02:07:43] Validation | Batch 260/1567 | Loss: 1.0691 [2026-04-22 02:07:44] Validation | Batch 270/1567 | Loss: 1.0735 [2026-04-22 02:07:46] Validation | Batch 280/1567 | Loss: 1.0770 [2026-04-22 02:07:48] Validation | Batch 290/1567 | Loss: 1.0723 [2026-04-22 02:07:49] Validation | Batch 300/1567 | Loss: 1.0715 [2026-04-22 02:07:50] Validation | Batch 310/1567 | Loss: 1.0683 [2026-04-22 02:07:51] Validation | Batch 320/1567 | Loss: 1.0711 [2026-04-22 02:07:53] Validation | Batch 330/1567 | Loss: 1.0712 [2026-04-22 02:07:54] Validation | Batch 340/1567 | Loss: 1.0706 [2026-04-22 02:07:55] Validation | Batch 350/1567 | Loss: 1.0680 [2026-04-22 02:07:56] Validation | Batch 360/1567 | Loss: 1.0617 [2026-04-22 02:07:58] Validation | Batch 370/1567 | Loss: 1.0618 [2026-04-22 02:07:59] Validation | Batch 380/1567 | Loss: 1.0661 [2026-04-22 02:08:00] Validation | Batch 390/1567 | Loss: 1.0652 [2026-04-22 02:08:01] Validation | Batch 400/1567 | Loss: 1.0659 [2026-04-22 02:08:03] Validation | Batch 410/1567 | Loss: 1.0620 [2026-04-22 02:08:04] Validation | Batch 420/1567 | Loss: 1.0603 [2026-04-22 02:08:05] Validation | Batch 430/1567 | Loss: 1.0629 [2026-04-22 02:08:06] Validation | Batch 440/1567 | Loss: 1.0631 [2026-04-22 02:08:07] Validation | Batch 450/1567 | Loss: 1.0651 [2026-04-22 02:08:09] Validation | Batch 460/1567 | Loss: 1.0677 [2026-04-22 02:08:10] Validation | Batch 470/1567 | Loss: 1.0726 [2026-04-22 02:08:11] Validation | Batch 480/1567 | Loss: 1.0702 [2026-04-22 02:08:12] Validation | Batch 490/1567 | Loss: 1.0679 [2026-04-22 02:08:13] Validation | Batch 500/1567 | Loss: 1.0690 [2026-04-22 02:08:14] Validation | Batch 510/1567 | Loss: 1.0688 [2026-04-22 02:08:15] Validation | Batch 520/1567 | Loss: 1.0701 [2026-04-22 02:08:16] Validation | Batch 530/1567 | Loss: 1.0686 [2026-04-22 02:08:18] Validation | Batch 540/1567 | Loss: 1.0659 [2026-04-22 02:08:19] Validation | Batch 550/1567 | Loss: 1.0670 [2026-04-22 02:08:20] Validation | Batch 560/1567 | Loss: 1.0661 [2026-04-22 02:08:22] Validation | Batch 570/1567 | Loss: 1.0619 [2026-04-22 02:08:23] Validation | Batch 580/1567 | Loss: 1.0638 [2026-04-22 02:08:24] Validation | Batch 590/1567 | Loss: 1.0635 [2026-04-22 02:08:25] Validation | Batch 600/1567 | Loss: 1.0624 [2026-04-22 02:08:27] Validation | Batch 610/1567 | Loss: 1.0645 [2026-04-22 02:08:28] Validation | Batch 620/1567 | Loss: 1.0624 [2026-04-22 02:08:29] Validation | Batch 630/1567 | Loss: 1.0628 [2026-04-22 02:08:31] Validation | Batch 640/1567 | Loss: 1.0634 [2026-04-22 02:08:32] Validation | Batch 650/1567 | Loss: 1.0663 [2026-04-22 02:08:33] Validation | Batch 660/1567 | Loss: 1.0676 [2026-04-22 02:08:34] Validation | Batch 670/1567 | Loss: 1.0658 [2026-04-22 02:08:35] Validation | Batch 680/1567 | Loss: 1.0646 [2026-04-22 02:08:36] Validation | Batch 690/1567 | Loss: 1.0631 [2026-04-22 02:08:38] Validation | Batch 700/1567 | Loss: 1.0632 [2026-04-22 02:08:39] Validation | Batch 710/1567 | Loss: 1.0625 [2026-04-22 02:08:40] Validation | Batch 720/1567 | Loss: 1.0593 [2026-04-22 02:08:41] Validation | Batch 730/1567 | Loss: 1.0598 [2026-04-22 02:08:42] Validation | Batch 740/1567 | Loss: 1.0605 [2026-04-22 02:08:43] Validation | Batch 750/1567 | Loss: 1.0601 [2026-04-22 02:08:44] Validation | Batch 760/1567 | Loss: 1.0614 [2026-04-22 02:08:46] Validation | Batch 770/1567 | Loss: 1.0609 [2026-04-22 02:08:47] Validation | Batch 780/1567 | Loss: 1.0619 [2026-04-22 02:08:48] Validation | Batch 790/1567 | Loss: 1.0604 [2026-04-22 02:08:49] Validation | Batch 800/1567 | Loss: 1.0586 [2026-04-22 02:08:50] Validation | Batch 810/1567 | Loss: 1.0593 [2026-04-22 02:08:51] Validation | Batch 820/1567 | Loss: 1.0585 [2026-04-22 02:08:53] Validation | Batch 830/1567 | Loss: 1.0577 [2026-04-22 02:08:53] Validation | Batch 840/1567 | Loss: 1.0585 [2026-04-22 02:08:55] Validation | Batch 850/1567 | Loss: 1.0596 [2026-04-22 02:08:55] Validation | Batch 860/1567 | Loss: 1.0603 [2026-04-22 02:08:56] Validation | Batch 870/1567 | Loss: 1.0611 [2026-04-22 02:08:58] Validation | Batch 880/1567 | Loss: 1.0610 [2026-04-22 02:08:59] Validation | Batch 890/1567 | Loss: 1.0606 [2026-04-22 02:09:00] Validation | Batch 900/1567 | Loss: 1.0602 [2026-04-22 02:09:02] Validation | Batch 910/1567 | Loss: 1.0600 [2026-04-22 02:09:03] Validation | Batch 920/1567 | Loss: 1.0619 [2026-04-22 02:09:04] Validation | Batch 930/1567 | Loss: 1.0618 [2026-04-22 02:09:05] Validation | Batch 940/1567 | Loss: 1.0617 [2026-04-22 02:09:06] Validation | Batch 950/1567 | Loss: 1.0613 [2026-04-22 02:09:07] Validation | Batch 960/1567 | Loss: 1.0616 [2026-04-22 02:09:08] Validation | Batch 970/1567 | Loss: 1.0622 [2026-04-22 02:09:09] Validation | Batch 980/1567 | Loss: 1.0618 [2026-04-22 02:09:10] Validation | Batch 990/1567 | Loss: 1.0628 [2026-04-22 02:09:11] Validation | Batch 1000/1567 | Loss: 1.0632 [2026-04-22 02:09:12] Validation | Batch 1010/1567 | Loss: 1.0623 [2026-04-22 02:09:13] Validation | Batch 1020/1567 | Loss: 1.0635 [2026-04-22 02:09:15] Validation | Batch 1030/1567 | Loss: 1.0640 [2026-04-22 02:09:16] Validation | Batch 1040/1567 | Loss: 1.0631 [2026-04-22 02:09:17] Validation | Batch 1050/1567 | Loss: 1.0620 [2026-04-22 02:09:18] Validation | Batch 1060/1567 | Loss: 1.0632 [2026-04-22 02:09:20] Validation | Batch 1070/1567 | Loss: 1.0631 [2026-04-22 02:09:21] Validation | Batch 1080/1567 | Loss: 1.0644 [2026-04-22 02:09:22] Validation | Batch 1090/1567 | Loss: 1.0671 [2026-04-22 02:09:23] Validation | Batch 1100/1567 | Loss: 1.0686 [2026-04-22 02:09:24] Validation | Batch 1110/1567 | Loss: 1.0676 [2026-04-22 02:09:25] Validation | Batch 1120/1567 | Loss: 1.0678 [2026-04-22 02:09:27] Validation | Batch 1130/1567 | Loss: 1.0660 [2026-04-22 02:09:28] Validation | Batch 1140/1567 | Loss: 1.0664 [2026-04-22 02:09:29] Validation | Batch 1150/1567 | Loss: 1.0651 [2026-04-22 02:09:30] Validation | Batch 1160/1567 | Loss: 1.0645 [2026-04-22 02:09:31] Validation | Batch 1170/1567 | Loss: 1.0648 [2026-04-22 02:09:32] Validation | Batch 1180/1567 | Loss: 1.0651 [2026-04-22 02:09:34] Validation | Batch 1190/1567 | Loss: 1.0653 [2026-04-22 02:09:35] Validation | Batch 1200/1567 | Loss: 1.0640 [2026-04-22 02:09:36] Validation | Batch 1210/1567 | Loss: 1.0634 [2026-04-22 02:09:37] Validation | Batch 1220/1567 | Loss: 1.0643 [2026-04-22 02:09:38] Validation | Batch 1230/1567 | Loss: 1.0648 [2026-04-22 02:09:39] Validation | Batch 1240/1567 | Loss: 1.0646 [2026-04-22 02:09:40] Validation | Batch 1250/1567 | Loss: 1.0650 [2026-04-22 02:09:42] Validation | Batch 1260/1567 | Loss: 1.0648 [2026-04-22 02:09:43] Validation | Batch 1270/1567 | Loss: 1.0630 [2026-04-22 02:09:44] Validation | Batch 1280/1567 | Loss: 1.0631 [2026-04-22 02:09:46] Validation | Batch 1290/1567 | Loss: 1.0633 [2026-04-22 02:09:47] Validation | Batch 1300/1567 | Loss: 1.0635 [2026-04-22 02:09:48] Validation | Batch 1310/1567 | Loss: 1.0643 [2026-04-22 02:09:50] Validation | Batch 1320/1567 | Loss: 1.0649 [2026-04-22 02:09:51] Validation | Batch 1330/1567 | Loss: 1.0664 [2026-04-22 02:09:52] Validation | Batch 1340/1567 | Loss: 1.0660 [2026-04-22 02:09:53] Validation | Batch 1350/1567 | Loss: 1.0664 [2026-04-22 02:09:54] Validation | Batch 1360/1567 | Loss: 1.0654 [2026-04-22 02:09:55] Validation | Batch 1370/1567 | Loss: 1.0651 [2026-04-22 02:09:56] Validation | Batch 1380/1567 | Loss: 1.0651 [2026-04-22 02:09:57] Validation | Batch 1390/1567 | Loss: 1.0643 [2026-04-22 02:09:58] Validation | Batch 1400/1567 | Loss: 1.0640 [2026-04-22 02:09:59] Validation | Batch 1410/1567 | Loss: 1.0645 [2026-04-22 02:10:01] Validation | Batch 1420/1567 | Loss: 1.0645 [2026-04-22 02:10:02] Validation | Batch 1430/1567 | Loss: 1.0648 [2026-04-22 02:10:03] Validation | Batch 1440/1567 | Loss: 1.0656 [2026-04-22 02:10:04] Validation | Batch 1450/1567 | Loss: 1.0658 [2026-04-22 02:10:05] Validation | Batch 1460/1567 | Loss: 1.0651 [2026-04-22 02:10:06] Validation | Batch 1470/1567 | Loss: 1.0649 [2026-04-22 02:10:07] Validation | Batch 1480/1567 | Loss: 1.0646 [2026-04-22 02:10:08] Validation | Batch 1490/1567 | Loss: 1.0641 [2026-04-22 02:10:09] Validation | Batch 1500/1567 | Loss: 1.0638 [2026-04-22 02:10:10] Validation | Batch 1510/1567 | Loss: 1.0629 [2026-04-22 02:10:11] Validation | Batch 1520/1567 | Loss: 1.0628 [2026-04-22 02:10:12] Validation | Batch 1530/1567 | Loss: 1.0628 [2026-04-22 02:10:13] Validation | Batch 1540/1567 | Loss: 1.0635 [2026-04-22 02:10:14] Validation | Batch 1550/1567 | Loss: 1.0648 [2026-04-22 02:10:15] Validation | Batch 1560/1567 | Loss: 1.0644 [2026-04-22 02:10:16] Validation | Batch 1567/1567 | Loss: 1.0644 [2026-04-22 02:10:16] Validation | Loss: 1.0644 | PPL: 2.95 | Time: 185.59s [2026-04-22 02:10:22] Epoch 3 | Step 27010 | Loss: 0.8254 | LR: 2.00e-06 [2026-04-22 02:10:27] Epoch 3 | Step 27020 | Loss: 0.8254 | LR: 2.00e-06 [2026-04-22 02:10:32] Epoch 3 | Step 27030 | Loss: 0.8252 | LR: 2.00e-06 [2026-04-22 02:10:39] Epoch 3 | Step 27040 | Loss: 0.8252 | LR: 2.00e-06 [2026-04-22 02:10:44] Epoch 3 | Step 27050 | Loss: 0.8253 | LR: 2.00e-06 [2026-04-22 02:10:49] Epoch 3 | Step 27060 | Loss: 0.8254 | LR: 2.00e-06 [2026-04-22 02:10:53] Epoch 3 | Step 27070 | Loss: 0.8254 | LR: 2.00e-06 [2026-04-22 02:10:59] Epoch 3 | Step 27080 | Loss: 0.8254 | LR: 2.00e-06 [2026-04-22 02:11:04] Epoch 3 | Step 27090 | Loss: 0.8255 | LR: 2.00e-06 [2026-04-22 02:11:10] Epoch 3 | Step 27100 | Loss: 0.8255 | LR: 2.00e-06 [2026-04-22 02:11:15] Epoch 3 | Step 27110 | Loss: 0.8254 | LR: 2.00e-06 [2026-04-22 02:11:20] Epoch 3 | Step 27120 | Loss: 0.8255 | LR: 2.00e-06 [2026-04-22 02:11:25] Epoch 3 | Step 27130 | Loss: 0.8255 | LR: 2.00e-06 [2026-04-22 02:11:30] Epoch 3 | Step 27140 | Loss: 0.8256 | LR: 2.00e-06 [2026-04-22 02:11:36] Epoch 3 | Step 27150 | Loss: 0.8254 | LR: 2.00e-06 [2026-04-22 02:11:41] Epoch 3 | Step 27160 | Loss: 0.8253 | LR: 2.00e-06 [2026-04-22 02:11:46] Epoch 3 | Step 27170 | Loss: 0.8251 | LR: 2.00e-06 [2026-04-22 02:11:52] Epoch 3 | Step 27180 | Loss: 0.8251 | LR: 2.00e-06 [2026-04-22 02:11:57] Epoch 3 | Step 27190 | Loss: 0.8249 | LR: 2.00e-06 [2026-04-22 02:12:01] Epoch 3 | Step 27200 | Loss: 0.8249 | LR: 2.00e-06 [2026-04-22 02:12:07] Epoch 3 | Step 27210 | Loss: 0.8248 | LR: 2.00e-06 [2026-04-22 02:12:12] Epoch 3 | Step 27220 | Loss: 0.8247 | LR: 2.00e-06 [2026-04-22 02:12:17] Epoch 3 | Step 27230 | Loss: 0.8248 | LR: 2.00e-06 [2026-04-22 02:12:23] Epoch 3 | Step 27240 | Loss: 0.8248 | LR: 2.00e-06 [2026-04-22 02:12:28] Epoch 3 | Step 27250 | Loss: 0.8249 | LR: 2.00e-06 [2026-04-22 02:12:33] Epoch 3 | Step 27260 | Loss: 0.8247 | LR: 2.00e-06 [2026-04-22 02:12:38] Epoch 3 | Step 27270 | Loss: 0.8247 | LR: 2.00e-06 [2026-04-22 02:12:44] Epoch 3 | Step 27280 | Loss: 0.8247 | LR: 2.00e-06 [2026-04-22 02:12:49] Epoch 3 | Step 27290 | Loss: 0.8246 | LR: 2.00e-06 [2026-04-22 02:12:54] Epoch 3 | Step 27300 | Loss: 0.8245 | LR: 2.00e-06 [2026-04-22 02:12:59] Epoch 3 | Step 27310 | Loss: 0.8246 | LR: 2.00e-06 [2026-04-22 02:13:04] Epoch 3 | Step 27320 | Loss: 0.8246 | LR: 2.00e-06 [2026-04-22 02:13:10] Epoch 3 | Step 27330 | Loss: 0.8247 | LR: 2.00e-06 [2026-04-22 02:13:15] Epoch 3 | Step 27340 | Loss: 0.8248 | LR: 2.00e-06 [2026-04-22 02:13:21] Epoch 3 | Step 27350 | Loss: 0.8248 | LR: 2.00e-06 [2026-04-22 02:13:27] Epoch 3 | Step 27360 | Loss: 0.8248 | LR: 2.00e-06 [2026-04-22 02:13:33] Epoch 3 | Step 27370 | Loss: 0.8250 | LR: 2.00e-06 [2026-04-22 02:13:39] Epoch 3 | Step 27380 | Loss: 0.8250 | LR: 2.00e-06 [2026-04-22 02:13:44] Epoch 3 | Step 27390 | Loss: 0.8250 | LR: 2.00e-06 [2026-04-22 02:13:49] Epoch 3 | Step 27400 | Loss: 0.8250 | LR: 2.00e-06 [2026-04-22 02:13:54] Epoch 3 | Step 27410 | Loss: 0.8249 | LR: 2.00e-06 [2026-04-22 02:13:59] Epoch 3 | Step 27420 | Loss: 0.8251 | LR: 2.00e-06 [2026-04-22 02:14:04] Epoch 3 | Step 27430 | Loss: 0.8252 | LR: 2.00e-06 [2026-04-22 02:14:09] Epoch 3 | Step 27440 | Loss: 0.8250 | LR: 2.00e-06 [2026-04-22 02:14:14] Epoch 3 | Step 27450 | Loss: 0.8249 | LR: 2.00e-06 [2026-04-22 02:14:20] Epoch 3 | Step 27460 | Loss: 0.8248 | LR: 2.00e-06 [2026-04-22 02:14:25] Epoch 3 | Step 27470 | Loss: 0.8249 | LR: 2.00e-06 [2026-04-22 02:14:30] Epoch 3 | Step 27480 | Loss: 0.8249 | LR: 2.00e-06 [2026-04-22 02:14:35] Epoch 3 | Step 27490 | Loss: 0.8248 | LR: 2.00e-06 [2026-04-22 02:14:41] Epoch 3 | Step 27500 | Loss: 0.8250 | LR: 2.00e-06 [2026-04-22 02:14:45] Epoch 3 | Step 27510 | Loss: 0.8251 | LR: 2.00e-06 [2026-04-22 02:14:50] Epoch 3 | Step 27520 | Loss: 0.8250 | LR: 2.00e-06 [2026-04-22 02:14:56] Epoch 3 | Step 27530 | Loss: 0.8250 | LR: 2.00e-06 [2026-04-22 02:15:01] Epoch 3 | Step 27540 | Loss: 0.8253 | LR: 2.00e-06 [2026-04-22 02:15:05] Epoch 3 | Step 27550 | Loss: 0.8253 | LR: 2.00e-06 [2026-04-22 02:15:11] Epoch 3 | Step 27560 | Loss: 0.8252 | LR: 2.00e-06 [2026-04-22 02:15:16] Epoch 3 | Step 27570 | Loss: 0.8251 | LR: 2.00e-06 [2026-04-22 02:15:22] Epoch 3 | Step 27580 | Loss: 0.8251 | LR: 2.00e-06 [2026-04-22 02:15:27] Epoch 3 | Step 27590 | Loss: 0.8252 | LR: 2.00e-06 [2026-04-22 02:15:32] Epoch 3 | Step 27600 | Loss: 0.8252 | LR: 2.00e-06 [2026-04-22 02:15:38] Epoch 3 | Step 27610 | Loss: 0.8250 | LR: 2.00e-06 [2026-04-22 02:15:43] Epoch 3 | Step 27620 | Loss: 0.8252 | LR: 2.00e-06 [2026-04-22 02:15:48] Epoch 3 | Step 27630 | Loss: 0.8251 | LR: 2.00e-06 [2026-04-22 02:15:54] Epoch 3 | Step 27640 | Loss: 0.8252 | LR: 2.00e-06 [2026-04-22 02:15:59] Epoch 3 | Step 27650 | Loss: 0.8251 | LR: 2.00e-06 [2026-04-22 02:16:04] Epoch 3 | Step 27660 | Loss: 0.8251 | LR: 2.00e-06 [2026-04-22 02:16:09] Epoch 3 | Step 27670 | Loss: 0.8250 | LR: 2.00e-06 [2026-04-22 02:16:15] Epoch 3 | Step 27680 | Loss: 0.8249 | LR: 2.00e-06 [2026-04-22 02:16:20] Epoch 3 | Step 27690 | Loss: 0.8249 | LR: 2.00e-06 [2026-04-22 02:16:25] Epoch 3 | Step 27700 | Loss: 0.8247 | LR: 2.00e-06 [2026-04-22 02:16:30] Epoch 3 | Step 27710 | Loss: 0.8245 | LR: 2.00e-06 [2026-04-22 02:16:36] Epoch 3 | Step 27720 | Loss: 0.8245 | LR: 2.00e-06 [2026-04-22 02:16:41] Epoch 3 | Step 27730 | Loss: 0.8244 | LR: 2.00e-06 [2026-04-22 02:16:46] Epoch 3 | Step 27740 | Loss: 0.8242 | LR: 2.00e-06 [2026-04-22 02:16:51] Epoch 3 | Step 27750 | Loss: 0.8242 | LR: 2.00e-06 [2026-04-22 02:16:56] Epoch 3 | Step 27760 | Loss: 0.8240 | LR: 2.00e-06 [2026-04-22 02:17:02] Epoch 3 | Step 27770 | Loss: 0.8240 | LR: 2.00e-06 [2026-04-22 02:17:08] Epoch 3 | Step 27780 | Loss: 0.8240 | LR: 2.00e-06 [2026-04-22 02:17:14] Epoch 3 | Step 27790 | Loss: 0.8238 | LR: 2.00e-06 [2026-04-22 02:17:20] Epoch 3 | Step 27800 | Loss: 0.8238 | LR: 2.00e-06 [2026-04-22 02:17:25] Epoch 3 | Step 27810 | Loss: 0.8237 | LR: 2.00e-06 [2026-04-22 02:17:30] Epoch 3 | Step 27820 | Loss: 0.8238 | LR: 2.00e-06 [2026-04-22 02:17:35] Epoch 3 | Step 27830 | Loss: 0.8238 | LR: 2.00e-06 [2026-04-22 02:17:41] Epoch 3 | Step 27840 | Loss: 0.8238 | LR: 2.00e-06 [2026-04-22 02:17:45] Epoch 3 | Step 27850 | Loss: 0.8237 | LR: 2.00e-06 [2026-04-22 02:17:51] Epoch 3 | Step 27860 | Loss: 0.8237 | LR: 2.00e-06 [2026-04-22 02:17:56] Epoch 3 | Step 27870 | Loss: 0.8236 | LR: 2.00e-06 [2026-04-22 02:18:01] Epoch 3 | Step 27880 | Loss: 0.8236 | LR: 2.00e-06 [2026-04-22 02:18:06] Epoch 3 | Step 27890 | Loss: 0.8236 | LR: 2.00e-06 [2026-04-22 02:18:11] Epoch 3 | Step 27900 | Loss: 0.8237 | LR: 2.00e-06 [2026-04-22 02:18:17] Epoch 3 | Step 27910 | Loss: 0.8236 | LR: 2.00e-06 [2026-04-22 02:18:22] Epoch 3 | Step 27920 | Loss: 0.8235 | LR: 2.00e-06 [2026-04-22 02:18:27] Epoch 3 | Step 27930 | Loss: 0.8236 | LR: 2.00e-06 [2026-04-22 02:18:32] Epoch 3 | Step 27940 | Loss: 0.8234 | LR: 2.00e-06 [2026-04-22 02:18:37] Epoch 3 | Step 27950 | Loss: 0.8233 | LR: 2.00e-06 [2026-04-22 02:18:43] Epoch 3 | Step 27960 | Loss: 0.8233 | LR: 2.00e-06 [2026-04-22 02:18:48] Epoch 3 | Step 27970 | Loss: 0.8233 | LR: 2.00e-06 [2026-04-22 02:18:53] Epoch 3 | Step 27980 | Loss: 0.8231 | LR: 2.00e-06 [2026-04-22 02:18:59] Epoch 3 | Step 27990 | Loss: 0.8229 | LR: 2.00e-06 [2026-04-22 02:19:03] Epoch 3 | Step 28000 | Loss: 0.8231 | LR: 2.00e-06 [2026-04-22 02:19:05] Validation | Batch 10/1567 | Loss: 1.0500 [2026-04-22 02:19:06] Validation | Batch 20/1567 | Loss: 1.1372 [2026-04-22 02:19:07] Validation | Batch 30/1567 | Loss: 1.0929 [2026-04-22 02:19:09] Validation | Batch 40/1567 | Loss: 1.1139 [2026-04-22 02:19:10] Validation | Batch 50/1567 | Loss: 1.0889 [2026-04-22 02:19:11] Validation | Batch 60/1567 | Loss: 1.0771 [2026-04-22 02:19:12] Validation | Batch 70/1567 | Loss: 1.0687 [2026-04-22 02:19:14] Validation | Batch 80/1567 | Loss: 1.0775 [2026-04-22 02:19:15] Validation | Batch 90/1567 | Loss: 1.0720 [2026-04-22 02:19:16] Validation | Batch 100/1567 | Loss: 1.0508 [2026-04-22 02:19:17] Validation | Batch 110/1567 | Loss: 1.0405 [2026-04-22 02:19:19] Validation | Batch 120/1567 | Loss: 1.0350 [2026-04-22 02:19:20] Validation | Batch 130/1567 | Loss: 1.0296 [2026-04-22 02:19:21] Validation | Batch 140/1567 | Loss: 1.0400 [2026-04-22 02:19:22] Validation | Batch 150/1567 | Loss: 1.0510 [2026-04-22 02:19:23] Validation | Batch 160/1567 | Loss: 1.0499 [2026-04-22 02:19:24] Validation | Batch 170/1567 | Loss: 1.0420 [2026-04-22 02:19:25] Validation | Batch 180/1567 | Loss: 1.0451 [2026-04-22 02:19:26] Validation | Batch 190/1567 | Loss: 1.0496 [2026-04-22 02:19:28] Validation | Batch 200/1567 | Loss: 1.0529 [2026-04-22 02:19:29] Validation | Batch 210/1567 | Loss: 1.0508 [2026-04-22 02:19:30] Validation | Batch 220/1567 | Loss: 1.0562 [2026-04-22 02:19:32] Validation | Batch 230/1567 | Loss: 1.0602 [2026-04-22 02:19:33] Validation | Batch 240/1567 | Loss: 1.0624 [2026-04-22 02:19:34] Validation | Batch 250/1567 | Loss: 1.0666 [2026-04-22 02:19:35] Validation | Batch 260/1567 | Loss: 1.0697 [2026-04-22 02:19:36] Validation | Batch 270/1567 | Loss: 1.0740 [2026-04-22 02:19:38] Validation | Batch 280/1567 | Loss: 1.0776 [2026-04-22 02:19:40] Validation | Batch 290/1567 | Loss: 1.0728 [2026-04-22 02:19:41] Validation | Batch 300/1567 | Loss: 1.0720 [2026-04-22 02:19:42] Validation | Batch 310/1567 | Loss: 1.0689 [2026-04-22 02:19:43] Validation | Batch 320/1567 | Loss: 1.0717 [2026-04-22 02:19:45] Validation | Batch 330/1567 | Loss: 1.0718 [2026-04-22 02:19:46] Validation | Batch 340/1567 | Loss: 1.0712 [2026-04-22 02:19:47] Validation | Batch 350/1567 | Loss: 1.0686 [2026-04-22 02:19:48] Validation | Batch 360/1567 | Loss: 1.0623 [2026-04-22 02:19:50] Validation | Batch 370/1567 | Loss: 1.0624 [2026-04-22 02:19:51] Validation | Batch 380/1567 | Loss: 1.0667 [2026-04-22 02:19:52] Validation | Batch 390/1567 | Loss: 1.0658 [2026-04-22 02:19:53] Validation | Batch 400/1567 | Loss: 1.0665 [2026-04-22 02:19:54] Validation | Batch 410/1567 | Loss: 1.0626 [2026-04-22 02:19:55] Validation | Batch 420/1567 | Loss: 1.0609 [2026-04-22 02:19:57] Validation | Batch 430/1567 | Loss: 1.0635 [2026-04-22 02:19:58] Validation | Batch 440/1567 | Loss: 1.0636 [2026-04-22 02:19:59] Validation | Batch 450/1567 | Loss: 1.0657 [2026-04-22 02:20:00] Validation | Batch 460/1567 | Loss: 1.0683 [2026-04-22 02:20:01] Validation | Batch 470/1567 | Loss: 1.0732 [2026-04-22 02:20:03] Validation | Batch 480/1567 | Loss: 1.0708 [2026-04-22 02:20:04] Validation | Batch 490/1567 | Loss: 1.0685 [2026-04-22 02:20:05] Validation | Batch 500/1567 | Loss: 1.0696 [2026-04-22 02:20:06] Validation | Batch 510/1567 | Loss: 1.0694 [2026-04-22 02:20:07] Validation | Batch 520/1567 | Loss: 1.0708 [2026-04-22 02:20:08] Validation | Batch 530/1567 | Loss: 1.0693 [2026-04-22 02:20:10] Validation | Batch 540/1567 | Loss: 1.0665 [2026-04-22 02:20:11] Validation | Batch 550/1567 | Loss: 1.0677 [2026-04-22 02:20:12] Validation | Batch 560/1567 | Loss: 1.0667 [2026-04-22 02:20:13] Validation | Batch 570/1567 | Loss: 1.0625 [2026-04-22 02:20:15] Validation | Batch 580/1567 | Loss: 1.0644 [2026-04-22 02:20:16] Validation | Batch 590/1567 | Loss: 1.0642 [2026-04-22 02:20:17] Validation | Batch 600/1567 | Loss: 1.0631 [2026-04-22 02:20:18] Validation | Batch 610/1567 | Loss: 1.0652 [2026-04-22 02:20:20] Validation | Batch 620/1567 | Loss: 1.0631 [2026-04-22 02:20:21] Validation | Batch 630/1567 | Loss: 1.0634 [2026-04-22 02:20:23] Validation | Batch 640/1567 | Loss: 1.0640 [2026-04-22 02:20:24] Validation | Batch 650/1567 | Loss: 1.0669 [2026-04-22 02:20:25] Validation | Batch 660/1567 | Loss: 1.0682 [2026-04-22 02:20:26] Validation | Batch 670/1567 | Loss: 1.0664 [2026-04-22 02:20:27] Validation | Batch 680/1567 | Loss: 1.0652 [2026-04-22 02:20:28] Validation | Batch 690/1567 | Loss: 1.0636 [2026-04-22 02:20:30] Validation | Batch 700/1567 | Loss: 1.0638 [2026-04-22 02:20:31] Validation | Batch 710/1567 | Loss: 1.0630 [2026-04-22 02:20:32] Validation | Batch 720/1567 | Loss: 1.0599 [2026-04-22 02:20:33] Validation | Batch 730/1567 | Loss: 1.0604 [2026-04-22 02:20:34] Validation | Batch 740/1567 | Loss: 1.0611 [2026-04-22 02:20:35] Validation | Batch 750/1567 | Loss: 1.0607 [2026-04-22 02:20:36] Validation | Batch 760/1567 | Loss: 1.0620 [2026-04-22 02:20:38] Validation | Batch 770/1567 | Loss: 1.0615 [2026-04-22 02:20:39] Validation | Batch 780/1567 | Loss: 1.0625 [2026-04-22 02:20:40] Validation | Batch 790/1567 | Loss: 1.0610 [2026-04-22 02:20:41] Validation | Batch 800/1567 | Loss: 1.0592 [2026-04-22 02:20:42] Validation | Batch 810/1567 | Loss: 1.0599 [2026-04-22 02:20:43] Validation | Batch 820/1567 | Loss: 1.0591 [2026-04-22 02:20:44] Validation | Batch 830/1567 | Loss: 1.0584 [2026-04-22 02:20:45] Validation | Batch 840/1567 | Loss: 1.0591 [2026-04-22 02:20:46] Validation | Batch 850/1567 | Loss: 1.0602 [2026-04-22 02:20:47] Validation | Batch 860/1567 | Loss: 1.0610 [2026-04-22 02:20:48] Validation | Batch 870/1567 | Loss: 1.0618 [2026-04-22 02:20:49] Validation | Batch 880/1567 | Loss: 1.0616 [2026-04-22 02:20:51] Validation | Batch 890/1567 | Loss: 1.0612 [2026-04-22 02:20:52] Validation | Batch 900/1567 | Loss: 1.0609 [2026-04-22 02:20:53] Validation | Batch 910/1567 | Loss: 1.0606 [2026-04-22 02:20:54] Validation | Batch 920/1567 | Loss: 1.0625 [2026-04-22 02:20:55] Validation | Batch 930/1567 | Loss: 1.0624 [2026-04-22 02:20:56] Validation | Batch 940/1567 | Loss: 1.0623 [2026-04-22 02:20:58] Validation | Batch 950/1567 | Loss: 1.0619 [2026-04-22 02:20:59] Validation | Batch 960/1567 | Loss: 1.0623 [2026-04-22 02:21:00] Validation | Batch 970/1567 | Loss: 1.0628 [2026-04-22 02:21:01] Validation | Batch 980/1567 | Loss: 1.0624 [2026-04-22 02:21:01] Validation | Batch 990/1567 | Loss: 1.0634 [2026-04-22 02:21:03] Validation | Batch 1000/1567 | Loss: 1.0638 [2026-04-22 02:21:04] Validation | Batch 1010/1567 | Loss: 1.0629 [2026-04-22 02:21:05] Validation | Batch 1020/1567 | Loss: 1.0641 [2026-04-22 02:21:06] Validation | Batch 1030/1567 | Loss: 1.0646 [2026-04-22 02:21:08] Validation | Batch 1040/1567 | Loss: 1.0637 [2026-04-22 02:21:09] Validation | Batch 1050/1567 | Loss: 1.0626 [2026-04-22 02:21:10] Validation | Batch 1060/1567 | Loss: 1.0638 [2026-04-22 02:21:12] Validation | Batch 1070/1567 | Loss: 1.0637 [2026-04-22 02:21:13] Validation | Batch 1080/1567 | Loss: 1.0650 [2026-04-22 02:21:14] Validation | Batch 1090/1567 | Loss: 1.0677 [2026-04-22 02:21:15] Validation | Batch 1100/1567 | Loss: 1.0692 [2026-04-22 02:21:16] Validation | Batch 1110/1567 | Loss: 1.0682 [2026-04-22 02:21:17] Validation | Batch 1120/1567 | Loss: 1.0683 [2026-04-22 02:21:18] Validation | Batch 1130/1567 | Loss: 1.0666 [2026-04-22 02:21:20] Validation | Batch 1140/1567 | Loss: 1.0670 [2026-04-22 02:21:21] Validation | Batch 1150/1567 | Loss: 1.0657 [2026-04-22 02:21:22] Validation | Batch 1160/1567 | Loss: 1.0651 [2026-04-22 02:21:23] Validation | Batch 1170/1567 | Loss: 1.0654 [2026-04-22 02:21:24] Validation | Batch 1180/1567 | Loss: 1.0656 [2026-04-22 02:21:25] Validation | Batch 1190/1567 | Loss: 1.0658 [2026-04-22 02:21:27] Validation | Batch 1200/1567 | Loss: 1.0645 [2026-04-22 02:21:28] Validation | Batch 1210/1567 | Loss: 1.0639 [2026-04-22 02:21:29] Validation | Batch 1220/1567 | Loss: 1.0648 [2026-04-22 02:21:30] Validation | Batch 1230/1567 | Loss: 1.0653 [2026-04-22 02:21:31] Validation | Batch 1240/1567 | Loss: 1.0652 [2026-04-22 02:21:32] Validation | Batch 1250/1567 | Loss: 1.0655 [2026-04-22 02:21:34] Validation | Batch 1260/1567 | Loss: 1.0653 [2026-04-22 02:21:35] Validation | Batch 1270/1567 | Loss: 1.0635 [2026-04-22 02:21:36] Validation | Batch 1280/1567 | Loss: 1.0637 [2026-04-22 02:21:38] Validation | Batch 1290/1567 | Loss: 1.0638 [2026-04-22 02:21:39] Validation | Batch 1300/1567 | Loss: 1.0641 [2026-04-22 02:21:40] Validation | Batch 1310/1567 | Loss: 1.0648 [2026-04-22 02:21:41] Validation | Batch 1320/1567 | Loss: 1.0654 [2026-04-22 02:21:42] Validation | Batch 1330/1567 | Loss: 1.0669 [2026-04-22 02:21:44] Validation | Batch 1340/1567 | Loss: 1.0666 [2026-04-22 02:21:44] Validation | Batch 1350/1567 | Loss: 1.0669 [2026-04-22 02:21:46] Validation | Batch 1360/1567 | Loss: 1.0659 [2026-04-22 02:21:47] Validation | Batch 1370/1567 | Loss: 1.0655 [2026-04-22 02:21:48] Validation | Batch 1380/1567 | Loss: 1.0656 [2026-04-22 02:21:49] Validation | Batch 1390/1567 | Loss: 1.0648 [2026-04-22 02:21:50] Validation | Batch 1400/1567 | Loss: 1.0645 [2026-04-22 02:21:51] Validation | Batch 1410/1567 | Loss: 1.0650 [2026-04-22 02:21:52] Validation | Batch 1420/1567 | Loss: 1.0650 [2026-04-22 02:21:53] Validation | Batch 1430/1567 | Loss: 1.0654 [2026-04-22 02:21:55] Validation | Batch 1440/1567 | Loss: 1.0662 [2026-04-22 02:21:56] Validation | Batch 1450/1567 | Loss: 1.0663 [2026-04-22 02:21:57] Validation | Batch 1460/1567 | Loss: 1.0656 [2026-04-22 02:21:58] Validation | Batch 1470/1567 | Loss: 1.0655 [2026-04-22 02:21:59] Validation | Batch 1480/1567 | Loss: 1.0652 [2026-04-22 02:21:59] Validation | Batch 1490/1567 | Loss: 1.0647 [2026-04-22 02:22:01] Validation | Batch 1500/1567 | Loss: 1.0644 [2026-04-22 02:22:02] Validation | Batch 1510/1567 | Loss: 1.0635 [2026-04-22 02:22:03] Validation | Batch 1520/1567 | Loss: 1.0633 [2026-04-22 02:22:04] Validation | Batch 1530/1567 | Loss: 1.0634 [2026-04-22 02:22:05] Validation | Batch 1540/1567 | Loss: 1.0640 [2026-04-22 02:22:06] Validation | Batch 1550/1567 | Loss: 1.0653 [2026-04-22 02:22:07] Validation | Batch 1560/1567 | Loss: 1.0649 [2026-04-22 02:22:08] Validation | Batch 1567/1567 | Loss: 1.0650 [2026-04-22 02:22:08] Validation | Loss: 1.0650 | PPL: 2.95 | Time: 184.72s [2026-04-22 02:22:14] Epoch 3 | Step 28010 | Loss: 0.8230 | LR: 2.00e-06 [2026-04-22 02:22:19] Epoch 3 | Step 28020 | Loss: 0.8229 | LR: 2.00e-06 [2026-04-22 02:22:24] Epoch 3 | Step 28030 | Loss: 0.8229 | LR: 2.00e-06 [2026-04-22 02:22:30] Epoch 3 | Step 28040 | Loss: 0.8229 | LR: 2.00e-06 [2026-04-22 02:22:35] Epoch 3 | Step 28050 | Loss: 0.8228 | LR: 2.00e-06 [2026-04-22 02:22:41] Epoch 3 | Step 28060 | Loss: 0.8227 | LR: 2.00e-06 [2026-04-22 02:22:46] Epoch 3 | Step 28070 | Loss: 0.8227 | LR: 2.00e-06 [2026-04-22 02:22:52] Epoch 3 | Step 28080 | Loss: 0.8228 | LR: 2.00e-06 [2026-04-22 02:22:57] Epoch 3 | Step 28090 | Loss: 0.8229 | LR: 2.00e-06 [2026-04-22 02:23:02] Epoch 3 | Step 28100 | Loss: 0.8228 | LR: 2.00e-06 [2026-04-22 02:23:07] Epoch 3 | Step 28110 | Loss: 0.8231 | LR: 2.00e-06 [2026-04-22 02:23:13] Epoch 3 | Step 28120 | Loss: 0.8230 | LR: 2.00e-06 [2026-04-22 02:23:18] Epoch 3 | Step 28130 | Loss: 0.8231 | LR: 2.00e-06 [2026-04-22 02:23:23] Epoch 3 | Step 28140 | Loss: 0.8229 | LR: 2.00e-06 [2026-04-22 02:23:29] Epoch 3 | Step 28150 | Loss: 0.8230 | LR: 2.00e-06 [2026-04-22 02:23:35] Epoch 3 | Step 28160 | Loss: 0.8228 | LR: 2.00e-06 [2026-04-22 02:23:40] Epoch 3 | Step 28170 | Loss: 0.8227 | LR: 2.00e-06 [2026-04-22 02:23:46] Epoch 3 | Step 28180 | Loss: 0.8228 | LR: 2.00e-06 [2026-04-22 02:23:51] Epoch 3 | Step 28190 | Loss: 0.8230 | LR: 2.00e-06 [2026-04-22 02:23:57] Epoch 3 | Step 28200 | Loss: 0.8229 | LR: 2.00e-06 [2026-04-22 02:24:01] Epoch 3 | Step 28210 | Loss: 0.8229 | LR: 2.00e-06 [2026-04-22 02:24:07] Epoch 3 | Step 28220 | Loss: 0.8229 | LR: 2.00e-06 [2026-04-22 02:24:13] Epoch 3 | Step 28230 | Loss: 0.8230 | LR: 2.00e-06 [2026-04-22 02:24:19] Epoch 3 | Step 28240 | Loss: 0.8230 | LR: 2.00e-06 [2026-04-22 02:24:23] Epoch 3 | Step 28250 | Loss: 0.8231 | LR: 2.00e-06 [2026-04-22 02:24:29] Epoch 3 | Step 28260 | Loss: 0.8230 | LR: 2.00e-06 [2026-04-22 02:24:35] Epoch 3 | Step 28270 | Loss: 0.8231 | LR: 2.00e-06 [2026-04-22 02:24:40] Epoch 3 | Step 28280 | Loss: 0.8232 | LR: 2.00e-06 [2026-04-22 02:24:45] Epoch 3 | Step 28290 | Loss: 0.8230 | LR: 2.00e-06 [2026-04-22 02:24:50] Epoch 3 | Step 28300 | Loss: 0.8232 | LR: 2.00e-06 [2026-04-22 02:24:55] Epoch 3 | Step 28310 | Loss: 0.8231 | LR: 2.00e-06 [2026-04-22 02:25:00] Epoch 3 | Step 28320 | Loss: 0.8230 | LR: 2.00e-06 [2026-04-22 02:25:05] Epoch 3 | Step 28330 | Loss: 0.8230 | LR: 2.00e-06 [2026-04-22 02:25:10] Epoch 3 | Step 28340 | Loss: 0.8230 | LR: 2.00e-06 [2026-04-22 02:25:16] Epoch 3 | Step 28350 | Loss: 0.8229 | LR: 2.00e-06 [2026-04-22 02:25:21] Epoch 3 | Step 28360 | Loss: 0.8227 | LR: 2.00e-06 [2026-04-22 02:25:27] Epoch 3 | Step 28370 | Loss: 0.8225 | LR: 2.00e-06 [2026-04-22 02:25:32] Epoch 3 | Step 28380 | Loss: 0.8223 | LR: 2.00e-06 [2026-04-22 02:25:38] Epoch 3 | Step 28390 | Loss: 0.8224 | LR: 2.00e-06 [2026-04-22 02:25:43] Epoch 3 | Step 28400 | Loss: 0.8224 | LR: 2.00e-06 [2026-04-22 02:25:48] Epoch 3 | Step 28410 | Loss: 0.8223 | LR: 2.00e-06 [2026-04-22 02:25:53] Epoch 3 | Step 28420 | Loss: 0.8224 | LR: 2.00e-06 [2026-04-22 02:25:57] Epoch 3 | Step 28430 | Loss: 0.8224 | LR: 2.00e-06 [2026-04-22 02:26:02] Epoch 3 | Step 28440 | Loss: 0.8225 | LR: 2.00e-06 [2026-04-22 02:26:08] Epoch 3 | Step 28450 | Loss: 0.8224 | LR: 2.00e-06 [2026-04-22 02:26:13] Epoch 3 | Step 28460 | Loss: 0.8226 | LR: 2.00e-06 [2026-04-22 02:26:18] Epoch 3 | Step 28470 | Loss: 0.8224 | LR: 2.00e-06 [2026-04-22 02:26:23] Epoch 3 | Step 28480 | Loss: 0.8225 | LR: 2.00e-06 [2026-04-22 02:26:28] Epoch 3 | Step 28490 | Loss: 0.8226 | LR: 2.00e-06 [2026-04-22 02:26:33] Epoch 3 | Step 28500 | Loss: 0.8226 | LR: 2.00e-06 [2026-04-22 02:26:38] Epoch 3 | Step 28510 | Loss: 0.8226 | LR: 2.00e-06 [2026-04-22 02:26:43] Epoch 3 | Step 28520 | Loss: 0.8227 | LR: 2.00e-06 [2026-04-22 02:26:48] Epoch 3 | Step 28530 | Loss: 0.8226 | LR: 2.00e-06 [2026-04-22 02:26:54] Epoch 3 | Step 28540 | Loss: 0.8226 | LR: 2.00e-06 [2026-04-22 02:26:59] Epoch 3 | Step 28550 | Loss: 0.8227 | LR: 2.00e-06 [2026-04-22 02:27:04] Epoch 3 | Step 28560 | Loss: 0.8227 | LR: 2.00e-06 [2026-04-22 02:27:09] Epoch 3 | Step 28570 | Loss: 0.8225 | LR: 2.00e-06 [2026-04-22 02:27:14] Epoch 3 | Step 28580 | Loss: 0.8225 | LR: 2.00e-06 [2026-04-22 02:27:20] Epoch 3 | Step 28590 | Loss: 0.8225 | LR: 2.00e-06 [2026-04-22 02:27:25] Epoch 3 | Step 28600 | Loss: 0.8227 | LR: 2.00e-06 [2026-04-22 02:27:30] Epoch 3 | Step 28610 | Loss: 0.8227 | LR: 2.00e-06 [2026-04-22 02:27:35] Epoch 3 | Step 28620 | Loss: 0.8226 | LR: 2.00e-06 [2026-04-22 02:27:41] Epoch 3 | Step 28630 | Loss: 0.8225 | LR: 2.00e-06 [2026-04-22 02:27:46] Epoch 3 | Step 28640 | Loss: 0.8226 | LR: 2.00e-06 [2026-04-22 02:27:51] Epoch 3 | Step 28650 | Loss: 0.8225 | LR: 2.00e-06 [2026-04-22 02:27:56] Epoch 3 | Step 28660 | Loss: 0.8225 | LR: 2.00e-06 [2026-04-22 02:28:02] Epoch 3 | Step 28670 | Loss: 0.8225 | LR: 2.00e-06 [2026-04-22 02:28:07] Epoch 3 | Step 28680 | Loss: 0.8226 | LR: 2.00e-06 [2026-04-22 02:28:12] Epoch 3 | Step 28690 | Loss: 0.8227 | LR: 2.00e-06 [2026-04-22 02:28:18] Epoch 3 | Step 28700 | Loss: 0.8226 | LR: 2.00e-06 [2026-04-22 02:28:23] Epoch 3 | Step 28710 | Loss: 0.8225 | LR: 2.00e-06 [2026-04-22 02:28:28] Epoch 3 | Step 28720 | Loss: 0.8227 | LR: 2.00e-06 [2026-04-22 02:28:33] Epoch 3 | Step 28730 | Loss: 0.8227 | LR: 2.00e-06 [2026-04-22 02:28:39] Epoch 3 | Step 28740 | Loss: 0.8224 | LR: 2.00e-06 [2026-04-22 02:28:44] Epoch 3 | Step 28750 | Loss: 0.8225 | LR: 2.00e-06 [2026-04-22 02:28:49] Epoch 3 | Step 28760 | Loss: 0.8226 | LR: 2.00e-06 [2026-04-22 02:28:56] Epoch 3 | Step 28770 | Loss: 0.8226 | LR: 2.00e-06 [2026-04-22 02:29:01] Epoch 3 | Step 28780 | Loss: 0.8228 | LR: 2.00e-06 [2026-04-22 02:29:06] Epoch 3 | Step 28790 | Loss: 0.8227 | LR: 2.00e-06 [2026-04-22 02:29:12] Epoch 3 | Step 28800 | Loss: 0.8227 | LR: 2.00e-06 [2026-04-22 02:29:17] Epoch 3 | Step 28810 | Loss: 0.8227 | LR: 2.00e-06 [2026-04-22 02:29:22] Epoch 3 | Step 28820 | Loss: 0.8226 | LR: 2.00e-06 [2026-04-22 02:29:27] Epoch 3 | Step 28830 | Loss: 0.8225 | LR: 2.00e-06 [2026-04-22 02:29:33] Epoch 3 | Step 28840 | Loss: 0.8225 | LR: 2.00e-06 [2026-04-22 02:29:38] Epoch 3 | Step 28850 | Loss: 0.8226 | LR: 2.00e-06 [2026-04-22 02:29:43] Epoch 3 | Step 28860 | Loss: 0.8226 | LR: 2.00e-06 [2026-04-22 02:29:48] Epoch 3 | Step 28870 | Loss: 0.8225 | LR: 2.00e-06 [2026-04-22 02:29:53] Epoch 3 | Step 28880 | Loss: 0.8224 | LR: 2.00e-06 [2026-04-22 02:29:58] Epoch 3 | Step 28890 | Loss: 0.8223 | LR: 2.00e-06 [2026-04-22 02:30:03] Epoch 3 | Step 28900 | Loss: 0.8222 | LR: 2.00e-06 [2026-04-22 02:30:09] Epoch 3 | Step 28910 | Loss: 0.8223 | LR: 2.00e-06 [2026-04-22 02:30:15] Epoch 3 | Step 28920 | Loss: 0.8221 | LR: 2.00e-06 [2026-04-22 02:30:20] Epoch 3 | Step 28930 | Loss: 0.8222 | LR: 2.00e-06 [2026-04-22 02:30:25] Epoch 3 | Step 28940 | Loss: 0.8222 | LR: 2.00e-06 [2026-04-22 02:30:29] Epoch 3 | Step 28950 | Loss: 0.8221 | LR: 2.00e-06 [2026-04-22 02:30:34] Epoch 3 | Step 28960 | Loss: 0.8220 | LR: 2.00e-06 [2026-04-22 02:30:39] Epoch 3 | Step 28970 | Loss: 0.8219 | LR: 2.00e-06 [2026-04-22 02:30:46] Epoch 3 | Step 28980 | Loss: 0.8219 | LR: 2.00e-06 [2026-04-22 02:30:51] Epoch 3 | Step 28990 | Loss: 0.8217 | LR: 2.00e-06 [2026-04-22 02:30:56] Epoch 3 | Step 29000 | Loss: 0.8215 | LR: 2.00e-06 [2026-04-22 02:30:58] Validation | Batch 10/1567 | Loss: 1.0505 [2026-04-22 02:30:59] Validation | Batch 20/1567 | Loss: 1.1374 [2026-04-22 02:31:00] Validation | Batch 30/1567 | Loss: 1.0931 [2026-04-22 02:31:02] Validation | Batch 40/1567 | Loss: 1.1138 [2026-04-22 02:31:02] Validation | Batch 50/1567 | Loss: 1.0886 [2026-04-22 02:31:04] Validation | Batch 60/1567 | Loss: 1.0769 [2026-04-22 02:31:05] Validation | Batch 70/1567 | Loss: 1.0689 [2026-04-22 02:31:07] Validation | Batch 80/1567 | Loss: 1.0777 [2026-04-22 02:31:08] Validation | Batch 90/1567 | Loss: 1.0723 [2026-04-22 02:31:09] Validation | Batch 100/1567 | Loss: 1.0510 [2026-04-22 02:31:10] Validation | Batch 110/1567 | Loss: 1.0407 [2026-04-22 02:31:12] Validation | Batch 120/1567 | Loss: 1.0352 [2026-04-22 02:31:13] Validation | Batch 130/1567 | Loss: 1.0298 [2026-04-22 02:31:14] Validation | Batch 140/1567 | Loss: 1.0403 [2026-04-22 02:31:15] Validation | Batch 150/1567 | Loss: 1.0512 [2026-04-22 02:31:16] Validation | Batch 160/1567 | Loss: 1.0502 [2026-04-22 02:31:17] Validation | Batch 170/1567 | Loss: 1.0423 [2026-04-22 02:31:18] Validation | Batch 180/1567 | Loss: 1.0453 [2026-04-22 02:31:19] Validation | Batch 190/1567 | Loss: 1.0499 [2026-04-22 02:31:21] Validation | Batch 200/1567 | Loss: 1.0532 [2026-04-22 02:31:22] Validation | Batch 210/1567 | Loss: 1.0511 [2026-04-22 02:31:23] Validation | Batch 220/1567 | Loss: 1.0566 [2026-04-22 02:31:25] Validation | Batch 230/1567 | Loss: 1.0607 [2026-04-22 02:31:26] Validation | Batch 240/1567 | Loss: 1.0628 [2026-04-22 02:31:27] Validation | Batch 250/1567 | Loss: 1.0670 [2026-04-22 02:31:28] Validation | Batch 260/1567 | Loss: 1.0701 [2026-04-22 02:31:29] Validation | Batch 270/1567 | Loss: 1.0744 [2026-04-22 02:31:31] Validation | Batch 280/1567 | Loss: 1.0779 [2026-04-22 02:31:33] Validation | Batch 290/1567 | Loss: 1.0731 [2026-04-22 02:31:34] Validation | Batch 300/1567 | Loss: 1.0724 [2026-04-22 02:31:35] Validation | Batch 310/1567 | Loss: 1.0692 [2026-04-22 02:31:36] Validation | Batch 320/1567 | Loss: 1.0720 [2026-04-22 02:31:38] Validation | Batch 330/1567 | Loss: 1.0721 [2026-04-22 02:31:39] Validation | Batch 340/1567 | Loss: 1.0715 [2026-04-22 02:31:40] Validation | Batch 350/1567 | Loss: 1.0688 [2026-04-22 02:31:41] Validation | Batch 360/1567 | Loss: 1.0626 [2026-04-22 02:31:43] Validation | Batch 370/1567 | Loss: 1.0626 [2026-04-22 02:31:44] Validation | Batch 380/1567 | Loss: 1.0670 [2026-04-22 02:31:45] Validation | Batch 390/1567 | Loss: 1.0661 [2026-04-22 02:31:46] Validation | Batch 400/1567 | Loss: 1.0668 [2026-04-22 02:31:47] Validation | Batch 410/1567 | Loss: 1.0629 [2026-04-22 02:31:48] Validation | Batch 420/1567 | Loss: 1.0612 [2026-04-22 02:31:50] Validation | Batch 430/1567 | Loss: 1.0638 [2026-04-22 02:31:51] Validation | Batch 440/1567 | Loss: 1.0640 [2026-04-22 02:31:52] Validation | Batch 450/1567 | Loss: 1.0660 [2026-04-22 02:31:53] Validation | Batch 460/1567 | Loss: 1.0686 [2026-04-22 02:31:54] Validation | Batch 470/1567 | Loss: 1.0736 [2026-04-22 02:31:56] Validation | Batch 480/1567 | Loss: 1.0711 [2026-04-22 02:31:57] Validation | Batch 490/1567 | Loss: 1.0688 [2026-04-22 02:31:58] Validation | Batch 500/1567 | Loss: 1.0699 [2026-04-22 02:31:59] Validation | Batch 510/1567 | Loss: 1.0698 [2026-04-22 02:32:00] Validation | Batch 520/1567 | Loss: 1.0711 [2026-04-22 02:32:01] Validation | Batch 530/1567 | Loss: 1.0696 [2026-04-22 02:32:02] Validation | Batch 540/1567 | Loss: 1.0668 [2026-04-22 02:32:04] Validation | Batch 550/1567 | Loss: 1.0679 [2026-04-22 02:32:05] Validation | Batch 560/1567 | Loss: 1.0670 [2026-04-22 02:32:06] Validation | Batch 570/1567 | Loss: 1.0628 [2026-04-22 02:32:08] Validation | Batch 580/1567 | Loss: 1.0647 [2026-04-22 02:32:09] Validation | Batch 590/1567 | Loss: 1.0645 [2026-04-22 02:32:10] Validation | Batch 600/1567 | Loss: 1.0634 [2026-04-22 02:32:11] Validation | Batch 610/1567 | Loss: 1.0655 [2026-04-22 02:32:13] Validation | Batch 620/1567 | Loss: 1.0635 [2026-04-22 02:32:14] Validation | Batch 630/1567 | Loss: 1.0638 [2026-04-22 02:32:16] Validation | Batch 640/1567 | Loss: 1.0644 [2026-04-22 02:32:17] Validation | Batch 650/1567 | Loss: 1.0673 [2026-04-22 02:32:18] Validation | Batch 660/1567 | Loss: 1.0686 [2026-04-22 02:32:19] Validation | Batch 670/1567 | Loss: 1.0667 [2026-04-22 02:32:20] Validation | Batch 680/1567 | Loss: 1.0656 [2026-04-22 02:32:21] Validation | Batch 690/1567 | Loss: 1.0640 [2026-04-22 02:32:23] Validation | Batch 700/1567 | Loss: 1.0642 [2026-04-22 02:32:24] Validation | Batch 710/1567 | Loss: 1.0634 [2026-04-22 02:32:25] Validation | Batch 720/1567 | Loss: 1.0602 [2026-04-22 02:32:26] Validation | Batch 730/1567 | Loss: 1.0608 [2026-04-22 02:32:27] Validation | Batch 740/1567 | Loss: 1.0615 [2026-04-22 02:32:28] Validation | Batch 750/1567 | Loss: 1.0611 [2026-04-22 02:32:29] Validation | Batch 760/1567 | Loss: 1.0623 [2026-04-22 02:32:31] Validation | Batch 770/1567 | Loss: 1.0619 [2026-04-22 02:32:32] Validation | Batch 780/1567 | Loss: 1.0629 [2026-04-22 02:32:33] Validation | Batch 790/1567 | Loss: 1.0614 [2026-04-22 02:32:34] Validation | Batch 800/1567 | Loss: 1.0596 [2026-04-22 02:32:35] Validation | Batch 810/1567 | Loss: 1.0603 [2026-04-22 02:32:36] Validation | Batch 820/1567 | Loss: 1.0595 [2026-04-22 02:32:37] Validation | Batch 830/1567 | Loss: 1.0587 [2026-04-22 02:32:38] Validation | Batch 840/1567 | Loss: 1.0594 [2026-04-22 02:32:39] Validation | Batch 850/1567 | Loss: 1.0606 [2026-04-22 02:32:40] Validation | Batch 860/1567 | Loss: 1.0613 [2026-04-22 02:32:41] Validation | Batch 870/1567 | Loss: 1.0621 [2026-04-22 02:32:42] Validation | Batch 880/1567 | Loss: 1.0620 [2026-04-22 02:32:44] Validation | Batch 890/1567 | Loss: 1.0615 [2026-04-22 02:32:45] Validation | Batch 900/1567 | Loss: 1.0612 [2026-04-22 02:32:46] Validation | Batch 910/1567 | Loss: 1.0609 [2026-04-22 02:32:47] Validation | Batch 920/1567 | Loss: 1.0628 [2026-04-22 02:32:48] Validation | Batch 930/1567 | Loss: 1.0627 [2026-04-22 02:32:49] Validation | Batch 940/1567 | Loss: 1.0627 [2026-04-22 02:32:51] Validation | Batch 950/1567 | Loss: 1.0623 [2026-04-22 02:32:52] Validation | Batch 960/1567 | Loss: 1.0626 [2026-04-22 02:32:53] Validation | Batch 970/1567 | Loss: 1.0631 [2026-04-22 02:32:54] Validation | Batch 980/1567 | Loss: 1.0628 [2026-04-22 02:32:54] Validation | Batch 990/1567 | Loss: 1.0637 [2026-04-22 02:32:56] Validation | Batch 1000/1567 | Loss: 1.0642 [2026-04-22 02:32:57] Validation | Batch 1010/1567 | Loss: 1.0633 [2026-04-22 02:32:58] Validation | Batch 1020/1567 | Loss: 1.0645 [2026-04-22 02:32:59] Validation | Batch 1030/1567 | Loss: 1.0650 [2026-04-22 02:33:01] Validation | Batch 1040/1567 | Loss: 1.0641 [2026-04-22 02:33:02] Validation | Batch 1050/1567 | Loss: 1.0630 [2026-04-22 02:33:03] Validation | Batch 1060/1567 | Loss: 1.0642 [2026-04-22 02:33:04] Validation | Batch 1070/1567 | Loss: 1.0640 [2026-04-22 02:33:06] Validation | Batch 1080/1567 | Loss: 1.0654 [2026-04-22 02:33:07] Validation | Batch 1090/1567 | Loss: 1.0681 [2026-04-22 02:33:08] Validation | Batch 1100/1567 | Loss: 1.0696 [2026-04-22 02:33:09] Validation | Batch 1110/1567 | Loss: 1.0685 [2026-04-22 02:33:10] Validation | Batch 1120/1567 | Loss: 1.0687 [2026-04-22 02:33:11] Validation | Batch 1130/1567 | Loss: 1.0669 [2026-04-22 02:33:13] Validation | Batch 1140/1567 | Loss: 1.0674 [2026-04-22 02:33:14] Validation | Batch 1150/1567 | Loss: 1.0661 [2026-04-22 02:33:15] Validation | Batch 1160/1567 | Loss: 1.0654 [2026-04-22 02:33:16] Validation | Batch 1170/1567 | Loss: 1.0657 [2026-04-22 02:33:17] Validation | Batch 1180/1567 | Loss: 1.0660 [2026-04-22 02:33:18] Validation | Batch 1190/1567 | Loss: 1.0662 [2026-04-22 02:33:20] Validation | Batch 1200/1567 | Loss: 1.0649 [2026-04-22 02:33:21] Validation | Batch 1210/1567 | Loss: 1.0643 [2026-04-22 02:33:22] Validation | Batch 1220/1567 | Loss: 1.0652 [2026-04-22 02:33:23] Validation | Batch 1230/1567 | Loss: 1.0658 [2026-04-22 02:33:24] Validation | Batch 1240/1567 | Loss: 1.0656 [2026-04-22 02:33:25] Validation | Batch 1250/1567 | Loss: 1.0660 [2026-04-22 02:33:27] Validation | Batch 1260/1567 | Loss: 1.0657 [2026-04-22 02:33:28] Validation | Batch 1270/1567 | Loss: 1.0639 [2026-04-22 02:33:29] Validation | Batch 1280/1567 | Loss: 1.0641 [2026-04-22 02:33:31] Validation | Batch 1290/1567 | Loss: 1.0642 [2026-04-22 02:33:32] Validation | Batch 1300/1567 | Loss: 1.0645 [2026-04-22 02:33:33] Validation | Batch 1310/1567 | Loss: 1.0653 [2026-04-22 02:33:34] Validation | Batch 1320/1567 | Loss: 1.0658 [2026-04-22 02:33:35] Validation | Batch 1330/1567 | Loss: 1.0673 [2026-04-22 02:33:37] Validation | Batch 1340/1567 | Loss: 1.0670 [2026-04-22 02:33:37] Validation | Batch 1350/1567 | Loss: 1.0673 [2026-04-22 02:33:38] Validation | Batch 1360/1567 | Loss: 1.0663 [2026-04-22 02:33:40] Validation | Batch 1370/1567 | Loss: 1.0660 [2026-04-22 02:33:41] Validation | Batch 1380/1567 | Loss: 1.0660 [2026-04-22 02:33:42] Validation | Batch 1390/1567 | Loss: 1.0653 [2026-04-22 02:33:43] Validation | Batch 1400/1567 | Loss: 1.0649 [2026-04-22 02:33:44] Validation | Batch 1410/1567 | Loss: 1.0654 [2026-04-22 02:33:45] Validation | Batch 1420/1567 | Loss: 1.0655 [2026-04-22 02:33:46] Validation | Batch 1430/1567 | Loss: 1.0658 [2026-04-22 02:33:48] Validation | Batch 1440/1567 | Loss: 1.0666 [2026-04-22 02:33:49] Validation | Batch 1450/1567 | Loss: 1.0668 [2026-04-22 02:33:49] Validation | Batch 1460/1567 | Loss: 1.0661 [2026-04-22 02:33:51] Validation | Batch 1470/1567 | Loss: 1.0659 [2026-04-22 02:33:52] Validation | Batch 1480/1567 | Loss: 1.0656 [2026-04-22 02:33:52] Validation | Batch 1490/1567 | Loss: 1.0651 [2026-04-22 02:33:54] Validation | Batch 1500/1567 | Loss: 1.0648 [2026-04-22 02:33:55] Validation | Batch 1510/1567 | Loss: 1.0639 [2026-04-22 02:33:56] Validation | Batch 1520/1567 | Loss: 1.0638 [2026-04-22 02:33:56] Validation | Batch 1530/1567 | Loss: 1.0638 [2026-04-22 02:33:58] Validation | Batch 1540/1567 | Loss: 1.0644 [2026-04-22 02:33:59] Validation | Batch 1550/1567 | Loss: 1.0658 [2026-04-22 02:34:00] Validation | Batch 1560/1567 | Loss: 1.0653 [2026-04-22 02:34:01] Validation | Batch 1567/1567 | Loss: 1.0654 [2026-04-22 02:34:01] Validation | Loss: 1.0654 | PPL: 2.95 | Time: 184.66s [2026-04-22 02:34:07] Epoch 3 | Step 29010 | Loss: 0.8213 | LR: 2.00e-06 [2026-04-22 02:34:12] Epoch 3 | Step 29020 | Loss: 0.8213 | LR: 2.00e-06 [2026-04-22 02:34:16] Epoch 3 | Step 29030 | Loss: 0.8213 | LR: 2.00e-06 [2026-04-22 02:34:22] Epoch 3 | Step 29040 | Loss: 0.8213 | LR: 2.00e-06 [2026-04-22 02:34:27] Epoch 3 | Step 29050 | Loss: 0.8212 | LR: 2.00e-06 [2026-04-22 02:34:32] Epoch 3 | Step 29060 | Loss: 0.8212 | LR: 2.00e-06 [2026-04-22 02:34:37] Epoch 3 | Step 29070 | Loss: 0.8212 | LR: 2.00e-06 [2026-04-22 02:34:44] Epoch 3 | Step 29080 | Loss: 0.8211 | LR: 2.00e-06 [2026-04-22 02:34:49] Epoch 3 | Step 29090 | Loss: 0.8211 | LR: 2.00e-06 [2026-04-22 02:34:56] Epoch 3 | Step 29100 | Loss: 0.8212 | LR: 2.00e-06 [2026-04-22 02:35:01] Epoch 3 | Step 29110 | Loss: 0.8211 | LR: 2.00e-06 [2026-04-22 02:35:07] Epoch 3 | Step 29120 | Loss: 0.8210 | LR: 2.00e-06 [2026-04-22 02:35:12] Epoch 3 | Step 29130 | Loss: 0.8209 | LR: 2.00e-06 [2026-04-22 02:35:17] Epoch 3 | Step 29140 | Loss: 0.8209 | LR: 2.00e-06 [2026-04-22 02:35:22] Epoch 3 | Step 29150 | Loss: 0.8208 | LR: 2.00e-06 [2026-04-22 02:35:28] Epoch 3 | Step 29160 | Loss: 0.8209 | LR: 2.00e-06 [2026-04-22 02:35:32] Epoch 3 | Step 29170 | Loss: 0.8209 | LR: 2.00e-06 [2026-04-22 02:35:38] Epoch 3 | Step 29180 | Loss: 0.8208 | LR: 2.00e-06 [2026-04-22 02:35:42] Epoch 3 | Step 29190 | Loss: 0.8208 | LR: 2.00e-06 [2026-04-22 02:35:48] Epoch 3 | Step 29200 | Loss: 0.8207 | LR: 2.00e-06 [2026-04-22 02:35:54] Epoch 3 | Step 29210 | Loss: 0.8208 | LR: 2.00e-06 [2026-04-22 02:35:59] Epoch 3 | Step 29220 | Loss: 0.8208 | LR: 2.00e-06 [2026-04-22 02:36:04] Epoch 3 | Step 29230 | Loss: 0.8209 | LR: 2.00e-06 [2026-04-22 02:36:10] Epoch 3 | Step 29240 | Loss: 0.8209 | LR: 2.00e-06 [2026-04-22 02:36:16] Epoch 3 | Step 29250 | Loss: 0.8209 | LR: 2.00e-06 [2026-04-22 02:36:21] Epoch 3 | Step 29260 | Loss: 0.8208 | LR: 2.00e-06 [2026-04-22 02:36:27] Epoch 3 | Step 29270 | Loss: 0.8208 | LR: 2.00e-06 [2026-04-22 02:36:32] Epoch 3 | Step 29280 | Loss: 0.8207 | LR: 2.00e-06 [2026-04-22 02:36:37] Epoch 3 | Step 29290 | Loss: 0.8208 | LR: 2.00e-06 [2026-04-22 02:36:42] Epoch 3 | Step 29300 | Loss: 0.8207 | LR: 2.00e-06 [2026-04-22 02:36:47] Epoch 3 | Step 29310 | Loss: 0.8208 | LR: 2.00e-06 [2026-04-22 02:36:52] Epoch 3 | Step 29320 | Loss: 0.8207 | LR: 2.00e-06 [2026-04-22 02:36:57] Epoch 3 | Step 29330 | Loss: 0.8207 | LR: 2.00e-06 [2026-04-22 02:37:01] Epoch 3 | Step 29340 | Loss: 0.8207 | LR: 2.00e-06 [2026-04-22 02:37:07] Epoch 3 | Step 29350 | Loss: 0.8206 | LR: 2.00e-06 [2026-04-22 02:37:12] Epoch 3 | Step 29360 | Loss: 0.8204 | LR: 2.00e-06 [2026-04-22 02:37:17] Epoch 3 | Step 29370 | Loss: 0.8205 | LR: 2.00e-06 [2026-04-22 02:37:23] Epoch 3 | Step 29380 | Loss: 0.8205 | LR: 2.00e-06 [2026-04-22 02:37:28] Epoch 3 | Step 29390 | Loss: 0.8207 | LR: 2.00e-06 [2026-04-22 02:37:34] Epoch 3 | Step 29400 | Loss: 0.8207 | LR: 2.00e-06 [2026-04-22 02:37:40] Epoch 3 | Step 29410 | Loss: 0.8206 | LR: 2.00e-06 [2026-04-22 02:37:44] Epoch 3 | Step 29420 | Loss: 0.8207 | LR: 2.00e-06 [2026-04-22 02:37:49] Epoch 3 | Step 29430 | Loss: 0.8208 | LR: 2.00e-06 [2026-04-22 02:37:55] Epoch 3 | Step 29440 | Loss: 0.8209 | LR: 2.00e-06 [2026-04-22 02:38:00] Epoch 3 | Step 29450 | Loss: 0.8208 | LR: 2.00e-06 [2026-04-22 02:38:06] Epoch 3 | Step 29460 | Loss: 0.8207 | LR: 2.00e-06 [2026-04-22 02:38:11] Epoch 3 | Step 29470 | Loss: 0.8205 | LR: 2.00e-06 [2026-04-22 02:38:17] Epoch 3 | Step 29480 | Loss: 0.8206 | LR: 2.00e-06 [2026-04-22 02:38:22] Epoch 3 | Step 29490 | Loss: 0.8206 | LR: 2.00e-06 [2026-04-22 02:38:27] Epoch 3 | Step 29500 | Loss: 0.8206 | LR: 2.00e-06 [2026-04-22 02:38:32] Epoch 3 | Step 29510 | Loss: 0.8206 | LR: 2.00e-06 [2026-04-22 02:38:37] Epoch 3 | Step 29520 | Loss: 0.8206 | LR: 2.00e-06 [2026-04-22 02:38:42] Epoch 3 | Step 29530 | Loss: 0.8206 | LR: 2.00e-06 [2026-04-22 02:38:48] Epoch 3 | Step 29540 | Loss: 0.8206 | LR: 2.00e-06 [2026-04-22 02:38:53] Epoch 3 | Step 29550 | Loss: 0.8207 | LR: 2.00e-06 [2026-04-22 02:38:59] Epoch 3 | Step 29560 | Loss: 0.8206 | LR: 2.00e-06 [2026-04-22 02:39:04] Epoch 3 | Step 29570 | Loss: 0.8206 | LR: 2.00e-06 [2026-04-22 02:39:09] Epoch 3 | Step 29580 | Loss: 0.8206 | LR: 2.00e-06 [2026-04-22 02:39:14] Epoch 3 | Step 29590 | Loss: 0.8205 | LR: 2.00e-06 [2026-04-22 02:39:20] Epoch 3 | Step 29600 | Loss: 0.8205 | LR: 2.00e-06 [2026-04-22 02:39:25] Epoch 3 | Step 29610 | Loss: 0.8204 | LR: 2.00e-06 [2026-04-22 02:39:30] Epoch 3 | Step 29620 | Loss: 0.8204 | LR: 2.00e-06 [2026-04-22 02:39:35] Epoch 3 | Step 29630 | Loss: 0.8204 | LR: 2.00e-06 [2026-04-22 02:39:40] Epoch 3 | Step 29640 | Loss: 0.8203 | LR: 2.00e-06 [2026-04-22 02:39:45] Epoch 3 | Step 29650 | Loss: 0.8204 | LR: 2.00e-06 [2026-04-22 02:39:51] Epoch 3 | Step 29660 | Loss: 0.8203 | LR: 2.00e-06 [2026-04-22 02:39:56] Epoch 3 | Step 29670 | Loss: 0.8202 | LR: 2.00e-06 [2026-04-22 02:40:02] Epoch 3 | Step 29680 | Loss: 0.8202 | LR: 2.00e-06 [2026-04-22 02:40:07] Epoch 3 | Step 29690 | Loss: 0.8202 | LR: 2.00e-06 [2026-04-22 02:40:12] Epoch 3 | Step 29700 | Loss: 0.8202 | LR: 2.00e-06 [2026-04-22 02:40:19] Epoch 3 | Step 29710 | Loss: 0.8203 | LR: 2.00e-06 [2026-04-22 02:40:24] Epoch 3 | Step 29720 | Loss: 0.8202 | LR: 2.00e-06 [2026-04-22 02:40:29] Epoch 3 | Step 29730 | Loss: 0.8200 | LR: 2.00e-06 [2026-04-22 02:40:35] Epoch 3 | Step 29740 | Loss: 0.8201 | LR: 2.00e-06 [2026-04-22 02:40:40] Epoch 3 | Step 29750 | Loss: 0.8201 | LR: 2.00e-06 [2026-04-22 02:40:46] Epoch 3 | Step 29760 | Loss: 0.8201 | LR: 2.00e-06 [2026-04-22 02:40:51] Epoch 3 | Step 29770 | Loss: 0.8201 | LR: 2.00e-06 [2026-04-22 02:40:55] Epoch 3 | Step 29780 | Loss: 0.8201 | LR: 2.00e-06 [2026-04-22 02:41:00] Epoch 3 | Step 29790 | Loss: 0.8202 | LR: 2.00e-06 [2026-04-22 02:41:05] Epoch 3 | Step 29800 | Loss: 0.8202 | LR: 2.00e-06 [2026-04-22 02:41:10] Epoch 3 | Step 29810 | Loss: 0.8203 | LR: 2.00e-06 [2026-04-22 02:41:16] Epoch 3 | Step 29820 | Loss: 0.8202 | LR: 2.00e-06 [2026-04-22 02:41:21] Epoch 3 | Step 29830 | Loss: 0.8202 | LR: 2.00e-06 [2026-04-22 02:41:27] Epoch 3 | Step 29840 | Loss: 0.8202 | LR: 2.00e-06 [2026-04-22 02:41:31] Epoch 3 | Step 29850 | Loss: 0.8201 | LR: 2.00e-06 [2026-04-22 02:41:38] Epoch 3 | Step 29860 | Loss: 0.8200 | LR: 2.00e-06 [2026-04-22 02:41:43] Epoch 3 | Step 29870 | Loss: 0.8200 | LR: 2.00e-06 [2026-04-22 02:41:49] Epoch 3 | Step 29880 | Loss: 0.8199 | LR: 2.00e-06 [2026-04-22 02:41:54] Epoch 3 | Step 29890 | Loss: 0.8200 | LR: 2.00e-06 [2026-04-22 02:42:00] Epoch 3 | Step 29900 | Loss: 0.8200 | LR: 2.00e-06 [2026-04-22 02:42:05] Epoch 3 | Step 29910 | Loss: 0.8200 | LR: 2.00e-06 [2026-04-22 02:42:11] Epoch 3 | Step 29920 | Loss: 0.8201 | LR: 2.00e-06 [2026-04-22 02:42:16] Epoch 3 | Step 29930 | Loss: 0.8201 | LR: 2.00e-06 [2026-04-22 02:42:21] Epoch 3 | Step 29940 | Loss: 0.8201 | LR: 2.00e-06 [2026-04-22 02:42:27] Epoch 3 | Step 29950 | Loss: 0.8202 | LR: 2.00e-06 [2026-04-22 02:42:33] Epoch 3 | Step 29960 | Loss: 0.8202 | LR: 2.00e-06 [2026-04-22 02:42:38] Epoch 3 | Step 29970 | Loss: 0.8202 | LR: 2.00e-06 [2026-04-22 02:42:43] Epoch 3 | Step 29980 | Loss: 0.8202 | LR: 2.00e-06 [2026-04-22 02:42:48] Epoch 3 | Step 29990 | Loss: 0.8201 | LR: 2.00e-06 [2026-04-22 02:42:54] Epoch 3 | Step 30000 | Loss: 0.8200 | LR: 2.00e-06 [2026-04-22 02:43:04] Checkpoint saved: outputs/2026-04-21/20-28-37/checkpoints/checkpoint_step_30000.pt [2026-04-22 02:44:21] Validation | Batch 10/1567 | Loss: 1.0505 [2026-04-22 02:44:22] Validation | Batch 20/1567 | Loss: 1.1377 [2026-04-22 02:44:23] Validation | Batch 30/1567 | Loss: 1.0933 [2026-04-22 02:44:25] Validation | Batch 40/1567 | Loss: 1.1140 [2026-04-22 02:44:25] Validation | Batch 50/1567 | Loss: 1.0891 [2026-04-22 02:44:27] Validation | Batch 60/1567 | Loss: 1.0773 [2026-04-22 02:44:28] Validation | Batch 70/1567 | Loss: 1.0693 [2026-04-22 02:44:30] Validation | Batch 80/1567 | Loss: 1.0782 [2026-04-22 02:44:31] Validation | Batch 90/1567 | Loss: 1.0727 [2026-04-22 02:44:32] Validation | Batch 100/1567 | Loss: 1.0514 [2026-04-22 02:44:33] Validation | Batch 110/1567 | Loss: 1.0412 [2026-04-22 02:44:35] Validation | Batch 120/1567 | Loss: 1.0356 [2026-04-22 02:44:36] Validation | Batch 130/1567 | Loss: 1.0302 [2026-04-22 02:44:37] Validation | Batch 140/1567 | Loss: 1.0407 [2026-04-22 02:44:38] Validation | Batch 150/1567 | Loss: 1.0517 [2026-04-22 02:44:40] Validation | Batch 160/1567 | Loss: 1.0506 [2026-04-22 02:44:41] Validation | Batch 170/1567 | Loss: 1.0427 [2026-04-22 02:44:42] Validation | Batch 180/1567 | Loss: 1.0458 [2026-04-22 02:44:43] Validation | Batch 190/1567 | Loss: 1.0503 [2026-04-22 02:44:45] Validation | Batch 200/1567 | Loss: 1.0536 [2026-04-22 02:44:46] Validation | Batch 210/1567 | Loss: 1.0515 [2026-04-22 02:44:47] Validation | Batch 220/1567 | Loss: 1.0569 [2026-04-22 02:44:49] Validation | Batch 230/1567 | Loss: 1.0610 [2026-04-22 02:44:50] Validation | Batch 240/1567 | Loss: 1.0631 [2026-04-22 02:44:51] Validation | Batch 250/1567 | Loss: 1.0674 [2026-04-22 02:44:52] Validation | Batch 260/1567 | Loss: 1.0705 [2026-04-22 02:44:53] Validation | Batch 270/1567 | Loss: 1.0748 [2026-04-22 02:44:55] Validation | Batch 280/1567 | Loss: 1.0784 [2026-04-22 02:44:57] Validation | Batch 290/1567 | Loss: 1.0736 [2026-04-22 02:44:58] Validation | Batch 300/1567 | Loss: 1.0729 [2026-04-22 02:44:59] Validation | Batch 310/1567 | Loss: 1.0696 [2026-04-22 02:45:00] Validation | Batch 320/1567 | Loss: 1.0724 [2026-04-22 02:45:01] Validation | Batch 330/1567 | Loss: 1.0725 [2026-04-22 02:45:03] Validation | Batch 340/1567 | Loss: 1.0719 [2026-04-22 02:45:04] Validation | Batch 350/1567 | Loss: 1.0692 [2026-04-22 02:45:05] Validation | Batch 360/1567 | Loss: 1.0629 [2026-04-22 02:45:06] Validation | Batch 370/1567 | Loss: 1.0629 [2026-04-22 02:45:07] Validation | Batch 380/1567 | Loss: 1.0673 [2026-04-22 02:45:09] Validation | Batch 390/1567 | Loss: 1.0664 [2026-04-22 02:45:10] Validation | Batch 400/1567 | Loss: 1.0671 [2026-04-22 02:45:11] Validation | Batch 410/1567 | Loss: 1.0632 [2026-04-22 02:45:12] Validation | Batch 420/1567 | Loss: 1.0614 [2026-04-22 02:45:14] Validation | Batch 430/1567 | Loss: 1.0640 [2026-04-22 02:45:15] Validation | Batch 440/1567 | Loss: 1.0641 [2026-04-22 02:45:16] Validation | Batch 450/1567 | Loss: 1.0662 [2026-04-22 02:45:17] Validation | Batch 460/1567 | Loss: 1.0688 [2026-04-22 02:45:18] Validation | Batch 470/1567 | Loss: 1.0737 [2026-04-22 02:45:19] Validation | Batch 480/1567 | Loss: 1.0713 [2026-04-22 02:45:20] Validation | Batch 490/1567 | Loss: 1.0690 [2026-04-22 02:45:21] Validation | Batch 500/1567 | Loss: 1.0700 [2026-04-22 02:45:23] Validation | Batch 510/1567 | Loss: 1.0699 [2026-04-22 02:45:24] Validation | Batch 520/1567 | Loss: 1.0712 [2026-04-22 02:45:25] Validation | Batch 530/1567 | Loss: 1.0696 [2026-04-22 02:45:26] Validation | Batch 540/1567 | Loss: 1.0669 [2026-04-22 02:45:28] Validation | Batch 550/1567 | Loss: 1.0680 [2026-04-22 02:45:29] Validation | Batch 560/1567 | Loss: 1.0671 [2026-04-22 02:45:30] Validation | Batch 570/1567 | Loss: 1.0629 [2026-04-22 02:45:32] Validation | Batch 580/1567 | Loss: 1.0648 [2026-04-22 02:45:33] Validation | Batch 590/1567 | Loss: 1.0646 [2026-04-22 02:45:34] Validation | Batch 600/1567 | Loss: 1.0635 [2026-04-22 02:45:35] Validation | Batch 610/1567 | Loss: 1.0655 [2026-04-22 02:45:36] Validation | Batch 620/1567 | Loss: 1.0635 [2026-04-22 02:45:38] Validation | Batch 630/1567 | Loss: 1.0638 [2026-04-22 02:45:39] Validation | Batch 640/1567 | Loss: 1.0644 [2026-04-22 02:45:41] Validation | Batch 650/1567 | Loss: 1.0673 [2026-04-22 02:45:42] Validation | Batch 660/1567 | Loss: 1.0686 [2026-04-22 02:45:43] Validation | Batch 670/1567 | Loss: 1.0668 [2026-04-22 02:45:44] Validation | Batch 680/1567 | Loss: 1.0656 [2026-04-22 02:45:45] Validation | Batch 690/1567 | Loss: 1.0640 [2026-04-22 02:45:46] Validation | Batch 700/1567 | Loss: 1.0642 [2026-04-22 02:45:48] Validation | Batch 710/1567 | Loss: 1.0634 [2026-04-22 02:45:49] Validation | Batch 720/1567 | Loss: 1.0602 [2026-04-22 02:45:50] Validation | Batch 730/1567 | Loss: 1.0608 [2026-04-22 02:45:51] Validation | Batch 740/1567 | Loss: 1.0615 [2026-04-22 02:45:52] Validation | Batch 750/1567 | Loss: 1.0611 [2026-04-22 02:45:53] Validation | Batch 760/1567 | Loss: 1.0624 [2026-04-22 02:45:55] Validation | Batch 770/1567 | Loss: 1.0619 [2026-04-22 02:45:56] Validation | Batch 780/1567 | Loss: 1.0629 [2026-04-22 02:45:57] Validation | Batch 790/1567 | Loss: 1.0614 [2026-04-22 02:45:58] Validation | Batch 800/1567 | Loss: 1.0596 [2026-04-22 02:45:59] Validation | Batch 810/1567 | Loss: 1.0603 [2026-04-22 02:46:00] Validation | Batch 820/1567 | Loss: 1.0595 [2026-04-22 02:46:01] Validation | Batch 830/1567 | Loss: 1.0588 [2026-04-22 02:46:02] Validation | Batch 840/1567 | Loss: 1.0595 [2026-04-22 02:46:03] Validation | Batch 850/1567 | Loss: 1.0606 [2026-04-22 02:46:04] Validation | Batch 860/1567 | Loss: 1.0614 [2026-04-22 02:46:05] Validation | Batch 870/1567 | Loss: 1.0622 [2026-04-22 02:46:06] Validation | Batch 880/1567 | Loss: 1.0620 [2026-04-22 02:46:08] Validation | Batch 890/1567 | Loss: 1.0616 [2026-04-22 02:46:09] Validation | Batch 900/1567 | Loss: 1.0613 [2026-04-22 02:46:10] Validation | Batch 910/1567 | Loss: 1.0610 [2026-04-22 02:46:11] Validation | Batch 920/1567 | Loss: 1.0629 [2026-04-22 02:46:12] Validation | Batch 930/1567 | Loss: 1.0628 [2026-04-22 02:46:13] Validation | Batch 940/1567 | Loss: 1.0627 [2026-04-22 02:46:14] Validation | Batch 950/1567 | Loss: 1.0623 [2026-04-22 02:46:15] Validation | Batch 960/1567 | Loss: 1.0627 [2026-04-22 02:46:16] Validation | Batch 970/1567 | Loss: 1.0632 [2026-04-22 02:46:17] Validation | Batch 980/1567 | Loss: 1.0628 [2026-04-22 02:46:18] Validation | Batch 990/1567 | Loss: 1.0638 [2026-04-22 02:46:19] Validation | Batch 1000/1567 | Loss: 1.0642 [2026-04-22 02:46:21] Validation | Batch 1010/1567 | Loss: 1.0633 [2026-04-22 02:46:22] Validation | Batch 1020/1567 | Loss: 1.0645 [2026-04-22 02:46:23] Validation | Batch 1030/1567 | Loss: 1.0650 [2026-04-22 02:46:25] Validation | Batch 1040/1567 | Loss: 1.0641 [2026-04-22 02:46:26] Validation | Batch 1050/1567 | Loss: 1.0630 [2026-04-22 02:46:27] Validation | Batch 1060/1567 | Loss: 1.0642 [2026-04-22 02:46:28] Validation | Batch 1070/1567 | Loss: 1.0640 [2026-04-22 02:46:29] Validation | Batch 1080/1567 | Loss: 1.0654 [2026-04-22 02:46:31] Validation | Batch 1090/1567 | Loss: 1.0681 [2026-04-22 02:46:32] Validation | Batch 1100/1567 | Loss: 1.0696 [2026-04-22 02:46:33] Validation | Batch 1110/1567 | Loss: 1.0686 [2026-04-22 02:46:34] Validation | Batch 1120/1567 | Loss: 1.0688 [2026-04-22 02:46:35] Validation | Batch 1130/1567 | Loss: 1.0669 [2026-04-22 02:46:36] Validation | Batch 1140/1567 | Loss: 1.0674 [2026-04-22 02:46:38] Validation | Batch 1150/1567 | Loss: 1.0661 [2026-04-22 02:46:38] Validation | Batch 1160/1567 | Loss: 1.0654 [2026-04-22 02:46:40] Validation | Batch 1170/1567 | Loss: 1.0658 [2026-04-22 02:46:41] Validation | Batch 1180/1567 | Loss: 1.0660 [2026-04-22 02:46:42] Validation | Batch 1190/1567 | Loss: 1.0662 [2026-04-22 02:46:43] Validation | Batch 1200/1567 | Loss: 1.0649 [2026-04-22 02:46:45] Validation | Batch 1210/1567 | Loss: 1.0643 [2026-04-22 02:46:45] Validation | Batch 1220/1567 | Loss: 1.0652 [2026-04-22 02:46:47] Validation | Batch 1230/1567 | Loss: 1.0658 [2026-04-22 02:46:48] Validation | Batch 1240/1567 | Loss: 1.0656 [2026-04-22 02:46:49] Validation | Batch 1250/1567 | Loss: 1.0660 [2026-04-22 02:46:50] Validation | Batch 1260/1567 | Loss: 1.0657 [2026-04-22 02:46:52] Validation | Batch 1270/1567 | Loss: 1.0640 [2026-04-22 02:46:53] Validation | Batch 1280/1567 | Loss: 1.0641 [2026-04-22 02:46:55] Validation | Batch 1290/1567 | Loss: 1.0642 [2026-04-22 02:46:56] Validation | Batch 1300/1567 | Loss: 1.0645 [2026-04-22 02:46:57] Validation | Batch 1310/1567 | Loss: 1.0653 [2026-04-22 02:46:58] Validation | Batch 1320/1567 | Loss: 1.0659 [2026-04-22 02:46:59] Validation | Batch 1330/1567 | Loss: 1.0673 [2026-04-22 02:47:00] Validation | Batch 1340/1567 | Loss: 1.0670 [2026-04-22 02:47:01] Validation | Batch 1350/1567 | Loss: 1.0673 [2026-04-22 02:47:02] Validation | Batch 1360/1567 | Loss: 1.0664 [2026-04-22 02:47:04] Validation | Batch 1370/1567 | Loss: 1.0660 [2026-04-22 02:47:05] Validation | Batch 1380/1567 | Loss: 1.0660 [2026-04-22 02:47:06] Validation | Batch 1390/1567 | Loss: 1.0653 [2026-04-22 02:47:07] Validation | Batch 1400/1567 | Loss: 1.0649 [2026-04-22 02:47:08] Validation | Batch 1410/1567 | Loss: 1.0655 [2026-04-22 02:47:09] Validation | Batch 1420/1567 | Loss: 1.0655 [2026-04-22 02:47:10] Validation | Batch 1430/1567 | Loss: 1.0658 [2026-04-22 02:47:11] Validation | Batch 1440/1567 | Loss: 1.0666 [2026-04-22 02:47:12] Validation | Batch 1450/1567 | Loss: 1.0668 [2026-04-22 02:47:13] Validation | Batch 1460/1567 | Loss: 1.0661 [2026-04-22 02:47:14] Validation | Batch 1470/1567 | Loss: 1.0659 [2026-04-22 02:47:15] Validation | Batch 1480/1567 | Loss: 1.0657 [2026-04-22 02:47:16] Validation | Batch 1490/1567 | Loss: 1.0652 [2026-04-22 02:47:18] Validation | Batch 1500/1567 | Loss: 1.0649 [2026-04-22 02:47:19] Validation | Batch 1510/1567 | Loss: 1.0639 [2026-04-22 02:47:19] Validation | Batch 1520/1567 | Loss: 1.0638 [2026-04-22 02:47:20] Validation | Batch 1530/1567 | Loss: 1.0638 [2026-04-22 02:47:22] Validation | Batch 1540/1567 | Loss: 1.0645 [2026-04-22 02:47:23] Validation | Batch 1550/1567 | Loss: 1.0658 [2026-04-22 02:47:24] Validation | Batch 1560/1567 | Loss: 1.0654 [2026-04-22 02:47:25] Validation | Batch 1567/1567 | Loss: 1.0654 [2026-04-22 02:47:25] Validation | Loss: 1.0654 | PPL: 2.95 | Time: 185.43s [2026-04-22 02:47:30] Epoch 3 | Step 30010 | Loss: 0.8201 | LR: 2.00e-06 [2026-04-22 02:47:35] Epoch 3 | Step 30020 | Loss: 0.8200 | LR: 2.00e-06 [2026-04-22 02:47:40] Epoch 3 | Step 30030 | Loss: 0.8200 | LR: 2.00e-06 [2026-04-22 02:47:46] Epoch 3 | Step 30040 | Loss: 0.8201 | LR: 2.00e-06 [2026-04-22 02:47:52] Epoch 3 | Step 30050 | Loss: 0.8200 | LR: 2.00e-06 [2026-04-22 02:47:57] Epoch 3 | Step 30060 | Loss: 0.8200 | LR: 2.00e-06 [2026-04-22 02:48:02] Epoch 3 | Step 30070 | Loss: 0.8199 | LR: 2.00e-06 [2026-04-22 02:48:07] Epoch 3 | Step 30080 | Loss: 0.8199 | LR: 2.00e-06 [2026-04-22 02:48:12] Epoch 3 | Step 30090 | Loss: 0.8199 | LR: 2.00e-06 [2026-04-22 02:48:17] Epoch 3 | Step 30100 | Loss: 0.8200 | LR: 2.00e-06 [2026-04-22 02:48:22] Epoch 3 | Step 30110 | Loss: 0.8200 | LR: 2.00e-06 [2026-04-22 02:48:28] Epoch 3 | Step 30120 | Loss: 0.8200 | LR: 2.00e-06 [2026-04-22 02:48:33] Epoch 3 | Step 30130 | Loss: 0.8201 | LR: 2.00e-06 [2026-04-22 02:48:38] Epoch 3 | Step 30140 | Loss: 0.8199 | LR: 2.00e-06 [2026-04-22 02:48:43] Epoch 3 | Step 30150 | Loss: 0.8198 | LR: 2.00e-06 [2026-04-22 02:48:49] Epoch 3 | Step 30160 | Loss: 0.8197 | LR: 2.00e-06 [2026-04-22 02:48:54] Epoch 3 | Step 30170 | Loss: 0.8197 | LR: 2.00e-06 [2026-04-22 02:48:59] Epoch 3 | Step 30180 | Loss: 0.8196 | LR: 2.00e-06 [2026-04-22 02:49:05] Epoch 3 | Step 30190 | Loss: 0.8196 | LR: 2.00e-06 [2026-04-22 02:49:10] Epoch 3 | Step 30200 | Loss: 0.8197 | LR: 2.00e-06 [2026-04-22 02:49:16] Epoch 3 | Step 30210 | Loss: 0.8198 | LR: 2.00e-06 [2026-04-22 02:49:21] Epoch 3 | Step 30220 | Loss: 0.8199 | LR: 2.00e-06 [2026-04-22 02:49:27] Epoch 3 | Step 30230 | Loss: 0.8199 | LR: 2.00e-06 [2026-04-22 02:49:32] Epoch 3 | Step 30240 | Loss: 0.8199 | LR: 2.00e-06 [2026-04-22 02:49:38] Epoch 3 | Step 30250 | Loss: 0.8199 | LR: 2.00e-06 [2026-04-22 02:49:43] Epoch 3 | Step 30260 | Loss: 0.8199 | LR: 2.00e-06 [2026-04-22 02:49:48] Epoch 3 | Step 30270 | Loss: 0.8199 | LR: 2.00e-06 [2026-04-22 02:49:53] Epoch 3 | Step 30280 | Loss: 0.8200 | LR: 2.00e-06 [2026-04-22 02:49:58] Epoch 3 | Step 30290 | Loss: 0.8199 | LR: 2.00e-06 [2026-04-22 02:50:04] Epoch 3 | Step 30300 | Loss: 0.8199 | LR: 2.00e-06 [2026-04-22 02:50:09] Epoch 3 | Step 30310 | Loss: 0.8199 | LR: 2.00e-06 [2026-04-22 02:50:15] Epoch 3 | Step 30320 | Loss: 0.8199 | LR: 2.00e-06 [2026-04-22 02:50:20] Epoch 3 | Step 30330 | Loss: 0.8197 | LR: 2.00e-06 [2026-04-22 02:50:25] Epoch 3 | Step 30340 | Loss: 0.8199 | LR: 2.00e-06 [2026-04-22 02:50:31] Epoch 3 | Step 30350 | Loss: 0.8198 | LR: 2.00e-06 [2026-04-22 02:50:37] Epoch 3 | Step 30360 | Loss: 0.8198 | LR: 2.00e-06 [2026-04-22 02:50:42] Epoch 3 | Step 30370 | Loss: 0.8198 | LR: 2.00e-06 [2026-04-22 02:50:48] Epoch 3 | Step 30380 | Loss: 0.8199 | LR: 2.00e-06 [2026-04-22 02:50:53] Epoch 3 | Step 30390 | Loss: 0.8198 | LR: 2.00e-06 [2026-04-22 02:50:58] Epoch 3 | Step 30400 | Loss: 0.8199 | LR: 2.00e-06 [2026-04-22 02:51:03] Epoch 3 | Step 30410 | Loss: 0.8198 | LR: 2.00e-06 [2026-04-22 02:51:08] Epoch 3 | Step 30420 | Loss: 0.8198 | LR: 2.00e-06 [2026-04-22 02:51:13] Epoch 3 | Step 30430 | Loss: 0.8198 | LR: 2.00e-06 [2026-04-22 02:51:18] Epoch 3 | Step 30440 | Loss: 0.8199 | LR: 2.00e-06 [2026-04-22 02:51:23] Epoch 3 | Step 30450 | Loss: 0.8199 | LR: 2.00e-06 [2026-04-22 02:51:28] Epoch 3 | Step 30460 | Loss: 0.8200 | LR: 2.00e-06 [2026-04-22 02:51:33] Epoch 3 | Step 30470 | Loss: 0.8200 | LR: 2.00e-06 [2026-04-22 02:51:39] Epoch 3 | Step 30480 | Loss: 0.8201 | LR: 2.00e-06 [2026-04-22 02:51:44] Epoch 3 | Step 30490 | Loss: 0.8203 | LR: 2.00e-06 [2026-04-22 02:51:49] Epoch 3 | Step 30500 | Loss: 0.8203 | LR: 2.00e-06 [2026-04-22 02:51:54] Epoch 3 | Step 30510 | Loss: 0.8203 | LR: 2.00e-06 [2026-04-22 02:52:00] Epoch 3 | Step 30520 | Loss: 0.8202 | LR: 2.00e-06 [2026-04-22 02:52:06] Epoch 3 | Step 30530 | Loss: 0.8202 | LR: 2.00e-06 [2026-04-22 02:52:12] Epoch 3 | Step 30540 | Loss: 0.8201 | LR: 2.00e-06 [2026-04-22 02:52:18] Epoch 3 | Step 30550 | Loss: 0.8201 | LR: 2.00e-06 [2026-04-22 02:52:23] Epoch 3 | Step 30560 | Loss: 0.8202 | LR: 2.00e-06 [2026-04-22 02:52:28] Epoch 3 | Step 30570 | Loss: 0.8203 | LR: 2.00e-06 [2026-04-22 02:52:33] Epoch 3 | Step 30580 | Loss: 0.8203 | LR: 2.00e-06 [2026-04-22 02:52:38] Epoch 3 | Step 30590 | Loss: 0.8202 | LR: 2.00e-06 [2026-04-22 02:52:44] Epoch 3 | Step 30600 | Loss: 0.8202 | LR: 2.00e-06 [2026-04-22 02:52:48] Epoch 3 | Step 30610 | Loss: 0.8202 | LR: 2.00e-06 [2026-04-22 02:52:54] Epoch 3 | Step 30620 | Loss: 0.8203 | LR: 2.00e-06 [2026-04-22 02:52:59] Epoch 3 | Step 30630 | Loss: 0.8204 | LR: 2.00e-06 [2026-04-22 02:53:04] Epoch 3 | Step 30640 | Loss: 0.8205 | LR: 2.00e-06 [2026-04-22 02:53:09] Epoch 3 | Step 30650 | Loss: 0.8205 | LR: 2.00e-06 [2026-04-22 02:53:15] Epoch 3 | Step 30660 | Loss: 0.8206 | LR: 2.00e-06 [2026-04-22 02:53:20] Epoch 3 | Step 30670 | Loss: 0.8206 | LR: 2.00e-06 [2026-04-22 02:53:25] Epoch 3 | Step 30680 | Loss: 0.8206 | LR: 2.00e-06 [2026-04-22 02:53:31] Epoch 3 | Step 30690 | Loss: 0.8205 | LR: 2.00e-06 [2026-04-22 02:53:36] Epoch 3 | Step 30700 | Loss: 0.8205 | LR: 2.00e-06 [2026-04-22 02:53:42] Epoch 3 | Step 30710 | Loss: 0.8205 | LR: 2.00e-06 [2026-04-22 02:53:47] Epoch 3 | Step 30720 | Loss: 0.8205 | LR: 2.00e-06 [2026-04-22 02:53:52] Epoch 3 | Step 30730 | Loss: 0.8204 | LR: 2.00e-06 [2026-04-22 02:53:57] Epoch 3 | Step 30740 | Loss: 0.8204 | LR: 2.00e-06 [2026-04-22 02:54:02] Epoch 3 | Step 30750 | Loss: 0.8205 | LR: 2.00e-06 [2026-04-22 02:54:07] Epoch 3 | Step 30760 | Loss: 0.8203 | LR: 2.00e-06 [2026-04-22 02:54:12] Epoch 3 | Step 30770 | Loss: 0.8202 | LR: 2.00e-06 [2026-04-22 02:54:17] Epoch 3 | Step 30780 | Loss: 0.8202 | LR: 2.00e-06 [2026-04-22 02:54:22] Epoch 3 | Step 30790 | Loss: 0.8202 | LR: 2.00e-06 [2026-04-22 02:54:28] Epoch 3 | Step 30800 | Loss: 0.8202 | LR: 2.00e-06 [2026-04-22 02:54:33] Epoch 3 | Step 30810 | Loss: 0.8201 | LR: 2.00e-06 [2026-04-22 02:54:38] Epoch 3 | Step 30820 | Loss: 0.8202 | LR: 2.00e-06 [2026-04-22 02:54:42] Epoch 3 | Step 30830 | Loss: 0.8202 | LR: 2.00e-06 [2026-04-22 02:54:47] Epoch 3 | Step 30840 | Loss: 0.8202 | LR: 2.00e-06 [2026-04-22 02:54:52] Epoch 3 | Step 30850 | Loss: 0.8202 | LR: 2.00e-06 [2026-04-22 02:54:58] Epoch 3 | Step 30860 | Loss: 0.8203 | LR: 2.00e-06 [2026-04-22 02:55:03] Epoch 3 | Step 30870 | Loss: 0.8202 | LR: 2.00e-06 [2026-04-22 02:55:09] Epoch 3 | Step 30880 | Loss: 0.8201 | LR: 2.00e-06 [2026-04-22 02:55:14] Epoch 3 | Step 30890 | Loss: 0.8201 | LR: 2.00e-06 [2026-04-22 02:55:20] Epoch 3 | Step 30900 | Loss: 0.8202 | LR: 2.00e-06 [2026-04-22 02:55:25] Epoch 3 | Step 30910 | Loss: 0.8202 | LR: 2.00e-06 [2026-04-22 02:55:30] Epoch 3 | Step 30920 | Loss: 0.8201 | LR: 2.00e-06 [2026-04-22 02:55:35] Epoch 3 | Step 30930 | Loss: 0.8202 | LR: 2.00e-06 [2026-04-22 02:55:40] Epoch 3 | Step 30940 | Loss: 0.8201 | LR: 2.00e-06 [2026-04-22 02:55:45] Epoch 3 | Step 30950 | Loss: 0.8202 | LR: 2.00e-06 [2026-04-22 02:55:51] Epoch 3 | Step 30960 | Loss: 0.8203 | LR: 2.00e-06 [2026-04-22 02:55:56] Epoch 3 | Step 30970 | Loss: 0.8203 | LR: 2.00e-06 [2026-04-22 02:56:01] Epoch 3 | Step 30980 | Loss: 0.8203 | LR: 2.00e-06 [2026-04-22 02:56:07] Epoch 3 | Step 30990 | Loss: 0.8204 | LR: 2.00e-06 [2026-04-22 02:56:12] Epoch 3 | Step 31000 | Loss: 0.8204 | LR: 2.00e-06 [2026-04-22 02:56:13] Validation | Batch 10/1567 | Loss: 1.0498 [2026-04-22 02:56:14] Validation | Batch 20/1567 | Loss: 1.1378 [2026-04-22 02:56:16] Validation | Batch 30/1567 | Loss: 1.0935 [2026-04-22 02:56:17] Validation | Batch 40/1567 | Loss: 1.1148 [2026-04-22 02:56:18] Validation | Batch 50/1567 | Loss: 1.0899 [2026-04-22 02:56:19] Validation | Batch 60/1567 | Loss: 1.0781 [2026-04-22 02:56:20] Validation | Batch 70/1567 | Loss: 1.0699 [2026-04-22 02:56:22] Validation | Batch 80/1567 | Loss: 1.0787 [2026-04-22 02:56:23] Validation | Batch 90/1567 | Loss: 1.0731 [2026-04-22 02:56:25] Validation | Batch 100/1567 | Loss: 1.0517 [2026-04-22 02:56:26] Validation | Batch 110/1567 | Loss: 1.0413 [2026-04-22 02:56:27] Validation | Batch 120/1567 | Loss: 1.0357 [2026-04-22 02:56:28] Validation | Batch 130/1567 | Loss: 1.0304 [2026-04-22 02:56:29] Validation | Batch 140/1567 | Loss: 1.0410 [2026-04-22 02:56:31] Validation | Batch 150/1567 | Loss: 1.0520 [2026-04-22 02:56:31] Validation | Batch 160/1567 | Loss: 1.0509 [2026-04-22 02:56:33] Validation | Batch 170/1567 | Loss: 1.0429 [2026-04-22 02:56:34] Validation | Batch 180/1567 | Loss: 1.0459 [2026-04-22 02:56:35] Validation | Batch 190/1567 | Loss: 1.0505 [2026-04-22 02:56:36] Validation | Batch 200/1567 | Loss: 1.0538 [2026-04-22 02:56:38] Validation | Batch 210/1567 | Loss: 1.0517 [2026-04-22 02:56:39] Validation | Batch 220/1567 | Loss: 1.0571 [2026-04-22 02:56:40] Validation | Batch 230/1567 | Loss: 1.0612 [2026-04-22 02:56:41] Validation | Batch 240/1567 | Loss: 1.0634 [2026-04-22 02:56:42] Validation | Batch 250/1567 | Loss: 1.0676 [2026-04-22 02:56:43] Validation | Batch 260/1567 | Loss: 1.0707 [2026-04-22 02:56:45] Validation | Batch 270/1567 | Loss: 1.0750 [2026-04-22 02:56:46] Validation | Batch 280/1567 | Loss: 1.0786 [2026-04-22 02:56:48] Validation | Batch 290/1567 | Loss: 1.0739 [2026-04-22 02:56:50] Validation | Batch 300/1567 | Loss: 1.0731 [2026-04-22 02:56:51] Validation | Batch 310/1567 | Loss: 1.0699 [2026-04-22 02:56:52] Validation | Batch 320/1567 | Loss: 1.0727 [2026-04-22 02:56:53] Validation | Batch 330/1567 | Loss: 1.0728 [2026-04-22 02:56:54] Validation | Batch 340/1567 | Loss: 1.0722 [2026-04-22 02:56:56] Validation | Batch 350/1567 | Loss: 1.0695 [2026-04-22 02:56:57] Validation | Batch 360/1567 | Loss: 1.0632 [2026-04-22 02:56:58] Validation | Batch 370/1567 | Loss: 1.0633 [2026-04-22 02:56:59] Validation | Batch 380/1567 | Loss: 1.0676 [2026-04-22 02:57:00] Validation | Batch 390/1567 | Loss: 1.0667 [2026-04-22 02:57:01] Validation | Batch 400/1567 | Loss: 1.0674 [2026-04-22 02:57:03] Validation | Batch 410/1567 | Loss: 1.0635 [2026-04-22 02:57:04] Validation | Batch 420/1567 | Loss: 1.0618 [2026-04-22 02:57:05] Validation | Batch 430/1567 | Loss: 1.0643 [2026-04-22 02:57:07] Validation | Batch 440/1567 | Loss: 1.0645 [2026-04-22 02:57:08] Validation | Batch 450/1567 | Loss: 1.0665 [2026-04-22 02:57:09] Validation | Batch 460/1567 | Loss: 1.0691 [2026-04-22 02:57:10] Validation | Batch 470/1567 | Loss: 1.0741 [2026-04-22 02:57:11] Validation | Batch 480/1567 | Loss: 1.0716 [2026-04-22 02:57:12] Validation | Batch 490/1567 | Loss: 1.0693 [2026-04-22 02:57:13] Validation | Batch 500/1567 | Loss: 1.0704 [2026-04-22 02:57:14] Validation | Batch 510/1567 | Loss: 1.0702 [2026-04-22 02:57:15] Validation | Batch 520/1567 | Loss: 1.0716 [2026-04-22 02:57:17] Validation | Batch 530/1567 | Loss: 1.0701 [2026-04-22 02:57:18] Validation | Batch 540/1567 | Loss: 1.0674 [2026-04-22 02:57:19] Validation | Batch 550/1567 | Loss: 1.0685 [2026-04-22 02:57:21] Validation | Batch 560/1567 | Loss: 1.0676 [2026-04-22 02:57:22] Validation | Batch 570/1567 | Loss: 1.0634 [2026-04-22 02:57:23] Validation | Batch 580/1567 | Loss: 1.0653 [2026-04-22 02:57:24] Validation | Batch 590/1567 | Loss: 1.0650 [2026-04-22 02:57:26] Validation | Batch 600/1567 | Loss: 1.0639 [2026-04-22 02:57:27] Validation | Batch 610/1567 | Loss: 1.0660 [2026-04-22 02:57:28] Validation | Batch 620/1567 | Loss: 1.0640 [2026-04-22 02:57:30] Validation | Batch 630/1567 | Loss: 1.0643 [2026-04-22 02:57:31] Validation | Batch 640/1567 | Loss: 1.0649 [2026-04-22 02:57:33] Validation | Batch 650/1567 | Loss: 1.0678 [2026-04-22 02:57:33] Validation | Batch 660/1567 | Loss: 1.0691 [2026-04-22 02:57:35] Validation | Batch 670/1567 | Loss: 1.0672 [2026-04-22 02:57:36] Validation | Batch 680/1567 | Loss: 1.0661 [2026-04-22 02:57:37] Validation | Batch 690/1567 | Loss: 1.0645 [2026-04-22 02:57:38] Validation | Batch 700/1567 | Loss: 1.0646 [2026-04-22 02:57:39] Validation | Batch 710/1567 | Loss: 1.0639 [2026-04-22 02:57:40] Validation | Batch 720/1567 | Loss: 1.0607 [2026-04-22 02:57:41] Validation | Batch 730/1567 | Loss: 1.0613 [2026-04-22 02:57:42] Validation | Batch 740/1567 | Loss: 1.0619 [2026-04-22 02:57:44] Validation | Batch 750/1567 | Loss: 1.0615 [2026-04-22 02:57:45] Validation | Batch 760/1567 | Loss: 1.0628 [2026-04-22 02:57:46] Validation | Batch 770/1567 | Loss: 1.0623 [2026-04-22 02:57:48] Validation | Batch 780/1567 | Loss: 1.0633 [2026-04-22 02:57:48] Validation | Batch 790/1567 | Loss: 1.0619 [2026-04-22 02:57:49] Validation | Batch 800/1567 | Loss: 1.0600 [2026-04-22 02:57:51] Validation | Batch 810/1567 | Loss: 1.0607 [2026-04-22 02:57:52] Validation | Batch 820/1567 | Loss: 1.0600 [2026-04-22 02:57:53] Validation | Batch 830/1567 | Loss: 1.0592 [2026-04-22 02:57:54] Validation | Batch 840/1567 | Loss: 1.0599 [2026-04-22 02:57:55] Validation | Batch 850/1567 | Loss: 1.0611 [2026-04-22 02:57:56] Validation | Batch 860/1567 | Loss: 1.0618 [2026-04-22 02:57:57] Validation | Batch 870/1567 | Loss: 1.0626 [2026-04-22 02:57:58] Validation | Batch 880/1567 | Loss: 1.0625 [2026-04-22 02:57:59] Validation | Batch 890/1567 | Loss: 1.0620 [2026-04-22 02:58:01] Validation | Batch 900/1567 | Loss: 1.0617 [2026-04-22 02:58:02] Validation | Batch 910/1567 | Loss: 1.0615 [2026-04-22 02:58:03] Validation | Batch 920/1567 | Loss: 1.0634 [2026-04-22 02:58:04] Validation | Batch 930/1567 | Loss: 1.0633 [2026-04-22 02:58:05] Validation | Batch 940/1567 | Loss: 1.0632 [2026-04-22 02:58:06] Validation | Batch 950/1567 | Loss: 1.0628 [2026-04-22 02:58:07] Validation | Batch 960/1567 | Loss: 1.0631 [2026-04-22 02:58:08] Validation | Batch 970/1567 | Loss: 1.0637 [2026-04-22 02:58:09] Validation | Batch 980/1567 | Loss: 1.0633 [2026-04-22 02:58:10] Validation | Batch 990/1567 | Loss: 1.0642 [2026-04-22 02:58:11] Validation | Batch 1000/1567 | Loss: 1.0647 [2026-04-22 02:58:12] Validation | Batch 1010/1567 | Loss: 1.0638 [2026-04-22 02:58:14] Validation | Batch 1020/1567 | Loss: 1.0650 [2026-04-22 02:58:15] Validation | Batch 1030/1567 | Loss: 1.0655 [2026-04-22 02:58:16] Validation | Batch 1040/1567 | Loss: 1.0646 [2026-04-22 02:58:17] Validation | Batch 1050/1567 | Loss: 1.0635 [2026-04-22 02:58:18] Validation | Batch 1060/1567 | Loss: 1.0647 [2026-04-22 02:58:20] Validation | Batch 1070/1567 | Loss: 1.0645 [2026-04-22 02:58:21] Validation | Batch 1080/1567 | Loss: 1.0659 [2026-04-22 02:58:22] Validation | Batch 1090/1567 | Loss: 1.0686 [2026-04-22 02:58:23] Validation | Batch 1100/1567 | Loss: 1.0701 [2026-04-22 02:58:25] Validation | Batch 1110/1567 | Loss: 1.0691 [2026-04-22 02:58:26] Validation | Batch 1120/1567 | Loss: 1.0692 [2026-04-22 02:58:27] Validation | Batch 1130/1567 | Loss: 1.0674 [2026-04-22 02:58:28] Validation | Batch 1140/1567 | Loss: 1.0679 [2026-04-22 02:58:29] Validation | Batch 1150/1567 | Loss: 1.0666 [2026-04-22 02:58:30] Validation | Batch 1160/1567 | Loss: 1.0660 [2026-04-22 02:58:31] Validation | Batch 1170/1567 | Loss: 1.0663 [2026-04-22 02:58:33] Validation | Batch 1180/1567 | Loss: 1.0665 [2026-04-22 02:58:34] Validation | Batch 1190/1567 | Loss: 1.0668 [2026-04-22 02:58:35] Validation | Batch 1200/1567 | Loss: 1.0655 [2026-04-22 02:58:36] Validation | Batch 1210/1567 | Loss: 1.0648 [2026-04-22 02:58:37] Validation | Batch 1220/1567 | Loss: 1.0658 [2026-04-22 02:58:38] Validation | Batch 1230/1567 | Loss: 1.0663 [2026-04-22 02:58:40] Validation | Batch 1240/1567 | Loss: 1.0661 [2026-04-22 02:58:41] Validation | Batch 1250/1567 | Loss: 1.0665 [2026-04-22 02:58:42] Validation | Batch 1260/1567 | Loss: 1.0662 [2026-04-22 02:58:43] Validation | Batch 1270/1567 | Loss: 1.0645 [2026-04-22 02:58:45] Validation | Batch 1280/1567 | Loss: 1.0646 [2026-04-22 02:58:46] Validation | Batch 1290/1567 | Loss: 1.0647 [2026-04-22 02:58:48] Validation | Batch 1300/1567 | Loss: 1.0650 [2026-04-22 02:58:48] Validation | Batch 1310/1567 | Loss: 1.0658 [2026-04-22 02:58:50] Validation | Batch 1320/1567 | Loss: 1.0664 [2026-04-22 02:58:51] Validation | Batch 1330/1567 | Loss: 1.0678 [2026-04-22 02:58:52] Validation | Batch 1340/1567 | Loss: 1.0675 [2026-04-22 02:58:53] Validation | Batch 1350/1567 | Loss: 1.0678 [2026-04-22 02:58:54] Validation | Batch 1360/1567 | Loss: 1.0669 [2026-04-22 02:58:55] Validation | Batch 1370/1567 | Loss: 1.0665 [2026-04-22 02:58:57] Validation | Batch 1380/1567 | Loss: 1.0666 [2026-04-22 02:58:58] Validation | Batch 1390/1567 | Loss: 1.0658 [2026-04-22 02:58:59] Validation | Batch 1400/1567 | Loss: 1.0655 [2026-04-22 02:59:00] Validation | Batch 1410/1567 | Loss: 1.0660 [2026-04-22 02:59:01] Validation | Batch 1420/1567 | Loss: 1.0660 [2026-04-22 02:59:02] Validation | Batch 1430/1567 | Loss: 1.0664 [2026-04-22 02:59:03] Validation | Batch 1440/1567 | Loss: 1.0672 [2026-04-22 02:59:04] Validation | Batch 1450/1567 | Loss: 1.0673 [2026-04-22 02:59:05] Validation | Batch 1460/1567 | Loss: 1.0666 [2026-04-22 02:59:06] Validation | Batch 1470/1567 | Loss: 1.0664 [2026-04-22 02:59:07] Validation | Batch 1480/1567 | Loss: 1.0662 [2026-04-22 02:59:08] Validation | Batch 1490/1567 | Loss: 1.0656 [2026-04-22 02:59:09] Validation | Batch 1500/1567 | Loss: 1.0654 [2026-04-22 02:59:10] Validation | Batch 1510/1567 | Loss: 1.0644 [2026-04-22 02:59:11] Validation | Batch 1520/1567 | Loss: 1.0643 [2026-04-22 02:59:12] Validation | Batch 1530/1567 | Loss: 1.0643 [2026-04-22 02:59:13] Validation | Batch 1540/1567 | Loss: 1.0650 [2026-04-22 02:59:14] Validation | Batch 1550/1567 | Loss: 1.0663 [2026-04-22 02:59:16] Validation | Batch 1560/1567 | Loss: 1.0659 [2026-04-22 02:59:17] Validation | Batch 1567/1567 | Loss: 1.0659 [2026-04-22 02:59:17] Validation | Loss: 1.0659 | PPL: 2.95 | Time: 184.68s [2026-04-22 02:59:21] Epoch 3 | Step 31010 | Loss: 0.8203 | LR: 2.00e-06 [2026-04-22 02:59:26] Epoch 3 | Step 31020 | Loss: 0.8203 | LR: 2.00e-06 [2026-04-22 02:59:32] Epoch 3 | Step 31030 | Loss: 0.8203 | LR: 2.00e-06 [2026-04-22 02:59:38] Epoch 3 | Step 31040 | Loss: 0.8202 | LR: 2.00e-06 [2026-04-22 02:59:43] Epoch 3 | Step 31050 | Loss: 0.8203 | LR: 2.00e-06 [2026-04-22 02:59:49] Epoch 3 | Step 31060 | Loss: 0.8201 | LR: 2.00e-06 [2026-04-22 02:59:53] Epoch 3 | Step 31070 | Loss: 0.8201 | LR: 2.00e-06 [2026-04-22 02:59:59] Epoch 3 | Step 31080 | Loss: 0.8200 | LR: 2.00e-06 [2026-04-22 03:00:04] Epoch 3 | Step 31090 | Loss: 0.8199 | LR: 2.00e-06 [2026-04-22 03:00:10] Epoch 3 | Step 31100 | Loss: 0.8199 | LR: 2.00e-06 [2026-04-22 03:00:16] Epoch 3 | Step 31110 | Loss: 0.8200 | LR: 2.00e-06 [2026-04-22 03:00:22] Epoch 3 | Step 31120 | Loss: 0.8199 | LR: 2.00e-06 [2026-04-22 03:00:26] Epoch 3 | Step 31130 | Loss: 0.8200 | LR: 2.00e-06 [2026-04-22 03:00:32] Epoch 3 | Step 31140 | Loss: 0.8199 | LR: 2.00e-06 [2026-04-22 03:00:36] Epoch 3 | Step 31150 | Loss: 0.8199 | LR: 2.00e-06 [2026-04-22 03:00:41] Epoch 3 | Step 31160 | Loss: 0.8200 | LR: 2.00e-06 [2026-04-22 03:00:46] Epoch 3 | Step 31170 | Loss: 0.8199 | LR: 2.00e-06 [2026-04-22 03:00:51] Epoch 3 | Step 31180 | Loss: 0.8200 | LR: 2.00e-06 [2026-04-22 03:00:56] Epoch 3 | Step 31190 | Loss: 0.8200 | LR: 2.00e-06 [2026-04-22 03:01:01] Epoch 3 | Step 31200 | Loss: 0.8200 | LR: 2.00e-06 [2026-04-22 03:01:06] Epoch 3 | Step 31210 | Loss: 0.8201 | LR: 2.00e-06 [2026-04-22 03:01:11] Epoch 3 | Step 31220 | Loss: 0.8200 | LR: 2.00e-06 [2026-04-22 03:01:17] Epoch 3 | Step 31230 | Loss: 0.8200 | LR: 2.00e-06 [2026-04-22 03:01:22] Epoch 3 | Step 31240 | Loss: 0.8200 | LR: 2.00e-06 [2026-04-22 03:01:27] Epoch 3 | Step 31250 | Loss: 0.8200 | LR: 2.00e-06 [2026-04-22 03:01:32] Epoch 3 | Step 31260 | Loss: 0.8200 | LR: 2.00e-06 [2026-04-22 03:01:38] Epoch 3 | Step 31270 | Loss: 0.8200 | LR: 2.00e-06 [2026-04-22 03:01:42] Epoch 3 | Step 31280 | Loss: 0.8200 | LR: 2.00e-06 [2026-04-22 03:01:47] Epoch 3 | Step 31290 | Loss: 0.8199 | LR: 2.00e-06 [2026-04-22 03:01:52] Epoch 3 | Step 31300 | Loss: 0.8199 | LR: 2.00e-06 [2026-04-22 03:01:58] Epoch 3 | Step 31310 | Loss: 0.8199 | LR: 2.00e-06 [2026-04-22 03:02:03] Epoch 3 | Step 31320 | Loss: 0.8200 | LR: 2.00e-06 [2026-04-22 03:02:08] Epoch 3 | Step 31330 | Loss: 0.8200 | LR: 2.00e-06 [2026-04-22 03:02:14] Epoch 3 | Step 31340 | Loss: 0.8199 | LR: 2.00e-06 [2026-04-22 03:02:19] Epoch 3 | Step 31350 | Loss: 0.8200 | LR: 2.00e-06 [2026-04-22 03:02:25] Epoch 3 | Step 31360 | Loss: 0.8200 | LR: 2.00e-06 [2026-04-22 03:02:30] Epoch 3 | Step 31370 | Loss: 0.8199 | LR: 2.00e-06 [2026-04-22 03:02:35] Epoch 3 | Step 31380 | Loss: 0.8198 | LR: 2.00e-06 [2026-04-22 03:02:41] Epoch 3 | Step 31390 | Loss: 0.8198 | LR: 2.00e-06 [2026-04-22 03:02:46] Epoch 3 | Step 31400 | Loss: 0.8199 | LR: 2.00e-06 [2026-04-22 03:02:51] Epoch 3 | Step 31410 | Loss: 0.8197 | LR: 2.00e-06 [2026-04-22 03:02:57] Epoch 3 | Step 31420 | Loss: 0.8196 | LR: 2.00e-06 [2026-04-22 03:03:02] Epoch 3 | Step 31430 | Loss: 0.8195 | LR: 2.00e-06 [2026-04-22 03:03:08] Epoch 3 | Step 31440 | Loss: 0.8195 | LR: 2.00e-06 [2026-04-22 03:03:13] Epoch 3 | Step 31450 | Loss: 0.8195 | LR: 2.00e-06 [2026-04-22 03:03:19] Epoch 3 | Step 31460 | Loss: 0.8196 | LR: 2.00e-06 [2026-04-22 03:03:24] Epoch 3 | Step 31470 | Loss: 0.8195 | LR: 2.00e-06 [2026-04-22 03:03:29] Epoch 3 | Step 31480 | Loss: 0.8194 | LR: 2.00e-06 [2026-04-22 03:03:35] Epoch 3 | Step 31490 | Loss: 0.8194 | LR: 2.00e-06 [2026-04-22 03:03:40] Epoch 3 | Step 31500 | Loss: 0.8193 | LR: 2.00e-06 [2026-04-22 03:03:44] Epoch 3 | Step 31510 | Loss: 0.8193 | LR: 2.00e-06 [2026-04-22 03:03:49] Epoch 3 | Step 31520 | Loss: 0.8193 | LR: 2.00e-06 [2026-04-22 03:03:55] Epoch 3 | Step 31530 | Loss: 0.8194 | LR: 2.00e-06 [2026-04-22 03:04:00] Epoch 3 | Step 31540 | Loss: 0.8193 | LR: 2.00e-06 [2026-04-22 03:04:07] Epoch 3 | Step 31550 | Loss: 0.8193 | LR: 2.00e-06 [2026-04-22 03:04:12] Epoch 3 | Step 31560 | Loss: 0.8192 | LR: 2.00e-06 [2026-04-22 03:04:18] Epoch 3 | Step 31570 | Loss: 0.8193 | LR: 2.00e-06 [2026-04-22 03:04:23] Epoch 3 | Step 31580 | Loss: 0.8192 | LR: 2.00e-06 [2026-04-22 03:04:28] Epoch 3 | Step 31590 | Loss: 0.8192 | LR: 2.00e-06 [2026-04-22 03:04:34] Epoch 3 | Step 31600 | Loss: 0.8193 | LR: 2.00e-06 [2026-04-22 03:04:40] Epoch 3 | Step 31610 | Loss: 0.8192 | LR: 2.00e-06 [2026-04-22 03:04:45] Epoch 3 | Step 31620 | Loss: 0.8193 | LR: 2.00e-06 [2026-04-22 03:04:51] Epoch 3 | Step 31630 | Loss: 0.8193 | LR: 2.00e-06 [2026-04-22 03:04:57] Epoch 3 | Step 31640 | Loss: 0.8193 | LR: 2.00e-06 [2026-04-22 03:05:02] Epoch 3 | Step 31650 | Loss: 0.8194 | LR: 2.00e-06 [2026-04-22 03:05:07] Epoch 3 | Step 31660 | Loss: 0.8194 | LR: 2.00e-06 [2026-04-22 03:05:13] Epoch 3 | Step 31670 | Loss: 0.8193 | LR: 2.00e-06 [2026-04-22 03:05:18] Epoch 3 | Step 31680 | Loss: 0.8194 | LR: 2.00e-06 [2026-04-22 03:05:24] Epoch 3 | Step 31690 | Loss: 0.8194 | LR: 2.00e-06 [2026-04-22 03:05:29] Epoch 3 | Step 31700 | Loss: 0.8194 | LR: 2.00e-06 [2026-04-22 03:05:35] Epoch 3 | Step 31710 | Loss: 0.8193 | LR: 2.00e-06 [2026-04-22 03:05:40] Epoch 3 | Step 31720 | Loss: 0.8193 | LR: 2.00e-06 [2026-04-22 03:05:45] Epoch 3 | Step 31730 | Loss: 0.8193 | LR: 2.00e-06 [2026-04-22 03:05:51] Epoch 3 | Step 31740 | Loss: 0.8193 | LR: 2.00e-06 [2026-04-22 03:05:56] Epoch 3 | Step 31750 | Loss: 0.8193 | LR: 2.00e-06 [2026-04-22 03:06:02] Epoch 3 | Step 31760 | Loss: 0.8194 | LR: 2.00e-06 [2026-04-22 03:06:07] Epoch 3 | Step 31770 | Loss: 0.8193 | LR: 2.00e-06 [2026-04-22 03:06:10] Epoch 3 completed in 7722.56s | Loss: 0.8194 [2026-04-22 03:06:20] Checkpoint saved: outputs/2026-04-21/20-28-37/checkpoints/checkpoint_step_31773.pt [2026-04-22 03:07:33] Training completed! [2026-04-22 03:07:36] Final model: outputs/2026-04-21/20-28-37/model_final.pt