[2026-04-18 12:19:15] CUDA_VISIBLE_DEVICES: 0,1 [2026-04-18 12:19:15] Number of processes: 2 [2026-04-18 12:19:15] Process index: 0 [2026-04-18 12:19:15] Mixed precision: bf16 [2026-04-18 12:19:15] ============================================================ [2026-04-18 12:19:15] Pythia Training Pipeline (Hydra + Trackio + Accelerate) [2026-04-18 12:19:15] ============================================================ [2026-04-18 12:19:15] Config: model: name: EleutherAI/pythia-1.4b checkpoint_path: null from_scratch: false training: epochs: 3 batch_size: 4 eval_batch_size: 12 gradient_accumulation_steps: 4 lr: 2.0e-05 weight_decay: 0.1 betas: - 0.9 - 0.95 eps: 1.0e-08 lr_scheduler: wsd warmup_ratio: 0.1 decay_ratio: 0.2 warmup_steps: 100 min_lr_ratio: 0.1 max_grad_norm: 1.0 use_amp: true resume: false resume_checkpoint: null data: path: /workspace/byte-llms-code/code_completion_exp/datasets/data_V4_full max_context_len: 4096 max_target_len: 256 num_workers: 4 pin_memory: true logging: log_interval: 10 save_interval: 3000 eval_interval: 1000 save_every_epoch: true tracking: enabled: true backend: wandb project: code-completion_full run_name: pythia_1_4b_v4_lr_2e-5 entity: null base_url: https://wandb.platun0v.ru local_dir: outputs/2026-04-18/12-19-14 paths: output_dir: outputs/2026-04-18/12-19-14 seed: 42 device: cuda [2026-04-18 12:19:17] Initializing tokenizer... [2026-04-18 12:19:18] Loading model... [2026-04-18 12:19:21] Loaded pretrained: EleutherAI/pythia-1.4b [2026-04-18 12:19:21] Total params: 1,414,647,808 [2026-04-18 12:19:21] Trainable params: 1,414,647,808 [2026-04-18 12:19:21] Creating dataloaders... [2026-04-18 12:19:21] Train dataset size: 316397 [2026-04-18 12:19:21] Train batches per epoch (before DDP split): 79100 [2026-04-18 12:19:21] Validation dataset size: 37592 [2026-04-18 12:19:21] Validation batches: 3133 [2026-04-18 12:19:21] Creating optimizer... [2026-04-18 12:19:21] Total steps: 29662, Steps per epoch: 39550 [2026-04-18 12:19:21] Preparing model, optimizer, and dataloaders with Accelerate... [2026-04-18 12:19:23] Train batches per epoch (after DDP split): 39550 [2026-04-18 12:19:23] Starting training... [2026-04-18 12:19:23] ============================================================ [2026-04-18 12:19:23] EPOCH 1/3 [2026-04-18 12:19:23] ============================================================ [2026-04-18 12:19:27] Epoch 1 | Step 10 | Loss: 2.4238 | LR: 2.12e-06 [2026-04-18 12:19:30] Epoch 1 | Step 20 | Loss: 2.3757 | LR: 2.24e-06 [2026-04-18 12:19:34] Epoch 1 | Step 30 | Loss: 2.2496 | LR: 2.36e-06 [2026-04-18 12:19:38] Epoch 1 | Step 40 | Loss: 2.1505 | LR: 2.49e-06 [2026-04-18 12:19:42] Epoch 1 | Step 50 | Loss: 2.0414 | LR: 2.61e-06 [2026-04-18 12:19:45] Epoch 1 | Step 60 | Loss: 1.9492 | LR: 2.73e-06 [2026-04-18 12:19:49] Epoch 1 | Step 70 | Loss: 1.8644 | LR: 2.85e-06 [2026-04-18 12:19:53] Epoch 1 | Step 80 | Loss: 1.8147 | LR: 2.97e-06 [2026-04-18 12:19:56] Epoch 1 | Step 90 | Loss: 1.7557 | LR: 3.09e-06 [2026-04-18 12:20:00] Epoch 1 | Step 100 | Loss: 1.7045 | LR: 3.21e-06 [2026-04-18 12:20:03] Epoch 1 | Step 110 | Loss: 1.6767 | LR: 3.34e-06 [2026-04-18 12:20:06] Epoch 1 | Step 120 | Loss: 1.6454 | LR: 3.46e-06 [2026-04-18 12:20:10] Epoch 1 | Step 130 | Loss: 1.6209 | LR: 3.58e-06 [2026-04-18 12:20:13] Epoch 1 | Step 140 | Loss: 1.5994 | LR: 3.70e-06 [2026-04-18 12:20:17] Epoch 1 | Step 150 | Loss: 1.5644 | LR: 3.82e-06 [2026-04-18 12:20:20] Epoch 1 | Step 160 | Loss: 1.5346 | LR: 3.94e-06 [2026-04-18 12:20:24] Epoch 1 | Step 170 | Loss: 1.5116 | LR: 4.06e-06 [2026-04-18 12:20:27] Epoch 1 | Step 180 | Loss: 1.4839 | LR: 4.18e-06 [2026-04-18 12:20:31] Epoch 1 | Step 190 | Loss: 1.4663 | LR: 4.31e-06 [2026-04-18 12:20:35] Epoch 1 | Step 200 | Loss: 1.4501 | LR: 4.43e-06 [2026-04-18 12:20:38] Epoch 1 | Step 210 | Loss: 1.4425 | LR: 4.55e-06 [2026-04-18 12:20:42] Epoch 1 | Step 220 | Loss: 1.4280 | LR: 4.67e-06 [2026-04-18 12:20:45] Epoch 1 | Step 230 | Loss: 1.4074 | LR: 4.79e-06 [2026-04-18 12:20:49] Epoch 1 | Step 240 | Loss: 1.3914 | LR: 4.91e-06 [2026-04-18 12:20:52] Epoch 1 | Step 250 | Loss: 1.3797 | LR: 5.03e-06 [2026-04-18 12:20:56] Epoch 1 | Step 260 | Loss: 1.3743 | LR: 5.16e-06 [2026-04-18 12:20:59] Epoch 1 | Step 270 | Loss: 1.3614 | LR: 5.28e-06 [2026-04-18 12:21:03] Epoch 1 | Step 280 | Loss: 1.3473 | LR: 5.40e-06 [2026-04-18 12:21:06] Epoch 1 | Step 290 | Loss: 1.3371 | LR: 5.52e-06 [2026-04-18 12:21:09] Epoch 1 | Step 300 | Loss: 1.3282 | LR: 5.64e-06 [2026-04-18 12:21:13] Epoch 1 | Step 310 | Loss: 1.3176 | LR: 5.76e-06 [2026-04-18 12:21:16] Epoch 1 | Step 320 | Loss: 1.3071 | LR: 5.88e-06 [2026-04-18 12:21:20] Epoch 1 | Step 330 | Loss: 1.2966 | LR: 6.01e-06 [2026-04-18 12:21:24] Epoch 1 | Step 340 | Loss: 1.2876 | LR: 6.13e-06 [2026-04-18 12:21:27] Epoch 1 | Step 350 | Loss: 1.2811 | LR: 6.25e-06 [2026-04-18 12:21:31] Epoch 1 | Step 360 | Loss: 1.2712 | LR: 6.37e-06 [2026-04-18 12:21:34] Epoch 1 | Step 370 | Loss: 1.2615 | LR: 6.49e-06 [2026-04-18 12:21:38] Epoch 1 | Step 380 | Loss: 1.2539 | LR: 6.61e-06 [2026-04-18 12:21:42] Epoch 1 | Step 390 | Loss: 1.2471 | LR: 6.73e-06 [2026-04-18 12:21:46] Epoch 1 | Step 400 | Loss: 1.2404 | LR: 6.86e-06 [2026-04-18 12:21:49] Epoch 1 | Step 410 | Loss: 1.2351 | LR: 6.98e-06 [2026-04-18 12:21:53] Epoch 1 | Step 420 | Loss: 1.2299 | LR: 7.10e-06 [2026-04-18 12:21:57] Epoch 1 | Step 430 | Loss: 1.2279 | LR: 7.22e-06 [2026-04-18 12:22:00] Epoch 1 | Step 440 | Loss: 1.2200 | LR: 7.34e-06 [2026-04-18 12:22:04] Epoch 1 | Step 450 | Loss: 1.2150 | LR: 7.46e-06 [2026-04-18 12:22:08] Epoch 1 | Step 460 | Loss: 1.2110 | LR: 7.58e-06 [2026-04-18 12:22:11] Epoch 1 | Step 470 | Loss: 1.2055 | LR: 7.70e-06 [2026-04-18 12:22:14] Epoch 1 | Step 480 | Loss: 1.2020 | LR: 7.83e-06 [2026-04-18 12:22:18] Epoch 1 | Step 490 | Loss: 1.1960 | LR: 7.95e-06 [2026-04-18 12:22:21] Epoch 1 | Step 500 | Loss: 1.1907 | LR: 8.07e-06 [2026-04-18 12:22:24] Epoch 1 | Step 510 | Loss: 1.1859 | LR: 8.19e-06 [2026-04-18 12:22:28] Epoch 1 | Step 520 | Loss: 1.1833 | LR: 8.31e-06 [2026-04-18 12:22:31] Epoch 1 | Step 530 | Loss: 1.1785 | LR: 8.43e-06 [2026-04-18 12:22:35] Epoch 1 | Step 540 | Loss: 1.1726 | LR: 8.55e-06 [2026-04-18 12:22:39] Epoch 1 | Step 550 | Loss: 1.1677 | LR: 8.68e-06 [2026-04-18 12:22:42] Epoch 1 | Step 560 | Loss: 1.1635 | LR: 8.80e-06 [2026-04-18 12:22:45] Epoch 1 | Step 570 | Loss: 1.1604 | LR: 8.92e-06 [2026-04-18 12:22:48] Epoch 1 | Step 580 | Loss: 1.1580 | LR: 9.04e-06 [2026-04-18 12:22:52] Epoch 1 | Step 590 | Loss: 1.1561 | LR: 9.16e-06 [2026-04-18 12:22:55] Epoch 1 | Step 600 | Loss: 1.1538 | LR: 9.28e-06 [2026-04-18 12:22:59] Epoch 1 | Step 610 | Loss: 1.1523 | LR: 9.40e-06 [2026-04-18 12:23:02] Epoch 1 | Step 620 | Loss: 1.1508 | LR: 9.53e-06 [2026-04-18 12:23:06] Epoch 1 | Step 630 | Loss: 1.1478 | LR: 9.65e-06 [2026-04-18 12:23:09] Epoch 1 | Step 640 | Loss: 1.1465 | LR: 9.77e-06 [2026-04-18 12:23:13] Epoch 1 | Step 650 | Loss: 1.1435 | LR: 9.89e-06 [2026-04-18 12:23:16] Epoch 1 | Step 660 | Loss: 1.1430 | LR: 1.00e-05 [2026-04-18 12:23:20] Epoch 1 | Step 670 | Loss: 1.1401 | LR: 1.01e-05 [2026-04-18 12:23:24] Epoch 1 | Step 680 | Loss: 1.1389 | LR: 1.03e-05 [2026-04-18 12:23:27] Epoch 1 | Step 690 | Loss: 1.1360 | LR: 1.04e-05 [2026-04-18 12:23:31] Epoch 1 | Step 700 | Loss: 1.1356 | LR: 1.05e-05 [2026-04-18 12:23:34] Epoch 1 | Step 710 | Loss: 1.1331 | LR: 1.06e-05 [2026-04-18 12:23:38] Epoch 1 | Step 720 | Loss: 1.1311 | LR: 1.07e-05 [2026-04-18 12:23:41] Epoch 1 | Step 730 | Loss: 1.1294 | LR: 1.09e-05 [2026-04-18 12:23:45] Epoch 1 | Step 740 | Loss: 1.1274 | LR: 1.10e-05 [2026-04-18 12:23:49] Epoch 1 | Step 750 | Loss: 1.1248 | LR: 1.11e-05 [2026-04-18 12:23:52] Epoch 1 | Step 760 | Loss: 1.1245 | LR: 1.12e-05 [2026-04-18 12:23:56] Epoch 1 | Step 770 | Loss: 1.1243 | LR: 1.13e-05 [2026-04-18 12:24:00] Epoch 1 | Step 780 | Loss: 1.1225 | LR: 1.15e-05 [2026-04-18 12:24:03] Epoch 1 | Step 790 | Loss: 1.1210 | LR: 1.16e-05 [2026-04-18 12:24:07] Epoch 1 | Step 800 | Loss: 1.1187 | LR: 1.17e-05 [2026-04-18 12:24:10] Epoch 1 | Step 810 | Loss: 1.1172 | LR: 1.18e-05 [2026-04-18 12:24:14] Epoch 1 | Step 820 | Loss: 1.1153 | LR: 1.20e-05 [2026-04-18 12:24:18] Epoch 1 | Step 830 | Loss: 1.1143 | LR: 1.21e-05 [2026-04-18 12:24:21] Epoch 1 | Step 840 | Loss: 1.1123 | LR: 1.22e-05 [2026-04-18 12:24:25] Epoch 1 | Step 850 | Loss: 1.1095 | LR: 1.23e-05 [2026-04-18 12:24:28] Epoch 1 | Step 860 | Loss: 1.1087 | LR: 1.24e-05 [2026-04-18 12:24:32] Epoch 1 | Step 870 | Loss: 1.1086 | LR: 1.26e-05 [2026-04-18 12:24:35] Epoch 1 | Step 880 | Loss: 1.1080 | LR: 1.27e-05 [2026-04-18 12:24:38] Epoch 1 | Step 890 | Loss: 1.1073 | LR: 1.28e-05 [2026-04-18 12:24:42] Epoch 1 | Step 900 | Loss: 1.1058 | LR: 1.29e-05 [2026-04-18 12:24:45] Epoch 1 | Step 910 | Loss: 1.1054 | LR: 1.30e-05 [2026-04-18 12:24:49] Epoch 1 | Step 920 | Loss: 1.1054 | LR: 1.32e-05 [2026-04-18 12:24:53] Epoch 1 | Step 930 | Loss: 1.1043 | LR: 1.33e-05 [2026-04-18 12:24:57] Epoch 1 | Step 940 | Loss: 1.1038 | LR: 1.34e-05 [2026-04-18 12:25:00] Epoch 1 | Step 950 | Loss: 1.1017 | LR: 1.35e-05 [2026-04-18 12:25:05] Epoch 1 | Step 960 | Loss: 1.1001 | LR: 1.37e-05 [2026-04-18 12:25:08] Epoch 1 | Step 970 | Loss: 1.0992 | LR: 1.38e-05 [2026-04-18 12:25:12] Epoch 1 | Step 980 | Loss: 1.0978 | LR: 1.39e-05 [2026-04-18 12:25:16] Epoch 1 | Step 990 | Loss: 1.0956 | LR: 1.40e-05 [2026-04-18 12:25:19] Epoch 1 | Step 1000 | Loss: 1.0947 | LR: 1.41e-05 [2026-04-18 12:25:20] Validation | Batch 10/1567 | Loss: 0.9659 [2026-04-18 12:25:21] Validation | Batch 20/1567 | Loss: 1.0161 [2026-04-18 12:25:22] Validation | Batch 30/1567 | Loss: 1.0658 [2026-04-18 12:25:23] Validation | Batch 40/1567 | Loss: 1.0936 [2026-04-18 12:25:24] Validation | Batch 50/1567 | Loss: 1.0759 [2026-04-18 12:25:25] Validation | Batch 60/1567 | Loss: 1.0613 [2026-04-18 12:25:25] Validation | Batch 70/1567 | Loss: 1.0453 [2026-04-18 12:25:27] Validation | Batch 80/1567 | Loss: 1.0592 [2026-04-18 12:25:27] Validation | Batch 90/1567 | Loss: 1.0658 [2026-04-18 12:25:28] Validation | Batch 100/1567 | Loss: 1.0704 [2026-04-18 12:25:29] Validation | Batch 110/1567 | Loss: 1.0619 [2026-04-18 12:25:30] Validation | Batch 120/1567 | Loss: 1.0722 [2026-04-18 12:25:31] Validation | Batch 130/1567 | Loss: 1.0744 [2026-04-18 12:25:32] Validation | Batch 140/1567 | Loss: 1.0797 [2026-04-18 12:25:32] Validation | Batch 150/1567 | Loss: 1.0852 [2026-04-18 12:25:33] Validation | Batch 160/1567 | Loss: 1.0871 [2026-04-18 12:25:34] Validation | Batch 170/1567 | Loss: 1.0745 [2026-04-18 12:25:35] Validation | Batch 180/1567 | Loss: 1.0754 [2026-04-18 12:25:36] Validation | Batch 190/1567 | Loss: 1.0736 [2026-04-18 12:25:37] Validation | Batch 200/1567 | Loss: 1.0760 [2026-04-18 12:25:37] Validation | Batch 210/1567 | Loss: 1.0773 [2026-04-18 12:25:38] Validation | Batch 220/1567 | Loss: 1.0803 [2026-04-18 12:25:39] Validation | Batch 230/1567 | Loss: 1.0851 [2026-04-18 12:25:40] Validation | Batch 240/1567 | Loss: 1.0826 [2026-04-18 12:25:41] Validation | Batch 250/1567 | Loss: 1.0768 [2026-04-18 12:25:41] Validation | Batch 260/1567 | Loss: 1.0720 [2026-04-18 12:25:42] Validation | Batch 270/1567 | Loss: 1.0679 [2026-04-18 12:25:43] Validation | Batch 280/1567 | Loss: 1.0690 [2026-04-18 12:25:44] Validation | Batch 290/1567 | Loss: 1.0748 [2026-04-18 12:25:45] Validation | Batch 300/1567 | Loss: 1.0798 [2026-04-18 12:25:46] Validation | Batch 310/1567 | Loss: 1.0794 [2026-04-18 12:25:46] Validation | Batch 320/1567 | Loss: 1.0788 [2026-04-18 12:25:48] Validation | Batch 330/1567 | Loss: 1.0765 [2026-04-18 12:25:48] Validation | Batch 340/1567 | Loss: 1.0797 [2026-04-18 12:25:49] Validation | Batch 350/1567 | Loss: 1.0785 [2026-04-18 12:25:50] Validation | Batch 360/1567 | Loss: 1.0768 [2026-04-18 12:25:51] Validation | Batch 370/1567 | Loss: 1.0738 [2026-04-18 12:25:52] Validation | Batch 380/1567 | Loss: 1.0766 [2026-04-18 12:25:52] Validation | Batch 390/1567 | Loss: 1.0773 [2026-04-18 12:25:53] Validation | Batch 400/1567 | Loss: 1.0785 [2026-04-18 12:25:54] Validation | Batch 410/1567 | Loss: 1.0785 [2026-04-18 12:25:55] Validation | Batch 420/1567 | Loss: 1.0788 [2026-04-18 12:25:56] Validation | Batch 430/1567 | Loss: 1.0785 [2026-04-18 12:25:57] Validation | Batch 440/1567 | Loss: 1.0771 [2026-04-18 12:25:57] Validation | Batch 450/1567 | Loss: 1.0774 [2026-04-18 12:25:58] Validation | Batch 460/1567 | Loss: 1.0760 [2026-04-18 12:25:59] Validation | Batch 470/1567 | Loss: 1.0743 [2026-04-18 12:26:00] Validation | Batch 480/1567 | Loss: 1.0721 [2026-04-18 12:26:01] Validation | Batch 490/1567 | Loss: 1.0717 [2026-04-18 12:26:01] Validation | Batch 500/1567 | Loss: 1.0729 [2026-04-18 12:26:02] Validation | Batch 510/1567 | Loss: 1.0759 [2026-04-18 12:26:03] Validation | Batch 520/1567 | Loss: 1.0773 [2026-04-18 12:26:04] Validation | Batch 530/1567 | Loss: 1.0770 [2026-04-18 12:26:05] Validation | Batch 540/1567 | Loss: 1.0790 [2026-04-18 12:26:06] Validation | Batch 550/1567 | Loss: 1.0820 [2026-04-18 12:26:06] Validation | Batch 560/1567 | Loss: 1.0815 [2026-04-18 12:26:07] Validation | Batch 570/1567 | Loss: 1.0804 [2026-04-18 12:26:08] Validation | Batch 580/1567 | Loss: 1.0794 [2026-04-18 12:26:09] Validation | Batch 590/1567 | Loss: 1.0787 [2026-04-18 12:26:10] Validation | Batch 600/1567 | Loss: 1.0766 [2026-04-18 12:26:11] Validation | Batch 610/1567 | Loss: 1.0764 [2026-04-18 12:26:12] Validation | Batch 620/1567 | Loss: 1.0777 [2026-04-18 12:26:13] Validation | Batch 630/1567 | Loss: 1.0756 [2026-04-18 12:26:13] Validation | Batch 640/1567 | Loss: 1.0766 [2026-04-18 12:26:15] Validation | Batch 650/1567 | Loss: 1.0759 [2026-04-18 12:26:15] Validation | Batch 660/1567 | Loss: 1.0744 [2026-04-18 12:26:16] Validation | Batch 670/1567 | Loss: 1.0719 [2026-04-18 12:26:17] Validation | Batch 680/1567 | Loss: 1.0713 [2026-04-18 12:26:17] Validation | Batch 690/1567 | Loss: 1.0723 [2026-04-18 12:26:18] Validation | Batch 700/1567 | Loss: 1.0707 [2026-04-18 12:26:19] Validation | Batch 710/1567 | Loss: 1.0719 [2026-04-18 12:26:20] Validation | Batch 720/1567 | Loss: 1.0716 [2026-04-18 12:26:21] Validation | Batch 730/1567 | Loss: 1.0727 [2026-04-18 12:26:21] Validation | Batch 740/1567 | Loss: 1.0732 [2026-04-18 12:26:22] Validation | Batch 750/1567 | Loss: 1.0729 [2026-04-18 12:26:23] Validation | Batch 760/1567 | Loss: 1.0729 [2026-04-18 12:26:24] Validation | Batch 770/1567 | Loss: 1.0748 [2026-04-18 12:26:25] Validation | Batch 780/1567 | Loss: 1.0759 [2026-04-18 12:26:26] Validation | Batch 790/1567 | Loss: 1.0753 [2026-04-18 12:26:26] Validation | Batch 800/1567 | Loss: 1.0773 [2026-04-18 12:26:27] Validation | Batch 810/1567 | Loss: 1.0774 [2026-04-18 12:26:28] Validation | Batch 820/1567 | Loss: 1.0775 [2026-04-18 12:26:29] Validation | Batch 830/1567 | Loss: 1.0757 [2026-04-18 12:26:29] Validation | Batch 840/1567 | Loss: 1.0759 [2026-04-18 12:26:30] Validation | Batch 850/1567 | Loss: 1.0745 [2026-04-18 12:26:31] Validation | Batch 860/1567 | Loss: 1.0759 [2026-04-18 12:26:31] Validation | Batch 870/1567 | Loss: 1.0767 [2026-04-18 12:26:32] Validation | Batch 880/1567 | Loss: 1.0776 [2026-04-18 12:26:33] Validation | Batch 890/1567 | Loss: 1.0778 [2026-04-18 12:26:34] Validation | Batch 900/1567 | Loss: 1.0797 [2026-04-18 12:26:34] Validation | Batch 910/1567 | Loss: 1.0803 [2026-04-18 12:26:35] Validation | Batch 920/1567 | Loss: 1.0818 [2026-04-18 12:26:36] Validation | Batch 930/1567 | Loss: 1.0794 [2026-04-18 12:26:37] Validation | Batch 940/1567 | Loss: 1.0790 [2026-04-18 12:26:37] Validation | Batch 950/1567 | Loss: 1.0781 [2026-04-18 12:26:38] Validation | Batch 960/1567 | Loss: 1.0770 [2026-04-18 12:26:39] Validation | Batch 970/1567 | Loss: 1.0781 [2026-04-18 12:26:40] Validation | Batch 980/1567 | Loss: 1.0785 [2026-04-18 12:26:40] Validation | Batch 990/1567 | Loss: 1.0778 [2026-04-18 12:26:41] Validation | Batch 1000/1567 | Loss: 1.0779 [2026-04-18 12:26:42] Validation | Batch 1010/1567 | Loss: 1.0761 [2026-04-18 12:26:42] Validation | Batch 1020/1567 | Loss: 1.0764 [2026-04-18 12:26:43] Validation | Batch 1030/1567 | Loss: 1.0780 [2026-04-18 12:26:44] Validation | Batch 1040/1567 | Loss: 1.0778 [2026-04-18 12:26:45] Validation | Batch 1050/1567 | Loss: 1.0790 [2026-04-18 12:26:46] Validation | Batch 1060/1567 | Loss: 1.0788 [2026-04-18 12:26:47] Validation | Batch 1070/1567 | Loss: 1.0782 [2026-04-18 12:26:47] Validation | Batch 1080/1567 | Loss: 1.0789 [2026-04-18 12:26:48] Validation | Batch 1090/1567 | Loss: 1.0788 [2026-04-18 12:26:49] Validation | Batch 1100/1567 | Loss: 1.0794 [2026-04-18 12:26:49] Validation | Batch 1110/1567 | Loss: 1.0791 [2026-04-18 12:26:50] Validation | Batch 1120/1567 | Loss: 1.0793 [2026-04-18 12:26:51] Validation | Batch 1130/1567 | Loss: 1.0795 [2026-04-18 12:26:52] Validation | Batch 1140/1567 | Loss: 1.0802 [2026-04-18 12:26:53] Validation | Batch 1150/1567 | Loss: 1.0802 [2026-04-18 12:26:53] Validation | Batch 1160/1567 | Loss: 1.0811 [2026-04-18 12:26:54] Validation | Batch 1170/1567 | Loss: 1.0807 [2026-04-18 12:26:55] Validation | Batch 1180/1567 | Loss: 1.0809 [2026-04-18 12:26:56] Validation | Batch 1190/1567 | Loss: 1.0816 [2026-04-18 12:26:57] Validation | Batch 1200/1567 | Loss: 1.0809 [2026-04-18 12:26:58] Validation | Batch 1210/1567 | Loss: 1.0798 [2026-04-18 12:26:58] Validation | Batch 1220/1567 | Loss: 1.0801 [2026-04-18 12:26:59] Validation | Batch 1230/1567 | Loss: 1.0817 [2026-04-18 12:27:00] Validation | Batch 1240/1567 | Loss: 1.0809 [2026-04-18 12:27:01] Validation | Batch 1250/1567 | Loss: 1.0806 [2026-04-18 12:27:01] Validation | Batch 1260/1567 | Loss: 1.0815 [2026-04-18 12:27:03] Validation | Batch 1270/1567 | Loss: 1.0817 [2026-04-18 12:27:03] Validation | Batch 1280/1567 | Loss: 1.0809 [2026-04-18 12:27:05] Validation | Batch 1290/1567 | Loss: 1.0811 [2026-04-18 12:27:05] Validation | Batch 1300/1567 | Loss: 1.0813 [2026-04-18 12:27:06] Validation | Batch 1310/1567 | Loss: 1.0818 [2026-04-18 12:27:07] Validation | Batch 1320/1567 | Loss: 1.0808 [2026-04-18 12:27:08] Validation | Batch 1330/1567 | Loss: 1.0808 [2026-04-18 12:27:09] Validation | Batch 1340/1567 | Loss: 1.0807 [2026-04-18 12:27:09] Validation | Batch 1350/1567 | Loss: 1.0811 [2026-04-18 12:27:10] Validation | Batch 1360/1567 | Loss: 1.0805 [2026-04-18 12:27:11] Validation | Batch 1370/1567 | Loss: 1.0807 [2026-04-18 12:27:12] Validation | Batch 1380/1567 | Loss: 1.0817 [2026-04-18 12:27:12] Validation | Batch 1390/1567 | Loss: 1.0818 [2026-04-18 12:27:13] Validation | Batch 1400/1567 | Loss: 1.0821 [2026-04-18 12:27:14] Validation | Batch 1410/1567 | Loss: 1.0818 [2026-04-18 12:27:14] Validation | Batch 1420/1567 | Loss: 1.0821 [2026-04-18 12:27:15] Validation | Batch 1430/1567 | Loss: 1.0817 [2026-04-18 12:27:16] Validation | Batch 1440/1567 | Loss: 1.0822 [2026-04-18 12:27:17] Validation | Batch 1450/1567 | Loss: 1.0814 [2026-04-18 12:27:17] Validation | Batch 1460/1567 | Loss: 1.0812 [2026-04-18 12:27:18] Validation | Batch 1470/1567 | Loss: 1.0805 [2026-04-18 12:27:19] Validation | Batch 1480/1567 | Loss: 1.0790 [2026-04-18 12:27:19] Validation | Batch 1490/1567 | Loss: 1.0792 [2026-04-18 12:27:20] Validation | Batch 1500/1567 | Loss: 1.0794 [2026-04-18 12:27:21] Validation | Batch 1510/1567 | Loss: 1.0790 [2026-04-18 12:27:22] Validation | Batch 1520/1567 | Loss: 1.0781 [2026-04-18 12:27:22] Validation | Batch 1530/1567 | Loss: 1.0791 [2026-04-18 12:27:24] Validation | Batch 1540/1567 | Loss: 1.0800 [2026-04-18 12:27:24] Validation | Batch 1550/1567 | Loss: 1.0801 [2026-04-18 12:27:25] Validation | Batch 1560/1567 | Loss: 1.0792 [2026-04-18 12:27:26] Validation | Batch 1567/1567 | Loss: 1.0795 [2026-04-18 12:27:26] Validation | Loss: 1.0795 | PPL: 2.97 | Time: 126.50s [2026-04-18 12:27:29] New best model saved! Val loss: 1.0795 [2026-04-18 12:27:32] Epoch 1 | Step 1010 | Loss: 1.0947 | LR: 1.43e-05 [2026-04-18 12:27:35] Epoch 1 | Step 1020 | Loss: 1.0939 | LR: 1.44e-05 [2026-04-18 12:27:39] Epoch 1 | Step 1030 | Loss: 1.0938 | LR: 1.45e-05 [2026-04-18 12:27:42] Epoch 1 | Step 1040 | Loss: 1.0919 | LR: 1.46e-05 [2026-04-18 12:27:46] Epoch 1 | Step 1050 | Loss: 1.0903 | LR: 1.47e-05 [2026-04-18 12:27:49] Epoch 1 | Step 1060 | Loss: 1.0876 | LR: 1.49e-05 [2026-04-18 12:27:53] Epoch 1 | Step 1070 | Loss: 1.0863 | LR: 1.50e-05 [2026-04-18 12:27:56] Epoch 1 | Step 1080 | Loss: 1.0865 | LR: 1.51e-05 [2026-04-18 12:28:00] Epoch 1 | Step 1090 | Loss: 1.0876 | LR: 1.52e-05 [2026-04-18 12:28:03] Epoch 1 | Step 1100 | Loss: 1.0867 | LR: 1.54e-05 [2026-04-18 12:28:08] Epoch 1 | Step 1110 | Loss: 1.0867 | LR: 1.55e-05 [2026-04-18 12:28:11] Epoch 1 | Step 1120 | Loss: 1.0868 | LR: 1.56e-05 [2026-04-18 12:28:14] Epoch 1 | Step 1130 | Loss: 1.0863 | LR: 1.57e-05 [2026-04-18 12:28:18] Epoch 1 | Step 1140 | Loss: 1.0856 | LR: 1.58e-05 [2026-04-18 12:28:22] Epoch 1 | Step 1150 | Loss: 1.0833 | LR: 1.60e-05 [2026-04-18 12:28:26] Epoch 1 | Step 1160 | Loss: 1.0834 | LR: 1.61e-05 [2026-04-18 12:28:29] Epoch 1 | Step 1170 | Loss: 1.0836 | LR: 1.62e-05 [2026-04-18 12:28:33] Epoch 1 | Step 1180 | Loss: 1.0830 | LR: 1.63e-05 [2026-04-18 12:28:37] Epoch 1 | Step 1190 | Loss: 1.0827 | LR: 1.64e-05 [2026-04-18 12:28:40] Epoch 1 | Step 1200 | Loss: 1.0814 | LR: 1.66e-05 [2026-04-18 12:28:43] Epoch 1 | Step 1210 | Loss: 1.0795 | LR: 1.67e-05 [2026-04-18 12:28:47] Epoch 1 | Step 1220 | Loss: 1.0778 | LR: 1.68e-05 [2026-04-18 12:28:50] Epoch 1 | Step 1230 | Loss: 1.0777 | LR: 1.69e-05 [2026-04-18 12:28:54] Epoch 1 | Step 1240 | Loss: 1.0772 | LR: 1.71e-05 [2026-04-18 12:28:58] Epoch 1 | Step 1250 | Loss: 1.0756 | LR: 1.72e-05 [2026-04-18 12:29:01] Epoch 1 | Step 1260 | Loss: 1.0749 | LR: 1.73e-05 [2026-04-18 12:29:04] Epoch 1 | Step 1270 | Loss: 1.0725 | LR: 1.74e-05 [2026-04-18 12:29:08] Epoch 1 | Step 1280 | Loss: 1.0727 | LR: 1.75e-05 [2026-04-18 12:29:10] Epoch 1 | Step 1290 | Loss: 1.0726 | LR: 1.77e-05 [2026-04-18 12:29:14] Epoch 1 | Step 1300 | Loss: 1.0714 | LR: 1.78e-05 [2026-04-18 12:29:17] Epoch 1 | Step 1310 | Loss: 1.0708 | LR: 1.79e-05 [2026-04-18 12:29:21] Epoch 1 | Step 1320 | Loss: 1.0707 | LR: 1.80e-05 [2026-04-18 12:29:24] Epoch 1 | Step 1330 | Loss: 1.0696 | LR: 1.81e-05 [2026-04-18 12:29:28] Epoch 1 | Step 1340 | Loss: 1.0695 | LR: 1.83e-05 [2026-04-18 12:29:31] Epoch 1 | Step 1350 | Loss: 1.0696 | LR: 1.84e-05 [2026-04-18 12:29:35] Epoch 1 | Step 1360 | Loss: 1.0690 | LR: 1.85e-05 [2026-04-18 12:29:38] Epoch 1 | Step 1370 | Loss: 1.0684 | LR: 1.86e-05 [2026-04-18 12:29:42] Epoch 1 | Step 1380 | Loss: 1.0688 | LR: 1.87e-05 [2026-04-18 12:29:45] Epoch 1 | Step 1390 | Loss: 1.0687 | LR: 1.89e-05 [2026-04-18 12:29:49] Epoch 1 | Step 1400 | Loss: 1.0685 | LR: 1.90e-05 [2026-04-18 12:29:53] Epoch 1 | Step 1410 | Loss: 1.0673 | LR: 1.91e-05 [2026-04-18 12:29:56] Epoch 1 | Step 1420 | Loss: 1.0668 | LR: 1.92e-05 [2026-04-18 12:30:00] Epoch 1 | Step 1430 | Loss: 1.0659 | LR: 1.94e-05 [2026-04-18 12:30:03] Epoch 1 | Step 1440 | Loss: 1.0650 | LR: 1.95e-05 [2026-04-18 12:30:07] Epoch 1 | Step 1450 | Loss: 1.0646 | LR: 1.96e-05 [2026-04-18 12:30:10] Epoch 1 | Step 1460 | Loss: 1.0631 | LR: 1.97e-05 [2026-04-18 12:30:15] Epoch 1 | Step 1470 | Loss: 1.0630 | LR: 1.98e-05 [2026-04-18 12:30:19] Epoch 1 | Step 1480 | Loss: 1.0628 | LR: 2.00e-05 [2026-04-18 12:30:22] Epoch 1 | Step 1490 | Loss: 1.0627 | LR: 2.00e-05 [2026-04-18 12:30:25] Epoch 1 | Step 1500 | Loss: 1.0619 | LR: 2.00e-05 [2026-04-18 12:30:29] Epoch 1 | Step 1510 | Loss: 1.0620 | LR: 2.00e-05 [2026-04-18 12:30:33] Epoch 1 | Step 1520 | Loss: 1.0619 | LR: 2.00e-05 [2026-04-18 12:30:36] Epoch 1 | Step 1530 | Loss: 1.0615 | LR: 2.00e-05 [2026-04-18 12:30:39] Epoch 1 | Step 1540 | Loss: 1.0617 | LR: 2.00e-05 [2026-04-18 12:30:43] Epoch 1 | Step 1550 | Loss: 1.0615 | LR: 2.00e-05 [2026-04-18 12:30:46] Epoch 1 | Step 1560 | Loss: 1.0607 | LR: 2.00e-05 [2026-04-18 12:30:50] Epoch 1 | Step 1570 | Loss: 1.0611 | LR: 2.00e-05 [2026-04-18 12:30:54] Epoch 1 | Step 1580 | Loss: 1.0602 | LR: 2.00e-05 [2026-04-18 12:30:57] Epoch 1 | Step 1590 | Loss: 1.0600 | LR: 2.00e-05 [2026-04-18 12:31:01] Epoch 1 | Step 1600 | Loss: 1.0598 | LR: 2.00e-05 [2026-04-18 12:31:04] Epoch 1 | Step 1610 | Loss: 1.0584 | LR: 2.00e-05 [2026-04-18 12:31:08] Epoch 1 | Step 1620 | Loss: 1.0574 | LR: 2.00e-05 [2026-04-18 12:31:11] Epoch 1 | Step 1630 | Loss: 1.0576 | LR: 2.00e-05 [2026-04-18 12:31:14] Epoch 1 | Step 1640 | Loss: 1.0572 | LR: 2.00e-05 [2026-04-18 12:31:18] Epoch 1 | Step 1650 | Loss: 1.0562 | LR: 2.00e-05 [2026-04-18 12:31:22] Epoch 1 | Step 1660 | Loss: 1.0555 | LR: 2.00e-05 [2026-04-18 12:31:25] Epoch 1 | Step 1670 | Loss: 1.0559 | LR: 2.00e-05 [2026-04-18 12:31:29] Epoch 1 | Step 1680 | Loss: 1.0559 | LR: 2.00e-05 [2026-04-18 12:31:33] Epoch 1 | Step 1690 | Loss: 1.0554 | LR: 2.00e-05 [2026-04-18 12:31:36] Epoch 1 | Step 1700 | Loss: 1.0541 | LR: 2.00e-05 [2026-04-18 12:31:40] Epoch 1 | Step 1710 | Loss: 1.0536 | LR: 2.00e-05 [2026-04-18 12:31:43] Epoch 1 | Step 1720 | Loss: 1.0529 | LR: 2.00e-05 [2026-04-18 12:31:46] Epoch 1 | Step 1730 | Loss: 1.0525 | LR: 2.00e-05 [2026-04-18 12:31:50] Epoch 1 | Step 1740 | Loss: 1.0523 | LR: 2.00e-05 [2026-04-18 12:31:54] Epoch 1 | Step 1750 | Loss: 1.0528 | LR: 2.00e-05 [2026-04-18 12:31:57] Epoch 1 | Step 1760 | Loss: 1.0520 | LR: 2.00e-05 [2026-04-18 12:32:01] Epoch 1 | Step 1770 | Loss: 1.0520 | LR: 2.00e-05 [2026-04-18 12:32:04] Epoch 1 | Step 1780 | Loss: 1.0517 | LR: 2.00e-05 [2026-04-18 12:32:08] Epoch 1 | Step 1790 | Loss: 1.0515 | LR: 2.00e-05 [2026-04-18 12:32:12] Epoch 1 | Step 1800 | Loss: 1.0508 | LR: 2.00e-05 [2026-04-18 12:32:15] Epoch 1 | Step 1810 | Loss: 1.0506 | LR: 2.00e-05 [2026-04-18 12:32:19] Epoch 1 | Step 1820 | Loss: 1.0509 | LR: 2.00e-05 [2026-04-18 12:32:22] Epoch 1 | Step 1830 | Loss: 1.0508 | LR: 2.00e-05 [2026-04-18 12:32:27] Epoch 1 | Step 1840 | Loss: 1.0506 | LR: 2.00e-05 [2026-04-18 12:32:31] Epoch 1 | Step 1850 | Loss: 1.0500 | LR: 2.00e-05 [2026-04-18 12:32:34] Epoch 1 | Step 1860 | Loss: 1.0498 | LR: 2.00e-05 [2026-04-18 12:32:38] Epoch 1 | Step 1870 | Loss: 1.0493 | LR: 2.00e-05 [2026-04-18 12:32:41] Epoch 1 | Step 1880 | Loss: 1.0488 | LR: 2.00e-05 [2026-04-18 12:32:45] Epoch 1 | Step 1890 | Loss: 1.0490 | LR: 2.00e-05 [2026-04-18 12:32:49] Epoch 1 | Step 1900 | Loss: 1.0485 | LR: 2.00e-05 [2026-04-18 12:32:53] Epoch 1 | Step 1910 | Loss: 1.0486 | LR: 2.00e-05 [2026-04-18 12:32:56] Epoch 1 | Step 1920 | Loss: 1.0488 | LR: 2.00e-05 [2026-04-18 12:32:59] Epoch 1 | Step 1930 | Loss: 1.0487 | LR: 2.00e-05 [2026-04-18 12:33:03] Epoch 1 | Step 1940 | Loss: 1.0482 | LR: 2.00e-05 [2026-04-18 12:33:07] Epoch 1 | Step 1950 | Loss: 1.0476 | LR: 2.00e-05 [2026-04-18 12:33:10] Epoch 1 | Step 1960 | Loss: 1.0476 | LR: 2.00e-05 [2026-04-18 12:33:14] Epoch 1 | Step 1970 | Loss: 1.0474 | LR: 2.00e-05 [2026-04-18 12:33:17] Epoch 1 | Step 1980 | Loss: 1.0477 | LR: 2.00e-05 [2026-04-18 12:33:20] Epoch 1 | Step 1990 | Loss: 1.0477 | LR: 2.00e-05 [2026-04-18 12:33:23] Epoch 1 | Step 2000 | Loss: 1.0474 | LR: 2.00e-05 [2026-04-18 12:33:24] Validation | Batch 10/1567 | Loss: 0.9653 [2026-04-18 12:33:25] Validation | Batch 20/1567 | Loss: 1.0305 [2026-04-18 12:33:26] Validation | Batch 30/1567 | Loss: 1.0818 [2026-04-18 12:33:27] Validation | Batch 40/1567 | Loss: 1.1024 [2026-04-18 12:33:27] Validation | Batch 50/1567 | Loss: 1.0830 [2026-04-18 12:33:29] Validation | Batch 60/1567 | Loss: 1.0710 [2026-04-18 12:33:30] Validation | Batch 70/1567 | Loss: 1.0558 [2026-04-18 12:33:31] Validation | Batch 80/1567 | Loss: 1.0690 [2026-04-18 12:33:32] Validation | Batch 90/1567 | Loss: 1.0777 [2026-04-18 12:33:33] Validation | Batch 100/1567 | Loss: 1.0825 [2026-04-18 12:33:33] Validation | Batch 110/1567 | Loss: 1.0716 [2026-04-18 12:33:34] Validation | Batch 120/1567 | Loss: 1.0840 [2026-04-18 12:33:35] Validation | Batch 130/1567 | Loss: 1.0853 [2026-04-18 12:33:36] Validation | Batch 140/1567 | Loss: 1.0891 [2026-04-18 12:33:37] Validation | Batch 150/1567 | Loss: 1.0966 [2026-04-18 12:33:37] Validation | Batch 160/1567 | Loss: 1.0975 [2026-04-18 12:33:38] Validation | Batch 170/1567 | Loss: 1.0832 [2026-04-18 12:33:39] Validation | Batch 180/1567 | Loss: 1.0839 [2026-04-18 12:33:40] Validation | Batch 190/1567 | Loss: 1.0821 [2026-04-18 12:33:41] Validation | Batch 200/1567 | Loss: 1.0853 [2026-04-18 12:33:42] Validation | Batch 210/1567 | Loss: 1.0874 [2026-04-18 12:33:42] Validation | Batch 220/1567 | Loss: 1.0900 [2026-04-18 12:33:44] Validation | Batch 230/1567 | Loss: 1.0941 [2026-04-18 12:33:44] Validation | Batch 240/1567 | Loss: 1.0921 [2026-04-18 12:33:45] Validation | Batch 250/1567 | Loss: 1.0865 [2026-04-18 12:33:46] Validation | Batch 260/1567 | Loss: 1.0813 [2026-04-18 12:33:46] Validation | Batch 270/1567 | Loss: 1.0776 [2026-04-18 12:33:47] Validation | Batch 280/1567 | Loss: 1.0791 [2026-04-18 12:33:48] Validation | Batch 290/1567 | Loss: 1.0846 [2026-04-18 12:33:49] Validation | Batch 300/1567 | Loss: 1.0895 [2026-04-18 12:33:50] Validation | Batch 310/1567 | Loss: 1.0889 [2026-04-18 12:33:50] Validation | Batch 320/1567 | Loss: 1.0883 [2026-04-18 12:33:52] Validation | Batch 330/1567 | Loss: 1.0863 [2026-04-18 12:33:52] Validation | Batch 340/1567 | Loss: 1.0892 [2026-04-18 12:33:53] Validation | Batch 350/1567 | Loss: 1.0885 [2026-04-18 12:33:54] Validation | Batch 360/1567 | Loss: 1.0866 [2026-04-18 12:33:55] Validation | Batch 370/1567 | Loss: 1.0840 [2026-04-18 12:33:56] Validation | Batch 380/1567 | Loss: 1.0871 [2026-04-18 12:33:56] Validation | Batch 390/1567 | Loss: 1.0882 [2026-04-18 12:33:57] Validation | Batch 400/1567 | Loss: 1.0893 [2026-04-18 12:33:58] Validation | Batch 410/1567 | Loss: 1.0891 [2026-04-18 12:33:59] Validation | Batch 420/1567 | Loss: 1.0894 [2026-04-18 12:34:00] Validation | Batch 430/1567 | Loss: 1.0892 [2026-04-18 12:34:01] Validation | Batch 440/1567 | Loss: 1.0882 [2026-04-18 12:34:01] Validation | Batch 450/1567 | Loss: 1.0884 [2026-04-18 12:34:02] Validation | Batch 460/1567 | Loss: 1.0872 [2026-04-18 12:34:02] Validation | Batch 470/1567 | Loss: 1.0855 [2026-04-18 12:34:03] Validation | Batch 480/1567 | Loss: 1.0832 [2026-04-18 12:34:04] Validation | Batch 490/1567 | Loss: 1.0830 [2026-04-18 12:34:05] Validation | Batch 500/1567 | Loss: 1.0839 [2026-04-18 12:34:06] Validation | Batch 510/1567 | Loss: 1.0862 [2026-04-18 12:34:06] Validation | Batch 520/1567 | Loss: 1.0874 [2026-04-18 12:34:07] Validation | Batch 530/1567 | Loss: 1.0872 [2026-04-18 12:34:08] Validation | Batch 540/1567 | Loss: 1.0889 [2026-04-18 12:34:09] Validation | Batch 550/1567 | Loss: 1.0918 [2026-04-18 12:34:10] Validation | Batch 560/1567 | Loss: 1.0912 [2026-04-18 12:34:11] Validation | Batch 570/1567 | Loss: 1.0903 [2026-04-18 12:34:12] Validation | Batch 580/1567 | Loss: 1.0892 [2026-04-18 12:34:12] Validation | Batch 590/1567 | Loss: 1.0882 [2026-04-18 12:34:13] Validation | Batch 600/1567 | Loss: 1.0865 [2026-04-18 12:34:14] Validation | Batch 610/1567 | Loss: 1.0856 [2026-04-18 12:34:15] Validation | Batch 620/1567 | Loss: 1.0869 [2026-04-18 12:34:16] Validation | Batch 630/1567 | Loss: 1.0848 [2026-04-18 12:34:17] Validation | Batch 640/1567 | Loss: 1.0858 [2026-04-18 12:34:18] Validation | Batch 650/1567 | Loss: 1.0852 [2026-04-18 12:34:18] Validation | Batch 660/1567 | Loss: 1.0840 [2026-04-18 12:34:19] Validation | Batch 670/1567 | Loss: 1.0817 [2026-04-18 12:34:20] Validation | Batch 680/1567 | Loss: 1.0810 [2026-04-18 12:34:20] Validation | Batch 690/1567 | Loss: 1.0817 [2026-04-18 12:34:21] Validation | Batch 700/1567 | Loss: 1.0800 [2026-04-18 12:34:22] Validation | Batch 710/1567 | Loss: 1.0813 [2026-04-18 12:34:23] Validation | Batch 720/1567 | Loss: 1.0810 [2026-04-18 12:34:24] Validation | Batch 730/1567 | Loss: 1.0826 [2026-04-18 12:34:25] Validation | Batch 740/1567 | Loss: 1.0832 [2026-04-18 12:34:25] Validation | Batch 750/1567 | Loss: 1.0832 [2026-04-18 12:34:26] Validation | Batch 760/1567 | Loss: 1.0830 [2026-04-18 12:34:27] Validation | Batch 770/1567 | Loss: 1.0851 [2026-04-18 12:34:28] Validation | Batch 780/1567 | Loss: 1.0860 [2026-04-18 12:34:29] Validation | Batch 790/1567 | Loss: 1.0854 [2026-04-18 12:34:29] Validation | Batch 800/1567 | Loss: 1.0873 [2026-04-18 12:34:30] Validation | Batch 810/1567 | Loss: 1.0875 [2026-04-18 12:34:31] Validation | Batch 820/1567 | Loss: 1.0873 [2026-04-18 12:34:32] Validation | Batch 830/1567 | Loss: 1.0858 [2026-04-18 12:34:32] Validation | Batch 840/1567 | Loss: 1.0863 [2026-04-18 12:34:33] Validation | Batch 850/1567 | Loss: 1.0849 [2026-04-18 12:34:34] Validation | Batch 860/1567 | Loss: 1.0861 [2026-04-18 12:34:35] Validation | Batch 870/1567 | Loss: 1.0869 [2026-04-18 12:34:36] Validation | Batch 880/1567 | Loss: 1.0877 [2026-04-18 12:34:37] Validation | Batch 890/1567 | Loss: 1.0880 [2026-04-18 12:34:37] Validation | Batch 900/1567 | Loss: 1.0899 [2026-04-18 12:34:38] Validation | Batch 910/1567 | Loss: 1.0904 [2026-04-18 12:34:39] Validation | Batch 920/1567 | Loss: 1.0921 [2026-04-18 12:34:40] Validation | Batch 930/1567 | Loss: 1.0899 [2026-04-18 12:34:40] Validation | Batch 940/1567 | Loss: 1.0897 [2026-04-18 12:34:41] Validation | Batch 950/1567 | Loss: 1.0887 [2026-04-18 12:34:42] Validation | Batch 960/1567 | Loss: 1.0876 [2026-04-18 12:34:42] Validation | Batch 970/1567 | Loss: 1.0887 [2026-04-18 12:34:43] Validation | Batch 980/1567 | Loss: 1.0891 [2026-04-18 12:34:44] Validation | Batch 990/1567 | Loss: 1.0885 [2026-04-18 12:34:45] Validation | Batch 1000/1567 | Loss: 1.0885 [2026-04-18 12:34:45] Validation | Batch 1010/1567 | Loss: 1.0864 [2026-04-18 12:34:46] Validation | Batch 1020/1567 | Loss: 1.0868 [2026-04-18 12:34:47] Validation | Batch 1030/1567 | Loss: 1.0884 [2026-04-18 12:34:48] Validation | Batch 1040/1567 | Loss: 1.0882 [2026-04-18 12:34:49] Validation | Batch 1050/1567 | Loss: 1.0895 [2026-04-18 12:34:50] Validation | Batch 1060/1567 | Loss: 1.0891 [2026-04-18 12:34:50] Validation | Batch 1070/1567 | Loss: 1.0884 [2026-04-18 12:34:51] Validation | Batch 1080/1567 | Loss: 1.0892 [2026-04-18 12:34:52] Validation | Batch 1090/1567 | Loss: 1.0892 [2026-04-18 12:34:52] Validation | Batch 1100/1567 | Loss: 1.0899 [2026-04-18 12:34:53] Validation | Batch 1110/1567 | Loss: 1.0895 [2026-04-18 12:34:54] Validation | Batch 1120/1567 | Loss: 1.0898 [2026-04-18 12:34:55] Validation | Batch 1130/1567 | Loss: 1.0901 [2026-04-18 12:34:56] Validation | Batch 1140/1567 | Loss: 1.0907 [2026-04-18 12:34:57] Validation | Batch 1150/1567 | Loss: 1.0909 [2026-04-18 12:34:57] Validation | Batch 1160/1567 | Loss: 1.0917 [2026-04-18 12:34:58] Validation | Batch 1170/1567 | Loss: 1.0915 [2026-04-18 12:34:59] Validation | Batch 1180/1567 | Loss: 1.0914 [2026-04-18 12:35:00] Validation | Batch 1190/1567 | Loss: 1.0920 [2026-04-18 12:35:01] Validation | Batch 1200/1567 | Loss: 1.0912 [2026-04-18 12:35:01] Validation | Batch 1210/1567 | Loss: 1.0901 [2026-04-18 12:35:02] Validation | Batch 1220/1567 | Loss: 1.0904 [2026-04-18 12:35:03] Validation | Batch 1230/1567 | Loss: 1.0921 [2026-04-18 12:35:04] Validation | Batch 1240/1567 | Loss: 1.0910 [2026-04-18 12:35:04] Validation | Batch 1250/1567 | Loss: 1.0907 [2026-04-18 12:35:05] Validation | Batch 1260/1567 | Loss: 1.0918 [2026-04-18 12:35:05] Validation | Batch 1270/1567 | Loss: 1.0918 [2026-04-18 12:35:06] Validation | Batch 1280/1567 | Loss: 1.0911 [2026-04-18 12:35:07] Validation | Batch 1290/1567 | Loss: 1.0914 [2026-04-18 12:35:08] Validation | Batch 1300/1567 | Loss: 1.0918 [2026-04-18 12:35:09] Validation | Batch 1310/1567 | Loss: 1.0922 [2026-04-18 12:35:10] Validation | Batch 1320/1567 | Loss: 1.0913 [2026-04-18 12:35:10] Validation | Batch 1330/1567 | Loss: 1.0909 [2026-04-18 12:35:11] Validation | Batch 1340/1567 | Loss: 1.0905 [2026-04-18 12:35:12] Validation | Batch 1350/1567 | Loss: 1.0911 [2026-04-18 12:35:13] Validation | Batch 1360/1567 | Loss: 1.0905 [2026-04-18 12:35:13] Validation | Batch 1370/1567 | Loss: 1.0908 [2026-04-18 12:35:14] Validation | Batch 1380/1567 | Loss: 1.0919 [2026-04-18 12:35:15] Validation | Batch 1390/1567 | Loss: 1.0919 [2026-04-18 12:35:16] Validation | Batch 1400/1567 | Loss: 1.0921 [2026-04-18 12:35:16] Validation | Batch 1410/1567 | Loss: 1.0919 [2026-04-18 12:35:17] Validation | Batch 1420/1567 | Loss: 1.0923 [2026-04-18 12:35:18] Validation | Batch 1430/1567 | Loss: 1.0919 [2026-04-18 12:35:19] Validation | Batch 1440/1567 | Loss: 1.0924 [2026-04-18 12:35:19] Validation | Batch 1450/1567 | Loss: 1.0916 [2026-04-18 12:35:20] Validation | Batch 1460/1567 | Loss: 1.0914 [2026-04-18 12:35:21] Validation | Batch 1470/1567 | Loss: 1.0907 [2026-04-18 12:35:22] Validation | Batch 1480/1567 | Loss: 1.0891 [2026-04-18 12:35:22] Validation | Batch 1490/1567 | Loss: 1.0893 [2026-04-18 12:35:23] Validation | Batch 1500/1567 | Loss: 1.0894 [2026-04-18 12:35:24] Validation | Batch 1510/1567 | Loss: 1.0890 [2026-04-18 12:35:24] Validation | Batch 1520/1567 | Loss: 1.0881 [2026-04-18 12:35:25] Validation | Batch 1530/1567 | Loss: 1.0890 [2026-04-18 12:35:26] Validation | Batch 1540/1567 | Loss: 1.0900 [2026-04-18 12:35:27] Validation | Batch 1550/1567 | Loss: 1.0902 [2026-04-18 12:35:28] Validation | Batch 1560/1567 | Loss: 1.0892 [2026-04-18 12:35:29] Validation | Batch 1567/1567 | Loss: 1.0895 [2026-04-18 12:35:29] Validation | Loss: 1.0895 | PPL: 3.00 | Time: 125.26s [2026-04-18 12:35:32] Epoch 1 | Step 2010 | Loss: 1.0473 | LR: 2.00e-05 [2026-04-18 12:35:36] Epoch 1 | Step 2020 | Loss: 1.0471 | LR: 2.00e-05 [2026-04-18 12:35:38] Epoch 1 | Step 2030 | Loss: 1.0471 | LR: 2.00e-05 [2026-04-18 12:35:42] Epoch 1 | Step 2040 | Loss: 1.0470 | LR: 2.00e-05 [2026-04-18 12:35:45] Epoch 1 | Step 2050 | Loss: 1.0469 | LR: 2.00e-05 [2026-04-18 12:35:49] Epoch 1 | Step 2060 | Loss: 1.0465 | LR: 2.00e-05 [2026-04-18 12:35:53] Epoch 1 | Step 2070 | Loss: 1.0454 | LR: 2.00e-05 [2026-04-18 12:35:56] Epoch 1 | Step 2080 | Loss: 1.0449 | LR: 2.00e-05 [2026-04-18 12:36:00] Epoch 1 | Step 2090 | Loss: 1.0450 | LR: 2.00e-05 [2026-04-18 12:36:03] Epoch 1 | Step 2100 | Loss: 1.0450 | LR: 2.00e-05 [2026-04-18 12:36:07] Epoch 1 | Step 2110 | Loss: 1.0448 | LR: 2.00e-05 [2026-04-18 12:36:12] Epoch 1 | Step 2120 | Loss: 1.0442 | LR: 2.00e-05 [2026-04-18 12:36:16] Epoch 1 | Step 2130 | Loss: 1.0442 | LR: 2.00e-05 [2026-04-18 12:36:19] Epoch 1 | Step 2140 | Loss: 1.0438 | LR: 2.00e-05 [2026-04-18 12:36:22] Epoch 1 | Step 2150 | Loss: 1.0437 | LR: 2.00e-05 [2026-04-18 12:36:26] Epoch 1 | Step 2160 | Loss: 1.0438 | LR: 2.00e-05 [2026-04-18 12:36:29] Epoch 1 | Step 2170 | Loss: 1.0434 | LR: 2.00e-05 [2026-04-18 12:36:33] Epoch 1 | Step 2180 | Loss: 1.0429 | LR: 2.00e-05 [2026-04-18 12:36:36] Epoch 1 | Step 2190 | Loss: 1.0428 | LR: 2.00e-05 [2026-04-18 12:36:40] Epoch 1 | Step 2200 | Loss: 1.0426 | LR: 2.00e-05 [2026-04-18 12:36:43] Epoch 1 | Step 2210 | Loss: 1.0424 | LR: 2.00e-05 [2026-04-18 12:36:47] Epoch 1 | Step 2220 | Loss: 1.0429 | LR: 2.00e-05 [2026-04-18 12:36:50] Epoch 1 | Step 2230 | Loss: 1.0435 | LR: 2.00e-05 [2026-04-18 12:36:54] Epoch 1 | Step 2240 | Loss: 1.0439 | LR: 2.00e-05 [2026-04-18 12:36:57] Epoch 1 | Step 2250 | Loss: 1.0443 | LR: 2.00e-05 [2026-04-18 12:37:01] Epoch 1 | Step 2260 | Loss: 1.0441 | LR: 2.00e-05 [2026-04-18 12:37:04] Epoch 1 | Step 2270 | Loss: 1.0440 | LR: 2.00e-05 [2026-04-18 12:37:08] Epoch 1 | Step 2280 | Loss: 1.0440 | LR: 2.00e-05 [2026-04-18 12:37:11] Epoch 1 | Step 2290 | Loss: 1.0447 | LR: 2.00e-05 [2026-04-18 12:37:15] Epoch 1 | Step 2300 | Loss: 1.0446 | LR: 2.00e-05 [2026-04-18 12:37:18] Epoch 1 | Step 2310 | Loss: 1.0442 | LR: 2.00e-05 [2026-04-18 12:37:22] Epoch 1 | Step 2320 | Loss: 1.0442 | LR: 2.00e-05 [2026-04-18 12:37:25] Epoch 1 | Step 2330 | Loss: 1.0440 | LR: 2.00e-05 [2026-04-18 12:37:28] Epoch 1 | Step 2340 | Loss: 1.0436 | LR: 2.00e-05 [2026-04-18 12:37:32] Epoch 1 | Step 2350 | Loss: 1.0433 | LR: 2.00e-05 [2026-04-18 12:37:36] Epoch 1 | Step 2360 | Loss: 1.0434 | LR: 2.00e-05 [2026-04-18 12:37:39] Epoch 1 | Step 2370 | Loss: 1.0433 | LR: 2.00e-05 [2026-04-18 12:37:43] Epoch 1 | Step 2380 | Loss: 1.0429 | LR: 2.00e-05 [2026-04-18 12:37:46] Epoch 1 | Step 2390 | Loss: 1.0428 | LR: 2.00e-05 [2026-04-18 12:37:50] Epoch 1 | Step 2400 | Loss: 1.0423 | LR: 2.00e-05 [2026-04-18 12:37:54] Epoch 1 | Step 2410 | Loss: 1.0425 | LR: 2.00e-05 [2026-04-18 12:37:58] Epoch 1 | Step 2420 | Loss: 1.0423 | LR: 2.00e-05 [2026-04-18 12:38:01] Epoch 1 | Step 2430 | Loss: 1.0424 | LR: 2.00e-05 [2026-04-18 12:38:05] Epoch 1 | Step 2440 | Loss: 1.0419 | LR: 2.00e-05 [2026-04-18 12:38:08] Epoch 1 | Step 2450 | Loss: 1.0414 | LR: 2.00e-05 [2026-04-18 12:38:12] Epoch 1 | Step 2460 | Loss: 1.0413 | LR: 2.00e-05 [2026-04-18 12:38:16] Epoch 1 | Step 2470 | Loss: 1.0414 | LR: 2.00e-05 [2026-04-18 12:38:19] Epoch 1 | Step 2480 | Loss: 1.0413 | LR: 2.00e-05 [2026-04-18 12:38:23] Epoch 1 | Step 2490 | Loss: 1.0408 | LR: 2.00e-05 [2026-04-18 12:38:27] Epoch 1 | Step 2500 | Loss: 1.0404 | LR: 2.00e-05 [2026-04-18 12:38:30] Epoch 1 | Step 2510 | Loss: 1.0406 | LR: 2.00e-05 [2026-04-18 12:38:33] Epoch 1 | Step 2520 | Loss: 1.0399 | LR: 2.00e-05 [2026-04-18 12:38:37] Epoch 1 | Step 2530 | Loss: 1.0395 | LR: 2.00e-05 [2026-04-18 12:38:40] Epoch 1 | Step 2540 | Loss: 1.0393 | LR: 2.00e-05 [2026-04-18 12:38:44] Epoch 1 | Step 2550 | Loss: 1.0386 | LR: 2.00e-05 [2026-04-18 12:38:47] Epoch 1 | Step 2560 | Loss: 1.0386 | LR: 2.00e-05 [2026-04-18 12:38:51] Epoch 1 | Step 2570 | Loss: 1.0389 | LR: 2.00e-05 [2026-04-18 12:38:55] Epoch 1 | Step 2580 | Loss: 1.0391 | LR: 2.00e-05 [2026-04-18 12:38:59] Epoch 1 | Step 2590 | Loss: 1.0393 | LR: 2.00e-05 [2026-04-18 12:39:02] Epoch 1 | Step 2600 | Loss: 1.0393 | LR: 2.00e-05 [2026-04-18 12:39:05] Epoch 1 | Step 2610 | Loss: 1.0392 | LR: 2.00e-05 [2026-04-18 12:39:08] Epoch 1 | Step 2620 | Loss: 1.0387 | LR: 2.00e-05 [2026-04-18 12:39:12] Epoch 1 | Step 2630 | Loss: 1.0384 | LR: 2.00e-05 [2026-04-18 12:39:16] Epoch 1 | Step 2640 | Loss: 1.0384 | LR: 2.00e-05 [2026-04-18 12:39:19] Epoch 1 | Step 2650 | Loss: 1.0381 | LR: 2.00e-05 [2026-04-18 12:39:23] Epoch 1 | Step 2660 | Loss: 1.0381 | LR: 2.00e-05 [2026-04-18 12:39:26] Epoch 1 | Step 2670 | Loss: 1.0377 | LR: 2.00e-05 [2026-04-18 12:39:30] Epoch 1 | Step 2680 | Loss: 1.0373 | LR: 2.00e-05 [2026-04-18 12:39:34] Epoch 1 | Step 2690 | Loss: 1.0372 | LR: 2.00e-05 [2026-04-18 12:39:37] Epoch 1 | Step 2700 | Loss: 1.0368 | LR: 2.00e-05 [2026-04-18 12:39:40] Epoch 1 | Step 2710 | Loss: 1.0361 | LR: 2.00e-05 [2026-04-18 12:39:44] Epoch 1 | Step 2720 | Loss: 1.0360 | LR: 2.00e-05 [2026-04-18 12:39:48] Epoch 1 | Step 2730 | Loss: 1.0358 | LR: 2.00e-05 [2026-04-18 12:39:51] Epoch 1 | Step 2740 | Loss: 1.0362 | LR: 2.00e-05 [2026-04-18 12:39:55] Epoch 1 | Step 2750 | Loss: 1.0363 | LR: 2.00e-05 [2026-04-18 12:39:57] Epoch 1 | Step 2760 | Loss: 1.0359 | LR: 2.00e-05 [2026-04-18 12:40:01] Epoch 1 | Step 2770 | Loss: 1.0356 | LR: 2.00e-05 [2026-04-18 12:40:04] Epoch 1 | Step 2780 | Loss: 1.0359 | LR: 2.00e-05 [2026-04-18 12:40:08] Epoch 1 | Step 2790 | Loss: 1.0358 | LR: 2.00e-05 [2026-04-18 12:40:11] Epoch 1 | Step 2800 | Loss: 1.0355 | LR: 2.00e-05 [2026-04-18 12:40:15] Epoch 1 | Step 2810 | Loss: 1.0355 | LR: 2.00e-05 [2026-04-18 12:40:18] Epoch 1 | Step 2820 | Loss: 1.0354 | LR: 2.00e-05 [2026-04-18 12:40:21] Epoch 1 | Step 2830 | Loss: 1.0350 | LR: 2.00e-05 [2026-04-18 12:40:25] Epoch 1 | Step 2840 | Loss: 1.0356 | LR: 2.00e-05 [2026-04-18 12:40:29] Epoch 1 | Step 2850 | Loss: 1.0356 | LR: 2.00e-05 [2026-04-18 12:40:33] Epoch 1 | Step 2860 | Loss: 1.0354 | LR: 2.00e-05 [2026-04-18 12:40:37] Epoch 1 | Step 2870 | Loss: 1.0354 | LR: 2.00e-05 [2026-04-18 12:40:41] Epoch 1 | Step 2880 | Loss: 1.0352 | LR: 2.00e-05 [2026-04-18 12:40:44] Epoch 1 | Step 2890 | Loss: 1.0349 | LR: 2.00e-05 [2026-04-18 12:40:48] Epoch 1 | Step 2900 | Loss: 1.0344 | LR: 2.00e-05 [2026-04-18 12:40:52] Epoch 1 | Step 2910 | Loss: 1.0343 | LR: 2.00e-05 [2026-04-18 12:40:56] Epoch 1 | Step 2920 | Loss: 1.0347 | LR: 2.00e-05 [2026-04-18 12:40:59] Epoch 1 | Step 2930 | Loss: 1.0344 | LR: 2.00e-05 [2026-04-18 12:41:02] Epoch 1 | Step 2940 | Loss: 1.0340 | LR: 2.00e-05 [2026-04-18 12:41:06] Epoch 1 | Step 2950 | Loss: 1.0343 | LR: 2.00e-05 [2026-04-18 12:41:09] Epoch 1 | Step 2960 | Loss: 1.0342 | LR: 2.00e-05 [2026-04-18 12:41:13] Epoch 1 | Step 2970 | Loss: 1.0343 | LR: 2.00e-05 [2026-04-18 12:41:16] Epoch 1 | Step 2980 | Loss: 1.0340 | LR: 2.00e-05 [2026-04-18 12:41:20] Epoch 1 | Step 2990 | Loss: 1.0342 | LR: 2.00e-05 [2026-04-18 12:41:24] Epoch 1 | Step 3000 | Loss: 1.0341 | LR: 2.00e-05 [2026-04-18 12:41:35] Checkpoint saved: outputs/2026-04-18/12-19-14/checkpoints/checkpoint_step_3000.pt [2026-04-18 12:41:46] Validation | Batch 10/1567 | Loss: 0.9670 [2026-04-18 12:41:47] Validation | Batch 20/1567 | Loss: 1.0277 [2026-04-18 12:41:48] Validation | Batch 30/1567 | Loss: 1.0821 [2026-04-18 12:41:49] Validation | Batch 40/1567 | Loss: 1.1025 [2026-04-18 12:41:49] Validation | Batch 50/1567 | Loss: 1.0775 [2026-04-18 12:41:50] Validation | Batch 60/1567 | Loss: 1.0661 [2026-04-18 12:41:51] Validation | Batch 70/1567 | Loss: 1.0485 [2026-04-18 12:41:52] Validation | Batch 80/1567 | Loss: 1.0631 [2026-04-18 12:41:53] Validation | Batch 90/1567 | Loss: 1.0694 [2026-04-18 12:41:54] Validation | Batch 100/1567 | Loss: 1.0766 [2026-04-18 12:41:55] Validation | Batch 110/1567 | Loss: 1.0671 [2026-04-18 12:41:56] Validation | Batch 120/1567 | Loss: 1.0790 [2026-04-18 12:41:57] Validation | Batch 130/1567 | Loss: 1.0811 [2026-04-18 12:41:57] Validation | Batch 140/1567 | Loss: 1.0832 [2026-04-18 12:41:58] Validation | Batch 150/1567 | Loss: 1.0914 [2026-04-18 12:41:59] Validation | Batch 160/1567 | Loss: 1.0925 [2026-04-18 12:42:00] Validation | Batch 170/1567 | Loss: 1.0773 [2026-04-18 12:42:00] Validation | Batch 180/1567 | Loss: 1.0776 [2026-04-18 12:42:01] Validation | Batch 190/1567 | Loss: 1.0753 [2026-04-18 12:42:02] Validation | Batch 200/1567 | Loss: 1.0785 [2026-04-18 12:42:03] Validation | Batch 210/1567 | Loss: 1.0810 [2026-04-18 12:42:04] Validation | Batch 220/1567 | Loss: 1.0843 [2026-04-18 12:42:05] Validation | Batch 230/1567 | Loss: 1.0868 [2026-04-18 12:42:06] Validation | Batch 240/1567 | Loss: 1.0845 [2026-04-18 12:42:06] Validation | Batch 250/1567 | Loss: 1.0793 [2026-04-18 12:42:07] Validation | Batch 260/1567 | Loss: 1.0744 [2026-04-18 12:42:08] Validation | Batch 270/1567 | Loss: 1.0712 [2026-04-18 12:42:09] Validation | Batch 280/1567 | Loss: 1.0722 [2026-04-18 12:42:10] Validation | Batch 290/1567 | Loss: 1.0775 [2026-04-18 12:42:10] Validation | Batch 300/1567 | Loss: 1.0832 [2026-04-18 12:42:11] Validation | Batch 310/1567 | Loss: 1.0826 [2026-04-18 12:42:12] Validation | Batch 320/1567 | Loss: 1.0821 [2026-04-18 12:42:13] Validation | Batch 330/1567 | Loss: 1.0796 [2026-04-18 12:42:14] Validation | Batch 340/1567 | Loss: 1.0833 [2026-04-18 12:42:15] Validation | Batch 350/1567 | Loss: 1.0820 [2026-04-18 12:42:15] Validation | Batch 360/1567 | Loss: 1.0799 [2026-04-18 12:42:16] Validation | Batch 370/1567 | Loss: 1.0776 [2026-04-18 12:42:17] Validation | Batch 380/1567 | Loss: 1.0805 [2026-04-18 12:42:18] Validation | Batch 390/1567 | Loss: 1.0813 [2026-04-18 12:42:18] Validation | Batch 400/1567 | Loss: 1.0827 [2026-04-18 12:42:19] Validation | Batch 410/1567 | Loss: 1.0820 [2026-04-18 12:42:20] Validation | Batch 420/1567 | Loss: 1.0824 [2026-04-18 12:42:21] Validation | Batch 430/1567 | Loss: 1.0821 [2026-04-18 12:42:22] Validation | Batch 440/1567 | Loss: 1.0813 [2026-04-18 12:42:23] Validation | Batch 450/1567 | Loss: 1.0813 [2026-04-18 12:42:24] Validation | Batch 460/1567 | Loss: 1.0803 [2026-04-18 12:42:24] Validation | Batch 470/1567 | Loss: 1.0785 [2026-04-18 12:42:25] Validation | Batch 480/1567 | Loss: 1.0767 [2026-04-18 12:42:26] Validation | Batch 490/1567 | Loss: 1.0770 [2026-04-18 12:42:27] Validation | Batch 500/1567 | Loss: 1.0780 [2026-04-18 12:42:27] Validation | Batch 510/1567 | Loss: 1.0803 [2026-04-18 12:42:28] Validation | Batch 520/1567 | Loss: 1.0819 [2026-04-18 12:42:29] Validation | Batch 530/1567 | Loss: 1.0817 [2026-04-18 12:42:30] Validation | Batch 540/1567 | Loss: 1.0838 [2026-04-18 12:42:31] Validation | Batch 550/1567 | Loss: 1.0867 [2026-04-18 12:42:32] Validation | Batch 560/1567 | Loss: 1.0862 [2026-04-18 12:42:33] Validation | Batch 570/1567 | Loss: 1.0857 [2026-04-18 12:42:33] Validation | Batch 580/1567 | Loss: 1.0844 [2026-04-18 12:42:34] Validation | Batch 590/1567 | Loss: 1.0834 [2026-04-18 12:42:35] Validation | Batch 600/1567 | Loss: 1.0814 [2026-04-18 12:42:36] Validation | Batch 610/1567 | Loss: 1.0805 [2026-04-18 12:42:37] Validation | Batch 620/1567 | Loss: 1.0817 [2026-04-18 12:42:38] Validation | Batch 630/1567 | Loss: 1.0798 [2026-04-18 12:42:39] Validation | Batch 640/1567 | Loss: 1.0811 [2026-04-18 12:42:40] Validation | Batch 650/1567 | Loss: 1.0805 [2026-04-18 12:42:40] Validation | Batch 660/1567 | Loss: 1.0791 [2026-04-18 12:42:41] Validation | Batch 670/1567 | Loss: 1.0768 [2026-04-18 12:42:42] Validation | Batch 680/1567 | Loss: 1.0761 [2026-04-18 12:42:42] Validation | Batch 690/1567 | Loss: 1.0771 [2026-04-18 12:42:43] Validation | Batch 700/1567 | Loss: 1.0758 [2026-04-18 12:42:44] Validation | Batch 710/1567 | Loss: 1.0770 [2026-04-18 12:42:45] Validation | Batch 720/1567 | Loss: 1.0767 [2026-04-18 12:42:46] Validation | Batch 730/1567 | Loss: 1.0780 [2026-04-18 12:42:46] Validation | Batch 740/1567 | Loss: 1.0788 [2026-04-18 12:42:47] Validation | Batch 750/1567 | Loss: 1.0787 [2026-04-18 12:42:48] Validation | Batch 760/1567 | Loss: 1.0788 [2026-04-18 12:42:49] Validation | Batch 770/1567 | Loss: 1.0808 [2026-04-18 12:42:50] Validation | Batch 780/1567 | Loss: 1.0820 [2026-04-18 12:42:51] Validation | Batch 790/1567 | Loss: 1.0816 [2026-04-18 12:42:51] Validation | Batch 800/1567 | Loss: 1.0832 [2026-04-18 12:42:52] Validation | Batch 810/1567 | Loss: 1.0833 [2026-04-18 12:42:53] Validation | Batch 820/1567 | Loss: 1.0831 [2026-04-18 12:42:54] Validation | Batch 830/1567 | Loss: 1.0815 [2026-04-18 12:42:54] Validation | Batch 840/1567 | Loss: 1.0818 [2026-04-18 12:42:55] Validation | Batch 850/1567 | Loss: 1.0803 [2026-04-18 12:42:56] Validation | Batch 860/1567 | Loss: 1.0815 [2026-04-18 12:42:56] Validation | Batch 870/1567 | Loss: 1.0821 [2026-04-18 12:42:57] Validation | Batch 880/1567 | Loss: 1.0831 [2026-04-18 12:42:58] Validation | Batch 890/1567 | Loss: 1.0832 [2026-04-18 12:42:59] Validation | Batch 900/1567 | Loss: 1.0850 [2026-04-18 12:42:59] Validation | Batch 910/1567 | Loss: 1.0854 [2026-04-18 12:43:00] Validation | Batch 920/1567 | Loss: 1.0873 [2026-04-18 12:43:01] Validation | Batch 930/1567 | Loss: 1.0850 [2026-04-18 12:43:02] Validation | Batch 940/1567 | Loss: 1.0847 [2026-04-18 12:43:03] Validation | Batch 950/1567 | Loss: 1.0836 [2026-04-18 12:43:03] Validation | Batch 960/1567 | Loss: 1.0824 [2026-04-18 12:43:04] Validation | Batch 970/1567 | Loss: 1.0837 [2026-04-18 12:43:05] Validation | Batch 980/1567 | Loss: 1.0842 [2026-04-18 12:43:05] Validation | Batch 990/1567 | Loss: 1.0838 [2026-04-18 12:43:06] Validation | Batch 1000/1567 | Loss: 1.0837 [2026-04-18 12:43:07] Validation | Batch 1010/1567 | Loss: 1.0816 [2026-04-18 12:43:08] Validation | Batch 1020/1567 | Loss: 1.0819 [2026-04-18 12:43:09] Validation | Batch 1030/1567 | Loss: 1.0832 [2026-04-18 12:43:09] Validation | Batch 1040/1567 | Loss: 1.0829 [2026-04-18 12:43:10] Validation | Batch 1050/1567 | Loss: 1.0841 [2026-04-18 12:43:11] Validation | Batch 1060/1567 | Loss: 1.0837 [2026-04-18 12:43:12] Validation | Batch 1070/1567 | Loss: 1.0832 [2026-04-18 12:43:13] Validation | Batch 1080/1567 | Loss: 1.0841 [2026-04-18 12:43:13] Validation | Batch 1090/1567 | Loss: 1.0841 [2026-04-18 12:43:14] Validation | Batch 1100/1567 | Loss: 1.0845 [2026-04-18 12:43:15] Validation | Batch 1110/1567 | Loss: 1.0842 [2026-04-18 12:43:15] Validation | Batch 1120/1567 | Loss: 1.0847 [2026-04-18 12:43:16] Validation | Batch 1130/1567 | Loss: 1.0849 [2026-04-18 12:43:17] Validation | Batch 1140/1567 | Loss: 1.0856 [2026-04-18 12:43:18] Validation | Batch 1150/1567 | Loss: 1.0856 [2026-04-18 12:43:19] Validation | Batch 1160/1567 | Loss: 1.0864 [2026-04-18 12:43:20] Validation | Batch 1170/1567 | Loss: 1.0861 [2026-04-18 12:43:21] Validation | Batch 1180/1567 | Loss: 1.0860 [2026-04-18 12:43:21] Validation | Batch 1190/1567 | Loss: 1.0866 [2026-04-18 12:43:22] Validation | Batch 1200/1567 | Loss: 1.0859 [2026-04-18 12:43:23] Validation | Batch 1210/1567 | Loss: 1.0847 [2026-04-18 12:43:24] Validation | Batch 1220/1567 | Loss: 1.0851 [2026-04-18 12:43:25] Validation | Batch 1230/1567 | Loss: 1.0871 [2026-04-18 12:43:25] Validation | Batch 1240/1567 | Loss: 1.0861 [2026-04-18 12:43:26] Validation | Batch 1250/1567 | Loss: 1.0857 [2026-04-18 12:43:27] Validation | Batch 1260/1567 | Loss: 1.0866 [2026-04-18 12:43:28] Validation | Batch 1270/1567 | Loss: 1.0867 [2026-04-18 12:43:29] Validation | Batch 1280/1567 | Loss: 1.0858 [2026-04-18 12:43:30] Validation | Batch 1290/1567 | Loss: 1.0861 [2026-04-18 12:43:31] Validation | Batch 1300/1567 | Loss: 1.0864 [2026-04-18 12:43:31] Validation | Batch 1310/1567 | Loss: 1.0868 [2026-04-18 12:43:32] Validation | Batch 1320/1567 | Loss: 1.0860 [2026-04-18 12:43:33] Validation | Batch 1330/1567 | Loss: 1.0856 [2026-04-18 12:43:34] Validation | Batch 1340/1567 | Loss: 1.0853 [2026-04-18 12:43:35] Validation | Batch 1350/1567 | Loss: 1.0859 [2026-04-18 12:43:35] Validation | Batch 1360/1567 | Loss: 1.0853 [2026-04-18 12:43:36] Validation | Batch 1370/1567 | Loss: 1.0855 [2026-04-18 12:43:37] Validation | Batch 1380/1567 | Loss: 1.0866 [2026-04-18 12:43:38] Validation | Batch 1390/1567 | Loss: 1.0865 [2026-04-18 12:43:38] Validation | Batch 1400/1567 | Loss: 1.0868 [2026-04-18 12:43:39] Validation | Batch 1410/1567 | Loss: 1.0864 [2026-04-18 12:43:40] Validation | Batch 1420/1567 | Loss: 1.0868 [2026-04-18 12:43:40] Validation | Batch 1430/1567 | Loss: 1.0864 [2026-04-18 12:43:41] Validation | Batch 1440/1567 | Loss: 1.0868 [2026-04-18 12:43:42] Validation | Batch 1450/1567 | Loss: 1.0862 [2026-04-18 12:43:43] Validation | Batch 1460/1567 | Loss: 1.0858 [2026-04-18 12:43:43] Validation | Batch 1470/1567 | Loss: 1.0850 [2026-04-18 12:43:44] Validation | Batch 1480/1567 | Loss: 1.0833 [2026-04-18 12:43:45] Validation | Batch 1490/1567 | Loss: 1.0833 [2026-04-18 12:43:46] Validation | Batch 1500/1567 | Loss: 1.0833 [2026-04-18 12:43:47] Validation | Batch 1510/1567 | Loss: 1.0829 [2026-04-18 12:43:47] Validation | Batch 1520/1567 | Loss: 1.0821 [2026-04-18 12:43:48] Validation | Batch 1530/1567 | Loss: 1.0831 [2026-04-18 12:43:49] Validation | Batch 1540/1567 | Loss: 1.0841 [2026-04-18 12:43:50] Validation | Batch 1550/1567 | Loss: 1.0842 [2026-04-18 12:43:51] Validation | Batch 1560/1567 | Loss: 1.0833 [2026-04-18 12:43:51] Validation | Batch 1567/1567 | Loss: 1.0836 [2026-04-18 12:43:51] Validation | Loss: 1.0836 | PPL: 2.98 | Time: 126.17s [2026-04-18 12:43:55] Epoch 1 | Step 3010 | Loss: 1.0341 | LR: 2.00e-05 [2026-04-18 12:43:59] Epoch 1 | Step 3020 | Loss: 1.0338 | LR: 2.00e-05 [2026-04-18 12:44:02] Epoch 1 | Step 3030 | Loss: 1.0336 | LR: 2.00e-05 [2026-04-18 12:44:06] Epoch 1 | Step 3040 | Loss: 1.0330 | LR: 2.00e-05 [2026-04-18 12:44:09] Epoch 1 | Step 3050 | Loss: 1.0326 | LR: 2.00e-05 [2026-04-18 12:44:13] Epoch 1 | Step 3060 | Loss: 1.0324 | LR: 2.00e-05 [2026-04-18 12:44:16] Epoch 1 | Step 3070 | Loss: 1.0321 | LR: 2.00e-05 [2026-04-18 12:44:20] Epoch 1 | Step 3080 | Loss: 1.0322 | LR: 2.00e-05 [2026-04-18 12:44:24] Epoch 1 | Step 3090 | Loss: 1.0318 | LR: 2.00e-05 [2026-04-18 12:44:27] Epoch 1 | Step 3100 | Loss: 1.0316 | LR: 2.00e-05 [2026-04-18 12:44:31] Epoch 1 | Step 3110 | Loss: 1.0313 | LR: 2.00e-05 [2026-04-18 12:44:34] Epoch 1 | Step 3120 | Loss: 1.0316 | LR: 2.00e-05 [2026-04-18 12:44:38] Epoch 1 | Step 3130 | Loss: 1.0314 | LR: 2.00e-05 [2026-04-18 12:44:41] Epoch 1 | Step 3140 | Loss: 1.0314 | LR: 2.00e-05 [2026-04-18 12:44:45] Epoch 1 | Step 3150 | Loss: 1.0316 | LR: 2.00e-05 [2026-04-18 12:44:49] Epoch 1 | Step 3160 | Loss: 1.0316 | LR: 2.00e-05 [2026-04-18 12:44:52] Epoch 1 | Step 3170 | Loss: 1.0316 | LR: 2.00e-05 [2026-04-18 12:44:56] Epoch 1 | Step 3180 | Loss: 1.0316 | LR: 2.00e-05 [2026-04-18 12:44:59] Epoch 1 | Step 3190 | Loss: 1.0311 | LR: 2.00e-05 [2026-04-18 12:45:03] Epoch 1 | Step 3200 | Loss: 1.0309 | LR: 2.00e-05 [2026-04-18 12:45:06] Epoch 1 | Step 3210 | Loss: 1.0306 | LR: 2.00e-05 [2026-04-18 12:45:10] Epoch 1 | Step 3220 | Loss: 1.0302 | LR: 2.00e-05 [2026-04-18 12:45:13] Epoch 1 | Step 3230 | Loss: 1.0305 | LR: 2.00e-05 [2026-04-18 12:45:16] Epoch 1 | Step 3240 | Loss: 1.0303 | LR: 2.00e-05 [2026-04-18 12:45:20] Epoch 1 | Step 3250 | Loss: 1.0304 | LR: 2.00e-05 [2026-04-18 12:45:24] Epoch 1 | Step 3260 | Loss: 1.0302 | LR: 2.00e-05 [2026-04-18 12:45:27] Epoch 1 | Step 3270 | Loss: 1.0301 | LR: 2.00e-05 [2026-04-18 12:45:31] Epoch 1 | Step 3280 | Loss: 1.0296 | LR: 2.00e-05 [2026-04-18 12:45:35] Epoch 1 | Step 3290 | Loss: 1.0295 | LR: 2.00e-05 [2026-04-18 12:45:38] Epoch 1 | Step 3300 | Loss: 1.0295 | LR: 2.00e-05 [2026-04-18 12:45:42] Epoch 1 | Step 3310 | Loss: 1.0293 | LR: 2.00e-05 [2026-04-18 12:45:46] Epoch 1 | Step 3320 | Loss: 1.0292 | LR: 2.00e-05 [2026-04-18 12:45:50] Epoch 1 | Step 3330 | Loss: 1.0290 | LR: 2.00e-05 [2026-04-18 12:45:53] Epoch 1 | Step 3340 | Loss: 1.0290 | LR: 2.00e-05 [2026-04-18 12:45:56] Epoch 1 | Step 3350 | Loss: 1.0286 | LR: 2.00e-05 [2026-04-18 12:46:00] Epoch 1 | Step 3360 | Loss: 1.0284 | LR: 2.00e-05 [2026-04-18 12:46:03] Epoch 1 | Step 3370 | Loss: 1.0285 | LR: 2.00e-05 [2026-04-18 12:46:07] Epoch 1 | Step 3380 | Loss: 1.0283 | LR: 2.00e-05 [2026-04-18 12:46:11] Epoch 1 | Step 3390 | Loss: 1.0284 | LR: 2.00e-05 [2026-04-18 12:46:15] Epoch 1 | Step 3400 | Loss: 1.0289 | LR: 2.00e-05 [2026-04-18 12:46:19] Epoch 1 | Step 3410 | Loss: 1.0286 | LR: 2.00e-05 [2026-04-18 12:46:23] Epoch 1 | Step 3420 | Loss: 1.0282 | LR: 2.00e-05 [2026-04-18 12:46:27] Epoch 1 | Step 3430 | Loss: 1.0282 | LR: 2.00e-05 [2026-04-18 12:46:30] Epoch 1 | Step 3440 | Loss: 1.0283 | LR: 2.00e-05 [2026-04-18 12:46:34] Epoch 1 | Step 3450 | Loss: 1.0282 | LR: 2.00e-05 [2026-04-18 12:46:38] Epoch 1 | Step 3460 | Loss: 1.0281 | LR: 2.00e-05 [2026-04-18 12:46:41] Epoch 1 | Step 3470 | Loss: 1.0280 | LR: 2.00e-05 [2026-04-18 12:46:45] Epoch 1 | Step 3480 | Loss: 1.0280 | LR: 2.00e-05 [2026-04-18 12:46:48] Epoch 1 | Step 3490 | Loss: 1.0277 | LR: 2.00e-05 [2026-04-18 12:46:52] Epoch 1 | Step 3500 | Loss: 1.0275 | LR: 2.00e-05 [2026-04-18 12:46:55] Epoch 1 | Step 3510 | Loss: 1.0277 | LR: 2.00e-05 [2026-04-18 12:46:59] Epoch 1 | Step 3520 | Loss: 1.0273 | LR: 2.00e-05 [2026-04-18 12:47:02] Epoch 1 | Step 3530 | Loss: 1.0275 | LR: 2.00e-05 [2026-04-18 12:47:06] Epoch 1 | Step 3540 | Loss: 1.0272 | LR: 2.00e-05 [2026-04-18 12:47:09] Epoch 1 | Step 3550 | Loss: 1.0270 | LR: 2.00e-05 [2026-04-18 12:47:13] Epoch 1 | Step 3560 | Loss: 1.0270 | LR: 2.00e-05 [2026-04-18 12:47:17] Epoch 1 | Step 3570 | Loss: 1.0269 | LR: 2.00e-05 [2026-04-18 12:47:20] Epoch 1 | Step 3580 | Loss: 1.0268 | LR: 2.00e-05 [2026-04-18 12:47:24] Epoch 1 | Step 3590 | Loss: 1.0268 | LR: 2.00e-05 [2026-04-18 12:47:27] Epoch 1 | Step 3600 | Loss: 1.0265 | LR: 2.00e-05 [2026-04-18 12:47:30] Epoch 1 | Step 3610 | Loss: 1.0263 | LR: 2.00e-05 [2026-04-18 12:47:33] Epoch 1 | Step 3620 | Loss: 1.0260 | LR: 2.00e-05 [2026-04-18 12:47:37] Epoch 1 | Step 3630 | Loss: 1.0263 | LR: 2.00e-05 [2026-04-18 12:47:40] Epoch 1 | Step 3640 | Loss: 1.0264 | LR: 2.00e-05 [2026-04-18 12:47:44] Epoch 1 | Step 3650 | Loss: 1.0264 | LR: 2.00e-05 [2026-04-18 12:47:48] Epoch 1 | Step 3660 | Loss: 1.0262 | LR: 2.00e-05 [2026-04-18 12:47:51] Epoch 1 | Step 3670 | Loss: 1.0259 | LR: 2.00e-05 [2026-04-18 12:47:55] Epoch 1 | Step 3680 | Loss: 1.0259 | LR: 2.00e-05 [2026-04-18 12:47:58] Epoch 1 | Step 3690 | Loss: 1.0257 | LR: 2.00e-05 [2026-04-18 12:48:02] Epoch 1 | Step 3700 | Loss: 1.0255 | LR: 2.00e-05 [2026-04-18 12:48:05] Epoch 1 | Step 3710 | Loss: 1.0252 | LR: 2.00e-05 [2026-04-18 12:48:09] Epoch 1 | Step 3720 | Loss: 1.0251 | LR: 2.00e-05 [2026-04-18 12:48:12] Epoch 1 | Step 3730 | Loss: 1.0251 | LR: 2.00e-05 [2026-04-18 12:48:16] Epoch 1 | Step 3740 | Loss: 1.0252 | LR: 2.00e-05 [2026-04-18 12:48:20] Epoch 1 | Step 3750 | Loss: 1.0249 | LR: 2.00e-05 [2026-04-18 12:48:24] Epoch 1 | Step 3760 | Loss: 1.0249 | LR: 2.00e-05 [2026-04-18 12:48:27] Epoch 1 | Step 3770 | Loss: 1.0250 | LR: 2.00e-05 [2026-04-18 12:48:31] Epoch 1 | Step 3780 | Loss: 1.0250 | LR: 2.00e-05 [2026-04-18 12:48:34] Epoch 1 | Step 3790 | Loss: 1.0250 | LR: 2.00e-05 [2026-04-18 12:48:37] Epoch 1 | Step 3800 | Loss: 1.0251 | LR: 2.00e-05 [2026-04-18 12:48:41] Epoch 1 | Step 3810 | Loss: 1.0245 | LR: 2.00e-05 [2026-04-18 12:48:45] Epoch 1 | Step 3820 | Loss: 1.0243 | LR: 2.00e-05 [2026-04-18 12:48:48] Epoch 1 | Step 3830 | Loss: 1.0241 | LR: 2.00e-05 [2026-04-18 12:48:52] Epoch 1 | Step 3840 | Loss: 1.0242 | LR: 2.00e-05 [2026-04-18 12:48:55] Epoch 1 | Step 3850 | Loss: 1.0238 | LR: 2.00e-05 [2026-04-18 12:48:59] Epoch 1 | Step 3860 | Loss: 1.0237 | LR: 2.00e-05 [2026-04-18 12:49:03] Epoch 1 | Step 3870 | Loss: 1.0236 | LR: 2.00e-05 [2026-04-18 12:49:06] Epoch 1 | Step 3880 | Loss: 1.0231 | LR: 2.00e-05 [2026-04-18 12:49:10] Epoch 1 | Step 3890 | Loss: 1.0227 | LR: 2.00e-05 [2026-04-18 12:49:14] Epoch 1 | Step 3900 | Loss: 1.0228 | LR: 2.00e-05 [2026-04-18 12:49:17] Epoch 1 | Step 3910 | Loss: 1.0229 | LR: 2.00e-05 [2026-04-18 12:49:21] Epoch 1 | Step 3920 | Loss: 1.0230 | LR: 2.00e-05 [2026-04-18 12:49:24] Epoch 1 | Step 3930 | Loss: 1.0228 | LR: 2.00e-05 [2026-04-18 12:49:28] Epoch 1 | Step 3940 | Loss: 1.0228 | LR: 2.00e-05 [2026-04-18 12:49:32] Epoch 1 | Step 3950 | Loss: 1.0226 | LR: 2.00e-05 [2026-04-18 12:49:35] Epoch 1 | Step 3960 | Loss: 1.0226 | LR: 2.00e-05 [2026-04-18 12:49:39] Epoch 1 | Step 3970 | Loss: 1.0224 | LR: 2.00e-05 [2026-04-18 12:49:43] Epoch 1 | Step 3980 | Loss: 1.0223 | LR: 2.00e-05 [2026-04-18 12:49:46] Epoch 1 | Step 3990 | Loss: 1.0221 | LR: 2.00e-05 [2026-04-18 12:49:50] Epoch 1 | Step 4000 | Loss: 1.0222 | LR: 2.00e-05 [2026-04-18 12:49:50] Validation | Batch 10/1567 | Loss: 0.9620 [2026-04-18 12:49:51] Validation | Batch 20/1567 | Loss: 1.0251 [2026-04-18 12:49:52] Validation | Batch 30/1567 | Loss: 1.0691 [2026-04-18 12:49:53] Validation | Batch 40/1567 | Loss: 1.0930 [2026-04-18 12:49:54] Validation | Batch 50/1567 | Loss: 1.0691 [2026-04-18 12:49:55] Validation | Batch 60/1567 | Loss: 1.0571 [2026-04-18 12:49:56] Validation | Batch 70/1567 | Loss: 1.0399 [2026-04-18 12:49:57] Validation | Batch 80/1567 | Loss: 1.0570 [2026-04-18 12:49:57] Validation | Batch 90/1567 | Loss: 1.0636 [2026-04-18 12:49:58] Validation | Batch 100/1567 | Loss: 1.0716 [2026-04-18 12:49:59] Validation | Batch 110/1567 | Loss: 1.0610 [2026-04-18 12:50:00] Validation | Batch 120/1567 | Loss: 1.0715 [2026-04-18 12:50:01] Validation | Batch 130/1567 | Loss: 1.0747 [2026-04-18 12:50:02] Validation | Batch 140/1567 | Loss: 1.0786 [2026-04-18 12:50:02] Validation | Batch 150/1567 | Loss: 1.0864 [2026-04-18 12:50:03] Validation | Batch 160/1567 | Loss: 1.0865 [2026-04-18 12:50:04] Validation | Batch 170/1567 | Loss: 1.0714 [2026-04-18 12:50:05] Validation | Batch 180/1567 | Loss: 1.0715 [2026-04-18 12:50:06] Validation | Batch 190/1567 | Loss: 1.0685 [2026-04-18 12:50:07] Validation | Batch 200/1567 | Loss: 1.0717 [2026-04-18 12:50:07] Validation | Batch 210/1567 | Loss: 1.0734 [2026-04-18 12:50:08] Validation | Batch 220/1567 | Loss: 1.0769 [2026-04-18 12:50:09] Validation | Batch 230/1567 | Loss: 1.0810 [2026-04-18 12:50:10] Validation | Batch 240/1567 | Loss: 1.0797 [2026-04-18 12:50:11] Validation | Batch 250/1567 | Loss: 1.0742 [2026-04-18 12:50:11] Validation | Batch 260/1567 | Loss: 1.0691 [2026-04-18 12:50:12] Validation | Batch 270/1567 | Loss: 1.0659 [2026-04-18 12:50:13] Validation | Batch 280/1567 | Loss: 1.0666 [2026-04-18 12:50:14] Validation | Batch 290/1567 | Loss: 1.0725 [2026-04-18 12:50:15] Validation | Batch 300/1567 | Loss: 1.0775 [2026-04-18 12:50:16] Validation | Batch 310/1567 | Loss: 1.0770 [2026-04-18 12:50:16] Validation | Batch 320/1567 | Loss: 1.0773 [2026-04-18 12:50:17] Validation | Batch 330/1567 | Loss: 1.0745 [2026-04-18 12:50:18] Validation | Batch 340/1567 | Loss: 1.0783 [2026-04-18 12:50:19] Validation | Batch 350/1567 | Loss: 1.0773 [2026-04-18 12:50:20] Validation | Batch 360/1567 | Loss: 1.0757 [2026-04-18 12:50:21] Validation | Batch 370/1567 | Loss: 1.0733 [2026-04-18 12:50:21] Validation | Batch 380/1567 | Loss: 1.0771 [2026-04-18 12:50:22] Validation | Batch 390/1567 | Loss: 1.0785 [2026-04-18 12:50:23] Validation | Batch 400/1567 | Loss: 1.0795 [2026-04-18 12:50:24] Validation | Batch 410/1567 | Loss: 1.0791 [2026-04-18 12:50:24] Validation | Batch 420/1567 | Loss: 1.0792 [2026-04-18 12:50:25] Validation | Batch 430/1567 | Loss: 1.0790 [2026-04-18 12:50:26] Validation | Batch 440/1567 | Loss: 1.0784 [2026-04-18 12:50:27] Validation | Batch 450/1567 | Loss: 1.0785 [2026-04-18 12:50:28] Validation | Batch 460/1567 | Loss: 1.0771 [2026-04-18 12:50:29] Validation | Batch 470/1567 | Loss: 1.0756 [2026-04-18 12:50:29] Validation | Batch 480/1567 | Loss: 1.0731 [2026-04-18 12:50:30] Validation | Batch 490/1567 | Loss: 1.0734 [2026-04-18 12:50:31] Validation | Batch 500/1567 | Loss: 1.0741 [2026-04-18 12:50:32] Validation | Batch 510/1567 | Loss: 1.0765 [2026-04-18 12:50:32] Validation | Batch 520/1567 | Loss: 1.0784 [2026-04-18 12:50:33] Validation | Batch 530/1567 | Loss: 1.0781 [2026-04-18 12:50:34] Validation | Batch 540/1567 | Loss: 1.0801 [2026-04-18 12:50:35] Validation | Batch 550/1567 | Loss: 1.0830 [2026-04-18 12:50:36] Validation | Batch 560/1567 | Loss: 1.0829 [2026-04-18 12:50:37] Validation | Batch 570/1567 | Loss: 1.0830 [2026-04-18 12:50:38] Validation | Batch 580/1567 | Loss: 1.0817 [2026-04-18 12:50:39] Validation | Batch 590/1567 | Loss: 1.0805 [2026-04-18 12:50:38] Validation | Batch 600/1567 | Loss: 1.0783 [2026-04-18 12:50:39] Validation | Batch 610/1567 | Loss: 1.0771 [2026-04-18 12:50:40] Validation | Batch 620/1567 | Loss: 1.0782 [2026-04-18 12:50:41] Validation | Batch 630/1567 | Loss: 1.0763 [2026-04-18 12:50:42] Validation | Batch 640/1567 | Loss: 1.0773 [2026-04-18 12:50:43] Validation | Batch 650/1567 | Loss: 1.0766 [2026-04-18 12:50:43] Validation | Batch 660/1567 | Loss: 1.0752 [2026-04-18 12:50:44] Validation | Batch 670/1567 | Loss: 1.0730 [2026-04-18 12:50:45] Validation | Batch 680/1567 | Loss: 1.0722 [2026-04-18 12:50:45] Validation | Batch 690/1567 | Loss: 1.0731 [2026-04-18 12:50:46] Validation | Batch 700/1567 | Loss: 1.0716 [2026-04-18 12:50:47] Validation | Batch 710/1567 | Loss: 1.0727 [2026-04-18 12:50:48] Validation | Batch 720/1567 | Loss: 1.0724 [2026-04-18 12:50:49] Validation | Batch 730/1567 | Loss: 1.0734 [2026-04-18 12:50:49] Validation | Batch 740/1567 | Loss: 1.0739 [2026-04-18 12:50:50] Validation | Batch 750/1567 | Loss: 1.0739 [2026-04-18 12:50:51] Validation | Batch 760/1567 | Loss: 1.0740 [2026-04-18 12:50:52] Validation | Batch 770/1567 | Loss: 1.0758 [2026-04-18 12:50:53] Validation | Batch 780/1567 | Loss: 1.0770 [2026-04-18 12:50:54] Validation | Batch 790/1567 | Loss: 1.0766 [2026-04-18 12:50:54] Validation | Batch 800/1567 | Loss: 1.0782 [2026-04-18 12:50:55] Validation | Batch 810/1567 | Loss: 1.0785 [2026-04-18 12:50:56] Validation | Batch 820/1567 | Loss: 1.0784 [2026-04-18 12:50:57] Validation | Batch 830/1567 | Loss: 1.0767 [2026-04-18 12:50:57] Validation | Batch 840/1567 | Loss: 1.0770 [2026-04-18 12:50:58] Validation | Batch 850/1567 | Loss: 1.0758 [2026-04-18 12:50:59] Validation | Batch 860/1567 | Loss: 1.0772 [2026-04-18 12:50:59] Validation | Batch 870/1567 | Loss: 1.0779 [2026-04-18 12:51:00] Validation | Batch 880/1567 | Loss: 1.0789 [2026-04-18 12:51:01] Validation | Batch 890/1567 | Loss: 1.0791 [2026-04-18 12:51:02] Validation | Batch 900/1567 | Loss: 1.0809 [2026-04-18 12:51:02] Validation | Batch 910/1567 | Loss: 1.0815 [2026-04-18 12:51:03] Validation | Batch 920/1567 | Loss: 1.0832 [2026-04-18 12:51:04] Validation | Batch 930/1567 | Loss: 1.0809 [2026-04-18 12:51:05] Validation | Batch 940/1567 | Loss: 1.0807 [2026-04-18 12:51:05] Validation | Batch 950/1567 | Loss: 1.0797 [2026-04-18 12:51:06] Validation | Batch 960/1567 | Loss: 1.0784 [2026-04-18 12:51:07] Validation | Batch 970/1567 | Loss: 1.0796 [2026-04-18 12:51:08] Validation | Batch 980/1567 | Loss: 1.0800 [2026-04-18 12:51:08] Validation | Batch 990/1567 | Loss: 1.0792 [2026-04-18 12:51:09] Validation | Batch 1000/1567 | Loss: 1.0793 [2026-04-18 12:51:10] Validation | Batch 1010/1567 | Loss: 1.0772 [2026-04-18 12:51:11] Validation | Batch 1020/1567 | Loss: 1.0775 [2026-04-18 12:51:12] Validation | Batch 1030/1567 | Loss: 1.0788 [2026-04-18 12:51:13] Validation | Batch 1040/1567 | Loss: 1.0784 [2026-04-18 12:51:13] Validation | Batch 1050/1567 | Loss: 1.0795 [2026-04-18 12:51:14] Validation | Batch 1060/1567 | Loss: 1.0789 [2026-04-18 12:51:15] Validation | Batch 1070/1567 | Loss: 1.0781 [2026-04-18 12:51:16] Validation | Batch 1080/1567 | Loss: 1.0791 [2026-04-18 12:51:16] Validation | Batch 1090/1567 | Loss: 1.0790 [2026-04-18 12:51:17] Validation | Batch 1100/1567 | Loss: 1.0794 [2026-04-18 12:51:18] Validation | Batch 1110/1567 | Loss: 1.0792 [2026-04-18 12:51:18] Validation | Batch 1120/1567 | Loss: 1.0795 [2026-04-18 12:51:19] Validation | Batch 1130/1567 | Loss: 1.0797 [2026-04-18 12:51:20] Validation | Batch 1140/1567 | Loss: 1.0804 [2026-04-18 12:51:21] Validation | Batch 1150/1567 | Loss: 1.0805 [2026-04-18 12:51:22] Validation | Batch 1160/1567 | Loss: 1.0813 [2026-04-18 12:51:23] Validation | Batch 1170/1567 | Loss: 1.0810 [2026-04-18 12:51:24] Validation | Batch 1180/1567 | Loss: 1.0808 [2026-04-18 12:51:24] Validation | Batch 1190/1567 | Loss: 1.0815 [2026-04-18 12:51:25] Validation | Batch 1200/1567 | Loss: 1.0809 [2026-04-18 12:51:26] Validation | Batch 1210/1567 | Loss: 1.0800 [2026-04-18 12:51:27] Validation | Batch 1220/1567 | Loss: 1.0802 [2026-04-18 12:51:28] Validation | Batch 1230/1567 | Loss: 1.0820 [2026-04-18 12:51:28] Validation | Batch 1240/1567 | Loss: 1.0811 [2026-04-18 12:51:29] Validation | Batch 1250/1567 | Loss: 1.0809 [2026-04-18 12:51:30] Validation | Batch 1260/1567 | Loss: 1.0817 [2026-04-18 12:51:31] Validation | Batch 1270/1567 | Loss: 1.0817 [2026-04-18 12:51:32] Validation | Batch 1280/1567 | Loss: 1.0810 [2026-04-18 12:51:33] Validation | Batch 1290/1567 | Loss: 1.0814 [2026-04-18 12:51:34] Validation | Batch 1300/1567 | Loss: 1.0817 [2026-04-18 12:51:34] Validation | Batch 1310/1567 | Loss: 1.0821 [2026-04-18 12:51:35] Validation | Batch 1320/1567 | Loss: 1.0811 [2026-04-18 12:51:36] Validation | Batch 1330/1567 | Loss: 1.0807 [2026-04-18 12:51:37] Validation | Batch 1340/1567 | Loss: 1.0803 [2026-04-18 12:51:37] Validation | Batch 1350/1567 | Loss: 1.0808 [2026-04-18 12:51:38] Validation | Batch 1360/1567 | Loss: 1.0803 [2026-04-18 12:51:39] Validation | Batch 1370/1567 | Loss: 1.0805 [2026-04-18 12:51:40] Validation | Batch 1380/1567 | Loss: 1.0815 [2026-04-18 12:51:41] Validation | Batch 1390/1567 | Loss: 1.0815 [2026-04-18 12:51:41] Validation | Batch 1400/1567 | Loss: 1.0816 [2026-04-18 12:51:42] Validation | Batch 1410/1567 | Loss: 1.0813 [2026-04-18 12:51:42] Validation | Batch 1420/1567 | Loss: 1.0818 [2026-04-18 12:51:43] Validation | Batch 1430/1567 | Loss: 1.0813 [2026-04-18 12:51:44] Validation | Batch 1440/1567 | Loss: 1.0816 [2026-04-18 12:51:45] Validation | Batch 1450/1567 | Loss: 1.0810 [2026-04-18 12:51:46] Validation | Batch 1460/1567 | Loss: 1.0807 [2026-04-18 12:51:46] Validation | Batch 1470/1567 | Loss: 1.0798 [2026-04-18 12:51:47] Validation | Batch 1480/1567 | Loss: 1.0782 [2026-04-18 12:51:48] Validation | Batch 1490/1567 | Loss: 1.0783 [2026-04-18 12:51:49] Validation | Batch 1500/1567 | Loss: 1.0784 [2026-04-18 12:51:49] Validation | Batch 1510/1567 | Loss: 1.0780 [2026-04-18 12:51:50] Validation | Batch 1520/1567 | Loss: 1.0772 [2026-04-18 12:51:51] Validation | Batch 1530/1567 | Loss: 1.0781 [2026-04-18 12:51:52] Validation | Batch 1540/1567 | Loss: 1.0792 [2026-04-18 12:51:53] Validation | Batch 1550/1567 | Loss: 1.0794 [2026-04-18 12:51:54] Validation | Batch 1560/1567 | Loss: 1.0783 [2026-04-18 12:51:54] Validation | Batch 1567/1567 | Loss: 1.0787 [2026-04-18 12:51:54] Validation | Loss: 1.0787 | PPL: 2.97 | Time: 124.60s [2026-04-18 12:51:58] New best model saved! Val loss: 1.0787 [2026-04-18 12:52:01] Epoch 1 | Step 4010 | Loss: 1.0220 | LR: 2.00e-05 [2026-04-18 12:52:05] Epoch 1 | Step 4020 | Loss: 1.0222 | LR: 2.00e-05 [2026-04-18 12:52:08] Epoch 1 | Step 4030 | Loss: 1.0218 | LR: 2.00e-05 [2026-04-18 12:52:12] Epoch 1 | Step 4040 | Loss: 1.0214 | LR: 2.00e-05 [2026-04-18 12:52:16] Epoch 1 | Step 4050 | Loss: 1.0213 | LR: 2.00e-05 [2026-04-18 12:52:19] Epoch 1 | Step 4060 | Loss: 1.0207 | LR: 2.00e-05 [2026-04-18 12:52:23] Epoch 1 | Step 4070 | Loss: 1.0207 | LR: 2.00e-05 [2026-04-18 12:52:26] Epoch 1 | Step 4080 | Loss: 1.0207 | LR: 2.00e-05 [2026-04-18 12:52:30] Epoch 1 | Step 4090 | Loss: 1.0208 | LR: 2.00e-05 [2026-04-18 12:52:33] Epoch 1 | Step 4100 | Loss: 1.0208 | LR: 2.00e-05 [2026-04-18 12:52:37] Epoch 1 | Step 4110 | Loss: 1.0207 | LR: 2.00e-05 [2026-04-18 12:52:41] Epoch 1 | Step 4120 | Loss: 1.0210 | LR: 2.00e-05 [2026-04-18 12:52:44] Epoch 1 | Step 4130 | Loss: 1.0208 | LR: 2.00e-05 [2026-04-18 12:52:48] Epoch 1 | Step 4140 | Loss: 1.0210 | LR: 2.00e-05 [2026-04-18 12:52:52] Epoch 1 | Step 4150 | Loss: 1.0215 | LR: 2.00e-05 [2026-04-18 12:52:56] Epoch 1 | Step 4160 | Loss: 1.0217 | LR: 2.00e-05 [2026-04-18 12:52:59] Epoch 1 | Step 4170 | Loss: 1.0215 | LR: 2.00e-05 [2026-04-18 12:53:03] Epoch 1 | Step 4180 | Loss: 1.0214 | LR: 2.00e-05 [2026-04-18 12:53:07] Epoch 1 | Step 4190 | Loss: 1.0212 | LR: 2.00e-05 [2026-04-18 12:53:11] Epoch 1 | Step 4200 | Loss: 1.0216 | LR: 2.00e-05 [2026-04-18 12:53:15] Epoch 1 | Step 4210 | Loss: 1.0214 | LR: 2.00e-05 [2026-04-18 12:53:19] Epoch 1 | Step 4220 | Loss: 1.0218 | LR: 2.00e-05 [2026-04-18 12:53:22] Epoch 1 | Step 4230 | Loss: 1.0219 | LR: 2.00e-05 [2026-04-18 12:53:26] Epoch 1 | Step 4240 | Loss: 1.0220 | LR: 2.00e-05 [2026-04-18 12:53:30] Epoch 1 | Step 4250 | Loss: 1.0219 | LR: 2.00e-05 [2026-04-18 12:53:33] Epoch 1 | Step 4260 | Loss: 1.0216 | LR: 2.00e-05 [2026-04-18 12:53:36] Epoch 1 | Step 4270 | Loss: 1.0218 | LR: 2.00e-05 [2026-04-18 12:53:40] Epoch 1 | Step 4280 | Loss: 1.0216 | LR: 2.00e-05 [2026-04-18 12:53:44] Epoch 1 | Step 4290 | Loss: 1.0214 | LR: 2.00e-05 [2026-04-18 12:53:47] Epoch 1 | Step 4300 | Loss: 1.0214 | LR: 2.00e-05 [2026-04-18 12:53:51] Epoch 1 | Step 4310 | Loss: 1.0215 | LR: 2.00e-05 [2026-04-18 12:53:53] Epoch 1 | Step 4320 | Loss: 1.0216 | LR: 2.00e-05 [2026-04-18 12:53:57] Epoch 1 | Step 4330 | Loss: 1.0215 | LR: 2.00e-05 [2026-04-18 12:54:00] Epoch 1 | Step 4340 | Loss: 1.0213 | LR: 2.00e-05 [2026-04-18 12:54:04] Epoch 1 | Step 4350 | Loss: 1.0211 | LR: 2.00e-05 [2026-04-18 12:54:08] Epoch 1 | Step 4360 | Loss: 1.0211 | LR: 2.00e-05 [2026-04-18 12:54:11] Epoch 1 | Step 4370 | Loss: 1.0211 | LR: 2.00e-05 [2026-04-18 12:54:15] Epoch 1 | Step 4380 | Loss: 1.0209 | LR: 2.00e-05 [2026-04-18 12:54:18] Epoch 1 | Step 4390 | Loss: 1.0209 | LR: 2.00e-05 [2026-04-18 12:54:22] Epoch 1 | Step 4400 | Loss: 1.0207 | LR: 2.00e-05 [2026-04-18 12:54:25] Epoch 1 | Step 4410 | Loss: 1.0203 | LR: 2.00e-05 [2026-04-18 12:54:29] Epoch 1 | Step 4420 | Loss: 1.0206 | LR: 2.00e-05 [2026-04-18 12:54:32] Epoch 1 | Step 4430 | Loss: 1.0204 | LR: 2.00e-05 [2026-04-18 12:54:36] Epoch 1 | Step 4440 | Loss: 1.0208 | LR: 2.00e-05 [2026-04-18 12:54:40] Epoch 1 | Step 4450 | Loss: 1.0207 | LR: 2.00e-05 [2026-04-18 12:54:43] Epoch 1 | Step 4460 | Loss: 1.0209 | LR: 2.00e-05 [2026-04-18 12:54:47] Epoch 1 | Step 4470 | Loss: 1.0207 | LR: 2.00e-05 [2026-04-18 12:54:50] Epoch 1 | Step 4480 | Loss: 1.0205 | LR: 2.00e-05 [2026-04-18 12:54:54] Epoch 1 | Step 4490 | Loss: 1.0203 | LR: 2.00e-05 [2026-04-18 12:54:58] Epoch 1 | Step 4500 | Loss: 1.0204 | LR: 2.00e-05 [2026-04-18 12:55:02] Epoch 1 | Step 4510 | Loss: 1.0201 | LR: 2.00e-05 [2026-04-18 12:55:05] Epoch 1 | Step 4520 | Loss: 1.0198 | LR: 2.00e-05 [2026-04-18 12:55:10] Epoch 1 | Step 4530 | Loss: 1.0195 | LR: 2.00e-05 [2026-04-18 12:55:14] Epoch 1 | Step 4540 | Loss: 1.0194 | LR: 2.00e-05 [2026-04-18 12:55:17] Epoch 1 | Step 4550 | Loss: 1.0190 | LR: 2.00e-05 [2026-04-18 12:55:21] Epoch 1 | Step 4560 | Loss: 1.0191 | LR: 2.00e-05 [2026-04-18 12:55:24] Epoch 1 | Step 4570 | Loss: 1.0190 | LR: 2.00e-05 [2026-04-18 12:55:28] Epoch 1 | Step 4580 | Loss: 1.0189 | LR: 2.00e-05 [2026-04-18 12:55:31] Epoch 1 | Step 4590 | Loss: 1.0188 | LR: 2.00e-05 [2026-04-18 12:55:35] Epoch 1 | Step 4600 | Loss: 1.0187 | LR: 2.00e-05 [2026-04-18 12:55:38] Epoch 1 | Step 4610 | Loss: 1.0185 | LR: 2.00e-05 [2026-04-18 12:55:41] Epoch 1 | Step 4620 | Loss: 1.0186 | LR: 2.00e-05 [2026-04-18 12:55:45] Epoch 1 | Step 4630 | Loss: 1.0185 | LR: 2.00e-05 [2026-04-18 12:55:48] Epoch 1 | Step 4640 | Loss: 1.0185 | LR: 2.00e-05 [2026-04-18 12:55:52] Epoch 1 | Step 4650 | Loss: 1.0185 | LR: 2.00e-05 [2026-04-18 12:55:55] Epoch 1 | Step 4660 | Loss: 1.0184 | LR: 2.00e-05 [2026-04-18 12:55:59] Epoch 1 | Step 4670 | Loss: 1.0182 | LR: 2.00e-05 [2026-04-18 12:56:03] Epoch 1 | Step 4680 | Loss: 1.0183 | LR: 2.00e-05 [2026-04-18 12:56:07] Epoch 1 | Step 4690 | Loss: 1.0181 | LR: 2.00e-05 [2026-04-18 12:56:10] Epoch 1 | Step 4700 | Loss: 1.0183 | LR: 2.00e-05 [2026-04-18 12:56:14] Epoch 1 | Step 4710 | Loss: 1.0182 | LR: 2.00e-05 [2026-04-18 12:56:17] Epoch 1 | Step 4720 | Loss: 1.0182 | LR: 2.00e-05 [2026-04-18 12:56:20] Epoch 1 | Step 4730 | Loss: 1.0182 | LR: 2.00e-05 [2026-04-18 12:56:24] Epoch 1 | Step 4740 | Loss: 1.0180 | LR: 2.00e-05 [2026-04-18 12:56:27] Epoch 1 | Step 4750 | Loss: 1.0179 | LR: 2.00e-05 [2026-04-18 12:56:30] Epoch 1 | Step 4760 | Loss: 1.0178 | LR: 2.00e-05 [2026-04-18 12:56:34] Epoch 1 | Step 4770 | Loss: 1.0175 | LR: 2.00e-05 [2026-04-18 12:56:37] Epoch 1 | Step 4780 | Loss: 1.0176 | LR: 2.00e-05 [2026-04-18 12:56:41] Epoch 1 | Step 4790 | Loss: 1.0174 | LR: 2.00e-05 [2026-04-18 12:56:45] Epoch 1 | Step 4800 | Loss: 1.0172 | LR: 2.00e-05 [2026-04-18 12:56:49] Epoch 1 | Step 4810 | Loss: 1.0169 | LR: 2.00e-05 [2026-04-18 12:56:52] Epoch 1 | Step 4820 | Loss: 1.0167 | LR: 2.00e-05 [2026-04-18 12:56:56] Epoch 1 | Step 4830 | Loss: 1.0163 | LR: 2.00e-05 [2026-04-18 12:56:59] Epoch 1 | Step 4840 | Loss: 1.0163 | LR: 2.00e-05 [2026-04-18 12:57:03] Epoch 1 | Step 4850 | Loss: 1.0165 | LR: 2.00e-05 [2026-04-18 12:57:06] Epoch 1 | Step 4860 | Loss: 1.0168 | LR: 2.00e-05 [2026-04-18 12:57:10] Epoch 1 | Step 4870 | Loss: 1.0168 | LR: 2.00e-05 [2026-04-18 12:57:14] Epoch 1 | Step 4880 | Loss: 1.0168 | LR: 2.00e-05 [2026-04-18 12:57:18] Epoch 1 | Step 4890 | Loss: 1.0166 | LR: 2.00e-05 [2026-04-18 12:57:21] Epoch 1 | Step 4900 | Loss: 1.0166 | LR: 2.00e-05 [2026-04-18 12:57:25] Epoch 1 | Step 4910 | Loss: 1.0165 | LR: 2.00e-05 [2026-04-18 12:57:28] Epoch 1 | Step 4920 | Loss: 1.0164 | LR: 2.00e-05 [2026-04-18 12:57:32] Epoch 1 | Step 4930 | Loss: 1.0164 | LR: 2.00e-05 [2026-04-18 12:57:35] Epoch 1 | Step 4940 | Loss: 1.0163 | LR: 2.00e-05 [2026-04-18 12:57:38] Epoch 1 | Step 4950 | Loss: 1.0162 | LR: 2.00e-05 [2026-04-18 12:57:42] Epoch 1 | Step 4960 | Loss: 1.0162 | LR: 2.00e-05 [2026-04-18 12:57:45] Epoch 1 | Step 4970 | Loss: 1.0161 | LR: 2.00e-05 [2026-04-18 12:57:49] Epoch 1 | Step 4980 | Loss: 1.0161 | LR: 2.00e-05 [2026-04-18 12:57:52] Epoch 1 | Step 4990 | Loss: 1.0159 | LR: 2.00e-05 [2026-04-18 12:57:56] Epoch 1 | Step 5000 | Loss: 1.0161 | LR: 2.00e-05 [2026-04-18 12:57:57] Validation | Batch 10/1567 | Loss: 0.9599 [2026-04-18 12:57:57] Validation | Batch 20/1567 | Loss: 1.0140 [2026-04-18 12:57:59] Validation | Batch 30/1567 | Loss: 1.0608 [2026-04-18 12:58:00] Validation | Batch 40/1567 | Loss: 1.0857 [2026-04-18 12:58:00] Validation | Batch 50/1567 | Loss: 1.0638 [2026-04-18 12:58:01] Validation | Batch 60/1567 | Loss: 1.0506 [2026-04-18 12:58:02] Validation | Batch 70/1567 | Loss: 1.0348 [2026-04-18 12:58:03] Validation | Batch 80/1567 | Loss: 1.0517 [2026-04-18 12:58:04] Validation | Batch 90/1567 | Loss: 1.0584 [2026-04-18 12:58:05] Validation | Batch 100/1567 | Loss: 1.0655 [2026-04-18 12:58:05] Validation | Batch 110/1567 | Loss: 1.0542 [2026-04-18 12:58:06] Validation | Batch 120/1567 | Loss: 1.0655 [2026-04-18 12:58:07] Validation | Batch 130/1567 | Loss: 1.0681 [2026-04-18 12:58:08] Validation | Batch 140/1567 | Loss: 1.0716 [2026-04-18 12:58:09] Validation | Batch 150/1567 | Loss: 1.0791 [2026-04-18 12:58:10] Validation | Batch 160/1567 | Loss: 1.0791 [2026-04-18 12:58:10] Validation | Batch 170/1567 | Loss: 1.0647 [2026-04-18 12:58:10] Validation | Batch 180/1567 | Loss: 1.0655 [2026-04-18 12:58:11] Validation | Batch 190/1567 | Loss: 1.0635 [2026-04-18 12:58:12] Validation | Batch 200/1567 | Loss: 1.0659 [2026-04-18 12:58:13] Validation | Batch 210/1567 | Loss: 1.0678 [2026-04-18 12:58:14] Validation | Batch 220/1567 | Loss: 1.0719 [2026-04-18 12:58:15] Validation | Batch 230/1567 | Loss: 1.0748 [2026-04-18 12:58:16] Validation | Batch 240/1567 | Loss: 1.0729 [2026-04-18 12:58:16] Validation | Batch 250/1567 | Loss: 1.0678 [2026-04-18 12:58:17] Validation | Batch 260/1567 | Loss: 1.0629 [2026-04-18 12:58:18] Validation | Batch 270/1567 | Loss: 1.0597 [2026-04-18 12:58:18] Validation | Batch 280/1567 | Loss: 1.0608 [2026-04-18 12:58:20] Validation | Batch 290/1567 | Loss: 1.0666 [2026-04-18 12:58:20] Validation | Batch 300/1567 | Loss: 1.0719 [2026-04-18 12:58:21] Validation | Batch 310/1567 | Loss: 1.0721 [2026-04-18 12:58:22] Validation | Batch 320/1567 | Loss: 1.0720 [2026-04-18 12:58:23] Validation | Batch 330/1567 | Loss: 1.0695 [2026-04-18 12:58:24] Validation | Batch 340/1567 | Loss: 1.0729 [2026-04-18 12:58:25] Validation | Batch 350/1567 | Loss: 1.0725 [2026-04-18 12:58:25] Validation | Batch 360/1567 | Loss: 1.0714 [2026-04-18 12:58:26] Validation | Batch 370/1567 | Loss: 1.0690 [2026-04-18 12:58:27] Validation | Batch 380/1567 | Loss: 1.0727 [2026-04-18 12:58:28] Validation | Batch 390/1567 | Loss: 1.0735 [2026-04-18 12:58:28] Validation | Batch 400/1567 | Loss: 1.0752 [2026-04-18 12:58:29] Validation | Batch 410/1567 | Loss: 1.0749 [2026-04-18 12:58:30] Validation | Batch 420/1567 | Loss: 1.0750 [2026-04-18 12:58:31] Validation | Batch 430/1567 | Loss: 1.0745 [2026-04-18 12:58:32] Validation | Batch 440/1567 | Loss: 1.0731 [2026-04-18 12:58:33] Validation | Batch 450/1567 | Loss: 1.0733 [2026-04-18 12:58:34] Validation | Batch 460/1567 | Loss: 1.0718 [2026-04-18 12:58:34] Validation | Batch 470/1567 | Loss: 1.0699 [2026-04-18 12:58:35] Validation | Batch 480/1567 | Loss: 1.0680 [2026-04-18 12:58:36] Validation | Batch 490/1567 | Loss: 1.0681 [2026-04-18 12:58:36] Validation | Batch 500/1567 | Loss: 1.0686 [2026-04-18 12:58:37] Validation | Batch 510/1567 | Loss: 1.0708 [2026-04-18 12:58:38] Validation | Batch 520/1567 | Loss: 1.0723 [2026-04-18 12:58:39] Validation | Batch 530/1567 | Loss: 1.0718 [2026-04-18 12:58:40] Validation | Batch 540/1567 | Loss: 1.0738 [2026-04-18 12:58:41] Validation | Batch 550/1567 | Loss: 1.0771 [2026-04-18 12:58:42] Validation | Batch 560/1567 | Loss: 1.0768 [2026-04-18 12:58:42] Validation | Batch 570/1567 | Loss: 1.0766 [2026-04-18 12:58:43] Validation | Batch 580/1567 | Loss: 1.0754 [2026-04-18 12:58:44] Validation | Batch 590/1567 | Loss: 1.0744 [2026-04-18 12:58:45] Validation | Batch 600/1567 | Loss: 1.0724 [2026-04-18 12:58:46] Validation | Batch 610/1567 | Loss: 1.0715 [2026-04-18 12:58:47] Validation | Batch 620/1567 | Loss: 1.0728 [2026-04-18 12:58:48] Validation | Batch 630/1567 | Loss: 1.0710 [2026-04-18 12:58:49] Validation | Batch 640/1567 | Loss: 1.0725 [2026-04-18 12:58:50] Validation | Batch 650/1567 | Loss: 1.0716 [2026-04-18 12:58:50] Validation | Batch 660/1567 | Loss: 1.0701 [2026-04-18 12:58:51] Validation | Batch 670/1567 | Loss: 1.0681 [2026-04-18 12:58:52] Validation | Batch 680/1567 | Loss: 1.0678 [2026-04-18 12:58:52] Validation | Batch 690/1567 | Loss: 1.0687 [2026-04-18 12:58:53] Validation | Batch 700/1567 | Loss: 1.0673 [2026-04-18 12:58:54] Validation | Batch 710/1567 | Loss: 1.0687 [2026-04-18 12:58:55] Validation | Batch 720/1567 | Loss: 1.0680 [2026-04-18 12:58:56] Validation | Batch 730/1567 | Loss: 1.0690 [2026-04-18 12:58:57] Validation | Batch 740/1567 | Loss: 1.0696 [2026-04-18 12:58:57] Validation | Batch 750/1567 | Loss: 1.0695 [2026-04-18 12:58:58] Validation | Batch 760/1567 | Loss: 1.0697 [2026-04-18 12:58:59] Validation | Batch 770/1567 | Loss: 1.0716 [2026-04-18 12:59:00] Validation | Batch 780/1567 | Loss: 1.0729 [2026-04-18 12:59:01] Validation | Batch 790/1567 | Loss: 1.0725 [2026-04-18 12:59:01] Validation | Batch 800/1567 | Loss: 1.0741 [2026-04-18 12:59:02] Validation | Batch 810/1567 | Loss: 1.0745 [2026-04-18 12:59:03] Validation | Batch 820/1567 | Loss: 1.0743 [2026-04-18 12:59:04] Validation | Batch 830/1567 | Loss: 1.0728 [2026-04-18 12:59:04] Validation | Batch 840/1567 | Loss: 1.0730 [2026-04-18 12:59:05] Validation | Batch 850/1567 | Loss: 1.0719 [2026-04-18 12:59:06] Validation | Batch 860/1567 | Loss: 1.0733 [2026-04-18 12:59:07] Validation | Batch 870/1567 | Loss: 1.0738 [2026-04-18 12:59:07] Validation | Batch 880/1567 | Loss: 1.0748 [2026-04-18 12:59:08] Validation | Batch 890/1567 | Loss: 1.0751 [2026-04-18 12:59:09] Validation | Batch 900/1567 | Loss: 1.0769 [2026-04-18 12:59:10] Validation | Batch 910/1567 | Loss: 1.0771 [2026-04-18 12:59:10] Validation | Batch 920/1567 | Loss: 1.0788 [2026-04-18 12:59:11] Validation | Batch 930/1567 | Loss: 1.0767 [2026-04-18 12:59:12] Validation | Batch 940/1567 | Loss: 1.0764 [2026-04-18 12:59:13] Validation | Batch 950/1567 | Loss: 1.0755 [2026-04-18 12:59:13] Validation | Batch 960/1567 | Loss: 1.0743 [2026-04-18 12:59:14] Validation | Batch 970/1567 | Loss: 1.0756 [2026-04-18 12:59:15] Validation | Batch 980/1567 | Loss: 1.0762 [2026-04-18 12:59:15] Validation | Batch 990/1567 | Loss: 1.0755 [2026-04-18 12:59:16] Validation | Batch 1000/1567 | Loss: 1.0755 [2026-04-18 12:59:17] Validation | Batch 1010/1567 | Loss: 1.0733 [2026-04-18 12:59:18] Validation | Batch 1020/1567 | Loss: 1.0736 [2026-04-18 12:59:19] Validation | Batch 1030/1567 | Loss: 1.0750 [2026-04-18 12:59:20] Validation | Batch 1040/1567 | Loss: 1.0747 [2026-04-18 12:59:20] Validation | Batch 1050/1567 | Loss: 1.0757 [2026-04-18 12:59:21] Validation | Batch 1060/1567 | Loss: 1.0750 [2026-04-18 12:59:22] Validation | Batch 1070/1567 | Loss: 1.0742 [2026-04-18 12:59:23] Validation | Batch 1080/1567 | Loss: 1.0752 [2026-04-18 12:59:23] Validation | Batch 1090/1567 | Loss: 1.0752 [2026-04-18 12:59:24] Validation | Batch 1100/1567 | Loss: 1.0757 [2026-04-18 12:59:25] Validation | Batch 1110/1567 | Loss: 1.0753 [2026-04-18 12:59:25] Validation | Batch 1120/1567 | Loss: 1.0757 [2026-04-18 12:59:26] Validation | Batch 1130/1567 | Loss: 1.0760 [2026-04-18 12:59:27] Validation | Batch 1140/1567 | Loss: 1.0765 [2026-04-18 12:59:28] Validation | Batch 1150/1567 | Loss: 1.0765 [2026-04-18 12:59:29] Validation | Batch 1160/1567 | Loss: 1.0774 [2026-04-18 12:59:30] Validation | Batch 1170/1567 | Loss: 1.0772 [2026-04-18 12:59:31] Validation | Batch 1180/1567 | Loss: 1.0770 [2026-04-18 12:59:31] Validation | Batch 1190/1567 | Loss: 1.0778 [2026-04-18 12:59:32] Validation | Batch 1200/1567 | Loss: 1.0774 [2026-04-18 12:59:33] Validation | Batch 1210/1567 | Loss: 1.0763 [2026-04-18 12:59:34] Validation | Batch 1220/1567 | Loss: 1.0767 [2026-04-18 12:59:35] Validation | Batch 1230/1567 | Loss: 1.0786 [2026-04-18 12:59:35] Validation | Batch 1240/1567 | Loss: 1.0776 [2026-04-18 12:59:36] Validation | Batch 1250/1567 | Loss: 1.0775 [2026-04-18 12:59:37] Validation | Batch 1260/1567 | Loss: 1.0783 [2026-04-18 12:59:38] Validation | Batch 1270/1567 | Loss: 1.0783 [2026-04-18 12:59:39] Validation | Batch 1280/1567 | Loss: 1.0775 [2026-04-18 12:59:40] Validation | Batch 1290/1567 | Loss: 1.0778 [2026-04-18 12:59:41] Validation | Batch 1300/1567 | Loss: 1.0782 [2026-04-18 12:59:41] Validation | Batch 1310/1567 | Loss: 1.0787 [2026-04-18 12:59:42] Validation | Batch 1320/1567 | Loss: 1.0777 [2026-04-18 12:59:43] Validation | Batch 1330/1567 | Loss: 1.0773 [2026-04-18 12:59:44] Validation | Batch 1340/1567 | Loss: 1.0771 [2026-04-18 12:59:44] Validation | Batch 1350/1567 | Loss: 1.0776 [2026-04-18 12:59:45] Validation | Batch 1360/1567 | Loss: 1.0771 [2026-04-18 12:59:46] Validation | Batch 1370/1567 | Loss: 1.0774 [2026-04-18 12:59:47] Validation | Batch 1380/1567 | Loss: 1.0785 [2026-04-18 12:59:48] Validation | Batch 1390/1567 | Loss: 1.0784 [2026-04-18 12:59:49] Validation | Batch 1400/1567 | Loss: 1.0786 [2026-04-18 12:59:49] Validation | Batch 1410/1567 | Loss: 1.0784 [2026-04-18 12:59:50] Validation | Batch 1420/1567 | Loss: 1.0788 [2026-04-18 12:59:51] Validation | Batch 1430/1567 | Loss: 1.0785 [2026-04-18 12:59:52] Validation | Batch 1440/1567 | Loss: 1.0789 [2026-04-18 12:59:52] Validation | Batch 1450/1567 | Loss: 1.0783 [2026-04-18 12:59:53] Validation | Batch 1460/1567 | Loss: 1.0780 [2026-04-18 12:59:54] Validation | Batch 1470/1567 | Loss: 1.0772 [2026-04-18 12:59:54] Validation | Batch 1480/1567 | Loss: 1.0757 [2026-04-18 12:59:55] Validation | Batch 1490/1567 | Loss: 1.0758 [2026-04-18 12:59:56] Validation | Batch 1500/1567 | Loss: 1.0758 [2026-04-18 12:59:57] Validation | Batch 1510/1567 | Loss: 1.0755 [2026-04-18 12:59:57] Validation | Batch 1520/1567 | Loss: 1.0748 [2026-04-18 12:59:58] Validation | Batch 1530/1567 | Loss: 1.0758 [2026-04-18 12:59:59] Validation | Batch 1540/1567 | Loss: 1.0768 [2026-04-18 13:00:00] Validation | Batch 1550/1567 | Loss: 1.0770 [2026-04-18 13:00:01] Validation | Batch 1560/1567 | Loss: 1.0760 [2026-04-18 13:00:01] Validation | Batch 1567/1567 | Loss: 1.0763 [2026-04-18 13:00:01] Validation | Loss: 1.0763 | PPL: 2.95 | Time: 125.50s [2026-04-18 13:00:05] New best model saved! Val loss: 1.0763 [2026-04-18 13:00:09] Epoch 1 | Step 5010 | Loss: 1.0159 | LR: 2.00e-05 [2026-04-18 13:00:12] Epoch 1 | Step 5020 | Loss: 1.0156 | LR: 2.00e-05 [2026-04-18 13:00:15] Epoch 1 | Step 5030 | Loss: 1.0156 | LR: 2.00e-05 [2026-04-18 13:00:19] Epoch 1 | Step 5040 | Loss: 1.0154 | LR: 2.00e-05 [2026-04-18 13:00:22] Epoch 1 | Step 5050 | Loss: 1.0153 | LR: 2.00e-05 [2026-04-18 13:00:26] Epoch 1 | Step 5060 | Loss: 1.0152 | LR: 2.00e-05 [2026-04-18 13:00:30] Epoch 1 | Step 5070 | Loss: 1.0152 | LR: 2.00e-05 [2026-04-18 13:00:33] Epoch 1 | Step 5080 | Loss: 1.0154 | LR: 2.00e-05 [2026-04-18 13:00:37] Epoch 1 | Step 5090 | Loss: 1.0154 | LR: 2.00e-05 [2026-04-18 13:00:41] Epoch 1 | Step 5100 | Loss: 1.0152 | LR: 2.00e-05 [2026-04-18 13:00:44] Epoch 1 | Step 5110 | Loss: 1.0151 | LR: 2.00e-05 [2026-04-18 13:00:48] Epoch 1 | Step 5120 | Loss: 1.0152 | LR: 2.00e-05 [2026-04-18 13:00:52] Epoch 1 | Step 5130 | Loss: 1.0151 | LR: 2.00e-05 [2026-04-18 13:00:55] Epoch 1 | Step 5140 | Loss: 1.0149 | LR: 2.00e-05 [2026-04-18 13:00:59] Epoch 1 | Step 5150 | Loss: 1.0148 | LR: 2.00e-05 [2026-04-18 13:01:02] Epoch 1 | Step 5160 | Loss: 1.0143 | LR: 2.00e-05 [2026-04-18 13:01:06] Epoch 1 | Step 5170 | Loss: 1.0143 | LR: 2.00e-05 [2026-04-18 13:01:10] Epoch 1 | Step 5180 | Loss: 1.0142 | LR: 2.00e-05 [2026-04-18 13:01:13] Epoch 1 | Step 5190 | Loss: 1.0143 | LR: 2.00e-05 [2026-04-18 13:01:17] Epoch 1 | Step 5200 | Loss: 1.0142 | LR: 2.00e-05 [2026-04-18 13:01:21] Epoch 1 | Step 5210 | Loss: 1.0141 | LR: 2.00e-05 [2026-04-18 13:01:25] Epoch 1 | Step 5220 | Loss: 1.0141 | LR: 2.00e-05 [2026-04-18 13:01:28] Epoch 1 | Step 5230 | Loss: 1.0140 | LR: 2.00e-05 [2026-04-18 13:01:32] Epoch 1 | Step 5240 | Loss: 1.0140 | LR: 2.00e-05 [2026-04-18 13:01:36] Epoch 1 | Step 5250 | Loss: 1.0140 | LR: 2.00e-05 [2026-04-18 13:01:39] Epoch 1 | Step 5260 | Loss: 1.0140 | LR: 2.00e-05 [2026-04-18 13:01:43] Epoch 1 | Step 5270 | Loss: 1.0140 | LR: 2.00e-05 [2026-04-18 13:01:47] Epoch 1 | Step 5280 | Loss: 1.0137 | LR: 2.00e-05 [2026-04-18 13:01:51] Epoch 1 | Step 5290 | Loss: 1.0134 | LR: 2.00e-05 [2026-04-18 13:01:54] Epoch 1 | Step 5300 | Loss: 1.0133 | LR: 2.00e-05 [2026-04-18 13:01:57] Epoch 1 | Step 5310 | Loss: 1.0134 | LR: 2.00e-05 [2026-04-18 13:02:01] Epoch 1 | Step 5320 | Loss: 1.0133 | LR: 2.00e-05 [2026-04-18 13:02:04] Epoch 1 | Step 5330 | Loss: 1.0132 | LR: 2.00e-05 [2026-04-18 13:02:08] Epoch 1 | Step 5340 | Loss: 1.0131 | LR: 2.00e-05 [2026-04-18 13:02:11] Epoch 1 | Step 5350 | Loss: 1.0129 | LR: 2.00e-05 [2026-04-18 13:02:15] Epoch 1 | Step 5360 | Loss: 1.0131 | LR: 2.00e-05 [2026-04-18 13:02:18] Epoch 1 | Step 5370 | Loss: 1.0130 | LR: 2.00e-05 [2026-04-18 13:02:22] Epoch 1 | Step 5380 | Loss: 1.0129 | LR: 2.00e-05 [2026-04-18 13:02:25] Epoch 1 | Step 5390 | Loss: 1.0127 | LR: 2.00e-05 [2026-04-18 13:02:29] Epoch 1 | Step 5400 | Loss: 1.0124 | LR: 2.00e-05 [2026-04-18 13:02:33] Epoch 1 | Step 5410 | Loss: 1.0124 | LR: 2.00e-05 [2026-04-18 13:02:37] Epoch 1 | Step 5420 | Loss: 1.0123 | LR: 2.00e-05 [2026-04-18 13:02:40] Epoch 1 | Step 5430 | Loss: 1.0123 | LR: 2.00e-05 [2026-04-18 13:02:43] Epoch 1 | Step 5440 | Loss: 1.0124 | LR: 2.00e-05 [2026-04-18 13:02:47] Epoch 1 | Step 5450 | Loss: 1.0126 | LR: 2.00e-05 [2026-04-18 13:02:51] Epoch 1 | Step 5460 | Loss: 1.0124 | LR: 2.00e-05 [2026-04-18 13:02:54] Epoch 1 | Step 5470 | Loss: 1.0123 | LR: 2.00e-05 [2026-04-18 13:02:58] Epoch 1 | Step 5480 | Loss: 1.0123 | LR: 2.00e-05 [2026-04-18 13:03:01] Epoch 1 | Step 5490 | Loss: 1.0123 | LR: 2.00e-05 [2026-04-18 13:03:05] Epoch 1 | Step 5500 | Loss: 1.0122 | LR: 2.00e-05 [2026-04-18 13:03:08] Epoch 1 | Step 5510 | Loss: 1.0124 | LR: 2.00e-05 [2026-04-18 13:03:12] Epoch 1 | Step 5520 | Loss: 1.0122 | LR: 2.00e-05 [2026-04-18 13:03:15] Epoch 1 | Step 5530 | Loss: 1.0121 | LR: 2.00e-05 [2026-04-18 13:03:19] Epoch 1 | Step 5540 | Loss: 1.0118 | LR: 2.00e-05 [2026-04-18 13:03:22] Epoch 1 | Step 5550 | Loss: 1.0117 | LR: 2.00e-05 [2026-04-18 13:03:26] Epoch 1 | Step 5560 | Loss: 1.0117 | LR: 2.00e-05 [2026-04-18 13:03:30] Epoch 1 | Step 5570 | Loss: 1.0120 | LR: 2.00e-05 [2026-04-18 13:03:34] Epoch 1 | Step 5580 | Loss: 1.0118 | LR: 2.00e-05 [2026-04-18 13:03:37] Epoch 1 | Step 5590 | Loss: 1.0116 | LR: 2.00e-05 [2026-04-18 13:03:41] Epoch 1 | Step 5600 | Loss: 1.0118 | LR: 2.00e-05 [2026-04-18 13:03:45] Epoch 1 | Step 5610 | Loss: 1.0119 | LR: 2.00e-05 [2026-04-18 13:03:48] Epoch 1 | Step 5620 | Loss: 1.0117 | LR: 2.00e-05 [2026-04-18 13:03:51] Epoch 1 | Step 5630 | Loss: 1.0117 | LR: 2.00e-05 [2026-04-18 13:03:55] Epoch 1 | Step 5640 | Loss: 1.0116 | LR: 2.00e-05 [2026-04-18 13:03:59] Epoch 1 | Step 5650 | Loss: 1.0116 | LR: 2.00e-05 [2026-04-18 13:04:02] Epoch 1 | Step 5660 | Loss: 1.0114 | LR: 2.00e-05 [2026-04-18 13:04:05] Epoch 1 | Step 5670 | Loss: 1.0112 | LR: 2.00e-05 [2026-04-18 13:04:09] Epoch 1 | Step 5680 | Loss: 1.0110 | LR: 2.00e-05 [2026-04-18 13:04:12] Epoch 1 | Step 5690 | Loss: 1.0111 | LR: 2.00e-05 [2026-04-18 13:04:16] Epoch 1 | Step 5700 | Loss: 1.0109 | LR: 2.00e-05 [2026-04-18 13:04:20] Epoch 1 | Step 5710 | Loss: 1.0110 | LR: 2.00e-05 [2026-04-18 13:04:24] Epoch 1 | Step 5720 | Loss: 1.0110 | LR: 2.00e-05 [2026-04-18 13:04:27] Epoch 1 | Step 5730 | Loss: 1.0109 | LR: 2.00e-05 [2026-04-18 13:04:31] Epoch 1 | Step 5740 | Loss: 1.0110 | LR: 2.00e-05 [2026-04-18 13:04:34] Epoch 1 | Step 5750 | Loss: 1.0109 | LR: 2.00e-05 [2026-04-18 13:04:39] Epoch 1 | Step 5760 | Loss: 1.0109 | LR: 2.00e-05 [2026-04-18 13:04:42] Epoch 1 | Step 5770 | Loss: 1.0109 | LR: 2.00e-05 [2026-04-18 13:04:45] Epoch 1 | Step 5780 | Loss: 1.0108 | LR: 2.00e-05 [2026-04-18 13:04:49] Epoch 1 | Step 5790 | Loss: 1.0109 | LR: 2.00e-05 [2026-04-18 13:04:52] Epoch 1 | Step 5800 | Loss: 1.0111 | LR: 2.00e-05 [2026-04-18 13:04:56] Epoch 1 | Step 5810 | Loss: 1.0111 | LR: 2.00e-05 [2026-04-18 13:05:00] Epoch 1 | Step 5820 | Loss: 1.0108 | LR: 2.00e-05 [2026-04-18 13:05:03] Epoch 1 | Step 5830 | Loss: 1.0107 | LR: 2.00e-05 [2026-04-18 13:05:06] Epoch 1 | Step 5840 | Loss: 1.0108 | LR: 2.00e-05 [2026-04-18 13:05:10] Epoch 1 | Step 5850 | Loss: 1.0109 | LR: 2.00e-05 [2026-04-18 13:05:13] Epoch 1 | Step 5860 | Loss: 1.0108 | LR: 2.00e-05 [2026-04-18 13:05:16] Epoch 1 | Step 5870 | Loss: 1.0109 | LR: 2.00e-05 [2026-04-18 13:05:20] Epoch 1 | Step 5880 | Loss: 1.0110 | LR: 2.00e-05 [2026-04-18 13:05:24] Epoch 1 | Step 5890 | Loss: 1.0110 | LR: 2.00e-05 [2026-04-18 13:05:28] Epoch 1 | Step 5900 | Loss: 1.0109 | LR: 2.00e-05 [2026-04-18 13:05:32] Epoch 1 | Step 5910 | Loss: 1.0109 | LR: 2.00e-05 [2026-04-18 13:05:35] Epoch 1 | Step 5920 | Loss: 1.0107 | LR: 2.00e-05 [2026-04-18 13:05:39] Epoch 1 | Step 5930 | Loss: 1.0108 | LR: 2.00e-05 [2026-04-18 13:05:42] Epoch 1 | Step 5940 | Loss: 1.0106 | LR: 2.00e-05 [2026-04-18 13:05:45] Epoch 1 | Step 5950 | Loss: 1.0106 | LR: 2.00e-05 [2026-04-18 13:05:49] Epoch 1 | Step 5960 | Loss: 1.0106 | LR: 2.00e-05 [2026-04-18 13:05:52] Epoch 1 | Step 5970 | Loss: 1.0107 | LR: 2.00e-05 [2026-04-18 13:05:56] Epoch 1 | Step 5980 | Loss: 1.0106 | LR: 2.00e-05 [2026-04-18 13:06:00] Epoch 1 | Step 5990 | Loss: 1.0108 | LR: 2.00e-05 [2026-04-18 13:06:03] Epoch 1 | Step 6000 | Loss: 1.0106 | LR: 2.00e-05 [2026-04-18 13:06:13] Checkpoint saved: outputs/2026-04-18/12-19-14/checkpoints/checkpoint_step_6000.pt [2026-04-18 13:06:29] Validation | Batch 10/1567 | Loss: 0.9562 [2026-04-18 13:06:30] Validation | Batch 20/1567 | Loss: 1.0115 [2026-04-18 13:06:31] Validation | Batch 30/1567 | Loss: 1.0531 [2026-04-18 13:06:32] Validation | Batch 40/1567 | Loss: 1.0742 [2026-04-18 13:06:32] Validation | Batch 50/1567 | Loss: 1.0569 [2026-04-18 13:06:33] Validation | Batch 60/1567 | Loss: 1.0446 [2026-04-18 13:06:34] Validation | Batch 70/1567 | Loss: 1.0299 [2026-04-18 13:06:35] Validation | Batch 80/1567 | Loss: 1.0459 [2026-04-18 13:06:36] Validation | Batch 90/1567 | Loss: 1.0535 [2026-04-18 13:06:37] Validation | Batch 100/1567 | Loss: 1.0623 [2026-04-18 13:06:38] Validation | Batch 110/1567 | Loss: 1.0544 [2026-04-18 13:06:39] Validation | Batch 120/1567 | Loss: 1.0638 [2026-04-18 13:06:39] Validation | Batch 130/1567 | Loss: 1.0656 [2026-04-18 13:06:40] Validation | Batch 140/1567 | Loss: 1.0683 [2026-04-18 13:06:41] Validation | Batch 150/1567 | Loss: 1.0774 [2026-04-18 13:06:42] Validation | Batch 160/1567 | Loss: 1.0782 [2026-04-18 13:06:43] Validation | Batch 170/1567 | Loss: 1.0635 [2026-04-18 13:06:43] Validation | Batch 180/1567 | Loss: 1.0651 [2026-04-18 13:06:44] Validation | Batch 190/1567 | Loss: 1.0628 [2026-04-18 13:06:45] Validation | Batch 200/1567 | Loss: 1.0659 [2026-04-18 13:06:46] Validation | Batch 210/1567 | Loss: 1.0675 [2026-04-18 13:06:47] Validation | Batch 220/1567 | Loss: 1.0711 [2026-04-18 13:06:48] Validation | Batch 230/1567 | Loss: 1.0740 [2026-04-18 13:06:49] Validation | Batch 240/1567 | Loss: 1.0718 [2026-04-18 13:06:49] Validation | Batch 250/1567 | Loss: 1.0660 [2026-04-18 13:06:50] Validation | Batch 260/1567 | Loss: 1.0615 [2026-04-18 13:06:51] Validation | Batch 270/1567 | Loss: 1.0584 [2026-04-18 13:06:51] Validation | Batch 280/1567 | Loss: 1.0592 [2026-04-18 13:06:52] Validation | Batch 290/1567 | Loss: 1.0647 [2026-04-18 13:06:53] Validation | Batch 300/1567 | Loss: 1.0705 [2026-04-18 13:06:54] Validation | Batch 310/1567 | Loss: 1.0698 [2026-04-18 13:06:55] Validation | Batch 320/1567 | Loss: 1.0693 [2026-04-18 13:06:56] Validation | Batch 330/1567 | Loss: 1.0671 [2026-04-18 13:06:57] Validation | Batch 340/1567 | Loss: 1.0707 [2026-04-18 13:06:57] Validation | Batch 350/1567 | Loss: 1.0702 [2026-04-18 13:06:58] Validation | Batch 360/1567 | Loss: 1.0685 [2026-04-18 13:06:59] Validation | Batch 370/1567 | Loss: 1.0660 [2026-04-18 13:07:00] Validation | Batch 380/1567 | Loss: 1.0690 [2026-04-18 13:07:00] Validation | Batch 390/1567 | Loss: 1.0701 [2026-04-18 13:07:01] Validation | Batch 400/1567 | Loss: 1.0714 [2026-04-18 13:07:02] Validation | Batch 410/1567 | Loss: 1.0705 [2026-04-18 13:07:03] Validation | Batch 420/1567 | Loss: 1.0703 [2026-04-18 13:07:04] Validation | Batch 430/1567 | Loss: 1.0705 [2026-04-18 13:07:05] Validation | Batch 440/1567 | Loss: 1.0695 [2026-04-18 13:07:05] Validation | Batch 450/1567 | Loss: 1.0696 [2026-04-18 13:07:06] Validation | Batch 460/1567 | Loss: 1.0684 [2026-04-18 13:07:07] Validation | Batch 470/1567 | Loss: 1.0669 [2026-04-18 13:07:08] Validation | Batch 480/1567 | Loss: 1.0649 [2026-04-18 13:07:09] Validation | Batch 490/1567 | Loss: 1.0654 [2026-04-18 13:07:09] Validation | Batch 500/1567 | Loss: 1.0654 [2026-04-18 13:07:10] Validation | Batch 510/1567 | Loss: 1.0673 [2026-04-18 13:07:11] Validation | Batch 520/1567 | Loss: 1.0688 [2026-04-18 13:07:12] Validation | Batch 530/1567 | Loss: 1.0685 [2026-04-18 13:07:13] Validation | Batch 540/1567 | Loss: 1.0705 [2026-04-18 13:07:14] Validation | Batch 550/1567 | Loss: 1.0738 [2026-04-18 13:07:14] Validation | Batch 560/1567 | Loss: 1.0736 [2026-04-18 13:07:15] Validation | Batch 570/1567 | Loss: 1.0735 [2026-04-18 13:07:17] Validation | Batch 580/1567 | Loss: 1.0724 [2026-04-18 13:07:18] Validation | Batch 590/1567 | Loss: 1.0711 [2026-04-18 13:07:18] Validation | Batch 600/1567 | Loss: 1.0693 [2026-04-18 13:07:20] Validation | Batch 610/1567 | Loss: 1.0682 [2026-04-18 13:07:21] Validation | Batch 620/1567 | Loss: 1.0695 [2026-04-18 13:07:21] Validation | Batch 630/1567 | Loss: 1.0674 [2026-04-18 13:07:22] Validation | Batch 640/1567 | Loss: 1.0687 [2026-04-18 13:07:23] Validation | Batch 650/1567 | Loss: 1.0675 [2026-04-18 13:07:24] Validation | Batch 660/1567 | Loss: 1.0661 [2026-04-18 13:07:24] Validation | Batch 670/1567 | Loss: 1.0641 [2026-04-18 13:07:25] Validation | Batch 680/1567 | Loss: 1.0637 [2026-04-18 13:07:26] Validation | Batch 690/1567 | Loss: 1.0648 [2026-04-18 13:07:27] Validation | Batch 700/1567 | Loss: 1.0633 [2026-04-18 13:07:28] Validation | Batch 710/1567 | Loss: 1.0645 [2026-04-18 13:07:28] Validation | Batch 720/1567 | Loss: 1.0639 [2026-04-18 13:07:29] Validation | Batch 730/1567 | Loss: 1.0647 [2026-04-18 13:07:30] Validation | Batch 740/1567 | Loss: 1.0656 [2026-04-18 13:07:31] Validation | Batch 750/1567 | Loss: 1.0655 [2026-04-18 13:07:31] Validation | Batch 760/1567 | Loss: 1.0653 [2026-04-18 13:07:32] Validation | Batch 770/1567 | Loss: 1.0672 [2026-04-18 13:07:33] Validation | Batch 780/1567 | Loss: 1.0684 [2026-04-18 13:07:34] Validation | Batch 790/1567 | Loss: 1.0679 [2026-04-18 13:07:35] Validation | Batch 800/1567 | Loss: 1.0696 [2026-04-18 13:07:35] Validation | Batch 810/1567 | Loss: 1.0697 [2026-04-18 13:07:36] Validation | Batch 820/1567 | Loss: 1.0694 [2026-04-18 13:07:37] Validation | Batch 830/1567 | Loss: 1.0678 [2026-04-18 13:07:38] Validation | Batch 840/1567 | Loss: 1.0680 [2026-04-18 13:07:39] Validation | Batch 850/1567 | Loss: 1.0667 [2026-04-18 13:07:39] Validation | Batch 860/1567 | Loss: 1.0681 [2026-04-18 13:07:40] Validation | Batch 870/1567 | Loss: 1.0687 [2026-04-18 13:07:41] Validation | Batch 880/1567 | Loss: 1.0694 [2026-04-18 13:07:41] Validation | Batch 890/1567 | Loss: 1.0698 [2026-04-18 13:07:42] Validation | Batch 900/1567 | Loss: 1.0718 [2026-04-18 13:07:43] Validation | Batch 910/1567 | Loss: 1.0721 [2026-04-18 13:07:44] Validation | Batch 920/1567 | Loss: 1.0737 [2026-04-18 13:07:44] Validation | Batch 930/1567 | Loss: 1.0716 [2026-04-18 13:07:45] Validation | Batch 940/1567 | Loss: 1.0715 [2026-04-18 13:07:46] Validation | Batch 950/1567 | Loss: 1.0705 [2026-04-18 13:07:46] Validation | Batch 960/1567 | Loss: 1.0692 [2026-04-18 13:07:47] Validation | Batch 970/1567 | Loss: 1.0706 [2026-04-18 13:07:48] Validation | Batch 980/1567 | Loss: 1.0710 [2026-04-18 13:07:49] Validation | Batch 990/1567 | Loss: 1.0707 [2026-04-18 13:07:49] Validation | Batch 1000/1567 | Loss: 1.0707 [2026-04-18 13:07:50] Validation | Batch 1010/1567 | Loss: 1.0686 [2026-04-18 13:07:51] Validation | Batch 1020/1567 | Loss: 1.0688 [2026-04-18 13:07:52] Validation | Batch 1030/1567 | Loss: 1.0701 [2026-04-18 13:07:53] Validation | Batch 1040/1567 | Loss: 1.0696 [2026-04-18 13:07:53] Validation | Batch 1050/1567 | Loss: 1.0704 [2026-04-18 13:07:54] Validation | Batch 1060/1567 | Loss: 1.0697 [2026-04-18 13:07:55] Validation | Batch 1070/1567 | Loss: 1.0690 [2026-04-18 13:07:56] Validation | Batch 1080/1567 | Loss: 1.0699 [2026-04-18 13:07:57] Validation | Batch 1090/1567 | Loss: 1.0697 [2026-04-18 13:07:57] Validation | Batch 1100/1567 | Loss: 1.0701 [2026-04-18 13:07:58] Validation | Batch 1110/1567 | Loss: 1.0699 [2026-04-18 13:07:59] Validation | Batch 1120/1567 | Loss: 1.0702 [2026-04-18 13:08:00] Validation | Batch 1130/1567 | Loss: 1.0705 [2026-04-18 13:08:00] Validation | Batch 1140/1567 | Loss: 1.0710 [2026-04-18 13:08:01] Validation | Batch 1150/1567 | Loss: 1.0711 [2026-04-18 13:08:02] Validation | Batch 1160/1567 | Loss: 1.0720 [2026-04-18 13:08:03] Validation | Batch 1170/1567 | Loss: 1.0717 [2026-04-18 13:08:04] Validation | Batch 1180/1567 | Loss: 1.0715 [2026-04-18 13:08:05] Validation | Batch 1190/1567 | Loss: 1.0726 [2026-04-18 13:08:05] Validation | Batch 1200/1567 | Loss: 1.0720 [2026-04-18 13:08:06] Validation | Batch 1210/1567 | Loss: 1.0712 [2026-04-18 13:08:07] Validation | Batch 1220/1567 | Loss: 1.0714 [2026-04-18 13:08:08] Validation | Batch 1230/1567 | Loss: 1.0733 [2026-04-18 13:08:08] Validation | Batch 1240/1567 | Loss: 1.0723 [2026-04-18 13:08:09] Validation | Batch 1250/1567 | Loss: 1.0721 [2026-04-18 13:08:10] Validation | Batch 1260/1567 | Loss: 1.0730 [2026-04-18 13:08:11] Validation | Batch 1270/1567 | Loss: 1.0730 [2026-04-18 13:08:12] Validation | Batch 1280/1567 | Loss: 1.0724 [2026-04-18 13:08:13] Validation | Batch 1290/1567 | Loss: 1.0726 [2026-04-18 13:08:14] Validation | Batch 1300/1567 | Loss: 1.0728 [2026-04-18 13:08:15] Validation | Batch 1310/1567 | Loss: 1.0733 [2026-04-18 13:08:15] Validation | Batch 1320/1567 | Loss: 1.0723 [2026-04-18 13:08:16] Validation | Batch 1330/1567 | Loss: 1.0719 [2026-04-18 13:08:17] Validation | Batch 1340/1567 | Loss: 1.0715 [2026-04-18 13:08:18] Validation | Batch 1350/1567 | Loss: 1.0721 [2026-04-18 13:08:18] Validation | Batch 1360/1567 | Loss: 1.0715 [2026-04-18 13:08:19] Validation | Batch 1370/1567 | Loss: 1.0719 [2026-04-18 13:08:20] Validation | Batch 1380/1567 | Loss: 1.0731 [2026-04-18 13:08:21] Validation | Batch 1390/1567 | Loss: 1.0730 [2026-04-18 13:08:21] Validation | Batch 1400/1567 | Loss: 1.0731 [2026-04-18 13:08:22] Validation | Batch 1410/1567 | Loss: 1.0728 [2026-04-18 13:08:23] Validation | Batch 1420/1567 | Loss: 1.0730 [2026-04-18 13:08:23] Validation | Batch 1430/1567 | Loss: 1.0726 [2026-04-18 13:08:24] Validation | Batch 1440/1567 | Loss: 1.0729 [2026-04-18 13:08:25] Validation | Batch 1450/1567 | Loss: 1.0723 [2026-04-18 13:08:26] Validation | Batch 1460/1567 | Loss: 1.0720 [2026-04-18 13:08:27] Validation | Batch 1470/1567 | Loss: 1.0711 [2026-04-18 13:08:27] Validation | Batch 1480/1567 | Loss: 1.0696 [2026-04-18 13:08:28] Validation | Batch 1490/1567 | Loss: 1.0696 [2026-04-18 13:08:29] Validation | Batch 1500/1567 | Loss: 1.0697 [2026-04-18 13:08:30] Validation | Batch 1510/1567 | Loss: 1.0693 [2026-04-18 13:08:30] Validation | Batch 1520/1567 | Loss: 1.0687 [2026-04-18 13:08:31] Validation | Batch 1530/1567 | Loss: 1.0696 [2026-04-18 13:08:32] Validation | Batch 1540/1567 | Loss: 1.0705 [2026-04-18 13:08:33] Validation | Batch 1550/1567 | Loss: 1.0708 [2026-04-18 13:08:34] Validation | Batch 1560/1567 | Loss: 1.0697 [2026-04-18 13:08:34] Validation | Batch 1567/1567 | Loss: 1.0699 [2026-04-18 13:08:34] Validation | Loss: 1.0699 | PPL: 2.94 | Time: 126.16s [2026-04-18 13:08:38] New best model saved! Val loss: 1.0699 [2026-04-18 13:08:42] Epoch 1 | Step 6010 | Loss: 1.0107 | LR: 2.00e-05 [2026-04-18 13:08:45] Epoch 1 | Step 6020 | Loss: 1.0105 | LR: 2.00e-05 [2026-04-18 13:08:49] Epoch 1 | Step 6030 | Loss: 1.0106 | LR: 2.00e-05 [2026-04-18 13:08:53] Epoch 1 | Step 6040 | Loss: 1.0106 | LR: 2.00e-05 [2026-04-18 13:08:57] Epoch 1 | Step 6050 | Loss: 1.0107 | LR: 2.00e-05 [2026-04-18 13:09:00] Epoch 1 | Step 6060 | Loss: 1.0107 | LR: 2.00e-05 [2026-04-18 13:09:04] Epoch 1 | Step 6070 | Loss: 1.0105 | LR: 2.00e-05 [2026-04-18 13:09:07] Epoch 1 | Step 6080 | Loss: 1.0105 | LR: 2.00e-05 [2026-04-18 13:09:11] Epoch 1 | Step 6090 | Loss: 1.0107 | LR: 2.00e-05 [2026-04-18 13:09:15] Epoch 1 | Step 6100 | Loss: 1.0108 | LR: 2.00e-05 [2026-04-18 13:09:19] Epoch 1 | Step 6110 | Loss: 1.0108 | LR: 2.00e-05 [2026-04-18 13:09:22] Epoch 1 | Step 6120 | Loss: 1.0107 | LR: 2.00e-05 [2026-04-18 13:09:26] Epoch 1 | Step 6130 | Loss: 1.0106 | LR: 2.00e-05 [2026-04-18 13:09:29] Epoch 1 | Step 6140 | Loss: 1.0101 | LR: 2.00e-05 [2026-04-18 13:09:33] Epoch 1 | Step 6150 | Loss: 1.0101 | LR: 2.00e-05 [2026-04-18 13:09:36] Epoch 1 | Step 6160 | Loss: 1.0101 | LR: 2.00e-05 [2026-04-18 13:09:40] Epoch 1 | Step 6170 | Loss: 1.0102 | LR: 2.00e-05 [2026-04-18 13:09:43] Epoch 1 | Step 6180 | Loss: 1.0099 | LR: 2.00e-05 [2026-04-18 13:09:47] Epoch 1 | Step 6190 | Loss: 1.0098 | LR: 2.00e-05 [2026-04-18 13:09:51] Epoch 1 | Step 6200 | Loss: 1.0096 | LR: 2.00e-05 [2026-04-18 13:09:54] Epoch 1 | Step 6210 | Loss: 1.0096 | LR: 2.00e-05 [2026-04-18 13:09:58] Epoch 1 | Step 6220 | Loss: 1.0096 | LR: 2.00e-05 [2026-04-18 13:10:02] Epoch 1 | Step 6230 | Loss: 1.0095 | LR: 2.00e-05 [2026-04-18 13:10:06] Epoch 1 | Step 6240 | Loss: 1.0095 | LR: 2.00e-05 [2026-04-18 13:10:09] Epoch 1 | Step 6250 | Loss: 1.0092 | LR: 2.00e-05 [2026-04-18 13:10:13] Epoch 1 | Step 6260 | Loss: 1.0092 | LR: 2.00e-05 [2026-04-18 13:10:16] Epoch 1 | Step 6270 | Loss: 1.0092 | LR: 2.00e-05 [2026-04-18 13:10:20] Epoch 1 | Step 6280 | Loss: 1.0090 | LR: 2.00e-05 [2026-04-18 13:10:24] Epoch 1 | Step 6290 | Loss: 1.0089 | LR: 2.00e-05 [2026-04-18 13:10:27] Epoch 1 | Step 6300 | Loss: 1.0089 | LR: 2.00e-05 [2026-04-18 13:10:31] Epoch 1 | Step 6310 | Loss: 1.0090 | LR: 2.00e-05 [2026-04-18 13:10:34] Epoch 1 | Step 6320 | Loss: 1.0089 | LR: 2.00e-05 [2026-04-18 13:10:39] Epoch 1 | Step 6330 | Loss: 1.0091 | LR: 2.00e-05 [2026-04-18 13:10:42] Epoch 1 | Step 6340 | Loss: 1.0092 | LR: 2.00e-05 [2026-04-18 13:10:45] Epoch 1 | Step 6350 | Loss: 1.0092 | LR: 2.00e-05 [2026-04-18 13:10:49] Epoch 1 | Step 6360 | Loss: 1.0093 | LR: 2.00e-05 [2026-04-18 13:10:52] Epoch 1 | Step 6370 | Loss: 1.0093 | LR: 2.00e-05 [2026-04-18 13:10:56] Epoch 1 | Step 6380 | Loss: 1.0092 | LR: 2.00e-05 [2026-04-18 13:10:59] Epoch 1 | Step 6390 | Loss: 1.0091 | LR: 2.00e-05 [2026-04-18 13:11:03] Epoch 1 | Step 6400 | Loss: 1.0090 | LR: 2.00e-05 [2026-04-18 13:11:06] Epoch 1 | Step 6410 | Loss: 1.0089 | LR: 2.00e-05 [2026-04-18 13:11:09] Epoch 1 | Step 6420 | Loss: 1.0088 | LR: 2.00e-05 [2026-04-18 13:11:13] Epoch 1 | Step 6430 | Loss: 1.0087 | LR: 2.00e-05 [2026-04-18 13:11:16] Epoch 1 | Step 6440 | Loss: 1.0087 | LR: 2.00e-05 [2026-04-18 13:11:20] Epoch 1 | Step 6450 | Loss: 1.0086 | LR: 2.00e-05 [2026-04-18 13:11:23] Epoch 1 | Step 6460 | Loss: 1.0082 | LR: 2.00e-05 [2026-04-18 13:11:27] Epoch 1 | Step 6470 | Loss: 1.0082 | LR: 2.00e-05 [2026-04-18 13:11:30] Epoch 1 | Step 6480 | Loss: 1.0083 | LR: 2.00e-05 [2026-04-18 13:11:34] Epoch 1 | Step 6490 | Loss: 1.0084 | LR: 2.00e-05 [2026-04-18 13:11:37] Epoch 1 | Step 6500 | Loss: 1.0081 | LR: 2.00e-05 [2026-04-18 13:11:41] Epoch 1 | Step 6510 | Loss: 1.0080 | LR: 2.00e-05 [2026-04-18 13:11:44] Epoch 1 | Step 6520 | Loss: 1.0078 | LR: 2.00e-05 [2026-04-18 13:11:47] Epoch 1 | Step 6530 | Loss: 1.0075 | LR: 2.00e-05 [2026-04-18 13:11:51] Epoch 1 | Step 6540 | Loss: 1.0074 | LR: 2.00e-05 [2026-04-18 13:11:54] Epoch 1 | Step 6550 | Loss: 1.0073 | LR: 2.00e-05 [2026-04-18 13:11:57] Epoch 1 | Step 6560 | Loss: 1.0073 | LR: 2.00e-05 [2026-04-18 13:12:01] Epoch 1 | Step 6570 | Loss: 1.0073 | LR: 2.00e-05 [2026-04-18 13:12:05] Epoch 1 | Step 6580 | Loss: 1.0072 | LR: 2.00e-05 [2026-04-18 13:12:08] Epoch 1 | Step 6590 | Loss: 1.0071 | LR: 2.00e-05 [2026-04-18 13:12:12] Epoch 1 | Step 6600 | Loss: 1.0070 | LR: 2.00e-05 [2026-04-18 13:12:16] Epoch 1 | Step 6610 | Loss: 1.0069 | LR: 2.00e-05 [2026-04-18 13:12:19] Epoch 1 | Step 6620 | Loss: 1.0068 | LR: 2.00e-05 [2026-04-18 13:12:22] Epoch 1 | Step 6630 | Loss: 1.0068 | LR: 2.00e-05 [2026-04-18 13:12:26] Epoch 1 | Step 6640 | Loss: 1.0068 | LR: 2.00e-05 [2026-04-18 13:12:29] Epoch 1 | Step 6650 | Loss: 1.0069 | LR: 2.00e-05 [2026-04-18 13:12:33] Epoch 1 | Step 6660 | Loss: 1.0067 | LR: 2.00e-05 [2026-04-18 13:12:36] Epoch 1 | Step 6670 | Loss: 1.0067 | LR: 2.00e-05 [2026-04-18 13:12:40] Epoch 1 | Step 6680 | Loss: 1.0067 | LR: 2.00e-05 [2026-04-18 13:12:43] Epoch 1 | Step 6690 | Loss: 1.0067 | LR: 2.00e-05 [2026-04-18 13:12:47] Epoch 1 | Step 6700 | Loss: 1.0066 | LR: 2.00e-05 [2026-04-18 13:12:50] Epoch 1 | Step 6710 | Loss: 1.0066 | LR: 2.00e-05 [2026-04-18 13:12:53] Epoch 1 | Step 6720 | Loss: 1.0065 | LR: 2.00e-05 [2026-04-18 13:12:57] Epoch 1 | Step 6730 | Loss: 1.0067 | LR: 2.00e-05 [2026-04-18 13:13:01] Epoch 1 | Step 6740 | Loss: 1.0066 | LR: 2.00e-05 [2026-04-18 13:13:04] Epoch 1 | Step 6750 | Loss: 1.0064 | LR: 2.00e-05 [2026-04-18 13:13:07] Epoch 1 | Step 6760 | Loss: 1.0065 | LR: 2.00e-05 [2026-04-18 13:13:11] Epoch 1 | Step 6770 | Loss: 1.0064 | LR: 2.00e-05 [2026-04-18 13:13:14] Epoch 1 | Step 6780 | Loss: 1.0064 | LR: 2.00e-05 [2026-04-18 13:13:18] Epoch 1 | Step 6790 | Loss: 1.0064 | LR: 2.00e-05 [2026-04-18 13:13:21] Epoch 1 | Step 6800 | Loss: 1.0066 | LR: 2.00e-05 [2026-04-18 13:13:25] Epoch 1 | Step 6810 | Loss: 1.0066 | LR: 2.00e-05 [2026-04-18 13:13:28] Epoch 1 | Step 6820 | Loss: 1.0067 | LR: 2.00e-05 [2026-04-18 13:13:32] Epoch 1 | Step 6830 | Loss: 1.0068 | LR: 2.00e-05 [2026-04-18 13:13:36] Epoch 1 | Step 6840 | Loss: 1.0070 | LR: 2.00e-05 [2026-04-18 13:13:39] Epoch 1 | Step 6850 | Loss: 1.0069 | LR: 2.00e-05 [2026-04-18 13:13:43] Epoch 1 | Step 6860 | Loss: 1.0068 | LR: 2.00e-05 [2026-04-18 13:13:46] Epoch 1 | Step 6870 | Loss: 1.0068 | LR: 2.00e-05 [2026-04-18 13:13:50] Epoch 1 | Step 6880 | Loss: 1.0067 | LR: 2.00e-05 [2026-04-18 13:13:53] Epoch 1 | Step 6890 | Loss: 1.0069 | LR: 2.00e-05 [2026-04-18 13:13:57] Epoch 1 | Step 6900 | Loss: 1.0068 | LR: 2.00e-05 [2026-04-18 13:14:00] Epoch 1 | Step 6910 | Loss: 1.0065 | LR: 2.00e-05 [2026-04-18 13:14:04] Epoch 1 | Step 6920 | Loss: 1.0065 | LR: 2.00e-05 [2026-04-18 13:14:07] Epoch 1 | Step 6930 | Loss: 1.0065 | LR: 2.00e-05 [2026-04-18 13:14:11] Epoch 1 | Step 6940 | Loss: 1.0063 | LR: 2.00e-05 [2026-04-18 13:14:15] Epoch 1 | Step 6950 | Loss: 1.0062 | LR: 2.00e-05 [2026-04-18 13:14:18] Epoch 1 | Step 6960 | Loss: 1.0063 | LR: 2.00e-05 [2026-04-18 13:14:22] Epoch 1 | Step 6970 | Loss: 1.0062 | LR: 2.00e-05 [2026-04-18 13:14:25] Epoch 1 | Step 6980 | Loss: 1.0062 | LR: 2.00e-05 [2026-04-18 13:14:29] Epoch 1 | Step 6990 | Loss: 1.0059 | LR: 2.00e-05 [2026-04-18 13:14:32] Epoch 1 | Step 7000 | Loss: 1.0058 | LR: 2.00e-05 [2026-04-18 13:14:33] Validation | Batch 10/1567 | Loss: 0.9349 [2026-04-18 13:14:34] Validation | Batch 20/1567 | Loss: 1.0120 [2026-04-18 13:14:35] Validation | Batch 30/1567 | Loss: 1.0616 [2026-04-18 13:14:36] Validation | Batch 40/1567 | Loss: 1.0793 [2026-04-18 13:14:36] Validation | Batch 50/1567 | Loss: 1.0531 [2026-04-18 13:14:37] Validation | Batch 60/1567 | Loss: 1.0398 [2026-04-18 13:14:38] Validation | Batch 70/1567 | Loss: 1.0256 [2026-04-18 13:14:39] Validation | Batch 80/1567 | Loss: 1.0416 [2026-04-18 13:14:40] Validation | Batch 90/1567 | Loss: 1.0505 [2026-04-18 13:14:41] Validation | Batch 100/1567 | Loss: 1.0587 [2026-04-18 13:14:42] Validation | Batch 110/1567 | Loss: 1.0485 [2026-04-18 13:14:43] Validation | Batch 120/1567 | Loss: 1.0579 [2026-04-18 13:14:44] Validation | Batch 130/1567 | Loss: 1.0590 [2026-04-18 13:14:44] Validation | Batch 140/1567 | Loss: 1.0629 [2026-04-18 13:14:45] Validation | Batch 150/1567 | Loss: 1.0717 [2026-04-18 13:14:46] Validation | Batch 160/1567 | Loss: 1.0724 [2026-04-18 13:14:47] Validation | Batch 170/1567 | Loss: 1.0581 [2026-04-18 13:14:47] Validation | Batch 180/1567 | Loss: 1.0601 [2026-04-18 13:14:48] Validation | Batch 190/1567 | Loss: 1.0579 [2026-04-18 13:14:49] Validation | Batch 200/1567 | Loss: 1.0608 [2026-04-18 13:14:50] Validation | Batch 210/1567 | Loss: 1.0621 [2026-04-18 13:14:51] Validation | Batch 220/1567 | Loss: 1.0650 [2026-04-18 13:14:52] Validation | Batch 230/1567 | Loss: 1.0681 [2026-04-18 13:14:53] Validation | Batch 240/1567 | Loss: 1.0663 [2026-04-18 13:14:53] Validation | Batch 250/1567 | Loss: 1.0607 [2026-04-18 13:14:54] Validation | Batch 260/1567 | Loss: 1.0563 [2026-04-18 13:14:55] Validation | Batch 270/1567 | Loss: 1.0530 [2026-04-18 13:14:55] Validation | Batch 280/1567 | Loss: 1.0540 [2026-04-18 13:14:57] Validation | Batch 290/1567 | Loss: 1.0590 [2026-04-18 13:14:57] Validation | Batch 300/1567 | Loss: 1.0647 [2026-04-18 13:14:58] Validation | Batch 310/1567 | Loss: 1.0640 [2026-04-18 13:14:59] Validation | Batch 320/1567 | Loss: 1.0638 [2026-04-18 13:15:00] Validation | Batch 330/1567 | Loss: 1.0614 [2026-04-18 13:15:01] Validation | Batch 340/1567 | Loss: 1.0652 [2026-04-18 13:15:02] Validation | Batch 350/1567 | Loss: 1.0649 [2026-04-18 13:15:02] Validation | Batch 360/1567 | Loss: 1.0629 [2026-04-18 13:15:03] Validation | Batch 370/1567 | Loss: 1.0601 [2026-04-18 13:15:04] Validation | Batch 380/1567 | Loss: 1.0635 [2026-04-18 13:15:05] Validation | Batch 390/1567 | Loss: 1.0647 [2026-04-18 13:15:05] Validation | Batch 400/1567 | Loss: 1.0666 [2026-04-18 13:15:06] Validation | Batch 410/1567 | Loss: 1.0659 [2026-04-18 13:15:07] Validation | Batch 420/1567 | Loss: 1.0654 [2026-04-18 13:15:08] Validation | Batch 430/1567 | Loss: 1.0655 [2026-04-18 13:15:09] Validation | Batch 440/1567 | Loss: 1.0643 [2026-04-18 13:15:10] Validation | Batch 450/1567 | Loss: 1.0638 [2026-04-18 13:15:11] Validation | Batch 460/1567 | Loss: 1.0627 [2026-04-18 13:15:11] Validation | Batch 470/1567 | Loss: 1.0613 [2026-04-18 13:15:12] Validation | Batch 480/1567 | Loss: 1.0592 [2026-04-18 13:15:13] Validation | Batch 490/1567 | Loss: 1.0593 [2026-04-18 13:15:13] Validation | Batch 500/1567 | Loss: 1.0593 [2026-04-18 13:15:14] Validation | Batch 510/1567 | Loss: 1.0613 [2026-04-18 13:15:15] Validation | Batch 520/1567 | Loss: 1.0629 [2026-04-18 13:15:16] Validation | Batch 530/1567 | Loss: 1.0623 [2026-04-18 13:15:17] Validation | Batch 540/1567 | Loss: 1.0645 [2026-04-18 13:15:18] Validation | Batch 550/1567 | Loss: 1.0679 [2026-04-18 13:15:19] Validation | Batch 560/1567 | Loss: 1.0674 [2026-04-18 13:15:19] Validation | Batch 570/1567 | Loss: 1.0671 [2026-04-18 13:15:20] Validation | Batch 580/1567 | Loss: 1.0658 [2026-04-18 13:15:21] Validation | Batch 590/1567 | Loss: 1.0646 [2026-04-18 13:15:22] Validation | Batch 600/1567 | Loss: 1.0631 [2026-04-18 13:15:23] Validation | Batch 610/1567 | Loss: 1.0620 [2026-04-18 13:15:24] Validation | Batch 620/1567 | Loss: 1.0630 [2026-04-18 13:15:25] Validation | Batch 630/1567 | Loss: 1.0610 [2026-04-18 13:15:25] Validation | Batch 640/1567 | Loss: 1.0623 [2026-04-18 13:15:27] Validation | Batch 650/1567 | Loss: 1.0612 [2026-04-18 13:15:27] Validation | Batch 660/1567 | Loss: 1.0600 [2026-04-18 13:15:28] Validation | Batch 670/1567 | Loss: 1.0578 [2026-04-18 13:15:29] Validation | Batch 680/1567 | Loss: 1.0575 [2026-04-18 13:15:29] Validation | Batch 690/1567 | Loss: 1.0585 [2026-04-18 13:15:30] Validation | Batch 700/1567 | Loss: 1.0570 [2026-04-18 13:15:31] Validation | Batch 710/1567 | Loss: 1.0581 [2026-04-18 13:15:32] Validation | Batch 720/1567 | Loss: 1.0575 [2026-04-18 13:15:33] Validation | Batch 730/1567 | Loss: 1.0583 [2026-04-18 13:15:33] Validation | Batch 740/1567 | Loss: 1.0591 [2026-04-18 13:15:34] Validation | Batch 750/1567 | Loss: 1.0591 [2026-04-18 13:15:35] Validation | Batch 760/1567 | Loss: 1.0591 [2026-04-18 13:15:36] Validation | Batch 770/1567 | Loss: 1.0613 [2026-04-18 13:15:37] Validation | Batch 780/1567 | Loss: 1.0624 [2026-04-18 13:15:38] Validation | Batch 790/1567 | Loss: 1.0619 [2026-04-18 13:15:38] Validation | Batch 800/1567 | Loss: 1.0636 [2026-04-18 13:15:39] Validation | Batch 810/1567 | Loss: 1.0637 [2026-04-18 13:15:40] Validation | Batch 820/1567 | Loss: 1.0634 [2026-04-18 13:15:41] Validation | Batch 830/1567 | Loss: 1.0618 [2026-04-18 13:15:41] Validation | Batch 840/1567 | Loss: 1.0619 [2026-04-18 13:15:42] Validation | Batch 850/1567 | Loss: 1.0608 [2026-04-18 13:15:43] Validation | Batch 860/1567 | Loss: 1.0623 [2026-04-18 13:15:43] Validation | Batch 870/1567 | Loss: 1.0629 [2026-04-18 13:15:44] Validation | Batch 880/1567 | Loss: 1.0637 [2026-04-18 13:15:45] Validation | Batch 890/1567 | Loss: 1.0641 [2026-04-18 13:15:46] Validation | Batch 900/1567 | Loss: 1.0659 [2026-04-18 13:15:46] Validation | Batch 910/1567 | Loss: 1.0663 [2026-04-18 13:15:47] Validation | Batch 920/1567 | Loss: 1.0682 [2026-04-18 13:15:48] Validation | Batch 930/1567 | Loss: 1.0661 [2026-04-18 13:15:49] Validation | Batch 940/1567 | Loss: 1.0658 [2026-04-18 13:15:49] Validation | Batch 950/1567 | Loss: 1.0648 [2026-04-18 13:15:50] Validation | Batch 960/1567 | Loss: 1.0634 [2026-04-18 13:15:51] Validation | Batch 970/1567 | Loss: 1.0646 [2026-04-18 13:15:52] Validation | Batch 980/1567 | Loss: 1.0652 [2026-04-18 13:15:52] Validation | Batch 990/1567 | Loss: 1.0647 [2026-04-18 13:15:53] Validation | Batch 1000/1567 | Loss: 1.0648 [2026-04-18 13:15:54] Validation | Batch 1010/1567 | Loss: 1.0626 [2026-04-18 13:15:55] Validation | Batch 1020/1567 | Loss: 1.0628 [2026-04-18 13:15:56] Validation | Batch 1030/1567 | Loss: 1.0640 [2026-04-18 13:15:56] Validation | Batch 1040/1567 | Loss: 1.0636 [2026-04-18 13:15:57] Validation | Batch 1050/1567 | Loss: 1.0645 [2026-04-18 13:15:58] Validation | Batch 1060/1567 | Loss: 1.0639 [2026-04-18 13:15:59] Validation | Batch 1070/1567 | Loss: 1.0631 [2026-04-18 13:16:00] Validation | Batch 1080/1567 | Loss: 1.0641 [2026-04-18 13:16:00] Validation | Batch 1090/1567 | Loss: 1.0639 [2026-04-18 13:16:01] Validation | Batch 1100/1567 | Loss: 1.0643 [2026-04-18 13:16:02] Validation | Batch 1110/1567 | Loss: 1.0640 [2026-04-18 13:16:02] Validation | Batch 1120/1567 | Loss: 1.0644 [2026-04-18 13:16:03] Validation | Batch 1130/1567 | Loss: 1.0647 [2026-04-18 13:16:04] Validation | Batch 1140/1567 | Loss: 1.0654 [2026-04-18 13:16:05] Validation | Batch 1150/1567 | Loss: 1.0656 [2026-04-18 13:16:06] Validation | Batch 1160/1567 | Loss: 1.0663 [2026-04-18 13:16:07] Validation | Batch 1170/1567 | Loss: 1.0661 [2026-04-18 13:16:08] Validation | Batch 1180/1567 | Loss: 1.0658 [2026-04-18 13:16:08] Validation | Batch 1190/1567 | Loss: 1.0669 [2026-04-18 13:16:09] Validation | Batch 1200/1567 | Loss: 1.0664 [2026-04-18 13:16:10] Validation | Batch 1210/1567 | Loss: 1.0653 [2026-04-18 13:16:11] Validation | Batch 1220/1567 | Loss: 1.0657 [2026-04-18 13:16:12] Validation | Batch 1230/1567 | Loss: 1.0676 [2026-04-18 13:16:12] Validation | Batch 1240/1567 | Loss: 1.0666 [2026-04-18 13:16:13] Validation | Batch 1250/1567 | Loss: 1.0666 [2026-04-18 13:16:14] Validation | Batch 1260/1567 | Loss: 1.0675 [2026-04-18 13:16:15] Validation | Batch 1270/1567 | Loss: 1.0674 [2026-04-18 13:16:16] Validation | Batch 1280/1567 | Loss: 1.0668 [2026-04-18 13:16:17] Validation | Batch 1290/1567 | Loss: 1.0671 [2026-04-18 13:16:18] Validation | Batch 1300/1567 | Loss: 1.0674 [2026-04-18 13:16:18] Validation | Batch 1310/1567 | Loss: 1.0678 [2026-04-18 13:16:19] Validation | Batch 1320/1567 | Loss: 1.0670 [2026-04-18 13:16:20] Validation | Batch 1330/1567 | Loss: 1.0666 [2026-04-18 13:16:21] Validation | Batch 1340/1567 | Loss: 1.0662 [2026-04-18 13:16:21] Validation | Batch 1350/1567 | Loss: 1.0669 [2026-04-18 13:16:22] Validation | Batch 1360/1567 | Loss: 1.0664 [2026-04-18 13:16:23] Validation | Batch 1370/1567 | Loss: 1.0667 [2026-04-18 13:16:24] Validation | Batch 1380/1567 | Loss: 1.0679 [2026-04-18 13:16:25] Validation | Batch 1390/1567 | Loss: 1.0680 [2026-04-18 13:16:25] Validation | Batch 1400/1567 | Loss: 1.0682 [2026-04-18 13:16:26] Validation | Batch 1410/1567 | Loss: 1.0679 [2026-04-18 13:16:26] Validation | Batch 1420/1567 | Loss: 1.0684 [2026-04-18 13:16:27] Validation | Batch 1430/1567 | Loss: 1.0681 [2026-04-18 13:16:28] Validation | Batch 1440/1567 | Loss: 1.0685 [2026-04-18 13:16:29] Validation | Batch 1450/1567 | Loss: 1.0679 [2026-04-18 13:16:29] Validation | Batch 1460/1567 | Loss: 1.0677 [2026-04-18 13:16:30] Validation | Batch 1470/1567 | Loss: 1.0667 [2026-04-18 13:16:31] Validation | Batch 1480/1567 | Loss: 1.0652 [2026-04-18 13:16:32] Validation | Batch 1490/1567 | Loss: 1.0651 [2026-04-18 13:16:33] Validation | Batch 1500/1567 | Loss: 1.0652 [2026-04-18 13:16:33] Validation | Batch 1510/1567 | Loss: 1.0648 [2026-04-18 13:16:34] Validation | Batch 1520/1567 | Loss: 1.0641 [2026-04-18 13:16:34] Validation | Batch 1530/1567 | Loss: 1.0650 [2026-04-18 13:16:36] Validation | Batch 1540/1567 | Loss: 1.0660 [2026-04-18 13:16:36] Validation | Batch 1550/1567 | Loss: 1.0662 [2026-04-18 13:16:37] Validation | Batch 1560/1567 | Loss: 1.0652 [2026-04-18 13:16:38] Validation | Batch 1567/1567 | Loss: 1.0655 [2026-04-18 13:16:38] Validation | Loss: 1.0655 | PPL: 2.93 | Time: 125.78s [2026-04-18 13:16:42] New best model saved! Val loss: 1.0655 [2026-04-18 13:16:45] Epoch 1 | Step 7010 | Loss: 1.0057 | LR: 2.00e-05 [2026-04-18 13:16:49] Epoch 1 | Step 7020 | Loss: 1.0058 | LR: 2.00e-05 [2026-04-18 13:16:52] Epoch 1 | Step 7030 | Loss: 1.0057 | LR: 2.00e-05 [2026-04-18 13:16:56] Epoch 1 | Step 7040 | Loss: 1.0058 | LR: 2.00e-05 [2026-04-18 13:16:59] Epoch 1 | Step 7050 | Loss: 1.0056 | LR: 2.00e-05 [2026-04-18 13:17:02] Epoch 1 | Step 7060 | Loss: 1.0056 | LR: 2.00e-05 [2026-04-18 13:17:06] Epoch 1 | Step 7070 | Loss: 1.0057 | LR: 2.00e-05 [2026-04-18 13:17:09] Epoch 1 | Step 7080 | Loss: 1.0056 | LR: 2.00e-05 [2026-04-18 13:17:13] Epoch 1 | Step 7090 | Loss: 1.0056 | LR: 2.00e-05 [2026-04-18 13:17:16] Epoch 1 | Step 7100 | Loss: 1.0054 | LR: 2.00e-05 [2026-04-18 13:17:20] Epoch 1 | Step 7110 | Loss: 1.0053 | LR: 2.00e-05 [2026-04-18 13:17:24] Epoch 1 | Step 7120 | Loss: 1.0054 | LR: 2.00e-05 [2026-04-18 13:17:27] Epoch 1 | Step 7130 | Loss: 1.0052 | LR: 2.00e-05 [2026-04-18 13:17:31] Epoch 1 | Step 7140 | Loss: 1.0051 | LR: 2.00e-05 [2026-04-18 13:17:34] Epoch 1 | Step 7150 | Loss: 1.0053 | LR: 2.00e-05 [2026-04-18 13:17:38] Epoch 1 | Step 7160 | Loss: 1.0051 | LR: 2.00e-05 [2026-04-18 13:17:42] Epoch 1 | Step 7170 | Loss: 1.0051 | LR: 2.00e-05 [2026-04-18 13:17:45] Epoch 1 | Step 7180 | Loss: 1.0051 | LR: 2.00e-05 [2026-04-18 13:17:49] Epoch 1 | Step 7190 | Loss: 1.0051 | LR: 2.00e-05 [2026-04-18 13:17:52] Epoch 1 | Step 7200 | Loss: 1.0049 | LR: 2.00e-05 [2026-04-18 13:17:56] Epoch 1 | Step 7210 | Loss: 1.0048 | LR: 2.00e-05 [2026-04-18 13:18:00] Epoch 1 | Step 7220 | Loss: 1.0049 | LR: 2.00e-05 [2026-04-18 13:18:04] Epoch 1 | Step 7230 | Loss: 1.0049 | LR: 2.00e-05 [2026-04-18 13:18:07] Epoch 1 | Step 7240 | Loss: 1.0049 | LR: 2.00e-05 [2026-04-18 13:18:11] Epoch 1 | Step 7250 | Loss: 1.0049 | LR: 2.00e-05 [2026-04-18 13:18:15] Epoch 1 | Step 7260 | Loss: 1.0048 | LR: 2.00e-05 [2026-04-18 13:18:18] Epoch 1 | Step 7270 | Loss: 1.0049 | LR: 2.00e-05 [2026-04-18 13:18:22] Epoch 1 | Step 7280 | Loss: 1.0050 | LR: 2.00e-05 [2026-04-18 13:18:25] Epoch 1 | Step 7290 | Loss: 1.0048 | LR: 2.00e-05 [2026-04-18 13:18:29] Epoch 1 | Step 7300 | Loss: 1.0047 | LR: 2.00e-05 [2026-04-18 13:18:33] Epoch 1 | Step 7310 | Loss: 1.0045 | LR: 2.00e-05 [2026-04-18 13:18:36] Epoch 1 | Step 7320 | Loss: 1.0044 | LR: 2.00e-05 [2026-04-18 13:18:40] Epoch 1 | Step 7330 | Loss: 1.0044 | LR: 2.00e-05 [2026-04-18 13:18:44] Epoch 1 | Step 7340 | Loss: 1.0045 | LR: 2.00e-05 [2026-04-18 13:18:47] Epoch 1 | Step 7350 | Loss: 1.0046 | LR: 2.00e-05 [2026-04-18 13:18:51] Epoch 1 | Step 7360 | Loss: 1.0044 | LR: 2.00e-05 [2026-04-18 13:18:55] Epoch 1 | Step 7370 | Loss: 1.0042 | LR: 2.00e-05 [2026-04-18 13:18:58] Epoch 1 | Step 7380 | Loss: 1.0041 | LR: 2.00e-05 [2026-04-18 13:19:02] Epoch 1 | Step 7390 | Loss: 1.0040 | LR: 2.00e-05 [2026-04-18 13:19:05] Epoch 1 | Step 7400 | Loss: 1.0039 | LR: 2.00e-05 [2026-04-18 13:19:09] Epoch 1 | Step 7410 | Loss: 1.0040 | LR: 2.00e-05 [2026-04-18 13:19:12] Epoch 1 | Step 7420 | Loss: 1.0040 | LR: 2.00e-05 [2026-04-18 13:19:16] Epoch 1 | Step 7430 | Loss: 1.0039 | LR: 2.00e-05 [2026-04-18 13:19:20] Epoch 1 | Step 7440 | Loss: 1.0039 | LR: 2.00e-05 [2026-04-18 13:19:23] Epoch 1 | Step 7450 | Loss: 1.0038 | LR: 2.00e-05 [2026-04-18 13:19:27] Epoch 1 | Step 7460 | Loss: 1.0036 | LR: 2.00e-05 [2026-04-18 13:19:30] Epoch 1 | Step 7470 | Loss: 1.0037 | LR: 2.00e-05 [2026-04-18 13:19:34] Epoch 1 | Step 7480 | Loss: 1.0037 | LR: 2.00e-05 [2026-04-18 13:19:37] Epoch 1 | Step 7490 | Loss: 1.0038 | LR: 2.00e-05 [2026-04-18 13:19:41] Epoch 1 | Step 7500 | Loss: 1.0038 | LR: 2.00e-05 [2026-04-18 13:19:44] Epoch 1 | Step 7510 | Loss: 1.0038 | LR: 2.00e-05 [2026-04-18 13:19:48] Epoch 1 | Step 7520 | Loss: 1.0037 | LR: 2.00e-05 [2026-04-18 13:19:51] Epoch 1 | Step 7530 | Loss: 1.0037 | LR: 2.00e-05 [2026-04-18 13:19:55] Epoch 1 | Step 7540 | Loss: 1.0035 | LR: 2.00e-05 [2026-04-18 13:19:58] Epoch 1 | Step 7550 | Loss: 1.0036 | LR: 2.00e-05 [2026-04-18 13:20:02] Epoch 1 | Step 7560 | Loss: 1.0035 | LR: 2.00e-05 [2026-04-18 13:20:05] Epoch 1 | Step 7570 | Loss: 1.0035 | LR: 2.00e-05 [2026-04-18 13:20:09] Epoch 1 | Step 7580 | Loss: 1.0034 | LR: 2.00e-05 [2026-04-18 13:20:12] Epoch 1 | Step 7590 | Loss: 1.0033 | LR: 2.00e-05 [2026-04-18 13:20:16] Epoch 1 | Step 7600 | Loss: 1.0031 | LR: 2.00e-05 [2026-04-18 13:20:19] Epoch 1 | Step 7610 | Loss: 1.0031 | LR: 2.00e-05 [2026-04-18 13:20:23] Epoch 1 | Step 7620 | Loss: 1.0029 | LR: 2.00e-05 [2026-04-18 13:20:26] Epoch 1 | Step 7630 | Loss: 1.0028 | LR: 2.00e-05 [2026-04-18 13:20:30] Epoch 1 | Step 7640 | Loss: 1.0028 | LR: 2.00e-05 [2026-04-18 13:20:34] Epoch 1 | Step 7650 | Loss: 1.0026 | LR: 2.00e-05 [2026-04-18 13:20:37] Epoch 1 | Step 7660 | Loss: 1.0025 | LR: 2.00e-05 [2026-04-18 13:20:41] Epoch 1 | Step 7670 | Loss: 1.0024 | LR: 2.00e-05 [2026-04-18 13:20:44] Epoch 1 | Step 7680 | Loss: 1.0023 | LR: 2.00e-05 [2026-04-18 13:20:48] Epoch 1 | Step 7690 | Loss: 1.0024 | LR: 2.00e-05 [2026-04-18 13:20:52] Epoch 1 | Step 7700 | Loss: 1.0023 | LR: 2.00e-05 [2026-04-18 13:20:56] Epoch 1 | Step 7710 | Loss: 1.0021 | LR: 2.00e-05 [2026-04-18 13:20:59] Epoch 1 | Step 7720 | Loss: 1.0022 | LR: 2.00e-05 [2026-04-18 13:21:02] Epoch 1 | Step 7730 | Loss: 1.0023 | LR: 2.00e-05 [2026-04-18 13:21:06] Epoch 1 | Step 7740 | Loss: 1.0024 | LR: 2.00e-05 [2026-04-18 13:21:10] Epoch 1 | Step 7750 | Loss: 1.0024 | LR: 2.00e-05 [2026-04-18 13:21:14] Epoch 1 | Step 7760 | Loss: 1.0022 | LR: 2.00e-05 [2026-04-18 13:21:17] Epoch 1 | Step 7770 | Loss: 1.0021 | LR: 2.00e-05 [2026-04-18 13:21:20] Epoch 1 | Step 7780 | Loss: 1.0021 | LR: 2.00e-05 [2026-04-18 13:21:24] Epoch 1 | Step 7790 | Loss: 1.0020 | LR: 2.00e-05 [2026-04-18 13:21:27] Epoch 1 | Step 7800 | Loss: 1.0019 | LR: 2.00e-05 [2026-04-18 13:21:31] Epoch 1 | Step 7810 | Loss: 1.0020 | LR: 2.00e-05 [2026-04-18 13:21:35] Epoch 1 | Step 7820 | Loss: 1.0021 | LR: 2.00e-05 [2026-04-18 13:21:38] Epoch 1 | Step 7830 | Loss: 1.0020 | LR: 2.00e-05 [2026-04-18 13:21:42] Epoch 1 | Step 7840 | Loss: 1.0019 | LR: 2.00e-05 [2026-04-18 13:21:45] Epoch 1 | Step 7850 | Loss: 1.0017 | LR: 2.00e-05 [2026-04-18 13:21:49] Epoch 1 | Step 7860 | Loss: 1.0017 | LR: 2.00e-05 [2026-04-18 13:21:53] Epoch 1 | Step 7870 | Loss: 1.0015 | LR: 2.00e-05 [2026-04-18 13:21:56] Epoch 1 | Step 7880 | Loss: 1.0016 | LR: 2.00e-05 [2026-04-18 13:22:00] Epoch 1 | Step 7890 | Loss: 1.0016 | LR: 2.00e-05 [2026-04-18 13:22:05] Epoch 1 | Step 7900 | Loss: 1.0015 | LR: 2.00e-05 [2026-04-18 13:22:08] Epoch 1 | Step 7910 | Loss: 1.0016 | LR: 2.00e-05 [2026-04-18 13:22:12] Epoch 1 | Step 7920 | Loss: 1.0016 | LR: 2.00e-05 [2026-04-18 13:22:15] Epoch 1 | Step 7930 | Loss: 1.0017 | LR: 2.00e-05 [2026-04-18 13:22:19] Epoch 1 | Step 7940 | Loss: 1.0017 | LR: 2.00e-05 [2026-04-18 13:22:22] Epoch 1 | Step 7950 | Loss: 1.0019 | LR: 2.00e-05 [2026-04-18 13:22:26] Epoch 1 | Step 7960 | Loss: 1.0018 | LR: 2.00e-05 [2026-04-18 13:22:29] Epoch 1 | Step 7970 | Loss: 1.0019 | LR: 2.00e-05 [2026-04-18 13:22:32] Epoch 1 | Step 7980 | Loss: 1.0018 | LR: 2.00e-05 [2026-04-18 13:22:36] Epoch 1 | Step 7990 | Loss: 1.0017 | LR: 2.00e-05 [2026-04-18 13:22:39] Epoch 1 | Step 8000 | Loss: 1.0017 | LR: 2.00e-05 [2026-04-18 13:22:40] Validation | Batch 10/1567 | Loss: 0.9650 [2026-04-18 13:22:41] Validation | Batch 20/1567 | Loss: 1.0162 [2026-04-18 13:22:42] Validation | Batch 30/1567 | Loss: 1.0543 [2026-04-18 13:22:43] Validation | Batch 40/1567 | Loss: 1.0733 [2026-04-18 13:22:44] Validation | Batch 50/1567 | Loss: 1.0521 [2026-04-18 13:22:45] Validation | Batch 60/1567 | Loss: 1.0419 [2026-04-18 13:22:46] Validation | Batch 70/1567 | Loss: 1.0251 [2026-04-18 13:22:47] Validation | Batch 80/1567 | Loss: 1.0433 [2026-04-18 13:22:47] Validation | Batch 90/1567 | Loss: 1.0521 [2026-04-18 13:22:48] Validation | Batch 100/1567 | Loss: 1.0605 [2026-04-18 13:22:49] Validation | Batch 110/1567 | Loss: 1.0507 [2026-04-18 13:22:50] Validation | Batch 120/1567 | Loss: 1.0608 [2026-04-18 13:22:51] Validation | Batch 130/1567 | Loss: 1.0624 [2026-04-18 13:22:52] Validation | Batch 140/1567 | Loss: 1.0652 [2026-04-18 13:22:52] Validation | Batch 150/1567 | Loss: 1.0738 [2026-04-18 13:22:53] Validation | Batch 160/1567 | Loss: 1.0734 [2026-04-18 13:22:54] Validation | Batch 170/1567 | Loss: 1.0597 [2026-04-18 13:22:55] Validation | Batch 180/1567 | Loss: 1.0604 [2026-04-18 13:22:56] Validation | Batch 190/1567 | Loss: 1.0577 [2026-04-18 13:22:57] Validation | Batch 200/1567 | Loss: 1.0608 [2026-04-18 13:22:57] Validation | Batch 210/1567 | Loss: 1.0633 [2026-04-18 13:22:58] Validation | Batch 220/1567 | Loss: 1.0657 [2026-04-18 13:22:59] Validation | Batch 230/1567 | Loss: 1.0687 [2026-04-18 13:23:00] Validation | Batch 240/1567 | Loss: 1.0670 [2026-04-18 13:23:01] Validation | Batch 250/1567 | Loss: 1.0619 [2026-04-18 13:23:01] Validation | Batch 260/1567 | Loss: 1.0574 [2026-04-18 13:23:02] Validation | Batch 270/1567 | Loss: 1.0545 [2026-04-18 13:23:03] Validation | Batch 280/1567 | Loss: 1.0551 [2026-04-18 13:23:04] Validation | Batch 290/1567 | Loss: 1.0600 [2026-04-18 13:23:05] Validation | Batch 300/1567 | Loss: 1.0662 [2026-04-18 13:23:06] Validation | Batch 310/1567 | Loss: 1.0657 [2026-04-18 13:23:06] Validation | Batch 320/1567 | Loss: 1.0658 [2026-04-18 13:23:07] Validation | Batch 330/1567 | Loss: 1.0628 [2026-04-18 13:23:08] Validation | Batch 340/1567 | Loss: 1.0664 [2026-04-18 13:23:09] Validation | Batch 350/1567 | Loss: 1.0663 [2026-04-18 13:23:10] Validation | Batch 360/1567 | Loss: 1.0644 [2026-04-18 13:23:11] Validation | Batch 370/1567 | Loss: 1.0617 [2026-04-18 13:23:11] Validation | Batch 380/1567 | Loss: 1.0651 [2026-04-18 13:23:12] Validation | Batch 390/1567 | Loss: 1.0662 [2026-04-18 13:23:13] Validation | Batch 400/1567 | Loss: 1.0678 [2026-04-18 13:23:14] Validation | Batch 410/1567 | Loss: 1.0674 [2026-04-18 13:23:14] Validation | Batch 420/1567 | Loss: 1.0669 [2026-04-18 13:23:15] Validation | Batch 430/1567 | Loss: 1.0670 [2026-04-18 13:23:16] Validation | Batch 440/1567 | Loss: 1.0658 [2026-04-18 13:23:17] Validation | Batch 450/1567 | Loss: 1.0656 [2026-04-18 13:23:18] Validation | Batch 460/1567 | Loss: 1.0640 [2026-04-18 13:23:19] Validation | Batch 470/1567 | Loss: 1.0628 [2026-04-18 13:23:19] Validation | Batch 480/1567 | Loss: 1.0609 [2026-04-18 13:23:20] Validation | Batch 490/1567 | Loss: 1.0612 [2026-04-18 13:23:21] Validation | Batch 500/1567 | Loss: 1.0616 [2026-04-18 13:23:22] Validation | Batch 510/1567 | Loss: 1.0637 [2026-04-18 13:23:22] Validation | Batch 520/1567 | Loss: 1.0655 [2026-04-18 13:23:23] Validation | Batch 530/1567 | Loss: 1.0649 [2026-04-18 13:23:24] Validation | Batch 540/1567 | Loss: 1.0669 [2026-04-18 13:23:25] Validation | Batch 550/1567 | Loss: 1.0703 [2026-04-18 13:23:26] Validation | Batch 560/1567 | Loss: 1.0700 [2026-04-18 13:23:27] Validation | Batch 570/1567 | Loss: 1.0697 [2026-04-18 13:23:28] Validation | Batch 580/1567 | Loss: 1.0683 [2026-04-18 13:23:29] Validation | Batch 590/1567 | Loss: 1.0669 [2026-04-18 13:23:29] Validation | Batch 600/1567 | Loss: 1.0652 [2026-04-18 13:23:31] Validation | Batch 610/1567 | Loss: 1.0644 [2026-04-18 13:23:31] Validation | Batch 620/1567 | Loss: 1.0652 [2026-04-18 13:23:32] Validation | Batch 630/1567 | Loss: 1.0633 [2026-04-18 13:23:33] Validation | Batch 640/1567 | Loss: 1.0646 [2026-04-18 13:23:34] Validation | Batch 650/1567 | Loss: 1.0637 [2026-04-18 13:23:35] Validation | Batch 660/1567 | Loss: 1.0622 [2026-04-18 13:23:35] Validation | Batch 670/1567 | Loss: 1.0603 [2026-04-18 13:23:36] Validation | Batch 680/1567 | Loss: 1.0599 [2026-04-18 13:23:37] Validation | Batch 690/1567 | Loss: 1.0611 [2026-04-18 13:23:38] Validation | Batch 700/1567 | Loss: 1.0595 [2026-04-18 13:23:39] Validation | Batch 710/1567 | Loss: 1.0604 [2026-04-18 13:23:39] Validation | Batch 720/1567 | Loss: 1.0596 [2026-04-18 13:23:40] Validation | Batch 730/1567 | Loss: 1.0603 [2026-04-18 13:23:41] Validation | Batch 740/1567 | Loss: 1.0611 [2026-04-18 13:23:42] Validation | Batch 750/1567 | Loss: 1.0609 [2026-04-18 13:23:42] Validation | Batch 760/1567 | Loss: 1.0611 [2026-04-18 13:23:43] Validation | Batch 770/1567 | Loss: 1.0628 [2026-04-18 13:23:44] Validation | Batch 780/1567 | Loss: 1.0641 [2026-04-18 13:23:45] Validation | Batch 790/1567 | Loss: 1.0637 [2026-04-18 13:23:46] Validation | Batch 800/1567 | Loss: 1.0653 [2026-04-18 13:23:46] Validation | Batch 810/1567 | Loss: 1.0652 [2026-04-18 13:23:47] Validation | Batch 820/1567 | Loss: 1.0649 [2026-04-18 13:23:48] Validation | Batch 830/1567 | Loss: 1.0634 [2026-04-18 13:23:49] Validation | Batch 840/1567 | Loss: 1.0637 [2026-04-18 13:23:49] Validation | Batch 850/1567 | Loss: 1.0627 [2026-04-18 13:23:50] Validation | Batch 860/1567 | Loss: 1.0642 [2026-04-18 13:23:51] Validation | Batch 870/1567 | Loss: 1.0649 [2026-04-18 13:23:52] Validation | Batch 880/1567 | Loss: 1.0656 [2026-04-18 13:23:52] Validation | Batch 890/1567 | Loss: 1.0660 [2026-04-18 13:23:53] Validation | Batch 900/1567 | Loss: 1.0680 [2026-04-18 13:23:54] Validation | Batch 910/1567 | Loss: 1.0682 [2026-04-18 13:23:54] Validation | Batch 920/1567 | Loss: 1.0701 [2026-04-18 13:23:55] Validation | Batch 930/1567 | Loss: 1.0679 [2026-04-18 13:23:56] Validation | Batch 940/1567 | Loss: 1.0675 [2026-04-18 13:23:57] Validation | Batch 950/1567 | Loss: 1.0666 [2026-04-18 13:23:57] Validation | Batch 960/1567 | Loss: 1.0655 [2026-04-18 13:23:58] Validation | Batch 970/1567 | Loss: 1.0667 [2026-04-18 13:23:59] Validation | Batch 980/1567 | Loss: 1.0673 [2026-04-18 13:23:59] Validation | Batch 990/1567 | Loss: 1.0667 [2026-04-18 13:24:00] Validation | Batch 1000/1567 | Loss: 1.0668 [2026-04-18 13:24:01] Validation | Batch 1010/1567 | Loss: 1.0647 [2026-04-18 13:24:02] Validation | Batch 1020/1567 | Loss: 1.0649 [2026-04-18 13:24:03] Validation | Batch 1030/1567 | Loss: 1.0663 [2026-04-18 13:24:04] Validation | Batch 1040/1567 | Loss: 1.0660 [2026-04-18 13:24:04] Validation | Batch 1050/1567 | Loss: 1.0668 [2026-04-18 13:24:05] Validation | Batch 1060/1567 | Loss: 1.0661 [2026-04-18 13:24:06] Validation | Batch 1070/1567 | Loss: 1.0654 [2026-04-18 13:24:07] Validation | Batch 1080/1567 | Loss: 1.0664 [2026-04-18 13:24:07] Validation | Batch 1090/1567 | Loss: 1.0663 [2026-04-18 13:24:08] Validation | Batch 1100/1567 | Loss: 1.0668 [2026-04-18 13:24:09] Validation | Batch 1110/1567 | Loss: 1.0664 [2026-04-18 13:24:09] Validation | Batch 1120/1567 | Loss: 1.0665 [2026-04-18 13:24:10] Validation | Batch 1130/1567 | Loss: 1.0667 [2026-04-18 13:24:11] Validation | Batch 1140/1567 | Loss: 1.0673 [2026-04-18 13:24:12] Validation | Batch 1150/1567 | Loss: 1.0675 [2026-04-18 13:24:13] Validation | Batch 1160/1567 | Loss: 1.0684 [2026-04-18 13:24:14] Validation | Batch 1170/1567 | Loss: 1.0680 [2026-04-18 13:24:15] Validation | Batch 1180/1567 | Loss: 1.0678 [2026-04-18 13:24:15] Validation | Batch 1190/1567 | Loss: 1.0688 [2026-04-18 13:24:16] Validation | Batch 1200/1567 | Loss: 1.0681 [2026-04-18 13:24:17] Validation | Batch 1210/1567 | Loss: 1.0671 [2026-04-18 13:24:18] Validation | Batch 1220/1567 | Loss: 1.0675 [2026-04-18 13:24:18] Validation | Batch 1230/1567 | Loss: 1.0696 [2026-04-18 13:24:19] Validation | Batch 1240/1567 | Loss: 1.0686 [2026-04-18 13:24:20] Validation | Batch 1250/1567 | Loss: 1.0684 [2026-04-18 13:24:21] Validation | Batch 1260/1567 | Loss: 1.0693 [2026-04-18 13:24:22] Validation | Batch 1270/1567 | Loss: 1.0693 [2026-04-18 13:24:23] Validation | Batch 1280/1567 | Loss: 1.0686 [2026-04-18 13:24:24] Validation | Batch 1290/1567 | Loss: 1.0689 [2026-04-18 13:24:25] Validation | Batch 1300/1567 | Loss: 1.0692 [2026-04-18 13:24:25] Validation | Batch 1310/1567 | Loss: 1.0696 [2026-04-18 13:24:26] Validation | Batch 1320/1567 | Loss: 1.0686 [2026-04-18 13:24:27] Validation | Batch 1330/1567 | Loss: 1.0683 [2026-04-18 13:24:28] Validation | Batch 1340/1567 | Loss: 1.0681 [2026-04-18 13:24:28] Validation | Batch 1350/1567 | Loss: 1.0688 [2026-04-18 13:24:29] Validation | Batch 1360/1567 | Loss: 1.0684 [2026-04-18 13:24:30] Validation | Batch 1370/1567 | Loss: 1.0687 [2026-04-18 13:24:31] Validation | Batch 1380/1567 | Loss: 1.0698 [2026-04-18 13:24:31] Validation | Batch 1390/1567 | Loss: 1.0698 [2026-04-18 13:24:32] Validation | Batch 1400/1567 | Loss: 1.0700 [2026-04-18 13:24:33] Validation | Batch 1410/1567 | Loss: 1.0697 [2026-04-18 13:24:33] Validation | Batch 1420/1567 | Loss: 1.0701 [2026-04-18 13:24:34] Validation | Batch 1430/1567 | Loss: 1.0698 [2026-04-18 13:24:35] Validation | Batch 1440/1567 | Loss: 1.0701 [2026-04-18 13:24:36] Validation | Batch 1450/1567 | Loss: 1.0695 [2026-04-18 13:24:36] Validation | Batch 1460/1567 | Loss: 1.0691 [2026-04-18 13:24:37] Validation | Batch 1470/1567 | Loss: 1.0681 [2026-04-18 13:24:38] Validation | Batch 1480/1567 | Loss: 1.0666 [2026-04-18 13:24:38] Validation | Batch 1490/1567 | Loss: 1.0665 [2026-04-18 13:24:39] Validation | Batch 1500/1567 | Loss: 1.0666 [2026-04-18 13:24:40] Validation | Batch 1510/1567 | Loss: 1.0664 [2026-04-18 13:24:41] Validation | Batch 1520/1567 | Loss: 1.0656 [2026-04-18 13:24:41] Validation | Batch 1530/1567 | Loss: 1.0664 [2026-04-18 13:24:42] Validation | Batch 1540/1567 | Loss: 1.0673 [2026-04-18 13:24:43] Validation | Batch 1550/1567 | Loss: 1.0674 [2026-04-18 13:24:44] Validation | Batch 1560/1567 | Loss: 1.0664 [2026-04-18 13:24:45] Validation | Batch 1567/1567 | Loss: 1.0667 [2026-04-18 13:24:45] Validation | Loss: 1.0667 | PPL: 2.93 | Time: 125.17s [2026-04-18 13:24:49] Epoch 1 | Step 8010 | Loss: 1.0016 | LR: 2.00e-05 [2026-04-18 13:24:52] Epoch 1 | Step 8020 | Loss: 1.0014 | LR: 2.00e-05 [2026-04-18 13:24:56] Epoch 1 | Step 8030 | Loss: 1.0013 | LR: 2.00e-05 [2026-04-18 13:25:00] Epoch 1 | Step 8040 | Loss: 1.0014 | LR: 2.00e-05 [2026-04-18 13:25:03] Epoch 1 | Step 8050 | Loss: 1.0013 | LR: 2.00e-05 [2026-04-18 13:25:07] Epoch 1 | Step 8060 | Loss: 1.0012 | LR: 2.00e-05 [2026-04-18 13:25:10] Epoch 1 | Step 8070 | Loss: 1.0012 | LR: 2.00e-05 [2026-04-18 13:25:14] Epoch 1 | Step 8080 | Loss: 1.0011 | LR: 2.00e-05 [2026-04-18 13:25:18] Epoch 1 | Step 8090 | Loss: 1.0010 | LR: 2.00e-05 [2026-04-18 13:25:21] Epoch 1 | Step 8100 | Loss: 1.0009 | LR: 2.00e-05 [2026-04-18 13:25:25] Epoch 1 | Step 8110 | Loss: 1.0010 | LR: 2.00e-05 [2026-04-18 13:25:29] Epoch 1 | Step 8120 | Loss: 1.0010 | LR: 2.00e-05 [2026-04-18 13:25:33] Epoch 1 | Step 8130 | Loss: 1.0009 | LR: 2.00e-05 [2026-04-18 13:25:37] Epoch 1 | Step 8140 | Loss: 1.0010 | LR: 2.00e-05 [2026-04-18 13:25:40] Epoch 1 | Step 8150 | Loss: 1.0010 | LR: 2.00e-05 [2026-04-18 13:25:43] Epoch 1 | Step 8160 | Loss: 1.0008 | LR: 2.00e-05 [2026-04-18 13:25:47] Epoch 1 | Step 8170 | Loss: 1.0008 | LR: 2.00e-05 [2026-04-18 13:25:50] Epoch 1 | Step 8180 | Loss: 1.0007 | LR: 2.00e-05 [2026-04-18 13:25:54] Epoch 1 | Step 8190 | Loss: 1.0006 | LR: 2.00e-05 [2026-04-18 13:25:58] Epoch 1 | Step 8200 | Loss: 1.0007 | LR: 2.00e-05 [2026-04-18 13:26:01] Epoch 1 | Step 8210 | Loss: 1.0006 | LR: 2.00e-05 [2026-04-18 13:26:05] Epoch 1 | Step 8220 | Loss: 1.0006 | LR: 2.00e-05 [2026-04-18 13:26:09] Epoch 1 | Step 8230 | Loss: 1.0006 | LR: 2.00e-05 [2026-04-18 13:26:12] Epoch 1 | Step 8240 | Loss: 1.0007 | LR: 2.00e-05 [2026-04-18 13:26:15] Epoch 1 | Step 8250 | Loss: 1.0006 | LR: 2.00e-05 [2026-04-18 13:26:19] Epoch 1 | Step 8260 | Loss: 1.0007 | LR: 2.00e-05 [2026-04-18 13:26:22] Epoch 1 | Step 8270 | Loss: 1.0008 | LR: 2.00e-05 [2026-04-18 13:26:26] Epoch 1 | Step 8280 | Loss: 1.0008 | LR: 2.00e-05 [2026-04-18 13:26:30] Epoch 1 | Step 8290 | Loss: 1.0009 | LR: 2.00e-05 [2026-04-18 13:26:34] Epoch 1 | Step 8300 | Loss: 1.0008 | LR: 2.00e-05 [2026-04-18 13:26:37] Epoch 1 | Step 8310 | Loss: 1.0008 | LR: 2.00e-05 [2026-04-18 13:26:40] Epoch 1 | Step 8320 | Loss: 1.0008 | LR: 2.00e-05 [2026-04-18 13:26:43] Epoch 1 | Step 8330 | Loss: 1.0008 | LR: 2.00e-05 [2026-04-18 13:26:47] Epoch 1 | Step 8340 | Loss: 1.0007 | LR: 2.00e-05 [2026-04-18 13:26:51] Epoch 1 | Step 8350 | Loss: 1.0006 | LR: 2.00e-05 [2026-04-18 13:26:54] Epoch 1 | Step 8360 | Loss: 1.0005 | LR: 2.00e-05 [2026-04-18 13:26:58] Epoch 1 | Step 8370 | Loss: 1.0005 | LR: 2.00e-05 [2026-04-18 13:27:01] Epoch 1 | Step 8380 | Loss: 1.0005 | LR: 2.00e-05 [2026-04-18 13:27:04] Epoch 1 | Step 8390 | Loss: 1.0005 | LR: 2.00e-05 [2026-04-18 13:27:08] Epoch 1 | Step 8400 | Loss: 1.0005 | LR: 2.00e-05 [2026-04-18 13:27:12] Epoch 1 | Step 8410 | Loss: 1.0005 | LR: 2.00e-05 [2026-04-18 13:27:15] Epoch 1 | Step 8420 | Loss: 1.0006 | LR: 2.00e-05 [2026-04-18 13:27:19] Epoch 1 | Step 8430 | Loss: 1.0004 | LR: 2.00e-05 [2026-04-18 13:27:23] Epoch 1 | Step 8440 | Loss: 1.0004 | LR: 2.00e-05 [2026-04-18 13:27:26] Epoch 1 | Step 8450 | Loss: 1.0004 | LR: 2.00e-05 [2026-04-18 13:27:30] Epoch 1 | Step 8460 | Loss: 1.0004 | LR: 2.00e-05 [2026-04-18 13:27:33] Epoch 1 | Step 8470 | Loss: 1.0004 | LR: 2.00e-05 [2026-04-18 13:27:37] Epoch 1 | Step 8480 | Loss: 1.0004 | LR: 2.00e-05 [2026-04-18 13:27:40] Epoch 1 | Step 8490 | Loss: 1.0002 | LR: 2.00e-05 [2026-04-18 13:27:43] Epoch 1 | Step 8500 | Loss: 1.0002 | LR: 2.00e-05 [2026-04-18 13:27:47] Epoch 1 | Step 8510 | Loss: 1.0001 | LR: 2.00e-05 [2026-04-18 13:27:51] Epoch 1 | Step 8520 | Loss: 1.0002 | LR: 2.00e-05 [2026-04-18 13:27:54] Epoch 1 | Step 8530 | Loss: 1.0001 | LR: 2.00e-05 [2026-04-18 13:27:58] Epoch 1 | Step 8540 | Loss: 1.0003 | LR: 2.00e-05 [2026-04-18 13:28:01] Epoch 1 | Step 8550 | Loss: 1.0003 | LR: 2.00e-05 [2026-04-18 13:28:05] Epoch 1 | Step 8560 | Loss: 1.0002 | LR: 2.00e-05 [2026-04-18 13:28:08] Epoch 1 | Step 8570 | Loss: 1.0002 | LR: 2.00e-05 [2026-04-18 13:28:11] Epoch 1 | Step 8580 | Loss: 1.0000 | LR: 2.00e-05 [2026-04-18 13:28:14] Epoch 1 | Step 8590 | Loss: 0.9999 | LR: 2.00e-05 [2026-04-18 13:28:18] Epoch 1 | Step 8600 | Loss: 0.9998 | LR: 2.00e-05 [2026-04-18 13:28:21] Epoch 1 | Step 8610 | Loss: 0.9998 | LR: 2.00e-05 [2026-04-18 13:28:25] Epoch 1 | Step 8620 | Loss: 0.9997 | LR: 2.00e-05 [2026-04-18 13:28:28] Epoch 1 | Step 8630 | Loss: 0.9995 | LR: 2.00e-05 [2026-04-18 13:28:32] Epoch 1 | Step 8640 | Loss: 0.9997 | LR: 2.00e-05 [2026-04-18 13:28:35] Epoch 1 | Step 8650 | Loss: 0.9998 | LR: 2.00e-05 [2026-04-18 13:28:39] Epoch 1 | Step 8660 | Loss: 0.9996 | LR: 2.00e-05 [2026-04-18 13:28:42] Epoch 1 | Step 8670 | Loss: 0.9996 | LR: 2.00e-05 [2026-04-18 13:28:47] Epoch 1 | Step 8680 | Loss: 0.9997 | LR: 2.00e-05 [2026-04-18 13:28:50] Epoch 1 | Step 8690 | Loss: 0.9995 | LR: 2.00e-05 [2026-04-18 13:28:54] Epoch 1 | Step 8700 | Loss: 0.9995 | LR: 2.00e-05 [2026-04-18 13:28:58] Epoch 1 | Step 8710 | Loss: 0.9992 | LR: 2.00e-05 [2026-04-18 13:29:01] Epoch 1 | Step 8720 | Loss: 0.9991 | LR: 2.00e-05 [2026-04-18 13:29:05] Epoch 1 | Step 8730 | Loss: 0.9992 | LR: 2.00e-05 [2026-04-18 13:29:08] Epoch 1 | Step 8740 | Loss: 0.9992 | LR: 2.00e-05 [2026-04-18 13:29:12] Epoch 1 | Step 8750 | Loss: 0.9992 | LR: 2.00e-05 [2026-04-18 13:29:15] Epoch 1 | Step 8760 | Loss: 0.9992 | LR: 2.00e-05 [2026-04-18 13:29:19] Epoch 1 | Step 8770 | Loss: 0.9990 | LR: 2.00e-05 [2026-04-18 13:29:23] Epoch 1 | Step 8780 | Loss: 0.9990 | LR: 2.00e-05 [2026-04-18 13:29:27] Epoch 1 | Step 8790 | Loss: 0.9990 | LR: 2.00e-05 [2026-04-18 13:29:30] Epoch 1 | Step 8800 | Loss: 0.9988 | LR: 2.00e-05 [2026-04-18 13:29:34] Epoch 1 | Step 8810 | Loss: 0.9987 | LR: 2.00e-05 [2026-04-18 13:29:38] Epoch 1 | Step 8820 | Loss: 0.9986 | LR: 2.00e-05 [2026-04-18 13:29:41] Epoch 1 | Step 8830 | Loss: 0.9986 | LR: 2.00e-05 [2026-04-18 13:29:45] Epoch 1 | Step 8840 | Loss: 0.9986 | LR: 2.00e-05 [2026-04-18 13:29:48] Epoch 1 | Step 8850 | Loss: 0.9986 | LR: 2.00e-05 [2026-04-18 13:29:52] Epoch 1 | Step 8860 | Loss: 0.9985 | LR: 2.00e-05 [2026-04-18 13:29:55] Epoch 1 | Step 8870 | Loss: 0.9985 | LR: 2.00e-05 [2026-04-18 13:29:59] Epoch 1 | Step 8880 | Loss: 0.9984 | LR: 2.00e-05 [2026-04-18 13:30:02] Epoch 1 | Step 8890 | Loss: 0.9983 | LR: 2.00e-05 [2026-04-18 13:30:06] Epoch 1 | Step 8900 | Loss: 0.9981 | LR: 2.00e-05 [2026-04-18 13:30:09] Epoch 1 | Step 8910 | Loss: 0.9982 | LR: 2.00e-05 [2026-04-18 13:30:13] Epoch 1 | Step 8920 | Loss: 0.9980 | LR: 2.00e-05 [2026-04-18 13:30:16] Epoch 1 | Step 8930 | Loss: 0.9979 | LR: 2.00e-05 [2026-04-18 13:30:20] Epoch 1 | Step 8940 | Loss: 0.9979 | LR: 2.00e-05 [2026-04-18 13:30:23] Epoch 1 | Step 8950 | Loss: 0.9980 | LR: 2.00e-05 [2026-04-18 13:30:26] Epoch 1 | Step 8960 | Loss: 0.9979 | LR: 2.00e-05 [2026-04-18 13:30:30] Epoch 1 | Step 8970 | Loss: 0.9979 | LR: 2.00e-05 [2026-04-18 13:30:33] Epoch 1 | Step 8980 | Loss: 0.9978 | LR: 2.00e-05 [2026-04-18 13:30:37] Epoch 1 | Step 8990 | Loss: 0.9977 | LR: 2.00e-05 [2026-04-18 13:30:41] Epoch 1 | Step 9000 | Loss: 0.9977 | LR: 2.00e-05 [2026-04-18 13:30:50] Checkpoint saved: outputs/2026-04-18/12-19-14/checkpoints/checkpoint_step_9000.pt [2026-04-18 13:31:07] Validation | Batch 10/1567 | Loss: 0.9376 [2026-04-18 13:31:08] Validation | Batch 20/1567 | Loss: 0.9928 [2026-04-18 13:31:09] Validation | Batch 30/1567 | Loss: 1.0413 [2026-04-18 13:31:10] Validation | Batch 40/1567 | Loss: 1.0647 [2026-04-18 13:31:10] Validation | Batch 50/1567 | Loss: 1.0458 [2026-04-18 13:31:11] Validation | Batch 60/1567 | Loss: 1.0350 [2026-04-18 13:31:12] Validation | Batch 70/1567 | Loss: 1.0203 [2026-04-18 13:31:13] Validation | Batch 80/1567 | Loss: 1.0374 [2026-04-18 13:31:14] Validation | Batch 90/1567 | Loss: 1.0466 [2026-04-18 13:31:15] Validation | Batch 100/1567 | Loss: 1.0534 [2026-04-18 13:31:15] Validation | Batch 110/1567 | Loss: 1.0445 [2026-04-18 13:31:16] Validation | Batch 120/1567 | Loss: 1.0552 [2026-04-18 13:31:17] Validation | Batch 130/1567 | Loss: 1.0565 [2026-04-18 13:31:18] Validation | Batch 140/1567 | Loss: 1.0602 [2026-04-18 13:31:19] Validation | Batch 150/1567 | Loss: 1.0686 [2026-04-18 13:31:20] Validation | Batch 160/1567 | Loss: 1.0686 [2026-04-18 13:31:20] Validation | Batch 170/1567 | Loss: 1.0543 [2026-04-18 13:31:21] Validation | Batch 180/1567 | Loss: 1.0557 [2026-04-18 13:31:22] Validation | Batch 190/1567 | Loss: 1.0526 [2026-04-18 13:31:23] Validation | Batch 200/1567 | Loss: 1.0557 [2026-04-18 13:31:24] Validation | Batch 210/1567 | Loss: 1.0569 [2026-04-18 13:31:25] Validation | Batch 220/1567 | Loss: 1.0599 [2026-04-18 13:31:26] Validation | Batch 230/1567 | Loss: 1.0632 [2026-04-18 13:31:27] Validation | Batch 240/1567 | Loss: 1.0618 [2026-04-18 13:31:27] Validation | Batch 250/1567 | Loss: 1.0562 [2026-04-18 13:31:28] Validation | Batch 260/1567 | Loss: 1.0510 [2026-04-18 13:31:29] Validation | Batch 270/1567 | Loss: 1.0476 [2026-04-18 13:31:29] Validation | Batch 280/1567 | Loss: 1.0482 [2026-04-18 13:31:31] Validation | Batch 290/1567 | Loss: 1.0526 [2026-04-18 13:31:31] Validation | Batch 300/1567 | Loss: 1.0587 [2026-04-18 13:31:32] Validation | Batch 310/1567 | Loss: 1.0581 [2026-04-18 13:31:33] Validation | Batch 320/1567 | Loss: 1.0591 [2026-04-18 13:31:34] Validation | Batch 330/1567 | Loss: 1.0566 [2026-04-18 13:31:35] Validation | Batch 340/1567 | Loss: 1.0607 [2026-04-18 13:31:36] Validation | Batch 350/1567 | Loss: 1.0601 [2026-04-18 13:31:36] Validation | Batch 360/1567 | Loss: 1.0583 [2026-04-18 13:31:37] Validation | Batch 370/1567 | Loss: 1.0557 [2026-04-18 13:31:38] Validation | Batch 380/1567 | Loss: 1.0592 [2026-04-18 13:31:39] Validation | Batch 390/1567 | Loss: 1.0607 [2026-04-18 13:31:39] Validation | Batch 400/1567 | Loss: 1.0621 [2026-04-18 13:31:40] Validation | Batch 410/1567 | Loss: 1.0619 [2026-04-18 13:31:41] Validation | Batch 420/1567 | Loss: 1.0615 [2026-04-18 13:31:42] Validation | Batch 430/1567 | Loss: 1.0616 [2026-04-18 13:31:43] Validation | Batch 440/1567 | Loss: 1.0605 [2026-04-18 13:31:44] Validation | Batch 450/1567 | Loss: 1.0603 [2026-04-18 13:31:45] Validation | Batch 460/1567 | Loss: 1.0591 [2026-04-18 13:31:45] Validation | Batch 470/1567 | Loss: 1.0580 [2026-04-18 13:31:46] Validation | Batch 480/1567 | Loss: 1.0563 [2026-04-18 13:31:47] Validation | Batch 490/1567 | Loss: 1.0565 [2026-04-18 13:31:48] Validation | Batch 500/1567 | Loss: 1.0567 [2026-04-18 13:31:49] Validation | Batch 510/1567 | Loss: 1.0584 [2026-04-18 13:31:49] Validation | Batch 520/1567 | Loss: 1.0603 [2026-04-18 13:31:50] Validation | Batch 530/1567 | Loss: 1.0598 [2026-04-18 13:31:51] Validation | Batch 540/1567 | Loss: 1.0621 [2026-04-18 13:31:52] Validation | Batch 550/1567 | Loss: 1.0656 [2026-04-18 13:31:53] Validation | Batch 560/1567 | Loss: 1.0650 [2026-04-18 13:31:54] Validation | Batch 570/1567 | Loss: 1.0646 [2026-04-18 13:31:55] Validation | Batch 580/1567 | Loss: 1.0637 [2026-04-18 13:31:55] Validation | Batch 590/1567 | Loss: 1.0625 [2026-04-18 13:31:56] Validation | Batch 600/1567 | Loss: 1.0608 [2026-04-18 13:31:57] Validation | Batch 610/1567 | Loss: 1.0599 [2026-04-18 13:31:58] Validation | Batch 620/1567 | Loss: 1.0615 [2026-04-18 13:31:59] Validation | Batch 630/1567 | Loss: 1.0595 [2026-04-18 13:32:00] Validation | Batch 640/1567 | Loss: 1.0609 [2026-04-18 13:32:01] Validation | Batch 650/1567 | Loss: 1.0602 [2026-04-18 13:32:01] Validation | Batch 660/1567 | Loss: 1.0590 [2026-04-18 13:32:02] Validation | Batch 670/1567 | Loss: 1.0568 [2026-04-18 13:32:03] Validation | Batch 680/1567 | Loss: 1.0566 [2026-04-18 13:32:03] Validation | Batch 690/1567 | Loss: 1.0576 [2026-04-18 13:32:04] Validation | Batch 700/1567 | Loss: 1.0561 [2026-04-18 13:32:05] Validation | Batch 710/1567 | Loss: 1.0570 [2026-04-18 13:32:06] Validation | Batch 720/1567 | Loss: 1.0565 [2026-04-18 13:32:07] Validation | Batch 730/1567 | Loss: 1.0574 [2026-04-18 13:32:07] Validation | Batch 740/1567 | Loss: 1.0583 [2026-04-18 13:32:08] Validation | Batch 750/1567 | Loss: 1.0584 [2026-04-18 13:32:09] Validation | Batch 760/1567 | Loss: 1.0584 [2026-04-18 13:32:10] Validation | Batch 770/1567 | Loss: 1.0604 [2026-04-18 13:32:11] Validation | Batch 780/1567 | Loss: 1.0619 [2026-04-18 13:32:12] Validation | Batch 790/1567 | Loss: 1.0614 [2026-04-18 13:32:12] Validation | Batch 800/1567 | Loss: 1.0631 [2026-04-18 13:32:13] Validation | Batch 810/1567 | Loss: 1.0630 [2026-04-18 13:32:14] Validation | Batch 820/1567 | Loss: 1.0627 [2026-04-18 13:32:15] Validation | Batch 830/1567 | Loss: 1.0613 [2026-04-18 13:32:15] Validation | Batch 840/1567 | Loss: 1.0615 [2026-04-18 13:32:16] Validation | Batch 850/1567 | Loss: 1.0602 [2026-04-18 13:32:17] Validation | Batch 860/1567 | Loss: 1.0617 [2026-04-18 13:32:17] Validation | Batch 870/1567 | Loss: 1.0623 [2026-04-18 13:32:18] Validation | Batch 880/1567 | Loss: 1.0633 [2026-04-18 13:32:19] Validation | Batch 890/1567 | Loss: 1.0637 [2026-04-18 13:32:20] Validation | Batch 900/1567 | Loss: 1.0656 [2026-04-18 13:32:20] Validation | Batch 910/1567 | Loss: 1.0657 [2026-04-18 13:32:21] Validation | Batch 920/1567 | Loss: 1.0675 [2026-04-18 13:32:22] Validation | Batch 930/1567 | Loss: 1.0653 [2026-04-18 13:32:23] Validation | Batch 940/1567 | Loss: 1.0649 [2026-04-18 13:32:23] Validation | Batch 950/1567 | Loss: 1.0640 [2026-04-18 13:32:24] Validation | Batch 960/1567 | Loss: 1.0627 [2026-04-18 13:32:25] Validation | Batch 970/1567 | Loss: 1.0639 [2026-04-18 13:32:26] Validation | Batch 980/1567 | Loss: 1.0644 [2026-04-18 13:32:26] Validation | Batch 990/1567 | Loss: 1.0639 [2026-04-18 13:32:27] Validation | Batch 1000/1567 | Loss: 1.0642 [2026-04-18 13:32:28] Validation | Batch 1010/1567 | Loss: 1.0620 [2026-04-18 13:32:28] Validation | Batch 1020/1567 | Loss: 1.0622 [2026-04-18 13:32:30] Validation | Batch 1030/1567 | Loss: 1.0636 [2026-04-18 13:32:30] Validation | Batch 1040/1567 | Loss: 1.0633 [2026-04-18 13:32:31] Validation | Batch 1050/1567 | Loss: 1.0641 [2026-04-18 13:32:32] Validation | Batch 1060/1567 | Loss: 1.0634 [2026-04-18 13:32:33] Validation | Batch 1070/1567 | Loss: 1.0626 [2026-04-18 13:32:33] Validation | Batch 1080/1567 | Loss: 1.0636 [2026-04-18 13:32:34] Validation | Batch 1090/1567 | Loss: 1.0634 [2026-04-18 13:32:35] Validation | Batch 1100/1567 | Loss: 1.0638 [2026-04-18 13:32:35] Validation | Batch 1110/1567 | Loss: 1.0633 [2026-04-18 13:32:36] Validation | Batch 1120/1567 | Loss: 1.0633 [2026-04-18 13:32:37] Validation | Batch 1130/1567 | Loss: 1.0636 [2026-04-18 13:32:38] Validation | Batch 1140/1567 | Loss: 1.0641 [2026-04-18 13:32:39] Validation | Batch 1150/1567 | Loss: 1.0643 [2026-04-18 13:32:39] Validation | Batch 1160/1567 | Loss: 1.0653 [2026-04-18 13:32:40] Validation | Batch 1170/1567 | Loss: 1.0650 [2026-04-18 13:32:41] Validation | Batch 1180/1567 | Loss: 1.0649 [2026-04-18 13:32:42] Validation | Batch 1190/1567 | Loss: 1.0658 [2026-04-18 13:32:43] Validation | Batch 1200/1567 | Loss: 1.0652 [2026-04-18 13:32:44] Validation | Batch 1210/1567 | Loss: 1.0639 [2026-04-18 13:32:44] Validation | Batch 1220/1567 | Loss: 1.0643 [2026-04-18 13:32:45] Validation | Batch 1230/1567 | Loss: 1.0664 [2026-04-18 13:32:46] Validation | Batch 1240/1567 | Loss: 1.0654 [2026-04-18 13:32:46] Validation | Batch 1250/1567 | Loss: 1.0653 [2026-04-18 13:32:47] Validation | Batch 1260/1567 | Loss: 1.0661 [2026-04-18 13:32:48] Validation | Batch 1270/1567 | Loss: 1.0661 [2026-04-18 13:32:49] Validation | Batch 1280/1567 | Loss: 1.0655 [2026-04-18 13:32:50] Validation | Batch 1290/1567 | Loss: 1.0656 [2026-04-18 13:32:51] Validation | Batch 1300/1567 | Loss: 1.0660 [2026-04-18 13:32:52] Validation | Batch 1310/1567 | Loss: 1.0663 [2026-04-18 13:32:53] Validation | Batch 1320/1567 | Loss: 1.0653 [2026-04-18 13:32:53] Validation | Batch 1330/1567 | Loss: 1.0649 [2026-04-18 13:32:54] Validation | Batch 1340/1567 | Loss: 1.0647 [2026-04-18 13:32:55] Validation | Batch 1350/1567 | Loss: 1.0654 [2026-04-18 13:32:56] Validation | Batch 1360/1567 | Loss: 1.0647 [2026-04-18 13:32:56] Validation | Batch 1370/1567 | Loss: 1.0650 [2026-04-18 13:32:57] Validation | Batch 1380/1567 | Loss: 1.0661 [2026-04-18 13:32:58] Validation | Batch 1390/1567 | Loss: 1.0661 [2026-04-18 13:32:59] Validation | Batch 1400/1567 | Loss: 1.0663 [2026-04-18 13:32:59] Validation | Batch 1410/1567 | Loss: 1.0659 [2026-04-18 13:33:00] Validation | Batch 1420/1567 | Loss: 1.0664 [2026-04-18 13:33:01] Validation | Batch 1430/1567 | Loss: 1.0661 [2026-04-18 13:33:02] Validation | Batch 1440/1567 | Loss: 1.0664 [2026-04-18 13:33:02] Validation | Batch 1450/1567 | Loss: 1.0659 [2026-04-18 13:33:03] Validation | Batch 1460/1567 | Loss: 1.0657 [2026-04-18 13:33:04] Validation | Batch 1470/1567 | Loss: 1.0647 [2026-04-18 13:33:05] Validation | Batch 1480/1567 | Loss: 1.0631 [2026-04-18 13:33:05] Validation | Batch 1490/1567 | Loss: 1.0630 [2026-04-18 13:33:06] Validation | Batch 1500/1567 | Loss: 1.0630 [2026-04-18 13:33:07] Validation | Batch 1510/1567 | Loss: 1.0627 [2026-04-18 13:33:08] Validation | Batch 1520/1567 | Loss: 1.0621 [2026-04-18 13:33:08] Validation | Batch 1530/1567 | Loss: 1.0629 [2026-04-18 13:33:09] Validation | Batch 1540/1567 | Loss: 1.0640 [2026-04-18 13:33:10] Validation | Batch 1550/1567 | Loss: 1.0643 [2026-04-18 13:33:11] Validation | Batch 1560/1567 | Loss: 1.0633 [2026-04-18 13:33:12] Validation | Batch 1567/1567 | Loss: 1.0636 [2026-04-18 13:33:12] Validation | Loss: 1.0636 | PPL: 2.92 | Time: 125.79s [2026-04-18 13:33:17] New best model saved! Val loss: 1.0636 [2026-04-18 13:33:21] Epoch 1 | Step 9010 | Loss: 0.9978 | LR: 2.00e-05 [2026-04-18 13:33:24] Epoch 1 | Step 9020 | Loss: 0.9978 | LR: 2.00e-05 [2026-04-18 13:33:28] Epoch 1 | Step 9030 | Loss: 0.9978 | LR: 2.00e-05 [2026-04-18 13:33:31] Epoch 1 | Step 9040 | Loss: 0.9977 | LR: 2.00e-05 [2026-04-18 13:33:35] Epoch 1 | Step 9050 | Loss: 0.9976 | LR: 2.00e-05 [2026-04-18 13:33:38] Epoch 1 | Step 9060 | Loss: 0.9977 | LR: 2.00e-05 [2026-04-18 13:33:42] Epoch 1 | Step 9070 | Loss: 0.9976 | LR: 2.00e-05 [2026-04-18 13:33:45] Epoch 1 | Step 9080 | Loss: 0.9976 | LR: 2.00e-05 [2026-04-18 13:33:48] Epoch 1 | Step 9090 | Loss: 0.9975 | LR: 2.00e-05 [2026-04-18 13:33:52] Epoch 1 | Step 9100 | Loss: 0.9976 | LR: 2.00e-05 [2026-04-18 13:33:56] Epoch 1 | Step 9110 | Loss: 0.9976 | LR: 2.00e-05 [2026-04-18 13:33:59] Epoch 1 | Step 9120 | Loss: 0.9977 | LR: 2.00e-05 [2026-04-18 13:34:02] Epoch 1 | Step 9130 | Loss: 0.9975 | LR: 2.00e-05 [2026-04-18 13:34:06] Epoch 1 | Step 9140 | Loss: 0.9975 | LR: 2.00e-05 [2026-04-18 13:34:09] Epoch 1 | Step 9150 | Loss: 0.9976 | LR: 2.00e-05 [2026-04-18 13:34:13] Epoch 1 | Step 9160 | Loss: 0.9975 | LR: 2.00e-05 [2026-04-18 13:34:17] Epoch 1 | Step 9170 | Loss: 0.9973 | LR: 2.00e-05 [2026-04-18 13:34:20] Epoch 1 | Step 9180 | Loss: 0.9972 | LR: 2.00e-05 [2026-04-18 13:34:24] Epoch 1 | Step 9190 | Loss: 0.9970 | LR: 2.00e-05 [2026-04-18 13:34:27] Epoch 1 | Step 9200 | Loss: 0.9970 | LR: 2.00e-05 [2026-04-18 13:34:31] Epoch 1 | Step 9210 | Loss: 0.9970 | LR: 2.00e-05 [2026-04-18 13:34:35] Epoch 1 | Step 9220 | Loss: 0.9969 | LR: 2.00e-05 [2026-04-18 13:34:38] Epoch 1 | Step 9230 | Loss: 0.9969 | LR: 2.00e-05 [2026-04-18 13:34:42] Epoch 1 | Step 9240 | Loss: 0.9967 | LR: 2.00e-05 [2026-04-18 13:34:46] Epoch 1 | Step 9250 | Loss: 0.9966 | LR: 2.00e-05 [2026-04-18 13:34:49] Epoch 1 | Step 9260 | Loss: 0.9964 | LR: 2.00e-05 [2026-04-18 13:34:53] Epoch 1 | Step 9270 | Loss: 0.9964 | LR: 2.00e-05 [2026-04-18 13:34:57] Epoch 1 | Step 9280 | Loss: 0.9964 | LR: 2.00e-05 [2026-04-18 13:35:00] Epoch 1 | Step 9290 | Loss: 0.9963 | LR: 2.00e-05 [2026-04-18 13:35:04] Epoch 1 | Step 9300 | Loss: 0.9963 | LR: 2.00e-05 [2026-04-18 13:35:07] Epoch 1 | Step 9310 | Loss: 0.9963 | LR: 2.00e-05 [2026-04-18 13:35:11] Epoch 1 | Step 9320 | Loss: 0.9962 | LR: 2.00e-05 [2026-04-18 13:35:14] Epoch 1 | Step 9330 | Loss: 0.9962 | LR: 2.00e-05 [2026-04-18 13:35:18] Epoch 1 | Step 9340 | Loss: 0.9961 | LR: 2.00e-05 [2026-04-18 13:35:22] Epoch 1 | Step 9350 | Loss: 0.9960 | LR: 2.00e-05 [2026-04-18 13:35:26] Epoch 1 | Step 9360 | Loss: 0.9960 | LR: 2.00e-05 [2026-04-18 13:35:29] Epoch 1 | Step 9370 | Loss: 0.9960 | LR: 2.00e-05 [2026-04-18 13:35:33] Epoch 1 | Step 9380 | Loss: 0.9959 | LR: 2.00e-05 [2026-04-18 13:35:37] Epoch 1 | Step 9390 | Loss: 0.9957 | LR: 2.00e-05 [2026-04-18 13:35:40] Epoch 1 | Step 9400 | Loss: 0.9959 | LR: 2.00e-05 [2026-04-18 13:35:44] Epoch 1 | Step 9410 | Loss: 0.9959 | LR: 2.00e-05 [2026-04-18 13:35:47] Epoch 1 | Step 9420 | Loss: 0.9960 | LR: 2.00e-05 [2026-04-18 13:35:51] Epoch 1 | Step 9430 | Loss: 0.9959 | LR: 2.00e-05 [2026-04-18 13:35:54] Epoch 1 | Step 9440 | Loss: 0.9959 | LR: 2.00e-05 [2026-04-18 13:35:57] Epoch 1 | Step 9450 | Loss: 0.9960 | LR: 2.00e-05 [2026-04-18 13:36:01] Epoch 1 | Step 9460 | Loss: 0.9959 | LR: 2.00e-05 [2026-04-18 13:36:05] Epoch 1 | Step 9470 | Loss: 0.9957 | LR: 2.00e-05 [2026-04-18 13:36:08] Epoch 1 | Step 9480 | Loss: 0.9955 | LR: 2.00e-05 [2026-04-18 13:36:11] Epoch 1 | Step 9490 | Loss: 0.9955 | LR: 2.00e-05 [2026-04-18 13:36:15] Epoch 1 | Step 9500 | Loss: 0.9955 | LR: 2.00e-05 [2026-04-18 13:36:19] Epoch 1 | Step 9510 | Loss: 0.9955 | LR: 2.00e-05 [2026-04-18 13:36:22] Epoch 1 | Step 9520 | Loss: 0.9954 | LR: 2.00e-05 [2026-04-18 13:36:26] Epoch 1 | Step 9530 | Loss: 0.9954 | LR: 2.00e-05 [2026-04-18 13:36:30] Epoch 1 | Step 9540 | Loss: 0.9953 | LR: 2.00e-05 [2026-04-18 13:36:33] Epoch 1 | Step 9550 | Loss: 0.9953 | LR: 2.00e-05 [2026-04-18 13:36:37] Epoch 1 | Step 9560 | Loss: 0.9953 | LR: 2.00e-05 [2026-04-18 13:36:41] Epoch 1 | Step 9570 | Loss: 0.9954 | LR: 2.00e-05 [2026-04-18 13:36:44] Epoch 1 | Step 9580 | Loss: 0.9954 | LR: 2.00e-05 [2026-04-18 13:36:47] Epoch 1 | Step 9590 | Loss: 0.9953 | LR: 2.00e-05 [2026-04-18 13:36:51] Epoch 1 | Step 9600 | Loss: 0.9952 | LR: 2.00e-05 [2026-04-18 13:36:55] Epoch 1 | Step 9610 | Loss: 0.9952 | LR: 2.00e-05 [2026-04-18 13:36:58] Epoch 1 | Step 9620 | Loss: 0.9953 | LR: 2.00e-05 [2026-04-18 13:37:02] Epoch 1 | Step 9630 | Loss: 0.9953 | LR: 2.00e-05 [2026-04-18 13:37:06] Epoch 1 | Step 9640 | Loss: 0.9953 | LR: 2.00e-05 [2026-04-18 13:37:10] Epoch 1 | Step 9650 | Loss: 0.9953 | LR: 2.00e-05 [2026-04-18 13:37:14] Epoch 1 | Step 9660 | Loss: 0.9953 | LR: 2.00e-05 [2026-04-18 13:37:18] Epoch 1 | Step 9670 | Loss: 0.9954 | LR: 2.00e-05 [2026-04-18 13:37:21] Epoch 1 | Step 9680 | Loss: 0.9953 | LR: 2.00e-05 [2026-04-18 13:37:24] Epoch 1 | Step 9690 | Loss: 0.9953 | LR: 2.00e-05 [2026-04-18 13:37:28] Epoch 1 | Step 9700 | Loss: 0.9953 | LR: 2.00e-05 [2026-04-18 13:37:32] Epoch 1 | Step 9710 | Loss: 0.9953 | LR: 2.00e-05 [2026-04-18 13:37:36] Epoch 1 | Step 9720 | Loss: 0.9952 | LR: 2.00e-05 [2026-04-18 13:37:39] Epoch 1 | Step 9730 | Loss: 0.9952 | LR: 2.00e-05 [2026-04-18 13:37:42] Epoch 1 | Step 9740 | Loss: 0.9952 | LR: 2.00e-05 [2026-04-18 13:37:46] Epoch 1 | Step 9750 | Loss: 0.9952 | LR: 2.00e-05 [2026-04-18 13:37:50] Epoch 1 | Step 9760 | Loss: 0.9951 | LR: 2.00e-05 [2026-04-18 13:37:52] Epoch 1 | Step 9770 | Loss: 0.9951 | LR: 2.00e-05 [2026-04-18 13:37:56] Epoch 1 | Step 9780 | Loss: 0.9950 | LR: 2.00e-05 [2026-04-18 13:37:59] Epoch 1 | Step 9790 | Loss: 0.9950 | LR: 2.00e-05 [2026-04-18 13:38:03] Epoch 1 | Step 9800 | Loss: 0.9950 | LR: 2.00e-05 [2026-04-18 13:38:07] Epoch 1 | Step 9810 | Loss: 0.9949 | LR: 2.00e-05 [2026-04-18 13:38:11] Epoch 1 | Step 9820 | Loss: 0.9949 | LR: 2.00e-05 [2026-04-18 13:38:15] Epoch 1 | Step 9830 | Loss: 0.9949 | LR: 2.00e-05 [2026-04-18 13:38:18] Epoch 1 | Step 9840 | Loss: 0.9950 | LR: 2.00e-05 [2026-04-18 13:38:21] Epoch 1 | Step 9850 | Loss: 0.9949 | LR: 2.00e-05 [2026-04-18 13:38:27] Epoch 1 | Step 9860 | Loss: 0.9948 | LR: 2.00e-05 [2026-04-18 13:38:30] Epoch 1 | Step 9870 | Loss: 0.9949 | LR: 2.00e-05 [2026-04-18 13:38:34] Epoch 1 | Step 9880 | Loss: 0.9949 | LR: 2.00e-05 [2026-04-18 13:38:37] Epoch 1 completed in 4753.98s | Loss: 0.9949 [2026-04-18 13:38:47] Checkpoint saved: outputs/2026-04-18/12-19-14/checkpoints/checkpoint_step_9887.pt [2026-04-18 13:38:57] ============================================================ [2026-04-18 13:38:57] EPOCH 2/3 [2026-04-18 13:38:57] ============================================================ [2026-04-18 13:38:59] Epoch 2 | Step 9890 | Loss: 0.8006 | LR: 2.00e-05 [2026-04-18 13:39:02] Epoch 2 | Step 9900 | Loss: 0.6708 | LR: 2.00e-05 [2026-04-18 13:39:06] Epoch 2 | Step 9910 | Loss: 0.7634 | LR: 2.00e-05 [2026-04-18 13:39:09] Epoch 2 | Step 9920 | Loss: 0.7934 | LR: 2.00e-05 [2026-04-18 13:39:13] Epoch 2 | Step 9930 | Loss: 0.8350 | LR: 2.00e-05 [2026-04-18 13:39:17] Epoch 2 | Step 9940 | Loss: 0.8274 | LR: 2.00e-05 [2026-04-18 13:39:21] Epoch 2 | Step 9950 | Loss: 0.8252 | LR: 2.00e-05 [2026-04-18 13:39:24] Epoch 2 | Step 9960 | Loss: 0.8194 | LR: 2.00e-05 [2026-04-18 13:39:28] Epoch 2 | Step 9970 | Loss: 0.8136 | LR: 2.00e-05 [2026-04-18 13:39:32] Epoch 2 | Step 9980 | Loss: 0.8098 | LR: 2.00e-05 [2026-04-18 13:39:35] Epoch 2 | Step 9990 | Loss: 0.8099 | LR: 2.00e-05 [2026-04-18 13:39:39] Epoch 2 | Step 10000 | Loss: 0.8110 | LR: 2.00e-05 [2026-04-18 13:39:40] Validation | Batch 10/1567 | Loss: 0.9520 [2026-04-18 13:39:40] Validation | Batch 20/1567 | Loss: 1.0164 [2026-04-18 13:39:42] Validation | Batch 30/1567 | Loss: 1.0610 [2026-04-18 13:39:43] Validation | Batch 40/1567 | Loss: 1.0833 [2026-04-18 13:39:43] Validation | Batch 50/1567 | Loss: 1.0627 [2026-04-18 13:39:44] Validation | Batch 60/1567 | Loss: 1.0483 [2026-04-18 13:39:45] Validation | Batch 70/1567 | Loss: 1.0313 [2026-04-18 13:39:46] Validation | Batch 80/1567 | Loss: 1.0477 [2026-04-18 13:39:47] Validation | Batch 90/1567 | Loss: 1.0543 [2026-04-18 13:39:48] Validation | Batch 100/1567 | Loss: 1.0649 [2026-04-18 13:39:48] Validation | Batch 110/1567 | Loss: 1.0543 [2026-04-18 13:39:49] Validation | Batch 120/1567 | Loss: 1.0660 [2026-04-18 13:39:50] Validation | Batch 130/1567 | Loss: 1.0675 [2026-04-18 13:39:51] Validation | Batch 140/1567 | Loss: 1.0711 [2026-04-18 13:39:52] Validation | Batch 150/1567 | Loss: 1.0783 [2026-04-18 13:39:53] Validation | Batch 160/1567 | Loss: 1.0773 [2026-04-18 13:39:53] Validation | Batch 170/1567 | Loss: 1.0621 [2026-04-18 13:39:54] Validation | Batch 180/1567 | Loss: 1.0641 [2026-04-18 13:39:55] Validation | Batch 190/1567 | Loss: 1.0599 [2026-04-18 13:39:56] Validation | Batch 200/1567 | Loss: 1.0632 [2026-04-18 13:39:57] Validation | Batch 210/1567 | Loss: 1.0646 [2026-04-18 13:39:57] Validation | Batch 220/1567 | Loss: 1.0674 [2026-04-18 13:39:59] Validation | Batch 230/1567 | Loss: 1.0712 [2026-04-18 13:39:59] Validation | Batch 240/1567 | Loss: 1.0695 [2026-04-18 13:39:59] Validation | Batch 250/1567 | Loss: 1.0631 [2026-04-18 13:40:00] Validation | Batch 260/1567 | Loss: 1.0582 [2026-04-18 13:40:01] Validation | Batch 270/1567 | Loss: 1.0548 [2026-04-18 13:40:02] Validation | Batch 280/1567 | Loss: 1.0562 [2026-04-18 13:40:03] Validation | Batch 290/1567 | Loss: 1.0615 [2026-04-18 13:40:04] Validation | Batch 300/1567 | Loss: 1.0678 [2026-04-18 13:40:04] Validation | Batch 310/1567 | Loss: 1.0674 [2026-04-18 13:40:05] Validation | Batch 320/1567 | Loss: 1.0678 [2026-04-18 13:40:06] Validation | Batch 330/1567 | Loss: 1.0657 [2026-04-18 13:40:07] Validation | Batch 340/1567 | Loss: 1.0697 [2026-04-18 13:40:08] Validation | Batch 350/1567 | Loss: 1.0688 [2026-04-18 13:40:09] Validation | Batch 360/1567 | Loss: 1.0668 [2026-04-18 13:40:09] Validation | Batch 370/1567 | Loss: 1.0640 [2026-04-18 13:40:10] Validation | Batch 380/1567 | Loss: 1.0673 [2026-04-18 13:40:11] Validation | Batch 390/1567 | Loss: 1.0681 [2026-04-18 13:40:12] Validation | Batch 400/1567 | Loss: 1.0696 [2026-04-18 13:40:13] Validation | Batch 410/1567 | Loss: 1.0687 [2026-04-18 13:40:13] Validation | Batch 420/1567 | Loss: 1.0685 [2026-04-18 13:40:14] Validation | Batch 430/1567 | Loss: 1.0679 [2026-04-18 13:40:15] Validation | Batch 440/1567 | Loss: 1.0666 [2026-04-18 13:40:16] Validation | Batch 450/1567 | Loss: 1.0666 [2026-04-18 13:40:17] Validation | Batch 460/1567 | Loss: 1.0660 [2026-04-18 13:40:17] Validation | Batch 470/1567 | Loss: 1.0651 [2026-04-18 13:40:18] Validation | Batch 480/1567 | Loss: 1.0628 [2026-04-18 13:40:19] Validation | Batch 490/1567 | Loss: 1.0627 [2026-04-18 13:40:20] Validation | Batch 500/1567 | Loss: 1.0629 [2026-04-18 13:40:21] Validation | Batch 510/1567 | Loss: 1.0649 [2026-04-18 13:40:21] Validation | Batch 520/1567 | Loss: 1.0667 [2026-04-18 13:40:22] Validation | Batch 530/1567 | Loss: 1.0664 [2026-04-18 13:40:23] Validation | Batch 540/1567 | Loss: 1.0688 [2026-04-18 13:40:24] Validation | Batch 550/1567 | Loss: 1.0720 [2026-04-18 13:40:25] Validation | Batch 560/1567 | Loss: 1.0717 [2026-04-18 13:40:26] Validation | Batch 570/1567 | Loss: 1.0716 [2026-04-18 13:40:27] Validation | Batch 580/1567 | Loss: 1.0706 [2026-04-18 13:40:27] Validation | Batch 590/1567 | Loss: 1.0695 [2026-04-18 13:40:28] Validation | Batch 600/1567 | Loss: 1.0679 [2026-04-18 13:40:29] Validation | Batch 610/1567 | Loss: 1.0668 [2026-04-18 13:40:30] Validation | Batch 620/1567 | Loss: 1.0682 [2026-04-18 13:40:31] Validation | Batch 630/1567 | Loss: 1.0666 [2026-04-18 13:40:32] Validation | Batch 640/1567 | Loss: 1.0680 [2026-04-18 13:40:34] Validation | Batch 650/1567 | Loss: 1.0673 [2026-04-18 13:40:34] Validation | Batch 660/1567 | Loss: 1.0663 [2026-04-18 13:40:35] Validation | Batch 670/1567 | Loss: 1.0642 [2026-04-18 13:40:36] Validation | Batch 680/1567 | Loss: 1.0640 [2026-04-18 13:40:36] Validation | Batch 690/1567 | Loss: 1.0652 [2026-04-18 13:40:37] Validation | Batch 700/1567 | Loss: 1.0638 [2026-04-18 13:40:38] Validation | Batch 710/1567 | Loss: 1.0650 [2026-04-18 13:40:39] Validation | Batch 720/1567 | Loss: 1.0646 [2026-04-18 13:40:40] Validation | Batch 730/1567 | Loss: 1.0654 [2026-04-18 13:40:40] Validation | Batch 740/1567 | Loss: 1.0660 [2026-04-18 13:40:41] Validation | Batch 750/1567 | Loss: 1.0664 [2026-04-18 13:40:42] Validation | Batch 760/1567 | Loss: 1.0663 [2026-04-18 13:40:43] Validation | Batch 770/1567 | Loss: 1.0685 [2026-04-18 13:40:44] Validation | Batch 780/1567 | Loss: 1.0698 [2026-04-18 13:40:45] Validation | Batch 790/1567 | Loss: 1.0696 [2026-04-18 13:40:45] Validation | Batch 800/1567 | Loss: 1.0714 [2026-04-18 13:40:46] Validation | Batch 810/1567 | Loss: 1.0715 [2026-04-18 13:40:47] Validation | Batch 820/1567 | Loss: 1.0714 [2026-04-18 13:40:48] Validation | Batch 830/1567 | Loss: 1.0701 [2026-04-18 13:40:48] Validation | Batch 840/1567 | Loss: 1.0704 [2026-04-18 13:40:49] Validation | Batch 850/1567 | Loss: 1.0690 [2026-04-18 13:40:50] Validation | Batch 860/1567 | Loss: 1.0707 [2026-04-18 13:40:50] Validation | Batch 870/1567 | Loss: 1.0713 [2026-04-18 13:40:51] Validation | Batch 880/1567 | Loss: 1.0722 [2026-04-18 13:40:52] Validation | Batch 890/1567 | Loss: 1.0726 [2026-04-18 13:40:53] Validation | Batch 900/1567 | Loss: 1.0745 [2026-04-18 13:40:53] Validation | Batch 910/1567 | Loss: 1.0748 [2026-04-18 13:40:54] Validation | Batch 920/1567 | Loss: 1.0769 [2026-04-18 13:40:55] Validation | Batch 930/1567 | Loss: 1.0746 [2026-04-18 13:40:56] Validation | Batch 940/1567 | Loss: 1.0741 [2026-04-18 13:40:56] Validation | Batch 950/1567 | Loss: 1.0732 [2026-04-18 13:40:57] Validation | Batch 960/1567 | Loss: 1.0718 [2026-04-18 13:40:58] Validation | Batch 970/1567 | Loss: 1.0734 [2026-04-18 13:40:59] Validation | Batch 980/1567 | Loss: 1.0737 [2026-04-18 13:40:59] Validation | Batch 990/1567 | Loss: 1.0732 [2026-04-18 13:41:00] Validation | Batch 1000/1567 | Loss: 1.0735 [2026-04-18 13:41:01] Validation | Batch 1010/1567 | Loss: 1.0712 [2026-04-18 13:41:02] Validation | Batch 1020/1567 | Loss: 1.0716 [2026-04-18 13:41:03] Validation | Batch 1030/1567 | Loss: 1.0734 [2026-04-18 13:41:03] Validation | Batch 1040/1567 | Loss: 1.0728 [2026-04-18 13:41:04] Validation | Batch 1050/1567 | Loss: 1.0738 [2026-04-18 13:41:05] Validation | Batch 1060/1567 | Loss: 1.0732 [2026-04-18 13:41:06] Validation | Batch 1070/1567 | Loss: 1.0724 [2026-04-18 13:41:06] Validation | Batch 1080/1567 | Loss: 1.0733 [2026-04-18 13:41:07] Validation | Batch 1090/1567 | Loss: 1.0730 [2026-04-18 13:41:08] Validation | Batch 1100/1567 | Loss: 1.0735 [2026-04-18 13:41:08] Validation | Batch 1110/1567 | Loss: 1.0734 [2026-04-18 13:41:09] Validation | Batch 1120/1567 | Loss: 1.0735 [2026-04-18 13:41:10] Validation | Batch 1130/1567 | Loss: 1.0737 [2026-04-18 13:41:11] Validation | Batch 1140/1567 | Loss: 1.0744 [2026-04-18 13:41:12] Validation | Batch 1150/1567 | Loss: 1.0748 [2026-04-18 13:41:13] Validation | Batch 1160/1567 | Loss: 1.0757 [2026-04-18 13:41:13] Validation | Batch 1170/1567 | Loss: 1.0752 [2026-04-18 13:41:14] Validation | Batch 1180/1567 | Loss: 1.0750 [2026-04-18 13:41:15] Validation | Batch 1190/1567 | Loss: 1.0760 [2026-04-18 13:41:16] Validation | Batch 1200/1567 | Loss: 1.0753 [2026-04-18 13:41:17] Validation | Batch 1210/1567 | Loss: 1.0741 [2026-04-18 13:41:17] Validation | Batch 1220/1567 | Loss: 1.0745 [2026-04-18 13:41:18] Validation | Batch 1230/1567 | Loss: 1.0765 [2026-04-18 13:41:19] Validation | Batch 1240/1567 | Loss: 1.0755 [2026-04-18 13:41:20] Validation | Batch 1250/1567 | Loss: 1.0754 [2026-04-18 13:41:21] Validation | Batch 1260/1567 | Loss: 1.0762 [2026-04-18 13:41:22] Validation | Batch 1270/1567 | Loss: 1.0762 [2026-04-18 13:41:22] Validation | Batch 1280/1567 | Loss: 1.0756 [2026-04-18 13:41:24] Validation | Batch 1290/1567 | Loss: 1.0758 [2026-04-18 13:41:24] Validation | Batch 1300/1567 | Loss: 1.0760 [2026-04-18 13:41:25] Validation | Batch 1310/1567 | Loss: 1.0766 [2026-04-18 13:41:26] Validation | Batch 1320/1567 | Loss: 1.0756 [2026-04-18 13:41:27] Validation | Batch 1330/1567 | Loss: 1.0752 [2026-04-18 13:41:28] Validation | Batch 1340/1567 | Loss: 1.0748 [2026-04-18 13:41:28] Validation | Batch 1350/1567 | Loss: 1.0756 [2026-04-18 13:41:29] Validation | Batch 1360/1567 | Loss: 1.0752 [2026-04-18 13:41:30] Validation | Batch 1370/1567 | Loss: 1.0755 [2026-04-18 13:41:31] Validation | Batch 1380/1567 | Loss: 1.0766 [2026-04-18 13:41:31] Validation | Batch 1390/1567 | Loss: 1.0767 [2026-04-18 13:41:32] Validation | Batch 1400/1567 | Loss: 1.0771 [2026-04-18 13:41:33] Validation | Batch 1410/1567 | Loss: 1.0768 [2026-04-18 13:41:33] Validation | Batch 1420/1567 | Loss: 1.0774 [2026-04-18 13:41:34] Validation | Batch 1430/1567 | Loss: 1.0771 [2026-04-18 13:41:35] Validation | Batch 1440/1567 | Loss: 1.0775 [2026-04-18 13:41:36] Validation | Batch 1450/1567 | Loss: 1.0769 [2026-04-18 13:41:36] Validation | Batch 1460/1567 | Loss: 1.0766 [2026-04-18 13:41:37] Validation | Batch 1470/1567 | Loss: 1.0756 [2026-04-18 13:41:38] Validation | Batch 1480/1567 | Loss: 1.0741 [2026-04-18 13:41:38] Validation | Batch 1490/1567 | Loss: 1.0741 [2026-04-18 13:41:39] Validation | Batch 1500/1567 | Loss: 1.0742 [2026-04-18 13:41:40] Validation | Batch 1510/1567 | Loss: 1.0739 [2026-04-18 13:41:41] Validation | Batch 1520/1567 | Loss: 1.0732 [2026-04-18 13:41:41] Validation | Batch 1530/1567 | Loss: 1.0740 [2026-04-18 13:41:43] Validation | Batch 1540/1567 | Loss: 1.0750 [2026-04-18 13:41:43] Validation | Batch 1550/1567 | Loss: 1.0752 [2026-04-18 13:41:44] Validation | Batch 1560/1567 | Loss: 1.0743 [2026-04-18 13:41:45] Validation | Batch 1567/1567 | Loss: 1.0746 [2026-04-18 13:41:45] Validation | Loss: 1.0746 | PPL: 2.96 | Time: 126.01s [2026-04-18 13:41:48] Epoch 2 | Step 10010 | Loss: 0.8068 | LR: 2.00e-05 [2026-04-18 13:41:52] Epoch 2 | Step 10020 | Loss: 0.7979 | LR: 2.00e-05 [2026-04-18 13:41:55] Epoch 2 | Step 10030 | Loss: 0.7958 | LR: 2.00e-05 [2026-04-18 13:41:59] Epoch 2 | Step 10040 | Loss: 0.7957 | LR: 2.00e-05 [2026-04-18 13:42:03] Epoch 2 | Step 10050 | Loss: 0.7942 | LR: 2.00e-05 [2026-04-18 13:42:06] Epoch 2 | Step 10060 | Loss: 0.8002 | LR: 2.00e-05 [2026-04-18 13:42:10] Epoch 2 | Step 10070 | Loss: 0.8002 | LR: 2.00e-05 [2026-04-18 13:42:14] Epoch 2 | Step 10080 | Loss: 0.8008 | LR: 2.00e-05 [2026-04-18 13:42:18] Epoch 2 | Step 10090 | Loss: 0.8010 | LR: 2.00e-05 [2026-04-18 13:42:21] Epoch 2 | Step 10100 | Loss: 0.7973 | LR: 2.00e-05 [2026-04-18 13:42:25] Epoch 2 | Step 10110 | Loss: 0.7973 | LR: 2.00e-05 [2026-04-18 13:42:28] Epoch 2 | Step 10120 | Loss: 0.7971 | LR: 2.00e-05 [2026-04-18 13:42:32] Epoch 2 | Step 10130 | Loss: 0.7986 | LR: 2.00e-05 [2026-04-18 13:42:35] Epoch 2 | Step 10140 | Loss: 0.8045 | LR: 2.00e-05 [2026-04-18 13:42:39] Epoch 2 | Step 10150 | Loss: 0.8030 | LR: 2.00e-05 [2026-04-18 13:42:42] Epoch 2 | Step 10160 | Loss: 0.8011 | LR: 2.00e-05 [2026-04-18 13:42:46] Epoch 2 | Step 10170 | Loss: 0.8055 | LR: 2.00e-05 [2026-04-18 13:42:50] Epoch 2 | Step 10180 | Loss: 0.8006 | LR: 2.00e-05 [2026-04-18 13:42:53] Epoch 2 | Step 10190 | Loss: 0.8029 | LR: 2.00e-05 [2026-04-18 13:42:57] Epoch 2 | Step 10200 | Loss: 0.8030 | LR: 2.00e-05 [2026-04-18 13:43:01] Epoch 2 | Step 10210 | Loss: 0.8018 | LR: 2.00e-05 [2026-04-18 13:43:04] Epoch 2 | Step 10220 | Loss: 0.8021 | LR: 2.00e-05 [2026-04-18 13:43:08] Epoch 2 | Step 10230 | Loss: 0.8009 | LR: 2.00e-05 [2026-04-18 13:43:11] Epoch 2 | Step 10240 | Loss: 0.7997 | LR: 2.00e-05 [2026-04-18 13:43:14] Epoch 2 | Step 10250 | Loss: 0.7980 | LR: 2.00e-05 [2026-04-18 13:43:18] Epoch 2 | Step 10260 | Loss: 0.7973 | LR: 2.00e-05 [2026-04-18 13:43:22] Epoch 2 | Step 10270 | Loss: 0.8006 | LR: 2.00e-05 [2026-04-18 13:43:25] Epoch 2 | Step 10280 | Loss: 0.8054 | LR: 2.00e-05 [2026-04-18 13:43:29] Epoch 2 | Step 10290 | Loss: 0.8032 | LR: 2.00e-05 [2026-04-18 13:43:32] Epoch 2 | Step 10300 | Loss: 0.8000 | LR: 2.00e-05 [2026-04-18 13:43:36] Epoch 2 | Step 10310 | Loss: 0.7996 | LR: 2.00e-05 [2026-04-18 13:43:40] Epoch 2 | Step 10320 | Loss: 0.8004 | LR: 2.00e-05 [2026-04-18 13:43:44] Epoch 2 | Step 10330 | Loss: 0.8009 | LR: 2.00e-05 [2026-04-18 13:43:47] Epoch 2 | Step 10340 | Loss: 0.8002 | LR: 2.00e-05 [2026-04-18 13:43:51] Epoch 2 | Step 10350 | Loss: 0.8013 | LR: 2.00e-05 [2026-04-18 13:43:55] Epoch 2 | Step 10360 | Loss: 0.8023 | LR: 2.00e-05 [2026-04-18 13:43:58] Epoch 2 | Step 10370 | Loss: 0.8012 | LR: 2.00e-05 [2026-04-18 13:44:02] Epoch 2 | Step 10380 | Loss: 0.8021 | LR: 2.00e-05 [2026-04-18 13:44:06] Epoch 2 | Step 10390 | Loss: 0.8019 | LR: 2.00e-05 [2026-04-18 13:44:09] Epoch 2 | Step 10400 | Loss: 0.8009 | LR: 2.00e-05 [2026-04-18 13:44:13] Epoch 2 | Step 10410 | Loss: 0.8011 | LR: 2.00e-05 [2026-04-18 13:44:17] Epoch 2 | Step 10420 | Loss: 0.8005 | LR: 2.00e-05 [2026-04-18 13:44:21] Epoch 2 | Step 10430 | Loss: 0.7981 | LR: 2.00e-05 [2026-04-18 13:44:24] Epoch 2 | Step 10440 | Loss: 0.7960 | LR: 2.00e-05 [2026-04-18 13:44:27] Epoch 2 | Step 10450 | Loss: 0.7973 | LR: 2.00e-05 [2026-04-18 13:44:31] Epoch 2 | Step 10460 | Loss: 0.7990 | LR: 2.00e-05 [2026-04-18 13:44:34] Epoch 2 | Step 10470 | Loss: 0.7985 | LR: 2.00e-05 [2026-04-18 13:44:38] Epoch 2 | Step 10480 | Loss: 0.7984 | LR: 2.00e-05 [2026-04-18 13:44:41] Epoch 2 | Step 10490 | Loss: 0.7984 | LR: 2.00e-05 [2026-04-18 13:44:45] Epoch 2 | Step 10500 | Loss: 0.7968 | LR: 2.00e-05 [2026-04-18 13:44:49] Epoch 2 | Step 10510 | Loss: 0.7981 | LR: 2.00e-05 [2026-04-18 13:44:52] Epoch 2 | Step 10520 | Loss: 0.7984 | LR: 2.00e-05 [2026-04-18 13:44:55] Epoch 2 | Step 10530 | Loss: 0.7977 | LR: 2.00e-05 [2026-04-18 13:44:59] Epoch 2 | Step 10540 | Loss: 0.7979 | LR: 2.00e-05 [2026-04-18 13:45:03] Epoch 2 | Step 10550 | Loss: 0.7992 | LR: 2.00e-05 [2026-04-18 13:45:06] Epoch 2 | Step 10560 | Loss: 0.7992 | LR: 2.00e-05 [2026-04-18 13:45:10] Epoch 2 | Step 10570 | Loss: 0.8000 | LR: 2.00e-05 [2026-04-18 13:45:14] Epoch 2 | Step 10580 | Loss: 0.8017 | LR: 2.00e-05 [2026-04-18 13:45:18] Epoch 2 | Step 10590 | Loss: 0.8009 | LR: 2.00e-05 [2026-04-18 13:45:21] Epoch 2 | Step 10600 | Loss: 0.8006 | LR: 2.00e-05 [2026-04-18 13:45:25] Epoch 2 | Step 10610 | Loss: 0.8019 | LR: 2.00e-05 [2026-04-18 13:45:28] Epoch 2 | Step 10620 | Loss: 0.8023 | LR: 2.00e-05 [2026-04-18 13:45:32] Epoch 2 | Step 10630 | Loss: 0.8016 | LR: 2.00e-05 [2026-04-18 13:45:35] Epoch 2 | Step 10640 | Loss: 0.8018 | LR: 2.00e-05 [2026-04-18 13:45:39] Epoch 2 | Step 10650 | Loss: 0.8022 | LR: 2.00e-05 [2026-04-18 13:45:43] Epoch 2 | Step 10660 | Loss: 0.8044 | LR: 2.00e-05 [2026-04-18 13:45:47] Epoch 2 | Step 10670 | Loss: 0.8047 | LR: 2.00e-05 [2026-04-18 13:45:50] Epoch 2 | Step 10680 | Loss: 0.8039 | LR: 2.00e-05 [2026-04-18 13:45:54] Epoch 2 | Step 10690 | Loss: 0.8035 | LR: 2.00e-05 [2026-04-18 13:45:58] Epoch 2 | Step 10700 | Loss: 0.8048 | LR: 2.00e-05 [2026-04-18 13:46:02] Epoch 2 | Step 10710 | Loss: 0.8052 | LR: 2.00e-05 [2026-04-18 13:46:06] Epoch 2 | Step 10720 | Loss: 0.8052 | LR: 2.00e-05 [2026-04-18 13:46:09] Epoch 2 | Step 10730 | Loss: 0.8064 | LR: 2.00e-05 [2026-04-18 13:46:12] Epoch 2 | Step 10740 | Loss: 0.8049 | LR: 2.00e-05 [2026-04-18 13:46:16] Epoch 2 | Step 10750 | Loss: 0.8054 | LR: 2.00e-05 [2026-04-18 13:46:19] Epoch 2 | Step 10760 | Loss: 0.8052 | LR: 2.00e-05 [2026-04-18 13:46:23] Epoch 2 | Step 10770 | Loss: 0.8053 | LR: 2.00e-05 [2026-04-18 13:46:26] Epoch 2 | Step 10780 | Loss: 0.8053 | LR: 2.00e-05 [2026-04-18 13:46:30] Epoch 2 | Step 10790 | Loss: 0.8065 | LR: 2.00e-05 [2026-04-18 13:46:33] Epoch 2 | Step 10800 | Loss: 0.8077 | LR: 2.00e-05 [2026-04-18 13:46:37] Epoch 2 | Step 10810 | Loss: 0.8079 | LR: 2.00e-05 [2026-04-18 13:46:40] Epoch 2 | Step 10820 | Loss: 0.8078 | LR: 2.00e-05 [2026-04-18 13:46:44] Epoch 2 | Step 10830 | Loss: 0.8079 | LR: 2.00e-05 [2026-04-18 13:46:48] Epoch 2 | Step 10840 | Loss: 0.8089 | LR: 2.00e-05 [2026-04-18 13:46:51] Epoch 2 | Step 10850 | Loss: 0.8085 | LR: 2.00e-05 [2026-04-18 13:46:55] Epoch 2 | Step 10860 | Loss: 0.8083 | LR: 2.00e-05 [2026-04-18 13:46:58] Epoch 2 | Step 10870 | Loss: 0.8084 | LR: 2.00e-05 [2026-04-18 13:47:02] Epoch 2 | Step 10880 | Loss: 0.8085 | LR: 2.00e-05 [2026-04-18 13:47:05] Epoch 2 | Step 10890 | Loss: 0.8080 | LR: 2.00e-05 [2026-04-18 13:47:08] Epoch 2 | Step 10900 | Loss: 0.8078 | LR: 2.00e-05 [2026-04-18 13:47:12] Epoch 2 | Step 10910 | Loss: 0.8090 | LR: 2.00e-05 [2026-04-18 13:47:15] Epoch 2 | Step 10920 | Loss: 0.8082 | LR: 2.00e-05 [2026-04-18 13:47:19] Epoch 2 | Step 10930 | Loss: 0.8079 | LR: 2.00e-05 [2026-04-18 13:47:22] Epoch 2 | Step 10940 | Loss: 0.8086 | LR: 2.00e-05 [2026-04-18 13:47:26] Epoch 2 | Step 10950 | Loss: 0.8082 | LR: 2.00e-05 [2026-04-18 13:47:30] Epoch 2 | Step 10960 | Loss: 0.8078 | LR: 2.00e-05 [2026-04-18 13:47:33] Epoch 2 | Step 10970 | Loss: 0.8081 | LR: 2.00e-05 [2026-04-18 13:47:37] Epoch 2 | Step 10980 | Loss: 0.8091 | LR: 2.00e-05 [2026-04-18 13:47:40] Epoch 2 | Step 10990 | Loss: 0.8099 | LR: 2.00e-05 [2026-04-18 13:47:44] Epoch 2 | Step 11000 | Loss: 0.8097 | LR: 2.00e-05 [2026-04-18 13:47:45] Validation | Batch 10/1567 | Loss: 0.9404 [2026-04-18 13:47:45] Validation | Batch 20/1567 | Loss: 1.0091 [2026-04-18 13:47:46] Validation | Batch 30/1567 | Loss: 1.0537 [2026-04-18 13:47:47] Validation | Batch 40/1567 | Loss: 1.0787 [2026-04-18 13:47:48] Validation | Batch 50/1567 | Loss: 1.0560 [2026-04-18 13:47:49] Validation | Batch 60/1567 | Loss: 1.0452 [2026-04-18 13:47:50] Validation | Batch 70/1567 | Loss: 1.0299 [2026-04-18 13:47:51] Validation | Batch 80/1567 | Loss: 1.0447 [2026-04-18 13:47:52] Validation | Batch 90/1567 | Loss: 1.0523 [2026-04-18 13:47:53] Validation | Batch 100/1567 | Loss: 1.0606 [2026-04-18 13:47:53] Validation | Batch 110/1567 | Loss: 1.0527 [2026-04-18 13:47:54] Validation | Batch 120/1567 | Loss: 1.0648 [2026-04-18 13:47:55] Validation | Batch 130/1567 | Loss: 1.0649 [2026-04-18 13:47:56] Validation | Batch 140/1567 | Loss: 1.0689 [2026-04-18 13:47:57] Validation | Batch 150/1567 | Loss: 1.0763 [2026-04-18 13:47:57] Validation | Batch 160/1567 | Loss: 1.0766 [2026-04-18 13:47:58] Validation | Batch 170/1567 | Loss: 1.0617 [2026-04-18 13:47:59] Validation | Batch 180/1567 | Loss: 1.0635 [2026-04-18 13:48:00] Validation | Batch 190/1567 | Loss: 1.0604 [2026-04-18 13:48:01] Validation | Batch 200/1567 | Loss: 1.0638 [2026-04-18 13:48:02] Validation | Batch 210/1567 | Loss: 1.0656 [2026-04-18 13:48:02] Validation | Batch 220/1567 | Loss: 1.0680 [2026-04-18 13:48:04] Validation | Batch 230/1567 | Loss: 1.0735 [2026-04-18 13:48:04] Validation | Batch 240/1567 | Loss: 1.0722 [2026-04-18 13:48:05] Validation | Batch 250/1567 | Loss: 1.0658 [2026-04-18 13:48:06] Validation | Batch 260/1567 | Loss: 1.0608 [2026-04-18 13:48:06] Validation | Batch 270/1567 | Loss: 1.0579 [2026-04-18 13:48:07] Validation | Batch 280/1567 | Loss: 1.0592 [2026-04-18 13:48:08] Validation | Batch 290/1567 | Loss: 1.0645 [2026-04-18 13:48:09] Validation | Batch 300/1567 | Loss: 1.0695 [2026-04-18 13:48:10] Validation | Batch 310/1567 | Loss: 1.0688 [2026-04-18 13:48:11] Validation | Batch 320/1567 | Loss: 1.0689 [2026-04-18 13:48:12] Validation | Batch 330/1567 | Loss: 1.0659 [2026-04-18 13:48:13] Validation | Batch 340/1567 | Loss: 1.0700 [2026-04-18 13:48:13] Validation | Batch 350/1567 | Loss: 1.0693 [2026-04-18 13:48:14] Validation | Batch 360/1567 | Loss: 1.0676 [2026-04-18 13:48:15] Validation | Batch 370/1567 | Loss: 1.0648 [2026-04-18 13:48:16] Validation | Batch 380/1567 | Loss: 1.0682 [2026-04-18 13:48:16] Validation | Batch 390/1567 | Loss: 1.0690 [2026-04-18 13:48:17] Validation | Batch 400/1567 | Loss: 1.0702 [2026-04-18 13:48:18] Validation | Batch 410/1567 | Loss: 1.0696 [2026-04-18 13:48:19] Validation | Batch 420/1567 | Loss: 1.0691 [2026-04-18 13:48:20] Validation | Batch 430/1567 | Loss: 1.0687 [2026-04-18 13:48:21] Validation | Batch 440/1567 | Loss: 1.0680 [2026-04-18 13:48:21] Validation | Batch 450/1567 | Loss: 1.0679 [2026-04-18 13:48:22] Validation | Batch 460/1567 | Loss: 1.0668 [2026-04-18 13:48:23] Validation | Batch 470/1567 | Loss: 1.0664 [2026-04-18 13:48:24] Validation | Batch 480/1567 | Loss: 1.0641 [2026-04-18 13:48:25] Validation | Batch 490/1567 | Loss: 1.0640 [2026-04-18 13:48:25] Validation | Batch 500/1567 | Loss: 1.0642 [2026-04-18 13:48:26] Validation | Batch 510/1567 | Loss: 1.0660 [2026-04-18 13:48:27] Validation | Batch 520/1567 | Loss: 1.0676 [2026-04-18 13:48:28] Validation | Batch 530/1567 | Loss: 1.0676 [2026-04-18 13:48:29] Validation | Batch 540/1567 | Loss: 1.0704 [2026-04-18 13:48:30] Validation | Batch 550/1567 | Loss: 1.0738 [2026-04-18 13:48:30] Validation | Batch 560/1567 | Loss: 1.0733 [2026-04-18 13:48:31] Validation | Batch 570/1567 | Loss: 1.0734 [2026-04-18 13:48:32] Validation | Batch 580/1567 | Loss: 1.0722 [2026-04-18 13:48:33] Validation | Batch 590/1567 | Loss: 1.0708 [2026-04-18 13:48:34] Validation | Batch 600/1567 | Loss: 1.0687 [2026-04-18 13:48:35] Validation | Batch 610/1567 | Loss: 1.0677 [2026-04-18 13:48:36] Validation | Batch 620/1567 | Loss: 1.0692 [2026-04-18 13:48:37] Validation | Batch 630/1567 | Loss: 1.0673 [2026-04-18 13:48:37] Validation | Batch 640/1567 | Loss: 1.0695 [2026-04-18 13:48:38] Validation | Batch 650/1567 | Loss: 1.0688 [2026-04-18 13:48:39] Validation | Batch 660/1567 | Loss: 1.0677 [2026-04-18 13:48:40] Validation | Batch 670/1567 | Loss: 1.0654 [2026-04-18 13:48:40] Validation | Batch 680/1567 | Loss: 1.0651 [2026-04-18 13:48:41] Validation | Batch 690/1567 | Loss: 1.0660 [2026-04-18 13:48:42] Validation | Batch 700/1567 | Loss: 1.0647 [2026-04-18 13:48:43] Validation | Batch 710/1567 | Loss: 1.0659 [2026-04-18 13:48:44] Validation | Batch 720/1567 | Loss: 1.0653 [2026-04-18 13:48:44] Validation | Batch 730/1567 | Loss: 1.0661 [2026-04-18 13:48:45] Validation | Batch 740/1567 | Loss: 1.0668 [2026-04-18 13:48:46] Validation | Batch 750/1567 | Loss: 1.0672 [2026-04-18 13:48:47] Validation | Batch 760/1567 | Loss: 1.0670 [2026-04-18 13:48:48] Validation | Batch 770/1567 | Loss: 1.0691 [2026-04-18 13:48:49] Validation | Batch 780/1567 | Loss: 1.0704 [2026-04-18 13:48:49] Validation | Batch 790/1567 | Loss: 1.0699 [2026-04-18 13:48:50] Validation | Batch 800/1567 | Loss: 1.0718 [2026-04-18 13:48:51] Validation | Batch 810/1567 | Loss: 1.0717 [2026-04-18 13:48:52] Validation | Batch 820/1567 | Loss: 1.0714 [2026-04-18 13:48:52] Validation | Batch 830/1567 | Loss: 1.0701 [2026-04-18 13:48:53] Validation | Batch 840/1567 | Loss: 1.0704 [2026-04-18 13:48:54] Validation | Batch 850/1567 | Loss: 1.0691 [2026-04-18 13:48:54] Validation | Batch 860/1567 | Loss: 1.0706 [2026-04-18 13:48:55] Validation | Batch 870/1567 | Loss: 1.0712 [2026-04-18 13:48:56] Validation | Batch 880/1567 | Loss: 1.0720 [2026-04-18 13:48:57] Validation | Batch 890/1567 | Loss: 1.0726 [2026-04-18 13:48:57] Validation | Batch 900/1567 | Loss: 1.0747 [2026-04-18 13:48:58] Validation | Batch 910/1567 | Loss: 1.0748 [2026-04-18 13:48:59] Validation | Batch 920/1567 | Loss: 1.0771 [2026-04-18 13:49:00] Validation | Batch 930/1567 | Loss: 1.0749 [2026-04-18 13:49:00] Validation | Batch 940/1567 | Loss: 1.0746 [2026-04-18 13:49:01] Validation | Batch 950/1567 | Loss: 1.0736 [2026-04-18 13:49:02] Validation | Batch 960/1567 | Loss: 1.0723 [2026-04-18 13:49:02] Validation | Batch 970/1567 | Loss: 1.0738 [2026-04-18 13:49:03] Validation | Batch 980/1567 | Loss: 1.0742 [2026-04-18 13:49:04] Validation | Batch 990/1567 | Loss: 1.0736 [2026-04-18 13:49:05] Validation | Batch 1000/1567 | Loss: 1.0739 [2026-04-18 13:49:05] Validation | Batch 1010/1567 | Loss: 1.0716 [2026-04-18 13:49:06] Validation | Batch 1020/1567 | Loss: 1.0718 [2026-04-18 13:49:07] Validation | Batch 1030/1567 | Loss: 1.0734 [2026-04-18 13:49:08] Validation | Batch 1040/1567 | Loss: 1.0730 [2026-04-18 13:49:09] Validation | Batch 1050/1567 | Loss: 1.0738 [2026-04-18 13:49:10] Validation | Batch 1060/1567 | Loss: 1.0731 [2026-04-18 13:49:11] Validation | Batch 1070/1567 | Loss: 1.0723 [2026-04-18 13:49:11] Validation | Batch 1080/1567 | Loss: 1.0732 [2026-04-18 13:49:12] Validation | Batch 1090/1567 | Loss: 1.0730 [2026-04-18 13:49:13] Validation | Batch 1100/1567 | Loss: 1.0734 [2026-04-18 13:49:13] Validation | Batch 1110/1567 | Loss: 1.0731 [2026-04-18 13:49:14] Validation | Batch 1120/1567 | Loss: 1.0732 [2026-04-18 13:49:15] Validation | Batch 1130/1567 | Loss: 1.0734 [2026-04-18 13:49:16] Validation | Batch 1140/1567 | Loss: 1.0740 [2026-04-18 13:49:17] Validation | Batch 1150/1567 | Loss: 1.0742 [2026-04-18 13:49:17] Validation | Batch 1160/1567 | Loss: 1.0750 [2026-04-18 13:49:18] Validation | Batch 1170/1567 | Loss: 1.0748 [2026-04-18 13:49:19] Validation | Batch 1180/1567 | Loss: 1.0744 [2026-04-18 13:49:20] Validation | Batch 1190/1567 | Loss: 1.0755 [2026-04-18 13:49:21] Validation | Batch 1200/1567 | Loss: 1.0748 [2026-04-18 13:49:22] Validation | Batch 1210/1567 | Loss: 1.0737 [2026-04-18 13:49:22] Validation | Batch 1220/1567 | Loss: 1.0740 [2026-04-18 13:49:23] Validation | Batch 1230/1567 | Loss: 1.0761 [2026-04-18 13:49:24] Validation | Batch 1240/1567 | Loss: 1.0750 [2026-04-18 13:49:24] Validation | Batch 1250/1567 | Loss: 1.0749 [2026-04-18 13:49:25] Validation | Batch 1260/1567 | Loss: 1.0760 [2026-04-18 13:49:26] Validation | Batch 1270/1567 | Loss: 1.0760 [2026-04-18 13:49:27] Validation | Batch 1280/1567 | Loss: 1.0753 [2026-04-18 13:49:28] Validation | Batch 1290/1567 | Loss: 1.0758 [2026-04-18 13:49:29] Validation | Batch 1300/1567 | Loss: 1.0759 [2026-04-18 13:49:30] Validation | Batch 1310/1567 | Loss: 1.0764 [2026-04-18 13:49:31] Validation | Batch 1320/1567 | Loss: 1.0754 [2026-04-18 13:49:31] Validation | Batch 1330/1567 | Loss: 1.0750 [2026-04-18 13:49:32] Validation | Batch 1340/1567 | Loss: 1.0749 [2026-04-18 13:49:33] Validation | Batch 1350/1567 | Loss: 1.0756 [2026-04-18 13:49:34] Validation | Batch 1360/1567 | Loss: 1.0752 [2026-04-18 13:49:34] Validation | Batch 1370/1567 | Loss: 1.0756 [2026-04-18 13:49:35] Validation | Batch 1380/1567 | Loss: 1.0768 [2026-04-18 13:49:36] Validation | Batch 1390/1567 | Loss: 1.0770 [2026-04-18 13:49:37] Validation | Batch 1400/1567 | Loss: 1.0773 [2026-04-18 13:49:37] Validation | Batch 1410/1567 | Loss: 1.0771 [2026-04-18 13:49:38] Validation | Batch 1420/1567 | Loss: 1.0776 [2026-04-18 13:49:39] Validation | Batch 1430/1567 | Loss: 1.0772 [2026-04-18 13:49:39] Validation | Batch 1440/1567 | Loss: 1.0775 [2026-04-18 13:49:40] Validation | Batch 1450/1567 | Loss: 1.0769 [2026-04-18 13:49:40] Validation | Batch 1460/1567 | Loss: 1.0767 [2026-04-18 13:49:41] Validation | Batch 1470/1567 | Loss: 1.0757 [2026-04-18 13:49:42] Validation | Batch 1480/1567 | Loss: 1.0741 [2026-04-18 13:49:42] Validation | Batch 1490/1567 | Loss: 1.0742 [2026-04-18 13:49:43] Validation | Batch 1500/1567 | Loss: 1.0743 [2026-04-18 13:49:44] Validation | Batch 1510/1567 | Loss: 1.0740 [2026-04-18 13:49:45] Validation | Batch 1520/1567 | Loss: 1.0733 [2026-04-18 13:49:45] Validation | Batch 1530/1567 | Loss: 1.0742 [2026-04-18 13:49:46] Validation | Batch 1540/1567 | Loss: 1.0751 [2026-04-18 13:49:47] Validation | Batch 1550/1567 | Loss: 1.0753 [2026-04-18 13:49:48] Validation | Batch 1560/1567 | Loss: 1.0743 [2026-04-18 13:49:49] Validation | Batch 1567/1567 | Loss: 1.0747 [2026-04-18 13:49:49] Validation | Loss: 1.0747 | PPL: 2.95 | Time: 124.95s [2026-04-18 13:49:52] Epoch 2 | Step 11010 | Loss: 0.8095 | LR: 2.00e-05 [2026-04-18 13:49:56] Epoch 2 | Step 11020 | Loss: 0.8094 | LR: 2.00e-05 [2026-04-18 13:49:59] Epoch 2 | Step 11030 | Loss: 0.8084 | LR: 2.00e-05 [2026-04-18 13:50:03] Epoch 2 | Step 11040 | Loss: 0.8072 | LR: 2.00e-05 [2026-04-18 13:50:07] Epoch 2 | Step 11050 | Loss: 0.8073 | LR: 2.00e-05 [2026-04-18 13:50:11] Epoch 2 | Step 11060 | Loss: 0.8078 | LR: 2.00e-05 [2026-04-18 13:50:13] Epoch 2 | Step 11070 | Loss: 0.8071 | LR: 2.00e-05 [2026-04-18 13:50:17] Epoch 2 | Step 11080 | Loss: 0.8072 | LR: 2.00e-05 [2026-04-18 13:50:20] Epoch 2 | Step 11090 | Loss: 0.8069 | LR: 2.00e-05 [2026-04-18 13:50:24] Epoch 2 | Step 11100 | Loss: 0.8070 | LR: 2.00e-05 [2026-04-18 13:50:28] Epoch 2 | Step 11110 | Loss: 0.8067 | LR: 2.00e-05 [2026-04-18 13:50:31] Epoch 2 | Step 11120 | Loss: 0.8057 | LR: 2.00e-05 [2026-04-18 13:50:35] Epoch 2 | Step 11130 | Loss: 0.8054 | LR: 2.00e-05 [2026-04-18 13:50:39] Epoch 2 | Step 11140 | Loss: 0.8049 | LR: 2.00e-05 [2026-04-18 13:50:42] Epoch 2 | Step 11150 | Loss: 0.8053 | LR: 2.00e-05 [2026-04-18 13:50:46] Epoch 2 | Step 11160 | Loss: 0.8053 | LR: 2.00e-05 [2026-04-18 13:50:49] Epoch 2 | Step 11170 | Loss: 0.8051 | LR: 2.00e-05 [2026-04-18 13:50:53] Epoch 2 | Step 11180 | Loss: 0.8048 | LR: 2.00e-05 [2026-04-18 13:50:56] Epoch 2 | Step 11190 | Loss: 0.8045 | LR: 2.00e-05 [2026-04-18 13:51:00] Epoch 2 | Step 11200 | Loss: 0.8045 | LR: 2.00e-05 [2026-04-18 13:51:04] Epoch 2 | Step 11210 | Loss: 0.8049 | LR: 2.00e-05 [2026-04-18 13:51:07] Epoch 2 | Step 11220 | Loss: 0.8035 | LR: 2.00e-05 [2026-04-18 13:51:11] Epoch 2 | Step 11230 | Loss: 0.8036 | LR: 2.00e-05 [2026-04-18 13:51:14] Epoch 2 | Step 11240 | Loss: 0.8034 | LR: 2.00e-05 [2026-04-18 13:51:17] Epoch 2 | Step 11250 | Loss: 0.8028 | LR: 2.00e-05 [2026-04-18 13:51:21] Epoch 2 | Step 11260 | Loss: 0.8030 | LR: 2.00e-05 [2026-04-18 13:51:25] Epoch 2 | Step 11270 | Loss: 0.8020 | LR: 2.00e-05 [2026-04-18 13:51:28] Epoch 2 | Step 11280 | Loss: 0.8016 | LR: 2.00e-05 [2026-04-18 13:51:32] Epoch 2 | Step 11290 | Loss: 0.8023 | LR: 2.00e-05 [2026-04-18 13:51:35] Epoch 2 | Step 11300 | Loss: 0.8026 | LR: 2.00e-05 [2026-04-18 13:51:39] Epoch 2 | Step 11310 | Loss: 0.8029 | LR: 2.00e-05 [2026-04-18 13:51:42] Epoch 2 | Step 11320 | Loss: 0.8032 | LR: 2.00e-05 [2026-04-18 13:51:46] Epoch 2 | Step 11330 | Loss: 0.8035 | LR: 2.00e-05 [2026-04-18 13:51:49] Epoch 2 | Step 11340 | Loss: 0.8037 | LR: 2.00e-05 [2026-04-18 13:51:53] Epoch 2 | Step 11350 | Loss: 0.8040 | LR: 2.00e-05 [2026-04-18 13:51:56] Epoch 2 | Step 11360 | Loss: 0.8047 | LR: 2.00e-05 [2026-04-18 13:52:00] Epoch 2 | Step 11370 | Loss: 0.8052 | LR: 2.00e-05 [2026-04-18 13:52:03] Epoch 2 | Step 11380 | Loss: 0.8053 | LR: 2.00e-05 [2026-04-18 13:52:07] Epoch 2 | Step 11390 | Loss: 0.8059 | LR: 2.00e-05 [2026-04-18 13:52:11] Epoch 2 | Step 11400 | Loss: 0.8054 | LR: 2.00e-05 [2026-04-18 13:52:15] Epoch 2 | Step 11410 | Loss: 0.8063 | LR: 2.00e-05 [2026-04-18 13:52:20] Epoch 2 | Step 11420 | Loss: 0.8072 | LR: 2.00e-05 [2026-04-18 13:52:24] Epoch 2 | Step 11430 | Loss: 0.8076 | LR: 2.00e-05 [2026-04-18 13:52:27] Epoch 2 | Step 11440 | Loss: 0.8076 | LR: 2.00e-05 [2026-04-18 13:52:31] Epoch 2 | Step 11450 | Loss: 0.8080 | LR: 2.00e-05 [2026-04-18 13:52:34] Epoch 2 | Step 11460 | Loss: 0.8082 | LR: 2.00e-05 [2026-04-18 13:52:38] Epoch 2 | Step 11470 | Loss: 0.8083 | LR: 2.00e-05 [2026-04-18 13:52:41] Epoch 2 | Step 11480 | Loss: 0.8083 | LR: 2.00e-05 [2026-04-18 13:52:45] Epoch 2 | Step 11490 | Loss: 0.8081 | LR: 2.00e-05 [2026-04-18 13:52:48] Epoch 2 | Step 11500 | Loss: 0.8085 | LR: 2.00e-05 [2026-04-18 13:52:52] Epoch 2 | Step 11510 | Loss: 0.8086 | LR: 2.00e-05 [2026-04-18 13:52:57] Epoch 2 | Step 11520 | Loss: 0.8086 | LR: 2.00e-05 [2026-04-18 13:53:00] Epoch 2 | Step 11530 | Loss: 0.8088 | LR: 2.00e-05 [2026-04-18 13:53:04] Epoch 2 | Step 11540 | Loss: 0.8088 | LR: 2.00e-05 [2026-04-18 13:53:07] Epoch 2 | Step 11550 | Loss: 0.8092 | LR: 2.00e-05 [2026-04-18 13:53:11] Epoch 2 | Step 11560 | Loss: 0.8090 | LR: 2.00e-05 [2026-04-18 13:53:15] Epoch 2 | Step 11570 | Loss: 0.8091 | LR: 2.00e-05 [2026-04-18 13:53:18] Epoch 2 | Step 11580 | Loss: 0.8088 | LR: 2.00e-05 [2026-04-18 13:53:22] Epoch 2 | Step 11590 | Loss: 0.8090 | LR: 2.00e-05 [2026-04-18 13:53:25] Epoch 2 | Step 11600 | Loss: 0.8098 | LR: 2.00e-05 [2026-04-18 13:53:28] Epoch 2 | Step 11610 | Loss: 0.8100 | LR: 2.00e-05 [2026-04-18 13:53:32] Epoch 2 | Step 11620 | Loss: 0.8097 | LR: 2.00e-05 [2026-04-18 13:53:36] Epoch 2 | Step 11630 | Loss: 0.8096 | LR: 2.00e-05 [2026-04-18 13:53:39] Epoch 2 | Step 11640 | Loss: 0.8098 | LR: 2.00e-05 [2026-04-18 13:53:43] Epoch 2 | Step 11650 | Loss: 0.8097 | LR: 2.00e-05 [2026-04-18 13:53:46] Epoch 2 | Step 11660 | Loss: 0.8097 | LR: 2.00e-05 [2026-04-18 13:53:50] Epoch 2 | Step 11670 | Loss: 0.8106 | LR: 2.00e-05 [2026-04-18 13:53:54] Epoch 2 | Step 11680 | Loss: 0.8102 | LR: 2.00e-05 [2026-04-18 13:53:58] Epoch 2 | Step 11690 | Loss: 0.8104 | LR: 2.00e-05 [2026-04-18 13:54:02] Epoch 2 | Step 11700 | Loss: 0.8102 | LR: 2.00e-05 [2026-04-18 13:54:06] Epoch 2 | Step 11710 | Loss: 0.8097 | LR: 2.00e-05 [2026-04-18 13:54:09] Epoch 2 | Step 11720 | Loss: 0.8093 | LR: 2.00e-05 [2026-04-18 13:54:13] Epoch 2 | Step 11730 | Loss: 0.8090 | LR: 2.00e-05 [2026-04-18 13:54:17] Epoch 2 | Step 11740 | Loss: 0.8092 | LR: 2.00e-05 [2026-04-18 13:54:20] Epoch 2 | Step 11750 | Loss: 0.8094 | LR: 2.00e-05 [2026-04-18 13:54:24] Epoch 2 | Step 11760 | Loss: 0.8092 | LR: 2.00e-05 [2026-04-18 13:54:27] Epoch 2 | Step 11770 | Loss: 0.8094 | LR: 2.00e-05 [2026-04-18 13:54:31] Epoch 2 | Step 11780 | Loss: 0.8095 | LR: 2.00e-05 [2026-04-18 13:54:34] Epoch 2 | Step 11790 | Loss: 0.8095 | LR: 2.00e-05 [2026-04-18 13:54:38] Epoch 2 | Step 11800 | Loss: 0.8095 | LR: 2.00e-05 [2026-04-18 13:54:41] Epoch 2 | Step 11810 | Loss: 0.8095 | LR: 2.00e-05 [2026-04-18 13:54:45] Epoch 2 | Step 11820 | Loss: 0.8090 | LR: 2.00e-05 [2026-04-18 13:54:48] Epoch 2 | Step 11830 | Loss: 0.8087 | LR: 2.00e-05 [2026-04-18 13:54:52] Epoch 2 | Step 11840 | Loss: 0.8092 | LR: 2.00e-05 [2026-04-18 13:54:55] Epoch 2 | Step 11850 | Loss: 0.8094 | LR: 2.00e-05 [2026-04-18 13:54:59] Epoch 2 | Step 11860 | Loss: 0.8092 | LR: 2.00e-05 [2026-04-18 13:55:03] Epoch 2 | Step 11870 | Loss: 0.8094 | LR: 2.00e-05 [2026-04-18 13:55:06] Epoch 2 | Step 11880 | Loss: 0.8093 | LR: 2.00e-05 [2026-04-18 13:55:10] Epoch 2 | Step 11890 | Loss: 0.8093 | LR: 2.00e-05 [2026-04-18 13:55:13] Epoch 2 | Step 11900 | Loss: 0.8093 | LR: 2.00e-05 [2026-04-18 13:55:17] Epoch 2 | Step 11910 | Loss: 0.8098 | LR: 2.00e-05 [2026-04-18 13:55:21] Epoch 2 | Step 11920 | Loss: 0.8100 | LR: 2.00e-05 [2026-04-18 13:55:24] Epoch 2 | Step 11930 | Loss: 0.8103 | LR: 2.00e-05 [2026-04-18 13:55:28] Epoch 2 | Step 11940 | Loss: 0.8101 | LR: 2.00e-05 [2026-04-18 13:55:31] Epoch 2 | Step 11950 | Loss: 0.8100 | LR: 2.00e-05 [2026-04-18 13:55:35] Epoch 2 | Step 11960 | Loss: 0.8095 | LR: 2.00e-05 [2026-04-18 13:55:38] Epoch 2 | Step 11970 | Loss: 0.8100 | LR: 1.99e-05 [2026-04-18 13:55:42] Epoch 2 | Step 11980 | Loss: 0.8103 | LR: 1.99e-05 [2026-04-18 13:55:45] Epoch 2 | Step 11990 | Loss: 0.8107 | LR: 1.99e-05 [2026-04-18 13:55:49] Epoch 2 | Step 12000 | Loss: 0.8104 | LR: 1.99e-05 [2026-04-18 13:55:58] Checkpoint saved: outputs/2026-04-18/12-19-14/checkpoints/checkpoint_step_12000.pt [2026-04-18 13:56:12] Validation | Batch 10/1567 | Loss: 0.9553 [2026-04-18 13:56:13] Validation | Batch 20/1567 | Loss: 1.0207 [2026-04-18 13:56:14] Validation | Batch 30/1567 | Loss: 1.0630 [2026-04-18 13:56:15] Validation | Batch 40/1567 | Loss: 1.0867 [2026-04-18 13:56:16] Validation | Batch 50/1567 | Loss: 1.0644 [2026-04-18 13:56:17] Validation | Batch 60/1567 | Loss: 1.0520 [2026-04-18 13:56:18] Validation | Batch 70/1567 | Loss: 1.0352 [2026-04-18 13:56:19] Validation | Batch 80/1567 | Loss: 1.0516 [2026-04-18 13:56:19] Validation | Batch 90/1567 | Loss: 1.0614 [2026-04-18 13:56:20] Validation | Batch 100/1567 | Loss: 1.0706 [2026-04-18 13:56:21] Validation | Batch 110/1567 | Loss: 1.0632 [2026-04-18 13:56:22] Validation | Batch 120/1567 | Loss: 1.0727 [2026-04-18 13:56:23] Validation | Batch 130/1567 | Loss: 1.0739 [2026-04-18 13:56:24] Validation | Batch 140/1567 | Loss: 1.0763 [2026-04-18 13:56:24] Validation | Batch 150/1567 | Loss: 1.0842 [2026-04-18 13:56:25] Validation | Batch 160/1567 | Loss: 1.0845 [2026-04-18 13:56:26] Validation | Batch 170/1567 | Loss: 1.0685 [2026-04-18 13:56:27] Validation | Batch 180/1567 | Loss: 1.0707 [2026-04-18 13:56:28] Validation | Batch 190/1567 | Loss: 1.0672 [2026-04-18 13:56:29] Validation | Batch 200/1567 | Loss: 1.0703 [2026-04-18 13:56:29] Validation | Batch 210/1567 | Loss: 1.0709 [2026-04-18 13:56:30] Validation | Batch 220/1567 | Loss: 1.0736 [2026-04-18 13:56:31] Validation | Batch 230/1567 | Loss: 1.0777 [2026-04-18 13:56:32] Validation | Batch 240/1567 | Loss: 1.0764 [2026-04-18 13:56:33] Validation | Batch 250/1567 | Loss: 1.0708 [2026-04-18 13:56:33] Validation | Batch 260/1567 | Loss: 1.0656 [2026-04-18 13:56:34] Validation | Batch 270/1567 | Loss: 1.0628 [2026-04-18 13:56:35] Validation | Batch 280/1567 | Loss: 1.0646 [2026-04-18 13:56:36] Validation | Batch 290/1567 | Loss: 1.0697 [2026-04-18 13:56:37] Validation | Batch 300/1567 | Loss: 1.0739 [2026-04-18 13:56:38] Validation | Batch 310/1567 | Loss: 1.0733 [2026-04-18 13:56:37] Validation | Batch 320/1567 | Loss: 1.0736 [2026-04-18 13:56:39] Validation | Batch 330/1567 | Loss: 1.0706 [2026-04-18 13:56:39] Validation | Batch 340/1567 | Loss: 1.0748 [2026-04-18 13:56:40] Validation | Batch 350/1567 | Loss: 1.0737 [2026-04-18 13:56:41] Validation | Batch 360/1567 | Loss: 1.0720 [2026-04-18 13:56:42] Validation | Batch 370/1567 | Loss: 1.0690 [2026-04-18 13:56:43] Validation | Batch 380/1567 | Loss: 1.0725 [2026-04-18 13:56:43] Validation | Batch 390/1567 | Loss: 1.0739 [2026-04-18 13:56:44] Validation | Batch 400/1567 | Loss: 1.0760 [2026-04-18 13:56:45] Validation | Batch 410/1567 | Loss: 1.0754 [2026-04-18 13:56:46] Validation | Batch 420/1567 | Loss: 1.0748 [2026-04-18 13:56:47] Validation | Batch 430/1567 | Loss: 1.0748 [2026-04-18 13:56:48] Validation | Batch 440/1567 | Loss: 1.0742 [2026-04-18 13:56:48] Validation | Batch 450/1567 | Loss: 1.0744 [2026-04-18 13:56:49] Validation | Batch 460/1567 | Loss: 1.0731 [2026-04-18 13:56:50] Validation | Batch 470/1567 | Loss: 1.0724 [2026-04-18 13:56:51] Validation | Batch 480/1567 | Loss: 1.0703 [2026-04-18 13:56:51] Validation | Batch 490/1567 | Loss: 1.0703 [2026-04-18 13:56:52] Validation | Batch 500/1567 | Loss: 1.0702 [2026-04-18 13:56:53] Validation | Batch 510/1567 | Loss: 1.0726 [2026-04-18 13:56:54] Validation | Batch 520/1567 | Loss: 1.0743 [2026-04-18 13:56:55] Validation | Batch 530/1567 | Loss: 1.0743 [2026-04-18 13:56:56] Validation | Batch 540/1567 | Loss: 1.0769 [2026-04-18 13:56:56] Validation | Batch 550/1567 | Loss: 1.0804 [2026-04-18 13:56:57] Validation | Batch 560/1567 | Loss: 1.0802 [2026-04-18 13:56:58] Validation | Batch 570/1567 | Loss: 1.0801 [2026-04-18 13:56:59] Validation | Batch 580/1567 | Loss: 1.0787 [2026-04-18 13:57:00] Validation | Batch 590/1567 | Loss: 1.0772 [2026-04-18 13:57:01] Validation | Batch 600/1567 | Loss: 1.0752 [2026-04-18 13:57:02] Validation | Batch 610/1567 | Loss: 1.0744 [2026-04-18 13:57:03] Validation | Batch 620/1567 | Loss: 1.0757 [2026-04-18 13:57:04] Validation | Batch 630/1567 | Loss: 1.0740 [2026-04-18 13:57:04] Validation | Batch 640/1567 | Loss: 1.0759 [2026-04-18 13:57:05] Validation | Batch 650/1567 | Loss: 1.0752 [2026-04-18 13:57:06] Validation | Batch 660/1567 | Loss: 1.0740 [2026-04-18 13:57:07] Validation | Batch 670/1567 | Loss: 1.0719 [2026-04-18 13:57:07] Validation | Batch 680/1567 | Loss: 1.0715 [2026-04-18 13:57:08] Validation | Batch 690/1567 | Loss: 1.0723 [2026-04-18 13:57:09] Validation | Batch 700/1567 | Loss: 1.0707 [2026-04-18 13:57:10] Validation | Batch 710/1567 | Loss: 1.0722 [2026-04-18 13:57:11] Validation | Batch 720/1567 | Loss: 1.0717 [2026-04-18 13:57:11] Validation | Batch 730/1567 | Loss: 1.0722 [2026-04-18 13:57:12] Validation | Batch 740/1567 | Loss: 1.0732 [2026-04-18 13:57:13] Validation | Batch 750/1567 | Loss: 1.0735 [2026-04-18 13:57:14] Validation | Batch 760/1567 | Loss: 1.0730 [2026-04-18 13:57:15] Validation | Batch 770/1567 | Loss: 1.0751 [2026-04-18 13:57:16] Validation | Batch 780/1567 | Loss: 1.0764 [2026-04-18 13:57:16] Validation | Batch 790/1567 | Loss: 1.0761 [2026-04-18 13:57:17] Validation | Batch 800/1567 | Loss: 1.0778 [2026-04-18 13:57:18] Validation | Batch 810/1567 | Loss: 1.0777 [2026-04-18 13:57:19] Validation | Batch 820/1567 | Loss: 1.0775 [2026-04-18 13:57:19] Validation | Batch 830/1567 | Loss: 1.0758 [2026-04-18 13:57:20] Validation | Batch 840/1567 | Loss: 1.0761 [2026-04-18 13:57:21] Validation | Batch 850/1567 | Loss: 1.0747 [2026-04-18 13:57:21] Validation | Batch 860/1567 | Loss: 1.0761 [2026-04-18 13:57:22] Validation | Batch 870/1567 | Loss: 1.0766 [2026-04-18 13:57:23] Validation | Batch 880/1567 | Loss: 1.0777 [2026-04-18 13:57:24] Validation | Batch 890/1567 | Loss: 1.0781 [2026-04-18 13:57:25] Validation | Batch 900/1567 | Loss: 1.0802 [2026-04-18 13:57:25] Validation | Batch 910/1567 | Loss: 1.0802 [2026-04-18 13:57:26] Validation | Batch 920/1567 | Loss: 1.0822 [2026-04-18 13:57:27] Validation | Batch 930/1567 | Loss: 1.0799 [2026-04-18 13:57:27] Validation | Batch 940/1567 | Loss: 1.0797 [2026-04-18 13:57:28] Validation | Batch 950/1567 | Loss: 1.0787 [2026-04-18 13:57:29] Validation | Batch 960/1567 | Loss: 1.0771 [2026-04-18 13:57:30] Validation | Batch 970/1567 | Loss: 1.0785 [2026-04-18 13:57:30] Validation | Batch 980/1567 | Loss: 1.0788 [2026-04-18 13:57:31] Validation | Batch 990/1567 | Loss: 1.0781 [2026-04-18 13:57:32] Validation | Batch 1000/1567 | Loss: 1.0785 [2026-04-18 13:57:33] Validation | Batch 1010/1567 | Loss: 1.0765 [2026-04-18 13:57:33] Validation | Batch 1020/1567 | Loss: 1.0767 [2026-04-18 13:57:34] Validation | Batch 1030/1567 | Loss: 1.0781 [2026-04-18 13:57:35] Validation | Batch 1040/1567 | Loss: 1.0778 [2026-04-18 13:57:36] Validation | Batch 1050/1567 | Loss: 1.0790 [2026-04-18 13:57:37] Validation | Batch 1060/1567 | Loss: 1.0781 [2026-04-18 13:57:38] Validation | Batch 1070/1567 | Loss: 1.0772 [2026-04-18 13:57:38] Validation | Batch 1080/1567 | Loss: 1.0782 [2026-04-18 13:57:39] Validation | Batch 1090/1567 | Loss: 1.0781 [2026-04-18 13:57:40] Validation | Batch 1100/1567 | Loss: 1.0786 [2026-04-18 13:57:40] Validation | Batch 1110/1567 | Loss: 1.0783 [2026-04-18 13:57:41] Validation | Batch 1120/1567 | Loss: 1.0784 [2026-04-18 13:57:42] Validation | Batch 1130/1567 | Loss: 1.0784 [2026-04-18 13:57:43] Validation | Batch 1140/1567 | Loss: 1.0789 [2026-04-18 13:57:44] Validation | Batch 1150/1567 | Loss: 1.0794 [2026-04-18 13:57:44] Validation | Batch 1160/1567 | Loss: 1.0803 [2026-04-18 13:57:45] Validation | Batch 1170/1567 | Loss: 1.0799 [2026-04-18 13:57:46] Validation | Batch 1180/1567 | Loss: 1.0795 [2026-04-18 13:57:47] Validation | Batch 1190/1567 | Loss: 1.0807 [2026-04-18 13:57:48] Validation | Batch 1200/1567 | Loss: 1.0799 [2026-04-18 13:57:49] Validation | Batch 1210/1567 | Loss: 1.0789 [2026-04-18 13:57:49] Validation | Batch 1220/1567 | Loss: 1.0792 [2026-04-18 13:57:50] Validation | Batch 1230/1567 | Loss: 1.0814 [2026-04-18 13:57:51] Validation | Batch 1240/1567 | Loss: 1.0803 [2026-04-18 13:57:52] Validation | Batch 1250/1567 | Loss: 1.0802 [2026-04-18 13:57:52] Validation | Batch 1260/1567 | Loss: 1.0812 [2026-04-18 13:57:54] Validation | Batch 1270/1567 | Loss: 1.0812 [2026-04-18 13:57:54] Validation | Batch 1280/1567 | Loss: 1.0804 [2026-04-18 13:57:56] Validation | Batch 1290/1567 | Loss: 1.0808 [2026-04-18 13:57:56] Validation | Batch 1300/1567 | Loss: 1.0810 [2026-04-18 13:57:57] Validation | Batch 1310/1567 | Loss: 1.0813 [2026-04-18 13:57:58] Validation | Batch 1320/1567 | Loss: 1.0804 [2026-04-18 13:57:59] Validation | Batch 1330/1567 | Loss: 1.0800 [2026-04-18 13:57:59] Validation | Batch 1340/1567 | Loss: 1.0798 [2026-04-18 13:58:00] Validation | Batch 1350/1567 | Loss: 1.0806 [2026-04-18 13:58:01] Validation | Batch 1360/1567 | Loss: 1.0801 [2026-04-18 13:58:02] Validation | Batch 1370/1567 | Loss: 1.0805 [2026-04-18 13:58:03] Validation | Batch 1380/1567 | Loss: 1.0818 [2026-04-18 13:58:03] Validation | Batch 1390/1567 | Loss: 1.0819 [2026-04-18 13:58:04] Validation | Batch 1400/1567 | Loss: 1.0823 [2026-04-18 13:58:04] Validation | Batch 1410/1567 | Loss: 1.0820 [2026-04-18 13:58:05] Validation | Batch 1420/1567 | Loss: 1.0826 [2026-04-18 13:58:06] Validation | Batch 1430/1567 | Loss: 1.0824 [2026-04-18 13:58:07] Validation | Batch 1440/1567 | Loss: 1.0826 [2026-04-18 13:58:08] Validation | Batch 1450/1567 | Loss: 1.0819 [2026-04-18 13:58:08] Validation | Batch 1460/1567 | Loss: 1.0817 [2026-04-18 13:58:09] Validation | Batch 1470/1567 | Loss: 1.0808 [2026-04-18 13:58:10] Validation | Batch 1480/1567 | Loss: 1.0791 [2026-04-18 13:58:10] Validation | Batch 1490/1567 | Loss: 1.0791 [2026-04-18 13:58:11] Validation | Batch 1500/1567 | Loss: 1.0792 [2026-04-18 13:58:12] Validation | Batch 1510/1567 | Loss: 1.0790 [2026-04-18 13:58:13] Validation | Batch 1520/1567 | Loss: 1.0783 [2026-04-18 13:58:13] Validation | Batch 1530/1567 | Loss: 1.0791 [2026-04-18 13:58:14] Validation | Batch 1540/1567 | Loss: 1.0801 [2026-04-18 13:58:15] Validation | Batch 1550/1567 | Loss: 1.0804 [2026-04-18 13:58:16] Validation | Batch 1560/1567 | Loss: 1.0794 [2026-04-18 13:58:17] Validation | Batch 1567/1567 | Loss: 1.0799 [2026-04-18 13:58:17] Validation | Loss: 1.0799 | PPL: 2.97 | Time: 125.06s [2026-04-18 13:58:21] Epoch 2 | Step 12010 | Loss: 0.8104 | LR: 1.99e-05 [2026-04-18 13:58:24] Epoch 2 | Step 12020 | Loss: 0.8098 | LR: 1.99e-05 [2026-04-18 13:58:28] Epoch 2 | Step 12030 | Loss: 0.8100 | LR: 1.99e-05 [2026-04-18 13:58:32] Epoch 2 | Step 12040 | Loss: 0.8099 | LR: 1.98e-05 [2026-04-18 13:58:36] Epoch 2 | Step 12050 | Loss: 0.8095 | LR: 1.98e-05 [2026-04-18 13:58:40] Epoch 2 | Step 12060 | Loss: 0.8096 | LR: 1.98e-05 [2026-04-18 13:58:43] Epoch 2 | Step 12070 | Loss: 0.8093 | LR: 1.98e-05 [2026-04-18 13:58:48] Epoch 2 | Step 12080 | Loss: 0.8090 | LR: 1.98e-05 [2026-04-18 13:58:52] Epoch 2 | Step 12090 | Loss: 0.8089 | LR: 1.97e-05 [2026-04-18 13:58:55] Epoch 2 | Step 12100 | Loss: 0.8096 | LR: 1.97e-05 [2026-04-18 13:58:59] Epoch 2 | Step 12110 | Loss: 0.8102 | LR: 1.97e-05 [2026-04-18 13:59:02] Epoch 2 | Step 12120 | Loss: 0.8102 | LR: 1.97e-05 [2026-04-18 13:59:06] Epoch 2 | Step 12130 | Loss: 0.8099 | LR: 1.96e-05 [2026-04-18 13:59:09] Epoch 2 | Step 12140 | Loss: 0.8100 | LR: 1.96e-05 [2026-04-18 13:59:13] Epoch 2 | Step 12150 | Loss: 0.8102 | LR: 1.96e-05 [2026-04-18 13:59:17] Epoch 2 | Step 12160 | Loss: 0.8100 | LR: 1.96e-05 [2026-04-18 13:59:20] Epoch 2 | Step 12170 | Loss: 0.8105 | LR: 1.95e-05 [2026-04-18 13:59:24] Epoch 2 | Step 12180 | Loss: 0.8106 | LR: 1.95e-05 [2026-04-18 13:59:27] Epoch 2 | Step 12190 | Loss: 0.8109 | LR: 1.95e-05 [2026-04-18 13:59:31] Epoch 2 | Step 12200 | Loss: 0.8109 | LR: 1.94e-05 [2026-04-18 13:59:34] Epoch 2 | Step 12210 | Loss: 0.8110 | LR: 1.94e-05 [2026-04-18 13:59:38] Epoch 2 | Step 12220 | Loss: 0.8109 | LR: 1.94e-05 [2026-04-18 13:59:42] Epoch 2 | Step 12230 | Loss: 0.8110 | LR: 1.93e-05 [2026-04-18 13:59:45] Epoch 2 | Step 12240 | Loss: 0.8113 | LR: 1.93e-05 [2026-04-18 13:59:49] Epoch 2 | Step 12250 | Loss: 0.8116 | LR: 1.93e-05 [2026-04-18 13:59:53] Epoch 2 | Step 12260 | Loss: 0.8119 | LR: 1.92e-05 [2026-04-18 13:59:57] Epoch 2 | Step 12270 | Loss: 0.8118 | LR: 1.92e-05 [2026-04-18 14:00:01] Epoch 2 | Step 12280 | Loss: 0.8119 | LR: 1.91e-05 [2026-04-18 14:00:05] Epoch 2 | Step 12290 | Loss: 0.8119 | LR: 1.91e-05 [2026-04-18 14:00:08] Epoch 2 | Step 12300 | Loss: 0.8117 | LR: 1.91e-05 [2026-04-18 14:00:12] Epoch 2 | Step 12310 | Loss: 0.8117 | LR: 1.90e-05 [2026-04-18 14:00:15] Epoch 2 | Step 12320 | Loss: 0.8119 | LR: 1.90e-05 [2026-04-18 14:00:18] Epoch 2 | Step 12330 | Loss: 0.8117 | LR: 1.89e-05 [2026-04-18 14:00:22] Epoch 2 | Step 12340 | Loss: 0.8115 | LR: 1.89e-05 [2026-04-18 14:00:26] Epoch 2 | Step 12350 | Loss: 0.8117 | LR: 1.88e-05 [2026-04-18 14:00:29] Epoch 2 | Step 12360 | Loss: 0.8115 | LR: 1.88e-05 [2026-04-18 14:00:33] Epoch 2 | Step 12370 | Loss: 0.8115 | LR: 1.87e-05 [2026-04-18 14:00:36] Epoch 2 | Step 12380 | Loss: 0.8116 | LR: 1.87e-05 [2026-04-18 14:00:40] Epoch 2 | Step 12390 | Loss: 0.8119 | LR: 1.86e-05 [2026-04-18 14:00:44] Epoch 2 | Step 12400 | Loss: 0.8116 | LR: 1.86e-05 [2026-04-18 14:00:47] Epoch 2 | Step 12410 | Loss: 0.8120 | LR: 1.85e-05 [2026-04-18 14:00:51] Epoch 2 | Step 12420 | Loss: 0.8117 | LR: 1.85e-05 [2026-04-18 14:00:54] Epoch 2 | Step 12430 | Loss: 0.8112 | LR: 1.84e-05 [2026-04-18 14:00:58] Epoch 2 | Step 12440 | Loss: 0.8111 | LR: 1.84e-05 [2026-04-18 14:01:01] Epoch 2 | Step 12450 | Loss: 0.8111 | LR: 1.83e-05 [2026-04-18 14:01:05] Epoch 2 | Step 12460 | Loss: 0.8110 | LR: 1.83e-05 [2026-04-18 14:01:08] Epoch 2 | Step 12470 | Loss: 0.8114 | LR: 1.82e-05 [2026-04-18 14:01:12] Epoch 2 | Step 12480 | Loss: 0.8110 | LR: 1.82e-05 [2026-04-18 14:01:15] Epoch 2 | Step 12490 | Loss: 0.8113 | LR: 1.81e-05 [2026-04-18 14:01:18] Epoch 2 | Step 12500 | Loss: 0.8117 | LR: 1.80e-05 [2026-04-18 14:01:22] Epoch 2 | Step 12510 | Loss: 0.8120 | LR: 1.80e-05 [2026-04-18 14:01:26] Epoch 2 | Step 12520 | Loss: 0.8119 | LR: 1.79e-05 [2026-04-18 14:01:29] Epoch 2 | Step 12530 | Loss: 0.8120 | LR: 1.79e-05 [2026-04-18 14:01:33] Epoch 2 | Step 12540 | Loss: 0.8118 | LR: 1.78e-05 [2026-04-18 14:01:36] Epoch 2 | Step 12550 | Loss: 0.8120 | LR: 1.77e-05 [2026-04-18 14:01:40] Epoch 2 | Step 12560 | Loss: 0.8121 | LR: 1.77e-05 [2026-04-18 14:01:44] Epoch 2 | Step 12570 | Loss: 0.8114 | LR: 1.76e-05 [2026-04-18 14:01:47] Epoch 2 | Step 12580 | Loss: 0.8118 | LR: 1.75e-05 [2026-04-18 14:01:51] Epoch 2 | Step 12590 | Loss: 0.8119 | LR: 1.75e-05 [2026-04-18 14:01:55] Epoch 2 | Step 12600 | Loss: 0.8117 | LR: 1.74e-05 [2026-04-18 14:01:59] Epoch 2 | Step 12610 | Loss: 0.8118 | LR: 1.73e-05 [2026-04-18 14:02:03] Epoch 2 | Step 12620 | Loss: 0.8114 | LR: 1.73e-05 [2026-04-18 14:02:06] Epoch 2 | Step 12630 | Loss: 0.8112 | LR: 1.72e-05 [2026-04-18 14:02:09] Epoch 2 | Step 12640 | Loss: 0.8112 | LR: 1.71e-05 [2026-04-18 14:02:13] Epoch 2 | Step 12650 | Loss: 0.8111 | LR: 1.71e-05 [2026-04-18 14:02:16] Epoch 2 | Step 12660 | Loss: 0.8110 | LR: 1.70e-05 [2026-04-18 14:02:20] Epoch 2 | Step 12670 | Loss: 0.8108 | LR: 1.69e-05 [2026-04-18 14:02:25] Epoch 2 | Step 12680 | Loss: 0.8110 | LR: 1.68e-05 [2026-04-18 14:02:28] Epoch 2 | Step 12690 | Loss: 0.8109 | LR: 1.68e-05 [2026-04-18 14:02:32] Epoch 2 | Step 12700 | Loss: 0.8109 | LR: 1.67e-05 [2026-04-18 14:02:35] Epoch 2 | Step 12710 | Loss: 0.8109 | LR: 1.66e-05 [2026-04-18 14:02:39] Epoch 2 | Step 12720 | Loss: 0.8107 | LR: 1.66e-05 [2026-04-18 14:02:42] Epoch 2 | Step 12730 | Loss: 0.8105 | LR: 1.65e-05 [2026-04-18 14:02:46] Epoch 2 | Step 12740 | Loss: 0.8101 | LR: 1.64e-05 [2026-04-18 14:02:49] Epoch 2 | Step 12750 | Loss: 0.8102 | LR: 1.63e-05 [2026-04-18 14:02:53] Epoch 2 | Step 12760 | Loss: 0.8102 | LR: 1.62e-05 [2026-04-18 14:02:57] Epoch 2 | Step 12770 | Loss: 0.8098 | LR: 1.62e-05 [2026-04-18 14:03:00] Epoch 2 | Step 12780 | Loss: 0.8098 | LR: 1.61e-05 [2026-04-18 14:03:04] Epoch 2 | Step 12790 | Loss: 0.8096 | LR: 1.60e-05 [2026-04-18 14:03:07] Epoch 2 | Step 12800 | Loss: 0.8096 | LR: 1.59e-05 [2026-04-18 14:03:10] Epoch 2 | Step 12810 | Loss: 0.8094 | LR: 1.59e-05 [2026-04-18 14:03:14] Epoch 2 | Step 12820 | Loss: 0.8097 | LR: 1.58e-05 [2026-04-18 14:03:17] Epoch 2 | Step 12830 | Loss: 0.8096 | LR: 1.57e-05 [2026-04-18 14:03:21] Epoch 2 | Step 12840 | Loss: 0.8096 | LR: 1.56e-05 [2026-04-18 14:03:24] Epoch 2 | Step 12850 | Loss: 0.8092 | LR: 1.55e-05 [2026-04-18 14:03:28] Epoch 2 | Step 12860 | Loss: 0.8094 | LR: 1.54e-05 [2026-04-18 14:03:32] Epoch 2 | Step 12870 | Loss: 0.8090 | LR: 1.54e-05 [2026-04-18 14:03:35] Epoch 2 | Step 12880 | Loss: 0.8092 | LR: 1.53e-05 [2026-04-18 14:03:39] Epoch 2 | Step 12890 | Loss: 0.8086 | LR: 1.52e-05 [2026-04-18 14:03:43] Epoch 2 | Step 12900 | Loss: 0.8086 | LR: 1.51e-05 [2026-04-18 14:03:46] Epoch 2 | Step 12910 | Loss: 0.8085 | LR: 1.50e-05 [2026-04-18 14:03:50] Epoch 2 | Step 12920 | Loss: 0.8085 | LR: 1.49e-05 [2026-04-18 14:03:53] Epoch 2 | Step 12930 | Loss: 0.8078 | LR: 1.49e-05 [2026-04-18 14:03:57] Epoch 2 | Step 12940 | Loss: 0.8077 | LR: 1.48e-05 [2026-04-18 14:04:01] Epoch 2 | Step 12950 | Loss: 0.8076 | LR: 1.47e-05 [2026-04-18 14:04:04] Epoch 2 | Step 12960 | Loss: 0.8072 | LR: 1.46e-05 [2026-04-18 14:04:08] Epoch 2 | Step 12970 | Loss: 0.8073 | LR: 1.45e-05 [2026-04-18 14:04:12] Epoch 2 | Step 12980 | Loss: 0.8075 | LR: 1.44e-05 [2026-04-18 14:04:15] Epoch 2 | Step 12990 | Loss: 0.8071 | LR: 1.43e-05 [2026-04-18 14:04:19] Epoch 2 | Step 13000 | Loss: 0.8072 | LR: 1.42e-05 [2026-04-18 14:04:20] Validation | Batch 10/1567 | Loss: 0.9323 [2026-04-18 14:04:21] Validation | Batch 20/1567 | Loss: 1.0094 [2026-04-18 14:04:22] Validation | Batch 30/1567 | Loss: 1.0510 [2026-04-18 14:04:23] Validation | Batch 40/1567 | Loss: 1.0750 [2026-04-18 14:04:23] Validation | Batch 50/1567 | Loss: 1.0497 [2026-04-18 14:04:24] Validation | Batch 60/1567 | Loss: 1.0382 [2026-04-18 14:04:25] Validation | Batch 70/1567 | Loss: 1.0223 [2026-04-18 14:04:26] Validation | Batch 80/1567 | Loss: 1.0386 [2026-04-18 14:04:27] Validation | Batch 90/1567 | Loss: 1.0465 [2026-04-18 14:04:28] Validation | Batch 100/1567 | Loss: 1.0553 [2026-04-18 14:04:28] Validation | Batch 110/1567 | Loss: 1.0477 [2026-04-18 14:04:29] Validation | Batch 120/1567 | Loss: 1.0587 [2026-04-18 14:04:30] Validation | Batch 130/1567 | Loss: 1.0602 [2026-04-18 14:04:31] Validation | Batch 140/1567 | Loss: 1.0636 [2026-04-18 14:04:32] Validation | Batch 150/1567 | Loss: 1.0712 [2026-04-18 14:04:32] Validation | Batch 160/1567 | Loss: 1.0725 [2026-04-18 14:04:33] Validation | Batch 170/1567 | Loss: 1.0576 [2026-04-18 14:04:34] Validation | Batch 180/1567 | Loss: 1.0595 [2026-04-18 14:04:35] Validation | Batch 190/1567 | Loss: 1.0554 [2026-04-18 14:04:36] Validation | Batch 200/1567 | Loss: 1.0584 [2026-04-18 14:04:37] Validation | Batch 210/1567 | Loss: 1.0594 [2026-04-18 14:04:37] Validation | Batch 220/1567 | Loss: 1.0617 [2026-04-18 14:04:39] Validation | Batch 230/1567 | Loss: 1.0653 [2026-04-18 14:04:39] Validation | Batch 240/1567 | Loss: 1.0637 [2026-04-18 14:04:40] Validation | Batch 250/1567 | Loss: 1.0576 [2026-04-18 14:04:41] Validation | Batch 260/1567 | Loss: 1.0532 [2026-04-18 14:04:41] Validation | Batch 270/1567 | Loss: 1.0504 [2026-04-18 14:04:42] Validation | Batch 280/1567 | Loss: 1.0520 [2026-04-18 14:04:43] Validation | Batch 290/1567 | Loss: 1.0574 [2026-04-18 14:04:44] Validation | Batch 300/1567 | Loss: 1.0624 [2026-04-18 14:04:45] Validation | Batch 310/1567 | Loss: 1.0618 [2026-04-18 14:04:45] Validation | Batch 320/1567 | Loss: 1.0622 [2026-04-18 14:04:47] Validation | Batch 330/1567 | Loss: 1.0594 [2026-04-18 14:04:47] Validation | Batch 340/1567 | Loss: 1.0632 [2026-04-18 14:04:48] Validation | Batch 350/1567 | Loss: 1.0619 [2026-04-18 14:04:49] Validation | Batch 360/1567 | Loss: 1.0598 [2026-04-18 14:04:50] Validation | Batch 370/1567 | Loss: 1.0570 [2026-04-18 14:04:50] Validation | Batch 380/1567 | Loss: 1.0602 [2026-04-18 14:04:51] Validation | Batch 390/1567 | Loss: 1.0613 [2026-04-18 14:04:52] Validation | Batch 400/1567 | Loss: 1.0626 [2026-04-18 14:04:53] Validation | Batch 410/1567 | Loss: 1.0617 [2026-04-18 14:04:54] Validation | Batch 420/1567 | Loss: 1.0613 [2026-04-18 14:04:55] Validation | Batch 430/1567 | Loss: 1.0615 [2026-04-18 14:04:55] Validation | Batch 440/1567 | Loss: 1.0603 [2026-04-18 14:04:56] Validation | Batch 450/1567 | Loss: 1.0603 [2026-04-18 14:04:57] Validation | Batch 460/1567 | Loss: 1.0595 [2026-04-18 14:04:58] Validation | Batch 470/1567 | Loss: 1.0584 [2026-04-18 14:04:59] Validation | Batch 480/1567 | Loss: 1.0560 [2026-04-18 14:04:59] Validation | Batch 490/1567 | Loss: 1.0558 [2026-04-18 14:05:00] Validation | Batch 500/1567 | Loss: 1.0556 [2026-04-18 14:05:01] Validation | Batch 510/1567 | Loss: 1.0579 [2026-04-18 14:05:02] Validation | Batch 520/1567 | Loss: 1.0599 [2026-04-18 14:05:03] Validation | Batch 530/1567 | Loss: 1.0597 [2026-04-18 14:05:04] Validation | Batch 540/1567 | Loss: 1.0623 [2026-04-18 14:05:04] Validation | Batch 550/1567 | Loss: 1.0659 [2026-04-18 14:05:05] Validation | Batch 560/1567 | Loss: 1.0657 [2026-04-18 14:05:06] Validation | Batch 570/1567 | Loss: 1.0657 [2026-04-18 14:05:07] Validation | Batch 580/1567 | Loss: 1.0647 [2026-04-18 14:05:08] Validation | Batch 590/1567 | Loss: 1.0633 [2026-04-18 14:05:09] Validation | Batch 600/1567 | Loss: 1.0618 [2026-04-18 14:05:10] Validation | Batch 610/1567 | Loss: 1.0608 [2026-04-18 14:05:11] Validation | Batch 620/1567 | Loss: 1.0624 [2026-04-18 14:05:11] Validation | Batch 630/1567 | Loss: 1.0605 [2026-04-18 14:05:12] Validation | Batch 640/1567 | Loss: 1.0620 [2026-04-18 14:05:13] Validation | Batch 650/1567 | Loss: 1.0612 [2026-04-18 14:05:14] Validation | Batch 660/1567 | Loss: 1.0600 [2026-04-18 14:05:14] Validation | Batch 670/1567 | Loss: 1.0580 [2026-04-18 14:05:15] Validation | Batch 680/1567 | Loss: 1.0573 [2026-04-18 14:05:16] Validation | Batch 690/1567 | Loss: 1.0581 [2026-04-18 14:05:17] Validation | Batch 700/1567 | Loss: 1.0568 [2026-04-18 14:05:18] Validation | Batch 710/1567 | Loss: 1.0581 [2026-04-18 14:05:19] Validation | Batch 720/1567 | Loss: 1.0572 [2026-04-18 14:05:19] Validation | Batch 730/1567 | Loss: 1.0580 [2026-04-18 14:05:20] Validation | Batch 740/1567 | Loss: 1.0591 [2026-04-18 14:05:21] Validation | Batch 750/1567 | Loss: 1.0597 [2026-04-18 14:05:22] Validation | Batch 760/1567 | Loss: 1.0595 [2026-04-18 14:05:23] Validation | Batch 770/1567 | Loss: 1.0615 [2026-04-18 14:05:24] Validation | Batch 780/1567 | Loss: 1.0628 [2026-04-18 14:05:24] Validation | Batch 790/1567 | Loss: 1.0624 [2026-04-18 14:05:25] Validation | Batch 800/1567 | Loss: 1.0643 [2026-04-18 14:05:26] Validation | Batch 810/1567 | Loss: 1.0642 [2026-04-18 14:05:27] Validation | Batch 820/1567 | Loss: 1.0638 [2026-04-18 14:05:27] Validation | Batch 830/1567 | Loss: 1.0622 [2026-04-18 14:05:28] Validation | Batch 840/1567 | Loss: 1.0622 [2026-04-18 14:05:29] Validation | Batch 850/1567 | Loss: 1.0608 [2026-04-18 14:05:29] Validation | Batch 860/1567 | Loss: 1.0624 [2026-04-18 14:05:30] Validation | Batch 870/1567 | Loss: 1.0629 [2026-04-18 14:05:31] Validation | Batch 880/1567 | Loss: 1.0639 [2026-04-18 14:05:32] Validation | Batch 890/1567 | Loss: 1.0646 [2026-04-18 14:05:33] Validation | Batch 900/1567 | Loss: 1.0665 [2026-04-18 14:05:33] Validation | Batch 910/1567 | Loss: 1.0665 [2026-04-18 14:05:34] Validation | Batch 920/1567 | Loss: 1.0687 [2026-04-18 14:05:35] Validation | Batch 930/1567 | Loss: 1.0664 [2026-04-18 14:05:35] Validation | Batch 940/1567 | Loss: 1.0661 [2026-04-18 14:05:36] Validation | Batch 950/1567 | Loss: 1.0650 [2026-04-18 14:05:37] Validation | Batch 960/1567 | Loss: 1.0636 [2026-04-18 14:05:38] Validation | Batch 970/1567 | Loss: 1.0653 [2026-04-18 14:05:38] Validation | Batch 980/1567 | Loss: 1.0656 [2026-04-18 14:05:39] Validation | Batch 990/1567 | Loss: 1.0650 [2026-04-18 14:05:40] Validation | Batch 1000/1567 | Loss: 1.0652 [2026-04-18 14:05:40] Validation | Batch 1010/1567 | Loss: 1.0630 [2026-04-18 14:05:41] Validation | Batch 1020/1567 | Loss: 1.0633 [2026-04-18 14:05:42] Validation | Batch 1030/1567 | Loss: 1.0648 [2026-04-18 14:05:43] Validation | Batch 1040/1567 | Loss: 1.0644 [2026-04-18 14:05:44] Validation | Batch 1050/1567 | Loss: 1.0655 [2026-04-18 14:05:45] Validation | Batch 1060/1567 | Loss: 1.0646 [2026-04-18 14:05:46] Validation | Batch 1070/1567 | Loss: 1.0638 [2026-04-18 14:05:46] Validation | Batch 1080/1567 | Loss: 1.0648 [2026-04-18 14:05:47] Validation | Batch 1090/1567 | Loss: 1.0646 [2026-04-18 14:05:48] Validation | Batch 1100/1567 | Loss: 1.0651 [2026-04-18 14:05:48] Validation | Batch 1110/1567 | Loss: 1.0649 [2026-04-18 14:05:49] Validation | Batch 1120/1567 | Loss: 1.0652 [2026-04-18 14:05:50] Validation | Batch 1130/1567 | Loss: 1.0653 [2026-04-18 14:05:51] Validation | Batch 1140/1567 | Loss: 1.0661 [2026-04-18 14:05:52] Validation | Batch 1150/1567 | Loss: 1.0665 [2026-04-18 14:05:52] Validation | Batch 1160/1567 | Loss: 1.0674 [2026-04-18 14:05:53] Validation | Batch 1170/1567 | Loss: 1.0671 [2026-04-18 14:05:54] Validation | Batch 1180/1567 | Loss: 1.0667 [2026-04-18 14:05:55] Validation | Batch 1190/1567 | Loss: 1.0678 [2026-04-18 14:05:56] Validation | Batch 1200/1567 | Loss: 1.0672 [2026-04-18 14:05:56] Validation | Batch 1210/1567 | Loss: 1.0661 [2026-04-18 14:05:57] Validation | Batch 1220/1567 | Loss: 1.0665 [2026-04-18 14:05:58] Validation | Batch 1230/1567 | Loss: 1.0685 [2026-04-18 14:05:59] Validation | Batch 1240/1567 | Loss: 1.0674 [2026-04-18 14:05:59] Validation | Batch 1250/1567 | Loss: 1.0675 [2026-04-18 14:06:00] Validation | Batch 1260/1567 | Loss: 1.0685 [2026-04-18 14:06:01] Validation | Batch 1270/1567 | Loss: 1.0685 [2026-04-18 14:06:02] Validation | Batch 1280/1567 | Loss: 1.0679 [2026-04-18 14:06:03] Validation | Batch 1290/1567 | Loss: 1.0681 [2026-04-18 14:06:04] Validation | Batch 1300/1567 | Loss: 1.0684 [2026-04-18 14:06:05] Validation | Batch 1310/1567 | Loss: 1.0687 [2026-04-18 14:06:06] Validation | Batch 1320/1567 | Loss: 1.0677 [2026-04-18 14:06:06] Validation | Batch 1330/1567 | Loss: 1.0673 [2026-04-18 14:06:07] Validation | Batch 1340/1567 | Loss: 1.0671 [2026-04-18 14:06:08] Validation | Batch 1350/1567 | Loss: 1.0679 [2026-04-18 14:06:09] Validation | Batch 1360/1567 | Loss: 1.0676 [2026-04-18 14:06:09] Validation | Batch 1370/1567 | Loss: 1.0680 [2026-04-18 14:06:10] Validation | Batch 1380/1567 | Loss: 1.0692 [2026-04-18 14:06:11] Validation | Batch 1390/1567 | Loss: 1.0693 [2026-04-18 14:06:12] Validation | Batch 1400/1567 | Loss: 1.0697 [2026-04-18 14:06:12] Validation | Batch 1410/1567 | Loss: 1.0695 [2026-04-18 14:06:13] Validation | Batch 1420/1567 | Loss: 1.0700 [2026-04-18 14:06:14] Validation | Batch 1430/1567 | Loss: 1.0697 [2026-04-18 14:06:14] Validation | Batch 1440/1567 | Loss: 1.0700 [2026-04-18 14:06:15] Validation | Batch 1450/1567 | Loss: 1.0693 [2026-04-18 14:06:16] Validation | Batch 1460/1567 | Loss: 1.0691 [2026-04-18 14:06:17] Validation | Batch 1470/1567 | Loss: 1.0682 [2026-04-18 14:06:17] Validation | Batch 1480/1567 | Loss: 1.0665 [2026-04-18 14:06:18] Validation | Batch 1490/1567 | Loss: 1.0666 [2026-04-18 14:06:18] Validation | Batch 1500/1567 | Loss: 1.0667 [2026-04-18 14:06:19] Validation | Batch 1510/1567 | Loss: 1.0665 [2026-04-18 14:06:19] Validation | Batch 1520/1567 | Loss: 1.0658 [2026-04-18 14:06:20] Validation | Batch 1530/1567 | Loss: 1.0667 [2026-04-18 14:06:21] Validation | Batch 1540/1567 | Loss: 1.0677 [2026-04-18 14:06:22] Validation | Batch 1550/1567 | Loss: 1.0679 [2026-04-18 14:06:23] Validation | Batch 1560/1567 | Loss: 1.0670 [2026-04-18 14:06:24] Validation | Batch 1567/1567 | Loss: 1.0673 [2026-04-18 14:06:24] Validation | Loss: 1.0673 | PPL: 2.93 | Time: 124.64s [2026-04-18 14:06:27] Epoch 2 | Step 13010 | Loss: 0.8072 | LR: 1.42e-05 [2026-04-18 14:06:30] Epoch 2 | Step 13020 | Loss: 0.8071 | LR: 1.41e-05 [2026-04-18 14:06:34] Epoch 2 | Step 13030 | Loss: 0.8072 | LR: 1.40e-05 [2026-04-18 14:06:37] Epoch 2 | Step 13040 | Loss: 0.8071 | LR: 1.39e-05 [2026-04-18 14:06:41] Epoch 2 | Step 13050 | Loss: 0.8070 | LR: 1.38e-05 [2026-04-18 14:06:44] Epoch 2 | Step 13060 | Loss: 0.8070 | LR: 1.37e-05 [2026-04-18 14:06:48] Epoch 2 | Step 13070 | Loss: 0.8069 | LR: 1.36e-05 [2026-04-18 14:06:52] Epoch 2 | Step 13080 | Loss: 0.8064 | LR: 1.35e-05 [2026-04-18 14:06:55] Epoch 2 | Step 13090 | Loss: 0.8062 | LR: 1.34e-05 [2026-04-18 14:06:59] Epoch 2 | Step 13100 | Loss: 0.8062 | LR: 1.33e-05 [2026-04-18 14:07:02] Epoch 2 | Step 13110 | Loss: 0.8060 | LR: 1.32e-05 [2026-04-18 14:07:06] Epoch 2 | Step 13120 | Loss: 0.8062 | LR: 1.32e-05 [2026-04-18 14:07:09] Epoch 2 | Step 13130 | Loss: 0.8064 | LR: 1.31e-05 [2026-04-18 14:07:12] Epoch 2 | Step 13140 | Loss: 0.8064 | LR: 1.30e-05 [2026-04-18 14:07:16] Epoch 2 | Step 13150 | Loss: 0.8068 | LR: 1.29e-05 [2026-04-18 14:07:19] Epoch 2 | Step 13160 | Loss: 0.8068 | LR: 1.28e-05 [2026-04-18 14:07:24] Epoch 2 | Step 13170 | Loss: 0.8068 | LR: 1.27e-05 [2026-04-18 14:07:28] Epoch 2 | Step 13180 | Loss: 0.8069 | LR: 1.26e-05 [2026-04-18 14:07:31] Epoch 2 | Step 13190 | Loss: 0.8069 | LR: 1.25e-05 [2026-04-18 14:07:34] Epoch 2 | Step 13200 | Loss: 0.8069 | LR: 1.24e-05 [2026-04-18 14:07:38] Epoch 2 | Step 13210 | Loss: 0.8071 | LR: 1.23e-05 [2026-04-18 14:07:42] Epoch 2 | Step 13220 | Loss: 0.8072 | LR: 1.22e-05 [2026-04-18 14:07:45] Epoch 2 | Step 13230 | Loss: 0.8072 | LR: 1.21e-05 [2026-04-18 14:07:49] Epoch 2 | Step 13240 | Loss: 0.8069 | LR: 1.20e-05 [2026-04-18 14:07:52] Epoch 2 | Step 13250 | Loss: 0.8068 | LR: 1.19e-05 [2026-04-18 14:07:56] Epoch 2 | Step 13260 | Loss: 0.8067 | LR: 1.18e-05 [2026-04-18 14:07:59] Epoch 2 | Step 13270 | Loss: 0.8067 | LR: 1.17e-05 [2026-04-18 14:08:03] Epoch 2 | Step 13280 | Loss: 0.8066 | LR: 1.16e-05 [2026-04-18 14:08:07] Epoch 2 | Step 13290 | Loss: 0.8062 | LR: 1.16e-05 [2026-04-18 14:08:10] Epoch 2 | Step 13300 | Loss: 0.8061 | LR: 1.15e-05 [2026-04-18 14:08:14] Epoch 2 | Step 13310 | Loss: 0.8063 | LR: 1.14e-05 [2026-04-18 14:08:18] Epoch 2 | Step 13320 | Loss: 0.8064 | LR: 1.13e-05 [2026-04-18 14:08:22] Epoch 2 | Step 13330 | Loss: 0.8063 | LR: 1.12e-05 [2026-04-18 14:08:25] Epoch 2 | Step 13340 | Loss: 0.8063 | LR: 1.11e-05 [2026-04-18 14:08:28] Epoch 2 | Step 13350 | Loss: 0.8062 | LR: 1.10e-05 [2026-04-18 14:08:32] Epoch 2 | Step 13360 | Loss: 0.8062 | LR: 1.09e-05 [2026-04-18 14:08:35] Epoch 2 | Step 13370 | Loss: 0.8061 | LR: 1.08e-05 [2026-04-18 14:08:39] Epoch 2 | Step 13380 | Loss: 0.8059 | LR: 1.07e-05 [2026-04-18 14:08:43] Epoch 2 | Step 13390 | Loss: 0.8062 | LR: 1.06e-05 [2026-04-18 14:08:47] Epoch 2 | Step 13400 | Loss: 0.8066 | LR: 1.05e-05 [2026-04-18 14:08:50] Epoch 2 | Step 13410 | Loss: 0.8066 | LR: 1.04e-05 [2026-04-18 14:08:54] Epoch 2 | Step 13420 | Loss: 0.8063 | LR: 1.03e-05 [2026-04-18 14:08:57] Epoch 2 | Step 13430 | Loss: 0.8066 | LR: 1.02e-05 [2026-04-18 14:09:01] Epoch 2 | Step 13440 | Loss: 0.8065 | LR: 1.01e-05 [2026-04-18 14:09:05] Epoch 2 | Step 13450 | Loss: 0.8066 | LR: 1.00e-05 [2026-04-18 14:09:08] Epoch 2 | Step 13460 | Loss: 0.8067 | LR: 9.93e-06 [2026-04-18 14:09:11] Epoch 2 | Step 13470 | Loss: 0.8068 | LR: 9.84e-06 [2026-04-18 14:09:15] Epoch 2 | Step 13480 | Loss: 0.8067 | LR: 9.75e-06 [2026-04-18 14:09:18] Epoch 2 | Step 13490 | Loss: 0.8067 | LR: 9.65e-06 [2026-04-18 14:09:22] Epoch 2 | Step 13500 | Loss: 0.8064 | LR: 9.56e-06 [2026-04-18 14:09:26] Epoch 2 | Step 13510 | Loss: 0.8065 | LR: 9.46e-06 [2026-04-18 14:09:29] Epoch 2 | Step 13520 | Loss: 0.8063 | LR: 9.37e-06 [2026-04-18 14:09:34] Epoch 2 | Step 13530 | Loss: 0.8060 | LR: 9.28e-06 [2026-04-18 14:09:37] Epoch 2 | Step 13540 | Loss: 0.8061 | LR: 9.18e-06 [2026-04-18 14:09:41] Epoch 2 | Step 13550 | Loss: 0.8058 | LR: 9.09e-06 [2026-04-18 14:09:44] Epoch 2 | Step 13560 | Loss: 0.8059 | LR: 9.00e-06 [2026-04-18 14:09:48] Epoch 2 | Step 13570 | Loss: 0.8062 | LR: 8.90e-06 [2026-04-18 14:09:51] Epoch 2 | Step 13580 | Loss: 0.8062 | LR: 8.81e-06 [2026-04-18 14:09:55] Epoch 2 | Step 13590 | Loss: 0.8061 | LR: 8.72e-06 [2026-04-18 14:09:58] Epoch 2 | Step 13600 | Loss: 0.8057 | LR: 8.63e-06 [2026-04-18 14:10:02] Epoch 2 | Step 13610 | Loss: 0.8058 | LR: 8.53e-06 [2026-04-18 14:10:06] Epoch 2 | Step 13620 | Loss: 0.8059 | LR: 8.44e-06 [2026-04-18 14:10:10] Epoch 2 | Step 13630 | Loss: 0.8054 | LR: 8.35e-06 [2026-04-18 14:10:14] Epoch 2 | Step 13640 | Loss: 0.8052 | LR: 8.26e-06 [2026-04-18 14:10:17] Epoch 2 | Step 13650 | Loss: 0.8057 | LR: 8.17e-06 [2026-04-18 14:10:21] Epoch 2 | Step 13660 | Loss: 0.8056 | LR: 8.08e-06 [2026-04-18 14:10:24] Epoch 2 | Step 13670 | Loss: 0.8057 | LR: 7.99e-06 [2026-04-18 14:10:28] Epoch 2 | Step 13680 | Loss: 0.8058 | LR: 7.90e-06 [2026-04-18 14:10:32] Epoch 2 | Step 13690 | Loss: 0.8057 | LR: 7.81e-06 [2026-04-18 14:10:35] Epoch 2 | Step 13700 | Loss: 0.8056 | LR: 7.72e-06 [2026-04-18 14:10:39] Epoch 2 | Step 13710 | Loss: 0.8058 | LR: 7.63e-06 [2026-04-18 14:10:42] Epoch 2 | Step 13720 | Loss: 0.8056 | LR: 7.54e-06 [2026-04-18 14:10:46] Epoch 2 | Step 13730 | Loss: 0.8057 | LR: 7.46e-06 [2026-04-18 14:10:50] Epoch 2 | Step 13740 | Loss: 0.8057 | LR: 7.37e-06 [2026-04-18 14:10:54] Epoch 2 | Step 13750 | Loss: 0.8059 | LR: 7.28e-06 [2026-04-18 14:10:58] Epoch 2 | Step 13760 | Loss: 0.8058 | LR: 7.20e-06 [2026-04-18 14:11:01] Epoch 2 | Step 13770 | Loss: 0.8059 | LR: 7.11e-06 [2026-04-18 14:11:05] Epoch 2 | Step 13780 | Loss: 0.8056 | LR: 7.02e-06 [2026-04-18 14:11:08] Epoch 2 | Step 13790 | Loss: 0.8055 | LR: 6.94e-06 [2026-04-18 14:11:12] Epoch 2 | Step 13800 | Loss: 0.8054 | LR: 6.85e-06 [2026-04-18 14:11:15] Epoch 2 | Step 13810 | Loss: 0.8054 | LR: 6.77e-06 [2026-04-18 14:11:19] Epoch 2 | Step 13820 | Loss: 0.8055 | LR: 6.69e-06 [2026-04-18 14:11:23] Epoch 2 | Step 13830 | Loss: 0.8057 | LR: 6.60e-06 [2026-04-18 14:11:26] Epoch 2 | Step 13840 | Loss: 0.8056 | LR: 6.52e-06 [2026-04-18 14:11:30] Epoch 2 | Step 13850 | Loss: 0.8055 | LR: 6.44e-06 [2026-04-18 14:11:33] Epoch 2 | Step 13860 | Loss: 0.8054 | LR: 6.35e-06 [2026-04-18 14:11:37] Epoch 2 | Step 13870 | Loss: 0.8055 | LR: 6.27e-06 [2026-04-18 14:11:41] Epoch 2 | Step 13880 | Loss: 0.8054 | LR: 6.19e-06 [2026-04-18 14:11:45] Epoch 2 | Step 13890 | Loss: 0.8055 | LR: 6.11e-06 [2026-04-18 14:11:49] Epoch 2 | Step 13900 | Loss: 0.8052 | LR: 6.03e-06 [2026-04-18 14:11:52] Epoch 2 | Step 13910 | Loss: 0.8051 | LR: 5.95e-06 [2026-04-18 14:11:56] Epoch 2 | Step 13920 | Loss: 0.8051 | LR: 5.87e-06 [2026-04-18 14:12:00] Epoch 2 | Step 13930 | Loss: 0.8048 | LR: 5.80e-06 [2026-04-18 14:12:03] Epoch 2 | Step 13940 | Loss: 0.8047 | LR: 5.72e-06 [2026-04-18 14:12:07] Epoch 2 | Step 13950 | Loss: 0.8048 | LR: 5.64e-06 [2026-04-18 14:12:11] Epoch 2 | Step 13960 | Loss: 0.8046 | LR: 5.57e-06 [2026-04-18 14:12:14] Epoch 2 | Step 13970 | Loss: 0.8045 | LR: 5.49e-06 [2026-04-18 14:12:18] Epoch 2 | Step 13980 | Loss: 0.8042 | LR: 5.42e-06 [2026-04-18 14:12:21] Epoch 2 | Step 13990 | Loss: 0.8041 | LR: 5.34e-06 [2026-04-18 14:12:25] Epoch 2 | Step 14000 | Loss: 0.8043 | LR: 5.27e-06 [2026-04-18 14:12:26] Validation | Batch 10/1567 | Loss: 0.9344 [2026-04-18 14:12:26] Validation | Batch 20/1567 | Loss: 0.9994 [2026-04-18 14:12:27] Validation | Batch 30/1567 | Loss: 1.0395 [2026-04-18 14:12:28] Validation | Batch 40/1567 | Loss: 1.0613 [2026-04-18 14:12:29] Validation | Batch 50/1567 | Loss: 1.0370 [2026-04-18 14:12:30] Validation | Batch 60/1567 | Loss: 1.0238 [2026-04-18 14:12:31] Validation | Batch 70/1567 | Loss: 1.0091 [2026-04-18 14:12:32] Validation | Batch 80/1567 | Loss: 1.0269 [2026-04-18 14:12:33] Validation | Batch 90/1567 | Loss: 1.0353 [2026-04-18 14:12:34] Validation | Batch 100/1567 | Loss: 1.0446 [2026-04-18 14:12:34] Validation | Batch 110/1567 | Loss: 1.0368 [2026-04-18 14:12:35] Validation | Batch 120/1567 | Loss: 1.0476 [2026-04-18 14:12:36] Validation | Batch 130/1567 | Loss: 1.0489 [2026-04-18 14:12:37] Validation | Batch 140/1567 | Loss: 1.0511 [2026-04-18 14:12:38] Validation | Batch 150/1567 | Loss: 1.0589 [2026-04-18 14:12:38] Validation | Batch 160/1567 | Loss: 1.0602 [2026-04-18 14:12:39] Validation | Batch 170/1567 | Loss: 1.0454 [2026-04-18 14:12:40] Validation | Batch 180/1567 | Loss: 1.0474 [2026-04-18 14:12:41] Validation | Batch 190/1567 | Loss: 1.0439 [2026-04-18 14:12:42] Validation | Batch 200/1567 | Loss: 1.0470 [2026-04-18 14:12:43] Validation | Batch 210/1567 | Loss: 1.0480 [2026-04-18 14:12:43] Validation | Batch 220/1567 | Loss: 1.0504 [2026-04-18 14:12:44] Validation | Batch 230/1567 | Loss: 1.0543 [2026-04-18 14:12:45] Validation | Batch 240/1567 | Loss: 1.0527 [2026-04-18 14:12:46] Validation | Batch 250/1567 | Loss: 1.0466 [2026-04-18 14:12:47] Validation | Batch 260/1567 | Loss: 1.0420 [2026-04-18 14:12:47] Validation | Batch 270/1567 | Loss: 1.0389 [2026-04-18 14:12:48] Validation | Batch 280/1567 | Loss: 1.0402 [2026-04-18 14:12:49] Validation | Batch 290/1567 | Loss: 1.0455 [2026-04-18 14:12:50] Validation | Batch 300/1567 | Loss: 1.0504 [2026-04-18 14:12:51] Validation | Batch 310/1567 | Loss: 1.0493 [2026-04-18 14:12:51] Validation | Batch 320/1567 | Loss: 1.0498 [2026-04-18 14:12:53] Validation | Batch 330/1567 | Loss: 1.0469 [2026-04-18 14:12:53] Validation | Batch 340/1567 | Loss: 1.0508 [2026-04-18 14:12:54] Validation | Batch 350/1567 | Loss: 1.0500 [2026-04-18 14:12:55] Validation | Batch 360/1567 | Loss: 1.0478 [2026-04-18 14:12:56] Validation | Batch 370/1567 | Loss: 1.0450 [2026-04-18 14:12:56] Validation | Batch 380/1567 | Loss: 1.0483 [2026-04-18 14:12:57] Validation | Batch 390/1567 | Loss: 1.0494 [2026-04-18 14:12:58] Validation | Batch 400/1567 | Loss: 1.0507 [2026-04-18 14:12:59] Validation | Batch 410/1567 | Loss: 1.0501 [2026-04-18 14:13:00] Validation | Batch 420/1567 | Loss: 1.0495 [2026-04-18 14:13:01] Validation | Batch 430/1567 | Loss: 1.0496 [2026-04-18 14:13:02] Validation | Batch 440/1567 | Loss: 1.0484 [2026-04-18 14:13:02] Validation | Batch 450/1567 | Loss: 1.0486 [2026-04-18 14:13:03] Validation | Batch 460/1567 | Loss: 1.0475 [2026-04-18 14:13:04] Validation | Batch 470/1567 | Loss: 1.0468 [2026-04-18 14:13:05] Validation | Batch 480/1567 | Loss: 1.0447 [2026-04-18 14:13:05] Validation | Batch 490/1567 | Loss: 1.0446 [2026-04-18 14:13:06] Validation | Batch 500/1567 | Loss: 1.0443 [2026-04-18 14:13:07] Validation | Batch 510/1567 | Loss: 1.0465 [2026-04-18 14:13:08] Validation | Batch 520/1567 | Loss: 1.0481 [2026-04-18 14:13:09] Validation | Batch 530/1567 | Loss: 1.0478 [2026-04-18 14:13:10] Validation | Batch 540/1567 | Loss: 1.0505 [2026-04-18 14:13:10] Validation | Batch 550/1567 | Loss: 1.0540 [2026-04-18 14:13:11] Validation | Batch 560/1567 | Loss: 1.0538 [2026-04-18 14:13:12] Validation | Batch 570/1567 | Loss: 1.0537 [2026-04-18 14:13:13] Validation | Batch 580/1567 | Loss: 1.0528 [2026-04-18 14:13:14] Validation | Batch 590/1567 | Loss: 1.0514 [2026-04-18 14:13:15] Validation | Batch 600/1567 | Loss: 1.0497 [2026-04-18 14:13:16] Validation | Batch 610/1567 | Loss: 1.0487 [2026-04-18 14:13:17] Validation | Batch 620/1567 | Loss: 1.0502 [2026-04-18 14:13:18] Validation | Batch 630/1567 | Loss: 1.0482 [2026-04-18 14:13:18] Validation | Batch 640/1567 | Loss: 1.0497 [2026-04-18 14:13:19] Validation | Batch 650/1567 | Loss: 1.0488 [2026-04-18 14:13:20] Validation | Batch 660/1567 | Loss: 1.0476 [2026-04-18 14:13:21] Validation | Batch 670/1567 | Loss: 1.0456 [2026-04-18 14:13:21] Validation | Batch 680/1567 | Loss: 1.0451 [2026-04-18 14:13:22] Validation | Batch 690/1567 | Loss: 1.0459 [2026-04-18 14:13:23] Validation | Batch 700/1567 | Loss: 1.0445 [2026-04-18 14:13:24] Validation | Batch 710/1567 | Loss: 1.0458 [2026-04-18 14:13:25] Validation | Batch 720/1567 | Loss: 1.0450 [2026-04-18 14:13:25] Validation | Batch 730/1567 | Loss: 1.0457 [2026-04-18 14:13:26] Validation | Batch 740/1567 | Loss: 1.0468 [2026-04-18 14:13:27] Validation | Batch 750/1567 | Loss: 1.0474 [2026-04-18 14:13:28] Validation | Batch 760/1567 | Loss: 1.0471 [2026-04-18 14:13:29] Validation | Batch 770/1567 | Loss: 1.0492 [2026-04-18 14:13:29] Validation | Batch 780/1567 | Loss: 1.0505 [2026-04-18 14:13:30] Validation | Batch 790/1567 | Loss: 1.0499 [2026-04-18 14:13:31] Validation | Batch 800/1567 | Loss: 1.0518 [2026-04-18 14:13:32] Validation | Batch 810/1567 | Loss: 1.0517 [2026-04-18 14:13:32] Validation | Batch 820/1567 | Loss: 1.0514 [2026-04-18 14:13:33] Validation | Batch 830/1567 | Loss: 1.0498 [2026-04-18 14:13:34] Validation | Batch 840/1567 | Loss: 1.0499 [2026-04-18 14:13:35] Validation | Batch 850/1567 | Loss: 1.0486 [2026-04-18 14:13:35] Validation | Batch 860/1567 | Loss: 1.0502 [2026-04-18 14:13:36] Validation | Batch 870/1567 | Loss: 1.0507 [2026-04-18 14:13:37] Validation | Batch 880/1567 | Loss: 1.0515 [2026-04-18 14:13:37] Validation | Batch 890/1567 | Loss: 1.0522 [2026-04-18 14:13:38] Validation | Batch 900/1567 | Loss: 1.0541 [2026-04-18 14:13:39] Validation | Batch 910/1567 | Loss: 1.0542 [2026-04-18 14:13:40] Validation | Batch 920/1567 | Loss: 1.0563 [2026-04-18 14:13:40] Validation | Batch 930/1567 | Loss: 1.0540 [2026-04-18 14:13:41] Validation | Batch 940/1567 | Loss: 1.0537 [2026-04-18 14:13:42] Validation | Batch 950/1567 | Loss: 1.0527 [2026-04-18 14:13:43] Validation | Batch 960/1567 | Loss: 1.0512 [2026-04-18 14:13:43] Validation | Batch 970/1567 | Loss: 1.0529 [2026-04-18 14:13:44] Validation | Batch 980/1567 | Loss: 1.0533 [2026-04-18 14:13:45] Validation | Batch 990/1567 | Loss: 1.0527 [2026-04-18 14:13:46] Validation | Batch 1000/1567 | Loss: 1.0531 [2026-04-18 14:13:46] Validation | Batch 1010/1567 | Loss: 1.0508 [2026-04-18 14:13:47] Validation | Batch 1020/1567 | Loss: 1.0511 [2026-04-18 14:13:48] Validation | Batch 1030/1567 | Loss: 1.0527 [2026-04-18 14:13:49] Validation | Batch 1040/1567 | Loss: 1.0523 [2026-04-18 14:13:50] Validation | Batch 1050/1567 | Loss: 1.0533 [2026-04-18 14:13:50] Validation | Batch 1060/1567 | Loss: 1.0524 [2026-04-18 14:13:51] Validation | Batch 1070/1567 | Loss: 1.0516 [2026-04-18 14:13:52] Validation | Batch 1080/1567 | Loss: 1.0526 [2026-04-18 14:13:53] Validation | Batch 1090/1567 | Loss: 1.0523 [2026-04-18 14:13:53] Validation | Batch 1100/1567 | Loss: 1.0529 [2026-04-18 14:13:54] Validation | Batch 1110/1567 | Loss: 1.0527 [2026-04-18 14:13:55] Validation | Batch 1120/1567 | Loss: 1.0529 [2026-04-18 14:13:56] Validation | Batch 1130/1567 | Loss: 1.0530 [2026-04-18 14:13:57] Validation | Batch 1140/1567 | Loss: 1.0538 [2026-04-18 14:13:58] Validation | Batch 1150/1567 | Loss: 1.0542 [2026-04-18 14:13:58] Validation | Batch 1160/1567 | Loss: 1.0551 [2026-04-18 14:13:59] Validation | Batch 1170/1567 | Loss: 1.0548 [2026-04-18 14:14:00] Validation | Batch 1180/1567 | Loss: 1.0544 [2026-04-18 14:14:01] Validation | Batch 1190/1567 | Loss: 1.0555 [2026-04-18 14:14:02] Validation | Batch 1200/1567 | Loss: 1.0549 [2026-04-18 14:14:02] Validation | Batch 1210/1567 | Loss: 1.0537 [2026-04-18 14:14:03] Validation | Batch 1220/1567 | Loss: 1.0541 [2026-04-18 14:14:04] Validation | Batch 1230/1567 | Loss: 1.0561 [2026-04-18 14:14:05] Validation | Batch 1240/1567 | Loss: 1.0549 [2026-04-18 14:14:05] Validation | Batch 1250/1567 | Loss: 1.0549 [2026-04-18 14:14:06] Validation | Batch 1260/1567 | Loss: 1.0559 [2026-04-18 14:14:07] Validation | Batch 1270/1567 | Loss: 1.0559 [2026-04-18 14:14:08] Validation | Batch 1280/1567 | Loss: 1.0553 [2026-04-18 14:14:09] Validation | Batch 1290/1567 | Loss: 1.0556 [2026-04-18 14:14:10] Validation | Batch 1300/1567 | Loss: 1.0559 [2026-04-18 14:14:11] Validation | Batch 1310/1567 | Loss: 1.0562 [2026-04-18 14:14:12] Validation | Batch 1320/1567 | Loss: 1.0553 [2026-04-18 14:14:12] Validation | Batch 1330/1567 | Loss: 1.0550 [2026-04-18 14:14:13] Validation | Batch 1340/1567 | Loss: 1.0547 [2026-04-18 14:14:14] Validation | Batch 1350/1567 | Loss: 1.0555 [2026-04-18 14:14:15] Validation | Batch 1360/1567 | Loss: 1.0552 [2026-04-18 14:14:15] Validation | Batch 1370/1567 | Loss: 1.0555 [2026-04-18 14:14:16] Validation | Batch 1380/1567 | Loss: 1.0568 [2026-04-18 14:14:17] Validation | Batch 1390/1567 | Loss: 1.0569 [2026-04-18 14:14:18] Validation | Batch 1400/1567 | Loss: 1.0573 [2026-04-18 14:14:18] Validation | Batch 1410/1567 | Loss: 1.0571 [2026-04-18 14:14:19] Validation | Batch 1420/1567 | Loss: 1.0577 [2026-04-18 14:14:20] Validation | Batch 1430/1567 | Loss: 1.0574 [2026-04-18 14:14:21] Validation | Batch 1440/1567 | Loss: 1.0577 [2026-04-18 14:14:21] Validation | Batch 1450/1567 | Loss: 1.0570 [2026-04-18 14:14:21] Validation | Batch 1460/1567 | Loss: 1.0568 [2026-04-18 14:14:22] Validation | Batch 1470/1567 | Loss: 1.0558 [2026-04-18 14:14:23] Validation | Batch 1480/1567 | Loss: 1.0542 [2026-04-18 14:14:23] Validation | Batch 1490/1567 | Loss: 1.0543 [2026-04-18 14:14:24] Validation | Batch 1500/1567 | Loss: 1.0544 [2026-04-18 14:14:25] Validation | Batch 1510/1567 | Loss: 1.0542 [2026-04-18 14:14:26] Validation | Batch 1520/1567 | Loss: 1.0535 [2026-04-18 14:14:26] Validation | Batch 1530/1567 | Loss: 1.0544 [2026-04-18 14:14:27] Validation | Batch 1540/1567 | Loss: 1.0553 [2026-04-18 14:14:28] Validation | Batch 1550/1567 | Loss: 1.0557 [2026-04-18 14:14:29] Validation | Batch 1560/1567 | Loss: 1.0547 [2026-04-18 14:14:30] Validation | Batch 1567/1567 | Loss: 1.0552 [2026-04-18 14:14:30] Validation | Loss: 1.0552 | PPL: 2.89 | Time: 124.99s [2026-04-18 14:14:33] New best model saved! Val loss: 1.0552 [2026-04-18 14:14:37] Epoch 2 | Step 14010 | Loss: 0.8043 | LR: 5.19e-06 [2026-04-18 14:14:41] Epoch 2 | Step 14020 | Loss: 0.8041 | LR: 5.12e-06 [2026-04-18 14:14:44] Epoch 2 | Step 14030 | Loss: 0.8040 | LR: 5.05e-06 [2026-04-18 14:14:48] Epoch 2 | Step 14040 | Loss: 0.8040 | LR: 4.98e-06 [2026-04-18 14:14:52] Epoch 2 | Step 14050 | Loss: 0.8037 | LR: 4.91e-06 [2026-04-18 14:14:55] Epoch 2 | Step 14060 | Loss: 0.8038 | LR: 4.84e-06 [2026-04-18 14:14:58] Epoch 2 | Step 14070 | Loss: 0.8041 | LR: 4.77e-06 [2026-04-18 14:15:01] Epoch 2 | Step 14080 | Loss: 0.8039 | LR: 4.70e-06 [2026-04-18 14:15:05] Epoch 2 | Step 14090 | Loss: 0.8040 | LR: 4.63e-06 [2026-04-18 14:15:08] Epoch 2 | Step 14100 | Loss: 0.8038 | LR: 4.57e-06 [2026-04-18 14:15:12] Epoch 2 | Step 14110 | Loss: 0.8037 | LR: 4.50e-06 [2026-04-18 14:15:16] Epoch 2 | Step 14120 | Loss: 0.8036 | LR: 4.43e-06 [2026-04-18 14:15:19] Epoch 2 | Step 14130 | Loss: 0.8035 | LR: 4.37e-06 [2026-04-18 14:15:23] Epoch 2 | Step 14140 | Loss: 0.8032 | LR: 4.30e-06 [2026-04-18 14:15:27] Epoch 2 | Step 14150 | Loss: 0.8030 | LR: 4.24e-06 [2026-04-18 14:15:30] Epoch 2 | Step 14160 | Loss: 0.8031 | LR: 4.18e-06 [2026-04-18 14:15:33] Epoch 2 | Step 14170 | Loss: 0.8030 | LR: 4.12e-06 [2026-04-18 14:15:37] Epoch 2 | Step 14180 | Loss: 0.8031 | LR: 4.06e-06 [2026-04-18 14:15:40] Epoch 2 | Step 14190 | Loss: 0.8028 | LR: 4.00e-06 [2026-04-18 14:15:44] Epoch 2 | Step 14200 | Loss: 0.8025 | LR: 3.94e-06 [2026-04-18 14:15:48] Epoch 2 | Step 14210 | Loss: 0.8026 | LR: 3.88e-06 [2026-04-18 14:15:51] Epoch 2 | Step 14220 | Loss: 0.8030 | LR: 3.82e-06 [2026-04-18 14:15:55] Epoch 2 | Step 14230 | Loss: 0.8029 | LR: 3.76e-06 [2026-04-18 14:15:58] Epoch 2 | Step 14240 | Loss: 0.8030 | LR: 3.71e-06 [2026-04-18 14:16:01] Epoch 2 | Step 14250 | Loss: 0.8031 | LR: 3.65e-06 [2026-04-18 14:16:05] Epoch 2 | Step 14260 | Loss: 0.8029 | LR: 3.60e-06 [2026-04-18 14:16:09] Epoch 2 | Step 14270 | Loss: 0.8025 | LR: 3.54e-06 [2026-04-18 14:16:13] Epoch 2 | Step 14280 | Loss: 0.8021 | LR: 3.49e-06 [2026-04-18 14:16:16] Epoch 2 | Step 14290 | Loss: 0.8019 | LR: 3.44e-06 [2026-04-18 14:16:19] Epoch 2 | Step 14300 | Loss: 0.8017 | LR: 3.39e-06 [2026-04-18 14:16:23] Epoch 2 | Step 14310 | Loss: 0.8017 | LR: 3.34e-06 [2026-04-18 14:16:26] Epoch 2 | Step 14320 | Loss: 0.8016 | LR: 3.29e-06 [2026-04-18 14:16:30] Epoch 2 | Step 14330 | Loss: 0.8013 | LR: 3.24e-06 [2026-04-18 14:16:34] Epoch 2 | Step 14340 | Loss: 0.8014 | LR: 3.19e-06 [2026-04-18 14:16:37] Epoch 2 | Step 14350 | Loss: 0.8014 | LR: 3.14e-06 [2026-04-18 14:16:41] Epoch 2 | Step 14360 | Loss: 0.8015 | LR: 3.10e-06 [2026-04-18 14:16:45] Epoch 2 | Step 14370 | Loss: 0.8014 | LR: 3.05e-06 [2026-04-18 14:16:48] Epoch 2 | Step 14380 | Loss: 0.8014 | LR: 3.01e-06 [2026-04-18 14:16:52] Epoch 2 | Step 14390 | Loss: 0.8014 | LR: 2.96e-06 [2026-04-18 14:16:56] Epoch 2 | Step 14400 | Loss: 0.8014 | LR: 2.92e-06 [2026-04-18 14:16:59] Epoch 2 | Step 14410 | Loss: 0.8017 | LR: 2.88e-06 [2026-04-18 14:17:04] Epoch 2 | Step 14420 | Loss: 0.8016 | LR: 2.84e-06 [2026-04-18 14:17:08] Epoch 2 | Step 14430 | Loss: 0.8016 | LR: 2.80e-06 [2026-04-18 14:17:11] Epoch 2 | Step 14440 | Loss: 0.8016 | LR: 2.76e-06 [2026-04-18 14:17:15] Epoch 2 | Step 14450 | Loss: 0.8015 | LR: 2.72e-06 [2026-04-18 14:17:18] Epoch 2 | Step 14460 | Loss: 0.8014 | LR: 2.69e-06 [2026-04-18 14:17:22] Epoch 2 | Step 14470 | Loss: 0.8018 | LR: 2.65e-06 [2026-04-18 14:17:26] Epoch 2 | Step 14480 | Loss: 0.8017 | LR: 2.61e-06 [2026-04-18 14:17:29] Epoch 2 | Step 14490 | Loss: 0.8018 | LR: 2.58e-06 [2026-04-18 14:17:33] Epoch 2 | Step 14500 | Loss: 0.8018 | LR: 2.55e-06 [2026-04-18 14:17:37] Epoch 2 | Step 14510 | Loss: 0.8018 | LR: 2.52e-06 [2026-04-18 14:17:40] Epoch 2 | Step 14520 | Loss: 0.8017 | LR: 2.48e-06 [2026-04-18 14:17:44] Epoch 2 | Step 14530 | Loss: 0.8015 | LR: 2.45e-06 [2026-04-18 14:17:47] Epoch 2 | Step 14540 | Loss: 0.8015 | LR: 2.42e-06 [2026-04-18 14:17:51] Epoch 2 | Step 14550 | Loss: 0.8012 | LR: 2.40e-06 [2026-04-18 14:17:55] Epoch 2 | Step 14560 | Loss: 0.8013 | LR: 2.37e-06 [2026-04-18 14:17:58] Epoch 2 | Step 14570 | Loss: 0.8012 | LR: 2.34e-06 [2026-04-18 14:18:02] Epoch 2 | Step 14580 | Loss: 0.8012 | LR: 2.32e-06 [2026-04-18 14:18:05] Epoch 2 | Step 14590 | Loss: 0.8012 | LR: 2.29e-06 [2026-04-18 14:18:09] Epoch 2 | Step 14600 | Loss: 0.8012 | LR: 2.27e-06 [2026-04-18 14:18:13] Epoch 2 | Step 14610 | Loss: 0.8013 | LR: 2.25e-06 [2026-04-18 14:18:16] Epoch 2 | Step 14620 | Loss: 0.8013 | LR: 2.22e-06 [2026-04-18 14:18:20] Epoch 2 | Step 14630 | Loss: 0.8013 | LR: 2.20e-06 [2026-04-18 14:18:24] Epoch 2 | Step 14640 | Loss: 0.8014 | LR: 2.18e-06 [2026-04-18 14:18:28] Epoch 2 | Step 14650 | Loss: 0.8016 | LR: 2.16e-06 [2026-04-18 14:18:31] Epoch 2 | Step 14660 | Loss: 0.8018 | LR: 2.15e-06 [2026-04-18 14:18:34] Epoch 2 | Step 14670 | Loss: 0.8015 | LR: 2.13e-06 [2026-04-18 14:18:38] Epoch 2 | Step 14680 | Loss: 0.8013 | LR: 2.11e-06 [2026-04-18 14:18:41] Epoch 2 | Step 14690 | Loss: 0.8013 | LR: 2.10e-06 [2026-04-18 14:18:44] Epoch 2 | Step 14700 | Loss: 0.8012 | LR: 2.09e-06 [2026-04-18 14:18:47] Epoch 2 | Step 14710 | Loss: 0.8015 | LR: 2.07e-06 [2026-04-18 14:18:51] Epoch 2 | Step 14720 | Loss: 0.8014 | LR: 2.06e-06 [2026-04-18 14:18:54] Epoch 2 | Step 14730 | Loss: 0.8015 | LR: 2.05e-06 [2026-04-18 14:18:58] Epoch 2 | Step 14740 | Loss: 0.8014 | LR: 2.04e-06 [2026-04-18 14:19:01] Epoch 2 | Step 14750 | Loss: 0.8012 | LR: 2.03e-06 [2026-04-18 14:19:05] Epoch 2 | Step 14760 | Loss: 0.8013 | LR: 2.03e-06 [2026-04-18 14:19:08] Epoch 2 | Step 14770 | Loss: 0.8014 | LR: 2.02e-06 [2026-04-18 14:19:12] Epoch 2 | Step 14780 | Loss: 0.8014 | LR: 2.01e-06 [2026-04-18 14:19:16] Epoch 2 | Step 14790 | Loss: 0.8013 | LR: 2.01e-06 [2026-04-18 14:19:19] Epoch 2 | Step 14800 | Loss: 0.8015 | LR: 2.00e-06 [2026-04-18 14:19:23] Epoch 2 | Step 14810 | Loss: 0.8014 | LR: 2.00e-06 [2026-04-18 14:19:27] Epoch 2 | Step 14820 | Loss: 0.8014 | LR: 2.00e-06 [2026-04-18 14:19:30] Epoch 2 | Step 14830 | Loss: 0.8011 | LR: 2.00e-06 [2026-04-18 14:19:34] Epoch 2 | Step 14840 | Loss: 0.8012 | LR: 2.00e-06 [2026-04-18 14:19:38] Epoch 2 | Step 14850 | Loss: 0.8015 | LR: 2.00e-06 [2026-04-18 14:19:41] Epoch 2 | Step 14860 | Loss: 0.8015 | LR: 2.00e-06 [2026-04-18 14:19:44] Epoch 2 | Step 14870 | Loss: 0.8015 | LR: 2.00e-06 [2026-04-18 14:19:48] Epoch 2 | Step 14880 | Loss: 0.8015 | LR: 2.00e-06 [2026-04-18 14:19:51] Epoch 2 | Step 14890 | Loss: 0.8014 | LR: 2.00e-06 [2026-04-18 14:19:55] Epoch 2 | Step 14900 | Loss: 0.8013 | LR: 2.00e-06 [2026-04-18 14:19:58] Epoch 2 | Step 14910 | Loss: 0.8013 | LR: 2.00e-06 [2026-04-18 14:20:02] Epoch 2 | Step 14920 | Loss: 0.8014 | LR: 2.00e-06 [2026-04-18 14:20:06] Epoch 2 | Step 14930 | Loss: 0.8013 | LR: 2.00e-06 [2026-04-18 14:20:09] Epoch 2 | Step 14940 | Loss: 0.8012 | LR: 2.00e-06 [2026-04-18 14:20:13] Epoch 2 | Step 14950 | Loss: 0.8013 | LR: 2.00e-06 [2026-04-18 14:20:16] Epoch 2 | Step 14960 | Loss: 0.8011 | LR: 2.00e-06 [2026-04-18 14:20:20] Epoch 2 | Step 14970 | Loss: 0.8010 | LR: 2.00e-06 [2026-04-18 14:20:24] Epoch 2 | Step 14980 | Loss: 0.8011 | LR: 2.00e-06 [2026-04-18 14:20:27] Epoch 2 | Step 14990 | Loss: 0.8008 | LR: 2.00e-06 [2026-04-18 14:20:31] Epoch 2 | Step 15000 | Loss: 0.8007 | LR: 2.00e-06 [2026-04-18 14:20:40] Checkpoint saved: outputs/2026-04-18/12-19-14/checkpoints/checkpoint_step_15000.pt [2026-04-18 14:20:56] Validation | Batch 10/1567 | Loss: 0.9340 [2026-04-18 14:20:57] Validation | Batch 20/1567 | Loss: 0.9999 [2026-04-18 14:20:58] Validation | Batch 30/1567 | Loss: 1.0394 [2026-04-18 14:20:59] Validation | Batch 40/1567 | Loss: 1.0610 [2026-04-18 14:21:00] Validation | Batch 50/1567 | Loss: 1.0364 [2026-04-18 14:21:01] Validation | Batch 60/1567 | Loss: 1.0236 [2026-04-18 14:21:02] Validation | Batch 70/1567 | Loss: 1.0086 [2026-04-18 14:21:03] Validation | Batch 80/1567 | Loss: 1.0262 [2026-04-18 14:21:03] Validation | Batch 90/1567 | Loss: 1.0346 [2026-04-18 14:21:04] Validation | Batch 100/1567 | Loss: 1.0443 [2026-04-18 14:21:05] Validation | Batch 110/1567 | Loss: 1.0364 [2026-04-18 14:21:06] Validation | Batch 120/1567 | Loss: 1.0474 [2026-04-18 14:21:07] Validation | Batch 130/1567 | Loss: 1.0486 [2026-04-18 14:21:08] Validation | Batch 140/1567 | Loss: 1.0508 [2026-04-18 14:21:08] Validation | Batch 150/1567 | Loss: 1.0587 [2026-04-18 14:21:09] Validation | Batch 160/1567 | Loss: 1.0602 [2026-04-18 14:21:10] Validation | Batch 170/1567 | Loss: 1.0453 [2026-04-18 14:21:11] Validation | Batch 180/1567 | Loss: 1.0474 [2026-04-18 14:21:12] Validation | Batch 190/1567 | Loss: 1.0438 [2026-04-18 14:21:13] Validation | Batch 200/1567 | Loss: 1.0469 [2026-04-18 14:21:13] Validation | Batch 210/1567 | Loss: 1.0480 [2026-04-18 14:21:14] Validation | Batch 220/1567 | Loss: 1.0503 [2026-04-18 14:21:15] Validation | Batch 230/1567 | Loss: 1.0542 [2026-04-18 14:21:16] Validation | Batch 240/1567 | Loss: 1.0525 [2026-04-18 14:21:17] Validation | Batch 250/1567 | Loss: 1.0464 [2026-04-18 14:21:17] Validation | Batch 260/1567 | Loss: 1.0418 [2026-04-18 14:21:18] Validation | Batch 270/1567 | Loss: 1.0387 [2026-04-18 14:21:19] Validation | Batch 280/1567 | Loss: 1.0399 [2026-04-18 14:21:20] Validation | Batch 290/1567 | Loss: 1.0451 [2026-04-18 14:21:21] Validation | Batch 300/1567 | Loss: 1.0501 [2026-04-18 14:21:22] Validation | Batch 310/1567 | Loss: 1.0490 [2026-04-18 14:21:22] Validation | Batch 320/1567 | Loss: 1.0496 [2026-04-18 14:21:23] Validation | Batch 330/1567 | Loss: 1.0467 [2026-04-18 14:21:24] Validation | Batch 340/1567 | Loss: 1.0506 [2026-04-18 14:21:25] Validation | Batch 350/1567 | Loss: 1.0498 [2026-04-18 14:21:26] Validation | Batch 360/1567 | Loss: 1.0476 [2026-04-18 14:21:27] Validation | Batch 370/1567 | Loss: 1.0448 [2026-04-18 14:21:27] Validation | Batch 380/1567 | Loss: 1.0481 [2026-04-18 14:21:28] Validation | Batch 390/1567 | Loss: 1.0492 [2026-04-18 14:21:29] Validation | Batch 400/1567 | Loss: 1.0504 [2026-04-18 14:21:30] Validation | Batch 410/1567 | Loss: 1.0497 [2026-04-18 14:21:31] Validation | Batch 420/1567 | Loss: 1.0492 [2026-04-18 14:21:31] Validation | Batch 430/1567 | Loss: 1.0491 [2026-04-18 14:21:32] Validation | Batch 440/1567 | Loss: 1.0479 [2026-04-18 14:21:33] Validation | Batch 450/1567 | Loss: 1.0480 [2026-04-18 14:21:34] Validation | Batch 460/1567 | Loss: 1.0469 [2026-04-18 14:21:35] Validation | Batch 470/1567 | Loss: 1.0462 [2026-04-18 14:21:36] Validation | Batch 480/1567 | Loss: 1.0440 [2026-04-18 14:21:36] Validation | Batch 490/1567 | Loss: 1.0439 [2026-04-18 14:21:37] Validation | Batch 500/1567 | Loss: 1.0435 [2026-04-18 14:21:38] Validation | Batch 510/1567 | Loss: 1.0457 [2026-04-18 14:21:39] Validation | Batch 520/1567 | Loss: 1.0475 [2026-04-18 14:21:40] Validation | Batch 530/1567 | Loss: 1.0471 [2026-04-18 14:21:40] Validation | Batch 540/1567 | Loss: 1.0498 [2026-04-18 14:21:41] Validation | Batch 550/1567 | Loss: 1.0533 [2026-04-18 14:21:42] Validation | Batch 560/1567 | Loss: 1.0530 [2026-04-18 14:21:43] Validation | Batch 570/1567 | Loss: 1.0530 [2026-04-18 14:21:44] Validation | Batch 580/1567 | Loss: 1.0521 [2026-04-18 14:21:45] Validation | Batch 590/1567 | Loss: 1.0507 [2026-04-18 14:21:46] Validation | Batch 600/1567 | Loss: 1.0491 [2026-04-18 14:21:47] Validation | Batch 610/1567 | Loss: 1.0480 [2026-04-18 14:21:48] Validation | Batch 620/1567 | Loss: 1.0495 [2026-04-18 14:21:48] Validation | Batch 630/1567 | Loss: 1.0475 [2026-04-18 14:21:49] Validation | Batch 640/1567 | Loss: 1.0490 [2026-04-18 14:21:50] Validation | Batch 650/1567 | Loss: 1.0481 [2026-04-18 14:21:51] Validation | Batch 660/1567 | Loss: 1.0469 [2026-04-18 14:21:51] Validation | Batch 670/1567 | Loss: 1.0450 [2026-04-18 14:21:52] Validation | Batch 680/1567 | Loss: 1.0444 [2026-04-18 14:21:53] Validation | Batch 690/1567 | Loss: 1.0453 [2026-04-18 14:21:54] Validation | Batch 700/1567 | Loss: 1.0438 [2026-04-18 14:21:55] Validation | Batch 710/1567 | Loss: 1.0451 [2026-04-18 14:21:56] Validation | Batch 720/1567 | Loss: 1.0443 [2026-04-18 14:21:56] Validation | Batch 730/1567 | Loss: 1.0449 [2026-04-18 14:21:57] Validation | Batch 740/1567 | Loss: 1.0461 [2026-04-18 14:21:58] Validation | Batch 750/1567 | Loss: 1.0466 [2026-04-18 14:21:58] Validation | Batch 760/1567 | Loss: 1.0463 [2026-04-18 14:21:59] Validation | Batch 770/1567 | Loss: 1.0484 [2026-04-18 14:22:00] Validation | Batch 780/1567 | Loss: 1.0497 [2026-04-18 14:22:01] Validation | Batch 790/1567 | Loss: 1.0491 [2026-04-18 14:22:02] Validation | Batch 800/1567 | Loss: 1.0510 [2026-04-18 14:22:02] Validation | Batch 810/1567 | Loss: 1.0509 [2026-04-18 14:22:03] Validation | Batch 820/1567 | Loss: 1.0506 [2026-04-18 14:22:04] Validation | Batch 830/1567 | Loss: 1.0490 [2026-04-18 14:22:05] Validation | Batch 840/1567 | Loss: 1.0491 [2026-04-18 14:22:05] Validation | Batch 850/1567 | Loss: 1.0478 [2026-04-18 14:22:06] Validation | Batch 860/1567 | Loss: 1.0493 [2026-04-18 14:22:07] Validation | Batch 870/1567 | Loss: 1.0498 [2026-04-18 14:22:08] Validation | Batch 880/1567 | Loss: 1.0506 [2026-04-18 14:22:08] Validation | Batch 890/1567 | Loss: 1.0513 [2026-04-18 14:22:09] Validation | Batch 900/1567 | Loss: 1.0532 [2026-04-18 14:22:10] Validation | Batch 910/1567 | Loss: 1.0533 [2026-04-18 14:22:11] Validation | Batch 920/1567 | Loss: 1.0554 [2026-04-18 14:22:11] Validation | Batch 930/1567 | Loss: 1.0531 [2026-04-18 14:22:12] Validation | Batch 940/1567 | Loss: 1.0528 [2026-04-18 14:22:13] Validation | Batch 950/1567 | Loss: 1.0518 [2026-04-18 14:22:13] Validation | Batch 960/1567 | Loss: 1.0504 [2026-04-18 14:22:14] Validation | Batch 970/1567 | Loss: 1.0520 [2026-04-18 14:22:15] Validation | Batch 980/1567 | Loss: 1.0524 [2026-04-18 14:22:16] Validation | Batch 990/1567 | Loss: 1.0518 [2026-04-18 14:22:16] Validation | Batch 1000/1567 | Loss: 1.0522 [2026-04-18 14:22:17] Validation | Batch 1010/1567 | Loss: 1.0499 [2026-04-18 14:22:18] Validation | Batch 1020/1567 | Loss: 1.0502 [2026-04-18 14:22:19] Validation | Batch 1030/1567 | Loss: 1.0518 [2026-04-18 14:22:20] Validation | Batch 1040/1567 | Loss: 1.0513 [2026-04-18 14:22:20] Validation | Batch 1050/1567 | Loss: 1.0523 [2026-04-18 14:22:21] Validation | Batch 1060/1567 | Loss: 1.0514 [2026-04-18 14:22:22] Validation | Batch 1070/1567 | Loss: 1.0506 [2026-04-18 14:22:23] Validation | Batch 1080/1567 | Loss: 1.0516 [2026-04-18 14:22:23] Validation | Batch 1090/1567 | Loss: 1.0514 [2026-04-18 14:22:24] Validation | Batch 1100/1567 | Loss: 1.0519 [2026-04-18 14:22:25] Validation | Batch 1110/1567 | Loss: 1.0518 [2026-04-18 14:22:25] Validation | Batch 1120/1567 | Loss: 1.0520 [2026-04-18 14:22:26] Validation | Batch 1130/1567 | Loss: 1.0521 [2026-04-18 14:22:27] Validation | Batch 1140/1567 | Loss: 1.0529 [2026-04-18 14:22:28] Validation | Batch 1150/1567 | Loss: 1.0533 [2026-04-18 14:22:29] Validation | Batch 1160/1567 | Loss: 1.0541 [2026-04-18 14:22:30] Validation | Batch 1170/1567 | Loss: 1.0539 [2026-04-18 14:22:31] Validation | Batch 1180/1567 | Loss: 1.0535 [2026-04-18 14:22:31] Validation | Batch 1190/1567 | Loss: 1.0546 [2026-04-18 14:22:32] Validation | Batch 1200/1567 | Loss: 1.0539 [2026-04-18 14:22:33] Validation | Batch 1210/1567 | Loss: 1.0528 [2026-04-18 14:22:34] Validation | Batch 1220/1567 | Loss: 1.0532 [2026-04-18 14:22:35] Validation | Batch 1230/1567 | Loss: 1.0553 [2026-04-18 14:22:35] Validation | Batch 1240/1567 | Loss: 1.0541 [2026-04-18 14:22:36] Validation | Batch 1250/1567 | Loss: 1.0540 [2026-04-18 14:22:37] Validation | Batch 1260/1567 | Loss: 1.0550 [2026-04-18 14:22:38] Validation | Batch 1270/1567 | Loss: 1.0550 [2026-04-18 14:22:39] Validation | Batch 1280/1567 | Loss: 1.0544 [2026-04-18 14:22:40] Validation | Batch 1290/1567 | Loss: 1.0547 [2026-04-18 14:22:41] Validation | Batch 1300/1567 | Loss: 1.0550 [2026-04-18 14:22:41] Validation | Batch 1310/1567 | Loss: 1.0554 [2026-04-18 14:22:42] Validation | Batch 1320/1567 | Loss: 1.0545 [2026-04-18 14:22:43] Validation | Batch 1330/1567 | Loss: 1.0541 [2026-04-18 14:22:44] Validation | Batch 1340/1567 | Loss: 1.0539 [2026-04-18 14:22:45] Validation | Batch 1350/1567 | Loss: 1.0547 [2026-04-18 14:22:45] Validation | Batch 1360/1567 | Loss: 1.0543 [2026-04-18 14:22:46] Validation | Batch 1370/1567 | Loss: 1.0547 [2026-04-18 14:22:47] Validation | Batch 1380/1567 | Loss: 1.0560 [2026-04-18 14:22:48] Validation | Batch 1390/1567 | Loss: 1.0561 [2026-04-18 14:22:48] Validation | Batch 1400/1567 | Loss: 1.0565 [2026-04-18 14:22:49] Validation | Batch 1410/1567 | Loss: 1.0562 [2026-04-18 14:22:50] Validation | Batch 1420/1567 | Loss: 1.0568 [2026-04-18 14:22:50] Validation | Batch 1430/1567 | Loss: 1.0565 [2026-04-18 14:22:51] Validation | Batch 1440/1567 | Loss: 1.0568 [2026-04-18 14:22:52] Validation | Batch 1450/1567 | Loss: 1.0561 [2026-04-18 14:22:53] Validation | Batch 1460/1567 | Loss: 1.0559 [2026-04-18 14:22:53] Validation | Batch 1470/1567 | Loss: 1.0550 [2026-04-18 14:22:54] Validation | Batch 1480/1567 | Loss: 1.0534 [2026-04-18 14:22:55] Validation | Batch 1490/1567 | Loss: 1.0534 [2026-04-18 14:22:56] Validation | Batch 1500/1567 | Loss: 1.0535 [2026-04-18 14:22:56] Validation | Batch 1510/1567 | Loss: 1.0534 [2026-04-18 14:22:57] Validation | Batch 1520/1567 | Loss: 1.0527 [2026-04-18 14:22:58] Validation | Batch 1530/1567 | Loss: 1.0535 [2026-04-18 14:22:59] Validation | Batch 1540/1567 | Loss: 1.0545 [2026-04-18 14:23:00] Validation | Batch 1550/1567 | Loss: 1.0548 [2026-04-18 14:23:01] Validation | Batch 1560/1567 | Loss: 1.0539 [2026-04-18 14:23:01] Validation | Batch 1567/1567 | Loss: 1.0543 [2026-04-18 14:23:01] Validation | Loss: 1.0543 | PPL: 2.89 | Time: 125.51s [2026-04-18 14:23:06] New best model saved! Val loss: 1.0543 [2026-04-18 14:23:10] Epoch 2 | Step 15010 | Loss: 0.8004 | LR: 2.00e-06 [2026-04-18 14:23:13] Epoch 2 | Step 15020 | Loss: 0.8005 | LR: 2.00e-06 [2026-04-18 14:23:17] Epoch 2 | Step 15030 | Loss: 0.8004 | LR: 2.00e-06 [2026-04-18 14:23:21] Epoch 2 | Step 15040 | Loss: 0.8005 | LR: 2.00e-06 [2026-04-18 14:23:24] Epoch 2 | Step 15050 | Loss: 0.8005 | LR: 2.00e-06 [2026-04-18 14:23:28] Epoch 2 | Step 15060 | Loss: 0.8006 | LR: 2.00e-06 [2026-04-18 14:23:32] Epoch 2 | Step 15070 | Loss: 0.8006 | LR: 2.00e-06 [2026-04-18 14:23:36] Epoch 2 | Step 15080 | Loss: 0.8006 | LR: 2.00e-06 [2026-04-18 14:23:40] Epoch 2 | Step 15090 | Loss: 0.8005 | LR: 2.00e-06 [2026-04-18 14:23:44] Epoch 2 | Step 15100 | Loss: 0.8003 | LR: 2.00e-06 [2026-04-18 14:23:47] Epoch 2 | Step 15110 | Loss: 0.8002 | LR: 2.00e-06 [2026-04-18 14:23:51] Epoch 2 | Step 15120 | Loss: 0.8002 | LR: 2.00e-06 [2026-04-18 14:23:55] Epoch 2 | Step 15130 | Loss: 0.8005 | LR: 2.00e-06 [2026-04-18 14:23:58] Epoch 2 | Step 15140 | Loss: 0.8004 | LR: 2.00e-06 [2026-04-18 14:24:02] Epoch 2 | Step 15150 | Loss: 0.8005 | LR: 2.00e-06 [2026-04-18 14:24:05] Epoch 2 | Step 15160 | Loss: 0.8005 | LR: 2.00e-06 [2026-04-18 14:24:08] Epoch 2 | Step 15170 | Loss: 0.8003 | LR: 2.00e-06 [2026-04-18 14:24:12] Epoch 2 | Step 15180 | Loss: 0.8001 | LR: 2.00e-06 [2026-04-18 14:24:16] Epoch 2 | Step 15190 | Loss: 0.8000 | LR: 2.00e-06 [2026-04-18 14:24:19] Epoch 2 | Step 15200 | Loss: 0.8000 | LR: 2.00e-06 [2026-04-18 14:24:22] Epoch 2 | Step 15210 | Loss: 0.8002 | LR: 2.00e-06 [2026-04-18 14:24:26] Epoch 2 | Step 15220 | Loss: 0.8000 | LR: 2.00e-06 [2026-04-18 14:24:29] Epoch 2 | Step 15230 | Loss: 0.7999 | LR: 2.00e-06 [2026-04-18 14:24:33] Epoch 2 | Step 15240 | Loss: 0.7998 | LR: 2.00e-06 [2026-04-18 14:24:36] Epoch 2 | Step 15250 | Loss: 0.7996 | LR: 2.00e-06 [2026-04-18 14:24:40] Epoch 2 | Step 15260 | Loss: 0.7995 | LR: 2.00e-06 [2026-04-18 14:24:44] Epoch 2 | Step 15270 | Loss: 0.7993 | LR: 2.00e-06 [2026-04-18 14:24:47] Epoch 2 | Step 15280 | Loss: 0.7990 | LR: 2.00e-06 [2026-04-18 14:24:51] Epoch 2 | Step 15290 | Loss: 0.7993 | LR: 2.00e-06 [2026-04-18 14:24:54] Epoch 2 | Step 15300 | Loss: 0.7995 | LR: 2.00e-06 [2026-04-18 14:24:58] Epoch 2 | Step 15310 | Loss: 0.7994 | LR: 2.00e-06 [2026-04-18 14:25:01] Epoch 2 | Step 15320 | Loss: 0.7992 | LR: 2.00e-06 [2026-04-18 14:25:04] Epoch 2 | Step 15330 | Loss: 0.7992 | LR: 2.00e-06 [2026-04-18 14:25:08] Epoch 2 | Step 15340 | Loss: 0.7989 | LR: 2.00e-06 [2026-04-18 14:25:11] Epoch 2 | Step 15350 | Loss: 0.7990 | LR: 2.00e-06 [2026-04-18 14:25:15] Epoch 2 | Step 15360 | Loss: 0.7988 | LR: 2.00e-06 [2026-04-18 14:25:18] Epoch 2 | Step 15370 | Loss: 0.7987 | LR: 2.00e-06 [2026-04-18 14:25:22] Epoch 2 | Step 15380 | Loss: 0.7987 | LR: 2.00e-06 [2026-04-18 14:25:25] Epoch 2 | Step 15390 | Loss: 0.7989 | LR: 2.00e-06 [2026-04-18 14:25:29] Epoch 2 | Step 15400 | Loss: 0.7991 | LR: 2.00e-06 [2026-04-18 14:25:33] Epoch 2 | Step 15410 | Loss: 0.7990 | LR: 2.00e-06 [2026-04-18 14:25:37] Epoch 2 | Step 15420 | Loss: 0.7989 | LR: 2.00e-06 [2026-04-18 14:25:40] Epoch 2 | Step 15430 | Loss: 0.7988 | LR: 2.00e-06 [2026-04-18 14:25:43] Epoch 2 | Step 15440 | Loss: 0.7989 | LR: 2.00e-06 [2026-04-18 14:25:47] Epoch 2 | Step 15450 | Loss: 0.7990 | LR: 2.00e-06 [2026-04-18 14:25:51] Epoch 2 | Step 15460 | Loss: 0.7990 | LR: 2.00e-06 [2026-04-18 14:25:54] Epoch 2 | Step 15470 | Loss: 0.7988 | LR: 2.00e-06 [2026-04-18 14:25:58] Epoch 2 | Step 15480 | Loss: 0.7988 | LR: 2.00e-06 [2026-04-18 14:26:02] Epoch 2 | Step 15490 | Loss: 0.7987 | LR: 2.00e-06 [2026-04-18 14:26:05] Epoch 2 | Step 15500 | Loss: 0.7989 | LR: 2.00e-06 [2026-04-18 14:26:09] Epoch 2 | Step 15510 | Loss: 0.7986 | LR: 2.00e-06 [2026-04-18 14:26:12] Epoch 2 | Step 15520 | Loss: 0.7986 | LR: 2.00e-06 [2026-04-18 14:26:15] Epoch 2 | Step 15530 | Loss: 0.7983 | LR: 2.00e-06 [2026-04-18 14:26:19] Epoch 2 | Step 15540 | Loss: 0.7982 | LR: 2.00e-06 [2026-04-18 14:26:22] Epoch 2 | Step 15550 | Loss: 0.7982 | LR: 2.00e-06 [2026-04-18 14:26:26] Epoch 2 | Step 15560 | Loss: 0.7981 | LR: 2.00e-06 [2026-04-18 14:26:29] Epoch 2 | Step 15570 | Loss: 0.7978 | LR: 2.00e-06 [2026-04-18 14:26:33] Epoch 2 | Step 15580 | Loss: 0.7979 | LR: 2.00e-06 [2026-04-18 14:26:36] Epoch 2 | Step 15590 | Loss: 0.7981 | LR: 2.00e-06 [2026-04-18 14:26:40] Epoch 2 | Step 15600 | Loss: 0.7980 | LR: 2.00e-06 [2026-04-18 14:26:43] Epoch 2 | Step 15610 | Loss: 0.7981 | LR: 2.00e-06 [2026-04-18 14:26:47] Epoch 2 | Step 15620 | Loss: 0.7981 | LR: 2.00e-06 [2026-04-18 14:26:50] Epoch 2 | Step 15630 | Loss: 0.7983 | LR: 2.00e-06 [2026-04-18 14:26:54] Epoch 2 | Step 15640 | Loss: 0.7981 | LR: 2.00e-06 [2026-04-18 14:26:57] Epoch 2 | Step 15650 | Loss: 0.7980 | LR: 2.00e-06 [2026-04-18 14:27:00] Epoch 2 | Step 15660 | Loss: 0.7979 | LR: 2.00e-06 [2026-04-18 14:27:04] Epoch 2 | Step 15670 | Loss: 0.7980 | LR: 2.00e-06 [2026-04-18 14:27:07] Epoch 2 | Step 15680 | Loss: 0.7981 | LR: 2.00e-06 [2026-04-18 14:27:11] Epoch 2 | Step 15690 | Loss: 0.7979 | LR: 2.00e-06 [2026-04-18 14:27:15] Epoch 2 | Step 15700 | Loss: 0.7978 | LR: 2.00e-06 [2026-04-18 14:27:18] Epoch 2 | Step 15710 | Loss: 0.7979 | LR: 2.00e-06 [2026-04-18 14:27:22] Epoch 2 | Step 15720 | Loss: 0.7979 | LR: 2.00e-06 [2026-04-18 14:27:25] Epoch 2 | Step 15730 | Loss: 0.7980 | LR: 2.00e-06 [2026-04-18 14:27:29] Epoch 2 | Step 15740 | Loss: 0.7980 | LR: 2.00e-06 [2026-04-18 14:27:32] Epoch 2 | Step 15750 | Loss: 0.7980 | LR: 2.00e-06 [2026-04-18 14:27:36] Epoch 2 | Step 15760 | Loss: 0.7980 | LR: 2.00e-06 [2026-04-18 14:27:39] Epoch 2 | Step 15770 | Loss: 0.7978 | LR: 2.00e-06 [2026-04-18 14:27:43] Epoch 2 | Step 15780 | Loss: 0.7978 | LR: 2.00e-06 [2026-04-18 14:27:46] Epoch 2 | Step 15790 | Loss: 0.7977 | LR: 2.00e-06 [2026-04-18 14:27:50] Epoch 2 | Step 15800 | Loss: 0.7976 | LR: 2.00e-06 [2026-04-18 14:27:53] Epoch 2 | Step 15810 | Loss: 0.7978 | LR: 2.00e-06 [2026-04-18 14:27:57] Epoch 2 | Step 15820 | Loss: 0.7977 | LR: 2.00e-06 [2026-04-18 14:28:00] Epoch 2 | Step 15830 | Loss: 0.7977 | LR: 2.00e-06 [2026-04-18 14:28:04] Epoch 2 | Step 15840 | Loss: 0.7976 | LR: 2.00e-06 [2026-04-18 14:28:07] Epoch 2 | Step 15850 | Loss: 0.7974 | LR: 2.00e-06 [2026-04-18 14:28:11] Epoch 2 | Step 15860 | Loss: 0.7975 | LR: 2.00e-06 [2026-04-18 14:28:14] Epoch 2 | Step 15870 | Loss: 0.7975 | LR: 2.00e-06 [2026-04-18 14:28:18] Epoch 2 | Step 15880 | Loss: 0.7978 | LR: 2.00e-06 [2026-04-18 14:28:21] Epoch 2 | Step 15890 | Loss: 0.7978 | LR: 2.00e-06 [2026-04-18 14:28:25] Epoch 2 | Step 15900 | Loss: 0.7979 | LR: 2.00e-06 [2026-04-18 14:28:29] Epoch 2 | Step 15910 | Loss: 0.7977 | LR: 2.00e-06 [2026-04-18 14:28:32] Epoch 2 | Step 15920 | Loss: 0.7978 | LR: 2.00e-06 [2026-04-18 14:28:36] Epoch 2 | Step 15930 | Loss: 0.7977 | LR: 2.00e-06 [2026-04-18 14:28:40] Epoch 2 | Step 15940 | Loss: 0.7977 | LR: 2.00e-06 [2026-04-18 14:28:43] Epoch 2 | Step 15950 | Loss: 0.7977 | LR: 2.00e-06 [2026-04-18 14:28:47] Epoch 2 | Step 15960 | Loss: 0.7978 | LR: 2.00e-06 [2026-04-18 14:28:50] Epoch 2 | Step 15970 | Loss: 0.7979 | LR: 2.00e-06 [2026-04-18 14:28:54] Epoch 2 | Step 15980 | Loss: 0.7976 | LR: 2.00e-06 [2026-04-18 14:28:58] Epoch 2 | Step 15990 | Loss: 0.7976 | LR: 2.00e-06 [2026-04-18 14:29:01] Epoch 2 | Step 16000 | Loss: 0.7977 | LR: 2.00e-06 [2026-04-18 14:29:02] Validation | Batch 10/1567 | Loss: 0.9330 [2026-04-18 14:29:03] Validation | Batch 20/1567 | Loss: 1.0008 [2026-04-18 14:29:04] Validation | Batch 30/1567 | Loss: 1.0399 [2026-04-18 14:29:05] Validation | Batch 40/1567 | Loss: 1.0615 [2026-04-18 14:29:05] Validation | Batch 50/1567 | Loss: 1.0367 [2026-04-18 14:29:07] Validation | Batch 60/1567 | Loss: 1.0237 [2026-04-18 14:29:07] Validation | Batch 70/1567 | Loss: 1.0087 [2026-04-18 14:29:08] Validation | Batch 80/1567 | Loss: 1.0263 [2026-04-18 14:29:09] Validation | Batch 90/1567 | Loss: 1.0343 [2026-04-18 14:29:10] Validation | Batch 100/1567 | Loss: 1.0435 [2026-04-18 14:29:11] Validation | Batch 110/1567 | Loss: 1.0355 [2026-04-18 14:29:12] Validation | Batch 120/1567 | Loss: 1.0463 [2026-04-18 14:29:13] Validation | Batch 130/1567 | Loss: 1.0476 [2026-04-18 14:29:13] Validation | Batch 140/1567 | Loss: 1.0499 [2026-04-18 14:29:14] Validation | Batch 150/1567 | Loss: 1.0577 [2026-04-18 14:29:15] Validation | Batch 160/1567 | Loss: 1.0591 [2026-04-18 14:29:16] Validation | Batch 170/1567 | Loss: 1.0442 [2026-04-18 14:29:16] Validation | Batch 180/1567 | Loss: 1.0460 [2026-04-18 14:29:17] Validation | Batch 190/1567 | Loss: 1.0424 [2026-04-18 14:29:18] Validation | Batch 200/1567 | Loss: 1.0455 [2026-04-18 14:29:19] Validation | Batch 210/1567 | Loss: 1.0465 [2026-04-18 14:29:20] Validation | Batch 220/1567 | Loss: 1.0487 [2026-04-18 14:29:21] Validation | Batch 230/1567 | Loss: 1.0525 [2026-04-18 14:29:22] Validation | Batch 240/1567 | Loss: 1.0506 [2026-04-18 14:29:22] Validation | Batch 250/1567 | Loss: 1.0446 [2026-04-18 14:29:23] Validation | Batch 260/1567 | Loss: 1.0399 [2026-04-18 14:29:24] Validation | Batch 270/1567 | Loss: 1.0369 [2026-04-18 14:29:24] Validation | Batch 280/1567 | Loss: 1.0381 [2026-04-18 14:29:25] Validation | Batch 290/1567 | Loss: 1.0433 [2026-04-18 14:29:26] Validation | Batch 300/1567 | Loss: 1.0483 [2026-04-18 14:29:27] Validation | Batch 310/1567 | Loss: 1.0471 [2026-04-18 14:29:27] Validation | Batch 320/1567 | Loss: 1.0476 [2026-04-18 14:29:29] Validation | Batch 330/1567 | Loss: 1.0447 [2026-04-18 14:29:30] Validation | Batch 340/1567 | Loss: 1.0488 [2026-04-18 14:29:30] Validation | Batch 350/1567 | Loss: 1.0478 [2026-04-18 14:29:31] Validation | Batch 360/1567 | Loss: 1.0457 [2026-04-18 14:29:32] Validation | Batch 370/1567 | Loss: 1.0430 [2026-04-18 14:29:33] Validation | Batch 380/1567 | Loss: 1.0462 [2026-04-18 14:29:33] Validation | Batch 390/1567 | Loss: 1.0473 [2026-04-18 14:29:34] Validation | Batch 400/1567 | Loss: 1.0485 [2026-04-18 14:29:35] Validation | Batch 410/1567 | Loss: 1.0479 [2026-04-18 14:29:36] Validation | Batch 420/1567 | Loss: 1.0474 [2026-04-18 14:29:37] Validation | Batch 430/1567 | Loss: 1.0474 [2026-04-18 14:29:38] Validation | Batch 440/1567 | Loss: 1.0463 [2026-04-18 14:29:38] Validation | Batch 450/1567 | Loss: 1.0464 [2026-04-18 14:29:39] Validation | Batch 460/1567 | Loss: 1.0454 [2026-04-18 14:29:40] Validation | Batch 470/1567 | Loss: 1.0446 [2026-04-18 14:29:41] Validation | Batch 480/1567 | Loss: 1.0425 [2026-04-18 14:29:41] Validation | Batch 490/1567 | Loss: 1.0424 [2026-04-18 14:29:42] Validation | Batch 500/1567 | Loss: 1.0420 [2026-04-18 14:29:43] Validation | Batch 510/1567 | Loss: 1.0443 [2026-04-18 14:29:44] Validation | Batch 520/1567 | Loss: 1.0460 [2026-04-18 14:29:45] Validation | Batch 530/1567 | Loss: 1.0456 [2026-04-18 14:29:46] Validation | Batch 540/1567 | Loss: 1.0483 [2026-04-18 14:29:46] Validation | Batch 550/1567 | Loss: 1.0518 [2026-04-18 14:29:47] Validation | Batch 560/1567 | Loss: 1.0516 [2026-04-18 14:29:48] Validation | Batch 570/1567 | Loss: 1.0516 [2026-04-18 14:29:49] Validation | Batch 580/1567 | Loss: 1.0507 [2026-04-18 14:29:50] Validation | Batch 590/1567 | Loss: 1.0493 [2026-04-18 14:29:51] Validation | Batch 600/1567 | Loss: 1.0477 [2026-04-18 14:29:52] Validation | Batch 610/1567 | Loss: 1.0466 [2026-04-18 14:29:53] Validation | Batch 620/1567 | Loss: 1.0481 [2026-04-18 14:29:54] Validation | Batch 630/1567 | Loss: 1.0461 [2026-04-18 14:29:54] Validation | Batch 640/1567 | Loss: 1.0477 [2026-04-18 14:29:55] Validation | Batch 650/1567 | Loss: 1.0468 [2026-04-18 14:29:56] Validation | Batch 660/1567 | Loss: 1.0456 [2026-04-18 14:29:57] Validation | Batch 670/1567 | Loss: 1.0436 [2026-04-18 14:29:57] Validation | Batch 680/1567 | Loss: 1.0431 [2026-04-18 14:29:58] Validation | Batch 690/1567 | Loss: 1.0440 [2026-04-18 14:29:59] Validation | Batch 700/1567 | Loss: 1.0426 [2026-04-18 14:30:00] Validation | Batch 710/1567 | Loss: 1.0438 [2026-04-18 14:30:01] Validation | Batch 720/1567 | Loss: 1.0431 [2026-04-18 14:30:01] Validation | Batch 730/1567 | Loss: 1.0437 [2026-04-18 14:30:02] Validation | Batch 740/1567 | Loss: 1.0448 [2026-04-18 14:30:03] Validation | Batch 750/1567 | Loss: 1.0454 [2026-04-18 14:30:04] Validation | Batch 760/1567 | Loss: 1.0452 [2026-04-18 14:30:05] Validation | Batch 770/1567 | Loss: 1.0472 [2026-04-18 14:30:05] Validation | Batch 780/1567 | Loss: 1.0485 [2026-04-18 14:30:06] Validation | Batch 790/1567 | Loss: 1.0480 [2026-04-18 14:30:07] Validation | Batch 800/1567 | Loss: 1.0498 [2026-04-18 14:30:08] Validation | Batch 810/1567 | Loss: 1.0498 [2026-04-18 14:30:08] Validation | Batch 820/1567 | Loss: 1.0495 [2026-04-18 14:30:09] Validation | Batch 830/1567 | Loss: 1.0479 [2026-04-18 14:30:10] Validation | Batch 840/1567 | Loss: 1.0480 [2026-04-18 14:30:11] Validation | Batch 850/1567 | Loss: 1.0467 [2026-04-18 14:30:11] Validation | Batch 860/1567 | Loss: 1.0483 [2026-04-18 14:30:12] Validation | Batch 870/1567 | Loss: 1.0487 [2026-04-18 14:30:13] Validation | Batch 880/1567 | Loss: 1.0496 [2026-04-18 14:30:13] Validation | Batch 890/1567 | Loss: 1.0502 [2026-04-18 14:30:14] Validation | Batch 900/1567 | Loss: 1.0522 [2026-04-18 14:30:15] Validation | Batch 910/1567 | Loss: 1.0523 [2026-04-18 14:30:16] Validation | Batch 920/1567 | Loss: 1.0543 [2026-04-18 14:30:16] Validation | Batch 930/1567 | Loss: 1.0520 [2026-04-18 14:30:17] Validation | Batch 940/1567 | Loss: 1.0517 [2026-04-18 14:30:18] Validation | Batch 950/1567 | Loss: 1.0507 [2026-04-18 14:30:19] Validation | Batch 960/1567 | Loss: 1.0493 [2026-04-18 14:30:19] Validation | Batch 970/1567 | Loss: 1.0510 [2026-04-18 14:30:20] Validation | Batch 980/1567 | Loss: 1.0513 [2026-04-18 14:30:21] Validation | Batch 990/1567 | Loss: 1.0507 [2026-04-18 14:30:22] Validation | Batch 1000/1567 | Loss: 1.0510 [2026-04-18 14:30:22] Validation | Batch 1010/1567 | Loss: 1.0487 [2026-04-18 14:30:23] Validation | Batch 1020/1567 | Loss: 1.0490 [2026-04-18 14:30:24] Validation | Batch 1030/1567 | Loss: 1.0506 [2026-04-18 14:30:25] Validation | Batch 1040/1567 | Loss: 1.0502 [2026-04-18 14:30:26] Validation | Batch 1050/1567 | Loss: 1.0512 [2026-04-18 14:30:26] Validation | Batch 1060/1567 | Loss: 1.0502 [2026-04-18 14:30:27] Validation | Batch 1070/1567 | Loss: 1.0495 [2026-04-18 14:30:28] Validation | Batch 1080/1567 | Loss: 1.0504 [2026-04-18 14:30:29] Validation | Batch 1090/1567 | Loss: 1.0502 [2026-04-18 14:30:30] Validation | Batch 1100/1567 | Loss: 1.0507 [2026-04-18 14:30:30] Validation | Batch 1110/1567 | Loss: 1.0506 [2026-04-18 14:30:31] Validation | Batch 1120/1567 | Loss: 1.0508 [2026-04-18 14:30:32] Validation | Batch 1130/1567 | Loss: 1.0509 [2026-04-18 14:30:33] Validation | Batch 1140/1567 | Loss: 1.0517 [2026-04-18 14:30:34] Validation | Batch 1150/1567 | Loss: 1.0521 [2026-04-18 14:30:34] Validation | Batch 1160/1567 | Loss: 1.0530 [2026-04-18 14:30:35] Validation | Batch 1170/1567 | Loss: 1.0527 [2026-04-18 14:30:36] Validation | Batch 1180/1567 | Loss: 1.0523 [2026-04-18 14:30:37] Validation | Batch 1190/1567 | Loss: 1.0534 [2026-04-18 14:30:38] Validation | Batch 1200/1567 | Loss: 1.0528 [2026-04-18 14:30:39] Validation | Batch 1210/1567 | Loss: 1.0517 [2026-04-18 14:30:39] Validation | Batch 1220/1567 | Loss: 1.0520 [2026-04-18 14:30:40] Validation | Batch 1230/1567 | Loss: 1.0541 [2026-04-18 14:30:41] Validation | Batch 1240/1567 | Loss: 1.0529 [2026-04-18 14:30:42] Validation | Batch 1250/1567 | Loss: 1.0529 [2026-04-18 14:30:43] Validation | Batch 1260/1567 | Loss: 1.0539 [2026-04-18 14:30:44] Validation | Batch 1270/1567 | Loss: 1.0539 [2026-04-18 14:30:44] Validation | Batch 1280/1567 | Loss: 1.0533 [2026-04-18 14:30:46] Validation | Batch 1290/1567 | Loss: 1.0535 [2026-04-18 14:30:46] Validation | Batch 1300/1567 | Loss: 1.0538 [2026-04-18 14:30:47] Validation | Batch 1310/1567 | Loss: 1.0542 [2026-04-18 14:30:48] Validation | Batch 1320/1567 | Loss: 1.0533 [2026-04-18 14:30:49] Validation | Batch 1330/1567 | Loss: 1.0529 [2026-04-18 14:30:49] Validation | Batch 1340/1567 | Loss: 1.0527 [2026-04-18 14:30:50] Validation | Batch 1350/1567 | Loss: 1.0535 [2026-04-18 14:30:51] Validation | Batch 1360/1567 | Loss: 1.0531 [2026-04-18 14:30:52] Validation | Batch 1370/1567 | Loss: 1.0535 [2026-04-18 14:30:53] Validation | Batch 1380/1567 | Loss: 1.0548 [2026-04-18 14:30:53] Validation | Batch 1390/1567 | Loss: 1.0549 [2026-04-18 14:30:54] Validation | Batch 1400/1567 | Loss: 1.0553 [2026-04-18 14:30:55] Validation | Batch 1410/1567 | Loss: 1.0551 [2026-04-18 14:30:55] Validation | Batch 1420/1567 | Loss: 1.0556 [2026-04-18 14:30:56] Validation | Batch 1430/1567 | Loss: 1.0553 [2026-04-18 14:30:57] Validation | Batch 1440/1567 | Loss: 1.0556 [2026-04-18 14:30:58] Validation | Batch 1450/1567 | Loss: 1.0549 [2026-04-18 14:30:58] Validation | Batch 1460/1567 | Loss: 1.0547 [2026-04-18 14:30:59] Validation | Batch 1470/1567 | Loss: 1.0538 [2026-04-18 14:31:00] Validation | Batch 1480/1567 | Loss: 1.0522 [2026-04-18 14:31:00] Validation | Batch 1490/1567 | Loss: 1.0522 [2026-04-18 14:31:01] Validation | Batch 1500/1567 | Loss: 1.0524 [2026-04-18 14:31:02] Validation | Batch 1510/1567 | Loss: 1.0522 [2026-04-18 14:31:03] Validation | Batch 1520/1567 | Loss: 1.0515 [2026-04-18 14:31:03] Validation | Batch 1530/1567 | Loss: 1.0523 [2026-04-18 14:31:05] Validation | Batch 1540/1567 | Loss: 1.0533 [2026-04-18 14:31:05] Validation | Batch 1550/1567 | Loss: 1.0536 [2026-04-18 14:31:06] Validation | Batch 1560/1567 | Loss: 1.0527 [2026-04-18 14:31:07] Validation | Batch 1567/1567 | Loss: 1.0531 [2026-04-18 14:31:07] Validation | Loss: 1.0531 | PPL: 2.89 | Time: 125.65s [2026-04-18 14:31:11] New best model saved! Val loss: 1.0531 [2026-04-18 14:31:14] Epoch 2 | Step 16010 | Loss: 0.7975 | LR: 2.00e-06 [2026-04-18 14:31:18] Epoch 2 | Step 16020 | Loss: 0.7975 | LR: 2.00e-06 [2026-04-18 14:31:22] Epoch 2 | Step 16030 | Loss: 0.7974 | LR: 2.00e-06 [2026-04-18 14:31:25] Epoch 2 | Step 16040 | Loss: 0.7976 | LR: 2.00e-06 [2026-04-18 14:31:29] Epoch 2 | Step 16050 | Loss: 0.7975 | LR: 2.00e-06 [2026-04-18 14:31:32] Epoch 2 | Step 16060 | Loss: 0.7975 | LR: 2.00e-06 [2026-04-18 14:31:36] Epoch 2 | Step 16070 | Loss: 0.7976 | LR: 2.00e-06 [2026-04-18 14:31:39] Epoch 2 | Step 16080 | Loss: 0.7974 | LR: 2.00e-06 [2026-04-18 14:31:42] Epoch 2 | Step 16090 | Loss: 0.7976 | LR: 2.00e-06 [2026-04-18 14:31:46] Epoch 2 | Step 16100 | Loss: 0.7976 | LR: 2.00e-06 [2026-04-18 14:31:49] Epoch 2 | Step 16110 | Loss: 0.7974 | LR: 2.00e-06 [2026-04-18 14:31:53] Epoch 2 | Step 16120 | Loss: 0.7973 | LR: 2.00e-06 [2026-04-18 14:31:56] Epoch 2 | Step 16130 | Loss: 0.7972 | LR: 2.00e-06 [2026-04-18 14:32:00] Epoch 2 | Step 16140 | Loss: 0.7970 | LR: 2.00e-06 [2026-04-18 14:32:03] Epoch 2 | Step 16150 | Loss: 0.7969 | LR: 2.00e-06 [2026-04-18 14:32:08] Epoch 2 | Step 16160 | Loss: 0.7970 | LR: 2.00e-06 [2026-04-18 14:32:11] Epoch 2 | Step 16170 | Loss: 0.7970 | LR: 2.00e-06 [2026-04-18 14:32:15] Epoch 2 | Step 16180 | Loss: 0.7970 | LR: 2.00e-06 [2026-04-18 14:32:19] Epoch 2 | Step 16190 | Loss: 0.7968 | LR: 2.00e-06 [2026-04-18 14:32:22] Epoch 2 | Step 16200 | Loss: 0.7966 | LR: 2.00e-06 [2026-04-18 14:32:26] Epoch 2 | Step 16210 | Loss: 0.7966 | LR: 2.00e-06 [2026-04-18 14:32:29] Epoch 2 | Step 16220 | Loss: 0.7968 | LR: 2.00e-06 [2026-04-18 14:32:33] Epoch 2 | Step 16230 | Loss: 0.7966 | LR: 2.00e-06 [2026-04-18 14:32:36] Epoch 2 | Step 16240 | Loss: 0.7965 | LR: 2.00e-06 [2026-04-18 14:32:39] Epoch 2 | Step 16250 | Loss: 0.7965 | LR: 2.00e-06 [2026-04-18 14:32:42] Epoch 2 | Step 16260 | Loss: 0.7965 | LR: 2.00e-06 [2026-04-18 14:32:46] Epoch 2 | Step 16270 | Loss: 0.7966 | LR: 2.00e-06 [2026-04-18 14:32:50] Epoch 2 | Step 16280 | Loss: 0.7965 | LR: 2.00e-06 [2026-04-18 14:32:53] Epoch 2 | Step 16290 | Loss: 0.7966 | LR: 2.00e-06 [2026-04-18 14:32:57] Epoch 2 | Step 16300 | Loss: 0.7965 | LR: 2.00e-06 [2026-04-18 14:33:00] Epoch 2 | Step 16310 | Loss: 0.7965 | LR: 2.00e-06 [2026-04-18 14:33:04] Epoch 2 | Step 16320 | Loss: 0.7967 | LR: 2.00e-06 [2026-04-18 14:33:08] Epoch 2 | Step 16330 | Loss: 0.7967 | LR: 2.00e-06 [2026-04-18 14:33:12] Epoch 2 | Step 16340 | Loss: 0.7967 | LR: 2.00e-06 [2026-04-18 14:33:15] Epoch 2 | Step 16350 | Loss: 0.7968 | LR: 2.00e-06 [2026-04-18 14:33:19] Epoch 2 | Step 16360 | Loss: 0.7968 | LR: 2.00e-06 [2026-04-18 14:33:22] Epoch 2 | Step 16370 | Loss: 0.7966 | LR: 2.00e-06 [2026-04-18 14:33:26] Epoch 2 | Step 16380 | Loss: 0.7965 | LR: 2.00e-06 [2026-04-18 14:33:29] Epoch 2 | Step 16390 | Loss: 0.7963 | LR: 2.00e-06 [2026-04-18 14:33:33] Epoch 2 | Step 16400 | Loss: 0.7961 | LR: 2.00e-06 [2026-04-18 14:33:36] Epoch 2 | Step 16410 | Loss: 0.7961 | LR: 2.00e-06 [2026-04-18 14:33:40] Epoch 2 | Step 16420 | Loss: 0.7961 | LR: 2.00e-06 [2026-04-18 14:33:43] Epoch 2 | Step 16430 | Loss: 0.7959 | LR: 2.00e-06 [2026-04-18 14:33:47] Epoch 2 | Step 16440 | Loss: 0.7960 | LR: 2.00e-06 [2026-04-18 14:33:51] Epoch 2 | Step 16450 | Loss: 0.7962 | LR: 2.00e-06 [2026-04-18 14:33:54] Epoch 2 | Step 16460 | Loss: 0.7960 | LR: 2.00e-06 [2026-04-18 14:33:57] Epoch 2 | Step 16470 | Loss: 0.7960 | LR: 2.00e-06 [2026-04-18 14:34:01] Epoch 2 | Step 16480 | Loss: 0.7959 | LR: 2.00e-06 [2026-04-18 14:34:04] Epoch 2 | Step 16490 | Loss: 0.7959 | LR: 2.00e-06 [2026-04-18 14:34:08] Epoch 2 | Step 16500 | Loss: 0.7958 | LR: 2.00e-06 [2026-04-18 14:34:12] Epoch 2 | Step 16510 | Loss: 0.7958 | LR: 2.00e-06 [2026-04-18 14:34:16] Epoch 2 | Step 16520 | Loss: 0.7958 | LR: 2.00e-06 [2026-04-18 14:34:19] Epoch 2 | Step 16530 | Loss: 0.7958 | LR: 2.00e-06 [2026-04-18 14:34:23] Epoch 2 | Step 16540 | Loss: 0.7957 | LR: 2.00e-06 [2026-04-18 14:34:26] Epoch 2 | Step 16550 | Loss: 0.7957 | LR: 2.00e-06 [2026-04-18 14:34:30] Epoch 2 | Step 16560 | Loss: 0.7956 | LR: 2.00e-06 [2026-04-18 14:34:34] Epoch 2 | Step 16570 | Loss: 0.7956 | LR: 2.00e-06 [2026-04-18 14:34:37] Epoch 2 | Step 16580 | Loss: 0.7955 | LR: 2.00e-06 [2026-04-18 14:34:41] Epoch 2 | Step 16590 | Loss: 0.7953 | LR: 2.00e-06 [2026-04-18 14:34:44] Epoch 2 | Step 16600 | Loss: 0.7952 | LR: 2.00e-06 [2026-04-18 14:34:48] Epoch 2 | Step 16610 | Loss: 0.7954 | LR: 2.00e-06 [2026-04-18 14:34:51] Epoch 2 | Step 16620 | Loss: 0.7955 | LR: 2.00e-06 [2026-04-18 14:34:55] Epoch 2 | Step 16630 | Loss: 0.7956 | LR: 2.00e-06 [2026-04-18 14:34:59] Epoch 2 | Step 16640 | Loss: 0.7954 | LR: 2.00e-06 [2026-04-18 14:35:02] Epoch 2 | Step 16650 | Loss: 0.7953 | LR: 2.00e-06 [2026-04-18 14:35:06] Epoch 2 | Step 16660 | Loss: 0.7954 | LR: 2.00e-06 [2026-04-18 14:35:10] Epoch 2 | Step 16670 | Loss: 0.7954 | LR: 2.00e-06 [2026-04-18 14:35:13] Epoch 2 | Step 16680 | Loss: 0.7955 | LR: 2.00e-06 [2026-04-18 14:35:17] Epoch 2 | Step 16690 | Loss: 0.7955 | LR: 2.00e-06 [2026-04-18 14:35:21] Epoch 2 | Step 16700 | Loss: 0.7954 | LR: 2.00e-06 [2026-04-18 14:35:24] Epoch 2 | Step 16710 | Loss: 0.7953 | LR: 2.00e-06 [2026-04-18 14:35:28] Epoch 2 | Step 16720 | Loss: 0.7953 | LR: 2.00e-06 [2026-04-18 14:35:31] Epoch 2 | Step 16730 | Loss: 0.7952 | LR: 2.00e-06 [2026-04-18 14:35:35] Epoch 2 | Step 16740 | Loss: 0.7952 | LR: 2.00e-06 [2026-04-18 14:35:39] Epoch 2 | Step 16750 | Loss: 0.7952 | LR: 2.00e-06 [2026-04-18 14:35:42] Epoch 2 | Step 16760 | Loss: 0.7951 | LR: 2.00e-06 [2026-04-18 14:35:45] Epoch 2 | Step 16770 | Loss: 0.7952 | LR: 2.00e-06 [2026-04-18 14:35:49] Epoch 2 | Step 16780 | Loss: 0.7951 | LR: 2.00e-06 [2026-04-18 14:35:53] Epoch 2 | Step 16790 | Loss: 0.7950 | LR: 2.00e-06 [2026-04-18 14:35:56] Epoch 2 | Step 16800 | Loss: 0.7950 | LR: 2.00e-06 [2026-04-18 14:35:59] Epoch 2 | Step 16810 | Loss: 0.7951 | LR: 2.00e-06 [2026-04-18 14:36:03] Epoch 2 | Step 16820 | Loss: 0.7952 | LR: 2.00e-06 [2026-04-18 14:36:06] Epoch 2 | Step 16830 | Loss: 0.7950 | LR: 2.00e-06 [2026-04-18 14:36:09] Epoch 2 | Step 16840 | Loss: 0.7949 | LR: 2.00e-06 [2026-04-18 14:36:13] Epoch 2 | Step 16850 | Loss: 0.7949 | LR: 2.00e-06 [2026-04-18 14:36:16] Epoch 2 | Step 16860 | Loss: 0.7948 | LR: 2.00e-06 [2026-04-18 14:36:20] Epoch 2 | Step 16870 | Loss: 0.7948 | LR: 2.00e-06 [2026-04-18 14:36:23] Epoch 2 | Step 16880 | Loss: 0.7948 | LR: 2.00e-06 [2026-04-18 14:36:27] Epoch 2 | Step 16890 | Loss: 0.7948 | LR: 2.00e-06 [2026-04-18 14:36:31] Epoch 2 | Step 16900 | Loss: 0.7949 | LR: 2.00e-06 [2026-04-18 14:36:34] Epoch 2 | Step 16910 | Loss: 0.7948 | LR: 2.00e-06 [2026-04-18 14:36:38] Epoch 2 | Step 16920 | Loss: 0.7948 | LR: 2.00e-06 [2026-04-18 14:36:41] Epoch 2 | Step 16930 | Loss: 0.7949 | LR: 2.00e-06 [2026-04-18 14:36:45] Epoch 2 | Step 16940 | Loss: 0.7947 | LR: 2.00e-06 [2026-04-18 14:36:48] Epoch 2 | Step 16950 | Loss: 0.7946 | LR: 2.00e-06 [2026-04-18 14:36:52] Epoch 2 | Step 16960 | Loss: 0.7945 | LR: 2.00e-06 [2026-04-18 14:36:56] Epoch 2 | Step 16970 | Loss: 0.7946 | LR: 2.00e-06 [2026-04-18 14:36:59] Epoch 2 | Step 16980 | Loss: 0.7946 | LR: 2.00e-06 [2026-04-18 14:37:02] Epoch 2 | Step 16990 | Loss: 0.7947 | LR: 2.00e-06 [2026-04-18 14:37:06] Epoch 2 | Step 17000 | Loss: 0.7945 | LR: 2.00e-06 [2026-04-18 14:37:07] Validation | Batch 10/1567 | Loss: 0.9349 [2026-04-18 14:37:07] Validation | Batch 20/1567 | Loss: 1.0004 [2026-04-18 14:37:08] Validation | Batch 30/1567 | Loss: 1.0394 [2026-04-18 14:37:09] Validation | Batch 40/1567 | Loss: 1.0611 [2026-04-18 14:37:10] Validation | Batch 50/1567 | Loss: 1.0369 [2026-04-18 14:37:11] Validation | Batch 60/1567 | Loss: 1.0240 [2026-04-18 14:37:12] Validation | Batch 70/1567 | Loss: 1.0088 [2026-04-18 14:37:13] Validation | Batch 80/1567 | Loss: 1.0264 [2026-04-18 14:37:14] Validation | Batch 90/1567 | Loss: 1.0343 [2026-04-18 14:37:15] Validation | Batch 100/1567 | Loss: 1.0435 [2026-04-18 14:37:15] Validation | Batch 110/1567 | Loss: 1.0354 [2026-04-18 14:37:16] Validation | Batch 120/1567 | Loss: 1.0462 [2026-04-18 14:37:17] Validation | Batch 130/1567 | Loss: 1.0475 [2026-04-18 14:37:18] Validation | Batch 140/1567 | Loss: 1.0497 [2026-04-18 14:37:19] Validation | Batch 150/1567 | Loss: 1.0575 [2026-04-18 14:37:19] Validation | Batch 160/1567 | Loss: 1.0588 [2026-04-18 14:37:20] Validation | Batch 170/1567 | Loss: 1.0439 [2026-04-18 14:37:21] Validation | Batch 180/1567 | Loss: 1.0459 [2026-04-18 14:37:22] Validation | Batch 190/1567 | Loss: 1.0422 [2026-04-18 14:37:23] Validation | Batch 200/1567 | Loss: 1.0453 [2026-04-18 14:37:24] Validation | Batch 210/1567 | Loss: 1.0462 [2026-04-18 14:37:24] Validation | Batch 220/1567 | Loss: 1.0485 [2026-04-18 14:37:25] Validation | Batch 230/1567 | Loss: 1.0523 [2026-04-18 14:37:26] Validation | Batch 240/1567 | Loss: 1.0505 [2026-04-18 14:37:27] Validation | Batch 250/1567 | Loss: 1.0445 [2026-04-18 14:37:27] Validation | Batch 260/1567 | Loss: 1.0398 [2026-04-18 14:37:28] Validation | Batch 270/1567 | Loss: 1.0367 [2026-04-18 14:37:30] Validation | Batch 280/1567 | Loss: 1.0379 [2026-04-18 14:37:31] Validation | Batch 290/1567 | Loss: 1.0433 [2026-04-18 14:37:31] Validation | Batch 300/1567 | Loss: 1.0482 [2026-04-18 14:37:32] Validation | Batch 310/1567 | Loss: 1.0471 [2026-04-18 14:37:33] Validation | Batch 320/1567 | Loss: 1.0476 [2026-04-18 14:37:34] Validation | Batch 330/1567 | Loss: 1.0447 [2026-04-18 14:37:35] Validation | Batch 340/1567 | Loss: 1.0487 [2026-04-18 14:37:36] Validation | Batch 350/1567 | Loss: 1.0478 [2026-04-18 14:37:36] Validation | Batch 360/1567 | Loss: 1.0456 [2026-04-18 14:37:37] Validation | Batch 370/1567 | Loss: 1.0429 [2026-04-18 14:37:38] Validation | Batch 380/1567 | Loss: 1.0462 [2026-04-18 14:37:39] Validation | Batch 390/1567 | Loss: 1.0472 [2026-04-18 14:37:39] Validation | Batch 400/1567 | Loss: 1.0484 [2026-04-18 14:37:40] Validation | Batch 410/1567 | Loss: 1.0478 [2026-04-18 14:37:41] Validation | Batch 420/1567 | Loss: 1.0474 [2026-04-18 14:37:42] Validation | Batch 430/1567 | Loss: 1.0473 [2026-04-18 14:37:43] Validation | Batch 440/1567 | Loss: 1.0462 [2026-04-18 14:37:44] Validation | Batch 450/1567 | Loss: 1.0462 [2026-04-18 14:37:45] Validation | Batch 460/1567 | Loss: 1.0452 [2026-04-18 14:37:45] Validation | Batch 470/1567 | Loss: 1.0445 [2026-04-18 14:37:46] Validation | Batch 480/1567 | Loss: 1.0424 [2026-04-18 14:37:47] Validation | Batch 490/1567 | Loss: 1.0422 [2026-04-18 14:37:48] Validation | Batch 500/1567 | Loss: 1.0419 [2026-04-18 14:37:48] Validation | Batch 510/1567 | Loss: 1.0441 [2026-04-18 14:37:49] Validation | Batch 520/1567 | Loss: 1.0458 [2026-04-18 14:37:50] Validation | Batch 530/1567 | Loss: 1.0455 [2026-04-18 14:37:51] Validation | Batch 540/1567 | Loss: 1.0481 [2026-04-18 14:37:52] Validation | Batch 550/1567 | Loss: 1.0517 [2026-04-18 14:37:53] Validation | Batch 560/1567 | Loss: 1.0514 [2026-04-18 14:37:53] Validation | Batch 570/1567 | Loss: 1.0514 [2026-04-18 14:37:54] Validation | Batch 580/1567 | Loss: 1.0505 [2026-04-18 14:37:55] Validation | Batch 590/1567 | Loss: 1.0492 [2026-04-18 14:37:56] Validation | Batch 600/1567 | Loss: 1.0475 [2026-04-18 14:37:57] Validation | Batch 610/1567 | Loss: 1.0464 [2026-04-18 14:37:58] Validation | Batch 620/1567 | Loss: 1.0478 [2026-04-18 14:37:59] Validation | Batch 630/1567 | Loss: 1.0458 [2026-04-18 14:37:59] Validation | Batch 640/1567 | Loss: 1.0474 [2026-04-18 14:38:01] Validation | Batch 650/1567 | Loss: 1.0465 [2026-04-18 14:38:01] Validation | Batch 660/1567 | Loss: 1.0454 [2026-04-18 14:38:02] Validation | Batch 670/1567 | Loss: 1.0434 [2026-04-18 14:38:03] Validation | Batch 680/1567 | Loss: 1.0428 [2026-04-18 14:38:03] Validation | Batch 690/1567 | Loss: 1.0436 [2026-04-18 14:38:04] Validation | Batch 700/1567 | Loss: 1.0422 [2026-04-18 14:38:05] Validation | Batch 710/1567 | Loss: 1.0435 [2026-04-18 14:38:06] Validation | Batch 720/1567 | Loss: 1.0427 [2026-04-18 14:38:07] Validation | Batch 730/1567 | Loss: 1.0434 [2026-04-18 14:38:07] Validation | Batch 740/1567 | Loss: 1.0445 [2026-04-18 14:38:08] Validation | Batch 750/1567 | Loss: 1.0450 [2026-04-18 14:38:09] Validation | Batch 760/1567 | Loss: 1.0448 [2026-04-18 14:38:10] Validation | Batch 770/1567 | Loss: 1.0468 [2026-04-18 14:38:11] Validation | Batch 780/1567 | Loss: 1.0481 [2026-04-18 14:38:12] Validation | Batch 790/1567 | Loss: 1.0475 [2026-04-18 14:38:12] Validation | Batch 800/1567 | Loss: 1.0494 [2026-04-18 14:38:13] Validation | Batch 810/1567 | Loss: 1.0493 [2026-04-18 14:38:14] Validation | Batch 820/1567 | Loss: 1.0490 [2026-04-18 14:38:15] Validation | Batch 830/1567 | Loss: 1.0474 [2026-04-18 14:38:15] Validation | Batch 840/1567 | Loss: 1.0475 [2026-04-18 14:38:16] Validation | Batch 850/1567 | Loss: 1.0463 [2026-04-18 14:38:17] Validation | Batch 860/1567 | Loss: 1.0478 [2026-04-18 14:38:17] Validation | Batch 870/1567 | Loss: 1.0483 [2026-04-18 14:38:18] Validation | Batch 880/1567 | Loss: 1.0492 [2026-04-18 14:38:19] Validation | Batch 890/1567 | Loss: 1.0498 [2026-04-18 14:38:20] Validation | Batch 900/1567 | Loss: 1.0517 [2026-04-18 14:38:20] Validation | Batch 910/1567 | Loss: 1.0518 [2026-04-18 14:38:21] Validation | Batch 920/1567 | Loss: 1.0539 [2026-04-18 14:38:22] Validation | Batch 930/1567 | Loss: 1.0516 [2026-04-18 14:38:23] Validation | Batch 940/1567 | Loss: 1.0513 [2026-04-18 14:38:23] Validation | Batch 950/1567 | Loss: 1.0503 [2026-04-18 14:38:24] Validation | Batch 960/1567 | Loss: 1.0489 [2026-04-18 14:38:25] Validation | Batch 970/1567 | Loss: 1.0505 [2026-04-18 14:38:26] Validation | Batch 980/1567 | Loss: 1.0509 [2026-04-18 14:38:26] Validation | Batch 990/1567 | Loss: 1.0504 [2026-04-18 14:38:27] Validation | Batch 1000/1567 | Loss: 1.0507 [2026-04-18 14:38:28] Validation | Batch 1010/1567 | Loss: 1.0484 [2026-04-18 14:38:28] Validation | Batch 1020/1567 | Loss: 1.0487 [2026-04-18 14:38:29] Validation | Batch 1030/1567 | Loss: 1.0503 [2026-04-18 14:38:30] Validation | Batch 1040/1567 | Loss: 1.0499 [2026-04-18 14:38:31] Validation | Batch 1050/1567 | Loss: 1.0509 [2026-04-18 14:38:32] Validation | Batch 1060/1567 | Loss: 1.0499 [2026-04-18 14:38:33] Validation | Batch 1070/1567 | Loss: 1.0492 [2026-04-18 14:38:33] Validation | Batch 1080/1567 | Loss: 1.0501 [2026-04-18 14:38:34] Validation | Batch 1090/1567 | Loss: 1.0498 [2026-04-18 14:38:35] Validation | Batch 1100/1567 | Loss: 1.0504 [2026-04-18 14:38:35] Validation | Batch 1110/1567 | Loss: 1.0503 [2026-04-18 14:38:36] Validation | Batch 1120/1567 | Loss: 1.0505 [2026-04-18 14:38:37] Validation | Batch 1130/1567 | Loss: 1.0506 [2026-04-18 14:38:38] Validation | Batch 1140/1567 | Loss: 1.0514 [2026-04-18 14:38:39] Validation | Batch 1150/1567 | Loss: 1.0518 [2026-04-18 14:38:39] Validation | Batch 1160/1567 | Loss: 1.0527 [2026-04-18 14:38:40] Validation | Batch 1170/1567 | Loss: 1.0524 [2026-04-18 14:38:41] Validation | Batch 1180/1567 | Loss: 1.0520 [2026-04-18 14:38:42] Validation | Batch 1190/1567 | Loss: 1.0531 [2026-04-18 14:38:43] Validation | Batch 1200/1567 | Loss: 1.0524 [2026-04-18 14:38:44] Validation | Batch 1210/1567 | Loss: 1.0513 [2026-04-18 14:38:44] Validation | Batch 1220/1567 | Loss: 1.0517 [2026-04-18 14:38:45] Validation | Batch 1230/1567 | Loss: 1.0537 [2026-04-18 14:38:46] Validation | Batch 1240/1567 | Loss: 1.0525 [2026-04-18 14:38:47] Validation | Batch 1250/1567 | Loss: 1.0525 [2026-04-18 14:38:47] Validation | Batch 1260/1567 | Loss: 1.0535 [2026-04-18 14:38:49] Validation | Batch 1270/1567 | Loss: 1.0535 [2026-04-18 14:38:49] Validation | Batch 1280/1567 | Loss: 1.0529 [2026-04-18 14:38:51] Validation | Batch 1290/1567 | Loss: 1.0532 [2026-04-18 14:38:51] Validation | Batch 1300/1567 | Loss: 1.0534 [2026-04-18 14:38:52] Validation | Batch 1310/1567 | Loss: 1.0538 [2026-04-18 14:38:53] Validation | Batch 1320/1567 | Loss: 1.0529 [2026-04-18 14:38:54] Validation | Batch 1330/1567 | Loss: 1.0525 [2026-04-18 14:38:54] Validation | Batch 1340/1567 | Loss: 1.0523 [2026-04-18 14:38:55] Validation | Batch 1350/1567 | Loss: 1.0531 [2026-04-18 14:38:56] Validation | Batch 1360/1567 | Loss: 1.0528 [2026-04-18 14:38:57] Validation | Batch 1370/1567 | Loss: 1.0531 [2026-04-18 14:38:58] Validation | Batch 1380/1567 | Loss: 1.0544 [2026-04-18 14:38:58] Validation | Batch 1390/1567 | Loss: 1.0545 [2026-04-18 14:38:59] Validation | Batch 1400/1567 | Loss: 1.0549 [2026-04-18 14:38:59] Validation | Batch 1410/1567 | Loss: 1.0547 [2026-04-18 14:39:00] Validation | Batch 1420/1567 | Loss: 1.0553 [2026-04-18 14:39:01] Validation | Batch 1430/1567 | Loss: 1.0550 [2026-04-18 14:39:02] Validation | Batch 1440/1567 | Loss: 1.0553 [2026-04-18 14:39:03] Validation | Batch 1450/1567 | Loss: 1.0546 [2026-04-18 14:39:03] Validation | Batch 1460/1567 | Loss: 1.0544 [2026-04-18 14:39:04] Validation | Batch 1470/1567 | Loss: 1.0534 [2026-04-18 14:39:05] Validation | Batch 1480/1567 | Loss: 1.0519 [2026-04-18 14:39:05] Validation | Batch 1490/1567 | Loss: 1.0519 [2026-04-18 14:39:06] Validation | Batch 1500/1567 | Loss: 1.0520 [2026-04-18 14:39:07] Validation | Batch 1510/1567 | Loss: 1.0518 [2026-04-18 14:39:07] Validation | Batch 1520/1567 | Loss: 1.0511 [2026-04-18 14:39:08] Validation | Batch 1530/1567 | Loss: 1.0519 [2026-04-18 14:39:09] Validation | Batch 1540/1567 | Loss: 1.0529 [2026-04-18 14:39:10] Validation | Batch 1550/1567 | Loss: 1.0533 [2026-04-18 14:39:11] Validation | Batch 1560/1567 | Loss: 1.0523 [2026-04-18 14:39:11] Validation | Batch 1567/1567 | Loss: 1.0527 [2026-04-18 14:39:11] Validation | Loss: 1.0527 | PPL: 2.89 | Time: 125.68s [2026-04-18 14:39:15] New best model saved! Val loss: 1.0527 [2026-04-18 14:39:19] Epoch 2 | Step 17010 | Loss: 0.7944 | LR: 2.00e-06 [2026-04-18 14:39:22] Epoch 2 | Step 17020 | Loss: 0.7944 | LR: 2.00e-06 [2026-04-18 14:39:26] Epoch 2 | Step 17030 | Loss: 0.7943 | LR: 2.00e-06 [2026-04-18 14:39:29] Epoch 2 | Step 17040 | Loss: 0.7944 | LR: 2.00e-06 [2026-04-18 14:39:33] Epoch 2 | Step 17050 | Loss: 0.7945 | LR: 2.00e-06 [2026-04-18 14:39:37] Epoch 2 | Step 17060 | Loss: 0.7945 | LR: 2.00e-06 [2026-04-18 14:39:41] Epoch 2 | Step 17070 | Loss: 0.7946 | LR: 2.00e-06 [2026-04-18 14:39:44] Epoch 2 | Step 17080 | Loss: 0.7947 | LR: 2.00e-06 [2026-04-18 14:39:48] Epoch 2 | Step 17090 | Loss: 0.7948 | LR: 2.00e-06 [2026-04-18 14:39:51] Epoch 2 | Step 17100 | Loss: 0.7948 | LR: 2.00e-06 [2026-04-18 14:39:55] Epoch 2 | Step 17110 | Loss: 0.7949 | LR: 2.00e-06 [2026-04-18 14:39:59] Epoch 2 | Step 17120 | Loss: 0.7947 | LR: 2.00e-06 [2026-04-18 14:40:02] Epoch 2 | Step 17130 | Loss: 0.7946 | LR: 2.00e-06 [2026-04-18 14:40:06] Epoch 2 | Step 17140 | Loss: 0.7946 | LR: 2.00e-06 [2026-04-18 14:40:09] Epoch 2 | Step 17150 | Loss: 0.7947 | LR: 2.00e-06 [2026-04-18 14:40:12] Epoch 2 | Step 17160 | Loss: 0.7948 | LR: 2.00e-06 [2026-04-18 14:40:15] Epoch 2 | Step 17170 | Loss: 0.7948 | LR: 2.00e-06 [2026-04-18 14:40:19] Epoch 2 | Step 17180 | Loss: 0.7948 | LR: 2.00e-06 [2026-04-18 14:40:23] Epoch 2 | Step 17190 | Loss: 0.7948 | LR: 2.00e-06 [2026-04-18 14:40:27] Epoch 2 | Step 17200 | Loss: 0.7946 | LR: 2.00e-06 [2026-04-18 14:40:30] Epoch 2 | Step 17210 | Loss: 0.7947 | LR: 2.00e-06 [2026-04-18 14:40:34] Epoch 2 | Step 17220 | Loss: 0.7949 | LR: 2.00e-06 [2026-04-18 14:40:38] Epoch 2 | Step 17230 | Loss: 0.7949 | LR: 2.00e-06 [2026-04-18 14:40:41] Epoch 2 | Step 17240 | Loss: 0.7949 | LR: 2.00e-06 [2026-04-18 14:40:44] Epoch 2 | Step 17250 | Loss: 0.7952 | LR: 2.00e-06 [2026-04-18 14:40:48] Epoch 2 | Step 17260 | Loss: 0.7951 | LR: 2.00e-06 [2026-04-18 14:40:51] Epoch 2 | Step 17270 | Loss: 0.7949 | LR: 2.00e-06 [2026-04-18 14:40:55] Epoch 2 | Step 17280 | Loss: 0.7949 | LR: 2.00e-06 [2026-04-18 14:40:59] Epoch 2 | Step 17290 | Loss: 0.7949 | LR: 2.00e-06 [2026-04-18 14:41:02] Epoch 2 | Step 17300 | Loss: 0.7947 | LR: 2.00e-06 [2026-04-18 14:41:05] Epoch 2 | Step 17310 | Loss: 0.7948 | LR: 2.00e-06 [2026-04-18 14:41:09] Epoch 2 | Step 17320 | Loss: 0.7947 | LR: 2.00e-06 [2026-04-18 14:41:12] Epoch 2 | Step 17330 | Loss: 0.7946 | LR: 2.00e-06 [2026-04-18 14:41:16] Epoch 2 | Step 17340 | Loss: 0.7944 | LR: 2.00e-06 [2026-04-18 14:41:19] Epoch 2 | Step 17350 | Loss: 0.7942 | LR: 2.00e-06 [2026-04-18 14:41:23] Epoch 2 | Step 17360 | Loss: 0.7941 | LR: 2.00e-06 [2026-04-18 14:41:27] Epoch 2 | Step 17370 | Loss: 0.7941 | LR: 2.00e-06 [2026-04-18 14:41:30] Epoch 2 | Step 17380 | Loss: 0.7940 | LR: 2.00e-06 [2026-04-18 14:41:34] Epoch 2 | Step 17390 | Loss: 0.7941 | LR: 2.00e-06 [2026-04-18 14:41:37] Epoch 2 | Step 17400 | Loss: 0.7941 | LR: 2.00e-06 [2026-04-18 14:41:41] Epoch 2 | Step 17410 | Loss: 0.7942 | LR: 2.00e-06 [2026-04-18 14:41:44] Epoch 2 | Step 17420 | Loss: 0.7941 | LR: 2.00e-06 [2026-04-18 14:41:47] Epoch 2 | Step 17430 | Loss: 0.7941 | LR: 2.00e-06 [2026-04-18 14:41:51] Epoch 2 | Step 17440 | Loss: 0.7940 | LR: 2.00e-06 [2026-04-18 14:41:54] Epoch 2 | Step 17450 | Loss: 0.7940 | LR: 2.00e-06 [2026-04-18 14:41:58] Epoch 2 | Step 17460 | Loss: 0.7940 | LR: 2.00e-06 [2026-04-18 14:42:01] Epoch 2 | Step 17470 | Loss: 0.7941 | LR: 2.00e-06 [2026-04-18 14:42:05] Epoch 2 | Step 17480 | Loss: 0.7940 | LR: 2.00e-06 [2026-04-18 14:42:09] Epoch 2 | Step 17490 | Loss: 0.7939 | LR: 2.00e-06 [2026-04-18 14:42:12] Epoch 2 | Step 17500 | Loss: 0.7939 | LR: 2.00e-06 [2026-04-18 14:42:16] Epoch 2 | Step 17510 | Loss: 0.7938 | LR: 2.00e-06 [2026-04-18 14:42:20] Epoch 2 | Step 17520 | Loss: 0.7939 | LR: 2.00e-06 [2026-04-18 14:42:24] Epoch 2 | Step 17530 | Loss: 0.7939 | LR: 2.00e-06 [2026-04-18 14:42:27] Epoch 2 | Step 17540 | Loss: 0.7938 | LR: 2.00e-06 [2026-04-18 14:42:31] Epoch 2 | Step 17550 | Loss: 0.7938 | LR: 2.00e-06 [2026-04-18 14:42:34] Epoch 2 | Step 17560 | Loss: 0.7938 | LR: 2.00e-06 [2026-04-18 14:42:38] Epoch 2 | Step 17570 | Loss: 0.7937 | LR: 2.00e-06 [2026-04-18 14:42:41] Epoch 2 | Step 17580 | Loss: 0.7936 | LR: 2.00e-06 [2026-04-18 14:42:45] Epoch 2 | Step 17590 | Loss: 0.7935 | LR: 2.00e-06 [2026-04-18 14:42:48] Epoch 2 | Step 17600 | Loss: 0.7935 | LR: 2.00e-06 [2026-04-18 14:42:53] Epoch 2 | Step 17610 | Loss: 0.7934 | LR: 2.00e-06 [2026-04-18 14:42:56] Epoch 2 | Step 17620 | Loss: 0.7934 | LR: 2.00e-06 [2026-04-18 14:43:00] Epoch 2 | Step 17630 | Loss: 0.7935 | LR: 2.00e-06 [2026-04-18 14:43:03] Epoch 2 | Step 17640 | Loss: 0.7936 | LR: 2.00e-06 [2026-04-18 14:43:07] Epoch 2 | Step 17650 | Loss: 0.7935 | LR: 2.00e-06 [2026-04-18 14:43:11] Epoch 2 | Step 17660 | Loss: 0.7935 | LR: 2.00e-06 [2026-04-18 14:43:14] Epoch 2 | Step 17670 | Loss: 0.7936 | LR: 2.00e-06 [2026-04-18 14:43:18] Epoch 2 | Step 17680 | Loss: 0.7935 | LR: 2.00e-06 [2026-04-18 14:43:21] Epoch 2 | Step 17690 | Loss: 0.7934 | LR: 2.00e-06 [2026-04-18 14:43:25] Epoch 2 | Step 17700 | Loss: 0.7934 | LR: 2.00e-06 [2026-04-18 14:43:28] Epoch 2 | Step 17710 | Loss: 0.7933 | LR: 2.00e-06 [2026-04-18 14:43:32] Epoch 2 | Step 17720 | Loss: 0.7934 | LR: 2.00e-06 [2026-04-18 14:43:36] Epoch 2 | Step 17730 | Loss: 0.7933 | LR: 2.00e-06 [2026-04-18 14:43:39] Epoch 2 | Step 17740 | Loss: 0.7936 | LR: 2.00e-06 [2026-04-18 14:43:43] Epoch 2 | Step 17750 | Loss: 0.7936 | LR: 2.00e-06 [2026-04-18 14:43:47] Epoch 2 | Step 17760 | Loss: 0.7937 | LR: 2.00e-06 [2026-04-18 14:43:50] Epoch 2 | Step 17770 | Loss: 0.7937 | LR: 2.00e-06 [2026-04-18 14:43:54] Epoch 2 | Step 17780 | Loss: 0.7936 | LR: 2.00e-06 [2026-04-18 14:43:58] Epoch 2 | Step 17790 | Loss: 0.7937 | LR: 2.00e-06 [2026-04-18 14:44:02] Epoch 2 | Step 17800 | Loss: 0.7937 | LR: 2.00e-06 [2026-04-18 14:44:05] Epoch 2 | Step 17810 | Loss: 0.7937 | LR: 2.00e-06 [2026-04-18 14:44:09] Epoch 2 | Step 17820 | Loss: 0.7938 | LR: 2.00e-06 [2026-04-18 14:44:12] Epoch 2 | Step 17830 | Loss: 0.7939 | LR: 2.00e-06 [2026-04-18 14:44:16] Epoch 2 | Step 17840 | Loss: 0.7938 | LR: 2.00e-06 [2026-04-18 14:44:19] Epoch 2 | Step 17850 | Loss: 0.7938 | LR: 2.00e-06 [2026-04-18 14:44:23] Epoch 2 | Step 17860 | Loss: 0.7938 | LR: 2.00e-06 [2026-04-18 14:44:27] Epoch 2 | Step 17870 | Loss: 0.7938 | LR: 2.00e-06 [2026-04-18 14:44:30] Epoch 2 | Step 17880 | Loss: 0.7936 | LR: 2.00e-06 [2026-04-18 14:44:33] Epoch 2 | Step 17890 | Loss: 0.7937 | LR: 2.00e-06 [2026-04-18 14:44:37] Epoch 2 | Step 17900 | Loss: 0.7937 | LR: 2.00e-06 [2026-04-18 14:44:40] Epoch 2 | Step 17910 | Loss: 0.7937 | LR: 2.00e-06 [2026-04-18 14:44:45] Epoch 2 | Step 17920 | Loss: 0.7937 | LR: 2.00e-06 [2026-04-18 14:44:48] Epoch 2 | Step 17930 | Loss: 0.7938 | LR: 2.00e-06 [2026-04-18 14:44:52] Epoch 2 | Step 17940 | Loss: 0.7936 | LR: 2.00e-06 [2026-04-18 14:44:56] Epoch 2 | Step 17950 | Loss: 0.7936 | LR: 2.00e-06 [2026-04-18 14:44:59] Epoch 2 | Step 17960 | Loss: 0.7937 | LR: 2.00e-06 [2026-04-18 14:45:03] Epoch 2 | Step 17970 | Loss: 0.7935 | LR: 2.00e-06 [2026-04-18 14:45:07] Epoch 2 | Step 17980 | Loss: 0.7937 | LR: 2.00e-06 [2026-04-18 14:45:10] Epoch 2 | Step 17990 | Loss: 0.7938 | LR: 2.00e-06 [2026-04-18 14:45:14] Epoch 2 | Step 18000 | Loss: 0.7938 | LR: 2.00e-06 [2026-04-18 14:45:23] Checkpoint saved: outputs/2026-04-18/12-19-14/checkpoints/checkpoint_step_18000.pt [2026-04-18 14:45:39] Validation | Batch 10/1567 | Loss: 0.9365 [2026-04-18 14:45:40] Validation | Batch 20/1567 | Loss: 1.0007 [2026-04-18 14:45:41] Validation | Batch 30/1567 | Loss: 1.0393 [2026-04-18 14:45:42] Validation | Batch 40/1567 | Loss: 1.0615 [2026-04-18 14:45:42] Validation | Batch 50/1567 | Loss: 1.0365 [2026-04-18 14:45:43] Validation | Batch 60/1567 | Loss: 1.0233 [2026-04-18 14:45:44] Validation | Batch 70/1567 | Loss: 1.0083 [2026-04-18 14:45:45] Validation | Batch 80/1567 | Loss: 1.0259 [2026-04-18 14:45:46] Validation | Batch 90/1567 | Loss: 1.0337 [2026-04-18 14:45:47] Validation | Batch 100/1567 | Loss: 1.0426 [2026-04-18 14:45:47] Validation | Batch 110/1567 | Loss: 1.0346 [2026-04-18 14:45:48] Validation | Batch 120/1567 | Loss: 1.0456 [2026-04-18 14:45:49] Validation | Batch 130/1567 | Loss: 1.0469 [2026-04-18 14:45:50] Validation | Batch 140/1567 | Loss: 1.0492 [2026-04-18 14:45:51] Validation | Batch 150/1567 | Loss: 1.0570 [2026-04-18 14:45:52] Validation | Batch 160/1567 | Loss: 1.0584 [2026-04-18 14:45:52] Validation | Batch 170/1567 | Loss: 1.0435 [2026-04-18 14:45:53] Validation | Batch 180/1567 | Loss: 1.0455 [2026-04-18 14:45:54] Validation | Batch 190/1567 | Loss: 1.0420 [2026-04-18 14:45:55] Validation | Batch 200/1567 | Loss: 1.0450 [2026-04-18 14:45:56] Validation | Batch 210/1567 | Loss: 1.0460 [2026-04-18 14:45:57] Validation | Batch 220/1567 | Loss: 1.0482 [2026-04-18 14:45:58] Validation | Batch 230/1567 | Loss: 1.0519 [2026-04-18 14:45:58] Validation | Batch 240/1567 | Loss: 1.0502 [2026-04-18 14:45:59] Validation | Batch 250/1567 | Loss: 1.0441 [2026-04-18 14:46:00] Validation | Batch 260/1567 | Loss: 1.0395 [2026-04-18 14:46:00] Validation | Batch 270/1567 | Loss: 1.0364 [2026-04-18 14:46:01] Validation | Batch 280/1567 | Loss: 1.0375 [2026-04-18 14:46:02] Validation | Batch 290/1567 | Loss: 1.0427 [2026-04-18 14:46:03] Validation | Batch 300/1567 | Loss: 1.0476 [2026-04-18 14:46:04] Validation | Batch 310/1567 | Loss: 1.0466 [2026-04-18 14:46:05] Validation | Batch 320/1567 | Loss: 1.0471 [2026-04-18 14:46:06] Validation | Batch 330/1567 | Loss: 1.0442 [2026-04-18 14:46:07] Validation | Batch 340/1567 | Loss: 1.0483 [2026-04-18 14:46:07] Validation | Batch 350/1567 | Loss: 1.0473 [2026-04-18 14:46:08] Validation | Batch 360/1567 | Loss: 1.0451 [2026-04-18 14:46:09] Validation | Batch 370/1567 | Loss: 1.0423 [2026-04-18 14:46:10] Validation | Batch 380/1567 | Loss: 1.0456 [2026-04-18 14:46:11] Validation | Batch 390/1567 | Loss: 1.0467 [2026-04-18 14:46:11] Validation | Batch 400/1567 | Loss: 1.0480 [2026-04-18 14:46:12] Validation | Batch 410/1567 | Loss: 1.0474 [2026-04-18 14:46:13] Validation | Batch 420/1567 | Loss: 1.0469 [2026-04-18 14:46:14] Validation | Batch 430/1567 | Loss: 1.0468 [2026-04-18 14:46:15] Validation | Batch 440/1567 | Loss: 1.0457 [2026-04-18 14:46:16] Validation | Batch 450/1567 | Loss: 1.0458 [2026-04-18 14:46:17] Validation | Batch 460/1567 | Loss: 1.0448 [2026-04-18 14:46:17] Validation | Batch 470/1567 | Loss: 1.0440 [2026-04-18 14:46:18] Validation | Batch 480/1567 | Loss: 1.0419 [2026-04-18 14:46:19] Validation | Batch 490/1567 | Loss: 1.0417 [2026-04-18 14:46:20] Validation | Batch 500/1567 | Loss: 1.0413 [2026-04-18 14:46:20] Validation | Batch 510/1567 | Loss: 1.0436 [2026-04-18 14:46:21] Validation | Batch 520/1567 | Loss: 1.0453 [2026-04-18 14:46:22] Validation | Batch 530/1567 | Loss: 1.0449 [2026-04-18 14:46:23] Validation | Batch 540/1567 | Loss: 1.0476 [2026-04-18 14:46:24] Validation | Batch 550/1567 | Loss: 1.0511 [2026-04-18 14:46:25] Validation | Batch 560/1567 | Loss: 1.0509 [2026-04-18 14:46:26] Validation | Batch 570/1567 | Loss: 1.0509 [2026-04-18 14:46:26] Validation | Batch 580/1567 | Loss: 1.0500 [2026-04-18 14:46:27] Validation | Batch 590/1567 | Loss: 1.0486 [2026-04-18 14:46:28] Validation | Batch 600/1567 | Loss: 1.0469 [2026-04-18 14:46:29] Validation | Batch 610/1567 | Loss: 1.0459 [2026-04-18 14:46:30] Validation | Batch 620/1567 | Loss: 1.0473 [2026-04-18 14:46:31] Validation | Batch 630/1567 | Loss: 1.0453 [2026-04-18 14:46:32] Validation | Batch 640/1567 | Loss: 1.0468 [2026-04-18 14:46:33] Validation | Batch 650/1567 | Loss: 1.0459 [2026-04-18 14:46:33] Validation | Batch 660/1567 | Loss: 1.0447 [2026-04-18 14:46:34] Validation | Batch 670/1567 | Loss: 1.0427 [2026-04-18 14:46:35] Validation | Batch 680/1567 | Loss: 1.0421 [2026-04-18 14:46:35] Validation | Batch 690/1567 | Loss: 1.0430 [2026-04-18 14:46:36] Validation | Batch 700/1567 | Loss: 1.0416 [2026-04-18 14:46:37] Validation | Batch 710/1567 | Loss: 1.0428 [2026-04-18 14:46:38] Validation | Batch 720/1567 | Loss: 1.0420 [2026-04-18 14:46:39] Validation | Batch 730/1567 | Loss: 1.0427 [2026-04-18 14:46:39] Validation | Batch 740/1567 | Loss: 1.0438 [2026-04-18 14:46:40] Validation | Batch 750/1567 | Loss: 1.0443 [2026-04-18 14:46:41] Validation | Batch 760/1567 | Loss: 1.0441 [2026-04-18 14:46:42] Validation | Batch 770/1567 | Loss: 1.0461 [2026-04-18 14:46:43] Validation | Batch 780/1567 | Loss: 1.0474 [2026-04-18 14:46:43] Validation | Batch 790/1567 | Loss: 1.0469 [2026-04-18 14:46:44] Validation | Batch 800/1567 | Loss: 1.0487 [2026-04-18 14:46:45] Validation | Batch 810/1567 | Loss: 1.0487 [2026-04-18 14:46:46] Validation | Batch 820/1567 | Loss: 1.0484 [2026-04-18 14:46:46] Validation | Batch 830/1567 | Loss: 1.0468 [2026-04-18 14:46:47] Validation | Batch 840/1567 | Loss: 1.0469 [2026-04-18 14:46:48] Validation | Batch 850/1567 | Loss: 1.0456 [2026-04-18 14:46:48] Validation | Batch 860/1567 | Loss: 1.0472 [2026-04-18 14:46:49] Validation | Batch 870/1567 | Loss: 1.0476 [2026-04-18 14:46:50] Validation | Batch 880/1567 | Loss: 1.0485 [2026-04-18 14:46:51] Validation | Batch 890/1567 | Loss: 1.0491 [2026-04-18 14:46:51] Validation | Batch 900/1567 | Loss: 1.0510 [2026-04-18 14:46:52] Validation | Batch 910/1567 | Loss: 1.0511 [2026-04-18 14:46:53] Validation | Batch 920/1567 | Loss: 1.0532 [2026-04-18 14:46:54] Validation | Batch 930/1567 | Loss: 1.0509 [2026-04-18 14:46:54] Validation | Batch 940/1567 | Loss: 1.0506 [2026-04-18 14:46:55] Validation | Batch 950/1567 | Loss: 1.0496 [2026-04-18 14:46:56] Validation | Batch 960/1567 | Loss: 1.0482 [2026-04-18 14:46:56] Validation | Batch 970/1567 | Loss: 1.0499 [2026-04-18 14:46:57] Validation | Batch 980/1567 | Loss: 1.0502 [2026-04-18 14:46:58] Validation | Batch 990/1567 | Loss: 1.0497 [2026-04-18 14:46:59] Validation | Batch 1000/1567 | Loss: 1.0500 [2026-04-18 14:46:59] Validation | Batch 1010/1567 | Loss: 1.0477 [2026-04-18 14:47:00] Validation | Batch 1020/1567 | Loss: 1.0480 [2026-04-18 14:47:01] Validation | Batch 1030/1567 | Loss: 1.0496 [2026-04-18 14:47:02] Validation | Batch 1040/1567 | Loss: 1.0492 [2026-04-18 14:47:03] Validation | Batch 1050/1567 | Loss: 1.0501 [2026-04-18 14:47:04] Validation | Batch 1060/1567 | Loss: 1.0492 [2026-04-18 14:47:04] Validation | Batch 1070/1567 | Loss: 1.0484 [2026-04-18 14:47:05] Validation | Batch 1080/1567 | Loss: 1.0494 [2026-04-18 14:47:06] Validation | Batch 1090/1567 | Loss: 1.0491 [2026-04-18 14:47:06] Validation | Batch 1100/1567 | Loss: 1.0497 [2026-04-18 14:47:07] Validation | Batch 1110/1567 | Loss: 1.0495 [2026-04-18 14:47:08] Validation | Batch 1120/1567 | Loss: 1.0498 [2026-04-18 14:47:09] Validation | Batch 1130/1567 | Loss: 1.0499 [2026-04-18 14:47:10] Validation | Batch 1140/1567 | Loss: 1.0507 [2026-04-18 14:47:11] Validation | Batch 1150/1567 | Loss: 1.0511 [2026-04-18 14:47:11] Validation | Batch 1160/1567 | Loss: 1.0519 [2026-04-18 14:47:12] Validation | Batch 1170/1567 | Loss: 1.0516 [2026-04-18 14:47:13] Validation | Batch 1180/1567 | Loss: 1.0513 [2026-04-18 14:47:14] Validation | Batch 1190/1567 | Loss: 1.0524 [2026-04-18 14:47:15] Validation | Batch 1200/1567 | Loss: 1.0517 [2026-04-18 14:47:16] Validation | Batch 1210/1567 | Loss: 1.0506 [2026-04-18 14:47:16] Validation | Batch 1220/1567 | Loss: 1.0510 [2026-04-18 14:47:17] Validation | Batch 1230/1567 | Loss: 1.0530 [2026-04-18 14:47:18] Validation | Batch 1240/1567 | Loss: 1.0519 [2026-04-18 14:47:19] Validation | Batch 1250/1567 | Loss: 1.0518 [2026-04-18 14:47:19] Validation | Batch 1260/1567 | Loss: 1.0528 [2026-04-18 14:47:21] Validation | Batch 1270/1567 | Loss: 1.0528 [2026-04-18 14:47:21] Validation | Batch 1280/1567 | Loss: 1.0522 [2026-04-18 14:47:22] Validation | Batch 1290/1567 | Loss: 1.0525 [2026-04-18 14:47:23] Validation | Batch 1300/1567 | Loss: 1.0528 [2026-04-18 14:47:24] Validation | Batch 1310/1567 | Loss: 1.0531 [2026-04-18 14:47:25] Validation | Batch 1320/1567 | Loss: 1.0522 [2026-04-18 14:47:26] Validation | Batch 1330/1567 | Loss: 1.0519 [2026-04-18 14:47:26] Validation | Batch 1340/1567 | Loss: 1.0516 [2026-04-18 14:47:27] Validation | Batch 1350/1567 | Loss: 1.0525 [2026-04-18 14:47:28] Validation | Batch 1360/1567 | Loss: 1.0521 [2026-04-18 14:47:29] Validation | Batch 1370/1567 | Loss: 1.0524 [2026-04-18 14:47:30] Validation | Batch 1380/1567 | Loss: 1.0537 [2026-04-18 14:47:30] Validation | Batch 1390/1567 | Loss: 1.0538 [2026-04-18 14:47:31] Validation | Batch 1400/1567 | Loss: 1.0542 [2026-04-18 14:47:32] Validation | Batch 1410/1567 | Loss: 1.0540 [2026-04-18 14:47:32] Validation | Batch 1420/1567 | Loss: 1.0546 [2026-04-18 14:47:33] Validation | Batch 1430/1567 | Loss: 1.0543 [2026-04-18 14:47:34] Validation | Batch 1440/1567 | Loss: 1.0546 [2026-04-18 14:47:35] Validation | Batch 1450/1567 | Loss: 1.0539 [2026-04-18 14:47:35] Validation | Batch 1460/1567 | Loss: 1.0537 [2026-04-18 14:47:36] Validation | Batch 1470/1567 | Loss: 1.0528 [2026-04-18 14:47:37] Validation | Batch 1480/1567 | Loss: 1.0512 [2026-04-18 14:47:37] Validation | Batch 1490/1567 | Loss: 1.0512 [2026-04-18 14:47:38] Validation | Batch 1500/1567 | Loss: 1.0514 [2026-04-18 14:47:39] Validation | Batch 1510/1567 | Loss: 1.0512 [2026-04-18 14:47:40] Validation | Batch 1520/1567 | Loss: 1.0505 [2026-04-18 14:47:40] Validation | Batch 1530/1567 | Loss: 1.0513 [2026-04-18 14:47:41] Validation | Batch 1540/1567 | Loss: 1.0523 [2026-04-18 14:47:42] Validation | Batch 1550/1567 | Loss: 1.0526 [2026-04-18 14:47:42] Validation | Batch 1560/1567 | Loss: 1.0517 [2026-04-18 14:47:43] Validation | Batch 1567/1567 | Loss: 1.0521 [2026-04-18 14:47:43] Validation | Loss: 1.0521 | PPL: 2.89 | Time: 124.89s [2026-04-18 14:47:47] New best model saved! Val loss: 1.0521 [2026-04-18 14:47:50] Epoch 2 | Step 18010 | Loss: 0.7938 | LR: 2.00e-06 [2026-04-18 14:47:53] Epoch 2 | Step 18020 | Loss: 0.7936 | LR: 2.00e-06 [2026-04-18 14:47:57] Epoch 2 | Step 18030 | Loss: 0.7935 | LR: 2.00e-06 [2026-04-18 14:48:00] Epoch 2 | Step 18040 | Loss: 0.7934 | LR: 2.00e-06 [2026-04-18 14:48:04] Epoch 2 | Step 18050 | Loss: 0.7935 | LR: 2.00e-06 [2026-04-18 14:48:07] Epoch 2 | Step 18060 | Loss: 0.7934 | LR: 2.00e-06 [2026-04-18 14:48:11] Epoch 2 | Step 18070 | Loss: 0.7934 | LR: 2.00e-06 [2026-04-18 14:48:14] Epoch 2 | Step 18080 | Loss: 0.7934 | LR: 2.00e-06 [2026-04-18 14:48:17] Epoch 2 | Step 18090 | Loss: 0.7934 | LR: 2.00e-06 [2026-04-18 14:48:21] Epoch 2 | Step 18100 | Loss: 0.7935 | LR: 2.00e-06 [2026-04-18 14:48:24] Epoch 2 | Step 18110 | Loss: 0.7935 | LR: 2.00e-06 [2026-04-18 14:48:28] Epoch 2 | Step 18120 | Loss: 0.7934 | LR: 2.00e-06 [2026-04-18 14:48:32] Epoch 2 | Step 18130 | Loss: 0.7935 | LR: 2.00e-06 [2026-04-18 14:48:36] Epoch 2 | Step 18140 | Loss: 0.7934 | LR: 2.00e-06 [2026-04-18 14:48:39] Epoch 2 | Step 18150 | Loss: 0.7935 | LR: 2.00e-06 [2026-04-18 14:48:42] Epoch 2 | Step 18160 | Loss: 0.7935 | LR: 2.00e-06 [2026-04-18 14:48:46] Epoch 2 | Step 18170 | Loss: 0.7935 | LR: 2.00e-06 [2026-04-18 14:48:50] Epoch 2 | Step 18180 | Loss: 0.7936 | LR: 2.00e-06 [2026-04-18 14:48:54] Epoch 2 | Step 18190 | Loss: 0.7937 | LR: 2.00e-06 [2026-04-18 14:48:57] Epoch 2 | Step 18200 | Loss: 0.7939 | LR: 2.00e-06 [2026-04-18 14:49:01] Epoch 2 | Step 18210 | Loss: 0.7938 | LR: 2.00e-06 [2026-04-18 14:49:04] Epoch 2 | Step 18220 | Loss: 0.7939 | LR: 2.00e-06 [2026-04-18 14:49:08] Epoch 2 | Step 18230 | Loss: 0.7939 | LR: 2.00e-06 [2026-04-18 14:49:12] Epoch 2 | Step 18240 | Loss: 0.7940 | LR: 2.00e-06 [2026-04-18 14:49:15] Epoch 2 | Step 18250 | Loss: 0.7941 | LR: 2.00e-06 [2026-04-18 14:49:20] Epoch 2 | Step 18260 | Loss: 0.7940 | LR: 2.00e-06 [2026-04-18 14:49:24] Epoch 2 | Step 18270 | Loss: 0.7940 | LR: 2.00e-06 [2026-04-18 14:49:27] Epoch 2 | Step 18280 | Loss: 0.7940 | LR: 2.00e-06 [2026-04-18 14:49:31] Epoch 2 | Step 18290 | Loss: 0.7940 | LR: 2.00e-06 [2026-04-18 14:49:34] Epoch 2 | Step 18300 | Loss: 0.7940 | LR: 2.00e-06 [2026-04-18 14:49:37] Epoch 2 | Step 18310 | Loss: 0.7940 | LR: 2.00e-06 [2026-04-18 14:49:41] Epoch 2 | Step 18320 | Loss: 0.7941 | LR: 2.00e-06 [2026-04-18 14:49:44] Epoch 2 | Step 18330 | Loss: 0.7940 | LR: 2.00e-06 [2026-04-18 14:49:48] Epoch 2 | Step 18340 | Loss: 0.7940 | LR: 2.00e-06 [2026-04-18 14:49:52] Epoch 2 | Step 18350 | Loss: 0.7942 | LR: 2.00e-06 [2026-04-18 14:49:55] Epoch 2 | Step 18360 | Loss: 0.7940 | LR: 2.00e-06 [2026-04-18 14:49:59] Epoch 2 | Step 18370 | Loss: 0.7941 | LR: 2.00e-06 [2026-04-18 14:50:03] Epoch 2 | Step 18380 | Loss: 0.7941 | LR: 2.00e-06 [2026-04-18 14:50:07] Epoch 2 | Step 18390 | Loss: 0.7940 | LR: 2.00e-06 [2026-04-18 14:50:10] Epoch 2 | Step 18400 | Loss: 0.7939 | LR: 2.00e-06 [2026-04-18 14:50:13] Epoch 2 | Step 18410 | Loss: 0.7937 | LR: 2.00e-06 [2026-04-18 14:50:17] Epoch 2 | Step 18420 | Loss: 0.7938 | LR: 2.00e-06 [2026-04-18 14:50:21] Epoch 2 | Step 18430 | Loss: 0.7937 | LR: 2.00e-06 [2026-04-18 14:50:25] Epoch 2 | Step 18440 | Loss: 0.7936 | LR: 2.00e-06 [2026-04-18 14:50:28] Epoch 2 | Step 18450 | Loss: 0.7935 | LR: 2.00e-06 [2026-04-18 14:50:32] Epoch 2 | Step 18460 | Loss: 0.7935 | LR: 2.00e-06 [2026-04-18 14:50:35] Epoch 2 | Step 18470 | Loss: 0.7936 | LR: 2.00e-06 [2026-04-18 14:50:38] Epoch 2 | Step 18480 | Loss: 0.7936 | LR: 2.00e-06 [2026-04-18 14:50:42] Epoch 2 | Step 18490 | Loss: 0.7935 | LR: 2.00e-06 [2026-04-18 14:50:46] Epoch 2 | Step 18500 | Loss: 0.7935 | LR: 2.00e-06 [2026-04-18 14:50:50] Epoch 2 | Step 18510 | Loss: 0.7935 | LR: 2.00e-06 [2026-04-18 14:50:53] Epoch 2 | Step 18520 | Loss: 0.7935 | LR: 2.00e-06 [2026-04-18 14:50:57] Epoch 2 | Step 18530 | Loss: 0.7937 | LR: 2.00e-06 [2026-04-18 14:51:01] Epoch 2 | Step 18540 | Loss: 0.7935 | LR: 2.00e-06 [2026-04-18 14:51:05] Epoch 2 | Step 18550 | Loss: 0.7934 | LR: 2.00e-06 [2026-04-18 14:51:09] Epoch 2 | Step 18560 | Loss: 0.7933 | LR: 2.00e-06 [2026-04-18 14:51:12] Epoch 2 | Step 18570 | Loss: 0.7933 | LR: 2.00e-06 [2026-04-18 14:51:16] Epoch 2 | Step 18580 | Loss: 0.7934 | LR: 2.00e-06 [2026-04-18 14:51:19] Epoch 2 | Step 18590 | Loss: 0.7933 | LR: 2.00e-06 [2026-04-18 14:51:23] Epoch 2 | Step 18600 | Loss: 0.7933 | LR: 2.00e-06 [2026-04-18 14:51:27] Epoch 2 | Step 18610 | Loss: 0.7933 | LR: 2.00e-06 [2026-04-18 14:51:31] Epoch 2 | Step 18620 | Loss: 0.7934 | LR: 2.00e-06 [2026-04-18 14:51:34] Epoch 2 | Step 18630 | Loss: 0.7935 | LR: 2.00e-06 [2026-04-18 14:51:38] Epoch 2 | Step 18640 | Loss: 0.7934 | LR: 2.00e-06 [2026-04-18 14:51:42] Epoch 2 | Step 18650 | Loss: 0.7934 | LR: 2.00e-06 [2026-04-18 14:51:46] Epoch 2 | Step 18660 | Loss: 0.7935 | LR: 2.00e-06 [2026-04-18 14:51:49] Epoch 2 | Step 18670 | Loss: 0.7936 | LR: 2.00e-06 [2026-04-18 14:51:53] Epoch 2 | Step 18680 | Loss: 0.7936 | LR: 2.00e-06 [2026-04-18 14:51:57] Epoch 2 | Step 18690 | Loss: 0.7935 | LR: 2.00e-06 [2026-04-18 14:52:00] Epoch 2 | Step 18700 | Loss: 0.7934 | LR: 2.00e-06 [2026-04-18 14:52:03] Epoch 2 | Step 18710 | Loss: 0.7934 | LR: 2.00e-06 [2026-04-18 14:52:06] Epoch 2 | Step 18720 | Loss: 0.7934 | LR: 2.00e-06 [2026-04-18 14:52:10] Epoch 2 | Step 18730 | Loss: 0.7934 | LR: 2.00e-06 [2026-04-18 14:52:13] Epoch 2 | Step 18740 | Loss: 0.7933 | LR: 2.00e-06 [2026-04-18 14:52:16] Epoch 2 | Step 18750 | Loss: 0.7933 | LR: 2.00e-06 [2026-04-18 14:52:20] Epoch 2 | Step 18760 | Loss: 0.7931 | LR: 2.00e-06 [2026-04-18 14:52:24] Epoch 2 | Step 18770 | Loss: 0.7931 | LR: 2.00e-06 [2026-04-18 14:52:28] Epoch 2 | Step 18780 | Loss: 0.7930 | LR: 2.00e-06 [2026-04-18 14:52:31] Epoch 2 | Step 18790 | Loss: 0.7931 | LR: 2.00e-06 [2026-04-18 14:52:35] Epoch 2 | Step 18800 | Loss: 0.7931 | LR: 2.00e-06 [2026-04-18 14:52:39] Epoch 2 | Step 18810 | Loss: 0.7931 | LR: 2.00e-06 [2026-04-18 14:52:42] Epoch 2 | Step 18820 | Loss: 0.7931 | LR: 2.00e-06 [2026-04-18 14:52:46] Epoch 2 | Step 18830 | Loss: 0.7931 | LR: 2.00e-06 [2026-04-18 14:52:49] Epoch 2 | Step 18840 | Loss: 0.7931 | LR: 2.00e-06 [2026-04-18 14:52:53] Epoch 2 | Step 18850 | Loss: 0.7932 | LR: 2.00e-06 [2026-04-18 14:52:57] Epoch 2 | Step 18860 | Loss: 0.7931 | LR: 2.00e-06 [2026-04-18 14:53:01] Epoch 2 | Step 18870 | Loss: 0.7932 | LR: 2.00e-06 [2026-04-18 14:53:04] Epoch 2 | Step 18880 | Loss: 0.7932 | LR: 2.00e-06 [2026-04-18 14:53:07] Epoch 2 | Step 18890 | Loss: 0.7933 | LR: 2.00e-06 [2026-04-18 14:53:11] Epoch 2 | Step 18900 | Loss: 0.7932 | LR: 2.00e-06 [2026-04-18 14:53:15] Epoch 2 | Step 18910 | Loss: 0.7932 | LR: 2.00e-06 [2026-04-18 14:53:19] Epoch 2 | Step 18920 | Loss: 0.7932 | LR: 2.00e-06 [2026-04-18 14:53:23] Epoch 2 | Step 18930 | Loss: 0.7931 | LR: 2.00e-06 [2026-04-18 14:53:26] Epoch 2 | Step 18940 | Loss: 0.7931 | LR: 2.00e-06 [2026-04-18 14:53:30] Epoch 2 | Step 18950 | Loss: 0.7931 | LR: 2.00e-06 [2026-04-18 14:53:33] Epoch 2 | Step 18960 | Loss: 0.7931 | LR: 2.00e-06 [2026-04-18 14:53:36] Epoch 2 | Step 18970 | Loss: 0.7931 | LR: 2.00e-06 [2026-04-18 14:53:40] Epoch 2 | Step 18980 | Loss: 0.7932 | LR: 2.00e-06 [2026-04-18 14:53:44] Epoch 2 | Step 18990 | Loss: 0.7932 | LR: 2.00e-06 [2026-04-18 14:53:47] Epoch 2 | Step 19000 | Loss: 0.7932 | LR: 2.00e-06 [2026-04-18 14:53:48] Validation | Batch 10/1567 | Loss: 0.9344 [2026-04-18 14:53:49] Validation | Batch 20/1567 | Loss: 0.9999 [2026-04-18 14:53:50] Validation | Batch 30/1567 | Loss: 1.0389 [2026-04-18 14:53:51] Validation | Batch 40/1567 | Loss: 1.0605 [2026-04-18 14:53:51] Validation | Batch 50/1567 | Loss: 1.0353 [2026-04-18 14:53:52] Validation | Batch 60/1567 | Loss: 1.0222 [2026-04-18 14:53:53] Validation | Batch 70/1567 | Loss: 1.0075 [2026-04-18 14:53:54] Validation | Batch 80/1567 | Loss: 1.0253 [2026-04-18 14:53:55] Validation | Batch 90/1567 | Loss: 1.0334 [2026-04-18 14:53:56] Validation | Batch 100/1567 | Loss: 1.0426 [2026-04-18 14:53:56] Validation | Batch 110/1567 | Loss: 1.0345 [2026-04-18 14:53:57] Validation | Batch 120/1567 | Loss: 1.0453 [2026-04-18 14:53:58] Validation | Batch 130/1567 | Loss: 1.0466 [2026-04-18 14:53:59] Validation | Batch 140/1567 | Loss: 1.0488 [2026-04-18 14:54:00] Validation | Batch 150/1567 | Loss: 1.0568 [2026-04-18 14:54:01] Validation | Batch 160/1567 | Loss: 1.0582 [2026-04-18 14:54:01] Validation | Batch 170/1567 | Loss: 1.0432 [2026-04-18 14:54:02] Validation | Batch 180/1567 | Loss: 1.0452 [2026-04-18 14:54:03] Validation | Batch 190/1567 | Loss: 1.0415 [2026-04-18 14:54:04] Validation | Batch 200/1567 | Loss: 1.0444 [2026-04-18 14:54:05] Validation | Batch 210/1567 | Loss: 1.0455 [2026-04-18 14:54:06] Validation | Batch 220/1567 | Loss: 1.0479 [2026-04-18 14:54:07] Validation | Batch 230/1567 | Loss: 1.0515 [2026-04-18 14:54:07] Validation | Batch 240/1567 | Loss: 1.0497 [2026-04-18 14:54:08] Validation | Batch 250/1567 | Loss: 1.0437 [2026-04-18 14:54:09] Validation | Batch 260/1567 | Loss: 1.0391 [2026-04-18 14:54:10] Validation | Batch 270/1567 | Loss: 1.0360 [2026-04-18 14:54:10] Validation | Batch 280/1567 | Loss: 1.0371 [2026-04-18 14:54:12] Validation | Batch 290/1567 | Loss: 1.0423 [2026-04-18 14:54:12] Validation | Batch 300/1567 | Loss: 1.0472 [2026-04-18 14:54:13] Validation | Batch 310/1567 | Loss: 1.0461 [2026-04-18 14:54:14] Validation | Batch 320/1567 | Loss: 1.0466 [2026-04-18 14:54:15] Validation | Batch 330/1567 | Loss: 1.0437 [2026-04-18 14:54:16] Validation | Batch 340/1567 | Loss: 1.0477 [2026-04-18 14:54:16] Validation | Batch 350/1567 | Loss: 1.0467 [2026-04-18 14:54:17] Validation | Batch 360/1567 | Loss: 1.0445 [2026-04-18 14:54:18] Validation | Batch 370/1567 | Loss: 1.0417 [2026-04-18 14:54:19] Validation | Batch 380/1567 | Loss: 1.0450 [2026-04-18 14:54:20] Validation | Batch 390/1567 | Loss: 1.0461 [2026-04-18 14:54:20] Validation | Batch 400/1567 | Loss: 1.0473 [2026-04-18 14:54:21] Validation | Batch 410/1567 | Loss: 1.0467 [2026-04-18 14:54:22] Validation | Batch 420/1567 | Loss: 1.0462 [2026-04-18 14:54:23] Validation | Batch 430/1567 | Loss: 1.0462 [2026-04-18 14:54:24] Validation | Batch 440/1567 | Loss: 1.0451 [2026-04-18 14:54:24] Validation | Batch 450/1567 | Loss: 1.0451 [2026-04-18 14:54:25] Validation | Batch 460/1567 | Loss: 1.0442 [2026-04-18 14:54:26] Validation | Batch 470/1567 | Loss: 1.0434 [2026-04-18 14:54:27] Validation | Batch 480/1567 | Loss: 1.0413 [2026-04-18 14:54:28] Validation | Batch 490/1567 | Loss: 1.0412 [2026-04-18 14:54:28] Validation | Batch 500/1567 | Loss: 1.0408 [2026-04-18 14:54:29] Validation | Batch 510/1567 | Loss: 1.0431 [2026-04-18 14:54:30] Validation | Batch 520/1567 | Loss: 1.0449 [2026-04-18 14:54:31] Validation | Batch 530/1567 | Loss: 1.0445 [2026-04-18 14:54:32] Validation | Batch 540/1567 | Loss: 1.0472 [2026-04-18 14:54:33] Validation | Batch 550/1567 | Loss: 1.0507 [2026-04-18 14:54:33] Validation | Batch 560/1567 | Loss: 1.0505 [2026-04-18 14:54:34] Validation | Batch 570/1567 | Loss: 1.0505 [2026-04-18 14:54:35] Validation | Batch 580/1567 | Loss: 1.0496 [2026-04-18 14:54:36] Validation | Batch 590/1567 | Loss: 1.0483 [2026-04-18 14:54:37] Validation | Batch 600/1567 | Loss: 1.0466 [2026-04-18 14:54:38] Validation | Batch 610/1567 | Loss: 1.0455 [2026-04-18 14:54:39] Validation | Batch 620/1567 | Loss: 1.0469 [2026-04-18 14:54:40] Validation | Batch 630/1567 | Loss: 1.0449 [2026-04-18 14:54:40] Validation | Batch 640/1567 | Loss: 1.0465 [2026-04-18 14:54:41] Validation | Batch 650/1567 | Loss: 1.0456 [2026-04-18 14:54:42] Validation | Batch 660/1567 | Loss: 1.0444 [2026-04-18 14:54:43] Validation | Batch 670/1567 | Loss: 1.0424 [2026-04-18 14:54:44] Validation | Batch 680/1567 | Loss: 1.0419 [2026-04-18 14:54:44] Validation | Batch 690/1567 | Loss: 1.0427 [2026-04-18 14:54:45] Validation | Batch 700/1567 | Loss: 1.0413 [2026-04-18 14:54:46] Validation | Batch 710/1567 | Loss: 1.0426 [2026-04-18 14:54:47] Validation | Batch 720/1567 | Loss: 1.0418 [2026-04-18 14:54:48] Validation | Batch 730/1567 | Loss: 1.0424 [2026-04-18 14:54:48] Validation | Batch 740/1567 | Loss: 1.0435 [2026-04-18 14:54:49] Validation | Batch 750/1567 | Loss: 1.0440 [2026-04-18 14:54:50] Validation | Batch 760/1567 | Loss: 1.0438 [2026-04-18 14:54:51] Validation | Batch 770/1567 | Loss: 1.0458 [2026-04-18 14:54:52] Validation | Batch 780/1567 | Loss: 1.0471 [2026-04-18 14:54:52] Validation | Batch 790/1567 | Loss: 1.0466 [2026-04-18 14:54:53] Validation | Batch 800/1567 | Loss: 1.0484 [2026-04-18 14:54:54] Validation | Batch 810/1567 | Loss: 1.0483 [2026-04-18 14:54:55] Validation | Batch 820/1567 | Loss: 1.0480 [2026-04-18 14:54:55] Validation | Batch 830/1567 | Loss: 1.0464 [2026-04-18 14:54:56] Validation | Batch 840/1567 | Loss: 1.0465 [2026-04-18 14:54:57] Validation | Batch 850/1567 | Loss: 1.0452 [2026-04-18 14:54:58] Validation | Batch 860/1567 | Loss: 1.0468 [2026-04-18 14:54:58] Validation | Batch 870/1567 | Loss: 1.0473 [2026-04-18 14:54:59] Validation | Batch 880/1567 | Loss: 1.0482 [2026-04-18 14:55:00] Validation | Batch 890/1567 | Loss: 1.0487 [2026-04-18 14:55:01] Validation | Batch 900/1567 | Loss: 1.0507 [2026-04-18 14:55:01] Validation | Batch 910/1567 | Loss: 1.0508 [2026-04-18 14:55:02] Validation | Batch 920/1567 | Loss: 1.0528 [2026-04-18 14:55:03] Validation | Batch 930/1567 | Loss: 1.0505 [2026-04-18 14:55:03] Validation | Batch 940/1567 | Loss: 1.0502 [2026-04-18 14:55:04] Validation | Batch 950/1567 | Loss: 1.0492 [2026-04-18 14:55:05] Validation | Batch 960/1567 | Loss: 1.0478 [2026-04-18 14:55:06] Validation | Batch 970/1567 | Loss: 1.0495 [2026-04-18 14:55:06] Validation | Batch 980/1567 | Loss: 1.0498 [2026-04-18 14:55:07] Validation | Batch 990/1567 | Loss: 1.0492 [2026-04-18 14:55:08] Validation | Batch 1000/1567 | Loss: 1.0496 [2026-04-18 14:55:09] Validation | Batch 1010/1567 | Loss: 1.0473 [2026-04-18 14:55:09] Validation | Batch 1020/1567 | Loss: 1.0476 [2026-04-18 14:55:10] Validation | Batch 1030/1567 | Loss: 1.0492 [2026-04-18 14:55:11] Validation | Batch 1040/1567 | Loss: 1.0487 [2026-04-18 14:55:12] Validation | Batch 1050/1567 | Loss: 1.0497 [2026-04-18 14:55:12] Validation | Batch 1060/1567 | Loss: 1.0488 [2026-04-18 14:55:13] Validation | Batch 1070/1567 | Loss: 1.0480 [2026-04-18 14:55:14] Validation | Batch 1080/1567 | Loss: 1.0489 [2026-04-18 14:55:14] Validation | Batch 1090/1567 | Loss: 1.0487 [2026-04-18 14:55:15] Validation | Batch 1100/1567 | Loss: 1.0492 [2026-04-18 14:55:16] Validation | Batch 1110/1567 | Loss: 1.0491 [2026-04-18 14:55:16] Validation | Batch 1120/1567 | Loss: 1.0493 [2026-04-18 14:55:17] Validation | Batch 1130/1567 | Loss: 1.0494 [2026-04-18 14:55:18] Validation | Batch 1140/1567 | Loss: 1.0502 [2026-04-18 14:55:19] Validation | Batch 1150/1567 | Loss: 1.0506 [2026-04-18 14:55:20] Validation | Batch 1160/1567 | Loss: 1.0515 [2026-04-18 14:55:21] Validation | Batch 1170/1567 | Loss: 1.0512 [2026-04-18 14:55:21] Validation | Batch 1180/1567 | Loss: 1.0508 [2026-04-18 14:55:22] Validation | Batch 1190/1567 | Loss: 1.0519 [2026-04-18 14:55:23] Validation | Batch 1200/1567 | Loss: 1.0513 [2026-04-18 14:55:24] Validation | Batch 1210/1567 | Loss: 1.0501 [2026-04-18 14:55:25] Validation | Batch 1220/1567 | Loss: 1.0505 [2026-04-18 14:55:25] Validation | Batch 1230/1567 | Loss: 1.0525 [2026-04-18 14:55:26] Validation | Batch 1240/1567 | Loss: 1.0513 [2026-04-18 14:55:27] Validation | Batch 1250/1567 | Loss: 1.0513 [2026-04-18 14:55:28] Validation | Batch 1260/1567 | Loss: 1.0523 [2026-04-18 14:55:29] Validation | Batch 1270/1567 | Loss: 1.0523 [2026-04-18 14:55:29] Validation | Batch 1280/1567 | Loss: 1.0517 [2026-04-18 14:55:31] Validation | Batch 1290/1567 | Loss: 1.0520 [2026-04-18 14:55:32] Validation | Batch 1300/1567 | Loss: 1.0523 [2026-04-18 14:55:32] Validation | Batch 1310/1567 | Loss: 1.0526 [2026-04-18 14:55:33] Validation | Batch 1320/1567 | Loss: 1.0517 [2026-04-18 14:55:34] Validation | Batch 1330/1567 | Loss: 1.0514 [2026-04-18 14:55:35] Validation | Batch 1340/1567 | Loss: 1.0512 [2026-04-18 14:55:35] Validation | Batch 1350/1567 | Loss: 1.0520 [2026-04-18 14:55:36] Validation | Batch 1360/1567 | Loss: 1.0516 [2026-04-18 14:55:37] Validation | Batch 1370/1567 | Loss: 1.0520 [2026-04-18 14:55:38] Validation | Batch 1380/1567 | Loss: 1.0532 [2026-04-18 14:55:38] Validation | Batch 1390/1567 | Loss: 1.0534 [2026-04-18 14:55:39] Validation | Batch 1400/1567 | Loss: 1.0537 [2026-04-18 14:55:40] Validation | Batch 1410/1567 | Loss: 1.0535 [2026-04-18 14:55:40] Validation | Batch 1420/1567 | Loss: 1.0541 [2026-04-18 14:55:41] Validation | Batch 1430/1567 | Loss: 1.0538 [2026-04-18 14:55:42] Validation | Batch 1440/1567 | Loss: 1.0541 [2026-04-18 14:55:43] Validation | Batch 1450/1567 | Loss: 1.0534 [2026-04-18 14:55:43] Validation | Batch 1460/1567 | Loss: 1.0532 [2026-04-18 14:55:44] Validation | Batch 1470/1567 | Loss: 1.0523 [2026-04-18 14:55:44] Validation | Batch 1480/1567 | Loss: 1.0507 [2026-04-18 14:55:45] Validation | Batch 1490/1567 | Loss: 1.0507 [2026-04-18 14:55:46] Validation | Batch 1500/1567 | Loss: 1.0509 [2026-04-18 14:55:46] Validation | Batch 1510/1567 | Loss: 1.0507 [2026-04-18 14:55:47] Validation | Batch 1520/1567 | Loss: 1.0500 [2026-04-18 14:55:48] Validation | Batch 1530/1567 | Loss: 1.0508 [2026-04-18 14:55:49] Validation | Batch 1540/1567 | Loss: 1.0518 [2026-04-18 14:55:49] Validation | Batch 1550/1567 | Loss: 1.0521 [2026-04-18 14:55:50] Validation | Batch 1560/1567 | Loss: 1.0512 [2026-04-18 14:55:51] Validation | Batch 1567/1567 | Loss: 1.0516 [2026-04-18 14:55:51] Validation | Loss: 1.0516 | PPL: 2.88 | Time: 124.06s [2026-04-18 14:55:55] New best model saved! Val loss: 1.0516 [2026-04-18 14:55:58] Epoch 2 | Step 19010 | Loss: 0.7932 | LR: 2.00e-06 [2026-04-18 14:56:02] Epoch 2 | Step 19020 | Loss: 0.7932 | LR: 2.00e-06 [2026-04-18 14:56:05] Epoch 2 | Step 19030 | Loss: 0.7932 | LR: 2.00e-06 [2026-04-18 14:56:09] Epoch 2 | Step 19040 | Loss: 0.7933 | LR: 2.00e-06 [2026-04-18 14:56:13] Epoch 2 | Step 19050 | Loss: 0.7933 | LR: 2.00e-06 [2026-04-18 14:56:16] Epoch 2 | Step 19060 | Loss: 0.7932 | LR: 2.00e-06 [2026-04-18 14:56:20] Epoch 2 | Step 19070 | Loss: 0.7931 | LR: 2.00e-06 [2026-04-18 14:56:24] Epoch 2 | Step 19080 | Loss: 0.7932 | LR: 2.00e-06 [2026-04-18 14:56:27] Epoch 2 | Step 19090 | Loss: 0.7932 | LR: 2.00e-06 [2026-04-18 14:56:31] Epoch 2 | Step 19100 | Loss: 0.7933 | LR: 2.00e-06 [2026-04-18 14:56:34] Epoch 2 | Step 19110 | Loss: 0.7934 | LR: 2.00e-06 [2026-04-18 14:56:38] Epoch 2 | Step 19120 | Loss: 0.7934 | LR: 2.00e-06 [2026-04-18 14:56:41] Epoch 2 | Step 19130 | Loss: 0.7933 | LR: 2.00e-06 [2026-04-18 14:56:45] Epoch 2 | Step 19140 | Loss: 0.7934 | LR: 2.00e-06 [2026-04-18 14:56:49] Epoch 2 | Step 19150 | Loss: 0.7933 | LR: 2.00e-06 [2026-04-18 14:56:52] Epoch 2 | Step 19160 | Loss: 0.7932 | LR: 2.00e-06 [2026-04-18 14:56:56] Epoch 2 | Step 19170 | Loss: 0.7932 | LR: 2.00e-06 [2026-04-18 14:56:59] Epoch 2 | Step 19180 | Loss: 0.7932 | LR: 2.00e-06 [2026-04-18 14:57:03] Epoch 2 | Step 19190 | Loss: 0.7931 | LR: 2.00e-06 [2026-04-18 14:57:06] Epoch 2 | Step 19200 | Loss: 0.7930 | LR: 2.00e-06 [2026-04-18 14:57:10] Epoch 2 | Step 19210 | Loss: 0.7931 | LR: 2.00e-06 [2026-04-18 14:57:14] Epoch 2 | Step 19220 | Loss: 0.7929 | LR: 2.00e-06 [2026-04-18 14:57:18] Epoch 2 | Step 19230 | Loss: 0.7929 | LR: 2.00e-06 [2026-04-18 14:57:22] Epoch 2 | Step 19240 | Loss: 0.7930 | LR: 2.00e-06 [2026-04-18 14:57:25] Epoch 2 | Step 19250 | Loss: 0.7930 | LR: 2.00e-06 [2026-04-18 14:57:29] Epoch 2 | Step 19260 | Loss: 0.7928 | LR: 2.00e-06 [2026-04-18 14:57:32] Epoch 2 | Step 19270 | Loss: 0.7927 | LR: 2.00e-06 [2026-04-18 14:57:36] Epoch 2 | Step 19280 | Loss: 0.7927 | LR: 2.00e-06 [2026-04-18 14:57:39] Epoch 2 | Step 19290 | Loss: 0.7927 | LR: 2.00e-06 [2026-04-18 14:57:43] Epoch 2 | Step 19300 | Loss: 0.7926 | LR: 2.00e-06 [2026-04-18 14:57:46] Epoch 2 | Step 19310 | Loss: 0.7924 | LR: 2.00e-06 [2026-04-18 14:57:50] Epoch 2 | Step 19320 | Loss: 0.7924 | LR: 2.00e-06 [2026-04-18 14:57:53] Epoch 2 | Step 19330 | Loss: 0.7922 | LR: 2.00e-06 [2026-04-18 14:57:56] Epoch 2 | Step 19340 | Loss: 0.7923 | LR: 2.00e-06 [2026-04-18 14:58:00] Epoch 2 | Step 19350 | Loss: 0.7922 | LR: 2.00e-06 [2026-04-18 14:58:04] Epoch 2 | Step 19360 | Loss: 0.7922 | LR: 2.00e-06 [2026-04-18 14:58:07] Epoch 2 | Step 19370 | Loss: 0.7920 | LR: 2.00e-06 [2026-04-18 14:58:10] Epoch 2 | Step 19380 | Loss: 0.7921 | LR: 2.00e-06 [2026-04-18 14:58:14] Epoch 2 | Step 19390 | Loss: 0.7922 | LR: 2.00e-06 [2026-04-18 14:58:19] Epoch 2 | Step 19400 | Loss: 0.7923 | LR: 2.00e-06 [2026-04-18 14:58:23] Epoch 2 | Step 19410 | Loss: 0.7924 | LR: 2.00e-06 [2026-04-18 14:58:28] Epoch 2 | Step 19420 | Loss: 0.7923 | LR: 2.00e-06 [2026-04-18 14:58:31] Epoch 2 | Step 19430 | Loss: 0.7924 | LR: 2.00e-06 [2026-04-18 14:58:34] Epoch 2 | Step 19440 | Loss: 0.7926 | LR: 2.00e-06 [2026-04-18 14:58:38] Epoch 2 | Step 19450 | Loss: 0.7927 | LR: 2.00e-06 [2026-04-18 14:58:41] Epoch 2 | Step 19460 | Loss: 0.7926 | LR: 2.00e-06 [2026-04-18 14:58:45] Epoch 2 | Step 19470 | Loss: 0.7926 | LR: 2.00e-06 [2026-04-18 14:58:49] Epoch 2 | Step 19480 | Loss: 0.7925 | LR: 2.00e-06 [2026-04-18 14:58:52] Epoch 2 | Step 19490 | Loss: 0.7925 | LR: 2.00e-06 [2026-04-18 14:58:56] Epoch 2 | Step 19500 | Loss: 0.7925 | LR: 2.00e-06 [2026-04-18 14:58:59] Epoch 2 | Step 19510 | Loss: 0.7924 | LR: 2.00e-06 [2026-04-18 14:59:03] Epoch 2 | Step 19520 | Loss: 0.7924 | LR: 2.00e-06 [2026-04-18 14:59:07] Epoch 2 | Step 19530 | Loss: 0.7923 | LR: 2.00e-06 [2026-04-18 14:59:10] Epoch 2 | Step 19540 | Loss: 0.7923 | LR: 2.00e-06 [2026-04-18 14:59:14] Epoch 2 | Step 19550 | Loss: 0.7923 | LR: 2.00e-06 [2026-04-18 14:59:17] Epoch 2 | Step 19560 | Loss: 0.7922 | LR: 2.00e-06 [2026-04-18 14:59:21] Epoch 2 | Step 19570 | Loss: 0.7922 | LR: 2.00e-06 [2026-04-18 14:59:25] Epoch 2 | Step 19580 | Loss: 0.7922 | LR: 2.00e-06 [2026-04-18 14:59:29] Epoch 2 | Step 19590 | Loss: 0.7921 | LR: 2.00e-06 [2026-04-18 14:59:33] Epoch 2 | Step 19600 | Loss: 0.7921 | LR: 2.00e-06 [2026-04-18 14:59:37] Epoch 2 | Step 19610 | Loss: 0.7921 | LR: 2.00e-06 [2026-04-18 14:59:41] Epoch 2 | Step 19620 | Loss: 0.7921 | LR: 2.00e-06 [2026-04-18 14:59:44] Epoch 2 | Step 19630 | Loss: 0.7921 | LR: 2.00e-06 [2026-04-18 14:59:47] Epoch 2 | Step 19640 | Loss: 0.7921 | LR: 2.00e-06 [2026-04-18 14:59:51] Epoch 2 | Step 19650 | Loss: 0.7921 | LR: 2.00e-06 [2026-04-18 14:59:54] Epoch 2 | Step 19660 | Loss: 0.7920 | LR: 2.00e-06 [2026-04-18 14:59:58] Epoch 2 | Step 19670 | Loss: 0.7921 | LR: 2.00e-06 [2026-04-18 15:00:02] Epoch 2 | Step 19680 | Loss: 0.7920 | LR: 2.00e-06 [2026-04-18 15:00:06] Epoch 2 | Step 19690 | Loss: 0.7920 | LR: 2.00e-06 [2026-04-18 15:00:09] Epoch 2 | Step 19700 | Loss: 0.7920 | LR: 2.00e-06 [2026-04-18 15:00:13] Epoch 2 | Step 19710 | Loss: 0.7921 | LR: 2.00e-06 [2026-04-18 15:00:17] Epoch 2 | Step 19720 | Loss: 0.7920 | LR: 2.00e-06 [2026-04-18 15:00:20] Epoch 2 | Step 19730 | Loss: 0.7920 | LR: 2.00e-06 [2026-04-18 15:00:24] Epoch 2 | Step 19740 | Loss: 0.7920 | LR: 2.00e-06 [2026-04-18 15:00:27] Epoch 2 | Step 19750 | Loss: 0.7920 | LR: 2.00e-06 [2026-04-18 15:00:31] Epoch 2 | Step 19760 | Loss: 0.7920 | LR: 2.00e-06 [2026-04-18 15:00:35] Epoch 2 | Step 19770 | Loss: 0.7919 | LR: 2.00e-06 [2026-04-18 15:00:36] Epoch 2 completed in 4898.76s | Loss: 0.7919 [2026-04-18 15:00:46] Checkpoint saved: outputs/2026-04-18/12-19-14/checkpoints/checkpoint_step_19774.pt [2026-04-18 15:01:00] ============================================================ [2026-04-18 15:01:00] EPOCH 3/3 [2026-04-18 15:01:00] ============================================================ [2026-04-18 15:01:03] Epoch 3 | Step 19780 | Loss: 0.7412 | LR: 2.00e-06 [2026-04-18 15:01:07] Epoch 3 | Step 19790 | Loss: 0.7473 | LR: 2.00e-06 [2026-04-18 15:01:10] Epoch 3 | Step 19800 | Loss: 0.7293 | LR: 2.00e-06 [2026-04-18 15:01:13] Epoch 3 | Step 19810 | Loss: 0.7257 | LR: 2.00e-06 [2026-04-18 15:01:17] Epoch 3 | Step 19820 | Loss: 0.7120 | LR: 2.00e-06 [2026-04-18 15:01:20] Epoch 3 | Step 19830 | Loss: 0.6957 | LR: 2.00e-06 [2026-04-18 15:01:24] Epoch 3 | Step 19840 | Loss: 0.7080 | LR: 2.00e-06 [2026-04-18 15:01:27] Epoch 3 | Step 19850 | Loss: 0.6971 | LR: 2.00e-06 [2026-04-18 15:01:31] Epoch 3 | Step 19860 | Loss: 0.7064 | LR: 2.00e-06 [2026-04-18 15:01:34] Epoch 3 | Step 19870 | Loss: 0.7122 | LR: 2.00e-06 [2026-04-18 15:01:38] Epoch 3 | Step 19880 | Loss: 0.7056 | LR: 2.00e-06 [2026-04-18 15:01:42] Epoch 3 | Step 19890 | Loss: 0.7076 | LR: 2.00e-06 [2026-04-18 15:01:45] Epoch 3 | Step 19900 | Loss: 0.7054 | LR: 2.00e-06 [2026-04-18 15:01:49] Epoch 3 | Step 19910 | Loss: 0.7072 | LR: 2.00e-06 [2026-04-18 15:01:52] Epoch 3 | Step 19920 | Loss: 0.7041 | LR: 2.00e-06 [2026-04-18 15:01:55] Epoch 3 | Step 19930 | Loss: 0.7011 | LR: 2.00e-06 [2026-04-18 15:01:59] Epoch 3 | Step 19940 | Loss: 0.7033 | LR: 2.00e-06 [2026-04-18 15:02:03] Epoch 3 | Step 19950 | Loss: 0.7055 | LR: 2.00e-06 [2026-04-18 15:02:07] Epoch 3 | Step 19960 | Loss: 0.7037 | LR: 2.00e-06 [2026-04-18 15:02:10] Epoch 3 | Step 19970 | Loss: 0.7071 | LR: 2.00e-06 [2026-04-18 15:02:14] Epoch 3 | Step 19980 | Loss: 0.7098 | LR: 2.00e-06 [2026-04-18 15:02:18] Epoch 3 | Step 19990 | Loss: 0.7063 | LR: 2.00e-06 [2026-04-18 15:02:22] Epoch 3 | Step 20000 | Loss: 0.6999 | LR: 2.00e-06 [2026-04-18 15:02:23] Validation | Batch 10/1567 | Loss: 0.9358 [2026-04-18 15:02:24] Validation | Batch 20/1567 | Loss: 1.0011 [2026-04-18 15:02:25] Validation | Batch 30/1567 | Loss: 1.0396 [2026-04-18 15:02:26] Validation | Batch 40/1567 | Loss: 1.0617 [2026-04-18 15:02:26] Validation | Batch 50/1567 | Loss: 1.0375 [2026-04-18 15:02:27] Validation | Batch 60/1567 | Loss: 1.0244 [2026-04-18 15:02:28] Validation | Batch 70/1567 | Loss: 1.0097 [2026-04-18 15:02:29] Validation | Batch 80/1567 | Loss: 1.0276 [2026-04-18 15:02:30] Validation | Batch 90/1567 | Loss: 1.0355 [2026-04-18 15:02:31] Validation | Batch 100/1567 | Loss: 1.0453 [2026-04-18 15:02:31] Validation | Batch 110/1567 | Loss: 1.0372 [2026-04-18 15:02:32] Validation | Batch 120/1567 | Loss: 1.0482 [2026-04-18 15:02:33] Validation | Batch 130/1567 | Loss: 1.0498 [2026-04-18 15:02:34] Validation | Batch 140/1567 | Loss: 1.0520 [2026-04-18 15:02:35] Validation | Batch 150/1567 | Loss: 1.0599 [2026-04-18 15:02:36] Validation | Batch 160/1567 | Loss: 1.0612 [2026-04-18 15:02:36] Validation | Batch 170/1567 | Loss: 1.0459 [2026-04-18 15:02:37] Validation | Batch 180/1567 | Loss: 1.0479 [2026-04-18 15:02:38] Validation | Batch 190/1567 | Loss: 1.0444 [2026-04-18 15:02:39] Validation | Batch 200/1567 | Loss: 1.0473 [2026-04-18 15:02:40] Validation | Batch 210/1567 | Loss: 1.0484 [2026-04-18 15:02:41] Validation | Batch 220/1567 | Loss: 1.0506 [2026-04-18 15:02:42] Validation | Batch 230/1567 | Loss: 1.0544 [2026-04-18 15:02:42] Validation | Batch 240/1567 | Loss: 1.0527 [2026-04-18 15:02:43] Validation | Batch 250/1567 | Loss: 1.0466 [2026-04-18 15:02:44] Validation | Batch 260/1567 | Loss: 1.0420 [2026-04-18 15:02:44] Validation | Batch 270/1567 | Loss: 1.0390 [2026-04-18 15:02:45] Validation | Batch 280/1567 | Loss: 1.0402 [2026-04-18 15:02:46] Validation | Batch 290/1567 | Loss: 1.0454 [2026-04-18 15:02:47] Validation | Batch 300/1567 | Loss: 1.0504 [2026-04-18 15:02:47] Validation | Batch 310/1567 | Loss: 1.0493 [2026-04-18 15:02:48] Validation | Batch 320/1567 | Loss: 1.0499 [2026-04-18 15:02:49] Validation | Batch 330/1567 | Loss: 1.0469 [2026-04-18 15:02:50] Validation | Batch 340/1567 | Loss: 1.0510 [2026-04-18 15:02:51] Validation | Batch 350/1567 | Loss: 1.0500 [2026-04-18 15:02:52] Validation | Batch 360/1567 | Loss: 1.0479 [2026-04-18 15:02:52] Validation | Batch 370/1567 | Loss: 1.0451 [2026-04-18 15:02:53] Validation | Batch 380/1567 | Loss: 1.0484 [2026-04-18 15:02:54] Validation | Batch 390/1567 | Loss: 1.0495 [2026-04-18 15:02:55] Validation | Batch 400/1567 | Loss: 1.0507 [2026-04-18 15:02:56] Validation | Batch 410/1567 | Loss: 1.0501 [2026-04-18 15:02:56] Validation | Batch 420/1567 | Loss: 1.0497 [2026-04-18 15:02:57] Validation | Batch 430/1567 | Loss: 1.0496 [2026-04-18 15:02:58] Validation | Batch 440/1567 | Loss: 1.0486 [2026-04-18 15:02:59] Validation | Batch 450/1567 | Loss: 1.0486 [2026-04-18 15:03:00] Validation | Batch 460/1567 | Loss: 1.0476 [2026-04-18 15:03:01] Validation | Batch 470/1567 | Loss: 1.0469 [2026-04-18 15:03:01] Validation | Batch 480/1567 | Loss: 1.0448 [2026-04-18 15:03:02] Validation | Batch 490/1567 | Loss: 1.0446 [2026-04-18 15:03:03] Validation | Batch 500/1567 | Loss: 1.0442 [2026-04-18 15:03:04] Validation | Batch 510/1567 | Loss: 1.0465 [2026-04-18 15:03:04] Validation | Batch 520/1567 | Loss: 1.0482 [2026-04-18 15:03:05] Validation | Batch 530/1567 | Loss: 1.0479 [2026-04-18 15:03:06] Validation | Batch 540/1567 | Loss: 1.0506 [2026-04-18 15:03:07] Validation | Batch 550/1567 | Loss: 1.0542 [2026-04-18 15:03:08] Validation | Batch 560/1567 | Loss: 1.0539 [2026-04-18 15:03:09] Validation | Batch 570/1567 | Loss: 1.0539 [2026-04-18 15:03:10] Validation | Batch 580/1567 | Loss: 1.0531 [2026-04-18 15:03:11] Validation | Batch 590/1567 | Loss: 1.0517 [2026-04-18 15:03:11] Validation | Batch 600/1567 | Loss: 1.0499 [2026-04-18 15:03:12] Validation | Batch 610/1567 | Loss: 1.0489 [2026-04-18 15:03:13] Validation | Batch 620/1567 | Loss: 1.0503 [2026-04-18 15:03:14] Validation | Batch 630/1567 | Loss: 1.0482 [2026-04-18 15:03:15] Validation | Batch 640/1567 | Loss: 1.0498 [2026-04-18 15:03:16] Validation | Batch 650/1567 | Loss: 1.0490 [2026-04-18 15:03:16] Validation | Batch 660/1567 | Loss: 1.0478 [2026-04-18 15:03:17] Validation | Batch 670/1567 | Loss: 1.0458 [2026-04-18 15:03:18] Validation | Batch 680/1567 | Loss: 1.0452 [2026-04-18 15:03:19] Validation | Batch 690/1567 | Loss: 1.0461 [2026-04-18 15:03:20] Validation | Batch 700/1567 | Loss: 1.0447 [2026-04-18 15:03:21] Validation | Batch 710/1567 | Loss: 1.0460 [2026-04-18 15:03:21] Validation | Batch 720/1567 | Loss: 1.0452 [2026-04-18 15:03:22] Validation | Batch 730/1567 | Loss: 1.0458 [2026-04-18 15:03:23] Validation | Batch 740/1567 | Loss: 1.0469 [2026-04-18 15:03:24] Validation | Batch 750/1567 | Loss: 1.0474 [2026-04-18 15:03:24] Validation | Batch 760/1567 | Loss: 1.0472 [2026-04-18 15:03:25] Validation | Batch 770/1567 | Loss: 1.0492 [2026-04-18 15:03:26] Validation | Batch 780/1567 | Loss: 1.0505 [2026-04-18 15:03:27] Validation | Batch 790/1567 | Loss: 1.0499 [2026-04-18 15:03:28] Validation | Batch 800/1567 | Loss: 1.0518 [2026-04-18 15:03:29] Validation | Batch 810/1567 | Loss: 1.0517 [2026-04-18 15:03:29] Validation | Batch 820/1567 | Loss: 1.0513 [2026-04-18 15:03:30] Validation | Batch 830/1567 | Loss: 1.0498 [2026-04-18 15:03:31] Validation | Batch 840/1567 | Loss: 1.0499 [2026-04-18 15:03:32] Validation | Batch 850/1567 | Loss: 1.0486 [2026-04-18 15:03:32] Validation | Batch 860/1567 | Loss: 1.0501 [2026-04-18 15:03:33] Validation | Batch 870/1567 | Loss: 1.0506 [2026-04-18 15:03:34] Validation | Batch 880/1567 | Loss: 1.0515 [2026-04-18 15:03:34] Validation | Batch 890/1567 | Loss: 1.0520 [2026-04-18 15:03:35] Validation | Batch 900/1567 | Loss: 1.0540 [2026-04-18 15:03:36] Validation | Batch 910/1567 | Loss: 1.0541 [2026-04-18 15:03:37] Validation | Batch 920/1567 | Loss: 1.0563 [2026-04-18 15:03:37] Validation | Batch 930/1567 | Loss: 1.0539 [2026-04-18 15:03:38] Validation | Batch 940/1567 | Loss: 1.0536 [2026-04-18 15:03:39] Validation | Batch 950/1567 | Loss: 1.0525 [2026-04-18 15:03:40] Validation | Batch 960/1567 | Loss: 1.0512 [2026-04-18 15:03:40] Validation | Batch 970/1567 | Loss: 1.0528 [2026-04-18 15:03:41] Validation | Batch 980/1567 | Loss: 1.0532 [2026-04-18 15:03:42] Validation | Batch 990/1567 | Loss: 1.0526 [2026-04-18 15:03:42] Validation | Batch 1000/1567 | Loss: 1.0530 [2026-04-18 15:03:43] Validation | Batch 1010/1567 | Loss: 1.0507 [2026-04-18 15:03:44] Validation | Batch 1020/1567 | Loss: 1.0510 [2026-04-18 15:03:45] Validation | Batch 1030/1567 | Loss: 1.0526 [2026-04-18 15:03:46] Validation | Batch 1040/1567 | Loss: 1.0521 [2026-04-18 15:03:46] Validation | Batch 1050/1567 | Loss: 1.0531 [2026-04-18 15:03:47] Validation | Batch 1060/1567 | Loss: 1.0522 [2026-04-18 15:03:48] Validation | Batch 1070/1567 | Loss: 1.0515 [2026-04-18 15:03:49] Validation | Batch 1080/1567 | Loss: 1.0524 [2026-04-18 15:03:50] Validation | Batch 1090/1567 | Loss: 1.0522 [2026-04-18 15:03:50] Validation | Batch 1100/1567 | Loss: 1.0528 [2026-04-18 15:03:51] Validation | Batch 1110/1567 | Loss: 1.0526 [2026-04-18 15:03:51] Validation | Batch 1120/1567 | Loss: 1.0529 [2026-04-18 15:03:52] Validation | Batch 1130/1567 | Loss: 1.0529 [2026-04-18 15:03:53] Validation | Batch 1140/1567 | Loss: 1.0537 [2026-04-18 15:03:54] Validation | Batch 1150/1567 | Loss: 1.0541 [2026-04-18 15:03:55] Validation | Batch 1160/1567 | Loss: 1.0550 [2026-04-18 15:03:56] Validation | Batch 1170/1567 | Loss: 1.0547 [2026-04-18 15:03:57] Validation | Batch 1180/1567 | Loss: 1.0543 [2026-04-18 15:03:57] Validation | Batch 1190/1567 | Loss: 1.0554 [2026-04-18 15:03:58] Validation | Batch 1200/1567 | Loss: 1.0548 [2026-04-18 15:03:59] Validation | Batch 1210/1567 | Loss: 1.0536 [2026-04-18 15:04:00] Validation | Batch 1220/1567 | Loss: 1.0540 [2026-04-18 15:04:00] Validation | Batch 1230/1567 | Loss: 1.0561 [2026-04-18 15:04:01] Validation | Batch 1240/1567 | Loss: 1.0549 [2026-04-18 15:04:02] Validation | Batch 1250/1567 | Loss: 1.0549 [2026-04-18 15:04:03] Validation | Batch 1260/1567 | Loss: 1.0558 [2026-04-18 15:04:04] Validation | Batch 1270/1567 | Loss: 1.0558 [2026-04-18 15:04:05] Validation | Batch 1280/1567 | Loss: 1.0552 [2026-04-18 15:04:06] Validation | Batch 1290/1567 | Loss: 1.0555 [2026-04-18 15:04:07] Validation | Batch 1300/1567 | Loss: 1.0558 [2026-04-18 15:04:07] Validation | Batch 1310/1567 | Loss: 1.0562 [2026-04-18 15:04:08] Validation | Batch 1320/1567 | Loss: 1.0553 [2026-04-18 15:04:09] Validation | Batch 1330/1567 | Loss: 1.0549 [2026-04-18 15:04:10] Validation | Batch 1340/1567 | Loss: 1.0547 [2026-04-18 15:04:10] Validation | Batch 1350/1567 | Loss: 1.0556 [2026-04-18 15:04:11] Validation | Batch 1360/1567 | Loss: 1.0552 [2026-04-18 15:04:12] Validation | Batch 1370/1567 | Loss: 1.0556 [2026-04-18 15:04:13] Validation | Batch 1380/1567 | Loss: 1.0568 [2026-04-18 15:04:13] Validation | Batch 1390/1567 | Loss: 1.0569 [2026-04-18 15:04:14] Validation | Batch 1400/1567 | Loss: 1.0573 [2026-04-18 15:04:15] Validation | Batch 1410/1567 | Loss: 1.0571 [2026-04-18 15:04:15] Validation | Batch 1420/1567 | Loss: 1.0577 [2026-04-18 15:04:16] Validation | Batch 1430/1567 | Loss: 1.0574 [2026-04-18 15:04:17] Validation | Batch 1440/1567 | Loss: 1.0577 [2026-04-18 15:04:18] Validation | Batch 1450/1567 | Loss: 1.0570 [2026-04-18 15:04:18] Validation | Batch 1460/1567 | Loss: 1.0568 [2026-04-18 15:04:19] Validation | Batch 1470/1567 | Loss: 1.0559 [2026-04-18 15:04:20] Validation | Batch 1480/1567 | Loss: 1.0543 [2026-04-18 15:04:20] Validation | Batch 1490/1567 | Loss: 1.0543 [2026-04-18 15:04:22] Validation | Batch 1500/1567 | Loss: 1.0544 [2026-04-18 15:04:23] Validation | Batch 1510/1567 | Loss: 1.0542 [2026-04-18 15:04:23] Validation | Batch 1520/1567 | Loss: 1.0535 [2026-04-18 15:04:24] Validation | Batch 1530/1567 | Loss: 1.0543 [2026-04-18 15:04:25] Validation | Batch 1540/1567 | Loss: 1.0553 [2026-04-18 15:04:26] Validation | Batch 1550/1567 | Loss: 1.0556 [2026-04-18 15:04:27] Validation | Batch 1560/1567 | Loss: 1.0547 [2026-04-18 15:04:27] Validation | Batch 1567/1567 | Loss: 1.0551 [2026-04-18 15:04:27] Validation | Loss: 1.0551 | PPL: 2.89 | Time: 125.38s [2026-04-18 15:04:31] Epoch 3 | Step 20010 | Loss: 0.7004 | LR: 2.00e-06 [2026-04-18 15:04:35] Epoch 3 | Step 20020 | Loss: 0.7012 | LR: 2.00e-06 [2026-04-18 15:04:38] Epoch 3 | Step 20030 | Loss: 0.7007 | LR: 2.00e-06 [2026-04-18 15:04:42] Epoch 3 | Step 20040 | Loss: 0.6994 | LR: 2.00e-06 [2026-04-18 15:04:45] Epoch 3 | Step 20050 | Loss: 0.7006 | LR: 2.00e-06 [2026-04-18 15:04:49] Epoch 3 | Step 20060 | Loss: 0.7017 | LR: 2.00e-06 [2026-04-18 15:04:52] Epoch 3 | Step 20070 | Loss: 0.6989 | LR: 2.00e-06 [2026-04-18 15:04:55] Epoch 3 | Step 20080 | Loss: 0.6953 | LR: 2.00e-06 [2026-04-18 15:04:59] Epoch 3 | Step 20090 | Loss: 0.6948 | LR: 2.00e-06 [2026-04-18 15:05:02] Epoch 3 | Step 20100 | Loss: 0.6933 | LR: 2.00e-06 [2026-04-18 15:05:06] Epoch 3 | Step 20110 | Loss: 0.6945 | LR: 2.00e-06 [2026-04-18 15:05:09] Epoch 3 | Step 20120 | Loss: 0.6968 | LR: 2.00e-06 [2026-04-18 15:05:13] Epoch 3 | Step 20130 | Loss: 0.6989 | LR: 2.00e-06 [2026-04-18 15:05:16] Epoch 3 | Step 20140 | Loss: 0.6994 | LR: 2.00e-06 [2026-04-18 15:05:20] Epoch 3 | Step 20150 | Loss: 0.6996 | LR: 2.00e-06 [2026-04-18 15:05:23] Epoch 3 | Step 20160 | Loss: 0.6992 | LR: 2.00e-06 [2026-04-18 15:05:28] Epoch 3 | Step 20170 | Loss: 0.7006 | LR: 2.00e-06 [2026-04-18 15:05:31] Epoch 3 | Step 20180 | Loss: 0.7031 | LR: 2.00e-06 [2026-04-18 15:05:35] Epoch 3 | Step 20190 | Loss: 0.7059 | LR: 2.00e-06 [2026-04-18 15:05:39] Epoch 3 | Step 20200 | Loss: 0.7088 | LR: 2.00e-06 [2026-04-18 15:05:42] Epoch 3 | Step 20210 | Loss: 0.7080 | LR: 2.00e-06 [2026-04-18 15:05:46] Epoch 3 | Step 20220 | Loss: 0.7110 | LR: 2.00e-06 [2026-04-18 15:05:49] Epoch 3 | Step 20230 | Loss: 0.7086 | LR: 2.00e-06 [2026-04-18 15:05:53] Epoch 3 | Step 20240 | Loss: 0.7079 | LR: 2.00e-06 [2026-04-18 15:05:56] Epoch 3 | Step 20250 | Loss: 0.7054 | LR: 2.00e-06 [2026-04-18 15:05:59] Epoch 3 | Step 20260 | Loss: 0.7043 | LR: 2.00e-06 [2026-04-18 15:06:02] Epoch 3 | Step 20270 | Loss: 0.7038 | LR: 2.00e-06 [2026-04-18 15:06:06] Epoch 3 | Step 20280 | Loss: 0.7041 | LR: 2.00e-06 [2026-04-18 15:06:09] Epoch 3 | Step 20290 | Loss: 0.7042 | LR: 2.00e-06 [2026-04-18 15:06:13] Epoch 3 | Step 20300 | Loss: 0.7035 | LR: 2.00e-06 [2026-04-18 15:06:16] Epoch 3 | Step 20310 | Loss: 0.7022 | LR: 2.00e-06 [2026-04-18 15:06:20] Epoch 3 | Step 20320 | Loss: 0.7019 | LR: 2.00e-06 [2026-04-18 15:06:23] Epoch 3 | Step 20330 | Loss: 0.7010 | LR: 2.00e-06 [2026-04-18 15:06:27] Epoch 3 | Step 20340 | Loss: 0.7002 | LR: 2.00e-06 [2026-04-18 15:06:30] Epoch 3 | Step 20350 | Loss: 0.7008 | LR: 2.00e-06 [2026-04-18 15:06:34] Epoch 3 | Step 20360 | Loss: 0.6987 | LR: 2.00e-06 [2026-04-18 15:06:37] Epoch 3 | Step 20370 | Loss: 0.6995 | LR: 2.00e-06 [2026-04-18 15:06:41] Epoch 3 | Step 20380 | Loss: 0.6992 | LR: 2.00e-06 [2026-04-18 15:06:44] Epoch 3 | Step 20390 | Loss: 0.7004 | LR: 2.00e-06 [2026-04-18 15:06:48] Epoch 3 | Step 20400 | Loss: 0.6999 | LR: 2.00e-06 [2026-04-18 15:06:51] Epoch 3 | Step 20410 | Loss: 0.7000 | LR: 2.00e-06 [2026-04-18 15:06:55] Epoch 3 | Step 20420 | Loss: 0.6992 | LR: 2.00e-06 [2026-04-18 15:06:58] Epoch 3 | Step 20430 | Loss: 0.6990 | LR: 2.00e-06 [2026-04-18 15:07:02] Epoch 3 | Step 20440 | Loss: 0.6992 | LR: 2.00e-06 [2026-04-18 15:07:06] Epoch 3 | Step 20450 | Loss: 0.7010 | LR: 2.00e-06 [2026-04-18 15:07:09] Epoch 3 | Step 20460 | Loss: 0.7009 | LR: 2.00e-06 [2026-04-18 15:07:13] Epoch 3 | Step 20470 | Loss: 0.7015 | LR: 2.00e-06 [2026-04-18 15:07:16] Epoch 3 | Step 20480 | Loss: 0.7013 | LR: 2.00e-06 [2026-04-18 15:07:20] Epoch 3 | Step 20490 | Loss: 0.7012 | LR: 2.00e-06 [2026-04-18 15:07:24] Epoch 3 | Step 20500 | Loss: 0.7009 | LR: 2.00e-06 [2026-04-18 15:07:27] Epoch 3 | Step 20510 | Loss: 0.7009 | LR: 2.00e-06 [2026-04-18 15:07:31] Epoch 3 | Step 20520 | Loss: 0.7024 | LR: 2.00e-06 [2026-04-18 15:07:34] Epoch 3 | Step 20530 | Loss: 0.7037 | LR: 2.00e-06 [2026-04-18 15:07:37] Epoch 3 | Step 20540 | Loss: 0.7051 | LR: 2.00e-06 [2026-04-18 15:07:41] Epoch 3 | Step 20550 | Loss: 0.7047 | LR: 2.00e-06 [2026-04-18 15:07:45] Epoch 3 | Step 20560 | Loss: 0.7041 | LR: 2.00e-06 [2026-04-18 15:07:48] Epoch 3 | Step 20570 | Loss: 0.7052 | LR: 2.00e-06 [2026-04-18 15:07:52] Epoch 3 | Step 20580 | Loss: 0.7064 | LR: 2.00e-06 [2026-04-18 15:07:55] Epoch 3 | Step 20590 | Loss: 0.7074 | LR: 2.00e-06 [2026-04-18 15:07:59] Epoch 3 | Step 20600 | Loss: 0.7080 | LR: 2.00e-06 [2026-04-18 15:08:02] Epoch 3 | Step 20610 | Loss: 0.7075 | LR: 2.00e-06 [2026-04-18 15:08:06] Epoch 3 | Step 20620 | Loss: 0.7083 | LR: 2.00e-06 [2026-04-18 15:08:10] Epoch 3 | Step 20630 | Loss: 0.7105 | LR: 2.00e-06 [2026-04-18 15:08:13] Epoch 3 | Step 20640 | Loss: 0.7102 | LR: 2.00e-06 [2026-04-18 15:08:17] Epoch 3 | Step 20650 | Loss: 0.7092 | LR: 2.00e-06 [2026-04-18 15:08:20] Epoch 3 | Step 20660 | Loss: 0.7087 | LR: 2.00e-06 [2026-04-18 15:08:24] Epoch 3 | Step 20670 | Loss: 0.7080 | LR: 2.00e-06 [2026-04-18 15:08:27] Epoch 3 | Step 20680 | Loss: 0.7072 | LR: 2.00e-06 [2026-04-18 15:08:31] Epoch 3 | Step 20690 | Loss: 0.7080 | LR: 2.00e-06 [2026-04-18 15:08:34] Epoch 3 | Step 20700 | Loss: 0.7085 | LR: 2.00e-06 [2026-04-18 15:08:38] Epoch 3 | Step 20710 | Loss: 0.7082 | LR: 2.00e-06 [2026-04-18 15:08:41] Epoch 3 | Step 20720 | Loss: 0.7083 | LR: 2.00e-06 [2026-04-18 15:08:45] Epoch 3 | Step 20730 | Loss: 0.7073 | LR: 2.00e-06 [2026-04-18 15:08:49] Epoch 3 | Step 20740 | Loss: 0.7079 | LR: 2.00e-06 [2026-04-18 15:08:53] Epoch 3 | Step 20750 | Loss: 0.7075 | LR: 2.00e-06 [2026-04-18 15:08:56] Epoch 3 | Step 20760 | Loss: 0.7074 | LR: 2.00e-06 [2026-04-18 15:09:00] Epoch 3 | Step 20770 | Loss: 0.7067 | LR: 2.00e-06 [2026-04-18 15:09:03] Epoch 3 | Step 20780 | Loss: 0.7057 | LR: 2.00e-06 [2026-04-18 15:09:07] Epoch 3 | Step 20790 | Loss: 0.7055 | LR: 2.00e-06 [2026-04-18 15:09:11] Epoch 3 | Step 20800 | Loss: 0.7050 | LR: 2.00e-06 [2026-04-18 15:09:14] Epoch 3 | Step 20810 | Loss: 0.7038 | LR: 2.00e-06 [2026-04-18 15:09:18] Epoch 3 | Step 20820 | Loss: 0.7028 | LR: 2.00e-06 [2026-04-18 15:09:21] Epoch 3 | Step 20830 | Loss: 0.7030 | LR: 2.00e-06 [2026-04-18 15:09:25] Epoch 3 | Step 20840 | Loss: 0.7024 | LR: 2.00e-06 [2026-04-18 15:09:28] Epoch 3 | Step 20850 | Loss: 0.7026 | LR: 2.00e-06 [2026-04-18 15:09:31] Epoch 3 | Step 20860 | Loss: 0.7030 | LR: 2.00e-06 [2026-04-18 15:09:35] Epoch 3 | Step 20870 | Loss: 0.7030 | LR: 2.00e-06 [2026-04-18 15:09:38] Epoch 3 | Step 20880 | Loss: 0.7025 | LR: 2.00e-06 [2026-04-18 15:09:42] Epoch 3 | Step 20890 | Loss: 0.7018 | LR: 2.00e-06 [2026-04-18 15:09:46] Epoch 3 | Step 20900 | Loss: 0.7022 | LR: 2.00e-06 [2026-04-18 15:09:50] Epoch 3 | Step 20910 | Loss: 0.7023 | LR: 2.00e-06 [2026-04-18 15:09:53] Epoch 3 | Step 20920 | Loss: 0.7024 | LR: 2.00e-06 [2026-04-18 15:09:56] Epoch 3 | Step 20930 | Loss: 0.7026 | LR: 2.00e-06 [2026-04-18 15:10:00] Epoch 3 | Step 20940 | Loss: 0.7027 | LR: 2.00e-06 [2026-04-18 15:10:04] Epoch 3 | Step 20950 | Loss: 0.7014 | LR: 2.00e-06 [2026-04-18 15:10:08] Epoch 3 | Step 20960 | Loss: 0.7022 | LR: 2.00e-06 [2026-04-18 15:10:11] Epoch 3 | Step 20970 | Loss: 0.7020 | LR: 2.00e-06 [2026-04-18 15:10:15] Epoch 3 | Step 20980 | Loss: 0.7022 | LR: 2.00e-06 [2026-04-18 15:10:18] Epoch 3 | Step 20990 | Loss: 0.7021 | LR: 2.00e-06 [2026-04-18 15:10:21] Epoch 3 | Step 21000 | Loss: 0.7023 | LR: 2.00e-06 [2026-04-18 15:10:30] Checkpoint saved: outputs/2026-04-18/12-19-14/checkpoints/checkpoint_step_21000.pt [2026-04-18 15:10:46] Validation | Batch 10/1567 | Loss: 0.9415 [2026-04-18 15:10:47] Validation | Batch 20/1567 | Loss: 1.0097 [2026-04-18 15:10:48] Validation | Batch 30/1567 | Loss: 1.0489 [2026-04-18 15:10:49] Validation | Batch 40/1567 | Loss: 1.0725 [2026-04-18 15:10:49] Validation | Batch 50/1567 | Loss: 1.0477 [2026-04-18 15:10:51] Validation | Batch 60/1567 | Loss: 1.0341 [2026-04-18 15:10:51] Validation | Batch 70/1567 | Loss: 1.0199 [2026-04-18 15:10:52] Validation | Batch 80/1567 | Loss: 1.0378 [2026-04-18 15:10:53] Validation | Batch 90/1567 | Loss: 1.0457 [2026-04-18 15:10:54] Validation | Batch 100/1567 | Loss: 1.0561 [2026-04-18 15:10:55] Validation | Batch 110/1567 | Loss: 1.0478 [2026-04-18 15:10:56] Validation | Batch 120/1567 | Loss: 1.0591 [2026-04-18 15:10:57] Validation | Batch 130/1567 | Loss: 1.0604 [2026-04-18 15:10:57] Validation | Batch 140/1567 | Loss: 1.0626 [2026-04-18 15:10:58] Validation | Batch 150/1567 | Loss: 1.0704 [2026-04-18 15:10:59] Validation | Batch 160/1567 | Loss: 1.0714 [2026-04-18 15:11:00] Validation | Batch 170/1567 | Loss: 1.0559 [2026-04-18 15:11:00] Validation | Batch 180/1567 | Loss: 1.0583 [2026-04-18 15:11:01] Validation | Batch 190/1567 | Loss: 1.0545 [2026-04-18 15:11:02] Validation | Batch 200/1567 | Loss: 1.0573 [2026-04-18 15:11:03] Validation | Batch 210/1567 | Loss: 1.0581 [2026-04-18 15:11:04] Validation | Batch 220/1567 | Loss: 1.0604 [2026-04-18 15:11:05] Validation | Batch 230/1567 | Loss: 1.0643 [2026-04-18 15:11:06] Validation | Batch 240/1567 | Loss: 1.0626 [2026-04-18 15:11:06] Validation | Batch 250/1567 | Loss: 1.0564 [2026-04-18 15:11:07] Validation | Batch 260/1567 | Loss: 1.0517 [2026-04-18 15:11:08] Validation | Batch 270/1567 | Loss: 1.0488 [2026-04-18 15:11:09] Validation | Batch 280/1567 | Loss: 1.0499 [2026-04-18 15:11:10] Validation | Batch 290/1567 | Loss: 1.0553 [2026-04-18 15:11:11] Validation | Batch 300/1567 | Loss: 1.0602 [2026-04-18 15:11:11] Validation | Batch 310/1567 | Loss: 1.0590 [2026-04-18 15:11:12] Validation | Batch 320/1567 | Loss: 1.0597 [2026-04-18 15:11:13] Validation | Batch 330/1567 | Loss: 1.0568 [2026-04-18 15:11:14] Validation | Batch 340/1567 | Loss: 1.0610 [2026-04-18 15:11:15] Validation | Batch 350/1567 | Loss: 1.0598 [2026-04-18 15:11:15] Validation | Batch 360/1567 | Loss: 1.0575 [2026-04-18 15:11:16] Validation | Batch 370/1567 | Loss: 1.0548 [2026-04-18 15:11:17] Validation | Batch 380/1567 | Loss: 1.0579 [2026-04-18 15:11:18] Validation | Batch 390/1567 | Loss: 1.0591 [2026-04-18 15:11:18] Validation | Batch 400/1567 | Loss: 1.0604 [2026-04-18 15:11:19] Validation | Batch 410/1567 | Loss: 1.0596 [2026-04-18 15:11:20] Validation | Batch 420/1567 | Loss: 1.0592 [2026-04-18 15:11:21] Validation | Batch 430/1567 | Loss: 1.0591 [2026-04-18 15:11:22] Validation | Batch 440/1567 | Loss: 1.0579 [2026-04-18 15:11:23] Validation | Batch 450/1567 | Loss: 1.0580 [2026-04-18 15:11:24] Validation | Batch 460/1567 | Loss: 1.0569 [2026-04-18 15:11:24] Validation | Batch 470/1567 | Loss: 1.0563 [2026-04-18 15:11:25] Validation | Batch 480/1567 | Loss: 1.0541 [2026-04-18 15:11:26] Validation | Batch 490/1567 | Loss: 1.0539 [2026-04-18 15:11:27] Validation | Batch 500/1567 | Loss: 1.0534 [2026-04-18 15:11:28] Validation | Batch 510/1567 | Loss: 1.0558 [2026-04-18 15:11:28] Validation | Batch 520/1567 | Loss: 1.0575 [2026-04-18 15:11:29] Validation | Batch 530/1567 | Loss: 1.0572 [2026-04-18 15:11:30] Validation | Batch 540/1567 | Loss: 1.0599 [2026-04-18 15:11:31] Validation | Batch 550/1567 | Loss: 1.0635 [2026-04-18 15:11:32] Validation | Batch 560/1567 | Loss: 1.0633 [2026-04-18 15:11:33] Validation | Batch 570/1567 | Loss: 1.0633 [2026-04-18 15:11:34] Validation | Batch 580/1567 | Loss: 1.0625 [2026-04-18 15:11:34] Validation | Batch 590/1567 | Loss: 1.0611 [2026-04-18 15:11:35] Validation | Batch 600/1567 | Loss: 1.0593 [2026-04-18 15:11:36] Validation | Batch 610/1567 | Loss: 1.0583 [2026-04-18 15:11:37] Validation | Batch 620/1567 | Loss: 1.0598 [2026-04-18 15:11:38] Validation | Batch 630/1567 | Loss: 1.0577 [2026-04-18 15:11:39] Validation | Batch 640/1567 | Loss: 1.0594 [2026-04-18 15:11:40] Validation | Batch 650/1567 | Loss: 1.0585 [2026-04-18 15:11:40] Validation | Batch 660/1567 | Loss: 1.0573 [2026-04-18 15:11:41] Validation | Batch 670/1567 | Loss: 1.0554 [2026-04-18 15:11:42] Validation | Batch 680/1567 | Loss: 1.0548 [2026-04-18 15:11:42] Validation | Batch 690/1567 | Loss: 1.0557 [2026-04-18 15:11:43] Validation | Batch 700/1567 | Loss: 1.0542 [2026-04-18 15:11:44] Validation | Batch 710/1567 | Loss: 1.0557 [2026-04-18 15:11:45] Validation | Batch 720/1567 | Loss: 1.0549 [2026-04-18 15:11:46] Validation | Batch 730/1567 | Loss: 1.0555 [2026-04-18 15:11:47] Validation | Batch 740/1567 | Loss: 1.0566 [2026-04-18 15:11:47] Validation | Batch 750/1567 | Loss: 1.0572 [2026-04-18 15:11:48] Validation | Batch 760/1567 | Loss: 1.0569 [2026-04-18 15:11:49] Validation | Batch 770/1567 | Loss: 1.0590 [2026-04-18 15:11:50] Validation | Batch 780/1567 | Loss: 1.0602 [2026-04-18 15:11:51] Validation | Batch 790/1567 | Loss: 1.0597 [2026-04-18 15:11:51] Validation | Batch 800/1567 | Loss: 1.0616 [2026-04-18 15:11:52] Validation | Batch 810/1567 | Loss: 1.0616 [2026-04-18 15:11:53] Validation | Batch 820/1567 | Loss: 1.0612 [2026-04-18 15:11:54] Validation | Batch 830/1567 | Loss: 1.0596 [2026-04-18 15:11:54] Validation | Batch 840/1567 | Loss: 1.0596 [2026-04-18 15:11:55] Validation | Batch 850/1567 | Loss: 1.0583 [2026-04-18 15:11:56] Validation | Batch 860/1567 | Loss: 1.0599 [2026-04-18 15:11:57] Validation | Batch 870/1567 | Loss: 1.0604 [2026-04-18 15:11:57] Validation | Batch 880/1567 | Loss: 1.0613 [2026-04-18 15:11:58] Validation | Batch 890/1567 | Loss: 1.0618 [2026-04-18 15:11:59] Validation | Batch 900/1567 | Loss: 1.0638 [2026-04-18 15:12:00] Validation | Batch 910/1567 | Loss: 1.0639 [2026-04-18 15:12:00] Validation | Batch 920/1567 | Loss: 1.0662 [2026-04-18 15:12:01] Validation | Batch 930/1567 | Loss: 1.0638 [2026-04-18 15:12:02] Validation | Batch 940/1567 | Loss: 1.0635 [2026-04-18 15:12:03] Validation | Batch 950/1567 | Loss: 1.0624 [2026-04-18 15:12:03] Validation | Batch 960/1567 | Loss: 1.0611 [2026-04-18 15:12:04] Validation | Batch 970/1567 | Loss: 1.0628 [2026-04-18 15:12:05] Validation | Batch 980/1567 | Loss: 1.0632 [2026-04-18 15:12:05] Validation | Batch 990/1567 | Loss: 1.0626 [2026-04-18 15:12:06] Validation | Batch 1000/1567 | Loss: 1.0630 [2026-04-18 15:12:07] Validation | Batch 1010/1567 | Loss: 1.0607 [2026-04-18 15:12:08] Validation | Batch 1020/1567 | Loss: 1.0610 [2026-04-18 15:12:09] Validation | Batch 1030/1567 | Loss: 1.0626 [2026-04-18 15:12:09] Validation | Batch 1040/1567 | Loss: 1.0621 [2026-04-18 15:12:10] Validation | Batch 1050/1567 | Loss: 1.0632 [2026-04-18 15:12:11] Validation | Batch 1060/1567 | Loss: 1.0622 [2026-04-18 15:12:12] Validation | Batch 1070/1567 | Loss: 1.0614 [2026-04-18 15:12:13] Validation | Batch 1080/1567 | Loss: 1.0624 [2026-04-18 15:12:13] Validation | Batch 1090/1567 | Loss: 1.0621 [2026-04-18 15:12:14] Validation | Batch 1100/1567 | Loss: 1.0627 [2026-04-18 15:12:14] Validation | Batch 1110/1567 | Loss: 1.0626 [2026-04-18 15:12:15] Validation | Batch 1120/1567 | Loss: 1.0629 [2026-04-18 15:12:16] Validation | Batch 1130/1567 | Loss: 1.0629 [2026-04-18 15:12:17] Validation | Batch 1140/1567 | Loss: 1.0637 [2026-04-18 15:12:18] Validation | Batch 1150/1567 | Loss: 1.0641 [2026-04-18 15:12:19] Validation | Batch 1160/1567 | Loss: 1.0650 [2026-04-18 15:12:19] Validation | Batch 1170/1567 | Loss: 1.0647 [2026-04-18 15:12:20] Validation | Batch 1180/1567 | Loss: 1.0643 [2026-04-18 15:12:21] Validation | Batch 1190/1567 | Loss: 1.0654 [2026-04-18 15:12:22] Validation | Batch 1200/1567 | Loss: 1.0648 [2026-04-18 15:12:23] Validation | Batch 1210/1567 | Loss: 1.0636 [2026-04-18 15:12:23] Validation | Batch 1220/1567 | Loss: 1.0639 [2026-04-18 15:12:24] Validation | Batch 1230/1567 | Loss: 1.0660 [2026-04-18 15:12:25] Validation | Batch 1240/1567 | Loss: 1.0648 [2026-04-18 15:12:26] Validation | Batch 1250/1567 | Loss: 1.0648 [2026-04-18 15:12:27] Validation | Batch 1260/1567 | Loss: 1.0658 [2026-04-18 15:12:28] Validation | Batch 1270/1567 | Loss: 1.0658 [2026-04-18 15:12:28] Validation | Batch 1280/1567 | Loss: 1.0652 [2026-04-18 15:12:30] Validation | Batch 1290/1567 | Loss: 1.0655 [2026-04-18 15:12:30] Validation | Batch 1300/1567 | Loss: 1.0658 [2026-04-18 15:12:31] Validation | Batch 1310/1567 | Loss: 1.0661 [2026-04-18 15:12:32] Validation | Batch 1320/1567 | Loss: 1.0652 [2026-04-18 15:12:33] Validation | Batch 1330/1567 | Loss: 1.0648 [2026-04-18 15:12:33] Validation | Batch 1340/1567 | Loss: 1.0646 [2026-04-18 15:12:34] Validation | Batch 1350/1567 | Loss: 1.0655 [2026-04-18 15:12:35] Validation | Batch 1360/1567 | Loss: 1.0651 [2026-04-18 15:12:36] Validation | Batch 1370/1567 | Loss: 1.0655 [2026-04-18 15:12:37] Validation | Batch 1380/1567 | Loss: 1.0668 [2026-04-18 15:12:37] Validation | Batch 1390/1567 | Loss: 1.0669 [2026-04-18 15:12:38] Validation | Batch 1400/1567 | Loss: 1.0673 [2026-04-18 15:12:39] Validation | Batch 1410/1567 | Loss: 1.0671 [2026-04-18 15:12:39] Validation | Batch 1420/1567 | Loss: 1.0677 [2026-04-18 15:12:40] Validation | Batch 1430/1567 | Loss: 1.0674 [2026-04-18 15:12:41] Validation | Batch 1440/1567 | Loss: 1.0677 [2026-04-18 15:12:42] Validation | Batch 1450/1567 | Loss: 1.0670 [2026-04-18 15:12:42] Validation | Batch 1460/1567 | Loss: 1.0668 [2026-04-18 15:12:43] Validation | Batch 1470/1567 | Loss: 1.0658 [2026-04-18 15:12:44] Validation | Batch 1480/1567 | Loss: 1.0642 [2026-04-18 15:12:44] Validation | Batch 1490/1567 | Loss: 1.0643 [2026-04-18 15:12:45] Validation | Batch 1500/1567 | Loss: 1.0644 [2026-04-18 15:12:46] Validation | Batch 1510/1567 | Loss: 1.0642 [2026-04-18 15:12:47] Validation | Batch 1520/1567 | Loss: 1.0635 [2026-04-18 15:12:47] Validation | Batch 1530/1567 | Loss: 1.0643 [2026-04-18 15:12:48] Validation | Batch 1540/1567 | Loss: 1.0653 [2026-04-18 15:12:49] Validation | Batch 1550/1567 | Loss: 1.0656 [2026-04-18 15:12:50] Validation | Batch 1560/1567 | Loss: 1.0647 [2026-04-18 15:12:51] Validation | Batch 1567/1567 | Loss: 1.0651 [2026-04-18 15:12:51] Validation | Loss: 1.0651 | PPL: 2.92 | Time: 125.38s [2026-04-18 15:12:54] Epoch 3 | Step 21010 | Loss: 0.7025 | LR: 2.00e-06 [2026-04-18 15:12:58] Epoch 3 | Step 21020 | Loss: 0.7029 | LR: 2.00e-06 [2026-04-18 15:13:02] Epoch 3 | Step 21030 | Loss: 0.7031 | LR: 2.00e-06 [2026-04-18 15:13:05] Epoch 3 | Step 21040 | Loss: 0.7039 | LR: 2.00e-06 [2026-04-18 15:13:09] Epoch 3 | Step 21050 | Loss: 0.7038 | LR: 2.00e-06 [2026-04-18 15:13:12] Epoch 3 | Step 21060 | Loss: 0.7037 | LR: 2.00e-06 [2026-04-18 15:13:16] Epoch 3 | Step 21070 | Loss: 0.7046 | LR: 2.00e-06 [2026-04-18 15:13:19] Epoch 3 | Step 21080 | Loss: 0.7047 | LR: 2.00e-06 [2026-04-18 15:13:23] Epoch 3 | Step 21090 | Loss: 0.7048 | LR: 2.00e-06 [2026-04-18 15:13:26] Epoch 3 | Step 21100 | Loss: 0.7050 | LR: 2.00e-06 [2026-04-18 15:13:29] Epoch 3 | Step 21110 | Loss: 0.7057 | LR: 2.00e-06 [2026-04-18 15:13:32] Epoch 3 | Step 21120 | Loss: 0.7061 | LR: 2.00e-06 [2026-04-18 15:13:36] Epoch 3 | Step 21130 | Loss: 0.7059 | LR: 2.00e-06 [2026-04-18 15:13:40] Epoch 3 | Step 21140 | Loss: 0.7047 | LR: 2.00e-06 [2026-04-18 15:13:44] Epoch 3 | Step 21150 | Loss: 0.7033 | LR: 2.00e-06 [2026-04-18 15:13:47] Epoch 3 | Step 21160 | Loss: 0.7039 | LR: 2.00e-06 [2026-04-18 15:13:51] Epoch 3 | Step 21170 | Loss: 0.7034 | LR: 2.00e-06 [2026-04-18 15:13:55] Epoch 3 | Step 21180 | Loss: 0.7026 | LR: 2.00e-06 [2026-04-18 15:13:58] Epoch 3 | Step 21190 | Loss: 0.7016 | LR: 2.00e-06 [2026-04-18 15:14:02] Epoch 3 | Step 21200 | Loss: 0.7014 | LR: 2.00e-06 [2026-04-18 15:14:06] Epoch 3 | Step 21210 | Loss: 0.7017 | LR: 2.00e-06 [2026-04-18 15:14:10] Epoch 3 | Step 21220 | Loss: 0.7011 | LR: 2.00e-06 [2026-04-18 15:14:13] Epoch 3 | Step 21230 | Loss: 0.7016 | LR: 2.00e-06 [2026-04-18 15:14:17] Epoch 3 | Step 21240 | Loss: 0.7017 | LR: 2.00e-06 [2026-04-18 15:14:20] Epoch 3 | Step 21250 | Loss: 0.7021 | LR: 2.00e-06 [2026-04-18 15:14:24] Epoch 3 | Step 21260 | Loss: 0.7024 | LR: 2.00e-06 [2026-04-18 15:14:27] Epoch 3 | Step 21270 | Loss: 0.7019 | LR: 2.00e-06 [2026-04-18 15:14:31] Epoch 3 | Step 21280 | Loss: 0.7016 | LR: 2.00e-06 [2026-04-18 15:14:34] Epoch 3 | Step 21290 | Loss: 0.7014 | LR: 2.00e-06 [2026-04-18 15:14:38] Epoch 3 | Step 21300 | Loss: 0.7015 | LR: 2.00e-06 [2026-04-18 15:14:42] Epoch 3 | Step 21310 | Loss: 0.7023 | LR: 2.00e-06 [2026-04-18 15:14:45] Epoch 3 | Step 21320 | Loss: 0.7023 | LR: 2.00e-06 [2026-04-18 15:14:49] Epoch 3 | Step 21330 | Loss: 0.7024 | LR: 2.00e-06 [2026-04-18 15:14:52] Epoch 3 | Step 21340 | Loss: 0.7021 | LR: 2.00e-06 [2026-04-18 15:14:56] Epoch 3 | Step 21350 | Loss: 0.7024 | LR: 2.00e-06 [2026-04-18 15:14:59] Epoch 3 | Step 21360 | Loss: 0.7020 | LR: 2.00e-06 [2026-04-18 15:15:03] Epoch 3 | Step 21370 | Loss: 0.7016 | LR: 2.00e-06 [2026-04-18 15:15:06] Epoch 3 | Step 21380 | Loss: 0.7021 | LR: 2.00e-06 [2026-04-18 15:15:10] Epoch 3 | Step 21390 | Loss: 0.7020 | LR: 2.00e-06 [2026-04-18 15:15:13] Epoch 3 | Step 21400 | Loss: 0.7024 | LR: 2.00e-06 [2026-04-18 15:15:17] Epoch 3 | Step 21410 | Loss: 0.7019 | LR: 2.00e-06 [2026-04-18 15:15:21] Epoch 3 | Step 21420 | Loss: 0.7020 | LR: 2.00e-06 [2026-04-18 15:15:24] Epoch 3 | Step 21430 | Loss: 0.7016 | LR: 2.00e-06 [2026-04-18 15:15:28] Epoch 3 | Step 21440 | Loss: 0.7017 | LR: 2.00e-06 [2026-04-18 15:15:31] Epoch 3 | Step 21450 | Loss: 0.7020 | LR: 2.00e-06 [2026-04-18 15:15:35] Epoch 3 | Step 21460 | Loss: 0.7019 | LR: 2.00e-06 [2026-04-18 15:15:39] Epoch 3 | Step 21470 | Loss: 0.7015 | LR: 2.00e-06 [2026-04-18 15:15:43] Epoch 3 | Step 21480 | Loss: 0.7013 | LR: 2.00e-06 [2026-04-18 15:15:46] Epoch 3 | Step 21490 | Loss: 0.7014 | LR: 2.00e-06 [2026-04-18 15:15:49] Epoch 3 | Step 21500 | Loss: 0.7014 | LR: 2.00e-06 [2026-04-18 15:15:53] Epoch 3 | Step 21510 | Loss: 0.7013 | LR: 2.00e-06 [2026-04-18 15:15:57] Epoch 3 | Step 21520 | Loss: 0.7006 | LR: 2.00e-06 [2026-04-18 15:16:01] Epoch 3 | Step 21530 | Loss: 0.7006 | LR: 2.00e-06 [2026-04-18 15:16:04] Epoch 3 | Step 21540 | Loss: 0.7005 | LR: 2.00e-06 [2026-04-18 15:16:08] Epoch 3 | Step 21550 | Loss: 0.7009 | LR: 2.00e-06 [2026-04-18 15:16:11] Epoch 3 | Step 21560 | Loss: 0.7008 | LR: 2.00e-06 [2026-04-18 15:16:15] Epoch 3 | Step 21570 | Loss: 0.7012 | LR: 2.00e-06 [2026-04-18 15:16:19] Epoch 3 | Step 21580 | Loss: 0.7006 | LR: 2.00e-06 [2026-04-18 15:16:22] Epoch 3 | Step 21590 | Loss: 0.7001 | LR: 2.00e-06 [2026-04-18 15:16:26] Epoch 3 | Step 21600 | Loss: 0.7001 | LR: 2.00e-06 [2026-04-18 15:16:29] Epoch 3 | Step 21610 | Loss: 0.6996 | LR: 2.00e-06 [2026-04-18 15:16:33] Epoch 3 | Step 21620 | Loss: 0.7001 | LR: 2.00e-06 [2026-04-18 15:16:37] Epoch 3 | Step 21630 | Loss: 0.7004 | LR: 2.00e-06 [2026-04-18 15:16:41] Epoch 3 | Step 21640 | Loss: 0.7003 | LR: 2.00e-06 [2026-04-18 15:16:44] Epoch 3 | Step 21650 | Loss: 0.6997 | LR: 2.00e-06 [2026-04-18 15:16:48] Epoch 3 | Step 21660 | Loss: 0.7003 | LR: 2.00e-06 [2026-04-18 15:16:51] Epoch 3 | Step 21670 | Loss: 0.6992 | LR: 2.00e-06 [2026-04-18 15:16:55] Epoch 3 | Step 21680 | Loss: 0.6991 | LR: 2.00e-06 [2026-04-18 15:16:58] Epoch 3 | Step 21690 | Loss: 0.6990 | LR: 2.00e-06 [2026-04-18 15:17:02] Epoch 3 | Step 21700 | Loss: 0.6988 | LR: 2.00e-06 [2026-04-18 15:17:05] Epoch 3 | Step 21710 | Loss: 0.6992 | LR: 2.00e-06 [2026-04-18 15:17:09] Epoch 3 | Step 21720 | Loss: 0.6989 | LR: 2.00e-06 [2026-04-18 15:17:12] Epoch 3 | Step 21730 | Loss: 0.6987 | LR: 2.00e-06 [2026-04-18 15:17:16] Epoch 3 | Step 21740 | Loss: 0.6991 | LR: 2.00e-06 [2026-04-18 15:17:19] Epoch 3 | Step 21750 | Loss: 0.6991 | LR: 2.00e-06 [2026-04-18 15:17:23] Epoch 3 | Step 21760 | Loss: 0.6986 | LR: 2.00e-06 [2026-04-18 15:17:27] Epoch 3 | Step 21770 | Loss: 0.6989 | LR: 2.00e-06 [2026-04-18 15:17:30] Epoch 3 | Step 21780 | Loss: 0.6983 | LR: 2.00e-06 [2026-04-18 15:17:33] Epoch 3 | Step 21790 | Loss: 0.6983 | LR: 2.00e-06 [2026-04-18 15:17:37] Epoch 3 | Step 21800 | Loss: 0.6985 | LR: 2.00e-06 [2026-04-18 15:17:40] Epoch 3 | Step 21810 | Loss: 0.6982 | LR: 2.00e-06 [2026-04-18 15:17:44] Epoch 3 | Step 21820 | Loss: 0.6982 | LR: 2.00e-06 [2026-04-18 15:17:48] Epoch 3 | Step 21830 | Loss: 0.6977 | LR: 2.00e-06 [2026-04-18 15:17:51] Epoch 3 | Step 21840 | Loss: 0.6977 | LR: 2.00e-06 [2026-04-18 15:17:54] Epoch 3 | Step 21850 | Loss: 0.6977 | LR: 2.00e-06 [2026-04-18 15:17:58] Epoch 3 | Step 21860 | Loss: 0.6975 | LR: 2.00e-06 [2026-04-18 15:18:02] Epoch 3 | Step 21870 | Loss: 0.6976 | LR: 2.00e-06 [2026-04-18 15:18:06] Epoch 3 | Step 21880 | Loss: 0.6977 | LR: 2.00e-06 [2026-04-18 15:18:09] Epoch 3 | Step 21890 | Loss: 0.6975 | LR: 2.00e-06 [2026-04-18 15:18:13] Epoch 3 | Step 21900 | Loss: 0.6980 | LR: 2.00e-06 [2026-04-18 15:18:16] Epoch 3 | Step 21910 | Loss: 0.6981 | LR: 2.00e-06 [2026-04-18 15:18:20] Epoch 3 | Step 21920 | Loss: 0.6984 | LR: 2.00e-06 [2026-04-18 15:18:23] Epoch 3 | Step 21930 | Loss: 0.6986 | LR: 2.00e-06 [2026-04-18 15:18:27] Epoch 3 | Step 21940 | Loss: 0.6992 | LR: 2.00e-06 [2026-04-18 15:18:31] Epoch 3 | Step 21950 | Loss: 0.6995 | LR: 2.00e-06 [2026-04-18 15:18:34] Epoch 3 | Step 21960 | Loss: 0.6993 | LR: 2.00e-06 [2026-04-18 15:18:37] Epoch 3 | Step 21970 | Loss: 0.6996 | LR: 2.00e-06 [2026-04-18 15:18:41] Epoch 3 | Step 21980 | Loss: 0.6998 | LR: 2.00e-06 [2026-04-18 15:18:45] Epoch 3 | Step 21990 | Loss: 0.6998 | LR: 2.00e-06 [2026-04-18 15:18:48] Epoch 3 | Step 22000 | Loss: 0.6996 | LR: 2.00e-06 [2026-04-18 15:18:49] Validation | Batch 10/1567 | Loss: 0.9440 [2026-04-18 15:18:50] Validation | Batch 20/1567 | Loss: 1.0127 [2026-04-18 15:18:51] Validation | Batch 30/1567 | Loss: 1.0522 [2026-04-18 15:18:52] Validation | Batch 40/1567 | Loss: 1.0754 [2026-04-18 15:18:52] Validation | Batch 50/1567 | Loss: 1.0505 [2026-04-18 15:18:53] Validation | Batch 60/1567 | Loss: 1.0372 [2026-04-18 15:18:54] Validation | Batch 70/1567 | Loss: 1.0224 [2026-04-18 15:18:55] Validation | Batch 80/1567 | Loss: 1.0410 [2026-04-18 15:18:56] Validation | Batch 90/1567 | Loss: 1.0491 [2026-04-18 15:18:57] Validation | Batch 100/1567 | Loss: 1.0593 [2026-04-18 15:18:58] Validation | Batch 110/1567 | Loss: 1.0511 [2026-04-18 15:18:58] Validation | Batch 120/1567 | Loss: 1.0621 [2026-04-18 15:18:59] Validation | Batch 130/1567 | Loss: 1.0636 [2026-04-18 15:19:00] Validation | Batch 140/1567 | Loss: 1.0658 [2026-04-18 15:19:01] Validation | Batch 150/1567 | Loss: 1.0736 [2026-04-18 15:19:02] Validation | Batch 160/1567 | Loss: 1.0746 [2026-04-18 15:19:02] Validation | Batch 170/1567 | Loss: 1.0591 [2026-04-18 15:19:03] Validation | Batch 180/1567 | Loss: 1.0614 [2026-04-18 15:19:04] Validation | Batch 190/1567 | Loss: 1.0576 [2026-04-18 15:19:05] Validation | Batch 200/1567 | Loss: 1.0602 [2026-04-18 15:19:06] Validation | Batch 210/1567 | Loss: 1.0612 [2026-04-18 15:19:07] Validation | Batch 220/1567 | Loss: 1.0634 [2026-04-18 15:19:08] Validation | Batch 230/1567 | Loss: 1.0672 [2026-04-18 15:19:09] Validation | Batch 240/1567 | Loss: 1.0655 [2026-04-18 15:19:09] Validation | Batch 250/1567 | Loss: 1.0594 [2026-04-18 15:19:10] Validation | Batch 260/1567 | Loss: 1.0546 [2026-04-18 15:19:11] Validation | Batch 270/1567 | Loss: 1.0516 [2026-04-18 15:19:11] Validation | Batch 280/1567 | Loss: 1.0527 [2026-04-18 15:19:13] Validation | Batch 290/1567 | Loss: 1.0581 [2026-04-18 15:19:13] Validation | Batch 300/1567 | Loss: 1.0630 [2026-04-18 15:19:14] Validation | Batch 310/1567 | Loss: 1.0618 [2026-04-18 15:19:15] Validation | Batch 320/1567 | Loss: 1.0626 [2026-04-18 15:19:16] Validation | Batch 330/1567 | Loss: 1.0595 [2026-04-18 15:19:17] Validation | Batch 340/1567 | Loss: 1.0636 [2026-04-18 15:19:18] Validation | Batch 350/1567 | Loss: 1.0625 [2026-04-18 15:19:18] Validation | Batch 360/1567 | Loss: 1.0602 [2026-04-18 15:19:19] Validation | Batch 370/1567 | Loss: 1.0574 [2026-04-18 15:19:20] Validation | Batch 380/1567 | Loss: 1.0607 [2026-04-18 15:19:21] Validation | Batch 390/1567 | Loss: 1.0619 [2026-04-18 15:19:21] Validation | Batch 400/1567 | Loss: 1.0631 [2026-04-18 15:19:22] Validation | Batch 410/1567 | Loss: 1.0624 [2026-04-18 15:19:23] Validation | Batch 420/1567 | Loss: 1.0620 [2026-04-18 15:19:24] Validation | Batch 430/1567 | Loss: 1.0619 [2026-04-18 15:19:25] Validation | Batch 440/1567 | Loss: 1.0607 [2026-04-18 15:19:25] Validation | Batch 450/1567 | Loss: 1.0608 [2026-04-18 15:19:27] Validation | Batch 460/1567 | Loss: 1.0599 [2026-04-18 15:19:27] Validation | Batch 470/1567 | Loss: 1.0592 [2026-04-18 15:19:28] Validation | Batch 480/1567 | Loss: 1.0570 [2026-04-18 15:19:29] Validation | Batch 490/1567 | Loss: 1.0568 [2026-04-18 15:19:29] Validation | Batch 500/1567 | Loss: 1.0563 [2026-04-18 15:19:30] Validation | Batch 510/1567 | Loss: 1.0586 [2026-04-18 15:19:31] Validation | Batch 520/1567 | Loss: 1.0604 [2026-04-18 15:19:32] Validation | Batch 530/1567 | Loss: 1.0601 [2026-04-18 15:19:33] Validation | Batch 540/1567 | Loss: 1.0628 [2026-04-18 15:19:34] Validation | Batch 550/1567 | Loss: 1.0664 [2026-04-18 15:19:34] Validation | Batch 560/1567 | Loss: 1.0662 [2026-04-18 15:19:35] Validation | Batch 570/1567 | Loss: 1.0662 [2026-04-18 15:19:36] Validation | Batch 580/1567 | Loss: 1.0653 [2026-04-18 15:19:37] Validation | Batch 590/1567 | Loss: 1.0640 [2026-04-18 15:19:38] Validation | Batch 600/1567 | Loss: 1.0622 [2026-04-18 15:19:39] Validation | Batch 610/1567 | Loss: 1.0612 [2026-04-18 15:19:40] Validation | Batch 620/1567 | Loss: 1.0627 [2026-04-18 15:19:41] Validation | Batch 630/1567 | Loss: 1.0606 [2026-04-18 15:19:41] Validation | Batch 640/1567 | Loss: 1.0623 [2026-04-18 15:19:42] Validation | Batch 650/1567 | Loss: 1.0614 [2026-04-18 15:19:43] Validation | Batch 660/1567 | Loss: 1.0602 [2026-04-18 15:19:44] Validation | Batch 670/1567 | Loss: 1.0583 [2026-04-18 15:19:45] Validation | Batch 680/1567 | Loss: 1.0576 [2026-04-18 15:19:45] Validation | Batch 690/1567 | Loss: 1.0585 [2026-04-18 15:19:46] Validation | Batch 700/1567 | Loss: 1.0571 [2026-04-18 15:19:47] Validation | Batch 710/1567 | Loss: 1.0584 [2026-04-18 15:19:48] Validation | Batch 720/1567 | Loss: 1.0577 [2026-04-18 15:19:48] Validation | Batch 730/1567 | Loss: 1.0583 [2026-04-18 15:19:49] Validation | Batch 740/1567 | Loss: 1.0594 [2026-04-18 15:19:50] Validation | Batch 750/1567 | Loss: 1.0600 [2026-04-18 15:19:51] Validation | Batch 760/1567 | Loss: 1.0598 [2026-04-18 15:19:52] Validation | Batch 770/1567 | Loss: 1.0618 [2026-04-18 15:19:53] Validation | Batch 780/1567 | Loss: 1.0631 [2026-04-18 15:19:53] Validation | Batch 790/1567 | Loss: 1.0625 [2026-04-18 15:19:54] Validation | Batch 800/1567 | Loss: 1.0644 [2026-04-18 15:19:55] Validation | Batch 810/1567 | Loss: 1.0644 [2026-04-18 15:19:56] Validation | Batch 820/1567 | Loss: 1.0639 [2026-04-18 15:19:56] Validation | Batch 830/1567 | Loss: 1.0624 [2026-04-18 15:19:57] Validation | Batch 840/1567 | Loss: 1.0624 [2026-04-18 15:19:58] Validation | Batch 850/1567 | Loss: 1.0610 [2026-04-18 15:19:58] Validation | Batch 860/1567 | Loss: 1.0627 [2026-04-18 15:19:59] Validation | Batch 870/1567 | Loss: 1.0631 [2026-04-18 15:20:00] Validation | Batch 880/1567 | Loss: 1.0640 [2026-04-18 15:20:01] Validation | Batch 890/1567 | Loss: 1.0646 [2026-04-18 15:20:02] Validation | Batch 900/1567 | Loss: 1.0666 [2026-04-18 15:20:02] Validation | Batch 910/1567 | Loss: 1.0667 [2026-04-18 15:20:03] Validation | Batch 920/1567 | Loss: 1.0689 [2026-04-18 15:20:04] Validation | Batch 930/1567 | Loss: 1.0665 [2026-04-18 15:20:04] Validation | Batch 940/1567 | Loss: 1.0661 [2026-04-18 15:20:05] Validation | Batch 950/1567 | Loss: 1.0651 [2026-04-18 15:20:06] Validation | Batch 960/1567 | Loss: 1.0637 [2026-04-18 15:20:07] Validation | Batch 970/1567 | Loss: 1.0655 [2026-04-18 15:20:07] Validation | Batch 980/1567 | Loss: 1.0658 [2026-04-18 15:20:08] Validation | Batch 990/1567 | Loss: 1.0652 [2026-04-18 15:20:09] Validation | Batch 1000/1567 | Loss: 1.0656 [2026-04-18 15:20:09] Validation | Batch 1010/1567 | Loss: 1.0633 [2026-04-18 15:20:10] Validation | Batch 1020/1567 | Loss: 1.0636 [2026-04-18 15:20:11] Validation | Batch 1030/1567 | Loss: 1.0653 [2026-04-18 15:20:12] Validation | Batch 1040/1567 | Loss: 1.0648 [2026-04-18 15:20:13] Validation | Batch 1050/1567 | Loss: 1.0658 [2026-04-18 15:20:14] Validation | Batch 1060/1567 | Loss: 1.0649 [2026-04-18 15:20:15] Validation | Batch 1070/1567 | Loss: 1.0641 [2026-04-18 15:20:15] Validation | Batch 1080/1567 | Loss: 1.0650 [2026-04-18 15:20:16] Validation | Batch 1090/1567 | Loss: 1.0648 [2026-04-18 15:20:17] Validation | Batch 1100/1567 | Loss: 1.0654 [2026-04-18 15:20:17] Validation | Batch 1110/1567 | Loss: 1.0652 [2026-04-18 15:20:18] Validation | Batch 1120/1567 | Loss: 1.0655 [2026-04-18 15:20:19] Validation | Batch 1130/1567 | Loss: 1.0655 [2026-04-18 15:20:20] Validation | Batch 1140/1567 | Loss: 1.0664 [2026-04-18 15:20:21] Validation | Batch 1150/1567 | Loss: 1.0668 [2026-04-18 15:20:21] Validation | Batch 1160/1567 | Loss: 1.0676 [2026-04-18 15:20:22] Validation | Batch 1170/1567 | Loss: 1.0673 [2026-04-18 15:20:23] Validation | Batch 1180/1567 | Loss: 1.0669 [2026-04-18 15:20:24] Validation | Batch 1190/1567 | Loss: 1.0680 [2026-04-18 15:20:25] Validation | Batch 1200/1567 | Loss: 1.0674 [2026-04-18 15:20:25] Validation | Batch 1210/1567 | Loss: 1.0662 [2026-04-18 15:20:26] Validation | Batch 1220/1567 | Loss: 1.0665 [2026-04-18 15:20:27] Validation | Batch 1230/1567 | Loss: 1.0687 [2026-04-18 15:20:28] Validation | Batch 1240/1567 | Loss: 1.0674 [2026-04-18 15:20:28] Validation | Batch 1250/1567 | Loss: 1.0674 [2026-04-18 15:20:29] Validation | Batch 1260/1567 | Loss: 1.0684 [2026-04-18 15:20:30] Validation | Batch 1270/1567 | Loss: 1.0684 [2026-04-18 15:20:31] Validation | Batch 1280/1567 | Loss: 1.0678 [2026-04-18 15:20:32] Validation | Batch 1290/1567 | Loss: 1.0681 [2026-04-18 15:20:33] Validation | Batch 1300/1567 | Loss: 1.0684 [2026-04-18 15:20:34] Validation | Batch 1310/1567 | Loss: 1.0688 [2026-04-18 15:20:35] Validation | Batch 1320/1567 | Loss: 1.0678 [2026-04-18 15:20:35] Validation | Batch 1330/1567 | Loss: 1.0675 [2026-04-18 15:20:36] Validation | Batch 1340/1567 | Loss: 1.0672 [2026-04-18 15:20:37] Validation | Batch 1350/1567 | Loss: 1.0681 [2026-04-18 15:20:38] Validation | Batch 1360/1567 | Loss: 1.0678 [2026-04-18 15:20:38] Validation | Batch 1370/1567 | Loss: 1.0681 [2026-04-18 15:20:39] Validation | Batch 1380/1567 | Loss: 1.0695 [2026-04-18 15:20:40] Validation | Batch 1390/1567 | Loss: 1.0696 [2026-04-18 15:20:41] Validation | Batch 1400/1567 | Loss: 1.0700 [2026-04-18 15:20:41] Validation | Batch 1410/1567 | Loss: 1.0698 [2026-04-18 15:20:42] Validation | Batch 1420/1567 | Loss: 1.0704 [2026-04-18 15:20:43] Validation | Batch 1430/1567 | Loss: 1.0701 [2026-04-18 15:20:44] Validation | Batch 1440/1567 | Loss: 1.0704 [2026-04-18 15:20:44] Validation | Batch 1450/1567 | Loss: 1.0697 [2026-04-18 15:20:45] Validation | Batch 1460/1567 | Loss: 1.0695 [2026-04-18 15:20:46] Validation | Batch 1470/1567 | Loss: 1.0685 [2026-04-18 15:20:46] Validation | Batch 1480/1567 | Loss: 1.0669 [2026-04-18 15:20:47] Validation | Batch 1490/1567 | Loss: 1.0669 [2026-04-18 15:20:48] Validation | Batch 1500/1567 | Loss: 1.0670 [2026-04-18 15:20:49] Validation | Batch 1510/1567 | Loss: 1.0668 [2026-04-18 15:20:49] Validation | Batch 1520/1567 | Loss: 1.0661 [2026-04-18 15:20:50] Validation | Batch 1530/1567 | Loss: 1.0669 [2026-04-18 15:20:51] Validation | Batch 1540/1567 | Loss: 1.0679 [2026-04-18 15:20:52] Validation | Batch 1550/1567 | Loss: 1.0682 [2026-04-18 15:20:53] Validation | Batch 1560/1567 | Loss: 1.0673 [2026-04-18 15:20:53] Validation | Batch 1567/1567 | Loss: 1.0677 [2026-04-18 15:20:53] Validation | Loss: 1.0677 | PPL: 2.93 | Time: 125.38s [2026-04-18 15:20:57] Epoch 3 | Step 22010 | Loss: 0.6998 | LR: 2.00e-06 [2026-04-18 15:21:01] Epoch 3 | Step 22020 | Loss: 0.7001 | LR: 2.00e-06 [2026-04-18 15:21:05] Epoch 3 | Step 22030 | Loss: 0.7000 | LR: 2.00e-06 [2026-04-18 15:21:08] Epoch 3 | Step 22040 | Loss: 0.7001 | LR: 2.00e-06 [2026-04-18 15:21:12] Epoch 3 | Step 22050 | Loss: 0.7001 | LR: 2.00e-06 [2026-04-18 15:21:16] Epoch 3 | Step 22060 | Loss: 0.6998 | LR: 2.00e-06 [2026-04-18 15:21:19] Epoch 3 | Step 22070 | Loss: 0.7000 | LR: 2.00e-06 [2026-04-18 15:21:22] Epoch 3 | Step 22080 | Loss: 0.6999 | LR: 2.00e-06 [2026-04-18 15:21:26] Epoch 3 | Step 22090 | Loss: 0.6997 | LR: 2.00e-06 [2026-04-18 15:21:30] Epoch 3 | Step 22100 | Loss: 0.6994 | LR: 2.00e-06 [2026-04-18 15:21:33] Epoch 3 | Step 22110 | Loss: 0.6995 | LR: 2.00e-06 [2026-04-18 15:21:36] Epoch 3 | Step 22120 | Loss: 0.7002 | LR: 2.00e-06 [2026-04-18 15:21:40] Epoch 3 | Step 22130 | Loss: 0.6998 | LR: 2.00e-06 [2026-04-18 15:21:43] Epoch 3 | Step 22140 | Loss: 0.6997 | LR: 2.00e-06 [2026-04-18 15:21:47] Epoch 3 | Step 22150 | Loss: 0.6994 | LR: 2.00e-06 [2026-04-18 15:21:50] Epoch 3 | Step 22160 | Loss: 0.6991 | LR: 2.00e-06 [2026-04-18 15:21:54] Epoch 3 | Step 22170 | Loss: 0.6992 | LR: 2.00e-06 [2026-04-18 15:21:57] Epoch 3 | Step 22180 | Loss: 0.6993 | LR: 2.00e-06 [2026-04-18 15:22:01] Epoch 3 | Step 22190 | Loss: 0.6989 | LR: 2.00e-06 [2026-04-18 15:22:05] Epoch 3 | Step 22200 | Loss: 0.6988 | LR: 2.00e-06 [2026-04-18 15:22:08] Epoch 3 | Step 22210 | Loss: 0.6985 | LR: 2.00e-06 [2026-04-18 15:22:12] Epoch 3 | Step 22220 | Loss: 0.6988 | LR: 2.00e-06 [2026-04-18 15:22:15] Epoch 3 | Step 22230 | Loss: 0.6991 | LR: 2.00e-06 [2026-04-18 15:22:19] Epoch 3 | Step 22240 | Loss: 0.6989 | LR: 2.00e-06 [2026-04-18 15:22:22] Epoch 3 | Step 22250 | Loss: 0.6990 | LR: 2.00e-06 [2026-04-18 15:22:26] Epoch 3 | Step 22260 | Loss: 0.6989 | LR: 2.00e-06 [2026-04-18 15:22:30] Epoch 3 | Step 22270 | Loss: 0.6992 | LR: 2.00e-06 [2026-04-18 15:22:33] Epoch 3 | Step 22280 | Loss: 0.6991 | LR: 2.00e-06 [2026-04-18 15:22:37] Epoch 3 | Step 22290 | Loss: 0.6993 | LR: 2.00e-06 [2026-04-18 15:22:40] Epoch 3 | Step 22300 | Loss: 0.6995 | LR: 2.00e-06 [2026-04-18 15:22:44] Epoch 3 | Step 22310 | Loss: 0.6995 | LR: 2.00e-06 [2026-04-18 15:22:47] Epoch 3 | Step 22320 | Loss: 0.6995 | LR: 2.00e-06 [2026-04-18 15:22:51] Epoch 3 | Step 22330 | Loss: 0.6999 | LR: 2.00e-06 [2026-04-18 15:22:54] Epoch 3 | Step 22340 | Loss: 0.6994 | LR: 2.00e-06 [2026-04-18 15:22:57] Epoch 3 | Step 22350 | Loss: 0.6991 | LR: 2.00e-06 [2026-04-18 15:23:01] Epoch 3 | Step 22360 | Loss: 0.6988 | LR: 2.00e-06 [2026-04-18 15:23:05] Epoch 3 | Step 22370 | Loss: 0.6986 | LR: 2.00e-06 [2026-04-18 15:23:09] Epoch 3 | Step 22380 | Loss: 0.6987 | LR: 2.00e-06 [2026-04-18 15:23:12] Epoch 3 | Step 22390 | Loss: 0.6988 | LR: 2.00e-06 [2026-04-18 15:23:16] Epoch 3 | Step 22400 | Loss: 0.6986 | LR: 2.00e-06 [2026-04-18 15:23:19] Epoch 3 | Step 22410 | Loss: 0.6989 | LR: 2.00e-06 [2026-04-18 15:23:23] Epoch 3 | Step 22420 | Loss: 0.6989 | LR: 2.00e-06 [2026-04-18 15:23:27] Epoch 3 | Step 22430 | Loss: 0.6989 | LR: 2.00e-06 [2026-04-18 15:23:31] Epoch 3 | Step 22440 | Loss: 0.6989 | LR: 2.00e-06 [2026-04-18 15:23:34] Epoch 3 | Step 22450 | Loss: 0.6989 | LR: 2.00e-06 [2026-04-18 15:23:38] Epoch 3 | Step 22460 | Loss: 0.6988 | LR: 2.00e-06 [2026-04-18 15:23:42] Epoch 3 | Step 22470 | Loss: 0.6989 | LR: 2.00e-06 [2026-04-18 15:23:45] Epoch 3 | Step 22480 | Loss: 0.6989 | LR: 2.00e-06 [2026-04-18 15:23:49] Epoch 3 | Step 22490 | Loss: 0.6988 | LR: 2.00e-06 [2026-04-18 15:23:52] Epoch 3 | Step 22500 | Loss: 0.6986 | LR: 2.00e-06 [2026-04-18 15:23:56] Epoch 3 | Step 22510 | Loss: 0.6984 | LR: 2.00e-06 [2026-04-18 15:23:59] Epoch 3 | Step 22520 | Loss: 0.6985 | LR: 2.00e-06 [2026-04-18 15:24:03] Epoch 3 | Step 22530 | Loss: 0.6983 | LR: 2.00e-06 [2026-04-18 15:24:06] Epoch 3 | Step 22540 | Loss: 0.6981 | LR: 2.00e-06 [2026-04-18 15:24:09] Epoch 3 | Step 22550 | Loss: 0.6980 | LR: 2.00e-06 [2026-04-18 15:24:13] Epoch 3 | Step 22560 | Loss: 0.6978 | LR: 2.00e-06 [2026-04-18 15:24:16] Epoch 3 | Step 22570 | Loss: 0.6978 | LR: 2.00e-06 [2026-04-18 15:24:20] Epoch 3 | Step 22580 | Loss: 0.6978 | LR: 2.00e-06 [2026-04-18 15:24:24] Epoch 3 | Step 22590 | Loss: 0.6977 | LR: 2.00e-06 [2026-04-18 15:24:27] Epoch 3 | Step 22600 | Loss: 0.6976 | LR: 2.00e-06 [2026-04-18 15:24:31] Epoch 3 | Step 22610 | Loss: 0.6974 | LR: 2.00e-06 [2026-04-18 15:24:35] Epoch 3 | Step 22620 | Loss: 0.6975 | LR: 2.00e-06 [2026-04-18 15:24:38] Epoch 3 | Step 22630 | Loss: 0.6973 | LR: 2.00e-06 [2026-04-18 15:24:42] Epoch 3 | Step 22640 | Loss: 0.6970 | LR: 2.00e-06 [2026-04-18 15:24:46] Epoch 3 | Step 22650 | Loss: 0.6968 | LR: 2.00e-06 [2026-04-18 15:24:49] Epoch 3 | Step 22660 | Loss: 0.6970 | LR: 2.00e-06 [2026-04-18 15:24:53] Epoch 3 | Step 22670 | Loss: 0.6971 | LR: 2.00e-06 [2026-04-18 15:24:56] Epoch 3 | Step 22680 | Loss: 0.6976 | LR: 2.00e-06 [2026-04-18 15:25:00] Epoch 3 | Step 22690 | Loss: 0.6975 | LR: 2.00e-06 [2026-04-18 15:25:04] Epoch 3 | Step 22700 | Loss: 0.6976 | LR: 2.00e-06 [2026-04-18 15:25:07] Epoch 3 | Step 22710 | Loss: 0.6979 | LR: 2.00e-06 [2026-04-18 15:25:11] Epoch 3 | Step 22720 | Loss: 0.6982 | LR: 2.00e-06 [2026-04-18 15:25:15] Epoch 3 | Step 22730 | Loss: 0.6981 | LR: 2.00e-06 [2026-04-18 15:25:19] Epoch 3 | Step 22740 | Loss: 0.6984 | LR: 2.00e-06 [2026-04-18 15:25:22] Epoch 3 | Step 22750 | Loss: 0.6984 | LR: 2.00e-06 [2026-04-18 15:25:26] Epoch 3 | Step 22760 | Loss: 0.6982 | LR: 2.00e-06 [2026-04-18 15:25:29] Epoch 3 | Step 22770 | Loss: 0.6978 | LR: 2.00e-06 [2026-04-18 15:25:32] Epoch 3 | Step 22780 | Loss: 0.6978 | LR: 2.00e-06 [2026-04-18 15:25:36] Epoch 3 | Step 22790 | Loss: 0.6977 | LR: 2.00e-06 [2026-04-18 15:25:40] Epoch 3 | Step 22800 | Loss: 0.6979 | LR: 2.00e-06 [2026-04-18 15:25:43] Epoch 3 | Step 22810 | Loss: 0.6980 | LR: 2.00e-06 [2026-04-18 15:25:46] Epoch 3 | Step 22820 | Loss: 0.6985 | LR: 2.00e-06 [2026-04-18 15:25:50] Epoch 3 | Step 22830 | Loss: 0.6984 | LR: 2.00e-06 [2026-04-18 15:25:54] Epoch 3 | Step 22840 | Loss: 0.6987 | LR: 2.00e-06 [2026-04-18 15:25:57] Epoch 3 | Step 22850 | Loss: 0.6986 | LR: 2.00e-06 [2026-04-18 15:26:01] Epoch 3 | Step 22860 | Loss: 0.6989 | LR: 2.00e-06 [2026-04-18 15:26:05] Epoch 3 | Step 22870 | Loss: 0.6988 | LR: 2.00e-06 [2026-04-18 15:26:08] Epoch 3 | Step 22880 | Loss: 0.6988 | LR: 2.00e-06 [2026-04-18 15:26:12] Epoch 3 | Step 22890 | Loss: 0.6986 | LR: 2.00e-06 [2026-04-18 15:26:15] Epoch 3 | Step 22900 | Loss: 0.6985 | LR: 2.00e-06 [2026-04-18 15:26:19] Epoch 3 | Step 22910 | Loss: 0.6985 | LR: 2.00e-06 [2026-04-18 15:26:23] Epoch 3 | Step 22920 | Loss: 0.6985 | LR: 2.00e-06 [2026-04-18 15:26:27] Epoch 3 | Step 22930 | Loss: 0.6987 | LR: 2.00e-06 [2026-04-18 15:26:30] Epoch 3 | Step 22940 | Loss: 0.6987 | LR: 2.00e-06 [2026-04-18 15:26:34] Epoch 3 | Step 22950 | Loss: 0.6984 | LR: 2.00e-06 [2026-04-18 15:26:38] Epoch 3 | Step 22960 | Loss: 0.6984 | LR: 2.00e-06 [2026-04-18 15:26:41] Epoch 3 | Step 22970 | Loss: 0.6983 | LR: 2.00e-06 [2026-04-18 15:26:45] Epoch 3 | Step 22980 | Loss: 0.6982 | LR: 2.00e-06 [2026-04-18 15:26:48] Epoch 3 | Step 22990 | Loss: 0.6985 | LR: 2.00e-06 [2026-04-18 15:26:51] Epoch 3 | Step 23000 | Loss: 0.6987 | LR: 2.00e-06 [2026-04-18 15:26:52] Validation | Batch 10/1567 | Loss: 0.9440 [2026-04-18 15:26:53] Validation | Batch 20/1567 | Loss: 1.0138 [2026-04-18 15:26:54] Validation | Batch 30/1567 | Loss: 1.0529 [2026-04-18 15:26:55] Validation | Batch 40/1567 | Loss: 1.0765 [2026-04-18 15:26:56] Validation | Batch 50/1567 | Loss: 1.0514 [2026-04-18 15:26:57] Validation | Batch 60/1567 | Loss: 1.0380 [2026-04-18 15:26:57] Validation | Batch 70/1567 | Loss: 1.0231 [2026-04-18 15:26:58] Validation | Batch 80/1567 | Loss: 1.0416 [2026-04-18 15:26:59] Validation | Batch 90/1567 | Loss: 1.0499 [2026-04-18 15:27:00] Validation | Batch 100/1567 | Loss: 1.0605 [2026-04-18 15:27:01] Validation | Batch 110/1567 | Loss: 1.0524 [2026-04-18 15:27:02] Validation | Batch 120/1567 | Loss: 1.0634 [2026-04-18 15:27:03] Validation | Batch 130/1567 | Loss: 1.0650 [2026-04-18 15:27:03] Validation | Batch 140/1567 | Loss: 1.0672 [2026-04-18 15:27:04] Validation | Batch 150/1567 | Loss: 1.0750 [2026-04-18 15:27:05] Validation | Batch 160/1567 | Loss: 1.0760 [2026-04-18 15:27:06] Validation | Batch 170/1567 | Loss: 1.0605 [2026-04-18 15:27:07] Validation | Batch 180/1567 | Loss: 1.0629 [2026-04-18 15:27:08] Validation | Batch 190/1567 | Loss: 1.0590 [2026-04-18 15:27:08] Validation | Batch 200/1567 | Loss: 1.0616 [2026-04-18 15:27:09] Validation | Batch 210/1567 | Loss: 1.0623 [2026-04-18 15:27:10] Validation | Batch 220/1567 | Loss: 1.0645 [2026-04-18 15:27:11] Validation | Batch 230/1567 | Loss: 1.0686 [2026-04-18 15:27:12] Validation | Batch 240/1567 | Loss: 1.0669 [2026-04-18 15:27:12] Validation | Batch 250/1567 | Loss: 1.0609 [2026-04-18 15:27:13] Validation | Batch 260/1567 | Loss: 1.0560 [2026-04-18 15:27:14] Validation | Batch 270/1567 | Loss: 1.0529 [2026-04-18 15:27:15] Validation | Batch 280/1567 | Loss: 1.0540 [2026-04-18 15:27:16] Validation | Batch 290/1567 | Loss: 1.0595 [2026-04-18 15:27:17] Validation | Batch 300/1567 | Loss: 1.0645 [2026-04-18 15:27:17] Validation | Batch 310/1567 | Loss: 1.0634 [2026-04-18 15:27:18] Validation | Batch 320/1567 | Loss: 1.0642 [2026-04-18 15:27:19] Validation | Batch 330/1567 | Loss: 1.0612 [2026-04-18 15:27:20] Validation | Batch 340/1567 | Loss: 1.0653 [2026-04-18 15:27:21] Validation | Batch 350/1567 | Loss: 1.0642 [2026-04-18 15:27:22] Validation | Batch 360/1567 | Loss: 1.0617 [2026-04-18 15:27:22] Validation | Batch 370/1567 | Loss: 1.0589 [2026-04-18 15:27:23] Validation | Batch 380/1567 | Loss: 1.0622 [2026-04-18 15:27:24] Validation | Batch 390/1567 | Loss: 1.0634 [2026-04-18 15:27:25] Validation | Batch 400/1567 | Loss: 1.0646 [2026-04-18 15:27:26] Validation | Batch 410/1567 | Loss: 1.0639 [2026-04-18 15:27:26] Validation | Batch 420/1567 | Loss: 1.0635 [2026-04-18 15:27:27] Validation | Batch 430/1567 | Loss: 1.0633 [2026-04-18 15:27:28] Validation | Batch 440/1567 | Loss: 1.0622 [2026-04-18 15:27:29] Validation | Batch 450/1567 | Loss: 1.0623 [2026-04-18 15:27:30] Validation | Batch 460/1567 | Loss: 1.0614 [2026-04-18 15:27:31] Validation | Batch 470/1567 | Loss: 1.0607 [2026-04-18 15:27:31] Validation | Batch 480/1567 | Loss: 1.0585 [2026-04-18 15:27:32] Validation | Batch 490/1567 | Loss: 1.0582 [2026-04-18 15:27:33] Validation | Batch 500/1567 | Loss: 1.0578 [2026-04-18 15:27:34] Validation | Batch 510/1567 | Loss: 1.0601 [2026-04-18 15:27:34] Validation | Batch 520/1567 | Loss: 1.0619 [2026-04-18 15:27:35] Validation | Batch 530/1567 | Loss: 1.0615 [2026-04-18 15:27:36] Validation | Batch 540/1567 | Loss: 1.0643 [2026-04-18 15:27:37] Validation | Batch 550/1567 | Loss: 1.0679 [2026-04-18 15:27:38] Validation | Batch 560/1567 | Loss: 1.0676 [2026-04-18 15:27:39] Validation | Batch 570/1567 | Loss: 1.0676 [2026-04-18 15:27:40] Validation | Batch 580/1567 | Loss: 1.0667 [2026-04-18 15:27:41] Validation | Batch 590/1567 | Loss: 1.0654 [2026-04-18 15:27:41] Validation | Batch 600/1567 | Loss: 1.0636 [2026-04-18 15:27:42] Validation | Batch 610/1567 | Loss: 1.0625 [2026-04-18 15:27:43] Validation | Batch 620/1567 | Loss: 1.0640 [2026-04-18 15:27:44] Validation | Batch 630/1567 | Loss: 1.0619 [2026-04-18 15:27:45] Validation | Batch 640/1567 | Loss: 1.0637 [2026-04-18 15:27:46] Validation | Batch 650/1567 | Loss: 1.0628 [2026-04-18 15:27:47] Validation | Batch 660/1567 | Loss: 1.0616 [2026-04-18 15:27:47] Validation | Batch 670/1567 | Loss: 1.0597 [2026-04-18 15:27:48] Validation | Batch 680/1567 | Loss: 1.0591 [2026-04-18 15:27:49] Validation | Batch 690/1567 | Loss: 1.0600 [2026-04-18 15:27:49] Validation | Batch 700/1567 | Loss: 1.0585 [2026-04-18 15:27:50] Validation | Batch 710/1567 | Loss: 1.0599 [2026-04-18 15:27:51] Validation | Batch 720/1567 | Loss: 1.0591 [2026-04-18 15:27:52] Validation | Batch 730/1567 | Loss: 1.0597 [2026-04-18 15:27:53] Validation | Batch 740/1567 | Loss: 1.0608 [2026-04-18 15:27:54] Validation | Batch 750/1567 | Loss: 1.0614 [2026-04-18 15:27:54] Validation | Batch 760/1567 | Loss: 1.0612 [2026-04-18 15:27:55] Validation | Batch 770/1567 | Loss: 1.0632 [2026-04-18 15:27:56] Validation | Batch 780/1567 | Loss: 1.0644 [2026-04-18 15:27:56] Validation | Batch 790/1567 | Loss: 1.0639 [2026-04-18 15:27:57] Validation | Batch 800/1567 | Loss: 1.0658 [2026-04-18 15:27:58] Validation | Batch 810/1567 | Loss: 1.0658 [2026-04-18 15:27:59] Validation | Batch 820/1567 | Loss: 1.0653 [2026-04-18 15:27:59] Validation | Batch 830/1567 | Loss: 1.0638 [2026-04-18 15:28:00] Validation | Batch 840/1567 | Loss: 1.0639 [2026-04-18 15:28:01] Validation | Batch 850/1567 | Loss: 1.0625 [2026-04-18 15:28:01] Validation | Batch 860/1567 | Loss: 1.0642 [2026-04-18 15:28:02] Validation | Batch 870/1567 | Loss: 1.0647 [2026-04-18 15:28:03] Validation | Batch 880/1567 | Loss: 1.0656 [2026-04-18 15:28:04] Validation | Batch 890/1567 | Loss: 1.0661 [2026-04-18 15:28:05] Validation | Batch 900/1567 | Loss: 1.0681 [2026-04-18 15:28:05] Validation | Batch 910/1567 | Loss: 1.0682 [2026-04-18 15:28:06] Validation | Batch 920/1567 | Loss: 1.0705 [2026-04-18 15:28:07] Validation | Batch 930/1567 | Loss: 1.0681 [2026-04-18 15:28:07] Validation | Batch 940/1567 | Loss: 1.0677 [2026-04-18 15:28:08] Validation | Batch 950/1567 | Loss: 1.0667 [2026-04-18 15:28:09] Validation | Batch 960/1567 | Loss: 1.0653 [2026-04-18 15:28:10] Validation | Batch 970/1567 | Loss: 1.0670 [2026-04-18 15:28:10] Validation | Batch 980/1567 | Loss: 1.0674 [2026-04-18 15:28:11] Validation | Batch 990/1567 | Loss: 1.0668 [2026-04-18 15:28:12] Validation | Batch 1000/1567 | Loss: 1.0672 [2026-04-18 15:28:13] Validation | Batch 1010/1567 | Loss: 1.0649 [2026-04-18 15:28:13] Validation | Batch 1020/1567 | Loss: 1.0652 [2026-04-18 15:28:14] Validation | Batch 1030/1567 | Loss: 1.0668 [2026-04-18 15:28:15] Validation | Batch 1040/1567 | Loss: 1.0663 [2026-04-18 15:28:16] Validation | Batch 1050/1567 | Loss: 1.0673 [2026-04-18 15:28:17] Validation | Batch 1060/1567 | Loss: 1.0663 [2026-04-18 15:28:18] Validation | Batch 1070/1567 | Loss: 1.0656 [2026-04-18 15:28:18] Validation | Batch 1080/1567 | Loss: 1.0665 [2026-04-18 15:28:19] Validation | Batch 1090/1567 | Loss: 1.0662 [2026-04-18 15:28:20] Validation | Batch 1100/1567 | Loss: 1.0668 [2026-04-18 15:28:20] Validation | Batch 1110/1567 | Loss: 1.0667 [2026-04-18 15:28:21] Validation | Batch 1120/1567 | Loss: 1.0669 [2026-04-18 15:28:22] Validation | Batch 1130/1567 | Loss: 1.0670 [2026-04-18 15:28:23] Validation | Batch 1140/1567 | Loss: 1.0678 [2026-04-18 15:28:24] Validation | Batch 1150/1567 | Loss: 1.0682 [2026-04-18 15:28:24] Validation | Batch 1160/1567 | Loss: 1.0691 [2026-04-18 15:28:25] Validation | Batch 1170/1567 | Loss: 1.0688 [2026-04-18 15:28:26] Validation | Batch 1180/1567 | Loss: 1.0683 [2026-04-18 15:28:27] Validation | Batch 1190/1567 | Loss: 1.0695 [2026-04-18 15:28:28] Validation | Batch 1200/1567 | Loss: 1.0689 [2026-04-18 15:28:29] Validation | Batch 1210/1567 | Loss: 1.0677 [2026-04-18 15:28:29] Validation | Batch 1220/1567 | Loss: 1.0680 [2026-04-18 15:28:30] Validation | Batch 1230/1567 | Loss: 1.0701 [2026-04-18 15:28:31] Validation | Batch 1240/1567 | Loss: 1.0689 [2026-04-18 15:28:32] Validation | Batch 1250/1567 | Loss: 1.0689 [2026-04-18 15:28:33] Validation | Batch 1260/1567 | Loss: 1.0699 [2026-04-18 15:28:34] Validation | Batch 1270/1567 | Loss: 1.0699 [2026-04-18 15:28:34] Validation | Batch 1280/1567 | Loss: 1.0693 [2026-04-18 15:28:36] Validation | Batch 1290/1567 | Loss: 1.0696 [2026-04-18 15:28:36] Validation | Batch 1300/1567 | Loss: 1.0699 [2026-04-18 15:28:37] Validation | Batch 1310/1567 | Loss: 1.0703 [2026-04-18 15:28:38] Validation | Batch 1320/1567 | Loss: 1.0693 [2026-04-18 15:28:39] Validation | Batch 1330/1567 | Loss: 1.0689 [2026-04-18 15:28:39] Validation | Batch 1340/1567 | Loss: 1.0687 [2026-04-18 15:28:40] Validation | Batch 1350/1567 | Loss: 1.0696 [2026-04-18 15:28:41] Validation | Batch 1360/1567 | Loss: 1.0693 [2026-04-18 15:28:42] Validation | Batch 1370/1567 | Loss: 1.0696 [2026-04-18 15:28:43] Validation | Batch 1380/1567 | Loss: 1.0710 [2026-04-18 15:28:43] Validation | Batch 1390/1567 | Loss: 1.0711 [2026-04-18 15:28:44] Validation | Batch 1400/1567 | Loss: 1.0715 [2026-04-18 15:28:45] Validation | Batch 1410/1567 | Loss: 1.0713 [2026-04-18 15:28:45] Validation | Batch 1420/1567 | Loss: 1.0719 [2026-04-18 15:28:46] Validation | Batch 1430/1567 | Loss: 1.0716 [2026-04-18 15:28:47] Validation | Batch 1440/1567 | Loss: 1.0719 [2026-04-18 15:28:48] Validation | Batch 1450/1567 | Loss: 1.0712 [2026-04-18 15:28:48] Validation | Batch 1460/1567 | Loss: 1.0710 [2026-04-18 15:28:49] Validation | Batch 1470/1567 | Loss: 1.0700 [2026-04-18 15:28:50] Validation | Batch 1480/1567 | Loss: 1.0684 [2026-04-18 15:28:50] Validation | Batch 1490/1567 | Loss: 1.0684 [2026-04-18 15:28:51] Validation | Batch 1500/1567 | Loss: 1.0685 [2026-04-18 15:28:52] Validation | Batch 1510/1567 | Loss: 1.0683 [2026-04-18 15:28:53] Validation | Batch 1520/1567 | Loss: 1.0675 [2026-04-18 15:28:53] Validation | Batch 1530/1567 | Loss: 1.0683 [2026-04-18 15:28:54] Validation | Batch 1540/1567 | Loss: 1.0694 [2026-04-18 15:28:55] Validation | Batch 1550/1567 | Loss: 1.0697 [2026-04-18 15:28:56] Validation | Batch 1560/1567 | Loss: 1.0687 [2026-04-18 15:28:57] Validation | Batch 1567/1567 | Loss: 1.0692 [2026-04-18 15:28:57] Validation | Loss: 1.0692 | PPL: 2.94 | Time: 125.37s [2026-04-18 15:29:00] Epoch 3 | Step 23010 | Loss: 0.6990 | LR: 2.00e-06 [2026-04-18 15:29:04] Epoch 3 | Step 23020 | Loss: 0.6991 | LR: 2.00e-06 [2026-04-18 15:29:07] Epoch 3 | Step 23030 | Loss: 0.6994 | LR: 2.00e-06 [2026-04-18 15:29:11] Epoch 3 | Step 23040 | Loss: 0.6992 | LR: 2.00e-06 [2026-04-18 15:29:15] Epoch 3 | Step 23050 | Loss: 0.6993 | LR: 2.00e-06 [2026-04-18 15:29:18] Epoch 3 | Step 23060 | Loss: 0.6989 | LR: 2.00e-06 [2026-04-18 15:29:22] Epoch 3 | Step 23070 | Loss: 0.6989 | LR: 2.00e-06 [2026-04-18 15:29:25] Epoch 3 | Step 23080 | Loss: 0.6992 | LR: 2.00e-06 [2026-04-18 15:29:29] Epoch 3 | Step 23090 | Loss: 0.6994 | LR: 2.00e-06 [2026-04-18 15:29:34] Epoch 3 | Step 23100 | Loss: 0.6995 | LR: 2.00e-06 [2026-04-18 15:29:38] Epoch 3 | Step 23110 | Loss: 0.6992 | LR: 2.00e-06 [2026-04-18 15:29:42] Epoch 3 | Step 23120 | Loss: 0.6991 | LR: 2.00e-06 [2026-04-18 15:29:45] Epoch 3 | Step 23130 | Loss: 0.6992 | LR: 2.00e-06 [2026-04-18 15:29:48] Epoch 3 | Step 23140 | Loss: 0.6991 | LR: 2.00e-06 [2026-04-18 15:29:52] Epoch 3 | Step 23150 | Loss: 0.6996 | LR: 2.00e-06 [2026-04-18 15:29:55] Epoch 3 | Step 23160 | Loss: 0.6995 | LR: 2.00e-06 [2026-04-18 15:29:58] Epoch 3 | Step 23170 | Loss: 0.6994 | LR: 2.00e-06 [2026-04-18 15:30:02] Epoch 3 | Step 23180 | Loss: 0.6995 | LR: 2.00e-06 [2026-04-18 15:30:05] Epoch 3 | Step 23190 | Loss: 0.7000 | LR: 2.00e-06 [2026-04-18 15:30:09] Epoch 3 | Step 23200 | Loss: 0.6998 | LR: 2.00e-06 [2026-04-18 15:30:13] Epoch 3 | Step 23210 | Loss: 0.7000 | LR: 2.00e-06 [2026-04-18 15:30:16] Epoch 3 | Step 23220 | Loss: 0.6998 | LR: 2.00e-06 [2026-04-18 15:30:20] Epoch 3 | Step 23230 | Loss: 0.6994 | LR: 2.00e-06 [2026-04-18 15:30:24] Epoch 3 | Step 23240 | Loss: 0.6994 | LR: 2.00e-06 [2026-04-18 15:30:28] Epoch 3 | Step 23250 | Loss: 0.6995 | LR: 2.00e-06 [2026-04-18 15:30:32] Epoch 3 | Step 23260 | Loss: 0.6992 | LR: 2.00e-06 [2026-04-18 15:30:35] Epoch 3 | Step 23270 | Loss: 0.6992 | LR: 2.00e-06 [2026-04-18 15:30:39] Epoch 3 | Step 23280 | Loss: 0.6991 | LR: 2.00e-06 [2026-04-18 15:30:43] Epoch 3 | Step 23290 | Loss: 0.6993 | LR: 2.00e-06 [2026-04-18 15:30:46] Epoch 3 | Step 23300 | Loss: 0.6995 | LR: 2.00e-06 [2026-04-18 15:30:50] Epoch 3 | Step 23310 | Loss: 0.6994 | LR: 2.00e-06 [2026-04-18 15:30:54] Epoch 3 | Step 23320 | Loss: 0.6994 | LR: 2.00e-06 [2026-04-18 15:30:58] Epoch 3 | Step 23330 | Loss: 0.6997 | LR: 2.00e-06 [2026-04-18 15:31:01] Epoch 3 | Step 23340 | Loss: 0.6996 | LR: 2.00e-06 [2026-04-18 15:31:04] Epoch 3 | Step 23350 | Loss: 0.6997 | LR: 2.00e-06 [2026-04-18 15:31:08] Epoch 3 | Step 23360 | Loss: 0.6999 | LR: 2.00e-06 [2026-04-18 15:31:11] Epoch 3 | Step 23370 | Loss: 0.6999 | LR: 2.00e-06 [2026-04-18 15:31:15] Epoch 3 | Step 23380 | Loss: 0.7000 | LR: 2.00e-06 [2026-04-18 15:31:19] Epoch 3 | Step 23390 | Loss: 0.7002 | LR: 2.00e-06 [2026-04-18 15:31:22] Epoch 3 | Step 23400 | Loss: 0.7002 | LR: 2.00e-06 [2026-04-18 15:31:26] Epoch 3 | Step 23410 | Loss: 0.7002 | LR: 2.00e-06 [2026-04-18 15:31:29] Epoch 3 | Step 23420 | Loss: 0.7002 | LR: 2.00e-06 [2026-04-18 15:31:33] Epoch 3 | Step 23430 | Loss: 0.7001 | LR: 2.00e-06 [2026-04-18 15:31:37] Epoch 3 | Step 23440 | Loss: 0.7002 | LR: 2.00e-06 [2026-04-18 15:31:40] Epoch 3 | Step 23450 | Loss: 0.7000 | LR: 2.00e-06 [2026-04-18 15:31:44] Epoch 3 | Step 23460 | Loss: 0.7000 | LR: 2.00e-06 [2026-04-18 15:31:48] Epoch 3 | Step 23470 | Loss: 0.7001 | LR: 2.00e-06 [2026-04-18 15:31:52] Epoch 3 | Step 23480 | Loss: 0.7001 | LR: 2.00e-06 [2026-04-18 15:31:55] Epoch 3 | Step 23490 | Loss: 0.7004 | LR: 2.00e-06 [2026-04-18 15:31:59] Epoch 3 | Step 23500 | Loss: 0.7003 | LR: 2.00e-06 [2026-04-18 15:32:03] Epoch 3 | Step 23510 | Loss: 0.7003 | LR: 2.00e-06 [2026-04-18 15:32:06] Epoch 3 | Step 23520 | Loss: 0.7004 | LR: 2.00e-06 [2026-04-18 15:32:10] Epoch 3 | Step 23530 | Loss: 0.7000 | LR: 2.00e-06 [2026-04-18 15:32:13] Epoch 3 | Step 23540 | Loss: 0.6998 | LR: 2.00e-06 [2026-04-18 15:32:17] Epoch 3 | Step 23550 | Loss: 0.6996 | LR: 2.00e-06 [2026-04-18 15:32:20] Epoch 3 | Step 23560 | Loss: 0.6997 | LR: 2.00e-06 [2026-04-18 15:32:24] Epoch 3 | Step 23570 | Loss: 0.6999 | LR: 2.00e-06 [2026-04-18 15:32:27] Epoch 3 | Step 23580 | Loss: 0.6999 | LR: 2.00e-06 [2026-04-18 15:32:31] Epoch 3 | Step 23590 | Loss: 0.6997 | LR: 2.00e-06 [2026-04-18 15:32:34] Epoch 3 | Step 23600 | Loss: 0.6999 | LR: 2.00e-06 [2026-04-18 15:32:38] Epoch 3 | Step 23610 | Loss: 0.6996 | LR: 2.00e-06 [2026-04-18 15:32:41] Epoch 3 | Step 23620 | Loss: 0.6998 | LR: 2.00e-06 [2026-04-18 15:32:45] Epoch 3 | Step 23630 | Loss: 0.6999 | LR: 2.00e-06 [2026-04-18 15:32:49] Epoch 3 | Step 23640 | Loss: 0.6999 | LR: 2.00e-06 [2026-04-18 15:32:52] Epoch 3 | Step 23650 | Loss: 0.6999 | LR: 2.00e-06 [2026-04-18 15:32:56] Epoch 3 | Step 23660 | Loss: 0.6997 | LR: 2.00e-06 [2026-04-18 15:32:59] Epoch 3 | Step 23670 | Loss: 0.6998 | LR: 2.00e-06 [2026-04-18 15:33:03] Epoch 3 | Step 23680 | Loss: 0.6995 | LR: 2.00e-06 [2026-04-18 15:33:06] Epoch 3 | Step 23690 | Loss: 0.6995 | LR: 2.00e-06 [2026-04-18 15:33:09] Epoch 3 | Step 23700 | Loss: 0.6996 | LR: 2.00e-06 [2026-04-18 15:33:13] Epoch 3 | Step 23710 | Loss: 0.6994 | LR: 2.00e-06 [2026-04-18 15:33:16] Epoch 3 | Step 23720 | Loss: 0.6994 | LR: 2.00e-06 [2026-04-18 15:33:20] Epoch 3 | Step 23730 | Loss: 0.6994 | LR: 2.00e-06 [2026-04-18 15:33:23] Epoch 3 | Step 23740 | Loss: 0.6991 | LR: 2.00e-06 [2026-04-18 15:33:27] Epoch 3 | Step 23750 | Loss: 0.6992 | LR: 2.00e-06 [2026-04-18 15:33:31] Epoch 3 | Step 23760 | Loss: 0.6992 | LR: 2.00e-06 [2026-04-18 15:33:34] Epoch 3 | Step 23770 | Loss: 0.6991 | LR: 2.00e-06 [2026-04-18 15:33:38] Epoch 3 | Step 23780 | Loss: 0.6991 | LR: 2.00e-06 [2026-04-18 15:33:41] Epoch 3 | Step 23790 | Loss: 0.6989 | LR: 2.00e-06 [2026-04-18 15:33:45] Epoch 3 | Step 23800 | Loss: 0.6988 | LR: 2.00e-06 [2026-04-18 15:33:49] Epoch 3 | Step 23810 | Loss: 0.6989 | LR: 2.00e-06 [2026-04-18 15:33:53] Epoch 3 | Step 23820 | Loss: 0.6989 | LR: 2.00e-06 [2026-04-18 15:33:56] Epoch 3 | Step 23830 | Loss: 0.6989 | LR: 2.00e-06 [2026-04-18 15:34:00] Epoch 3 | Step 23840 | Loss: 0.6989 | LR: 2.00e-06 [2026-04-18 15:34:04] Epoch 3 | Step 23850 | Loss: 0.6986 | LR: 2.00e-06 [2026-04-18 15:34:07] Epoch 3 | Step 23860 | Loss: 0.6986 | LR: 2.00e-06 [2026-04-18 15:34:11] Epoch 3 | Step 23870 | Loss: 0.6985 | LR: 2.00e-06 [2026-04-18 15:34:14] Epoch 3 | Step 23880 | Loss: 0.6985 | LR: 2.00e-06 [2026-04-18 15:34:18] Epoch 3 | Step 23890 | Loss: 0.6985 | LR: 2.00e-06 [2026-04-18 15:34:21] Epoch 3 | Step 23900 | Loss: 0.6988 | LR: 2.00e-06 [2026-04-18 15:34:25] Epoch 3 | Step 23910 | Loss: 0.6990 | LR: 2.00e-06 [2026-04-18 15:34:28] Epoch 3 | Step 23920 | Loss: 0.6991 | LR: 2.00e-06 [2026-04-18 15:34:32] Epoch 3 | Step 23930 | Loss: 0.6993 | LR: 2.00e-06 [2026-04-18 15:34:36] Epoch 3 | Step 23940 | Loss: 0.6991 | LR: 2.00e-06 [2026-04-18 15:34:39] Epoch 3 | Step 23950 | Loss: 0.6989 | LR: 2.00e-06 [2026-04-18 15:34:43] Epoch 3 | Step 23960 | Loss: 0.6989 | LR: 2.00e-06 [2026-04-18 15:34:46] Epoch 3 | Step 23970 | Loss: 0.6988 | LR: 2.00e-06 [2026-04-18 15:34:50] Epoch 3 | Step 23980 | Loss: 0.6986 | LR: 2.00e-06 [2026-04-18 15:34:53] Epoch 3 | Step 23990 | Loss: 0.6987 | LR: 2.00e-06 [2026-04-18 15:34:57] Epoch 3 | Step 24000 | Loss: 0.6984 | LR: 2.00e-06 [2026-04-18 15:35:07] Checkpoint saved: outputs/2026-04-18/12-19-14/checkpoints/checkpoint_step_24000.pt [2026-04-18 15:35:22] Validation | Batch 10/1567 | Loss: 0.9463 [2026-04-18 15:35:23] Validation | Batch 20/1567 | Loss: 1.0162 [2026-04-18 15:35:24] Validation | Batch 30/1567 | Loss: 1.0554 [2026-04-18 15:35:25] Validation | Batch 40/1567 | Loss: 1.0785 [2026-04-18 15:35:25] Validation | Batch 50/1567 | Loss: 1.0533 [2026-04-18 15:35:26] Validation | Batch 60/1567 | Loss: 1.0396 [2026-04-18 15:35:26] Validation | Batch 70/1567 | Loss: 1.0249 [2026-04-18 15:35:28] Validation | Batch 80/1567 | Loss: 1.0434 [2026-04-18 15:35:28] Validation | Batch 90/1567 | Loss: 1.0511 [2026-04-18 15:35:29] Validation | Batch 100/1567 | Loss: 1.0617 [2026-04-18 15:35:30] Validation | Batch 110/1567 | Loss: 1.0536 [2026-04-18 15:35:31] Validation | Batch 120/1567 | Loss: 1.0647 [2026-04-18 15:35:32] Validation | Batch 130/1567 | Loss: 1.0662 [2026-04-18 15:35:33] Validation | Batch 140/1567 | Loss: 1.0685 [2026-04-18 15:35:33] Validation | Batch 150/1567 | Loss: 1.0763 [2026-04-18 15:35:34] Validation | Batch 160/1567 | Loss: 1.0772 [2026-04-18 15:35:35] Validation | Batch 170/1567 | Loss: 1.0615 [2026-04-18 15:35:36] Validation | Batch 180/1567 | Loss: 1.0641 [2026-04-18 15:35:37] Validation | Batch 190/1567 | Loss: 1.0602 [2026-04-18 15:35:38] Validation | Batch 200/1567 | Loss: 1.0628 [2026-04-18 15:35:38] Validation | Batch 210/1567 | Loss: 1.0636 [2026-04-18 15:35:39] Validation | Batch 220/1567 | Loss: 1.0658 [2026-04-18 15:35:40] Validation | Batch 230/1567 | Loss: 1.0696 [2026-04-18 15:35:41] Validation | Batch 240/1567 | Loss: 1.0679 [2026-04-18 15:35:42] Validation | Batch 250/1567 | Loss: 1.0617 [2026-04-18 15:35:42] Validation | Batch 260/1567 | Loss: 1.0568 [2026-04-18 15:35:43] Validation | Batch 270/1567 | Loss: 1.0537 [2026-04-18 15:35:44] Validation | Batch 280/1567 | Loss: 1.0550 [2026-04-18 15:35:45] Validation | Batch 290/1567 | Loss: 1.0604 [2026-04-18 15:35:46] Validation | Batch 300/1567 | Loss: 1.0652 [2026-04-18 15:35:46] Validation | Batch 310/1567 | Loss: 1.0641 [2026-04-18 15:35:47] Validation | Batch 320/1567 | Loss: 1.0650 [2026-04-18 15:35:48] Validation | Batch 330/1567 | Loss: 1.0619 [2026-04-18 15:35:49] Validation | Batch 340/1567 | Loss: 1.0661 [2026-04-18 15:35:50] Validation | Batch 350/1567 | Loss: 1.0650 [2026-04-18 15:35:51] Validation | Batch 360/1567 | Loss: 1.0626 [2026-04-18 15:35:52] Validation | Batch 370/1567 | Loss: 1.0598 [2026-04-18 15:35:52] Validation | Batch 380/1567 | Loss: 1.0631 [2026-04-18 15:35:53] Validation | Batch 390/1567 | Loss: 1.0642 [2026-04-18 15:35:54] Validation | Batch 400/1567 | Loss: 1.0655 [2026-04-18 15:35:55] Validation | Batch 410/1567 | Loss: 1.0648 [2026-04-18 15:35:55] Validation | Batch 420/1567 | Loss: 1.0643 [2026-04-18 15:35:56] Validation | Batch 430/1567 | Loss: 1.0642 [2026-04-18 15:35:58] Validation | Batch 440/1567 | Loss: 1.0630 [2026-04-18 15:35:58] Validation | Batch 450/1567 | Loss: 1.0632 [2026-04-18 15:35:59] Validation | Batch 460/1567 | Loss: 1.0622 [2026-04-18 15:36:00] Validation | Batch 470/1567 | Loss: 1.0616 [2026-04-18 15:36:01] Validation | Batch 480/1567 | Loss: 1.0594 [2026-04-18 15:36:02] Validation | Batch 490/1567 | Loss: 1.0592 [2026-04-18 15:36:02] Validation | Batch 500/1567 | Loss: 1.0587 [2026-04-18 15:36:03] Validation | Batch 510/1567 | Loss: 1.0611 [2026-04-18 15:36:04] Validation | Batch 520/1567 | Loss: 1.0629 [2026-04-18 15:36:05] Validation | Batch 530/1567 | Loss: 1.0626 [2026-04-18 15:36:06] Validation | Batch 540/1567 | Loss: 1.0653 [2026-04-18 15:36:07] Validation | Batch 550/1567 | Loss: 1.0689 [2026-04-18 15:36:07] Validation | Batch 560/1567 | Loss: 1.0687 [2026-04-18 15:36:08] Validation | Batch 570/1567 | Loss: 1.0686 [2026-04-18 15:36:09] Validation | Batch 580/1567 | Loss: 1.0678 [2026-04-18 15:36:10] Validation | Batch 590/1567 | Loss: 1.0665 [2026-04-18 15:36:11] Validation | Batch 600/1567 | Loss: 1.0647 [2026-04-18 15:36:12] Validation | Batch 610/1567 | Loss: 1.0636 [2026-04-18 15:36:13] Validation | Batch 620/1567 | Loss: 1.0651 [2026-04-18 15:36:14] Validation | Batch 630/1567 | Loss: 1.0631 [2026-04-18 15:36:14] Validation | Batch 640/1567 | Loss: 1.0648 [2026-04-18 15:36:15] Validation | Batch 650/1567 | Loss: 1.0639 [2026-04-18 15:36:16] Validation | Batch 660/1567 | Loss: 1.0627 [2026-04-18 15:36:17] Validation | Batch 670/1567 | Loss: 1.0608 [2026-04-18 15:36:17] Validation | Batch 680/1567 | Loss: 1.0602 [2026-04-18 15:36:18] Validation | Batch 690/1567 | Loss: 1.0610 [2026-04-18 15:36:19] Validation | Batch 700/1567 | Loss: 1.0596 [2026-04-18 15:36:20] Validation | Batch 710/1567 | Loss: 1.0609 [2026-04-18 15:36:21] Validation | Batch 720/1567 | Loss: 1.0601 [2026-04-18 15:36:21] Validation | Batch 730/1567 | Loss: 1.0607 [2026-04-18 15:36:22] Validation | Batch 740/1567 | Loss: 1.0618 [2026-04-18 15:36:23] Validation | Batch 750/1567 | Loss: 1.0624 [2026-04-18 15:36:24] Validation | Batch 760/1567 | Loss: 1.0622 [2026-04-18 15:36:25] Validation | Batch 770/1567 | Loss: 1.0643 [2026-04-18 15:36:26] Validation | Batch 780/1567 | Loss: 1.0655 [2026-04-18 15:36:26] Validation | Batch 790/1567 | Loss: 1.0650 [2026-04-18 15:36:27] Validation | Batch 800/1567 | Loss: 1.0669 [2026-04-18 15:36:28] Validation | Batch 810/1567 | Loss: 1.0669 [2026-04-18 15:36:29] Validation | Batch 820/1567 | Loss: 1.0665 [2026-04-18 15:36:29] Validation | Batch 830/1567 | Loss: 1.0650 [2026-04-18 15:36:30] Validation | Batch 840/1567 | Loss: 1.0650 [2026-04-18 15:36:31] Validation | Batch 850/1567 | Loss: 1.0636 [2026-04-18 15:36:31] Validation | Batch 860/1567 | Loss: 1.0653 [2026-04-18 15:36:32] Validation | Batch 870/1567 | Loss: 1.0657 [2026-04-18 15:36:33] Validation | Batch 880/1567 | Loss: 1.0666 [2026-04-18 15:36:34] Validation | Batch 890/1567 | Loss: 1.0672 [2026-04-18 15:36:34] Validation | Batch 900/1567 | Loss: 1.0691 [2026-04-18 15:36:35] Validation | Batch 910/1567 | Loss: 1.0693 [2026-04-18 15:36:36] Validation | Batch 920/1567 | Loss: 1.0715 [2026-04-18 15:36:36] Validation | Batch 930/1567 | Loss: 1.0692 [2026-04-18 15:36:37] Validation | Batch 940/1567 | Loss: 1.0688 [2026-04-18 15:36:38] Validation | Batch 950/1567 | Loss: 1.0677 [2026-04-18 15:36:39] Validation | Batch 960/1567 | Loss: 1.0663 [2026-04-18 15:36:39] Validation | Batch 970/1567 | Loss: 1.0681 [2026-04-18 15:36:40] Validation | Batch 980/1567 | Loss: 1.0685 [2026-04-18 15:36:41] Validation | Batch 990/1567 | Loss: 1.0679 [2026-04-18 15:36:41] Validation | Batch 1000/1567 | Loss: 1.0684 [2026-04-18 15:36:42] Validation | Batch 1010/1567 | Loss: 1.0661 [2026-04-18 15:36:43] Validation | Batch 1020/1567 | Loss: 1.0663 [2026-04-18 15:36:44] Validation | Batch 1030/1567 | Loss: 1.0680 [2026-04-18 15:36:45] Validation | Batch 1040/1567 | Loss: 1.0674 [2026-04-18 15:36:45] Validation | Batch 1050/1567 | Loss: 1.0685 [2026-04-18 15:36:46] Validation | Batch 1060/1567 | Loss: 1.0675 [2026-04-18 15:36:47] Validation | Batch 1070/1567 | Loss: 1.0667 [2026-04-18 15:36:48] Validation | Batch 1080/1567 | Loss: 1.0677 [2026-04-18 15:36:49] Validation | Batch 1090/1567 | Loss: 1.0674 [2026-04-18 15:36:49] Validation | Batch 1100/1567 | Loss: 1.0680 [2026-04-18 15:36:50] Validation | Batch 1110/1567 | Loss: 1.0679 [2026-04-18 15:36:50] Validation | Batch 1120/1567 | Loss: 1.0681 [2026-04-18 15:36:51] Validation | Batch 1130/1567 | Loss: 1.0681 [2026-04-18 15:36:52] Validation | Batch 1140/1567 | Loss: 1.0689 [2026-04-18 15:36:53] Validation | Batch 1150/1567 | Loss: 1.0694 [2026-04-18 15:36:54] Validation | Batch 1160/1567 | Loss: 1.0702 [2026-04-18 15:36:55] Validation | Batch 1170/1567 | Loss: 1.0699 [2026-04-18 15:36:56] Validation | Batch 1180/1567 | Loss: 1.0694 [2026-04-18 15:36:56] Validation | Batch 1190/1567 | Loss: 1.0706 [2026-04-18 15:36:57] Validation | Batch 1200/1567 | Loss: 1.0699 [2026-04-18 15:36:58] Validation | Batch 1210/1567 | Loss: 1.0688 [2026-04-18 15:36:59] Validation | Batch 1220/1567 | Loss: 1.0692 [2026-04-18 15:37:00] Validation | Batch 1230/1567 | Loss: 1.0713 [2026-04-18 15:37:00] Validation | Batch 1240/1567 | Loss: 1.0700 [2026-04-18 15:37:01] Validation | Batch 1250/1567 | Loss: 1.0700 [2026-04-18 15:37:02] Validation | Batch 1260/1567 | Loss: 1.0710 [2026-04-18 15:37:04] Validation | Batch 1270/1567 | Loss: 1.0710 [2026-04-18 15:37:04] Validation | Batch 1280/1567 | Loss: 1.0704 [2026-04-18 15:37:05] Validation | Batch 1290/1567 | Loss: 1.0707 [2026-04-18 15:37:06] Validation | Batch 1300/1567 | Loss: 1.0710 [2026-04-18 15:37:07] Validation | Batch 1310/1567 | Loss: 1.0714 [2026-04-18 15:37:08] Validation | Batch 1320/1567 | Loss: 1.0705 [2026-04-18 15:37:08] Validation | Batch 1330/1567 | Loss: 1.0701 [2026-04-18 15:37:09] Validation | Batch 1340/1567 | Loss: 1.0699 [2026-04-18 15:37:10] Validation | Batch 1350/1567 | Loss: 1.0708 [2026-04-18 15:37:11] Validation | Batch 1360/1567 | Loss: 1.0704 [2026-04-18 15:37:11] Validation | Batch 1370/1567 | Loss: 1.0708 [2026-04-18 15:37:12] Validation | Batch 1380/1567 | Loss: 1.0722 [2026-04-18 15:37:13] Validation | Batch 1390/1567 | Loss: 1.0723 [2026-04-18 15:37:14] Validation | Batch 1400/1567 | Loss: 1.0727 [2026-04-18 15:37:14] Validation | Batch 1410/1567 | Loss: 1.0725 [2026-04-18 15:37:15] Validation | Batch 1420/1567 | Loss: 1.0731 [2026-04-18 15:37:16] Validation | Batch 1430/1567 | Loss: 1.0728 [2026-04-18 15:37:17] Validation | Batch 1440/1567 | Loss: 1.0731 [2026-04-18 15:37:17] Validation | Batch 1450/1567 | Loss: 1.0723 [2026-04-18 15:37:18] Validation | Batch 1460/1567 | Loss: 1.0721 [2026-04-18 15:37:19] Validation | Batch 1470/1567 | Loss: 1.0711 [2026-04-18 15:37:20] Validation | Batch 1480/1567 | Loss: 1.0695 [2026-04-18 15:37:20] Validation | Batch 1490/1567 | Loss: 1.0695 [2026-04-18 15:37:21] Validation | Batch 1500/1567 | Loss: 1.0696 [2026-04-18 15:37:22] Validation | Batch 1510/1567 | Loss: 1.0695 [2026-04-18 15:37:22] Validation | Batch 1520/1567 | Loss: 1.0687 [2026-04-18 15:37:23] Validation | Batch 1530/1567 | Loss: 1.0695 [2026-04-18 15:37:24] Validation | Batch 1540/1567 | Loss: 1.0706 [2026-04-18 15:37:25] Validation | Batch 1550/1567 | Loss: 1.0708 [2026-04-18 15:37:26] Validation | Batch 1560/1567 | Loss: 1.0699 [2026-04-18 15:37:27] Validation | Batch 1567/1567 | Loss: 1.0703 [2026-04-18 15:37:27] Validation | Loss: 1.0703 | PPL: 2.94 | Time: 124.99s [2026-04-18 15:37:30] Epoch 3 | Step 24010 | Loss: 0.6985 | LR: 2.00e-06 [2026-04-18 15:37:34] Epoch 3 | Step 24020 | Loss: 0.6983 | LR: 2.00e-06 [2026-04-18 15:37:37] Epoch 3 | Step 24030 | Loss: 0.6983 | LR: 2.00e-06 [2026-04-18 15:37:41] Epoch 3 | Step 24040 | Loss: 0.6983 | LR: 2.00e-06 [2026-04-18 15:37:44] Epoch 3 | Step 24050 | Loss: 0.6984 | LR: 2.00e-06 [2026-04-18 15:37:48] Epoch 3 | Step 24060 | Loss: 0.6985 | LR: 2.00e-06 [2026-04-18 15:37:52] Epoch 3 | Step 24070 | Loss: 0.6988 | LR: 2.00e-06 [2026-04-18 15:37:55] Epoch 3 | Step 24080 | Loss: 0.6988 | LR: 2.00e-06 [2026-04-18 15:37:59] Epoch 3 | Step 24090 | Loss: 0.6989 | LR: 2.00e-06 [2026-04-18 15:38:02] Epoch 3 | Step 24100 | Loss: 0.6988 | LR: 2.00e-06 [2026-04-18 15:38:06] Epoch 3 | Step 24110 | Loss: 0.6988 | LR: 2.00e-06 [2026-04-18 15:38:10] Epoch 3 | Step 24120 | Loss: 0.6989 | LR: 2.00e-06 [2026-04-18 15:38:14] Epoch 3 | Step 24130 | Loss: 0.6990 | LR: 2.00e-06 [2026-04-18 15:38:17] Epoch 3 | Step 24140 | Loss: 0.6989 | LR: 2.00e-06 [2026-04-18 15:38:21] Epoch 3 | Step 24150 | Loss: 0.6990 | LR: 2.00e-06 [2026-04-18 15:38:24] Epoch 3 | Step 24160 | Loss: 0.6990 | LR: 2.00e-06 [2026-04-18 15:38:28] Epoch 3 | Step 24170 | Loss: 0.6990 | LR: 2.00e-06 [2026-04-18 15:38:32] Epoch 3 | Step 24180 | Loss: 0.6990 | LR: 2.00e-06 [2026-04-18 15:38:35] Epoch 3 | Step 24190 | Loss: 0.6990 | LR: 2.00e-06 [2026-04-18 15:38:39] Epoch 3 | Step 24200 | Loss: 0.6989 | LR: 2.00e-06 [2026-04-18 15:38:43] Epoch 3 | Step 24210 | Loss: 0.6990 | LR: 2.00e-06 [2026-04-18 15:38:46] Epoch 3 | Step 24220 | Loss: 0.6991 | LR: 2.00e-06 [2026-04-18 15:38:50] Epoch 3 | Step 24230 | Loss: 0.6993 | LR: 2.00e-06 [2026-04-18 15:38:53] Epoch 3 | Step 24240 | Loss: 0.6992 | LR: 2.00e-06 [2026-04-18 15:38:57] Epoch 3 | Step 24250 | Loss: 0.6990 | LR: 2.00e-06 [2026-04-18 15:39:00] Epoch 3 | Step 24260 | Loss: 0.6990 | LR: 2.00e-06 [2026-04-18 15:39:04] Epoch 3 | Step 24270 | Loss: 0.6992 | LR: 2.00e-06 [2026-04-18 15:39:07] Epoch 3 | Step 24280 | Loss: 0.6992 | LR: 2.00e-06 [2026-04-18 15:39:11] Epoch 3 | Step 24290 | Loss: 0.6990 | LR: 2.00e-06 [2026-04-18 15:39:15] Epoch 3 | Step 24300 | Loss: 0.6989 | LR: 2.00e-06 [2026-04-18 15:39:18] Epoch 3 | Step 24310 | Loss: 0.6992 | LR: 2.00e-06 [2026-04-18 15:39:22] Epoch 3 | Step 24320 | Loss: 0.6991 | LR: 2.00e-06 [2026-04-18 15:39:26] Epoch 3 | Step 24330 | Loss: 0.6992 | LR: 2.00e-06 [2026-04-18 15:39:29] Epoch 3 | Step 24340 | Loss: 0.6992 | LR: 2.00e-06 [2026-04-18 15:39:33] Epoch 3 | Step 24350 | Loss: 0.6994 | LR: 2.00e-06 [2026-04-18 15:39:36] Epoch 3 | Step 24360 | Loss: 0.6995 | LR: 2.00e-06 [2026-04-18 15:39:40] Epoch 3 | Step 24370 | Loss: 0.6998 | LR: 2.00e-06 [2026-04-18 15:39:44] Epoch 3 | Step 24380 | Loss: 0.6998 | LR: 2.00e-06 [2026-04-18 15:39:47] Epoch 3 | Step 24390 | Loss: 0.6997 | LR: 2.00e-06 [2026-04-18 15:39:50] Epoch 3 | Step 24400 | Loss: 0.6998 | LR: 2.00e-06 [2026-04-18 15:39:54] Epoch 3 | Step 24410 | Loss: 0.6997 | LR: 2.00e-06 [2026-04-18 15:39:57] Epoch 3 | Step 24420 | Loss: 0.6996 | LR: 2.00e-06 [2026-04-18 15:40:01] Epoch 3 | Step 24430 | Loss: 0.6998 | LR: 2.00e-06 [2026-04-18 15:40:04] Epoch 3 | Step 24440 | Loss: 0.6996 | LR: 2.00e-06 [2026-04-18 15:40:08] Epoch 3 | Step 24450 | Loss: 0.6997 | LR: 2.00e-06 [2026-04-18 15:40:11] Epoch 3 | Step 24460 | Loss: 0.6995 | LR: 2.00e-06 [2026-04-18 15:40:15] Epoch 3 | Step 24470 | Loss: 0.6997 | LR: 2.00e-06 [2026-04-18 15:40:19] Epoch 3 | Step 24480 | Loss: 0.6998 | LR: 2.00e-06 [2026-04-18 15:40:22] Epoch 3 | Step 24490 | Loss: 0.6999 | LR: 2.00e-06 [2026-04-18 15:40:25] Epoch 3 | Step 24500 | Loss: 0.6998 | LR: 2.00e-06 [2026-04-18 15:40:29] Epoch 3 | Step 24510 | Loss: 0.6997 | LR: 2.00e-06 [2026-04-18 15:40:32] Epoch 3 | Step 24520 | Loss: 0.6997 | LR: 2.00e-06 [2026-04-18 15:40:36] Epoch 3 | Step 24530 | Loss: 0.6997 | LR: 2.00e-06 [2026-04-18 15:40:40] Epoch 3 | Step 24540 | Loss: 0.6997 | LR: 2.00e-06 [2026-04-18 15:40:44] Epoch 3 | Step 24550 | Loss: 0.6998 | LR: 2.00e-06 [2026-04-18 15:40:47] Epoch 3 | Step 24560 | Loss: 0.6998 | LR: 2.00e-06 [2026-04-18 15:40:51] Epoch 3 | Step 24570 | Loss: 0.6997 | LR: 2.00e-06 [2026-04-18 15:40:54] Epoch 3 | Step 24580 | Loss: 0.6998 | LR: 2.00e-06 [2026-04-18 15:40:58] Epoch 3 | Step 24590 | Loss: 0.6997 | LR: 2.00e-06 [2026-04-18 15:41:01] Epoch 3 | Step 24600 | Loss: 0.6998 | LR: 2.00e-06 [2026-04-18 15:41:05] Epoch 3 | Step 24610 | Loss: 0.6999 | LR: 2.00e-06 [2026-04-18 15:41:09] Epoch 3 | Step 24620 | Loss: 0.6997 | LR: 2.00e-06 [2026-04-18 15:41:12] Epoch 3 | Step 24630 | Loss: 0.6997 | LR: 2.00e-06 [2026-04-18 15:41:15] Epoch 3 | Step 24640 | Loss: 0.6994 | LR: 2.00e-06 [2026-04-18 15:41:19] Epoch 3 | Step 24650 | Loss: 0.6997 | LR: 2.00e-06 [2026-04-18 15:41:22] Epoch 3 | Step 24660 | Loss: 0.6997 | LR: 2.00e-06 [2026-04-18 15:41:26] Epoch 3 | Step 24670 | Loss: 0.6998 | LR: 2.00e-06 [2026-04-18 15:41:29] Epoch 3 | Step 24680 | Loss: 0.6997 | LR: 2.00e-06 [2026-04-18 15:41:33] Epoch 3 | Step 24690 | Loss: 0.6997 | LR: 2.00e-06 [2026-04-18 15:41:37] Epoch 3 | Step 24700 | Loss: 0.6996 | LR: 2.00e-06 [2026-04-18 15:41:40] Epoch 3 | Step 24710 | Loss: 0.6994 | LR: 2.00e-06 [2026-04-18 15:41:44] Epoch 3 | Step 24720 | Loss: 0.6994 | LR: 2.00e-06 [2026-04-18 15:41:48] Epoch 3 | Step 24730 | Loss: 0.6994 | LR: 2.00e-06 [2026-04-18 15:41:51] Epoch 3 | Step 24740 | Loss: 0.6993 | LR: 2.00e-06 [2026-04-18 15:41:55] Epoch 3 | Step 24750 | Loss: 0.6995 | LR: 2.00e-06 [2026-04-18 15:41:59] Epoch 3 | Step 24760 | Loss: 0.6994 | LR: 2.00e-06 [2026-04-18 15:42:02] Epoch 3 | Step 24770 | Loss: 0.6995 | LR: 2.00e-06 [2026-04-18 15:42:05] Epoch 3 | Step 24780 | Loss: 0.6996 | LR: 2.00e-06 [2026-04-18 15:42:09] Epoch 3 | Step 24790 | Loss: 0.6996 | LR: 2.00e-06 [2026-04-18 15:42:12] Epoch 3 | Step 24800 | Loss: 0.6995 | LR: 2.00e-06 [2026-04-18 15:42:15] Epoch 3 | Step 24810 | Loss: 0.6996 | LR: 2.00e-06 [2026-04-18 15:42:19] Epoch 3 | Step 24820 | Loss: 0.6995 | LR: 2.00e-06 [2026-04-18 15:42:22] Epoch 3 | Step 24830 | Loss: 0.6995 | LR: 2.00e-06 [2026-04-18 15:42:26] Epoch 3 | Step 24840 | Loss: 0.6996 | LR: 2.00e-06 [2026-04-18 15:42:30] Epoch 3 | Step 24850 | Loss: 0.6998 | LR: 2.00e-06 [2026-04-18 15:42:34] Epoch 3 | Step 24860 | Loss: 0.6999 | LR: 2.00e-06 [2026-04-18 15:42:37] Epoch 3 | Step 24870 | Loss: 0.6999 | LR: 2.00e-06 [2026-04-18 15:42:41] Epoch 3 | Step 24880 | Loss: 0.7000 | LR: 2.00e-06 [2026-04-18 15:42:44] Epoch 3 | Step 24890 | Loss: 0.7000 | LR: 2.00e-06 [2026-04-18 15:42:48] Epoch 3 | Step 24900 | Loss: 0.7000 | LR: 2.00e-06 [2026-04-18 15:42:51] Epoch 3 | Step 24910 | Loss: 0.7001 | LR: 2.00e-06 [2026-04-18 15:42:55] Epoch 3 | Step 24920 | Loss: 0.7002 | LR: 2.00e-06 [2026-04-18 15:42:58] Epoch 3 | Step 24930 | Loss: 0.7001 | LR: 2.00e-06 [2026-04-18 15:43:02] Epoch 3 | Step 24940 | Loss: 0.7001 | LR: 2.00e-06 [2026-04-18 15:43:05] Epoch 3 | Step 24950 | Loss: 0.7000 | LR: 2.00e-06 [2026-04-18 15:43:09] Epoch 3 | Step 24960 | Loss: 0.7001 | LR: 2.00e-06 [2026-04-18 15:43:12] Epoch 3 | Step 24970 | Loss: 0.7001 | LR: 2.00e-06 [2026-04-18 15:43:16] Epoch 3 | Step 24980 | Loss: 0.7000 | LR: 2.00e-06 [2026-04-18 15:43:20] Epoch 3 | Step 24990 | Loss: 0.6998 | LR: 2.00e-06 [2026-04-18 15:43:23] Epoch 3 | Step 25000 | Loss: 0.7000 | LR: 2.00e-06 [2026-04-18 15:43:24] Validation | Batch 10/1567 | Loss: 0.9433 [2026-04-18 15:43:25] Validation | Batch 20/1567 | Loss: 1.0148 [2026-04-18 15:43:26] Validation | Batch 30/1567 | Loss: 1.0553 [2026-04-18 15:43:27] Validation | Batch 40/1567 | Loss: 1.0790 [2026-04-18 15:43:28] Validation | Batch 50/1567 | Loss: 1.0533 [2026-04-18 15:43:29] Validation | Batch 60/1567 | Loss: 1.0398 [2026-04-18 15:43:29] Validation | Batch 70/1567 | Loss: 1.0249 [2026-04-18 15:43:31] Validation | Batch 80/1567 | Loss: 1.0435 [2026-04-18 15:43:31] Validation | Batch 90/1567 | Loss: 1.0513 [2026-04-18 15:43:32] Validation | Batch 100/1567 | Loss: 1.0620 [2026-04-18 15:43:33] Validation | Batch 110/1567 | Loss: 1.0539 [2026-04-18 15:43:34] Validation | Batch 120/1567 | Loss: 1.0649 [2026-04-18 15:43:35] Validation | Batch 130/1567 | Loss: 1.0666 [2026-04-18 15:43:35] Validation | Batch 140/1567 | Loss: 1.0687 [2026-04-18 15:43:36] Validation | Batch 150/1567 | Loss: 1.0766 [2026-04-18 15:43:37] Validation | Batch 160/1567 | Loss: 1.0777 [2026-04-18 15:43:38] Validation | Batch 170/1567 | Loss: 1.0622 [2026-04-18 15:43:39] Validation | Batch 180/1567 | Loss: 1.0648 [2026-04-18 15:43:40] Validation | Batch 190/1567 | Loss: 1.0609 [2026-04-18 15:43:40] Validation | Batch 200/1567 | Loss: 1.0636 [2026-04-18 15:43:41] Validation | Batch 210/1567 | Loss: 1.0643 [2026-04-18 15:43:42] Validation | Batch 220/1567 | Loss: 1.0664 [2026-04-18 15:43:43] Validation | Batch 230/1567 | Loss: 1.0703 [2026-04-18 15:43:44] Validation | Batch 240/1567 | Loss: 1.0686 [2026-04-18 15:43:44] Validation | Batch 250/1567 | Loss: 1.0624 [2026-04-18 15:43:45] Validation | Batch 260/1567 | Loss: 1.0576 [2026-04-18 15:43:46] Validation | Batch 270/1567 | Loss: 1.0547 [2026-04-18 15:43:47] Validation | Batch 280/1567 | Loss: 1.0558 [2026-04-18 15:43:48] Validation | Batch 290/1567 | Loss: 1.0613 [2026-04-18 15:43:49] Validation | Batch 300/1567 | Loss: 1.0663 [2026-04-18 15:43:49] Validation | Batch 310/1567 | Loss: 1.0652 [2026-04-18 15:43:50] Validation | Batch 320/1567 | Loss: 1.0660 [2026-04-18 15:43:51] Validation | Batch 330/1567 | Loss: 1.0629 [2026-04-18 15:43:52] Validation | Batch 340/1567 | Loss: 1.0670 [2026-04-18 15:43:53] Validation | Batch 350/1567 | Loss: 1.0658 [2026-04-18 15:43:54] Validation | Batch 360/1567 | Loss: 1.0635 [2026-04-18 15:43:54] Validation | Batch 370/1567 | Loss: 1.0606 [2026-04-18 15:43:55] Validation | Batch 380/1567 | Loss: 1.0640 [2026-04-18 15:43:56] Validation | Batch 390/1567 | Loss: 1.0651 [2026-04-18 15:43:57] Validation | Batch 400/1567 | Loss: 1.0664 [2026-04-18 15:43:58] Validation | Batch 410/1567 | Loss: 1.0657 [2026-04-18 15:43:58] Validation | Batch 420/1567 | Loss: 1.0653 [2026-04-18 15:43:59] Validation | Batch 430/1567 | Loss: 1.0651 [2026-04-18 15:44:00] Validation | Batch 440/1567 | Loss: 1.0640 [2026-04-18 15:44:01] Validation | Batch 450/1567 | Loss: 1.0641 [2026-04-18 15:44:02] Validation | Batch 460/1567 | Loss: 1.0631 [2026-04-18 15:44:03] Validation | Batch 470/1567 | Loss: 1.0625 [2026-04-18 15:44:03] Validation | Batch 480/1567 | Loss: 1.0603 [2026-04-18 15:44:04] Validation | Batch 490/1567 | Loss: 1.0600 [2026-04-18 15:44:05] Validation | Batch 500/1567 | Loss: 1.0596 [2026-04-18 15:44:06] Validation | Batch 510/1567 | Loss: 1.0619 [2026-04-18 15:44:06] Validation | Batch 520/1567 | Loss: 1.0637 [2026-04-18 15:44:07] Validation | Batch 530/1567 | Loss: 1.0634 [2026-04-18 15:44:08] Validation | Batch 540/1567 | Loss: 1.0661 [2026-04-18 15:44:09] Validation | Batch 550/1567 | Loss: 1.0697 [2026-04-18 15:44:10] Validation | Batch 560/1567 | Loss: 1.0695 [2026-04-18 15:44:11] Validation | Batch 570/1567 | Loss: 1.0694 [2026-04-18 15:44:12] Validation | Batch 580/1567 | Loss: 1.0685 [2026-04-18 15:44:12] Validation | Batch 590/1567 | Loss: 1.0672 [2026-04-18 15:44:13] Validation | Batch 600/1567 | Loss: 1.0654 [2026-04-18 15:44:14] Validation | Batch 610/1567 | Loss: 1.0644 [2026-04-18 15:44:15] Validation | Batch 620/1567 | Loss: 1.0658 [2026-04-18 15:44:16] Validation | Batch 630/1567 | Loss: 1.0638 [2026-04-18 15:44:17] Validation | Batch 640/1567 | Loss: 1.0655 [2026-04-18 15:44:18] Validation | Batch 650/1567 | Loss: 1.0646 [2026-04-18 15:44:18] Validation | Batch 660/1567 | Loss: 1.0635 [2026-04-18 15:44:19] Validation | Batch 670/1567 | Loss: 1.0615 [2026-04-18 15:44:20] Validation | Batch 680/1567 | Loss: 1.0609 [2026-04-18 15:44:20] Validation | Batch 690/1567 | Loss: 1.0618 [2026-04-18 15:44:21] Validation | Batch 700/1567 | Loss: 1.0603 [2026-04-18 15:44:22] Validation | Batch 710/1567 | Loss: 1.0617 [2026-04-18 15:44:23] Validation | Batch 720/1567 | Loss: 1.0609 [2026-04-18 15:44:24] Validation | Batch 730/1567 | Loss: 1.0616 [2026-04-18 15:44:25] Validation | Batch 740/1567 | Loss: 1.0627 [2026-04-18 15:44:25] Validation | Batch 750/1567 | Loss: 1.0633 [2026-04-18 15:44:26] Validation | Batch 760/1567 | Loss: 1.0630 [2026-04-18 15:44:27] Validation | Batch 770/1567 | Loss: 1.0651 [2026-04-18 15:44:28] Validation | Batch 780/1567 | Loss: 1.0664 [2026-04-18 15:44:29] Validation | Batch 790/1567 | Loss: 1.0658 [2026-04-18 15:44:29] Validation | Batch 800/1567 | Loss: 1.0677 [2026-04-18 15:44:30] Validation | Batch 810/1567 | Loss: 1.0677 [2026-04-18 15:44:31] Validation | Batch 820/1567 | Loss: 1.0673 [2026-04-18 15:44:32] Validation | Batch 830/1567 | Loss: 1.0657 [2026-04-18 15:44:32] Validation | Batch 840/1567 | Loss: 1.0658 [2026-04-18 15:44:33] Validation | Batch 850/1567 | Loss: 1.0644 [2026-04-18 15:44:34] Validation | Batch 860/1567 | Loss: 1.0660 [2026-04-18 15:44:35] Validation | Batch 870/1567 | Loss: 1.0665 [2026-04-18 15:44:35] Validation | Batch 880/1567 | Loss: 1.0674 [2026-04-18 15:44:36] Validation | Batch 890/1567 | Loss: 1.0680 [2026-04-18 15:44:37] Validation | Batch 900/1567 | Loss: 1.0699 [2026-04-18 15:44:38] Validation | Batch 910/1567 | Loss: 1.0701 [2026-04-18 15:44:38] Validation | Batch 920/1567 | Loss: 1.0723 [2026-04-18 15:44:39] Validation | Batch 930/1567 | Loss: 1.0700 [2026-04-18 15:44:40] Validation | Batch 940/1567 | Loss: 1.0696 [2026-04-18 15:44:41] Validation | Batch 950/1567 | Loss: 1.0685 [2026-04-18 15:44:41] Validation | Batch 960/1567 | Loss: 1.0671 [2026-04-18 15:44:42] Validation | Batch 970/1567 | Loss: 1.0689 [2026-04-18 15:44:43] Validation | Batch 980/1567 | Loss: 1.0692 [2026-04-18 15:44:44] Validation | Batch 990/1567 | Loss: 1.0687 [2026-04-18 15:44:44] Validation | Batch 1000/1567 | Loss: 1.0691 [2026-04-18 15:44:45] Validation | Batch 1010/1567 | Loss: 1.0668 [2026-04-18 15:44:46] Validation | Batch 1020/1567 | Loss: 1.0671 [2026-04-18 15:44:47] Validation | Batch 1030/1567 | Loss: 1.0687 [2026-04-18 15:44:48] Validation | Batch 1040/1567 | Loss: 1.0682 [2026-04-18 15:44:48] Validation | Batch 1050/1567 | Loss: 1.0692 [2026-04-18 15:44:49] Validation | Batch 1060/1567 | Loss: 1.0683 [2026-04-18 15:44:50] Validation | Batch 1070/1567 | Loss: 1.0675 [2026-04-18 15:44:51] Validation | Batch 1080/1567 | Loss: 1.0684 [2026-04-18 15:44:51] Validation | Batch 1090/1567 | Loss: 1.0681 [2026-04-18 15:44:52] Validation | Batch 1100/1567 | Loss: 1.0687 [2026-04-18 15:44:53] Validation | Batch 1110/1567 | Loss: 1.0686 [2026-04-18 15:44:53] Validation | Batch 1120/1567 | Loss: 1.0688 [2026-04-18 15:44:54] Validation | Batch 1130/1567 | Loss: 1.0689 [2026-04-18 15:44:55] Validation | Batch 1140/1567 | Loss: 1.0697 [2026-04-18 15:44:56] Validation | Batch 1150/1567 | Loss: 1.0701 [2026-04-18 15:44:57] Validation | Batch 1160/1567 | Loss: 1.0710 [2026-04-18 15:44:58] Validation | Batch 1170/1567 | Loss: 1.0706 [2026-04-18 15:44:59] Validation | Batch 1180/1567 | Loss: 1.0702 [2026-04-18 15:44:59] Validation | Batch 1190/1567 | Loss: 1.0714 [2026-04-18 15:45:00] Validation | Batch 1200/1567 | Loss: 1.0708 [2026-04-18 15:45:01] Validation | Batch 1210/1567 | Loss: 1.0696 [2026-04-18 15:45:02] Validation | Batch 1220/1567 | Loss: 1.0699 [2026-04-18 15:45:03] Validation | Batch 1230/1567 | Loss: 1.0721 [2026-04-18 15:45:03] Validation | Batch 1240/1567 | Loss: 1.0708 [2026-04-18 15:45:04] Validation | Batch 1250/1567 | Loss: 1.0708 [2026-04-18 15:45:05] Validation | Batch 1260/1567 | Loss: 1.0718 [2026-04-18 15:45:06] Validation | Batch 1270/1567 | Loss: 1.0718 [2026-04-18 15:45:07] Validation | Batch 1280/1567 | Loss: 1.0712 [2026-04-18 15:45:08] Validation | Batch 1290/1567 | Loss: 1.0715 [2026-04-18 15:45:09] Validation | Batch 1300/1567 | Loss: 1.0718 [2026-04-18 15:45:09] Validation | Batch 1310/1567 | Loss: 1.0721 [2026-04-18 15:45:10] Validation | Batch 1320/1567 | Loss: 1.0712 [2026-04-18 15:45:11] Validation | Batch 1330/1567 | Loss: 1.0708 [2026-04-18 15:45:12] Validation | Batch 1340/1567 | Loss: 1.0706 [2026-04-18 15:45:12] Validation | Batch 1350/1567 | Loss: 1.0715 [2026-04-18 15:45:13] Validation | Batch 1360/1567 | Loss: 1.0711 [2026-04-18 15:45:14] Validation | Batch 1370/1567 | Loss: 1.0715 [2026-04-18 15:45:15] Validation | Batch 1380/1567 | Loss: 1.0728 [2026-04-18 15:45:15] Validation | Batch 1390/1567 | Loss: 1.0730 [2026-04-18 15:45:16] Validation | Batch 1400/1567 | Loss: 1.0734 [2026-04-18 15:45:17] Validation | Batch 1410/1567 | Loss: 1.0732 [2026-04-18 15:45:17] Validation | Batch 1420/1567 | Loss: 1.0738 [2026-04-18 15:45:18] Validation | Batch 1430/1567 | Loss: 1.0735 [2026-04-18 15:45:19] Validation | Batch 1440/1567 | Loss: 1.0738 [2026-04-18 15:45:20] Validation | Batch 1450/1567 | Loss: 1.0731 [2026-04-18 15:45:20] Validation | Batch 1460/1567 | Loss: 1.0729 [2026-04-18 15:45:21] Validation | Batch 1470/1567 | Loss: 1.0719 [2026-04-18 15:45:22] Validation | Batch 1480/1567 | Loss: 1.0703 [2026-04-18 15:45:22] Validation | Batch 1490/1567 | Loss: 1.0703 [2026-04-18 15:45:23] Validation | Batch 1500/1567 | Loss: 1.0704 [2026-04-18 15:45:24] Validation | Batch 1510/1567 | Loss: 1.0703 [2026-04-18 15:45:25] Validation | Batch 1520/1567 | Loss: 1.0695 [2026-04-18 15:45:25] Validation | Batch 1530/1567 | Loss: 1.0703 [2026-04-18 15:45:26] Validation | Batch 1540/1567 | Loss: 1.0713 [2026-04-18 15:45:27] Validation | Batch 1550/1567 | Loss: 1.0716 [2026-04-18 15:45:28] Validation | Batch 1560/1567 | Loss: 1.0707 [2026-04-18 15:45:29] Validation | Batch 1567/1567 | Loss: 1.0711 [2026-04-18 15:45:29] Validation | Loss: 1.0711 | PPL: 2.94 | Time: 125.32s [2026-04-18 15:45:32] Epoch 3 | Step 25010 | Loss: 0.7000 | LR: 2.00e-06 [2026-04-18 15:45:36] Epoch 3 | Step 25020 | Loss: 0.7001 | LR: 2.00e-06 [2026-04-18 15:45:40] Epoch 3 | Step 25030 | Loss: 0.7002 | LR: 2.00e-06 [2026-04-18 15:45:43] Epoch 3 | Step 25040 | Loss: 0.7003 | LR: 2.00e-06 [2026-04-18 15:45:47] Epoch 3 | Step 25050 | Loss: 0.7002 | LR: 2.00e-06 [2026-04-18 15:45:51] Epoch 3 | Step 25060 | Loss: 0.7005 | LR: 2.00e-06 [2026-04-18 15:45:54] Epoch 3 | Step 25070 | Loss: 0.7004 | LR: 2.00e-06 [2026-04-18 15:45:58] Epoch 3 | Step 25080 | Loss: 0.7005 | LR: 2.00e-06 [2026-04-18 15:46:01] Epoch 3 | Step 25090 | Loss: 0.7006 | LR: 2.00e-06 [2026-04-18 15:46:05] Epoch 3 | Step 25100 | Loss: 0.7005 | LR: 2.00e-06 [2026-04-18 15:46:08] Epoch 3 | Step 25110 | Loss: 0.7006 | LR: 2.00e-06 [2026-04-18 15:46:12] Epoch 3 | Step 25120 | Loss: 0.7006 | LR: 2.00e-06 [2026-04-18 15:46:15] Epoch 3 | Step 25130 | Loss: 0.7007 | LR: 2.00e-06 [2026-04-18 15:46:19] Epoch 3 | Step 25140 | Loss: 0.7008 | LR: 2.00e-06 [2026-04-18 15:46:22] Epoch 3 | Step 25150 | Loss: 0.7008 | LR: 2.00e-06 [2026-04-18 15:46:26] Epoch 3 | Step 25160 | Loss: 0.7009 | LR: 2.00e-06 [2026-04-18 15:46:30] Epoch 3 | Step 25170 | Loss: 0.7010 | LR: 2.00e-06 [2026-04-18 15:46:34] Epoch 3 | Step 25180 | Loss: 0.7009 | LR: 2.00e-06 [2026-04-18 15:46:38] Epoch 3 | Step 25190 | Loss: 0.7010 | LR: 2.00e-06 [2026-04-18 15:46:41] Epoch 3 | Step 25200 | Loss: 0.7010 | LR: 2.00e-06 [2026-04-18 15:46:45] Epoch 3 | Step 25210 | Loss: 0.7009 | LR: 2.00e-06 [2026-04-18 15:46:48] Epoch 3 | Step 25220 | Loss: 0.7007 | LR: 2.00e-06 [2026-04-18 15:46:52] Epoch 3 | Step 25230 | Loss: 0.7008 | LR: 2.00e-06 [2026-04-18 15:46:55] Epoch 3 | Step 25240 | Loss: 0.7007 | LR: 2.00e-06 [2026-04-18 15:46:59] Epoch 3 | Step 25250 | Loss: 0.7008 | LR: 2.00e-06 [2026-04-18 15:47:02] Epoch 3 | Step 25260 | Loss: 0.7007 | LR: 2.00e-06 [2026-04-18 15:47:06] Epoch 3 | Step 25270 | Loss: 0.7010 | LR: 2.00e-06 [2026-04-18 15:47:09] Epoch 3 | Step 25280 | Loss: 0.7011 | LR: 2.00e-06 [2026-04-18 15:47:12] Epoch 3 | Step 25290 | Loss: 0.7010 | LR: 2.00e-06 [2026-04-18 15:47:16] Epoch 3 | Step 25300 | Loss: 0.7008 | LR: 2.00e-06 [2026-04-18 15:47:20] Epoch 3 | Step 25310 | Loss: 0.7007 | LR: 2.00e-06 [2026-04-18 15:47:23] Epoch 3 | Step 25320 | Loss: 0.7008 | LR: 2.00e-06 [2026-04-18 15:47:27] Epoch 3 | Step 25330 | Loss: 0.7009 | LR: 2.00e-06 [2026-04-18 15:47:30] Epoch 3 | Step 25340 | Loss: 0.7011 | LR: 2.00e-06 [2026-04-18 15:47:34] Epoch 3 | Step 25350 | Loss: 0.7013 | LR: 2.00e-06 [2026-04-18 15:47:38] Epoch 3 | Step 25360 | Loss: 0.7014 | LR: 2.00e-06 [2026-04-18 15:47:42] Epoch 3 | Step 25370 | Loss: 0.7014 | LR: 2.00e-06 [2026-04-18 15:47:46] Epoch 3 | Step 25380 | Loss: 0.7015 | LR: 2.00e-06 [2026-04-18 15:47:49] Epoch 3 | Step 25390 | Loss: 0.7014 | LR: 2.00e-06 [2026-04-18 15:47:53] Epoch 3 | Step 25400 | Loss: 0.7015 | LR: 2.00e-06 [2026-04-18 15:47:56] Epoch 3 | Step 25410 | Loss: 0.7014 | LR: 2.00e-06 [2026-04-18 15:48:00] Epoch 3 | Step 25420 | Loss: 0.7014 | LR: 2.00e-06 [2026-04-18 15:48:04] Epoch 3 | Step 25430 | Loss: 0.7013 | LR: 2.00e-06 [2026-04-18 15:48:07] Epoch 3 | Step 25440 | Loss: 0.7013 | LR: 2.00e-06 [2026-04-18 15:48:11] Epoch 3 | Step 25450 | Loss: 0.7012 | LR: 2.00e-06 [2026-04-18 15:48:15] Epoch 3 | Step 25460 | Loss: 0.7014 | LR: 2.00e-06 [2026-04-18 15:48:18] Epoch 3 | Step 25470 | Loss: 0.7016 | LR: 2.00e-06 [2026-04-18 15:48:22] Epoch 3 | Step 25480 | Loss: 0.7016 | LR: 2.00e-06 [2026-04-18 15:48:26] Epoch 3 | Step 25490 | Loss: 0.7016 | LR: 2.00e-06 [2026-04-18 15:48:29] Epoch 3 | Step 25500 | Loss: 0.7015 | LR: 2.00e-06 [2026-04-18 15:48:32] Epoch 3 | Step 25510 | Loss: 0.7016 | LR: 2.00e-06 [2026-04-18 15:48:36] Epoch 3 | Step 25520 | Loss: 0.7017 | LR: 2.00e-06 [2026-04-18 15:48:40] Epoch 3 | Step 25530 | Loss: 0.7017 | LR: 2.00e-06 [2026-04-18 15:48:43] Epoch 3 | Step 25540 | Loss: 0.7015 | LR: 2.00e-06 [2026-04-18 15:48:47] Epoch 3 | Step 25550 | Loss: 0.7017 | LR: 2.00e-06 [2026-04-18 15:48:51] Epoch 3 | Step 25560 | Loss: 0.7017 | LR: 2.00e-06 [2026-04-18 15:48:54] Epoch 3 | Step 25570 | Loss: 0.7018 | LR: 2.00e-06 [2026-04-18 15:48:58] Epoch 3 | Step 25580 | Loss: 0.7021 | LR: 2.00e-06 [2026-04-18 15:49:01] Epoch 3 | Step 25590 | Loss: 0.7020 | LR: 2.00e-06 [2026-04-18 15:49:04] Epoch 3 | Step 25600 | Loss: 0.7021 | LR: 2.00e-06 [2026-04-18 15:49:08] Epoch 3 | Step 25610 | Loss: 0.7021 | LR: 2.00e-06 [2026-04-18 15:49:11] Epoch 3 | Step 25620 | Loss: 0.7021 | LR: 2.00e-06 [2026-04-18 15:49:15] Epoch 3 | Step 25630 | Loss: 0.7022 | LR: 2.00e-06 [2026-04-18 15:49:19] Epoch 3 | Step 25640 | Loss: 0.7023 | LR: 2.00e-06 [2026-04-18 15:49:22] Epoch 3 | Step 25650 | Loss: 0.7024 | LR: 2.00e-06 [2026-04-18 15:49:26] Epoch 3 | Step 25660 | Loss: 0.7024 | LR: 2.00e-06 [2026-04-18 15:49:29] Epoch 3 | Step 25670 | Loss: 0.7022 | LR: 2.00e-06 [2026-04-18 15:49:33] Epoch 3 | Step 25680 | Loss: 0.7022 | LR: 2.00e-06 [2026-04-18 15:49:37] Epoch 3 | Step 25690 | Loss: 0.7022 | LR: 2.00e-06 [2026-04-18 15:49:40] Epoch 3 | Step 25700 | Loss: 0.7022 | LR: 2.00e-06 [2026-04-18 15:49:44] Epoch 3 | Step 25710 | Loss: 0.7019 | LR: 2.00e-06 [2026-04-18 15:49:47] Epoch 3 | Step 25720 | Loss: 0.7018 | LR: 2.00e-06 [2026-04-18 15:49:51] Epoch 3 | Step 25730 | Loss: 0.7018 | LR: 2.00e-06 [2026-04-18 15:49:54] Epoch 3 | Step 25740 | Loss: 0.7017 | LR: 2.00e-06 [2026-04-18 15:49:58] Epoch 3 | Step 25750 | Loss: 0.7016 | LR: 2.00e-06 [2026-04-18 15:50:02] Epoch 3 | Step 25760 | Loss: 0.7017 | LR: 2.00e-06 [2026-04-18 15:50:05] Epoch 3 | Step 25770 | Loss: 0.7017 | LR: 2.00e-06 [2026-04-18 15:50:09] Epoch 3 | Step 25780 | Loss: 0.7017 | LR: 2.00e-06 [2026-04-18 15:50:12] Epoch 3 | Step 25790 | Loss: 0.7016 | LR: 2.00e-06 [2026-04-18 15:50:16] Epoch 3 | Step 25800 | Loss: 0.7017 | LR: 2.00e-06 [2026-04-18 15:50:19] Epoch 3 | Step 25810 | Loss: 0.7017 | LR: 2.00e-06 [2026-04-18 15:50:23] Epoch 3 | Step 25820 | Loss: 0.7016 | LR: 2.00e-06 [2026-04-18 15:50:26] Epoch 3 | Step 25830 | Loss: 0.7015 | LR: 2.00e-06 [2026-04-18 15:50:30] Epoch 3 | Step 25840 | Loss: 0.7015 | LR: 2.00e-06 [2026-04-18 15:50:33] Epoch 3 | Step 25850 | Loss: 0.7017 | LR: 2.00e-06 [2026-04-18 15:50:37] Epoch 3 | Step 25860 | Loss: 0.7016 | LR: 2.00e-06 [2026-04-18 15:50:40] Epoch 3 | Step 25870 | Loss: 0.7016 | LR: 2.00e-06 [2026-04-18 15:50:44] Epoch 3 | Step 25880 | Loss: 0.7016 | LR: 2.00e-06 [2026-04-18 15:50:47] Epoch 3 | Step 25890 | Loss: 0.7018 | LR: 2.00e-06 [2026-04-18 15:50:50] Epoch 3 | Step 25900 | Loss: 0.7017 | LR: 2.00e-06 [2026-04-18 15:50:54] Epoch 3 | Step 25910 | Loss: 0.7017 | LR: 2.00e-06 [2026-04-18 15:50:57] Epoch 3 | Step 25920 | Loss: 0.7017 | LR: 2.00e-06 [2026-04-18 15:51:01] Epoch 3 | Step 25930 | Loss: 0.7018 | LR: 2.00e-06 [2026-04-18 15:51:04] Epoch 3 | Step 25940 | Loss: 0.7017 | LR: 2.00e-06 [2026-04-18 15:51:08] Epoch 3 | Step 25950 | Loss: 0.7014 | LR: 2.00e-06 [2026-04-18 15:51:12] Epoch 3 | Step 25960 | Loss: 0.7015 | LR: 2.00e-06 [2026-04-18 15:51:15] Epoch 3 | Step 25970 | Loss: 0.7016 | LR: 2.00e-06 [2026-04-18 15:51:19] Epoch 3 | Step 25980 | Loss: 0.7018 | LR: 2.00e-06 [2026-04-18 15:51:22] Epoch 3 | Step 25990 | Loss: 0.7019 | LR: 2.00e-06 [2026-04-18 15:51:25] Epoch 3 | Step 26000 | Loss: 0.7020 | LR: 2.00e-06 [2026-04-18 15:51:26] Validation | Batch 10/1567 | Loss: 0.9480 [2026-04-18 15:51:27] Validation | Batch 20/1567 | Loss: 1.0170 [2026-04-18 15:51:28] Validation | Batch 30/1567 | Loss: 1.0554 [2026-04-18 15:51:29] Validation | Batch 40/1567 | Loss: 1.0780 [2026-04-18 15:51:30] Validation | Batch 50/1567 | Loss: 1.0530 [2026-04-18 15:51:31] Validation | Batch 60/1567 | Loss: 1.0395 [2026-04-18 15:51:31] Validation | Batch 70/1567 | Loss: 1.0248 [2026-04-18 15:51:33] Validation | Batch 80/1567 | Loss: 1.0437 [2026-04-18 15:51:33] Validation | Batch 90/1567 | Loss: 1.0520 [2026-04-18 15:51:34] Validation | Batch 100/1567 | Loss: 1.0622 [2026-04-18 15:51:34] Validation | Batch 110/1567 | Loss: 1.0543 [2026-04-18 15:51:35] Validation | Batch 120/1567 | Loss: 1.0653 [2026-04-18 15:51:36] Validation | Batch 130/1567 | Loss: 1.0667 [2026-04-18 15:51:37] Validation | Batch 140/1567 | Loss: 1.0689 [2026-04-18 15:51:38] Validation | Batch 150/1567 | Loss: 1.0767 [2026-04-18 15:51:39] Validation | Batch 160/1567 | Loss: 1.0778 [2026-04-18 15:51:39] Validation | Batch 170/1567 | Loss: 1.0623 [2026-04-18 15:51:40] Validation | Batch 180/1567 | Loss: 1.0650 [2026-04-18 15:51:41] Validation | Batch 190/1567 | Loss: 1.0610 [2026-04-18 15:51:42] Validation | Batch 200/1567 | Loss: 1.0639 [2026-04-18 15:51:43] Validation | Batch 210/1567 | Loss: 1.0646 [2026-04-18 15:51:44] Validation | Batch 220/1567 | Loss: 1.0668 [2026-04-18 15:51:45] Validation | Batch 230/1567 | Loss: 1.0707 [2026-04-18 15:51:45] Validation | Batch 240/1567 | Loss: 1.0689 [2026-04-18 15:51:46] Validation | Batch 250/1567 | Loss: 1.0627 [2026-04-18 15:51:47] Validation | Batch 260/1567 | Loss: 1.0579 [2026-04-18 15:51:48] Validation | Batch 270/1567 | Loss: 1.0549 [2026-04-18 15:51:48] Validation | Batch 280/1567 | Loss: 1.0560 [2026-04-18 15:51:50] Validation | Batch 290/1567 | Loss: 1.0615 [2026-04-18 15:51:50] Validation | Batch 300/1567 | Loss: 1.0665 [2026-04-18 15:51:51] Validation | Batch 310/1567 | Loss: 1.0653 [2026-04-18 15:51:52] Validation | Batch 320/1567 | Loss: 1.0661 [2026-04-18 15:51:53] Validation | Batch 330/1567 | Loss: 1.0630 [2026-04-18 15:51:54] Validation | Batch 340/1567 | Loss: 1.0672 [2026-04-18 15:51:54] Validation | Batch 350/1567 | Loss: 1.0660 [2026-04-18 15:51:55] Validation | Batch 360/1567 | Loss: 1.0636 [2026-04-18 15:51:56] Validation | Batch 370/1567 | Loss: 1.0608 [2026-04-18 15:51:57] Validation | Batch 380/1567 | Loss: 1.0641 [2026-04-18 15:51:57] Validation | Batch 390/1567 | Loss: 1.0652 [2026-04-18 15:51:58] Validation | Batch 400/1567 | Loss: 1.0664 [2026-04-18 15:51:59] Validation | Batch 410/1567 | Loss: 1.0657 [2026-04-18 15:52:00] Validation | Batch 420/1567 | Loss: 1.0653 [2026-04-18 15:52:01] Validation | Batch 430/1567 | Loss: 1.0652 [2026-04-18 15:52:02] Validation | Batch 440/1567 | Loss: 1.0641 [2026-04-18 15:52:02] Validation | Batch 450/1567 | Loss: 1.0641 [2026-04-18 15:52:03] Validation | Batch 460/1567 | Loss: 1.0632 [2026-04-18 15:52:04] Validation | Batch 470/1567 | Loss: 1.0627 [2026-04-18 15:52:05] Validation | Batch 480/1567 | Loss: 1.0605 [2026-04-18 15:52:06] Validation | Batch 490/1567 | Loss: 1.0602 [2026-04-18 15:52:06] Validation | Batch 500/1567 | Loss: 1.0597 [2026-04-18 15:52:07] Validation | Batch 510/1567 | Loss: 1.0620 [2026-04-18 15:52:08] Validation | Batch 520/1567 | Loss: 1.0638 [2026-04-18 15:52:09] Validation | Batch 530/1567 | Loss: 1.0635 [2026-04-18 15:52:10] Validation | Batch 540/1567 | Loss: 1.0662 [2026-04-18 15:52:11] Validation | Batch 550/1567 | Loss: 1.0699 [2026-04-18 15:52:12] Validation | Batch 560/1567 | Loss: 1.0697 [2026-04-18 15:52:12] Validation | Batch 570/1567 | Loss: 1.0696 [2026-04-18 15:52:13] Validation | Batch 580/1567 | Loss: 1.0687 [2026-04-18 15:52:14] Validation | Batch 590/1567 | Loss: 1.0674 [2026-04-18 15:52:15] Validation | Batch 600/1567 | Loss: 1.0656 [2026-04-18 15:52:16] Validation | Batch 610/1567 | Loss: 1.0645 [2026-04-18 15:52:17] Validation | Batch 620/1567 | Loss: 1.0660 [2026-04-18 15:52:18] Validation | Batch 630/1567 | Loss: 1.0640 [2026-04-18 15:52:18] Validation | Batch 640/1567 | Loss: 1.0657 [2026-04-18 15:52:20] Validation | Batch 650/1567 | Loss: 1.0648 [2026-04-18 15:52:20] Validation | Batch 660/1567 | Loss: 1.0636 [2026-04-18 15:52:21] Validation | Batch 670/1567 | Loss: 1.0616 [2026-04-18 15:52:22] Validation | Batch 680/1567 | Loss: 1.0610 [2026-04-18 15:52:22] Validation | Batch 690/1567 | Loss: 1.0619 [2026-04-18 15:52:23] Validation | Batch 700/1567 | Loss: 1.0605 [2026-04-18 15:52:24] Validation | Batch 710/1567 | Loss: 1.0619 [2026-04-18 15:52:25] Validation | Batch 720/1567 | Loss: 1.0611 [2026-04-18 15:52:26] Validation | Batch 730/1567 | Loss: 1.0617 [2026-04-18 15:52:26] Validation | Batch 740/1567 | Loss: 1.0628 [2026-04-18 15:52:27] Validation | Batch 750/1567 | Loss: 1.0634 [2026-04-18 15:52:28] Validation | Batch 760/1567 | Loss: 1.0632 [2026-04-18 15:52:29] Validation | Batch 770/1567 | Loss: 1.0653 [2026-04-18 15:52:30] Validation | Batch 780/1567 | Loss: 1.0665 [2026-04-18 15:52:31] Validation | Batch 790/1567 | Loss: 1.0660 [2026-04-18 15:52:31] Validation | Batch 800/1567 | Loss: 1.0679 [2026-04-18 15:52:32] Validation | Batch 810/1567 | Loss: 1.0679 [2026-04-18 15:52:33] Validation | Batch 820/1567 | Loss: 1.0674 [2026-04-18 15:52:34] Validation | Batch 830/1567 | Loss: 1.0659 [2026-04-18 15:52:34] Validation | Batch 840/1567 | Loss: 1.0659 [2026-04-18 15:52:35] Validation | Batch 850/1567 | Loss: 1.0646 [2026-04-18 15:52:36] Validation | Batch 860/1567 | Loss: 1.0662 [2026-04-18 15:52:36] Validation | Batch 870/1567 | Loss: 1.0667 [2026-04-18 15:52:37] Validation | Batch 880/1567 | Loss: 1.0676 [2026-04-18 15:52:38] Validation | Batch 890/1567 | Loss: 1.0682 [2026-04-18 15:52:39] Validation | Batch 900/1567 | Loss: 1.0701 [2026-04-18 15:52:39] Validation | Batch 910/1567 | Loss: 1.0702 [2026-04-18 15:52:40] Validation | Batch 920/1567 | Loss: 1.0725 [2026-04-18 15:52:41] Validation | Batch 930/1567 | Loss: 1.0701 [2026-04-18 15:52:42] Validation | Batch 940/1567 | Loss: 1.0697 [2026-04-18 15:52:42] Validation | Batch 950/1567 | Loss: 1.0687 [2026-04-18 15:52:43] Validation | Batch 960/1567 | Loss: 1.0673 [2026-04-18 15:52:44] Validation | Batch 970/1567 | Loss: 1.0690 [2026-04-18 15:52:45] Validation | Batch 980/1567 | Loss: 1.0694 [2026-04-18 15:52:45] Validation | Batch 990/1567 | Loss: 1.0688 [2026-04-18 15:52:46] Validation | Batch 1000/1567 | Loss: 1.0692 [2026-04-18 15:52:47] Validation | Batch 1010/1567 | Loss: 1.0669 [2026-04-18 15:52:47] Validation | Batch 1020/1567 | Loss: 1.0672 [2026-04-18 15:52:48] Validation | Batch 1030/1567 | Loss: 1.0688 [2026-04-18 15:52:49] Validation | Batch 1040/1567 | Loss: 1.0683 [2026-04-18 15:52:50] Validation | Batch 1050/1567 | Loss: 1.0694 [2026-04-18 15:52:51] Validation | Batch 1060/1567 | Loss: 1.0684 [2026-04-18 15:52:52] Validation | Batch 1070/1567 | Loss: 1.0676 [2026-04-18 15:52:52] Validation | Batch 1080/1567 | Loss: 1.0685 [2026-04-18 15:52:53] Validation | Batch 1090/1567 | Loss: 1.0683 [2026-04-18 15:52:54] Validation | Batch 1100/1567 | Loss: 1.0689 [2026-04-18 15:52:54] Validation | Batch 1110/1567 | Loss: 1.0687 [2026-04-18 15:52:55] Validation | Batch 1120/1567 | Loss: 1.0689 [2026-04-18 15:52:56] Validation | Batch 1130/1567 | Loss: 1.0690 [2026-04-18 15:52:57] Validation | Batch 1140/1567 | Loss: 1.0698 [2026-04-18 15:52:58] Validation | Batch 1150/1567 | Loss: 1.0702 [2026-04-18 15:52:58] Validation | Batch 1160/1567 | Loss: 1.0711 [2026-04-18 15:52:59] Validation | Batch 1170/1567 | Loss: 1.0708 [2026-04-18 15:53:00] Validation | Batch 1180/1567 | Loss: 1.0703 [2026-04-18 15:53:01] Validation | Batch 1190/1567 | Loss: 1.0715 [2026-04-18 15:53:02] Validation | Batch 1200/1567 | Loss: 1.0709 [2026-04-18 15:53:03] Validation | Batch 1210/1567 | Loss: 1.0697 [2026-04-18 15:53:03] Validation | Batch 1220/1567 | Loss: 1.0701 [2026-04-18 15:53:04] Validation | Batch 1230/1567 | Loss: 1.0722 [2026-04-18 15:53:05] Validation | Batch 1240/1567 | Loss: 1.0709 [2026-04-18 15:53:05] Validation | Batch 1250/1567 | Loss: 1.0709 [2026-04-18 15:53:06] Validation | Batch 1260/1567 | Loss: 1.0719 [2026-04-18 15:53:07] Validation | Batch 1270/1567 | Loss: 1.0719 [2026-04-18 15:53:08] Validation | Batch 1280/1567 | Loss: 1.0713 [2026-04-18 15:53:09] Validation | Batch 1290/1567 | Loss: 1.0716 [2026-04-18 15:53:10] Validation | Batch 1300/1567 | Loss: 1.0719 [2026-04-18 15:53:11] Validation | Batch 1310/1567 | Loss: 1.0722 [2026-04-18 15:53:12] Validation | Batch 1320/1567 | Loss: 1.0713 [2026-04-18 15:53:12] Validation | Batch 1330/1567 | Loss: 1.0709 [2026-04-18 15:53:13] Validation | Batch 1340/1567 | Loss: 1.0707 [2026-04-18 15:53:14] Validation | Batch 1350/1567 | Loss: 1.0716 [2026-04-18 15:53:15] Validation | Batch 1360/1567 | Loss: 1.0713 [2026-04-18 15:53:15] Validation | Batch 1370/1567 | Loss: 1.0716 [2026-04-18 15:53:16] Validation | Batch 1380/1567 | Loss: 1.0730 [2026-04-18 15:53:17] Validation | Batch 1390/1567 | Loss: 1.0731 [2026-04-18 15:53:18] Validation | Batch 1400/1567 | Loss: 1.0736 [2026-04-18 15:53:18] Validation | Batch 1410/1567 | Loss: 1.0734 [2026-04-18 15:53:19] Validation | Batch 1420/1567 | Loss: 1.0740 [2026-04-18 15:53:20] Validation | Batch 1430/1567 | Loss: 1.0737 [2026-04-18 15:53:21] Validation | Batch 1440/1567 | Loss: 1.0739 [2026-04-18 15:53:22] Validation | Batch 1450/1567 | Loss: 1.0732 [2026-04-18 15:53:22] Validation | Batch 1460/1567 | Loss: 1.0730 [2026-04-18 15:53:23] Validation | Batch 1470/1567 | Loss: 1.0720 [2026-04-18 15:53:24] Validation | Batch 1480/1567 | Loss: 1.0704 [2026-04-18 15:53:24] Validation | Batch 1490/1567 | Loss: 1.0704 [2026-04-18 15:53:25] Validation | Batch 1500/1567 | Loss: 1.0706 [2026-04-18 15:53:26] Validation | Batch 1510/1567 | Loss: 1.0704 [2026-04-18 15:53:27] Validation | Batch 1520/1567 | Loss: 1.0696 [2026-04-18 15:53:27] Validation | Batch 1530/1567 | Loss: 1.0704 [2026-04-18 15:53:28] Validation | Batch 1540/1567 | Loss: 1.0715 [2026-04-18 15:53:29] Validation | Batch 1550/1567 | Loss: 1.0717 [2026-04-18 15:53:30] Validation | Batch 1560/1567 | Loss: 1.0708 [2026-04-18 15:53:31] Validation | Batch 1567/1567 | Loss: 1.0713 [2026-04-18 15:53:31] Validation | Loss: 1.0713 | PPL: 2.94 | Time: 125.23s [2026-04-18 15:53:34] Epoch 3 | Step 26010 | Loss: 0.7020 | LR: 2.00e-06 [2026-04-18 15:53:38] Epoch 3 | Step 26020 | Loss: 0.7020 | LR: 2.00e-06 [2026-04-18 15:53:41] Epoch 3 | Step 26030 | Loss: 0.7020 | LR: 2.00e-06 [2026-04-18 15:53:44] Epoch 3 | Step 26040 | Loss: 0.7020 | LR: 2.00e-06 [2026-04-18 15:53:48] Epoch 3 | Step 26050 | Loss: 0.7020 | LR: 2.00e-06 [2026-04-18 15:53:51] Epoch 3 | Step 26060 | Loss: 0.7021 | LR: 2.00e-06 [2026-04-18 15:53:55] Epoch 3 | Step 26070 | Loss: 0.7020 | LR: 2.00e-06 [2026-04-18 15:53:58] Epoch 3 | Step 26080 | Loss: 0.7019 | LR: 2.00e-06 [2026-04-18 15:54:02] Epoch 3 | Step 26090 | Loss: 0.7019 | LR: 2.00e-06 [2026-04-18 15:54:06] Epoch 3 | Step 26100 | Loss: 0.7019 | LR: 2.00e-06 [2026-04-18 15:54:09] Epoch 3 | Step 26110 | Loss: 0.7018 | LR: 2.00e-06 [2026-04-18 15:54:13] Epoch 3 | Step 26120 | Loss: 0.7018 | LR: 2.00e-06 [2026-04-18 15:54:17] Epoch 3 | Step 26130 | Loss: 0.7018 | LR: 2.00e-06 [2026-04-18 15:54:20] Epoch 3 | Step 26140 | Loss: 0.7019 | LR: 2.00e-06 [2026-04-18 15:54:24] Epoch 3 | Step 26150 | Loss: 0.7018 | LR: 2.00e-06 [2026-04-18 15:54:28] Epoch 3 | Step 26160 | Loss: 0.7018 | LR: 2.00e-06 [2026-04-18 15:54:31] Epoch 3 | Step 26170 | Loss: 0.7018 | LR: 2.00e-06 [2026-04-18 15:54:35] Epoch 3 | Step 26180 | Loss: 0.7018 | LR: 2.00e-06 [2026-04-18 15:54:39] Epoch 3 | Step 26190 | Loss: 0.7017 | LR: 2.00e-06 [2026-04-18 15:54:42] Epoch 3 | Step 26200 | Loss: 0.7019 | LR: 2.00e-06 [2026-04-18 15:54:46] Epoch 3 | Step 26210 | Loss: 0.7015 | LR: 2.00e-06 [2026-04-18 15:54:49] Epoch 3 | Step 26220 | Loss: 0.7018 | LR: 2.00e-06 [2026-04-18 15:54:53] Epoch 3 | Step 26230 | Loss: 0.7018 | LR: 2.00e-06 [2026-04-18 15:54:56] Epoch 3 | Step 26240 | Loss: 0.7017 | LR: 2.00e-06 [2026-04-18 15:55:00] Epoch 3 | Step 26250 | Loss: 0.7018 | LR: 2.00e-06 [2026-04-18 15:55:03] Epoch 3 | Step 26260 | Loss: 0.7019 | LR: 2.00e-06 [2026-04-18 15:55:07] Epoch 3 | Step 26270 | Loss: 0.7018 | LR: 2.00e-06 [2026-04-18 15:55:10] Epoch 3 | Step 26280 | Loss: 0.7017 | LR: 2.00e-06 [2026-04-18 15:55:14] Epoch 3 | Step 26290 | Loss: 0.7016 | LR: 2.00e-06 [2026-04-18 15:55:17] Epoch 3 | Step 26300 | Loss: 0.7016 | LR: 2.00e-06 [2026-04-18 15:55:22] Epoch 3 | Step 26310 | Loss: 0.7016 | LR: 2.00e-06 [2026-04-18 15:55:26] Epoch 3 | Step 26320 | Loss: 0.7017 | LR: 2.00e-06 [2026-04-18 15:55:29] Epoch 3 | Step 26330 | Loss: 0.7019 | LR: 2.00e-06 [2026-04-18 15:55:33] Epoch 3 | Step 26340 | Loss: 0.7019 | LR: 2.00e-06 [2026-04-18 15:55:36] Epoch 3 | Step 26350 | Loss: 0.7019 | LR: 2.00e-06 [2026-04-18 15:55:40] Epoch 3 | Step 26360 | Loss: 0.7020 | LR: 2.00e-06 [2026-04-18 15:55:43] Epoch 3 | Step 26370 | Loss: 0.7021 | LR: 2.00e-06 [2026-04-18 15:55:47] Epoch 3 | Step 26380 | Loss: 0.7021 | LR: 2.00e-06 [2026-04-18 15:55:50] Epoch 3 | Step 26390 | Loss: 0.7022 | LR: 2.00e-06 [2026-04-18 15:55:54] Epoch 3 | Step 26400 | Loss: 0.7022 | LR: 2.00e-06 [2026-04-18 15:55:58] Epoch 3 | Step 26410 | Loss: 0.7023 | LR: 2.00e-06 [2026-04-18 15:56:01] Epoch 3 | Step 26420 | Loss: 0.7024 | LR: 2.00e-06 [2026-04-18 15:56:05] Epoch 3 | Step 26430 | Loss: 0.7025 | LR: 2.00e-06 [2026-04-18 15:56:09] Epoch 3 | Step 26440 | Loss: 0.7025 | LR: 2.00e-06 [2026-04-18 15:56:13] Epoch 3 | Step 26450 | Loss: 0.7024 | LR: 2.00e-06 [2026-04-18 15:56:17] Epoch 3 | Step 26460 | Loss: 0.7025 | LR: 2.00e-06 [2026-04-18 15:56:20] Epoch 3 | Step 26470 | Loss: 0.7026 | LR: 2.00e-06 [2026-04-18 15:56:24] Epoch 3 | Step 26480 | Loss: 0.7025 | LR: 2.00e-06 [2026-04-18 15:56:28] Epoch 3 | Step 26490 | Loss: 0.7025 | LR: 2.00e-06 [2026-04-18 15:56:31] Epoch 3 | Step 26500 | Loss: 0.7024 | LR: 2.00e-06 [2026-04-18 15:56:34] Epoch 3 | Step 26510 | Loss: 0.7025 | LR: 2.00e-06 [2026-04-18 15:56:39] Epoch 3 | Step 26520 | Loss: 0.7026 | LR: 2.00e-06 [2026-04-18 15:56:42] Epoch 3 | Step 26530 | Loss: 0.7024 | LR: 2.00e-06 [2026-04-18 15:56:46] Epoch 3 | Step 26540 | Loss: 0.7023 | LR: 2.00e-06 [2026-04-18 15:56:49] Epoch 3 | Step 26550 | Loss: 0.7024 | LR: 2.00e-06 [2026-04-18 15:56:53] Epoch 3 | Step 26560 | Loss: 0.7024 | LR: 2.00e-06 [2026-04-18 15:56:57] Epoch 3 | Step 26570 | Loss: 0.7026 | LR: 2.00e-06 [2026-04-18 15:57:01] Epoch 3 | Step 26580 | Loss: 0.7026 | LR: 2.00e-06 [2026-04-18 15:57:04] Epoch 3 | Step 26590 | Loss: 0.7025 | LR: 2.00e-06 [2026-04-18 15:57:08] Epoch 3 | Step 26600 | Loss: 0.7025 | LR: 2.00e-06 [2026-04-18 15:57:11] Epoch 3 | Step 26610 | Loss: 0.7025 | LR: 2.00e-06 [2026-04-18 15:57:15] Epoch 3 | Step 26620 | Loss: 0.7023 | LR: 2.00e-06 [2026-04-18 15:57:19] Epoch 3 | Step 26630 | Loss: 0.7025 | LR: 2.00e-06 [2026-04-18 15:57:22] Epoch 3 | Step 26640 | Loss: 0.7024 | LR: 2.00e-06 [2026-04-18 15:57:26] Epoch 3 | Step 26650 | Loss: 0.7025 | LR: 2.00e-06 [2026-04-18 15:57:29] Epoch 3 | Step 26660 | Loss: 0.7026 | LR: 2.00e-06 [2026-04-18 15:57:32] Epoch 3 | Step 26670 | Loss: 0.7025 | LR: 2.00e-06 [2026-04-18 15:57:36] Epoch 3 | Step 26680 | Loss: 0.7025 | LR: 2.00e-06 [2026-04-18 15:57:39] Epoch 3 | Step 26690 | Loss: 0.7025 | LR: 2.00e-06 [2026-04-18 15:57:42] Epoch 3 | Step 26700 | Loss: 0.7025 | LR: 2.00e-06 [2026-04-18 15:57:46] Epoch 3 | Step 26710 | Loss: 0.7025 | LR: 2.00e-06 [2026-04-18 15:57:50] Epoch 3 | Step 26720 | Loss: 0.7024 | LR: 2.00e-06 [2026-04-18 15:57:53] Epoch 3 | Step 26730 | Loss: 0.7025 | LR: 2.00e-06 [2026-04-18 15:57:57] Epoch 3 | Step 26740 | Loss: 0.7024 | LR: 2.00e-06 [2026-04-18 15:58:01] Epoch 3 | Step 26750 | Loss: 0.7025 | LR: 2.00e-06 [2026-04-18 15:58:05] Epoch 3 | Step 26760 | Loss: 0.7025 | LR: 2.00e-06 [2026-04-18 15:58:09] Epoch 3 | Step 26770 | Loss: 0.7025 | LR: 2.00e-06 [2026-04-18 15:58:12] Epoch 3 | Step 26780 | Loss: 0.7026 | LR: 2.00e-06 [2026-04-18 15:58:16] Epoch 3 | Step 26790 | Loss: 0.7027 | LR: 2.00e-06 [2026-04-18 15:58:20] Epoch 3 | Step 26800 | Loss: 0.7026 | LR: 2.00e-06 [2026-04-18 15:58:24] Epoch 3 | Step 26810 | Loss: 0.7026 | LR: 2.00e-06 [2026-04-18 15:58:28] Epoch 3 | Step 26820 | Loss: 0.7027 | LR: 2.00e-06 [2026-04-18 15:58:31] Epoch 3 | Step 26830 | Loss: 0.7025 | LR: 2.00e-06 [2026-04-18 15:58:34] Epoch 3 | Step 26840 | Loss: 0.7026 | LR: 2.00e-06 [2026-04-18 15:58:38] Epoch 3 | Step 26850 | Loss: 0.7025 | LR: 2.00e-06 [2026-04-18 15:58:41] Epoch 3 | Step 26860 | Loss: 0.7025 | LR: 2.00e-06 [2026-04-18 15:58:44] Epoch 3 | Step 26870 | Loss: 0.7026 | LR: 2.00e-06 [2026-04-18 15:58:48] Epoch 3 | Step 26880 | Loss: 0.7027 | LR: 2.00e-06 [2026-04-18 15:58:51] Epoch 3 | Step 26890 | Loss: 0.7027 | LR: 2.00e-06 [2026-04-18 15:58:55] Epoch 3 | Step 26900 | Loss: 0.7026 | LR: 2.00e-06 [2026-04-18 15:58:59] Epoch 3 | Step 26910 | Loss: 0.7025 | LR: 2.00e-06 [2026-04-18 15:59:02] Epoch 3 | Step 26920 | Loss: 0.7025 | LR: 2.00e-06 [2026-04-18 15:59:06] Epoch 3 | Step 26930 | Loss: 0.7023 | LR: 2.00e-06 [2026-04-18 15:59:09] Epoch 3 | Step 26940 | Loss: 0.7023 | LR: 2.00e-06 [2026-04-18 15:59:13] Epoch 3 | Step 26950 | Loss: 0.7024 | LR: 2.00e-06 [2026-04-18 15:59:16] Epoch 3 | Step 26960 | Loss: 0.7023 | LR: 2.00e-06 [2026-04-18 15:59:20] Epoch 3 | Step 26970 | Loss: 0.7024 | LR: 2.00e-06 [2026-04-18 15:59:23] Epoch 3 | Step 26980 | Loss: 0.7022 | LR: 2.00e-06 [2026-04-18 15:59:27] Epoch 3 | Step 26990 | Loss: 0.7022 | LR: 2.00e-06 [2026-04-18 15:59:31] Epoch 3 | Step 27000 | Loss: 0.7024 | LR: 2.00e-06 [2026-04-18 15:59:40] Checkpoint saved: outputs/2026-04-18/12-19-14/checkpoints/checkpoint_step_27000.pt [2026-04-18 15:59:55] Validation | Batch 10/1567 | Loss: 0.9455 [2026-04-18 15:59:56] Validation | Batch 20/1567 | Loss: 1.0160 [2026-04-18 15:59:57] Validation | Batch 30/1567 | Loss: 1.0560 [2026-04-18 15:59:58] Validation | Batch 40/1567 | Loss: 1.0797 [2026-04-18 15:59:58] Validation | Batch 50/1567 | Loss: 1.0545 [2026-04-18 16:00:00] Validation | Batch 60/1567 | Loss: 1.0405 [2026-04-18 16:00:00] Validation | Batch 70/1567 | Loss: 1.0257 [2026-04-18 16:00:01] Validation | Batch 80/1567 | Loss: 1.0444 [2026-04-18 16:00:02] Validation | Batch 90/1567 | Loss: 1.0524 [2026-04-18 16:00:03] Validation | Batch 100/1567 | Loss: 1.0628 [2026-04-18 16:00:04] Validation | Batch 110/1567 | Loss: 1.0544 [2026-04-18 16:00:05] Validation | Batch 120/1567 | Loss: 1.0655 [2026-04-18 16:00:06] Validation | Batch 130/1567 | Loss: 1.0671 [2026-04-18 16:00:06] Validation | Batch 140/1567 | Loss: 1.0692 [2026-04-18 16:00:07] Validation | Batch 150/1567 | Loss: 1.0770 [2026-04-18 16:00:08] Validation | Batch 160/1567 | Loss: 1.0782 [2026-04-18 16:00:09] Validation | Batch 170/1567 | Loss: 1.0626 [2026-04-18 16:00:09] Validation | Batch 180/1567 | Loss: 1.0652 [2026-04-18 16:00:10] Validation | Batch 190/1567 | Loss: 1.0613 [2026-04-18 16:00:11] Validation | Batch 200/1567 | Loss: 1.0639 [2026-04-18 16:00:12] Validation | Batch 210/1567 | Loss: 1.0646 [2026-04-18 16:00:12] Validation | Batch 220/1567 | Loss: 1.0668 [2026-04-18 16:00:14] Validation | Batch 230/1567 | Loss: 1.0708 [2026-04-18 16:00:14] Validation | Batch 240/1567 | Loss: 1.0691 [2026-04-18 16:00:15] Validation | Batch 250/1567 | Loss: 1.0629 [2026-04-18 16:00:16] Validation | Batch 260/1567 | Loss: 1.0581 [2026-04-18 16:00:16] Validation | Batch 270/1567 | Loss: 1.0552 [2026-04-18 16:00:17] Validation | Batch 280/1567 | Loss: 1.0564 [2026-04-18 16:00:18] Validation | Batch 290/1567 | Loss: 1.0619 [2026-04-18 16:00:19] Validation | Batch 300/1567 | Loss: 1.0669 [2026-04-18 16:00:20] Validation | Batch 310/1567 | Loss: 1.0658 [2026-04-18 16:00:21] Validation | Batch 320/1567 | Loss: 1.0665 [2026-04-18 16:00:22] Validation | Batch 330/1567 | Loss: 1.0635 [2026-04-18 16:00:23] Validation | Batch 340/1567 | Loss: 1.0676 [2026-04-18 16:00:23] Validation | Batch 350/1567 | Loss: 1.0664 [2026-04-18 16:00:24] Validation | Batch 360/1567 | Loss: 1.0641 [2026-04-18 16:00:25] Validation | Batch 370/1567 | Loss: 1.0612 [2026-04-18 16:00:26] Validation | Batch 380/1567 | Loss: 1.0645 [2026-04-18 16:00:26] Validation | Batch 390/1567 | Loss: 1.0656 [2026-04-18 16:00:27] Validation | Batch 400/1567 | Loss: 1.0669 [2026-04-18 16:00:28] Validation | Batch 410/1567 | Loss: 1.0662 [2026-04-18 16:00:29] Validation | Batch 420/1567 | Loss: 1.0658 [2026-04-18 16:00:30] Validation | Batch 430/1567 | Loss: 1.0657 [2026-04-18 16:00:31] Validation | Batch 440/1567 | Loss: 1.0645 [2026-04-18 16:00:31] Validation | Batch 450/1567 | Loss: 1.0646 [2026-04-18 16:00:32] Validation | Batch 460/1567 | Loss: 1.0637 [2026-04-18 16:00:33] Validation | Batch 470/1567 | Loss: 1.0631 [2026-04-18 16:00:34] Validation | Batch 480/1567 | Loss: 1.0609 [2026-04-18 16:00:34] Validation | Batch 490/1567 | Loss: 1.0606 [2026-04-18 16:00:35] Validation | Batch 500/1567 | Loss: 1.0601 [2026-04-18 16:00:36] Validation | Batch 510/1567 | Loss: 1.0625 [2026-04-18 16:00:37] Validation | Batch 520/1567 | Loss: 1.0643 [2026-04-18 16:00:38] Validation | Batch 530/1567 | Loss: 1.0639 [2026-04-18 16:00:39] Validation | Batch 540/1567 | Loss: 1.0667 [2026-04-18 16:00:39] Validation | Batch 550/1567 | Loss: 1.0703 [2026-04-18 16:00:40] Validation | Batch 560/1567 | Loss: 1.0701 [2026-04-18 16:00:42] Validation | Batch 570/1567 | Loss: 1.0701 [2026-04-18 16:00:42] Validation | Batch 580/1567 | Loss: 1.0692 [2026-04-18 16:00:43] Validation | Batch 590/1567 | Loss: 1.0679 [2026-04-18 16:00:44] Validation | Batch 600/1567 | Loss: 1.0661 [2026-04-18 16:00:45] Validation | Batch 610/1567 | Loss: 1.0651 [2026-04-18 16:00:46] Validation | Batch 620/1567 | Loss: 1.0666 [2026-04-18 16:00:47] Validation | Batch 630/1567 | Loss: 1.0645 [2026-04-18 16:00:48] Validation | Batch 640/1567 | Loss: 1.0662 [2026-04-18 16:00:49] Validation | Batch 650/1567 | Loss: 1.0653 [2026-04-18 16:00:49] Validation | Batch 660/1567 | Loss: 1.0642 [2026-04-18 16:00:50] Validation | Batch 670/1567 | Loss: 1.0622 [2026-04-18 16:00:51] Validation | Batch 680/1567 | Loss: 1.0616 [2026-04-18 16:00:51] Validation | Batch 690/1567 | Loss: 1.0625 [2026-04-18 16:00:52] Validation | Batch 700/1567 | Loss: 1.0611 [2026-04-18 16:00:53] Validation | Batch 710/1567 | Loss: 1.0625 [2026-04-18 16:00:54] Validation | Batch 720/1567 | Loss: 1.0616 [2026-04-18 16:00:55] Validation | Batch 730/1567 | Loss: 1.0622 [2026-04-18 16:00:55] Validation | Batch 740/1567 | Loss: 1.0634 [2026-04-18 16:00:56] Validation | Batch 750/1567 | Loss: 1.0640 [2026-04-18 16:00:57] Validation | Batch 760/1567 | Loss: 1.0638 [2026-04-18 16:00:58] Validation | Batch 770/1567 | Loss: 1.0659 [2026-04-18 16:00:59] Validation | Batch 780/1567 | Loss: 1.0671 [2026-04-18 16:01:00] Validation | Batch 790/1567 | Loss: 1.0666 [2026-04-18 16:01:00] Validation | Batch 800/1567 | Loss: 1.0684 [2026-04-18 16:01:01] Validation | Batch 810/1567 | Loss: 1.0685 [2026-04-18 16:01:02] Validation | Batch 820/1567 | Loss: 1.0680 [2026-04-18 16:01:03] Validation | Batch 830/1567 | Loss: 1.0665 [2026-04-18 16:01:03] Validation | Batch 840/1567 | Loss: 1.0665 [2026-04-18 16:01:04] Validation | Batch 850/1567 | Loss: 1.0651 [2026-04-18 16:01:05] Validation | Batch 860/1567 | Loss: 1.0668 [2026-04-18 16:01:05] Validation | Batch 870/1567 | Loss: 1.0673 [2026-04-18 16:01:06] Validation | Batch 880/1567 | Loss: 1.0682 [2026-04-18 16:01:07] Validation | Batch 890/1567 | Loss: 1.0687 [2026-04-18 16:01:08] Validation | Batch 900/1567 | Loss: 1.0707 [2026-04-18 16:01:08] Validation | Batch 910/1567 | Loss: 1.0708 [2026-04-18 16:01:09] Validation | Batch 920/1567 | Loss: 1.0731 [2026-04-18 16:01:10] Validation | Batch 930/1567 | Loss: 1.0707 [2026-04-18 16:01:11] Validation | Batch 940/1567 | Loss: 1.0703 [2026-04-18 16:01:12] Validation | Batch 950/1567 | Loss: 1.0693 [2026-04-18 16:01:12] Validation | Batch 960/1567 | Loss: 1.0679 [2026-04-18 16:01:13] Validation | Batch 970/1567 | Loss: 1.0697 [2026-04-18 16:01:14] Validation | Batch 980/1567 | Loss: 1.0700 [2026-04-18 16:01:14] Validation | Batch 990/1567 | Loss: 1.0695 [2026-04-18 16:01:15] Validation | Batch 1000/1567 | Loss: 1.0699 [2026-04-18 16:01:16] Validation | Batch 1010/1567 | Loss: 1.0676 [2026-04-18 16:01:17] Validation | Batch 1020/1567 | Loss: 1.0679 [2026-04-18 16:01:17] Validation | Batch 1030/1567 | Loss: 1.0695 [2026-04-18 16:01:18] Validation | Batch 1040/1567 | Loss: 1.0690 [2026-04-18 16:01:19] Validation | Batch 1050/1567 | Loss: 1.0700 [2026-04-18 16:01:20] Validation | Batch 1060/1567 | Loss: 1.0691 [2026-04-18 16:01:21] Validation | Batch 1070/1567 | Loss: 1.0683 [2026-04-18 16:01:21] Validation | Batch 1080/1567 | Loss: 1.0692 [2026-04-18 16:01:22] Validation | Batch 1090/1567 | Loss: 1.0689 [2026-04-18 16:01:23] Validation | Batch 1100/1567 | Loss: 1.0696 [2026-04-18 16:01:23] Validation | Batch 1110/1567 | Loss: 1.0694 [2026-04-18 16:01:24] Validation | Batch 1120/1567 | Loss: 1.0697 [2026-04-18 16:01:25] Validation | Batch 1130/1567 | Loss: 1.0697 [2026-04-18 16:01:26] Validation | Batch 1140/1567 | Loss: 1.0705 [2026-04-18 16:01:27] Validation | Batch 1150/1567 | Loss: 1.0710 [2026-04-18 16:01:27] Validation | Batch 1160/1567 | Loss: 1.0718 [2026-04-18 16:01:28] Validation | Batch 1170/1567 | Loss: 1.0715 [2026-04-18 16:01:29] Validation | Batch 1180/1567 | Loss: 1.0710 [2026-04-18 16:01:30] Validation | Batch 1190/1567 | Loss: 1.0722 [2026-04-18 16:01:31] Validation | Batch 1200/1567 | Loss: 1.0716 [2026-04-18 16:01:31] Validation | Batch 1210/1567 | Loss: 1.0704 [2026-04-18 16:01:32] Validation | Batch 1220/1567 | Loss: 1.0708 [2026-04-18 16:01:33] Validation | Batch 1230/1567 | Loss: 1.0729 [2026-04-18 16:01:34] Validation | Batch 1240/1567 | Loss: 1.0716 [2026-04-18 16:01:34] Validation | Batch 1250/1567 | Loss: 1.0716 [2026-04-18 16:01:35] Validation | Batch 1260/1567 | Loss: 1.0727 [2026-04-18 16:01:36] Validation | Batch 1270/1567 | Loss: 1.0726 [2026-04-18 16:01:37] Validation | Batch 1280/1567 | Loss: 1.0720 [2026-04-18 16:01:38] Validation | Batch 1290/1567 | Loss: 1.0723 [2026-04-18 16:01:39] Validation | Batch 1300/1567 | Loss: 1.0726 [2026-04-18 16:01:40] Validation | Batch 1310/1567 | Loss: 1.0730 [2026-04-18 16:01:41] Validation | Batch 1320/1567 | Loss: 1.0721 [2026-04-18 16:01:41] Validation | Batch 1330/1567 | Loss: 1.0717 [2026-04-18 16:01:42] Validation | Batch 1340/1567 | Loss: 1.0714 [2026-04-18 16:01:43] Validation | Batch 1350/1567 | Loss: 1.0723 [2026-04-18 16:01:44] Validation | Batch 1360/1567 | Loss: 1.0720 [2026-04-18 16:01:44] Validation | Batch 1370/1567 | Loss: 1.0723 [2026-04-18 16:01:45] Validation | Batch 1380/1567 | Loss: 1.0737 [2026-04-18 16:01:46] Validation | Batch 1390/1567 | Loss: 1.0738 [2026-04-18 16:01:47] Validation | Batch 1400/1567 | Loss: 1.0743 [2026-04-18 16:01:48] Validation | Batch 1410/1567 | Loss: 1.0741 [2026-04-18 16:01:49] Validation | Batch 1420/1567 | Loss: 1.0747 [2026-04-18 16:01:49] Validation | Batch 1430/1567 | Loss: 1.0744 [2026-04-18 16:01:50] Validation | Batch 1440/1567 | Loss: 1.0747 [2026-04-18 16:01:51] Validation | Batch 1450/1567 | Loss: 1.0739 [2026-04-18 16:01:52] Validation | Batch 1460/1567 | Loss: 1.0737 [2026-04-18 16:01:52] Validation | Batch 1470/1567 | Loss: 1.0727 [2026-04-18 16:01:53] Validation | Batch 1480/1567 | Loss: 1.0711 [2026-04-18 16:01:54] Validation | Batch 1490/1567 | Loss: 1.0711 [2026-04-18 16:01:55] Validation | Batch 1500/1567 | Loss: 1.0712 [2026-04-18 16:01:55] Validation | Batch 1510/1567 | Loss: 1.0711 [2026-04-18 16:01:56] Validation | Batch 1520/1567 | Loss: 1.0703 [2026-04-18 16:01:57] Validation | Batch 1530/1567 | Loss: 1.0711 [2026-04-18 16:01:58] Validation | Batch 1540/1567 | Loss: 1.0722 [2026-04-18 16:01:58] Validation | Batch 1550/1567 | Loss: 1.0724 [2026-04-18 16:01:59] Validation | Batch 1560/1567 | Loss: 1.0715 [2026-04-18 16:02:00] Validation | Batch 1567/1567 | Loss: 1.0720 [2026-04-18 16:02:00] Validation | Loss: 1.0720 | PPL: 2.94 | Time: 125.84s [2026-04-18 16:02:04] Epoch 3 | Step 27010 | Loss: 0.7023 | LR: 2.00e-06 [2026-04-18 16:02:07] Epoch 3 | Step 27020 | Loss: 0.7023 | LR: 2.00e-06 [2026-04-18 16:02:11] Epoch 3 | Step 27030 | Loss: 0.7024 | LR: 2.00e-06 [2026-04-18 16:02:15] Epoch 3 | Step 27040 | Loss: 0.7023 | LR: 2.00e-06 [2026-04-18 16:02:18] Epoch 3 | Step 27050 | Loss: 0.7023 | LR: 2.00e-06 [2026-04-18 16:02:22] Epoch 3 | Step 27060 | Loss: 0.7024 | LR: 2.00e-06 [2026-04-18 16:02:26] Epoch 3 | Step 27070 | Loss: 0.7023 | LR: 2.00e-06 [2026-04-18 16:02:29] Epoch 3 | Step 27080 | Loss: 0.7022 | LR: 2.00e-06 [2026-04-18 16:02:33] Epoch 3 | Step 27090 | Loss: 0.7023 | LR: 2.00e-06 [2026-04-18 16:02:36] Epoch 3 | Step 27100 | Loss: 0.7025 | LR: 2.00e-06 [2026-04-18 16:02:40] Epoch 3 | Step 27110 | Loss: 0.7025 | LR: 2.00e-06 [2026-04-18 16:02:43] Epoch 3 | Step 27120 | Loss: 0.7025 | LR: 2.00e-06 [2026-04-18 16:02:46] Epoch 3 | Step 27130 | Loss: 0.7024 | LR: 2.00e-06 [2026-04-18 16:02:50] Epoch 3 | Step 27140 | Loss: 0.7026 | LR: 2.00e-06 [2026-04-18 16:02:53] Epoch 3 | Step 27150 | Loss: 0.7026 | LR: 2.00e-06 [2026-04-18 16:02:57] Epoch 3 | Step 27160 | Loss: 0.7025 | LR: 2.00e-06 [2026-04-18 16:03:00] Epoch 3 | Step 27170 | Loss: 0.7024 | LR: 2.00e-06 [2026-04-18 16:03:04] Epoch 3 | Step 27180 | Loss: 0.7024 | LR: 2.00e-06 [2026-04-18 16:03:07] Epoch 3 | Step 27190 | Loss: 0.7025 | LR: 2.00e-06 [2026-04-18 16:03:11] Epoch 3 | Step 27200 | Loss: 0.7025 | LR: 2.00e-06 [2026-04-18 16:03:14] Epoch 3 | Step 27210 | Loss: 0.7026 | LR: 2.00e-06 [2026-04-18 16:03:18] Epoch 3 | Step 27220 | Loss: 0.7026 | LR: 2.00e-06 [2026-04-18 16:03:21] Epoch 3 | Step 27230 | Loss: 0.7025 | LR: 2.00e-06 [2026-04-18 16:03:25] Epoch 3 | Step 27240 | Loss: 0.7025 | LR: 2.00e-06 [2026-04-18 16:03:29] Epoch 3 | Step 27250 | Loss: 0.7025 | LR: 2.00e-06 [2026-04-18 16:03:32] Epoch 3 | Step 27260 | Loss: 0.7025 | LR: 2.00e-06 [2026-04-18 16:03:36] Epoch 3 | Step 27270 | Loss: 0.7025 | LR: 2.00e-06 [2026-04-18 16:03:39] Epoch 3 | Step 27280 | Loss: 0.7025 | LR: 2.00e-06 [2026-04-18 16:03:43] Epoch 3 | Step 27290 | Loss: 0.7026 | LR: 2.00e-06 [2026-04-18 16:03:46] Epoch 3 | Step 27300 | Loss: 0.7026 | LR: 2.00e-06 [2026-04-18 16:03:50] Epoch 3 | Step 27310 | Loss: 0.7027 | LR: 2.00e-06 [2026-04-18 16:03:54] Epoch 3 | Step 27320 | Loss: 0.7027 | LR: 2.00e-06 [2026-04-18 16:03:57] Epoch 3 | Step 27330 | Loss: 0.7027 | LR: 2.00e-06 [2026-04-18 16:04:00] Epoch 3 | Step 27340 | Loss: 0.7027 | LR: 2.00e-06 [2026-04-18 16:04:04] Epoch 3 | Step 27350 | Loss: 0.7026 | LR: 2.00e-06 [2026-04-18 16:04:08] Epoch 3 | Step 27360 | Loss: 0.7026 | LR: 2.00e-06 [2026-04-18 16:04:12] Epoch 3 | Step 27370 | Loss: 0.7027 | LR: 2.00e-06 [2026-04-18 16:04:15] Epoch 3 | Step 27380 | Loss: 0.7025 | LR: 2.00e-06 [2026-04-18 16:04:19] Epoch 3 | Step 27390 | Loss: 0.7026 | LR: 2.00e-06 [2026-04-18 16:04:23] Epoch 3 | Step 27400 | Loss: 0.7025 | LR: 2.00e-06 [2026-04-18 16:04:26] Epoch 3 | Step 27410 | Loss: 0.7026 | LR: 2.00e-06 [2026-04-18 16:04:29] Epoch 3 | Step 27420 | Loss: 0.7026 | LR: 2.00e-06 [2026-04-18 16:04:33] Epoch 3 | Step 27430 | Loss: 0.7027 | LR: 2.00e-06 [2026-04-18 16:04:37] Epoch 3 | Step 27440 | Loss: 0.7027 | LR: 2.00e-06 [2026-04-18 16:04:40] Epoch 3 | Step 27450 | Loss: 0.7028 | LR: 2.00e-06 [2026-04-18 16:04:44] Epoch 3 | Step 27460 | Loss: 0.7027 | LR: 2.00e-06 [2026-04-18 16:04:47] Epoch 3 | Step 27470 | Loss: 0.7027 | LR: 2.00e-06 [2026-04-18 16:04:51] Epoch 3 | Step 27480 | Loss: 0.7027 | LR: 2.00e-06 [2026-04-18 16:04:54] Epoch 3 | Step 27490 | Loss: 0.7025 | LR: 2.00e-06 [2026-04-18 16:04:58] Epoch 3 | Step 27500 | Loss: 0.7026 | LR: 2.00e-06 [2026-04-18 16:05:02] Epoch 3 | Step 27510 | Loss: 0.7027 | LR: 2.00e-06 [2026-04-18 16:05:05] Epoch 3 | Step 27520 | Loss: 0.7027 | LR: 2.00e-06 [2026-04-18 16:05:09] Epoch 3 | Step 27530 | Loss: 0.7026 | LR: 2.00e-06 [2026-04-18 16:05:13] Epoch 3 | Step 27540 | Loss: 0.7025 | LR: 2.00e-06 [2026-04-18 16:05:16] Epoch 3 | Step 27550 | Loss: 0.7026 | LR: 2.00e-06 [2026-04-18 16:05:20] Epoch 3 | Step 27560 | Loss: 0.7026 | LR: 2.00e-06 [2026-04-18 16:05:23] Epoch 3 | Step 27570 | Loss: 0.7025 | LR: 2.00e-06 [2026-04-18 16:05:27] Epoch 3 | Step 27580 | Loss: 0.7025 | LR: 2.00e-06 [2026-04-18 16:05:31] Epoch 3 | Step 27590 | Loss: 0.7025 | LR: 2.00e-06 [2026-04-18 16:05:34] Epoch 3 | Step 27600 | Loss: 0.7024 | LR: 2.00e-06 [2026-04-18 16:05:38] Epoch 3 | Step 27610 | Loss: 0.7025 | LR: 2.00e-06 [2026-04-18 16:05:42] Epoch 3 | Step 27620 | Loss: 0.7025 | LR: 2.00e-06 [2026-04-18 16:05:45] Epoch 3 | Step 27630 | Loss: 0.7025 | LR: 2.00e-06 [2026-04-18 16:05:49] Epoch 3 | Step 27640 | Loss: 0.7026 | LR: 2.00e-06 [2026-04-18 16:05:52] Epoch 3 | Step 27650 | Loss: 0.7025 | LR: 2.00e-06 [2026-04-18 16:05:56] Epoch 3 | Step 27660 | Loss: 0.7025 | LR: 2.00e-06 [2026-04-18 16:06:00] Epoch 3 | Step 27670 | Loss: 0.7024 | LR: 2.00e-06 [2026-04-18 16:06:03] Epoch 3 | Step 27680 | Loss: 0.7023 | LR: 2.00e-06 [2026-04-18 16:06:07] Epoch 3 | Step 27690 | Loss: 0.7024 | LR: 2.00e-06 [2026-04-18 16:06:10] Epoch 3 | Step 27700 | Loss: 0.7023 | LR: 2.00e-06 [2026-04-18 16:06:14] Epoch 3 | Step 27710 | Loss: 0.7021 | LR: 2.00e-06 [2026-04-18 16:06:17] Epoch 3 | Step 27720 | Loss: 0.7021 | LR: 2.00e-06 [2026-04-18 16:06:21] Epoch 3 | Step 27730 | Loss: 0.7020 | LR: 2.00e-06 [2026-04-18 16:06:25] Epoch 3 | Step 27740 | Loss: 0.7021 | LR: 2.00e-06 [2026-04-18 16:06:28] Epoch 3 | Step 27750 | Loss: 0.7022 | LR: 2.00e-06 [2026-04-18 16:06:32] Epoch 3 | Step 27760 | Loss: 0.7022 | LR: 2.00e-06 [2026-04-18 16:06:35] Epoch 3 | Step 27770 | Loss: 0.7021 | LR: 2.00e-06 [2026-04-18 16:06:39] Epoch 3 | Step 27780 | Loss: 0.7022 | LR: 2.00e-06 [2026-04-18 16:06:42] Epoch 3 | Step 27790 | Loss: 0.7022 | LR: 2.00e-06 [2026-04-18 16:06:46] Epoch 3 | Step 27800 | Loss: 0.7023 | LR: 2.00e-06 [2026-04-18 16:06:49] Epoch 3 | Step 27810 | Loss: 0.7022 | LR: 2.00e-06 [2026-04-18 16:06:53] Epoch 3 | Step 27820 | Loss: 0.7022 | LR: 2.00e-06 [2026-04-18 16:06:56] Epoch 3 | Step 27830 | Loss: 0.7021 | LR: 2.00e-06 [2026-04-18 16:07:00] Epoch 3 | Step 27840 | Loss: 0.7021 | LR: 2.00e-06 [2026-04-18 16:07:04] Epoch 3 | Step 27850 | Loss: 0.7022 | LR: 2.00e-06 [2026-04-18 16:07:07] Epoch 3 | Step 27860 | Loss: 0.7021 | LR: 2.00e-06 [2026-04-18 16:07:11] Epoch 3 | Step 27870 | Loss: 0.7020 | LR: 2.00e-06 [2026-04-18 16:07:14] Epoch 3 | Step 27880 | Loss: 0.7021 | LR: 2.00e-06 [2026-04-18 16:07:18] Epoch 3 | Step 27890 | Loss: 0.7021 | LR: 2.00e-06 [2026-04-18 16:07:21] Epoch 3 | Step 27900 | Loss: 0.7022 | LR: 2.00e-06 [2026-04-18 16:07:25] Epoch 3 | Step 27910 | Loss: 0.7021 | LR: 2.00e-06 [2026-04-18 16:07:29] Epoch 3 | Step 27920 | Loss: 0.7021 | LR: 2.00e-06 [2026-04-18 16:07:32] Epoch 3 | Step 27930 | Loss: 0.7022 | LR: 2.00e-06 [2026-04-18 16:07:36] Epoch 3 | Step 27940 | Loss: 0.7021 | LR: 2.00e-06 [2026-04-18 16:07:39] Epoch 3 | Step 27950 | Loss: 0.7020 | LR: 2.00e-06 [2026-04-18 16:07:43] Epoch 3 | Step 27960 | Loss: 0.7019 | LR: 2.00e-06 [2026-04-18 16:07:47] Epoch 3 | Step 27970 | Loss: 0.7018 | LR: 2.00e-06 [2026-04-18 16:07:51] Epoch 3 | Step 27980 | Loss: 0.7019 | LR: 2.00e-06 [2026-04-18 16:07:55] Epoch 3 | Step 27990 | Loss: 0.7018 | LR: 2.00e-06 [2026-04-18 16:07:58] Epoch 3 | Step 28000 | Loss: 0.7017 | LR: 2.00e-06 [2026-04-18 16:07:59] Validation | Batch 10/1567 | Loss: 0.9465 [2026-04-18 16:08:00] Validation | Batch 20/1567 | Loss: 1.0157 [2026-04-18 16:08:01] Validation | Batch 30/1567 | Loss: 1.0563 [2026-04-18 16:08:02] Validation | Batch 40/1567 | Loss: 1.0807 [2026-04-18 16:08:02] Validation | Batch 50/1567 | Loss: 1.0553 [2026-04-18 16:08:04] Validation | Batch 60/1567 | Loss: 1.0413 [2026-04-18 16:08:04] Validation | Batch 70/1567 | Loss: 1.0264 [2026-04-18 16:08:05] Validation | Batch 80/1567 | Loss: 1.0447 [2026-04-18 16:08:06] Validation | Batch 90/1567 | Loss: 1.0529 [2026-04-18 16:08:07] Validation | Batch 100/1567 | Loss: 1.0632 [2026-04-18 16:08:08] Validation | Batch 110/1567 | Loss: 1.0550 [2026-04-18 16:08:09] Validation | Batch 120/1567 | Loss: 1.0657 [2026-04-18 16:08:10] Validation | Batch 130/1567 | Loss: 1.0673 [2026-04-18 16:08:10] Validation | Batch 140/1567 | Loss: 1.0694 [2026-04-18 16:08:11] Validation | Batch 150/1567 | Loss: 1.0771 [2026-04-18 16:08:12] Validation | Batch 160/1567 | Loss: 1.0783 [2026-04-18 16:08:13] Validation | Batch 170/1567 | Loss: 1.0626 [2026-04-18 16:08:13] Validation | Batch 180/1567 | Loss: 1.0652 [2026-04-18 16:08:14] Validation | Batch 190/1567 | Loss: 1.0614 [2026-04-18 16:08:15] Validation | Batch 200/1567 | Loss: 1.0641 [2026-04-18 16:08:16] Validation | Batch 210/1567 | Loss: 1.0649 [2026-04-18 16:08:17] Validation | Batch 220/1567 | Loss: 1.0670 [2026-04-18 16:08:18] Validation | Batch 230/1567 | Loss: 1.0709 [2026-04-18 16:08:19] Validation | Batch 240/1567 | Loss: 1.0693 [2026-04-18 16:08:19] Validation | Batch 250/1567 | Loss: 1.0631 [2026-04-18 16:08:20] Validation | Batch 260/1567 | Loss: 1.0583 [2026-04-18 16:08:21] Validation | Batch 270/1567 | Loss: 1.0554 [2026-04-18 16:08:22] Validation | Batch 280/1567 | Loss: 1.0565 [2026-04-18 16:08:23] Validation | Batch 290/1567 | Loss: 1.0620 [2026-04-18 16:08:24] Validation | Batch 300/1567 | Loss: 1.0669 [2026-04-18 16:08:24] Validation | Batch 310/1567 | Loss: 1.0658 [2026-04-18 16:08:25] Validation | Batch 320/1567 | Loss: 1.0666 [2026-04-18 16:08:26] Validation | Batch 330/1567 | Loss: 1.0636 [2026-04-18 16:08:27] Validation | Batch 340/1567 | Loss: 1.0677 [2026-04-18 16:08:28] Validation | Batch 350/1567 | Loss: 1.0665 [2026-04-18 16:08:29] Validation | Batch 360/1567 | Loss: 1.0642 [2026-04-18 16:08:29] Validation | Batch 370/1567 | Loss: 1.0614 [2026-04-18 16:08:30] Validation | Batch 380/1567 | Loss: 1.0647 [2026-04-18 16:08:31] Validation | Batch 390/1567 | Loss: 1.0658 [2026-04-18 16:08:32] Validation | Batch 400/1567 | Loss: 1.0671 [2026-04-18 16:08:33] Validation | Batch 410/1567 | Loss: 1.0664 [2026-04-18 16:08:33] Validation | Batch 420/1567 | Loss: 1.0659 [2026-04-18 16:08:34] Validation | Batch 430/1567 | Loss: 1.0658 [2026-04-18 16:08:35] Validation | Batch 440/1567 | Loss: 1.0646 [2026-04-18 16:08:36] Validation | Batch 450/1567 | Loss: 1.0647 [2026-04-18 16:08:37] Validation | Batch 460/1567 | Loss: 1.0637 [2026-04-18 16:08:38] Validation | Batch 470/1567 | Loss: 1.0631 [2026-04-18 16:08:38] Validation | Batch 480/1567 | Loss: 1.0609 [2026-04-18 16:08:39] Validation | Batch 490/1567 | Loss: 1.0606 [2026-04-18 16:08:40] Validation | Batch 500/1567 | Loss: 1.0602 [2026-04-18 16:08:41] Validation | Batch 510/1567 | Loss: 1.0626 [2026-04-18 16:08:41] Validation | Batch 520/1567 | Loss: 1.0644 [2026-04-18 16:08:42] Validation | Batch 530/1567 | Loss: 1.0640 [2026-04-18 16:08:43] Validation | Batch 540/1567 | Loss: 1.0668 [2026-04-18 16:08:44] Validation | Batch 550/1567 | Loss: 1.0703 [2026-04-18 16:08:45] Validation | Batch 560/1567 | Loss: 1.0701 [2026-04-18 16:08:46] Validation | Batch 570/1567 | Loss: 1.0700 [2026-04-18 16:08:47] Validation | Batch 580/1567 | Loss: 1.0692 [2026-04-18 16:08:47] Validation | Batch 590/1567 | Loss: 1.0678 [2026-04-18 16:08:48] Validation | Batch 600/1567 | Loss: 1.0660 [2026-04-18 16:08:49] Validation | Batch 610/1567 | Loss: 1.0650 [2026-04-18 16:08:50] Validation | Batch 620/1567 | Loss: 1.0665 [2026-04-18 16:08:51] Validation | Batch 630/1567 | Loss: 1.0645 [2026-04-18 16:08:52] Validation | Batch 640/1567 | Loss: 1.0662 [2026-04-18 16:08:53] Validation | Batch 650/1567 | Loss: 1.0652 [2026-04-18 16:08:53] Validation | Batch 660/1567 | Loss: 1.0641 [2026-04-18 16:08:54] Validation | Batch 670/1567 | Loss: 1.0621 [2026-04-18 16:08:55] Validation | Batch 680/1567 | Loss: 1.0615 [2026-04-18 16:08:55] Validation | Batch 690/1567 | Loss: 1.0623 [2026-04-18 16:08:56] Validation | Batch 700/1567 | Loss: 1.0609 [2026-04-18 16:08:57] Validation | Batch 710/1567 | Loss: 1.0623 [2026-04-18 16:08:58] Validation | Batch 720/1567 | Loss: 1.0614 [2026-04-18 16:08:59] Validation | Batch 730/1567 | Loss: 1.0620 [2026-04-18 16:09:00] Validation | Batch 740/1567 | Loss: 1.0631 [2026-04-18 16:09:00] Validation | Batch 750/1567 | Loss: 1.0637 [2026-04-18 16:09:01] Validation | Batch 760/1567 | Loss: 1.0634 [2026-04-18 16:09:02] Validation | Batch 770/1567 | Loss: 1.0655 [2026-04-18 16:09:03] Validation | Batch 780/1567 | Loss: 1.0668 [2026-04-18 16:09:04] Validation | Batch 790/1567 | Loss: 1.0662 [2026-04-18 16:09:04] Validation | Batch 800/1567 | Loss: 1.0680 [2026-04-18 16:09:05] Validation | Batch 810/1567 | Loss: 1.0681 [2026-04-18 16:09:06] Validation | Batch 820/1567 | Loss: 1.0676 [2026-04-18 16:09:07] Validation | Batch 830/1567 | Loss: 1.0661 [2026-04-18 16:09:07] Validation | Batch 840/1567 | Loss: 1.0661 [2026-04-18 16:09:08] Validation | Batch 850/1567 | Loss: 1.0647 [2026-04-18 16:09:09] Validation | Batch 860/1567 | Loss: 1.0664 [2026-04-18 16:09:10] Validation | Batch 870/1567 | Loss: 1.0668 [2026-04-18 16:09:10] Validation | Batch 880/1567 | Loss: 1.0678 [2026-04-18 16:09:11] Validation | Batch 890/1567 | Loss: 1.0683 [2026-04-18 16:09:12] Validation | Batch 900/1567 | Loss: 1.0703 [2026-04-18 16:09:13] Validation | Batch 910/1567 | Loss: 1.0704 [2026-04-18 16:09:13] Validation | Batch 920/1567 | Loss: 1.0727 [2026-04-18 16:09:14] Validation | Batch 930/1567 | Loss: 1.0703 [2026-04-18 16:09:15] Validation | Batch 940/1567 | Loss: 1.0699 [2026-04-18 16:09:16] Validation | Batch 950/1567 | Loss: 1.0688 [2026-04-18 16:09:16] Validation | Batch 960/1567 | Loss: 1.0674 [2026-04-18 16:09:17] Validation | Batch 970/1567 | Loss: 1.0692 [2026-04-18 16:09:18] Validation | Batch 980/1567 | Loss: 1.0695 [2026-04-18 16:09:18] Validation | Batch 990/1567 | Loss: 1.0690 [2026-04-18 16:09:19] Validation | Batch 1000/1567 | Loss: 1.0694 [2026-04-18 16:09:20] Validation | Batch 1010/1567 | Loss: 1.0671 [2026-04-18 16:09:21] Validation | Batch 1020/1567 | Loss: 1.0674 [2026-04-18 16:09:22] Validation | Batch 1030/1567 | Loss: 1.0690 [2026-04-18 16:09:23] Validation | Batch 1040/1567 | Loss: 1.0685 [2026-04-18 16:09:23] Validation | Batch 1050/1567 | Loss: 1.0695 [2026-04-18 16:09:24] Validation | Batch 1060/1567 | Loss: 1.0685 [2026-04-18 16:09:25] Validation | Batch 1070/1567 | Loss: 1.0677 [2026-04-18 16:09:26] Validation | Batch 1080/1567 | Loss: 1.0687 [2026-04-18 16:09:26] Validation | Batch 1090/1567 | Loss: 1.0684 [2026-04-18 16:09:27] Validation | Batch 1100/1567 | Loss: 1.0690 [2026-04-18 16:09:28] Validation | Batch 1110/1567 | Loss: 1.0689 [2026-04-18 16:09:28] Validation | Batch 1120/1567 | Loss: 1.0691 [2026-04-18 16:09:29] Validation | Batch 1130/1567 | Loss: 1.0691 [2026-04-18 16:09:30] Validation | Batch 1140/1567 | Loss: 1.0700 [2026-04-18 16:09:31] Validation | Batch 1150/1567 | Loss: 1.0704 [2026-04-18 16:09:32] Validation | Batch 1160/1567 | Loss: 1.0712 [2026-04-18 16:09:33] Validation | Batch 1170/1567 | Loss: 1.0709 [2026-04-18 16:09:34] Validation | Batch 1180/1567 | Loss: 1.0705 [2026-04-18 16:09:34] Validation | Batch 1190/1567 | Loss: 1.0716 [2026-04-18 16:09:35] Validation | Batch 1200/1567 | Loss: 1.0710 [2026-04-18 16:09:36] Validation | Batch 1210/1567 | Loss: 1.0699 [2026-04-18 16:09:37] Validation | Batch 1220/1567 | Loss: 1.0703 [2026-04-18 16:09:37] Validation | Batch 1230/1567 | Loss: 1.0724 [2026-04-18 16:09:38] Validation | Batch 1240/1567 | Loss: 1.0711 [2026-04-18 16:09:39] Validation | Batch 1250/1567 | Loss: 1.0711 [2026-04-18 16:09:40] Validation | Batch 1260/1567 | Loss: 1.0721 [2026-04-18 16:09:41] Validation | Batch 1270/1567 | Loss: 1.0721 [2026-04-18 16:09:42] Validation | Batch 1280/1567 | Loss: 1.0715 [2026-04-18 16:09:43] Validation | Batch 1290/1567 | Loss: 1.0717 [2026-04-18 16:09:44] Validation | Batch 1300/1567 | Loss: 1.0720 [2026-04-18 16:09:44] Validation | Batch 1310/1567 | Loss: 1.0724 [2026-04-18 16:09:45] Validation | Batch 1320/1567 | Loss: 1.0715 [2026-04-18 16:09:46] Validation | Batch 1330/1567 | Loss: 1.0711 [2026-04-18 16:09:47] Validation | Batch 1340/1567 | Loss: 1.0709 [2026-04-18 16:09:47] Validation | Batch 1350/1567 | Loss: 1.0718 [2026-04-18 16:09:48] Validation | Batch 1360/1567 | Loss: 1.0714 [2026-04-18 16:09:49] Validation | Batch 1370/1567 | Loss: 1.0717 [2026-04-18 16:09:50] Validation | Batch 1380/1567 | Loss: 1.0731 [2026-04-18 16:09:51] Validation | Batch 1390/1567 | Loss: 1.0732 [2026-04-18 16:09:51] Validation | Batch 1400/1567 | Loss: 1.0736 [2026-04-18 16:09:52] Validation | Batch 1410/1567 | Loss: 1.0734 [2026-04-18 16:09:52] Validation | Batch 1420/1567 | Loss: 1.0740 [2026-04-18 16:09:53] Validation | Batch 1430/1567 | Loss: 1.0737 [2026-04-18 16:09:54] Validation | Batch 1440/1567 | Loss: 1.0740 [2026-04-18 16:09:55] Validation | Batch 1450/1567 | Loss: 1.0733 [2026-04-18 16:09:55] Validation | Batch 1460/1567 | Loss: 1.0731 [2026-04-18 16:09:56] Validation | Batch 1470/1567 | Loss: 1.0721 [2026-04-18 16:09:57] Validation | Batch 1480/1567 | Loss: 1.0705 [2026-04-18 16:09:58] Validation | Batch 1490/1567 | Loss: 1.0705 [2026-04-18 16:09:59] Validation | Batch 1500/1567 | Loss: 1.0706 [2026-04-18 16:09:59] Validation | Batch 1510/1567 | Loss: 1.0704 [2026-04-18 16:10:00] Validation | Batch 1520/1567 | Loss: 1.0697 [2026-04-18 16:10:01] Validation | Batch 1530/1567 | Loss: 1.0705 [2026-04-18 16:10:02] Validation | Batch 1540/1567 | Loss: 1.0715 [2026-04-18 16:10:02] Validation | Batch 1550/1567 | Loss: 1.0718 [2026-04-18 16:10:03] Validation | Batch 1560/1567 | Loss: 1.0709 [2026-04-18 16:10:04] Validation | Batch 1567/1567 | Loss: 1.0713 [2026-04-18 16:10:04] Validation | Loss: 1.0713 | PPL: 2.94 | Time: 125.79s [2026-04-18 16:10:07] Epoch 3 | Step 28010 | Loss: 0.7017 | LR: 2.00e-06 [2026-04-18 16:10:11] Epoch 3 | Step 28020 | Loss: 0.7019 | LR: 2.00e-06 [2026-04-18 16:10:15] Epoch 3 | Step 28030 | Loss: 0.7018 | LR: 2.00e-06 [2026-04-18 16:10:18] Epoch 3 | Step 28040 | Loss: 0.7018 | LR: 2.00e-06 [2026-04-18 16:10:22] Epoch 3 | Step 28050 | Loss: 0.7019 | LR: 2.00e-06 [2026-04-18 16:10:26] Epoch 3 | Step 28060 | Loss: 0.7018 | LR: 2.00e-06 [2026-04-18 16:10:29] Epoch 3 | Step 28070 | Loss: 0.7018 | LR: 2.00e-06 [2026-04-18 16:10:33] Epoch 3 | Step 28080 | Loss: 0.7017 | LR: 2.00e-06 [2026-04-18 16:10:36] Epoch 3 | Step 28090 | Loss: 0.7017 | LR: 2.00e-06 [2026-04-18 16:10:40] Epoch 3 | Step 28100 | Loss: 0.7018 | LR: 2.00e-06 [2026-04-18 16:10:44] Epoch 3 | Step 28110 | Loss: 0.7019 | LR: 2.00e-06 [2026-04-18 16:10:47] Epoch 3 | Step 28120 | Loss: 0.7018 | LR: 2.00e-06 [2026-04-18 16:10:51] Epoch 3 | Step 28130 | Loss: 0.7018 | LR: 2.00e-06 [2026-04-18 16:10:53] Epoch 3 | Step 28140 | Loss: 0.7018 | LR: 2.00e-06 [2026-04-18 16:10:57] Epoch 3 | Step 28150 | Loss: 0.7017 | LR: 2.00e-06 [2026-04-18 16:11:00] Epoch 3 | Step 28160 | Loss: 0.7017 | LR: 2.00e-06 [2026-04-18 16:11:04] Epoch 3 | Step 28170 | Loss: 0.7017 | LR: 2.00e-06 [2026-04-18 16:11:08] Epoch 3 | Step 28180 | Loss: 0.7016 | LR: 2.00e-06 [2026-04-18 16:11:11] Epoch 3 | Step 28190 | Loss: 0.7015 | LR: 2.00e-06 [2026-04-18 16:11:15] Epoch 3 | Step 28200 | Loss: 0.7014 | LR: 2.00e-06 [2026-04-18 16:11:19] Epoch 3 | Step 28210 | Loss: 0.7016 | LR: 2.00e-06 [2026-04-18 16:11:22] Epoch 3 | Step 28220 | Loss: 0.7015 | LR: 2.00e-06 [2026-04-18 16:11:27] Epoch 3 | Step 28230 | Loss: 0.7015 | LR: 2.00e-06 [2026-04-18 16:11:31] Epoch 3 | Step 28240 | Loss: 0.7015 | LR: 2.00e-06 [2026-04-18 16:11:35] Epoch 3 | Step 28250 | Loss: 0.7016 | LR: 2.00e-06 [2026-04-18 16:11:38] Epoch 3 | Step 28260 | Loss: 0.7015 | LR: 2.00e-06 [2026-04-18 16:11:42] Epoch 3 | Step 28270 | Loss: 0.7015 | LR: 2.00e-06 [2026-04-18 16:11:45] Epoch 3 | Step 28280 | Loss: 0.7015 | LR: 2.00e-06 [2026-04-18 16:11:48] Epoch 3 | Step 28290 | Loss: 0.7013 | LR: 2.00e-06 [2026-04-18 16:11:52] Epoch 3 | Step 28300 | Loss: 0.7015 | LR: 2.00e-06 [2026-04-18 16:11:56] Epoch 3 | Step 28310 | Loss: 0.7016 | LR: 2.00e-06 [2026-04-18 16:11:59] Epoch 3 | Step 28320 | Loss: 0.7016 | LR: 2.00e-06 [2026-04-18 16:12:03] Epoch 3 | Step 28330 | Loss: 0.7017 | LR: 2.00e-06 [2026-04-18 16:12:07] Epoch 3 | Step 28340 | Loss: 0.7016 | LR: 2.00e-06 [2026-04-18 16:12:10] Epoch 3 | Step 28350 | Loss: 0.7015 | LR: 2.00e-06 [2026-04-18 16:12:14] Epoch 3 | Step 28360 | Loss: 0.7015 | LR: 2.00e-06 [2026-04-18 16:12:17] Epoch 3 | Step 28370 | Loss: 0.7014 | LR: 2.00e-06 [2026-04-18 16:12:20] Epoch 3 | Step 28380 | Loss: 0.7014 | LR: 2.00e-06 [2026-04-18 16:12:24] Epoch 3 | Step 28390 | Loss: 0.7014 | LR: 2.00e-06 [2026-04-18 16:12:28] Epoch 3 | Step 28400 | Loss: 0.7014 | LR: 2.00e-06 [2026-04-18 16:12:32] Epoch 3 | Step 28410 | Loss: 0.7014 | LR: 2.00e-06 [2026-04-18 16:12:36] Epoch 3 | Step 28420 | Loss: 0.7014 | LR: 2.00e-06 [2026-04-18 16:12:41] Epoch 3 | Step 28430 | Loss: 0.7014 | LR: 2.00e-06 [2026-04-18 16:12:45] Epoch 3 | Step 28440 | Loss: 0.7015 | LR: 2.00e-06 [2026-04-18 16:12:48] Epoch 3 | Step 28450 | Loss: 0.7015 | LR: 2.00e-06 [2026-04-18 16:12:51] Epoch 3 | Step 28460 | Loss: 0.7016 | LR: 2.00e-06 [2026-04-18 16:12:55] Epoch 3 | Step 28470 | Loss: 0.7015 | LR: 2.00e-06 [2026-04-18 16:12:58] Epoch 3 | Step 28480 | Loss: 0.7013 | LR: 2.00e-06 [2026-04-18 16:13:02] Epoch 3 | Step 28490 | Loss: 0.7015 | LR: 2.00e-06 [2026-04-18 16:13:04] Epoch 3 | Step 28500 | Loss: 0.7015 | LR: 2.00e-06 [2026-04-18 16:13:08] Epoch 3 | Step 28510 | Loss: 0.7016 | LR: 2.00e-06 [2026-04-18 16:13:11] Epoch 3 | Step 28520 | Loss: 0.7015 | LR: 2.00e-06 [2026-04-18 16:13:14] Epoch 3 | Step 28530 | Loss: 0.7015 | LR: 2.00e-06 [2026-04-18 16:13:18] Epoch 3 | Step 28540 | Loss: 0.7015 | LR: 2.00e-06 [2026-04-18 16:13:21] Epoch 3 | Step 28550 | Loss: 0.7016 | LR: 2.00e-06 [2026-04-18 16:13:25] Epoch 3 | Step 28560 | Loss: 0.7016 | LR: 2.00e-06 [2026-04-18 16:13:29] Epoch 3 | Step 28570 | Loss: 0.7017 | LR: 2.00e-06 [2026-04-18 16:13:32] Epoch 3 | Step 28580 | Loss: 0.7016 | LR: 2.00e-06 [2026-04-18 16:13:37] Epoch 3 | Step 28590 | Loss: 0.7015 | LR: 2.00e-06 [2026-04-18 16:13:40] Epoch 3 | Step 28600 | Loss: 0.7015 | LR: 2.00e-06 [2026-04-18 16:13:44] Epoch 3 | Step 28610 | Loss: 0.7015 | LR: 2.00e-06 [2026-04-18 16:13:47] Epoch 3 | Step 28620 | Loss: 0.7014 | LR: 2.00e-06 [2026-04-18 16:13:51] Epoch 3 | Step 28630 | Loss: 0.7013 | LR: 2.00e-06 [2026-04-18 16:13:54] Epoch 3 | Step 28640 | Loss: 0.7012 | LR: 2.00e-06 [2026-04-18 16:13:58] Epoch 3 | Step 28650 | Loss: 0.7013 | LR: 2.00e-06 [2026-04-18 16:14:01] Epoch 3 | Step 28660 | Loss: 0.7014 | LR: 2.00e-06 [2026-04-18 16:14:05] Epoch 3 | Step 28670 | Loss: 0.7012 | LR: 2.00e-06 [2026-04-18 16:14:07] Epoch 3 | Step 28680 | Loss: 0.7011 | LR: 2.00e-06 [2026-04-18 16:14:11] Epoch 3 | Step 28690 | Loss: 0.7011 | LR: 2.00e-06 [2026-04-18 16:14:15] Epoch 3 | Step 28700 | Loss: 0.7010 | LR: 2.00e-06 [2026-04-18 16:14:18] Epoch 3 | Step 28710 | Loss: 0.7010 | LR: 2.00e-06 [2026-04-18 16:14:22] Epoch 3 | Step 28720 | Loss: 0.7010 | LR: 2.00e-06 [2026-04-18 16:14:26] Epoch 3 | Step 28730 | Loss: 0.7010 | LR: 2.00e-06 [2026-04-18 16:14:29] Epoch 3 | Step 28740 | Loss: 0.7010 | LR: 2.00e-06 [2026-04-18 16:14:33] Epoch 3 | Step 28750 | Loss: 0.7010 | LR: 2.00e-06 [2026-04-18 16:14:37] Epoch 3 | Step 28760 | Loss: 0.7010 | LR: 2.00e-06 [2026-04-18 16:14:41] Epoch 3 | Step 28770 | Loss: 0.7010 | LR: 2.00e-06 [2026-04-18 16:14:45] Epoch 3 | Step 28780 | Loss: 0.7009 | LR: 2.00e-06 [2026-04-18 16:14:48] Epoch 3 | Step 28790 | Loss: 0.7009 | LR: 2.00e-06 [2026-04-18 16:14:52] Epoch 3 | Step 28800 | Loss: 0.7009 | LR: 2.00e-06 [2026-04-18 16:14:56] Epoch 3 | Step 28810 | Loss: 0.7009 | LR: 2.00e-06 [2026-04-18 16:15:00] Epoch 3 | Step 28820 | Loss: 0.7009 | LR: 2.00e-06 [2026-04-18 16:15:03] Epoch 3 | Step 28830 | Loss: 0.7010 | LR: 2.00e-06 [2026-04-18 16:15:07] Epoch 3 | Step 28840 | Loss: 0.7009 | LR: 2.00e-06 [2026-04-18 16:15:10] Epoch 3 | Step 28850 | Loss: 0.7010 | LR: 2.00e-06 [2026-04-18 16:15:13] Epoch 3 | Step 28860 | Loss: 0.7009 | LR: 2.00e-06 [2026-04-18 16:15:17] Epoch 3 | Step 28870 | Loss: 0.7009 | LR: 2.00e-06 [2026-04-18 16:15:20] Epoch 3 | Step 28880 | Loss: 0.7008 | LR: 2.00e-06 [2026-04-18 16:15:23] Epoch 3 | Step 28890 | Loss: 0.7010 | LR: 2.00e-06 [2026-04-18 16:15:27] Epoch 3 | Step 28900 | Loss: 0.7010 | LR: 2.00e-06 [2026-04-18 16:15:30] Epoch 3 | Step 28910 | Loss: 0.7009 | LR: 2.00e-06 [2026-04-18 16:15:34] Epoch 3 | Step 28920 | Loss: 0.7009 | LR: 2.00e-06 [2026-04-18 16:15:37] Epoch 3 | Step 28930 | Loss: 0.7009 | LR: 2.00e-06 [2026-04-18 16:15:41] Epoch 3 | Step 28940 | Loss: 0.7009 | LR: 2.00e-06 [2026-04-18 16:15:45] Epoch 3 | Step 28950 | Loss: 0.7009 | LR: 2.00e-06 [2026-04-18 16:15:49] Epoch 3 | Step 28960 | Loss: 0.7009 | LR: 2.00e-06 [2026-04-18 16:15:53] Epoch 3 | Step 28970 | Loss: 0.7009 | LR: 2.00e-06 [2026-04-18 16:15:56] Epoch 3 | Step 28980 | Loss: 0.7009 | LR: 2.00e-06 [2026-04-18 16:16:00] Epoch 3 | Step 28990 | Loss: 0.7008 | LR: 2.00e-06 [2026-04-18 16:16:04] Epoch 3 | Step 29000 | Loss: 0.7008 | LR: 2.00e-06 [2026-04-18 16:16:04] Validation | Batch 10/1567 | Loss: 0.9440 [2026-04-18 16:16:05] Validation | Batch 20/1567 | Loss: 1.0150 [2026-04-18 16:16:06] Validation | Batch 30/1567 | Loss: 1.0553 [2026-04-18 16:16:07] Validation | Batch 40/1567 | Loss: 1.0786 [2026-04-18 16:16:08] Validation | Batch 50/1567 | Loss: 1.0534 [2026-04-18 16:16:09] Validation | Batch 60/1567 | Loss: 1.0397 [2026-04-18 16:16:10] Validation | Batch 70/1567 | Loss: 1.0250 [2026-04-18 16:16:11] Validation | Batch 80/1567 | Loss: 1.0436 [2026-04-18 16:16:11] Validation | Batch 90/1567 | Loss: 1.0520 [2026-04-18 16:16:12] Validation | Batch 100/1567 | Loss: 1.0625 [2026-04-18 16:16:13] Validation | Batch 110/1567 | Loss: 1.0542 [2026-04-18 16:16:14] Validation | Batch 120/1567 | Loss: 1.0654 [2026-04-18 16:16:15] Validation | Batch 130/1567 | Loss: 1.0666 [2026-04-18 16:16:16] Validation | Batch 140/1567 | Loss: 1.0686 [2026-04-18 16:16:16] Validation | Batch 150/1567 | Loss: 1.0766 [2026-04-18 16:16:16] Validation | Batch 160/1567 | Loss: 1.0776 [2026-04-18 16:16:17] Validation | Batch 170/1567 | Loss: 1.0620 [2026-04-18 16:16:18] Validation | Batch 180/1567 | Loss: 1.0645 [2026-04-18 16:16:19] Validation | Batch 190/1567 | Loss: 1.0606 [2026-04-18 16:16:20] Validation | Batch 200/1567 | Loss: 1.0634 [2026-04-18 16:16:21] Validation | Batch 210/1567 | Loss: 1.0641 [2026-04-18 16:16:21] Validation | Batch 220/1567 | Loss: 1.0663 [2026-04-18 16:16:22] Validation | Batch 230/1567 | Loss: 1.0702 [2026-04-18 16:16:23] Validation | Batch 240/1567 | Loss: 1.0685 [2026-04-18 16:16:24] Validation | Batch 250/1567 | Loss: 1.0624 [2026-04-18 16:16:25] Validation | Batch 260/1567 | Loss: 1.0575 [2026-04-18 16:16:25] Validation | Batch 270/1567 | Loss: 1.0545 [2026-04-18 16:16:26] Validation | Batch 280/1567 | Loss: 1.0556 [2026-04-18 16:16:27] Validation | Batch 290/1567 | Loss: 1.0611 [2026-04-18 16:16:28] Validation | Batch 300/1567 | Loss: 1.0662 [2026-04-18 16:16:29] Validation | Batch 310/1567 | Loss: 1.0650 [2026-04-18 16:16:29] Validation | Batch 320/1567 | Loss: 1.0659 [2026-04-18 16:16:31] Validation | Batch 330/1567 | Loss: 1.0629 [2026-04-18 16:16:31] Validation | Batch 340/1567 | Loss: 1.0671 [2026-04-18 16:16:32] Validation | Batch 350/1567 | Loss: 1.0659 [2026-04-18 16:16:33] Validation | Batch 360/1567 | Loss: 1.0636 [2026-04-18 16:16:34] Validation | Batch 370/1567 | Loss: 1.0607 [2026-04-18 16:16:34] Validation | Batch 380/1567 | Loss: 1.0640 [2026-04-18 16:16:35] Validation | Batch 390/1567 | Loss: 1.0651 [2026-04-18 16:16:36] Validation | Batch 400/1567 | Loss: 1.0663 [2026-04-18 16:16:37] Validation | Batch 410/1567 | Loss: 1.0656 [2026-04-18 16:16:38] Validation | Batch 420/1567 | Loss: 1.0652 [2026-04-18 16:16:39] Validation | Batch 430/1567 | Loss: 1.0650 [2026-04-18 16:16:39] Validation | Batch 440/1567 | Loss: 1.0638 [2026-04-18 16:16:40] Validation | Batch 450/1567 | Loss: 1.0639 [2026-04-18 16:16:41] Validation | Batch 460/1567 | Loss: 1.0629 [2026-04-18 16:16:42] Validation | Batch 470/1567 | Loss: 1.0624 [2026-04-18 16:16:43] Validation | Batch 480/1567 | Loss: 1.0602 [2026-04-18 16:16:43] Validation | Batch 490/1567 | Loss: 1.0599 [2026-04-18 16:16:44] Validation | Batch 500/1567 | Loss: 1.0594 [2026-04-18 16:16:45] Validation | Batch 510/1567 | Loss: 1.0617 [2026-04-18 16:16:46] Validation | Batch 520/1567 | Loss: 1.0635 [2026-04-18 16:16:47] Validation | Batch 530/1567 | Loss: 1.0632 [2026-04-18 16:16:48] Validation | Batch 540/1567 | Loss: 1.0659 [2026-04-18 16:16:48] Validation | Batch 550/1567 | Loss: 1.0695 [2026-04-18 16:16:49] Validation | Batch 560/1567 | Loss: 1.0693 [2026-04-18 16:16:50] Validation | Batch 570/1567 | Loss: 1.0692 [2026-04-18 16:16:51] Validation | Batch 580/1567 | Loss: 1.0684 [2026-04-18 16:16:52] Validation | Batch 590/1567 | Loss: 1.0670 [2026-04-18 16:16:53] Validation | Batch 600/1567 | Loss: 1.0652 [2026-04-18 16:16:54] Validation | Batch 610/1567 | Loss: 1.0641 [2026-04-18 16:16:55] Validation | Batch 620/1567 | Loss: 1.0656 [2026-04-18 16:16:56] Validation | Batch 630/1567 | Loss: 1.0636 [2026-04-18 16:16:56] Validation | Batch 640/1567 | Loss: 1.0653 [2026-04-18 16:16:57] Validation | Batch 650/1567 | Loss: 1.0644 [2026-04-18 16:16:58] Validation | Batch 660/1567 | Loss: 1.0632 [2026-04-18 16:16:59] Validation | Batch 670/1567 | Loss: 1.0613 [2026-04-18 16:16:59] Validation | Batch 680/1567 | Loss: 1.0606 [2026-04-18 16:17:00] Validation | Batch 690/1567 | Loss: 1.0615 [2026-04-18 16:17:01] Validation | Batch 700/1567 | Loss: 1.0601 [2026-04-18 16:17:02] Validation | Batch 710/1567 | Loss: 1.0615 [2026-04-18 16:17:03] Validation | Batch 720/1567 | Loss: 1.0607 [2026-04-18 16:17:03] Validation | Batch 730/1567 | Loss: 1.0613 [2026-04-18 16:17:04] Validation | Batch 740/1567 | Loss: 1.0624 [2026-04-18 16:17:05] Validation | Batch 750/1567 | Loss: 1.0630 [2026-04-18 16:17:06] Validation | Batch 760/1567 | Loss: 1.0628 [2026-04-18 16:17:07] Validation | Batch 770/1567 | Loss: 1.0649 [2026-04-18 16:17:08] Validation | Batch 780/1567 | Loss: 1.0661 [2026-04-18 16:17:08] Validation | Batch 790/1567 | Loss: 1.0656 [2026-04-18 16:17:09] Validation | Batch 800/1567 | Loss: 1.0674 [2026-04-18 16:17:10] Validation | Batch 810/1567 | Loss: 1.0674 [2026-04-18 16:17:11] Validation | Batch 820/1567 | Loss: 1.0670 [2026-04-18 16:17:11] Validation | Batch 830/1567 | Loss: 1.0654 [2026-04-18 16:17:12] Validation | Batch 840/1567 | Loss: 1.0655 [2026-04-18 16:17:13] Validation | Batch 850/1567 | Loss: 1.0641 [2026-04-18 16:17:13] Validation | Batch 860/1567 | Loss: 1.0658 [2026-04-18 16:17:14] Validation | Batch 870/1567 | Loss: 1.0662 [2026-04-18 16:17:15] Validation | Batch 880/1567 | Loss: 1.0671 [2026-04-18 16:17:16] Validation | Batch 890/1567 | Loss: 1.0677 [2026-04-18 16:17:17] Validation | Batch 900/1567 | Loss: 1.0697 [2026-04-18 16:17:17] Validation | Batch 910/1567 | Loss: 1.0698 [2026-04-18 16:17:18] Validation | Batch 920/1567 | Loss: 1.0720 [2026-04-18 16:17:19] Validation | Batch 930/1567 | Loss: 1.0696 [2026-04-18 16:17:19] Validation | Batch 940/1567 | Loss: 1.0692 [2026-04-18 16:17:20] Validation | Batch 950/1567 | Loss: 1.0682 [2026-04-18 16:17:21] Validation | Batch 960/1567 | Loss: 1.0668 [2026-04-18 16:17:22] Validation | Batch 970/1567 | Loss: 1.0686 [2026-04-18 16:17:23] Validation | Batch 980/1567 | Loss: 1.0689 [2026-04-18 16:17:23] Validation | Batch 990/1567 | Loss: 1.0684 [2026-04-18 16:17:24] Validation | Batch 1000/1567 | Loss: 1.0688 [2026-04-18 16:17:25] Validation | Batch 1010/1567 | Loss: 1.0665 [2026-04-18 16:17:25] Validation | Batch 1020/1567 | Loss: 1.0668 [2026-04-18 16:17:26] Validation | Batch 1030/1567 | Loss: 1.0684 [2026-04-18 16:17:27] Validation | Batch 1040/1567 | Loss: 1.0679 [2026-04-18 16:17:28] Validation | Batch 1050/1567 | Loss: 1.0690 [2026-04-18 16:17:29] Validation | Batch 1060/1567 | Loss: 1.0680 [2026-04-18 16:17:30] Validation | Batch 1070/1567 | Loss: 1.0672 [2026-04-18 16:17:30] Validation | Batch 1080/1567 | Loss: 1.0682 [2026-04-18 16:17:31] Validation | Batch 1090/1567 | Loss: 1.0679 [2026-04-18 16:17:32] Validation | Batch 1100/1567 | Loss: 1.0685 [2026-04-18 16:17:32] Validation | Batch 1110/1567 | Loss: 1.0684 [2026-04-18 16:17:33] Validation | Batch 1120/1567 | Loss: 1.0686 [2026-04-18 16:17:34] Validation | Batch 1130/1567 | Loss: 1.0686 [2026-04-18 16:17:35] Validation | Batch 1140/1567 | Loss: 1.0695 [2026-04-18 16:17:36] Validation | Batch 1150/1567 | Loss: 1.0699 [2026-04-18 16:17:36] Validation | Batch 1160/1567 | Loss: 1.0707 [2026-04-18 16:17:37] Validation | Batch 1170/1567 | Loss: 1.0704 [2026-04-18 16:17:38] Validation | Batch 1180/1567 | Loss: 1.0700 [2026-04-18 16:17:39] Validation | Batch 1190/1567 | Loss: 1.0712 [2026-04-18 16:17:40] Validation | Batch 1200/1567 | Loss: 1.0706 [2026-04-18 16:17:41] Validation | Batch 1210/1567 | Loss: 1.0694 [2026-04-18 16:17:41] Validation | Batch 1220/1567 | Loss: 1.0697 [2026-04-18 16:17:42] Validation | Batch 1230/1567 | Loss: 1.0719 [2026-04-18 16:17:43] Validation | Batch 1240/1567 | Loss: 1.0706 [2026-04-18 16:17:44] Validation | Batch 1250/1567 | Loss: 1.0706 [2026-04-18 16:17:44] Validation | Batch 1260/1567 | Loss: 1.0716 [2026-04-18 16:17:46] Validation | Batch 1270/1567 | Loss: 1.0716 [2026-04-18 16:17:46] Validation | Batch 1280/1567 | Loss: 1.0710 [2026-04-18 16:17:47] Validation | Batch 1290/1567 | Loss: 1.0713 [2026-04-18 16:17:48] Validation | Batch 1300/1567 | Loss: 1.0715 [2026-04-18 16:17:49] Validation | Batch 1310/1567 | Loss: 1.0719 [2026-04-18 16:17:50] Validation | Batch 1320/1567 | Loss: 1.0710 [2026-04-18 16:17:50] Validation | Batch 1330/1567 | Loss: 1.0706 [2026-04-18 16:17:51] Validation | Batch 1340/1567 | Loss: 1.0704 [2026-04-18 16:17:52] Validation | Batch 1350/1567 | Loss: 1.0713 [2026-04-18 16:17:53] Validation | Batch 1360/1567 | Loss: 1.0709 [2026-04-18 16:17:53] Validation | Batch 1370/1567 | Loss: 1.0712 [2026-04-18 16:17:54] Validation | Batch 1380/1567 | Loss: 1.0726 [2026-04-18 16:17:55] Validation | Batch 1390/1567 | Loss: 1.0727 [2026-04-18 16:17:56] Validation | Batch 1400/1567 | Loss: 1.0731 [2026-04-18 16:17:56] Validation | Batch 1410/1567 | Loss: 1.0729 [2026-04-18 16:17:57] Validation | Batch 1420/1567 | Loss: 1.0735 [2026-04-18 16:17:58] Validation | Batch 1430/1567 | Loss: 1.0732 [2026-04-18 16:17:59] Validation | Batch 1440/1567 | Loss: 1.0735 [2026-04-18 16:17:59] Validation | Batch 1450/1567 | Loss: 1.0728 [2026-04-18 16:18:00] Validation | Batch 1460/1567 | Loss: 1.0726 [2026-04-18 16:18:01] Validation | Batch 1470/1567 | Loss: 1.0716 [2026-04-18 16:18:02] Validation | Batch 1480/1567 | Loss: 1.0700 [2026-04-18 16:18:02] Validation | Batch 1490/1567 | Loss: 1.0700 [2026-04-18 16:18:03] Validation | Batch 1500/1567 | Loss: 1.0701 [2026-04-18 16:18:04] Validation | Batch 1510/1567 | Loss: 1.0699 [2026-04-18 16:18:05] Validation | Batch 1520/1567 | Loss: 1.0692 [2026-04-18 16:18:05] Validation | Batch 1530/1567 | Loss: 1.0700 [2026-04-18 16:18:06] Validation | Batch 1540/1567 | Loss: 1.0710 [2026-04-18 16:18:07] Validation | Batch 1550/1567 | Loss: 1.0713 [2026-04-18 16:18:08] Validation | Batch 1560/1567 | Loss: 1.0704 [2026-04-18 16:18:09] Validation | Batch 1567/1567 | Loss: 1.0708 [2026-04-18 16:18:09] Validation | Loss: 1.0708 | PPL: 2.94 | Time: 125.04s [2026-04-18 16:18:12] Epoch 3 | Step 29010 | Loss: 0.7007 | LR: 2.00e-06 [2026-04-18 16:18:16] Epoch 3 | Step 29020 | Loss: 0.7009 | LR: 2.00e-06 [2026-04-18 16:18:20] Epoch 3 | Step 29030 | Loss: 0.7009 | LR: 2.00e-06 [2026-04-18 16:18:23] Epoch 3 | Step 29040 | Loss: 0.7008 | LR: 2.00e-06 [2026-04-18 16:18:27] Epoch 3 | Step 29050 | Loss: 0.7008 | LR: 2.00e-06 [2026-04-18 16:18:31] Epoch 3 | Step 29060 | Loss: 0.7008 | LR: 2.00e-06 [2026-04-18 16:18:35] Epoch 3 | Step 29070 | Loss: 0.7008 | LR: 2.00e-06 [2026-04-18 16:18:38] Epoch 3 | Step 29080 | Loss: 0.7009 | LR: 2.00e-06 [2026-04-18 16:18:42] Epoch 3 | Step 29090 | Loss: 0.7008 | LR: 2.00e-06 [2026-04-18 16:18:46] Epoch 3 | Step 29100 | Loss: 0.7008 | LR: 2.00e-06 [2026-04-18 16:18:49] Epoch 3 | Step 29110 | Loss: 0.7007 | LR: 2.00e-06 [2026-04-18 16:18:53] Epoch 3 | Step 29120 | Loss: 0.7008 | LR: 2.00e-06 [2026-04-18 16:18:56] Epoch 3 | Step 29130 | Loss: 0.7008 | LR: 2.00e-06 [2026-04-18 16:18:59] Epoch 3 | Step 29140 | Loss: 0.7007 | LR: 2.00e-06 [2026-04-18 16:19:02] Epoch 3 | Step 29150 | Loss: 0.7008 | LR: 2.00e-06 [2026-04-18 16:19:06] Epoch 3 | Step 29160 | Loss: 0.7007 | LR: 2.00e-06 [2026-04-18 16:19:10] Epoch 3 | Step 29170 | Loss: 0.7007 | LR: 2.00e-06 [2026-04-18 16:19:13] Epoch 3 | Step 29180 | Loss: 0.7008 | LR: 2.00e-06 [2026-04-18 16:19:16] Epoch 3 | Step 29190 | Loss: 0.7008 | LR: 2.00e-06 [2026-04-18 16:19:20] Epoch 3 | Step 29200 | Loss: 0.7007 | LR: 2.00e-06 [2026-04-18 16:19:24] Epoch 3 | Step 29210 | Loss: 0.7007 | LR: 2.00e-06 [2026-04-18 16:19:28] Epoch 3 | Step 29220 | Loss: 0.7007 | LR: 2.00e-06 [2026-04-18 16:19:31] Epoch 3 | Step 29230 | Loss: 0.7007 | LR: 2.00e-06 [2026-04-18 16:19:35] Epoch 3 | Step 29240 | Loss: 0.7008 | LR: 2.00e-06 [2026-04-18 16:19:39] Epoch 3 | Step 29250 | Loss: 0.7008 | LR: 2.00e-06 [2026-04-18 16:19:42] Epoch 3 | Step 29260 | Loss: 0.7008 | LR: 2.00e-06 [2026-04-18 16:19:46] Epoch 3 | Step 29270 | Loss: 0.7008 | LR: 2.00e-06 [2026-04-18 16:19:50] Epoch 3 | Step 29280 | Loss: 0.7008 | LR: 2.00e-06 [2026-04-18 16:19:54] Epoch 3 | Step 29290 | Loss: 0.7007 | LR: 2.00e-06 [2026-04-18 16:19:57] Epoch 3 | Step 29300 | Loss: 0.7008 | LR: 2.00e-06 [2026-04-18 16:20:00] Epoch 3 | Step 29310 | Loss: 0.7008 | LR: 2.00e-06 [2026-04-18 16:20:04] Epoch 3 | Step 29320 | Loss: 0.7008 | LR: 2.00e-06 [2026-04-18 16:20:08] Epoch 3 | Step 29330 | Loss: 0.7008 | LR: 2.00e-06 [2026-04-18 16:20:11] Epoch 3 | Step 29340 | Loss: 0.7008 | LR: 2.00e-06 [2026-04-18 16:20:15] Epoch 3 | Step 29350 | Loss: 0.7008 | LR: 2.00e-06 [2026-04-18 16:20:19] Epoch 3 | Step 29360 | Loss: 0.7008 | LR: 2.00e-06 [2026-04-18 16:20:22] Epoch 3 | Step 29370 | Loss: 0.7008 | LR: 2.00e-06 [2026-04-18 16:20:26] Epoch 3 | Step 29380 | Loss: 0.7008 | LR: 2.00e-06 [2026-04-18 16:20:29] Epoch 3 | Step 29390 | Loss: 0.7007 | LR: 2.00e-06 [2026-04-18 16:20:33] Epoch 3 | Step 29400 | Loss: 0.7006 | LR: 2.00e-06 [2026-04-18 16:20:36] Epoch 3 | Step 29410 | Loss: 0.7005 | LR: 2.00e-06 [2026-04-18 16:20:40] Epoch 3 | Step 29420 | Loss: 0.7005 | LR: 2.00e-06 [2026-04-18 16:20:44] Epoch 3 | Step 29430 | Loss: 0.7006 | LR: 2.00e-06 [2026-04-18 16:20:48] Epoch 3 | Step 29440 | Loss: 0.7005 | LR: 2.00e-06 [2026-04-18 16:20:51] Epoch 3 | Step 29450 | Loss: 0.7007 | LR: 2.00e-06 [2026-04-18 16:20:55] Epoch 3 | Step 29460 | Loss: 0.7008 | LR: 2.00e-06 [2026-04-18 16:20:58] Epoch 3 | Step 29470 | Loss: 0.7009 | LR: 2.00e-06 [2026-04-18 16:21:02] Epoch 3 | Step 29480 | Loss: 0.7008 | LR: 2.00e-06 [2026-04-18 16:21:06] Epoch 3 | Step 29490 | Loss: 0.7008 | LR: 2.00e-06 [2026-04-18 16:21:09] Epoch 3 | Step 29500 | Loss: 0.7009 | LR: 2.00e-06 [2026-04-18 16:21:13] Epoch 3 | Step 29510 | Loss: 0.7008 | LR: 2.00e-06 [2026-04-18 16:21:16] Epoch 3 | Step 29520 | Loss: 0.7007 | LR: 2.00e-06 [2026-04-18 16:21:20] Epoch 3 | Step 29530 | Loss: 0.7007 | LR: 2.00e-06 [2026-04-18 16:21:23] Epoch 3 | Step 29540 | Loss: 0.7007 | LR: 2.00e-06 [2026-04-18 16:21:26] Epoch 3 | Step 29550 | Loss: 0.7007 | LR: 2.00e-06 [2026-04-18 16:21:30] Epoch 3 | Step 29560 | Loss: 0.7008 | LR: 2.00e-06 [2026-04-18 16:21:34] Epoch 3 | Step 29570 | Loss: 0.7008 | LR: 2.00e-06 [2026-04-18 16:21:37] Epoch 3 | Step 29580 | Loss: 0.7008 | LR: 2.00e-06 [2026-04-18 16:21:41] Epoch 3 | Step 29590 | Loss: 0.7008 | LR: 2.00e-06 [2026-04-18 16:21:44] Epoch 3 | Step 29600 | Loss: 0.7007 | LR: 2.00e-06 [2026-04-18 16:21:48] Epoch 3 | Step 29610 | Loss: 0.7009 | LR: 2.00e-06 [2026-04-18 16:21:52] Epoch 3 | Step 29620 | Loss: 0.7010 | LR: 2.00e-06 [2026-04-18 16:21:56] Epoch 3 | Step 29630 | Loss: 0.7010 | LR: 2.00e-06 [2026-04-18 16:22:00] Epoch 3 | Step 29640 | Loss: 0.7010 | LR: 2.00e-06 [2026-04-18 16:22:03] Epoch 3 | Step 29650 | Loss: 0.7010 | LR: 2.00e-06 [2026-04-18 16:22:07] Epoch 3 | Step 29660 | Loss: 0.7011 | LR: 2.00e-06 [2026-04-18 16:22:07] Epoch 3 completed in 4867.09s | Loss: 0.7011 [2026-04-18 16:22:17] Checkpoint saved: outputs/2026-04-18/12-19-14/checkpoints/checkpoint_step_29661.pt [2026-04-18 16:22:31] Training completed! [2026-04-18 16:22:34] Final model: outputs/2026-04-18/12-19-14/model_final.pt