ImageCaptioningProject / text_files /training (2).txt
Varsha Dewangan
Initial clean commit for project deployment
ee1d4aa
2025-06-05 09:14:36,398 - __main__ - INFO -
--- Starting Model Training ---
2025-06-05 09:14:36,399 - __main__ - INFO - Starting training process...
2025-06-05 09:14:36,400 - __main__ - INFO - Using device: cuda
2025-06-05 09:14:36,405 - __main__ - INFO - Initializing training dataset for vocabulary building...
2025-06-05 09:14:37,522 - __main__ - INFO - Successfully loaded captions from /kaggle/input/coco-2017-dataset/coco2017/annotations/captions_train2017.json
2025-06-05 09:17:31,012 - __main__ - INFO -
--- Starting Model Training ---
2025-06-05 09:17:31,013 - __main__ - INFO - Starting training process...
2025-06-05 09:17:31,014 - __main__ - INFO - Using device: cuda
2025-06-05 09:17:31,088 - __main__ - INFO - Loaded vocabulary from /kaggle/input/vocabulary_s/pytorch/default/1/vocabulary.pkl
2025-06-05 09:17:32,171 - __main__ - INFO - Successfully loaded captions from /kaggle/input/coco-2017-dataset/coco2017/annotations/captions_train2017.json
2025-06-05 09:23:17,287 - __main__ - INFO - Using subset of 200000 samples for the dataset.
2025-06-05 09:23:17,288 - __main__ - INFO - Dataset size after filtering: 200000 samples.
2025-06-05 09:23:17,386 - __main__ - INFO - Successfully loaded captions from /kaggle/input/coco-2017-dataset/coco2017/annotations/captions_val2017.json
2025-06-05 09:23:33,589 - __main__ - INFO - Dataset size after filtering: 25014 samples.
2025-06-05 09:23:33,590 - __main__ - INFO - Training dataset size: 200000
2025-06-05 09:23:33,591 - __main__ - INFO - Validation dataset size: 25014
2025-06-05 09:23:34,957 - __main__ - INFO - ResNet encoder base layers are fine-tuning enabled.
2025-06-05 09:23:35,415 - __main__ - INFO - No checkpoint found. Starting training from scratch.
2025-06-05 09:25:12,222 - __main__ - INFO - Epoch [1/9], Step [100/3125], Loss: 5.4706, Perplexity: 237.6090
2025-06-05 09:26:45,054 - __main__ - INFO - Epoch [1/9], Step [200/3125], Loss: 5.1889, Perplexity: 179.2651
2025-06-05 09:28:17,121 - __main__ - INFO - Epoch [1/9], Step [300/3125], Loss: 4.7951, Perplexity: 120.9148
2025-06-05 09:29:49,844 - __main__ - INFO - Epoch [1/9], Step [400/3125], Loss: 4.1438, Perplexity: 63.0419
2025-06-05 09:31:22,479 - __main__ - INFO - Epoch [1/9], Step [500/3125], Loss: 4.0644, Perplexity: 58.2316
2025-06-05 09:32:55,263 - __main__ - INFO - Epoch [1/9], Step [600/3125], Loss: 4.1386, Perplexity: 62.7179
2025-06-05 09:34:28,639 - __main__ - INFO - Epoch [1/9], Step [700/3125], Loss: 4.4148, Perplexity: 82.6655
2025-06-05 09:36:01,562 - __main__ - INFO - Epoch [1/9], Step [800/3125], Loss: 4.0775, Perplexity: 58.9954
2025-06-05 09:37:34,470 - __main__ - INFO - Epoch [1/9], Step [900/3125], Loss: 3.8928, Perplexity: 49.0485
2025-06-05 09:39:07,144 - __main__ - INFO - Epoch [1/9], Step [1000/3125], Loss: 3.7069, Perplexity: 40.7267
2025-06-05 09:40:39,832 - __main__ - INFO - Epoch [1/9], Step [1100/3125], Loss: 3.7165, Perplexity: 41.1201
2025-06-05 09:42:12,245 - __main__ - INFO - Epoch [1/9], Step [1200/3125], Loss: 3.6987, Perplexity: 40.3960
2025-06-05 09:43:44,664 - __main__ - INFO - Epoch [1/9], Step [1300/3125], Loss: 3.8443, Perplexity: 46.7262
2025-06-05 09:45:16,983 - __main__ - INFO - Epoch [1/9], Step [1400/3125], Loss: 3.7788, Perplexity: 43.7654
2025-06-05 09:46:49,549 - __main__ - INFO - Epoch [1/9], Step [1500/3125], Loss: 3.9427, Perplexity: 51.5586
2025-06-05 09:48:22,306 - __main__ - INFO - Epoch [1/9], Step [1600/3125], Loss: 3.9204, Perplexity: 50.4209
2025-06-05 09:49:55,016 - __main__ - INFO - Epoch [1/9], Step [1700/3125], Loss: 3.6418, Perplexity: 38.1600
2025-06-05 09:51:27,127 - __main__ - INFO - Epoch [1/9], Step [1800/3125], Loss: 3.5172, Perplexity: 33.6913
2025-06-05 09:52:59,489 - __main__ - INFO - Epoch [1/9], Step [1900/3125], Loss: 3.6409, Perplexity: 38.1244
2025-06-05 09:54:31,921 - __main__ - INFO - Epoch [1/9], Step [2000/3125], Loss: 3.2969, Perplexity: 27.0292
2025-06-05 09:56:04,383 - __main__ - INFO - Epoch [1/9], Step [2100/3125], Loss: 3.4517, Perplexity: 31.5544
2025-06-05 09:57:36,919 - __main__ - INFO - Epoch [1/9], Step [2200/3125], Loss: 3.4920, Perplexity: 32.8529
2025-06-05 09:59:10,022 - __main__ - INFO - Epoch [1/9], Step [2300/3125], Loss: 3.7028, Perplexity: 40.5603
2025-06-05 10:00:42,264 - __main__ - INFO - Epoch [1/9], Step [2400/3125], Loss: 3.6794, Perplexity: 39.6238
2025-06-05 10:02:14,437 - __main__ - INFO - Epoch [1/9], Step [2500/3125], Loss: 3.4977, Perplexity: 33.0382
2025-06-05 10:03:47,338 - __main__ - INFO - Epoch [1/9], Step [2600/3125], Loss: 3.4883, Perplexity: 32.7311
2025-06-05 10:05:20,077 - __main__ - INFO - Epoch [1/9], Step [2700/3125], Loss: 3.4596, Perplexity: 31.8036
2025-06-05 10:06:52,363 - __main__ - INFO - Epoch [1/9], Step [2800/3125], Loss: 3.4043, Perplexity: 30.0920
2025-06-05 10:08:24,721 - __main__ - INFO - Epoch [1/9], Step [2900/3125], Loss: 3.1842, Perplexity: 24.1488
2025-06-05 10:09:56,919 - __main__ - INFO - Epoch [1/9], Step [3000/3125], Loss: 3.2843, Perplexity: 26.6914
2025-06-05 10:11:29,066 - __main__ - INFO - Epoch [1/9], Step [3100/3125], Loss: 3.1950, Perplexity: 24.4100
2025-06-05 10:11:52,259 - __main__ - INFO - Epoch 1 Training finished. Avg Loss: 3.8943, Time: 2896.84s
2025-06-05 10:24:31,901 - __main__ - INFO - Validation Avg Loss: 3.1313, Perplexity: 22.9038
2025-06-05 10:24:44,857 - __main__ - INFO - Validation BLEU-4: 0.0836
2025-06-05 10:24:45,632 - __main__ - INFO - Saved best model checkpoint to ./output/best_model_bleu0.0836.pth
2025-06-05 10:24:45,633 - __main__ - INFO - Performing memory optimization after epoch...
2025-06-05 10:24:45,924 - __main__ - INFO - CUDA cache emptied.
2025-06-05 10:24:46,199 - __main__ - INFO - Python garbage collector run.
2025-06-05 10:26:22,213 - __main__ - INFO - Epoch [2/9], Step [100/3125], Loss: 3.1401, Perplexity: 23.1065
2025-06-05 10:27:55,157 - __main__ - INFO - Epoch [2/9], Step [200/3125], Loss: 3.0563, Perplexity: 21.2493
2025-06-05 10:29:27,968 - __main__ - INFO - Epoch [2/9], Step [300/3125], Loss: 3.2782, Perplexity: 26.5282
2025-06-05 10:31:00,270 - __main__ - INFO - Epoch [2/9], Step [400/3125], Loss: 3.1834, Perplexity: 24.1295
2025-06-05 10:32:33,440 - __main__ - INFO - Epoch [2/9], Step [500/3125], Loss: 3.3472, Perplexity: 28.4244
2025-06-05 10:34:06,422 - __main__ - INFO - Epoch [2/9], Step [600/3125], Loss: 3.1609, Perplexity: 23.5928
2025-06-05 10:35:39,261 - __main__ - INFO - Epoch [2/9], Step [700/3125], Loss: 2.9984, Perplexity: 20.0538
2025-06-05 10:37:11,706 - __main__ - INFO - Epoch [2/9], Step [800/3125], Loss: 3.2494, Perplexity: 25.7758
2025-06-05 10:38:44,368 - __main__ - INFO - Epoch [2/9], Step [900/3125], Loss: 3.1955, Perplexity: 24.4232
2025-06-05 10:40:17,199 - __main__ - INFO - Epoch [2/9], Step [1000/3125], Loss: 3.4101, Perplexity: 30.2673
2025-06-05 10:41:49,573 - __main__ - INFO - Epoch [2/9], Step [1100/3125], Loss: 3.3706, Perplexity: 29.0971
2025-06-05 10:43:21,777 - __main__ - INFO - Epoch [2/9], Step [1200/3125], Loss: 3.1680, Perplexity: 23.7608
2025-06-05 10:44:54,540 - __main__ - INFO - Epoch [2/9], Step [1300/3125], Loss: 3.0362, Perplexity: 20.8251
2025-06-05 10:46:27,295 - __main__ - INFO - Epoch [2/9], Step [1400/3125], Loss: 3.0925, Perplexity: 22.0321
2025-06-05 10:47:59,572 - __main__ - INFO - Epoch [2/9], Step [1500/3125], Loss: 3.1057, Perplexity: 22.3241
2025-06-05 10:49:32,053 - __main__ - INFO - Epoch [2/9], Step [1600/3125], Loss: 3.3277, Perplexity: 27.8730
2025-06-05 10:51:04,671 - __main__ - INFO - Epoch [2/9], Step [1700/3125], Loss: 3.1610, Perplexity: 23.5945
2025-06-05 10:52:37,560 - __main__ - INFO - Epoch [2/9], Step [1800/3125], Loss: 3.2718, Perplexity: 26.3582
2025-06-05 10:54:10,358 - __main__ - INFO - Epoch [2/9], Step [1900/3125], Loss: 3.1425, Perplexity: 23.1612
2025-06-05 10:55:43,281 - __main__ - INFO - Epoch [2/9], Step [2000/3125], Loss: 3.0522, Perplexity: 21.1619
2025-06-05 10:57:15,767 - __main__ - INFO - Epoch [2/9], Step [2100/3125], Loss: 2.9711, Perplexity: 19.5126
2025-06-05 10:58:48,072 - __main__ - INFO - Epoch [2/9], Step [2200/3125], Loss: 3.0638, Perplexity: 21.4079
2025-06-05 11:00:20,322 - __main__ - INFO - Epoch [2/9], Step [2300/3125], Loss: 3.1867, Perplexity: 24.2093
2025-06-05 11:01:53,270 - __main__ - INFO - Epoch [2/9], Step [2400/3125], Loss: 3.0060, Perplexity: 20.2070
2025-06-05 11:03:25,315 - __main__ - INFO - Epoch [2/9], Step [2500/3125], Loss: 3.1718, Perplexity: 23.8494
2025-06-05 11:04:57,298 - __main__ - INFO - Epoch [2/9], Step [2600/3125], Loss: 3.0628, Perplexity: 21.3874
2025-06-05 11:06:29,658 - __main__ - INFO - Epoch [2/9], Step [2700/3125], Loss: 3.3898, Perplexity: 29.6609
2025-06-05 11:08:01,931 - __main__ - INFO - Epoch [2/9], Step [2800/3125], Loss: 2.7914, Perplexity: 16.3038
2025-06-05 11:09:34,397 - __main__ - INFO - Epoch [2/9], Step [2900/3125], Loss: 2.8758, Perplexity: 17.7403
2025-06-05 11:11:06,755 - __main__ - INFO - Epoch [2/9], Step [3000/3125], Loss: 3.0766, Perplexity: 21.6847
2025-06-05 11:12:39,179 - __main__ - INFO - Epoch [2/9], Step [3100/3125], Loss: 3.0787, Perplexity: 21.7303
2025-06-05 11:13:02,556 - __main__ - INFO - Epoch 2 Training finished. Avg Loss: 3.1318, Time: 2896.36s
2025-06-05 11:25:51,359 - __main__ - INFO - Validation Avg Loss: 2.8683, Perplexity: 17.6076
2025-06-05 11:26:01,678 - __main__ - INFO - Validation BLEU-4: 0.0923
2025-06-05 11:26:02,402 - __main__ - INFO - Saved best model checkpoint to ./output/best_model_bleu0.0923.pth
2025-06-05 11:26:02,404 - __main__ - INFO - Performing memory optimization after epoch...
2025-06-05 11:26:02,690 - __main__ - INFO - CUDA cache emptied.
2025-06-05 11:26:03,026 - __main__ - INFO - Python garbage collector run.
2025-06-05 11:27:38,287 - __main__ - INFO - Epoch [3/9], Step [100/3125], Loss: 3.1375, Perplexity: 23.0468
2025-06-05 11:29:10,992 - __main__ - INFO - Epoch [3/9], Step [200/3125], Loss: 2.9503, Perplexity: 19.1119
2025-06-05 11:30:43,374 - __main__ - INFO - Epoch [3/9], Step [300/3125], Loss: 2.9096, Perplexity: 18.3486
2025-06-05 11:32:15,961 - __main__ - INFO - Epoch [3/9], Step [400/3125], Loss: 2.9213, Perplexity: 18.5648
2025-06-05 11:33:48,692 - __main__ - INFO - Epoch [3/9], Step [500/3125], Loss: 2.7807, Perplexity: 16.1304
2025-06-05 11:35:21,035 - __main__ - INFO - Epoch [3/9], Step [600/3125], Loss: 2.9757, Perplexity: 19.6029
2025-06-05 11:36:53,722 - __main__ - INFO - Epoch [3/9], Step [700/3125], Loss: 2.7691, Perplexity: 15.9448
2025-06-05 11:38:26,136 - __main__ - INFO - Epoch [3/9], Step [800/3125], Loss: 2.6290, Perplexity: 13.8597
2025-06-05 11:39:58,718 - __main__ - INFO - Epoch [3/9], Step [900/3125], Loss: 2.7852, Perplexity: 16.2028
2025-06-05 11:41:31,263 - __main__ - INFO - Epoch [3/9], Step [1000/3125], Loss: 2.9282, Perplexity: 18.6934
2025-06-05 11:43:04,789 - __main__ - INFO - Epoch [3/9], Step [1100/3125], Loss: 3.0185, Perplexity: 20.4607
2025-06-05 11:44:37,274 - __main__ - INFO - Epoch [3/9], Step [1200/3125], Loss: 3.1393, Perplexity: 23.0879
2025-06-05 11:46:10,068 - __main__ - INFO - Epoch [3/9], Step [1300/3125], Loss: 2.9986, Perplexity: 20.0580
2025-06-05 11:47:42,373 - __main__ - INFO - Epoch [3/9], Step [1400/3125], Loss: 2.8490, Perplexity: 17.2704
2025-06-05 11:49:14,841 - __main__ - INFO - Epoch [3/9], Step [1500/3125], Loss: 3.1596, Perplexity: 23.5610
2025-06-05 11:50:47,296 - __main__ - INFO - Epoch [3/9], Step [1600/3125], Loss: 2.9312, Perplexity: 18.7505
2025-06-05 11:52:19,632 - __main__ - INFO - Epoch [3/9], Step [1700/3125], Loss: 2.8247, Perplexity: 16.8555
2025-06-05 11:53:52,241 - __main__ - INFO - Epoch [3/9], Step [1800/3125], Loss: 2.9927, Perplexity: 19.9386
2025-06-05 11:55:24,747 - __main__ - INFO - Epoch [3/9], Step [1900/3125], Loss: 2.7951, Perplexity: 16.3638
2025-06-05 11:56:57,095 - __main__ - INFO - Epoch [3/9], Step [2000/3125], Loss: 2.8393, Perplexity: 17.1032
2025-06-05 11:58:29,803 - __main__ - INFO - Epoch [3/9], Step [2100/3125], Loss: 3.0383, Perplexity: 20.8706
2025-06-05 12:00:01,877 - __main__ - INFO - Epoch [3/9], Step [2200/3125], Loss: 3.1580, Perplexity: 23.5244
2025-06-05 12:01:34,524 - __main__ - INFO - Epoch [3/9], Step [2300/3125], Loss: 2.9753, Perplexity: 19.5955
2025-06-05 12:03:06,876 - __main__ - INFO - Epoch [3/9], Step [2400/3125], Loss: 2.7754, Perplexity: 16.0452
2025-06-05 12:04:39,456 - __main__ - INFO - Epoch [3/9], Step [2500/3125], Loss: 2.5844, Perplexity: 13.2559
2025-06-05 12:06:12,314 - __main__ - INFO - Epoch [3/9], Step [2600/3125], Loss: 2.9575, Perplexity: 19.2490
2025-06-05 12:07:44,508 - __main__ - INFO - Epoch [3/9], Step [2700/3125], Loss: 2.7714, Perplexity: 15.9803
2025-06-05 12:09:16,908 - __main__ - INFO - Epoch [3/9], Step [2800/3125], Loss: 2.6014, Perplexity: 13.4829
2025-06-05 12:10:49,531 - __main__ - INFO - Epoch [3/9], Step [2900/3125], Loss: 2.9240, Perplexity: 18.6159
2025-06-05 12:12:22,014 - __main__ - INFO - Epoch [3/9], Step [3000/3125], Loss: 2.8361, Perplexity: 17.0495
2025-06-05 12:13:54,500 - __main__ - INFO - Epoch [3/9], Step [3100/3125], Loss: 2.9557, Perplexity: 19.2145
2025-06-05 12:14:17,729 - __main__ - INFO - Epoch 3 Training finished. Avg Loss: 2.9138, Time: 2894.70s
2025-06-05 12:27:10,365 - __main__ - INFO - Validation Avg Loss: 2.7487, Perplexity: 15.6226
2025-06-05 12:27:20,942 - __main__ - INFO - Validation BLEU-4: 0.0986
2025-06-05 12:27:21,682 - __main__ - INFO - Saved best model checkpoint to ./output/best_model_bleu0.0986.pth
2025-06-05 12:27:21,683 - __main__ - INFO - Performing memory optimization after epoch...
2025-06-05 12:27:21,948 - __main__ - INFO - CUDA cache emptied.
2025-06-05 12:27:22,293 - __main__ - INFO - Python garbage collector run.
2025-06-05 12:28:57,255 - __main__ - INFO - Epoch [4/9], Step [100/3125], Loss: 2.8362, Perplexity: 17.0513
2025-06-05 12:30:30,013 - __main__ - INFO - Epoch [4/9], Step [200/3125], Loss: 2.5668, Perplexity: 13.0241
2025-06-05 12:32:02,681 - __main__ - INFO - Epoch [4/9], Step [300/3125], Loss: 2.7211, Perplexity: 15.1978
2025-06-05 12:33:35,422 - __main__ - INFO - Epoch [4/9], Step [400/3125], Loss: 2.9763, Perplexity: 19.6153
2025-06-05 12:35:07,932 - __main__ - INFO - Epoch [4/9], Step [500/3125], Loss: 2.7447, Perplexity: 15.5594
2025-06-05 12:36:40,894 - __main__ - INFO - Epoch [4/9], Step [600/3125], Loss: 2.6688, Perplexity: 14.4227
2025-06-05 12:38:13,657 - __main__ - INFO - Epoch [4/9], Step [700/3125], Loss: 2.7560, Perplexity: 15.7370
2025-06-05 12:39:46,304 - __main__ - INFO - Epoch [4/9], Step [800/3125], Loss: 2.7199, Perplexity: 15.1782
2025-06-05 12:41:18,843 - __main__ - INFO - Epoch [4/9], Step [900/3125], Loss: 2.9431, Perplexity: 18.9745
2025-06-05 12:42:51,310 - __main__ - INFO - Epoch [4/9], Step [1000/3125], Loss: 2.7849, Perplexity: 16.1979
2025-06-05 12:44:24,070 - __main__ - INFO - Epoch [4/9], Step [1100/3125], Loss: 2.6553, Perplexity: 14.2291
2025-06-05 12:45:56,314 - __main__ - INFO - Epoch [4/9], Step [1200/3125], Loss: 2.7191, Perplexity: 15.1662
2025-06-05 12:47:28,651 - __main__ - INFO - Epoch [4/9], Step [1300/3125], Loss: 2.7089, Perplexity: 15.0125
2025-06-05 12:49:01,294 - __main__ - INFO - Epoch [4/9], Step [1400/3125], Loss: 2.9824, Perplexity: 19.7352
2025-06-05 12:50:34,013 - __main__ - INFO - Epoch [4/9], Step [1500/3125], Loss: 2.7723, Perplexity: 15.9946
2025-06-05 12:52:06,802 - __main__ - INFO - Epoch [4/9], Step [1600/3125], Loss: 2.8024, Perplexity: 16.4847
2025-06-05 12:53:39,654 - __main__ - INFO - Epoch [4/9], Step [1700/3125], Loss: 2.7457, Perplexity: 15.5758
2025-06-05 12:55:12,141 - __main__ - INFO - Epoch [4/9], Step [1800/3125], Loss: 2.9873, Perplexity: 19.8314
2025-06-05 12:56:44,381 - __main__ - INFO - Epoch [4/9], Step [1900/3125], Loss: 2.6393, Perplexity: 14.0032
2025-06-05 12:58:16,535 - __main__ - INFO - Epoch [4/9], Step [2000/3125], Loss: 2.6532, Perplexity: 14.1999
2025-06-05 12:59:49,166 - __main__ - INFO - Epoch [4/9], Step [2100/3125], Loss: 2.7995, Perplexity: 16.4372
2025-06-05 13:01:21,714 - __main__ - INFO - Epoch [4/9], Step [2200/3125], Loss: 2.7488, Perplexity: 15.6245
2025-06-05 13:02:53,947 - __main__ - INFO - Epoch [4/9], Step [2300/3125], Loss: 2.8463, Perplexity: 17.2242
2025-06-05 13:04:26,007 - __main__ - INFO - Epoch [4/9], Step [2400/3125], Loss: 2.6901, Perplexity: 14.7335
2025-06-05 13:05:58,607 - __main__ - INFO - Epoch [4/9], Step [2500/3125], Loss: 2.8890, Perplexity: 17.9762
2025-06-05 13:07:31,240 - __main__ - INFO - Epoch [4/9], Step [2600/3125], Loss: 2.6551, Perplexity: 14.2270
2025-06-05 13:09:03,803 - __main__ - INFO - Epoch [4/9], Step [2700/3125], Loss: 2.8474, Perplexity: 17.2437
2025-06-05 13:10:36,148 - __main__ - INFO - Epoch [4/9], Step [2800/3125], Loss: 2.7218, Perplexity: 15.2075
2025-06-05 13:12:08,443 - __main__ - INFO - Epoch [4/9], Step [2900/3125], Loss: 2.6753, Perplexity: 14.5169
2025-06-05 13:13:41,235 - __main__ - INFO - Epoch [4/9], Step [3000/3125], Loss: 2.8698, Perplexity: 17.6342
2025-06-05 13:15:13,729 - __main__ - INFO - Epoch [4/9], Step [3100/3125], Loss: 2.5715, Perplexity: 13.0854
2025-06-05 13:15:37,016 - __main__ - INFO - Epoch 4 Training finished. Avg Loss: 2.7810, Time: 2894.72s
2025-06-05 13:28:14,233 - __main__ - INFO - Validation Avg Loss: 2.6835, Perplexity: 14.6358
2025-06-05 13:28:24,559 - __main__ - INFO - Validation BLEU-4: 0.0996
2025-06-05 13:28:25,299 - __main__ - INFO - Saved best model checkpoint to ./output/best_model_bleu0.0996.pth
2025-06-05 13:28:25,300 - __main__ - INFO - Performing memory optimization after epoch...
2025-06-05 13:28:25,596 - __main__ - INFO - CUDA cache emptied.
2025-06-05 13:28:25,927 - __main__ - INFO - Python garbage collector run.
2025-06-05 13:30:01,372 - __main__ - INFO - Epoch [5/9], Step [100/3125], Loss: 2.7858, Perplexity: 16.2124
2025-06-05 13:31:34,188 - __main__ - INFO - Epoch [5/9], Step [200/3125], Loss: 2.5799, Perplexity: 13.1955
2025-06-05 13:33:06,552 - __main__ - INFO - Epoch [5/9], Step [300/3125], Loss: 2.7141, Perplexity: 15.0910
2025-06-05 13:34:39,208 - __main__ - INFO - Epoch [5/9], Step [400/3125], Loss: 2.6322, Perplexity: 13.9042
2025-06-05 13:36:11,891 - __main__ - INFO - Epoch [5/9], Step [500/3125], Loss: 2.9920, Perplexity: 19.9248
2025-06-05 13:37:44,341 - __main__ - INFO - Epoch [5/9], Step [600/3125], Loss: 2.5876, Perplexity: 13.2972
2025-06-05 13:39:16,416 - __main__ - INFO - Epoch [5/9], Step [700/3125], Loss: 2.6221, Perplexity: 13.7644
2025-06-05 13:40:48,960 - __main__ - INFO - Epoch [5/9], Step [800/3125], Loss: 2.5546, Perplexity: 12.8666
2025-06-05 13:42:21,593 - __main__ - INFO - Epoch [5/9], Step [900/3125], Loss: 2.7899, Perplexity: 16.2789
2025-06-05 13:43:54,018 - __main__ - INFO - Epoch [5/9], Step [1000/3125], Loss: 2.7180, Perplexity: 15.1505
2025-06-05 13:45:26,629 - __main__ - INFO - Epoch [5/9], Step [1100/3125], Loss: 2.6338, Perplexity: 13.9268
2025-06-05 13:46:59,396 - __main__ - INFO - Epoch [5/9], Step [1200/3125], Loss: 2.9994, Perplexity: 20.0732
2025-06-05 13:48:31,472 - __main__ - INFO - Epoch [5/9], Step [1300/3125], Loss: 2.5138, Perplexity: 12.3518
2025-06-05 13:50:03,810 - __main__ - INFO - Epoch [5/9], Step [1400/3125], Loss: 2.7708, Perplexity: 15.9718
2025-06-05 13:51:36,234 - __main__ - INFO - Epoch [5/9], Step [1500/3125], Loss: 2.6756, Perplexity: 14.5208
2025-06-05 13:53:08,614 - __main__ - INFO - Epoch [5/9], Step [1600/3125], Loss: 2.6377, Perplexity: 13.9804
2025-06-05 13:54:41,451 - __main__ - INFO - Epoch [5/9], Step [1700/3125], Loss: 2.6101, Perplexity: 13.6009
2025-06-05 13:56:14,028 - __main__ - INFO - Epoch [5/9], Step [1800/3125], Loss: 2.6736, Perplexity: 14.4918
2025-06-05 13:57:46,796 - __main__ - INFO - Epoch [5/9], Step [1900/3125], Loss: 2.6328, Perplexity: 13.9123
2025-06-05 13:59:19,826 - __main__ - INFO - Epoch [5/9], Step [2000/3125], Loss: 2.5901, Perplexity: 13.3318
2025-06-05 14:00:52,403 - __main__ - INFO - Epoch [5/9], Step [2100/3125], Loss: 2.9779, Perplexity: 19.6469
2025-06-05 14:02:25,242 - __main__ - INFO - Epoch [5/9], Step [2200/3125], Loss: 2.5809, Perplexity: 13.2096
2025-06-05 14:03:58,214 - __main__ - INFO - Epoch [5/9], Step [2300/3125], Loss: 2.7360, Perplexity: 15.4247
2025-06-05 14:05:30,984 - __main__ - INFO - Epoch [5/9], Step [2400/3125], Loss: 2.5268, Perplexity: 12.5135
2025-06-05 14:07:04,027 - __main__ - INFO - Epoch [5/9], Step [2500/3125], Loss: 2.8699, Perplexity: 17.6357
2025-06-05 14:08:36,966 - __main__ - INFO - Epoch [5/9], Step [2600/3125], Loss: 2.7812, Perplexity: 16.1382
2025-06-05 14:10:09,378 - __main__ - INFO - Epoch [5/9], Step [2700/3125], Loss: 2.6540, Perplexity: 14.2111
2025-06-05 14:11:41,671 - __main__ - INFO - Epoch [5/9], Step [2800/3125], Loss: 2.6715, Perplexity: 14.4612
2025-06-05 14:13:14,497 - __main__ - INFO - Epoch [5/9], Step [2900/3125], Loss: 2.8780, Perplexity: 17.7783
2025-06-05 14:14:47,210 - __main__ - INFO - Epoch [5/9], Step [3000/3125], Loss: 2.6441, Perplexity: 14.0706
2025-06-05 14:16:19,723 - __main__ - INFO - Epoch [5/9], Step [3100/3125], Loss: 2.6745, Perplexity: 14.5049
2025-06-05 14:16:43,018 - __main__ - INFO - Epoch 5 Training finished. Avg Loss: 2.6866, Time: 2897.09s
2025-06-05 14:29:20,136 - __main__ - INFO - Validation Avg Loss: 2.6448, Perplexity: 14.0800
2025-06-05 14:29:30,680 - __main__ - INFO - Validation BLEU-4: 0.1011
2025-06-05 14:29:31,398 - __main__ - INFO - Saved best model checkpoint to ./output/best_model_bleu0.1011.pth
2025-06-05 14:29:32,142 - __main__ - INFO - Saved periodic model checkpoint to ./output/model_epoch_5.pth
2025-06-05 14:29:32,143 - __main__ - INFO - Performing memory optimization after epoch...
2025-06-05 14:29:32,408 - __main__ - INFO - CUDA cache emptied.
2025-06-05 14:29:32,753 - __main__ - INFO - Python garbage collector run.
2025-06-05 14:31:08,760 - __main__ - INFO - Epoch [6/9], Step [100/3125], Loss: 2.9338, Perplexity: 18.7981
2025-06-05 14:32:41,036 - __main__ - INFO - Epoch [6/9], Step [200/3125], Loss: 2.7460, Perplexity: 15.5799
2025-06-05 14:34:13,610 - __main__ - INFO - Epoch [6/9], Step [300/3125], Loss: 2.3894, Perplexity: 10.9066
2025-06-05 14:35:46,857 - __main__ - INFO - Epoch [6/9], Step [400/3125], Loss: 2.6891, Perplexity: 14.7191
2025-06-05 14:37:19,167 - __main__ - INFO - Epoch [6/9], Step [500/3125], Loss: 2.7032, Perplexity: 14.9273
2025-06-05 14:38:52,048 - __main__ - INFO - Epoch [6/9], Step [600/3125], Loss: 2.6848, Perplexity: 14.6553
2025-06-05 14:40:24,881 - __main__ - INFO - Epoch [6/9], Step [700/3125], Loss: 2.4991, Perplexity: 12.1715
2025-06-05 14:41:57,087 - __main__ - INFO - Epoch [6/9], Step [800/3125], Loss: 2.5794, Perplexity: 13.1892
2025-06-05 14:43:29,854 - __main__ - INFO - Epoch [6/9], Step [900/3125], Loss: 2.5044, Perplexity: 12.2366
2025-06-05 14:45:02,445 - __main__ - INFO - Epoch [6/9], Step [1000/3125], Loss: 2.6207, Perplexity: 13.7448
2025-06-05 14:46:35,344 - __main__ - INFO - Epoch [6/9], Step [1100/3125], Loss: 2.5424, Perplexity: 12.7103
2025-06-05 14:48:08,048 - __main__ - INFO - Epoch [6/9], Step [1200/3125], Loss: 2.5540, Perplexity: 12.8587
2025-06-05 14:49:40,487 - __main__ - INFO - Epoch [6/9], Step [1300/3125], Loss: 2.5427, Perplexity: 12.7145
2025-06-05 14:51:13,035 - __main__ - INFO - Epoch [6/9], Step [1400/3125], Loss: 2.6368, Perplexity: 13.9681
2025-06-05 14:52:45,059 - __main__ - INFO - Epoch [6/9], Step [1500/3125], Loss: 2.5717, Perplexity: 13.0879
2025-06-05 14:54:17,291 - __main__ - INFO - Epoch [6/9], Step [1600/3125], Loss: 2.4953, Perplexity: 12.1255
2025-06-05 14:55:50,488 - __main__ - INFO - Epoch [6/9], Step [1700/3125], Loss: 2.7300, Perplexity: 15.3336
2025-06-05 14:57:23,442 - __main__ - INFO - Epoch [6/9], Step [1800/3125], Loss: 2.6784, Perplexity: 14.5613
2025-06-05 14:58:55,865 - __main__ - INFO - Epoch [6/9], Step [1900/3125], Loss: 2.6738, Perplexity: 14.4946
2025-06-05 15:00:28,266 - __main__ - INFO - Epoch [6/9], Step [2000/3125], Loss: 2.6148, Perplexity: 13.6649
2025-06-05 15:02:00,818 - __main__ - INFO - Epoch [6/9], Step [2100/3125], Loss: 2.8018, Perplexity: 16.4748
2025-06-05 15:03:33,616 - __main__ - INFO - Epoch [6/9], Step [2200/3125], Loss: 2.6942, Perplexity: 14.7940
2025-06-05 15:05:05,991 - __main__ - INFO - Epoch [6/9], Step [2300/3125], Loss: 2.4844, Perplexity: 11.9935
2025-06-05 15:06:38,662 - __main__ - INFO - Epoch [6/9], Step [2400/3125], Loss: 2.4125, Perplexity: 11.1615
2025-06-05 15:08:11,014 - __main__ - INFO - Epoch [6/9], Step [2500/3125], Loss: 2.5616, Perplexity: 12.9564
2025-06-05 15:09:43,766 - __main__ - INFO - Epoch [6/9], Step [2600/3125], Loss: 2.5049, Perplexity: 12.2425
2025-06-05 15:11:16,830 - __main__ - INFO - Epoch [6/9], Step [2700/3125], Loss: 2.4866, Perplexity: 12.0206
2025-06-05 15:12:49,025 - __main__ - INFO - Epoch [6/9], Step [2800/3125], Loss: 2.3411, Perplexity: 10.3931
2025-06-05 15:14:21,512 - __main__ - INFO - Epoch [6/9], Step [2900/3125], Loss: 2.6441, Perplexity: 14.0713
2025-06-05 15:15:54,097 - __main__ - INFO - Epoch [6/9], Step [3000/3125], Loss: 2.5855, Perplexity: 13.2695
2025-06-05 15:17:26,931 - __main__ - INFO - Epoch [6/9], Step [3100/3125], Loss: 2.7371, Perplexity: 15.4419
2025-06-05 15:17:50,166 - __main__ - INFO - Epoch 6 Training finished. Avg Loss: 2.6127, Time: 2897.41s
2025-06-05 15:30:17,460 - __main__ - INFO - Validation Avg Loss: 2.6184, Perplexity: 13.7138
2025-06-05 15:30:28,171 - __main__ - INFO - Validation BLEU-4: 0.1009
2025-06-05 15:30:28,172 - __main__ - INFO - Performing memory optimization after epoch...
2025-06-05 15:30:28,470 - __main__ - INFO - CUDA cache emptied.
2025-06-05 15:30:28,816 - __main__ - INFO - Python garbage collector run.
2025-06-05 15:32:05,141 - __main__ - INFO - Epoch [7/9], Step [100/3125], Loss: 2.6545, Perplexity: 14.2180
2025-06-05 15:33:37,788 - __main__ - INFO - Epoch [7/9], Step [200/3125], Loss: 2.3907, Perplexity: 10.9206
2025-06-05 15:35:09,924 - __main__ - INFO - Epoch [7/9], Step [300/3125], Loss: 2.3964, Perplexity: 10.9837
2025-06-05 15:36:42,633 - __main__ - INFO - Epoch [7/9], Step [400/3125], Loss: 2.2917, Perplexity: 9.8918
2025-06-05 15:38:15,138 - __main__ - INFO - Epoch [7/9], Step [500/3125], Loss: 2.5460, Perplexity: 12.7565
2025-06-05 15:39:47,288 - __main__ - INFO - Epoch [7/9], Step [600/3125], Loss: 2.7165, Perplexity: 15.1276
2025-06-05 15:41:19,984 - __main__ - INFO - Epoch [7/9], Step [700/3125], Loss: 2.3160, Perplexity: 10.1350
2025-06-05 15:42:52,625 - __main__ - INFO - Epoch [7/9], Step [800/3125], Loss: 2.4257, Perplexity: 11.3097
2025-06-05 15:44:25,300 - __main__ - INFO - Epoch [7/9], Step [900/3125], Loss: 2.8313, Perplexity: 16.9669
2025-06-05 15:45:57,862 - __main__ - INFO - Epoch [7/9], Step [1000/3125], Loss: 2.4841, Perplexity: 11.9898
2025-06-05 15:47:30,346 - __main__ - INFO - Epoch [7/9], Step [1100/3125], Loss: 2.4737, Perplexity: 11.8667
2025-06-05 15:49:02,664 - __main__ - INFO - Epoch [7/9], Step [1200/3125], Loss: 2.3904, Perplexity: 10.9178
2025-06-05 15:50:34,955 - __main__ - INFO - Epoch [7/9], Step [1300/3125], Loss: 2.4815, Perplexity: 11.9595
2025-06-05 15:52:07,231 - __main__ - INFO - Epoch [7/9], Step [1400/3125], Loss: 2.7324, Perplexity: 15.3697
2025-06-05 15:53:39,430 - __main__ - INFO - Epoch [7/9], Step [1500/3125], Loss: 2.7363, Perplexity: 15.4298
2025-06-05 15:55:11,826 - __main__ - INFO - Epoch [7/9], Step [1600/3125], Loss: 2.7751, Perplexity: 16.0397
2025-06-05 15:56:44,380 - __main__ - INFO - Epoch [7/9], Step [1700/3125], Loss: 2.5598, Perplexity: 12.9329
2025-06-05 15:58:16,561 - __main__ - INFO - Epoch [7/9], Step [1800/3125], Loss: 2.7369, Perplexity: 15.4388
2025-06-05 15:59:49,080 - __main__ - INFO - Epoch [7/9], Step [1900/3125], Loss: 2.8117, Perplexity: 16.6374
2025-06-05 16:01:21,995 - __main__ - INFO - Epoch [7/9], Step [2000/3125], Loss: 2.6360, Perplexity: 13.9572
2025-06-05 16:02:54,730 - __main__ - INFO - Epoch [7/9], Step [2100/3125], Loss: 2.4383, Perplexity: 11.4533
2025-06-05 16:04:28,093 - __main__ - INFO - Epoch [7/9], Step [2200/3125], Loss: 2.7142, Perplexity: 15.0932
2025-06-05 16:06:01,476 - __main__ - INFO - Epoch [7/9], Step [2300/3125], Loss: 2.4590, Perplexity: 11.6927
2025-06-05 16:07:34,763 - __main__ - INFO - Epoch [7/9], Step [2400/3125], Loss: 2.6457, Perplexity: 14.0935
2025-06-05 16:09:07,778 - __main__ - INFO - Epoch [7/9], Step [2500/3125], Loss: 2.6708, Perplexity: 14.4518
2025-06-05 16:10:40,473 - __main__ - INFO - Epoch [7/9], Step [2600/3125], Loss: 2.5663, Perplexity: 13.0179
2025-06-05 16:12:12,943 - __main__ - INFO - Epoch [7/9], Step [2700/3125], Loss: 2.8020, Perplexity: 16.4769
2025-06-05 16:13:45,226 - __main__ - INFO - Epoch [7/9], Step [2800/3125], Loss: 2.6069, Perplexity: 13.5576
2025-06-05 16:15:17,969 - __main__ - INFO - Epoch [7/9], Step [2900/3125], Loss: 2.5213, Perplexity: 12.4453
2025-06-05 16:16:51,084 - __main__ - INFO - Epoch [7/9], Step [3000/3125], Loss: 2.5426, Perplexity: 12.7128
2025-06-05 16:18:23,560 - __main__ - INFO - Epoch [7/9], Step [3100/3125], Loss: 2.6069, Perplexity: 13.5576
2025-06-05 16:18:46,982 - __main__ - INFO - Epoch 7 Training finished. Avg Loss: 2.5503, Time: 2898.16s
2025-06-05 16:31:32,517 - __main__ - INFO - Validation Avg Loss: 2.6021, Perplexity: 13.4918
2025-06-05 16:31:42,995 - __main__ - INFO - Validation BLEU-4: 0.1028
2025-06-05 16:31:43,704 - __main__ - INFO - Saved best model checkpoint to ./output/best_model_bleu0.1028.pth
2025-06-05 16:31:43,705 - __main__ - INFO - Performing memory optimization after epoch...
2025-06-05 16:31:43,969 - __main__ - INFO - CUDA cache emptied.
2025-06-05 16:31:44,301 - __main__ - INFO - Python garbage collector run.
2025-06-05 16:33:19,208 - __main__ - INFO - Epoch [8/9], Step [100/3125], Loss: 2.5949, Perplexity: 13.3954
2025-06-05 16:34:52,087 - __main__ - INFO - Epoch [8/9], Step [200/3125], Loss: 2.3867, Perplexity: 10.8772
2025-06-05 16:36:24,295 - __main__ - INFO - Epoch [8/9], Step [300/3125], Loss: 2.4636, Perplexity: 11.7471
2025-06-05 16:37:56,189 - __main__ - INFO - Epoch [8/9], Step [400/3125], Loss: 2.5906, Perplexity: 13.3381
2025-06-05 16:39:29,764 - __main__ - INFO - Epoch [8/9], Step [500/3125], Loss: 2.5785, Perplexity: 13.1772
2025-06-05 16:41:02,645 - __main__ - INFO - Epoch [8/9], Step [600/3125], Loss: 2.5902, Perplexity: 13.3330
2025-06-05 16:42:35,744 - __main__ - INFO - Epoch [8/9], Step [700/3125], Loss: 2.6504, Perplexity: 14.1599
2025-06-05 16:44:08,175 - __main__ - INFO - Epoch [8/9], Step [800/3125], Loss: 2.5812, Perplexity: 13.2125
2025-06-05 16:45:41,068 - __main__ - INFO - Epoch [8/9], Step [900/3125], Loss: 2.3477, Perplexity: 10.4617
2025-06-05 16:47:13,040 - __main__ - INFO - Epoch [8/9], Step [1000/3125], Loss: 2.5416, Perplexity: 12.7004
2025-06-05 16:48:45,442 - __main__ - INFO - Epoch [8/9], Step [1100/3125], Loss: 2.5581, Perplexity: 12.9117
2025-06-05 16:50:18,029 - __main__ - INFO - Epoch [8/9], Step [1200/3125], Loss: 2.6781, Perplexity: 14.5574
2025-06-05 16:51:51,068 - __main__ - INFO - Epoch [8/9], Step [1300/3125], Loss: 2.4678, Perplexity: 11.7967
2025-06-05 16:53:23,767 - __main__ - INFO - Epoch [8/9], Step [1400/3125], Loss: 2.7964, Perplexity: 16.3863
2025-06-05 16:54:56,092 - __main__ - INFO - Epoch [8/9], Step [1500/3125], Loss: 2.5617, Perplexity: 12.9583
2025-06-05 16:56:28,564 - __main__ - INFO - Epoch [8/9], Step [1600/3125], Loss: 2.4647, Perplexity: 11.7597
2025-06-05 16:58:01,422 - __main__ - INFO - Epoch [8/9], Step [1700/3125], Loss: 2.5186, Perplexity: 12.4108
2025-06-05 16:59:34,614 - __main__ - INFO - Epoch [8/9], Step [1800/3125], Loss: 2.7458, Perplexity: 15.5764
2025-06-05 17:01:06,892 - __main__ - INFO - Epoch [8/9], Step [1900/3125], Loss: 2.4218, Perplexity: 11.2661
2025-06-05 17:02:39,718 - __main__ - INFO - Epoch [8/9], Step [2000/3125], Loss: 2.4996, Perplexity: 12.1774
2025-06-05 17:04:12,656 - __main__ - INFO - Epoch [8/9], Step [2100/3125], Loss: 2.4715, Perplexity: 11.8400
2025-06-05 17:05:45,350 - __main__ - INFO - Epoch [8/9], Step [2200/3125], Loss: 2.7304, Perplexity: 15.3395
2025-06-05 17:07:18,313 - __main__ - INFO - Epoch [8/9], Step [2300/3125], Loss: 2.6088, Perplexity: 13.5826
2025-06-05 17:08:50,891 - __main__ - INFO - Epoch [8/9], Step [2400/3125], Loss: 2.2952, Perplexity: 9.9266
2025-06-05 17:10:23,321 - __main__ - INFO - Epoch [8/9], Step [2500/3125], Loss: 2.6801, Perplexity: 14.5865
2025-06-05 17:11:56,233 - __main__ - INFO - Epoch [8/9], Step [2600/3125], Loss: 2.6453, Perplexity: 14.0877
2025-06-05 17:13:28,949 - __main__ - INFO - Epoch [8/9], Step [2700/3125], Loss: 2.4980, Perplexity: 12.1576
2025-06-05 17:15:01,500 - __main__ - INFO - Epoch [8/9], Step [2800/3125], Loss: 2.4855, Perplexity: 12.0065
2025-06-05 17:16:33,983 - __main__ - INFO - Epoch [8/9], Step [2900/3125], Loss: 2.7647, Perplexity: 15.8748
2025-06-05 17:18:06,726 - __main__ - INFO - Epoch [8/9], Step [3000/3125], Loss: 2.3954, Perplexity: 10.9722
2025-06-05 17:19:40,090 - __main__ - INFO - Epoch [8/9], Step [3100/3125], Loss: 2.5672, Perplexity: 13.0291
2025-06-05 17:20:03,575 - __main__ - INFO - Epoch 8 Training finished. Avg Loss: 2.4984, Time: 2899.27s
2025-06-05 17:32:41,641 - __main__ - INFO - Validation Avg Loss: 2.5887, Perplexity: 13.3131
2025-06-05 17:32:52,092 - __main__ - INFO - Validation BLEU-4: 0.1029
2025-06-05 17:32:52,835 - __main__ - INFO - Saved best model checkpoint to ./output/best_model_bleu0.1029.pth
2025-06-05 17:32:52,836 - __main__ - INFO - Performing memory optimization after epoch...
2025-06-05 17:32:53,135 - __main__ - INFO - CUDA cache emptied.
2025-06-05 17:32:53,477 - __main__ - INFO - Python garbage collector run.
2025-06-05 17:34:28,617 - __main__ - INFO - Epoch [9/9], Step [100/3125], Loss: 2.4885, Perplexity: 12.0428
2025-06-05 17:36:01,256 - __main__ - INFO - Epoch [9/9], Step [200/3125], Loss: 2.4464, Perplexity: 11.5461
2025-06-05 17:37:33,875 - __main__ - INFO - Epoch [9/9], Step [300/3125], Loss: 2.5567, Perplexity: 12.8926
2025-06-05 17:39:06,254 - __main__ - INFO - Epoch [9/9], Step [400/3125], Loss: 2.5434, Perplexity: 12.7234
2025-06-05 17:40:38,946 - __main__ - INFO - Epoch [9/9], Step [500/3125], Loss: 2.4282, Perplexity: 11.3380
2025-06-05 17:42:11,097 - __main__ - INFO - Epoch [9/9], Step [600/3125], Loss: 2.4907, Perplexity: 12.0691
2025-06-05 17:43:43,481 - __main__ - INFO - Epoch [9/9], Step [700/3125], Loss: 2.5079, Perplexity: 12.2789
2025-06-05 17:45:15,684 - __main__ - INFO - Epoch [9/9], Step [800/3125], Loss: 2.4735, Perplexity: 11.8636
2025-06-05 17:46:48,773 - __main__ - INFO - Epoch [9/9], Step [900/3125], Loss: 2.5014, Perplexity: 12.2001
2025-06-05 17:48:21,463 - __main__ - INFO - Epoch [9/9], Step [1000/3125], Loss: 2.7705, Perplexity: 15.9662
2025-06-05 17:49:53,960 - __main__ - INFO - Epoch [9/9], Step [1100/3125], Loss: 2.6952, Perplexity: 14.8082