Spaces:
Build error
Build error
| 2025-06-05 09:14:36,398 - __main__ - INFO - | |
| --- Starting Model Training --- | |
| 2025-06-05 09:14:36,399 - __main__ - INFO - Starting training process... | |
| 2025-06-05 09:14:36,400 - __main__ - INFO - Using device: cuda | |
| 2025-06-05 09:14:36,405 - __main__ - INFO - Initializing training dataset for vocabulary building... | |
| 2025-06-05 09:14:37,522 - __main__ - INFO - Successfully loaded captions from /kaggle/input/coco-2017-dataset/coco2017/annotations/captions_train2017.json | |
| 2025-06-05 09:17:31,012 - __main__ - INFO - | |
| --- Starting Model Training --- | |
| 2025-06-05 09:17:31,013 - __main__ - INFO - Starting training process... | |
| 2025-06-05 09:17:31,014 - __main__ - INFO - Using device: cuda | |
| 2025-06-05 09:17:31,088 - __main__ - INFO - Loaded vocabulary from /kaggle/input/vocabulary_s/pytorch/default/1/vocabulary.pkl | |
| 2025-06-05 09:17:32,171 - __main__ - INFO - Successfully loaded captions from /kaggle/input/coco-2017-dataset/coco2017/annotations/captions_train2017.json | |
| 2025-06-05 09:23:17,287 - __main__ - INFO - Using subset of 200000 samples for the dataset. | |
| 2025-06-05 09:23:17,288 - __main__ - INFO - Dataset size after filtering: 200000 samples. | |
| 2025-06-05 09:23:17,386 - __main__ - INFO - Successfully loaded captions from /kaggle/input/coco-2017-dataset/coco2017/annotations/captions_val2017.json | |
| 2025-06-05 09:23:33,589 - __main__ - INFO - Dataset size after filtering: 25014 samples. | |
| 2025-06-05 09:23:33,590 - __main__ - INFO - Training dataset size: 200000 | |
| 2025-06-05 09:23:33,591 - __main__ - INFO - Validation dataset size: 25014 | |
| 2025-06-05 09:23:34,957 - __main__ - INFO - ResNet encoder base layers are fine-tuning enabled. | |
| 2025-06-05 09:23:35,415 - __main__ - INFO - No checkpoint found. Starting training from scratch. | |
| 2025-06-05 09:25:12,222 - __main__ - INFO - Epoch [1/9], Step [100/3125], Loss: 5.4706, Perplexity: 237.6090 | |
| 2025-06-05 09:26:45,054 - __main__ - INFO - Epoch [1/9], Step [200/3125], Loss: 5.1889, Perplexity: 179.2651 | |
| 2025-06-05 09:28:17,121 - __main__ - INFO - Epoch [1/9], Step [300/3125], Loss: 4.7951, Perplexity: 120.9148 | |
| 2025-06-05 09:29:49,844 - __main__ - INFO - Epoch [1/9], Step [400/3125], Loss: 4.1438, Perplexity: 63.0419 | |
| 2025-06-05 09:31:22,479 - __main__ - INFO - Epoch [1/9], Step [500/3125], Loss: 4.0644, Perplexity: 58.2316 | |
| 2025-06-05 09:32:55,263 - __main__ - INFO - Epoch [1/9], Step [600/3125], Loss: 4.1386, Perplexity: 62.7179 | |
| 2025-06-05 09:34:28,639 - __main__ - INFO - Epoch [1/9], Step [700/3125], Loss: 4.4148, Perplexity: 82.6655 | |
| 2025-06-05 09:36:01,562 - __main__ - INFO - Epoch [1/9], Step [800/3125], Loss: 4.0775, Perplexity: 58.9954 | |
| 2025-06-05 09:37:34,470 - __main__ - INFO - Epoch [1/9], Step [900/3125], Loss: 3.8928, Perplexity: 49.0485 | |
| 2025-06-05 09:39:07,144 - __main__ - INFO - Epoch [1/9], Step [1000/3125], Loss: 3.7069, Perplexity: 40.7267 | |
| 2025-06-05 09:40:39,832 - __main__ - INFO - Epoch [1/9], Step [1100/3125], Loss: 3.7165, Perplexity: 41.1201 | |
| 2025-06-05 09:42:12,245 - __main__ - INFO - Epoch [1/9], Step [1200/3125], Loss: 3.6987, Perplexity: 40.3960 | |
| 2025-06-05 09:43:44,664 - __main__ - INFO - Epoch [1/9], Step [1300/3125], Loss: 3.8443, Perplexity: 46.7262 | |
| 2025-06-05 09:45:16,983 - __main__ - INFO - Epoch [1/9], Step [1400/3125], Loss: 3.7788, Perplexity: 43.7654 | |
| 2025-06-05 09:46:49,549 - __main__ - INFO - Epoch [1/9], Step [1500/3125], Loss: 3.9427, Perplexity: 51.5586 | |
| 2025-06-05 09:48:22,306 - __main__ - INFO - Epoch [1/9], Step [1600/3125], Loss: 3.9204, Perplexity: 50.4209 | |
| 2025-06-05 09:49:55,016 - __main__ - INFO - Epoch [1/9], Step [1700/3125], Loss: 3.6418, Perplexity: 38.1600 | |
| 2025-06-05 09:51:27,127 - __main__ - INFO - Epoch [1/9], Step [1800/3125], Loss: 3.5172, Perplexity: 33.6913 | |
| 2025-06-05 09:52:59,489 - __main__ - INFO - Epoch [1/9], Step [1900/3125], Loss: 3.6409, Perplexity: 38.1244 | |
| 2025-06-05 09:54:31,921 - __main__ - INFO - Epoch [1/9], Step [2000/3125], Loss: 3.2969, Perplexity: 27.0292 | |
| 2025-06-05 09:56:04,383 - __main__ - INFO - Epoch [1/9], Step [2100/3125], Loss: 3.4517, Perplexity: 31.5544 | |
| 2025-06-05 09:57:36,919 - __main__ - INFO - Epoch [1/9], Step [2200/3125], Loss: 3.4920, Perplexity: 32.8529 | |
| 2025-06-05 09:59:10,022 - __main__ - INFO - Epoch [1/9], Step [2300/3125], Loss: 3.7028, Perplexity: 40.5603 | |
| 2025-06-05 10:00:42,264 - __main__ - INFO - Epoch [1/9], Step [2400/3125], Loss: 3.6794, Perplexity: 39.6238 | |
| 2025-06-05 10:02:14,437 - __main__ - INFO - Epoch [1/9], Step [2500/3125], Loss: 3.4977, Perplexity: 33.0382 | |
| 2025-06-05 10:03:47,338 - __main__ - INFO - Epoch [1/9], Step [2600/3125], Loss: 3.4883, Perplexity: 32.7311 | |
| 2025-06-05 10:05:20,077 - __main__ - INFO - Epoch [1/9], Step [2700/3125], Loss: 3.4596, Perplexity: 31.8036 | |
| 2025-06-05 10:06:52,363 - __main__ - INFO - Epoch [1/9], Step [2800/3125], Loss: 3.4043, Perplexity: 30.0920 | |
| 2025-06-05 10:08:24,721 - __main__ - INFO - Epoch [1/9], Step [2900/3125], Loss: 3.1842, Perplexity: 24.1488 | |
| 2025-06-05 10:09:56,919 - __main__ - INFO - Epoch [1/9], Step [3000/3125], Loss: 3.2843, Perplexity: 26.6914 | |
| 2025-06-05 10:11:29,066 - __main__ - INFO - Epoch [1/9], Step [3100/3125], Loss: 3.1950, Perplexity: 24.4100 | |
| 2025-06-05 10:11:52,259 - __main__ - INFO - Epoch 1 Training finished. Avg Loss: 3.8943, Time: 2896.84s | |
| 2025-06-05 10:24:31,901 - __main__ - INFO - Validation Avg Loss: 3.1313, Perplexity: 22.9038 | |
| 2025-06-05 10:24:44,857 - __main__ - INFO - Validation BLEU-4: 0.0836 | |
| 2025-06-05 10:24:45,632 - __main__ - INFO - Saved best model checkpoint to ./output/best_model_bleu0.0836.pth | |
| 2025-06-05 10:24:45,633 - __main__ - INFO - Performing memory optimization after epoch... | |
| 2025-06-05 10:24:45,924 - __main__ - INFO - CUDA cache emptied. | |
| 2025-06-05 10:24:46,199 - __main__ - INFO - Python garbage collector run. | |
| 2025-06-05 10:26:22,213 - __main__ - INFO - Epoch [2/9], Step [100/3125], Loss: 3.1401, Perplexity: 23.1065 | |
| 2025-06-05 10:27:55,157 - __main__ - INFO - Epoch [2/9], Step [200/3125], Loss: 3.0563, Perplexity: 21.2493 | |
| 2025-06-05 10:29:27,968 - __main__ - INFO - Epoch [2/9], Step [300/3125], Loss: 3.2782, Perplexity: 26.5282 | |
| 2025-06-05 10:31:00,270 - __main__ - INFO - Epoch [2/9], Step [400/3125], Loss: 3.1834, Perplexity: 24.1295 | |
| 2025-06-05 10:32:33,440 - __main__ - INFO - Epoch [2/9], Step [500/3125], Loss: 3.3472, Perplexity: 28.4244 | |
| 2025-06-05 10:34:06,422 - __main__ - INFO - Epoch [2/9], Step [600/3125], Loss: 3.1609, Perplexity: 23.5928 | |
| 2025-06-05 10:35:39,261 - __main__ - INFO - Epoch [2/9], Step [700/3125], Loss: 2.9984, Perplexity: 20.0538 | |
| 2025-06-05 10:37:11,706 - __main__ - INFO - Epoch [2/9], Step [800/3125], Loss: 3.2494, Perplexity: 25.7758 | |
| 2025-06-05 10:38:44,368 - __main__ - INFO - Epoch [2/9], Step [900/3125], Loss: 3.1955, Perplexity: 24.4232 | |
| 2025-06-05 10:40:17,199 - __main__ - INFO - Epoch [2/9], Step [1000/3125], Loss: 3.4101, Perplexity: 30.2673 | |
| 2025-06-05 10:41:49,573 - __main__ - INFO - Epoch [2/9], Step [1100/3125], Loss: 3.3706, Perplexity: 29.0971 | |
| 2025-06-05 10:43:21,777 - __main__ - INFO - Epoch [2/9], Step [1200/3125], Loss: 3.1680, Perplexity: 23.7608 | |
| 2025-06-05 10:44:54,540 - __main__ - INFO - Epoch [2/9], Step [1300/3125], Loss: 3.0362, Perplexity: 20.8251 | |
| 2025-06-05 10:46:27,295 - __main__ - INFO - Epoch [2/9], Step [1400/3125], Loss: 3.0925, Perplexity: 22.0321 | |
| 2025-06-05 10:47:59,572 - __main__ - INFO - Epoch [2/9], Step [1500/3125], Loss: 3.1057, Perplexity: 22.3241 | |
| 2025-06-05 10:49:32,053 - __main__ - INFO - Epoch [2/9], Step [1600/3125], Loss: 3.3277, Perplexity: 27.8730 | |
| 2025-06-05 10:51:04,671 - __main__ - INFO - Epoch [2/9], Step [1700/3125], Loss: 3.1610, Perplexity: 23.5945 | |
| 2025-06-05 10:52:37,560 - __main__ - INFO - Epoch [2/9], Step [1800/3125], Loss: 3.2718, Perplexity: 26.3582 | |
| 2025-06-05 10:54:10,358 - __main__ - INFO - Epoch [2/9], Step [1900/3125], Loss: 3.1425, Perplexity: 23.1612 | |
| 2025-06-05 10:55:43,281 - __main__ - INFO - Epoch [2/9], Step [2000/3125], Loss: 3.0522, Perplexity: 21.1619 | |
| 2025-06-05 10:57:15,767 - __main__ - INFO - Epoch [2/9], Step [2100/3125], Loss: 2.9711, Perplexity: 19.5126 | |
| 2025-06-05 10:58:48,072 - __main__ - INFO - Epoch [2/9], Step [2200/3125], Loss: 3.0638, Perplexity: 21.4079 | |
| 2025-06-05 11:00:20,322 - __main__ - INFO - Epoch [2/9], Step [2300/3125], Loss: 3.1867, Perplexity: 24.2093 | |
| 2025-06-05 11:01:53,270 - __main__ - INFO - Epoch [2/9], Step [2400/3125], Loss: 3.0060, Perplexity: 20.2070 | |
| 2025-06-05 11:03:25,315 - __main__ - INFO - Epoch [2/9], Step [2500/3125], Loss: 3.1718, Perplexity: 23.8494 | |
| 2025-06-05 11:04:57,298 - __main__ - INFO - Epoch [2/9], Step [2600/3125], Loss: 3.0628, Perplexity: 21.3874 | |
| 2025-06-05 11:06:29,658 - __main__ - INFO - Epoch [2/9], Step [2700/3125], Loss: 3.3898, Perplexity: 29.6609 | |
| 2025-06-05 11:08:01,931 - __main__ - INFO - Epoch [2/9], Step [2800/3125], Loss: 2.7914, Perplexity: 16.3038 | |
| 2025-06-05 11:09:34,397 - __main__ - INFO - Epoch [2/9], Step [2900/3125], Loss: 2.8758, Perplexity: 17.7403 | |
| 2025-06-05 11:11:06,755 - __main__ - INFO - Epoch [2/9], Step [3000/3125], Loss: 3.0766, Perplexity: 21.6847 | |
| 2025-06-05 11:12:39,179 - __main__ - INFO - Epoch [2/9], Step [3100/3125], Loss: 3.0787, Perplexity: 21.7303 | |
| 2025-06-05 11:13:02,556 - __main__ - INFO - Epoch 2 Training finished. Avg Loss: 3.1318, Time: 2896.36s | |
| 2025-06-05 11:25:51,359 - __main__ - INFO - Validation Avg Loss: 2.8683, Perplexity: 17.6076 | |
| 2025-06-05 11:26:01,678 - __main__ - INFO - Validation BLEU-4: 0.0923 | |
| 2025-06-05 11:26:02,402 - __main__ - INFO - Saved best model checkpoint to ./output/best_model_bleu0.0923.pth | |
| 2025-06-05 11:26:02,404 - __main__ - INFO - Performing memory optimization after epoch... | |
| 2025-06-05 11:26:02,690 - __main__ - INFO - CUDA cache emptied. | |
| 2025-06-05 11:26:03,026 - __main__ - INFO - Python garbage collector run. | |
| 2025-06-05 11:27:38,287 - __main__ - INFO - Epoch [3/9], Step [100/3125], Loss: 3.1375, Perplexity: 23.0468 | |
| 2025-06-05 11:29:10,992 - __main__ - INFO - Epoch [3/9], Step [200/3125], Loss: 2.9503, Perplexity: 19.1119 | |
| 2025-06-05 11:30:43,374 - __main__ - INFO - Epoch [3/9], Step [300/3125], Loss: 2.9096, Perplexity: 18.3486 | |
| 2025-06-05 11:32:15,961 - __main__ - INFO - Epoch [3/9], Step [400/3125], Loss: 2.9213, Perplexity: 18.5648 | |
| 2025-06-05 11:33:48,692 - __main__ - INFO - Epoch [3/9], Step [500/3125], Loss: 2.7807, Perplexity: 16.1304 | |
| 2025-06-05 11:35:21,035 - __main__ - INFO - Epoch [3/9], Step [600/3125], Loss: 2.9757, Perplexity: 19.6029 | |
| 2025-06-05 11:36:53,722 - __main__ - INFO - Epoch [3/9], Step [700/3125], Loss: 2.7691, Perplexity: 15.9448 | |
| 2025-06-05 11:38:26,136 - __main__ - INFO - Epoch [3/9], Step [800/3125], Loss: 2.6290, Perplexity: 13.8597 | |
| 2025-06-05 11:39:58,718 - __main__ - INFO - Epoch [3/9], Step [900/3125], Loss: 2.7852, Perplexity: 16.2028 | |
| 2025-06-05 11:41:31,263 - __main__ - INFO - Epoch [3/9], Step [1000/3125], Loss: 2.9282, Perplexity: 18.6934 | |
| 2025-06-05 11:43:04,789 - __main__ - INFO - Epoch [3/9], Step [1100/3125], Loss: 3.0185, Perplexity: 20.4607 | |
| 2025-06-05 11:44:37,274 - __main__ - INFO - Epoch [3/9], Step [1200/3125], Loss: 3.1393, Perplexity: 23.0879 | |
| 2025-06-05 11:46:10,068 - __main__ - INFO - Epoch [3/9], Step [1300/3125], Loss: 2.9986, Perplexity: 20.0580 | |
| 2025-06-05 11:47:42,373 - __main__ - INFO - Epoch [3/9], Step [1400/3125], Loss: 2.8490, Perplexity: 17.2704 | |
| 2025-06-05 11:49:14,841 - __main__ - INFO - Epoch [3/9], Step [1500/3125], Loss: 3.1596, Perplexity: 23.5610 | |
| 2025-06-05 11:50:47,296 - __main__ - INFO - Epoch [3/9], Step [1600/3125], Loss: 2.9312, Perplexity: 18.7505 | |
| 2025-06-05 11:52:19,632 - __main__ - INFO - Epoch [3/9], Step [1700/3125], Loss: 2.8247, Perplexity: 16.8555 | |
| 2025-06-05 11:53:52,241 - __main__ - INFO - Epoch [3/9], Step [1800/3125], Loss: 2.9927, Perplexity: 19.9386 | |
| 2025-06-05 11:55:24,747 - __main__ - INFO - Epoch [3/9], Step [1900/3125], Loss: 2.7951, Perplexity: 16.3638 | |
| 2025-06-05 11:56:57,095 - __main__ - INFO - Epoch [3/9], Step [2000/3125], Loss: 2.8393, Perplexity: 17.1032 | |
| 2025-06-05 11:58:29,803 - __main__ - INFO - Epoch [3/9], Step [2100/3125], Loss: 3.0383, Perplexity: 20.8706 | |
| 2025-06-05 12:00:01,877 - __main__ - INFO - Epoch [3/9], Step [2200/3125], Loss: 3.1580, Perplexity: 23.5244 | |
| 2025-06-05 12:01:34,524 - __main__ - INFO - Epoch [3/9], Step [2300/3125], Loss: 2.9753, Perplexity: 19.5955 | |
| 2025-06-05 12:03:06,876 - __main__ - INFO - Epoch [3/9], Step [2400/3125], Loss: 2.7754, Perplexity: 16.0452 | |
| 2025-06-05 12:04:39,456 - __main__ - INFO - Epoch [3/9], Step [2500/3125], Loss: 2.5844, Perplexity: 13.2559 | |
| 2025-06-05 12:06:12,314 - __main__ - INFO - Epoch [3/9], Step [2600/3125], Loss: 2.9575, Perplexity: 19.2490 | |
| 2025-06-05 12:07:44,508 - __main__ - INFO - Epoch [3/9], Step [2700/3125], Loss: 2.7714, Perplexity: 15.9803 | |
| 2025-06-05 12:09:16,908 - __main__ - INFO - Epoch [3/9], Step [2800/3125], Loss: 2.6014, Perplexity: 13.4829 | |
| 2025-06-05 12:10:49,531 - __main__ - INFO - Epoch [3/9], Step [2900/3125], Loss: 2.9240, Perplexity: 18.6159 | |
| 2025-06-05 12:12:22,014 - __main__ - INFO - Epoch [3/9], Step [3000/3125], Loss: 2.8361, Perplexity: 17.0495 | |
| 2025-06-05 12:13:54,500 - __main__ - INFO - Epoch [3/9], Step [3100/3125], Loss: 2.9557, Perplexity: 19.2145 | |
| 2025-06-05 12:14:17,729 - __main__ - INFO - Epoch 3 Training finished. Avg Loss: 2.9138, Time: 2894.70s | |
| 2025-06-05 12:27:10,365 - __main__ - INFO - Validation Avg Loss: 2.7487, Perplexity: 15.6226 | |
| 2025-06-05 12:27:20,942 - __main__ - INFO - Validation BLEU-4: 0.0986 | |
| 2025-06-05 12:27:21,682 - __main__ - INFO - Saved best model checkpoint to ./output/best_model_bleu0.0986.pth | |
| 2025-06-05 12:27:21,683 - __main__ - INFO - Performing memory optimization after epoch... | |
| 2025-06-05 12:27:21,948 - __main__ - INFO - CUDA cache emptied. | |
| 2025-06-05 12:27:22,293 - __main__ - INFO - Python garbage collector run. | |
| 2025-06-05 12:28:57,255 - __main__ - INFO - Epoch [4/9], Step [100/3125], Loss: 2.8362, Perplexity: 17.0513 | |
| 2025-06-05 12:30:30,013 - __main__ - INFO - Epoch [4/9], Step [200/3125], Loss: 2.5668, Perplexity: 13.0241 | |
| 2025-06-05 12:32:02,681 - __main__ - INFO - Epoch [4/9], Step [300/3125], Loss: 2.7211, Perplexity: 15.1978 | |
| 2025-06-05 12:33:35,422 - __main__ - INFO - Epoch [4/9], Step [400/3125], Loss: 2.9763, Perplexity: 19.6153 | |
| 2025-06-05 12:35:07,932 - __main__ - INFO - Epoch [4/9], Step [500/3125], Loss: 2.7447, Perplexity: 15.5594 | |
| 2025-06-05 12:36:40,894 - __main__ - INFO - Epoch [4/9], Step [600/3125], Loss: 2.6688, Perplexity: 14.4227 | |
| 2025-06-05 12:38:13,657 - __main__ - INFO - Epoch [4/9], Step [700/3125], Loss: 2.7560, Perplexity: 15.7370 | |
| 2025-06-05 12:39:46,304 - __main__ - INFO - Epoch [4/9], Step [800/3125], Loss: 2.7199, Perplexity: 15.1782 | |
| 2025-06-05 12:41:18,843 - __main__ - INFO - Epoch [4/9], Step [900/3125], Loss: 2.9431, Perplexity: 18.9745 | |
| 2025-06-05 12:42:51,310 - __main__ - INFO - Epoch [4/9], Step [1000/3125], Loss: 2.7849, Perplexity: 16.1979 | |
| 2025-06-05 12:44:24,070 - __main__ - INFO - Epoch [4/9], Step [1100/3125], Loss: 2.6553, Perplexity: 14.2291 | |
| 2025-06-05 12:45:56,314 - __main__ - INFO - Epoch [4/9], Step [1200/3125], Loss: 2.7191, Perplexity: 15.1662 | |
| 2025-06-05 12:47:28,651 - __main__ - INFO - Epoch [4/9], Step [1300/3125], Loss: 2.7089, Perplexity: 15.0125 | |
| 2025-06-05 12:49:01,294 - __main__ - INFO - Epoch [4/9], Step [1400/3125], Loss: 2.9824, Perplexity: 19.7352 | |
| 2025-06-05 12:50:34,013 - __main__ - INFO - Epoch [4/9], Step [1500/3125], Loss: 2.7723, Perplexity: 15.9946 | |
| 2025-06-05 12:52:06,802 - __main__ - INFO - Epoch [4/9], Step [1600/3125], Loss: 2.8024, Perplexity: 16.4847 | |
| 2025-06-05 12:53:39,654 - __main__ - INFO - Epoch [4/9], Step [1700/3125], Loss: 2.7457, Perplexity: 15.5758 | |
| 2025-06-05 12:55:12,141 - __main__ - INFO - Epoch [4/9], Step [1800/3125], Loss: 2.9873, Perplexity: 19.8314 | |
| 2025-06-05 12:56:44,381 - __main__ - INFO - Epoch [4/9], Step [1900/3125], Loss: 2.6393, Perplexity: 14.0032 | |
| 2025-06-05 12:58:16,535 - __main__ - INFO - Epoch [4/9], Step [2000/3125], Loss: 2.6532, Perplexity: 14.1999 | |
| 2025-06-05 12:59:49,166 - __main__ - INFO - Epoch [4/9], Step [2100/3125], Loss: 2.7995, Perplexity: 16.4372 | |
| 2025-06-05 13:01:21,714 - __main__ - INFO - Epoch [4/9], Step [2200/3125], Loss: 2.7488, Perplexity: 15.6245 | |
| 2025-06-05 13:02:53,947 - __main__ - INFO - Epoch [4/9], Step [2300/3125], Loss: 2.8463, Perplexity: 17.2242 | |
| 2025-06-05 13:04:26,007 - __main__ - INFO - Epoch [4/9], Step [2400/3125], Loss: 2.6901, Perplexity: 14.7335 | |
| 2025-06-05 13:05:58,607 - __main__ - INFO - Epoch [4/9], Step [2500/3125], Loss: 2.8890, Perplexity: 17.9762 | |
| 2025-06-05 13:07:31,240 - __main__ - INFO - Epoch [4/9], Step [2600/3125], Loss: 2.6551, Perplexity: 14.2270 | |
| 2025-06-05 13:09:03,803 - __main__ - INFO - Epoch [4/9], Step [2700/3125], Loss: 2.8474, Perplexity: 17.2437 | |
| 2025-06-05 13:10:36,148 - __main__ - INFO - Epoch [4/9], Step [2800/3125], Loss: 2.7218, Perplexity: 15.2075 | |
| 2025-06-05 13:12:08,443 - __main__ - INFO - Epoch [4/9], Step [2900/3125], Loss: 2.6753, Perplexity: 14.5169 | |
| 2025-06-05 13:13:41,235 - __main__ - INFO - Epoch [4/9], Step [3000/3125], Loss: 2.8698, Perplexity: 17.6342 | |
| 2025-06-05 13:15:13,729 - __main__ - INFO - Epoch [4/9], Step [3100/3125], Loss: 2.5715, Perplexity: 13.0854 | |
| 2025-06-05 13:15:37,016 - __main__ - INFO - Epoch 4 Training finished. Avg Loss: 2.7810, Time: 2894.72s | |
| 2025-06-05 13:28:14,233 - __main__ - INFO - Validation Avg Loss: 2.6835, Perplexity: 14.6358 | |
| 2025-06-05 13:28:24,559 - __main__ - INFO - Validation BLEU-4: 0.0996 | |
| 2025-06-05 13:28:25,299 - __main__ - INFO - Saved best model checkpoint to ./output/best_model_bleu0.0996.pth | |
| 2025-06-05 13:28:25,300 - __main__ - INFO - Performing memory optimization after epoch... | |
| 2025-06-05 13:28:25,596 - __main__ - INFO - CUDA cache emptied. | |
| 2025-06-05 13:28:25,927 - __main__ - INFO - Python garbage collector run. | |
| 2025-06-05 13:30:01,372 - __main__ - INFO - Epoch [5/9], Step [100/3125], Loss: 2.7858, Perplexity: 16.2124 | |
| 2025-06-05 13:31:34,188 - __main__ - INFO - Epoch [5/9], Step [200/3125], Loss: 2.5799, Perplexity: 13.1955 | |
| 2025-06-05 13:33:06,552 - __main__ - INFO - Epoch [5/9], Step [300/3125], Loss: 2.7141, Perplexity: 15.0910 | |
| 2025-06-05 13:34:39,208 - __main__ - INFO - Epoch [5/9], Step [400/3125], Loss: 2.6322, Perplexity: 13.9042 | |
| 2025-06-05 13:36:11,891 - __main__ - INFO - Epoch [5/9], Step [500/3125], Loss: 2.9920, Perplexity: 19.9248 | |
| 2025-06-05 13:37:44,341 - __main__ - INFO - Epoch [5/9], Step [600/3125], Loss: 2.5876, Perplexity: 13.2972 | |
| 2025-06-05 13:39:16,416 - __main__ - INFO - Epoch [5/9], Step [700/3125], Loss: 2.6221, Perplexity: 13.7644 | |
| 2025-06-05 13:40:48,960 - __main__ - INFO - Epoch [5/9], Step [800/3125], Loss: 2.5546, Perplexity: 12.8666 | |
| 2025-06-05 13:42:21,593 - __main__ - INFO - Epoch [5/9], Step [900/3125], Loss: 2.7899, Perplexity: 16.2789 | |
| 2025-06-05 13:43:54,018 - __main__ - INFO - Epoch [5/9], Step [1000/3125], Loss: 2.7180, Perplexity: 15.1505 | |
| 2025-06-05 13:45:26,629 - __main__ - INFO - Epoch [5/9], Step [1100/3125], Loss: 2.6338, Perplexity: 13.9268 | |
| 2025-06-05 13:46:59,396 - __main__ - INFO - Epoch [5/9], Step [1200/3125], Loss: 2.9994, Perplexity: 20.0732 | |
| 2025-06-05 13:48:31,472 - __main__ - INFO - Epoch [5/9], Step [1300/3125], Loss: 2.5138, Perplexity: 12.3518 | |
| 2025-06-05 13:50:03,810 - __main__ - INFO - Epoch [5/9], Step [1400/3125], Loss: 2.7708, Perplexity: 15.9718 | |
| 2025-06-05 13:51:36,234 - __main__ - INFO - Epoch [5/9], Step [1500/3125], Loss: 2.6756, Perplexity: 14.5208 | |
| 2025-06-05 13:53:08,614 - __main__ - INFO - Epoch [5/9], Step [1600/3125], Loss: 2.6377, Perplexity: 13.9804 | |
| 2025-06-05 13:54:41,451 - __main__ - INFO - Epoch [5/9], Step [1700/3125], Loss: 2.6101, Perplexity: 13.6009 | |
| 2025-06-05 13:56:14,028 - __main__ - INFO - Epoch [5/9], Step [1800/3125], Loss: 2.6736, Perplexity: 14.4918 | |
| 2025-06-05 13:57:46,796 - __main__ - INFO - Epoch [5/9], Step [1900/3125], Loss: 2.6328, Perplexity: 13.9123 | |
| 2025-06-05 13:59:19,826 - __main__ - INFO - Epoch [5/9], Step [2000/3125], Loss: 2.5901, Perplexity: 13.3318 | |
| 2025-06-05 14:00:52,403 - __main__ - INFO - Epoch [5/9], Step [2100/3125], Loss: 2.9779, Perplexity: 19.6469 | |
| 2025-06-05 14:02:25,242 - __main__ - INFO - Epoch [5/9], Step [2200/3125], Loss: 2.5809, Perplexity: 13.2096 | |
| 2025-06-05 14:03:58,214 - __main__ - INFO - Epoch [5/9], Step [2300/3125], Loss: 2.7360, Perplexity: 15.4247 | |
| 2025-06-05 14:05:30,984 - __main__ - INFO - Epoch [5/9], Step [2400/3125], Loss: 2.5268, Perplexity: 12.5135 | |
| 2025-06-05 14:07:04,027 - __main__ - INFO - Epoch [5/9], Step [2500/3125], Loss: 2.8699, Perplexity: 17.6357 | |
| 2025-06-05 14:08:36,966 - __main__ - INFO - Epoch [5/9], Step [2600/3125], Loss: 2.7812, Perplexity: 16.1382 | |
| 2025-06-05 14:10:09,378 - __main__ - INFO - Epoch [5/9], Step [2700/3125], Loss: 2.6540, Perplexity: 14.2111 | |
| 2025-06-05 14:11:41,671 - __main__ - INFO - Epoch [5/9], Step [2800/3125], Loss: 2.6715, Perplexity: 14.4612 | |
| 2025-06-05 14:13:14,497 - __main__ - INFO - Epoch [5/9], Step [2900/3125], Loss: 2.8780, Perplexity: 17.7783 | |
| 2025-06-05 14:14:47,210 - __main__ - INFO - Epoch [5/9], Step [3000/3125], Loss: 2.6441, Perplexity: 14.0706 | |
| 2025-06-05 14:16:19,723 - __main__ - INFO - Epoch [5/9], Step [3100/3125], Loss: 2.6745, Perplexity: 14.5049 | |
| 2025-06-05 14:16:43,018 - __main__ - INFO - Epoch 5 Training finished. Avg Loss: 2.6866, Time: 2897.09s | |
| 2025-06-05 14:29:20,136 - __main__ - INFO - Validation Avg Loss: 2.6448, Perplexity: 14.0800 | |
| 2025-06-05 14:29:30,680 - __main__ - INFO - Validation BLEU-4: 0.1011 | |
| 2025-06-05 14:29:31,398 - __main__ - INFO - Saved best model checkpoint to ./output/best_model_bleu0.1011.pth | |
| 2025-06-05 14:29:32,142 - __main__ - INFO - Saved periodic model checkpoint to ./output/model_epoch_5.pth | |
| 2025-06-05 14:29:32,143 - __main__ - INFO - Performing memory optimization after epoch... | |
| 2025-06-05 14:29:32,408 - __main__ - INFO - CUDA cache emptied. | |
| 2025-06-05 14:29:32,753 - __main__ - INFO - Python garbage collector run. | |
| 2025-06-05 14:31:08,760 - __main__ - INFO - Epoch [6/9], Step [100/3125], Loss: 2.9338, Perplexity: 18.7981 | |
| 2025-06-05 14:32:41,036 - __main__ - INFO - Epoch [6/9], Step [200/3125], Loss: 2.7460, Perplexity: 15.5799 | |
| 2025-06-05 14:34:13,610 - __main__ - INFO - Epoch [6/9], Step [300/3125], Loss: 2.3894, Perplexity: 10.9066 | |
| 2025-06-05 14:35:46,857 - __main__ - INFO - Epoch [6/9], Step [400/3125], Loss: 2.6891, Perplexity: 14.7191 | |
| 2025-06-05 14:37:19,167 - __main__ - INFO - Epoch [6/9], Step [500/3125], Loss: 2.7032, Perplexity: 14.9273 | |
| 2025-06-05 14:38:52,048 - __main__ - INFO - Epoch [6/9], Step [600/3125], Loss: 2.6848, Perplexity: 14.6553 | |
| 2025-06-05 14:40:24,881 - __main__ - INFO - Epoch [6/9], Step [700/3125], Loss: 2.4991, Perplexity: 12.1715 | |
| 2025-06-05 14:41:57,087 - __main__ - INFO - Epoch [6/9], Step [800/3125], Loss: 2.5794, Perplexity: 13.1892 | |
| 2025-06-05 14:43:29,854 - __main__ - INFO - Epoch [6/9], Step [900/3125], Loss: 2.5044, Perplexity: 12.2366 | |
| 2025-06-05 14:45:02,445 - __main__ - INFO - Epoch [6/9], Step [1000/3125], Loss: 2.6207, Perplexity: 13.7448 | |
| 2025-06-05 14:46:35,344 - __main__ - INFO - Epoch [6/9], Step [1100/3125], Loss: 2.5424, Perplexity: 12.7103 | |
| 2025-06-05 14:48:08,048 - __main__ - INFO - Epoch [6/9], Step [1200/3125], Loss: 2.5540, Perplexity: 12.8587 | |
| 2025-06-05 14:49:40,487 - __main__ - INFO - Epoch [6/9], Step [1300/3125], Loss: 2.5427, Perplexity: 12.7145 | |
| 2025-06-05 14:51:13,035 - __main__ - INFO - Epoch [6/9], Step [1400/3125], Loss: 2.6368, Perplexity: 13.9681 | |
| 2025-06-05 14:52:45,059 - __main__ - INFO - Epoch [6/9], Step [1500/3125], Loss: 2.5717, Perplexity: 13.0879 | |
| 2025-06-05 14:54:17,291 - __main__ - INFO - Epoch [6/9], Step [1600/3125], Loss: 2.4953, Perplexity: 12.1255 | |
| 2025-06-05 14:55:50,488 - __main__ - INFO - Epoch [6/9], Step [1700/3125], Loss: 2.7300, Perplexity: 15.3336 | |
| 2025-06-05 14:57:23,442 - __main__ - INFO - Epoch [6/9], Step [1800/3125], Loss: 2.6784, Perplexity: 14.5613 | |
| 2025-06-05 14:58:55,865 - __main__ - INFO - Epoch [6/9], Step [1900/3125], Loss: 2.6738, Perplexity: 14.4946 | |
| 2025-06-05 15:00:28,266 - __main__ - INFO - Epoch [6/9], Step [2000/3125], Loss: 2.6148, Perplexity: 13.6649 | |
| 2025-06-05 15:02:00,818 - __main__ - INFO - Epoch [6/9], Step [2100/3125], Loss: 2.8018, Perplexity: 16.4748 | |
| 2025-06-05 15:03:33,616 - __main__ - INFO - Epoch [6/9], Step [2200/3125], Loss: 2.6942, Perplexity: 14.7940 | |
| 2025-06-05 15:05:05,991 - __main__ - INFO - Epoch [6/9], Step [2300/3125], Loss: 2.4844, Perplexity: 11.9935 | |
| 2025-06-05 15:06:38,662 - __main__ - INFO - Epoch [6/9], Step [2400/3125], Loss: 2.4125, Perplexity: 11.1615 | |
| 2025-06-05 15:08:11,014 - __main__ - INFO - Epoch [6/9], Step [2500/3125], Loss: 2.5616, Perplexity: 12.9564 | |
| 2025-06-05 15:09:43,766 - __main__ - INFO - Epoch [6/9], Step [2600/3125], Loss: 2.5049, Perplexity: 12.2425 | |
| 2025-06-05 15:11:16,830 - __main__ - INFO - Epoch [6/9], Step [2700/3125], Loss: 2.4866, Perplexity: 12.0206 | |
| 2025-06-05 15:12:49,025 - __main__ - INFO - Epoch [6/9], Step [2800/3125], Loss: 2.3411, Perplexity: 10.3931 | |
| 2025-06-05 15:14:21,512 - __main__ - INFO - Epoch [6/9], Step [2900/3125], Loss: 2.6441, Perplexity: 14.0713 | |
| 2025-06-05 15:15:54,097 - __main__ - INFO - Epoch [6/9], Step [3000/3125], Loss: 2.5855, Perplexity: 13.2695 | |
| 2025-06-05 15:17:26,931 - __main__ - INFO - Epoch [6/9], Step [3100/3125], Loss: 2.7371, Perplexity: 15.4419 | |
| 2025-06-05 15:17:50,166 - __main__ - INFO - Epoch 6 Training finished. Avg Loss: 2.6127, Time: 2897.41s | |
| 2025-06-05 15:30:17,460 - __main__ - INFO - Validation Avg Loss: 2.6184, Perplexity: 13.7138 | |
| 2025-06-05 15:30:28,171 - __main__ - INFO - Validation BLEU-4: 0.1009 | |
| 2025-06-05 15:30:28,172 - __main__ - INFO - Performing memory optimization after epoch... | |
| 2025-06-05 15:30:28,470 - __main__ - INFO - CUDA cache emptied. | |
| 2025-06-05 15:30:28,816 - __main__ - INFO - Python garbage collector run. | |
| 2025-06-05 15:32:05,141 - __main__ - INFO - Epoch [7/9], Step [100/3125], Loss: 2.6545, Perplexity: 14.2180 | |
| 2025-06-05 15:33:37,788 - __main__ - INFO - Epoch [7/9], Step [200/3125], Loss: 2.3907, Perplexity: 10.9206 | |
| 2025-06-05 15:35:09,924 - __main__ - INFO - Epoch [7/9], Step [300/3125], Loss: 2.3964, Perplexity: 10.9837 | |
| 2025-06-05 15:36:42,633 - __main__ - INFO - Epoch [7/9], Step [400/3125], Loss: 2.2917, Perplexity: 9.8918 | |
| 2025-06-05 15:38:15,138 - __main__ - INFO - Epoch [7/9], Step [500/3125], Loss: 2.5460, Perplexity: 12.7565 | |
| 2025-06-05 15:39:47,288 - __main__ - INFO - Epoch [7/9], Step [600/3125], Loss: 2.7165, Perplexity: 15.1276 | |
| 2025-06-05 15:41:19,984 - __main__ - INFO - Epoch [7/9], Step [700/3125], Loss: 2.3160, Perplexity: 10.1350 | |
| 2025-06-05 15:42:52,625 - __main__ - INFO - Epoch [7/9], Step [800/3125], Loss: 2.4257, Perplexity: 11.3097 | |
| 2025-06-05 15:44:25,300 - __main__ - INFO - Epoch [7/9], Step [900/3125], Loss: 2.8313, Perplexity: 16.9669 | |
| 2025-06-05 15:45:57,862 - __main__ - INFO - Epoch [7/9], Step [1000/3125], Loss: 2.4841, Perplexity: 11.9898 | |
| 2025-06-05 15:47:30,346 - __main__ - INFO - Epoch [7/9], Step [1100/3125], Loss: 2.4737, Perplexity: 11.8667 | |
| 2025-06-05 15:49:02,664 - __main__ - INFO - Epoch [7/9], Step [1200/3125], Loss: 2.3904, Perplexity: 10.9178 | |
| 2025-06-05 15:50:34,955 - __main__ - INFO - Epoch [7/9], Step [1300/3125], Loss: 2.4815, Perplexity: 11.9595 | |
| 2025-06-05 15:52:07,231 - __main__ - INFO - Epoch [7/9], Step [1400/3125], Loss: 2.7324, Perplexity: 15.3697 | |
| 2025-06-05 15:53:39,430 - __main__ - INFO - Epoch [7/9], Step [1500/3125], Loss: 2.7363, Perplexity: 15.4298 | |
| 2025-06-05 15:55:11,826 - __main__ - INFO - Epoch [7/9], Step [1600/3125], Loss: 2.7751, Perplexity: 16.0397 | |
| 2025-06-05 15:56:44,380 - __main__ - INFO - Epoch [7/9], Step [1700/3125], Loss: 2.5598, Perplexity: 12.9329 | |
| 2025-06-05 15:58:16,561 - __main__ - INFO - Epoch [7/9], Step [1800/3125], Loss: 2.7369, Perplexity: 15.4388 | |
| 2025-06-05 15:59:49,080 - __main__ - INFO - Epoch [7/9], Step [1900/3125], Loss: 2.8117, Perplexity: 16.6374 | |
| 2025-06-05 16:01:21,995 - __main__ - INFO - Epoch [7/9], Step [2000/3125], Loss: 2.6360, Perplexity: 13.9572 | |
| 2025-06-05 16:02:54,730 - __main__ - INFO - Epoch [7/9], Step [2100/3125], Loss: 2.4383, Perplexity: 11.4533 | |
| 2025-06-05 16:04:28,093 - __main__ - INFO - Epoch [7/9], Step [2200/3125], Loss: 2.7142, Perplexity: 15.0932 | |
| 2025-06-05 16:06:01,476 - __main__ - INFO - Epoch [7/9], Step [2300/3125], Loss: 2.4590, Perplexity: 11.6927 | |
| 2025-06-05 16:07:34,763 - __main__ - INFO - Epoch [7/9], Step [2400/3125], Loss: 2.6457, Perplexity: 14.0935 | |
| 2025-06-05 16:09:07,778 - __main__ - INFO - Epoch [7/9], Step [2500/3125], Loss: 2.6708, Perplexity: 14.4518 | |
| 2025-06-05 16:10:40,473 - __main__ - INFO - Epoch [7/9], Step [2600/3125], Loss: 2.5663, Perplexity: 13.0179 | |
| 2025-06-05 16:12:12,943 - __main__ - INFO - Epoch [7/9], Step [2700/3125], Loss: 2.8020, Perplexity: 16.4769 | |
| 2025-06-05 16:13:45,226 - __main__ - INFO - Epoch [7/9], Step [2800/3125], Loss: 2.6069, Perplexity: 13.5576 | |
| 2025-06-05 16:15:17,969 - __main__ - INFO - Epoch [7/9], Step [2900/3125], Loss: 2.5213, Perplexity: 12.4453 | |
| 2025-06-05 16:16:51,084 - __main__ - INFO - Epoch [7/9], Step [3000/3125], Loss: 2.5426, Perplexity: 12.7128 | |
| 2025-06-05 16:18:23,560 - __main__ - INFO - Epoch [7/9], Step [3100/3125], Loss: 2.6069, Perplexity: 13.5576 | |
| 2025-06-05 16:18:46,982 - __main__ - INFO - Epoch 7 Training finished. Avg Loss: 2.5503, Time: 2898.16s | |
| 2025-06-05 16:31:32,517 - __main__ - INFO - Validation Avg Loss: 2.6021, Perplexity: 13.4918 | |
| 2025-06-05 16:31:42,995 - __main__ - INFO - Validation BLEU-4: 0.1028 | |
| 2025-06-05 16:31:43,704 - __main__ - INFO - Saved best model checkpoint to ./output/best_model_bleu0.1028.pth | |
| 2025-06-05 16:31:43,705 - __main__ - INFO - Performing memory optimization after epoch... | |
| 2025-06-05 16:31:43,969 - __main__ - INFO - CUDA cache emptied. | |
| 2025-06-05 16:31:44,301 - __main__ - INFO - Python garbage collector run. | |
| 2025-06-05 16:33:19,208 - __main__ - INFO - Epoch [8/9], Step [100/3125], Loss: 2.5949, Perplexity: 13.3954 | |
| 2025-06-05 16:34:52,087 - __main__ - INFO - Epoch [8/9], Step [200/3125], Loss: 2.3867, Perplexity: 10.8772 | |
| 2025-06-05 16:36:24,295 - __main__ - INFO - Epoch [8/9], Step [300/3125], Loss: 2.4636, Perplexity: 11.7471 | |
| 2025-06-05 16:37:56,189 - __main__ - INFO - Epoch [8/9], Step [400/3125], Loss: 2.5906, Perplexity: 13.3381 | |
| 2025-06-05 16:39:29,764 - __main__ - INFO - Epoch [8/9], Step [500/3125], Loss: 2.5785, Perplexity: 13.1772 | |
| 2025-06-05 16:41:02,645 - __main__ - INFO - Epoch [8/9], Step [600/3125], Loss: 2.5902, Perplexity: 13.3330 | |
| 2025-06-05 16:42:35,744 - __main__ - INFO - Epoch [8/9], Step [700/3125], Loss: 2.6504, Perplexity: 14.1599 | |
| 2025-06-05 16:44:08,175 - __main__ - INFO - Epoch [8/9], Step [800/3125], Loss: 2.5812, Perplexity: 13.2125 | |
| 2025-06-05 16:45:41,068 - __main__ - INFO - Epoch [8/9], Step [900/3125], Loss: 2.3477, Perplexity: 10.4617 | |
| 2025-06-05 16:47:13,040 - __main__ - INFO - Epoch [8/9], Step [1000/3125], Loss: 2.5416, Perplexity: 12.7004 | |
| 2025-06-05 16:48:45,442 - __main__ - INFO - Epoch [8/9], Step [1100/3125], Loss: 2.5581, Perplexity: 12.9117 | |
| 2025-06-05 16:50:18,029 - __main__ - INFO - Epoch [8/9], Step [1200/3125], Loss: 2.6781, Perplexity: 14.5574 | |
| 2025-06-05 16:51:51,068 - __main__ - INFO - Epoch [8/9], Step [1300/3125], Loss: 2.4678, Perplexity: 11.7967 | |
| 2025-06-05 16:53:23,767 - __main__ - INFO - Epoch [8/9], Step [1400/3125], Loss: 2.7964, Perplexity: 16.3863 | |
| 2025-06-05 16:54:56,092 - __main__ - INFO - Epoch [8/9], Step [1500/3125], Loss: 2.5617, Perplexity: 12.9583 | |
| 2025-06-05 16:56:28,564 - __main__ - INFO - Epoch [8/9], Step [1600/3125], Loss: 2.4647, Perplexity: 11.7597 | |
| 2025-06-05 16:58:01,422 - __main__ - INFO - Epoch [8/9], Step [1700/3125], Loss: 2.5186, Perplexity: 12.4108 | |
| 2025-06-05 16:59:34,614 - __main__ - INFO - Epoch [8/9], Step [1800/3125], Loss: 2.7458, Perplexity: 15.5764 | |
| 2025-06-05 17:01:06,892 - __main__ - INFO - Epoch [8/9], Step [1900/3125], Loss: 2.4218, Perplexity: 11.2661 | |
| 2025-06-05 17:02:39,718 - __main__ - INFO - Epoch [8/9], Step [2000/3125], Loss: 2.4996, Perplexity: 12.1774 | |
| 2025-06-05 17:04:12,656 - __main__ - INFO - Epoch [8/9], Step [2100/3125], Loss: 2.4715, Perplexity: 11.8400 | |
| 2025-06-05 17:05:45,350 - __main__ - INFO - Epoch [8/9], Step [2200/3125], Loss: 2.7304, Perplexity: 15.3395 | |
| 2025-06-05 17:07:18,313 - __main__ - INFO - Epoch [8/9], Step [2300/3125], Loss: 2.6088, Perplexity: 13.5826 | |
| 2025-06-05 17:08:50,891 - __main__ - INFO - Epoch [8/9], Step [2400/3125], Loss: 2.2952, Perplexity: 9.9266 | |
| 2025-06-05 17:10:23,321 - __main__ - INFO - Epoch [8/9], Step [2500/3125], Loss: 2.6801, Perplexity: 14.5865 | |
| 2025-06-05 17:11:56,233 - __main__ - INFO - Epoch [8/9], Step [2600/3125], Loss: 2.6453, Perplexity: 14.0877 | |
| 2025-06-05 17:13:28,949 - __main__ - INFO - Epoch [8/9], Step [2700/3125], Loss: 2.4980, Perplexity: 12.1576 | |
| 2025-06-05 17:15:01,500 - __main__ - INFO - Epoch [8/9], Step [2800/3125], Loss: 2.4855, Perplexity: 12.0065 | |
| 2025-06-05 17:16:33,983 - __main__ - INFO - Epoch [8/9], Step [2900/3125], Loss: 2.7647, Perplexity: 15.8748 | |
| 2025-06-05 17:18:06,726 - __main__ - INFO - Epoch [8/9], Step [3000/3125], Loss: 2.3954, Perplexity: 10.9722 | |
| 2025-06-05 17:19:40,090 - __main__ - INFO - Epoch [8/9], Step [3100/3125], Loss: 2.5672, Perplexity: 13.0291 | |
| 2025-06-05 17:20:03,575 - __main__ - INFO - Epoch 8 Training finished. Avg Loss: 2.4984, Time: 2899.27s | |
| 2025-06-05 17:32:41,641 - __main__ - INFO - Validation Avg Loss: 2.5887, Perplexity: 13.3131 | |
| 2025-06-05 17:32:52,092 - __main__ - INFO - Validation BLEU-4: 0.1029 | |
| 2025-06-05 17:32:52,835 - __main__ - INFO - Saved best model checkpoint to ./output/best_model_bleu0.1029.pth | |
| 2025-06-05 17:32:52,836 - __main__ - INFO - Performing memory optimization after epoch... | |
| 2025-06-05 17:32:53,135 - __main__ - INFO - CUDA cache emptied. | |
| 2025-06-05 17:32:53,477 - __main__ - INFO - Python garbage collector run. | |
| 2025-06-05 17:34:28,617 - __main__ - INFO - Epoch [9/9], Step [100/3125], Loss: 2.4885, Perplexity: 12.0428 | |
| 2025-06-05 17:36:01,256 - __main__ - INFO - Epoch [9/9], Step [200/3125], Loss: 2.4464, Perplexity: 11.5461 | |
| 2025-06-05 17:37:33,875 - __main__ - INFO - Epoch [9/9], Step [300/3125], Loss: 2.5567, Perplexity: 12.8926 | |
| 2025-06-05 17:39:06,254 - __main__ - INFO - Epoch [9/9], Step [400/3125], Loss: 2.5434, Perplexity: 12.7234 | |
| 2025-06-05 17:40:38,946 - __main__ - INFO - Epoch [9/9], Step [500/3125], Loss: 2.4282, Perplexity: 11.3380 | |
| 2025-06-05 17:42:11,097 - __main__ - INFO - Epoch [9/9], Step [600/3125], Loss: 2.4907, Perplexity: 12.0691 | |
| 2025-06-05 17:43:43,481 - __main__ - INFO - Epoch [9/9], Step [700/3125], Loss: 2.5079, Perplexity: 12.2789 | |
| 2025-06-05 17:45:15,684 - __main__ - INFO - Epoch [9/9], Step [800/3125], Loss: 2.4735, Perplexity: 11.8636 | |
| 2025-06-05 17:46:48,773 - __main__ - INFO - Epoch [9/9], Step [900/3125], Loss: 2.5014, Perplexity: 12.2001 | |
| 2025-06-05 17:48:21,463 - __main__ - INFO - Epoch [9/9], Step [1000/3125], Loss: 2.7705, Perplexity: 15.9662 | |
| 2025-06-05 17:49:53,960 - __main__ - INFO - Epoch [9/9], Step [1100/3125], Loss: 2.6952, Perplexity: 14.8082 | |