mp3pintyo's picture
Upload folder using huggingface_hub
5442bb2 verified
2025-10-21 00:40:53,334 - INFO: Problem Type: text_causal_language_modeling
2025-10-21 00:40:53,334 - INFO: Global random seed: 42
2025-10-21 00:40:53,334 - INFO: Preparing the data...
2025-10-21 00:40:53,334 - INFO: Setting up automatic validation split...
2025-10-21 00:40:53,364 - INFO: Preparing train and validation data
2025-10-21 00:40:53,364 - INFO: Loading train dataset...
2025-10-21 00:41:02,760 - INFO: Stop token ids: [tensor([256002]), tensor([256000]), tensor([256001])]
2025-10-21 00:41:02,811 - INFO: Loading validation dataset...
2025-10-21 00:41:05,316 - INFO: Stop token ids: [tensor([256002]), tensor([256000]), tensor([256001])]
2025-10-21 00:41:05,373 - INFO: Number of observations in train dataset: 1088
2025-10-21 00:41:05,373 - INFO: Number of observations in validation dataset: 58
2025-10-21 00:41:08,375 - INFO: Stop token ids: [tensor([256002], device='cuda:0'), tensor([256000], device='cuda:0'), tensor([256001], device='cuda:0')]
2025-10-21 00:41:08,467 - INFO: Using bfloat16 for backbone
2025-10-21 00:41:08,469 - INFO: Loading google/gemma-2-2b. This may take a while.
2025-10-21 00:55:04,688 - INFO: Loaded google/gemma-2-2b.
2025-10-21 00:55:04,688 - INFO: Resizing token embeddings to 256003
2025-10-21 00:55:05,339 - INFO: Attention implementation: eager
2025-10-21 00:55:05,342 - INFO: Lora module names: ['q_proj', 'k_proj', 'v_proj', 'o_proj', 'gate_proj', 'up_proj', 'down_proj']
2025-10-21 00:55:05,518 - INFO: Unfreezing layer: base_model.model.model.embed_tokens.weight
2025-10-21 00:55:05,527 - INFO: Trainable parameters count: 755964672
2025-10-21 00:55:05,528 - INFO: Total parameters count: 2780482560
2025-10-21 00:55:05,528 - INFO: Trainable %: 27.1883%
2025-10-21 00:55:05,536 - INFO: Enough space available for saving model weights.Required space: 5788.83MB, Available space: 93745.59MB.
2025-10-21 00:55:06,168 - INFO: Starting validation inference
2025-10-21 00:55:06,169 - INFO: validation progress: 0%| | 0/6 [00:00<?, ?it/s]
2025-10-21 00:55:06,278 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 00:55:06,278 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 00:55:06,278 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 00:55:06,278 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 00:55:06,280 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 00:55:06,282 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 00:55:06,285 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 00:55:06,288 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 00:55:06,289 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 00:55:06,289 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 00:55:06,290 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 00:55:06,295 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 00:55:06,304 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 00:55:06,307 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 00:55:15,637 - INFO: validation progress: 17%|#6 | 1/6 [00:09<00:47, 9.47s/it]
2025-10-21 00:55:19,604 - INFO: validation progress: 33%|###3 | 2/6 [00:13<00:24, 6.23s/it]
2025-10-21 00:55:23,699 - INFO: validation progress: 50%|##### | 3/6 [00:17<00:15, 5.26s/it]
2025-10-21 00:55:28,406 - INFO: validation progress: 67%|######6 | 4/6 [00:22<00:10, 5.04s/it]
2025-10-21 00:55:32,716 - INFO: validation progress: 83%|########3 | 5/6 [00:26<00:04, 4.78s/it]
2025-10-21 00:55:36,187 - INFO: validation progress: 100%|##########| 6/6 [00:30<00:00, 4.33s/it]
2025-10-21 00:55:36,251 - INFO: validation progress: 100%|##########| 6/6 [00:30<00:00, 5.01s/it]
2025-10-21 00:55:36,289 - INFO: Validation Perplexity: 9580.99707
2025-10-21 00:55:36,290 - INFO: Mean validation loss: 7.27520
2025-10-21 00:55:36,513 - INFO: Training Epoch: 1 / 4
2025-10-21 00:55:36,513 - INFO: train loss: 0%| | 0/181 [00:00<?, ?it/s]
2025-10-21 00:55:36,640 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 00:55:36,642 - INFO: Evaluation step: 90
2025-10-21 00:55:36,642 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 00:55:36,644 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 00:55:36,645 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 00:55:36,647 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 00:55:36,647 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 00:55:36,648 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 00:55:36,654 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 00:55:36,656 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 00:55:36,658 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 00:55:36,659 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 00:55:36,662 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 00:55:36,662 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 00:55:36,671 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 00:55:36,673 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 00:55:36,676 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 00:55:39,067 - INFO: Stop token ids: [tensor([256002]), tensor([256000]), tensor([256001])]
2025-10-21 00:55:48,479 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 00:56:01,220 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 00:56:12,651 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 00:56:21,970 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 00:56:21,972 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 00:56:33,594 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 00:56:43,980 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 00:56:55,432 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 00:56:55,436 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 00:56:55,439 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 00:57:05,659 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 00:57:05,661 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 00:57:16,849 - INFO: train loss: 7.53: 5%|4 | 9/181 [01:40<31:57, 11.15s/it]
2025-10-21 00:57:16,858 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 00:57:25,624 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 00:57:35,939 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 00:57:35,943 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 00:57:45,489 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 00:57:56,639 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 00:58:06,224 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 00:58:06,226 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 00:58:37,935 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 00:58:37,940 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 00:58:47,566 - INFO: train loss: 5.02: 10%|9 | 18/181 [03:11<28:34, 10.52s/it]
2025-10-21 00:58:47,572 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 00:58:58,701 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 00:59:08,762 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 00:59:19,781 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 00:59:29,221 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 00:59:40,879 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:00:01,638 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:00:01,641 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:00:01,643 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:00:22,180 - INFO: train loss: 3.28: 15%|#4 | 27/181 [04:45<26:59, 10.52s/it]
2025-10-21 01:00:22,183 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:00:31,750 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:00:31,752 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:00:43,171 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:00:43,174 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:00:52,811 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:00:52,813 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:01:12,137 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:01:12,139 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:01:22,384 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:01:32,823 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:01:43,013 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:01:52,767 - INFO: train loss: 2.65: 20%|#9 | 36/181 [06:16<24:59, 10.34s/it]
2025-10-21 01:01:52,771 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:01:52,775 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:02:03,982 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:02:03,985 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:02:03,987 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:02:03,989 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:02:14,001 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:02:14,006 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:02:25,098 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:02:34,735 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:02:45,888 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:03:23,767 - INFO: train loss: 2.22: 25%|##4 | 45/181 [07:47<23:14, 10.26s/it]
2025-10-21 01:03:23,774 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:03:34,019 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:03:34,025 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:03:34,027 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:03:54,085 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:03:54,088 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:03:54,090 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:03:54,091 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:04:00,184 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:04:10,091 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:04:10,094 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:04:10,098 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:04:20,745 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:04:20,751 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:04:29,707 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:04:29,710 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:04:36,549 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:04:36,551 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:04:46,438 - INFO: train loss: 2.29: 30%|##9 | 54/181 [09:09<20:56, 9.89s/it]
2025-10-21 01:05:03,927 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:05:03,995 - INFO: train loss: 2.29: 30%|##9 | 54/181 [09:27<20:56, 9.89s/it]
2025-10-21 01:05:15,209 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:05:24,763 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:05:35,986 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:05:35,988 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:05:45,687 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:05:45,693 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:05:56,951 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:06:07,182 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:06:07,189 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:06:13,257 - INFO: train loss: 2.13: 35%|###4 | 63/181 [10:36<19:17, 9.81s/it]
2025-10-21 01:06:22,845 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:06:22,849 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:06:23,994 - INFO: train loss: 2.13: 35%|###4 | 63/181 [10:47<19:17, 9.81s/it]
2025-10-21 01:06:32,798 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:06:32,800 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:06:32,802 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:06:42,952 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:07:25,263 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:07:25,267 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:07:25,269 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:07:36,792 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:07:45,541 - INFO: train loss: 2.05: 40%|###9 | 72/181 [12:09<18:04, 9.95s/it]
2025-10-21 01:07:45,550 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:07:56,347 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:07:56,354 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:08:06,302 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:08:06,307 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:08:06,309 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:08:17,125 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:08:22,423 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:08:22,430 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:08:22,433 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:08:45,185 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:08:45,187 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:08:45,191 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:08:57,320 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:09:18,263 - INFO: train loss: 2.02: 45%|####4 | 81/181 [13:41<16:46, 10.06s/it]
2025-10-21 01:09:18,267 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:09:28,101 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:09:39,674 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:09:39,677 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:09:49,811 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:10:01,044 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:10:10,212 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:10:10,217 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:10:21,925 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:10:21,930 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:10:21,932 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:10:31,759 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:10:42,839 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:10:42,841 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:10:42,845 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:10:52,475 - INFO: train loss: 1.93: 50%|####9 | 90/181 [15:15<15:27, 10.19s/it]
2025-10-21 01:10:52,475 - INFO: Saving last model checkpoint to /mount/output/user/gemma-2-2b-it-HUN-v2.9.1/
2025-10-21 01:11:17,441 - INFO: Starting validation inference
2025-10-21 01:11:17,441 - INFO: validation progress: 0%| | 0/6 [00:00<?, ?it/s]
2025-10-21 01:11:17,602 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:11:17,604 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:11:17,606 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:11:17,607 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:11:17,609 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:11:17,609 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:11:17,614 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:11:17,617 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:11:17,617 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:11:17,619 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:11:17,622 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:11:17,623 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:11:17,635 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:11:17,639 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:11:23,885 - INFO: validation progress: 17%|#6 | 1/6 [00:06<00:32, 6.44s/it]
2025-10-21 01:11:29,618 - INFO: validation progress: 33%|###3 | 2/6 [00:12<00:24, 6.03s/it]
2025-10-21 01:11:35,673 - INFO: validation progress: 50%|##### | 3/6 [00:18<00:18, 6.04s/it]
2025-10-21 01:11:41,732 - INFO: validation progress: 67%|######6 | 4/6 [00:24<00:12, 6.05s/it]
2025-10-21 01:11:47,740 - INFO: validation progress: 83%|########3 | 5/6 [00:30<00:06, 6.03s/it]
2025-10-21 01:11:52,264 - INFO: validation progress: 100%|##########| 6/6 [00:34<00:00, 5.52s/it]
2025-10-21 01:11:52,331 - INFO: validation progress: 100%|##########| 6/6 [00:34<00:00, 5.81s/it]
2025-10-21 01:11:52,380 - INFO: Validation Perplexity: 7.08796
2025-10-21 01:11:52,380 - INFO: Mean validation loss: 1.59453
2025-10-21 01:11:52,612 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:11:52,615 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:11:52,617 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:11:52,618 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:11:52,620 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:12:04,187 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:12:04,192 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:12:04,194 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:12:14,295 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:12:14,301 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:12:25,148 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:12:34,278 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:12:54,675 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:12:54,680 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:13:15,456 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:13:15,458 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:13:15,461 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:13:15,463 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:13:27,303 - INFO: train loss: 1.99: 55%|#####4 | 99/181 [17:50<16:51, 12.33s/it]
2025-10-21 01:13:27,307 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:13:27,311 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:13:36,816 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:13:36,818 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:13:47,880 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:13:57,818 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:13:57,821 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:14:08,278 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:14:08,281 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:14:13,968 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:14:25,005 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:14:25,010 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:14:25,013 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:14:46,090 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:14:46,095 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:14:46,097 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:14:56,097 - INFO: train loss: 2.00: 60%|#####9 | 108/181 [19:19<14:05, 11.58s/it]
2025-10-21 01:14:56,101 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:14:56,105 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:15:06,141 - INFO: train loss: 2.00: 60%|#####9 | 108/181 [19:29<14:05, 11.58s/it]
2025-10-21 01:15:16,350 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:15:27,983 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:15:27,985 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:15:27,987 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:15:37,855 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:15:48,061 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:15:57,837 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:16:09,114 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:16:09,115 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:16:19,122 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:16:19,125 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:16:19,131 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:16:30,653 - INFO: train loss: 1.85: 65%|######4 | 117/181 [20:54<12:00, 11.26s/it]
2025-10-21 01:16:30,660 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:16:30,662 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:16:40,079 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:16:51,025 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:16:51,027 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:17:00,981 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:17:11,461 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:17:11,464 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:17:11,466 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:17:21,559 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:17:21,561 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:17:43,623 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:17:43,631 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:17:54,252 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:17:54,259 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:18:04,497 - INFO: train loss: 1.99: 70%|######9 | 126/181 [22:27<10:05, 11.01s/it]
2025-10-21 01:18:15,182 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:18:15,184 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:18:25,507 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:18:25,513 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:18:25,516 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:18:36,529 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:18:36,532 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:18:47,275 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:18:58,688 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:18:58,689 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:18:58,694 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:19:04,721 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:19:04,724 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:19:16,093 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:19:26,042 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:19:26,050 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:19:26,052 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:19:35,592 - INFO: train loss: 1.99: 75%|#######4 | 135/181 [23:59<08:14, 10.74s/it]
2025-10-21 01:19:35,598 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:19:45,480 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:19:56,549 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:19:56,552 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:19:56,556 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:20:06,373 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:20:06,378 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:20:17,537 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:20:17,542 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:20:26,993 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:20:26,996 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:20:38,402 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:20:47,899 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:20:59,397 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:21:09,803 - INFO: train loss: 1.91: 80%|#######9 | 144/181 [25:33<06:34, 10.66s/it]
2025-10-21 01:21:09,808 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:21:09,810 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:21:09,811 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:21:20,073 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:21:20,075 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:21:29,852 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:21:40,898 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:21:51,020 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:22:02,534 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:22:02,536 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:22:02,538 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:22:10,904 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:22:10,910 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:22:21,598 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:22:21,601 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:22:42,868 - INFO: train loss: 1.90: 85%|########4 | 153/181 [27:06<04:55, 10.56s/it]
2025-10-21 01:22:42,871 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:22:42,878 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:22:53,140 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:23:04,769 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:23:14,530 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:23:25,336 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:23:34,956 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:23:34,963 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:23:46,322 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:23:46,327 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:23:56,610 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:24:07,567 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:24:07,569 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:24:07,571 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:24:17,537 - INFO: train loss: 1.91: 90%|########9 | 162/181 [28:41<03:20, 10.55s/it]
2025-10-21 01:24:17,541 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:24:17,546 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:24:39,089 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:24:59,918 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:24:59,922 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:25:11,840 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:25:11,844 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:25:11,846 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:25:21,828 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:25:21,830 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:25:21,832 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:25:32,606 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:25:32,610 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:25:32,612 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:25:42,771 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:25:42,775 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:25:53,859 - INFO: train loss: 1.88: 94%|#########4| 171/181 [30:17<01:45, 10.60s/it]
2025-10-21 01:25:53,862 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:25:53,865 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:25:53,867 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:25:53,870 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:25:53,872 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:25:59,658 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:27:19,166 - INFO: train loss: 2.00: 99%|#########9| 180/181 [31:42<00:10, 10.26s/it]
2025-10-21 01:27:19,167 - INFO: Saving last model checkpoint to /mount/output/user/gemma-2-2b-it-HUN-v2.9.1/
2025-10-21 01:27:34,015 - INFO: train loss: 2.00: 99%|#########9| 180/181 [31:57<00:10, 10.26s/it]
2025-10-21 01:27:44,363 - INFO: Starting validation inference
2025-10-21 01:27:44,364 - INFO: validation progress: 0%| | 0/6 [00:00<?, ?it/s]
2025-10-21 01:27:44,496 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:27:44,499 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:27:44,499 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:27:44,500 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:27:44,504 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:27:44,507 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:27:44,510 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:27:44,510 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:27:44,515 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:27:44,515 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:27:44,516 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:27:44,522 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:27:44,531 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:27:44,535 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:27:51,490 - INFO: validation progress: 17%|#6 | 1/6 [00:07<00:35, 7.12s/it]
2025-10-21 01:27:57,727 - INFO: validation progress: 33%|###3 | 2/6 [00:13<00:26, 6.60s/it]
2025-10-21 01:28:03,621 - INFO: validation progress: 50%|##### | 3/6 [00:19<00:18, 6.28s/it]
2025-10-21 01:28:09,533 - INFO: validation progress: 67%|######6 | 4/6 [00:25<00:12, 6.13s/it]
2025-10-21 01:28:15,105 - INFO: validation progress: 83%|########3 | 5/6 [00:30<00:05, 5.93s/it]
2025-10-21 01:28:19,813 - INFO: validation progress: 100%|##########| 6/6 [00:35<00:00, 5.52s/it]
2025-10-21 01:28:19,879 - INFO: validation progress: 100%|##########| 6/6 [00:35<00:00, 5.92s/it]
2025-10-21 01:28:19,930 - INFO: Validation Perplexity: 6.28209
2025-10-21 01:28:19,930 - INFO: Mean validation loss: 1.50718
2025-10-21 01:28:30,468 - INFO: train loss: 1.98: 100%|##########| 181/181 [32:53<00:00, 13.04s/it]
2025-10-21 01:28:30,519 - INFO: train loss: 1.98: 100%|##########| 181/181 [32:54<00:00, 10.91s/it]
2025-10-21 01:28:30,587 - INFO: Training Epoch: 2 / 4
2025-10-21 01:28:30,587 - INFO: train loss: 0%| | 0/181 [00:00<?, ?it/s]
2025-10-21 01:28:30,713 - INFO: Evaluation step: 90
2025-10-21 01:28:30,725 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:28:30,727 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:28:30,729 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:28:30,729 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:28:30,736 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:28:30,737 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:28:30,743 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:28:30,745 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:28:30,750 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:28:30,755 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:28:41,768 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:28:41,770 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:28:51,854 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:29:02,957 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:29:02,964 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:29:13,039 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:29:23,980 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:29:33,828 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:29:33,830 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:29:45,111 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:29:45,117 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:29:54,341 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:29:54,346 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:30:07,870 - INFO: train loss: 1.71: 5%|4 | 9/181 [01:37<30:59, 10.81s/it]
2025-10-21 01:30:07,874 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:30:07,881 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:30:18,071 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:30:18,073 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:30:29,741 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:30:29,749 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:30:50,919 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:31:12,254 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:31:12,258 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:31:12,260 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:31:22,128 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:31:33,798 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:31:43,819 - INFO: train loss: 1.83: 10%|9 | 18/181 [03:13<29:07, 10.72s/it]
2025-10-21 01:31:43,823 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:31:43,829 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:31:43,832 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:32:07,342 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:32:07,347 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:32:20,556 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:32:20,563 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:32:30,576 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:32:30,580 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:32:30,584 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:32:41,404 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:32:41,409 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:32:41,412 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:33:02,454 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:33:02,459 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:33:11,564 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:33:11,568 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:33:11,571 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:33:22,943 - INFO: train loss: 1.69: 15%|#4 | 27/181 [04:52<27:51, 10.86s/it]
2025-10-21 01:33:32,168 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:33:32,174 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:33:43,959 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:33:53,725 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:34:05,460 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:34:35,390 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:34:35,392 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:34:35,394 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:34:35,398 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:34:47,037 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:34:47,040 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:34:56,160 - INFO: train loss: 1.75: 20%|#9 | 36/181 [06:25<25:45, 10.66s/it]
2025-10-21 01:34:56,168 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:35:17,683 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:35:17,689 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:35:28,570 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:35:38,020 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:35:38,026 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:35:48,624 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:35:48,627 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:35:48,633 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:35:57,765 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:35:57,769 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:35:57,771 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:36:20,068 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:36:31,350 - INFO: train loss: 1.66: 25%|##4 | 45/181 [08:00<24:05, 10.63s/it]
2025-10-21 01:36:31,359 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:36:31,361 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:36:40,270 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:36:40,273 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:36:40,275 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:36:51,464 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:36:51,467 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:36:51,472 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:37:01,529 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:37:01,532 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:37:01,534 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:37:12,752 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:37:43,518 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:37:53,961 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:37:53,964 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:37:53,966 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:38:04,412 - INFO: train loss: 1.76: 30%|##9 | 54/181 [09:33<22:17, 10.53s/it]
2025-10-21 01:38:04,421 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:38:25,377 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:38:25,379 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:38:37,146 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:38:46,689 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:38:56,559 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:38:56,561 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:39:06,145 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:39:06,147 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:39:17,544 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:39:17,550 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:39:39,406 - INFO: train loss: 1.71: 35%|###4 | 63/181 [11:08<20:43, 10.54s/it]
2025-10-21 01:39:39,414 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:39:39,417 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:39:59,796 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:39:59,799 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:39:59,801 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:40:09,933 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:40:21,364 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:40:21,367 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:40:21,369 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:40:43,410 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:40:43,413 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:41:05,580 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:41:05,584 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:41:16,241 - INFO: train loss: 1.74: 40%|###9 | 72/181 [12:45<19:16, 10.61s/it]
2025-10-21 01:41:16,246 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:41:16,252 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:41:22,751 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:41:22,754 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:41:32,996 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:41:32,998 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:41:33,002 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:41:42,829 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:41:53,899 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:41:53,902 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:41:53,906 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:42:05,187 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:42:05,189 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:42:05,195 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:42:05,197 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:42:15,343 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:42:26,620 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:42:26,623 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:42:36,592 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:42:47,863 - INFO: train loss: 1.69: 45%|####4 | 81/181 [14:17<17:27, 10.47s/it]
2025-10-21 01:42:58,556 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:42:58,561 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:42:58,563 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:42:58,565 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:43:09,860 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:43:40,709 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:43:40,713 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:43:52,548 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:43:52,553 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:44:11,900 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:44:11,902 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:44:21,838 - INFO: train loss: 1.64: 50%|####9 | 90/181 [15:51<15:52, 10.46s/it]
2025-10-21 01:44:21,839 - INFO: Saving last model checkpoint to /mount/output/user/gemma-2-2b-it-HUN-v2.9.1/
2025-10-21 01:44:47,214 - INFO: Starting validation inference
2025-10-21 01:44:47,215 - INFO: validation progress: 0%| | 0/6 [00:00<?, ?it/s]
2025-10-21 01:44:47,339 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:44:47,341 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:44:47,342 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:44:47,343 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:44:47,346 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:44:47,346 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:44:47,351 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:44:47,353 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:44:47,355 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:44:47,356 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:44:47,356 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:44:47,361 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:44:47,369 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:44:47,372 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:44:53,717 - INFO: validation progress: 17%|#6 | 1/6 [00:06<00:32, 6.50s/it]
2025-10-21 01:44:59,348 - INFO: validation progress: 33%|###3 | 2/6 [00:12<00:23, 5.99s/it]
2025-10-21 01:45:04,964 - INFO: validation progress: 50%|##### | 3/6 [00:17<00:17, 5.82s/it]
2025-10-21 01:45:11,378 - INFO: validation progress: 67%|######6 | 4/6 [00:24<00:12, 6.05s/it]
2025-10-21 01:45:17,454 - INFO: validation progress: 83%|########3 | 5/6 [00:30<00:06, 6.06s/it]
2025-10-21 01:45:22,344 - INFO: validation progress: 100%|##########| 6/6 [00:35<00:00, 5.66s/it]
2025-10-21 01:45:22,415 - INFO: validation progress: 100%|##########| 6/6 [00:35<00:00, 5.87s/it]
2025-10-21 01:45:22,464 - INFO: Validation Perplexity: 6.14510
2025-10-21 01:45:22,464 - INFO: Mean validation loss: 1.48446
2025-10-21 01:45:22,708 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:45:22,710 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:45:33,064 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:45:42,665 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:45:42,667 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:45:52,823 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:45:52,826 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:46:02,996 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:46:03,000 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:46:14,087 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:46:14,090 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:46:14,092 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:46:14,096 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:46:23,664 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:46:34,177 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:46:44,247 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:46:44,249 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:46:55,671 - INFO: train loss: 1.63: 55%|#####4 | 99/181 [18:25<17:04, 12.49s/it]
2025-10-21 01:46:55,674 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:46:55,679 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:47:05,124 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:47:16,694 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:47:26,599 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:47:37,725 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:47:47,802 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:48:07,806 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:48:07,811 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:48:29,490 - INFO: train loss: 1.62: 60%|#####9 | 108/181 [19:58<14:26, 11.86s/it]
2025-10-21 01:48:29,494 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:48:29,496 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:48:40,574 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:48:40,577 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:48:49,788 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:49:01,483 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:49:01,490 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:49:19,880 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:49:19,883 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:49:19,887 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:49:39,973 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:49:49,272 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:49:49,274 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:50:00,674 - INFO: train loss: 1.74: 65%|######4 | 117/181 [21:30<12:05, 11.34s/it]
2025-10-21 01:50:00,681 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:50:00,686 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:50:00,687 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:50:11,313 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:50:11,315 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:50:11,319 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:50:22,092 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:50:22,095 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:50:31,768 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:50:42,225 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:50:42,229 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:50:52,064 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:51:03,159 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:51:11,963 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:51:23,352 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:51:33,010 - INFO: train loss: 1.68: 70%|######9 | 126/181 [23:02<10:05, 11.01s/it]
2025-10-21 01:51:44,114 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:51:54,017 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:51:54,019 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:51:54,024 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:52:05,032 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:52:05,037 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:52:25,901 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:52:25,903 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:52:25,906 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:52:31,971 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:52:31,974 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:52:43,048 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:53:02,198 - INFO: train loss: 1.67: 75%|#######4 | 135/181 [24:31<08:11, 10.68s/it]
2025-10-21 01:53:02,207 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:53:02,209 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:53:11,767 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:53:11,769 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:53:11,771 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:53:11,776 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:53:13,980 - INFO: train loss: 1.67: 75%|#######4 | 135/181 [24:43<08:11, 10.68s/it]
2025-10-21 01:53:23,395 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:53:23,399 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:53:33,111 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:53:33,115 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:53:53,980 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:53:53,983 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:53:53,987 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:54:05,166 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:54:05,171 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:54:05,173 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:54:21,497 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:54:31,055 - INFO: train loss: 1.78: 80%|#######9 | 144/181 [26:00<06:26, 10.44s/it]
2025-10-21 01:54:43,221 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:54:43,225 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:54:43,228 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:54:43,231 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:54:43,979 - INFO: train loss: 1.78: 80%|#######9 | 144/181 [26:13<06:26, 10.44s/it]
2025-10-21 01:54:53,339 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:55:04,628 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:55:04,634 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:55:04,639 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:55:10,368 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:55:21,385 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:55:21,387 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:55:21,392 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:55:41,570 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:56:01,886 - INFO: train loss: 1.72: 85%|########4 | 153/181 [27:31<04:49, 10.33s/it]
2025-10-21 01:56:01,890 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:56:01,894 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:56:11,434 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:56:11,437 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:56:22,176 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:56:43,185 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:56:43,188 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:56:43,192 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:56:53,396 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:56:53,401 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:57:13,432 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:57:13,434 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:57:22,838 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:57:32,821 - INFO: train loss: 1.71: 90%|########9 | 162/181 [29:02<03:15, 10.26s/it]
2025-10-21 01:57:32,827 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:57:44,077 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:57:44,079 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:57:44,081 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:57:44,084 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:57:44,086 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:57:53,609 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:57:53,611 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:57:53,613 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:58:04,123 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:58:04,126 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:58:04,128 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:58:04,130 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:58:14,071 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:58:14,074 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:58:25,288 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:58:25,291 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:58:25,295 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:58:34,388 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:58:45,105 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:58:55,113 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:58:55,116 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:58:55,118 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:59:06,190 - INFO: train loss: 1.64: 94%|#########4| 171/181 [30:35<01:42, 10.30s/it]
2025-10-21 01:59:06,198 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:59:06,199 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 01:59:15,743 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:00:39,842 - INFO: train loss: 1.65: 99%|#########9| 180/181 [32:09<00:10, 10.33s/it]
2025-10-21 02:00:39,842 - INFO: Saving last model checkpoint to /mount/output/user/gemma-2-2b-it-HUN-v2.9.1/
2025-10-21 02:01:05,163 - INFO: Starting validation inference
2025-10-21 02:01:05,164 - INFO: validation progress: 0%| | 0/6 [00:00<?, ?it/s]
2025-10-21 02:01:05,301 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:01:05,303 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:01:05,305 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:01:05,305 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:01:05,308 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:01:05,309 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:01:05,316 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:01:05,318 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:01:05,323 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:01:05,323 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:01:05,324 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:01:05,330 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:01:05,338 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:01:05,340 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:01:12,144 - INFO: validation progress: 17%|#6 | 1/6 [00:06<00:34, 6.98s/it]
2025-10-21 02:01:18,061 - INFO: validation progress: 33%|###3 | 2/6 [00:12<00:25, 6.35s/it]
2025-10-21 02:01:24,305 - INFO: validation progress: 50%|##### | 3/6 [00:19<00:18, 6.30s/it]
2025-10-21 02:01:30,064 - INFO: validation progress: 67%|######6 | 4/6 [00:24<00:12, 6.09s/it]
2025-10-21 02:01:36,046 - INFO: validation progress: 83%|########3 | 5/6 [00:30<00:06, 6.05s/it]
2025-10-21 02:01:40,546 - INFO: validation progress: 100%|##########| 6/6 [00:35<00:00, 5.52s/it]
2025-10-21 02:01:40,620 - INFO: validation progress: 100%|##########| 6/6 [00:35<00:00, 5.91s/it]
2025-10-21 02:01:40,668 - INFO: Validation Perplexity: 5.77568
2025-10-21 02:01:40,668 - INFO: Mean validation loss: 1.46207
2025-10-21 02:01:50,516 - INFO: train loss: 1.65: 100%|##########| 181/181 [33:19<00:00, 13.08s/it]
2025-10-21 02:01:50,583 - INFO: train loss: 1.65: 100%|##########| 181/181 [33:19<00:00, 11.05s/it]
2025-10-21 02:01:50,659 - INFO: Training Epoch: 3 / 4
2025-10-21 02:01:50,659 - INFO: train loss: 0%| | 0/181 [00:00<?, ?it/s]
2025-10-21 02:01:50,781 - INFO: Evaluation step: 90
2025-10-21 02:01:50,785 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:01:50,785 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:01:50,789 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:01:50,789 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:01:50,789 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:01:50,792 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:01:50,793 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:01:50,795 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:01:50,796 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:01:50,803 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:01:50,806 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:01:50,808 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:01:50,810 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:01:50,812 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:01:50,823 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:02:02,390 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:02:02,394 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:02:02,398 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:02:12,336 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:02:12,338 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:02:23,712 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:02:23,718 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:02:45,005 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:02:55,112 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:02:55,114 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:02:55,119 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:03:05,494 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:03:05,501 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:03:15,227 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:03:15,232 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:03:26,265 - INFO: train loss: 1.60: 5%|4 | 9/181 [01:35<30:27, 10.62s/it]
2025-10-21 02:03:26,268 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:03:36,406 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:03:57,460 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:03:57,466 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:04:04,129 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:04:04,132 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:04:04,135 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:04:04,137 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:04:13,841 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:04:35,585 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:04:35,587 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:04:46,595 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:04:55,561 - INFO: train loss: 1.51: 10%|9 | 18/181 [03:04<27:44, 10.21s/it]
2025-10-21 02:04:55,571 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:05:05,887 - INFO: train loss: 1.51: 10%|9 | 18/181 [03:15<27:44, 10.21s/it]
2025-10-21 02:05:06,806 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:05:06,811 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:05:06,817 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:05:11,083 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:05:22,491 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:05:22,493 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:05:32,439 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:05:43,715 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:05:53,556 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:06:04,895 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:06:04,897 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:06:15,087 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:06:15,095 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:06:25,564 - INFO: train loss: 1.48: 15%|#4 | 27/181 [04:34<25:57, 10.11s/it]
2025-10-21 02:06:35,730 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:06:35,733 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:06:35,735 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:06:46,757 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:07:07,598 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:07:28,790 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:07:28,793 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:07:28,797 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:07:28,799 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:07:28,801 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:07:38,678 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:07:38,681 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:07:49,983 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:07:59,561 - INFO: train loss: 1.47: 20%|#9 | 36/181 [06:08<24:45, 10.24s/it]
2025-10-21 02:07:59,570 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:07:59,572 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:08:10,975 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:08:10,980 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:08:40,983 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:08:40,987 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:08:51,668 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:09:13,329 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:09:23,579 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:09:23,582 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:09:34,606 - INFO: train loss: 1.51: 25%|##4 | 45/181 [07:43<23:28, 10.36s/it]
2025-10-21 02:09:34,616 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:09:34,618 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:09:56,405 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:09:56,408 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:09:56,411 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:10:00,262 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:10:00,266 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:10:00,267 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:10:00,269 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:10:11,819 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:10:11,822 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:10:11,826 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:10:11,828 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:10:31,379 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:10:31,385 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:10:40,780 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:10:40,786 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:10:52,451 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:11:02,531 - INFO: train loss: 1.56: 30%|##9 | 54/181 [09:11<21:30, 10.16s/it]
2025-10-21 02:11:02,535 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:11:08,481 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:11:08,483 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:11:08,487 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:11:13,966 - INFO: train loss: 1.56: 30%|##9 | 54/181 [09:23<21:30, 10.16s/it]
2025-10-21 02:11:19,138 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:11:29,915 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:11:29,918 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:11:39,854 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:11:39,860 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:11:45,909 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:11:45,911 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:11:45,914 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:11:45,916 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:12:06,981 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:12:06,983 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:12:17,291 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:12:27,994 - INFO: train loss: 1.53: 35%|###4 | 63/181 [10:37<19:33, 9.94s/it]
2025-10-21 02:12:28,001 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:12:28,003 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:12:37,990 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:12:37,992 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:12:37,996 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:12:43,965 - INFO: train loss: 1.53: 35%|###4 | 63/181 [10:53<19:33, 9.94s/it]
2025-10-21 02:12:59,718 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:13:11,118 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:13:21,129 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:13:21,133 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:13:21,134 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:13:28,394 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:13:28,396 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:13:28,399 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:13:38,140 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:13:48,895 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:13:48,897 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:13:48,899 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:13:48,901 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:13:59,174 - INFO: train loss: 1.63: 40%|###9 | 72/181 [12:08<18:10, 10.00s/it]
2025-10-21 02:13:59,179 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:13:59,181 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:13:59,184 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:14:10,850 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:14:20,979 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:14:20,985 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:14:31,975 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:14:31,978 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:14:42,044 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:14:53,142 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:14:53,145 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:15:03,393 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:15:14,745 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:15:14,753 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:15:35,673 - INFO: train loss: 1.59: 45%|####4 | 81/181 [13:45<17:02, 10.23s/it]
2025-10-21 02:15:56,685 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:16:06,209 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:16:17,113 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:16:38,254 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:16:59,648 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:17:09,827 - INFO: train loss: 1.60: 50%|####9 | 90/181 [15:19<15:37, 10.30s/it]
2025-10-21 02:17:09,827 - INFO: Saving last model checkpoint to /mount/output/user/gemma-2-2b-it-HUN-v2.9.1/
2025-10-21 02:17:35,196 - INFO: Starting validation inference
2025-10-21 02:17:35,197 - INFO: validation progress: 0%| | 0/6 [00:00<?, ?it/s]
2025-10-21 02:17:35,339 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:17:35,341 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:17:35,341 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:17:35,344 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:17:35,347 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:17:35,349 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:17:35,354 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:17:35,355 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:17:35,358 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:17:35,360 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:17:35,361 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:17:35,365 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:17:35,375 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:17:35,377 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:17:42,462 - INFO: validation progress: 17%|#6 | 1/6 [00:07<00:36, 7.26s/it]
2025-10-21 02:17:48,705 - INFO: validation progress: 33%|###3 | 2/6 [00:13<00:26, 6.66s/it]
2025-10-21 02:17:54,557 - INFO: validation progress: 50%|##### | 3/6 [00:19<00:18, 6.29s/it]
2025-10-21 02:18:00,577 - INFO: validation progress: 67%|######6 | 4/6 [00:25<00:12, 6.19s/it]
2025-10-21 02:18:06,483 - INFO: validation progress: 83%|########3 | 5/6 [00:31<00:06, 6.08s/it]
2025-10-21 02:18:11,021 - INFO: validation progress: 100%|##########| 6/6 [00:35<00:00, 5.56s/it]
2025-10-21 02:18:11,090 - INFO: validation progress: 100%|##########| 6/6 [00:35<00:00, 5.98s/it]
2025-10-21 02:18:11,140 - INFO: Validation Perplexity: 5.71523
2025-10-21 02:18:11,141 - INFO: Mean validation loss: 1.47129
2025-10-21 02:18:11,381 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:18:11,387 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:18:11,390 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:18:27,321 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:18:27,326 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:18:47,906 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:18:54,673 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:19:04,531 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:19:04,536 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:19:04,539 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:19:15,122 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:19:24,730 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:19:36,146 - INFO: train loss: 1.52: 55%|#####4 | 99/181 [17:45<16:34, 12.12s/it]
2025-10-21 02:19:36,150 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:19:36,153 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:19:36,155 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:19:36,157 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:19:44,829 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:19:44,832 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:19:44,834 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:19:56,172 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:19:56,177 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:20:17,160 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:20:17,165 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:20:26,800 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:20:37,994 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:20:38,001 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:20:47,583 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:20:47,585 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:20:58,825 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:20:58,829 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:20:58,833 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:21:09,214 - INFO: train loss: 1.49: 60%|#####9 | 108/181 [19:18<14:05, 11.58s/it]
2025-10-21 02:21:20,479 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:21:30,436 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:21:37,829 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:21:37,832 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:21:37,836 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:21:37,838 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:21:47,887 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:21:47,890 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:22:09,420 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:22:20,587 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:22:30,376 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:22:41,066 - INFO: train loss: 1.44: 65%|######4 | 117/181 [20:50<11:54, 11.16s/it]
2025-10-21 02:22:41,073 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:22:50,465 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:23:02,165 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:23:02,167 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:23:12,014 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:23:23,444 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:23:23,446 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:23:23,449 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:23:28,956 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:23:28,958 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:23:39,530 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:23:48,898 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:23:48,903 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:24:06,181 - INFO: train loss: 1.49: 70%|######9 | 126/181 [22:15<09:45, 10.65s/it]
2025-10-21 02:24:06,191 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:24:17,248 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:24:23,948 - INFO: train loss: 1.49: 70%|######9 | 126/181 [22:33<09:45, 10.65s/it]
2025-10-21 02:24:27,501 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:24:27,505 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:24:38,101 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:24:38,107 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:24:38,110 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:24:47,805 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:24:47,809 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:24:47,813 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:24:47,815 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:24:59,407 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:25:08,358 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:25:08,363 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:25:08,366 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:25:08,368 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:25:19,458 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:25:19,461 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:25:39,278 - INFO: train loss: 1.58: 75%|#######4 | 135/181 [23:48<08:05, 10.56s/it]
2025-10-21 02:25:39,284 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:25:39,286 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:25:49,206 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:26:01,096 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:26:11,352 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:26:22,547 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:26:32,352 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:26:53,875 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:26:53,881 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:27:04,431 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:27:04,436 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:27:14,185 - INFO: train loss: 1.53: 80%|#######9 | 144/181 [25:23<06:30, 10.55s/it]
2025-10-21 02:27:14,193 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:27:35,246 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:27:46,398 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:27:56,322 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:27:56,325 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:27:56,330 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:28:07,059 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:28:16,333 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:28:37,752 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:28:48,439 - INFO: train loss: 1.48: 85%|########4 | 153/181 [26:57<04:54, 10.53s/it]
2025-10-21 02:28:48,448 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:28:58,376 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:28:58,380 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:29:09,628 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:29:09,632 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:29:19,521 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:29:19,525 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:29:30,807 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:29:30,811 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:29:39,238 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:29:39,242 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:29:39,244 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:29:49,700 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:29:49,704 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:29:49,710 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:29:58,496 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:30:11,831 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:30:11,834 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:30:11,837 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:30:21,693 - INFO: train loss: 1.57: 90%|########9 | 162/181 [28:31<03:19, 10.48s/it]
2025-10-21 02:30:21,705 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:30:32,631 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:30:42,587 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:30:42,590 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:30:42,595 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:30:53,745 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:30:53,748 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:31:04,256 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:31:16,206 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:31:25,930 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:31:25,933 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:31:37,373 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:31:37,376 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:31:37,380 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:31:47,394 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:31:58,841 - INFO: train loss: 1.56: 94%|#########4| 171/181 [30:08<01:45, 10.57s/it]
2025-10-21 02:31:58,847 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:31:58,851 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:31:58,853 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:32:08,867 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:33:30,946 - INFO: train loss: 1.45: 99%|#########9| 180/181 [31:40<00:10, 10.47s/it]
2025-10-21 02:33:30,946 - INFO: Saving last model checkpoint to /mount/output/user/gemma-2-2b-it-HUN-v2.9.1/
2025-10-21 02:33:56,843 - INFO: Starting validation inference
2025-10-21 02:33:56,843 - INFO: validation progress: 0%| | 0/6 [00:00<?, ?it/s]
2025-10-21 02:33:56,971 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:33:56,972 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:33:56,974 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:33:56,975 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:33:56,977 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:33:56,977 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:33:56,981 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:33:56,983 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:33:56,984 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:33:56,986 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:33:56,987 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:33:56,989 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:33:56,999 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:33:57,001 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:34:03,854 - INFO: validation progress: 17%|#6 | 1/6 [00:07<00:35, 7.01s/it]
2025-10-21 02:34:09,631 - INFO: validation progress: 33%|###3 | 2/6 [00:12<00:25, 6.28s/it]
2025-10-21 02:34:15,586 - INFO: validation progress: 50%|##### | 3/6 [00:18<00:18, 6.13s/it]
2025-10-21 02:34:21,414 - INFO: validation progress: 67%|######6 | 4/6 [00:24<00:12, 6.01s/it]
2025-10-21 02:34:27,224 - INFO: validation progress: 83%|########3 | 5/6 [00:30<00:05, 5.94s/it]
2025-10-21 02:34:31,496 - INFO: validation progress: 100%|##########| 6/6 [00:34<00:00, 5.37s/it]
2025-10-21 02:34:31,567 - INFO: validation progress: 100%|##########| 6/6 [00:34<00:00, 5.79s/it]
2025-10-21 02:34:31,616 - INFO: Validation Perplexity: 5.79932
2025-10-21 02:34:31,617 - INFO: Mean validation loss: 1.46649
2025-10-21 02:34:41,449 - INFO: train loss: 1.48: 100%|##########| 181/181 [32:50<00:00, 13.20s/it]
2025-10-21 02:34:41,520 - INFO: train loss: 1.48: 100%|##########| 181/181 [32:50<00:00, 10.89s/it]
2025-10-21 02:34:41,592 - INFO: Training Epoch: 4 / 4
2025-10-21 02:34:41,592 - INFO: train loss: 0%| | 0/181 [00:00<?, ?it/s]
2025-10-21 02:34:41,722 - INFO: Evaluation step: 90
2025-10-21 02:34:41,725 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:34:41,728 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:34:41,730 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:34:41,733 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:34:41,742 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:34:41,747 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:34:41,748 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:34:41,751 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:34:41,751 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:34:41,762 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:34:52,667 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:35:02,802 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:35:13,491 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:35:23,903 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:35:35,638 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:35:45,337 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:35:45,340 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:35:49,936 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:36:06,485 - INFO: train loss: 1.49: 5%|4 | 9/181 [01:24<27:02, 9.43s/it]
2025-10-21 02:36:06,489 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:36:06,491 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:36:06,495 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:36:06,497 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:36:16,012 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:36:16,017 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:36:23,933 - INFO: train loss: 1.49: 5%|4 | 9/181 [01:42<27:02, 9.43s/it]
2025-10-21 02:36:26,934 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:36:26,938 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:36:26,941 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:36:36,947 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:36:48,968 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:36:48,975 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:36:58,083 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:36:58,087 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:36:58,089 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:37:09,847 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:37:35,878 - INFO: train loss: 1.42: 10%|9 | 18/181 [02:54<26:25, 9.73s/it]
2025-10-21 02:37:46,035 - INFO: train loss: 1.42: 10%|9 | 18/181 [03:04<26:25, 9.73s/it]
2025-10-21 02:37:46,883 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:37:46,891 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:37:57,032 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:38:08,503 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:38:18,455 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:38:29,733 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:38:29,736 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:38:39,739 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:38:39,744 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:38:50,978 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:38:50,980 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:39:00,006 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:39:00,013 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:39:10,221 - INFO: train loss: 1.39: 15%|#4 | 27/181 [04:28<25:51, 10.07s/it]
2025-10-21 02:39:10,227 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:39:10,231 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:39:10,233 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:39:20,628 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:39:31,576 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:39:31,580 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:40:02,164 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:40:13,290 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:40:22,955 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:40:34,004 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:40:44,038 - INFO: train loss: 1.46: 20%|#9 | 36/181 [06:02<24:40, 10.21s/it]
2025-10-21 02:40:44,049 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:40:54,412 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:40:54,416 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:40:54,418 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:41:14,800 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:41:14,802 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:41:24,045 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:41:24,048 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:41:35,617 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:41:35,620 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:41:35,623 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:41:46,108 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:41:46,113 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:41:57,089 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:41:57,093 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:42:07,395 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:42:07,398 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:42:18,180 - INFO: train loss: 1.40: 25%|##4 | 45/181 [07:36<23:20, 10.30s/it]
2025-10-21 02:42:18,184 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:42:18,187 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:42:18,190 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:42:27,603 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:42:48,036 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:42:59,356 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:43:09,567 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:43:09,576 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:43:21,213 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:43:31,930 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:43:31,935 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:43:31,937 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:43:42,741 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:43:42,747 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:43:52,753 - INFO: train loss: 1.45: 30%|##9 | 54/181 [09:11<21:57, 10.37s/it]
2025-10-21 02:43:52,765 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:44:03,962 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:44:13,704 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:44:13,707 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:44:13,709 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:44:13,711 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:44:34,114 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:44:45,179 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:44:55,269 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:44:55,276 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:44:55,277 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:45:06,171 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:45:06,175 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:45:16,150 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:45:16,154 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:45:16,156 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:45:26,208 - INFO: train loss: 1.40: 35%|###4 | 63/181 [10:44<20:24, 10.38s/it]
2025-10-21 02:45:26,214 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:45:26,217 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:45:36,420 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:45:57,302 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:45:57,307 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:45:57,309 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:45:57,310 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:46:07,783 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:46:18,447 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:46:29,879 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:46:39,576 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:46:39,578 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:46:50,353 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:46:50,357 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:47:00,419 - INFO: train loss: 1.42: 40%|###9 | 72/181 [12:18<18:54, 10.40s/it]
2025-10-21 02:47:00,425 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:47:00,427 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:47:11,409 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:47:11,413 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:47:21,418 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:47:21,421 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:47:41,640 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:47:41,644 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:47:41,646 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:47:52,247 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:47:52,249 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:48:12,547 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:48:12,552 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:48:22,533 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:48:33,525 - INFO: train loss: 1.45: 45%|####4 | 81/181 [13:51<17:18, 10.39s/it]
2025-10-21 02:48:33,528 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:48:33,533 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:48:33,536 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:48:43,558 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:48:43,565 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:48:55,538 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:48:55,541 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:49:05,372 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:49:16,319 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:49:16,326 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:49:16,328 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:49:26,371 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:49:35,855 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:49:35,858 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:49:35,863 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:49:46,349 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:49:46,351 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:49:57,258 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:49:57,261 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:50:07,498 - INFO: train loss: 1.44: 50%|####9 | 90/181 [15:25<15:46, 10.40s/it]
2025-10-21 02:50:07,499 - INFO: Saving last model checkpoint to /mount/output/user/gemma-2-2b-it-HUN-v2.9.1/
2025-10-21 02:50:31,721 - INFO: Starting validation inference
2025-10-21 02:50:31,722 - INFO: validation progress: 0%| | 0/6 [00:00<?, ?it/s]
2025-10-21 02:50:31,854 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:50:31,856 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:50:31,858 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:50:31,858 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:50:31,860 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:50:31,861 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:50:31,866 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:50:31,868 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:50:31,870 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:50:31,872 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:50:31,873 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:50:31,876 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:50:31,886 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:50:31,891 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:50:38,518 - INFO: validation progress: 17%|#6 | 1/6 [00:06<00:33, 6.80s/it]
2025-10-21 02:50:44,368 - INFO: validation progress: 33%|###3 | 2/6 [00:12<00:24, 6.24s/it]
2025-10-21 02:50:50,231 - INFO: validation progress: 50%|##### | 3/6 [00:18<00:18, 6.07s/it]
2025-10-21 02:50:56,260 - INFO: validation progress: 67%|######6 | 4/6 [00:24<00:12, 6.05s/it]
2025-10-21 02:51:02,875 - INFO: validation progress: 83%|########3 | 5/6 [00:31<00:06, 6.26s/it]
2025-10-21 02:51:07,529 - INFO: validation progress: 100%|##########| 6/6 [00:35<00:00, 5.71s/it]
2025-10-21 02:51:07,605 - INFO: validation progress: 100%|##########| 6/6 [00:35<00:00, 5.98s/it]
2025-10-21 02:51:07,656 - INFO: Validation Perplexity: 5.97394
2025-10-21 02:51:07,657 - INFO: Mean validation loss: 1.48139
2025-10-21 02:51:08,016 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:51:08,022 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:51:18,654 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:51:18,656 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:51:18,658 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:51:18,662 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:51:28,242 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:51:49,191 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:51:49,195 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:51:59,194 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:52:09,460 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:52:09,468 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:52:21,283 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:52:21,286 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:52:31,239 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:52:31,241 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:52:31,243 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:52:42,431 - INFO: train loss: 1.42: 55%|#####4 | 99/181 [18:00<17:04, 12.49s/it]
2025-10-21 02:52:42,435 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:52:42,442 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:52:52,433 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:53:03,475 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:53:12,147 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:53:12,150 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:53:23,683 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:53:23,685 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:53:23,687 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:53:32,779 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:53:32,783 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:53:43,943 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:53:43,945 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:53:43,947 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:53:43,949 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:53:54,028 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:53:54,032 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:53:54,035 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:54:05,705 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:54:15,682 - INFO: train loss: 1.43: 60%|#####9 | 108/181 [19:34<14:24, 11.84s/it]
2025-10-21 02:54:26,394 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:54:26,395 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:54:35,669 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:54:46,423 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:54:46,426 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:54:56,221 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:54:56,225 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:55:07,656 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:55:07,660 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:55:17,951 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:55:28,602 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:55:28,605 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:55:38,666 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:55:49,592 - INFO: train loss: 1.41: 65%|######4 | 117/181 [21:07<12:10, 11.41s/it]
2025-10-21 02:55:49,600 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:55:59,159 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:56:09,381 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:56:09,386 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:56:19,350 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:56:29,950 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:56:29,956 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:56:39,792 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:56:51,810 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:57:02,283 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:57:13,534 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:57:13,536 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:57:13,540 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:57:23,326 - INFO: train loss: 1.44: 70%|######9 | 126/181 [22:41<10:11, 11.11s/it]
2025-10-21 02:57:23,334 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:57:23,337 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:57:34,703 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:57:34,711 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:57:45,183 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:57:55,435 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:58:05,501 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:58:05,504 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:58:05,507 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:58:15,658 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:58:46,030 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:58:46,033 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:58:57,889 - INFO: train loss: 1.46: 75%|#######4 | 135/181 [24:16<08:22, 10.93s/it]
2025-10-21 02:58:57,894 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:58:57,897 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:59:08,060 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:59:18,842 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:59:28,954 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:59:28,959 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:59:28,960 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:59:40,014 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:59:48,915 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:59:59,786 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 02:59:59,791 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 03:00:10,175 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 03:00:30,861 - INFO: train loss: 1.40: 80%|#######9 | 144/181 [25:49<06:37, 10.75s/it]
2025-10-21 03:00:30,871 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 03:00:42,045 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 03:00:42,047 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 03:00:51,338 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 03:01:02,917 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 03:01:12,856 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 03:01:23,489 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 03:01:23,493 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 03:01:23,494 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 03:01:33,751 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 03:01:44,925 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 03:01:44,930 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 03:01:44,932 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 03:01:55,041 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 03:02:06,173 - INFO: train loss: 1.43: 85%|########4 | 153/181 [27:24<04:59, 10.70s/it]
2025-10-21 03:02:06,178 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 03:02:06,180 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 03:02:26,956 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 03:02:26,959 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 03:02:35,910 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 03:02:35,913 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 03:02:47,234 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 03:02:47,237 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 03:02:57,276 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 03:03:08,582 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 03:03:18,958 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 03:03:18,965 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 03:03:40,799 - INFO: train loss: 1.46: 90%|########9 | 162/181 [28:59<03:22, 10.65s/it]
2025-10-21 03:03:40,807 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 03:03:51,679 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 03:03:51,686 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 03:03:51,688 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 03:04:01,565 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 03:04:01,571 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 03:04:12,757 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 03:04:22,364 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 03:04:33,208 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 03:04:33,213 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 03:04:33,214 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 03:04:42,751 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 03:05:02,986 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 03:05:13,709 - INFO: train loss: 1.45: 94%|#########4| 171/181 [30:32<01:45, 10.55s/it]
2025-10-21 03:05:13,714 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 03:05:13,716 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 03:05:13,719 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 03:05:13,721 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 03:05:13,724 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 03:05:23,691 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 03:06:47,129 - INFO: train loss: 1.42: 99%|#########9| 180/181 [32:05<00:10, 10.50s/it]
2025-10-21 03:06:47,129 - INFO: Saving last model checkpoint to /mount/output/user/gemma-2-2b-it-HUN-v2.9.1/
2025-10-21 03:07:12,550 - INFO: Starting validation inference
2025-10-21 03:07:12,551 - INFO: validation progress: 0%| | 0/6 [00:00<?, ?it/s]
2025-10-21 03:07:12,693 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 03:07:12,695 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 03:07:12,696 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 03:07:12,696 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 03:07:12,699 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 03:07:12,701 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 03:07:12,705 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 03:07:12,708 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 03:07:12,709 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 03:07:12,710 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 03:07:12,711 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 03:07:12,716 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 03:07:12,725 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 03:07:12,728 - INFO: Input exceeds max_length of 768, truncating sample.
2025-10-21 03:07:19,014 - INFO: validation progress: 17%|#6 | 1/6 [00:06<00:32, 6.46s/it]
2025-10-21 03:07:25,266 - INFO: validation progress: 33%|###3 | 2/6 [00:12<00:25, 6.34s/it]
2025-10-21 03:07:31,142 - INFO: validation progress: 50%|##### | 3/6 [00:18<00:18, 6.13s/it]
2025-10-21 03:07:37,011 - INFO: validation progress: 67%|######6 | 4/6 [00:24<00:12, 6.03s/it]
2025-10-21 03:07:42,960 - INFO: validation progress: 83%|########3 | 5/6 [00:30<00:05, 6.00s/it]
2025-10-21 03:07:47,936 - INFO: validation progress: 100%|##########| 6/6 [00:35<00:00, 5.65s/it]
2025-10-21 03:07:48,009 - INFO: validation progress: 100%|##########| 6/6 [00:35<00:00, 5.91s/it]
2025-10-21 03:07:48,060 - INFO: Validation Perplexity: 5.93114
2025-10-21 03:07:48,060 - INFO: Mean validation loss: 1.48006
2025-10-21 03:07:58,778 - INFO: train loss: 1.42: 100%|##########| 181/181 [33:17<00:00, 13.28s/it]
2025-10-21 03:07:58,851 - INFO: train loss: 1.42: 100%|##########| 181/181 [33:17<00:00, 11.03s/it]