| 2025-11-20 22:05:20,267 - INFO - Starting training with args: Namespace(regime='vision', data_path='data/training/splits_510k/train.jsonl', output_dir='outputs/production_vision_base_reconstruction_20251120_220510', objective='reconstruction', val_data_path='data/training/splits_510k/val.jsonl', max_samples=None, vision_mode='base', text_context_tokens=None, hybrid_text_tokens=0, vision_prompt='free_ocr', train_encoder=True, encoder_lr=1e-05, compression_window_size=9, compression_stride=9, subsample_strategy='regular', subsample_count=None, projection_dim=None, train_projection=False, compression_target=None, conv_kernel=5, timestamp='20251120_220510', batch_size=2, gradient_accumulation_steps=24, learning_rate=0.0001, weight_decay=0.01, num_epochs=1, warmup_ratio=0.1, max_grad_norm=1.0, log_steps=10, save_steps=0, eval_steps=2000, initial_validation=True, validation_only=False, no_checkpoints=False, num_qualitative_samples=5, max_generation_tokens=200, use_wandb=True, wandb_project='vision-compression-2', wandb_run_name='production_vision_base_reconstruction_20251120_220510', resume_from_checkpoint=None, resume=None, init_from_checkpoint=None, allow_objective_switch=False, aux_loss_weight=0.5, num_workers=8, prefetch_factor=2, seed=42, eval_seed=42, debug_log_sample_ids=False, device='cuda', compile=True, use_optimized_model=False, use_encoder_checkpointing=False) | |
| 2025-11-20 22:05:20,267 - INFO - Using preset vision prompt: 'free_ocr' → ''\nFree OCR.'' | |
| 2025-11-20 22:05:20,267 - INFO - Setting random seed: 42 | |
| 2025-11-20 22:05:21,769 - INFO - Initialized W&B run: vision-compression-2/production_vision_base_reconstruction_20251120_220510 (ID: 1jsg7rd3) | |
| 2025-11-20 22:05:21,769 - INFO - Loading model and tokenizer... | |
| 2025-11-20 22:05:30,726 - INFO - Compiling model with torch.compile... | |
| 2025-11-20 22:05:30,726 - INFO - Note: First forward pass will compile (may take several minutes) | |
| 2025-11-20 22:05:31,586 - INFO - Created Vision Compression trainer (mode: base) | |
| 2025-11-20 22:05:31,586 - INFO - Training objective: reconstruction | |
| 2025-11-20 22:05:31,620 - INFO - Logged parameter counts to W&B: total=3,336,106,240, trainable=3,336,106,240, encoder=401,369,600, decoder=2,934,736,640 | |
| 2025-11-20 22:05:31,620 - INFO - Loading training data from data/training/splits_510k/train.jsonl | |
| 2025-11-20 22:08:11,473 - INFO - Loaded 500000 samples from data/training/splits_510k/train.jsonl | |
| 2025-11-20 22:08:11,474 - INFO - Vision mode: base (273 tokens, 1024x1024) | |
| 2025-11-20 22:08:11,506 - INFO - Loading validation data from data/training/splits_510k/val.jsonl | |
| 2025-11-20 22:08:14,053 - INFO - Loaded 10000 samples from data/training/splits_510k/val.jsonl | |
| 2025-11-20 22:08:14,053 - INFO - Vision mode: base (273 tokens, 1024x1024) | |
| 2025-11-20 22:08:14,079 - INFO - Created AdamW optimizer with differential LR: | |
| Encoder: 474 param tensors @ lr=1e-05 | |
| Decoder: 2236 param tensors @ lr=0.0001 | |
| Fused kernels: True | |
| 2025-11-20 22:08:14,079 - INFO - Created scheduler with warmup_steps=1041, total_steps=10417 | |
| 2025-11-20 22:08:14,079 - INFO - Starting training loop... | |
| 2025-11-20 22:08:14,079 - INFO - | |
| ====================================================================== | |
| 2025-11-20 22:08:14,079 - INFO - Running initial validation (before any training)... | |
| 2025-11-20 22:08:14,080 - INFO - ====================================================================== | |
| 2025-11-20 22:25:38,099 - INFO - Validation loss: 0.1809, perplexity: 1.20 | |
| 2025-11-20 22:25:38,099 - INFO - Qualitative metrics (n=5): | |
| 2025-11-20 22:25:38,099 - INFO - BLEU: 0.9117 | |
| 2025-11-20 22:25:38,099 - INFO - METEOR: 0.9629 | |
| 2025-11-20 22:25:38,099 - INFO - Edit Distance: 0.1543 | |
| 2025-11-20 22:25:38,100 - INFO - F-measure: 0.9884 | |
| 2025-11-20 22:25:38,100 - INFO - | |
| ====================================================================== | |
| 2025-11-20 22:25:38,100 - INFO - Qualitative Evaluation Samples: | |
| 2025-11-20 22:25:38,100 - INFO - ====================================================================== | |
| 2025-11-20 22:25:38,100 - INFO - | |
| Sample 1 (ID: sample_141920_chunk_1): | |
| 2025-11-20 22:25:38,100 - INFO - Context: [Image: sample_141920_chunk_1] + " | |
| Free OCR." | |
| 2025-11-20 22:25:38,100 - INFO - Generated: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' | |
| 2025-11-20 22:25:38,100 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' | |
| 2025-11-20 22:25:38,100 - INFO - ---------------------------------------------------------------------- | |
| 2025-11-20 22:25:38,100 - INFO - | |
| Sample 2 (ID: sample_170543_chunk_2): | |
| 2025-11-20 22:25:38,101 - INFO - Context: [Image: sample_170543_chunk_2] + " | |
| Free OCR." | |
| 2025-11-20 22:25:38,101 - INFO - Generated: 'was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROTC;...' | |
| 2025-11-20 22:25:38,101 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' | |
| 2025-11-20 22:25:38,101 - INFO - ---------------------------------------------------------------------- | |
| 2025-11-20 22:25:38,101 - INFO - | |
| Sample 3 (ID: sample_107152_chunk_9): | |
| 2025-11-20 22:25:38,101 - INFO - Context: [Image: sample_107152_chunk_9] + " | |
| Free OCR." | |
| 2025-11-20 22:25:38,101 - INFO - Generated: 'at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and bo...' | |
| 2025-11-20 22:25:38,101 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' | |
| 2025-11-20 22:25:38,101 - INFO - ---------------------------------------------------------------------- | |
| 2025-11-20 22:25:38,101 - INFO - | |
| Sample 4 (ID: sample_069148_chunk_0): | |
| 2025-11-20 22:25:38,101 - INFO - Context: [Image: sample_069148_chunk_0] + " | |
| Free OCR." | |
| 2025-11-20 22:25:38,101 - INFO - Generated: '# Oriya (Unicode block) Oriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' | |
| 2025-11-20 22:25:38,102 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' | |
| 2025-11-20 22:25:38,102 - INFO - ---------------------------------------------------------------------- | |
| 2025-11-20 22:25:38,102 - INFO - | |
| Sample 5 (ID: sample_103176_chunk_4): | |
| 2025-11-20 22:25:38,102 - INFO - Context: [Image: sample_103176_chunk_4] + " | |
| Free OCR." | |
| 2025-11-20 22:25:38,102 - INFO - Generated: '| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores |\n|-----------------------|------------|---------|----------------------|\n| [ 132 ] | Ultima Underworld: The Stygian Abyss and...' | |
| 2025-11-20 22:25:38,102 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' | |
| 2025-11-20 22:25:38,102 - INFO - ---------------------------------------------------------------------- | |
| 2025-11-20 22:25:38,103 - INFO - | |
| Qualitative samples saved to: outputs/production_vision_base_reconstruction_20251120_220510/qualitative_step_0.jsonl | |
| 2025-11-20 22:25:39,225 - INFO - Initial validation - Loss: 0.1809, Perplexity: 1.20 | |
| 2025-11-20 22:25:39,225 - INFO - ====================================================================== | |
| 2025-11-20 22:25:39,226 - INFO - | |
| ====================================================================== | |
| 2025-11-20 22:25:39,226 - INFO - Epoch 1/1 | |
| 2025-11-20 22:25:39,226 - INFO - ====================================================================== | |
| 2025-11-20 22:26:25,697 - INFO - Effective context tokens (per-sample): 278 | Compression ratio: 3.60x | |
| 2025-11-20 22:26:25,697 - INFO - Target tokens per sample: 1000 | |
| 2025-11-20 22:31:07,015 - INFO - Epoch 1 Step 10 (Global: 10): loss=0.1329, ppl=1.14, grad_norm=1.19, lr=1.09e-06, throughput=1464 tok/s | |
| 2025-11-20 22:35:41,475 - INFO - Epoch 1 Step 20 (Global: 20): loss=0.1136, ppl=1.12, grad_norm=0.93, lr=1.17e-06, throughput=1749 tok/s | |
| 2025-11-20 22:40:16,781 - INFO - Epoch 1 Step 30 (Global: 30): loss=0.0966, ppl=1.10, grad_norm=0.71, lr=1.26e-06, throughput=1744 tok/s | |
| 2025-11-20 22:44:42,600 - INFO - Epoch 1 Step 40 (Global: 40): loss=0.1272, ppl=1.14, grad_norm=0.81, lr=1.35e-06, throughput=1806 tok/s | |
| 2025-11-20 22:49:19,540 - INFO - Epoch 1 Step 50 (Global: 50): loss=0.0939, ppl=1.10, grad_norm=0.88, lr=1.43e-06, throughput=1733 tok/s | |
| 2025-11-20 22:54:00,461 - INFO - Epoch 1 Step 60 (Global: 60): loss=0.1028, ppl=1.11, grad_norm=0.86, lr=1.52e-06, throughput=1709 tok/s | |
| 2025-11-20 22:58:36,914 - INFO - Epoch 1 Step 70 (Global: 70): loss=0.1129, ppl=1.12, grad_norm=1.37, lr=1.61e-06, throughput=1736 tok/s | |
| 2025-11-20 23:03:07,519 - INFO - Epoch 1 Step 80 (Global: 80): loss=0.0624, ppl=1.06, grad_norm=0.93, lr=1.69e-06, throughput=1774 tok/s | |
| 2025-11-20 23:07:45,791 - INFO - Epoch 1 Step 90 (Global: 90): loss=0.1011, ppl=1.11, grad_norm=1.61, lr=1.78e-06, throughput=1725 tok/s | |
| 2025-11-20 23:12:25,821 - INFO - Epoch 1 Step 100 (Global: 100): loss=0.0949, ppl=1.10, grad_norm=0.91, lr=1.86e-06, throughput=1714 tok/s | |
| 2025-11-20 23:17:07,145 - INFO - Epoch 1 Step 110 (Global: 110): loss=0.1003, ppl=1.11, grad_norm=1.20, lr=1.95e-06, throughput=1706 tok/s | |
| 2025-11-20 23:21:35,047 - INFO - Epoch 1 Step 120 (Global: 120): loss=0.0755, ppl=1.08, grad_norm=1.05, lr=2.04e-06, throughput=1792 tok/s | |
| 2025-11-20 23:26:08,678 - INFO - Epoch 1 Step 130 (Global: 130): loss=0.0999, ppl=1.11, grad_norm=0.81, lr=2.12e-06, throughput=1754 tok/s | |
| 2025-11-20 23:30:49,413 - INFO - Epoch 1 Step 140 (Global: 140): loss=0.0939, ppl=1.10, grad_norm=2.62, lr=2.21e-06, throughput=1710 tok/s | |
| 2025-11-20 23:35:23,149 - INFO - Epoch 1 Step 150 (Global: 150): loss=0.1064, ppl=1.11, grad_norm=1.08, lr=2.30e-06, throughput=1754 tok/s | |
| 2025-11-20 23:40:00,168 - INFO - Epoch 1 Step 160 (Global: 160): loss=0.0677, ppl=1.07, grad_norm=0.82, lr=2.38e-06, throughput=1733 tok/s | |
| 2025-11-20 23:44:58,261 - INFO - Epoch 1 Step 170 (Global: 170): loss=0.0801, ppl=1.08, grad_norm=0.82, lr=2.47e-06, throughput=1610 tok/s | |
| 2025-11-20 23:49:56,186 - INFO - Epoch 1 Step 180 (Global: 180): loss=0.0623, ppl=1.06, grad_norm=0.68, lr=2.56e-06, throughput=1611 tok/s | |
| 2025-11-20 23:54:48,005 - INFO - Epoch 1 Step 190 (Global: 190): loss=0.0610, ppl=1.06, grad_norm=0.88, lr=2.64e-06, throughput=1645 tok/s | |
| 2025-11-20 23:59:28,322 - INFO - Epoch 1 Step 200 (Global: 200): loss=0.0778, ppl=1.08, grad_norm=1.02, lr=2.73e-06, throughput=1712 tok/s | |
| 2025-11-21 00:04:21,090 - INFO - Epoch 1 Step 210 (Global: 210): loss=0.0711, ppl=1.07, grad_norm=1.00, lr=2.82e-06, throughput=1640 tok/s | |
| 2025-11-21 00:09:12,205 - INFO - Epoch 1 Step 220 (Global: 220): loss=0.0933, ppl=1.10, grad_norm=2.52, lr=2.90e-06, throughput=1649 tok/s | |
| 2025-11-21 00:13:59,919 - INFO - Epoch 1 Step 230 (Global: 230): loss=0.0714, ppl=1.07, grad_norm=1.34, lr=2.99e-06, throughput=1668 tok/s | |
| 2025-11-21 00:18:42,155 - INFO - Epoch 1 Step 240 (Global: 240): loss=0.0898, ppl=1.09, grad_norm=1.38, lr=3.07e-06, throughput=1701 tok/s | |
| 2025-11-21 00:23:32,870 - INFO - Epoch 1 Step 250 (Global: 250): loss=0.0852, ppl=1.09, grad_norm=0.98, lr=3.16e-06, throughput=1651 tok/s | |
| 2025-11-21 00:28:24,455 - INFO - Epoch 1 Step 260 (Global: 260): loss=0.0888, ppl=1.09, grad_norm=1.29, lr=3.25e-06, throughput=1646 tok/s | |
| 2025-11-21 00:33:19,712 - INFO - Epoch 1 Step 270 (Global: 270): loss=0.0883, ppl=1.09, grad_norm=0.95, lr=3.33e-06, throughput=1626 tok/s | |
| 2025-11-21 00:38:02,760 - INFO - Epoch 1 Step 280 (Global: 280): loss=0.0757, ppl=1.08, grad_norm=2.02, lr=3.42e-06, throughput=1696 tok/s | |
| 2025-11-21 00:42:56,009 - INFO - Epoch 1 Step 290 (Global: 290): loss=0.1040, ppl=1.11, grad_norm=1.85, lr=3.51e-06, throughput=1637 tok/s | |
| 2025-11-21 00:47:46,840 - INFO - Epoch 1 Step 300 (Global: 300): loss=0.0745, ppl=1.08, grad_norm=1.02, lr=3.59e-06, throughput=1650 tok/s | |
| 2025-11-21 00:52:37,858 - INFO - Epoch 1 Step 310 (Global: 310): loss=0.0846, ppl=1.09, grad_norm=0.98, lr=3.68e-06, throughput=1649 tok/s | |
| 2025-11-21 00:57:20,830 - INFO - Epoch 1 Step 320 (Global: 320): loss=0.0772, ppl=1.08, grad_norm=0.75, lr=3.77e-06, throughput=1696 tok/s | |
| 2025-11-21 01:02:12,895 - INFO - Epoch 1 Step 330 (Global: 330): loss=0.0785, ppl=1.08, grad_norm=1.18, lr=3.85e-06, throughput=1643 tok/s | |
| 2025-11-21 01:07:07,960 - INFO - Epoch 1 Step 340 (Global: 340): loss=0.0918, ppl=1.10, grad_norm=0.89, lr=3.94e-06, throughput=1627 tok/s | |
| 2025-11-21 01:11:54,495 - INFO - Epoch 1 Step 350 (Global: 350): loss=0.0628, ppl=1.06, grad_norm=1.00, lr=4.03e-06, throughput=1675 tok/s | |
| 2025-11-21 01:16:29,555 - INFO - Epoch 1 Step 360 (Global: 360): loss=0.0801, ppl=1.08, grad_norm=0.89, lr=4.11e-06, throughput=1745 tok/s | |
| 2025-11-21 01:21:18,039 - INFO - Epoch 1 Step 370 (Global: 370): loss=0.0723, ppl=1.07, grad_norm=0.92, lr=4.20e-06, throughput=1664 tok/s | |
| 2025-11-21 01:26:06,241 - INFO - Epoch 1 Step 380 (Global: 380): loss=0.0557, ppl=1.06, grad_norm=0.88, lr=4.29e-06, throughput=1666 tok/s | |
| 2025-11-21 01:30:41,027 - INFO - Epoch 1 Step 390 (Global: 390): loss=0.0877, ppl=1.09, grad_norm=1.99, lr=4.37e-06, throughput=1747 tok/s | |
| 2025-11-21 01:35:27,663 - INFO - Epoch 1 Step 400 (Global: 400): loss=0.0766, ppl=1.08, grad_norm=0.71, lr=4.46e-06, throughput=1675 tok/s | |
| 2025-11-21 01:40:13,248 - INFO - Epoch 1 Step 410 (Global: 410): loss=0.0879, ppl=1.09, grad_norm=1.55, lr=4.54e-06, throughput=1681 tok/s | |
| 2025-11-21 01:45:02,821 - INFO - Epoch 1 Step 420 (Global: 420): loss=0.0678, ppl=1.07, grad_norm=1.22, lr=4.63e-06, throughput=1658 tok/s | |
| 2025-11-21 01:49:39,594 - INFO - Epoch 1 Step 430 (Global: 430): loss=0.0784, ppl=1.08, grad_norm=1.23, lr=4.72e-06, throughput=1734 tok/s | |
| 2025-11-21 01:54:27,867 - INFO - Epoch 1 Step 440 (Global: 440): loss=0.0727, ppl=1.08, grad_norm=0.79, lr=4.80e-06, throughput=1665 tok/s | |
| 2025-11-21 01:59:16,427 - INFO - Epoch 1 Step 450 (Global: 450): loss=0.0557, ppl=1.06, grad_norm=0.70, lr=4.89e-06, throughput=1663 tok/s | |
| 2025-11-21 02:04:05,270 - INFO - Epoch 1 Step 460 (Global: 460): loss=0.0816, ppl=1.09, grad_norm=1.01, lr=4.98e-06, throughput=1662 tok/s | |
| 2025-11-21 02:08:43,350 - INFO - Epoch 1 Step 470 (Global: 470): loss=0.0619, ppl=1.06, grad_norm=0.67, lr=5.06e-06, throughput=1726 tok/s | |
| 2025-11-21 02:13:32,031 - INFO - Epoch 1 Step 480 (Global: 480): loss=0.0807, ppl=1.08, grad_norm=1.25, lr=5.15e-06, throughput=1663 tok/s | |
| 2025-11-21 02:18:21,690 - INFO - Epoch 1 Step 490 (Global: 490): loss=0.0680, ppl=1.07, grad_norm=0.93, lr=5.24e-06, throughput=1657 tok/s | |
| 2025-11-21 02:23:11,378 - INFO - Epoch 1 Step 500 (Global: 500): loss=0.0711, ppl=1.07, grad_norm=0.79, lr=5.32e-06, throughput=1657 tok/s | |
| 2025-11-21 02:27:46,472 - INFO - Epoch 1 Step 510 (Global: 510): loss=0.0669, ppl=1.07, grad_norm=1.09, lr=5.41e-06, throughput=1745 tok/s | |
| 2025-11-21 02:32:32,830 - INFO - Epoch 1 Step 520 (Global: 520): loss=0.0689, ppl=1.07, grad_norm=0.73, lr=5.50e-06, throughput=1676 tok/s | |
| 2025-11-21 02:37:18,077 - INFO - Epoch 1 Step 530 (Global: 530): loss=0.0735, ppl=1.08, grad_norm=0.63, lr=5.58e-06, throughput=1683 tok/s | |
| 2025-11-21 02:42:03,662 - INFO - Epoch 1 Step 540 (Global: 540): loss=0.0701, ppl=1.07, grad_norm=0.75, lr=5.67e-06, throughput=1681 tok/s | |
| 2025-11-21 02:46:37,603 - INFO - Epoch 1 Step 550 (Global: 550): loss=0.0717, ppl=1.07, grad_norm=1.15, lr=5.76e-06, throughput=1752 tok/s | |
| 2025-11-21 02:51:24,492 - INFO - Epoch 1 Step 560 (Global: 560): loss=0.0760, ppl=1.08, grad_norm=1.44, lr=5.84e-06, throughput=1673 tok/s | |
| 2025-11-21 02:56:11,370 - INFO - Epoch 1 Step 570 (Global: 570): loss=0.0861, ppl=1.09, grad_norm=0.82, lr=5.93e-06, throughput=1673 tok/s | |
| 2025-11-21 03:00:57,000 - INFO - Epoch 1 Step 580 (Global: 580): loss=0.0764, ppl=1.08, grad_norm=0.75, lr=6.01e-06, throughput=1681 tok/s | |
| 2025-11-21 03:05:33,083 - INFO - Epoch 1 Step 590 (Global: 590): loss=0.0759, ppl=1.08, grad_norm=1.02, lr=6.10e-06, throughput=1739 tok/s | |
| 2025-11-21 03:10:20,437 - INFO - Epoch 1 Step 600 (Global: 600): loss=0.0642, ppl=1.07, grad_norm=0.80, lr=6.19e-06, throughput=1670 tok/s | |
| 2025-11-21 03:15:11,140 - INFO - Epoch 1 Step 610 (Global: 610): loss=0.0661, ppl=1.07, grad_norm=0.94, lr=6.27e-06, throughput=1651 tok/s | |
| 2025-11-21 03:19:57,806 - INFO - Epoch 1 Step 620 (Global: 620): loss=0.0674, ppl=1.07, grad_norm=0.99, lr=6.36e-06, throughput=1674 tok/s | |
| 2025-11-21 03:24:32,706 - INFO - Epoch 1 Step 630 (Global: 630): loss=0.0681, ppl=1.07, grad_norm=0.73, lr=6.45e-06, throughput=1746 tok/s | |
| 2025-11-21 03:29:18,305 - INFO - Epoch 1 Step 640 (Global: 640): loss=0.0896, ppl=1.09, grad_norm=0.79, lr=6.53e-06, throughput=1681 tok/s | |
| 2025-11-21 03:34:04,466 - INFO - Epoch 1 Step 650 (Global: 650): loss=0.0732, ppl=1.08, grad_norm=1.10, lr=6.62e-06, throughput=1677 tok/s | |
| 2025-11-21 03:38:49,304 - INFO - Epoch 1 Step 660 (Global: 660): loss=0.0752, ppl=1.08, grad_norm=0.85, lr=6.71e-06, throughput=1685 tok/s | |
| 2025-11-21 03:43:24,904 - INFO - Epoch 1 Step 670 (Global: 670): loss=0.0756, ppl=1.08, grad_norm=0.70, lr=6.79e-06, throughput=1742 tok/s | |
| 2025-11-21 03:48:10,882 - INFO - Epoch 1 Step 680 (Global: 680): loss=0.0952, ppl=1.10, grad_norm=1.23, lr=6.88e-06, throughput=1679 tok/s | |
| 2025-11-21 03:52:56,085 - INFO - Epoch 1 Step 690 (Global: 690): loss=0.0866, ppl=1.09, grad_norm=0.88, lr=6.97e-06, throughput=1683 tok/s | |
| 2025-11-21 03:57:39,717 - INFO - Epoch 1 Step 700 (Global: 700): loss=0.0657, ppl=1.07, grad_norm=0.64, lr=7.05e-06, throughput=1692 tok/s | |
| 2025-11-21 04:02:14,655 - INFO - Epoch 1 Step 710 (Global: 710): loss=0.0798, ppl=1.08, grad_norm=0.93, lr=7.14e-06, throughput=1746 tok/s | |
| 2025-11-21 04:06:59,745 - INFO - Epoch 1 Step 720 (Global: 720): loss=0.0882, ppl=1.09, grad_norm=0.85, lr=7.22e-06, throughput=1684 tok/s | |
| 2025-11-21 04:11:46,026 - INFO - Epoch 1 Step 730 (Global: 730): loss=0.0629, ppl=1.06, grad_norm=0.75, lr=7.31e-06, throughput=1677 tok/s | |
| 2025-11-21 04:16:31,999 - INFO - Epoch 1 Step 740 (Global: 740): loss=0.0619, ppl=1.06, grad_norm=0.82, lr=7.40e-06, throughput=1678 tok/s | |
| 2025-11-21 04:21:07,328 - INFO - Epoch 1 Step 750 (Global: 750): loss=0.0760, ppl=1.08, grad_norm=0.82, lr=7.48e-06, throughput=1743 tok/s | |
| 2025-11-21 04:25:54,789 - INFO - Epoch 1 Step 760 (Global: 760): loss=0.0633, ppl=1.07, grad_norm=0.77, lr=7.57e-06, throughput=1670 tok/s | |
| 2025-11-21 04:30:41,496 - INFO - Epoch 1 Step 770 (Global: 770): loss=0.0715, ppl=1.07, grad_norm=1.08, lr=7.66e-06, throughput=1674 tok/s | |
| 2025-11-21 04:35:27,088 - INFO - Epoch 1 Step 780 (Global: 780): loss=0.0827, ppl=1.09, grad_norm=0.64, lr=7.74e-06, throughput=1681 tok/s | |
| 2025-11-21 04:40:01,047 - INFO - Epoch 1 Step 790 (Global: 790): loss=0.0876, ppl=1.09, grad_norm=0.70, lr=7.83e-06, throughput=1752 tok/s | |
| 2025-11-21 04:44:47,652 - INFO - Epoch 1 Step 800 (Global: 800): loss=0.0764, ppl=1.08, grad_norm=0.77, lr=7.92e-06, throughput=1675 tok/s | |
| 2025-11-21 04:49:32,858 - INFO - Epoch 1 Step 810 (Global: 810): loss=0.0654, ppl=1.07, grad_norm=0.74, lr=8.00e-06, throughput=1683 tok/s | |
| 2025-11-21 04:54:18,092 - INFO - Epoch 1 Step 820 (Global: 820): loss=0.0857, ppl=1.09, grad_norm=0.98, lr=8.09e-06, throughput=1683 tok/s | |
| 2025-11-21 04:58:49,142 - INFO - Epoch 1 Step 830 (Global: 830): loss=0.0793, ppl=1.08, grad_norm=0.69, lr=8.18e-06, throughput=1771 tok/s | |
| 2025-11-21 05:03:35,705 - INFO - Epoch 1 Step 840 (Global: 840): loss=0.0588, ppl=1.06, grad_norm=0.64, lr=8.26e-06, throughput=1675 tok/s | |
| 2025-11-21 05:08:20,436 - INFO - Epoch 1 Step 850 (Global: 850): loss=0.0684, ppl=1.07, grad_norm=0.73, lr=8.35e-06, throughput=1686 tok/s | |
| 2025-11-21 05:13:05,829 - INFO - Epoch 1 Step 860 (Global: 860): loss=0.0859, ppl=1.09, grad_norm=0.73, lr=8.44e-06, throughput=1682 tok/s | |
| 2025-11-21 05:17:40,621 - INFO - Epoch 1 Step 870 (Global: 870): loss=0.0778, ppl=1.08, grad_norm=1.06, lr=8.52e-06, throughput=1747 tok/s | |
| 2025-11-21 05:22:26,624 - INFO - Epoch 1 Step 880 (Global: 880): loss=0.0928, ppl=1.10, grad_norm=0.92, lr=8.61e-06, throughput=1678 tok/s | |
| 2025-11-21 05:27:12,650 - INFO - Epoch 1 Step 890 (Global: 890): loss=0.0859, ppl=1.09, grad_norm=0.71, lr=8.69e-06, throughput=1678 tok/s | |
| 2025-11-21 05:31:57,951 - INFO - Epoch 1 Step 900 (Global: 900): loss=0.0815, ppl=1.08, grad_norm=0.96, lr=8.78e-06, throughput=1682 tok/s | |
| 2025-11-21 05:36:32,164 - INFO - Epoch 1 Step 910 (Global: 910): loss=0.0836, ppl=1.09, grad_norm=0.80, lr=8.87e-06, throughput=1750 tok/s | |
| 2025-11-21 05:41:18,875 - INFO - Epoch 1 Step 920 (Global: 920): loss=0.0722, ppl=1.07, grad_norm=0.93, lr=8.95e-06, throughput=1674 tok/s | |
| 2025-11-21 05:46:06,236 - INFO - Epoch 1 Step 930 (Global: 930): loss=0.0671, ppl=1.07, grad_norm=0.77, lr=9.04e-06, throughput=1670 tok/s | |
| 2025-11-21 05:50:52,566 - INFO - Epoch 1 Step 940 (Global: 940): loss=0.0830, ppl=1.09, grad_norm=0.78, lr=9.13e-06, throughput=1676 tok/s | |
| 2025-11-21 05:55:24,742 - INFO - Epoch 1 Step 950 (Global: 950): loss=0.0628, ppl=1.06, grad_norm=0.59, lr=9.21e-06, throughput=1764 tok/s | |
| 2025-11-21 06:00:10,311 - INFO - Epoch 1 Step 960 (Global: 960): loss=0.0711, ppl=1.07, grad_norm=0.49, lr=9.30e-06, throughput=1681 tok/s | |
| 2025-11-21 06:04:55,920 - INFO - Epoch 1 Step 970 (Global: 970): loss=0.0726, ppl=1.08, grad_norm=0.76, lr=9.39e-06, throughput=1681 tok/s | |
| 2025-11-21 06:09:42,549 - INFO - Epoch 1 Step 980 (Global: 980): loss=0.0789, ppl=1.08, grad_norm=0.64, lr=9.47e-06, throughput=1675 tok/s | |
| 2025-11-21 06:14:18,453 - INFO - Epoch 1 Step 990 (Global: 990): loss=0.0798, ppl=1.08, grad_norm=0.76, lr=9.56e-06, throughput=1740 tok/s | |
| 2025-11-21 06:19:04,557 - INFO - Epoch 1 Step 1000 (Global: 1000): loss=0.0668, ppl=1.07, grad_norm=1.26, lr=9.65e-06, throughput=1678 tok/s | |
| 2025-11-21 06:23:49,992 - INFO - Epoch 1 Step 1010 (Global: 1010): loss=0.0623, ppl=1.06, grad_norm=0.57, lr=9.73e-06, throughput=1682 tok/s | |
| 2025-11-21 06:28:35,416 - INFO - Epoch 1 Step 1020 (Global: 1020): loss=0.0577, ppl=1.06, grad_norm=0.66, lr=9.82e-06, throughput=1682 tok/s | |
| 2025-11-21 06:33:08,982 - INFO - Epoch 1 Step 1030 (Global: 1030): loss=0.0615, ppl=1.06, grad_norm=0.63, lr=9.90e-06, throughput=1755 tok/s | |
| 2025-11-21 06:37:52,367 - INFO - Epoch 1 Step 1040 (Global: 1040): loss=0.0714, ppl=1.07, grad_norm=0.66, lr=9.99e-06, throughput=1694 tok/s | |
| 2025-11-21 06:42:45,709 - INFO - Epoch 1 Step 1050 (Global: 1050): loss=0.0645, ppl=1.07, grad_norm=1.21, lr=1.00e-05, throughput=1636 tok/s | |
| 2025-11-21 06:47:35,116 - INFO - Epoch 1 Step 1060 (Global: 1060): loss=0.0766, ppl=1.08, grad_norm=0.59, lr=1.00e-05, throughput=1659 tok/s | |
| 2025-11-21 06:52:08,387 - INFO - Epoch 1 Step 1070 (Global: 1070): loss=0.0857, ppl=1.09, grad_norm=0.88, lr=1.00e-05, throughput=1757 tok/s | |
| 2025-11-21 06:56:53,360 - INFO - Epoch 1 Step 1080 (Global: 1080): loss=0.0652, ppl=1.07, grad_norm=0.64, lr=1.00e-05, throughput=1684 tok/s | |
| 2025-11-21 07:01:38,915 - INFO - Epoch 1 Step 1090 (Global: 1090): loss=0.0728, ppl=1.08, grad_norm=0.94, lr=1.00e-05, throughput=1681 tok/s | |
| 2025-11-21 07:06:13,465 - INFO - Epoch 1 Step 1100 (Global: 1100): loss=0.0846, ppl=1.09, grad_norm=0.95, lr=1.00e-05, throughput=1748 tok/s | |
| 2025-11-21 07:11:00,450 - INFO - Epoch 1 Step 1110 (Global: 1110): loss=0.0801, ppl=1.08, grad_norm=0.85, lr=1.00e-05, throughput=1673 tok/s | |
| 2025-11-21 07:15:47,684 - INFO - Epoch 1 Step 1120 (Global: 1120): loss=0.0653, ppl=1.07, grad_norm=0.64, lr=1.00e-05, throughput=1671 tok/s | |
| 2025-11-21 07:20:35,314 - INFO - Epoch 1 Step 1130 (Global: 1130): loss=0.0840, ppl=1.09, grad_norm=0.78, lr=1.00e-05, throughput=1669 tok/s | |
| 2025-11-21 07:25:12,731 - INFO - Epoch 1 Step 1140 (Global: 1140): loss=0.0862, ppl=1.09, grad_norm=0.61, lr=1.00e-05, throughput=1730 tok/s | |
| 2025-11-21 07:29:59,587 - INFO - Epoch 1 Step 1150 (Global: 1150): loss=0.0550, ppl=1.06, grad_norm=0.61, lr=1.00e-05, throughput=1673 tok/s | |
| 2025-11-21 07:34:46,583 - INFO - Epoch 1 Step 1160 (Global: 1160): loss=0.0890, ppl=1.09, grad_norm=0.82, lr=1.00e-05, throughput=1673 tok/s | |
| 2025-11-21 07:39:33,625 - INFO - Epoch 1 Step 1170 (Global: 1170): loss=0.0699, ppl=1.07, grad_norm=0.76, lr=1.00e-05, throughput=1672 tok/s | |
| 2025-11-21 07:44:08,616 - INFO - Epoch 1 Step 1180 (Global: 1180): loss=0.0743, ppl=1.08, grad_norm=0.82, lr=9.99e-06, throughput=1746 tok/s | |
| 2025-11-21 07:48:54,812 - INFO - Epoch 1 Step 1190 (Global: 1190): loss=0.0715, ppl=1.07, grad_norm=0.85, lr=9.99e-06, throughput=1677 tok/s | |
| 2025-11-21 07:53:42,607 - INFO - Epoch 1 Step 1200 (Global: 1200): loss=0.0710, ppl=1.07, grad_norm=0.75, lr=9.99e-06, throughput=1668 tok/s | |
| 2025-11-21 07:58:29,378 - INFO - Epoch 1 Step 1210 (Global: 1210): loss=0.0613, ppl=1.06, grad_norm=0.69, lr=9.99e-06, throughput=1674 tok/s | |
| 2025-11-21 08:03:04,991 - INFO - Epoch 1 Step 1220 (Global: 1220): loss=0.0841, ppl=1.09, grad_norm=0.63, lr=9.99e-06, throughput=1742 tok/s | |
| 2025-11-21 08:07:50,476 - INFO - Epoch 1 Step 1230 (Global: 1230): loss=0.0628, ppl=1.06, grad_norm=0.75, lr=9.99e-06, throughput=1681 tok/s | |
| 2025-11-21 08:12:37,734 - INFO - Epoch 1 Step 1240 (Global: 1240): loss=0.0617, ppl=1.06, grad_norm=0.57, lr=9.99e-06, throughput=1671 tok/s | |
| 2025-11-21 08:17:24,729 - INFO - Epoch 1 Step 1250 (Global: 1250): loss=0.0693, ppl=1.07, grad_norm=0.61, lr=9.99e-06, throughput=1673 tok/s | |
| 2025-11-21 08:21:59,915 - INFO - Epoch 1 Step 1260 (Global: 1260): loss=0.0674, ppl=1.07, grad_norm=0.53, lr=9.99e-06, throughput=1744 tok/s | |
| 2025-11-21 08:26:45,386 - INFO - Epoch 1 Step 1270 (Global: 1270): loss=0.0646, ppl=1.07, grad_norm=0.62, lr=9.99e-06, throughput=1681 tok/s | |
| 2025-11-21 08:31:31,079 - INFO - Epoch 1 Step 1280 (Global: 1280): loss=0.0658, ppl=1.07, grad_norm=0.71, lr=9.98e-06, throughput=1680 tok/s | |
| 2025-11-21 08:36:18,021 - INFO - Epoch 1 Step 1290 (Global: 1290): loss=0.0460, ppl=1.05, grad_norm=0.47, lr=9.98e-06, throughput=1673 tok/s | |
| 2025-11-21 08:40:52,974 - INFO - Epoch 1 Step 1300 (Global: 1300): loss=0.0712, ppl=1.07, grad_norm=0.96, lr=9.98e-06, throughput=1746 tok/s | |
| 2025-11-21 08:45:39,608 - INFO - Epoch 1 Step 1310 (Global: 1310): loss=0.0625, ppl=1.06, grad_norm=0.86, lr=9.98e-06, throughput=1675 tok/s | |
| 2025-11-21 08:50:26,151 - INFO - Epoch 1 Step 1320 (Global: 1320): loss=0.0628, ppl=1.06, grad_norm=0.74, lr=9.98e-06, throughput=1675 tok/s | |
| 2025-11-21 08:55:14,710 - INFO - Epoch 1 Step 1330 (Global: 1330): loss=0.0535, ppl=1.05, grad_norm=0.69, lr=9.98e-06, throughput=1663 tok/s | |
| 2025-11-21 08:59:50,638 - INFO - Epoch 1 Step 1340 (Global: 1340): loss=0.0618, ppl=1.06, grad_norm=0.73, lr=9.97e-06, throughput=1740 tok/s | |
| 2025-11-21 09:04:36,159 - INFO - Epoch 1 Step 1350 (Global: 1350): loss=0.0549, ppl=1.06, grad_norm=0.41, lr=9.97e-06, throughput=1681 tok/s | |
| 2025-11-21 09:09:23,478 - INFO - Epoch 1 Step 1360 (Global: 1360): loss=0.0633, ppl=1.07, grad_norm=0.56, lr=9.97e-06, throughput=1671 tok/s | |
| 2025-11-21 09:14:09,544 - INFO - Epoch 1 Step 1370 (Global: 1370): loss=0.0605, ppl=1.06, grad_norm=0.68, lr=9.97e-06, throughput=1678 tok/s | |
| 2025-11-21 09:18:43,904 - INFO - Epoch 1 Step 1380 (Global: 1380): loss=0.0668, ppl=1.07, grad_norm=0.66, lr=9.97e-06, throughput=1750 tok/s | |
| 2025-11-21 09:23:31,057 - INFO - Epoch 1 Step 1390 (Global: 1390): loss=0.0629, ppl=1.06, grad_norm=0.67, lr=9.97e-06, throughput=1672 tok/s | |
| 2025-11-21 09:28:19,934 - INFO - Epoch 1 Step 1400 (Global: 1400): loss=0.0883, ppl=1.09, grad_norm=0.82, lr=9.96e-06, throughput=1662 tok/s | |
| 2025-11-21 09:33:10,494 - INFO - Epoch 1 Step 1410 (Global: 1410): loss=0.0719, ppl=1.07, grad_norm=1.18, lr=9.96e-06, throughput=1652 tok/s | |
| 2025-11-21 09:37:48,496 - INFO - Epoch 1 Step 1420 (Global: 1420): loss=0.0984, ppl=1.10, grad_norm=0.97, lr=9.96e-06, throughput=1727 tok/s | |
| 2025-11-21 09:42:35,955 - INFO - Epoch 1 Step 1430 (Global: 1430): loss=0.0761, ppl=1.08, grad_norm=0.64, lr=9.96e-06, throughput=1670 tok/s | |
| 2025-11-21 09:47:22,914 - INFO - Epoch 1 Step 1440 (Global: 1440): loss=0.0576, ppl=1.06, grad_norm=0.60, lr=9.96e-06, throughput=1673 tok/s | |
| 2025-11-21 09:52:09,334 - INFO - Epoch 1 Step 1450 (Global: 1450): loss=0.0578, ppl=1.06, grad_norm=0.89, lr=9.95e-06, throughput=1676 tok/s | |
| 2025-11-21 09:56:43,333 - INFO - Epoch 1 Step 1460 (Global: 1460): loss=0.0604, ppl=1.06, grad_norm=0.61, lr=9.95e-06, throughput=1752 tok/s | |
| 2025-11-21 10:01:29,529 - INFO - Epoch 1 Step 1470 (Global: 1470): loss=0.0857, ppl=1.09, grad_norm=0.72, lr=9.95e-06, throughput=1677 tok/s | |
| 2025-11-21 10:06:16,818 - INFO - Epoch 1 Step 1480 (Global: 1480): loss=0.0619, ppl=1.06, grad_norm=0.64, lr=9.95e-06, throughput=1671 tok/s | |
| 2025-11-21 10:11:03,344 - INFO - Epoch 1 Step 1490 (Global: 1490): loss=0.0639, ppl=1.07, grad_norm=0.68, lr=9.94e-06, throughput=1675 tok/s | |
| 2025-11-21 10:15:41,211 - INFO - Epoch 1 Step 1500 (Global: 1500): loss=0.0731, ppl=1.08, grad_norm=0.88, lr=9.94e-06, throughput=1727 tok/s | |
| 2025-11-21 10:20:29,671 - INFO - Epoch 1 Step 1510 (Global: 1510): loss=0.0517, ppl=1.05, grad_norm=0.61, lr=9.94e-06, throughput=1664 tok/s | |
| 2025-11-21 10:25:15,639 - INFO - Epoch 1 Step 1520 (Global: 1520): loss=0.0755, ppl=1.08, grad_norm=0.67, lr=9.94e-06, throughput=1679 tok/s | |
| 2025-11-21 10:30:03,523 - INFO - Epoch 1 Step 1530 (Global: 1530): loss=0.0510, ppl=1.05, grad_norm=0.73, lr=9.93e-06, throughput=1667 tok/s | |
| 2025-11-21 10:34:39,810 - INFO - Epoch 1 Step 1540 (Global: 1540): loss=0.0551, ppl=1.06, grad_norm=0.57, lr=9.93e-06, throughput=1737 tok/s | |
| 2025-11-21 10:39:25,812 - INFO - Epoch 1 Step 1550 (Global: 1550): loss=0.0585, ppl=1.06, grad_norm=0.67, lr=9.93e-06, throughput=1678 tok/s | |
| 2025-11-21 10:44:12,808 - INFO - Epoch 1 Step 1560 (Global: 1560): loss=0.0526, ppl=1.05, grad_norm=0.72, lr=9.92e-06, throughput=1673 tok/s | |
| 2025-11-21 10:48:58,995 - INFO - Epoch 1 Step 1570 (Global: 1570): loss=0.0580, ppl=1.06, grad_norm=0.70, lr=9.92e-06, throughput=1677 tok/s | |
| 2025-11-21 10:53:32,229 - INFO - Epoch 1 Step 1580 (Global: 1580): loss=0.0555, ppl=1.06, grad_norm=0.65, lr=9.92e-06, throughput=1757 tok/s | |
| 2025-11-21 10:58:18,241 - INFO - Epoch 1 Step 1590 (Global: 1590): loss=0.0612, ppl=1.06, grad_norm=0.56, lr=9.92e-06, throughput=1678 tok/s | |
| 2025-11-21 11:03:06,029 - INFO - Epoch 1 Step 1600 (Global: 1600): loss=0.0502, ppl=1.05, grad_norm=0.58, lr=9.91e-06, throughput=1668 tok/s | |
| 2025-11-21 11:07:58,358 - INFO - Epoch 1 Step 1610 (Global: 1610): loss=0.0610, ppl=1.06, grad_norm=0.59, lr=9.91e-06, throughput=1642 tok/s | |
| 2025-11-21 11:12:39,002 - INFO - Epoch 1 Step 1620 (Global: 1620): loss=0.0672, ppl=1.07, grad_norm=0.75, lr=9.91e-06, throughput=1710 tok/s | |
| 2025-11-21 11:17:28,931 - INFO - Epoch 1 Step 1630 (Global: 1630): loss=0.0554, ppl=1.06, grad_norm=0.72, lr=9.90e-06, throughput=1656 tok/s | |
| 2025-11-21 11:22:15,555 - INFO - Epoch 1 Step 1640 (Global: 1640): loss=0.0501, ppl=1.05, grad_norm=0.68, lr=9.90e-06, throughput=1675 tok/s | |
| 2025-11-21 11:26:50,429 - INFO - Epoch 1 Step 1650 (Global: 1650): loss=0.0556, ppl=1.06, grad_norm=0.75, lr=9.90e-06, throughput=1746 tok/s | |
| 2025-11-21 11:31:36,407 - INFO - Epoch 1 Step 1660 (Global: 1660): loss=0.0567, ppl=1.06, grad_norm=0.63, lr=9.89e-06, throughput=1678 tok/s | |
| 2025-11-21 11:36:22,429 - INFO - Epoch 1 Step 1670 (Global: 1670): loss=0.0440, ppl=1.05, grad_norm=0.57, lr=9.89e-06, throughput=1678 tok/s | |
| 2025-11-21 11:41:10,272 - INFO - Epoch 1 Step 1680 (Global: 1680): loss=0.0463, ppl=1.05, grad_norm=0.72, lr=9.89e-06, throughput=1668 tok/s | |
| 2025-11-21 11:45:43,949 - INFO - Epoch 1 Step 1690 (Global: 1690): loss=0.0776, ppl=1.08, grad_norm=0.72, lr=9.88e-06, throughput=1754 tok/s | |
| 2025-11-21 11:50:29,361 - INFO - Epoch 1 Step 1700 (Global: 1700): loss=0.0623, ppl=1.06, grad_norm=0.53, lr=9.88e-06, throughput=1682 tok/s | |
| 2025-11-21 11:55:14,934 - INFO - Epoch 1 Step 1710 (Global: 1710): loss=0.0547, ppl=1.06, grad_norm=0.58, lr=9.87e-06, throughput=1681 tok/s | |
| 2025-11-21 12:00:00,889 - INFO - Epoch 1 Step 1720 (Global: 1720): loss=0.0462, ppl=1.05, grad_norm=0.59, lr=9.87e-06, throughput=1679 tok/s | |
| 2025-11-21 12:04:35,983 - INFO - Epoch 1 Step 1730 (Global: 1730): loss=0.0532, ppl=1.05, grad_norm=0.74, lr=9.87e-06, throughput=1745 tok/s | |
| 2025-11-21 12:09:24,368 - INFO - Epoch 1 Step 1740 (Global: 1740): loss=0.0521, ppl=1.05, grad_norm=0.76, lr=9.86e-06, throughput=1664 tok/s | |
| 2025-11-21 12:14:13,442 - INFO - Epoch 1 Step 1750 (Global: 1750): loss=0.0640, ppl=1.07, grad_norm=0.42, lr=9.86e-06, throughput=1660 tok/s | |
| 2025-11-21 12:19:04,064 - INFO - Epoch 1 Step 1760 (Global: 1760): loss=0.0483, ppl=1.05, grad_norm=0.50, lr=9.86e-06, throughput=1652 tok/s | |
| 2025-11-21 12:23:46,443 - INFO - Epoch 1 Step 1770 (Global: 1770): loss=0.0556, ppl=1.06, grad_norm=0.73, lr=9.85e-06, throughput=1700 tok/s | |
| 2025-11-21 12:28:40,348 - INFO - Epoch 1 Step 1780 (Global: 1780): loss=0.0573, ppl=1.06, grad_norm=0.45, lr=9.85e-06, throughput=1633 tok/s | |
| 2025-11-21 12:33:30,949 - INFO - Epoch 1 Step 1790 (Global: 1790): loss=0.0500, ppl=1.05, grad_norm=0.58, lr=9.84e-06, throughput=1652 tok/s | |
| 2025-11-21 12:38:26,949 - INFO - Epoch 1 Step 1800 (Global: 1800): loss=0.0452, ppl=1.05, grad_norm=0.57, lr=9.84e-06, throughput=1622 tok/s | |
| 2025-11-21 12:43:04,015 - INFO - Epoch 1 Step 1810 (Global: 1810): loss=0.0612, ppl=1.06, grad_norm=0.52, lr=9.83e-06, throughput=1732 tok/s | |
| 2025-11-21 12:47:52,213 - INFO - Epoch 1 Step 1820 (Global: 1820): loss=0.0476, ppl=1.05, grad_norm=0.76, lr=9.83e-06, throughput=1666 tok/s | |
| 2025-11-21 12:52:44,027 - INFO - Epoch 1 Step 1830 (Global: 1830): loss=0.0619, ppl=1.06, grad_norm=0.77, lr=9.83e-06, throughput=1645 tok/s | |
| 2025-11-21 12:57:33,953 - INFO - Epoch 1 Step 1840 (Global: 1840): loss=0.0644, ppl=1.07, grad_norm=0.57, lr=9.82e-06, throughput=1656 tok/s | |
| 2025-11-21 13:02:16,233 - INFO - Epoch 1 Step 1850 (Global: 1850): loss=0.0576, ppl=1.06, grad_norm=0.61, lr=9.82e-06, throughput=1700 tok/s | |
| 2025-11-21 13:07:09,063 - INFO - Epoch 1 Step 1860 (Global: 1860): loss=0.0489, ppl=1.05, grad_norm=0.56, lr=9.81e-06, throughput=1639 tok/s | |
| 2025-11-21 13:12:00,742 - INFO - Epoch 1 Step 1870 (Global: 1870): loss=0.0687, ppl=1.07, grad_norm=0.65, lr=9.81e-06, throughput=1646 tok/s | |
| 2025-11-21 13:16:50,493 - INFO - Epoch 1 Step 1880 (Global: 1880): loss=0.0737, ppl=1.08, grad_norm=0.96, lr=9.80e-06, throughput=1657 tok/s | |
| 2025-11-21 13:21:28,534 - INFO - Epoch 1 Step 1890 (Global: 1890): loss=0.0502, ppl=1.05, grad_norm=0.38, lr=9.80e-06, throughput=1726 tok/s | |
| 2025-11-21 13:26:20,115 - INFO - Epoch 1 Step 1900 (Global: 1900): loss=0.0573, ppl=1.06, grad_norm=0.62, lr=9.79e-06, throughput=1646 tok/s | |
| 2025-11-21 13:31:12,717 - INFO - Epoch 1 Step 1910 (Global: 1910): loss=0.0535, ppl=1.05, grad_norm=0.46, lr=9.79e-06, throughput=1640 tok/s | |
| 2025-11-21 13:36:03,849 - INFO - Epoch 1 Step 1920 (Global: 1920): loss=0.0512, ppl=1.05, grad_norm=0.51, lr=9.78e-06, throughput=1649 tok/s | |
| 2025-11-21 13:40:42,595 - INFO - Epoch 1 Step 1930 (Global: 1930): loss=0.0510, ppl=1.05, grad_norm=0.48, lr=9.78e-06, throughput=1722 tok/s | |
| 2025-11-21 13:45:32,737 - INFO - Epoch 1 Step 1940 (Global: 1940): loss=0.0515, ppl=1.05, grad_norm=0.73, lr=9.77e-06, throughput=1654 tok/s | |
| 2025-11-21 13:50:23,413 - INFO - Epoch 1 Step 1950 (Global: 1950): loss=0.0636, ppl=1.07, grad_norm=0.56, lr=9.77e-06, throughput=1651 tok/s | |
| 2025-11-21 13:55:14,225 - INFO - Epoch 1 Step 1960 (Global: 1960): loss=0.0418, ppl=1.04, grad_norm=0.54, lr=9.76e-06, throughput=1651 tok/s | |
| 2025-11-21 13:59:53,965 - INFO - Epoch 1 Step 1970 (Global: 1970): loss=0.0522, ppl=1.05, grad_norm=0.57, lr=9.76e-06, throughput=1716 tok/s | |
| 2025-11-21 14:04:47,369 - INFO - Epoch 1 Step 1980 (Global: 1980): loss=0.0694, ppl=1.07, grad_norm=0.93, lr=9.75e-06, throughput=1636 tok/s | |
| 2025-11-21 14:09:41,809 - INFO - Epoch 1 Step 1990 (Global: 1990): loss=0.0741, ppl=1.08, grad_norm=0.77, lr=9.75e-06, throughput=1630 tok/s | |
| 2025-11-21 14:14:35,496 - INFO - Epoch 1 Step 2000 (Global: 2000): loss=0.0688, ppl=1.07, grad_norm=0.71, lr=9.74e-06, throughput=1634 tok/s | |
| 2025-11-21 14:14:35,496 - INFO - | |
| Running validation at step 2000... | |
| 2025-11-21 14:31:23,361 - INFO - Validation loss: 0.0558, perplexity: 1.06 | |
| 2025-11-21 14:31:23,362 - INFO - Qualitative metrics (n=5): | |
| 2025-11-21 14:31:23,362 - INFO - BLEU: 0.9330 | |
| 2025-11-21 14:31:23,362 - INFO - METEOR: 0.9476 | |
| 2025-11-21 14:31:23,362 - INFO - Edit Distance: 0.1201 | |
| 2025-11-21 14:31:23,362 - INFO - F-measure: 0.9935 | |
| 2025-11-21 14:31:23,363 - INFO - | |
| ====================================================================== | |
| 2025-11-21 14:31:23,363 - INFO - Qualitative Evaluation Samples: | |
| 2025-11-21 14:31:23,363 - INFO - ====================================================================== | |
| 2025-11-21 14:31:23,363 - INFO - | |
| Sample 1 (ID: sample_141920_chunk_1): | |
| 2025-11-21 14:31:23,363 - INFO - Context: [Image: sample_141920_chunk_1] + " | |
| Free OCR." | |
| 2025-11-21 14:31:23,363 - INFO - Generated: ' Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s ...' | |
| 2025-11-21 14:31:23,363 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' | |
| 2025-11-21 14:31:23,363 - INFO - ---------------------------------------------------------------------- | |
| 2025-11-21 14:31:23,364 - INFO - | |
| Sample 2 (ID: sample_170543_chunk_2): | |
| 2025-11-21 14:31:23,364 - INFO - Context: [Image: sample_170543_chunk_2] + " | |
| Free OCR." | |
| 2025-11-21 14:31:23,364 - INFO - Generated: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' | |
| 2025-11-21 14:31:23,364 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' | |
| 2025-11-21 14:31:23,364 - INFO - ---------------------------------------------------------------------- | |
| 2025-11-21 14:31:23,365 - INFO - | |
| Sample 3 (ID: sample_107152_chunk_9): | |
| 2025-11-21 14:31:23,365 - INFO - Context: [Image: sample_107152_chunk_9] + " | |
| Free OCR." | |
| 2025-11-21 14:31:23,365 - INFO - Generated: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' | |
| 2025-11-21 14:31:23,365 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' | |
| 2025-11-21 14:31:23,365 - INFO - ---------------------------------------------------------------------- | |
| 2025-11-21 14:31:23,365 - INFO - | |
| Sample 4 (ID: sample_069148_chunk_0): | |
| 2025-11-21 14:31:23,365 - INFO - Context: [Image: sample_069148_chunk_0] + " | |
| Free OCR." | |
| 2025-11-21 14:31:23,366 - INFO - Generated: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' | |
| 2025-11-21 14:31:23,366 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' | |
| 2025-11-21 14:31:23,366 - INFO - ---------------------------------------------------------------------- | |
| 2025-11-21 14:31:23,366 - INFO - | |
| Sample 5 (ID: sample_103176_chunk_4): | |
| 2025-11-21 14:31:23,366 - INFO - Context: [Image: sample_103176_chunk_4] + " | |
| Free OCR." | |
| 2025-11-21 14:31:23,366 - INFO - Generated: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Re...' | |
| 2025-11-21 14:31:23,367 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' | |
| 2025-11-21 14:31:23,367 - INFO - ---------------------------------------------------------------------- | |
| 2025-11-21 14:31:23,368 - INFO - | |
| Qualitative samples saved to: outputs/production_vision_base_reconstruction_20251120_220510/qualitative_step_2000.jsonl | |
| 2025-11-21 14:32:04,306 - INFO - Saved checkpoint to outputs/production_vision_base_reconstruction_20251120_220510/best_checkpoint.pt | |
| 2025-11-21 14:32:04,319 - INFO - New best validation loss: 0.0558, perplexity: 1.06 | |
| 2025-11-21 14:36:44,476 - INFO - Epoch 1 Step 2010 (Global: 2010): loss=0.0632, ppl=1.07, grad_norm=0.51, lr=9.74e-06, throughput=1713 tok/s | |
| 2025-11-21 14:41:38,444 - INFO - Epoch 1 Step 2020 (Global: 2020): loss=0.0444, ppl=1.05, grad_norm=0.50, lr=9.73e-06, throughput=1633 tok/s | |
| 2025-11-21 14:46:30,681 - INFO - Epoch 1 Step 2030 (Global: 2030): loss=0.0640, ppl=1.07, grad_norm=0.67, lr=9.73e-06, throughput=1643 tok/s | |
| 2025-11-21 14:51:20,136 - INFO - Epoch 1 Step 2040 (Global: 2040): loss=0.0575, ppl=1.06, grad_norm=0.60, lr=9.72e-06, throughput=1658 tok/s | |
| 2025-11-21 14:55:57,801 - INFO - Epoch 1 Step 2050 (Global: 2050): loss=0.0563, ppl=1.06, grad_norm=0.81, lr=9.72e-06, throughput=1729 tok/s | |
| 2025-11-21 15:00:51,215 - INFO - Epoch 1 Step 2060 (Global: 2060): loss=0.0542, ppl=1.06, grad_norm=0.59, lr=9.71e-06, throughput=1636 tok/s | |
| 2025-11-21 15:06:02,945 - INFO - Epoch 1 Step 2070 (Global: 2070): loss=0.0453, ppl=1.05, grad_norm=2.09, lr=9.71e-06, throughput=1540 tok/s | |
| 2025-11-21 15:10:52,208 - INFO - Epoch 1 Step 2080 (Global: 2080): loss=0.0543, ppl=1.06, grad_norm=0.52, lr=9.70e-06, throughput=1659 tok/s | |
| 2025-11-21 15:15:30,437 - INFO - Epoch 1 Step 2090 (Global: 2090): loss=0.0453, ppl=1.05, grad_norm=0.66, lr=9.69e-06, throughput=1725 tok/s | |
| 2025-11-21 15:20:19,943 - INFO - Epoch 1 Step 2100 (Global: 2100): loss=0.0548, ppl=1.06, grad_norm=0.55, lr=9.69e-06, throughput=1658 tok/s | |
| 2025-11-21 15:25:11,621 - INFO - Epoch 1 Step 2110 (Global: 2110): loss=0.0462, ppl=1.05, grad_norm=0.65, lr=9.68e-06, throughput=1646 tok/s | |
| 2025-11-21 15:30:04,170 - INFO - Epoch 1 Step 2120 (Global: 2120): loss=0.0573, ppl=1.06, grad_norm=0.57, lr=9.68e-06, throughput=1641 tok/s | |
| 2025-11-21 15:34:45,698 - INFO - Epoch 1 Step 2130 (Global: 2130): loss=0.0417, ppl=1.04, grad_norm=0.64, lr=9.67e-06, throughput=1705 tok/s | |
| 2025-11-21 15:39:41,536 - INFO - Epoch 1 Step 2140 (Global: 2140): loss=0.0429, ppl=1.04, grad_norm=0.41, lr=9.66e-06, throughput=1623 tok/s | |
| 2025-11-21 15:44:37,325 - INFO - Epoch 1 Step 2150 (Global: 2150): loss=0.0444, ppl=1.05, grad_norm=0.49, lr=9.66e-06, throughput=1623 tok/s | |
| 2025-11-21 15:49:34,624 - INFO - Epoch 1 Step 2160 (Global: 2160): loss=0.0626, ppl=1.06, grad_norm=0.79, lr=9.65e-06, throughput=1615 tok/s | |
| 2025-11-21 15:54:15,322 - INFO - Epoch 1 Step 2170 (Global: 2170): loss=0.0545, ppl=1.06, grad_norm=0.86, lr=9.65e-06, throughput=1710 tok/s | |
| 2025-11-21 15:59:04,086 - INFO - Epoch 1 Step 2180 (Global: 2180): loss=0.0575, ppl=1.06, grad_norm=0.57, lr=9.64e-06, throughput=1662 tok/s | |
| 2025-11-21 16:03:54,579 - INFO - Epoch 1 Step 2190 (Global: 2190): loss=0.0542, ppl=1.06, grad_norm=1.07, lr=9.63e-06, throughput=1652 tok/s | |
| 2025-11-21 16:08:48,209 - INFO - Epoch 1 Step 2200 (Global: 2200): loss=0.0472, ppl=1.05, grad_norm=0.61, lr=9.63e-06, throughput=1635 tok/s | |
| 2025-11-21 16:13:27,166 - INFO - Epoch 1 Step 2210 (Global: 2210): loss=0.0528, ppl=1.05, grad_norm=0.42, lr=9.62e-06, throughput=1721 tok/s | |
| 2025-11-21 16:18:17,469 - INFO - Epoch 1 Step 2220 (Global: 2220): loss=0.0554, ppl=1.06, grad_norm=0.68, lr=9.61e-06, throughput=1653 tok/s | |
| 2025-11-21 16:23:08,399 - INFO - Epoch 1 Step 2230 (Global: 2230): loss=0.0482, ppl=1.05, grad_norm=0.45, lr=9.61e-06, throughput=1650 tok/s | |
| 2025-11-21 16:27:48,922 - INFO - Epoch 1 Step 2240 (Global: 2240): loss=0.0630, ppl=1.06, grad_norm=0.62, lr=9.60e-06, throughput=1711 tok/s | |
| 2025-11-21 16:32:41,059 - INFO - Epoch 1 Step 2250 (Global: 2250): loss=0.0489, ppl=1.05, grad_norm=0.52, lr=9.60e-06, throughput=1643 tok/s | |
| 2025-11-21 16:37:37,124 - INFO - Epoch 1 Step 2260 (Global: 2260): loss=0.0470, ppl=1.05, grad_norm=0.54, lr=9.59e-06, throughput=1621 tok/s | |
| 2025-11-21 16:42:29,397 - INFO - Epoch 1 Step 2270 (Global: 2270): loss=0.0505, ppl=1.05, grad_norm=0.50, lr=9.58e-06, throughput=1642 tok/s | |
| 2025-11-21 16:47:11,320 - INFO - Epoch 1 Step 2280 (Global: 2280): loss=0.0536, ppl=1.06, grad_norm=0.59, lr=9.58e-06, throughput=1703 tok/s | |
| 2025-11-21 16:52:17,436 - INFO - Epoch 1 Step 2290 (Global: 2290): loss=0.0455, ppl=1.05, grad_norm=0.58, lr=9.57e-06, throughput=1568 tok/s | |
| 2025-11-21 16:57:24,548 - INFO - Epoch 1 Step 2300 (Global: 2300): loss=0.0391, ppl=1.04, grad_norm=0.68, lr=9.56e-06, throughput=1563 tok/s | |
| 2025-11-21 17:02:18,501 - INFO - Epoch 1 Step 2310 (Global: 2310): loss=0.0497, ppl=1.05, grad_norm=0.79, lr=9.55e-06, throughput=1633 tok/s | |
| 2025-11-21 17:07:04,803 - INFO - Epoch 1 Step 2320 (Global: 2320): loss=0.0464, ppl=1.05, grad_norm=0.67, lr=9.55e-06, throughput=1677 tok/s | |
| 2025-11-21 17:12:02,816 - INFO - Epoch 1 Step 2330 (Global: 2330): loss=0.0416, ppl=1.04, grad_norm=0.64, lr=9.54e-06, throughput=1611 tok/s | |
| 2025-11-21 17:17:12,101 - INFO - Epoch 1 Step 2340 (Global: 2340): loss=0.0511, ppl=1.05, grad_norm=0.52, lr=9.53e-06, throughput=1552 tok/s | |
| 2025-11-21 17:22:03,994 - INFO - Epoch 1 Step 2350 (Global: 2350): loss=0.0713, ppl=1.07, grad_norm=0.64, lr=9.53e-06, throughput=1644 tok/s | |
| 2025-11-21 17:26:43,053 - INFO - Epoch 1 Step 2360 (Global: 2360): loss=0.0593, ppl=1.06, grad_norm=0.46, lr=9.52e-06, throughput=1720 tok/s | |
| 2025-11-21 17:31:37,603 - INFO - Epoch 1 Step 2370 (Global: 2370): loss=0.0440, ppl=1.04, grad_norm=0.62, lr=9.51e-06, throughput=1630 tok/s | |
| 2025-11-21 17:36:33,135 - INFO - Epoch 1 Step 2380 (Global: 2380): loss=0.0485, ppl=1.05, grad_norm=0.51, lr=9.51e-06, throughput=1624 tok/s | |
| 2025-11-21 17:41:15,628 - INFO - Epoch 1 Step 2390 (Global: 2390): loss=0.0712, ppl=1.07, grad_norm=0.65, lr=9.50e-06, throughput=1699 tok/s | |
| 2025-11-21 17:45:49,332 - INFO - Epoch 1 Step 2400 (Global: 2400): loss=0.0450, ppl=1.05, grad_norm=0.66, lr=9.49e-06, throughput=1754 tok/s | |
| 2025-11-21 17:50:30,495 - INFO - Epoch 1 Step 2410 (Global: 2410): loss=0.0465, ppl=1.05, grad_norm=0.71, lr=9.48e-06, throughput=1707 tok/s | |
| 2025-11-21 17:55:15,691 - INFO - Epoch 1 Step 2420 (Global: 2420): loss=0.0397, ppl=1.04, grad_norm=0.55, lr=9.48e-06, throughput=1683 tok/s | |
| 2025-11-21 17:59:53,952 - INFO - Epoch 1 Step 2430 (Global: 2430): loss=0.0456, ppl=1.05, grad_norm=0.71, lr=9.47e-06, throughput=1725 tok/s | |
| 2025-11-21 18:04:25,888 - INFO - Epoch 1 Step 2440 (Global: 2440): loss=0.0514, ppl=1.05, grad_norm=0.68, lr=9.46e-06, throughput=1765 tok/s | |
| 2025-11-21 18:09:05,823 - INFO - Epoch 1 Step 2450 (Global: 2450): loss=0.0666, ppl=1.07, grad_norm=0.62, lr=9.45e-06, throughput=1715 tok/s | |
| 2025-11-21 18:13:44,815 - INFO - Epoch 1 Step 2460 (Global: 2460): loss=0.0727, ppl=1.08, grad_norm=0.61, lr=9.45e-06, throughput=1720 tok/s | |
| 2025-11-21 18:18:24,001 - INFO - Epoch 1 Step 2470 (Global: 2470): loss=0.0413, ppl=1.04, grad_norm=0.65, lr=9.44e-06, throughput=1719 tok/s | |
| 2025-11-21 18:22:52,956 - INFO - Epoch 1 Step 2480 (Global: 2480): loss=0.0424, ppl=1.04, grad_norm=0.48, lr=9.43e-06, throughput=1785 tok/s | |
| 2025-11-21 18:27:31,978 - INFO - Epoch 1 Step 2490 (Global: 2490): loss=0.0586, ppl=1.06, grad_norm=0.68, lr=9.42e-06, throughput=1720 tok/s | |
| 2025-11-21 18:32:12,052 - INFO - Epoch 1 Step 2500 (Global: 2500): loss=0.0469, ppl=1.05, grad_norm=1.01, lr=9.41e-06, throughput=1714 tok/s | |
| 2025-11-21 18:36:42,980 - INFO - Epoch 1 Step 2510 (Global: 2510): loss=0.0461, ppl=1.05, grad_norm=0.63, lr=9.41e-06, throughput=1772 tok/s | |
| 2025-11-21 18:41:22,540 - INFO - Epoch 1 Step 2520 (Global: 2520): loss=0.0466, ppl=1.05, grad_norm=0.46, lr=9.40e-06, throughput=1717 tok/s | |
| 2025-11-21 18:45:58,643 - INFO - Epoch 1 Step 2530 (Global: 2530): loss=0.0485, ppl=1.05, grad_norm=0.52, lr=9.39e-06, throughput=1738 tok/s | |
| 2025-11-21 18:50:33,568 - INFO - Epoch 1 Step 2540 (Global: 2540): loss=0.0364, ppl=1.04, grad_norm=0.77, lr=9.38e-06, throughput=1746 tok/s | |
| 2025-11-21 18:55:04,970 - INFO - Epoch 1 Step 2550 (Global: 2550): loss=0.0495, ppl=1.05, grad_norm=0.57, lr=9.37e-06, throughput=1769 tok/s | |
| 2025-11-21 18:59:41,699 - INFO - Epoch 1 Step 2560 (Global: 2560): loss=0.0685, ppl=1.07, grad_norm=0.61, lr=9.37e-06, throughput=1735 tok/s | |
| 2025-11-21 19:04:17,623 - INFO - Epoch 1 Step 2570 (Global: 2570): loss=0.0582, ppl=1.06, grad_norm=0.53, lr=9.36e-06, throughput=1740 tok/s | |
| 2025-11-21 19:08:54,912 - INFO - Epoch 1 Step 2580 (Global: 2580): loss=0.0633, ppl=1.07, grad_norm=0.68, lr=9.35e-06, throughput=1731 tok/s | |
| 2025-11-21 19:13:26,354 - INFO - Epoch 1 Step 2590 (Global: 2590): loss=0.0680, ppl=1.07, grad_norm=0.68, lr=9.34e-06, throughput=1768 tok/s | |
| 2025-11-21 19:18:07,269 - INFO - Epoch 1 Step 2600 (Global: 2600): loss=0.0719, ppl=1.07, grad_norm=0.49, lr=9.33e-06, throughput=1709 tok/s | |
| 2025-11-21 19:22:43,640 - INFO - Epoch 1 Step 2610 (Global: 2610): loss=0.0436, ppl=1.04, grad_norm=0.60, lr=9.32e-06, throughput=1737 tok/s | |
| 2025-11-21 19:27:20,313 - INFO - Epoch 1 Step 2620 (Global: 2620): loss=0.0440, ppl=1.05, grad_norm=0.70, lr=9.32e-06, throughput=1735 tok/s | |
| 2025-11-21 19:31:47,135 - INFO - Epoch 1 Step 2630 (Global: 2630): loss=0.0477, ppl=1.05, grad_norm=0.64, lr=9.31e-06, throughput=1799 tok/s | |
| 2025-11-21 19:36:23,311 - INFO - Epoch 1 Step 2640 (Global: 2640): loss=0.0437, ppl=1.04, grad_norm=0.61, lr=9.30e-06, throughput=1738 tok/s | |
| 2025-11-21 19:40:58,043 - INFO - Epoch 1 Step 2650 (Global: 2650): loss=0.0369, ppl=1.04, grad_norm=0.62, lr=9.29e-06, throughput=1747 tok/s | |
| 2025-11-21 19:45:32,608 - INFO - Epoch 1 Step 2660 (Global: 2660): loss=0.0652, ppl=1.07, grad_norm=0.50, lr=9.28e-06, throughput=1748 tok/s | |
| 2025-11-21 19:49:59,756 - INFO - Epoch 1 Step 2670 (Global: 2670): loss=0.0430, ppl=1.04, grad_norm=0.53, lr=9.27e-06, throughput=1797 tok/s | |
| 2025-11-21 19:54:34,755 - INFO - Epoch 1 Step 2680 (Global: 2680): loss=0.0436, ppl=1.04, grad_norm=0.62, lr=9.26e-06, throughput=1745 tok/s | |
| 2025-11-21 19:59:08,385 - INFO - Epoch 1 Step 2690 (Global: 2690): loss=0.1075, ppl=1.11, grad_norm=0.79, lr=9.26e-06, throughput=1754 tok/s | |
| 2025-11-21 20:03:40,800 - INFO - Epoch 1 Step 2700 (Global: 2700): loss=0.0450, ppl=1.05, grad_norm=0.51, lr=9.25e-06, throughput=1762 tok/s | |
| 2025-11-21 20:08:09,437 - INFO - Epoch 1 Step 2710 (Global: 2710): loss=0.0506, ppl=1.05, grad_norm=0.66, lr=9.24e-06, throughput=1787 tok/s | |
| 2025-11-21 20:12:49,204 - INFO - Epoch 1 Step 2720 (Global: 2720): loss=0.0389, ppl=1.04, grad_norm=0.46, lr=9.23e-06, throughput=1716 tok/s | |
| 2025-11-21 20:17:29,953 - INFO - Epoch 1 Step 2730 (Global: 2730): loss=0.0610, ppl=1.06, grad_norm=0.64, lr=9.22e-06, throughput=1710 tok/s | |
| 2025-11-21 20:22:11,409 - INFO - Epoch 1 Step 2740 (Global: 2740): loss=0.0449, ppl=1.05, grad_norm=0.38, lr=9.21e-06, throughput=1705 tok/s | |
| 2025-11-21 20:26:44,559 - INFO - Epoch 1 Step 2750 (Global: 2750): loss=0.0477, ppl=1.05, grad_norm=0.42, lr=9.20e-06, throughput=1757 tok/s | |
| 2025-11-21 20:31:24,487 - INFO - Epoch 1 Step 2760 (Global: 2760): loss=0.0371, ppl=1.04, grad_norm=0.40, lr=9.19e-06, throughput=1715 tok/s | |
| 2025-11-21 20:36:01,432 - INFO - Epoch 1 Step 2770 (Global: 2770): loss=0.0452, ppl=1.05, grad_norm=0.57, lr=9.18e-06, throughput=1733 tok/s | |
| 2025-11-21 20:40:30,180 - INFO - Epoch 1 Step 2780 (Global: 2780): loss=0.0462, ppl=1.05, grad_norm=0.43, lr=9.17e-06, throughput=1786 tok/s | |
| 2025-11-21 20:45:03,363 - INFO - Epoch 1 Step 2790 (Global: 2790): loss=0.0438, ppl=1.04, grad_norm=0.58, lr=9.17e-06, throughput=1757 tok/s | |
| 2025-11-21 20:49:38,320 - INFO - Epoch 1 Step 2800 (Global: 2800): loss=0.0487, ppl=1.05, grad_norm=0.50, lr=9.16e-06, throughput=1746 tok/s | |
| 2025-11-21 20:54:12,141 - INFO - Epoch 1 Step 2810 (Global: 2810): loss=0.0422, ppl=1.04, grad_norm=0.59, lr=9.15e-06, throughput=1753 tok/s | |
| 2025-11-21 20:58:36,987 - INFO - Epoch 1 Step 2820 (Global: 2820): loss=0.0473, ppl=1.05, grad_norm=0.45, lr=9.14e-06, throughput=1812 tok/s | |
| 2025-11-21 21:03:10,179 - INFO - Epoch 1 Step 2830 (Global: 2830): loss=0.0450, ppl=1.05, grad_norm=0.98, lr=9.13e-06, throughput=1757 tok/s | |
| 2025-11-21 21:07:42,461 - INFO - Epoch 1 Step 2840 (Global: 2840): loss=0.0400, ppl=1.04, grad_norm=0.40, lr=9.12e-06, throughput=1763 tok/s | |
| 2025-11-21 21:12:16,327 - INFO - Epoch 1 Step 2850 (Global: 2850): loss=0.0486, ppl=1.05, grad_norm=0.53, lr=9.11e-06, throughput=1753 tok/s | |
| 2025-11-21 21:16:43,047 - INFO - Epoch 1 Step 2860 (Global: 2860): loss=0.0367, ppl=1.04, grad_norm=0.44, lr=9.10e-06, throughput=1800 tok/s | |
| 2025-11-21 21:21:17,760 - INFO - Epoch 1 Step 2870 (Global: 2870): loss=0.0368, ppl=1.04, grad_norm=0.55, lr=9.09e-06, throughput=1747 tok/s | |
| 2025-11-21 21:25:55,734 - INFO - Epoch 1 Step 2880 (Global: 2880): loss=0.0486, ppl=1.05, grad_norm=0.73, lr=9.08e-06, throughput=1727 tok/s | |
| 2025-11-21 21:30:40,067 - INFO - Epoch 1 Step 2890 (Global: 2890): loss=0.0345, ppl=1.04, grad_norm=0.51, lr=9.07e-06, throughput=1688 tok/s | |
| 2025-11-21 21:35:04,390 - INFO - Epoch 1 Step 2900 (Global: 2900): loss=0.0527, ppl=1.05, grad_norm=0.55, lr=9.06e-06, throughput=1816 tok/s | |
| 2025-11-21 21:39:35,722 - INFO - Epoch 1 Step 2910 (Global: 2910): loss=0.0556, ppl=1.06, grad_norm=0.73, lr=9.05e-06, throughput=1769 tok/s | |
| 2025-11-21 21:44:13,445 - INFO - Epoch 1 Step 2920 (Global: 2920): loss=0.0521, ppl=1.05, grad_norm=0.60, lr=9.04e-06, throughput=1728 tok/s | |
| 2025-11-21 21:48:57,943 - INFO - Epoch 1 Step 2930 (Global: 2930): loss=0.0471, ppl=1.05, grad_norm=0.51, lr=9.03e-06, throughput=1687 tok/s | |
| 2025-11-21 21:53:30,603 - INFO - Epoch 1 Step 2940 (Global: 2940): loss=0.0486, ppl=1.05, grad_norm=0.46, lr=9.02e-06, throughput=1760 tok/s | |
| 2025-11-21 21:58:14,076 - INFO - Epoch 1 Step 2950 (Global: 2950): loss=0.0279, ppl=1.03, grad_norm=0.32, lr=9.01e-06, throughput=1693 tok/s | |
| 2025-11-21 22:02:51,449 - INFO - Epoch 1 Step 2960 (Global: 2960): loss=0.0351, ppl=1.04, grad_norm=0.47, lr=9.00e-06, throughput=1731 tok/s | |
| 2025-11-21 22:07:28,684 - INFO - Epoch 1 Step 2970 (Global: 2970): loss=0.0523, ppl=1.05, grad_norm=0.57, lr=8.99e-06, throughput=1731 tok/s | |
| 2025-11-21 22:11:57,132 - INFO - Epoch 1 Step 2980 (Global: 2980): loss=0.0482, ppl=1.05, grad_norm=0.39, lr=8.98e-06, throughput=1788 tok/s | |
| 2025-11-21 22:16:42,783 - INFO - Epoch 1 Step 2990 (Global: 2990): loss=0.0479, ppl=1.05, grad_norm=0.53, lr=8.97e-06, throughput=1680 tok/s | |
| 2025-11-21 22:21:42,765 - INFO - Epoch 1 Step 3000 (Global: 3000): loss=0.0617, ppl=1.06, grad_norm=0.65, lr=8.96e-06, throughput=1600 tok/s | |
| 2025-11-21 22:26:36,667 - INFO - Epoch 1 Step 3010 (Global: 3010): loss=0.0397, ppl=1.04, grad_norm=0.52, lr=8.95e-06, throughput=1633 tok/s | |
| 2025-11-21 22:31:17,872 - INFO - Epoch 1 Step 3020 (Global: 3020): loss=0.0493, ppl=1.05, grad_norm=0.54, lr=8.94e-06, throughput=1707 tok/s | |
| 2025-11-21 22:36:07,096 - INFO - Epoch 1 Step 3030 (Global: 3030): loss=0.0385, ppl=1.04, grad_norm=0.50, lr=8.93e-06, throughput=1660 tok/s | |
| 2025-11-21 22:40:55,356 - INFO - Epoch 1 Step 3040 (Global: 3040): loss=0.0466, ppl=1.05, grad_norm=0.77, lr=8.92e-06, throughput=1665 tok/s | |
| 2025-11-21 22:45:34,190 - INFO - Epoch 1 Step 3050 (Global: 3050): loss=0.0540, ppl=1.06, grad_norm=0.63, lr=8.91e-06, throughput=1721 tok/s | |
| 2025-11-21 22:50:28,871 - INFO - Epoch 1 Step 3060 (Global: 3060): loss=0.0428, ppl=1.04, grad_norm=0.54, lr=8.90e-06, throughput=1629 tok/s | |
| 2025-11-21 22:55:20,877 - INFO - Epoch 1 Step 3070 (Global: 3070): loss=0.0468, ppl=1.05, grad_norm=0.64, lr=8.89e-06, throughput=1644 tok/s | |
| 2025-11-21 23:00:11,587 - INFO - Epoch 1 Step 3080 (Global: 3080): loss=0.0434, ppl=1.04, grad_norm=0.49, lr=8.88e-06, throughput=1651 tok/s | |
| 2025-11-21 23:04:51,597 - INFO - Epoch 1 Step 3090 (Global: 3090): loss=0.0474, ppl=1.05, grad_norm=0.43, lr=8.87e-06, throughput=1714 tok/s | |
| 2025-11-21 23:09:46,116 - INFO - Epoch 1 Step 3100 (Global: 3100): loss=0.0488, ppl=1.05, grad_norm=0.83, lr=8.86e-06, throughput=1630 tok/s | |
| 2025-11-21 23:14:40,568 - INFO - Epoch 1 Step 3110 (Global: 3110): loss=0.0303, ppl=1.03, grad_norm=0.39, lr=8.85e-06, throughput=1630 tok/s | |
| 2025-11-21 23:19:35,084 - INFO - Epoch 1 Step 3120 (Global: 3120): loss=0.0383, ppl=1.04, grad_norm=0.49, lr=8.84e-06, throughput=1630 tok/s | |
| 2025-11-21 23:24:13,538 - INFO - Epoch 1 Step 3130 (Global: 3130): loss=0.0584, ppl=1.06, grad_norm=0.68, lr=8.82e-06, throughput=1724 tok/s | |
| 2025-11-21 23:29:03,173 - INFO - Epoch 1 Step 3140 (Global: 3140): loss=0.0423, ppl=1.04, grad_norm=0.40, lr=8.81e-06, throughput=1657 tok/s | |
| 2025-11-21 23:33:54,459 - INFO - Epoch 1 Step 3150 (Global: 3150): loss=0.0577, ppl=1.06, grad_norm=0.45, lr=8.80e-06, throughput=1648 tok/s | |
| 2025-11-21 23:38:47,297 - INFO - Epoch 1 Step 3160 (Global: 3160): loss=0.0559, ppl=1.06, grad_norm=0.46, lr=8.79e-06, throughput=1639 tok/s | |
| 2025-11-21 23:43:28,249 - INFO - Epoch 1 Step 3170 (Global: 3170): loss=0.0382, ppl=1.04, grad_norm=0.52, lr=8.78e-06, throughput=1708 tok/s | |
| 2025-11-21 23:48:18,848 - INFO - Epoch 1 Step 3180 (Global: 3180): loss=0.0373, ppl=1.04, grad_norm=0.40, lr=8.77e-06, throughput=1652 tok/s | |
| 2025-11-21 23:53:12,589 - INFO - Epoch 1 Step 3190 (Global: 3190): loss=0.0422, ppl=1.04, grad_norm=0.66, lr=8.76e-06, throughput=1634 tok/s | |
| 2025-11-21 23:58:04,716 - INFO - Epoch 1 Step 3200 (Global: 3200): loss=0.0340, ppl=1.03, grad_norm=0.37, lr=8.75e-06, throughput=1643 tok/s | |
| 2025-11-22 00:02:48,327 - INFO - Epoch 1 Step 3210 (Global: 3210): loss=0.0374, ppl=1.04, grad_norm=0.49, lr=8.74e-06, throughput=1692 tok/s | |
| 2025-11-22 00:07:43,616 - INFO - Epoch 1 Step 3220 (Global: 3220): loss=0.0411, ppl=1.04, grad_norm=0.60, lr=8.73e-06, throughput=1626 tok/s | |
| 2025-11-22 00:12:35,567 - INFO - Epoch 1 Step 3230 (Global: 3230): loss=0.0530, ppl=1.05, grad_norm=0.82, lr=8.71e-06, throughput=1644 tok/s | |
| 2025-11-22 00:17:14,438 - INFO - Epoch 1 Step 3240 (Global: 3240): loss=0.0565, ppl=1.06, grad_norm=0.46, lr=8.70e-06, throughput=1721 tok/s | |
| 2025-11-22 00:22:05,118 - INFO - Epoch 1 Step 3250 (Global: 3250): loss=0.0391, ppl=1.04, grad_norm=0.47, lr=8.69e-06, throughput=1651 tok/s | |
| 2025-11-22 00:26:54,523 - INFO - Epoch 1 Step 3260 (Global: 3260): loss=0.0422, ppl=1.04, grad_norm=0.60, lr=8.68e-06, throughput=1659 tok/s | |
| 2025-11-22 00:31:43,640 - INFO - Epoch 1 Step 3270 (Global: 3270): loss=0.0388, ppl=1.04, grad_norm=0.52, lr=8.67e-06, throughput=1660 tok/s | |
| 2025-11-22 00:36:19,324 - INFO - Epoch 1 Step 3280 (Global: 3280): loss=0.0458, ppl=1.05, grad_norm=0.49, lr=8.66e-06, throughput=1741 tok/s | |
| 2025-11-22 00:41:06,212 - INFO - Epoch 1 Step 3290 (Global: 3290): loss=0.0451, ppl=1.05, grad_norm=0.52, lr=8.65e-06, throughput=1673 tok/s | |
| 2025-11-22 00:45:53,675 - INFO - Epoch 1 Step 3300 (Global: 3300): loss=0.0399, ppl=1.04, grad_norm=0.32, lr=8.63e-06, throughput=1670 tok/s | |
| 2025-11-22 00:50:46,339 - INFO - Epoch 1 Step 3310 (Global: 3310): loss=0.0366, ppl=1.04, grad_norm=0.55, lr=8.62e-06, throughput=1640 tok/s | |
| 2025-11-22 00:55:28,643 - INFO - Epoch 1 Step 3320 (Global: 3320): loss=0.0378, ppl=1.04, grad_norm=0.34, lr=8.61e-06, throughput=1700 tok/s | |
| 2025-11-22 01:00:23,312 - INFO - Epoch 1 Step 3330 (Global: 3330): loss=0.0381, ppl=1.04, grad_norm=0.54, lr=8.60e-06, throughput=1629 tok/s | |
| 2025-11-22 01:05:17,235 - INFO - Epoch 1 Step 3340 (Global: 3340): loss=0.0450, ppl=1.05, grad_norm=0.62, lr=8.59e-06, throughput=1633 tok/s | |
| 2025-11-22 01:10:07,090 - INFO - Epoch 1 Step 3350 (Global: 3350): loss=0.0436, ppl=1.04, grad_norm=0.71, lr=8.58e-06, throughput=1656 tok/s | |
| 2025-11-22 01:14:41,895 - INFO - Epoch 1 Step 3360 (Global: 3360): loss=0.0344, ppl=1.03, grad_norm=0.62, lr=8.57e-06, throughput=1747 tok/s | |
| 2025-11-22 01:19:30,916 - INFO - Epoch 1 Step 3370 (Global: 3370): loss=0.0357, ppl=1.04, grad_norm=0.33, lr=8.55e-06, throughput=1661 tok/s | |
| 2025-11-22 01:24:19,195 - INFO - Epoch 1 Step 3380 (Global: 3380): loss=0.0397, ppl=1.04, grad_norm=0.54, lr=8.54e-06, throughput=1665 tok/s | |
| 2025-11-22 01:29:07,083 - INFO - Epoch 1 Step 3390 (Global: 3390): loss=0.0317, ppl=1.03, grad_norm=0.31, lr=8.53e-06, throughput=1667 tok/s | |
| 2025-11-22 01:33:45,346 - INFO - Epoch 1 Step 3400 (Global: 3400): loss=0.0596, ppl=1.06, grad_norm=0.68, lr=8.52e-06, throughput=1725 tok/s | |
| 2025-11-22 01:38:33,874 - INFO - Epoch 1 Step 3410 (Global: 3410): loss=0.0441, ppl=1.05, grad_norm=0.55, lr=8.51e-06, throughput=1664 tok/s | |
| 2025-11-22 01:43:22,389 - INFO - Epoch 1 Step 3420 (Global: 3420): loss=0.0407, ppl=1.04, grad_norm=0.40, lr=8.49e-06, throughput=1664 tok/s | |
| 2025-11-22 01:48:12,368 - INFO - Epoch 1 Step 3430 (Global: 3430): loss=0.0531, ppl=1.05, grad_norm=0.50, lr=8.48e-06, throughput=1655 tok/s | |
| 2025-11-22 01:52:50,102 - INFO - Epoch 1 Step 3440 (Global: 3440): loss=0.0457, ppl=1.05, grad_norm=0.59, lr=8.47e-06, throughput=1728 tok/s | |
| 2025-11-22 01:57:39,641 - INFO - Epoch 1 Step 3450 (Global: 3450): loss=0.0382, ppl=1.04, grad_norm=0.65, lr=8.46e-06, throughput=1658 tok/s | |
| 2025-11-22 02:02:28,638 - INFO - Epoch 1 Step 3460 (Global: 3460): loss=0.0439, ppl=1.04, grad_norm=0.53, lr=8.45e-06, throughput=1661 tok/s | |
| 2025-11-22 02:07:20,818 - INFO - Epoch 1 Step 3470 (Global: 3470): loss=0.0325, ppl=1.03, grad_norm=0.37, lr=8.43e-06, throughput=1643 tok/s | |
| 2025-11-22 02:11:58,447 - INFO - Epoch 1 Step 3480 (Global: 3480): loss=0.0387, ppl=1.04, grad_norm=0.37, lr=8.42e-06, throughput=1729 tok/s | |
| 2025-11-22 02:16:45,343 - INFO - Epoch 1 Step 3490 (Global: 3490): loss=0.0439, ppl=1.04, grad_norm=0.59, lr=8.41e-06, throughput=1673 tok/s | |
| 2025-11-22 02:21:36,099 - INFO - Epoch 1 Step 3500 (Global: 3500): loss=0.0344, ppl=1.04, grad_norm=0.41, lr=8.40e-06, throughput=1651 tok/s | |
| 2025-11-22 02:26:12,802 - INFO - Epoch 1 Step 3510 (Global: 3510): loss=0.0470, ppl=1.05, grad_norm=0.40, lr=8.38e-06, throughput=1735 tok/s | |
| 2025-11-22 02:31:04,064 - INFO - Epoch 1 Step 3520 (Global: 3520): loss=0.0423, ppl=1.04, grad_norm=0.57, lr=8.37e-06, throughput=1648 tok/s | |
| 2025-11-22 02:35:55,036 - INFO - Epoch 1 Step 3530 (Global: 3530): loss=0.0381, ppl=1.04, grad_norm=0.70, lr=8.36e-06, throughput=1650 tok/s | |
| 2025-11-22 02:40:46,400 - INFO - Epoch 1 Step 3540 (Global: 3540): loss=0.0483, ppl=1.05, grad_norm=0.46, lr=8.35e-06, throughput=1647 tok/s | |
| 2025-11-22 02:45:25,367 - INFO - Epoch 1 Step 3550 (Global: 3550): loss=0.0358, ppl=1.04, grad_norm=0.49, lr=8.33e-06, throughput=1721 tok/s | |
| 2025-11-22 02:50:16,673 - INFO - Epoch 1 Step 3560 (Global: 3560): loss=0.0382, ppl=1.04, grad_norm=0.46, lr=8.32e-06, throughput=1648 tok/s | |
| 2025-11-22 02:55:08,548 - INFO - Epoch 1 Step 3570 (Global: 3570): loss=0.0317, ppl=1.03, grad_norm=0.52, lr=8.31e-06, throughput=1645 tok/s | |
| 2025-11-22 03:00:00,160 - INFO - Epoch 1 Step 3580 (Global: 3580): loss=0.0460, ppl=1.05, grad_norm=0.64, lr=8.30e-06, throughput=1646 tok/s | |
| 2025-11-22 03:04:39,697 - INFO - Epoch 1 Step 3590 (Global: 3590): loss=0.0580, ppl=1.06, grad_norm=0.46, lr=8.28e-06, throughput=1717 tok/s | |
| 2025-11-22 03:09:35,847 - INFO - Epoch 1 Step 3600 (Global: 3600): loss=0.0487, ppl=1.05, grad_norm=1.01, lr=8.27e-06, throughput=1621 tok/s | |
| 2025-11-22 03:14:29,023 - INFO - Epoch 1 Step 3610 (Global: 3610): loss=0.0497, ppl=1.05, grad_norm=0.54, lr=8.26e-06, throughput=1637 tok/s | |
| 2025-11-22 03:19:20,858 - INFO - Epoch 1 Step 3620 (Global: 3620): loss=0.0497, ppl=1.05, grad_norm=0.89, lr=8.25e-06, throughput=1645 tok/s | |
| 2025-11-22 03:24:00,768 - INFO - Epoch 1 Step 3630 (Global: 3630): loss=0.0407, ppl=1.04, grad_norm=1.09, lr=8.23e-06, throughput=1715 tok/s | |
| 2025-11-22 03:28:53,013 - INFO - Epoch 1 Step 3640 (Global: 3640): loss=0.0523, ppl=1.05, grad_norm=0.94, lr=8.22e-06, throughput=1642 tok/s | |
| 2025-11-22 03:33:42,146 - INFO - Epoch 1 Step 3650 (Global: 3650): loss=0.0350, ppl=1.04, grad_norm=0.44, lr=8.21e-06, throughput=1660 tok/s | |
| 2025-11-22 03:38:35,896 - INFO - Epoch 1 Step 3660 (Global: 3660): loss=0.0378, ppl=1.04, grad_norm=0.66, lr=8.20e-06, throughput=1634 tok/s | |
| 2025-11-22 03:43:16,665 - INFO - Epoch 1 Step 3670 (Global: 3670): loss=0.0360, ppl=1.04, grad_norm=0.65, lr=8.18e-06, throughput=1710 tok/s | |
| 2025-11-22 03:48:07,181 - INFO - Epoch 1 Step 3680 (Global: 3680): loss=0.0334, ppl=1.03, grad_norm=0.38, lr=8.17e-06, throughput=1652 tok/s | |
| 2025-11-22 03:52:58,107 - INFO - Epoch 1 Step 3690 (Global: 3690): loss=0.0456, ppl=1.05, grad_norm=0.80, lr=8.16e-06, throughput=1650 tok/s | |
| 2025-11-22 03:57:50,277 - INFO - Epoch 1 Step 3700 (Global: 3700): loss=0.0431, ppl=1.04, grad_norm=0.66, lr=8.14e-06, throughput=1643 tok/s | |
| 2025-11-22 04:02:28,782 - INFO - Epoch 1 Step 3710 (Global: 3710): loss=0.0412, ppl=1.04, grad_norm=0.50, lr=8.13e-06, throughput=1724 tok/s | |
| 2025-11-22 04:07:18,572 - INFO - Epoch 1 Step 3720 (Global: 3720): loss=0.0365, ppl=1.04, grad_norm=0.47, lr=8.12e-06, throughput=1656 tok/s | |
| 2025-11-22 04:12:10,471 - INFO - Epoch 1 Step 3730 (Global: 3730): loss=0.0452, ppl=1.05, grad_norm=0.75, lr=8.10e-06, throughput=1644 tok/s | |
| 2025-11-22 04:16:52,165 - INFO - Epoch 1 Step 3740 (Global: 3740): loss=0.0439, ppl=1.04, grad_norm=0.42, lr=8.09e-06, throughput=1704 tok/s | |
| 2025-11-22 04:21:43,124 - INFO - Epoch 1 Step 3750 (Global: 3750): loss=0.0426, ppl=1.04, grad_norm=0.38, lr=8.08e-06, throughput=1650 tok/s | |
| 2025-11-22 04:26:35,267 - INFO - Epoch 1 Step 3760 (Global: 3760): loss=0.0327, ppl=1.03, grad_norm=0.35, lr=8.06e-06, throughput=1643 tok/s | |
| 2025-11-22 04:31:29,193 - INFO - Epoch 1 Step 3770 (Global: 3770): loss=0.0326, ppl=1.03, grad_norm=0.64, lr=8.05e-06, throughput=1633 tok/s | |
| 2025-11-22 04:36:07,786 - INFO - Epoch 1 Step 3780 (Global: 3780): loss=0.0401, ppl=1.04, grad_norm=0.54, lr=8.04e-06, throughput=1723 tok/s | |
| 2025-11-22 04:40:57,354 - INFO - Epoch 1 Step 3790 (Global: 3790): loss=0.0445, ppl=1.05, grad_norm=0.53, lr=8.02e-06, throughput=1658 tok/s | |
| 2025-11-22 04:45:51,277 - INFO - Epoch 1 Step 3800 (Global: 3800): loss=0.0362, ppl=1.04, grad_norm=0.50, lr=8.01e-06, throughput=1633 tok/s | |
| 2025-11-22 04:50:42,956 - INFO - Epoch 1 Step 3810 (Global: 3810): loss=0.0338, ppl=1.03, grad_norm=0.48, lr=8.00e-06, throughput=1646 tok/s | |
| 2025-11-22 04:55:24,224 - INFO - Epoch 1 Step 3820 (Global: 3820): loss=0.0461, ppl=1.05, grad_norm=0.43, lr=7.98e-06, throughput=1707 tok/s | |
| 2025-11-22 05:00:14,213 - INFO - Epoch 1 Step 3830 (Global: 3830): loss=0.0441, ppl=1.05, grad_norm=0.50, lr=7.97e-06, throughput=1655 tok/s | |
| 2025-11-22 05:05:09,118 - INFO - Epoch 1 Step 3840 (Global: 3840): loss=0.0358, ppl=1.04, grad_norm=0.66, lr=7.96e-06, throughput=1628 tok/s | |
| 2025-11-22 05:10:02,489 - INFO - Epoch 1 Step 3850 (Global: 3850): loss=0.0625, ppl=1.06, grad_norm=0.68, lr=7.94e-06, throughput=1636 tok/s | |
| 2025-11-22 05:14:42,971 - INFO - Epoch 1 Step 3860 (Global: 3860): loss=0.0346, ppl=1.04, grad_norm=0.43, lr=7.93e-06, throughput=1711 tok/s | |
| 2025-11-22 05:19:37,781 - INFO - Epoch 1 Step 3870 (Global: 3870): loss=0.0375, ppl=1.04, grad_norm=0.46, lr=7.92e-06, throughput=1628 tok/s | |
| 2025-11-22 05:24:31,805 - INFO - Epoch 1 Step 3880 (Global: 3880): loss=0.0556, ppl=1.06, grad_norm=0.99, lr=7.90e-06, throughput=1633 tok/s | |
| 2025-11-22 05:29:24,993 - INFO - Epoch 1 Step 3890 (Global: 3890): loss=0.0456, ppl=1.05, grad_norm=0.56, lr=7.89e-06, throughput=1638 tok/s | |
| 2025-11-22 05:34:06,242 - INFO - Epoch 1 Step 3900 (Global: 3900): loss=0.0438, ppl=1.04, grad_norm=0.47, lr=7.88e-06, throughput=1707 tok/s | |
| 2025-11-22 05:39:00,162 - INFO - Epoch 1 Step 3910 (Global: 3910): loss=0.0539, ppl=1.06, grad_norm=0.62, lr=7.86e-06, throughput=1633 tok/s | |
| 2025-11-22 05:43:50,221 - INFO - Epoch 1 Step 3920 (Global: 3920): loss=0.0354, ppl=1.04, grad_norm=0.37, lr=7.85e-06, throughput=1655 tok/s | |
| 2025-11-22 05:48:28,218 - INFO - Epoch 1 Step 3930 (Global: 3930): loss=0.0423, ppl=1.04, grad_norm=0.46, lr=7.83e-06, throughput=1727 tok/s | |
| 2025-11-22 05:53:22,206 - INFO - Epoch 1 Step 3940 (Global: 3940): loss=0.0403, ppl=1.04, grad_norm=0.50, lr=7.82e-06, throughput=1633 tok/s | |
| 2025-11-22 05:58:15,452 - INFO - Epoch 1 Step 3950 (Global: 3950): loss=0.0312, ppl=1.03, grad_norm=0.31, lr=7.81e-06, throughput=1637 tok/s | |
| 2025-11-22 06:03:07,920 - INFO - Epoch 1 Step 3960 (Global: 3960): loss=0.0355, ppl=1.04, grad_norm=0.40, lr=7.79e-06, throughput=1641 tok/s | |
| 2025-11-22 06:07:48,197 - INFO - Epoch 1 Step 3970 (Global: 3970): loss=0.0476, ppl=1.05, grad_norm=0.47, lr=7.78e-06, throughput=1713 tok/s | |
| 2025-11-22 06:12:40,165 - INFO - Epoch 1 Step 3980 (Global: 3980): loss=0.0648, ppl=1.07, grad_norm=0.41, lr=7.77e-06, throughput=1644 tok/s | |
| 2025-11-22 06:17:34,317 - INFO - Epoch 1 Step 3990 (Global: 3990): loss=0.0407, ppl=1.04, grad_norm=0.77, lr=7.75e-06, throughput=1632 tok/s | |
| 2025-11-22 06:22:27,951 - INFO - Epoch 1 Step 4000 (Global: 4000): loss=0.0351, ppl=1.04, grad_norm=0.43, lr=7.74e-06, throughput=1635 tok/s | |
| 2025-11-22 06:22:27,952 - INFO - | |
| Running validation at step 4000... | |
| 2025-11-22 06:38:54,626 - INFO - Validation loss: 0.0395, perplexity: 1.04 | |
| 2025-11-22 06:38:54,627 - INFO - Qualitative metrics (n=5): | |
| 2025-11-22 06:38:54,627 - INFO - BLEU: 0.9376 | |
| 2025-11-22 06:38:54,627 - INFO - METEOR: 0.9321 | |
| 2025-11-22 06:38:54,627 - INFO - Edit Distance: 0.2641 | |
| 2025-11-22 06:38:54,627 - INFO - F-measure: 0.9940 | |
| 2025-11-22 06:38:54,627 - INFO - | |
| ====================================================================== | |
| 2025-11-22 06:38:54,627 - INFO - Qualitative Evaluation Samples: | |
| 2025-11-22 06:38:54,627 - INFO - ====================================================================== | |
| 2025-11-22 06:38:54,627 - INFO - | |
| Sample 1 (ID: sample_141920_chunk_1): | |
| 2025-11-22 06:38:54,627 - INFO - Context: [Image: sample_141920_chunk_1] + " | |
| Free OCR." | |
| 2025-11-22 06:38:54,628 - INFO - Generated: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' | |
| 2025-11-22 06:38:54,628 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' | |
| 2025-11-22 06:38:54,628 - INFO - ---------------------------------------------------------------------- | |
| 2025-11-22 06:38:54,628 - INFO - | |
| Sample 2 (ID: sample_170543_chunk_2): | |
| 2025-11-22 06:38:54,628 - INFO - Context: [Image: sample_170543_chunk_2] + " | |
| Free OCR." | |
| 2025-11-22 06:38:54,628 - INFO - Generated: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' | |
| 2025-11-22 06:38:54,628 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' | |
| 2025-11-22 06:38:54,628 - INFO - ---------------------------------------------------------------------- | |
| 2025-11-22 06:38:54,628 - INFO - | |
| Sample 3 (ID: sample_107152_chunk_9): | |
| 2025-11-22 06:38:54,628 - INFO - Context: [Image: sample_107152_chunk_9] + " | |
| Free OCR." | |
| 2025-11-22 06:38:54,628 - INFO - Generated: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' | |
| 2025-11-22 06:38:54,628 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' | |
| 2025-11-22 06:38:54,629 - INFO - ---------------------------------------------------------------------- | |
| 2025-11-22 06:38:54,629 - INFO - | |
| Sample 4 (ID: sample_069148_chunk_0): | |
| 2025-11-22 06:38:54,629 - INFO - Context: [Image: sample_069148_chunk_0] + " | |
| Free OCR." | |
| 2025-11-22 06:38:54,629 - INFO - Generated: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' | |
| 2025-11-22 06:38:54,629 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' | |
| 2025-11-22 06:38:54,629 - INFO - ---------------------------------------------------------------------- | |
| 2025-11-22 06:38:54,629 - INFO - | |
| Sample 5 (ID: sample_103176_chunk_4): | |
| 2025-11-22 06:38:54,629 - INFO - Context: [Image: sample_103176_chunk_4] + " | |
| Free OCR." | |
| 2025-11-22 06:38:54,629 - INFO - Generated: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows ...' | |
| 2025-11-22 06:38:54,629 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' | |
| 2025-11-22 06:38:54,630 - INFO - ---------------------------------------------------------------------- | |
| 2025-11-22 06:38:54,631 - INFO - | |
| Qualitative samples saved to: outputs/production_vision_base_reconstruction_20251120_220510/qualitative_step_4000.jsonl | |
| 2025-11-22 06:39:40,254 - INFO - Saved checkpoint to outputs/production_vision_base_reconstruction_20251120_220510/best_checkpoint.pt | |
| 2025-11-22 06:39:40,276 - INFO - New best validation loss: 0.0395, perplexity: 1.04 | |
| 2025-11-22 06:44:18,927 - INFO - Epoch 1 Step 4010 (Global: 4010): loss=0.0369, ppl=1.04, grad_norm=0.62, lr=7.72e-06, throughput=1723 tok/s | |
| 2025-11-22 06:49:07,654 - INFO - Epoch 1 Step 4020 (Global: 4020): loss=0.0424, ppl=1.04, grad_norm=0.37, lr=7.71e-06, throughput=1662 tok/s | |
| 2025-11-22 06:53:57,300 - INFO - Epoch 1 Step 4030 (Global: 4030): loss=0.0320, ppl=1.03, grad_norm=0.47, lr=7.70e-06, throughput=1657 tok/s | |
| 2025-11-22 06:58:49,772 - INFO - Epoch 1 Step 4040 (Global: 4040): loss=0.0421, ppl=1.04, grad_norm=0.57, lr=7.68e-06, throughput=1641 tok/s | |
| 2025-11-22 07:03:31,780 - INFO - Epoch 1 Step 4050 (Global: 4050): loss=0.0507, ppl=1.05, grad_norm=0.72, lr=7.67e-06, throughput=1702 tok/s | |
| 2025-11-22 07:08:22,918 - INFO - Epoch 1 Step 4060 (Global: 4060): loss=0.0387, ppl=1.04, grad_norm=0.55, lr=7.65e-06, throughput=1649 tok/s | |
| 2025-11-22 07:13:18,508 - INFO - Epoch 1 Step 4070 (Global: 4070): loss=0.0318, ppl=1.03, grad_norm=0.53, lr=7.64e-06, throughput=1624 tok/s | |
| 2025-11-22 07:18:12,421 - INFO - Epoch 1 Step 4080 (Global: 4080): loss=0.0397, ppl=1.04, grad_norm=0.59, lr=7.62e-06, throughput=1633 tok/s | |
| 2025-11-22 07:22:53,977 - INFO - Epoch 1 Step 4090 (Global: 4090): loss=0.0381, ppl=1.04, grad_norm=0.45, lr=7.61e-06, throughput=1705 tok/s | |
| 2025-11-22 07:27:49,531 - INFO - Epoch 1 Step 4100 (Global: 4100): loss=0.0387, ppl=1.04, grad_norm=0.42, lr=7.60e-06, throughput=1624 tok/s | |
| 2025-11-22 07:32:44,178 - INFO - Epoch 1 Step 4110 (Global: 4110): loss=0.0344, ppl=1.03, grad_norm=0.48, lr=7.58e-06, throughput=1629 tok/s | |
| 2025-11-22 07:37:38,233 - INFO - Epoch 1 Step 4120 (Global: 4120): loss=0.0321, ppl=1.03, grad_norm=0.53, lr=7.57e-06, throughput=1632 tok/s | |
| 2025-11-22 07:42:17,773 - INFO - Epoch 1 Step 4130 (Global: 4130): loss=0.0438, ppl=1.04, grad_norm=0.53, lr=7.55e-06, throughput=1717 tok/s | |
| 2025-11-22 07:47:12,392 - INFO - Epoch 1 Step 4140 (Global: 4140): loss=0.0426, ppl=1.04, grad_norm=0.41, lr=7.54e-06, throughput=1629 tok/s | |
| 2025-11-22 07:52:06,237 - INFO - Epoch 1 Step 4150 (Global: 4150): loss=0.0471, ppl=1.05, grad_norm=0.43, lr=7.52e-06, throughput=1634 tok/s | |
| 2025-11-22 07:56:59,090 - INFO - Epoch 1 Step 4160 (Global: 4160): loss=0.0402, ppl=1.04, grad_norm=0.44, lr=7.51e-06, throughput=1639 tok/s | |
| 2025-11-22 08:01:41,308 - INFO - Epoch 1 Step 4170 (Global: 4170): loss=0.0427, ppl=1.04, grad_norm=0.59, lr=7.49e-06, throughput=1701 tok/s | |
| 2025-11-22 08:06:43,454 - INFO - Epoch 1 Step 4180 (Global: 4180): loss=0.0375, ppl=1.04, grad_norm=0.73, lr=7.48e-06, throughput=1589 tok/s | |
| 2025-11-22 08:11:39,706 - INFO - Epoch 1 Step 4190 (Global: 4190): loss=0.0338, ppl=1.03, grad_norm=0.47, lr=7.47e-06, throughput=1620 tok/s | |
| 2025-11-22 08:16:33,537 - INFO - Epoch 1 Step 4200 (Global: 4200): loss=0.0567, ppl=1.06, grad_norm=0.45, lr=7.45e-06, throughput=1634 tok/s | |
| 2025-11-22 08:21:15,922 - INFO - Epoch 1 Step 4210 (Global: 4210): loss=0.0377, ppl=1.04, grad_norm=0.42, lr=7.44e-06, throughput=1700 tok/s | |
| 2025-11-22 08:26:08,441 - INFO - Epoch 1 Step 4220 (Global: 4220): loss=0.0421, ppl=1.04, grad_norm=0.53, lr=7.42e-06, throughput=1641 tok/s | |
| 2025-11-22 08:31:01,821 - INFO - Epoch 1 Step 4230 (Global: 4230): loss=0.0355, ppl=1.04, grad_norm=0.59, lr=7.41e-06, throughput=1636 tok/s | |
| 2025-11-22 08:35:54,737 - INFO - Epoch 1 Step 4240 (Global: 4240): loss=0.0437, ppl=1.04, grad_norm=0.65, lr=7.39e-06, throughput=1639 tok/s | |
| 2025-11-22 08:40:34,946 - INFO - Epoch 1 Step 4250 (Global: 4250): loss=0.0312, ppl=1.03, grad_norm=0.72, lr=7.38e-06, throughput=1713 tok/s | |
| 2025-11-22 08:45:27,274 - INFO - Epoch 1 Step 4260 (Global: 4260): loss=0.0320, ppl=1.03, grad_norm=0.42, lr=7.36e-06, throughput=1642 tok/s | |
| 2025-11-22 08:50:17,878 - INFO - Epoch 1 Step 4270 (Global: 4270): loss=0.0417, ppl=1.04, grad_norm=0.76, lr=7.35e-06, throughput=1652 tok/s | |
| 2025-11-22 08:54:58,379 - INFO - Epoch 1 Step 4280 (Global: 4280): loss=0.0333, ppl=1.03, grad_norm=0.50, lr=7.33e-06, throughput=1711 tok/s | |
| 2025-11-22 08:59:50,359 - INFO - Epoch 1 Step 4290 (Global: 4290): loss=0.0398, ppl=1.04, grad_norm=0.46, lr=7.32e-06, throughput=1644 tok/s | |
| 2025-11-22 09:04:41,147 - INFO - Epoch 1 Step 4300 (Global: 4300): loss=0.0356, ppl=1.04, grad_norm=0.66, lr=7.30e-06, throughput=1651 tok/s | |
| 2025-11-22 09:09:37,771 - INFO - Epoch 1 Step 4310 (Global: 4310): loss=0.0310, ppl=1.03, grad_norm=0.57, lr=7.29e-06, throughput=1618 tok/s | |
| 2025-11-22 09:14:17,269 - INFO - Epoch 1 Step 4320 (Global: 4320): loss=0.0326, ppl=1.03, grad_norm=0.46, lr=7.27e-06, throughput=1717 tok/s | |
| 2025-11-22 09:19:06,720 - INFO - Epoch 1 Step 4330 (Global: 4330): loss=0.0286, ppl=1.03, grad_norm=0.36, lr=7.26e-06, throughput=1658 tok/s | |
| 2025-11-22 09:23:59,733 - INFO - Epoch 1 Step 4340 (Global: 4340): loss=0.0323, ppl=1.03, grad_norm=0.53, lr=7.24e-06, throughput=1638 tok/s | |
| 2025-11-22 09:28:51,804 - INFO - Epoch 1 Step 4350 (Global: 4350): loss=0.0376, ppl=1.04, grad_norm=0.35, lr=7.23e-06, throughput=1643 tok/s | |
| 2025-11-22 09:33:28,700 - INFO - Epoch 1 Step 4360 (Global: 4360): loss=0.0390, ppl=1.04, grad_norm=1.16, lr=7.21e-06, throughput=1734 tok/s | |
| 2025-11-22 09:38:20,478 - INFO - Epoch 1 Step 4370 (Global: 4370): loss=0.0331, ppl=1.03, grad_norm=0.51, lr=7.20e-06, throughput=1645 tok/s | |
| 2025-11-22 09:43:14,059 - INFO - Epoch 1 Step 4380 (Global: 4380): loss=0.0339, ppl=1.03, grad_norm=0.34, lr=7.18e-06, throughput=1635 tok/s | |
| 2025-11-22 09:48:04,375 - INFO - Epoch 1 Step 4390 (Global: 4390): loss=0.0360, ppl=1.04, grad_norm=0.36, lr=7.17e-06, throughput=1653 tok/s | |
| 2025-11-22 09:52:43,127 - INFO - Epoch 1 Step 4400 (Global: 4400): loss=0.0332, ppl=1.03, grad_norm=0.50, lr=7.15e-06, throughput=1722 tok/s | |
| 2025-11-22 09:57:35,898 - INFO - Epoch 1 Step 4410 (Global: 4410): loss=0.0362, ppl=1.04, grad_norm=0.31, lr=7.14e-06, throughput=1640 tok/s | |
| 2025-11-22 10:02:30,158 - INFO - Epoch 1 Step 4420 (Global: 4420): loss=0.0415, ppl=1.04, grad_norm=0.45, lr=7.12e-06, throughput=1631 tok/s | |
| 2025-11-22 10:07:23,980 - INFO - Epoch 1 Step 4430 (Global: 4430): loss=0.0345, ppl=1.04, grad_norm=0.42, lr=7.11e-06, throughput=1634 tok/s | |
| 2025-11-22 10:12:06,750 - INFO - Epoch 1 Step 4440 (Global: 4440): loss=0.0284, ppl=1.03, grad_norm=0.25, lr=7.09e-06, throughput=1698 tok/s | |
| 2025-11-22 10:17:00,030 - INFO - Epoch 1 Step 4450 (Global: 4450): loss=0.0363, ppl=1.04, grad_norm=0.43, lr=7.08e-06, throughput=1637 tok/s | |
| 2025-11-22 10:21:51,556 - INFO - Epoch 1 Step 4460 (Global: 4460): loss=0.0383, ppl=1.04, grad_norm=0.45, lr=7.06e-06, throughput=1647 tok/s | |
| 2025-11-22 10:26:43,097 - INFO - Epoch 1 Step 4470 (Global: 4470): loss=0.0319, ppl=1.03, grad_norm=0.54, lr=7.05e-06, throughput=1646 tok/s | |
| 2025-11-22 10:31:23,799 - INFO - Epoch 1 Step 4480 (Global: 4480): loss=0.0271, ppl=1.03, grad_norm=0.52, lr=7.03e-06, throughput=1710 tok/s | |
| 2025-11-22 10:36:15,329 - INFO - Epoch 1 Step 4490 (Global: 4490): loss=0.0289, ppl=1.03, grad_norm=0.39, lr=7.02e-06, throughput=1647 tok/s | |
| 2025-11-22 10:41:06,264 - INFO - Epoch 1 Step 4500 (Global: 4500): loss=0.0297, ppl=1.03, grad_norm=0.33, lr=7.00e-06, throughput=1650 tok/s | |
| 2025-11-22 10:45:48,943 - INFO - Epoch 1 Step 4510 (Global: 4510): loss=0.0293, ppl=1.03, grad_norm=0.38, lr=6.99e-06, throughput=1698 tok/s | |
| 2025-11-22 10:50:40,969 - INFO - Epoch 1 Step 4520 (Global: 4520): loss=0.0383, ppl=1.04, grad_norm=0.61, lr=6.97e-06, throughput=1644 tok/s | |
| 2025-11-22 10:55:31,952 - INFO - Epoch 1 Step 4530 (Global: 4530): loss=0.0343, ppl=1.03, grad_norm=0.57, lr=6.96e-06, throughput=1650 tok/s | |
| 2025-11-22 11:00:24,850 - INFO - Epoch 1 Step 4540 (Global: 4540): loss=0.0298, ppl=1.03, grad_norm=0.38, lr=6.94e-06, throughput=1639 tok/s | |
| 2025-11-22 11:05:09,152 - INFO - Epoch 1 Step 4550 (Global: 4550): loss=0.0355, ppl=1.04, grad_norm=0.49, lr=6.92e-06, throughput=1688 tok/s | |
| 2025-11-22 11:10:01,994 - INFO - Epoch 1 Step 4560 (Global: 4560): loss=0.0355, ppl=1.04, grad_norm=0.74, lr=6.91e-06, throughput=1639 tok/s | |
| 2025-11-22 11:14:56,876 - INFO - Epoch 1 Step 4570 (Global: 4570): loss=0.0351, ppl=1.04, grad_norm=0.48, lr=6.89e-06, throughput=1628 tok/s | |
| 2025-11-22 11:19:52,375 - INFO - Epoch 1 Step 4580 (Global: 4580): loss=0.0422, ppl=1.04, grad_norm=0.51, lr=6.88e-06, throughput=1624 tok/s | |
| 2025-11-22 11:24:31,255 - INFO - Epoch 1 Step 4590 (Global: 4590): loss=0.0368, ppl=1.04, grad_norm=0.50, lr=6.86e-06, throughput=1721 tok/s | |
| 2025-11-22 11:29:21,183 - INFO - Epoch 1 Step 4600 (Global: 4600): loss=0.0331, ppl=1.03, grad_norm=0.49, lr=6.85e-06, throughput=1656 tok/s | |
| 2025-11-22 11:34:13,298 - INFO - Epoch 1 Step 4610 (Global: 4610): loss=0.0642, ppl=1.07, grad_norm=0.71, lr=6.83e-06, throughput=1643 tok/s | |
| 2025-11-22 11:39:05,306 - INFO - Epoch 1 Step 4620 (Global: 4620): loss=0.0389, ppl=1.04, grad_norm=0.48, lr=6.82e-06, throughput=1644 tok/s | |
| 2025-11-22 11:43:42,972 - INFO - Epoch 1 Step 4630 (Global: 4630): loss=0.0349, ppl=1.04, grad_norm=0.33, lr=6.80e-06, throughput=1729 tok/s | |
| 2025-11-22 11:48:36,710 - INFO - Epoch 1 Step 4640 (Global: 4640): loss=0.0395, ppl=1.04, grad_norm=0.50, lr=6.78e-06, throughput=1634 tok/s | |
| 2025-11-22 11:53:31,168 - INFO - Epoch 1 Step 4650 (Global: 4650): loss=0.0558, ppl=1.06, grad_norm=0.48, lr=6.77e-06, throughput=1630 tok/s | |
| 2025-11-22 11:58:23,639 - INFO - Epoch 1 Step 4660 (Global: 4660): loss=0.0430, ppl=1.04, grad_norm=0.51, lr=6.75e-06, throughput=1641 tok/s | |
| 2025-11-22 12:03:06,571 - INFO - Epoch 1 Step 4670 (Global: 4670): loss=0.0403, ppl=1.04, grad_norm=0.47, lr=6.74e-06, throughput=1697 tok/s | |
| 2025-11-22 12:08:02,783 - INFO - Epoch 1 Step 4680 (Global: 4680): loss=0.0308, ppl=1.03, grad_norm=0.62, lr=6.72e-06, throughput=1620 tok/s | |
| 2025-11-22 12:12:56,806 - INFO - Epoch 1 Step 4690 (Global: 4690): loss=0.0267, ppl=1.03, grad_norm=0.38, lr=6.71e-06, throughput=1633 tok/s | |
| 2025-11-22 12:17:53,312 - INFO - Epoch 1 Step 4700 (Global: 4700): loss=0.0282, ppl=1.03, grad_norm=0.58, lr=6.69e-06, throughput=1619 tok/s | |
| 2025-11-22 12:22:38,673 - INFO - Epoch 1 Step 4710 (Global: 4710): loss=0.0326, ppl=1.03, grad_norm=0.44, lr=6.67e-06, throughput=1682 tok/s | |
| 2025-11-22 12:27:35,180 - INFO - Epoch 1 Step 4720 (Global: 4720): loss=0.0411, ppl=1.04, grad_norm=0.63, lr=6.66e-06, throughput=1619 tok/s | |
| 2025-11-22 12:32:29,054 - INFO - Epoch 1 Step 4730 (Global: 4730): loss=0.0354, ppl=1.04, grad_norm=0.33, lr=6.64e-06, throughput=1633 tok/s | |
| 2025-11-22 12:37:11,108 - INFO - Epoch 1 Step 4740 (Global: 4740): loss=0.0589, ppl=1.06, grad_norm=0.51, lr=6.63e-06, throughput=1702 tok/s | |
| 2025-11-22 12:42:05,355 - INFO - Epoch 1 Step 4750 (Global: 4750): loss=0.0375, ppl=1.04, grad_norm=0.42, lr=6.61e-06, throughput=1631 tok/s | |
| 2025-11-22 12:47:16,884 - INFO - Epoch 1 Step 4760 (Global: 4760): loss=0.0437, ppl=1.04, grad_norm=0.40, lr=6.60e-06, throughput=1541 tok/s | |
| 2025-11-22 12:52:28,236 - INFO - Epoch 1 Step 4770 (Global: 4770): loss=0.0333, ppl=1.03, grad_norm=0.36, lr=6.58e-06, throughput=1542 tok/s | |
| 2025-11-22 12:57:13,348 - INFO - Epoch 1 Step 4780 (Global: 4780): loss=0.0387, ppl=1.04, grad_norm=0.45, lr=6.56e-06, throughput=1684 tok/s | |
| 2025-11-22 13:02:07,777 - INFO - Epoch 1 Step 4790 (Global: 4790): loss=0.0320, ppl=1.03, grad_norm=0.70, lr=6.55e-06, throughput=1630 tok/s | |
| 2025-11-22 13:07:00,140 - INFO - Epoch 1 Step 4800 (Global: 4800): loss=0.0439, ppl=1.04, grad_norm=0.45, lr=6.53e-06, throughput=1642 tok/s | |
| 2025-11-22 13:11:56,119 - INFO - Epoch 1 Step 4810 (Global: 4810): loss=0.0329, ppl=1.03, grad_norm=0.44, lr=6.52e-06, throughput=1622 tok/s | |
| 2025-11-22 13:16:43,989 - INFO - Epoch 1 Step 4820 (Global: 4820): loss=0.0325, ppl=1.03, grad_norm=0.31, lr=6.50e-06, throughput=1667 tok/s | |
| 2025-11-22 13:21:36,949 - INFO - Epoch 1 Step 4830 (Global: 4830): loss=0.0429, ppl=1.04, grad_norm=1.09, lr=6.48e-06, throughput=1638 tok/s | |
| 2025-11-22 13:26:33,834 - INFO - Epoch 1 Step 4840 (Global: 4840): loss=0.0376, ppl=1.04, grad_norm=0.37, lr=6.47e-06, throughput=1617 tok/s | |
| 2025-11-22 13:31:31,025 - INFO - Epoch 1 Step 4850 (Global: 4850): loss=0.0303, ppl=1.03, grad_norm=0.54, lr=6.45e-06, throughput=1615 tok/s | |
| 2025-11-22 13:36:12,627 - INFO - Epoch 1 Step 4860 (Global: 4860): loss=0.0352, ppl=1.04, grad_norm=0.44, lr=6.44e-06, throughput=1705 tok/s | |
| 2025-11-22 13:41:04,237 - INFO - Epoch 1 Step 4870 (Global: 4870): loss=0.0357, ppl=1.04, grad_norm=0.69, lr=6.42e-06, throughput=1646 tok/s | |
| 2025-11-22 13:45:58,681 - INFO - Epoch 1 Step 4880 (Global: 4880): loss=0.0341, ppl=1.03, grad_norm=0.35, lr=6.40e-06, throughput=1630 tok/s | |
| 2025-11-22 13:50:53,558 - INFO - Epoch 1 Step 4890 (Global: 4890): loss=0.0337, ppl=1.03, grad_norm=0.59, lr=6.39e-06, throughput=1628 tok/s | |
| 2025-11-22 13:55:37,601 - INFO - Epoch 1 Step 4900 (Global: 4900): loss=0.0404, ppl=1.04, grad_norm=1.80, lr=6.37e-06, throughput=1690 tok/s | |
| 2025-11-22 14:00:35,951 - INFO - Epoch 1 Step 4910 (Global: 4910): loss=0.0308, ppl=1.03, grad_norm=0.48, lr=6.35e-06, throughput=1609 tok/s | |
| 2025-11-22 14:05:29,335 - INFO - Epoch 1 Step 4920 (Global: 4920): loss=0.0367, ppl=1.04, grad_norm=0.59, lr=6.34e-06, throughput=1636 tok/s | |
| 2025-11-22 14:10:25,673 - INFO - Epoch 1 Step 4930 (Global: 4930): loss=0.0374, ppl=1.04, grad_norm=0.38, lr=6.32e-06, throughput=1620 tok/s | |
| 2025-11-22 14:15:10,703 - INFO - Epoch 1 Step 4940 (Global: 4940): loss=0.0297, ppl=1.03, grad_norm=0.52, lr=6.31e-06, throughput=1684 tok/s | |
| 2025-11-22 14:20:06,279 - INFO - Epoch 1 Step 4950 (Global: 4950): loss=0.0485, ppl=1.05, grad_norm=0.42, lr=6.29e-06, throughput=1624 tok/s | |
| 2025-11-22 14:25:55,189 - INFO - Epoch 1 Step 4960 (Global: 4960): loss=0.0282, ppl=1.03, grad_norm=0.71, lr=6.27e-06, throughput=1376 tok/s | |
| 2025-11-22 14:30:56,111 - INFO - Epoch 1 Step 4970 (Global: 4970): loss=0.0368, ppl=1.04, grad_norm=0.91, lr=6.26e-06, throughput=1595 tok/s | |
| 2025-11-22 14:35:51,472 - INFO - Epoch 1 Step 4980 (Global: 4980): loss=0.0365, ppl=1.04, grad_norm=0.61, lr=6.24e-06, throughput=1625 tok/s | |
| 2025-11-22 14:41:06,658 - INFO - Epoch 1 Step 4990 (Global: 4990): loss=0.0577, ppl=1.06, grad_norm=0.76, lr=6.23e-06, throughput=1523 tok/s | |
| 2025-11-22 14:46:06,627 - INFO - Epoch 1 Step 5000 (Global: 5000): loss=0.0405, ppl=1.04, grad_norm=0.94, lr=6.21e-06, throughput=1600 tok/s | |
| 2025-11-22 14:51:02,701 - INFO - Epoch 1 Step 5010 (Global: 5010): loss=0.0417, ppl=1.04, grad_norm=0.44, lr=6.19e-06, throughput=1621 tok/s | |
| 2025-11-22 14:55:51,027 - INFO - Epoch 1 Step 5020 (Global: 5020): loss=0.0326, ppl=1.03, grad_norm=0.33, lr=6.18e-06, throughput=1665 tok/s | |
| 2025-11-22 15:00:47,357 - INFO - Epoch 1 Step 5030 (Global: 5030): loss=0.0325, ppl=1.03, grad_norm=0.45, lr=6.16e-06, throughput=1620 tok/s | |
| 2025-11-22 15:05:43,509 - INFO - Epoch 1 Step 5040 (Global: 5040): loss=0.0299, ppl=1.03, grad_norm=0.51, lr=6.14e-06, throughput=1621 tok/s | |
| 2025-11-22 15:10:36,679 - INFO - Epoch 1 Step 5050 (Global: 5050): loss=0.0339, ppl=1.03, grad_norm=0.35, lr=6.13e-06, throughput=1637 tok/s | |
| 2025-11-22 15:15:35,463 - INFO - Epoch 1 Step 5060 (Global: 5060): loss=0.0340, ppl=1.03, grad_norm=0.47, lr=6.11e-06, throughput=1607 tok/s | |
| 2025-11-22 15:20:38,513 - INFO - Epoch 1 Step 5070 (Global: 5070): loss=0.0375, ppl=1.04, grad_norm=0.41, lr=6.10e-06, throughput=1584 tok/s | |
| 2025-11-22 15:25:38,543 - INFO - Epoch 1 Step 5080 (Global: 5080): loss=0.0500, ppl=1.05, grad_norm=0.46, lr=6.08e-06, throughput=1600 tok/s | |
| 2025-11-22 15:30:22,167 - INFO - Epoch 1 Step 5090 (Global: 5090): loss=0.0532, ppl=1.05, grad_norm=0.53, lr=6.06e-06, throughput=1692 tok/s | |
| 2025-11-22 15:35:20,575 - INFO - Epoch 1 Step 5100 (Global: 5100): loss=0.0327, ppl=1.03, grad_norm=0.38, lr=6.05e-06, throughput=1609 tok/s | |
| 2025-11-22 15:40:21,116 - INFO - Epoch 1 Step 5110 (Global: 5110): loss=0.0390, ppl=1.04, grad_norm=0.43, lr=6.03e-06, throughput=1597 tok/s | |
| 2025-11-22 15:45:22,423 - INFO - Epoch 1 Step 5120 (Global: 5120): loss=0.0409, ppl=1.04, grad_norm=0.57, lr=6.01e-06, throughput=1593 tok/s | |
| 2025-11-22 15:50:02,494 - INFO - Epoch 1 Step 5130 (Global: 5130): loss=0.0368, ppl=1.04, grad_norm=0.38, lr=6.00e-06, throughput=1714 tok/s | |
| 2025-11-22 15:54:51,574 - INFO - Epoch 1 Step 5140 (Global: 5140): loss=0.0424, ppl=1.04, grad_norm=0.46, lr=5.98e-06, throughput=1660 tok/s | |
| 2025-11-22 15:59:40,470 - INFO - Epoch 1 Step 5150 (Global: 5150): loss=0.0416, ppl=1.04, grad_norm=0.38, lr=5.96e-06, throughput=1662 tok/s | |
| 2025-11-22 16:04:34,492 - INFO - Epoch 1 Step 5160 (Global: 5160): loss=0.0319, ppl=1.03, grad_norm=0.70, lr=5.95e-06, throughput=1633 tok/s | |
| 2025-11-22 16:09:17,621 - INFO - Epoch 1 Step 5170 (Global: 5170): loss=0.0433, ppl=1.04, grad_norm=0.45, lr=5.93e-06, throughput=1695 tok/s | |
| 2025-11-22 16:14:08,647 - INFO - Epoch 1 Step 5180 (Global: 5180): loss=0.0360, ppl=1.04, grad_norm=0.43, lr=5.91e-06, throughput=1649 tok/s | |
| 2025-11-22 16:18:59,025 - INFO - Epoch 1 Step 5190 (Global: 5190): loss=0.0333, ppl=1.03, grad_norm=0.44, lr=5.90e-06, throughput=1653 tok/s | |
| 2025-11-22 16:23:54,910 - INFO - Epoch 1 Step 5200 (Global: 5200): loss=0.0330, ppl=1.03, grad_norm=0.47, lr=5.88e-06, throughput=1622 tok/s | |
| 2025-11-22 16:28:34,703 - INFO - Epoch 1 Step 5210 (Global: 5210): loss=0.0411, ppl=1.04, grad_norm=0.37, lr=5.87e-06, throughput=1716 tok/s | |
| 2025-11-22 16:33:25,962 - INFO - Epoch 1 Step 5220 (Global: 5220): loss=0.0321, ppl=1.03, grad_norm=0.45, lr=5.85e-06, throughput=1648 tok/s | |
| 2025-11-22 16:38:19,157 - INFO - Epoch 1 Step 5230 (Global: 5230): loss=0.0362, ppl=1.04, grad_norm=0.45, lr=5.83e-06, throughput=1637 tok/s | |
| 2025-11-22 16:43:11,499 - INFO - Epoch 1 Step 5240 (Global: 5240): loss=0.0450, ppl=1.05, grad_norm=0.60, lr=5.82e-06, throughput=1642 tok/s | |
| 2025-11-22 16:47:59,385 - INFO - Epoch 1 Step 5250 (Global: 5250): loss=0.0287, ppl=1.03, grad_norm=0.62, lr=5.80e-06, throughput=1667 tok/s | |
| 2025-11-22 16:52:57,592 - INFO - Epoch 1 Step 5260 (Global: 5260): loss=0.0343, ppl=1.03, grad_norm=0.62, lr=5.78e-06, throughput=1610 tok/s | |
| 2025-11-22 16:57:57,122 - INFO - Epoch 1 Step 5270 (Global: 5270): loss=0.0331, ppl=1.03, grad_norm=0.40, lr=5.77e-06, throughput=1603 tok/s | |
| 2025-11-22 17:02:45,058 - INFO - Epoch 1 Step 5280 (Global: 5280): loss=0.0335, ppl=1.03, grad_norm=0.35, lr=5.75e-06, throughput=1667 tok/s | |
| 2025-11-22 17:07:47,738 - INFO - Epoch 1 Step 5290 (Global: 5290): loss=0.0259, ppl=1.03, grad_norm=0.33, lr=5.73e-06, throughput=1586 tok/s | |
| 2025-11-22 17:12:47,377 - INFO - Epoch 1 Step 5300 (Global: 5300): loss=0.0318, ppl=1.03, grad_norm=0.36, lr=5.72e-06, throughput=1602 tok/s | |
| 2025-11-22 17:17:41,620 - INFO - Epoch 1 Step 5310 (Global: 5310): loss=0.0340, ppl=1.03, grad_norm=0.60, lr=5.70e-06, throughput=1631 tok/s | |
| 2025-11-22 17:22:26,373 - INFO - Epoch 1 Step 5320 (Global: 5320): loss=0.0317, ppl=1.03, grad_norm=0.41, lr=5.68e-06, throughput=1686 tok/s | |
| 2025-11-22 17:27:23,760 - INFO - Epoch 1 Step 5330 (Global: 5330): loss=0.0459, ppl=1.05, grad_norm=0.62, lr=5.67e-06, throughput=1614 tok/s | |
| 2025-11-22 17:32:36,869 - INFO - Epoch 1 Step 5340 (Global: 5340): loss=0.0315, ppl=1.03, grad_norm=0.81, lr=5.65e-06, throughput=1533 tok/s | |
| 2025-11-22 17:37:28,018 - INFO - Epoch 1 Step 5350 (Global: 5350): loss=0.0374, ppl=1.04, grad_norm=1.01, lr=5.63e-06, throughput=1649 tok/s | |
| 2025-11-22 17:42:10,070 - INFO - Epoch 1 Step 5360 (Global: 5360): loss=0.0352, ppl=1.04, grad_norm=0.44, lr=5.62e-06, throughput=1702 tok/s | |
| 2025-11-22 17:47:03,503 - INFO - Epoch 1 Step 5370 (Global: 5370): loss=0.0240, ppl=1.02, grad_norm=0.36, lr=5.60e-06, throughput=1636 tok/s | |
| 2025-11-22 17:51:55,622 - INFO - Epoch 1 Step 5380 (Global: 5380): loss=0.0411, ppl=1.04, grad_norm=0.54, lr=5.58e-06, throughput=1643 tok/s | |
| 2025-11-22 17:56:49,224 - INFO - Epoch 1 Step 5390 (Global: 5390): loss=0.0282, ppl=1.03, grad_norm=0.41, lr=5.57e-06, throughput=1635 tok/s | |
| 2025-11-22 18:01:26,962 - INFO - Epoch 1 Step 5400 (Global: 5400): loss=0.0270, ppl=1.03, grad_norm=0.44, lr=5.55e-06, throughput=1728 tok/s | |
| 2025-11-22 18:06:18,881 - INFO - Epoch 1 Step 5410 (Global: 5410): loss=0.0344, ppl=1.03, grad_norm=0.42, lr=5.53e-06, throughput=1644 tok/s | |
| 2025-11-22 18:11:11,406 - INFO - Epoch 1 Step 5420 (Global: 5420): loss=0.0336, ppl=1.03, grad_norm=0.34, lr=5.52e-06, throughput=1641 tok/s | |
| 2025-11-22 18:16:03,732 - INFO - Epoch 1 Step 5430 (Global: 5430): loss=0.0400, ppl=1.04, grad_norm=0.55, lr=5.50e-06, throughput=1642 tok/s | |
| 2025-11-22 18:20:44,541 - INFO - Epoch 1 Step 5440 (Global: 5440): loss=0.0269, ppl=1.03, grad_norm=0.42, lr=5.48e-06, throughput=1709 tok/s | |
| 2025-11-22 18:25:36,857 - INFO - Epoch 1 Step 5450 (Global: 5450): loss=0.0508, ppl=1.05, grad_norm=0.53, lr=5.47e-06, throughput=1642 tok/s | |
| 2025-11-22 18:30:30,169 - INFO - Epoch 1 Step 5460 (Global: 5460): loss=0.0338, ppl=1.03, grad_norm=0.70, lr=5.45e-06, throughput=1636 tok/s | |
| 2025-11-22 18:35:25,302 - INFO - Epoch 1 Step 5470 (Global: 5470): loss=0.0329, ppl=1.03, grad_norm=0.51, lr=5.43e-06, throughput=1626 tok/s | |
| 2025-11-22 18:40:05,920 - INFO - Epoch 1 Step 5480 (Global: 5480): loss=0.0316, ppl=1.03, grad_norm=0.38, lr=5.42e-06, throughput=1711 tok/s | |
| 2025-11-22 18:45:00,724 - INFO - Epoch 1 Step 5490 (Global: 5490): loss=0.0414, ppl=1.04, grad_norm=0.45, lr=5.40e-06, throughput=1628 tok/s | |
| 2025-11-22 18:49:54,454 - INFO - Epoch 1 Step 5500 (Global: 5500): loss=0.0357, ppl=1.04, grad_norm=0.78, lr=5.38e-06, throughput=1634 tok/s | |
| 2025-11-22 18:54:32,023 - INFO - Epoch 1 Step 5510 (Global: 5510): loss=0.0340, ppl=1.03, grad_norm=0.40, lr=5.37e-06, throughput=1729 tok/s | |
| 2025-11-22 18:59:22,826 - INFO - Epoch 1 Step 5520 (Global: 5520): loss=0.0290, ppl=1.03, grad_norm=0.31, lr=5.35e-06, throughput=1651 tok/s | |
| 2025-11-22 19:04:18,583 - INFO - Epoch 1 Step 5530 (Global: 5530): loss=0.0317, ppl=1.03, grad_norm=0.40, lr=5.33e-06, throughput=1623 tok/s | |
| 2025-11-22 19:09:20,025 - INFO - Epoch 1 Step 5540 (Global: 5540): loss=0.0377, ppl=1.04, grad_norm=0.40, lr=5.32e-06, throughput=1592 tok/s | |
| 2025-11-22 19:14:01,169 - INFO - Epoch 1 Step 5550 (Global: 5550): loss=0.0433, ppl=1.04, grad_norm=0.58, lr=5.30e-06, throughput=1707 tok/s | |
| 2025-11-22 19:18:55,775 - INFO - Epoch 1 Step 5560 (Global: 5560): loss=0.0286, ppl=1.03, grad_norm=0.78, lr=5.28e-06, throughput=1629 tok/s | |
| 2025-11-22 19:23:50,846 - INFO - Epoch 1 Step 5570 (Global: 5570): loss=0.0339, ppl=1.03, grad_norm=0.57, lr=5.27e-06, throughput=1627 tok/s | |
| 2025-11-22 19:28:44,192 - INFO - Epoch 1 Step 5580 (Global: 5580): loss=0.0333, ppl=1.03, grad_norm=0.50, lr=5.25e-06, throughput=1636 tok/s | |
| 2025-11-22 19:33:25,361 - INFO - Epoch 1 Step 5590 (Global: 5590): loss=0.0302, ppl=1.03, grad_norm=0.38, lr=5.23e-06, throughput=1707 tok/s | |
| 2025-11-22 19:38:16,471 - INFO - Epoch 1 Step 5600 (Global: 5600): loss=0.0462, ppl=1.05, grad_norm=0.47, lr=5.22e-06, throughput=1649 tok/s | |
| 2025-11-22 19:43:06,870 - INFO - Epoch 1 Step 5610 (Global: 5610): loss=0.0381, ppl=1.04, grad_norm=0.46, lr=5.20e-06, throughput=1653 tok/s | |
| 2025-11-22 19:48:08,555 - INFO - Epoch 1 Step 5620 (Global: 5620): loss=0.0301, ppl=1.03, grad_norm=0.36, lr=5.18e-06, throughput=1591 tok/s | |
| 2025-11-22 19:53:20,743 - INFO - Epoch 1 Step 5630 (Global: 5630): loss=0.0324, ppl=1.03, grad_norm=0.50, lr=5.17e-06, throughput=1538 tok/s | |
| 2025-11-22 19:58:48,205 - INFO - Epoch 1 Step 5640 (Global: 5640): loss=0.0367, ppl=1.04, grad_norm=0.49, lr=5.15e-06, throughput=1466 tok/s | |
| 2025-11-22 20:04:22,425 - INFO - Epoch 1 Step 5650 (Global: 5650): loss=0.0339, ppl=1.03, grad_norm=0.36, lr=5.13e-06, throughput=1436 tok/s | |
| 2025-11-22 20:10:05,207 - INFO - Epoch 1 Step 5660 (Global: 5660): loss=0.0425, ppl=1.04, grad_norm=0.30, lr=5.12e-06, throughput=1400 tok/s | |
| 2025-11-22 20:14:55,479 - INFO - Epoch 1 Step 5670 (Global: 5670): loss=0.0233, ppl=1.02, grad_norm=0.29, lr=5.10e-06, throughput=1654 tok/s | |
| 2025-11-22 20:19:50,176 - INFO - Epoch 1 Step 5680 (Global: 5680): loss=0.0282, ppl=1.03, grad_norm=0.42, lr=5.08e-06, throughput=1629 tok/s | |
| 2025-11-22 20:24:50,137 - INFO - Epoch 1 Step 5690 (Global: 5690): loss=0.0309, ppl=1.03, grad_norm=0.68, lr=5.07e-06, throughput=1600 tok/s | |
| 2025-11-22 20:29:50,651 - INFO - Epoch 1 Step 5700 (Global: 5700): loss=0.0354, ppl=1.04, grad_norm=0.52, lr=5.05e-06, throughput=1597 tok/s | |
| 2025-11-22 20:34:42,008 - INFO - Epoch 1 Step 5710 (Global: 5710): loss=0.0275, ppl=1.03, grad_norm=0.30, lr=5.03e-06, throughput=1647 tok/s | |
| 2025-11-22 20:39:41,728 - INFO - Epoch 1 Step 5720 (Global: 5720): loss=0.0383, ppl=1.04, grad_norm=0.54, lr=5.02e-06, throughput=1602 tok/s | |
| 2025-11-22 20:44:48,571 - INFO - Epoch 1 Step 5730 (Global: 5730): loss=0.0309, ppl=1.03, grad_norm=0.45, lr=5.00e-06, throughput=1564 tok/s | |
| 2025-11-22 20:49:55,752 - INFO - Epoch 1 Step 5740 (Global: 5740): loss=0.0307, ppl=1.03, grad_norm=0.32, lr=4.98e-06, throughput=1563 tok/s | |
| 2025-11-22 20:55:08,479 - INFO - Epoch 1 Step 5750 (Global: 5750): loss=0.0331, ppl=1.03, grad_norm=0.67, lr=4.96e-06, throughput=1535 tok/s | |
| 2025-11-22 21:00:20,686 - INFO - Epoch 1 Step 5760 (Global: 5760): loss=0.0337, ppl=1.03, grad_norm=0.42, lr=4.95e-06, throughput=1537 tok/s | |
| 2025-11-22 21:05:34,249 - INFO - Epoch 1 Step 5770 (Global: 5770): loss=0.0327, ppl=1.03, grad_norm=0.47, lr=4.93e-06, throughput=1531 tok/s | |
| 2025-11-22 21:10:30,533 - INFO - Epoch 1 Step 5780 (Global: 5780): loss=0.0351, ppl=1.04, grad_norm=0.59, lr=4.91e-06, throughput=1620 tok/s | |
| 2025-11-22 21:15:30,102 - INFO - Epoch 1 Step 5790 (Global: 5790): loss=0.0399, ppl=1.04, grad_norm=0.46, lr=4.90e-06, throughput=1602 tok/s | |
| 2025-11-22 21:20:24,547 - INFO - Epoch 1 Step 5800 (Global: 5800): loss=0.0381, ppl=1.04, grad_norm=0.41, lr=4.88e-06, throughput=1630 tok/s | |
| 2025-11-22 21:25:40,072 - INFO - Epoch 1 Step 5810 (Global: 5810): loss=0.0306, ppl=1.03, grad_norm=0.44, lr=4.86e-06, throughput=1521 tok/s | |
| 2025-11-22 21:30:37,559 - INFO - Epoch 1 Step 5820 (Global: 5820): loss=0.0308, ppl=1.03, grad_norm=0.36, lr=4.85e-06, throughput=1614 tok/s | |
| 2025-11-22 21:35:31,341 - INFO - Epoch 1 Step 5830 (Global: 5830): loss=0.0339, ppl=1.03, grad_norm=0.47, lr=4.83e-06, throughput=1634 tok/s | |
| 2025-11-22 21:40:17,908 - INFO - Epoch 1 Step 5840 (Global: 5840): loss=0.0303, ppl=1.03, grad_norm=0.58, lr=4.81e-06, throughput=1675 tok/s | |
| 2025-11-22 21:45:20,848 - INFO - Epoch 1 Step 5850 (Global: 5850): loss=0.0306, ppl=1.03, grad_norm=0.49, lr=4.80e-06, throughput=1585 tok/s | |
| 2025-11-22 21:50:18,264 - INFO - Epoch 1 Step 5860 (Global: 5860): loss=0.0483, ppl=1.05, grad_norm=0.46, lr=4.78e-06, throughput=1614 tok/s | |
| 2025-11-22 21:55:14,257 - INFO - Epoch 1 Step 5870 (Global: 5870): loss=0.0380, ppl=1.04, grad_norm=0.52, lr=4.76e-06, throughput=1622 tok/s | |
| 2025-11-22 22:00:11,799 - INFO - Epoch 1 Step 5880 (Global: 5880): loss=0.0293, ppl=1.03, grad_norm=0.38, lr=4.75e-06, throughput=1613 tok/s | |
| 2025-11-22 22:05:04,751 - INFO - Epoch 1 Step 5890 (Global: 5890): loss=0.0391, ppl=1.04, grad_norm=0.49, lr=4.73e-06, throughput=1639 tok/s | |
| 2025-11-22 22:09:39,043 - INFO - Epoch 1 Step 5900 (Global: 5900): loss=0.0405, ppl=1.04, grad_norm=0.41, lr=4.71e-06, throughput=1750 tok/s | |
| 2025-11-22 22:14:25,113 - INFO - Epoch 1 Step 5910 (Global: 5910): loss=0.0375, ppl=1.04, grad_norm=0.43, lr=4.70e-06, throughput=1678 tok/s | |
| 2025-11-22 22:19:09,941 - INFO - Epoch 1 Step 5920 (Global: 5920): loss=0.0455, ppl=1.05, grad_norm=0.40, lr=4.68e-06, throughput=1685 tok/s | |
| 2025-11-22 22:23:58,769 - INFO - Epoch 1 Step 5930 (Global: 5930): loss=0.0306, ppl=1.03, grad_norm=0.40, lr=4.66e-06, throughput=1662 tok/s | |
| 2025-11-22 22:28:29,582 - INFO - Epoch 1 Step 5940 (Global: 5940): loss=0.0252, ppl=1.03, grad_norm=0.59, lr=4.65e-06, throughput=1772 tok/s | |
| 2025-11-22 22:33:07,823 - INFO - Epoch 1 Step 5950 (Global: 5950): loss=0.0423, ppl=1.04, grad_norm=0.71, lr=4.63e-06, throughput=1725 tok/s | |
| 2025-11-22 22:37:42,548 - INFO - Epoch 1 Step 5960 (Global: 5960): loss=0.0287, ppl=1.03, grad_norm=0.33, lr=4.61e-06, throughput=1747 tok/s | |
| 2025-11-22 22:42:25,281 - INFO - Epoch 1 Step 5970 (Global: 5970): loss=0.0316, ppl=1.03, grad_norm=0.51, lr=4.60e-06, throughput=1698 tok/s | |
| 2025-11-22 22:47:12,342 - INFO - Epoch 1 Step 5980 (Global: 5980): loss=0.0353, ppl=1.04, grad_norm=0.50, lr=4.58e-06, throughput=1672 tok/s | |
| 2025-11-22 22:51:56,088 - INFO - Epoch 1 Step 5990 (Global: 5990): loss=0.0291, ppl=1.03, grad_norm=0.70, lr=4.56e-06, throughput=1692 tok/s | |
| 2025-11-22 22:56:37,187 - INFO - Epoch 1 Step 6000 (Global: 6000): loss=0.0381, ppl=1.04, grad_norm=0.28, lr=4.55e-06, throughput=1708 tok/s | |
| 2025-11-22 22:56:37,187 - INFO - | |
| Running validation at step 6000... | |
| 2025-11-22 23:14:36,114 - INFO - Validation loss: 0.0339, perplexity: 1.03 | |
| 2025-11-22 23:14:36,115 - INFO - Qualitative metrics (n=5): | |
| 2025-11-22 23:14:36,115 - INFO - BLEU: 0.9924 | |
| 2025-11-22 23:14:36,115 - INFO - METEOR: 0.9827 | |
| 2025-11-22 23:14:36,116 - INFO - Edit Distance: 0.0596 | |
| 2025-11-22 23:14:36,116 - INFO - F-measure: 0.9968 | |
| 2025-11-22 23:14:36,116 - INFO - | |
| ====================================================================== | |
| 2025-11-22 23:14:36,116 - INFO - Qualitative Evaluation Samples: | |
| 2025-11-22 23:14:36,116 - INFO - ====================================================================== | |
| 2025-11-22 23:14:36,117 - INFO - | |
| Sample 1 (ID: sample_141920_chunk_1): | |
| 2025-11-22 23:14:36,117 - INFO - Context: [Image: sample_141920_chunk_1] + " | |
| Free OCR." | |
| 2025-11-22 23:14:36,117 - INFO - Generated: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' | |
| 2025-11-22 23:14:36,117 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' | |
| 2025-11-22 23:14:36,117 - INFO - ---------------------------------------------------------------------- | |
| 2025-11-22 23:14:36,118 - INFO - | |
| Sample 2 (ID: sample_170543_chunk_2): | |
| 2025-11-22 23:14:36,118 - INFO - Context: [Image: sample_170543_chunk_2] + " | |
| Free OCR." | |
| 2025-11-22 23:14:36,118 - INFO - Generated: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' | |
| 2025-11-22 23:14:36,118 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' | |
| 2025-11-22 23:14:36,118 - INFO - ---------------------------------------------------------------------- | |
| 2025-11-22 23:14:36,118 - INFO - | |
| Sample 3 (ID: sample_107152_chunk_9): | |
| 2025-11-22 23:14:36,119 - INFO - Context: [Image: sample_107152_chunk_9] + " | |
| Free OCR." | |
| 2025-11-22 23:14:36,119 - INFO - Generated: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' | |
| 2025-11-22 23:14:36,119 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' | |
| 2025-11-22 23:14:36,119 - INFO - ---------------------------------------------------------------------- | |
| 2025-11-22 23:14:36,119 - INFO - | |
| Sample 4 (ID: sample_069148_chunk_0): | |
| 2025-11-22 23:14:36,120 - INFO - Context: [Image: sample_069148_chunk_0] + " | |
| Free OCR." | |
| 2025-11-22 23:14:36,120 - INFO - Generated: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' | |
| 2025-11-22 23:14:36,121 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' | |
| 2025-11-22 23:14:36,121 - INFO - ---------------------------------------------------------------------- | |
| 2025-11-22 23:14:36,122 - INFO - | |
| Sample 5 (ID: sample_103176_chunk_4): | |
| 2025-11-22 23:14:36,122 - INFO - Context: [Image: sample_103176_chunk_4] + " | |
| Free OCR." | |
| 2025-11-22 23:14:36,122 - INFO - Generated: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' | |
| 2025-11-22 23:14:36,123 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' | |
| 2025-11-22 23:14:36,123 - INFO - ---------------------------------------------------------------------- | |
| 2025-11-22 23:14:36,124 - INFO - | |
| Qualitative samples saved to: outputs/production_vision_base_reconstruction_20251120_220510/qualitative_step_6000.jsonl | |
| 2025-11-22 23:15:43,608 - INFO - Saved checkpoint to outputs/production_vision_base_reconstruction_20251120_220510/best_checkpoint.pt | |
| 2025-11-22 23:15:43,622 - INFO - New best validation loss: 0.0339, perplexity: 1.03 | |
| 2025-11-22 23:20:34,813 - INFO - Epoch 1 Step 6010 (Global: 6010): loss=0.0367, ppl=1.04, grad_norm=0.48, lr=4.53e-06, throughput=1649 tok/s | |
| 2025-11-22 23:25:31,271 - INFO - Epoch 1 Step 6020 (Global: 6020): loss=0.0304, ppl=1.03, grad_norm=0.50, lr=4.51e-06, throughput=1619 tok/s | |
| 2025-11-22 23:30:20,184 - INFO - Epoch 1 Step 6030 (Global: 6030): loss=0.0332, ppl=1.03, grad_norm=0.35, lr=4.50e-06, throughput=1661 tok/s | |
| 2025-11-22 23:35:09,850 - INFO - Epoch 1 Step 6040 (Global: 6040): loss=0.0534, ppl=1.05, grad_norm=0.42, lr=4.48e-06, throughput=1657 tok/s | |
| 2025-11-22 23:40:05,169 - INFO - Epoch 1 Step 6050 (Global: 6050): loss=0.0333, ppl=1.03, grad_norm=0.68, lr=4.46e-06, throughput=1625 tok/s | |
| 2025-11-22 23:45:08,397 - INFO - Epoch 1 Step 6060 (Global: 6060): loss=0.0246, ppl=1.02, grad_norm=0.39, lr=4.45e-06, throughput=1583 tok/s | |
| 2025-11-22 23:50:06,465 - INFO - Epoch 1 Step 6070 (Global: 6070): loss=0.0279, ppl=1.03, grad_norm=0.35, lr=4.43e-06, throughput=1610 tok/s | |
| 2025-11-22 23:55:08,296 - INFO - Epoch 1 Step 6080 (Global: 6080): loss=0.0387, ppl=1.04, grad_norm=0.47, lr=4.41e-06, throughput=1590 tok/s | |
| 2025-11-22 23:59:58,299 - INFO - Epoch 1 Step 6090 (Global: 6090): loss=0.0319, ppl=1.03, grad_norm=0.39, lr=4.40e-06, throughput=1655 tok/s | |
| 2025-11-23 00:04:36,093 - INFO - Epoch 1 Step 6100 (Global: 6100): loss=0.0358, ppl=1.04, grad_norm=0.59, lr=4.38e-06, throughput=1728 tok/s | |
| 2025-11-23 00:09:18,574 - INFO - Epoch 1 Step 6110 (Global: 6110): loss=0.0318, ppl=1.03, grad_norm=0.44, lr=4.36e-06, throughput=1699 tok/s | |
| 2025-11-23 00:13:59,474 - INFO - Epoch 1 Step 6120 (Global: 6120): loss=0.0663, ppl=1.07, grad_norm=0.39, lr=4.35e-06, throughput=1709 tok/s | |
| 2025-11-23 00:18:33,907 - INFO - Epoch 1 Step 6130 (Global: 6130): loss=0.0337, ppl=1.03, grad_norm=0.41, lr=4.33e-06, throughput=1749 tok/s | |
| 2025-11-23 00:23:17,104 - INFO - Epoch 1 Step 6140 (Global: 6140): loss=0.0323, ppl=1.03, grad_norm=0.45, lr=4.31e-06, throughput=1695 tok/s | |
| 2025-11-23 00:27:57,908 - INFO - Epoch 1 Step 6150 (Global: 6150): loss=0.0304, ppl=1.03, grad_norm=0.39, lr=4.30e-06, throughput=1709 tok/s | |
| 2025-11-23 00:32:35,640 - INFO - Epoch 1 Step 6160 (Global: 6160): loss=0.0323, ppl=1.03, grad_norm=0.30, lr=4.28e-06, throughput=1728 tok/s | |
| 2025-11-23 00:37:03,744 - INFO - Epoch 1 Step 6170 (Global: 6170): loss=0.0349, ppl=1.04, grad_norm=0.49, lr=4.26e-06, throughput=1790 tok/s | |
| 2025-11-23 00:41:45,398 - INFO - Epoch 1 Step 6180 (Global: 6180): loss=0.0350, ppl=1.04, grad_norm=0.43, lr=4.25e-06, throughput=1704 tok/s | |
| 2025-11-23 00:46:27,090 - INFO - Epoch 1 Step 6190 (Global: 6190): loss=0.0525, ppl=1.05, grad_norm=0.51, lr=4.23e-06, throughput=1704 tok/s | |
| 2025-11-23 00:51:08,887 - INFO - Epoch 1 Step 6200 (Global: 6200): loss=0.0267, ppl=1.03, grad_norm=0.40, lr=4.21e-06, throughput=1703 tok/s | |
| 2025-11-23 00:55:42,147 - INFO - Epoch 1 Step 6210 (Global: 6210): loss=0.0408, ppl=1.04, grad_norm=0.57, lr=4.20e-06, throughput=1757 tok/s | |
| 2025-11-23 01:00:25,345 - INFO - Epoch 1 Step 6220 (Global: 6220): loss=0.0295, ppl=1.03, grad_norm=0.32, lr=4.18e-06, throughput=1695 tok/s | |
| 2025-11-23 01:05:04,025 - INFO - Epoch 1 Step 6230 (Global: 6230): loss=0.0315, ppl=1.03, grad_norm=0.47, lr=4.16e-06, throughput=1722 tok/s | |
| 2025-11-23 01:09:41,251 - INFO - Epoch 1 Step 6240 (Global: 6240): loss=0.0315, ppl=1.03, grad_norm=0.47, lr=4.15e-06, throughput=1731 tok/s | |
| 2025-11-23 01:14:09,867 - INFO - Epoch 1 Step 6250 (Global: 6250): loss=0.0349, ppl=1.04, grad_norm=0.35, lr=4.13e-06, throughput=1787 tok/s | |
| 2025-11-23 01:18:51,316 - INFO - Epoch 1 Step 6260 (Global: 6260): loss=0.0266, ppl=1.03, grad_norm=0.38, lr=4.12e-06, throughput=1705 tok/s | |
| 2025-11-23 01:23:27,879 - INFO - Epoch 1 Step 6270 (Global: 6270): loss=0.0381, ppl=1.04, grad_norm=0.40, lr=4.10e-06, throughput=1736 tok/s | |
| 2025-11-23 01:27:55,916 - INFO - Epoch 1 Step 6280 (Global: 6280): loss=0.0250, ppl=1.03, grad_norm=0.53, lr=4.08e-06, throughput=1791 tok/s | |
| 2025-11-23 01:32:34,879 - INFO - Epoch 1 Step 6290 (Global: 6290): loss=0.0351, ppl=1.04, grad_norm=0.46, lr=4.07e-06, throughput=1721 tok/s | |
| 2025-11-23 01:37:12,159 - INFO - Epoch 1 Step 6300 (Global: 6300): loss=0.0367, ppl=1.04, grad_norm=0.48, lr=4.05e-06, throughput=1731 tok/s | |
| 2025-11-23 01:41:50,596 - INFO - Epoch 1 Step 6310 (Global: 6310): loss=0.0355, ppl=1.04, grad_norm=0.56, lr=4.03e-06, throughput=1724 tok/s | |
| 2025-11-23 01:46:17,470 - INFO - Epoch 1 Step 6320 (Global: 6320): loss=0.0369, ppl=1.04, grad_norm=0.27, lr=4.02e-06, throughput=1799 tok/s | |
| 2025-11-23 01:50:54,789 - INFO - Epoch 1 Step 6330 (Global: 6330): loss=0.0265, ppl=1.03, grad_norm=0.40, lr=4.00e-06, throughput=1731 tok/s | |
| 2025-11-23 01:55:30,301 - INFO - Epoch 1 Step 6340 (Global: 6340): loss=0.0462, ppl=1.05, grad_norm=0.33, lr=3.98e-06, throughput=1742 tok/s | |
| 2025-11-23 02:00:05,376 - INFO - Epoch 1 Step 6350 (Global: 6350): loss=0.0317, ppl=1.03, grad_norm=0.45, lr=3.97e-06, throughput=1745 tok/s | |
| 2025-11-23 02:04:33,173 - INFO - Epoch 1 Step 6360 (Global: 6360): loss=0.0303, ppl=1.03, grad_norm=0.44, lr=3.95e-06, throughput=1792 tok/s | |
| 2025-11-23 02:09:13,832 - INFO - Epoch 1 Step 6370 (Global: 6370): loss=0.0485, ppl=1.05, grad_norm=0.35, lr=3.93e-06, throughput=1710 tok/s | |
| 2025-11-23 02:13:51,885 - INFO - Epoch 1 Step 6380 (Global: 6380): loss=0.0242, ppl=1.02, grad_norm=0.33, lr=3.92e-06, throughput=1726 tok/s | |
| 2025-11-23 02:18:28,589 - INFO - Epoch 1 Step 6390 (Global: 6390): loss=0.0419, ppl=1.04, grad_norm=0.39, lr=3.90e-06, throughput=1735 tok/s | |
| 2025-11-23 02:22:56,902 - INFO - Epoch 1 Step 6400 (Global: 6400): loss=0.0579, ppl=1.06, grad_norm=0.54, lr=3.89e-06, throughput=1789 tok/s | |
| 2025-11-23 02:27:34,108 - INFO - Epoch 1 Step 6410 (Global: 6410): loss=0.0399, ppl=1.04, grad_norm=0.45, lr=3.87e-06, throughput=1732 tok/s | |
| 2025-11-23 02:32:11,798 - INFO - Epoch 1 Step 6420 (Global: 6420): loss=0.0357, ppl=1.04, grad_norm=0.35, lr=3.85e-06, throughput=1729 tok/s | |
| 2025-11-23 02:36:47,703 - INFO - Epoch 1 Step 6430 (Global: 6430): loss=0.0309, ppl=1.03, grad_norm=0.36, lr=3.84e-06, throughput=1740 tok/s | |
| 2025-11-23 02:41:16,701 - INFO - Epoch 1 Step 6440 (Global: 6440): loss=0.0389, ppl=1.04, grad_norm=0.43, lr=3.82e-06, throughput=1784 tok/s | |
| 2025-11-23 02:45:52,821 - INFO - Epoch 1 Step 6450 (Global: 6450): loss=0.0347, ppl=1.04, grad_norm=0.43, lr=3.80e-06, throughput=1738 tok/s | |
| 2025-11-23 02:50:28,680 - INFO - Epoch 1 Step 6460 (Global: 6460): loss=0.0295, ppl=1.03, grad_norm=0.37, lr=3.79e-06, throughput=1740 tok/s | |
| 2025-11-23 02:55:07,792 - INFO - Epoch 1 Step 6470 (Global: 6470): loss=0.0333, ppl=1.03, grad_norm=0.30, lr=3.77e-06, throughput=1720 tok/s | |
| 2025-11-23 02:59:36,440 - INFO - Epoch 1 Step 6480 (Global: 6480): loss=0.0268, ppl=1.03, grad_norm=0.68, lr=3.76e-06, throughput=1787 tok/s | |
| 2025-11-23 03:04:11,249 - INFO - Epoch 1 Step 6490 (Global: 6490): loss=0.0328, ppl=1.03, grad_norm=0.37, lr=3.74e-06, throughput=1747 tok/s | |
| 2025-11-23 03:08:48,250 - INFO - Epoch 1 Step 6500 (Global: 6500): loss=0.0333, ppl=1.03, grad_norm=0.41, lr=3.72e-06, throughput=1733 tok/s | |
| 2025-11-23 03:13:28,089 - INFO - Epoch 1 Step 6510 (Global: 6510): loss=0.0251, ppl=1.03, grad_norm=0.31, lr=3.71e-06, throughput=1715 tok/s | |
| 2025-11-23 03:17:59,861 - INFO - Epoch 1 Step 6520 (Global: 6520): loss=0.0323, ppl=1.03, grad_norm=0.38, lr=3.69e-06, throughput=1766 tok/s | |
| 2025-11-23 03:22:40,883 - INFO - Epoch 1 Step 6530 (Global: 6530): loss=0.0281, ppl=1.03, grad_norm=0.32, lr=3.67e-06, throughput=1708 tok/s | |
| 2025-11-23 03:27:21,007 - INFO - Epoch 1 Step 6540 (Global: 6540): loss=0.0282, ppl=1.03, grad_norm=0.29, lr=3.66e-06, throughput=1714 tok/s | |
| 2025-11-23 03:31:50,176 - INFO - Epoch 1 Step 6550 (Global: 6550): loss=0.0307, ppl=1.03, grad_norm=0.64, lr=3.64e-06, throughput=1783 tok/s | |
| 2025-11-23 03:36:31,645 - INFO - Epoch 1 Step 6560 (Global: 6560): loss=0.0310, ppl=1.03, grad_norm=0.28, lr=3.63e-06, throughput=1705 tok/s | |
| 2025-11-23 03:41:08,490 - INFO - Epoch 1 Step 6570 (Global: 6570): loss=0.0603, ppl=1.06, grad_norm=0.54, lr=3.61e-06, throughput=1734 tok/s | |
| 2025-11-23 03:45:42,884 - INFO - Epoch 1 Step 6580 (Global: 6580): loss=0.0299, ppl=1.03, grad_norm=0.43, lr=3.59e-06, throughput=1749 tok/s | |
| 2025-11-23 03:50:08,392 - INFO - Epoch 1 Step 6590 (Global: 6590): loss=0.0421, ppl=1.04, grad_norm=0.44, lr=3.58e-06, throughput=1808 tok/s | |
| 2025-11-23 03:54:47,769 - INFO - Epoch 1 Step 6600 (Global: 6600): loss=0.0276, ppl=1.03, grad_norm=0.39, lr=3.56e-06, throughput=1718 tok/s | |
| 2025-11-23 03:59:25,728 - INFO - Epoch 1 Step 6610 (Global: 6610): loss=0.0324, ppl=1.03, grad_norm=0.53, lr=3.55e-06, throughput=1727 tok/s | |
| 2025-11-23 04:04:05,735 - INFO - Epoch 1 Step 6620 (Global: 6620): loss=0.0358, ppl=1.04, grad_norm=0.33, lr=3.53e-06, throughput=1714 tok/s | |
| 2025-11-23 04:08:33,550 - INFO - Epoch 1 Step 6630 (Global: 6630): loss=0.0358, ppl=1.04, grad_norm=0.32, lr=3.51e-06, throughput=1792 tok/s | |
| 2025-11-23 04:13:12,697 - INFO - Epoch 1 Step 6640 (Global: 6640): loss=0.0300, ppl=1.03, grad_norm=0.41, lr=3.50e-06, throughput=1720 tok/s | |
| 2025-11-23 04:17:52,772 - INFO - Epoch 1 Step 6650 (Global: 6650): loss=0.0254, ppl=1.03, grad_norm=0.43, lr=3.48e-06, throughput=1714 tok/s | |
| 2025-11-23 04:22:31,133 - INFO - Epoch 1 Step 6660 (Global: 6660): loss=0.0320, ppl=1.03, grad_norm=0.37, lr=3.47e-06, throughput=1724 tok/s | |
| 2025-11-23 04:27:03,750 - INFO - Epoch 1 Step 6670 (Global: 6670): loss=0.0378, ppl=1.04, grad_norm=0.37, lr=3.45e-06, throughput=1761 tok/s | |
| 2025-11-23 04:31:45,459 - INFO - Epoch 1 Step 6680 (Global: 6680): loss=0.0267, ppl=1.03, grad_norm=0.33, lr=3.43e-06, throughput=1704 tok/s | |
| 2025-11-23 04:36:22,906 - INFO - Epoch 1 Step 6690 (Global: 6690): loss=0.0336, ppl=1.03, grad_norm=0.36, lr=3.42e-06, throughput=1730 tok/s | |
| 2025-11-23 04:41:00,664 - INFO - Epoch 1 Step 6700 (Global: 6700): loss=0.0366, ppl=1.04, grad_norm=0.51, lr=3.40e-06, throughput=1728 tok/s | |
| 2025-11-23 04:45:29,519 - INFO - Epoch 1 Step 6710 (Global: 6710): loss=0.0296, ppl=1.03, grad_norm=0.29, lr=3.39e-06, throughput=1785 tok/s | |
| 2025-11-23 04:50:21,755 - INFO - Epoch 1 Step 6720 (Global: 6720): loss=0.0321, ppl=1.03, grad_norm=0.39, lr=3.37e-06, throughput=1643 tok/s | |
| 2025-11-23 04:55:10,903 - INFO - Epoch 1 Step 6730 (Global: 6730): loss=0.0311, ppl=1.03, grad_norm=0.62, lr=3.35e-06, throughput=1660 tok/s | |
| 2025-11-23 04:59:50,289 - INFO - Epoch 1 Step 6740 (Global: 6740): loss=0.0366, ppl=1.04, grad_norm=0.39, lr=3.34e-06, throughput=1718 tok/s | |
| 2025-11-23 05:04:24,301 - INFO - Epoch 1 Step 6750 (Global: 6750): loss=0.0261, ppl=1.03, grad_norm=0.36, lr=3.32e-06, throughput=1752 tok/s | |
| 2025-11-23 05:09:03,497 - INFO - Epoch 1 Step 6760 (Global: 6760): loss=0.0262, ppl=1.03, grad_norm=0.33, lr=3.31e-06, throughput=1719 tok/s | |
| 2025-11-23 05:13:43,802 - INFO - Epoch 1 Step 6770 (Global: 6770): loss=0.0282, ppl=1.03, grad_norm=0.34, lr=3.29e-06, throughput=1712 tok/s | |
| 2025-11-23 05:18:15,757 - INFO - Epoch 1 Step 6780 (Global: 6780): loss=0.0313, ppl=1.03, grad_norm=0.31, lr=3.28e-06, throughput=1765 tok/s | |
| 2025-11-23 05:22:59,024 - INFO - Epoch 1 Step 6790 (Global: 6790): loss=0.0313, ppl=1.03, grad_norm=0.40, lr=3.26e-06, throughput=1695 tok/s | |
| 2025-11-23 05:27:40,605 - INFO - Epoch 1 Step 6800 (Global: 6800): loss=0.0359, ppl=1.04, grad_norm=0.42, lr=3.24e-06, throughput=1705 tok/s | |
| 2025-11-23 05:32:18,789 - INFO - Epoch 1 Step 6810 (Global: 6810): loss=0.0407, ppl=1.04, grad_norm=0.61, lr=3.23e-06, throughput=1725 tok/s | |
| 2025-11-23 05:36:48,477 - INFO - Epoch 1 Step 6820 (Global: 6820): loss=0.0380, ppl=1.04, grad_norm=0.35, lr=3.21e-06, throughput=1780 tok/s | |
| 2025-11-23 05:41:29,030 - INFO - Epoch 1 Step 6830 (Global: 6830): loss=0.0339, ppl=1.03, grad_norm=0.38, lr=3.20e-06, throughput=1711 tok/s | |
| 2025-11-23 05:46:07,445 - INFO - Epoch 1 Step 6840 (Global: 6840): loss=0.0336, ppl=1.03, grad_norm=0.44, lr=3.18e-06, throughput=1724 tok/s | |
| 2025-11-23 05:50:43,985 - INFO - Epoch 1 Step 6850 (Global: 6850): loss=0.0339, ppl=1.03, grad_norm=0.39, lr=3.17e-06, throughput=1736 tok/s | |
| 2025-11-23 05:55:12,042 - INFO - Epoch 1 Step 6860 (Global: 6860): loss=0.0311, ppl=1.03, grad_norm=0.49, lr=3.15e-06, throughput=1791 tok/s | |
| 2025-11-23 05:59:51,477 - INFO - Epoch 1 Step 6870 (Global: 6870): loss=0.0284, ppl=1.03, grad_norm=0.30, lr=3.13e-06, throughput=1718 tok/s | |
| 2025-11-23 06:04:29,228 - INFO - Epoch 1 Step 6880 (Global: 6880): loss=0.0297, ppl=1.03, grad_norm=0.40, lr=3.12e-06, throughput=1728 tok/s | |
| 2025-11-23 06:09:10,594 - INFO - Epoch 1 Step 6890 (Global: 6890): loss=0.0346, ppl=1.04, grad_norm=0.38, lr=3.10e-06, throughput=1706 tok/s | |
| 2025-11-23 06:13:44,896 - INFO - Epoch 1 Step 6900 (Global: 6900): loss=0.0324, ppl=1.03, grad_norm=0.35, lr=3.09e-06, throughput=1750 tok/s | |
| 2025-11-23 06:18:27,271 - INFO - Epoch 1 Step 6910 (Global: 6910): loss=0.0377, ppl=1.04, grad_norm=0.52, lr=3.07e-06, throughput=1700 tok/s | |
| 2025-11-23 06:23:09,922 - INFO - Epoch 1 Step 6920 (Global: 6920): loss=0.0358, ppl=1.04, grad_norm=0.37, lr=3.06e-06, throughput=1698 tok/s | |
| 2025-11-23 06:27:51,345 - INFO - Epoch 1 Step 6930 (Global: 6930): loss=0.0366, ppl=1.04, grad_norm=0.40, lr=3.04e-06, throughput=1706 tok/s | |
| 2025-11-23 06:32:18,355 - INFO - Epoch 1 Step 6940 (Global: 6940): loss=0.0268, ppl=1.03, grad_norm=0.33, lr=3.03e-06, throughput=1798 tok/s | |
| 2025-11-23 06:36:57,768 - INFO - Epoch 1 Step 6950 (Global: 6950): loss=0.0413, ppl=1.04, grad_norm=0.36, lr=3.01e-06, throughput=1718 tok/s | |
| 2025-11-23 06:41:36,560 - INFO - Epoch 1 Step 6960 (Global: 6960): loss=0.0259, ppl=1.03, grad_norm=0.55, lr=3.00e-06, throughput=1722 tok/s | |
| 2025-11-23 06:46:04,836 - INFO - Epoch 1 Step 6970 (Global: 6970): loss=0.0340, ppl=1.03, grad_norm=0.36, lr=2.98e-06, throughput=1789 tok/s | |
| 2025-11-23 06:50:42,775 - INFO - Epoch 1 Step 6980 (Global: 6980): loss=0.0323, ppl=1.03, grad_norm=0.29, lr=2.96e-06, throughput=1727 tok/s | |
| 2025-11-23 06:55:21,108 - INFO - Epoch 1 Step 6990 (Global: 6990): loss=0.0437, ppl=1.04, grad_norm=0.33, lr=2.95e-06, throughput=1725 tok/s | |
| 2025-11-23 06:59:58,651 - INFO - Epoch 1 Step 7000 (Global: 7000): loss=0.0269, ppl=1.03, grad_norm=0.47, lr=2.93e-06, throughput=1729 tok/s | |
| 2025-11-23 07:04:27,529 - INFO - Epoch 1 Step 7010 (Global: 7010): loss=0.0342, ppl=1.03, grad_norm=0.40, lr=2.92e-06, throughput=1785 tok/s | |
| 2025-11-23 07:09:05,502 - INFO - Epoch 1 Step 7020 (Global: 7020): loss=0.0271, ppl=1.03, grad_norm=0.32, lr=2.90e-06, throughput=1727 tok/s | |
| 2025-11-23 07:13:47,340 - INFO - Epoch 1 Step 7030 (Global: 7030): loss=0.0314, ppl=1.03, grad_norm=0.25, lr=2.89e-06, throughput=1703 tok/s | |
| 2025-11-23 07:18:27,220 - INFO - Epoch 1 Step 7040 (Global: 7040): loss=0.0329, ppl=1.03, grad_norm=0.40, lr=2.87e-06, throughput=1715 tok/s | |
| 2025-11-23 07:22:57,917 - INFO - Epoch 1 Step 7050 (Global: 7050): loss=0.0329, ppl=1.03, grad_norm=0.44, lr=2.86e-06, throughput=1773 tok/s | |
| 2025-11-23 07:27:40,887 - INFO - Epoch 1 Step 7060 (Global: 7060): loss=0.0267, ppl=1.03, grad_norm=0.28, lr=2.84e-06, throughput=1696 tok/s | |
| 2025-11-23 07:32:23,076 - INFO - Epoch 1 Step 7070 (Global: 7070): loss=0.0305, ppl=1.03, grad_norm=0.42, lr=2.83e-06, throughput=1701 tok/s | |
| 2025-11-23 07:37:02,377 - INFO - Epoch 1 Step 7080 (Global: 7080): loss=0.0344, ppl=1.04, grad_norm=0.48, lr=2.81e-06, throughput=1719 tok/s | |
| 2025-11-23 07:41:31,045 - INFO - Epoch 1 Step 7090 (Global: 7090): loss=0.0314, ppl=1.03, grad_norm=0.32, lr=2.80e-06, throughput=1787 tok/s | |
| 2025-11-23 07:46:10,882 - INFO - Epoch 1 Step 7100 (Global: 7100): loss=0.0353, ppl=1.04, grad_norm=0.38, lr=2.78e-06, throughput=1715 tok/s | |
| 2025-11-23 07:50:47,558 - INFO - Epoch 1 Step 7110 (Global: 7110): loss=0.0285, ppl=1.03, grad_norm=0.29, lr=2.77e-06, throughput=1735 tok/s | |
| 2025-11-23 07:55:27,956 - INFO - Epoch 1 Step 7120 (Global: 7120): loss=0.0314, ppl=1.03, grad_norm=0.38, lr=2.75e-06, throughput=1712 tok/s | |
| 2025-11-23 07:59:55,704 - INFO - Epoch 1 Step 7130 (Global: 7130): loss=0.0285, ppl=1.03, grad_norm=0.31, lr=2.74e-06, throughput=1793 tok/s | |
| 2025-11-23 08:04:37,657 - INFO - Epoch 1 Step 7140 (Global: 7140): loss=0.0336, ppl=1.03, grad_norm=0.44, lr=2.72e-06, throughput=1702 tok/s | |
| 2025-11-23 08:09:17,530 - INFO - Epoch 1 Step 7150 (Global: 7150): loss=0.0250, ppl=1.03, grad_norm=0.51, lr=2.71e-06, throughput=1715 tok/s | |
| 2025-11-23 08:14:01,015 - INFO - Epoch 1 Step 7160 (Global: 7160): loss=0.0245, ppl=1.02, grad_norm=0.32, lr=2.69e-06, throughput=1693 tok/s | |
| 2025-11-23 08:18:34,442 - INFO - Epoch 1 Step 7170 (Global: 7170): loss=0.0275, ppl=1.03, grad_norm=0.37, lr=2.68e-06, throughput=1756 tok/s | |
| 2025-11-23 08:23:17,970 - INFO - Epoch 1 Step 7180 (Global: 7180): loss=0.0448, ppl=1.05, grad_norm=0.42, lr=2.66e-06, throughput=1693 tok/s | |
| 2025-11-23 08:27:57,086 - INFO - Epoch 1 Step 7190 (Global: 7190): loss=0.0292, ppl=1.03, grad_norm=0.35, lr=2.65e-06, throughput=1720 tok/s | |
| 2025-11-23 08:32:25,967 - INFO - Epoch 1 Step 7200 (Global: 7200): loss=0.0260, ppl=1.03, grad_norm=0.40, lr=2.63e-06, throughput=1785 tok/s | |
| 2025-11-23 08:37:04,035 - INFO - Epoch 1 Step 7210 (Global: 7210): loss=0.0296, ppl=1.03, grad_norm=0.49, lr=2.62e-06, throughput=1726 tok/s | |
| 2025-11-23 08:41:42,811 - INFO - Epoch 1 Step 7220 (Global: 7220): loss=0.0268, ppl=1.03, grad_norm=0.35, lr=2.60e-06, throughput=1722 tok/s | |
| 2025-11-23 08:46:20,933 - INFO - Epoch 1 Step 7230 (Global: 7230): loss=0.0346, ppl=1.04, grad_norm=0.38, lr=2.59e-06, throughput=1726 tok/s | |
| 2025-11-23 08:50:51,091 - INFO - Epoch 1 Step 7240 (Global: 7240): loss=0.0359, ppl=1.04, grad_norm=0.54, lr=2.58e-06, throughput=1777 tok/s | |
| 2025-11-23 08:55:30,656 - INFO - Epoch 1 Step 7250 (Global: 7250): loss=0.0277, ppl=1.03, grad_norm=0.27, lr=2.56e-06, throughput=1717 tok/s | |
| 2025-11-23 09:00:11,124 - INFO - Epoch 1 Step 7260 (Global: 7260): loss=0.0311, ppl=1.03, grad_norm=0.35, lr=2.55e-06, throughput=1711 tok/s | |
| 2025-11-23 09:04:52,017 - INFO - Epoch 1 Step 7270 (Global: 7270): loss=0.0295, ppl=1.03, grad_norm=0.29, lr=2.53e-06, throughput=1709 tok/s | |
| 2025-11-23 09:09:23,373 - INFO - Epoch 1 Step 7280 (Global: 7280): loss=0.0306, ppl=1.03, grad_norm=0.47, lr=2.52e-06, throughput=1769 tok/s | |
| 2025-11-23 09:14:05,471 - INFO - Epoch 1 Step 7290 (Global: 7290): loss=0.0398, ppl=1.04, grad_norm=0.44, lr=2.50e-06, throughput=1702 tok/s | |
| 2025-11-23 09:18:49,139 - INFO - Epoch 1 Step 7300 (Global: 7300): loss=0.0341, ppl=1.03, grad_norm=0.61, lr=2.49e-06, throughput=1692 tok/s | |
| 2025-11-23 09:23:32,174 - INFO - Epoch 1 Step 7310 (Global: 7310): loss=0.0339, ppl=1.03, grad_norm=0.47, lr=2.47e-06, throughput=1696 tok/s | |
| 2025-11-23 09:28:04,020 - INFO - Epoch 1 Step 7320 (Global: 7320): loss=0.0294, ppl=1.03, grad_norm=0.33, lr=2.46e-06, throughput=1766 tok/s | |
| 2025-11-23 09:32:42,333 - INFO - Epoch 1 Step 7330 (Global: 7330): loss=0.0319, ppl=1.03, grad_norm=0.37, lr=2.44e-06, throughput=1725 tok/s | |
| 2025-11-23 09:37:20,719 - INFO - Epoch 1 Step 7340 (Global: 7340): loss=0.0386, ppl=1.04, grad_norm=0.54, lr=2.43e-06, throughput=1724 tok/s | |
| 2025-11-23 09:41:57,517 - INFO - Epoch 1 Step 7350 (Global: 7350): loss=0.0381, ppl=1.04, grad_norm=0.37, lr=2.42e-06, throughput=1734 tok/s | |
| 2025-11-23 09:46:31,312 - INFO - Epoch 1 Step 7360 (Global: 7360): loss=0.0381, ppl=1.04, grad_norm=0.55, lr=2.40e-06, throughput=1753 tok/s | |
| 2025-11-23 09:51:23,517 - INFO - Epoch 1 Step 7370 (Global: 7370): loss=0.0313, ppl=1.03, grad_norm=0.41, lr=2.39e-06, throughput=1643 tok/s | |
| 2025-11-23 09:55:59,346 - INFO - Epoch 1 Step 7380 (Global: 7380): loss=0.0284, ppl=1.03, grad_norm=0.30, lr=2.37e-06, throughput=1740 tok/s | |
| 2025-11-23 10:00:34,880 - INFO - Epoch 1 Step 7390 (Global: 7390): loss=0.0506, ppl=1.05, grad_norm=0.50, lr=2.36e-06, throughput=1742 tok/s | |
| 2025-11-23 10:05:03,298 - INFO - Epoch 1 Step 7400 (Global: 7400): loss=0.0214, ppl=1.02, grad_norm=0.27, lr=2.34e-06, throughput=1788 tok/s | |
| 2025-11-23 10:09:42,771 - INFO - Epoch 1 Step 7410 (Global: 7410): loss=0.0312, ppl=1.03, grad_norm=0.66, lr=2.33e-06, throughput=1718 tok/s | |
| 2025-11-23 10:14:24,058 - INFO - Epoch 1 Step 7420 (Global: 7420): loss=0.0778, ppl=1.08, grad_norm=0.36, lr=2.32e-06, throughput=1706 tok/s | |
| 2025-11-23 10:18:53,109 - INFO - Epoch 1 Step 7430 (Global: 7430): loss=0.0296, ppl=1.03, grad_norm=1.34, lr=2.30e-06, throughput=1784 tok/s | |
| 2025-11-23 10:23:34,192 - INFO - Epoch 1 Step 7440 (Global: 7440): loss=0.0403, ppl=1.04, grad_norm=0.46, lr=2.29e-06, throughput=1708 tok/s | |
| 2025-11-23 10:28:14,242 - INFO - Epoch 1 Step 7450 (Global: 7450): loss=0.0366, ppl=1.04, grad_norm=0.42, lr=2.27e-06, throughput=1714 tok/s | |
| 2025-11-23 10:32:54,036 - INFO - Epoch 1 Step 7460 (Global: 7460): loss=0.0278, ppl=1.03, grad_norm=0.37, lr=2.26e-06, throughput=1717 tok/s | |
| 2025-11-23 10:37:21,031 - INFO - Epoch 1 Step 7470 (Global: 7470): loss=0.0248, ppl=1.03, grad_norm=0.42, lr=2.25e-06, throughput=1798 tok/s | |
| 2025-11-23 10:41:57,477 - INFO - Epoch 1 Step 7480 (Global: 7480): loss=0.0257, ppl=1.03, grad_norm=0.39, lr=2.23e-06, throughput=1736 tok/s | |
| 2025-11-23 10:46:34,346 - INFO - Epoch 1 Step 7490 (Global: 7490): loss=0.0428, ppl=1.04, grad_norm=0.35, lr=2.22e-06, throughput=1734 tok/s | |
| 2025-11-23 10:51:13,059 - INFO - Epoch 1 Step 7500 (Global: 7500): loss=0.0320, ppl=1.03, grad_norm=0.34, lr=2.20e-06, throughput=1722 tok/s | |
| 2025-11-23 10:55:42,067 - INFO - Epoch 1 Step 7510 (Global: 7510): loss=0.0314, ppl=1.03, grad_norm=0.36, lr=2.19e-06, throughput=1784 tok/s | |
| 2025-11-23 11:00:18,864 - INFO - Epoch 1 Step 7520 (Global: 7520): loss=0.0282, ppl=1.03, grad_norm=0.28, lr=2.18e-06, throughput=1734 tok/s | |
| 2025-11-23 11:04:56,543 - INFO - Epoch 1 Step 7530 (Global: 7530): loss=0.0430, ppl=1.04, grad_norm=0.38, lr=2.16e-06, throughput=1729 tok/s | |
| 2025-11-23 11:09:33,209 - INFO - Epoch 1 Step 7540 (Global: 7540): loss=0.0306, ppl=1.03, grad_norm=0.37, lr=2.15e-06, throughput=1735 tok/s | |
| 2025-11-23 11:14:02,406 - INFO - Epoch 1 Step 7550 (Global: 7550): loss=0.0250, ppl=1.03, grad_norm=0.50, lr=2.14e-06, throughput=1783 tok/s | |
| 2025-11-23 11:18:43,114 - INFO - Epoch 1 Step 7560 (Global: 7560): loss=0.0294, ppl=1.03, grad_norm=0.38, lr=2.12e-06, throughput=1710 tok/s | |
| 2025-11-23 11:23:24,454 - INFO - Epoch 1 Step 7570 (Global: 7570): loss=0.0248, ppl=1.03, grad_norm=0.34, lr=2.11e-06, throughput=1706 tok/s | |
| 2025-11-23 11:28:03,821 - INFO - Epoch 1 Step 7580 (Global: 7580): loss=0.0405, ppl=1.04, grad_norm=0.41, lr=2.09e-06, throughput=1718 tok/s | |
| 2025-11-23 11:32:33,691 - INFO - Epoch 1 Step 7590 (Global: 7590): loss=0.0293, ppl=1.03, grad_norm=0.47, lr=2.08e-06, throughput=1779 tok/s | |
| 2025-11-23 11:37:11,652 - INFO - Epoch 1 Step 7600 (Global: 7600): loss=0.0283, ppl=1.03, grad_norm=0.40, lr=2.07e-06, throughput=1727 tok/s | |
| 2025-11-23 11:41:48,711 - INFO - Epoch 1 Step 7610 (Global: 7610): loss=0.0498, ppl=1.05, grad_norm=0.37, lr=2.05e-06, throughput=1733 tok/s | |
| 2025-11-23 11:46:25,734 - INFO - Epoch 1 Step 7620 (Global: 7620): loss=0.0267, ppl=1.03, grad_norm=0.48, lr=2.04e-06, throughput=1733 tok/s | |
| 2025-11-23 11:50:52,047 - INFO - Epoch 1 Step 7630 (Global: 7630): loss=0.0290, ppl=1.03, grad_norm=0.33, lr=2.03e-06, throughput=1802 tok/s | |
| 2025-11-23 11:55:30,296 - INFO - Epoch 1 Step 7640 (Global: 7640): loss=0.0271, ppl=1.03, grad_norm=0.37, lr=2.01e-06, throughput=1725 tok/s | |
| 2025-11-23 12:00:03,768 - INFO - Epoch 1 Step 7650 (Global: 7650): loss=0.0324, ppl=1.03, grad_norm=0.33, lr=2.00e-06, throughput=1755 tok/s | |
| 2025-11-23 12:04:27,520 - INFO - Epoch 1 Step 7660 (Global: 7660): loss=0.0236, ppl=1.02, grad_norm=0.31, lr=1.99e-06, throughput=1820 tok/s | |
| 2025-11-23 12:09:01,781 - INFO - Epoch 1 Step 7670 (Global: 7670): loss=0.0297, ppl=1.03, grad_norm=0.80, lr=1.97e-06, throughput=1750 tok/s | |
| 2025-11-23 12:13:38,786 - INFO - Epoch 1 Step 7680 (Global: 7680): loss=0.0317, ppl=1.03, grad_norm=0.36, lr=1.96e-06, throughput=1733 tok/s | |
| 2025-11-23 12:18:16,906 - INFO - Epoch 1 Step 7690 (Global: 7690): loss=0.0436, ppl=1.04, grad_norm=0.59, lr=1.95e-06, throughput=1726 tok/s | |
| 2025-11-23 12:22:44,116 - INFO - Epoch 1 Step 7700 (Global: 7700): loss=0.0325, ppl=1.03, grad_norm=0.32, lr=1.93e-06, throughput=1796 tok/s | |
| 2025-11-23 12:27:22,111 - INFO - Epoch 1 Step 7710 (Global: 7710): loss=0.0330, ppl=1.03, grad_norm=0.45, lr=1.92e-06, throughput=1727 tok/s | |
| 2025-11-23 12:31:56,618 - INFO - Epoch 1 Step 7720 (Global: 7720): loss=0.0322, ppl=1.03, grad_norm=0.31, lr=1.91e-06, throughput=1749 tok/s | |
| 2025-11-23 12:36:29,711 - INFO - Epoch 1 Step 7730 (Global: 7730): loss=0.0360, ppl=1.04, grad_norm=0.38, lr=1.89e-06, throughput=1758 tok/s | |
| 2025-11-23 12:40:54,502 - INFO - Epoch 1 Step 7740 (Global: 7740): loss=0.0377, ppl=1.04, grad_norm=0.31, lr=1.88e-06, throughput=1813 tok/s | |
| 2025-11-23 12:45:28,879 - INFO - Epoch 1 Step 7750 (Global: 7750): loss=0.0352, ppl=1.04, grad_norm=0.50, lr=1.87e-06, throughput=1749 tok/s | |
| 2025-11-23 12:50:03,020 - INFO - Epoch 1 Step 7760 (Global: 7760): loss=0.0318, ppl=1.03, grad_norm=0.34, lr=1.85e-06, throughput=1751 tok/s | |
| 2025-11-23 12:54:39,243 - INFO - Epoch 1 Step 7770 (Global: 7770): loss=0.0380, ppl=1.04, grad_norm=0.35, lr=1.84e-06, throughput=1738 tok/s | |
| 2025-11-23 12:59:06,117 - INFO - Epoch 1 Step 7780 (Global: 7780): loss=0.0287, ppl=1.03, grad_norm=0.44, lr=1.83e-06, throughput=1799 tok/s | |
| 2025-11-23 13:03:40,553 - INFO - Epoch 1 Step 7790 (Global: 7790): loss=0.0351, ppl=1.04, grad_norm=0.36, lr=1.82e-06, throughput=1749 tok/s | |
| 2025-11-23 13:08:14,352 - INFO - Epoch 1 Step 7800 (Global: 7800): loss=0.0275, ppl=1.03, grad_norm=0.39, lr=1.80e-06, throughput=1753 tok/s | |
| 2025-11-23 13:12:50,383 - INFO - Epoch 1 Step 7810 (Global: 7810): loss=0.0240, ppl=1.02, grad_norm=0.29, lr=1.79e-06, throughput=1739 tok/s | |
| 2025-11-23 13:17:14,812 - INFO - Epoch 1 Step 7820 (Global: 7820): loss=0.0306, ppl=1.03, grad_norm=0.38, lr=1.78e-06, throughput=1815 tok/s | |
| 2025-11-23 13:21:50,147 - INFO - Epoch 1 Step 7830 (Global: 7830): loss=0.0276, ppl=1.03, grad_norm=0.32, lr=1.76e-06, throughput=1743 tok/s | |
| 2025-11-23 13:26:25,682 - INFO - Epoch 1 Step 7840 (Global: 7840): loss=0.0307, ppl=1.03, grad_norm=0.33, lr=1.75e-06, throughput=1742 tok/s | |
| 2025-11-23 13:30:52,931 - INFO - Epoch 1 Step 7850 (Global: 7850): loss=0.0295, ppl=1.03, grad_norm=0.44, lr=1.74e-06, throughput=1796 tok/s | |
| 2025-11-23 13:35:29,722 - INFO - Epoch 1 Step 7860 (Global: 7860): loss=0.0294, ppl=1.03, grad_norm=0.38, lr=1.73e-06, throughput=1734 tok/s | |
| 2025-11-23 13:40:01,819 - INFO - Epoch 1 Step 7870 (Global: 7870): loss=0.0284, ppl=1.03, grad_norm=0.36, lr=1.71e-06, throughput=1764 tok/s | |
| 2025-11-23 13:44:36,318 - INFO - Epoch 1 Step 7880 (Global: 7880): loss=0.0332, ppl=1.03, grad_norm=0.41, lr=1.70e-06, throughput=1749 tok/s | |
| 2025-11-23 13:49:00,315 - INFO - Epoch 1 Step 7890 (Global: 7890): loss=0.0388, ppl=1.04, grad_norm=0.39, lr=1.69e-06, throughput=1818 tok/s | |
| 2025-11-23 13:53:33,143 - INFO - Epoch 1 Step 7900 (Global: 7900): loss=0.0280, ppl=1.03, grad_norm=0.35, lr=1.68e-06, throughput=1759 tok/s | |
| 2025-11-23 13:58:10,184 - INFO - Epoch 1 Step 7910 (Global: 7910): loss=0.0331, ppl=1.03, grad_norm=0.34, lr=1.66e-06, throughput=1733 tok/s | |
| 2025-11-23 14:02:47,442 - INFO - Epoch 1 Step 7920 (Global: 7920): loss=0.0282, ppl=1.03, grad_norm=0.31, lr=1.65e-06, throughput=1731 tok/s | |
| 2025-11-23 14:07:26,473 - INFO - Epoch 1 Step 7930 (Global: 7930): loss=0.0324, ppl=1.03, grad_norm=0.52, lr=1.64e-06, throughput=1720 tok/s | |
| 2025-11-23 14:12:05,633 - INFO - Epoch 1 Step 7940 (Global: 7940): loss=0.0262, ppl=1.03, grad_norm=0.29, lr=1.63e-06, throughput=1719 tok/s | |
| 2025-11-23 14:16:40,521 - INFO - Epoch 1 Step 7950 (Global: 7950): loss=0.0350, ppl=1.04, grad_norm=0.37, lr=1.61e-06, throughput=1746 tok/s | |
| 2025-11-23 14:21:16,943 - INFO - Epoch 1 Step 7960 (Global: 7960): loss=0.0376, ppl=1.04, grad_norm=0.44, lr=1.60e-06, throughput=1736 tok/s | |
| 2025-11-23 14:25:42,813 - INFO - Epoch 1 Step 7970 (Global: 7970): loss=0.0305, ppl=1.03, grad_norm=0.35, lr=1.59e-06, throughput=1805 tok/s | |
| 2025-11-23 14:30:17,481 - INFO - Epoch 1 Step 7980 (Global: 7980): loss=0.0316, ppl=1.03, grad_norm=0.29, lr=1.58e-06, throughput=1748 tok/s | |
| 2025-11-23 14:34:58,034 - INFO - Epoch 1 Step 7990 (Global: 7990): loss=0.0295, ppl=1.03, grad_norm=0.38, lr=1.56e-06, throughput=1711 tok/s | |
| 2025-11-23 14:39:30,945 - INFO - Epoch 1 Step 8000 (Global: 8000): loss=0.0281, ppl=1.03, grad_norm=0.38, lr=1.55e-06, throughput=1759 tok/s | |
| 2025-11-23 14:39:30,945 - INFO - | |
| Running validation at step 8000... | |
| 2025-11-23 14:54:52,452 - INFO - Validation loss: 0.0320, perplexity: 1.03 | |
| 2025-11-23 14:54:52,453 - INFO - Qualitative metrics (n=5): | |
| 2025-11-23 14:54:52,453 - INFO - BLEU: 0.9938 | |
| 2025-11-23 14:54:52,453 - INFO - METEOR: 0.9844 | |
| 2025-11-23 14:54:52,453 - INFO - Edit Distance: 0.0596 | |
| 2025-11-23 14:54:52,453 - INFO - F-measure: 0.9982 | |
| 2025-11-23 14:54:52,453 - INFO - | |
| ====================================================================== | |
| 2025-11-23 14:54:52,453 - INFO - Qualitative Evaluation Samples: | |
| 2025-11-23 14:54:52,453 - INFO - ====================================================================== | |
| 2025-11-23 14:54:52,453 - INFO - | |
| Sample 1 (ID: sample_141920_chunk_1): | |
| 2025-11-23 14:54:52,453 - INFO - Context: [Image: sample_141920_chunk_1] + " | |
| Free OCR." | |
| 2025-11-23 14:54:52,454 - INFO - Generated: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' | |
| 2025-11-23 14:54:52,454 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' | |
| 2025-11-23 14:54:52,454 - INFO - ---------------------------------------------------------------------- | |
| 2025-11-23 14:54:52,454 - INFO - | |
| Sample 2 (ID: sample_170543_chunk_2): | |
| 2025-11-23 14:54:52,454 - INFO - Context: [Image: sample_170543_chunk_2] + " | |
| Free OCR." | |
| 2025-11-23 14:54:52,454 - INFO - Generated: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' | |
| 2025-11-23 14:54:52,454 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' | |
| 2025-11-23 14:54:52,454 - INFO - ---------------------------------------------------------------------- | |
| 2025-11-23 14:54:52,454 - INFO - | |
| Sample 3 (ID: sample_107152_chunk_9): | |
| 2025-11-23 14:54:52,454 - INFO - Context: [Image: sample_107152_chunk_9] + " | |
| Free OCR." | |
| 2025-11-23 14:54:52,454 - INFO - Generated: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' | |
| 2025-11-23 14:54:52,454 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' | |
| 2025-11-23 14:54:52,455 - INFO - ---------------------------------------------------------------------- | |
| 2025-11-23 14:54:52,455 - INFO - | |
| Sample 4 (ID: sample_069148_chunk_0): | |
| 2025-11-23 14:54:52,455 - INFO - Context: [Image: sample_069148_chunk_0] + " | |
| Free OCR." | |
| 2025-11-23 14:54:52,455 - INFO - Generated: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' | |
| 2025-11-23 14:54:52,455 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' | |
| 2025-11-23 14:54:52,455 - INFO - ---------------------------------------------------------------------- | |
| 2025-11-23 14:54:52,455 - INFO - | |
| Sample 5 (ID: sample_103176_chunk_4): | |
| 2025-11-23 14:54:52,455 - INFO - Context: [Image: sample_103176_chunk_4] + " | |
| Free OCR." | |
| 2025-11-23 14:54:52,455 - INFO - Generated: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' | |
| 2025-11-23 14:54:52,455 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' | |
| 2025-11-23 14:54:52,455 - INFO - ---------------------------------------------------------------------- | |
| 2025-11-23 14:54:52,457 - INFO - | |
| Qualitative samples saved to: outputs/production_vision_base_reconstruction_20251120_220510/qualitative_step_8000.jsonl | |
| 2025-11-23 14:55:43,935 - INFO - Saved checkpoint to outputs/production_vision_base_reconstruction_20251120_220510/best_checkpoint.pt | |
| 2025-11-23 14:55:43,948 - INFO - New best validation loss: 0.0320, perplexity: 1.03 | |
| 2025-11-23 15:00:06,019 - INFO - Epoch 1 Step 8010 (Global: 8010): loss=0.0290, ppl=1.03, grad_norm=0.34, lr=1.54e-06, throughput=1832 tok/s | |
| 2025-11-23 15:04:43,744 - INFO - Epoch 1 Step 8020 (Global: 8020): loss=0.0283, ppl=1.03, grad_norm=0.38, lr=1.53e-06, throughput=1728 tok/s | |
| 2025-11-23 15:09:25,435 - INFO - Epoch 1 Step 8030 (Global: 8030): loss=0.0295, ppl=1.03, grad_norm=0.43, lr=1.52e-06, throughput=1704 tok/s | |
| 2025-11-23 15:13:59,294 - INFO - Epoch 1 Step 8040 (Global: 8040): loss=0.0324, ppl=1.03, grad_norm=0.37, lr=1.50e-06, throughput=1753 tok/s | |
| 2025-11-23 15:18:24,865 - INFO - Epoch 1 Step 8050 (Global: 8050): loss=0.0279, ppl=1.03, grad_norm=0.32, lr=1.49e-06, throughput=1807 tok/s | |
| 2025-11-23 15:23:00,450 - INFO - Epoch 1 Step 8060 (Global: 8060): loss=0.0293, ppl=1.03, grad_norm=0.30, lr=1.48e-06, throughput=1742 tok/s | |
| 2025-11-23 15:27:37,946 - INFO - Epoch 1 Step 8070 (Global: 8070): loss=0.0304, ppl=1.03, grad_norm=0.42, lr=1.47e-06, throughput=1730 tok/s | |
| 2025-11-23 15:32:10,639 - INFO - Epoch 1 Step 8080 (Global: 8080): loss=0.0329, ppl=1.03, grad_norm=0.35, lr=1.46e-06, throughput=1760 tok/s | |
| 2025-11-23 15:36:37,759 - INFO - Epoch 1 Step 8090 (Global: 8090): loss=0.0371, ppl=1.04, grad_norm=0.48, lr=1.44e-06, throughput=1797 tok/s | |
| 2025-11-23 15:41:15,507 - INFO - Epoch 1 Step 8100 (Global: 8100): loss=0.0256, ppl=1.03, grad_norm=0.36, lr=1.43e-06, throughput=1728 tok/s | |
| 2025-11-23 15:45:48,559 - INFO - Epoch 1 Step 8110 (Global: 8110): loss=0.0285, ppl=1.03, grad_norm=0.29, lr=1.42e-06, throughput=1758 tok/s | |
| 2025-11-23 15:50:23,661 - INFO - Epoch 1 Step 8120 (Global: 8120): loss=0.0304, ppl=1.03, grad_norm=0.36, lr=1.41e-06, throughput=1745 tok/s | |
| 2025-11-23 15:54:52,669 - INFO - Epoch 1 Step 8130 (Global: 8130): loss=0.0203, ppl=1.02, grad_norm=0.38, lr=1.40e-06, throughput=1784 tok/s | |
| 2025-11-23 15:59:31,953 - INFO - Epoch 1 Step 8140 (Global: 8140): loss=0.0241, ppl=1.02, grad_norm=0.27, lr=1.39e-06, throughput=1719 tok/s | |
| 2025-11-23 16:04:12,738 - INFO - Epoch 1 Step 8150 (Global: 8150): loss=0.0364, ppl=1.04, grad_norm=0.55, lr=1.37e-06, throughput=1710 tok/s | |
| 2025-11-23 16:08:43,157 - INFO - Epoch 1 Step 8160 (Global: 8160): loss=0.0339, ppl=1.03, grad_norm=0.29, lr=1.36e-06, throughput=1775 tok/s | |
| 2025-11-23 16:13:23,320 - INFO - Epoch 1 Step 8170 (Global: 8170): loss=0.0265, ppl=1.03, grad_norm=0.41, lr=1.35e-06, throughput=1713 tok/s | |
| 2025-11-23 16:18:10,924 - INFO - Epoch 1 Step 8180 (Global: 8180): loss=0.0280, ppl=1.03, grad_norm=0.51, lr=1.34e-06, throughput=1669 tok/s | |
| 2025-11-23 16:23:21,691 - INFO - Epoch 1 Step 8190 (Global: 8190): loss=0.0291, ppl=1.03, grad_norm=0.32, lr=1.33e-06, throughput=1545 tok/s | |
| 2025-11-23 16:28:43,693 - INFO - Epoch 1 Step 8200 (Global: 8200): loss=0.0203, ppl=1.02, grad_norm=0.33, lr=1.32e-06, throughput=1491 tok/s | |
| 2025-11-23 16:34:16,721 - INFO - Epoch 1 Step 8210 (Global: 8210): loss=0.0316, ppl=1.03, grad_norm=0.37, lr=1.31e-06, throughput=1441 tok/s | |
| 2025-11-23 16:39:41,386 - INFO - Epoch 1 Step 8220 (Global: 8220): loss=0.0477, ppl=1.05, grad_norm=0.55, lr=1.29e-06, throughput=1478 tok/s | |
| 2025-11-23 16:44:51,990 - INFO - Epoch 1 Step 8230 (Global: 8230): loss=0.0289, ppl=1.03, grad_norm=0.36, lr=1.28e-06, throughput=1545 tok/s | |
| 2025-11-23 16:49:42,107 - INFO - Epoch 1 Step 8240 (Global: 8240): loss=0.0323, ppl=1.03, grad_norm=0.37, lr=1.27e-06, throughput=1655 tok/s | |
| 2025-11-23 16:54:55,202 - INFO - Epoch 1 Step 8250 (Global: 8250): loss=0.0309, ppl=1.03, grad_norm=0.38, lr=1.26e-06, throughput=1533 tok/s | |
| 2025-11-23 17:00:12,399 - INFO - Epoch 1 Step 8260 (Global: 8260): loss=0.0257, ppl=1.03, grad_norm=0.36, lr=1.25e-06, throughput=1513 tok/s | |
| 2025-11-23 17:05:47,942 - INFO - Epoch 1 Step 8270 (Global: 8270): loss=0.0318, ppl=1.03, grad_norm=0.34, lr=1.24e-06, throughput=1431 tok/s | |
| 2025-11-23 17:10:37,598 - INFO - Epoch 1 Step 8280 (Global: 8280): loss=0.0267, ppl=1.03, grad_norm=0.33, lr=1.23e-06, throughput=1657 tok/s | |
| 2025-11-23 17:16:01,192 - INFO - Epoch 1 Step 8290 (Global: 8290): loss=0.0326, ppl=1.03, grad_norm=0.33, lr=1.22e-06, throughput=1483 tok/s | |
| 2025-11-23 17:21:18,080 - INFO - Epoch 1 Step 8300 (Global: 8300): loss=0.0290, ppl=1.03, grad_norm=0.32, lr=1.21e-06, throughput=1515 tok/s | |
| 2025-11-23 17:26:12,528 - INFO - Epoch 1 Step 8310 (Global: 8310): loss=0.0309, ppl=1.03, grad_norm=0.52, lr=1.20e-06, throughput=1630 tok/s | |
| 2025-11-23 17:30:52,604 - INFO - Epoch 1 Step 8320 (Global: 8320): loss=0.0313, ppl=1.03, grad_norm=0.36, lr=1.18e-06, throughput=1714 tok/s | |
| 2025-11-23 17:35:41,811 - INFO - Epoch 1 Step 8330 (Global: 8330): loss=0.0289, ppl=1.03, grad_norm=0.38, lr=1.17e-06, throughput=1660 tok/s | |
| 2025-11-23 17:40:28,367 - INFO - Epoch 1 Step 8340 (Global: 8340): loss=0.0342, ppl=1.03, grad_norm=0.36, lr=1.16e-06, throughput=1675 tok/s | |
| 2025-11-23 17:45:05,639 - INFO - Epoch 1 Step 8350 (Global: 8350): loss=0.0310, ppl=1.03, grad_norm=0.31, lr=1.15e-06, throughput=1731 tok/s | |
| 2025-11-23 17:50:00,517 - INFO - Epoch 1 Step 8360 (Global: 8360): loss=0.0334, ppl=1.03, grad_norm=0.41, lr=1.14e-06, throughput=1628 tok/s | |
| 2025-11-23 17:54:55,030 - INFO - Epoch 1 Step 8370 (Global: 8370): loss=0.0385, ppl=1.04, grad_norm=0.44, lr=1.13e-06, throughput=1630 tok/s | |
| 2025-11-23 17:59:45,735 - INFO - Epoch 1 Step 8380 (Global: 8380): loss=0.0388, ppl=1.04, grad_norm=0.30, lr=1.12e-06, throughput=1651 tok/s | |
| 2025-11-23 18:04:23,526 - INFO - Epoch 1 Step 8390 (Global: 8390): loss=0.0279, ppl=1.03, grad_norm=0.34, lr=1.11e-06, throughput=1728 tok/s | |
| 2025-11-23 18:09:10,417 - INFO - Epoch 1 Step 8400 (Global: 8400): loss=0.0331, ppl=1.03, grad_norm=0.49, lr=1.10e-06, throughput=1673 tok/s | |
| 2025-11-23 18:14:02,956 - INFO - Epoch 1 Step 8410 (Global: 8410): loss=0.0277, ppl=1.03, grad_norm=0.70, lr=1.09e-06, throughput=1641 tok/s | |
| 2025-11-23 18:18:50,761 - INFO - Epoch 1 Step 8420 (Global: 8420): loss=0.0287, ppl=1.03, grad_norm=0.33, lr=1.08e-06, throughput=1668 tok/s | |
| 2025-11-23 18:23:34,431 - INFO - Epoch 1 Step 8430 (Global: 8430): loss=0.0247, ppl=1.03, grad_norm=0.24, lr=1.07e-06, throughput=1692 tok/s | |
| 2025-11-23 18:28:28,555 - INFO - Epoch 1 Step 8440 (Global: 8440): loss=0.0599, ppl=1.06, grad_norm=0.58, lr=1.06e-06, throughput=1632 tok/s | |
| 2025-11-23 18:33:18,090 - INFO - Epoch 1 Step 8450 (Global: 8450): loss=0.0304, ppl=1.03, grad_norm=0.36, lr=1.05e-06, throughput=1658 tok/s | |
| 2025-11-23 18:38:13,097 - INFO - Epoch 1 Step 8460 (Global: 8460): loss=0.0288, ppl=1.03, grad_norm=0.54, lr=1.04e-06, throughput=1627 tok/s | |
| 2025-11-23 18:42:49,693 - INFO - Epoch 1 Step 8470 (Global: 8470): loss=0.0256, ppl=1.03, grad_norm=0.39, lr=1.03e-06, throughput=1735 tok/s | |
| 2025-11-23 18:47:45,921 - INFO - Epoch 1 Step 8480 (Global: 8480): loss=0.0392, ppl=1.04, grad_norm=0.51, lr=1.02e-06, throughput=1620 tok/s | |
| 2025-11-23 18:52:42,886 - INFO - Epoch 1 Step 8490 (Global: 8490): loss=0.0396, ppl=1.04, grad_norm=0.39, lr=1.01e-06, throughput=1616 tok/s | |
| 2025-11-23 18:57:34,152 - INFO - Epoch 1 Step 8500 (Global: 8500): loss=0.0277, ppl=1.03, grad_norm=0.34, lr=9.96e-07, throughput=1648 tok/s | |
| 2025-11-23 19:02:16,394 - INFO - Epoch 1 Step 8510 (Global: 8510): loss=0.0297, ppl=1.03, grad_norm=0.46, lr=9.86e-07, throughput=1701 tok/s | |
| 2025-11-23 19:07:12,057 - INFO - Epoch 1 Step 8520 (Global: 8520): loss=0.0402, ppl=1.04, grad_norm=0.42, lr=9.76e-07, throughput=1623 tok/s | |
| 2025-11-23 19:12:07,050 - INFO - Epoch 1 Step 8530 (Global: 8530): loss=0.0366, ppl=1.04, grad_norm=0.31, lr=9.67e-07, throughput=1627 tok/s | |
| 2025-11-23 19:17:11,056 - INFO - Epoch 1 Step 8540 (Global: 8540): loss=0.0384, ppl=1.04, grad_norm=0.39, lr=9.57e-07, throughput=1579 tok/s | |
| 2025-11-23 19:22:12,918 - INFO - Epoch 1 Step 8550 (Global: 8550): loss=0.0331, ppl=1.03, grad_norm=0.35, lr=9.47e-07, throughput=1590 tok/s | |
| 2025-11-23 19:27:01,465 - INFO - Epoch 1 Step 8560 (Global: 8560): loss=0.0269, ppl=1.03, grad_norm=0.33, lr=9.37e-07, throughput=1664 tok/s | |
| 2025-11-23 19:31:52,025 - INFO - Epoch 1 Step 8570 (Global: 8570): loss=0.0289, ppl=1.03, grad_norm=0.27, lr=9.27e-07, throughput=1652 tok/s | |
| 2025-11-23 19:36:30,539 - INFO - Epoch 1 Step 8580 (Global: 8580): loss=0.0405, ppl=1.04, grad_norm=0.44, lr=9.18e-07, throughput=1723 tok/s | |
| 2025-11-23 19:41:22,864 - INFO - Epoch 1 Step 8590 (Global: 8590): loss=0.0285, ppl=1.03, grad_norm=0.33, lr=9.08e-07, throughput=1642 tok/s | |
| 2025-11-23 19:46:17,904 - INFO - Epoch 1 Step 8600 (Global: 8600): loss=0.0437, ppl=1.04, grad_norm=0.41, lr=8.98e-07, throughput=1627 tok/s | |
| 2025-11-23 19:51:39,859 - INFO - Epoch 1 Step 8610 (Global: 8610): loss=0.0284, ppl=1.03, grad_norm=0.36, lr=8.89e-07, throughput=1491 tok/s | |
| 2025-11-23 19:56:39,831 - INFO - Epoch 1 Step 8620 (Global: 8620): loss=0.0351, ppl=1.04, grad_norm=0.37, lr=8.79e-07, throughput=1600 tok/s | |
| 2025-11-23 20:01:33,415 - INFO - Epoch 1 Step 8630 (Global: 8630): loss=0.0283, ppl=1.03, grad_norm=0.34, lr=8.70e-07, throughput=1635 tok/s | |
| 2025-11-23 20:06:33,032 - INFO - Epoch 1 Step 8640 (Global: 8640): loss=0.0247, ppl=1.02, grad_norm=0.32, lr=8.60e-07, throughput=1602 tok/s | |
| 2025-11-23 20:11:34,344 - INFO - Epoch 1 Step 8650 (Global: 8650): loss=0.0277, ppl=1.03, grad_norm=0.30, lr=8.51e-07, throughput=1593 tok/s | |
| 2025-11-23 20:16:15,552 - INFO - Epoch 1 Step 8660 (Global: 8660): loss=0.0351, ppl=1.04, grad_norm=0.36, lr=8.42e-07, throughput=1707 tok/s | |
| 2025-11-23 20:21:42,944 - INFO - Epoch 1 Step 8670 (Global: 8670): loss=0.0337, ppl=1.03, grad_norm=0.49, lr=8.32e-07, throughput=1466 tok/s | |
| 2025-11-23 20:27:04,092 - INFO - Epoch 1 Step 8680 (Global: 8680): loss=0.0248, ppl=1.03, grad_norm=0.49, lr=8.23e-07, throughput=1495 tok/s | |
| 2025-11-23 20:32:30,030 - INFO - Epoch 1 Step 8690 (Global: 8690): loss=0.0351, ppl=1.04, grad_norm=0.34, lr=8.14e-07, throughput=1473 tok/s | |
| 2025-11-23 20:37:15,690 - INFO - Epoch 1 Step 8700 (Global: 8700): loss=0.0316, ppl=1.03, grad_norm=0.32, lr=8.05e-07, throughput=1680 tok/s | |
| 2025-11-23 20:42:09,155 - INFO - Epoch 1 Step 8710 (Global: 8710): loss=0.0269, ppl=1.03, grad_norm=0.42, lr=7.96e-07, throughput=1636 tok/s | |
| 2025-11-23 20:47:01,721 - INFO - Epoch 1 Step 8720 (Global: 8720): loss=0.0331, ppl=1.03, grad_norm=0.30, lr=7.87e-07, throughput=1641 tok/s | |
| 2025-11-23 20:51:52,840 - INFO - Epoch 1 Step 8730 (Global: 8730): loss=0.0526, ppl=1.05, grad_norm=0.54, lr=7.78e-07, throughput=1649 tok/s | |
| 2025-11-23 20:56:30,169 - INFO - Epoch 1 Step 8740 (Global: 8740): loss=0.0274, ppl=1.03, grad_norm=0.36, lr=7.69e-07, throughput=1731 tok/s | |
| 2025-11-23 21:01:52,784 - INFO - Epoch 1 Step 8750 (Global: 8750): loss=0.0302, ppl=1.03, grad_norm=0.27, lr=7.60e-07, throughput=1488 tok/s | |
| 2025-11-23 21:07:58,610 - INFO - Epoch 1 Step 8760 (Global: 8760): loss=0.0350, ppl=1.04, grad_norm=0.83, lr=7.51e-07, throughput=1312 tok/s | |
| 2025-11-23 21:13:44,752 - INFO - Epoch 1 Step 8770 (Global: 8770): loss=0.0321, ppl=1.03, grad_norm=0.36, lr=7.42e-07, throughput=1387 tok/s | |
| 2025-11-23 21:19:36,917 - INFO - Epoch 1 Step 8780 (Global: 8780): loss=0.0266, ppl=1.03, grad_norm=0.31, lr=7.33e-07, throughput=1363 tok/s | |
| 2025-11-23 21:25:24,181 - INFO - Epoch 1 Step 8790 (Global: 8790): loss=0.0358, ppl=1.04, grad_norm=0.36, lr=7.25e-07, throughput=1382 tok/s | |
| 2025-11-23 21:31:10,302 - INFO - Epoch 1 Step 8800 (Global: 8800): loss=0.0306, ppl=1.03, grad_norm=0.36, lr=7.16e-07, throughput=1387 tok/s | |
| 2025-11-23 21:36:37,891 - INFO - Epoch 1 Step 8810 (Global: 8810): loss=0.0286, ppl=1.03, grad_norm=0.41, lr=7.07e-07, throughput=1465 tok/s | |
| 2025-11-23 21:41:57,502 - INFO - Epoch 1 Step 8820 (Global: 8820): loss=0.0325, ppl=1.03, grad_norm=0.45, lr=6.99e-07, throughput=1502 tok/s | |
| 2025-11-23 21:47:15,278 - INFO - Epoch 1 Step 8830 (Global: 8830): loss=0.0278, ppl=1.03, grad_norm=0.45, lr=6.90e-07, throughput=1511 tok/s | |
| 2025-11-23 21:52:46,685 - INFO - Epoch 1 Step 8840 (Global: 8840): loss=0.0295, ppl=1.03, grad_norm=0.30, lr=6.82e-07, throughput=1448 tok/s | |
| 2025-11-23 21:57:46,383 - INFO - Epoch 1 Step 8850 (Global: 8850): loss=0.0469, ppl=1.05, grad_norm=0.44, lr=6.74e-07, throughput=1602 tok/s | |
| 2025-11-23 22:02:59,174 - INFO - Epoch 1 Step 8860 (Global: 8860): loss=0.0298, ppl=1.03, grad_norm=0.37, lr=6.65e-07, throughput=1535 tok/s | |
| 2025-11-23 22:08:02,021 - INFO - Epoch 1 Step 8870 (Global: 8870): loss=0.0315, ppl=1.03, grad_norm=0.36, lr=6.57e-07, throughput=1585 tok/s | |
| 2025-11-23 22:13:04,625 - INFO - Epoch 1 Step 8880 (Global: 8880): loss=0.0369, ppl=1.04, grad_norm=0.50, lr=6.49e-07, throughput=1586 tok/s | |
| 2025-11-23 22:18:11,118 - INFO - Epoch 1 Step 8890 (Global: 8890): loss=0.0363, ppl=1.04, grad_norm=0.34, lr=6.40e-07, throughput=1566 tok/s | |
| 2025-11-23 22:23:31,100 - INFO - Epoch 1 Step 8900 (Global: 8900): loss=0.0233, ppl=1.02, grad_norm=0.32, lr=6.32e-07, throughput=1500 tok/s | |
| 2025-11-23 22:28:50,462 - INFO - Epoch 1 Step 8910 (Global: 8910): loss=0.0311, ppl=1.03, grad_norm=0.35, lr=6.24e-07, throughput=1503 tok/s | |
| 2025-11-23 22:34:09,144 - INFO - Epoch 1 Step 8920 (Global: 8920): loss=0.0349, ppl=1.04, grad_norm=0.36, lr=6.16e-07, throughput=1506 tok/s | |
| 2025-11-23 22:39:24,709 - INFO - Epoch 1 Step 8930 (Global: 8930): loss=0.0231, ppl=1.02, grad_norm=0.35, lr=6.08e-07, throughput=1521 tok/s | |
| 2025-11-23 22:44:53,817 - INFO - Epoch 1 Step 8940 (Global: 8940): loss=0.0435, ppl=1.04, grad_norm=0.48, lr=6.00e-07, throughput=1459 tok/s | |
| 2025-11-23 22:50:19,423 - INFO - Epoch 1 Step 8950 (Global: 8950): loss=0.0298, ppl=1.03, grad_norm=0.32, lr=5.92e-07, throughput=1474 tok/s | |
| 2025-11-23 22:55:01,427 - INFO - Epoch 1 Step 8960 (Global: 8960): loss=0.0263, ppl=1.03, grad_norm=0.45, lr=5.84e-07, throughput=1702 tok/s | |
| 2025-11-23 22:59:29,971 - INFO - Epoch 1 Step 8970 (Global: 8970): loss=0.0306, ppl=1.03, grad_norm=0.49, lr=5.76e-07, throughput=1787 tok/s | |
| 2025-11-23 23:04:14,828 - INFO - Epoch 1 Step 8980 (Global: 8980): loss=0.0354, ppl=1.04, grad_norm=0.38, lr=5.68e-07, throughput=1685 tok/s | |
| 2025-11-23 23:08:57,899 - INFO - Epoch 1 Step 8990 (Global: 8990): loss=0.0313, ppl=1.03, grad_norm=0.41, lr=5.61e-07, throughput=1696 tok/s | |
| 2025-11-23 23:13:30,194 - INFO - Epoch 1 Step 9000 (Global: 9000): loss=0.0252, ppl=1.03, grad_norm=0.35, lr=5.53e-07, throughput=1763 tok/s | |
| 2025-11-23 23:18:08,958 - INFO - Epoch 1 Step 9010 (Global: 9010): loss=0.0304, ppl=1.03, grad_norm=0.42, lr=5.45e-07, throughput=1722 tok/s | |
| 2025-11-23 23:22:51,216 - INFO - Epoch 1 Step 9020 (Global: 9020): loss=0.0311, ppl=1.03, grad_norm=1.26, lr=5.38e-07, throughput=1701 tok/s | |
| 2025-11-23 23:27:47,233 - INFO - Epoch 1 Step 9030 (Global: 9030): loss=0.0345, ppl=1.04, grad_norm=0.39, lr=5.30e-07, throughput=1622 tok/s | |
| 2025-11-23 23:32:25,992 - INFO - Epoch 1 Step 9040 (Global: 9040): loss=0.0253, ppl=1.03, grad_norm=0.63, lr=5.23e-07, throughput=1722 tok/s | |
| 2025-11-23 23:37:12,879 - INFO - Epoch 1 Step 9050 (Global: 9050): loss=0.0291, ppl=1.03, grad_norm=0.36, lr=5.15e-07, throughput=1673 tok/s | |
| 2025-11-23 23:41:59,112 - INFO - Epoch 1 Step 9060 (Global: 9060): loss=0.0511, ppl=1.05, grad_norm=0.33, lr=5.08e-07, throughput=1677 tok/s | |
| 2025-11-23 23:46:45,533 - INFO - Epoch 1 Step 9070 (Global: 9070): loss=0.0299, ppl=1.03, grad_norm=0.34, lr=5.01e-07, throughput=1676 tok/s | |
| 2025-11-23 23:51:16,558 - INFO - Epoch 1 Step 9080 (Global: 9080): loss=0.0254, ppl=1.03, grad_norm=0.33, lr=4.93e-07, throughput=1771 tok/s | |
| 2025-11-23 23:55:59,565 - INFO - Epoch 1 Step 9090 (Global: 9090): loss=0.0350, ppl=1.04, grad_norm=0.38, lr=4.86e-07, throughput=1696 tok/s | |
| 2025-11-24 00:00:39,997 - INFO - Epoch 1 Step 9100 (Global: 9100): loss=0.0296, ppl=1.03, grad_norm=0.37, lr=4.79e-07, throughput=1712 tok/s | |
| 2025-11-24 00:05:24,432 - INFO - Epoch 1 Step 9110 (Global: 9110): loss=0.0321, ppl=1.03, grad_norm=0.40, lr=4.72e-07, throughput=1688 tok/s | |
| 2025-11-24 00:10:01,607 - INFO - Epoch 1 Step 9120 (Global: 9120): loss=0.0238, ppl=1.02, grad_norm=0.46, lr=4.65e-07, throughput=1732 tok/s | |
| 2025-11-24 00:14:46,162 - INFO - Epoch 1 Step 9130 (Global: 9130): loss=0.0311, ppl=1.03, grad_norm=0.33, lr=4.58e-07, throughput=1687 tok/s | |
| 2025-11-24 00:19:33,544 - INFO - Epoch 1 Step 9140 (Global: 9140): loss=0.0298, ppl=1.03, grad_norm=0.34, lr=4.51e-07, throughput=1670 tok/s | |
| 2025-11-24 00:24:17,674 - INFO - Epoch 1 Step 9150 (Global: 9150): loss=0.0265, ppl=1.03, grad_norm=0.55, lr=4.44e-07, throughput=1689 tok/s | |
| 2025-11-24 00:28:50,098 - INFO - Epoch 1 Step 9160 (Global: 9160): loss=0.0426, ppl=1.04, grad_norm=0.41, lr=4.37e-07, throughput=1762 tok/s | |
| 2025-11-24 00:33:28,426 - INFO - Epoch 1 Step 9170 (Global: 9170): loss=0.0267, ppl=1.03, grad_norm=0.42, lr=4.30e-07, throughput=1725 tok/s | |
| 2025-11-24 00:38:04,801 - INFO - Epoch 1 Step 9180 (Global: 9180): loss=0.0419, ppl=1.04, grad_norm=0.53, lr=4.23e-07, throughput=1737 tok/s | |
| 2025-11-24 00:42:36,173 - INFO - Epoch 1 Step 9190 (Global: 9190): loss=0.0273, ppl=1.03, grad_norm=0.28, lr=4.17e-07, throughput=1769 tok/s | |
| 2025-11-24 00:47:15,105 - INFO - Epoch 1 Step 9200 (Global: 9200): loss=0.0355, ppl=1.04, grad_norm=0.45, lr=4.10e-07, throughput=1721 tok/s | |
| 2025-11-24 00:51:56,310 - INFO - Epoch 1 Step 9210 (Global: 9210): loss=0.0325, ppl=1.03, grad_norm=0.31, lr=4.03e-07, throughput=1707 tok/s | |
| 2025-11-24 00:56:31,447 - INFO - Epoch 1 Step 9220 (Global: 9220): loss=0.0243, ppl=1.02, grad_norm=0.36, lr=3.97e-07, throughput=1745 tok/s | |
| 2025-11-24 01:00:59,660 - INFO - Epoch 1 Step 9230 (Global: 9230): loss=0.0271, ppl=1.03, grad_norm=0.38, lr=3.90e-07, throughput=1790 tok/s | |
| 2025-11-24 01:05:31,216 - INFO - Epoch 1 Step 9240 (Global: 9240): loss=0.0312, ppl=1.03, grad_norm=0.47, lr=3.84e-07, throughput=1768 tok/s | |
| 2025-11-24 01:10:08,307 - INFO - Epoch 1 Step 9250 (Global: 9250): loss=0.0309, ppl=1.03, grad_norm=0.34, lr=3.77e-07, throughput=1732 tok/s | |
| 2025-11-24 01:14:40,639 - INFO - Epoch 1 Step 9260 (Global: 9260): loss=0.0378, ppl=1.04, grad_norm=0.37, lr=3.71e-07, throughput=1763 tok/s | |
| 2025-11-24 01:19:06,009 - INFO - Epoch 1 Step 9270 (Global: 9270): loss=0.0260, ppl=1.03, grad_norm=0.29, lr=3.65e-07, throughput=1809 tok/s | |
| 2025-11-24 01:23:46,085 - INFO - Epoch 1 Step 9280 (Global: 9280): loss=0.0378, ppl=1.04, grad_norm=0.45, lr=3.58e-07, throughput=1714 tok/s | |
| 2025-11-24 01:28:22,212 - INFO - Epoch 1 Step 9290 (Global: 9290): loss=0.0204, ppl=1.02, grad_norm=0.43, lr=3.52e-07, throughput=1738 tok/s | |
| 2025-11-24 01:33:01,295 - INFO - Epoch 1 Step 9300 (Global: 9300): loss=0.0316, ppl=1.03, grad_norm=0.35, lr=3.46e-07, throughput=1720 tok/s | |
| 2025-11-24 01:37:32,650 - INFO - Epoch 1 Step 9310 (Global: 9310): loss=0.0280, ppl=1.03, grad_norm=0.37, lr=3.40e-07, throughput=1769 tok/s | |
| 2025-11-24 01:42:10,558 - INFO - Epoch 1 Step 9320 (Global: 9320): loss=0.0280, ppl=1.03, grad_norm=0.44, lr=3.34e-07, throughput=1727 tok/s | |
| 2025-11-24 01:46:45,349 - INFO - Epoch 1 Step 9330 (Global: 9330): loss=0.0314, ppl=1.03, grad_norm=0.30, lr=3.28e-07, throughput=1747 tok/s | |
| 2025-11-24 01:51:18,414 - INFO - Epoch 1 Step 9340 (Global: 9340): loss=0.0339, ppl=1.03, grad_norm=0.31, lr=3.22e-07, throughput=1758 tok/s | |
| 2025-11-24 01:55:44,101 - INFO - Epoch 1 Step 9350 (Global: 9350): loss=0.0465, ppl=1.05, grad_norm=0.32, lr=3.16e-07, throughput=1807 tok/s | |
| 2025-11-24 02:00:23,151 - INFO - Epoch 1 Step 9360 (Global: 9360): loss=0.0439, ppl=1.04, grad_norm=0.43, lr=3.10e-07, throughput=1720 tok/s | |
| 2025-11-24 02:05:01,411 - INFO - Epoch 1 Step 9370 (Global: 9370): loss=0.0243, ppl=1.02, grad_norm=0.38, lr=3.05e-07, throughput=1725 tok/s | |
| 2025-11-24 02:09:43,842 - INFO - Epoch 1 Step 9380 (Global: 9380): loss=0.0319, ppl=1.03, grad_norm=0.29, lr=2.99e-07, throughput=1700 tok/s | |
| 2025-11-24 02:14:15,331 - INFO - Epoch 1 Step 9390 (Global: 9390): loss=0.0229, ppl=1.02, grad_norm=0.24, lr=2.93e-07, throughput=1768 tok/s | |
| 2025-11-24 02:18:59,400 - INFO - Epoch 1 Step 9400 (Global: 9400): loss=0.0353, ppl=1.04, grad_norm=0.59, lr=2.88e-07, throughput=1690 tok/s | |
| 2025-11-24 02:23:40,393 - INFO - Epoch 1 Step 9410 (Global: 9410): loss=0.0269, ppl=1.03, grad_norm=0.33, lr=2.82e-07, throughput=1708 tok/s | |
| 2025-11-24 02:28:11,294 - INFO - Epoch 1 Step 9420 (Global: 9420): loss=0.0244, ppl=1.02, grad_norm=0.30, lr=2.76e-07, throughput=1772 tok/s | |
| 2025-11-24 02:32:47,570 - INFO - Epoch 1 Step 9430 (Global: 9430): loss=0.0261, ppl=1.03, grad_norm=0.43, lr=2.71e-07, throughput=1737 tok/s | |
| 2025-11-24 02:37:25,954 - INFO - Epoch 1 Step 9440 (Global: 9440): loss=0.0265, ppl=1.03, grad_norm=0.33, lr=2.66e-07, throughput=1724 tok/s | |
| 2025-11-24 02:42:03,696 - INFO - Epoch 1 Step 9450 (Global: 9450): loss=0.0303, ppl=1.03, grad_norm=0.38, lr=2.60e-07, throughput=1728 tok/s | |
| 2025-11-24 02:46:33,262 - INFO - Epoch 1 Step 9460 (Global: 9460): loss=0.0396, ppl=1.04, grad_norm=0.37, lr=2.55e-07, throughput=1781 tok/s | |
| 2025-11-24 02:51:12,644 - INFO - Epoch 1 Step 9470 (Global: 9470): loss=0.0325, ppl=1.03, grad_norm=0.52, lr=2.50e-07, throughput=1718 tok/s | |
| 2025-11-24 02:55:50,952 - INFO - Epoch 1 Step 9480 (Global: 9480): loss=0.0306, ppl=1.03, grad_norm=0.42, lr=2.44e-07, throughput=1725 tok/s | |
| 2025-11-24 03:00:28,659 - INFO - Epoch 1 Step 9490 (Global: 9490): loss=0.0286, ppl=1.03, grad_norm=0.37, lr=2.39e-07, throughput=1728 tok/s | |
| 2025-11-24 03:04:57,667 - INFO - Epoch 1 Step 9500 (Global: 9500): loss=0.0297, ppl=1.03, grad_norm=0.73, lr=2.34e-07, throughput=1784 tok/s | |
| 2025-11-24 03:09:36,294 - INFO - Epoch 1 Step 9510 (Global: 9510): loss=0.0313, ppl=1.03, grad_norm=0.46, lr=2.29e-07, throughput=1723 tok/s | |
| 2025-11-24 03:14:17,480 - INFO - Epoch 1 Step 9520 (Global: 9520): loss=0.0321, ppl=1.03, grad_norm=0.31, lr=2.24e-07, throughput=1707 tok/s | |
| 2025-11-24 03:18:57,862 - INFO - Epoch 1 Step 9530 (Global: 9530): loss=0.0274, ppl=1.03, grad_norm=0.36, lr=2.19e-07, throughput=1712 tok/s | |
| 2025-11-24 03:23:27,942 - INFO - Epoch 1 Step 9540 (Global: 9540): loss=0.0256, ppl=1.03, grad_norm=0.31, lr=2.14e-07, throughput=1777 tok/s | |
| 2025-11-24 03:28:08,156 - INFO - Epoch 1 Step 9550 (Global: 9550): loss=0.0267, ppl=1.03, grad_norm=0.36, lr=2.10e-07, throughput=1713 tok/s | |
| 2025-11-24 03:32:46,794 - INFO - Epoch 1 Step 9560 (Global: 9560): loss=0.0297, ppl=1.03, grad_norm=0.31, lr=2.05e-07, throughput=1723 tok/s | |
| 2025-11-24 03:37:23,201 - INFO - Epoch 1 Step 9570 (Global: 9570): loss=0.0289, ppl=1.03, grad_norm=0.24, lr=2.00e-07, throughput=1737 tok/s | |
| 2025-11-24 03:41:50,614 - INFO - Epoch 1 Step 9580 (Global: 9580): loss=0.0346, ppl=1.04, grad_norm=0.34, lr=1.95e-07, throughput=1795 tok/s | |
| 2025-11-24 03:46:26,753 - INFO - Epoch 1 Step 9590 (Global: 9590): loss=0.0349, ppl=1.04, grad_norm=0.34, lr=1.91e-07, throughput=1738 tok/s | |
| 2025-11-24 03:51:06,217 - INFO - Epoch 1 Step 9600 (Global: 9600): loss=0.0292, ppl=1.03, grad_norm=0.37, lr=1.86e-07, throughput=1718 tok/s | |
| 2025-11-24 03:55:35,624 - INFO - Epoch 1 Step 9610 (Global: 9610): loss=0.0261, ppl=1.03, grad_norm=0.29, lr=1.82e-07, throughput=1782 tok/s | |
| 2025-11-24 04:00:11,510 - INFO - Epoch 1 Step 9620 (Global: 9620): loss=0.0723, ppl=1.07, grad_norm=0.54, lr=1.77e-07, throughput=1740 tok/s | |
| 2025-11-24 04:04:49,923 - INFO - Epoch 1 Step 9630 (Global: 9630): loss=0.0358, ppl=1.04, grad_norm=0.42, lr=1.73e-07, throughput=1724 tok/s | |
| 2025-11-24 04:09:28,146 - INFO - Epoch 1 Step 9640 (Global: 9640): loss=0.0292, ppl=1.03, grad_norm=0.28, lr=1.68e-07, throughput=1725 tok/s | |
| 2025-11-24 04:14:01,748 - INFO - Epoch 1 Step 9650 (Global: 9650): loss=0.0289, ppl=1.03, grad_norm=0.56, lr=1.64e-07, throughput=1754 tok/s | |
| 2025-11-24 04:18:42,005 - INFO - Epoch 1 Step 9660 (Global: 9660): loss=0.0244, ppl=1.02, grad_norm=0.44, lr=1.60e-07, throughput=1713 tok/s | |
| 2025-11-24 04:23:22,087 - INFO - Epoch 1 Step 9670 (Global: 9670): loss=0.0291, ppl=1.03, grad_norm=0.36, lr=1.56e-07, throughput=1714 tok/s | |
| 2025-11-24 04:28:03,326 - INFO - Epoch 1 Step 9680 (Global: 9680): loss=0.0307, ppl=1.03, grad_norm=0.59, lr=1.52e-07, throughput=1707 tok/s | |
| 2025-11-24 04:32:31,587 - INFO - Epoch 1 Step 9690 (Global: 9690): loss=0.0322, ppl=1.03, grad_norm=0.49, lr=1.48e-07, throughput=1789 tok/s | |
| 2025-11-24 04:37:09,957 - INFO - Epoch 1 Step 9700 (Global: 9700): loss=0.0505, ppl=1.05, grad_norm=0.41, lr=1.44e-07, throughput=1724 tok/s | |
| 2025-11-24 04:41:48,659 - INFO - Epoch 1 Step 9710 (Global: 9710): loss=0.0311, ppl=1.03, grad_norm=0.36, lr=1.40e-07, throughput=1722 tok/s | |
| 2025-11-24 04:46:29,476 - INFO - Epoch 1 Step 9720 (Global: 9720): loss=0.0300, ppl=1.03, grad_norm=0.33, lr=1.36e-07, throughput=1709 tok/s | |
| 2025-11-24 04:50:55,679 - INFO - Epoch 1 Step 9730 (Global: 9730): loss=0.0282, ppl=1.03, grad_norm=0.28, lr=1.32e-07, throughput=1803 tok/s | |
| 2025-11-24 04:55:32,884 - INFO - Epoch 1 Step 9740 (Global: 9740): loss=0.0350, ppl=1.04, grad_norm=0.55, lr=1.28e-07, throughput=1732 tok/s | |
| 2025-11-24 05:00:11,616 - INFO - Epoch 1 Step 9750 (Global: 9750): loss=0.0331, ppl=1.03, grad_norm=0.34, lr=1.24e-07, throughput=1722 tok/s | |
| 2025-11-24 05:04:50,781 - INFO - Epoch 1 Step 9760 (Global: 9760): loss=0.0293, ppl=1.03, grad_norm=0.31, lr=1.21e-07, throughput=1719 tok/s | |
| 2025-11-24 05:09:21,365 - INFO - Epoch 1 Step 9770 (Global: 9770): loss=0.0265, ppl=1.03, grad_norm=0.29, lr=1.17e-07, throughput=1774 tok/s | |
| 2025-11-24 05:14:01,778 - INFO - Epoch 1 Step 9780 (Global: 9780): loss=0.0305, ppl=1.03, grad_norm=0.39, lr=1.13e-07, throughput=1712 tok/s | |
| 2025-11-24 05:18:41,503 - INFO - Epoch 1 Step 9790 (Global: 9790): loss=0.0372, ppl=1.04, grad_norm=0.33, lr=1.10e-07, throughput=1716 tok/s | |
| 2025-11-24 05:23:09,184 - INFO - Epoch 1 Step 9800 (Global: 9800): loss=0.0357, ppl=1.04, grad_norm=0.31, lr=1.06e-07, throughput=1793 tok/s | |
| 2025-11-24 05:27:43,433 - INFO - Epoch 1 Step 9810 (Global: 9810): loss=0.0326, ppl=1.03, grad_norm=0.38, lr=1.03e-07, throughput=1750 tok/s | |
| 2025-11-24 05:32:20,810 - INFO - Epoch 1 Step 9820 (Global: 9820): loss=0.0545, ppl=1.06, grad_norm=0.30, lr=9.97e-08, throughput=1731 tok/s | |
| 2025-11-24 05:36:58,715 - INFO - Epoch 1 Step 9830 (Global: 9830): loss=0.0333, ppl=1.03, grad_norm=0.52, lr=9.64e-08, throughput=1727 tok/s | |
| 2025-11-24 05:41:27,300 - INFO - Epoch 1 Step 9840 (Global: 9840): loss=0.0265, ppl=1.03, grad_norm=0.33, lr=9.32e-08, throughput=1787 tok/s | |
| 2025-11-24 05:46:03,412 - INFO - Epoch 1 Step 9850 (Global: 9850): loss=0.0330, ppl=1.03, grad_norm=0.31, lr=9.00e-08, throughput=1738 tok/s | |
| 2025-11-24 05:50:39,630 - INFO - Epoch 1 Step 9860 (Global: 9860): loss=0.0316, ppl=1.03, grad_norm=0.52, lr=8.68e-08, throughput=1738 tok/s | |
| 2025-11-24 05:55:16,137 - INFO - Epoch 1 Step 9870 (Global: 9870): loss=0.0260, ppl=1.03, grad_norm=0.36, lr=8.37e-08, throughput=1736 tok/s | |
| 2025-11-24 05:59:45,431 - INFO - Epoch 1 Step 9880 (Global: 9880): loss=0.0332, ppl=1.03, grad_norm=0.34, lr=8.07e-08, throughput=1782 tok/s | |
| 2025-11-24 06:04:24,791 - INFO - Epoch 1 Step 9890 (Global: 9890): loss=0.0291, ppl=1.03, grad_norm=0.36, lr=7.77e-08, throughput=1718 tok/s | |
| 2025-11-24 06:09:02,561 - INFO - Epoch 1 Step 9900 (Global: 9900): loss=0.0410, ppl=1.04, grad_norm=0.33, lr=7.48e-08, throughput=1728 tok/s | |
| 2025-11-24 06:13:41,515 - INFO - Epoch 1 Step 9910 (Global: 9910): loss=0.0297, ppl=1.03, grad_norm=0.32, lr=7.20e-08, throughput=1721 tok/s | |
| 2025-11-24 06:18:12,895 - INFO - Epoch 1 Step 9920 (Global: 9920): loss=0.0255, ppl=1.03, grad_norm=0.29, lr=6.92e-08, throughput=1769 tok/s | |
| 2025-11-24 06:22:52,695 - INFO - Epoch 1 Step 9930 (Global: 9930): loss=0.0278, ppl=1.03, grad_norm=0.38, lr=6.64e-08, throughput=1716 tok/s | |
| 2025-11-24 06:27:32,504 - INFO - Epoch 1 Step 9940 (Global: 9940): loss=0.0373, ppl=1.04, grad_norm=0.48, lr=6.37e-08, throughput=1715 tok/s | |
| 2025-11-24 06:32:08,595 - INFO - Epoch 1 Step 9950 (Global: 9950): loss=0.0286, ppl=1.03, grad_norm=0.40, lr=6.11e-08, throughput=1739 tok/s | |
| 2025-11-24 06:36:38,288 - INFO - Epoch 1 Step 9960 (Global: 9960): loss=0.0358, ppl=1.04, grad_norm=0.45, lr=5.85e-08, throughput=1780 tok/s | |
| 2025-11-24 06:41:15,160 - INFO - Epoch 1 Step 9970 (Global: 9970): loss=0.0349, ppl=1.04, grad_norm=0.32, lr=5.60e-08, throughput=1734 tok/s | |
| 2025-11-24 06:45:51,571 - INFO - Epoch 1 Step 9980 (Global: 9980): loss=0.0282, ppl=1.03, grad_norm=0.32, lr=5.35e-08, throughput=1737 tok/s | |
| 2025-11-24 06:50:19,472 - INFO - Epoch 1 Step 9990 (Global: 9990): loss=0.0264, ppl=1.03, grad_norm=0.31, lr=5.11e-08, throughput=1792 tok/s | |
| 2025-11-24 06:54:56,678 - INFO - Epoch 1 Step 10000 (Global: 10000): loss=0.0220, ppl=1.02, grad_norm=0.38, lr=4.87e-08, throughput=1732 tok/s | |
| 2025-11-24 06:54:56,678 - INFO - | |
| Running validation at step 10000... | |
| 2025-11-24 07:10:43,247 - INFO - Validation loss: 0.0318, perplexity: 1.03 | |
| 2025-11-24 07:10:43,248 - INFO - Qualitative metrics (n=5): | |
| 2025-11-24 07:10:43,248 - INFO - BLEU: 0.9938 | |
| 2025-11-24 07:10:43,248 - INFO - METEOR: 0.9844 | |
| 2025-11-24 07:10:43,248 - INFO - Edit Distance: 0.0596 | |
| 2025-11-24 07:10:43,248 - INFO - F-measure: 0.9982 | |
| 2025-11-24 07:10:43,249 - INFO - | |
| ====================================================================== | |
| 2025-11-24 07:10:43,249 - INFO - Qualitative Evaluation Samples: | |
| 2025-11-24 07:10:43,249 - INFO - ====================================================================== | |
| 2025-11-24 07:10:43,249 - INFO - | |
| Sample 1 (ID: sample_141920_chunk_1): | |
| 2025-11-24 07:10:43,249 - INFO - Context: [Image: sample_141920_chunk_1] + " | |
| Free OCR." | |
| 2025-11-24 07:10:43,249 - INFO - Generated: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' | |
| 2025-11-24 07:10:43,249 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' | |
| 2025-11-24 07:10:43,249 - INFO - ---------------------------------------------------------------------- | |
| 2025-11-24 07:10:43,249 - INFO - | |
| Sample 2 (ID: sample_170543_chunk_2): | |
| 2025-11-24 07:10:43,250 - INFO - Context: [Image: sample_170543_chunk_2] + " | |
| Free OCR." | |
| 2025-11-24 07:10:43,250 - INFO - Generated: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' | |
| 2025-11-24 07:10:43,250 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' | |
| 2025-11-24 07:10:43,251 - INFO - ---------------------------------------------------------------------- | |
| 2025-11-24 07:10:43,251 - INFO - | |
| Sample 3 (ID: sample_107152_chunk_9): | |
| 2025-11-24 07:10:43,251 - INFO - Context: [Image: sample_107152_chunk_9] + " | |
| Free OCR." | |
| 2025-11-24 07:10:43,251 - INFO - Generated: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' | |
| 2025-11-24 07:10:43,252 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' | |
| 2025-11-24 07:10:43,252 - INFO - ---------------------------------------------------------------------- | |
| 2025-11-24 07:10:43,252 - INFO - | |
| Sample 4 (ID: sample_069148_chunk_0): | |
| 2025-11-24 07:10:43,252 - INFO - Context: [Image: sample_069148_chunk_0] + " | |
| Free OCR." | |
| 2025-11-24 07:10:43,252 - INFO - Generated: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' | |
| 2025-11-24 07:10:43,253 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' | |
| 2025-11-24 07:10:43,253 - INFO - ---------------------------------------------------------------------- | |
| 2025-11-24 07:10:43,253 - INFO - | |
| Sample 5 (ID: sample_103176_chunk_4): | |
| 2025-11-24 07:10:43,253 - INFO - Context: [Image: sample_103176_chunk_4] + " | |
| Free OCR." | |
| 2025-11-24 07:10:43,253 - INFO - Generated: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' | |
| 2025-11-24 07:10:43,253 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' | |
| 2025-11-24 07:10:43,253 - INFO - ---------------------------------------------------------------------- | |
| 2025-11-24 07:10:43,255 - INFO - | |
| Qualitative samples saved to: outputs/production_vision_base_reconstruction_20251120_220510/qualitative_step_10000.jsonl | |
| 2025-11-24 07:11:26,116 - INFO - Saved checkpoint to outputs/production_vision_base_reconstruction_20251120_220510/best_checkpoint.pt | |
| 2025-11-24 07:11:26,125 - INFO - New best validation loss: 0.0318, perplexity: 1.03 | |
| 2025-11-24 07:16:07,795 - INFO - Epoch 1 Step 10010 (Global: 10010): loss=0.0273, ppl=1.03, grad_norm=0.56, lr=4.64e-08, throughput=1704 tok/s | |
| 2025-11-24 07:20:51,421 - INFO - Epoch 1 Step 10020 (Global: 10020): loss=0.0245, ppl=1.02, grad_norm=0.29, lr=4.42e-08, throughput=1692 tok/s | |
| 2025-11-24 07:25:27,357 - INFO - Epoch 1 Step 10030 (Global: 10030): loss=0.0345, ppl=1.04, grad_norm=0.57, lr=4.20e-08, throughput=1740 tok/s | |
| 2025-11-24 07:30:07,843 - INFO - Epoch 1 Step 10040 (Global: 10040): loss=0.0290, ppl=1.03, grad_norm=0.38, lr=3.98e-08, throughput=1711 tok/s | |
| 2025-11-24 07:34:44,576 - INFO - Epoch 1 Step 10050 (Global: 10050): loss=0.0280, ppl=1.03, grad_norm=0.32, lr=3.78e-08, throughput=1735 tok/s | |
| 2025-11-24 07:39:20,577 - INFO - Epoch 1 Step 10060 (Global: 10060): loss=0.0326, ppl=1.03, grad_norm=0.38, lr=3.57e-08, throughput=1739 tok/s | |
| 2025-11-24 07:43:47,169 - INFO - Epoch 1 Step 10070 (Global: 10070): loss=0.0367, ppl=1.04, grad_norm=0.35, lr=3.38e-08, throughput=1801 tok/s | |
| 2025-11-24 07:48:23,174 - INFO - Epoch 1 Step 10080 (Global: 10080): loss=0.0303, ppl=1.03, grad_norm=0.52, lr=3.18e-08, throughput=1739 tok/s | |
| 2025-11-24 07:53:02,478 - INFO - Epoch 1 Step 10090 (Global: 10090): loss=0.0385, ppl=1.04, grad_norm=0.31, lr=3.00e-08, throughput=1719 tok/s | |
| 2025-11-24 07:57:37,271 - INFO - Epoch 1 Step 10100 (Global: 10100): loss=0.0274, ppl=1.03, grad_norm=0.25, lr=2.82e-08, throughput=1747 tok/s | |
| 2025-11-24 08:02:02,294 - INFO - Epoch 1 Step 10110 (Global: 10110): loss=0.0296, ppl=1.03, grad_norm=0.39, lr=2.64e-08, throughput=1811 tok/s | |
| 2025-11-24 08:06:41,587 - INFO - Epoch 1 Step 10120 (Global: 10120): loss=0.0294, ppl=1.03, grad_norm=0.37, lr=2.47e-08, throughput=1719 tok/s | |
| 2025-11-24 08:11:20,094 - INFO - Epoch 1 Step 10130 (Global: 10130): loss=0.0247, ppl=1.02, grad_norm=0.27, lr=2.31e-08, throughput=1723 tok/s | |
| 2025-11-24 08:16:01,116 - INFO - Epoch 1 Step 10140 (Global: 10140): loss=0.0320, ppl=1.03, grad_norm=0.38, lr=2.15e-08, throughput=1708 tok/s | |
| 2025-11-24 08:20:32,934 - INFO - Epoch 1 Step 10150 (Global: 10150): loss=0.0346, ppl=1.04, grad_norm=0.49, lr=2.00e-08, throughput=1766 tok/s | |
| 2025-11-24 08:25:13,723 - INFO - Epoch 1 Step 10160 (Global: 10160): loss=0.0296, ppl=1.03, grad_norm=0.41, lr=1.85e-08, throughput=1709 tok/s | |
| 2025-11-24 08:29:52,914 - INFO - Epoch 1 Step 10170 (Global: 10170): loss=0.0249, ppl=1.03, grad_norm=0.35, lr=1.71e-08, throughput=1719 tok/s | |
| 2025-11-24 08:34:31,860 - INFO - Epoch 1 Step 10180 (Global: 10180): loss=0.0379, ppl=1.04, grad_norm=0.45, lr=1.58e-08, throughput=1721 tok/s | |
| 2025-11-24 08:39:00,999 - INFO - Epoch 1 Step 10190 (Global: 10190): loss=0.0329, ppl=1.03, grad_norm=0.30, lr=1.45e-08, throughput=1783 tok/s | |
| 2025-11-24 08:43:39,438 - INFO - Epoch 1 Step 10200 (Global: 10200): loss=0.0273, ppl=1.03, grad_norm=0.39, lr=1.32e-08, throughput=1724 tok/s | |
| 2025-11-24 08:48:17,681 - INFO - Epoch 1 Step 10210 (Global: 10210): loss=0.0767, ppl=1.08, grad_norm=0.53, lr=1.20e-08, throughput=1725 tok/s | |
| 2025-11-24 08:52:45,482 - INFO - Epoch 1 Step 10220 (Global: 10220): loss=0.0340, ppl=1.03, grad_norm=0.30, lr=1.09e-08, throughput=1792 tok/s | |
| 2025-11-24 08:57:22,869 - INFO - Epoch 1 Step 10230 (Global: 10230): loss=0.0301, ppl=1.03, grad_norm=0.27, lr=9.81e-09, throughput=1730 tok/s | |
| 2025-11-24 09:02:04,846 - INFO - Epoch 1 Step 10240 (Global: 10240): loss=0.0314, ppl=1.03, grad_norm=0.87, lr=8.79e-09, throughput=1702 tok/s | |
| 2025-11-24 09:06:42,967 - INFO - Epoch 1 Step 10250 (Global: 10250): loss=0.0306, ppl=1.03, grad_norm=0.33, lr=7.83e-09, throughput=1726 tok/s | |
| 2025-11-24 09:11:12,628 - INFO - Epoch 1 Step 10260 (Global: 10260): loss=0.0293, ppl=1.03, grad_norm=0.24, lr=6.92e-09, throughput=1780 tok/s | |
| 2025-11-24 09:15:54,311 - INFO - Epoch 1 Step 10270 (Global: 10270): loss=0.0368, ppl=1.04, grad_norm=0.28, lr=6.06e-09, throughput=1704 tok/s | |
| 2025-11-24 09:20:31,717 - INFO - Epoch 1 Step 10280 (Global: 10280): loss=0.0385, ppl=1.04, grad_norm=0.50, lr=5.27e-09, throughput=1730 tok/s | |
| 2025-11-24 09:25:12,118 - INFO - Epoch 1 Step 10290 (Global: 10290): loss=0.0238, ppl=1.02, grad_norm=0.27, lr=4.53e-09, throughput=1712 tok/s | |
| 2025-11-24 09:29:36,992 - INFO - Epoch 1 Step 10300 (Global: 10300): loss=0.0416, ppl=1.04, grad_norm=0.38, lr=3.84e-09, throughput=1812 tok/s | |
| 2025-11-24 09:34:08,288 - INFO - Epoch 1 Step 10310 (Global: 10310): loss=0.0350, ppl=1.04, grad_norm=0.38, lr=3.21e-09, throughput=1769 tok/s | |
| 2025-11-24 09:38:38,621 - INFO - Epoch 1 Step 10320 (Global: 10320): loss=0.0251, ppl=1.03, grad_norm=0.29, lr=2.64e-09, throughput=1776 tok/s | |
| 2025-11-24 09:43:15,372 - INFO - Epoch 1 Step 10330 (Global: 10330): loss=0.0327, ppl=1.03, grad_norm=0.48, lr=2.12e-09, throughput=1734 tok/s | |
| 2025-11-24 09:47:45,312 - INFO - Epoch 1 Step 10340 (Global: 10340): loss=0.0324, ppl=1.03, grad_norm=0.90, lr=1.66e-09, throughput=1778 tok/s | |
| 2025-11-24 09:52:23,376 - INFO - Epoch 1 Step 10350 (Global: 10350): loss=0.0292, ppl=1.03, grad_norm=0.33, lr=1.26e-09, throughput=1726 tok/s | |
| 2025-11-24 09:57:01,709 - INFO - Epoch 1 Step 10360 (Global: 10360): loss=0.0349, ppl=1.04, grad_norm=0.55, lr=9.12e-10, throughput=1725 tok/s | |
| 2025-11-24 10:01:40,552 - INFO - Epoch 1 Step 10370 (Global: 10370): loss=0.0410, ppl=1.04, grad_norm=0.49, lr=6.20e-10, throughput=1721 tok/s | |
| 2025-11-24 10:06:15,838 - INFO - Epoch 1 Step 10380 (Global: 10380): loss=0.0246, ppl=1.02, grad_norm=0.30, lr=3.84e-10, throughput=1744 tok/s | |
| 2025-11-24 10:10:54,888 - INFO - Epoch 1 Step 10390 (Global: 10390): loss=0.0291, ppl=1.03, grad_norm=0.41, lr=2.05e-10, throughput=1720 tok/s | |
| 2025-11-24 10:15:37,558 - INFO - Epoch 1 Step 10400 (Global: 10400): loss=0.0282, ppl=1.03, grad_norm=0.26, lr=8.11e-11, throughput=1698 tok/s | |
| 2025-11-24 10:20:23,917 - INFO - Epoch 1 Step 10410 (Global: 10410): loss=0.0325, ppl=1.03, grad_norm=0.35, lr=1.38e-11, throughput=1676 tok/s | |
| 2025-11-24 10:23:30,255 - INFO - Flushing 16 remainder batches from gradient accumulation | |
| 2025-11-24 10:23:30,258 - INFO - Rescaling gradients by 1.50x (compensating for 16/24 batches) | |
| 2025-11-24 10:23:30,484 - INFO - Remainder batch: loss=0.0270, ppl=1.03, grad_norm=0.33 | |
| 2025-11-24 10:23:30,494 - INFO - Epoch 1 training: loss=0.0436, ppl=1.04, grad_norm=0.54, throughput=1654 tok/s (302271.2s total) | |
| 2025-11-24 10:23:30,495 - INFO - | |
| Running final validation... | |
| 2025-11-24 10:39:32,933 - INFO - Validation loss: 0.0318, perplexity: 1.03 | |
| 2025-11-24 10:39:32,934 - INFO - Qualitative metrics (n=5): | |
| 2025-11-24 10:39:32,934 - INFO - BLEU: 0.9924 | |
| 2025-11-24 10:39:32,934 - INFO - METEOR: 0.9827 | |
| 2025-11-24 10:39:32,934 - INFO - Edit Distance: 0.0596 | |
| 2025-11-24 10:39:32,934 - INFO - F-measure: 0.9968 | |
| 2025-11-24 10:39:32,935 - INFO - | |
| ====================================================================== | |
| 2025-11-24 10:39:32,935 - INFO - Qualitative Evaluation Samples: | |
| 2025-11-24 10:39:32,935 - INFO - ====================================================================== | |
| 2025-11-24 10:39:32,935 - INFO - | |
| Sample 1 (ID: sample_141920_chunk_1): | |
| 2025-11-24 10:39:32,935 - INFO - Context: [Image: sample_141920_chunk_1] + " | |
| Free OCR." | |
| 2025-11-24 10:39:32,935 - INFO - Generated: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' | |
| 2025-11-24 10:39:32,935 - INFO - Ground Truth: 'Q gave it four stars out of five and said that "Perhaps [the album\'s] seemingly illogical sequencing of songs makes sense if they wish to lure their audience into thinking it\'s as-you-were. But it\'s n...' | |
| 2025-11-24 10:39:32,935 - INFO - ---------------------------------------------------------------------- | |
| 2025-11-24 10:39:32,935 - INFO - | |
| Sample 2 (ID: sample_170543_chunk_2): | |
| 2025-11-24 10:39:32,935 - INFO - Context: [Image: sample_170543_chunk_2] + " | |
| Free OCR." | |
| 2025-11-24 10:39:32,935 - INFO - Generated: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' | |
| 2025-11-24 10:39:32,935 - INFO - Ground Truth: ', was Sirene Abou-Chakra, a Lebanese-American student who led the Arab-American Student Association. Other members included the woman president of the Michigan Student Assembly; the leader of Army ROT...' | |
| 2025-11-24 10:39:32,936 - INFO - ---------------------------------------------------------------------- | |
| 2025-11-24 10:39:32,936 - INFO - | |
| Sample 3 (ID: sample_107152_chunk_9): | |
| 2025-11-24 10:39:32,936 - INFO - Context: [Image: sample_107152_chunk_9] + " | |
| Free OCR." | |
| 2025-11-24 10:39:32,936 - INFO - Generated: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' | |
| 2025-11-24 10:39:32,936 - INFO - Ground Truth: ' at the meeting Laymia headed. His weapon of choice is a giant ax, and he has the power to immobilise his opponents if they look at him in the eye. Oga falls for the trick, but Beel stops the ax and b...' | |
| 2025-11-24 10:39:32,936 - INFO - ---------------------------------------------------------------------- | |
| 2025-11-24 10:39:32,936 - INFO - | |
| Sample 4 (ID: sample_069148_chunk_0): | |
| 2025-11-24 10:39:32,936 - INFO - Context: [Image: sample_069148_chunk_0] + " | |
| Free OCR." | |
| 2025-11-24 10:39:32,936 - INFO - Generated: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' | |
| 2025-11-24 10:39:32,936 - INFO - Ground Truth: '# Oriya (Unicode block)\nOriya is a Unicode block containing characters for the Odia, Khondi and Santali languages of the state of Odisha in India. In its original incarnation, the code points U+0B01.....' | |
| 2025-11-24 10:39:32,936 - INFO - ---------------------------------------------------------------------- | |
| 2025-11-24 10:39:32,937 - INFO - | |
| Sample 5 (ID: sample_103176_chunk_4): | |
| 2025-11-24 10:39:32,937 - INFO - Context: [Image: sample_103176_chunk_4] + " | |
| Free OCR." | |
| 2025-11-24 10:39:32,937 - INFO - Generated: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' | |
| 2025-11-24 10:39:32,937 - INFO - Ground Truth: ' |\n| The Sims 3: Generations | May 31, 2011 | Windows | Maxis Redwood Shores ...' | |
| 2025-11-24 10:39:32,937 - INFO - ---------------------------------------------------------------------- | |
| 2025-11-24 10:39:32,938 - INFO - | |
| Qualitative samples saved to: outputs/production_vision_base_reconstruction_20251120_220510/qualitative_step_10417.jsonl | |
| 2025-11-24 10:39:33,521 - INFO - | |
| Training complete! | |
| 2025-11-24 10:40:10,780 - INFO - Saved checkpoint to outputs/production_vision_base_reconstruction_20251120_220510/final_checkpoint.pt | |
| 2025-11-24 10:40:10,786 - INFO - Final checkpoint saved to outputs/production_vision_base_reconstruction_20251120_220510/final_checkpoint.pt | |
| 2025-11-24 10:40:10,787 - INFO - Best validation loss: 0.0318, perplexity: 1.03 | |
| 2025-11-24 10:40:10,787 - INFO - Checkpoints saved to outputs/production_vision_base_reconstruction_20251120_220510 | |
| 2025-11-24 10:40:11,576 - INFO - W&B run finished | |