| 2025-08-30 04:44:44 - pico-train - INFO - Step 40000 -- ๐ Evaluation Results |
| 2025-08-30 04:44:44 - pico-train - INFO - โโโ paloma: 7.314096757540847e+26 |
| 2025-08-30 04:44:44 - pico-train - INFO - ================================================== |
| 2025-08-30 04:44:44 - pico-train - INFO - โจ Training Configuration |
| 2025-08-30 04:44:44 - pico-train - INFO - ================================================== |
| 2025-08-30 04:44:44 - pico-train - INFO - โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ checkpointing: โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ checkpoints_dir: checkpoints โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ evaluation: โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ eval_results_dir: eval_results โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ fabric_checkpoint_dir: fabric_state โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ fabric_checkpoint_filename: checkpoint.pt โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ hf_checkpoint: โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ collection_slug: null โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ repo_id: ThomasTheMaker/pico-decoder-tiny โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ learning_dynamics: โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ batch_size: 1 โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ eval_data: null โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ layer_suffixes: โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ - attention.v_proj โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ - attention.o_proj โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ - swiglu.w_2 โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ sequence_idx: -1 โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ learning_dynamics_dir: learning_dynamics โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ logs_dir: logs โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ run_name: pico-decoder-tiny-dolma5M-v1 โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ runs_dir: runs โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ save_every_n_steps: 500 โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ save_to_hf: true โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ training: โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ auto_resume: true โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ data: โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ dataloader: โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ batch_size: 4 โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ dataset: โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ name: ThomasTheMaker/pretokenized-dolma-5M โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ tokenizer: โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ name: allenai/OLMo-7B-0724-hf โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ vocab_size: 50304 โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ evaluation: โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ metrics: โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ - paloma โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ paloma: โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ batch_size: 1 โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ dataset_name: pico-lm/pretokenized-paloma-tinsy โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ dataset_split: val โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ max_length: 2048 โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ model: โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ activation_hidden_dim: 384 โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ attention_n_heads: 12 โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ attention_n_kv_heads: 4 โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ batch_size: 1024 โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ d_model: 96 โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ max_seq_len: 2048 โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ model_type: pico_decoder โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ n_layers: 12 โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ norm_eps: 1.0e-06 โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ position_emb_theta: 10000.0 โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ vocab_size: 50304 โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ monitoring: โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ logging: โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ log_every_n_steps: 25 โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ log_level: INFO โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ save_to_wandb: false โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ wandb: โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ entity: boymyc โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ project: pico-decoder-tiny โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ training: โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ fabric: โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ accelerator: cuda โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ num_devices: 1 โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ num_nodes: 1 โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ precision: bf16-mixed โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ max_steps: 20000 โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ optimization: โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ gradient_accumulation_steps: 4 โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ lr: 5.0e-05 โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ lr_scheduler: cosine โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ lr_warmup_steps: 8000 โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ optimizer: adamw โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โ โ |
| 2025-08-30 04:44:44 - pico-train - INFO - โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ |
| 2025-08-30 04:44:44 - pico-train - INFO - ================================================== |
| 2025-08-30 04:44:44 - pico-train - INFO - โญ Runtime Summary: |
| 2025-08-30 04:44:44 - pico-train - INFO - ================================================== |
| 2025-08-30 04:44:44 - pico-train - INFO - Starting from step: 40000 |
| 2025-08-30 04:44:44 - pico-train - INFO - Model Setup: |
| 2025-08-30 04:44:44 - pico-train - INFO - โโ Total Parameters: 11,282,784 |
| 2025-08-30 04:44:44 - pico-train - INFO - โโ Trainable Parameters: 11,282,784 |
| 2025-08-30 04:44:44 - pico-train - INFO - Distributed Setup: |
| 2025-08-30 04:44:44 - pico-train - INFO - โโ Number of Devices: 1 |
| 2025-08-30 04:44:44 - pico-train - INFO - โโ Device Type: NVIDIA GeForce RTX 5090 |
| 2025-08-30 04:44:44 - pico-train - INFO - โโ Available Memory: 33.68 GB |
| 2025-08-30 04:44:44 - pico-train - INFO - Software Setup: |
| 2025-08-30 04:44:44 - pico-train - INFO - โโ Python Version: 3.10.12 |
| 2025-08-30 04:44:44 - pico-train - INFO - โโ PyTorch Version: 2.8.0+cu128 |
| 2025-08-30 04:44:44 - pico-train - INFO - โโ CUDA Version: 12.8 |
| 2025-08-30 04:44:44 - pico-train - INFO - โโ Operating System: Linux 6.8.0-63-generic |
| 2025-08-30 04:44:44 - pico-train - INFO - Batch Size Configuration: |
| 2025-08-30 04:44:44 - pico-train - INFO - โโ Global Batch Size: 4 |
| 2025-08-30 04:44:44 - pico-train - INFO - โโ Per Device Batch Size: 1 |
| 2025-08-30 04:44:44 - pico-train - INFO - โโ Gradient Accumulation Steps: 4 |
| 2025-08-30 04:44:44 - pico-train - INFO - ================================================== |
| 2025-08-30 04:44:45 - pico-train - INFO - Step 40000 -- ๐ Training Metrics |
| 2025-08-30 04:44:45 - pico-train - INFO - โโโ Loss: 6.3052 |
| 2025-08-30 04:44:45 - pico-train - INFO - โโโ Learning Rate: 5.00e-06 |
| 2025-08-30 04:44:45 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 04:44:45 - pico-train - INFO - Step 40000 -- ๐ Saving Learning Dynamics |
| 2025-08-30 04:45:06 - pico-train - INFO - Step 40025 -- ๐ Training Metrics |
| 2025-08-30 04:45:06 - pico-train - INFO - โโโ Loss: 6.1689 |
| 2025-08-30 04:45:06 - pico-train - INFO - โโโ Learning Rate: 3.65e-05 |
| 2025-08-30 04:45:06 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 04:45:22 - pico-train - INFO - Step 40050 -- ๐ Training Metrics |
| 2025-08-30 04:45:22 - pico-train - INFO - โโโ Loss: 6.1212 |
| 2025-08-30 04:45:22 - pico-train - INFO - โโโ Learning Rate: 3.65e-05 |
| 2025-08-30 04:45:22 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 04:45:39 - pico-train - INFO - Step 40075 -- ๐ Training Metrics |
| 2025-08-30 04:45:39 - pico-train - INFO - โโโ Loss: 6.0189 |
| 2025-08-30 04:45:39 - pico-train - INFO - โโโ Learning Rate: 3.64e-05 |
| 2025-08-30 04:45:39 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 04:45:55 - pico-train - INFO - Step 40100 -- ๐ Training Metrics |
| 2025-08-30 04:45:55 - pico-train - INFO - โโโ Loss: 6.1347 |
| 2025-08-30 04:45:55 - pico-train - INFO - โโโ Learning Rate: 3.64e-05 |
| 2025-08-30 04:45:55 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 04:46:12 - pico-train - INFO - Step 40125 -- ๐ Training Metrics |
| 2025-08-30 04:46:12 - pico-train - INFO - โโโ Loss: 6.1791 |
| 2025-08-30 04:46:12 - pico-train - INFO - โโโ Learning Rate: 3.64e-05 |
| 2025-08-30 04:46:12 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 04:46:28 - pico-train - INFO - Step 40150 -- ๐ Training Metrics |
| 2025-08-30 04:46:28 - pico-train - INFO - โโโ Loss: 6.1368 |
| 2025-08-30 04:46:28 - pico-train - INFO - โโโ Learning Rate: 3.64e-05 |
| 2025-08-30 04:46:28 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 04:46:44 - pico-train - INFO - Step 40175 -- ๐ Training Metrics |
| 2025-08-30 04:46:44 - pico-train - INFO - โโโ Loss: 6.1443 |
| 2025-08-30 04:46:44 - pico-train - INFO - โโโ Learning Rate: 3.64e-05 |
| 2025-08-30 04:46:44 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 04:47:01 - pico-train - INFO - Step 40200 -- ๐ Training Metrics |
| 2025-08-30 04:47:01 - pico-train - INFO - โโโ Loss: 6.1815 |
| 2025-08-30 04:47:01 - pico-train - INFO - โโโ Learning Rate: 3.63e-05 |
| 2025-08-30 04:47:01 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 04:47:17 - pico-train - INFO - Step 40225 -- ๐ Training Metrics |
| 2025-08-30 04:47:17 - pico-train - INFO - โโโ Loss: 6.1685 |
| 2025-08-30 04:47:17 - pico-train - INFO - โโโ Learning Rate: 3.63e-05 |
| 2025-08-30 04:47:17 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 04:47:34 - pico-train - INFO - Step 40250 -- ๐ Training Metrics |
| 2025-08-30 04:47:34 - pico-train - INFO - โโโ Loss: 6.0835 |
| 2025-08-30 04:47:34 - pico-train - INFO - โโโ Learning Rate: 3.63e-05 |
| 2025-08-30 04:47:34 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 04:47:50 - pico-train - INFO - Step 40275 -- ๐ Training Metrics |
| 2025-08-30 04:47:50 - pico-train - INFO - โโโ Loss: 6.0785 |
| 2025-08-30 04:47:50 - pico-train - INFO - โโโ Learning Rate: 3.63e-05 |
| 2025-08-30 04:47:50 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 04:48:07 - pico-train - INFO - Step 40300 -- ๐ Training Metrics |
| 2025-08-30 04:48:07 - pico-train - INFO - โโโ Loss: 6.0537 |
| 2025-08-30 04:48:07 - pico-train - INFO - โโโ Learning Rate: 3.63e-05 |
| 2025-08-30 04:48:07 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 04:48:23 - pico-train - INFO - Step 40325 -- ๐ Training Metrics |
| 2025-08-30 04:48:23 - pico-train - INFO - โโโ Loss: 6.0608 |
| 2025-08-30 04:48:23 - pico-train - INFO - โโโ Learning Rate: 3.63e-05 |
| 2025-08-30 04:48:23 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 04:48:36 - pico-train - INFO - Step 40350 -- ๐ Training Metrics |
| 2025-08-30 04:48:36 - pico-train - INFO - โโโ Loss: 6.1696 |
| 2025-08-30 04:48:36 - pico-train - INFO - โโโ Learning Rate: 3.62e-05 |
| 2025-08-30 04:48:36 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 04:48:49 - pico-train - INFO - Step 40375 -- ๐ Training Metrics |
| 2025-08-30 04:48:49 - pico-train - INFO - โโโ Loss: 6.1070 |
| 2025-08-30 04:48:49 - pico-train - INFO - โโโ Learning Rate: 3.62e-05 |
| 2025-08-30 04:48:49 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 04:49:02 - pico-train - INFO - Step 40400 -- ๐ Training Metrics |
| 2025-08-30 04:49:02 - pico-train - INFO - โโโ Loss: 6.0783 |
| 2025-08-30 04:49:02 - pico-train - INFO - โโโ Learning Rate: 3.62e-05 |
| 2025-08-30 04:49:02 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 04:49:14 - pico-train - INFO - Step 40425 -- ๐ Training Metrics |
| 2025-08-30 04:49:14 - pico-train - INFO - โโโ Loss: 6.2326 |
| 2025-08-30 04:49:14 - pico-train - INFO - โโโ Learning Rate: 3.62e-05 |
| 2025-08-30 04:49:14 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 04:49:27 - pico-train - INFO - Step 40450 -- ๐ Training Metrics |
| 2025-08-30 04:49:27 - pico-train - INFO - โโโ Loss: 6.0715 |
| 2025-08-30 04:49:27 - pico-train - INFO - โโโ Learning Rate: 3.62e-05 |
| 2025-08-30 04:49:27 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 04:49:39 - pico-train - INFO - Step 40475 -- ๐ Training Metrics |
| 2025-08-30 04:49:39 - pico-train - INFO - โโโ Loss: 6.1857 |
| 2025-08-30 04:49:39 - pico-train - INFO - โโโ Learning Rate: 3.61e-05 |
| 2025-08-30 04:49:39 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 04:49:51 - pico-train - INFO - Step 40500 -- ๐พ Saving Checkpoint |
| 2025-08-30 04:51:47 - pico-train - INFO - Step 40500 -- ๐ Evaluation Results |
| 2025-08-30 04:51:47 - pico-train - INFO - โโโ paloma: 1.2201991301470252e+27 |
| 2025-08-30 04:51:50 - pico-train - INFO - Step 40500 -- ๐ Training Metrics |
| 2025-08-30 04:51:50 - pico-train - INFO - โโโ Loss: 6.1294 |
| 2025-08-30 04:51:50 - pico-train - INFO - โโโ Learning Rate: 3.61e-05 |
| 2025-08-30 04:51:50 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 04:51:50 - pico-train - INFO - Step 40500 -- ๐ Saving Learning Dynamics |
| 2025-08-30 04:52:05 - pico-train - INFO - Step 40525 -- ๐ Training Metrics |
| 2025-08-30 04:52:05 - pico-train - INFO - โโโ Loss: 6.1508 |
| 2025-08-30 04:52:05 - pico-train - INFO - โโโ Learning Rate: 3.61e-05 |
| 2025-08-30 04:52:05 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 04:52:18 - pico-train - INFO - Step 40550 -- ๐ Training Metrics |
| 2025-08-30 04:52:18 - pico-train - INFO - โโโ Loss: 6.1130 |
| 2025-08-30 04:52:18 - pico-train - INFO - โโโ Learning Rate: 3.61e-05 |
| 2025-08-30 04:52:18 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 04:52:30 - pico-train - INFO - Step 40575 -- ๐ Training Metrics |
| 2025-08-30 04:52:30 - pico-train - INFO - โโโ Loss: 6.1631 |
| 2025-08-30 04:52:30 - pico-train - INFO - โโโ Learning Rate: 3.61e-05 |
| 2025-08-30 04:52:30 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 04:52:43 - pico-train - INFO - Step 40600 -- ๐ Training Metrics |
| 2025-08-30 04:52:43 - pico-train - INFO - โโโ Loss: 6.2337 |
| 2025-08-30 04:52:43 - pico-train - INFO - โโโ Learning Rate: 3.60e-05 |
| 2025-08-30 04:52:43 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 04:52:55 - pico-train - INFO - Step 40625 -- ๐ Training Metrics |
| 2025-08-30 04:52:55 - pico-train - INFO - โโโ Loss: 6.0858 |
| 2025-08-30 04:52:55 - pico-train - INFO - โโโ Learning Rate: 3.60e-05 |
| 2025-08-30 04:52:55 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 04:53:08 - pico-train - INFO - Step 40650 -- ๐ Training Metrics |
| 2025-08-30 04:53:08 - pico-train - INFO - โโโ Loss: 6.1727 |
| 2025-08-30 04:53:08 - pico-train - INFO - โโโ Learning Rate: 3.60e-05 |
| 2025-08-30 04:53:08 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 04:53:21 - pico-train - INFO - Step 40675 -- ๐ Training Metrics |
| 2025-08-30 04:53:21 - pico-train - INFO - โโโ Loss: 6.1629 |
| 2025-08-30 04:53:21 - pico-train - INFO - โโโ Learning Rate: 3.60e-05 |
| 2025-08-30 04:53:21 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 04:53:33 - pico-train - INFO - Step 40700 -- ๐ Training Metrics |
| 2025-08-30 04:53:33 - pico-train - INFO - โโโ Loss: 6.1451 |
| 2025-08-30 04:53:33 - pico-train - INFO - โโโ Learning Rate: 3.60e-05 |
| 2025-08-30 04:53:33 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 04:53:46 - pico-train - INFO - Step 40725 -- ๐ Training Metrics |
| 2025-08-30 04:53:46 - pico-train - INFO - โโโ Loss: 6.1482 |
| 2025-08-30 04:53:46 - pico-train - INFO - โโโ Learning Rate: 3.59e-05 |
| 2025-08-30 04:53:46 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 04:53:58 - pico-train - INFO - Step 40750 -- ๐ Training Metrics |
| 2025-08-30 04:53:58 - pico-train - INFO - โโโ Loss: 6.0939 |
| 2025-08-30 04:53:58 - pico-train - INFO - โโโ Learning Rate: 3.59e-05 |
| 2025-08-30 04:53:58 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 04:54:11 - pico-train - INFO - Step 40775 -- ๐ Training Metrics |
| 2025-08-30 04:54:11 - pico-train - INFO - โโโ Loss: 6.1594 |
| 2025-08-30 04:54:11 - pico-train - INFO - โโโ Learning Rate: 3.59e-05 |
| 2025-08-30 04:54:11 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 04:54:23 - pico-train - INFO - Step 40800 -- ๐ Training Metrics |
| 2025-08-30 04:54:23 - pico-train - INFO - โโโ Loss: 6.1450 |
| 2025-08-30 04:54:23 - pico-train - INFO - โโโ Learning Rate: 3.59e-05 |
| 2025-08-30 04:54:23 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 04:54:36 - pico-train - INFO - Step 40825 -- ๐ Training Metrics |
| 2025-08-30 04:54:36 - pico-train - INFO - โโโ Loss: 6.0952 |
| 2025-08-30 04:54:36 - pico-train - INFO - โโโ Learning Rate: 3.59e-05 |
| 2025-08-30 04:54:36 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 04:54:48 - pico-train - INFO - Step 40850 -- ๐ Training Metrics |
| 2025-08-30 04:54:48 - pico-train - INFO - โโโ Loss: 6.1180 |
| 2025-08-30 04:54:48 - pico-train - INFO - โโโ Learning Rate: 3.59e-05 |
| 2025-08-30 04:54:48 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 04:55:01 - pico-train - INFO - Step 40875 -- ๐ Training Metrics |
| 2025-08-30 04:55:01 - pico-train - INFO - โโโ Loss: 6.0993 |
| 2025-08-30 04:55:01 - pico-train - INFO - โโโ Learning Rate: 3.58e-05 |
| 2025-08-30 04:55:01 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 04:55:13 - pico-train - INFO - Step 40900 -- ๐ Training Metrics |
| 2025-08-30 04:55:13 - pico-train - INFO - โโโ Loss: 6.0885 |
| 2025-08-30 04:55:13 - pico-train - INFO - โโโ Learning Rate: 3.58e-05 |
| 2025-08-30 04:55:13 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 04:55:26 - pico-train - INFO - Step 40925 -- ๐ Training Metrics |
| 2025-08-30 04:55:26 - pico-train - INFO - โโโ Loss: 6.0793 |
| 2025-08-30 04:55:26 - pico-train - INFO - โโโ Learning Rate: 3.58e-05 |
| 2025-08-30 04:55:26 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 04:55:39 - pico-train - INFO - Step 40950 -- ๐ Training Metrics |
| 2025-08-30 04:55:39 - pico-train - INFO - โโโ Loss: 6.1996 |
| 2025-08-30 04:55:39 - pico-train - INFO - โโโ Learning Rate: 3.58e-05 |
| 2025-08-30 04:55:39 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 04:55:51 - pico-train - INFO - Step 40975 -- ๐ Training Metrics |
| 2025-08-30 04:55:51 - pico-train - INFO - โโโ Loss: 6.1833 |
| 2025-08-30 04:55:51 - pico-train - INFO - โโโ Learning Rate: 3.58e-05 |
| 2025-08-30 04:55:51 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 04:56:03 - pico-train - INFO - Step 41000 -- ๐พ Saving Checkpoint |
| 2025-08-30 04:58:02 - pico-train - INFO - Step 41000 -- ๐ Evaluation Results |
| 2025-08-30 04:58:02 - pico-train - INFO - โโโ paloma: 1.2786105287360795e+27 |
| 2025-08-30 04:58:05 - pico-train - INFO - Step 41000 -- ๐ Training Metrics |
| 2025-08-30 04:58:05 - pico-train - INFO - โโโ Loss: 6.0609 |
| 2025-08-30 04:58:05 - pico-train - INFO - โโโ Learning Rate: 3.57e-05 |
| 2025-08-30 04:58:05 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 04:58:05 - pico-train - INFO - Step 41000 -- ๐ Saving Learning Dynamics |
| 2025-08-30 04:58:20 - pico-train - INFO - Step 41025 -- ๐ Training Metrics |
| 2025-08-30 04:58:20 - pico-train - INFO - โโโ Loss: 6.0776 |
| 2025-08-30 04:58:20 - pico-train - INFO - โโโ Learning Rate: 3.57e-05 |
| 2025-08-30 04:58:20 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 04:58:32 - pico-train - INFO - Step 41050 -- ๐ Training Metrics |
| 2025-08-30 04:58:32 - pico-train - INFO - โโโ Loss: 6.0842 |
| 2025-08-30 04:58:32 - pico-train - INFO - โโโ Learning Rate: 3.57e-05 |
| 2025-08-30 04:58:32 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 04:58:45 - pico-train - INFO - Step 41075 -- ๐ Training Metrics |
| 2025-08-30 04:58:45 - pico-train - INFO - โโโ Loss: 6.0750 |
| 2025-08-30 04:58:45 - pico-train - INFO - โโโ Learning Rate: 3.57e-05 |
| 2025-08-30 04:58:45 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 04:58:57 - pico-train - INFO - Step 41100 -- ๐ Training Metrics |
| 2025-08-30 04:58:57 - pico-train - INFO - โโโ Loss: 6.1881 |
| 2025-08-30 04:58:57 - pico-train - INFO - โโโ Learning Rate: 3.57e-05 |
| 2025-08-30 04:58:57 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 04:59:10 - pico-train - INFO - Step 41125 -- ๐ Training Metrics |
| 2025-08-30 04:59:10 - pico-train - INFO - โโโ Loss: 6.1206 |
| 2025-08-30 04:59:10 - pico-train - INFO - โโโ Learning Rate: 3.56e-05 |
| 2025-08-30 04:59:10 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 04:59:23 - pico-train - INFO - Step 41150 -- ๐ Training Metrics |
| 2025-08-30 04:59:23 - pico-train - INFO - โโโ Loss: 6.0181 |
| 2025-08-30 04:59:23 - pico-train - INFO - โโโ Learning Rate: 3.56e-05 |
| 2025-08-30 04:59:23 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 04:59:35 - pico-train - INFO - Step 41175 -- ๐ Training Metrics |
| 2025-08-30 04:59:35 - pico-train - INFO - โโโ Loss: 6.2113 |
| 2025-08-30 04:59:35 - pico-train - INFO - โโโ Learning Rate: 3.56e-05 |
| 2025-08-30 04:59:35 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 04:59:48 - pico-train - INFO - Step 41200 -- ๐ Training Metrics |
| 2025-08-30 04:59:48 - pico-train - INFO - โโโ Loss: 6.1853 |
| 2025-08-30 04:59:48 - pico-train - INFO - โโโ Learning Rate: 3.56e-05 |
| 2025-08-30 04:59:48 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:00:00 - pico-train - INFO - Step 41225 -- ๐ Training Metrics |
| 2025-08-30 05:00:00 - pico-train - INFO - โโโ Loss: 6.0819 |
| 2025-08-30 05:00:00 - pico-train - INFO - โโโ Learning Rate: 3.56e-05 |
| 2025-08-30 05:00:00 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:00:13 - pico-train - INFO - Step 41250 -- ๐ Training Metrics |
| 2025-08-30 05:00:13 - pico-train - INFO - โโโ Loss: 6.0575 |
| 2025-08-30 05:00:13 - pico-train - INFO - โโโ Learning Rate: 3.55e-05 |
| 2025-08-30 05:00:13 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:00:25 - pico-train - INFO - Step 41275 -- ๐ Training Metrics |
| 2025-08-30 05:00:25 - pico-train - INFO - โโโ Loss: 6.0731 |
| 2025-08-30 05:00:25 - pico-train - INFO - โโโ Learning Rate: 3.55e-05 |
| 2025-08-30 05:00:25 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:00:38 - pico-train - INFO - Step 41300 -- ๐ Training Metrics |
| 2025-08-30 05:00:38 - pico-train - INFO - โโโ Loss: 6.0200 |
| 2025-08-30 05:00:38 - pico-train - INFO - โโโ Learning Rate: 3.55e-05 |
| 2025-08-30 05:00:38 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:00:50 - pico-train - INFO - Step 41325 -- ๐ Training Metrics |
| 2025-08-30 05:00:50 - pico-train - INFO - โโโ Loss: 6.0379 |
| 2025-08-30 05:00:50 - pico-train - INFO - โโโ Learning Rate: 3.55e-05 |
| 2025-08-30 05:00:50 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:01:03 - pico-train - INFO - Step 41350 -- ๐ Training Metrics |
| 2025-08-30 05:01:03 - pico-train - INFO - โโโ Loss: 6.0660 |
| 2025-08-30 05:01:03 - pico-train - INFO - โโโ Learning Rate: 3.55e-05 |
| 2025-08-30 05:01:03 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:01:15 - pico-train - INFO - Step 41375 -- ๐ Training Metrics |
| 2025-08-30 05:01:15 - pico-train - INFO - โโโ Loss: 6.1597 |
| 2025-08-30 05:01:15 - pico-train - INFO - โโโ Learning Rate: 3.54e-05 |
| 2025-08-30 05:01:15 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:01:28 - pico-train - INFO - Step 41400 -- ๐ Training Metrics |
| 2025-08-30 05:01:28 - pico-train - INFO - โโโ Loss: 6.0449 |
| 2025-08-30 05:01:28 - pico-train - INFO - โโโ Learning Rate: 3.54e-05 |
| 2025-08-30 05:01:28 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:01:41 - pico-train - INFO - Step 41425 -- ๐ Training Metrics |
| 2025-08-30 05:01:41 - pico-train - INFO - โโโ Loss: 6.1370 |
| 2025-08-30 05:01:41 - pico-train - INFO - โโโ Learning Rate: 3.54e-05 |
| 2025-08-30 05:01:41 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:01:53 - pico-train - INFO - Step 41450 -- ๐ Training Metrics |
| 2025-08-30 05:01:53 - pico-train - INFO - โโโ Loss: 6.1647 |
| 2025-08-30 05:01:53 - pico-train - INFO - โโโ Learning Rate: 3.54e-05 |
| 2025-08-30 05:01:53 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:02:06 - pico-train - INFO - Step 41475 -- ๐ Training Metrics |
| 2025-08-30 05:02:06 - pico-train - INFO - โโโ Loss: 6.0793 |
| 2025-08-30 05:02:06 - pico-train - INFO - โโโ Learning Rate: 3.54e-05 |
| 2025-08-30 05:02:06 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:02:18 - pico-train - INFO - Step 41500 -- ๐พ Saving Checkpoint |
| 2025-08-30 05:04:19 - pico-train - INFO - Step 41500 -- ๐ Evaluation Results |
| 2025-08-30 05:04:19 - pico-train - INFO - โโโ paloma: 2.062057669347938e+27 |
| 2025-08-30 05:04:23 - pico-train - INFO - Step 41500 -- ๐ Training Metrics |
| 2025-08-30 05:04:23 - pico-train - INFO - โโโ Loss: 6.0860 |
| 2025-08-30 05:04:23 - pico-train - INFO - โโโ Learning Rate: 3.54e-05 |
| 2025-08-30 05:04:23 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:04:23 - pico-train - INFO - Step 41500 -- ๐ Saving Learning Dynamics |
| 2025-08-30 05:04:39 - pico-train - INFO - Step 41525 -- ๐ Training Metrics |
| 2025-08-30 05:04:39 - pico-train - INFO - โโโ Loss: 6.0604 |
| 2025-08-30 05:04:39 - pico-train - INFO - โโโ Learning Rate: 3.53e-05 |
| 2025-08-30 05:04:39 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:04:51 - pico-train - INFO - Step 41550 -- ๐ Training Metrics |
| 2025-08-30 05:04:51 - pico-train - INFO - โโโ Loss: 6.0622 |
| 2025-08-30 05:04:51 - pico-train - INFO - โโโ Learning Rate: 3.53e-05 |
| 2025-08-30 05:04:51 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:05:04 - pico-train - INFO - Step 41575 -- ๐ Training Metrics |
| 2025-08-30 05:05:04 - pico-train - INFO - โโโ Loss: 6.0831 |
| 2025-08-30 05:05:04 - pico-train - INFO - โโโ Learning Rate: 3.53e-05 |
| 2025-08-30 05:05:04 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:05:16 - pico-train - INFO - Step 41600 -- ๐ Training Metrics |
| 2025-08-30 05:05:16 - pico-train - INFO - โโโ Loss: 6.0853 |
| 2025-08-30 05:05:16 - pico-train - INFO - โโโ Learning Rate: 3.53e-05 |
| 2025-08-30 05:05:16 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:05:29 - pico-train - INFO - Step 41625 -- ๐ Training Metrics |
| 2025-08-30 05:05:29 - pico-train - INFO - โโโ Loss: 6.0860 |
| 2025-08-30 05:05:29 - pico-train - INFO - โโโ Learning Rate: 3.53e-05 |
| 2025-08-30 05:05:29 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:05:41 - pico-train - INFO - Step 41650 -- ๐ Training Metrics |
| 2025-08-30 05:05:41 - pico-train - INFO - โโโ Loss: 6.0905 |
| 2025-08-30 05:05:41 - pico-train - INFO - โโโ Learning Rate: 3.52e-05 |
| 2025-08-30 05:05:41 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:05:54 - pico-train - INFO - Step 41675 -- ๐ Training Metrics |
| 2025-08-30 05:05:54 - pico-train - INFO - โโโ Loss: 6.0475 |
| 2025-08-30 05:05:54 - pico-train - INFO - โโโ Learning Rate: 3.52e-05 |
| 2025-08-30 05:05:54 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:06:07 - pico-train - INFO - Step 41700 -- ๐ Training Metrics |
| 2025-08-30 05:06:07 - pico-train - INFO - โโโ Loss: 6.1168 |
| 2025-08-30 05:06:07 - pico-train - INFO - โโโ Learning Rate: 3.52e-05 |
| 2025-08-30 05:06:07 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:06:19 - pico-train - INFO - Step 41725 -- ๐ Training Metrics |
| 2025-08-30 05:06:19 - pico-train - INFO - โโโ Loss: 6.1310 |
| 2025-08-30 05:06:19 - pico-train - INFO - โโโ Learning Rate: 3.52e-05 |
| 2025-08-30 05:06:19 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:06:32 - pico-train - INFO - Step 41750 -- ๐ Training Metrics |
| 2025-08-30 05:06:32 - pico-train - INFO - โโโ Loss: 6.0966 |
| 2025-08-30 05:06:32 - pico-train - INFO - โโโ Learning Rate: 3.52e-05 |
| 2025-08-30 05:06:32 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:06:44 - pico-train - INFO - Step 41775 -- ๐ Training Metrics |
| 2025-08-30 05:06:44 - pico-train - INFO - โโโ Loss: 6.1002 |
| 2025-08-30 05:06:44 - pico-train - INFO - โโโ Learning Rate: 3.51e-05 |
| 2025-08-30 05:06:44 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:06:57 - pico-train - INFO - Step 41800 -- ๐ Training Metrics |
| 2025-08-30 05:06:57 - pico-train - INFO - โโโ Loss: 6.1383 |
| 2025-08-30 05:06:57 - pico-train - INFO - โโโ Learning Rate: 3.51e-05 |
| 2025-08-30 05:06:57 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:07:09 - pico-train - INFO - Step 41825 -- ๐ Training Metrics |
| 2025-08-30 05:07:09 - pico-train - INFO - โโโ Loss: 6.0973 |
| 2025-08-30 05:07:09 - pico-train - INFO - โโโ Learning Rate: 3.51e-05 |
| 2025-08-30 05:07:09 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:07:22 - pico-train - INFO - Step 41850 -- ๐ Training Metrics |
| 2025-08-30 05:07:22 - pico-train - INFO - โโโ Loss: 6.0864 |
| 2025-08-30 05:07:22 - pico-train - INFO - โโโ Learning Rate: 3.51e-05 |
| 2025-08-30 05:07:22 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:07:34 - pico-train - INFO - Step 41875 -- ๐ Training Metrics |
| 2025-08-30 05:07:34 - pico-train - INFO - โโโ Loss: 6.1542 |
| 2025-08-30 05:07:34 - pico-train - INFO - โโโ Learning Rate: 3.51e-05 |
| 2025-08-30 05:07:34 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:07:47 - pico-train - INFO - Step 41900 -- ๐ Training Metrics |
| 2025-08-30 05:07:47 - pico-train - INFO - โโโ Loss: 6.1191 |
| 2025-08-30 05:07:47 - pico-train - INFO - โโโ Learning Rate: 3.50e-05 |
| 2025-08-30 05:07:47 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:07:59 - pico-train - INFO - Step 41925 -- ๐ Training Metrics |
| 2025-08-30 05:07:59 - pico-train - INFO - โโโ Loss: 6.1827 |
| 2025-08-30 05:07:59 - pico-train - INFO - โโโ Learning Rate: 3.50e-05 |
| 2025-08-30 05:07:59 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:08:12 - pico-train - INFO - Step 41950 -- ๐ Training Metrics |
| 2025-08-30 05:08:12 - pico-train - INFO - โโโ Loss: 6.1001 |
| 2025-08-30 05:08:12 - pico-train - INFO - โโโ Learning Rate: 3.50e-05 |
| 2025-08-30 05:08:12 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:08:24 - pico-train - INFO - Step 41975 -- ๐ Training Metrics |
| 2025-08-30 05:08:24 - pico-train - INFO - โโโ Loss: 6.1700 |
| 2025-08-30 05:08:24 - pico-train - INFO - โโโ Learning Rate: 3.50e-05 |
| 2025-08-30 05:08:24 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:08:36 - pico-train - INFO - Step 42000 -- ๐พ Saving Checkpoint |
| 2025-08-30 05:10:36 - pico-train - INFO - Step 42000 -- ๐ Evaluation Results |
| 2025-08-30 05:10:36 - pico-train - INFO - โโโ paloma: 2.5987478678619155e+27 |
| 2025-08-30 05:10:39 - pico-train - INFO - Step 42000 -- ๐ Training Metrics |
| 2025-08-30 05:10:39 - pico-train - INFO - โโโ Loss: 6.1167 |
| 2025-08-30 05:10:39 - pico-train - INFO - โโโ Learning Rate: 3.50e-05 |
| 2025-08-30 05:10:39 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:10:39 - pico-train - INFO - Step 42000 -- ๐ Saving Learning Dynamics |
| 2025-08-30 05:10:57 - pico-train - INFO - Step 42025 -- ๐ Training Metrics |
| 2025-08-30 05:10:57 - pico-train - INFO - โโโ Loss: 6.1833 |
| 2025-08-30 05:10:57 - pico-train - INFO - โโโ Learning Rate: 3.49e-05 |
| 2025-08-30 05:10:57 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:11:09 - pico-train - INFO - Step 42050 -- ๐ Training Metrics |
| 2025-08-30 05:11:09 - pico-train - INFO - โโโ Loss: 6.0939 |
| 2025-08-30 05:11:09 - pico-train - INFO - โโโ Learning Rate: 3.49e-05 |
| 2025-08-30 05:11:09 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:11:22 - pico-train - INFO - Step 42075 -- ๐ Training Metrics |
| 2025-08-30 05:11:22 - pico-train - INFO - โโโ Loss: 6.0309 |
| 2025-08-30 05:11:22 - pico-train - INFO - โโโ Learning Rate: 3.49e-05 |
| 2025-08-30 05:11:22 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:11:34 - pico-train - INFO - Step 42100 -- ๐ Training Metrics |
| 2025-08-30 05:11:34 - pico-train - INFO - โโโ Loss: 6.0340 |
| 2025-08-30 05:11:34 - pico-train - INFO - โโโ Learning Rate: 3.49e-05 |
| 2025-08-30 05:11:34 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:11:47 - pico-train - INFO - Step 42125 -- ๐ Training Metrics |
| 2025-08-30 05:11:47 - pico-train - INFO - โโโ Loss: 6.0556 |
| 2025-08-30 05:11:47 - pico-train - INFO - โโโ Learning Rate: 3.49e-05 |
| 2025-08-30 05:11:47 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:11:59 - pico-train - INFO - Step 42150 -- ๐ Training Metrics |
| 2025-08-30 05:11:59 - pico-train - INFO - โโโ Loss: 6.1500 |
| 2025-08-30 05:11:59 - pico-train - INFO - โโโ Learning Rate: 3.48e-05 |
| 2025-08-30 05:11:59 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:12:12 - pico-train - INFO - Step 42175 -- ๐ Training Metrics |
| 2025-08-30 05:12:12 - pico-train - INFO - โโโ Loss: 6.1793 |
| 2025-08-30 05:12:12 - pico-train - INFO - โโโ Learning Rate: 3.48e-05 |
| 2025-08-30 05:12:12 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:12:25 - pico-train - INFO - Step 42200 -- ๐ Training Metrics |
| 2025-08-30 05:12:25 - pico-train - INFO - โโโ Loss: 6.0804 |
| 2025-08-30 05:12:25 - pico-train - INFO - โโโ Learning Rate: 3.48e-05 |
| 2025-08-30 05:12:25 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:12:37 - pico-train - INFO - Step 42225 -- ๐ Training Metrics |
| 2025-08-30 05:12:37 - pico-train - INFO - โโโ Loss: 6.1646 |
| 2025-08-30 05:12:37 - pico-train - INFO - โโโ Learning Rate: 3.48e-05 |
| 2025-08-30 05:12:37 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:12:50 - pico-train - INFO - Step 42250 -- ๐ Training Metrics |
| 2025-08-30 05:12:50 - pico-train - INFO - โโโ Loss: 6.1414 |
| 2025-08-30 05:12:50 - pico-train - INFO - โโโ Learning Rate: 3.48e-05 |
| 2025-08-30 05:12:50 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:13:02 - pico-train - INFO - Step 42275 -- ๐ Training Metrics |
| 2025-08-30 05:13:02 - pico-train - INFO - โโโ Loss: 6.0790 |
| 2025-08-30 05:13:02 - pico-train - INFO - โโโ Learning Rate: 3.47e-05 |
| 2025-08-30 05:13:02 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:13:15 - pico-train - INFO - Step 42300 -- ๐ Training Metrics |
| 2025-08-30 05:13:15 - pico-train - INFO - โโโ Loss: 6.0907 |
| 2025-08-30 05:13:15 - pico-train - INFO - โโโ Learning Rate: 3.47e-05 |
| 2025-08-30 05:13:15 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:13:27 - pico-train - INFO - Step 42325 -- ๐ Training Metrics |
| 2025-08-30 05:13:27 - pico-train - INFO - โโโ Loss: 6.1426 |
| 2025-08-30 05:13:27 - pico-train - INFO - โโโ Learning Rate: 3.47e-05 |
| 2025-08-30 05:13:27 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:13:40 - pico-train - INFO - Step 42350 -- ๐ Training Metrics |
| 2025-08-30 05:13:40 - pico-train - INFO - โโโ Loss: 6.1071 |
| 2025-08-30 05:13:40 - pico-train - INFO - โโโ Learning Rate: 3.47e-05 |
| 2025-08-30 05:13:40 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:13:52 - pico-train - INFO - Step 42375 -- ๐ Training Metrics |
| 2025-08-30 05:13:52 - pico-train - INFO - โโโ Loss: 6.0071 |
| 2025-08-30 05:13:52 - pico-train - INFO - โโโ Learning Rate: 3.47e-05 |
| 2025-08-30 05:13:52 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:14:05 - pico-train - INFO - Step 42400 -- ๐ Training Metrics |
| 2025-08-30 05:14:05 - pico-train - INFO - โโโ Loss: 6.1562 |
| 2025-08-30 05:14:05 - pico-train - INFO - โโโ Learning Rate: 3.46e-05 |
| 2025-08-30 05:14:05 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:14:18 - pico-train - INFO - Step 42425 -- ๐ Training Metrics |
| 2025-08-30 05:14:18 - pico-train - INFO - โโโ Loss: 6.1296 |
| 2025-08-30 05:14:18 - pico-train - INFO - โโโ Learning Rate: 3.46e-05 |
| 2025-08-30 05:14:18 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:14:30 - pico-train - INFO - Step 42450 -- ๐ Training Metrics |
| 2025-08-30 05:14:30 - pico-train - INFO - โโโ Loss: 6.1257 |
| 2025-08-30 05:14:30 - pico-train - INFO - โโโ Learning Rate: 3.46e-05 |
| 2025-08-30 05:14:30 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:14:43 - pico-train - INFO - Step 42475 -- ๐ Training Metrics |
| 2025-08-30 05:14:43 - pico-train - INFO - โโโ Loss: 6.1398 |
| 2025-08-30 05:14:43 - pico-train - INFO - โโโ Learning Rate: 3.46e-05 |
| 2025-08-30 05:14:43 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:14:55 - pico-train - INFO - Step 42500 -- ๐พ Saving Checkpoint |
| 2025-08-30 05:16:58 - pico-train - INFO - Step 42500 -- ๐ Evaluation Results |
| 2025-08-30 05:16:58 - pico-train - INFO - โโโ paloma: 3.0154563482458477e+27 |
| 2025-08-30 05:17:01 - pico-train - INFO - Step 42500 -- ๐ Training Metrics |
| 2025-08-30 05:17:01 - pico-train - INFO - โโโ Loss: 6.0496 |
| 2025-08-30 05:17:01 - pico-train - INFO - โโโ Learning Rate: 3.46e-05 |
| 2025-08-30 05:17:01 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:17:01 - pico-train - INFO - Step 42500 -- ๐ Saving Learning Dynamics |
| 2025-08-30 05:17:16 - pico-train - INFO - Step 42525 -- ๐ Training Metrics |
| 2025-08-30 05:17:16 - pico-train - INFO - โโโ Loss: 6.0819 |
| 2025-08-30 05:17:16 - pico-train - INFO - โโโ Learning Rate: 3.45e-05 |
| 2025-08-30 05:17:16 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:17:29 - pico-train - INFO - Step 42550 -- ๐ Training Metrics |
| 2025-08-30 05:17:29 - pico-train - INFO - โโโ Loss: 6.0871 |
| 2025-08-30 05:17:29 - pico-train - INFO - โโโ Learning Rate: 3.45e-05 |
| 2025-08-30 05:17:29 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:17:42 - pico-train - INFO - Step 42575 -- ๐ Training Metrics |
| 2025-08-30 05:17:42 - pico-train - INFO - โโโ Loss: 6.0924 |
| 2025-08-30 05:17:42 - pico-train - INFO - โโโ Learning Rate: 3.45e-05 |
| 2025-08-30 05:17:42 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:17:54 - pico-train - INFO - Step 42600 -- ๐ Training Metrics |
| 2025-08-30 05:17:54 - pico-train - INFO - โโโ Loss: 6.0553 |
| 2025-08-30 05:17:54 - pico-train - INFO - โโโ Learning Rate: 3.45e-05 |
| 2025-08-30 05:17:54 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:18:07 - pico-train - INFO - Step 42625 -- ๐ Training Metrics |
| 2025-08-30 05:18:07 - pico-train - INFO - โโโ Loss: 6.1371 |
| 2025-08-30 05:18:07 - pico-train - INFO - โโโ Learning Rate: 3.45e-05 |
| 2025-08-30 05:18:07 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:18:19 - pico-train - INFO - Step 42650 -- ๐ Training Metrics |
| 2025-08-30 05:18:19 - pico-train - INFO - โโโ Loss: 6.0776 |
| 2025-08-30 05:18:19 - pico-train - INFO - โโโ Learning Rate: 3.44e-05 |
| 2025-08-30 05:18:19 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:18:32 - pico-train - INFO - Step 42675 -- ๐ Training Metrics |
| 2025-08-30 05:18:32 - pico-train - INFO - โโโ Loss: 6.1134 |
| 2025-08-30 05:18:32 - pico-train - INFO - โโโ Learning Rate: 3.44e-05 |
| 2025-08-30 05:18:32 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:18:45 - pico-train - INFO - Step 42700 -- ๐ Training Metrics |
| 2025-08-30 05:18:45 - pico-train - INFO - โโโ Loss: 5.9718 |
| 2025-08-30 05:18:45 - pico-train - INFO - โโโ Learning Rate: 3.44e-05 |
| 2025-08-30 05:18:45 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:18:57 - pico-train - INFO - Step 42725 -- ๐ Training Metrics |
| 2025-08-30 05:18:57 - pico-train - INFO - โโโ Loss: 6.0381 |
| 2025-08-30 05:18:57 - pico-train - INFO - โโโ Learning Rate: 3.44e-05 |
| 2025-08-30 05:18:57 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:19:10 - pico-train - INFO - Step 42750 -- ๐ Training Metrics |
| 2025-08-30 05:19:10 - pico-train - INFO - โโโ Loss: 6.1626 |
| 2025-08-30 05:19:10 - pico-train - INFO - โโโ Learning Rate: 3.44e-05 |
| 2025-08-30 05:19:10 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:19:22 - pico-train - INFO - Step 42775 -- ๐ Training Metrics |
| 2025-08-30 05:19:22 - pico-train - INFO - โโโ Loss: 6.0909 |
| 2025-08-30 05:19:22 - pico-train - INFO - โโโ Learning Rate: 3.43e-05 |
| 2025-08-30 05:19:22 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:19:35 - pico-train - INFO - Step 42800 -- ๐ Training Metrics |
| 2025-08-30 05:19:35 - pico-train - INFO - โโโ Loss: 6.1275 |
| 2025-08-30 05:19:35 - pico-train - INFO - โโโ Learning Rate: 3.43e-05 |
| 2025-08-30 05:19:35 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:19:47 - pico-train - INFO - Step 42825 -- ๐ Training Metrics |
| 2025-08-30 05:19:47 - pico-train - INFO - โโโ Loss: 6.0942 |
| 2025-08-30 05:19:47 - pico-train - INFO - โโโ Learning Rate: 3.43e-05 |
| 2025-08-30 05:19:47 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:20:00 - pico-train - INFO - Step 42850 -- ๐ Training Metrics |
| 2025-08-30 05:20:00 - pico-train - INFO - โโโ Loss: 6.0309 |
| 2025-08-30 05:20:00 - pico-train - INFO - โโโ Learning Rate: 3.43e-05 |
| 2025-08-30 05:20:00 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:20:12 - pico-train - INFO - Step 42875 -- ๐ Training Metrics |
| 2025-08-30 05:20:12 - pico-train - INFO - โโโ Loss: 6.1312 |
| 2025-08-30 05:20:12 - pico-train - INFO - โโโ Learning Rate: 3.43e-05 |
| 2025-08-30 05:20:12 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:20:25 - pico-train - INFO - Step 42900 -- ๐ Training Metrics |
| 2025-08-30 05:20:25 - pico-train - INFO - โโโ Loss: 6.1728 |
| 2025-08-30 05:20:25 - pico-train - INFO - โโโ Learning Rate: 3.43e-05 |
| 2025-08-30 05:20:25 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:20:38 - pico-train - INFO - Step 42925 -- ๐ Training Metrics |
| 2025-08-30 05:20:38 - pico-train - INFO - โโโ Loss: 5.9740 |
| 2025-08-30 05:20:38 - pico-train - INFO - โโโ Learning Rate: 3.42e-05 |
| 2025-08-30 05:20:38 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:20:50 - pico-train - INFO - Step 42950 -- ๐ Training Metrics |
| 2025-08-30 05:20:50 - pico-train - INFO - โโโ Loss: 6.0812 |
| 2025-08-30 05:20:50 - pico-train - INFO - โโโ Learning Rate: 3.42e-05 |
| 2025-08-30 05:20:50 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:21:03 - pico-train - INFO - Step 42975 -- ๐ Training Metrics |
| 2025-08-30 05:21:03 - pico-train - INFO - โโโ Loss: 6.0484 |
| 2025-08-30 05:21:03 - pico-train - INFO - โโโ Learning Rate: 3.42e-05 |
| 2025-08-30 05:21:03 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:21:15 - pico-train - INFO - Step 43000 -- ๐พ Saving Checkpoint |
| 2025-08-30 05:23:15 - pico-train - INFO - Step 43000 -- ๐ Evaluation Results |
| 2025-08-30 05:23:15 - pico-train - INFO - โโโ paloma: 4.4972099298583296e+27 |
| 2025-08-30 05:23:19 - pico-train - INFO - Step 43000 -- ๐ Training Metrics |
| 2025-08-30 05:23:19 - pico-train - INFO - โโโ Loss: 6.2475 |
| 2025-08-30 05:23:19 - pico-train - INFO - โโโ Learning Rate: 3.42e-05 |
| 2025-08-30 05:23:19 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:23:19 - pico-train - INFO - Step 43000 -- ๐ Saving Learning Dynamics |
| 2025-08-30 05:23:36 - pico-train - INFO - Step 43025 -- ๐ Training Metrics |
| 2025-08-30 05:23:36 - pico-train - INFO - โโโ Loss: 6.0959 |
| 2025-08-30 05:23:36 - pico-train - INFO - โโโ Learning Rate: 3.42e-05 |
| 2025-08-30 05:23:36 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:23:48 - pico-train - INFO - Step 43050 -- ๐ Training Metrics |
| 2025-08-30 05:23:48 - pico-train - INFO - โโโ Loss: 6.0753 |
| 2025-08-30 05:23:48 - pico-train - INFO - โโโ Learning Rate: 3.41e-05 |
| 2025-08-30 05:23:48 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:24:01 - pico-train - INFO - Step 43075 -- ๐ Training Metrics |
| 2025-08-30 05:24:01 - pico-train - INFO - โโโ Loss: 6.1130 |
| 2025-08-30 05:24:01 - pico-train - INFO - โโโ Learning Rate: 3.41e-05 |
| 2025-08-30 05:24:01 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:24:13 - pico-train - INFO - Step 43100 -- ๐ Training Metrics |
| 2025-08-30 05:24:13 - pico-train - INFO - โโโ Loss: 6.0777 |
| 2025-08-30 05:24:13 - pico-train - INFO - โโโ Learning Rate: 3.41e-05 |
| 2025-08-30 05:24:13 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:24:26 - pico-train - INFO - Step 43125 -- ๐ Training Metrics |
| 2025-08-30 05:24:26 - pico-train - INFO - โโโ Loss: 6.1311 |
| 2025-08-30 05:24:26 - pico-train - INFO - โโโ Learning Rate: 3.41e-05 |
| 2025-08-30 05:24:26 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:24:39 - pico-train - INFO - Step 43150 -- ๐ Training Metrics |
| 2025-08-30 05:24:39 - pico-train - INFO - โโโ Loss: 6.0421 |
| 2025-08-30 05:24:39 - pico-train - INFO - โโโ Learning Rate: 3.41e-05 |
| 2025-08-30 05:24:39 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:24:52 - pico-train - INFO - Step 43175 -- ๐ Training Metrics |
| 2025-08-30 05:24:52 - pico-train - INFO - โโโ Loss: 6.0355 |
| 2025-08-30 05:24:52 - pico-train - INFO - โโโ Learning Rate: 3.40e-05 |
| 2025-08-30 05:24:52 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:25:04 - pico-train - INFO - Step 43200 -- ๐ Training Metrics |
| 2025-08-30 05:25:04 - pico-train - INFO - โโโ Loss: 6.0889 |
| 2025-08-30 05:25:04 - pico-train - INFO - โโโ Learning Rate: 3.40e-05 |
| 2025-08-30 05:25:04 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:25:17 - pico-train - INFO - Step 43225 -- ๐ Training Metrics |
| 2025-08-30 05:25:17 - pico-train - INFO - โโโ Loss: 6.0605 |
| 2025-08-30 05:25:17 - pico-train - INFO - โโโ Learning Rate: 3.40e-05 |
| 2025-08-30 05:25:17 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:25:30 - pico-train - INFO - Step 43250 -- ๐ Training Metrics |
| 2025-08-30 05:25:30 - pico-train - INFO - โโโ Loss: 6.1064 |
| 2025-08-30 05:25:30 - pico-train - INFO - โโโ Learning Rate: 3.40e-05 |
| 2025-08-30 05:25:30 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:25:42 - pico-train - INFO - Step 43275 -- ๐ Training Metrics |
| 2025-08-30 05:25:42 - pico-train - INFO - โโโ Loss: 6.1053 |
| 2025-08-30 05:25:42 - pico-train - INFO - โโโ Learning Rate: 3.40e-05 |
| 2025-08-30 05:25:42 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:25:55 - pico-train - INFO - Step 43300 -- ๐ Training Metrics |
| 2025-08-30 05:25:55 - pico-train - INFO - โโโ Loss: 6.1399 |
| 2025-08-30 05:25:55 - pico-train - INFO - โโโ Learning Rate: 3.39e-05 |
| 2025-08-30 05:25:55 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:26:07 - pico-train - INFO - Step 43325 -- ๐ Training Metrics |
| 2025-08-30 05:26:07 - pico-train - INFO - โโโ Loss: 6.1271 |
| 2025-08-30 05:26:07 - pico-train - INFO - โโโ Learning Rate: 3.39e-05 |
| 2025-08-30 05:26:07 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:26:20 - pico-train - INFO - Step 43350 -- ๐ Training Metrics |
| 2025-08-30 05:26:20 - pico-train - INFO - โโโ Loss: 6.0790 |
| 2025-08-30 05:26:20 - pico-train - INFO - โโโ Learning Rate: 3.39e-05 |
| 2025-08-30 05:26:20 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:26:33 - pico-train - INFO - Step 43375 -- ๐ Training Metrics |
| 2025-08-30 05:26:33 - pico-train - INFO - โโโ Loss: 6.0567 |
| 2025-08-30 05:26:33 - pico-train - INFO - โโโ Learning Rate: 3.39e-05 |
| 2025-08-30 05:26:33 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:26:45 - pico-train - INFO - Step 43400 -- ๐ Training Metrics |
| 2025-08-30 05:26:45 - pico-train - INFO - โโโ Loss: 6.0771 |
| 2025-08-30 05:26:45 - pico-train - INFO - โโโ Learning Rate: 3.39e-05 |
| 2025-08-30 05:26:45 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:26:58 - pico-train - INFO - Step 43425 -- ๐ Training Metrics |
| 2025-08-30 05:26:58 - pico-train - INFO - โโโ Loss: 6.1399 |
| 2025-08-30 05:26:58 - pico-train - INFO - โโโ Learning Rate: 3.38e-05 |
| 2025-08-30 05:26:58 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:27:10 - pico-train - INFO - Step 43450 -- ๐ Training Metrics |
| 2025-08-30 05:27:10 - pico-train - INFO - โโโ Loss: 6.1330 |
| 2025-08-30 05:27:10 - pico-train - INFO - โโโ Learning Rate: 3.38e-05 |
| 2025-08-30 05:27:10 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:27:23 - pico-train - INFO - Step 43475 -- ๐ Training Metrics |
| 2025-08-30 05:27:23 - pico-train - INFO - โโโ Loss: 6.0139 |
| 2025-08-30 05:27:23 - pico-train - INFO - โโโ Learning Rate: 3.38e-05 |
| 2025-08-30 05:27:23 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:27:35 - pico-train - INFO - Step 43500 -- ๐พ Saving Checkpoint |
| 2025-08-30 05:29:43 - pico-train - INFO - Step 43500 -- ๐ Evaluation Results |
| 2025-08-30 05:29:43 - pico-train - INFO - โโโ paloma: 5.326210528222522e+27 |
| 2025-08-30 05:29:45 - pico-train - INFO - Step 43500 -- ๐ Training Metrics |
| 2025-08-30 05:29:45 - pico-train - INFO - โโโ Loss: 6.1439 |
| 2025-08-30 05:29:45 - pico-train - INFO - โโโ Learning Rate: 3.38e-05 |
| 2025-08-30 05:29:45 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:29:45 - pico-train - INFO - Step 43500 -- ๐ Saving Learning Dynamics |
| 2025-08-30 05:30:00 - pico-train - INFO - Step 43525 -- ๐ Training Metrics |
| 2025-08-30 05:30:00 - pico-train - INFO - โโโ Loss: 6.0445 |
| 2025-08-30 05:30:00 - pico-train - INFO - โโโ Learning Rate: 3.38e-05 |
| 2025-08-30 05:30:00 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:30:12 - pico-train - INFO - Step 43550 -- ๐ Training Metrics |
| 2025-08-30 05:30:12 - pico-train - INFO - โโโ Loss: 6.0780 |
| 2025-08-30 05:30:12 - pico-train - INFO - โโโ Learning Rate: 3.37e-05 |
| 2025-08-30 05:30:12 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:30:25 - pico-train - INFO - Step 43575 -- ๐ Training Metrics |
| 2025-08-30 05:30:25 - pico-train - INFO - โโโ Loss: 6.0044 |
| 2025-08-30 05:30:25 - pico-train - INFO - โโโ Learning Rate: 3.37e-05 |
| 2025-08-30 05:30:25 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:30:38 - pico-train - INFO - Step 43600 -- ๐ Training Metrics |
| 2025-08-30 05:30:38 - pico-train - INFO - โโโ Loss: 6.0087 |
| 2025-08-30 05:30:38 - pico-train - INFO - โโโ Learning Rate: 3.37e-05 |
| 2025-08-30 05:30:38 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:30:50 - pico-train - INFO - Step 43625 -- ๐ Training Metrics |
| 2025-08-30 05:30:50 - pico-train - INFO - โโโ Loss: 6.1263 |
| 2025-08-30 05:30:50 - pico-train - INFO - โโโ Learning Rate: 3.37e-05 |
| 2025-08-30 05:30:50 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:31:03 - pico-train - INFO - Step 43650 -- ๐ Training Metrics |
| 2025-08-30 05:31:03 - pico-train - INFO - โโโ Loss: 6.0459 |
| 2025-08-30 05:31:03 - pico-train - INFO - โโโ Learning Rate: 3.37e-05 |
| 2025-08-30 05:31:03 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:31:15 - pico-train - INFO - Step 43675 -- ๐ Training Metrics |
| 2025-08-30 05:31:15 - pico-train - INFO - โโโ Loss: 6.0390 |
| 2025-08-30 05:31:15 - pico-train - INFO - โโโ Learning Rate: 3.36e-05 |
| 2025-08-30 05:31:15 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:31:28 - pico-train - INFO - Step 43700 -- ๐ Training Metrics |
| 2025-08-30 05:31:28 - pico-train - INFO - โโโ Loss: 6.0918 |
| 2025-08-30 05:31:28 - pico-train - INFO - โโโ Learning Rate: 3.36e-05 |
| 2025-08-30 05:31:28 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:31:41 - pico-train - INFO - Step 43725 -- ๐ Training Metrics |
| 2025-08-30 05:31:41 - pico-train - INFO - โโโ Loss: 6.0426 |
| 2025-08-30 05:31:41 - pico-train - INFO - โโโ Learning Rate: 3.36e-05 |
| 2025-08-30 05:31:41 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:31:53 - pico-train - INFO - Step 43750 -- ๐ Training Metrics |
| 2025-08-30 05:31:53 - pico-train - INFO - โโโ Loss: 6.0634 |
| 2025-08-30 05:31:53 - pico-train - INFO - โโโ Learning Rate: 3.36e-05 |
| 2025-08-30 05:31:53 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:32:06 - pico-train - INFO - Step 43775 -- ๐ Training Metrics |
| 2025-08-30 05:32:06 - pico-train - INFO - โโโ Loss: 6.1042 |
| 2025-08-30 05:32:06 - pico-train - INFO - โโโ Learning Rate: 3.36e-05 |
| 2025-08-30 05:32:06 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:32:18 - pico-train - INFO - Step 43800 -- ๐ Training Metrics |
| 2025-08-30 05:32:18 - pico-train - INFO - โโโ Loss: 6.0510 |
| 2025-08-30 05:32:18 - pico-train - INFO - โโโ Learning Rate: 3.35e-05 |
| 2025-08-30 05:32:18 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:32:31 - pico-train - INFO - Step 43825 -- ๐ Training Metrics |
| 2025-08-30 05:32:31 - pico-train - INFO - โโโ Loss: 6.0403 |
| 2025-08-30 05:32:31 - pico-train - INFO - โโโ Learning Rate: 3.35e-05 |
| 2025-08-30 05:32:31 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:32:43 - pico-train - INFO - Step 43850 -- ๐ Training Metrics |
| 2025-08-30 05:32:43 - pico-train - INFO - โโโ Loss: 6.0537 |
| 2025-08-30 05:32:43 - pico-train - INFO - โโโ Learning Rate: 3.35e-05 |
| 2025-08-30 05:32:43 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:32:56 - pico-train - INFO - Step 43875 -- ๐ Training Metrics |
| 2025-08-30 05:32:56 - pico-train - INFO - โโโ Loss: 6.1244 |
| 2025-08-30 05:32:56 - pico-train - INFO - โโโ Learning Rate: 3.35e-05 |
| 2025-08-30 05:32:56 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:33:09 - pico-train - INFO - Step 43900 -- ๐ Training Metrics |
| 2025-08-30 05:33:09 - pico-train - INFO - โโโ Loss: 6.1294 |
| 2025-08-30 05:33:09 - pico-train - INFO - โโโ Learning Rate: 3.35e-05 |
| 2025-08-30 05:33:09 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:33:21 - pico-train - INFO - Step 43925 -- ๐ Training Metrics |
| 2025-08-30 05:33:21 - pico-train - INFO - โโโ Loss: 6.0845 |
| 2025-08-30 05:33:21 - pico-train - INFO - โโโ Learning Rate: 3.34e-05 |
| 2025-08-30 05:33:21 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:33:34 - pico-train - INFO - Step 43950 -- ๐ Training Metrics |
| 2025-08-30 05:33:34 - pico-train - INFO - โโโ Loss: 6.0365 |
| 2025-08-30 05:33:34 - pico-train - INFO - โโโ Learning Rate: 3.34e-05 |
| 2025-08-30 05:33:34 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:33:46 - pico-train - INFO - Step 43975 -- ๐ Training Metrics |
| 2025-08-30 05:33:46 - pico-train - INFO - โโโ Loss: 6.0507 |
| 2025-08-30 05:33:46 - pico-train - INFO - โโโ Learning Rate: 3.34e-05 |
| 2025-08-30 05:33:46 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:33:58 - pico-train - INFO - Step 44000 -- ๐พ Saving Checkpoint |
| 2025-08-30 05:35:55 - pico-train - INFO - Step 44000 -- ๐ Evaluation Results |
| 2025-08-30 05:35:55 - pico-train - INFO - โโโ paloma: 1.0515089806395209e+28 |
| 2025-08-30 05:35:57 - pico-train - INFO - Step 44000 -- ๐ Training Metrics |
| 2025-08-30 05:35:57 - pico-train - INFO - โโโ Loss: 5.9669 |
| 2025-08-30 05:35:57 - pico-train - INFO - โโโ Learning Rate: 3.34e-05 |
| 2025-08-30 05:35:57 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:35:57 - pico-train - INFO - Step 44000 -- ๐ Saving Learning Dynamics |
| 2025-08-30 05:36:12 - pico-train - INFO - Step 44025 -- ๐ Training Metrics |
| 2025-08-30 05:36:12 - pico-train - INFO - โโโ Loss: 6.0454 |
| 2025-08-30 05:36:12 - pico-train - INFO - โโโ Learning Rate: 3.34e-05 |
| 2025-08-30 05:36:12 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:36:25 - pico-train - INFO - Step 44050 -- ๐ Training Metrics |
| 2025-08-30 05:36:25 - pico-train - INFO - โโโ Loss: 6.0395 |
| 2025-08-30 05:36:25 - pico-train - INFO - โโโ Learning Rate: 3.33e-05 |
| 2025-08-30 05:36:25 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:36:38 - pico-train - INFO - Step 44075 -- ๐ Training Metrics |
| 2025-08-30 05:36:38 - pico-train - INFO - โโโ Loss: 5.9733 |
| 2025-08-30 05:36:38 - pico-train - INFO - โโโ Learning Rate: 3.33e-05 |
| 2025-08-30 05:36:38 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:36:50 - pico-train - INFO - Step 44100 -- ๐ Training Metrics |
| 2025-08-30 05:36:50 - pico-train - INFO - โโโ Loss: 6.1172 |
| 2025-08-30 05:36:50 - pico-train - INFO - โโโ Learning Rate: 3.33e-05 |
| 2025-08-30 05:36:50 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:37:03 - pico-train - INFO - Step 44125 -- ๐ Training Metrics |
| 2025-08-30 05:37:03 - pico-train - INFO - โโโ Loss: 6.0527 |
| 2025-08-30 05:37:03 - pico-train - INFO - โโโ Learning Rate: 3.33e-05 |
| 2025-08-30 05:37:03 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:37:15 - pico-train - INFO - Step 44150 -- ๐ Training Metrics |
| 2025-08-30 05:37:15 - pico-train - INFO - โโโ Loss: 6.0853 |
| 2025-08-30 05:37:15 - pico-train - INFO - โโโ Learning Rate: 3.33e-05 |
| 2025-08-30 05:37:15 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:37:28 - pico-train - INFO - Step 44175 -- ๐ Training Metrics |
| 2025-08-30 05:37:28 - pico-train - INFO - โโโ Loss: 6.0303 |
| 2025-08-30 05:37:28 - pico-train - INFO - โโโ Learning Rate: 3.32e-05 |
| 2025-08-30 05:37:28 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:37:40 - pico-train - INFO - Step 44200 -- ๐ Training Metrics |
| 2025-08-30 05:37:40 - pico-train - INFO - โโโ Loss: 5.9986 |
| 2025-08-30 05:37:40 - pico-train - INFO - โโโ Learning Rate: 3.32e-05 |
| 2025-08-30 05:37:40 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:37:53 - pico-train - INFO - Step 44225 -- ๐ Training Metrics |
| 2025-08-30 05:37:53 - pico-train - INFO - โโโ Loss: 6.0450 |
| 2025-08-30 05:37:53 - pico-train - INFO - โโโ Learning Rate: 3.32e-05 |
| 2025-08-30 05:37:53 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:38:06 - pico-train - INFO - Step 44250 -- ๐ Training Metrics |
| 2025-08-30 05:38:06 - pico-train - INFO - โโโ Loss: 6.0449 |
| 2025-08-30 05:38:06 - pico-train - INFO - โโโ Learning Rate: 3.32e-05 |
| 2025-08-30 05:38:06 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:38:18 - pico-train - INFO - Step 44275 -- ๐ Training Metrics |
| 2025-08-30 05:38:18 - pico-train - INFO - โโโ Loss: 6.0811 |
| 2025-08-30 05:38:18 - pico-train - INFO - โโโ Learning Rate: 3.32e-05 |
| 2025-08-30 05:38:18 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:38:31 - pico-train - INFO - Step 44300 -- ๐ Training Metrics |
| 2025-08-30 05:38:31 - pico-train - INFO - โโโ Loss: 6.0524 |
| 2025-08-30 05:38:31 - pico-train - INFO - โโโ Learning Rate: 3.31e-05 |
| 2025-08-30 05:38:31 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:38:43 - pico-train - INFO - Step 44325 -- ๐ Training Metrics |
| 2025-08-30 05:38:43 - pico-train - INFO - โโโ Loss: 6.0148 |
| 2025-08-30 05:38:43 - pico-train - INFO - โโโ Learning Rate: 3.31e-05 |
| 2025-08-30 05:38:43 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:38:56 - pico-train - INFO - Step 44350 -- ๐ Training Metrics |
| 2025-08-30 05:38:56 - pico-train - INFO - โโโ Loss: 6.0216 |
| 2025-08-30 05:38:56 - pico-train - INFO - โโโ Learning Rate: 3.31e-05 |
| 2025-08-30 05:38:56 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:39:08 - pico-train - INFO - Step 44375 -- ๐ Training Metrics |
| 2025-08-30 05:39:08 - pico-train - INFO - โโโ Loss: 5.9966 |
| 2025-08-30 05:39:08 - pico-train - INFO - โโโ Learning Rate: 3.31e-05 |
| 2025-08-30 05:39:08 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:39:21 - pico-train - INFO - Step 44400 -- ๐ Training Metrics |
| 2025-08-30 05:39:21 - pico-train - INFO - โโโ Loss: 6.0301 |
| 2025-08-30 05:39:21 - pico-train - INFO - โโโ Learning Rate: 3.30e-05 |
| 2025-08-30 05:39:21 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:39:34 - pico-train - INFO - Step 44425 -- ๐ Training Metrics |
| 2025-08-30 05:39:34 - pico-train - INFO - โโโ Loss: 6.1473 |
| 2025-08-30 05:39:34 - pico-train - INFO - โโโ Learning Rate: 3.30e-05 |
| 2025-08-30 05:39:34 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:39:46 - pico-train - INFO - Step 44450 -- ๐ Training Metrics |
| 2025-08-30 05:39:46 - pico-train - INFO - โโโ Loss: 6.0092 |
| 2025-08-30 05:39:46 - pico-train - INFO - โโโ Learning Rate: 3.30e-05 |
| 2025-08-30 05:39:46 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:39:59 - pico-train - INFO - Step 44475 -- ๐ Training Metrics |
| 2025-08-30 05:39:59 - pico-train - INFO - โโโ Loss: 6.0807 |
| 2025-08-30 05:39:59 - pico-train - INFO - โโโ Learning Rate: 3.30e-05 |
| 2025-08-30 05:39:59 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:40:11 - pico-train - INFO - Step 44500 -- ๐พ Saving Checkpoint |
| 2025-08-30 05:42:28 - pico-train - INFO - Step 44500 -- ๐ Evaluation Results |
| 2025-08-30 05:42:28 - pico-train - INFO - โโโ paloma: 9.953158679071717e+27 |
| 2025-08-30 05:42:31 - pico-train - INFO - Step 44500 -- ๐ Training Metrics |
| 2025-08-30 05:42:31 - pico-train - INFO - โโโ Loss: 6.0974 |
| 2025-08-30 05:42:31 - pico-train - INFO - โโโ Learning Rate: 3.30e-05 |
| 2025-08-30 05:42:31 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:42:31 - pico-train - INFO - Step 44500 -- ๐ Saving Learning Dynamics |
| 2025-08-30 05:42:46 - pico-train - INFO - Step 44525 -- ๐ Training Metrics |
| 2025-08-30 05:42:46 - pico-train - INFO - โโโ Loss: 6.0606 |
| 2025-08-30 05:42:46 - pico-train - INFO - โโโ Learning Rate: 3.29e-05 |
| 2025-08-30 05:42:46 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:42:59 - pico-train - INFO - Step 44550 -- ๐ Training Metrics |
| 2025-08-30 05:42:59 - pico-train - INFO - โโโ Loss: 6.0374 |
| 2025-08-30 05:42:59 - pico-train - INFO - โโโ Learning Rate: 3.29e-05 |
| 2025-08-30 05:42:59 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:43:11 - pico-train - INFO - Step 44575 -- ๐ Training Metrics |
| 2025-08-30 05:43:11 - pico-train - INFO - โโโ Loss: 5.9995 |
| 2025-08-30 05:43:11 - pico-train - INFO - โโโ Learning Rate: 3.29e-05 |
| 2025-08-30 05:43:11 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:43:24 - pico-train - INFO - Step 44600 -- ๐ Training Metrics |
| 2025-08-30 05:43:24 - pico-train - INFO - โโโ Loss: 6.0354 |
| 2025-08-30 05:43:24 - pico-train - INFO - โโโ Learning Rate: 3.29e-05 |
| 2025-08-30 05:43:24 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:43:36 - pico-train - INFO - Step 44625 -- ๐ Training Metrics |
| 2025-08-30 05:43:36 - pico-train - INFO - โโโ Loss: 6.0512 |
| 2025-08-30 05:43:36 - pico-train - INFO - โโโ Learning Rate: 3.29e-05 |
| 2025-08-30 05:43:36 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:43:49 - pico-train - INFO - Step 44650 -- ๐ Training Metrics |
| 2025-08-30 05:43:49 - pico-train - INFO - โโโ Loss: 5.9998 |
| 2025-08-30 05:43:49 - pico-train - INFO - โโโ Learning Rate: 3.28e-05 |
| 2025-08-30 05:43:49 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:44:01 - pico-train - INFO - Step 44675 -- ๐ Training Metrics |
| 2025-08-30 05:44:01 - pico-train - INFO - โโโ Loss: 6.0010 |
| 2025-08-30 05:44:01 - pico-train - INFO - โโโ Learning Rate: 3.28e-05 |
| 2025-08-30 05:44:01 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:44:14 - pico-train - INFO - Step 44700 -- ๐ Training Metrics |
| 2025-08-30 05:44:14 - pico-train - INFO - โโโ Loss: 6.0795 |
| 2025-08-30 05:44:14 - pico-train - INFO - โโโ Learning Rate: 3.28e-05 |
| 2025-08-30 05:44:14 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:44:26 - pico-train - INFO - Step 44725 -- ๐ Training Metrics |
| 2025-08-30 05:44:26 - pico-train - INFO - โโโ Loss: 6.0255 |
| 2025-08-30 05:44:26 - pico-train - INFO - โโโ Learning Rate: 3.28e-05 |
| 2025-08-30 05:44:26 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:44:39 - pico-train - INFO - Step 44750 -- ๐ Training Metrics |
| 2025-08-30 05:44:39 - pico-train - INFO - โโโ Loss: 6.0648 |
| 2025-08-30 05:44:39 - pico-train - INFO - โโโ Learning Rate: 3.28e-05 |
| 2025-08-30 05:44:39 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:44:52 - pico-train - INFO - Step 44775 -- ๐ Training Metrics |
| 2025-08-30 05:44:52 - pico-train - INFO - โโโ Loss: 6.0873 |
| 2025-08-30 05:44:52 - pico-train - INFO - โโโ Learning Rate: 3.27e-05 |
| 2025-08-30 05:44:52 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:45:04 - pico-train - INFO - Step 44800 -- ๐ Training Metrics |
| 2025-08-30 05:45:04 - pico-train - INFO - โโโ Loss: 6.0366 |
| 2025-08-30 05:45:04 - pico-train - INFO - โโโ Learning Rate: 3.27e-05 |
| 2025-08-30 05:45:04 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:45:17 - pico-train - INFO - Step 44825 -- ๐ Training Metrics |
| 2025-08-30 05:45:17 - pico-train - INFO - โโโ Loss: 6.0182 |
| 2025-08-30 05:45:17 - pico-train - INFO - โโโ Learning Rate: 3.27e-05 |
| 2025-08-30 05:45:17 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:45:29 - pico-train - INFO - Step 44850 -- ๐ Training Metrics |
| 2025-08-30 05:45:29 - pico-train - INFO - โโโ Loss: 6.0006 |
| 2025-08-30 05:45:29 - pico-train - INFO - โโโ Learning Rate: 3.27e-05 |
| 2025-08-30 05:45:29 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:45:42 - pico-train - INFO - Step 44875 -- ๐ Training Metrics |
| 2025-08-30 05:45:42 - pico-train - INFO - โโโ Loss: 6.0773 |
| 2025-08-30 05:45:42 - pico-train - INFO - โโโ Learning Rate: 3.27e-05 |
| 2025-08-30 05:45:42 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:45:54 - pico-train - INFO - Step 44900 -- ๐ Training Metrics |
| 2025-08-30 05:45:54 - pico-train - INFO - โโโ Loss: 6.0644 |
| 2025-08-30 05:45:54 - pico-train - INFO - โโโ Learning Rate: 3.26e-05 |
| 2025-08-30 05:45:54 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:46:07 - pico-train - INFO - Step 44925 -- ๐ Training Metrics |
| 2025-08-30 05:46:07 - pico-train - INFO - โโโ Loss: 6.0927 |
| 2025-08-30 05:46:07 - pico-train - INFO - โโโ Learning Rate: 3.26e-05 |
| 2025-08-30 05:46:07 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:46:20 - pico-train - INFO - Step 44950 -- ๐ Training Metrics |
| 2025-08-30 05:46:20 - pico-train - INFO - โโโ Loss: 6.0458 |
| 2025-08-30 05:46:20 - pico-train - INFO - โโโ Learning Rate: 3.26e-05 |
| 2025-08-30 05:46:20 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:46:32 - pico-train - INFO - Step 44975 -- ๐ Training Metrics |
| 2025-08-30 05:46:32 - pico-train - INFO - โโโ Loss: 6.0466 |
| 2025-08-30 05:46:32 - pico-train - INFO - โโโ Learning Rate: 3.26e-05 |
| 2025-08-30 05:46:32 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:46:44 - pico-train - INFO - Step 45000 -- ๐พ Saving Checkpoint |
| 2025-08-30 05:48:48 - pico-train - INFO - Step 45000 -- ๐ Evaluation Results |
| 2025-08-30 05:48:48 - pico-train - INFO - โโโ paloma: 1.3981708485109732e+28 |
| 2025-08-30 05:48:50 - pico-train - INFO - Step 45000 -- ๐ Training Metrics |
| 2025-08-30 05:48:50 - pico-train - INFO - โโโ Loss: 6.0790 |
| 2025-08-30 05:48:50 - pico-train - INFO - โโโ Learning Rate: 3.26e-05 |
| 2025-08-30 05:48:50 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:48:50 - pico-train - INFO - Step 45000 -- ๐ Saving Learning Dynamics |
| 2025-08-30 05:49:05 - pico-train - INFO - Step 45025 -- ๐ Training Metrics |
| 2025-08-30 05:49:05 - pico-train - INFO - โโโ Loss: 6.0231 |
| 2025-08-30 05:49:05 - pico-train - INFO - โโโ Learning Rate: 3.25e-05 |
| 2025-08-30 05:49:05 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:49:17 - pico-train - INFO - Step 45050 -- ๐ Training Metrics |
| 2025-08-30 05:49:17 - pico-train - INFO - โโโ Loss: 6.0257 |
| 2025-08-30 05:49:17 - pico-train - INFO - โโโ Learning Rate: 3.25e-05 |
| 2025-08-30 05:49:17 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:49:30 - pico-train - INFO - Step 45075 -- ๐ Training Metrics |
| 2025-08-30 05:49:30 - pico-train - INFO - โโโ Loss: 6.0401 |
| 2025-08-30 05:49:30 - pico-train - INFO - โโโ Learning Rate: 3.25e-05 |
| 2025-08-30 05:49:30 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:49:43 - pico-train - INFO - Step 45100 -- ๐ Training Metrics |
| 2025-08-30 05:49:43 - pico-train - INFO - โโโ Loss: 6.0050 |
| 2025-08-30 05:49:43 - pico-train - INFO - โโโ Learning Rate: 3.25e-05 |
| 2025-08-30 05:49:43 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:49:56 - pico-train - INFO - Step 45125 -- ๐ Training Metrics |
| 2025-08-30 05:49:56 - pico-train - INFO - โโโ Loss: 6.0666 |
| 2025-08-30 05:49:56 - pico-train - INFO - โโโ Learning Rate: 3.25e-05 |
| 2025-08-30 05:49:56 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:50:08 - pico-train - INFO - Step 45150 -- ๐ Training Metrics |
| 2025-08-30 05:50:08 - pico-train - INFO - โโโ Loss: 6.0214 |
| 2025-08-30 05:50:08 - pico-train - INFO - โโโ Learning Rate: 3.24e-05 |
| 2025-08-30 05:50:08 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:50:21 - pico-train - INFO - Step 45175 -- ๐ Training Metrics |
| 2025-08-30 05:50:21 - pico-train - INFO - โโโ Loss: 6.1788 |
| 2025-08-30 05:50:21 - pico-train - INFO - โโโ Learning Rate: 3.24e-05 |
| 2025-08-30 05:50:21 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:50:33 - pico-train - INFO - Step 45200 -- ๐ Training Metrics |
| 2025-08-30 05:50:33 - pico-train - INFO - โโโ Loss: 6.0156 |
| 2025-08-30 05:50:33 - pico-train - INFO - โโโ Learning Rate: 3.24e-05 |
| 2025-08-30 05:50:33 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:50:46 - pico-train - INFO - Step 45225 -- ๐ Training Metrics |
| 2025-08-30 05:50:46 - pico-train - INFO - โโโ Loss: 6.0201 |
| 2025-08-30 05:50:46 - pico-train - INFO - โโโ Learning Rate: 3.24e-05 |
| 2025-08-30 05:50:46 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:50:59 - pico-train - INFO - Step 45250 -- ๐ Training Metrics |
| 2025-08-30 05:50:59 - pico-train - INFO - โโโ Loss: 6.0011 |
| 2025-08-30 05:50:59 - pico-train - INFO - โโโ Learning Rate: 3.24e-05 |
| 2025-08-30 05:50:59 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:51:11 - pico-train - INFO - Step 45275 -- ๐ Training Metrics |
| 2025-08-30 05:51:11 - pico-train - INFO - โโโ Loss: 6.1612 |
| 2025-08-30 05:51:11 - pico-train - INFO - โโโ Learning Rate: 3.23e-05 |
| 2025-08-30 05:51:11 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:51:24 - pico-train - INFO - Step 45300 -- ๐ Training Metrics |
| 2025-08-30 05:51:24 - pico-train - INFO - โโโ Loss: 6.0480 |
| 2025-08-30 05:51:24 - pico-train - INFO - โโโ Learning Rate: 3.23e-05 |
| 2025-08-30 05:51:24 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:51:36 - pico-train - INFO - Step 45325 -- ๐ Training Metrics |
| 2025-08-30 05:51:36 - pico-train - INFO - โโโ Loss: 5.9685 |
| 2025-08-30 05:51:36 - pico-train - INFO - โโโ Learning Rate: 3.23e-05 |
| 2025-08-30 05:51:36 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:51:49 - pico-train - INFO - Step 45350 -- ๐ Training Metrics |
| 2025-08-30 05:51:49 - pico-train - INFO - โโโ Loss: 6.0803 |
| 2025-08-30 05:51:49 - pico-train - INFO - โโโ Learning Rate: 3.23e-05 |
| 2025-08-30 05:51:49 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:52:02 - pico-train - INFO - Step 45375 -- ๐ Training Metrics |
| 2025-08-30 05:52:02 - pico-train - INFO - โโโ Loss: 6.0258 |
| 2025-08-30 05:52:02 - pico-train - INFO - โโโ Learning Rate: 3.23e-05 |
| 2025-08-30 05:52:02 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:52:14 - pico-train - INFO - Step 45400 -- ๐ Training Metrics |
| 2025-08-30 05:52:14 - pico-train - INFO - โโโ Loss: 6.0367 |
| 2025-08-30 05:52:14 - pico-train - INFO - โโโ Learning Rate: 3.22e-05 |
| 2025-08-30 05:52:14 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:52:27 - pico-train - INFO - Step 45425 -- ๐ Training Metrics |
| 2025-08-30 05:52:27 - pico-train - INFO - โโโ Loss: 5.9915 |
| 2025-08-30 05:52:27 - pico-train - INFO - โโโ Learning Rate: 3.22e-05 |
| 2025-08-30 05:52:27 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:52:39 - pico-train - INFO - Step 45450 -- ๐ Training Metrics |
| 2025-08-30 05:52:39 - pico-train - INFO - โโโ Loss: 5.9926 |
| 2025-08-30 05:52:39 - pico-train - INFO - โโโ Learning Rate: 3.22e-05 |
| 2025-08-30 05:52:39 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:52:52 - pico-train - INFO - Step 45475 -- ๐ Training Metrics |
| 2025-08-30 05:52:52 - pico-train - INFO - โโโ Loss: 5.9767 |
| 2025-08-30 05:52:52 - pico-train - INFO - โโโ Learning Rate: 3.22e-05 |
| 2025-08-30 05:52:52 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:53:04 - pico-train - INFO - Step 45500 -- ๐พ Saving Checkpoint |
| 2025-08-30 05:55:00 - pico-train - INFO - Step 45500 -- ๐ Evaluation Results |
| 2025-08-30 05:55:00 - pico-train - INFO - โโโ paloma: 2.1286507820171466e+28 |
| 2025-08-30 05:55:02 - pico-train - INFO - Step 45500 -- ๐ Training Metrics |
| 2025-08-30 05:55:02 - pico-train - INFO - โโโ Loss: 6.0752 |
| 2025-08-30 05:55:02 - pico-train - INFO - โโโ Learning Rate: 3.22e-05 |
| 2025-08-30 05:55:02 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:55:02 - pico-train - INFO - Step 45500 -- ๐ Saving Learning Dynamics |
| 2025-08-30 05:55:17 - pico-train - INFO - Step 45525 -- ๐ Training Metrics |
| 2025-08-30 05:55:17 - pico-train - INFO - โโโ Loss: 6.0444 |
| 2025-08-30 05:55:17 - pico-train - INFO - โโโ Learning Rate: 3.21e-05 |
| 2025-08-30 05:55:17 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:55:29 - pico-train - INFO - Step 45550 -- ๐ Training Metrics |
| 2025-08-30 05:55:29 - pico-train - INFO - โโโ Loss: 6.0119 |
| 2025-08-30 05:55:29 - pico-train - INFO - โโโ Learning Rate: 3.21e-05 |
| 2025-08-30 05:55:29 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:55:42 - pico-train - INFO - Step 45575 -- ๐ Training Metrics |
| 2025-08-30 05:55:42 - pico-train - INFO - โโโ Loss: 6.0627 |
| 2025-08-30 05:55:42 - pico-train - INFO - โโโ Learning Rate: 3.21e-05 |
| 2025-08-30 05:55:42 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:55:55 - pico-train - INFO - Step 45600 -- ๐ Training Metrics |
| 2025-08-30 05:55:55 - pico-train - INFO - โโโ Loss: 5.9389 |
| 2025-08-30 05:55:55 - pico-train - INFO - โโโ Learning Rate: 3.21e-05 |
| 2025-08-30 05:55:55 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:56:07 - pico-train - INFO - Step 45625 -- ๐ Training Metrics |
| 2025-08-30 05:56:07 - pico-train - INFO - โโโ Loss: 6.1041 |
| 2025-08-30 05:56:07 - pico-train - INFO - โโโ Learning Rate: 3.21e-05 |
| 2025-08-30 05:56:07 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:56:20 - pico-train - INFO - Step 45650 -- ๐ Training Metrics |
| 2025-08-30 05:56:20 - pico-train - INFO - โโโ Loss: 6.0837 |
| 2025-08-30 05:56:20 - pico-train - INFO - โโโ Learning Rate: 3.20e-05 |
| 2025-08-30 05:56:20 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:56:33 - pico-train - INFO - Step 45675 -- ๐ Training Metrics |
| 2025-08-30 05:56:33 - pico-train - INFO - โโโ Loss: 6.0495 |
| 2025-08-30 05:56:33 - pico-train - INFO - โโโ Learning Rate: 3.20e-05 |
| 2025-08-30 05:56:33 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:56:45 - pico-train - INFO - Step 45700 -- ๐ Training Metrics |
| 2025-08-30 05:56:45 - pico-train - INFO - โโโ Loss: 6.0507 |
| 2025-08-30 05:56:45 - pico-train - INFO - โโโ Learning Rate: 3.20e-05 |
| 2025-08-30 05:56:45 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:56:58 - pico-train - INFO - Step 45725 -- ๐ Training Metrics |
| 2025-08-30 05:56:58 - pico-train - INFO - โโโ Loss: 6.0594 |
| 2025-08-30 05:56:58 - pico-train - INFO - โโโ Learning Rate: 3.20e-05 |
| 2025-08-30 05:56:58 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:57:11 - pico-train - INFO - Step 45750 -- ๐ Training Metrics |
| 2025-08-30 05:57:11 - pico-train - INFO - โโโ Loss: 6.0685 |
| 2025-08-30 05:57:11 - pico-train - INFO - โโโ Learning Rate: 3.20e-05 |
| 2025-08-30 05:57:11 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:57:23 - pico-train - INFO - Step 45775 -- ๐ Training Metrics |
| 2025-08-30 05:57:23 - pico-train - INFO - โโโ Loss: 6.0040 |
| 2025-08-30 05:57:23 - pico-train - INFO - โโโ Learning Rate: 3.19e-05 |
| 2025-08-30 05:57:23 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:57:36 - pico-train - INFO - Step 45800 -- ๐ Training Metrics |
| 2025-08-30 05:57:36 - pico-train - INFO - โโโ Loss: 6.0630 |
| 2025-08-30 05:57:36 - pico-train - INFO - โโโ Learning Rate: 3.19e-05 |
| 2025-08-30 05:57:36 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:57:48 - pico-train - INFO - Step 45825 -- ๐ Training Metrics |
| 2025-08-30 05:57:48 - pico-train - INFO - โโโ Loss: 6.0334 |
| 2025-08-30 05:57:48 - pico-train - INFO - โโโ Learning Rate: 3.19e-05 |
| 2025-08-30 05:57:48 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:58:01 - pico-train - INFO - Step 45850 -- ๐ Training Metrics |
| 2025-08-30 05:58:01 - pico-train - INFO - โโโ Loss: 6.0141 |
| 2025-08-30 05:58:01 - pico-train - INFO - โโโ Learning Rate: 3.19e-05 |
| 2025-08-30 05:58:01 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:58:14 - pico-train - INFO - Step 45875 -- ๐ Training Metrics |
| 2025-08-30 05:58:14 - pico-train - INFO - โโโ Loss: 6.0175 |
| 2025-08-30 05:58:14 - pico-train - INFO - โโโ Learning Rate: 3.18e-05 |
| 2025-08-30 05:58:14 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:58:27 - pico-train - INFO - Step 45900 -- ๐ Training Metrics |
| 2025-08-30 05:58:27 - pico-train - INFO - โโโ Loss: 6.0745 |
| 2025-08-30 05:58:27 - pico-train - INFO - โโโ Learning Rate: 3.18e-05 |
| 2025-08-30 05:58:27 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:58:39 - pico-train - INFO - Step 45925 -- ๐ Training Metrics |
| 2025-08-30 05:58:39 - pico-train - INFO - โโโ Loss: 6.0172 |
| 2025-08-30 05:58:39 - pico-train - INFO - โโโ Learning Rate: 3.18e-05 |
| 2025-08-30 05:58:39 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:58:52 - pico-train - INFO - Step 45950 -- ๐ Training Metrics |
| 2025-08-30 05:58:52 - pico-train - INFO - โโโ Loss: 5.9627 |
| 2025-08-30 05:58:52 - pico-train - INFO - โโโ Learning Rate: 3.18e-05 |
| 2025-08-30 05:58:52 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:59:04 - pico-train - INFO - Step 45975 -- ๐ Training Metrics |
| 2025-08-30 05:59:04 - pico-train - INFO - โโโ Loss: 5.9906 |
| 2025-08-30 05:59:04 - pico-train - INFO - โโโ Learning Rate: 3.18e-05 |
| 2025-08-30 05:59:04 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 05:59:17 - pico-train - INFO - Step 46000 -- ๐พ Saving Checkpoint |
| 2025-08-30 06:01:12 - pico-train - INFO - Step 46000 -- ๐ Evaluation Results |
| 2025-08-30 06:01:12 - pico-train - INFO - โโโ paloma: 2.287805203128674e+28 |
| 2025-08-30 06:01:14 - pico-train - INFO - Step 46000 -- ๐ Training Metrics |
| 2025-08-30 06:01:14 - pico-train - INFO - โโโ Loss: 6.0973 |
| 2025-08-30 06:01:14 - pico-train - INFO - โโโ Learning Rate: 3.17e-05 |
| 2025-08-30 06:01:14 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:01:14 - pico-train - INFO - Step 46000 -- ๐ Saving Learning Dynamics |
| 2025-08-30 06:01:29 - pico-train - INFO - Step 46025 -- ๐ Training Metrics |
| 2025-08-30 06:01:29 - pico-train - INFO - โโโ Loss: 5.9999 |
| 2025-08-30 06:01:29 - pico-train - INFO - โโโ Learning Rate: 3.17e-05 |
| 2025-08-30 06:01:29 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:01:41 - pico-train - INFO - Step 46050 -- ๐ Training Metrics |
| 2025-08-30 06:01:41 - pico-train - INFO - โโโ Loss: 5.9786 |
| 2025-08-30 06:01:41 - pico-train - INFO - โโโ Learning Rate: 3.17e-05 |
| 2025-08-30 06:01:41 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:01:54 - pico-train - INFO - Step 46075 -- ๐ Training Metrics |
| 2025-08-30 06:01:54 - pico-train - INFO - โโโ Loss: 6.0511 |
| 2025-08-30 06:01:54 - pico-train - INFO - โโโ Learning Rate: 3.17e-05 |
| 2025-08-30 06:01:54 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:02:06 - pico-train - INFO - Step 46100 -- ๐ Training Metrics |
| 2025-08-30 06:02:06 - pico-train - INFO - โโโ Loss: 5.9915 |
| 2025-08-30 06:02:06 - pico-train - INFO - โโโ Learning Rate: 3.17e-05 |
| 2025-08-30 06:02:06 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:02:19 - pico-train - INFO - Step 46125 -- ๐ Training Metrics |
| 2025-08-30 06:02:19 - pico-train - INFO - โโโ Loss: 6.0164 |
| 2025-08-30 06:02:19 - pico-train - INFO - โโโ Learning Rate: 3.16e-05 |
| 2025-08-30 06:02:19 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:02:32 - pico-train - INFO - Step 46150 -- ๐ Training Metrics |
| 2025-08-30 06:02:32 - pico-train - INFO - โโโ Loss: 6.0278 |
| 2025-08-30 06:02:32 - pico-train - INFO - โโโ Learning Rate: 3.16e-05 |
| 2025-08-30 06:02:32 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:02:45 - pico-train - INFO - Step 46175 -- ๐ Training Metrics |
| 2025-08-30 06:02:45 - pico-train - INFO - โโโ Loss: 5.9636 |
| 2025-08-30 06:02:45 - pico-train - INFO - โโโ Learning Rate: 3.16e-05 |
| 2025-08-30 06:02:45 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:02:57 - pico-train - INFO - Step 46200 -- ๐ Training Metrics |
| 2025-08-30 06:02:57 - pico-train - INFO - โโโ Loss: 5.9233 |
| 2025-08-30 06:02:57 - pico-train - INFO - โโโ Learning Rate: 3.16e-05 |
| 2025-08-30 06:02:57 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:03:10 - pico-train - INFO - Step 46225 -- ๐ Training Metrics |
| 2025-08-30 06:03:10 - pico-train - INFO - โโโ Loss: 6.1381 |
| 2025-08-30 06:03:10 - pico-train - INFO - โโโ Learning Rate: 3.16e-05 |
| 2025-08-30 06:03:10 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:03:23 - pico-train - INFO - Step 46250 -- ๐ Training Metrics |
| 2025-08-30 06:03:23 - pico-train - INFO - โโโ Loss: 5.9423 |
| 2025-08-30 06:03:23 - pico-train - INFO - โโโ Learning Rate: 3.15e-05 |
| 2025-08-30 06:03:23 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:03:36 - pico-train - INFO - Step 46275 -- ๐ Training Metrics |
| 2025-08-30 06:03:36 - pico-train - INFO - โโโ Loss: 5.9885 |
| 2025-08-30 06:03:36 - pico-train - INFO - โโโ Learning Rate: 3.15e-05 |
| 2025-08-30 06:03:36 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:03:48 - pico-train - INFO - Step 46300 -- ๐ Training Metrics |
| 2025-08-30 06:03:48 - pico-train - INFO - โโโ Loss: 6.0572 |
| 2025-08-30 06:03:48 - pico-train - INFO - โโโ Learning Rate: 3.15e-05 |
| 2025-08-30 06:03:48 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:04:01 - pico-train - INFO - Step 46325 -- ๐ Training Metrics |
| 2025-08-30 06:04:01 - pico-train - INFO - โโโ Loss: 6.0765 |
| 2025-08-30 06:04:01 - pico-train - INFO - โโโ Learning Rate: 3.15e-05 |
| 2025-08-30 06:04:01 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:04:13 - pico-train - INFO - Step 46350 -- ๐ Training Metrics |
| 2025-08-30 06:04:13 - pico-train - INFO - โโโ Loss: 6.0594 |
| 2025-08-30 06:04:13 - pico-train - INFO - โโโ Learning Rate: 3.15e-05 |
| 2025-08-30 06:04:13 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:04:26 - pico-train - INFO - Step 46375 -- ๐ Training Metrics |
| 2025-08-30 06:04:26 - pico-train - INFO - โโโ Loss: 6.0579 |
| 2025-08-30 06:04:26 - pico-train - INFO - โโโ Learning Rate: 3.14e-05 |
| 2025-08-30 06:04:26 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:04:39 - pico-train - INFO - Step 46400 -- ๐ Training Metrics |
| 2025-08-30 06:04:39 - pico-train - INFO - โโโ Loss: 5.9964 |
| 2025-08-30 06:04:39 - pico-train - INFO - โโโ Learning Rate: 3.14e-05 |
| 2025-08-30 06:04:39 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:04:51 - pico-train - INFO - Step 46425 -- ๐ Training Metrics |
| 2025-08-30 06:04:51 - pico-train - INFO - โโโ Loss: 6.0002 |
| 2025-08-30 06:04:51 - pico-train - INFO - โโโ Learning Rate: 3.14e-05 |
| 2025-08-30 06:04:51 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:05:04 - pico-train - INFO - Step 46450 -- ๐ Training Metrics |
| 2025-08-30 06:05:04 - pico-train - INFO - โโโ Loss: 6.0970 |
| 2025-08-30 06:05:04 - pico-train - INFO - โโโ Learning Rate: 3.14e-05 |
| 2025-08-30 06:05:04 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:05:17 - pico-train - INFO - Step 46475 -- ๐ Training Metrics |
| 2025-08-30 06:05:17 - pico-train - INFO - โโโ Loss: 5.9791 |
| 2025-08-30 06:05:17 - pico-train - INFO - โโโ Learning Rate: 3.14e-05 |
| 2025-08-30 06:05:17 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:05:29 - pico-train - INFO - Step 46500 -- ๐พ Saving Checkpoint |
| 2025-08-30 06:07:22 - pico-train - INFO - Step 46500 -- ๐ Evaluation Results |
| 2025-08-30 06:07:22 - pico-train - INFO - โโโ paloma: 2.5264771857772e+28 |
| 2025-08-30 06:07:35 - pico-train - INFO - Step 46500 -- ๐ Training Metrics |
| 2025-08-30 06:07:35 - pico-train - INFO - โโโ Loss: 5.9970 |
| 2025-08-30 06:07:35 - pico-train - INFO - โโโ Learning Rate: 3.13e-05 |
| 2025-08-30 06:07:35 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:07:35 - pico-train - INFO - Step 46500 -- ๐ Saving Learning Dynamics |
| 2025-08-30 06:07:50 - pico-train - INFO - Step 46525 -- ๐ Training Metrics |
| 2025-08-30 06:07:50 - pico-train - INFO - โโโ Loss: 5.9723 |
| 2025-08-30 06:07:50 - pico-train - INFO - โโโ Learning Rate: 3.13e-05 |
| 2025-08-30 06:07:50 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:08:02 - pico-train - INFO - Step 46550 -- ๐ Training Metrics |
| 2025-08-30 06:08:02 - pico-train - INFO - โโโ Loss: 5.9671 |
| 2025-08-30 06:08:02 - pico-train - INFO - โโโ Learning Rate: 3.13e-05 |
| 2025-08-30 06:08:02 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:08:15 - pico-train - INFO - Step 46575 -- ๐ Training Metrics |
| 2025-08-30 06:08:15 - pico-train - INFO - โโโ Loss: 5.9461 |
| 2025-08-30 06:08:15 - pico-train - INFO - โโโ Learning Rate: 3.13e-05 |
| 2025-08-30 06:08:15 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:08:27 - pico-train - INFO - Step 46600 -- ๐ Training Metrics |
| 2025-08-30 06:08:27 - pico-train - INFO - โโโ Loss: 6.0239 |
| 2025-08-30 06:08:27 - pico-train - INFO - โโโ Learning Rate: 3.13e-05 |
| 2025-08-30 06:08:27 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:08:40 - pico-train - INFO - Step 46625 -- ๐ Training Metrics |
| 2025-08-30 06:08:40 - pico-train - INFO - โโโ Loss: 6.0496 |
| 2025-08-30 06:08:40 - pico-train - INFO - โโโ Learning Rate: 3.12e-05 |
| 2025-08-30 06:08:40 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:08:53 - pico-train - INFO - Step 46650 -- ๐ Training Metrics |
| 2025-08-30 06:08:53 - pico-train - INFO - โโโ Loss: 5.9859 |
| 2025-08-30 06:08:53 - pico-train - INFO - โโโ Learning Rate: 3.12e-05 |
| 2025-08-30 06:08:53 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:09:05 - pico-train - INFO - Step 46675 -- ๐ Training Metrics |
| 2025-08-30 06:09:05 - pico-train - INFO - โโโ Loss: 6.0529 |
| 2025-08-30 06:09:05 - pico-train - INFO - โโโ Learning Rate: 3.12e-05 |
| 2025-08-30 06:09:05 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:09:18 - pico-train - INFO - Step 46700 -- ๐ Training Metrics |
| 2025-08-30 06:09:18 - pico-train - INFO - โโโ Loss: 6.0469 |
| 2025-08-30 06:09:18 - pico-train - INFO - โโโ Learning Rate: 3.12e-05 |
| 2025-08-30 06:09:18 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:09:31 - pico-train - INFO - Step 46725 -- ๐ Training Metrics |
| 2025-08-30 06:09:31 - pico-train - INFO - โโโ Loss: 6.0152 |
| 2025-08-30 06:09:31 - pico-train - INFO - โโโ Learning Rate: 3.11e-05 |
| 2025-08-30 06:09:31 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:09:43 - pico-train - INFO - Step 46750 -- ๐ Training Metrics |
| 2025-08-30 06:09:43 - pico-train - INFO - โโโ Loss: 6.0636 |
| 2025-08-30 06:09:43 - pico-train - INFO - โโโ Learning Rate: 3.11e-05 |
| 2025-08-30 06:09:43 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:09:56 - pico-train - INFO - Step 46775 -- ๐ Training Metrics |
| 2025-08-30 06:09:56 - pico-train - INFO - โโโ Loss: 6.0503 |
| 2025-08-30 06:09:56 - pico-train - INFO - โโโ Learning Rate: 3.11e-05 |
| 2025-08-30 06:09:56 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:10:08 - pico-train - INFO - Step 46800 -- ๐ Training Metrics |
| 2025-08-30 06:10:08 - pico-train - INFO - โโโ Loss: 6.0151 |
| 2025-08-30 06:10:08 - pico-train - INFO - โโโ Learning Rate: 3.11e-05 |
| 2025-08-30 06:10:08 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:10:21 - pico-train - INFO - Step 46825 -- ๐ Training Metrics |
| 2025-08-30 06:10:21 - pico-train - INFO - โโโ Loss: 5.9617 |
| 2025-08-30 06:10:21 - pico-train - INFO - โโโ Learning Rate: 3.11e-05 |
| 2025-08-30 06:10:21 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:10:33 - pico-train - INFO - Step 46850 -- ๐ Training Metrics |
| 2025-08-30 06:10:33 - pico-train - INFO - โโโ Loss: 5.9888 |
| 2025-08-30 06:10:33 - pico-train - INFO - โโโ Learning Rate: 3.10e-05 |
| 2025-08-30 06:10:33 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:10:46 - pico-train - INFO - Step 46875 -- ๐ Training Metrics |
| 2025-08-30 06:10:46 - pico-train - INFO - โโโ Loss: 5.9116 |
| 2025-08-30 06:10:46 - pico-train - INFO - โโโ Learning Rate: 3.10e-05 |
| 2025-08-30 06:10:46 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:10:59 - pico-train - INFO - Step 46900 -- ๐ Training Metrics |
| 2025-08-30 06:10:59 - pico-train - INFO - โโโ Loss: 6.0299 |
| 2025-08-30 06:10:59 - pico-train - INFO - โโโ Learning Rate: 3.10e-05 |
| 2025-08-30 06:10:59 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:11:11 - pico-train - INFO - Step 46925 -- ๐ Training Metrics |
| 2025-08-30 06:11:11 - pico-train - INFO - โโโ Loss: 5.9876 |
| 2025-08-30 06:11:11 - pico-train - INFO - โโโ Learning Rate: 3.10e-05 |
| 2025-08-30 06:11:11 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:11:24 - pico-train - INFO - Step 46950 -- ๐ Training Metrics |
| 2025-08-30 06:11:24 - pico-train - INFO - โโโ Loss: 6.0462 |
| 2025-08-30 06:11:24 - pico-train - INFO - โโโ Learning Rate: 3.10e-05 |
| 2025-08-30 06:11:24 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:11:36 - pico-train - INFO - Step 46975 -- ๐ Training Metrics |
| 2025-08-30 06:11:36 - pico-train - INFO - โโโ Loss: 6.0083 |
| 2025-08-30 06:11:36 - pico-train - INFO - โโโ Learning Rate: 3.09e-05 |
| 2025-08-30 06:11:36 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:11:48 - pico-train - INFO - Step 47000 -- ๐พ Saving Checkpoint |
| 2025-08-30 06:13:47 - pico-train - INFO - Step 47000 -- ๐ Evaluation Results |
| 2025-08-30 06:13:47 - pico-train - INFO - โโโ paloma: 3.374744437022473e+28 |
| 2025-08-30 06:13:49 - pico-train - INFO - Step 47000 -- ๐ Training Metrics |
| 2025-08-30 06:13:49 - pico-train - INFO - โโโ Loss: 6.0269 |
| 2025-08-30 06:13:49 - pico-train - INFO - โโโ Learning Rate: 3.09e-05 |
| 2025-08-30 06:13:49 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:13:49 - pico-train - INFO - Step 47000 -- ๐ Saving Learning Dynamics |
| 2025-08-30 06:14:04 - pico-train - INFO - Step 47025 -- ๐ Training Metrics |
| 2025-08-30 06:14:04 - pico-train - INFO - โโโ Loss: 6.0510 |
| 2025-08-30 06:14:04 - pico-train - INFO - โโโ Learning Rate: 3.09e-05 |
| 2025-08-30 06:14:04 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:14:17 - pico-train - INFO - Step 47050 -- ๐ Training Metrics |
| 2025-08-30 06:14:17 - pico-train - INFO - โโโ Loss: 5.9631 |
| 2025-08-30 06:14:17 - pico-train - INFO - โโโ Learning Rate: 3.09e-05 |
| 2025-08-30 06:14:17 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:14:29 - pico-train - INFO - Step 47075 -- ๐ Training Metrics |
| 2025-08-30 06:14:29 - pico-train - INFO - โโโ Loss: 5.9767 |
| 2025-08-30 06:14:29 - pico-train - INFO - โโโ Learning Rate: 3.09e-05 |
| 2025-08-30 06:14:29 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:14:42 - pico-train - INFO - Step 47100 -- ๐ Training Metrics |
| 2025-08-30 06:14:42 - pico-train - INFO - โโโ Loss: 6.0403 |
| 2025-08-30 06:14:42 - pico-train - INFO - โโโ Learning Rate: 3.08e-05 |
| 2025-08-30 06:14:42 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:14:55 - pico-train - INFO - Step 47125 -- ๐ Training Metrics |
| 2025-08-30 06:14:55 - pico-train - INFO - โโโ Loss: 6.0179 |
| 2025-08-30 06:14:55 - pico-train - INFO - โโโ Learning Rate: 3.08e-05 |
| 2025-08-30 06:14:55 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:15:08 - pico-train - INFO - Step 47150 -- ๐ Training Metrics |
| 2025-08-30 06:15:08 - pico-train - INFO - โโโ Loss: 6.0036 |
| 2025-08-30 06:15:08 - pico-train - INFO - โโโ Learning Rate: 3.08e-05 |
| 2025-08-30 06:15:08 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:15:21 - pico-train - INFO - Step 47175 -- ๐ Training Metrics |
| 2025-08-30 06:15:21 - pico-train - INFO - โโโ Loss: 6.0186 |
| 2025-08-30 06:15:21 - pico-train - INFO - โโโ Learning Rate: 3.08e-05 |
| 2025-08-30 06:15:21 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:15:33 - pico-train - INFO - Step 47200 -- ๐ Training Metrics |
| 2025-08-30 06:15:33 - pico-train - INFO - โโโ Loss: 5.9299 |
| 2025-08-30 06:15:33 - pico-train - INFO - โโโ Learning Rate: 3.08e-05 |
| 2025-08-30 06:15:33 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:15:46 - pico-train - INFO - Step 47225 -- ๐ Training Metrics |
| 2025-08-30 06:15:46 - pico-train - INFO - โโโ Loss: 6.1006 |
| 2025-08-30 06:15:46 - pico-train - INFO - โโโ Learning Rate: 3.07e-05 |
| 2025-08-30 06:15:46 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:15:58 - pico-train - INFO - Step 47250 -- ๐ Training Metrics |
| 2025-08-30 06:15:58 - pico-train - INFO - โโโ Loss: 5.9586 |
| 2025-08-30 06:15:58 - pico-train - INFO - โโโ Learning Rate: 3.07e-05 |
| 2025-08-30 06:15:58 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:16:11 - pico-train - INFO - Step 47275 -- ๐ Training Metrics |
| 2025-08-30 06:16:11 - pico-train - INFO - โโโ Loss: 6.0152 |
| 2025-08-30 06:16:11 - pico-train - INFO - โโโ Learning Rate: 3.07e-05 |
| 2025-08-30 06:16:11 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:16:24 - pico-train - INFO - Step 47300 -- ๐ Training Metrics |
| 2025-08-30 06:16:24 - pico-train - INFO - โโโ Loss: 5.9418 |
| 2025-08-30 06:16:24 - pico-train - INFO - โโโ Learning Rate: 3.07e-05 |
| 2025-08-30 06:16:24 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:16:36 - pico-train - INFO - Step 47325 -- ๐ Training Metrics |
| 2025-08-30 06:16:36 - pico-train - INFO - โโโ Loss: 5.9040 |
| 2025-08-30 06:16:36 - pico-train - INFO - โโโ Learning Rate: 3.06e-05 |
| 2025-08-30 06:16:36 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:16:49 - pico-train - INFO - Step 47350 -- ๐ Training Metrics |
| 2025-08-30 06:16:49 - pico-train - INFO - โโโ Loss: 6.0085 |
| 2025-08-30 06:16:49 - pico-train - INFO - โโโ Learning Rate: 3.06e-05 |
| 2025-08-30 06:16:49 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:17:01 - pico-train - INFO - Step 47375 -- ๐ Training Metrics |
| 2025-08-30 06:17:01 - pico-train - INFO - โโโ Loss: 5.9546 |
| 2025-08-30 06:17:01 - pico-train - INFO - โโโ Learning Rate: 3.06e-05 |
| 2025-08-30 06:17:01 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:17:14 - pico-train - INFO - Step 47400 -- ๐ Training Metrics |
| 2025-08-30 06:17:14 - pico-train - INFO - โโโ Loss: 6.0002 |
| 2025-08-30 06:17:14 - pico-train - INFO - โโโ Learning Rate: 3.06e-05 |
| 2025-08-30 06:17:14 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:17:26 - pico-train - INFO - Step 47425 -- ๐ Training Metrics |
| 2025-08-30 06:17:26 - pico-train - INFO - โโโ Loss: 5.9671 |
| 2025-08-30 06:17:26 - pico-train - INFO - โโโ Learning Rate: 3.06e-05 |
| 2025-08-30 06:17:26 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:17:39 - pico-train - INFO - Step 47450 -- ๐ Training Metrics |
| 2025-08-30 06:17:39 - pico-train - INFO - โโโ Loss: 5.9857 |
| 2025-08-30 06:17:39 - pico-train - INFO - โโโ Learning Rate: 3.05e-05 |
| 2025-08-30 06:17:39 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:17:52 - pico-train - INFO - Step 47475 -- ๐ Training Metrics |
| 2025-08-30 06:17:52 - pico-train - INFO - โโโ Loss: 6.0252 |
| 2025-08-30 06:17:52 - pico-train - INFO - โโโ Learning Rate: 3.05e-05 |
| 2025-08-30 06:17:52 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:18:04 - pico-train - INFO - Step 47500 -- ๐พ Saving Checkpoint |
| 2025-08-30 06:19:56 - pico-train - INFO - Step 47500 -- ๐ Evaluation Results |
| 2025-08-30 06:19:56 - pico-train - INFO - โโโ paloma: 6.3085366283161405e+28 |
| 2025-08-30 06:19:57 - pico-train - INFO - Step 47500 -- ๐ Training Metrics |
| 2025-08-30 06:19:57 - pico-train - INFO - โโโ Loss: 6.0560 |
| 2025-08-30 06:19:57 - pico-train - INFO - โโโ Learning Rate: 3.05e-05 |
| 2025-08-30 06:19:57 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:19:57 - pico-train - INFO - Step 47500 -- ๐ Saving Learning Dynamics |
| 2025-08-30 06:20:12 - pico-train - INFO - Step 47525 -- ๐ Training Metrics |
| 2025-08-30 06:20:12 - pico-train - INFO - โโโ Loss: 5.9855 |
| 2025-08-30 06:20:12 - pico-train - INFO - โโโ Learning Rate: 3.05e-05 |
| 2025-08-30 06:20:12 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:20:25 - pico-train - INFO - Step 47550 -- ๐ Training Metrics |
| 2025-08-30 06:20:25 - pico-train - INFO - โโโ Loss: 5.9577 |
| 2025-08-30 06:20:25 - pico-train - INFO - โโโ Learning Rate: 3.05e-05 |
| 2025-08-30 06:20:25 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:20:37 - pico-train - INFO - Step 47575 -- ๐ Training Metrics |
| 2025-08-30 06:20:37 - pico-train - INFO - โโโ Loss: 6.0061 |
| 2025-08-30 06:20:37 - pico-train - INFO - โโโ Learning Rate: 3.04e-05 |
| 2025-08-30 06:20:37 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:20:50 - pico-train - INFO - Step 47600 -- ๐ Training Metrics |
| 2025-08-30 06:20:50 - pico-train - INFO - โโโ Loss: 5.9977 |
| 2025-08-30 06:20:50 - pico-train - INFO - โโโ Learning Rate: 3.04e-05 |
| 2025-08-30 06:20:50 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:21:03 - pico-train - INFO - Step 47625 -- ๐ Training Metrics |
| 2025-08-30 06:21:03 - pico-train - INFO - โโโ Loss: 5.9507 |
| 2025-08-30 06:21:03 - pico-train - INFO - โโโ Learning Rate: 3.04e-05 |
| 2025-08-30 06:21:03 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:21:15 - pico-train - INFO - Step 47650 -- ๐ Training Metrics |
| 2025-08-30 06:21:15 - pico-train - INFO - โโโ Loss: 5.9363 |
| 2025-08-30 06:21:15 - pico-train - INFO - โโโ Learning Rate: 3.04e-05 |
| 2025-08-30 06:21:15 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:21:28 - pico-train - INFO - Step 47675 -- ๐ Training Metrics |
| 2025-08-30 06:21:28 - pico-train - INFO - โโโ Loss: 6.0677 |
| 2025-08-30 06:21:28 - pico-train - INFO - โโโ Learning Rate: 3.04e-05 |
| 2025-08-30 06:21:28 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:21:41 - pico-train - INFO - Step 47700 -- ๐ Training Metrics |
| 2025-08-30 06:21:41 - pico-train - INFO - โโโ Loss: 6.0777 |
| 2025-08-30 06:21:41 - pico-train - INFO - โโโ Learning Rate: 3.03e-05 |
| 2025-08-30 06:21:41 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:21:53 - pico-train - INFO - Step 47725 -- ๐ Training Metrics |
| 2025-08-30 06:21:53 - pico-train - INFO - โโโ Loss: 5.9203 |
| 2025-08-30 06:21:53 - pico-train - INFO - โโโ Learning Rate: 3.03e-05 |
| 2025-08-30 06:21:53 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:22:06 - pico-train - INFO - Step 47750 -- ๐ Training Metrics |
| 2025-08-30 06:22:06 - pico-train - INFO - โโโ Loss: 6.0014 |
| 2025-08-30 06:22:06 - pico-train - INFO - โโโ Learning Rate: 3.03e-05 |
| 2025-08-30 06:22:06 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:22:19 - pico-train - INFO - Step 47775 -- ๐ Training Metrics |
| 2025-08-30 06:22:19 - pico-train - INFO - โโโ Loss: 5.9680 |
| 2025-08-30 06:22:19 - pico-train - INFO - โโโ Learning Rate: 3.03e-05 |
| 2025-08-30 06:22:19 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:22:31 - pico-train - INFO - Step 47800 -- ๐ Training Metrics |
| 2025-08-30 06:22:31 - pico-train - INFO - โโโ Loss: 6.0516 |
| 2025-08-30 06:22:31 - pico-train - INFO - โโโ Learning Rate: 3.03e-05 |
| 2025-08-30 06:22:31 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:22:44 - pico-train - INFO - Step 47825 -- ๐ Training Metrics |
| 2025-08-30 06:22:44 - pico-train - INFO - โโโ Loss: 6.0163 |
| 2025-08-30 06:22:44 - pico-train - INFO - โโโ Learning Rate: 3.02e-05 |
| 2025-08-30 06:22:44 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:22:56 - pico-train - INFO - Step 47850 -- ๐ Training Metrics |
| 2025-08-30 06:22:56 - pico-train - INFO - โโโ Loss: 6.0132 |
| 2025-08-30 06:22:56 - pico-train - INFO - โโโ Learning Rate: 3.02e-05 |
| 2025-08-30 06:22:56 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:23:09 - pico-train - INFO - Step 47875 -- ๐ Training Metrics |
| 2025-08-30 06:23:09 - pico-train - INFO - โโโ Loss: 5.9571 |
| 2025-08-30 06:23:09 - pico-train - INFO - โโโ Learning Rate: 3.02e-05 |
| 2025-08-30 06:23:09 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:23:22 - pico-train - INFO - Step 47900 -- ๐ Training Metrics |
| 2025-08-30 06:23:22 - pico-train - INFO - โโโ Loss: 5.9390 |
| 2025-08-30 06:23:22 - pico-train - INFO - โโโ Learning Rate: 3.02e-05 |
| 2025-08-30 06:23:22 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:23:34 - pico-train - INFO - Step 47925 -- ๐ Training Metrics |
| 2025-08-30 06:23:34 - pico-train - INFO - โโโ Loss: 5.9870 |
| 2025-08-30 06:23:34 - pico-train - INFO - โโโ Learning Rate: 3.01e-05 |
| 2025-08-30 06:23:34 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:23:47 - pico-train - INFO - Step 47950 -- ๐ Training Metrics |
| 2025-08-30 06:23:47 - pico-train - INFO - โโโ Loss: 5.9717 |
| 2025-08-30 06:23:47 - pico-train - INFO - โโโ Learning Rate: 3.01e-05 |
| 2025-08-30 06:23:47 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:23:59 - pico-train - INFO - Step 47975 -- ๐ Training Metrics |
| 2025-08-30 06:23:59 - pico-train - INFO - โโโ Loss: 6.0558 |
| 2025-08-30 06:23:59 - pico-train - INFO - โโโ Learning Rate: 3.01e-05 |
| 2025-08-30 06:23:59 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:24:12 - pico-train - INFO - Step 48000 -- ๐พ Saving Checkpoint |
| 2025-08-30 06:26:04 - pico-train - INFO - Step 48000 -- ๐ Evaluation Results |
| 2025-08-30 06:26:04 - pico-train - INFO - โโโ paloma: 6.49975478431273e+28 |
| 2025-08-30 06:26:06 - pico-train - INFO - Step 48000 -- ๐ Training Metrics |
| 2025-08-30 06:26:06 - pico-train - INFO - โโโ Loss: 6.0808 |
| 2025-08-30 06:26:06 - pico-train - INFO - โโโ Learning Rate: 3.01e-05 |
| 2025-08-30 06:26:06 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:26:06 - pico-train - INFO - Step 48000 -- ๐ Saving Learning Dynamics |
| 2025-08-30 06:26:20 - pico-train - INFO - Step 48025 -- ๐ Training Metrics |
| 2025-08-30 06:26:20 - pico-train - INFO - โโโ Loss: 6.0001 |
| 2025-08-30 06:26:20 - pico-train - INFO - โโโ Learning Rate: 3.01e-05 |
| 2025-08-30 06:26:20 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:26:33 - pico-train - INFO - Step 48050 -- ๐ Training Metrics |
| 2025-08-30 06:26:33 - pico-train - INFO - โโโ Loss: 6.0349 |
| 2025-08-30 06:26:33 - pico-train - INFO - โโโ Learning Rate: 3.00e-05 |
| 2025-08-30 06:26:33 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:26:46 - pico-train - INFO - Step 48075 -- ๐ Training Metrics |
| 2025-08-30 06:26:46 - pico-train - INFO - โโโ Loss: 5.9524 |
| 2025-08-30 06:26:46 - pico-train - INFO - โโโ Learning Rate: 3.00e-05 |
| 2025-08-30 06:26:46 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:26:58 - pico-train - INFO - Step 48100 -- ๐ Training Metrics |
| 2025-08-30 06:26:58 - pico-train - INFO - โโโ Loss: 5.9626 |
| 2025-08-30 06:26:58 - pico-train - INFO - โโโ Learning Rate: 3.00e-05 |
| 2025-08-30 06:26:58 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:27:11 - pico-train - INFO - Step 48125 -- ๐ Training Metrics |
| 2025-08-30 06:27:11 - pico-train - INFO - โโโ Loss: 6.0514 |
| 2025-08-30 06:27:11 - pico-train - INFO - โโโ Learning Rate: 3.00e-05 |
| 2025-08-30 06:27:11 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:27:24 - pico-train - INFO - Step 48150 -- ๐ Training Metrics |
| 2025-08-30 06:27:24 - pico-train - INFO - โโโ Loss: 6.0687 |
| 2025-08-30 06:27:24 - pico-train - INFO - โโโ Learning Rate: 3.00e-05 |
| 2025-08-30 06:27:24 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:27:36 - pico-train - INFO - Step 48175 -- ๐ Training Metrics |
| 2025-08-30 06:27:36 - pico-train - INFO - โโโ Loss: 6.0928 |
| 2025-08-30 06:27:36 - pico-train - INFO - โโโ Learning Rate: 2.99e-05 |
| 2025-08-30 06:27:36 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:27:49 - pico-train - INFO - Step 48200 -- ๐ Training Metrics |
| 2025-08-30 06:27:49 - pico-train - INFO - โโโ Loss: 5.9182 |
| 2025-08-30 06:27:49 - pico-train - INFO - โโโ Learning Rate: 2.99e-05 |
| 2025-08-30 06:27:49 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:28:02 - pico-train - INFO - Step 48225 -- ๐ Training Metrics |
| 2025-08-30 06:28:02 - pico-train - INFO - โโโ Loss: 5.9677 |
| 2025-08-30 06:28:02 - pico-train - INFO - โโโ Learning Rate: 2.99e-05 |
| 2025-08-30 06:28:02 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:28:14 - pico-train - INFO - Step 48250 -- ๐ Training Metrics |
| 2025-08-30 06:28:14 - pico-train - INFO - โโโ Loss: 6.0330 |
| 2025-08-30 06:28:14 - pico-train - INFO - โโโ Learning Rate: 2.99e-05 |
| 2025-08-30 06:28:14 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:28:27 - pico-train - INFO - Step 48275 -- ๐ Training Metrics |
| 2025-08-30 06:28:27 - pico-train - INFO - โโโ Loss: 6.0136 |
| 2025-08-30 06:28:27 - pico-train - INFO - โโโ Learning Rate: 2.99e-05 |
| 2025-08-30 06:28:27 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:28:39 - pico-train - INFO - Step 48300 -- ๐ Training Metrics |
| 2025-08-30 06:28:39 - pico-train - INFO - โโโ Loss: 6.0606 |
| 2025-08-30 06:28:39 - pico-train - INFO - โโโ Learning Rate: 2.98e-05 |
| 2025-08-30 06:28:39 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:28:52 - pico-train - INFO - Step 48325 -- ๐ Training Metrics |
| 2025-08-30 06:28:52 - pico-train - INFO - โโโ Loss: 5.9799 |
| 2025-08-30 06:28:52 - pico-train - INFO - โโโ Learning Rate: 2.98e-05 |
| 2025-08-30 06:28:52 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:29:04 - pico-train - INFO - Step 48350 -- ๐ Training Metrics |
| 2025-08-30 06:29:04 - pico-train - INFO - โโโ Loss: 5.9201 |
| 2025-08-30 06:29:04 - pico-train - INFO - โโโ Learning Rate: 2.98e-05 |
| 2025-08-30 06:29:04 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:29:17 - pico-train - INFO - Step 48375 -- ๐ Training Metrics |
| 2025-08-30 06:29:17 - pico-train - INFO - โโโ Loss: 6.0589 |
| 2025-08-30 06:29:17 - pico-train - INFO - โโโ Learning Rate: 2.98e-05 |
| 2025-08-30 06:29:17 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:29:30 - pico-train - INFO - Step 48400 -- ๐ Training Metrics |
| 2025-08-30 06:29:30 - pico-train - INFO - โโโ Loss: 5.9895 |
| 2025-08-30 06:29:30 - pico-train - INFO - โโโ Learning Rate: 2.98e-05 |
| 2025-08-30 06:29:30 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:29:42 - pico-train - INFO - Step 48425 -- ๐ Training Metrics |
| 2025-08-30 06:29:42 - pico-train - INFO - โโโ Loss: 6.0389 |
| 2025-08-30 06:29:42 - pico-train - INFO - โโโ Learning Rate: 2.97e-05 |
| 2025-08-30 06:29:42 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:29:55 - pico-train - INFO - Step 48450 -- ๐ Training Metrics |
| 2025-08-30 06:29:55 - pico-train - INFO - โโโ Loss: 5.9910 |
| 2025-08-30 06:29:55 - pico-train - INFO - โโโ Learning Rate: 2.97e-05 |
| 2025-08-30 06:29:55 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:30:07 - pico-train - INFO - Step 48475 -- ๐ Training Metrics |
| 2025-08-30 06:30:07 - pico-train - INFO - โโโ Loss: 6.0012 |
| 2025-08-30 06:30:07 - pico-train - INFO - โโโ Learning Rate: 2.97e-05 |
| 2025-08-30 06:30:07 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:30:20 - pico-train - INFO - Step 48500 -- ๐พ Saving Checkpoint |
| 2025-08-30 06:32:12 - pico-train - INFO - Step 48500 -- ๐ Evaluation Results |
| 2025-08-30 06:32:12 - pico-train - INFO - โโโ paloma: 7.468048914747141e+28 |
| 2025-08-30 06:32:15 - pico-train - INFO - Step 48500 -- ๐ Training Metrics |
| 2025-08-30 06:32:15 - pico-train - INFO - โโโ Loss: 6.0047 |
| 2025-08-30 06:32:15 - pico-train - INFO - โโโ Learning Rate: 2.97e-05 |
| 2025-08-30 06:32:15 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:32:15 - pico-train - INFO - Step 48500 -- ๐ Saving Learning Dynamics |
| 2025-08-30 06:32:31 - pico-train - INFO - Step 48525 -- ๐ Training Metrics |
| 2025-08-30 06:32:31 - pico-train - INFO - โโโ Loss: 5.9447 |
| 2025-08-30 06:32:31 - pico-train - INFO - โโโ Learning Rate: 2.96e-05 |
| 2025-08-30 06:32:31 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:32:43 - pico-train - INFO - Step 48550 -- ๐ Training Metrics |
| 2025-08-30 06:32:43 - pico-train - INFO - โโโ Loss: 5.9573 |
| 2025-08-30 06:32:43 - pico-train - INFO - โโโ Learning Rate: 2.96e-05 |
| 2025-08-30 06:32:43 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:32:56 - pico-train - INFO - Step 48575 -- ๐ Training Metrics |
| 2025-08-30 06:32:56 - pico-train - INFO - โโโ Loss: 5.9279 |
| 2025-08-30 06:32:56 - pico-train - INFO - โโโ Learning Rate: 2.96e-05 |
| 2025-08-30 06:32:56 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:33:08 - pico-train - INFO - Step 48600 -- ๐ Training Metrics |
| 2025-08-30 06:33:08 - pico-train - INFO - โโโ Loss: 6.0511 |
| 2025-08-30 06:33:08 - pico-train - INFO - โโโ Learning Rate: 2.96e-05 |
| 2025-08-30 06:33:08 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:33:21 - pico-train - INFO - Step 48625 -- ๐ Training Metrics |
| 2025-08-30 06:33:21 - pico-train - INFO - โโโ Loss: 5.9875 |
| 2025-08-30 06:33:21 - pico-train - INFO - โโโ Learning Rate: 2.96e-05 |
| 2025-08-30 06:33:21 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:33:34 - pico-train - INFO - Step 48650 -- ๐ Training Metrics |
| 2025-08-30 06:33:34 - pico-train - INFO - โโโ Loss: 5.9392 |
| 2025-08-30 06:33:34 - pico-train - INFO - โโโ Learning Rate: 2.95e-05 |
| 2025-08-30 06:33:34 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:33:46 - pico-train - INFO - Step 48675 -- ๐ Training Metrics |
| 2025-08-30 06:33:46 - pico-train - INFO - โโโ Loss: 5.9466 |
| 2025-08-30 06:33:46 - pico-train - INFO - โโโ Learning Rate: 2.95e-05 |
| 2025-08-30 06:33:46 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:33:59 - pico-train - INFO - Step 48700 -- ๐ Training Metrics |
| 2025-08-30 06:33:59 - pico-train - INFO - โโโ Loss: 6.0769 |
| 2025-08-30 06:33:59 - pico-train - INFO - โโโ Learning Rate: 2.95e-05 |
| 2025-08-30 06:33:59 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:34:11 - pico-train - INFO - Step 48725 -- ๐ Training Metrics |
| 2025-08-30 06:34:11 - pico-train - INFO - โโโ Loss: 5.8933 |
| 2025-08-30 06:34:11 - pico-train - INFO - โโโ Learning Rate: 2.95e-05 |
| 2025-08-30 06:34:11 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:34:24 - pico-train - INFO - Step 48750 -- ๐ Training Metrics |
| 2025-08-30 06:34:24 - pico-train - INFO - โโโ Loss: 5.9891 |
| 2025-08-30 06:34:24 - pico-train - INFO - โโโ Learning Rate: 2.95e-05 |
| 2025-08-30 06:34:24 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:34:37 - pico-train - INFO - Step 48775 -- ๐ Training Metrics |
| 2025-08-30 06:34:37 - pico-train - INFO - โโโ Loss: 5.9740 |
| 2025-08-30 06:34:37 - pico-train - INFO - โโโ Learning Rate: 2.94e-05 |
| 2025-08-30 06:34:37 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:34:49 - pico-train - INFO - Step 48800 -- ๐ Training Metrics |
| 2025-08-30 06:34:49 - pico-train - INFO - โโโ Loss: 5.9417 |
| 2025-08-30 06:34:49 - pico-train - INFO - โโโ Learning Rate: 2.94e-05 |
| 2025-08-30 06:34:49 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:35:02 - pico-train - INFO - Step 48825 -- ๐ Training Metrics |
| 2025-08-30 06:35:02 - pico-train - INFO - โโโ Loss: 5.9812 |
| 2025-08-30 06:35:02 - pico-train - INFO - โโโ Learning Rate: 2.94e-05 |
| 2025-08-30 06:35:02 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:35:14 - pico-train - INFO - Step 48850 -- ๐ Training Metrics |
| 2025-08-30 06:35:14 - pico-train - INFO - โโโ Loss: 5.9183 |
| 2025-08-30 06:35:14 - pico-train - INFO - โโโ Learning Rate: 2.94e-05 |
| 2025-08-30 06:35:14 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:35:27 - pico-train - INFO - Step 48875 -- ๐ Training Metrics |
| 2025-08-30 06:35:27 - pico-train - INFO - โโโ Loss: 5.8828 |
| 2025-08-30 06:35:27 - pico-train - INFO - โโโ Learning Rate: 2.94e-05 |
| 2025-08-30 06:35:27 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:35:40 - pico-train - INFO - Step 48900 -- ๐ Training Metrics |
| 2025-08-30 06:35:40 - pico-train - INFO - โโโ Loss: 6.0054 |
| 2025-08-30 06:35:40 - pico-train - INFO - โโโ Learning Rate: 2.93e-05 |
| 2025-08-30 06:35:40 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:35:52 - pico-train - INFO - Step 48925 -- ๐ Training Metrics |
| 2025-08-30 06:35:52 - pico-train - INFO - โโโ Loss: 5.9383 |
| 2025-08-30 06:35:52 - pico-train - INFO - โโโ Learning Rate: 2.93e-05 |
| 2025-08-30 06:35:52 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:36:05 - pico-train - INFO - Step 48950 -- ๐ Training Metrics |
| 2025-08-30 06:36:05 - pico-train - INFO - โโโ Loss: 5.9938 |
| 2025-08-30 06:36:05 - pico-train - INFO - โโโ Learning Rate: 2.93e-05 |
| 2025-08-30 06:36:05 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:36:17 - pico-train - INFO - Step 48975 -- ๐ Training Metrics |
| 2025-08-30 06:36:17 - pico-train - INFO - โโโ Loss: 6.0000 |
| 2025-08-30 06:36:17 - pico-train - INFO - โโโ Learning Rate: 2.93e-05 |
| 2025-08-30 06:36:17 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:36:30 - pico-train - INFO - Step 49000 -- ๐พ Saving Checkpoint |
| 2025-08-30 06:38:29 - pico-train - INFO - Step 49000 -- ๐ Evaluation Results |
| 2025-08-30 06:38:29 - pico-train - INFO - โโโ paloma: 1.0055567105609192e+29 |
| 2025-08-30 06:38:32 - pico-train - INFO - Step 49000 -- ๐ Training Metrics |
| 2025-08-30 06:38:32 - pico-train - INFO - โโโ Loss: 5.9422 |
| 2025-08-30 06:38:32 - pico-train - INFO - โโโ Learning Rate: 2.92e-05 |
| 2025-08-30 06:38:32 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:38:32 - pico-train - INFO - Step 49000 -- ๐ Saving Learning Dynamics |
| 2025-08-30 06:38:47 - pico-train - INFO - Step 49025 -- ๐ Training Metrics |
| 2025-08-30 06:38:47 - pico-train - INFO - โโโ Loss: 6.0063 |
| 2025-08-30 06:38:47 - pico-train - INFO - โโโ Learning Rate: 2.92e-05 |
| 2025-08-30 06:38:47 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:39:00 - pico-train - INFO - Step 49050 -- ๐ Training Metrics |
| 2025-08-30 06:39:00 - pico-train - INFO - โโโ Loss: 5.9355 |
| 2025-08-30 06:39:00 - pico-train - INFO - โโโ Learning Rate: 2.92e-05 |
| 2025-08-30 06:39:00 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:39:12 - pico-train - INFO - Step 49075 -- ๐ Training Metrics |
| 2025-08-30 06:39:12 - pico-train - INFO - โโโ Loss: 5.9666 |
| 2025-08-30 06:39:12 - pico-train - INFO - โโโ Learning Rate: 2.92e-05 |
| 2025-08-30 06:39:12 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:39:25 - pico-train - INFO - Step 49100 -- ๐ Training Metrics |
| 2025-08-30 06:39:25 - pico-train - INFO - โโโ Loss: 5.9422 |
| 2025-08-30 06:39:25 - pico-train - INFO - โโโ Learning Rate: 2.92e-05 |
| 2025-08-30 06:39:25 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:39:38 - pico-train - INFO - Step 49125 -- ๐ Training Metrics |
| 2025-08-30 06:39:38 - pico-train - INFO - โโโ Loss: 6.0065 |
| 2025-08-30 06:39:38 - pico-train - INFO - โโโ Learning Rate: 2.91e-05 |
| 2025-08-30 06:39:38 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:39:51 - pico-train - INFO - Step 49150 -- ๐ Training Metrics |
| 2025-08-30 06:39:51 - pico-train - INFO - โโโ Loss: 5.8978 |
| 2025-08-30 06:39:51 - pico-train - INFO - โโโ Learning Rate: 2.91e-05 |
| 2025-08-30 06:39:51 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:40:04 - pico-train - INFO - Step 49175 -- ๐ Training Metrics |
| 2025-08-30 06:40:04 - pico-train - INFO - โโโ Loss: 5.9054 |
| 2025-08-30 06:40:04 - pico-train - INFO - โโโ Learning Rate: 2.91e-05 |
| 2025-08-30 06:40:04 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:40:16 - pico-train - INFO - Step 49200 -- ๐ Training Metrics |
| 2025-08-30 06:40:16 - pico-train - INFO - โโโ Loss: 5.9853 |
| 2025-08-30 06:40:16 - pico-train - INFO - โโโ Learning Rate: 2.91e-05 |
| 2025-08-30 06:40:16 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:40:29 - pico-train - INFO - Step 49225 -- ๐ Training Metrics |
| 2025-08-30 06:40:29 - pico-train - INFO - โโโ Loss: 6.1100 |
| 2025-08-30 06:40:29 - pico-train - INFO - โโโ Learning Rate: 2.91e-05 |
| 2025-08-30 06:40:29 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:40:42 - pico-train - INFO - Step 49250 -- ๐ Training Metrics |
| 2025-08-30 06:40:42 - pico-train - INFO - โโโ Loss: 5.9674 |
| 2025-08-30 06:40:42 - pico-train - INFO - โโโ Learning Rate: 2.90e-05 |
| 2025-08-30 06:40:42 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:40:54 - pico-train - INFO - Step 49275 -- ๐ Training Metrics |
| 2025-08-30 06:40:54 - pico-train - INFO - โโโ Loss: 5.9658 |
| 2025-08-30 06:40:54 - pico-train - INFO - โโโ Learning Rate: 2.90e-05 |
| 2025-08-30 06:40:54 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:41:07 - pico-train - INFO - Step 49300 -- ๐ Training Metrics |
| 2025-08-30 06:41:07 - pico-train - INFO - โโโ Loss: 5.8764 |
| 2025-08-30 06:41:07 - pico-train - INFO - โโโ Learning Rate: 2.90e-05 |
| 2025-08-30 06:41:07 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:41:19 - pico-train - INFO - Step 49325 -- ๐ Training Metrics |
| 2025-08-30 06:41:19 - pico-train - INFO - โโโ Loss: 6.0664 |
| 2025-08-30 06:41:19 - pico-train - INFO - โโโ Learning Rate: 2.90e-05 |
| 2025-08-30 06:41:19 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:41:32 - pico-train - INFO - Step 49350 -- ๐ Training Metrics |
| 2025-08-30 06:41:32 - pico-train - INFO - โโโ Loss: 6.0158 |
| 2025-08-30 06:41:32 - pico-train - INFO - โโโ Learning Rate: 2.90e-05 |
| 2025-08-30 06:41:32 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:41:44 - pico-train - INFO - Step 49375 -- ๐ Training Metrics |
| 2025-08-30 06:41:44 - pico-train - INFO - โโโ Loss: 5.8884 |
| 2025-08-30 06:41:44 - pico-train - INFO - โโโ Learning Rate: 2.89e-05 |
| 2025-08-30 06:41:44 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:41:57 - pico-train - INFO - Step 49400 -- ๐ Training Metrics |
| 2025-08-30 06:41:57 - pico-train - INFO - โโโ Loss: 5.9176 |
| 2025-08-30 06:41:57 - pico-train - INFO - โโโ Learning Rate: 2.89e-05 |
| 2025-08-30 06:41:57 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:42:10 - pico-train - INFO - Step 49425 -- ๐ Training Metrics |
| 2025-08-30 06:42:10 - pico-train - INFO - โโโ Loss: 6.0363 |
| 2025-08-30 06:42:10 - pico-train - INFO - โโโ Learning Rate: 2.89e-05 |
| 2025-08-30 06:42:10 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:42:22 - pico-train - INFO - Step 49450 -- ๐ Training Metrics |
| 2025-08-30 06:42:22 - pico-train - INFO - โโโ Loss: 5.9492 |
| 2025-08-30 06:42:22 - pico-train - INFO - โโโ Learning Rate: 2.89e-05 |
| 2025-08-30 06:42:22 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:42:35 - pico-train - INFO - Step 49475 -- ๐ Training Metrics |
| 2025-08-30 06:42:35 - pico-train - INFO - โโโ Loss: 5.9670 |
| 2025-08-30 06:42:35 - pico-train - INFO - โโโ Learning Rate: 2.88e-05 |
| 2025-08-30 06:42:35 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:42:47 - pico-train - INFO - Step 49500 -- ๐พ Saving Checkpoint |
| 2025-08-30 06:44:54 - pico-train - INFO - Step 49500 -- ๐ Evaluation Results |
| 2025-08-30 06:44:54 - pico-train - INFO - โโโ paloma: 1.1754196849862509e+29 |
| 2025-08-30 06:44:57 - pico-train - INFO - Step 49500 -- ๐ Training Metrics |
| 2025-08-30 06:44:57 - pico-train - INFO - โโโ Loss: 5.9626 |
| 2025-08-30 06:44:57 - pico-train - INFO - โโโ Learning Rate: 2.88e-05 |
| 2025-08-30 06:44:57 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:44:57 - pico-train - INFO - Step 49500 -- ๐ Saving Learning Dynamics |
| 2025-08-30 06:45:12 - pico-train - INFO - Step 49525 -- ๐ Training Metrics |
| 2025-08-30 06:45:12 - pico-train - INFO - โโโ Loss: 5.9286 |
| 2025-08-30 06:45:12 - pico-train - INFO - โโโ Learning Rate: 2.88e-05 |
| 2025-08-30 06:45:12 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:45:24 - pico-train - INFO - Step 49550 -- ๐ Training Metrics |
| 2025-08-30 06:45:24 - pico-train - INFO - โโโ Loss: 5.8746 |
| 2025-08-30 06:45:24 - pico-train - INFO - โโโ Learning Rate: 2.88e-05 |
| 2025-08-30 06:45:24 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:45:37 - pico-train - INFO - Step 49575 -- ๐ Training Metrics |
| 2025-08-30 06:45:37 - pico-train - INFO - โโโ Loss: 5.9669 |
| 2025-08-30 06:45:37 - pico-train - INFO - โโโ Learning Rate: 2.88e-05 |
| 2025-08-30 06:45:37 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:45:49 - pico-train - INFO - Step 49600 -- ๐ Training Metrics |
| 2025-08-30 06:45:49 - pico-train - INFO - โโโ Loss: 6.0786 |
| 2025-08-30 06:45:49 - pico-train - INFO - โโโ Learning Rate: 2.87e-05 |
| 2025-08-30 06:45:49 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:46:03 - pico-train - INFO - Step 49625 -- ๐ Training Metrics |
| 2025-08-30 06:46:03 - pico-train - INFO - โโโ Loss: 6.0407 |
| 2025-08-30 06:46:03 - pico-train - INFO - โโโ Learning Rate: 2.87e-05 |
| 2025-08-30 06:46:03 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:46:15 - pico-train - INFO - Step 49650 -- ๐ Training Metrics |
| 2025-08-30 06:46:15 - pico-train - INFO - โโโ Loss: 5.9219 |
| 2025-08-30 06:46:15 - pico-train - INFO - โโโ Learning Rate: 2.87e-05 |
| 2025-08-30 06:46:15 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:46:28 - pico-train - INFO - Step 49675 -- ๐ Training Metrics |
| 2025-08-30 06:46:28 - pico-train - INFO - โโโ Loss: 5.8997 |
| 2025-08-30 06:46:28 - pico-train - INFO - โโโ Learning Rate: 2.87e-05 |
| 2025-08-30 06:46:28 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:46:41 - pico-train - INFO - Step 49700 -- ๐ Training Metrics |
| 2025-08-30 06:46:41 - pico-train - INFO - โโโ Loss: 5.9761 |
| 2025-08-30 06:46:41 - pico-train - INFO - โโโ Learning Rate: 2.87e-05 |
| 2025-08-30 06:46:41 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:46:53 - pico-train - INFO - Step 49725 -- ๐ Training Metrics |
| 2025-08-30 06:46:53 - pico-train - INFO - โโโ Loss: 5.8749 |
| 2025-08-30 06:46:53 - pico-train - INFO - โโโ Learning Rate: 2.86e-05 |
| 2025-08-30 06:46:53 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:47:06 - pico-train - INFO - Step 49750 -- ๐ Training Metrics |
| 2025-08-30 06:47:06 - pico-train - INFO - โโโ Loss: 5.9646 |
| 2025-08-30 06:47:06 - pico-train - INFO - โโโ Learning Rate: 2.86e-05 |
| 2025-08-30 06:47:06 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:47:18 - pico-train - INFO - Step 49775 -- ๐ Training Metrics |
| 2025-08-30 06:47:18 - pico-train - INFO - โโโ Loss: 5.9586 |
| 2025-08-30 06:47:18 - pico-train - INFO - โโโ Learning Rate: 2.86e-05 |
| 2025-08-30 06:47:18 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:47:31 - pico-train - INFO - Step 49800 -- ๐ Training Metrics |
| 2025-08-30 06:47:31 - pico-train - INFO - โโโ Loss: 5.9586 |
| 2025-08-30 06:47:31 - pico-train - INFO - โโโ Learning Rate: 2.86e-05 |
| 2025-08-30 06:47:31 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:47:44 - pico-train - INFO - Step 49825 -- ๐ Training Metrics |
| 2025-08-30 06:47:44 - pico-train - INFO - โโโ Loss: 5.9795 |
| 2025-08-30 06:47:44 - pico-train - INFO - โโโ Learning Rate: 2.86e-05 |
| 2025-08-30 06:47:44 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:47:56 - pico-train - INFO - Step 49850 -- ๐ Training Metrics |
| 2025-08-30 06:47:56 - pico-train - INFO - โโโ Loss: 5.9432 |
| 2025-08-30 06:47:56 - pico-train - INFO - โโโ Learning Rate: 2.85e-05 |
| 2025-08-30 06:47:56 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:48:09 - pico-train - INFO - Step 49875 -- ๐ Training Metrics |
| 2025-08-30 06:48:09 - pico-train - INFO - โโโ Loss: 6.0729 |
| 2025-08-30 06:48:09 - pico-train - INFO - โโโ Learning Rate: 2.85e-05 |
| 2025-08-30 06:48:09 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:48:22 - pico-train - INFO - Step 49900 -- ๐ Training Metrics |
| 2025-08-30 06:48:22 - pico-train - INFO - โโโ Loss: 6.0377 |
| 2025-08-30 06:48:22 - pico-train - INFO - โโโ Learning Rate: 2.85e-05 |
| 2025-08-30 06:48:22 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:48:34 - pico-train - INFO - Step 49925 -- ๐ Training Metrics |
| 2025-08-30 06:48:34 - pico-train - INFO - โโโ Loss: 5.9408 |
| 2025-08-30 06:48:34 - pico-train - INFO - โโโ Learning Rate: 2.85e-05 |
| 2025-08-30 06:48:34 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:48:47 - pico-train - INFO - Step 49950 -- ๐ Training Metrics |
| 2025-08-30 06:48:47 - pico-train - INFO - โโโ Loss: 5.9974 |
| 2025-08-30 06:48:47 - pico-train - INFO - โโโ Learning Rate: 2.84e-05 |
| 2025-08-30 06:48:47 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:48:59 - pico-train - INFO - Step 49975 -- ๐ Training Metrics |
| 2025-08-30 06:48:59 - pico-train - INFO - โโโ Loss: 5.8671 |
| 2025-08-30 06:48:59 - pico-train - INFO - โโโ Learning Rate: 2.84e-05 |
| 2025-08-30 06:48:59 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:49:12 - pico-train - INFO - Step 50000 -- ๐พ Saving Checkpoint |
| 2025-08-30 06:51:08 - pico-train - INFO - Step 50000 -- ๐ Evaluation Results |
| 2025-08-30 06:51:08 - pico-train - INFO - โโโ paloma: 1.5844205816962802e+29 |
| 2025-08-30 06:51:11 - pico-train - INFO - Step 50000 -- ๐ Training Metrics |
| 2025-08-30 06:51:11 - pico-train - INFO - โโโ Loss: 5.9467 |
| 2025-08-30 06:51:11 - pico-train - INFO - โโโ Learning Rate: 2.84e-05 |
| 2025-08-30 06:51:11 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:51:11 - pico-train - INFO - Step 50000 -- ๐ Saving Learning Dynamics |
| 2025-08-30 06:51:26 - pico-train - INFO - Step 50025 -- ๐ Training Metrics |
| 2025-08-30 06:51:26 - pico-train - INFO - โโโ Loss: 5.9961 |
| 2025-08-30 06:51:26 - pico-train - INFO - โโโ Learning Rate: 2.84e-05 |
| 2025-08-30 06:51:26 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:51:39 - pico-train - INFO - Step 50050 -- ๐ Training Metrics |
| 2025-08-30 06:51:39 - pico-train - INFO - โโโ Loss: 5.9269 |
| 2025-08-30 06:51:39 - pico-train - INFO - โโโ Learning Rate: 2.84e-05 |
| 2025-08-30 06:51:39 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:51:52 - pico-train - INFO - Step 50075 -- ๐ Training Metrics |
| 2025-08-30 06:51:52 - pico-train - INFO - โโโ Loss: 5.9394 |
| 2025-08-30 06:51:52 - pico-train - INFO - โโโ Learning Rate: 2.83e-05 |
| 2025-08-30 06:51:52 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:52:04 - pico-train - INFO - Step 50100 -- ๐ Training Metrics |
| 2025-08-30 06:52:04 - pico-train - INFO - โโโ Loss: 5.9330 |
| 2025-08-30 06:52:04 - pico-train - INFO - โโโ Learning Rate: 2.83e-05 |
| 2025-08-30 06:52:04 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:52:18 - pico-train - INFO - Step 50125 -- ๐ Training Metrics |
| 2025-08-30 06:52:18 - pico-train - INFO - โโโ Loss: 5.9620 |
| 2025-08-30 06:52:18 - pico-train - INFO - โโโ Learning Rate: 2.83e-05 |
| 2025-08-30 06:52:18 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:52:30 - pico-train - INFO - Step 50150 -- ๐ Training Metrics |
| 2025-08-30 06:52:30 - pico-train - INFO - โโโ Loss: 6.0199 |
| 2025-08-30 06:52:30 - pico-train - INFO - โโโ Learning Rate: 2.83e-05 |
| 2025-08-30 06:52:30 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:52:43 - pico-train - INFO - Step 50175 -- ๐ Training Metrics |
| 2025-08-30 06:52:43 - pico-train - INFO - โโโ Loss: 6.0399 |
| 2025-08-30 06:52:43 - pico-train - INFO - โโโ Learning Rate: 2.83e-05 |
| 2025-08-30 06:52:43 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:52:55 - pico-train - INFO - Step 50200 -- ๐ Training Metrics |
| 2025-08-30 06:52:55 - pico-train - INFO - โโโ Loss: 6.0137 |
| 2025-08-30 06:52:55 - pico-train - INFO - โโโ Learning Rate: 2.82e-05 |
| 2025-08-30 06:52:55 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:53:08 - pico-train - INFO - Step 50225 -- ๐ Training Metrics |
| 2025-08-30 06:53:08 - pico-train - INFO - โโโ Loss: 5.9405 |
| 2025-08-30 06:53:08 - pico-train - INFO - โโโ Learning Rate: 2.82e-05 |
| 2025-08-30 06:53:08 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:53:21 - pico-train - INFO - Step 50250 -- ๐ Training Metrics |
| 2025-08-30 06:53:21 - pico-train - INFO - โโโ Loss: 5.9045 |
| 2025-08-30 06:53:21 - pico-train - INFO - โโโ Learning Rate: 2.82e-05 |
| 2025-08-30 06:53:21 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:53:33 - pico-train - INFO - Step 50275 -- ๐ Training Metrics |
| 2025-08-30 06:53:33 - pico-train - INFO - โโโ Loss: 6.0237 |
| 2025-08-30 06:53:33 - pico-train - INFO - โโโ Learning Rate: 2.82e-05 |
| 2025-08-30 06:53:33 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:53:46 - pico-train - INFO - Step 50300 -- ๐ Training Metrics |
| 2025-08-30 06:53:46 - pico-train - INFO - โโโ Loss: 5.9869 |
| 2025-08-30 06:53:46 - pico-train - INFO - โโโ Learning Rate: 2.82e-05 |
| 2025-08-30 06:53:46 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:53:59 - pico-train - INFO - Step 50325 -- ๐ Training Metrics |
| 2025-08-30 06:53:59 - pico-train - INFO - โโโ Loss: 5.9344 |
| 2025-08-30 06:53:59 - pico-train - INFO - โโโ Learning Rate: 2.81e-05 |
| 2025-08-30 06:53:59 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:54:11 - pico-train - INFO - Step 50350 -- ๐ Training Metrics |
| 2025-08-30 06:54:11 - pico-train - INFO - โโโ Loss: 6.0131 |
| 2025-08-30 06:54:11 - pico-train - INFO - โโโ Learning Rate: 2.81e-05 |
| 2025-08-30 06:54:11 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:54:24 - pico-train - INFO - Step 50375 -- ๐ Training Metrics |
| 2025-08-30 06:54:24 - pico-train - INFO - โโโ Loss: 5.9916 |
| 2025-08-30 06:54:24 - pico-train - INFO - โโโ Learning Rate: 2.81e-05 |
| 2025-08-30 06:54:24 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:54:36 - pico-train - INFO - Step 50400 -- ๐ Training Metrics |
| 2025-08-30 06:54:36 - pico-train - INFO - โโโ Loss: 6.0289 |
| 2025-08-30 06:54:36 - pico-train - INFO - โโโ Learning Rate: 2.81e-05 |
| 2025-08-30 06:54:36 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:54:49 - pico-train - INFO - Step 50425 -- ๐ Training Metrics |
| 2025-08-30 06:54:49 - pico-train - INFO - โโโ Loss: 6.0051 |
| 2025-08-30 06:54:49 - pico-train - INFO - โโโ Learning Rate: 2.80e-05 |
| 2025-08-30 06:54:49 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:55:01 - pico-train - INFO - Step 50450 -- ๐ Training Metrics |
| 2025-08-30 06:55:01 - pico-train - INFO - โโโ Loss: 5.9803 |
| 2025-08-30 06:55:01 - pico-train - INFO - โโโ Learning Rate: 2.80e-05 |
| 2025-08-30 06:55:01 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:55:14 - pico-train - INFO - Step 50475 -- ๐ Training Metrics |
| 2025-08-30 06:55:14 - pico-train - INFO - โโโ Loss: 5.9222 |
| 2025-08-30 06:55:14 - pico-train - INFO - โโโ Learning Rate: 2.80e-05 |
| 2025-08-30 06:55:14 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:55:26 - pico-train - INFO - Step 50500 -- ๐พ Saving Checkpoint |
| 2025-08-30 06:57:22 - pico-train - INFO - Step 50500 -- ๐ Evaluation Results |
| 2025-08-30 06:57:22 - pico-train - INFO - โโโ paloma: 2.307126408238767e+29 |
| 2025-08-30 06:57:25 - pico-train - INFO - Step 50500 -- ๐ Training Metrics |
| 2025-08-30 06:57:25 - pico-train - INFO - โโโ Loss: 5.9666 |
| 2025-08-30 06:57:25 - pico-train - INFO - โโโ Learning Rate: 2.80e-05 |
| 2025-08-30 06:57:25 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:57:25 - pico-train - INFO - Step 50500 -- ๐ Saving Learning Dynamics |
| 2025-08-30 06:57:41 - pico-train - INFO - Step 50525 -- ๐ Training Metrics |
| 2025-08-30 06:57:41 - pico-train - INFO - โโโ Loss: 5.9108 |
| 2025-08-30 06:57:41 - pico-train - INFO - โโโ Learning Rate: 2.80e-05 |
| 2025-08-30 06:57:41 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:57:54 - pico-train - INFO - Step 50550 -- ๐ Training Metrics |
| 2025-08-30 06:57:54 - pico-train - INFO - โโโ Loss: 5.9066 |
| 2025-08-30 06:57:54 - pico-train - INFO - โโโ Learning Rate: 2.79e-05 |
| 2025-08-30 06:57:54 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:58:06 - pico-train - INFO - Step 50575 -- ๐ Training Metrics |
| 2025-08-30 06:58:06 - pico-train - INFO - โโโ Loss: 5.9447 |
| 2025-08-30 06:58:06 - pico-train - INFO - โโโ Learning Rate: 2.79e-05 |
| 2025-08-30 06:58:06 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:58:19 - pico-train - INFO - Step 50600 -- ๐ Training Metrics |
| 2025-08-30 06:58:19 - pico-train - INFO - โโโ Loss: 5.9920 |
| 2025-08-30 06:58:19 - pico-train - INFO - โโโ Learning Rate: 2.79e-05 |
| 2025-08-30 06:58:19 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:58:32 - pico-train - INFO - Step 50625 -- ๐ Training Metrics |
| 2025-08-30 06:58:32 - pico-train - INFO - โโโ Loss: 5.8969 |
| 2025-08-30 06:58:32 - pico-train - INFO - โโโ Learning Rate: 2.79e-05 |
| 2025-08-30 06:58:32 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:58:44 - pico-train - INFO - Step 50650 -- ๐ Training Metrics |
| 2025-08-30 06:58:44 - pico-train - INFO - โโโ Loss: 5.9352 |
| 2025-08-30 06:58:44 - pico-train - INFO - โโโ Learning Rate: 2.79e-05 |
| 2025-08-30 06:58:44 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:58:57 - pico-train - INFO - Step 50675 -- ๐ Training Metrics |
| 2025-08-30 06:58:57 - pico-train - INFO - โโโ Loss: 5.9511 |
| 2025-08-30 06:58:57 - pico-train - INFO - โโโ Learning Rate: 2.78e-05 |
| 2025-08-30 06:58:57 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:59:10 - pico-train - INFO - Step 50700 -- ๐ Training Metrics |
| 2025-08-30 06:59:10 - pico-train - INFO - โโโ Loss: 5.9762 |
| 2025-08-30 06:59:10 - pico-train - INFO - โโโ Learning Rate: 2.78e-05 |
| 2025-08-30 06:59:10 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:59:22 - pico-train - INFO - Step 50725 -- ๐ Training Metrics |
| 2025-08-30 06:59:22 - pico-train - INFO - โโโ Loss: 5.8962 |
| 2025-08-30 06:59:22 - pico-train - INFO - โโโ Learning Rate: 2.78e-05 |
| 2025-08-30 06:59:22 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:59:35 - pico-train - INFO - Step 50750 -- ๐ Training Metrics |
| 2025-08-30 06:59:35 - pico-train - INFO - โโโ Loss: 5.9610 |
| 2025-08-30 06:59:35 - pico-train - INFO - โโโ Learning Rate: 2.78e-05 |
| 2025-08-30 06:59:35 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 06:59:47 - pico-train - INFO - Step 50775 -- ๐ Training Metrics |
| 2025-08-30 06:59:47 - pico-train - INFO - โโโ Loss: 5.9507 |
| 2025-08-30 06:59:47 - pico-train - INFO - โโโ Learning Rate: 2.77e-05 |
| 2025-08-30 06:59:47 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:00:00 - pico-train - INFO - Step 50800 -- ๐ Training Metrics |
| 2025-08-30 07:00:00 - pico-train - INFO - โโโ Loss: 6.0094 |
| 2025-08-30 07:00:00 - pico-train - INFO - โโโ Learning Rate: 2.77e-05 |
| 2025-08-30 07:00:00 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:00:13 - pico-train - INFO - Step 50825 -- ๐ Training Metrics |
| 2025-08-30 07:00:13 - pico-train - INFO - โโโ Loss: 5.9076 |
| 2025-08-30 07:00:13 - pico-train - INFO - โโโ Learning Rate: 2.77e-05 |
| 2025-08-30 07:00:13 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:00:25 - pico-train - INFO - Step 50850 -- ๐ Training Metrics |
| 2025-08-30 07:00:25 - pico-train - INFO - โโโ Loss: 5.9857 |
| 2025-08-30 07:00:25 - pico-train - INFO - โโโ Learning Rate: 2.77e-05 |
| 2025-08-30 07:00:25 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:00:38 - pico-train - INFO - Step 50875 -- ๐ Training Metrics |
| 2025-08-30 07:00:38 - pico-train - INFO - โโโ Loss: 6.0201 |
| 2025-08-30 07:00:38 - pico-train - INFO - โโโ Learning Rate: 2.77e-05 |
| 2025-08-30 07:00:38 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:00:51 - pico-train - INFO - Step 50900 -- ๐ Training Metrics |
| 2025-08-30 07:00:51 - pico-train - INFO - โโโ Loss: 6.0121 |
| 2025-08-30 07:00:51 - pico-train - INFO - โโโ Learning Rate: 2.76e-05 |
| 2025-08-30 07:00:51 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:01:03 - pico-train - INFO - Step 50925 -- ๐ Training Metrics |
| 2025-08-30 07:01:03 - pico-train - INFO - โโโ Loss: 5.9658 |
| 2025-08-30 07:01:03 - pico-train - INFO - โโโ Learning Rate: 2.76e-05 |
| 2025-08-30 07:01:03 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:01:16 - pico-train - INFO - Step 50950 -- ๐ Training Metrics |
| 2025-08-30 07:01:16 - pico-train - INFO - โโโ Loss: 5.9981 |
| 2025-08-30 07:01:16 - pico-train - INFO - โโโ Learning Rate: 2.76e-05 |
| 2025-08-30 07:01:16 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:01:28 - pico-train - INFO - Step 50975 -- ๐ Training Metrics |
| 2025-08-30 07:01:28 - pico-train - INFO - โโโ Loss: 5.9961 |
| 2025-08-30 07:01:28 - pico-train - INFO - โโโ Learning Rate: 2.76e-05 |
| 2025-08-30 07:01:28 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:01:41 - pico-train - INFO - Step 51000 -- ๐พ Saving Checkpoint |
| 2025-08-30 07:03:38 - pico-train - INFO - Step 51000 -- ๐ Evaluation Results |
| 2025-08-30 07:03:38 - pico-train - INFO - โโโ paloma: 2.410761895962811e+29 |
| 2025-08-30 07:03:42 - pico-train - INFO - Step 51000 -- ๐ Training Metrics |
| 2025-08-30 07:03:42 - pico-train - INFO - โโโ Loss: 5.9076 |
| 2025-08-30 07:03:42 - pico-train - INFO - โโโ Learning Rate: 2.76e-05 |
| 2025-08-30 07:03:42 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:03:42 - pico-train - INFO - Step 51000 -- ๐ Saving Learning Dynamics |
| 2025-08-30 07:03:57 - pico-train - INFO - Step 51025 -- ๐ Training Metrics |
| 2025-08-30 07:03:57 - pico-train - INFO - โโโ Loss: 5.9965 |
| 2025-08-30 07:03:57 - pico-train - INFO - โโโ Learning Rate: 2.75e-05 |
| 2025-08-30 07:03:57 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:04:09 - pico-train - INFO - Step 51050 -- ๐ Training Metrics |
| 2025-08-30 07:04:09 - pico-train - INFO - โโโ Loss: 5.9633 |
| 2025-08-30 07:04:09 - pico-train - INFO - โโโ Learning Rate: 2.75e-05 |
| 2025-08-30 07:04:09 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:04:22 - pico-train - INFO - Step 51075 -- ๐ Training Metrics |
| 2025-08-30 07:04:22 - pico-train - INFO - โโโ Loss: 5.9476 |
| 2025-08-30 07:04:22 - pico-train - INFO - โโโ Learning Rate: 2.75e-05 |
| 2025-08-30 07:04:22 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:04:34 - pico-train - INFO - Step 51100 -- ๐ Training Metrics |
| 2025-08-30 07:04:34 - pico-train - INFO - โโโ Loss: 5.9797 |
| 2025-08-30 07:04:34 - pico-train - INFO - โโโ Learning Rate: 2.75e-05 |
| 2025-08-30 07:04:34 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:04:47 - pico-train - INFO - Step 51125 -- ๐ Training Metrics |
| 2025-08-30 07:04:47 - pico-train - INFO - โโโ Loss: 5.9138 |
| 2025-08-30 07:04:47 - pico-train - INFO - โโโ Learning Rate: 2.75e-05 |
| 2025-08-30 07:04:47 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:04:59 - pico-train - INFO - Step 51150 -- ๐ Training Metrics |
| 2025-08-30 07:04:59 - pico-train - INFO - โโโ Loss: 5.9946 |
| 2025-08-30 07:04:59 - pico-train - INFO - โโโ Learning Rate: 2.74e-05 |
| 2025-08-30 07:04:59 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:05:12 - pico-train - INFO - Step 51175 -- ๐ Training Metrics |
| 2025-08-30 07:05:12 - pico-train - INFO - โโโ Loss: 5.9050 |
| 2025-08-30 07:05:12 - pico-train - INFO - โโโ Learning Rate: 2.74e-05 |
| 2025-08-30 07:05:12 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:05:25 - pico-train - INFO - Step 51200 -- ๐ Training Metrics |
| 2025-08-30 07:05:25 - pico-train - INFO - โโโ Loss: 5.9431 |
| 2025-08-30 07:05:25 - pico-train - INFO - โโโ Learning Rate: 2.74e-05 |
| 2025-08-30 07:05:25 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:05:37 - pico-train - INFO - Step 51225 -- ๐ Training Metrics |
| 2025-08-30 07:05:37 - pico-train - INFO - โโโ Loss: 5.9906 |
| 2025-08-30 07:05:37 - pico-train - INFO - โโโ Learning Rate: 2.74e-05 |
| 2025-08-30 07:05:37 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:05:50 - pico-train - INFO - Step 51250 -- ๐ Training Metrics |
| 2025-08-30 07:05:50 - pico-train - INFO - โโโ Loss: 5.9408 |
| 2025-08-30 07:05:50 - pico-train - INFO - โโโ Learning Rate: 2.73e-05 |
| 2025-08-30 07:05:50 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:06:03 - pico-train - INFO - Step 51275 -- ๐ Training Metrics |
| 2025-08-30 07:06:03 - pico-train - INFO - โโโ Loss: 6.0058 |
| 2025-08-30 07:06:03 - pico-train - INFO - โโโ Learning Rate: 2.73e-05 |
| 2025-08-30 07:06:03 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:06:15 - pico-train - INFO - Step 51300 -- ๐ Training Metrics |
| 2025-08-30 07:06:15 - pico-train - INFO - โโโ Loss: 5.9526 |
| 2025-08-30 07:06:15 - pico-train - INFO - โโโ Learning Rate: 2.73e-05 |
| 2025-08-30 07:06:15 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:06:28 - pico-train - INFO - Step 51325 -- ๐ Training Metrics |
| 2025-08-30 07:06:28 - pico-train - INFO - โโโ Loss: 5.9452 |
| 2025-08-30 07:06:28 - pico-train - INFO - โโโ Learning Rate: 2.73e-05 |
| 2025-08-30 07:06:28 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:06:40 - pico-train - INFO - Step 51350 -- ๐ Training Metrics |
| 2025-08-30 07:06:40 - pico-train - INFO - โโโ Loss: 6.0049 |
| 2025-08-30 07:06:40 - pico-train - INFO - โโโ Learning Rate: 2.73e-05 |
| 2025-08-30 07:06:40 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:06:53 - pico-train - INFO - Step 51375 -- ๐ Training Metrics |
| 2025-08-30 07:06:53 - pico-train - INFO - โโโ Loss: 5.9591 |
| 2025-08-30 07:06:53 - pico-train - INFO - โโโ Learning Rate: 2.72e-05 |
| 2025-08-30 07:06:53 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:07:06 - pico-train - INFO - Step 51400 -- ๐ Training Metrics |
| 2025-08-30 07:07:06 - pico-train - INFO - โโโ Loss: 5.9947 |
| 2025-08-30 07:07:06 - pico-train - INFO - โโโ Learning Rate: 2.72e-05 |
| 2025-08-30 07:07:06 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:07:18 - pico-train - INFO - Step 51425 -- ๐ Training Metrics |
| 2025-08-30 07:07:18 - pico-train - INFO - โโโ Loss: 5.9487 |
| 2025-08-30 07:07:18 - pico-train - INFO - โโโ Learning Rate: 2.72e-05 |
| 2025-08-30 07:07:18 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:07:31 - pico-train - INFO - Step 51450 -- ๐ Training Metrics |
| 2025-08-30 07:07:31 - pico-train - INFO - โโโ Loss: 5.9444 |
| 2025-08-30 07:07:31 - pico-train - INFO - โโโ Learning Rate: 2.72e-05 |
| 2025-08-30 07:07:31 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:07:44 - pico-train - INFO - Step 51475 -- ๐ Training Metrics |
| 2025-08-30 07:07:44 - pico-train - INFO - โโโ Loss: 5.9784 |
| 2025-08-30 07:07:44 - pico-train - INFO - โโโ Learning Rate: 2.72e-05 |
| 2025-08-30 07:07:44 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:07:56 - pico-train - INFO - Step 51500 -- ๐พ Saving Checkpoint |
| 2025-08-30 07:09:51 - pico-train - INFO - Step 51500 -- ๐ Evaluation Results |
| 2025-08-30 07:09:51 - pico-train - INFO - โโโ paloma: 3.779255466147184e+29 |
| 2025-08-30 07:09:53 - pico-train - INFO - Step 51500 -- ๐ Training Metrics |
| 2025-08-30 07:09:53 - pico-train - INFO - โโโ Loss: 5.9132 |
| 2025-08-30 07:09:53 - pico-train - INFO - โโโ Learning Rate: 2.71e-05 |
| 2025-08-30 07:09:53 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:09:53 - pico-train - INFO - Step 51500 -- ๐ Saving Learning Dynamics |
| 2025-08-30 07:10:09 - pico-train - INFO - Step 51525 -- ๐ Training Metrics |
| 2025-08-30 07:10:09 - pico-train - INFO - โโโ Loss: 5.9536 |
| 2025-08-30 07:10:09 - pico-train - INFO - โโโ Learning Rate: 2.71e-05 |
| 2025-08-30 07:10:09 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:10:21 - pico-train - INFO - Step 51550 -- ๐ Training Metrics |
| 2025-08-30 07:10:21 - pico-train - INFO - โโโ Loss: 5.9491 |
| 2025-08-30 07:10:21 - pico-train - INFO - โโโ Learning Rate: 2.71e-05 |
| 2025-08-30 07:10:21 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:10:34 - pico-train - INFO - Step 51575 -- ๐ Training Metrics |
| 2025-08-30 07:10:34 - pico-train - INFO - โโโ Loss: 5.9868 |
| 2025-08-30 07:10:34 - pico-train - INFO - โโโ Learning Rate: 2.71e-05 |
| 2025-08-30 07:10:34 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:10:47 - pico-train - INFO - Step 51600 -- ๐ Training Metrics |
| 2025-08-30 07:10:47 - pico-train - INFO - โโโ Loss: 5.9299 |
| 2025-08-30 07:10:47 - pico-train - INFO - โโโ Learning Rate: 2.70e-05 |
| 2025-08-30 07:10:47 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:10:59 - pico-train - INFO - Step 51625 -- ๐ Training Metrics |
| 2025-08-30 07:10:59 - pico-train - INFO - โโโ Loss: 5.9520 |
| 2025-08-30 07:10:59 - pico-train - INFO - โโโ Learning Rate: 2.70e-05 |
| 2025-08-30 07:10:59 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:11:12 - pico-train - INFO - Step 51650 -- ๐ Training Metrics |
| 2025-08-30 07:11:12 - pico-train - INFO - โโโ Loss: 5.8812 |
| 2025-08-30 07:11:12 - pico-train - INFO - โโโ Learning Rate: 2.70e-05 |
| 2025-08-30 07:11:12 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:11:24 - pico-train - INFO - Step 51675 -- ๐ Training Metrics |
| 2025-08-30 07:11:24 - pico-train - INFO - โโโ Loss: 5.9874 |
| 2025-08-30 07:11:24 - pico-train - INFO - โโโ Learning Rate: 2.70e-05 |
| 2025-08-30 07:11:24 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:11:37 - pico-train - INFO - Step 51700 -- ๐ Training Metrics |
| 2025-08-30 07:11:37 - pico-train - INFO - โโโ Loss: 5.8259 |
| 2025-08-30 07:11:37 - pico-train - INFO - โโโ Learning Rate: 2.70e-05 |
| 2025-08-30 07:11:37 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:11:50 - pico-train - INFO - Step 51725 -- ๐ Training Metrics |
| 2025-08-30 07:11:50 - pico-train - INFO - โโโ Loss: 5.8867 |
| 2025-08-30 07:11:50 - pico-train - INFO - โโโ Learning Rate: 2.69e-05 |
| 2025-08-30 07:11:50 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:12:03 - pico-train - INFO - Step 51750 -- ๐ Training Metrics |
| 2025-08-30 07:12:03 - pico-train - INFO - โโโ Loss: 5.9863 |
| 2025-08-30 07:12:03 - pico-train - INFO - โโโ Learning Rate: 2.69e-05 |
| 2025-08-30 07:12:03 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:12:16 - pico-train - INFO - Step 51775 -- ๐ Training Metrics |
| 2025-08-30 07:12:16 - pico-train - INFO - โโโ Loss: 6.0154 |
| 2025-08-30 07:12:16 - pico-train - INFO - โโโ Learning Rate: 2.69e-05 |
| 2025-08-30 07:12:16 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:12:28 - pico-train - INFO - Step 51800 -- ๐ Training Metrics |
| 2025-08-30 07:12:28 - pico-train - INFO - โโโ Loss: 5.9222 |
| 2025-08-30 07:12:28 - pico-train - INFO - โโโ Learning Rate: 2.69e-05 |
| 2025-08-30 07:12:28 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:12:41 - pico-train - INFO - Step 51825 -- ๐ Training Metrics |
| 2025-08-30 07:12:41 - pico-train - INFO - โโโ Loss: 5.9468 |
| 2025-08-30 07:12:41 - pico-train - INFO - โโโ Learning Rate: 2.69e-05 |
| 2025-08-30 07:12:41 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:12:53 - pico-train - INFO - Step 51850 -- ๐ Training Metrics |
| 2025-08-30 07:12:53 - pico-train - INFO - โโโ Loss: 5.9967 |
| 2025-08-30 07:12:53 - pico-train - INFO - โโโ Learning Rate: 2.68e-05 |
| 2025-08-30 07:12:53 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:13:06 - pico-train - INFO - Step 51875 -- ๐ Training Metrics |
| 2025-08-30 07:13:06 - pico-train - INFO - โโโ Loss: 5.9565 |
| 2025-08-30 07:13:06 - pico-train - INFO - โโโ Learning Rate: 2.68e-05 |
| 2025-08-30 07:13:06 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:13:19 - pico-train - INFO - Step 51900 -- ๐ Training Metrics |
| 2025-08-30 07:13:19 - pico-train - INFO - โโโ Loss: 5.9186 |
| 2025-08-30 07:13:19 - pico-train - INFO - โโโ Learning Rate: 2.68e-05 |
| 2025-08-30 07:13:19 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:13:31 - pico-train - INFO - Step 51925 -- ๐ Training Metrics |
| 2025-08-30 07:13:31 - pico-train - INFO - โโโ Loss: 5.7959 |
| 2025-08-30 07:13:31 - pico-train - INFO - โโโ Learning Rate: 2.68e-05 |
| 2025-08-30 07:13:31 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:13:44 - pico-train - INFO - Step 51950 -- ๐ Training Metrics |
| 2025-08-30 07:13:44 - pico-train - INFO - โโโ Loss: 5.9955 |
| 2025-08-30 07:13:44 - pico-train - INFO - โโโ Learning Rate: 2.67e-05 |
| 2025-08-30 07:13:44 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:13:56 - pico-train - INFO - Step 51975 -- ๐ Training Metrics |
| 2025-08-30 07:13:56 - pico-train - INFO - โโโ Loss: 5.9673 |
| 2025-08-30 07:13:56 - pico-train - INFO - โโโ Learning Rate: 2.67e-05 |
| 2025-08-30 07:13:56 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:14:09 - pico-train - INFO - Step 52000 -- ๐พ Saving Checkpoint |
| 2025-08-30 07:16:04 - pico-train - INFO - Step 52000 -- ๐ Evaluation Results |
| 2025-08-30 07:16:04 - pico-train - INFO - โโโ paloma: 4.093974189838008e+29 |
| 2025-08-30 07:16:07 - pico-train - INFO - Step 52000 -- ๐ Training Metrics |
| 2025-08-30 07:16:07 - pico-train - INFO - โโโ Loss: 6.0273 |
| 2025-08-30 07:16:07 - pico-train - INFO - โโโ Learning Rate: 2.67e-05 |
| 2025-08-30 07:16:07 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:16:07 - pico-train - INFO - Step 52000 -- ๐ Saving Learning Dynamics |
| 2025-08-30 07:16:23 - pico-train - INFO - Step 52025 -- ๐ Training Metrics |
| 2025-08-30 07:16:23 - pico-train - INFO - โโโ Loss: 5.8760 |
| 2025-08-30 07:16:23 - pico-train - INFO - โโโ Learning Rate: 2.67e-05 |
| 2025-08-30 07:16:23 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:16:36 - pico-train - INFO - Step 52050 -- ๐ Training Metrics |
| 2025-08-30 07:16:36 - pico-train - INFO - โโโ Loss: 5.9087 |
| 2025-08-30 07:16:36 - pico-train - INFO - โโโ Learning Rate: 2.67e-05 |
| 2025-08-30 07:16:36 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:16:48 - pico-train - INFO - Step 52075 -- ๐ Training Metrics |
| 2025-08-30 07:16:48 - pico-train - INFO - โโโ Loss: 5.8656 |
| 2025-08-30 07:16:48 - pico-train - INFO - โโโ Learning Rate: 2.66e-05 |
| 2025-08-30 07:16:48 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:17:01 - pico-train - INFO - Step 52100 -- ๐ Training Metrics |
| 2025-08-30 07:17:01 - pico-train - INFO - โโโ Loss: 5.9358 |
| 2025-08-30 07:17:01 - pico-train - INFO - โโโ Learning Rate: 2.66e-05 |
| 2025-08-30 07:17:01 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:17:13 - pico-train - INFO - Step 52125 -- ๐ Training Metrics |
| 2025-08-30 07:17:13 - pico-train - INFO - โโโ Loss: 6.0056 |
| 2025-08-30 07:17:13 - pico-train - INFO - โโโ Learning Rate: 2.66e-05 |
| 2025-08-30 07:17:13 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:17:26 - pico-train - INFO - Step 52150 -- ๐ Training Metrics |
| 2025-08-30 07:17:26 - pico-train - INFO - โโโ Loss: 5.9770 |
| 2025-08-30 07:17:26 - pico-train - INFO - โโโ Learning Rate: 2.66e-05 |
| 2025-08-30 07:17:26 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:17:38 - pico-train - INFO - Step 52175 -- ๐ Training Metrics |
| 2025-08-30 07:17:38 - pico-train - INFO - โโโ Loss: 5.9145 |
| 2025-08-30 07:17:38 - pico-train - INFO - โโโ Learning Rate: 2.66e-05 |
| 2025-08-30 07:17:38 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:17:51 - pico-train - INFO - Step 52200 -- ๐ Training Metrics |
| 2025-08-30 07:17:51 - pico-train - INFO - โโโ Loss: 5.9592 |
| 2025-08-30 07:17:51 - pico-train - INFO - โโโ Learning Rate: 2.65e-05 |
| 2025-08-30 07:17:51 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:18:04 - pico-train - INFO - Step 52225 -- ๐ Training Metrics |
| 2025-08-30 07:18:04 - pico-train - INFO - โโโ Loss: 5.9323 |
| 2025-08-30 07:18:04 - pico-train - INFO - โโโ Learning Rate: 2.65e-05 |
| 2025-08-30 07:18:04 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:18:17 - pico-train - INFO - Step 52250 -- ๐ Training Metrics |
| 2025-08-30 07:18:17 - pico-train - INFO - โโโ Loss: 5.9309 |
| 2025-08-30 07:18:17 - pico-train - INFO - โโโ Learning Rate: 2.65e-05 |
| 2025-08-30 07:18:17 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:18:29 - pico-train - INFO - Step 52275 -- ๐ Training Metrics |
| 2025-08-30 07:18:29 - pico-train - INFO - โโโ Loss: 6.0290 |
| 2025-08-30 07:18:29 - pico-train - INFO - โโโ Learning Rate: 2.65e-05 |
| 2025-08-30 07:18:29 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:18:42 - pico-train - INFO - Step 52300 -- ๐ Training Metrics |
| 2025-08-30 07:18:42 - pico-train - INFO - โโโ Loss: 6.0121 |
| 2025-08-30 07:18:42 - pico-train - INFO - โโโ Learning Rate: 2.65e-05 |
| 2025-08-30 07:18:42 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:18:55 - pico-train - INFO - Step 52325 -- ๐ Training Metrics |
| 2025-08-30 07:18:55 - pico-train - INFO - โโโ Loss: 5.8936 |
| 2025-08-30 07:18:55 - pico-train - INFO - โโโ Learning Rate: 2.64e-05 |
| 2025-08-30 07:18:55 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:19:07 - pico-train - INFO - Step 52350 -- ๐ Training Metrics |
| 2025-08-30 07:19:07 - pico-train - INFO - โโโ Loss: 5.9461 |
| 2025-08-30 07:19:07 - pico-train - INFO - โโโ Learning Rate: 2.64e-05 |
| 2025-08-30 07:19:07 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:19:20 - pico-train - INFO - Step 52375 -- ๐ Training Metrics |
| 2025-08-30 07:19:20 - pico-train - INFO - โโโ Loss: 6.0288 |
| 2025-08-30 07:19:20 - pico-train - INFO - โโโ Learning Rate: 2.64e-05 |
| 2025-08-30 07:19:20 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:19:32 - pico-train - INFO - Step 52400 -- ๐ Training Metrics |
| 2025-08-30 07:19:32 - pico-train - INFO - โโโ Loss: 5.9728 |
| 2025-08-30 07:19:32 - pico-train - INFO - โโโ Learning Rate: 2.64e-05 |
| 2025-08-30 07:19:32 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:19:45 - pico-train - INFO - Step 52425 -- ๐ Training Metrics |
| 2025-08-30 07:19:45 - pico-train - INFO - โโโ Loss: 5.8944 |
| 2025-08-30 07:19:45 - pico-train - INFO - โโโ Learning Rate: 2.63e-05 |
| 2025-08-30 07:19:45 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:19:58 - pico-train - INFO - Step 52450 -- ๐ Training Metrics |
| 2025-08-30 07:19:58 - pico-train - INFO - โโโ Loss: 6.0148 |
| 2025-08-30 07:19:58 - pico-train - INFO - โโโ Learning Rate: 2.63e-05 |
| 2025-08-30 07:19:58 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:20:10 - pico-train - INFO - Step 52475 -- ๐ Training Metrics |
| 2025-08-30 07:20:10 - pico-train - INFO - โโโ Loss: 5.9232 |
| 2025-08-30 07:20:10 - pico-train - INFO - โโโ Learning Rate: 2.63e-05 |
| 2025-08-30 07:20:10 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:20:22 - pico-train - INFO - Step 52500 -- ๐พ Saving Checkpoint |
| 2025-08-30 07:22:31 - pico-train - INFO - Step 52500 -- ๐ Evaluation Results |
| 2025-08-30 07:22:31 - pico-train - INFO - โโโ paloma: 4.3829741549987764e+29 |
| 2025-08-30 07:22:34 - pico-train - INFO - Step 52500 -- ๐ Training Metrics |
| 2025-08-30 07:22:34 - pico-train - INFO - โโโ Loss: 6.0213 |
| 2025-08-30 07:22:34 - pico-train - INFO - โโโ Learning Rate: 2.63e-05 |
| 2025-08-30 07:22:34 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:22:34 - pico-train - INFO - Step 52500 -- ๐ Saving Learning Dynamics |
| 2025-08-30 07:22:49 - pico-train - INFO - Step 52525 -- ๐ Training Metrics |
| 2025-08-30 07:22:49 - pico-train - INFO - โโโ Loss: 5.9703 |
| 2025-08-30 07:22:49 - pico-train - INFO - โโโ Learning Rate: 2.63e-05 |
| 2025-08-30 07:22:49 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:23:01 - pico-train - INFO - Step 52550 -- ๐ Training Metrics |
| 2025-08-30 07:23:01 - pico-train - INFO - โโโ Loss: 5.9471 |
| 2025-08-30 07:23:01 - pico-train - INFO - โโโ Learning Rate: 2.62e-05 |
| 2025-08-30 07:23:01 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:23:14 - pico-train - INFO - Step 52575 -- ๐ Training Metrics |
| 2025-08-30 07:23:14 - pico-train - INFO - โโโ Loss: 5.9502 |
| 2025-08-30 07:23:14 - pico-train - INFO - โโโ Learning Rate: 2.62e-05 |
| 2025-08-30 07:23:14 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:23:26 - pico-train - INFO - Step 52600 -- ๐ Training Metrics |
| 2025-08-30 07:23:26 - pico-train - INFO - โโโ Loss: 5.9032 |
| 2025-08-30 07:23:26 - pico-train - INFO - โโโ Learning Rate: 2.62e-05 |
| 2025-08-30 07:23:26 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:23:39 - pico-train - INFO - Step 52625 -- ๐ Training Metrics |
| 2025-08-30 07:23:39 - pico-train - INFO - โโโ Loss: 5.8965 |
| 2025-08-30 07:23:39 - pico-train - INFO - โโโ Learning Rate: 2.62e-05 |
| 2025-08-30 07:23:39 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:23:51 - pico-train - INFO - Step 52650 -- ๐ Training Metrics |
| 2025-08-30 07:23:51 - pico-train - INFO - โโโ Loss: 5.9213 |
| 2025-08-30 07:23:51 - pico-train - INFO - โโโ Learning Rate: 2.62e-05 |
| 2025-08-30 07:23:51 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:24:04 - pico-train - INFO - Step 52675 -- ๐ Training Metrics |
| 2025-08-30 07:24:04 - pico-train - INFO - โโโ Loss: 6.0115 |
| 2025-08-30 07:24:04 - pico-train - INFO - โโโ Learning Rate: 2.61e-05 |
| 2025-08-30 07:24:04 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:24:17 - pico-train - INFO - Step 52700 -- ๐ Training Metrics |
| 2025-08-30 07:24:17 - pico-train - INFO - โโโ Loss: 5.9340 |
| 2025-08-30 07:24:17 - pico-train - INFO - โโโ Learning Rate: 2.61e-05 |
| 2025-08-30 07:24:17 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:24:29 - pico-train - INFO - Step 52725 -- ๐ Training Metrics |
| 2025-08-30 07:24:29 - pico-train - INFO - โโโ Loss: 5.8320 |
| 2025-08-30 07:24:29 - pico-train - INFO - โโโ Learning Rate: 2.61e-05 |
| 2025-08-30 07:24:29 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:24:42 - pico-train - INFO - Step 52750 -- ๐ Training Metrics |
| 2025-08-30 07:24:42 - pico-train - INFO - โโโ Loss: 5.9125 |
| 2025-08-30 07:24:42 - pico-train - INFO - โโโ Learning Rate: 2.61e-05 |
| 2025-08-30 07:24:42 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:24:55 - pico-train - INFO - Step 52775 -- ๐ Training Metrics |
| 2025-08-30 07:24:55 - pico-train - INFO - โโโ Loss: 5.8468 |
| 2025-08-30 07:24:55 - pico-train - INFO - โโโ Learning Rate: 2.60e-05 |
| 2025-08-30 07:24:55 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:25:07 - pico-train - INFO - Step 52800 -- ๐ Training Metrics |
| 2025-08-30 07:25:07 - pico-train - INFO - โโโ Loss: 5.9822 |
| 2025-08-30 07:25:07 - pico-train - INFO - โโโ Learning Rate: 2.60e-05 |
| 2025-08-30 07:25:07 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:25:20 - pico-train - INFO - Step 52825 -- ๐ Training Metrics |
| 2025-08-30 07:25:20 - pico-train - INFO - โโโ Loss: 6.0151 |
| 2025-08-30 07:25:20 - pico-train - INFO - โโโ Learning Rate: 2.60e-05 |
| 2025-08-30 07:25:20 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:25:33 - pico-train - INFO - Step 52850 -- ๐ Training Metrics |
| 2025-08-30 07:25:33 - pico-train - INFO - โโโ Loss: 5.9894 |
| 2025-08-30 07:25:33 - pico-train - INFO - โโโ Learning Rate: 2.60e-05 |
| 2025-08-30 07:25:33 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:25:45 - pico-train - INFO - Step 52875 -- ๐ Training Metrics |
| 2025-08-30 07:25:45 - pico-train - INFO - โโโ Loss: 5.8899 |
| 2025-08-30 07:25:45 - pico-train - INFO - โโโ Learning Rate: 2.60e-05 |
| 2025-08-30 07:25:45 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:25:58 - pico-train - INFO - Step 52900 -- ๐ Training Metrics |
| 2025-08-30 07:25:58 - pico-train - INFO - โโโ Loss: 5.8752 |
| 2025-08-30 07:25:58 - pico-train - INFO - โโโ Learning Rate: 2.59e-05 |
| 2025-08-30 07:25:58 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:26:11 - pico-train - INFO - Step 52925 -- ๐ Training Metrics |
| 2025-08-30 07:26:11 - pico-train - INFO - โโโ Loss: 5.9977 |
| 2025-08-30 07:26:11 - pico-train - INFO - โโโ Learning Rate: 2.59e-05 |
| 2025-08-30 07:26:11 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:26:23 - pico-train - INFO - Step 52950 -- ๐ Training Metrics |
| 2025-08-30 07:26:23 - pico-train - INFO - โโโ Loss: 5.9522 |
| 2025-08-30 07:26:23 - pico-train - INFO - โโโ Learning Rate: 2.59e-05 |
| 2025-08-30 07:26:23 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:26:36 - pico-train - INFO - Step 52975 -- ๐ Training Metrics |
| 2025-08-30 07:26:36 - pico-train - INFO - โโโ Loss: 5.9471 |
| 2025-08-30 07:26:36 - pico-train - INFO - โโโ Learning Rate: 2.59e-05 |
| 2025-08-30 07:26:36 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:26:48 - pico-train - INFO - Step 53000 -- ๐พ Saving Checkpoint |
| 2025-08-30 07:28:43 - pico-train - INFO - Step 53000 -- ๐ Evaluation Results |
| 2025-08-30 07:28:43 - pico-train - INFO - โโโ paloma: 6.0678060947205406e+29 |
| 2025-08-30 07:28:45 - pico-train - INFO - Step 53000 -- ๐ Training Metrics |
| 2025-08-30 07:28:45 - pico-train - INFO - โโโ Loss: 5.9668 |
| 2025-08-30 07:28:45 - pico-train - INFO - โโโ Learning Rate: 2.59e-05 |
| 2025-08-30 07:28:45 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:28:45 - pico-train - INFO - Step 53000 -- ๐ Saving Learning Dynamics |
| 2025-08-30 07:29:00 - pico-train - INFO - Step 53025 -- ๐ Training Metrics |
| 2025-08-30 07:29:00 - pico-train - INFO - โโโ Loss: 5.9943 |
| 2025-08-30 07:29:00 - pico-train - INFO - โโโ Learning Rate: 2.58e-05 |
| 2025-08-30 07:29:00 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:29:13 - pico-train - INFO - Step 53050 -- ๐ Training Metrics |
| 2025-08-30 07:29:13 - pico-train - INFO - โโโ Loss: 5.9316 |
| 2025-08-30 07:29:13 - pico-train - INFO - โโโ Learning Rate: 2.58e-05 |
| 2025-08-30 07:29:13 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:29:25 - pico-train - INFO - Step 53075 -- ๐ Training Metrics |
| 2025-08-30 07:29:25 - pico-train - INFO - โโโ Loss: 5.9268 |
| 2025-08-30 07:29:25 - pico-train - INFO - โโโ Learning Rate: 2.58e-05 |
| 2025-08-30 07:29:25 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:29:38 - pico-train - INFO - Step 53100 -- ๐ Training Metrics |
| 2025-08-30 07:29:38 - pico-train - INFO - โโโ Loss: 5.9084 |
| 2025-08-30 07:29:38 - pico-train - INFO - โโโ Learning Rate: 2.58e-05 |
| 2025-08-30 07:29:38 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:29:51 - pico-train - INFO - Step 53125 -- ๐ Training Metrics |
| 2025-08-30 07:29:51 - pico-train - INFO - โโโ Loss: 6.0159 |
| 2025-08-30 07:29:51 - pico-train - INFO - โโโ Learning Rate: 2.57e-05 |
| 2025-08-30 07:29:51 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:30:03 - pico-train - INFO - Step 53150 -- ๐ Training Metrics |
| 2025-08-30 07:30:03 - pico-train - INFO - โโโ Loss: 5.8973 |
| 2025-08-30 07:30:03 - pico-train - INFO - โโโ Learning Rate: 2.57e-05 |
| 2025-08-30 07:30:03 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:30:16 - pico-train - INFO - Step 53175 -- ๐ Training Metrics |
| 2025-08-30 07:30:16 - pico-train - INFO - โโโ Loss: 5.9660 |
| 2025-08-30 07:30:16 - pico-train - INFO - โโโ Learning Rate: 2.57e-05 |
| 2025-08-30 07:30:16 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:30:29 - pico-train - INFO - Step 53200 -- ๐ Training Metrics |
| 2025-08-30 07:30:29 - pico-train - INFO - โโโ Loss: 5.9716 |
| 2025-08-30 07:30:29 - pico-train - INFO - โโโ Learning Rate: 2.57e-05 |
| 2025-08-30 07:30:29 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:30:41 - pico-train - INFO - Step 53225 -- ๐ Training Metrics |
| 2025-08-30 07:30:41 - pico-train - INFO - โโโ Loss: 5.8883 |
| 2025-08-30 07:30:41 - pico-train - INFO - โโโ Learning Rate: 2.57e-05 |
| 2025-08-30 07:30:41 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:30:54 - pico-train - INFO - Step 53250 -- ๐ Training Metrics |
| 2025-08-30 07:30:54 - pico-train - INFO - โโโ Loss: 5.9727 |
| 2025-08-30 07:30:54 - pico-train - INFO - โโโ Learning Rate: 2.56e-05 |
| 2025-08-30 07:30:54 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:31:07 - pico-train - INFO - Step 53275 -- ๐ Training Metrics |
| 2025-08-30 07:31:07 - pico-train - INFO - โโโ Loss: 5.8948 |
| 2025-08-30 07:31:07 - pico-train - INFO - โโโ Learning Rate: 2.56e-05 |
| 2025-08-30 07:31:07 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:31:19 - pico-train - INFO - Step 53300 -- ๐ Training Metrics |
| 2025-08-30 07:31:19 - pico-train - INFO - โโโ Loss: 5.8979 |
| 2025-08-30 07:31:19 - pico-train - INFO - โโโ Learning Rate: 2.56e-05 |
| 2025-08-30 07:31:19 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:31:32 - pico-train - INFO - Step 53325 -- ๐ Training Metrics |
| 2025-08-30 07:31:32 - pico-train - INFO - โโโ Loss: 5.9572 |
| 2025-08-30 07:31:32 - pico-train - INFO - โโโ Learning Rate: 2.56e-05 |
| 2025-08-30 07:31:32 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:31:44 - pico-train - INFO - Step 53350 -- ๐ Training Metrics |
| 2025-08-30 07:31:44 - pico-train - INFO - โโโ Loss: 5.8599 |
| 2025-08-30 07:31:44 - pico-train - INFO - โโโ Learning Rate: 2.56e-05 |
| 2025-08-30 07:31:44 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:31:57 - pico-train - INFO - Step 53375 -- ๐ Training Metrics |
| 2025-08-30 07:31:57 - pico-train - INFO - โโโ Loss: 5.8751 |
| 2025-08-30 07:31:57 - pico-train - INFO - โโโ Learning Rate: 2.55e-05 |
| 2025-08-30 07:31:57 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:32:09 - pico-train - INFO - Step 53400 -- ๐ Training Metrics |
| 2025-08-30 07:32:09 - pico-train - INFO - โโโ Loss: 5.9950 |
| 2025-08-30 07:32:09 - pico-train - INFO - โโโ Learning Rate: 2.55e-05 |
| 2025-08-30 07:32:09 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:32:22 - pico-train - INFO - Step 53425 -- ๐ Training Metrics |
| 2025-08-30 07:32:22 - pico-train - INFO - โโโ Loss: 5.9827 |
| 2025-08-30 07:32:22 - pico-train - INFO - โโโ Learning Rate: 2.55e-05 |
| 2025-08-30 07:32:22 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:32:34 - pico-train - INFO - Step 53450 -- ๐ Training Metrics |
| 2025-08-30 07:32:34 - pico-train - INFO - โโโ Loss: 5.8589 |
| 2025-08-30 07:32:34 - pico-train - INFO - โโโ Learning Rate: 2.55e-05 |
| 2025-08-30 07:32:34 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:32:47 - pico-train - INFO - Step 53475 -- ๐ Training Metrics |
| 2025-08-30 07:32:47 - pico-train - INFO - โโโ Loss: 5.9415 |
| 2025-08-30 07:32:47 - pico-train - INFO - โโโ Learning Rate: 2.54e-05 |
| 2025-08-30 07:32:47 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:32:59 - pico-train - INFO - Step 53500 -- ๐พ Saving Checkpoint |
| 2025-08-30 07:34:53 - pico-train - INFO - Step 53500 -- ๐ Evaluation Results |
| 2025-08-30 07:34:53 - pico-train - INFO - โโโ paloma: 5.560195307890516e+29 |
| 2025-08-30 07:34:55 - pico-train - INFO - Step 53500 -- ๐ Training Metrics |
| 2025-08-30 07:34:55 - pico-train - INFO - โโโ Loss: 5.8976 |
| 2025-08-30 07:34:55 - pico-train - INFO - โโโ Learning Rate: 2.54e-05 |
| 2025-08-30 07:34:55 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:34:55 - pico-train - INFO - Step 53500 -- ๐ Saving Learning Dynamics |
| 2025-08-30 07:35:10 - pico-train - INFO - Step 53525 -- ๐ Training Metrics |
| 2025-08-30 07:35:10 - pico-train - INFO - โโโ Loss: 5.9070 |
| 2025-08-30 07:35:10 - pico-train - INFO - โโโ Learning Rate: 2.54e-05 |
| 2025-08-30 07:35:10 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:35:22 - pico-train - INFO - Step 53550 -- ๐ Training Metrics |
| 2025-08-30 07:35:22 - pico-train - INFO - โโโ Loss: 5.8362 |
| 2025-08-30 07:35:22 - pico-train - INFO - โโโ Learning Rate: 2.54e-05 |
| 2025-08-30 07:35:22 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:35:35 - pico-train - INFO - Step 53575 -- ๐ Training Metrics |
| 2025-08-30 07:35:35 - pico-train - INFO - โโโ Loss: 5.8874 |
| 2025-08-30 07:35:35 - pico-train - INFO - โโโ Learning Rate: 2.54e-05 |
| 2025-08-30 07:35:35 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:35:48 - pico-train - INFO - Step 53600 -- ๐ Training Metrics |
| 2025-08-30 07:35:48 - pico-train - INFO - โโโ Loss: 5.8866 |
| 2025-08-30 07:35:48 - pico-train - INFO - โโโ Learning Rate: 2.53e-05 |
| 2025-08-30 07:35:48 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:36:00 - pico-train - INFO - Step 53625 -- ๐ Training Metrics |
| 2025-08-30 07:36:00 - pico-train - INFO - โโโ Loss: 5.8824 |
| 2025-08-30 07:36:00 - pico-train - INFO - โโโ Learning Rate: 2.53e-05 |
| 2025-08-30 07:36:00 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:36:13 - pico-train - INFO - Step 53650 -- ๐ Training Metrics |
| 2025-08-30 07:36:13 - pico-train - INFO - โโโ Loss: 5.7949 |
| 2025-08-30 07:36:13 - pico-train - INFO - โโโ Learning Rate: 2.53e-05 |
| 2025-08-30 07:36:13 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:36:25 - pico-train - INFO - Step 53675 -- ๐ Training Metrics |
| 2025-08-30 07:36:25 - pico-train - INFO - โโโ Loss: 5.9849 |
| 2025-08-30 07:36:25 - pico-train - INFO - โโโ Learning Rate: 2.53e-05 |
| 2025-08-30 07:36:25 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:36:38 - pico-train - INFO - Step 53700 -- ๐ Training Metrics |
| 2025-08-30 07:36:38 - pico-train - INFO - โโโ Loss: 5.9197 |
| 2025-08-30 07:36:38 - pico-train - INFO - โโโ Learning Rate: 2.53e-05 |
| 2025-08-30 07:36:38 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:36:50 - pico-train - INFO - Step 53725 -- ๐ Training Metrics |
| 2025-08-30 07:36:50 - pico-train - INFO - โโโ Loss: 5.9326 |
| 2025-08-30 07:36:50 - pico-train - INFO - โโโ Learning Rate: 2.52e-05 |
| 2025-08-30 07:36:50 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:37:03 - pico-train - INFO - Step 53750 -- ๐ Training Metrics |
| 2025-08-30 07:37:03 - pico-train - INFO - โโโ Loss: 5.8980 |
| 2025-08-30 07:37:03 - pico-train - INFO - โโโ Learning Rate: 2.52e-05 |
| 2025-08-30 07:37:03 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:37:16 - pico-train - INFO - Step 53775 -- ๐ Training Metrics |
| 2025-08-30 07:37:16 - pico-train - INFO - โโโ Loss: 5.8599 |
| 2025-08-30 07:37:16 - pico-train - INFO - โโโ Learning Rate: 2.52e-05 |
| 2025-08-30 07:37:16 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:37:28 - pico-train - INFO - Step 53800 -- ๐ Training Metrics |
| 2025-08-30 07:37:28 - pico-train - INFO - โโโ Loss: 5.8844 |
| 2025-08-30 07:37:28 - pico-train - INFO - โโโ Learning Rate: 2.52e-05 |
| 2025-08-30 07:37:28 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:37:41 - pico-train - INFO - Step 53825 -- ๐ Training Metrics |
| 2025-08-30 07:37:41 - pico-train - INFO - โโโ Loss: 5.9178 |
| 2025-08-30 07:37:41 - pico-train - INFO - โโโ Learning Rate: 2.51e-05 |
| 2025-08-30 07:37:41 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:37:54 - pico-train - INFO - Step 53850 -- ๐ Training Metrics |
| 2025-08-30 07:37:54 - pico-train - INFO - โโโ Loss: 5.9118 |
| 2025-08-30 07:37:54 - pico-train - INFO - โโโ Learning Rate: 2.51e-05 |
| 2025-08-30 07:37:54 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:38:06 - pico-train - INFO - Step 53875 -- ๐ Training Metrics |
| 2025-08-30 07:38:06 - pico-train - INFO - โโโ Loss: 5.9270 |
| 2025-08-30 07:38:06 - pico-train - INFO - โโโ Learning Rate: 2.51e-05 |
| 2025-08-30 07:38:06 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:38:19 - pico-train - INFO - Step 53900 -- ๐ Training Metrics |
| 2025-08-30 07:38:19 - pico-train - INFO - โโโ Loss: 5.8265 |
| 2025-08-30 07:38:19 - pico-train - INFO - โโโ Learning Rate: 2.51e-05 |
| 2025-08-30 07:38:19 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:38:31 - pico-train - INFO - Step 53925 -- ๐ Training Metrics |
| 2025-08-30 07:38:31 - pico-train - INFO - โโโ Loss: 5.9337 |
| 2025-08-30 07:38:31 - pico-train - INFO - โโโ Learning Rate: 2.51e-05 |
| 2025-08-30 07:38:31 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:38:44 - pico-train - INFO - Step 53950 -- ๐ Training Metrics |
| 2025-08-30 07:38:44 - pico-train - INFO - โโโ Loss: 5.8446 |
| 2025-08-30 07:38:44 - pico-train - INFO - โโโ Learning Rate: 2.50e-05 |
| 2025-08-30 07:38:44 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:38:57 - pico-train - INFO - Step 53975 -- ๐ Training Metrics |
| 2025-08-30 07:38:57 - pico-train - INFO - โโโ Loss: 5.8990 |
| 2025-08-30 07:38:57 - pico-train - INFO - โโโ Learning Rate: 2.50e-05 |
| 2025-08-30 07:38:57 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:39:09 - pico-train - INFO - Step 54000 -- ๐พ Saving Checkpoint |
| 2025-08-30 07:41:23 - pico-train - INFO - Step 54000 -- ๐ Evaluation Results |
| 2025-08-30 07:41:23 - pico-train - INFO - โโโ paloma: 7.742991230238928e+29 |
| 2025-08-30 07:41:25 - pico-train - INFO - Step 54000 -- ๐ Training Metrics |
| 2025-08-30 07:41:25 - pico-train - INFO - โโโ Loss: 5.7472 |
| 2025-08-30 07:41:25 - pico-train - INFO - โโโ Learning Rate: 2.50e-05 |
| 2025-08-30 07:41:25 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:41:25 - pico-train - INFO - Step 54000 -- ๐ Saving Learning Dynamics |
| 2025-08-30 07:41:40 - pico-train - INFO - Step 54025 -- ๐ Training Metrics |
| 2025-08-30 07:41:40 - pico-train - INFO - โโโ Loss: 5.9241 |
| 2025-08-30 07:41:40 - pico-train - INFO - โโโ Learning Rate: 2.50e-05 |
| 2025-08-30 07:41:40 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:41:53 - pico-train - INFO - Step 54050 -- ๐ Training Metrics |
| 2025-08-30 07:41:53 - pico-train - INFO - โโโ Loss: 5.9308 |
| 2025-08-30 07:41:53 - pico-train - INFO - โโโ Learning Rate: 2.50e-05 |
| 2025-08-30 07:41:53 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:42:05 - pico-train - INFO - Step 54075 -- ๐ Training Metrics |
| 2025-08-30 07:42:05 - pico-train - INFO - โโโ Loss: 5.9971 |
| 2025-08-30 07:42:05 - pico-train - INFO - โโโ Learning Rate: 2.49e-05 |
| 2025-08-30 07:42:05 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:42:18 - pico-train - INFO - Step 54100 -- ๐ Training Metrics |
| 2025-08-30 07:42:18 - pico-train - INFO - โโโ Loss: 5.9050 |
| 2025-08-30 07:42:18 - pico-train - INFO - โโโ Learning Rate: 2.49e-05 |
| 2025-08-30 07:42:18 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:42:31 - pico-train - INFO - Step 54125 -- ๐ Training Metrics |
| 2025-08-30 07:42:31 - pico-train - INFO - โโโ Loss: 6.0286 |
| 2025-08-30 07:42:31 - pico-train - INFO - โโโ Learning Rate: 2.49e-05 |
| 2025-08-30 07:42:31 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:42:43 - pico-train - INFO - Step 54150 -- ๐ Training Metrics |
| 2025-08-30 07:42:43 - pico-train - INFO - โโโ Loss: 5.9402 |
| 2025-08-30 07:42:43 - pico-train - INFO - โโโ Learning Rate: 2.49e-05 |
| 2025-08-30 07:42:43 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:42:56 - pico-train - INFO - Step 54175 -- ๐ Training Metrics |
| 2025-08-30 07:42:56 - pico-train - INFO - โโโ Loss: 5.8842 |
| 2025-08-30 07:42:56 - pico-train - INFO - โโโ Learning Rate: 2.49e-05 |
| 2025-08-30 07:42:56 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:43:08 - pico-train - INFO - Step 54200 -- ๐ Training Metrics |
| 2025-08-30 07:43:08 - pico-train - INFO - โโโ Loss: 5.9398 |
| 2025-08-30 07:43:08 - pico-train - INFO - โโโ Learning Rate: 2.48e-05 |
| 2025-08-30 07:43:08 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:43:21 - pico-train - INFO - Step 54225 -- ๐ Training Metrics |
| 2025-08-30 07:43:21 - pico-train - INFO - โโโ Loss: 5.8637 |
| 2025-08-30 07:43:21 - pico-train - INFO - โโโ Learning Rate: 2.48e-05 |
| 2025-08-30 07:43:21 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:43:34 - pico-train - INFO - Step 54250 -- ๐ Training Metrics |
| 2025-08-30 07:43:34 - pico-train - INFO - โโโ Loss: 5.8928 |
| 2025-08-30 07:43:34 - pico-train - INFO - โโโ Learning Rate: 2.48e-05 |
| 2025-08-30 07:43:34 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:43:47 - pico-train - INFO - Step 54275 -- ๐ Training Metrics |
| 2025-08-30 07:43:47 - pico-train - INFO - โโโ Loss: 5.8749 |
| 2025-08-30 07:43:47 - pico-train - INFO - โโโ Learning Rate: 2.48e-05 |
| 2025-08-30 07:43:47 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:43:59 - pico-train - INFO - Step 54300 -- ๐ Training Metrics |
| 2025-08-30 07:43:59 - pico-train - INFO - โโโ Loss: 5.8800 |
| 2025-08-30 07:43:59 - pico-train - INFO - โโโ Learning Rate: 2.47e-05 |
| 2025-08-30 07:43:59 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:44:12 - pico-train - INFO - Step 54325 -- ๐ Training Metrics |
| 2025-08-30 07:44:12 - pico-train - INFO - โโโ Loss: 5.9748 |
| 2025-08-30 07:44:12 - pico-train - INFO - โโโ Learning Rate: 2.47e-05 |
| 2025-08-30 07:44:12 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:44:25 - pico-train - INFO - Step 54350 -- ๐ Training Metrics |
| 2025-08-30 07:44:25 - pico-train - INFO - โโโ Loss: 5.8758 |
| 2025-08-30 07:44:25 - pico-train - INFO - โโโ Learning Rate: 2.47e-05 |
| 2025-08-30 07:44:25 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:44:37 - pico-train - INFO - Step 54375 -- ๐ Training Metrics |
| 2025-08-30 07:44:37 - pico-train - INFO - โโโ Loss: 6.0149 |
| 2025-08-30 07:44:37 - pico-train - INFO - โโโ Learning Rate: 2.47e-05 |
| 2025-08-30 07:44:37 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:44:50 - pico-train - INFO - Step 54400 -- ๐ Training Metrics |
| 2025-08-30 07:44:50 - pico-train - INFO - โโโ Loss: 5.9165 |
| 2025-08-30 07:44:50 - pico-train - INFO - โโโ Learning Rate: 2.47e-05 |
| 2025-08-30 07:44:50 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:45:02 - pico-train - INFO - Step 54425 -- ๐ Training Metrics |
| 2025-08-30 07:45:02 - pico-train - INFO - โโโ Loss: 5.8508 |
| 2025-08-30 07:45:02 - pico-train - INFO - โโโ Learning Rate: 2.46e-05 |
| 2025-08-30 07:45:02 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:45:15 - pico-train - INFO - Step 54450 -- ๐ Training Metrics |
| 2025-08-30 07:45:15 - pico-train - INFO - โโโ Loss: 5.9284 |
| 2025-08-30 07:45:15 - pico-train - INFO - โโโ Learning Rate: 2.46e-05 |
| 2025-08-30 07:45:15 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:45:27 - pico-train - INFO - Step 54475 -- ๐ Training Metrics |
| 2025-08-30 07:45:27 - pico-train - INFO - โโโ Loss: 5.9071 |
| 2025-08-30 07:45:27 - pico-train - INFO - โโโ Learning Rate: 2.46e-05 |
| 2025-08-30 07:45:27 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:45:39 - pico-train - INFO - Step 54500 -- ๐พ Saving Checkpoint |
| 2025-08-30 07:47:45 - pico-train - INFO - Step 54500 -- ๐ Evaluation Results |
| 2025-08-30 07:47:45 - pico-train - INFO - โโโ paloma: 9.839335327293338e+29 |
| 2025-08-30 07:47:47 - pico-train - INFO - Step 54500 -- ๐ Training Metrics |
| 2025-08-30 07:47:47 - pico-train - INFO - โโโ Loss: 5.8753 |
| 2025-08-30 07:47:47 - pico-train - INFO - โโโ Learning Rate: 2.46e-05 |
| 2025-08-30 07:47:47 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:47:47 - pico-train - INFO - Step 54500 -- ๐ Saving Learning Dynamics |
| 2025-08-30 07:48:03 - pico-train - INFO - Step 54525 -- ๐ Training Metrics |
| 2025-08-30 07:48:03 - pico-train - INFO - โโโ Loss: 5.9132 |
| 2025-08-30 07:48:03 - pico-train - INFO - โโโ Learning Rate: 2.46e-05 |
| 2025-08-30 07:48:03 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:48:16 - pico-train - INFO - Step 54550 -- ๐ Training Metrics |
| 2025-08-30 07:48:16 - pico-train - INFO - โโโ Loss: 5.9826 |
| 2025-08-30 07:48:16 - pico-train - INFO - โโโ Learning Rate: 2.45e-05 |
| 2025-08-30 07:48:16 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:48:28 - pico-train - INFO - Step 54575 -- ๐ Training Metrics |
| 2025-08-30 07:48:28 - pico-train - INFO - โโโ Loss: 5.8963 |
| 2025-08-30 07:48:28 - pico-train - INFO - โโโ Learning Rate: 2.45e-05 |
| 2025-08-30 07:48:28 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:48:41 - pico-train - INFO - Step 54600 -- ๐ Training Metrics |
| 2025-08-30 07:48:41 - pico-train - INFO - โโโ Loss: 5.9433 |
| 2025-08-30 07:48:41 - pico-train - INFO - โโโ Learning Rate: 2.45e-05 |
| 2025-08-30 07:48:41 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:48:53 - pico-train - INFO - Step 54625 -- ๐ Training Metrics |
| 2025-08-30 07:48:53 - pico-train - INFO - โโโ Loss: 5.9281 |
| 2025-08-30 07:48:53 - pico-train - INFO - โโโ Learning Rate: 2.45e-05 |
| 2025-08-30 07:48:53 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:49:06 - pico-train - INFO - Step 54650 -- ๐ Training Metrics |
| 2025-08-30 07:49:06 - pico-train - INFO - โโโ Loss: 5.8462 |
| 2025-08-30 07:49:06 - pico-train - INFO - โโโ Learning Rate: 2.44e-05 |
| 2025-08-30 07:49:06 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:49:19 - pico-train - INFO - Step 54675 -- ๐ Training Metrics |
| 2025-08-30 07:49:19 - pico-train - INFO - โโโ Loss: 5.9508 |
| 2025-08-30 07:49:19 - pico-train - INFO - โโโ Learning Rate: 2.44e-05 |
| 2025-08-30 07:49:19 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:49:31 - pico-train - INFO - Step 54700 -- ๐ Training Metrics |
| 2025-08-30 07:49:31 - pico-train - INFO - โโโ Loss: 5.8880 |
| 2025-08-30 07:49:31 - pico-train - INFO - โโโ Learning Rate: 2.44e-05 |
| 2025-08-30 07:49:31 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:49:44 - pico-train - INFO - Step 54725 -- ๐ Training Metrics |
| 2025-08-30 07:49:44 - pico-train - INFO - โโโ Loss: 5.8829 |
| 2025-08-30 07:49:44 - pico-train - INFO - โโโ Learning Rate: 2.44e-05 |
| 2025-08-30 07:49:44 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:49:57 - pico-train - INFO - Step 54750 -- ๐ Training Metrics |
| 2025-08-30 07:49:57 - pico-train - INFO - โโโ Loss: 5.9466 |
| 2025-08-30 07:49:57 - pico-train - INFO - โโโ Learning Rate: 2.44e-05 |
| 2025-08-30 07:49:57 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:50:10 - pico-train - INFO - Step 54775 -- ๐ Training Metrics |
| 2025-08-30 07:50:10 - pico-train - INFO - โโโ Loss: 5.9607 |
| 2025-08-30 07:50:10 - pico-train - INFO - โโโ Learning Rate: 2.43e-05 |
| 2025-08-30 07:50:10 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:50:22 - pico-train - INFO - Step 54800 -- ๐ Training Metrics |
| 2025-08-30 07:50:22 - pico-train - INFO - โโโ Loss: 5.9967 |
| 2025-08-30 07:50:22 - pico-train - INFO - โโโ Learning Rate: 2.43e-05 |
| 2025-08-30 07:50:22 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:50:35 - pico-train - INFO - Step 54825 -- ๐ Training Metrics |
| 2025-08-30 07:50:35 - pico-train - INFO - โโโ Loss: 5.8599 |
| 2025-08-30 07:50:35 - pico-train - INFO - โโโ Learning Rate: 2.43e-05 |
| 2025-08-30 07:50:35 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:50:48 - pico-train - INFO - Step 54850 -- ๐ Training Metrics |
| 2025-08-30 07:50:48 - pico-train - INFO - โโโ Loss: 5.9756 |
| 2025-08-30 07:50:48 - pico-train - INFO - โโโ Learning Rate: 2.43e-05 |
| 2025-08-30 07:50:48 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:51:00 - pico-train - INFO - Step 54875 -- ๐ Training Metrics |
| 2025-08-30 07:51:00 - pico-train - INFO - โโโ Loss: 5.8856 |
| 2025-08-30 07:51:00 - pico-train - INFO - โโโ Learning Rate: 2.43e-05 |
| 2025-08-30 07:51:00 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:51:13 - pico-train - INFO - Step 54900 -- ๐ Training Metrics |
| 2025-08-30 07:51:13 - pico-train - INFO - โโโ Loss: 5.9306 |
| 2025-08-30 07:51:13 - pico-train - INFO - โโโ Learning Rate: 2.42e-05 |
| 2025-08-30 07:51:13 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:51:26 - pico-train - INFO - Step 54925 -- ๐ Training Metrics |
| 2025-08-30 07:51:26 - pico-train - INFO - โโโ Loss: 6.0266 |
| 2025-08-30 07:51:26 - pico-train - INFO - โโโ Learning Rate: 2.42e-05 |
| 2025-08-30 07:51:26 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:51:38 - pico-train - INFO - Step 54950 -- ๐ Training Metrics |
| 2025-08-30 07:51:38 - pico-train - INFO - โโโ Loss: 5.9054 |
| 2025-08-30 07:51:38 - pico-train - INFO - โโโ Learning Rate: 2.42e-05 |
| 2025-08-30 07:51:38 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:51:51 - pico-train - INFO - Step 54975 -- ๐ Training Metrics |
| 2025-08-30 07:51:51 - pico-train - INFO - โโโ Loss: 5.8885 |
| 2025-08-30 07:51:51 - pico-train - INFO - โโโ Learning Rate: 2.42e-05 |
| 2025-08-30 07:51:51 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:52:03 - pico-train - INFO - Step 55000 -- ๐พ Saving Checkpoint |
| 2025-08-30 07:54:06 - pico-train - INFO - Step 55000 -- ๐ Evaluation Results |
| 2025-08-30 07:54:06 - pico-train - INFO - โโโ paloma: 1.0447307155558866e+30 |
| 2025-08-30 07:54:07 - pico-train - INFO - Step 55000 -- ๐ Training Metrics |
| 2025-08-30 07:54:07 - pico-train - INFO - โโโ Loss: 6.0147 |
| 2025-08-30 07:54:07 - pico-train - INFO - โโโ Learning Rate: 2.41e-05 |
| 2025-08-30 07:54:07 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:54:07 - pico-train - INFO - Step 55000 -- ๐ Saving Learning Dynamics |
| 2025-08-30 07:54:22 - pico-train - INFO - Step 55025 -- ๐ Training Metrics |
| 2025-08-30 07:54:22 - pico-train - INFO - โโโ Loss: 5.9628 |
| 2025-08-30 07:54:22 - pico-train - INFO - โโโ Learning Rate: 2.41e-05 |
| 2025-08-30 07:54:22 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:54:34 - pico-train - INFO - Step 55050 -- ๐ Training Metrics |
| 2025-08-30 07:54:34 - pico-train - INFO - โโโ Loss: 5.9456 |
| 2025-08-30 07:54:34 - pico-train - INFO - โโโ Learning Rate: 2.41e-05 |
| 2025-08-30 07:54:34 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:54:47 - pico-train - INFO - Step 55075 -- ๐ Training Metrics |
| 2025-08-30 07:54:47 - pico-train - INFO - โโโ Loss: 5.9061 |
| 2025-08-30 07:54:47 - pico-train - INFO - โโโ Learning Rate: 2.41e-05 |
| 2025-08-30 07:54:47 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:55:00 - pico-train - INFO - Step 55100 -- ๐ Training Metrics |
| 2025-08-30 07:55:00 - pico-train - INFO - โโโ Loss: 5.9604 |
| 2025-08-30 07:55:00 - pico-train - INFO - โโโ Learning Rate: 2.41e-05 |
| 2025-08-30 07:55:00 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:55:12 - pico-train - INFO - Step 55125 -- ๐ Training Metrics |
| 2025-08-30 07:55:12 - pico-train - INFO - โโโ Loss: 5.8649 |
| 2025-08-30 07:55:12 - pico-train - INFO - โโโ Learning Rate: 2.40e-05 |
| 2025-08-30 07:55:12 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:55:25 - pico-train - INFO - Step 55150 -- ๐ Training Metrics |
| 2025-08-30 07:55:25 - pico-train - INFO - โโโ Loss: 5.8123 |
| 2025-08-30 07:55:25 - pico-train - INFO - โโโ Learning Rate: 2.40e-05 |
| 2025-08-30 07:55:25 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:55:37 - pico-train - INFO - Step 55175 -- ๐ Training Metrics |
| 2025-08-30 07:55:37 - pico-train - INFO - โโโ Loss: 5.9016 |
| 2025-08-30 07:55:37 - pico-train - INFO - โโโ Learning Rate: 2.40e-05 |
| 2025-08-30 07:55:37 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:55:50 - pico-train - INFO - Step 55200 -- ๐ Training Metrics |
| 2025-08-30 07:55:50 - pico-train - INFO - โโโ Loss: 5.9233 |
| 2025-08-30 07:55:50 - pico-train - INFO - โโโ Learning Rate: 2.40e-05 |
| 2025-08-30 07:55:50 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:56:03 - pico-train - INFO - Step 55225 -- ๐ Training Metrics |
| 2025-08-30 07:56:03 - pico-train - INFO - โโโ Loss: 5.8768 |
| 2025-08-30 07:56:03 - pico-train - INFO - โโโ Learning Rate: 2.40e-05 |
| 2025-08-30 07:56:03 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:56:15 - pico-train - INFO - Step 55250 -- ๐ Training Metrics |
| 2025-08-30 07:56:15 - pico-train - INFO - โโโ Loss: 5.9265 |
| 2025-08-30 07:56:15 - pico-train - INFO - โโโ Learning Rate: 2.39e-05 |
| 2025-08-30 07:56:15 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:56:28 - pico-train - INFO - Step 55275 -- ๐ Training Metrics |
| 2025-08-30 07:56:28 - pico-train - INFO - โโโ Loss: 5.8929 |
| 2025-08-30 07:56:28 - pico-train - INFO - โโโ Learning Rate: 2.39e-05 |
| 2025-08-30 07:56:28 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:56:41 - pico-train - INFO - Step 55300 -- ๐ Training Metrics |
| 2025-08-30 07:56:41 - pico-train - INFO - โโโ Loss: 5.8863 |
| 2025-08-30 07:56:41 - pico-train - INFO - โโโ Learning Rate: 2.39e-05 |
| 2025-08-30 07:56:41 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:56:53 - pico-train - INFO - Step 55325 -- ๐ Training Metrics |
| 2025-08-30 07:56:53 - pico-train - INFO - โโโ Loss: 5.8588 |
| 2025-08-30 07:56:53 - pico-train - INFO - โโโ Learning Rate: 2.39e-05 |
| 2025-08-30 07:56:53 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:57:06 - pico-train - INFO - Step 55350 -- ๐ Training Metrics |
| 2025-08-30 07:57:06 - pico-train - INFO - โโโ Loss: 5.8740 |
| 2025-08-30 07:57:06 - pico-train - INFO - โโโ Learning Rate: 2.38e-05 |
| 2025-08-30 07:57:06 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:57:19 - pico-train - INFO - Step 55375 -- ๐ Training Metrics |
| 2025-08-30 07:57:19 - pico-train - INFO - โโโ Loss: 5.9163 |
| 2025-08-30 07:57:19 - pico-train - INFO - โโโ Learning Rate: 2.38e-05 |
| 2025-08-30 07:57:19 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:57:32 - pico-train - INFO - Step 55400 -- ๐ Training Metrics |
| 2025-08-30 07:57:32 - pico-train - INFO - โโโ Loss: 5.8563 |
| 2025-08-30 07:57:32 - pico-train - INFO - โโโ Learning Rate: 2.38e-05 |
| 2025-08-30 07:57:32 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:57:44 - pico-train - INFO - Step 55425 -- ๐ Training Metrics |
| 2025-08-30 07:57:44 - pico-train - INFO - โโโ Loss: 5.8964 |
| 2025-08-30 07:57:44 - pico-train - INFO - โโโ Learning Rate: 2.38e-05 |
| 2025-08-30 07:57:44 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:57:57 - pico-train - INFO - Step 55450 -- ๐ Training Metrics |
| 2025-08-30 07:57:57 - pico-train - INFO - โโโ Loss: 5.9860 |
| 2025-08-30 07:57:57 - pico-train - INFO - โโโ Learning Rate: 2.38e-05 |
| 2025-08-30 07:57:57 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:58:09 - pico-train - INFO - Step 55475 -- ๐ Training Metrics |
| 2025-08-30 07:58:09 - pico-train - INFO - โโโ Loss: 5.9124 |
| 2025-08-30 07:58:09 - pico-train - INFO - โโโ Learning Rate: 2.37e-05 |
| 2025-08-30 07:58:09 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 07:58:22 - pico-train - INFO - Step 55500 -- ๐พ Saving Checkpoint |
| 2025-08-30 08:00:23 - pico-train - INFO - Step 55500 -- ๐ Evaluation Results |
| 2025-08-30 08:00:23 - pico-train - INFO - โโโ paloma: 1.3871906758809066e+30 |
| 2025-08-30 08:00:33 - pico-train - INFO - Step 55500 -- ๐ Training Metrics |
| 2025-08-30 08:00:33 - pico-train - INFO - โโโ Loss: 5.9344 |
| 2025-08-30 08:00:33 - pico-train - INFO - โโโ Learning Rate: 2.37e-05 |
| 2025-08-30 08:00:33 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:00:33 - pico-train - INFO - Step 55500 -- ๐ Saving Learning Dynamics |
| 2025-08-30 08:00:48 - pico-train - INFO - Step 55525 -- ๐ Training Metrics |
| 2025-08-30 08:00:48 - pico-train - INFO - โโโ Loss: 5.8185 |
| 2025-08-30 08:00:48 - pico-train - INFO - โโโ Learning Rate: 2.37e-05 |
| 2025-08-30 08:00:48 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:01:00 - pico-train - INFO - Step 55550 -- ๐ Training Metrics |
| 2025-08-30 08:01:00 - pico-train - INFO - โโโ Loss: 5.8907 |
| 2025-08-30 08:01:00 - pico-train - INFO - โโโ Learning Rate: 2.37e-05 |
| 2025-08-30 08:01:00 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:01:13 - pico-train - INFO - Step 55575 -- ๐ Training Metrics |
| 2025-08-30 08:01:13 - pico-train - INFO - โโโ Loss: 5.8578 |
| 2025-08-30 08:01:13 - pico-train - INFO - โโโ Learning Rate: 2.37e-05 |
| 2025-08-30 08:01:13 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:01:25 - pico-train - INFO - Step 55600 -- ๐ Training Metrics |
| 2025-08-30 08:01:25 - pico-train - INFO - โโโ Loss: 5.8699 |
| 2025-08-30 08:01:25 - pico-train - INFO - โโโ Learning Rate: 2.36e-05 |
| 2025-08-30 08:01:25 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:01:38 - pico-train - INFO - Step 55625 -- ๐ Training Metrics |
| 2025-08-30 08:01:38 - pico-train - INFO - โโโ Loss: 5.9257 |
| 2025-08-30 08:01:38 - pico-train - INFO - โโโ Learning Rate: 2.36e-05 |
| 2025-08-30 08:01:38 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:01:50 - pico-train - INFO - Step 55650 -- ๐ Training Metrics |
| 2025-08-30 08:01:50 - pico-train - INFO - โโโ Loss: 5.9346 |
| 2025-08-30 08:01:50 - pico-train - INFO - โโโ Learning Rate: 2.36e-05 |
| 2025-08-30 08:01:50 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:02:03 - pico-train - INFO - Step 55675 -- ๐ Training Metrics |
| 2025-08-30 08:02:03 - pico-train - INFO - โโโ Loss: 5.9879 |
| 2025-08-30 08:02:03 - pico-train - INFO - โโโ Learning Rate: 2.36e-05 |
| 2025-08-30 08:02:03 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:02:15 - pico-train - INFO - Step 55700 -- ๐ Training Metrics |
| 2025-08-30 08:02:15 - pico-train - INFO - โโโ Loss: 5.9003 |
| 2025-08-30 08:02:15 - pico-train - INFO - โโโ Learning Rate: 2.35e-05 |
| 2025-08-30 08:02:15 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:02:28 - pico-train - INFO - Step 55725 -- ๐ Training Metrics |
| 2025-08-30 08:02:28 - pico-train - INFO - โโโ Loss: 5.9490 |
| 2025-08-30 08:02:28 - pico-train - INFO - โโโ Learning Rate: 2.35e-05 |
| 2025-08-30 08:02:28 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:02:40 - pico-train - INFO - Step 55750 -- ๐ Training Metrics |
| 2025-08-30 08:02:40 - pico-train - INFO - โโโ Loss: 5.8409 |
| 2025-08-30 08:02:40 - pico-train - INFO - โโโ Learning Rate: 2.35e-05 |
| 2025-08-30 08:02:40 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:02:53 - pico-train - INFO - Step 55775 -- ๐ Training Metrics |
| 2025-08-30 08:02:53 - pico-train - INFO - โโโ Loss: 5.9248 |
| 2025-08-30 08:02:53 - pico-train - INFO - โโโ Learning Rate: 2.35e-05 |
| 2025-08-30 08:02:53 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:03:06 - pico-train - INFO - Step 55800 -- ๐ Training Metrics |
| 2025-08-30 08:03:06 - pico-train - INFO - โโโ Loss: 5.8427 |
| 2025-08-30 08:03:06 - pico-train - INFO - โโโ Learning Rate: 2.35e-05 |
| 2025-08-30 08:03:06 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:03:18 - pico-train - INFO - Step 55825 -- ๐ Training Metrics |
| 2025-08-30 08:03:18 - pico-train - INFO - โโโ Loss: 5.9812 |
| 2025-08-30 08:03:18 - pico-train - INFO - โโโ Learning Rate: 2.34e-05 |
| 2025-08-30 08:03:18 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:03:31 - pico-train - INFO - Step 55850 -- ๐ Training Metrics |
| 2025-08-30 08:03:31 - pico-train - INFO - โโโ Loss: 5.8846 |
| 2025-08-30 08:03:31 - pico-train - INFO - โโโ Learning Rate: 2.34e-05 |
| 2025-08-30 08:03:31 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:03:44 - pico-train - INFO - Step 55875 -- ๐ Training Metrics |
| 2025-08-30 08:03:44 - pico-train - INFO - โโโ Loss: 5.8634 |
| 2025-08-30 08:03:44 - pico-train - INFO - โโโ Learning Rate: 2.34e-05 |
| 2025-08-30 08:03:44 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:03:56 - pico-train - INFO - Step 55900 -- ๐ Training Metrics |
| 2025-08-30 08:03:56 - pico-train - INFO - โโโ Loss: 5.8900 |
| 2025-08-30 08:03:56 - pico-train - INFO - โโโ Learning Rate: 2.34e-05 |
| 2025-08-30 08:03:56 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:04:09 - pico-train - INFO - Step 55925 -- ๐ Training Metrics |
| 2025-08-30 08:04:09 - pico-train - INFO - โโโ Loss: 5.8378 |
| 2025-08-30 08:04:09 - pico-train - INFO - โโโ Learning Rate: 2.34e-05 |
| 2025-08-30 08:04:09 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:04:21 - pico-train - INFO - Step 55950 -- ๐ Training Metrics |
| 2025-08-30 08:04:21 - pico-train - INFO - โโโ Loss: 5.8298 |
| 2025-08-30 08:04:21 - pico-train - INFO - โโโ Learning Rate: 2.33e-05 |
| 2025-08-30 08:04:21 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:04:34 - pico-train - INFO - Step 55975 -- ๐ Training Metrics |
| 2025-08-30 08:04:34 - pico-train - INFO - โโโ Loss: 6.0091 |
| 2025-08-30 08:04:34 - pico-train - INFO - โโโ Learning Rate: 2.33e-05 |
| 2025-08-30 08:04:34 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:04:46 - pico-train - INFO - Step 56000 -- ๐พ Saving Checkpoint |
| 2025-08-30 08:06:44 - pico-train - INFO - Step 56000 -- ๐ Evaluation Results |
| 2025-08-30 08:06:44 - pico-train - INFO - โโโ paloma: 1.5920277240703402e+30 |
| 2025-08-30 08:06:45 - pico-train - INFO - Step 56000 -- ๐ Training Metrics |
| 2025-08-30 08:06:45 - pico-train - INFO - โโโ Loss: 5.9868 |
| 2025-08-30 08:06:45 - pico-train - INFO - โโโ Learning Rate: 2.33e-05 |
| 2025-08-30 08:06:45 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:06:45 - pico-train - INFO - Step 56000 -- ๐ Saving Learning Dynamics |
| 2025-08-30 08:07:00 - pico-train - INFO - Step 56025 -- ๐ Training Metrics |
| 2025-08-30 08:07:00 - pico-train - INFO - โโโ Loss: 5.9389 |
| 2025-08-30 08:07:00 - pico-train - INFO - โโโ Learning Rate: 2.33e-05 |
| 2025-08-30 08:07:00 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:07:13 - pico-train - INFO - Step 56050 -- ๐ Training Metrics |
| 2025-08-30 08:07:13 - pico-train - INFO - โโโ Loss: 5.8835 |
| 2025-08-30 08:07:13 - pico-train - INFO - โโโ Learning Rate: 2.33e-05 |
| 2025-08-30 08:07:13 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:07:25 - pico-train - INFO - Step 56075 -- ๐ Training Metrics |
| 2025-08-30 08:07:25 - pico-train - INFO - โโโ Loss: 5.8286 |
| 2025-08-30 08:07:25 - pico-train - INFO - โโโ Learning Rate: 2.32e-05 |
| 2025-08-30 08:07:25 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:07:38 - pico-train - INFO - Step 56100 -- ๐ Training Metrics |
| 2025-08-30 08:07:38 - pico-train - INFO - โโโ Loss: 5.8313 |
| 2025-08-30 08:07:38 - pico-train - INFO - โโโ Learning Rate: 2.32e-05 |
| 2025-08-30 08:07:38 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:07:51 - pico-train - INFO - Step 56125 -- ๐ Training Metrics |
| 2025-08-30 08:07:51 - pico-train - INFO - โโโ Loss: 5.8921 |
| 2025-08-30 08:07:51 - pico-train - INFO - โโโ Learning Rate: 2.32e-05 |
| 2025-08-30 08:07:51 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:08:03 - pico-train - INFO - Step 56150 -- ๐ Training Metrics |
| 2025-08-30 08:08:03 - pico-train - INFO - โโโ Loss: 5.8274 |
| 2025-08-30 08:08:03 - pico-train - INFO - โโโ Learning Rate: 2.32e-05 |
| 2025-08-30 08:08:03 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:08:16 - pico-train - INFO - Step 56175 -- ๐ Training Metrics |
| 2025-08-30 08:08:16 - pico-train - INFO - โโโ Loss: 5.9244 |
| 2025-08-30 08:08:16 - pico-train - INFO - โโโ Learning Rate: 2.31e-05 |
| 2025-08-30 08:08:16 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:08:29 - pico-train - INFO - Step 56200 -- ๐ Training Metrics |
| 2025-08-30 08:08:29 - pico-train - INFO - โโโ Loss: 6.0019 |
| 2025-08-30 08:08:29 - pico-train - INFO - โโโ Learning Rate: 2.31e-05 |
| 2025-08-30 08:08:29 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:08:41 - pico-train - INFO - Step 56225 -- ๐ Training Metrics |
| 2025-08-30 08:08:41 - pico-train - INFO - โโโ Loss: 5.8920 |
| 2025-08-30 08:08:41 - pico-train - INFO - โโโ Learning Rate: 2.31e-05 |
| 2025-08-30 08:08:41 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:08:54 - pico-train - INFO - Step 56250 -- ๐ Training Metrics |
| 2025-08-30 08:08:54 - pico-train - INFO - โโโ Loss: 5.8811 |
| 2025-08-30 08:08:54 - pico-train - INFO - โโโ Learning Rate: 2.31e-05 |
| 2025-08-30 08:08:54 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:09:07 - pico-train - INFO - Step 56275 -- ๐ Training Metrics |
| 2025-08-30 08:09:07 - pico-train - INFO - โโโ Loss: 5.9166 |
| 2025-08-30 08:09:07 - pico-train - INFO - โโโ Learning Rate: 2.31e-05 |
| 2025-08-30 08:09:07 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:09:19 - pico-train - INFO - Step 56300 -- ๐ Training Metrics |
| 2025-08-30 08:09:19 - pico-train - INFO - โโโ Loss: 5.8974 |
| 2025-08-30 08:09:19 - pico-train - INFO - โโโ Learning Rate: 2.30e-05 |
| 2025-08-30 08:09:19 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:09:32 - pico-train - INFO - Step 56325 -- ๐ Training Metrics |
| 2025-08-30 08:09:32 - pico-train - INFO - โโโ Loss: 5.8989 |
| 2025-08-30 08:09:32 - pico-train - INFO - โโโ Learning Rate: 2.30e-05 |
| 2025-08-30 08:09:32 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:09:45 - pico-train - INFO - Step 56350 -- ๐ Training Metrics |
| 2025-08-30 08:09:45 - pico-train - INFO - โโโ Loss: 5.8976 |
| 2025-08-30 08:09:45 - pico-train - INFO - โโโ Learning Rate: 2.30e-05 |
| 2025-08-30 08:09:45 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:09:57 - pico-train - INFO - Step 56375 -- ๐ Training Metrics |
| 2025-08-30 08:09:57 - pico-train - INFO - โโโ Loss: 5.9189 |
| 2025-08-30 08:09:57 - pico-train - INFO - โโโ Learning Rate: 2.30e-05 |
| 2025-08-30 08:09:57 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:10:10 - pico-train - INFO - Step 56400 -- ๐ Training Metrics |
| 2025-08-30 08:10:10 - pico-train - INFO - โโโ Loss: 5.8489 |
| 2025-08-30 08:10:10 - pico-train - INFO - โโโ Learning Rate: 2.30e-05 |
| 2025-08-30 08:10:10 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:10:23 - pico-train - INFO - Step 56425 -- ๐ Training Metrics |
| 2025-08-30 08:10:23 - pico-train - INFO - โโโ Loss: 5.9099 |
| 2025-08-30 08:10:23 - pico-train - INFO - โโโ Learning Rate: 2.29e-05 |
| 2025-08-30 08:10:23 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:10:35 - pico-train - INFO - Step 56450 -- ๐ Training Metrics |
| 2025-08-30 08:10:35 - pico-train - INFO - โโโ Loss: 5.8612 |
| 2025-08-30 08:10:35 - pico-train - INFO - โโโ Learning Rate: 2.29e-05 |
| 2025-08-30 08:10:35 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:10:48 - pico-train - INFO - Step 56475 -- ๐ Training Metrics |
| 2025-08-30 08:10:48 - pico-train - INFO - โโโ Loss: 5.8795 |
| 2025-08-30 08:10:48 - pico-train - INFO - โโโ Learning Rate: 2.29e-05 |
| 2025-08-30 08:10:48 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:11:00 - pico-train - INFO - Step 56500 -- ๐พ Saving Checkpoint |
| 2025-08-30 08:12:53 - pico-train - INFO - Step 56500 -- ๐ Evaluation Results |
| 2025-08-30 08:12:53 - pico-train - INFO - โโโ paloma: 1.7892090663438402e+30 |
| 2025-08-30 08:12:55 - pico-train - INFO - Step 56500 -- ๐ Training Metrics |
| 2025-08-30 08:12:55 - pico-train - INFO - โโโ Loss: 5.8945 |
| 2025-08-30 08:12:55 - pico-train - INFO - โโโ Learning Rate: 2.29e-05 |
| 2025-08-30 08:12:55 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:12:55 - pico-train - INFO - Step 56500 -- ๐ Saving Learning Dynamics |
| 2025-08-30 08:13:10 - pico-train - INFO - Step 56525 -- ๐ Training Metrics |
| 2025-08-30 08:13:10 - pico-train - INFO - โโโ Loss: 5.8448 |
| 2025-08-30 08:13:10 - pico-train - INFO - โโโ Learning Rate: 2.28e-05 |
| 2025-08-30 08:13:10 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:13:23 - pico-train - INFO - Step 56550 -- ๐ Training Metrics |
| 2025-08-30 08:13:23 - pico-train - INFO - โโโ Loss: 5.8696 |
| 2025-08-30 08:13:23 - pico-train - INFO - โโโ Learning Rate: 2.28e-05 |
| 2025-08-30 08:13:23 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:13:35 - pico-train - INFO - Step 56575 -- ๐ Training Metrics |
| 2025-08-30 08:13:35 - pico-train - INFO - โโโ Loss: 5.8567 |
| 2025-08-30 08:13:35 - pico-train - INFO - โโโ Learning Rate: 2.28e-05 |
| 2025-08-30 08:13:35 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:13:48 - pico-train - INFO - Step 56600 -- ๐ Training Metrics |
| 2025-08-30 08:13:48 - pico-train - INFO - โโโ Loss: 5.8884 |
| 2025-08-30 08:13:48 - pico-train - INFO - โโโ Learning Rate: 2.28e-05 |
| 2025-08-30 08:13:48 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:14:01 - pico-train - INFO - Step 56625 -- ๐ Training Metrics |
| 2025-08-30 08:14:01 - pico-train - INFO - โโโ Loss: 5.9933 |
| 2025-08-30 08:14:01 - pico-train - INFO - โโโ Learning Rate: 2.28e-05 |
| 2025-08-30 08:14:01 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:14:13 - pico-train - INFO - Step 56650 -- ๐ Training Metrics |
| 2025-08-30 08:14:13 - pico-train - INFO - โโโ Loss: 5.8716 |
| 2025-08-30 08:14:13 - pico-train - INFO - โโโ Learning Rate: 2.27e-05 |
| 2025-08-30 08:14:13 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:14:26 - pico-train - INFO - Step 56675 -- ๐ Training Metrics |
| 2025-08-30 08:14:26 - pico-train - INFO - โโโ Loss: 5.9089 |
| 2025-08-30 08:14:26 - pico-train - INFO - โโโ Learning Rate: 2.27e-05 |
| 2025-08-30 08:14:26 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:14:39 - pico-train - INFO - Step 56700 -- ๐ Training Metrics |
| 2025-08-30 08:14:39 - pico-train - INFO - โโโ Loss: 5.8769 |
| 2025-08-30 08:14:39 - pico-train - INFO - โโโ Learning Rate: 2.27e-05 |
| 2025-08-30 08:14:39 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:14:51 - pico-train - INFO - Step 56725 -- ๐ Training Metrics |
| 2025-08-30 08:14:51 - pico-train - INFO - โโโ Loss: 5.8934 |
| 2025-08-30 08:14:51 - pico-train - INFO - โโโ Learning Rate: 2.27e-05 |
| 2025-08-30 08:14:51 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:15:04 - pico-train - INFO - Step 56750 -- ๐ Training Metrics |
| 2025-08-30 08:15:04 - pico-train - INFO - โโโ Loss: 5.8711 |
| 2025-08-30 08:15:04 - pico-train - INFO - โโโ Learning Rate: 2.27e-05 |
| 2025-08-30 08:15:04 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:15:16 - pico-train - INFO - Step 56775 -- ๐ Training Metrics |
| 2025-08-30 08:15:16 - pico-train - INFO - โโโ Loss: 5.8866 |
| 2025-08-30 08:15:16 - pico-train - INFO - โโโ Learning Rate: 2.26e-05 |
| 2025-08-30 08:15:16 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:15:29 - pico-train - INFO - Step 56800 -- ๐ Training Metrics |
| 2025-08-30 08:15:29 - pico-train - INFO - โโโ Loss: 5.9154 |
| 2025-08-30 08:15:29 - pico-train - INFO - โโโ Learning Rate: 2.26e-05 |
| 2025-08-30 08:15:29 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:15:42 - pico-train - INFO - Step 56825 -- ๐ Training Metrics |
| 2025-08-30 08:15:42 - pico-train - INFO - โโโ Loss: 5.8844 |
| 2025-08-30 08:15:42 - pico-train - INFO - โโโ Learning Rate: 2.26e-05 |
| 2025-08-30 08:15:42 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:15:54 - pico-train - INFO - Step 56850 -- ๐ Training Metrics |
| 2025-08-30 08:15:54 - pico-train - INFO - โโโ Loss: 5.9142 |
| 2025-08-30 08:15:54 - pico-train - INFO - โโโ Learning Rate: 2.26e-05 |
| 2025-08-30 08:15:54 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:16:07 - pico-train - INFO - Step 56875 -- ๐ Training Metrics |
| 2025-08-30 08:16:07 - pico-train - INFO - โโโ Loss: 5.8741 |
| 2025-08-30 08:16:07 - pico-train - INFO - โโโ Learning Rate: 2.25e-05 |
| 2025-08-30 08:16:07 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:16:20 - pico-train - INFO - Step 56900 -- ๐ Training Metrics |
| 2025-08-30 08:16:20 - pico-train - INFO - โโโ Loss: 5.9399 |
| 2025-08-30 08:16:20 - pico-train - INFO - โโโ Learning Rate: 2.25e-05 |
| 2025-08-30 08:16:20 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:16:32 - pico-train - INFO - Step 56925 -- ๐ Training Metrics |
| 2025-08-30 08:16:32 - pico-train - INFO - โโโ Loss: 5.8245 |
| 2025-08-30 08:16:32 - pico-train - INFO - โโโ Learning Rate: 2.25e-05 |
| 2025-08-30 08:16:32 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:16:45 - pico-train - INFO - Step 56950 -- ๐ Training Metrics |
| 2025-08-30 08:16:45 - pico-train - INFO - โโโ Loss: 5.9157 |
| 2025-08-30 08:16:45 - pico-train - INFO - โโโ Learning Rate: 2.25e-05 |
| 2025-08-30 08:16:45 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:16:58 - pico-train - INFO - Step 56975 -- ๐ Training Metrics |
| 2025-08-30 08:16:58 - pico-train - INFO - โโโ Loss: 5.8869 |
| 2025-08-30 08:16:58 - pico-train - INFO - โโโ Learning Rate: 2.25e-05 |
| 2025-08-30 08:16:58 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:17:10 - pico-train - INFO - Step 57000 -- ๐พ Saving Checkpoint |
| 2025-08-30 08:19:09 - pico-train - INFO - Step 57000 -- ๐ Evaluation Results |
| 2025-08-30 08:19:09 - pico-train - INFO - โโโ paloma: 2.2911292914273982e+30 |
| 2025-08-30 08:19:12 - pico-train - INFO - Step 57000 -- ๐ Training Metrics |
| 2025-08-30 08:19:12 - pico-train - INFO - โโโ Loss: 5.8465 |
| 2025-08-30 08:19:12 - pico-train - INFO - โโโ Learning Rate: 2.24e-05 |
| 2025-08-30 08:19:12 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:19:12 - pico-train - INFO - Step 57000 -- ๐ Saving Learning Dynamics |
| 2025-08-30 08:19:27 - pico-train - INFO - Step 57025 -- ๐ Training Metrics |
| 2025-08-30 08:19:27 - pico-train - INFO - โโโ Loss: 5.8565 |
| 2025-08-30 08:19:27 - pico-train - INFO - โโโ Learning Rate: 2.24e-05 |
| 2025-08-30 08:19:27 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:19:39 - pico-train - INFO - Step 57050 -- ๐ Training Metrics |
| 2025-08-30 08:19:39 - pico-train - INFO - โโโ Loss: 5.8628 |
| 2025-08-30 08:19:39 - pico-train - INFO - โโโ Learning Rate: 2.24e-05 |
| 2025-08-30 08:19:39 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:19:52 - pico-train - INFO - Step 57075 -- ๐ Training Metrics |
| 2025-08-30 08:19:52 - pico-train - INFO - โโโ Loss: 5.9016 |
| 2025-08-30 08:19:52 - pico-train - INFO - โโโ Learning Rate: 2.24e-05 |
| 2025-08-30 08:19:52 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:20:04 - pico-train - INFO - Step 57100 -- ๐ Training Metrics |
| 2025-08-30 08:20:04 - pico-train - INFO - โโโ Loss: 5.9662 |
| 2025-08-30 08:20:04 - pico-train - INFO - โโโ Learning Rate: 2.24e-05 |
| 2025-08-30 08:20:04 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:20:17 - pico-train - INFO - Step 57125 -- ๐ Training Metrics |
| 2025-08-30 08:20:17 - pico-train - INFO - โโโ Loss: 5.8192 |
| 2025-08-30 08:20:17 - pico-train - INFO - โโโ Learning Rate: 2.23e-05 |
| 2025-08-30 08:20:17 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:20:30 - pico-train - INFO - Step 57150 -- ๐ Training Metrics |
| 2025-08-30 08:20:30 - pico-train - INFO - โโโ Loss: 5.9000 |
| 2025-08-30 08:20:30 - pico-train - INFO - โโโ Learning Rate: 2.23e-05 |
| 2025-08-30 08:20:30 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:20:42 - pico-train - INFO - Step 57175 -- ๐ Training Metrics |
| 2025-08-30 08:20:42 - pico-train - INFO - โโโ Loss: 5.7458 |
| 2025-08-30 08:20:42 - pico-train - INFO - โโโ Learning Rate: 2.23e-05 |
| 2025-08-30 08:20:42 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:20:55 - pico-train - INFO - Step 57200 -- ๐ Training Metrics |
| 2025-08-30 08:20:55 - pico-train - INFO - โโโ Loss: 5.8635 |
| 2025-08-30 08:20:55 - pico-train - INFO - โโโ Learning Rate: 2.23e-05 |
| 2025-08-30 08:20:55 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:21:08 - pico-train - INFO - Step 57225 -- ๐ Training Metrics |
| 2025-08-30 08:21:08 - pico-train - INFO - โโโ Loss: 5.9097 |
| 2025-08-30 08:21:08 - pico-train - INFO - โโโ Learning Rate: 2.23e-05 |
| 2025-08-30 08:21:08 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:21:20 - pico-train - INFO - Step 57250 -- ๐ Training Metrics |
| 2025-08-30 08:21:20 - pico-train - INFO - โโโ Loss: 5.9121 |
| 2025-08-30 08:21:20 - pico-train - INFO - โโโ Learning Rate: 2.22e-05 |
| 2025-08-30 08:21:20 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:21:33 - pico-train - INFO - Step 57275 -- ๐ Training Metrics |
| 2025-08-30 08:21:33 - pico-train - INFO - โโโ Loss: 5.8948 |
| 2025-08-30 08:21:33 - pico-train - INFO - โโโ Learning Rate: 2.22e-05 |
| 2025-08-30 08:21:33 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:21:46 - pico-train - INFO - Step 57300 -- ๐ Training Metrics |
| 2025-08-30 08:21:46 - pico-train - INFO - โโโ Loss: 5.8280 |
| 2025-08-30 08:21:46 - pico-train - INFO - โโโ Learning Rate: 2.22e-05 |
| 2025-08-30 08:21:46 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:21:58 - pico-train - INFO - Step 57325 -- ๐ Training Metrics |
| 2025-08-30 08:21:58 - pico-train - INFO - โโโ Loss: 5.8445 |
| 2025-08-30 08:21:58 - pico-train - INFO - โโโ Learning Rate: 2.22e-05 |
| 2025-08-30 08:21:58 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:22:11 - pico-train - INFO - Step 57350 -- ๐ Training Metrics |
| 2025-08-30 08:22:11 - pico-train - INFO - โโโ Loss: 5.9213 |
| 2025-08-30 08:22:11 - pico-train - INFO - โโโ Learning Rate: 2.21e-05 |
| 2025-08-30 08:22:11 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:22:24 - pico-train - INFO - Step 57375 -- ๐ Training Metrics |
| 2025-08-30 08:22:24 - pico-train - INFO - โโโ Loss: 5.9795 |
| 2025-08-30 08:22:24 - pico-train - INFO - โโโ Learning Rate: 2.21e-05 |
| 2025-08-30 08:22:24 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:22:36 - pico-train - INFO - Step 57400 -- ๐ Training Metrics |
| 2025-08-30 08:22:36 - pico-train - INFO - โโโ Loss: 5.9827 |
| 2025-08-30 08:22:36 - pico-train - INFO - โโโ Learning Rate: 2.21e-05 |
| 2025-08-30 08:22:36 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:22:49 - pico-train - INFO - Step 57425 -- ๐ Training Metrics |
| 2025-08-30 08:22:49 - pico-train - INFO - โโโ Loss: 5.9802 |
| 2025-08-30 08:22:49 - pico-train - INFO - โโโ Learning Rate: 2.21e-05 |
| 2025-08-30 08:22:49 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:23:01 - pico-train - INFO - Step 57450 -- ๐ Training Metrics |
| 2025-08-30 08:23:01 - pico-train - INFO - โโโ Loss: 5.8669 |
| 2025-08-30 08:23:01 - pico-train - INFO - โโโ Learning Rate: 2.21e-05 |
| 2025-08-30 08:23:01 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:23:14 - pico-train - INFO - Step 57475 -- ๐ Training Metrics |
| 2025-08-30 08:23:14 - pico-train - INFO - โโโ Loss: 5.8762 |
| 2025-08-30 08:23:14 - pico-train - INFO - โโโ Learning Rate: 2.20e-05 |
| 2025-08-30 08:23:14 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:23:26 - pico-train - INFO - Step 57500 -- ๐พ Saving Checkpoint |
| 2025-08-30 08:25:33 - pico-train - INFO - Step 57500 -- ๐ Evaluation Results |
| 2025-08-30 08:25:33 - pico-train - INFO - โโโ paloma: 2.2146898668006388e+30 |
| 2025-08-30 08:25:35 - pico-train - INFO - Step 57500 -- ๐ Training Metrics |
| 2025-08-30 08:25:35 - pico-train - INFO - โโโ Loss: 5.8685 |
| 2025-08-30 08:25:35 - pico-train - INFO - โโโ Learning Rate: 2.20e-05 |
| 2025-08-30 08:25:35 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:25:35 - pico-train - INFO - Step 57500 -- ๐ Saving Learning Dynamics |
| 2025-08-30 08:25:49 - pico-train - INFO - Step 57525 -- ๐ Training Metrics |
| 2025-08-30 08:25:49 - pico-train - INFO - โโโ Loss: 5.8952 |
| 2025-08-30 08:25:49 - pico-train - INFO - โโโ Learning Rate: 2.20e-05 |
| 2025-08-30 08:25:49 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:26:02 - pico-train - INFO - Step 57550 -- ๐ Training Metrics |
| 2025-08-30 08:26:02 - pico-train - INFO - โโโ Loss: 5.8838 |
| 2025-08-30 08:26:02 - pico-train - INFO - โโโ Learning Rate: 2.20e-05 |
| 2025-08-30 08:26:02 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:26:15 - pico-train - INFO - Step 57575 -- ๐ Training Metrics |
| 2025-08-30 08:26:15 - pico-train - INFO - โโโ Loss: 5.8700 |
| 2025-08-30 08:26:15 - pico-train - INFO - โโโ Learning Rate: 2.20e-05 |
| 2025-08-30 08:26:15 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:26:28 - pico-train - INFO - Step 57600 -- ๐ Training Metrics |
| 2025-08-30 08:26:28 - pico-train - INFO - โโโ Loss: 5.8803 |
| 2025-08-30 08:26:28 - pico-train - INFO - โโโ Learning Rate: 2.19e-05 |
| 2025-08-30 08:26:28 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:26:40 - pico-train - INFO - Step 57625 -- ๐ Training Metrics |
| 2025-08-30 08:26:40 - pico-train - INFO - โโโ Loss: 5.9336 |
| 2025-08-30 08:26:40 - pico-train - INFO - โโโ Learning Rate: 2.19e-05 |
| 2025-08-30 08:26:40 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:26:53 - pico-train - INFO - Step 57650 -- ๐ Training Metrics |
| 2025-08-30 08:26:53 - pico-train - INFO - โโโ Loss: 5.8840 |
| 2025-08-30 08:26:53 - pico-train - INFO - โโโ Learning Rate: 2.19e-05 |
| 2025-08-30 08:26:53 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:27:06 - pico-train - INFO - Step 57675 -- ๐ Training Metrics |
| 2025-08-30 08:27:06 - pico-train - INFO - โโโ Loss: 5.9388 |
| 2025-08-30 08:27:06 - pico-train - INFO - โโโ Learning Rate: 2.19e-05 |
| 2025-08-30 08:27:06 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:27:18 - pico-train - INFO - Step 57700 -- ๐ Training Metrics |
| 2025-08-30 08:27:18 - pico-train - INFO - โโโ Loss: 5.9069 |
| 2025-08-30 08:27:18 - pico-train - INFO - โโโ Learning Rate: 2.18e-05 |
| 2025-08-30 08:27:18 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:27:31 - pico-train - INFO - Step 57725 -- ๐ Training Metrics |
| 2025-08-30 08:27:31 - pico-train - INFO - โโโ Loss: 5.9429 |
| 2025-08-30 08:27:31 - pico-train - INFO - โโโ Learning Rate: 2.18e-05 |
| 2025-08-30 08:27:31 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:27:44 - pico-train - INFO - Step 57750 -- ๐ Training Metrics |
| 2025-08-30 08:27:44 - pico-train - INFO - โโโ Loss: 5.8362 |
| 2025-08-30 08:27:44 - pico-train - INFO - โโโ Learning Rate: 2.18e-05 |
| 2025-08-30 08:27:44 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:27:56 - pico-train - INFO - Step 57775 -- ๐ Training Metrics |
| 2025-08-30 08:27:56 - pico-train - INFO - โโโ Loss: 5.8943 |
| 2025-08-30 08:27:56 - pico-train - INFO - โโโ Learning Rate: 2.18e-05 |
| 2025-08-30 08:27:56 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:28:09 - pico-train - INFO - Step 57800 -- ๐ Training Metrics |
| 2025-08-30 08:28:09 - pico-train - INFO - โโโ Loss: 5.8114 |
| 2025-08-30 08:28:09 - pico-train - INFO - โโโ Learning Rate: 2.18e-05 |
| 2025-08-30 08:28:09 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:28:22 - pico-train - INFO - Step 57825 -- ๐ Training Metrics |
| 2025-08-30 08:28:22 - pico-train - INFO - โโโ Loss: 5.9848 |
| 2025-08-30 08:28:22 - pico-train - INFO - โโโ Learning Rate: 2.17e-05 |
| 2025-08-30 08:28:22 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:28:34 - pico-train - INFO - Step 57850 -- ๐ Training Metrics |
| 2025-08-30 08:28:34 - pico-train - INFO - โโโ Loss: 5.8611 |
| 2025-08-30 08:28:34 - pico-train - INFO - โโโ Learning Rate: 2.17e-05 |
| 2025-08-30 08:28:34 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:28:47 - pico-train - INFO - Step 57875 -- ๐ Training Metrics |
| 2025-08-30 08:28:47 - pico-train - INFO - โโโ Loss: 5.9010 |
| 2025-08-30 08:28:47 - pico-train - INFO - โโโ Learning Rate: 2.17e-05 |
| 2025-08-30 08:28:47 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:29:00 - pico-train - INFO - Step 57900 -- ๐ Training Metrics |
| 2025-08-30 08:29:00 - pico-train - INFO - โโโ Loss: 5.8876 |
| 2025-08-30 08:29:00 - pico-train - INFO - โโโ Learning Rate: 2.17e-05 |
| 2025-08-30 08:29:00 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:29:12 - pico-train - INFO - Step 57925 -- ๐ Training Metrics |
| 2025-08-30 08:29:12 - pico-train - INFO - โโโ Loss: 5.9053 |
| 2025-08-30 08:29:12 - pico-train - INFO - โโโ Learning Rate: 2.17e-05 |
| 2025-08-30 08:29:12 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:29:25 - pico-train - INFO - Step 57950 -- ๐ Training Metrics |
| 2025-08-30 08:29:25 - pico-train - INFO - โโโ Loss: 5.9021 |
| 2025-08-30 08:29:25 - pico-train - INFO - โโโ Learning Rate: 2.16e-05 |
| 2025-08-30 08:29:25 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:29:38 - pico-train - INFO - Step 57975 -- ๐ Training Metrics |
| 2025-08-30 08:29:38 - pico-train - INFO - โโโ Loss: 5.8546 |
| 2025-08-30 08:29:38 - pico-train - INFO - โโโ Learning Rate: 2.16e-05 |
| 2025-08-30 08:29:38 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:29:50 - pico-train - INFO - Step 58000 -- ๐พ Saving Checkpoint |
| 2025-08-30 08:31:44 - pico-train - INFO - Step 58000 -- ๐ Evaluation Results |
| 2025-08-30 08:31:44 - pico-train - INFO - โโโ paloma: 2.9327628683408786e+30 |
| 2025-08-30 08:31:47 - pico-train - INFO - Step 58000 -- ๐ Training Metrics |
| 2025-08-30 08:31:47 - pico-train - INFO - โโโ Loss: 5.8753 |
| 2025-08-30 08:31:47 - pico-train - INFO - โโโ Learning Rate: 2.16e-05 |
| 2025-08-30 08:31:47 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:31:47 - pico-train - INFO - Step 58000 -- ๐ Saving Learning Dynamics |
| 2025-08-30 08:32:02 - pico-train - INFO - Step 58025 -- ๐ Training Metrics |
| 2025-08-30 08:32:02 - pico-train - INFO - โโโ Loss: 5.8882 |
| 2025-08-30 08:32:02 - pico-train - INFO - โโโ Learning Rate: 2.16e-05 |
| 2025-08-30 08:32:02 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:32:15 - pico-train - INFO - Step 58050 -- ๐ Training Metrics |
| 2025-08-30 08:32:15 - pico-train - INFO - โโโ Loss: 5.8783 |
| 2025-08-30 08:32:15 - pico-train - INFO - โโโ Learning Rate: 2.16e-05 |
| 2025-08-30 08:32:15 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:32:27 - pico-train - INFO - Step 58075 -- ๐ Training Metrics |
| 2025-08-30 08:32:27 - pico-train - INFO - โโโ Loss: 5.8479 |
| 2025-08-30 08:32:27 - pico-train - INFO - โโโ Learning Rate: 2.15e-05 |
| 2025-08-30 08:32:27 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:32:40 - pico-train - INFO - Step 58100 -- ๐ Training Metrics |
| 2025-08-30 08:32:40 - pico-train - INFO - โโโ Loss: 5.8465 |
| 2025-08-30 08:32:40 - pico-train - INFO - โโโ Learning Rate: 2.15e-05 |
| 2025-08-30 08:32:40 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:32:53 - pico-train - INFO - Step 58125 -- ๐ Training Metrics |
| 2025-08-30 08:32:53 - pico-train - INFO - โโโ Loss: 5.8889 |
| 2025-08-30 08:32:53 - pico-train - INFO - โโโ Learning Rate: 2.15e-05 |
| 2025-08-30 08:32:53 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:33:05 - pico-train - INFO - Step 58150 -- ๐ Training Metrics |
| 2025-08-30 08:33:05 - pico-train - INFO - โโโ Loss: 5.8143 |
| 2025-08-30 08:33:05 - pico-train - INFO - โโโ Learning Rate: 2.15e-05 |
| 2025-08-30 08:33:05 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:33:18 - pico-train - INFO - Step 58175 -- ๐ Training Metrics |
| 2025-08-30 08:33:18 - pico-train - INFO - โโโ Loss: 5.9133 |
| 2025-08-30 08:33:18 - pico-train - INFO - โโโ Learning Rate: 2.14e-05 |
| 2025-08-30 08:33:18 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:33:31 - pico-train - INFO - Step 58200 -- ๐ Training Metrics |
| 2025-08-30 08:33:31 - pico-train - INFO - โโโ Loss: 5.8496 |
| 2025-08-30 08:33:31 - pico-train - INFO - โโโ Learning Rate: 2.14e-05 |
| 2025-08-30 08:33:31 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:33:43 - pico-train - INFO - Step 58225 -- ๐ Training Metrics |
| 2025-08-30 08:33:43 - pico-train - INFO - โโโ Loss: 5.9211 |
| 2025-08-30 08:33:43 - pico-train - INFO - โโโ Learning Rate: 2.14e-05 |
| 2025-08-30 08:33:43 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:33:56 - pico-train - INFO - Step 58250 -- ๐ Training Metrics |
| 2025-08-30 08:33:56 - pico-train - INFO - โโโ Loss: 5.8764 |
| 2025-08-30 08:33:56 - pico-train - INFO - โโโ Learning Rate: 2.14e-05 |
| 2025-08-30 08:33:56 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:34:09 - pico-train - INFO - Step 58275 -- ๐ Training Metrics |
| 2025-08-30 08:34:09 - pico-train - INFO - โโโ Loss: 5.9342 |
| 2025-08-30 08:34:09 - pico-train - INFO - โโโ Learning Rate: 2.14e-05 |
| 2025-08-30 08:34:09 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:34:21 - pico-train - INFO - Step 58300 -- ๐ Training Metrics |
| 2025-08-30 08:34:21 - pico-train - INFO - โโโ Loss: 5.8601 |
| 2025-08-30 08:34:21 - pico-train - INFO - โโโ Learning Rate: 2.13e-05 |
| 2025-08-30 08:34:21 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:34:34 - pico-train - INFO - Step 58325 -- ๐ Training Metrics |
| 2025-08-30 08:34:34 - pico-train - INFO - โโโ Loss: 5.8394 |
| 2025-08-30 08:34:34 - pico-train - INFO - โโโ Learning Rate: 2.13e-05 |
| 2025-08-30 08:34:34 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:34:46 - pico-train - INFO - Step 58350 -- ๐ Training Metrics |
| 2025-08-30 08:34:46 - pico-train - INFO - โโโ Loss: 5.9285 |
| 2025-08-30 08:34:46 - pico-train - INFO - โโโ Learning Rate: 2.13e-05 |
| 2025-08-30 08:34:46 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:34:59 - pico-train - INFO - Step 58375 -- ๐ Training Metrics |
| 2025-08-30 08:34:59 - pico-train - INFO - โโโ Loss: 5.8421 |
| 2025-08-30 08:34:59 - pico-train - INFO - โโโ Learning Rate: 2.13e-05 |
| 2025-08-30 08:34:59 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:35:12 - pico-train - INFO - Step 58400 -- ๐ Training Metrics |
| 2025-08-30 08:35:12 - pico-train - INFO - โโโ Loss: 5.7891 |
| 2025-08-30 08:35:12 - pico-train - INFO - โโโ Learning Rate: 2.13e-05 |
| 2025-08-30 08:35:12 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:35:25 - pico-train - INFO - Step 58425 -- ๐ Training Metrics |
| 2025-08-30 08:35:25 - pico-train - INFO - โโโ Loss: 5.8921 |
| 2025-08-30 08:35:25 - pico-train - INFO - โโโ Learning Rate: 2.12e-05 |
| 2025-08-30 08:35:25 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:35:37 - pico-train - INFO - Step 58450 -- ๐ Training Metrics |
| 2025-08-30 08:35:37 - pico-train - INFO - โโโ Loss: 5.8410 |
| 2025-08-30 08:35:37 - pico-train - INFO - โโโ Learning Rate: 2.12e-05 |
| 2025-08-30 08:35:37 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:35:50 - pico-train - INFO - Step 58475 -- ๐ Training Metrics |
| 2025-08-30 08:35:50 - pico-train - INFO - โโโ Loss: 5.8166 |
| 2025-08-30 08:35:50 - pico-train - INFO - โโโ Learning Rate: 2.12e-05 |
| 2025-08-30 08:35:50 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:36:02 - pico-train - INFO - Step 58500 -- ๐พ Saving Checkpoint |
| 2025-08-30 08:37:56 - pico-train - INFO - Step 58500 -- ๐ Evaluation Results |
| 2025-08-30 08:37:56 - pico-train - INFO - โโโ paloma: 2.9542125550009274e+30 |
| 2025-08-30 08:38:01 - pico-train - INFO - Step 58500 -- ๐ Training Metrics |
| 2025-08-30 08:38:01 - pico-train - INFO - โโโ Loss: 5.8586 |
| 2025-08-30 08:38:01 - pico-train - INFO - โโโ Learning Rate: 2.12e-05 |
| 2025-08-30 08:38:01 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:38:01 - pico-train - INFO - Step 58500 -- ๐ Saving Learning Dynamics |
| 2025-08-30 08:38:16 - pico-train - INFO - Step 58525 -- ๐ Training Metrics |
| 2025-08-30 08:38:16 - pico-train - INFO - โโโ Loss: 5.8248 |
| 2025-08-30 08:38:16 - pico-train - INFO - โโโ Learning Rate: 2.12e-05 |
| 2025-08-30 08:38:16 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:38:29 - pico-train - INFO - Step 58550 -- ๐ Training Metrics |
| 2025-08-30 08:38:29 - pico-train - INFO - โโโ Loss: 5.8162 |
| 2025-08-30 08:38:29 - pico-train - INFO - โโโ Learning Rate: 2.11e-05 |
| 2025-08-30 08:38:29 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:38:41 - pico-train - INFO - Step 58575 -- ๐ Training Metrics |
| 2025-08-30 08:38:41 - pico-train - INFO - โโโ Loss: 5.9361 |
| 2025-08-30 08:38:41 - pico-train - INFO - โโโ Learning Rate: 2.11e-05 |
| 2025-08-30 08:38:41 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:38:54 - pico-train - INFO - Step 58600 -- ๐ Training Metrics |
| 2025-08-30 08:38:54 - pico-train - INFO - โโโ Loss: 5.8945 |
| 2025-08-30 08:38:54 - pico-train - INFO - โโโ Learning Rate: 2.11e-05 |
| 2025-08-30 08:38:54 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:39:06 - pico-train - INFO - Step 58625 -- ๐ Training Metrics |
| 2025-08-30 08:39:06 - pico-train - INFO - โโโ Loss: 5.7984 |
| 2025-08-30 08:39:06 - pico-train - INFO - โโโ Learning Rate: 2.11e-05 |
| 2025-08-30 08:39:06 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:39:19 - pico-train - INFO - Step 58650 -- ๐ Training Metrics |
| 2025-08-30 08:39:19 - pico-train - INFO - โโโ Loss: 5.8764 |
| 2025-08-30 08:39:19 - pico-train - INFO - โโโ Learning Rate: 2.10e-05 |
| 2025-08-30 08:39:19 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:39:32 - pico-train - INFO - Step 58675 -- ๐ Training Metrics |
| 2025-08-30 08:39:32 - pico-train - INFO - โโโ Loss: 5.9141 |
| 2025-08-30 08:39:32 - pico-train - INFO - โโโ Learning Rate: 2.10e-05 |
| 2025-08-30 08:39:32 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:39:44 - pico-train - INFO - Step 58700 -- ๐ Training Metrics |
| 2025-08-30 08:39:44 - pico-train - INFO - โโโ Loss: 5.9118 |
| 2025-08-30 08:39:44 - pico-train - INFO - โโโ Learning Rate: 2.10e-05 |
| 2025-08-30 08:39:44 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:39:57 - pico-train - INFO - Step 58725 -- ๐ Training Metrics |
| 2025-08-30 08:39:57 - pico-train - INFO - โโโ Loss: 5.8585 |
| 2025-08-30 08:39:57 - pico-train - INFO - โโโ Learning Rate: 2.10e-05 |
| 2025-08-30 08:39:57 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:40:10 - pico-train - INFO - Step 58750 -- ๐ Training Metrics |
| 2025-08-30 08:40:10 - pico-train - INFO - โโโ Loss: 5.8661 |
| 2025-08-30 08:40:10 - pico-train - INFO - โโโ Learning Rate: 2.10e-05 |
| 2025-08-30 08:40:10 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:40:22 - pico-train - INFO - Step 58775 -- ๐ Training Metrics |
| 2025-08-30 08:40:22 - pico-train - INFO - โโโ Loss: 5.8330 |
| 2025-08-30 08:40:22 - pico-train - INFO - โโโ Learning Rate: 2.09e-05 |
| 2025-08-30 08:40:22 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:40:35 - pico-train - INFO - Step 58800 -- ๐ Training Metrics |
| 2025-08-30 08:40:35 - pico-train - INFO - โโโ Loss: 5.8415 |
| 2025-08-30 08:40:35 - pico-train - INFO - โโโ Learning Rate: 2.09e-05 |
| 2025-08-30 08:40:35 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:40:47 - pico-train - INFO - Step 58825 -- ๐ Training Metrics |
| 2025-08-30 08:40:47 - pico-train - INFO - โโโ Loss: 5.9273 |
| 2025-08-30 08:40:47 - pico-train - INFO - โโโ Learning Rate: 2.09e-05 |
| 2025-08-30 08:40:47 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:41:00 - pico-train - INFO - Step 58850 -- ๐ Training Metrics |
| 2025-08-30 08:41:00 - pico-train - INFO - โโโ Loss: 5.8663 |
| 2025-08-30 08:41:00 - pico-train - INFO - โโโ Learning Rate: 2.09e-05 |
| 2025-08-30 08:41:00 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:41:14 - pico-train - INFO - Step 58875 -- ๐ Training Metrics |
| 2025-08-30 08:41:14 - pico-train - INFO - โโโ Loss: 5.8209 |
| 2025-08-30 08:41:14 - pico-train - INFO - โโโ Learning Rate: 2.09e-05 |
| 2025-08-30 08:41:14 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:41:26 - pico-train - INFO - Step 58900 -- ๐ Training Metrics |
| 2025-08-30 08:41:26 - pico-train - INFO - โโโ Loss: 5.9101 |
| 2025-08-30 08:41:26 - pico-train - INFO - โโโ Learning Rate: 2.08e-05 |
| 2025-08-30 08:41:26 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:41:39 - pico-train - INFO - Step 58925 -- ๐ Training Metrics |
| 2025-08-30 08:41:39 - pico-train - INFO - โโโ Loss: 5.9064 |
| 2025-08-30 08:41:39 - pico-train - INFO - โโโ Learning Rate: 2.08e-05 |
| 2025-08-30 08:41:39 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:41:51 - pico-train - INFO - Step 58950 -- ๐ Training Metrics |
| 2025-08-30 08:41:51 - pico-train - INFO - โโโ Loss: 5.8527 |
| 2025-08-30 08:41:51 - pico-train - INFO - โโโ Learning Rate: 2.08e-05 |
| 2025-08-30 08:41:51 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:42:04 - pico-train - INFO - Step 58975 -- ๐ Training Metrics |
| 2025-08-30 08:42:04 - pico-train - INFO - โโโ Loss: 5.8115 |
| 2025-08-30 08:42:04 - pico-train - INFO - โโโ Learning Rate: 2.08e-05 |
| 2025-08-30 08:42:04 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:42:16 - pico-train - INFO - Step 59000 -- ๐พ Saving Checkpoint |
| 2025-08-30 08:44:14 - pico-train - INFO - Step 59000 -- ๐ Evaluation Results |
| 2025-08-30 08:44:14 - pico-train - INFO - โโโ paloma: 3.916054030122377e+30 |
| 2025-08-30 08:44:17 - pico-train - INFO - Step 59000 -- ๐ Training Metrics |
| 2025-08-30 08:44:17 - pico-train - INFO - โโโ Loss: 5.8043 |
| 2025-08-30 08:44:17 - pico-train - INFO - โโโ Learning Rate: 2.08e-05 |
| 2025-08-30 08:44:17 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:44:17 - pico-train - INFO - Step 59000 -- ๐ Saving Learning Dynamics |
| 2025-08-30 08:44:33 - pico-train - INFO - Step 59025 -- ๐ Training Metrics |
| 2025-08-30 08:44:33 - pico-train - INFO - โโโ Loss: 5.7710 |
| 2025-08-30 08:44:33 - pico-train - INFO - โโโ Learning Rate: 2.07e-05 |
| 2025-08-30 08:44:33 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:44:45 - pico-train - INFO - Step 59050 -- ๐ Training Metrics |
| 2025-08-30 08:44:45 - pico-train - INFO - โโโ Loss: 5.8913 |
| 2025-08-30 08:44:45 - pico-train - INFO - โโโ Learning Rate: 2.07e-05 |
| 2025-08-30 08:44:45 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:44:59 - pico-train - INFO - Step 59075 -- ๐ Training Metrics |
| 2025-08-30 08:44:59 - pico-train - INFO - โโโ Loss: 5.8823 |
| 2025-08-30 08:44:59 - pico-train - INFO - โโโ Learning Rate: 2.07e-05 |
| 2025-08-30 08:44:59 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:45:11 - pico-train - INFO - Step 59100 -- ๐ Training Metrics |
| 2025-08-30 08:45:11 - pico-train - INFO - โโโ Loss: 5.8189 |
| 2025-08-30 08:45:11 - pico-train - INFO - โโโ Learning Rate: 2.07e-05 |
| 2025-08-30 08:45:11 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:45:24 - pico-train - INFO - Step 59125 -- ๐ Training Metrics |
| 2025-08-30 08:45:24 - pico-train - INFO - โโโ Loss: 5.7997 |
| 2025-08-30 08:45:24 - pico-train - INFO - โโโ Learning Rate: 2.06e-05 |
| 2025-08-30 08:45:24 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:45:37 - pico-train - INFO - Step 59150 -- ๐ Training Metrics |
| 2025-08-30 08:45:37 - pico-train - INFO - โโโ Loss: 5.8950 |
| 2025-08-30 08:45:37 - pico-train - INFO - โโโ Learning Rate: 2.06e-05 |
| 2025-08-30 08:45:37 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:45:50 - pico-train - INFO - Step 59175 -- ๐ Training Metrics |
| 2025-08-30 08:45:50 - pico-train - INFO - โโโ Loss: 5.9084 |
| 2025-08-30 08:45:50 - pico-train - INFO - โโโ Learning Rate: 2.06e-05 |
| 2025-08-30 08:45:50 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:46:03 - pico-train - INFO - Step 59200 -- ๐ Training Metrics |
| 2025-08-30 08:46:03 - pico-train - INFO - โโโ Loss: 5.8141 |
| 2025-08-30 08:46:03 - pico-train - INFO - โโโ Learning Rate: 2.06e-05 |
| 2025-08-30 08:46:03 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:46:16 - pico-train - INFO - Step 59225 -- ๐ Training Metrics |
| 2025-08-30 08:46:16 - pico-train - INFO - โโโ Loss: 5.8814 |
| 2025-08-30 08:46:16 - pico-train - INFO - โโโ Learning Rate: 2.06e-05 |
| 2025-08-30 08:46:16 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:46:28 - pico-train - INFO - Step 59250 -- ๐ Training Metrics |
| 2025-08-30 08:46:28 - pico-train - INFO - โโโ Loss: 5.8316 |
| 2025-08-30 08:46:28 - pico-train - INFO - โโโ Learning Rate: 2.05e-05 |
| 2025-08-30 08:46:28 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:46:41 - pico-train - INFO - Step 59275 -- ๐ Training Metrics |
| 2025-08-30 08:46:41 - pico-train - INFO - โโโ Loss: 5.8489 |
| 2025-08-30 08:46:41 - pico-train - INFO - โโโ Learning Rate: 2.05e-05 |
| 2025-08-30 08:46:41 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:46:54 - pico-train - INFO - Step 59300 -- ๐ Training Metrics |
| 2025-08-30 08:46:54 - pico-train - INFO - โโโ Loss: 5.7998 |
| 2025-08-30 08:46:54 - pico-train - INFO - โโโ Learning Rate: 2.05e-05 |
| 2025-08-30 08:46:54 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:47:06 - pico-train - INFO - Step 59325 -- ๐ Training Metrics |
| 2025-08-30 08:47:06 - pico-train - INFO - โโโ Loss: 5.8848 |
| 2025-08-30 08:47:06 - pico-train - INFO - โโโ Learning Rate: 2.05e-05 |
| 2025-08-30 08:47:06 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:47:19 - pico-train - INFO - Step 59350 -- ๐ Training Metrics |
| 2025-08-30 08:47:19 - pico-train - INFO - โโโ Loss: 5.8543 |
| 2025-08-30 08:47:19 - pico-train - INFO - โโโ Learning Rate: 2.05e-05 |
| 2025-08-30 08:47:19 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:47:32 - pico-train - INFO - Step 59375 -- ๐ Training Metrics |
| 2025-08-30 08:47:32 - pico-train - INFO - โโโ Loss: 5.8655 |
| 2025-08-30 08:47:32 - pico-train - INFO - โโโ Learning Rate: 2.04e-05 |
| 2025-08-30 08:47:32 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:47:44 - pico-train - INFO - Step 59400 -- ๐ Training Metrics |
| 2025-08-30 08:47:44 - pico-train - INFO - โโโ Loss: 5.8870 |
| 2025-08-30 08:47:44 - pico-train - INFO - โโโ Learning Rate: 2.04e-05 |
| 2025-08-30 08:47:44 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:47:57 - pico-train - INFO - Step 59425 -- ๐ Training Metrics |
| 2025-08-30 08:47:57 - pico-train - INFO - โโโ Loss: 5.8000 |
| 2025-08-30 08:47:57 - pico-train - INFO - โโโ Learning Rate: 2.04e-05 |
| 2025-08-30 08:47:57 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:48:10 - pico-train - INFO - Step 59450 -- ๐ Training Metrics |
| 2025-08-30 08:48:10 - pico-train - INFO - โโโ Loss: 5.8162 |
| 2025-08-30 08:48:10 - pico-train - INFO - โโโ Learning Rate: 2.04e-05 |
| 2025-08-30 08:48:10 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:48:22 - pico-train - INFO - Step 59475 -- ๐ Training Metrics |
| 2025-08-30 08:48:22 - pico-train - INFO - โโโ Loss: 5.8936 |
| 2025-08-30 08:48:22 - pico-train - INFO - โโโ Learning Rate: 2.04e-05 |
| 2025-08-30 08:48:22 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:48:35 - pico-train - INFO - Step 59500 -- ๐พ Saving Checkpoint |
| 2025-08-30 08:50:28 - pico-train - INFO - Step 59500 -- ๐ Evaluation Results |
| 2025-08-30 08:50:28 - pico-train - INFO - โโโ paloma: 4.0666865028851395e+30 |
| 2025-08-30 08:50:32 - pico-train - INFO - Step 59500 -- ๐ Training Metrics |
| 2025-08-30 08:50:32 - pico-train - INFO - โโโ Loss: 5.8731 |
| 2025-08-30 08:50:32 - pico-train - INFO - โโโ Learning Rate: 2.03e-05 |
| 2025-08-30 08:50:32 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:50:32 - pico-train - INFO - Step 59500 -- ๐ Saving Learning Dynamics |
| 2025-08-30 08:50:47 - pico-train - INFO - Step 59525 -- ๐ Training Metrics |
| 2025-08-30 08:50:47 - pico-train - INFO - โโโ Loss: 5.9058 |
| 2025-08-30 08:50:47 - pico-train - INFO - โโโ Learning Rate: 2.03e-05 |
| 2025-08-30 08:50:47 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:51:00 - pico-train - INFO - Step 59550 -- ๐ Training Metrics |
| 2025-08-30 08:51:00 - pico-train - INFO - โโโ Loss: 5.8037 |
| 2025-08-30 08:51:00 - pico-train - INFO - โโโ Learning Rate: 2.03e-05 |
| 2025-08-30 08:51:00 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:51:12 - pico-train - INFO - Step 59575 -- ๐ Training Metrics |
| 2025-08-30 08:51:12 - pico-train - INFO - โโโ Loss: 5.8553 |
| 2025-08-30 08:51:12 - pico-train - INFO - โโโ Learning Rate: 2.03e-05 |
| 2025-08-30 08:51:12 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:51:25 - pico-train - INFO - Step 59600 -- ๐ Training Metrics |
| 2025-08-30 08:51:25 - pico-train - INFO - โโโ Loss: 5.8022 |
| 2025-08-30 08:51:25 - pico-train - INFO - โโโ Learning Rate: 2.02e-05 |
| 2025-08-30 08:51:25 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:51:38 - pico-train - INFO - Step 59625 -- ๐ Training Metrics |
| 2025-08-30 08:51:38 - pico-train - INFO - โโโ Loss: 5.8279 |
| 2025-08-30 08:51:38 - pico-train - INFO - โโโ Learning Rate: 2.02e-05 |
| 2025-08-30 08:51:38 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:51:51 - pico-train - INFO - Step 59650 -- ๐ Training Metrics |
| 2025-08-30 08:51:51 - pico-train - INFO - โโโ Loss: 5.7732 |
| 2025-08-30 08:51:51 - pico-train - INFO - โโโ Learning Rate: 2.02e-05 |
| 2025-08-30 08:51:51 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:52:03 - pico-train - INFO - Step 59675 -- ๐ Training Metrics |
| 2025-08-30 08:52:03 - pico-train - INFO - โโโ Loss: 5.8738 |
| 2025-08-30 08:52:03 - pico-train - INFO - โโโ Learning Rate: 2.02e-05 |
| 2025-08-30 08:52:03 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:52:16 - pico-train - INFO - Step 59700 -- ๐ Training Metrics |
| 2025-08-30 08:52:16 - pico-train - INFO - โโโ Loss: 5.8618 |
| 2025-08-30 08:52:16 - pico-train - INFO - โโโ Learning Rate: 2.02e-05 |
| 2025-08-30 08:52:16 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:52:29 - pico-train - INFO - Step 59725 -- ๐ Training Metrics |
| 2025-08-30 08:52:29 - pico-train - INFO - โโโ Loss: 5.8423 |
| 2025-08-30 08:52:29 - pico-train - INFO - โโโ Learning Rate: 2.01e-05 |
| 2025-08-30 08:52:29 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:52:41 - pico-train - INFO - Step 59750 -- ๐ Training Metrics |
| 2025-08-30 08:52:41 - pico-train - INFO - โโโ Loss: 5.9335 |
| 2025-08-30 08:52:41 - pico-train - INFO - โโโ Learning Rate: 2.01e-05 |
| 2025-08-30 08:52:41 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:52:54 - pico-train - INFO - Step 59775 -- ๐ Training Metrics |
| 2025-08-30 08:52:54 - pico-train - INFO - โโโ Loss: 5.7709 |
| 2025-08-30 08:52:54 - pico-train - INFO - โโโ Learning Rate: 2.01e-05 |
| 2025-08-30 08:52:54 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:53:07 - pico-train - INFO - Step 59800 -- ๐ Training Metrics |
| 2025-08-30 08:53:07 - pico-train - INFO - โโโ Loss: 5.9237 |
| 2025-08-30 08:53:07 - pico-train - INFO - โโโ Learning Rate: 2.01e-05 |
| 2025-08-30 08:53:07 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:53:19 - pico-train - INFO - Step 59825 -- ๐ Training Metrics |
| 2025-08-30 08:53:19 - pico-train - INFO - โโโ Loss: 5.9029 |
| 2025-08-30 08:53:19 - pico-train - INFO - โโโ Learning Rate: 2.01e-05 |
| 2025-08-30 08:53:19 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:53:32 - pico-train - INFO - Step 59850 -- ๐ Training Metrics |
| 2025-08-30 08:53:32 - pico-train - INFO - โโโ Loss: 5.9280 |
| 2025-08-30 08:53:32 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-30 08:53:32 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:53:45 - pico-train - INFO - Step 59875 -- ๐ Training Metrics |
| 2025-08-30 08:53:45 - pico-train - INFO - โโโ Loss: 5.8758 |
| 2025-08-30 08:53:45 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-30 08:53:45 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:53:58 - pico-train - INFO - Step 59900 -- ๐ Training Metrics |
| 2025-08-30 08:53:58 - pico-train - INFO - โโโ Loss: 5.8195 |
| 2025-08-30 08:53:58 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-30 08:53:58 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:54:11 - pico-train - INFO - Step 59925 -- ๐ Training Metrics |
| 2025-08-30 08:54:11 - pico-train - INFO - โโโ Loss: 5.9247 |
| 2025-08-30 08:54:11 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-30 08:54:11 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:54:23 - pico-train - INFO - Step 59950 -- ๐ Training Metrics |
| 2025-08-30 08:54:23 - pico-train - INFO - โโโ Loss: 5.8941 |
| 2025-08-30 08:54:23 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-30 08:54:23 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:54:36 - pico-train - INFO - Step 59975 -- ๐ Training Metrics |
| 2025-08-30 08:54:36 - pico-train - INFO - โโโ Loss: 5.9192 |
| 2025-08-30 08:54:36 - pico-train - INFO - โโโ Learning Rate: 1.99e-05 |
| 2025-08-30 08:54:36 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:54:48 - pico-train - INFO - Step 60000 -- ๐พ Saving Checkpoint |
| 2025-08-30 08:56:42 - pico-train - INFO - Step 60000 -- ๐ Evaluation Results |
| 2025-08-30 08:56:42 - pico-train - INFO - โโโ paloma: 5.67735563606023e+30 |
| 2025-08-30 08:56:44 - pico-train - INFO - Step 60000 -- ๐ Training Metrics |
| 2025-08-30 08:56:44 - pico-train - INFO - โโโ Loss: 5.9175 |
| 2025-08-30 08:56:44 - pico-train - INFO - โโโ Learning Rate: 1.99e-05 |
| 2025-08-30 08:56:44 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:56:44 - pico-train - INFO - Step 60000 -- ๐ Saving Learning Dynamics |
| 2025-08-30 08:56:59 - pico-train - INFO - Step 60025 -- ๐ Training Metrics |
| 2025-08-30 08:56:59 - pico-train - INFO - โโโ Loss: 5.8005 |
| 2025-08-30 08:56:59 - pico-train - INFO - โโโ Learning Rate: 1.99e-05 |
| 2025-08-30 08:56:59 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:57:12 - pico-train - INFO - Step 60050 -- ๐ Training Metrics |
| 2025-08-30 08:57:12 - pico-train - INFO - โโโ Loss: 5.8668 |
| 2025-08-30 08:57:12 - pico-train - INFO - โโโ Learning Rate: 1.99e-05 |
| 2025-08-30 08:57:12 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:57:24 - pico-train - INFO - Step 60075 -- ๐ Training Metrics |
| 2025-08-30 08:57:24 - pico-train - INFO - โโโ Loss: 5.9150 |
| 2025-08-30 08:57:24 - pico-train - INFO - โโโ Learning Rate: 1.99e-05 |
| 2025-08-30 08:57:24 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:57:37 - pico-train - INFO - Step 60100 -- ๐ Training Metrics |
| 2025-08-30 08:57:37 - pico-train - INFO - โโโ Loss: 5.8577 |
| 2025-08-30 08:57:37 - pico-train - INFO - โโโ Learning Rate: 1.98e-05 |
| 2025-08-30 08:57:37 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:57:50 - pico-train - INFO - Step 60125 -- ๐ Training Metrics |
| 2025-08-30 08:57:50 - pico-train - INFO - โโโ Loss: 5.9463 |
| 2025-08-30 08:57:50 - pico-train - INFO - โโโ Learning Rate: 1.98e-05 |
| 2025-08-30 08:57:50 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:58:03 - pico-train - INFO - Step 60150 -- ๐ Training Metrics |
| 2025-08-30 08:58:03 - pico-train - INFO - โโโ Loss: 5.9613 |
| 2025-08-30 08:58:03 - pico-train - INFO - โโโ Learning Rate: 1.98e-05 |
| 2025-08-30 08:58:03 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:58:15 - pico-train - INFO - Step 60175 -- ๐ Training Metrics |
| 2025-08-30 08:58:15 - pico-train - INFO - โโโ Loss: 5.7742 |
| 2025-08-30 08:58:15 - pico-train - INFO - โโโ Learning Rate: 1.98e-05 |
| 2025-08-30 08:58:15 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:58:28 - pico-train - INFO - Step 60200 -- ๐ Training Metrics |
| 2025-08-30 08:58:28 - pico-train - INFO - โโโ Loss: 5.9330 |
| 2025-08-30 08:58:28 - pico-train - INFO - โโโ Learning Rate: 1.97e-05 |
| 2025-08-30 08:58:28 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:58:40 - pico-train - INFO - Step 60225 -- ๐ Training Metrics |
| 2025-08-30 08:58:40 - pico-train - INFO - โโโ Loss: 5.9165 |
| 2025-08-30 08:58:40 - pico-train - INFO - โโโ Learning Rate: 1.97e-05 |
| 2025-08-30 08:58:40 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:58:53 - pico-train - INFO - Step 60250 -- ๐ Training Metrics |
| 2025-08-30 08:58:53 - pico-train - INFO - โโโ Loss: 5.8891 |
| 2025-08-30 08:58:53 - pico-train - INFO - โโโ Learning Rate: 1.97e-05 |
| 2025-08-30 08:58:53 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:59:06 - pico-train - INFO - Step 60275 -- ๐ Training Metrics |
| 2025-08-30 08:59:06 - pico-train - INFO - โโโ Loss: 5.8293 |
| 2025-08-30 08:59:06 - pico-train - INFO - โโโ Learning Rate: 1.97e-05 |
| 2025-08-30 08:59:06 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:59:18 - pico-train - INFO - Step 60300 -- ๐ Training Metrics |
| 2025-08-30 08:59:18 - pico-train - INFO - โโโ Loss: 5.7729 |
| 2025-08-30 08:59:18 - pico-train - INFO - โโโ Learning Rate: 1.97e-05 |
| 2025-08-30 08:59:18 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:59:31 - pico-train - INFO - Step 60325 -- ๐ Training Metrics |
| 2025-08-30 08:59:31 - pico-train - INFO - โโโ Loss: 5.8043 |
| 2025-08-30 08:59:31 - pico-train - INFO - โโโ Learning Rate: 1.96e-05 |
| 2025-08-30 08:59:31 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:59:43 - pico-train - INFO - Step 60350 -- ๐ Training Metrics |
| 2025-08-30 08:59:43 - pico-train - INFO - โโโ Loss: 5.8123 |
| 2025-08-30 08:59:43 - pico-train - INFO - โโโ Learning Rate: 1.96e-05 |
| 2025-08-30 08:59:43 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 08:59:57 - pico-train - INFO - Step 60375 -- ๐ Training Metrics |
| 2025-08-30 08:59:57 - pico-train - INFO - โโโ Loss: 5.9085 |
| 2025-08-30 08:59:57 - pico-train - INFO - โโโ Learning Rate: 1.96e-05 |
| 2025-08-30 08:59:57 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:00:09 - pico-train - INFO - Step 60400 -- ๐ Training Metrics |
| 2025-08-30 09:00:09 - pico-train - INFO - โโโ Loss: 5.8004 |
| 2025-08-30 09:00:09 - pico-train - INFO - โโโ Learning Rate: 1.96e-05 |
| 2025-08-30 09:00:09 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:00:22 - pico-train - INFO - Step 60425 -- ๐ Training Metrics |
| 2025-08-30 09:00:22 - pico-train - INFO - โโโ Loss: 5.8664 |
| 2025-08-30 09:00:22 - pico-train - INFO - โโโ Learning Rate: 1.96e-05 |
| 2025-08-30 09:00:22 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:00:35 - pico-train - INFO - Step 60450 -- ๐ Training Metrics |
| 2025-08-30 09:00:35 - pico-train - INFO - โโโ Loss: 5.8370 |
| 2025-08-30 09:00:35 - pico-train - INFO - โโโ Learning Rate: 1.95e-05 |
| 2025-08-30 09:00:35 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:00:48 - pico-train - INFO - Step 60475 -- ๐ Training Metrics |
| 2025-08-30 09:00:48 - pico-train - INFO - โโโ Loss: 5.8813 |
| 2025-08-30 09:00:48 - pico-train - INFO - โโโ Learning Rate: 1.95e-05 |
| 2025-08-30 09:00:48 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:01:01 - pico-train - INFO - Step 60500 -- ๐พ Saving Checkpoint |
| 2025-08-30 09:03:00 - pico-train - INFO - Step 60500 -- ๐ Evaluation Results |
| 2025-08-30 09:03:00 - pico-train - INFO - โโโ paloma: 6.577053610858546e+30 |
| 2025-08-30 09:03:04 - pico-train - INFO - Step 60500 -- ๐ Training Metrics |
| 2025-08-30 09:03:04 - pico-train - INFO - โโโ Loss: 5.8644 |
| 2025-08-30 09:03:04 - pico-train - INFO - โโโ Learning Rate: 1.95e-05 |
| 2025-08-30 09:03:04 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:03:04 - pico-train - INFO - Step 60500 -- ๐ Saving Learning Dynamics |
| 2025-08-30 09:03:20 - pico-train - INFO - Step 60525 -- ๐ Training Metrics |
| 2025-08-30 09:03:20 - pico-train - INFO - โโโ Loss: 5.9048 |
| 2025-08-30 09:03:20 - pico-train - INFO - โโโ Learning Rate: 1.95e-05 |
| 2025-08-30 09:03:20 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:03:32 - pico-train - INFO - Step 60550 -- ๐ Training Metrics |
| 2025-08-30 09:03:32 - pico-train - INFO - โโโ Loss: 5.8286 |
| 2025-08-30 09:03:32 - pico-train - INFO - โโโ Learning Rate: 1.95e-05 |
| 2025-08-30 09:03:32 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:03:45 - pico-train - INFO - Step 60575 -- ๐ Training Metrics |
| 2025-08-30 09:03:45 - pico-train - INFO - โโโ Loss: 5.9112 |
| 2025-08-30 09:03:45 - pico-train - INFO - โโโ Learning Rate: 1.94e-05 |
| 2025-08-30 09:03:45 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:03:58 - pico-train - INFO - Step 60600 -- ๐ Training Metrics |
| 2025-08-30 09:03:58 - pico-train - INFO - โโโ Loss: 5.8445 |
| 2025-08-30 09:03:58 - pico-train - INFO - โโโ Learning Rate: 1.94e-05 |
| 2025-08-30 09:03:58 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:04:10 - pico-train - INFO - Step 60625 -- ๐ Training Metrics |
| 2025-08-30 09:04:10 - pico-train - INFO - โโโ Loss: 5.8444 |
| 2025-08-30 09:04:10 - pico-train - INFO - โโโ Learning Rate: 1.94e-05 |
| 2025-08-30 09:04:10 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:04:23 - pico-train - INFO - Step 60650 -- ๐ Training Metrics |
| 2025-08-30 09:04:23 - pico-train - INFO - โโโ Loss: 5.7993 |
| 2025-08-30 09:04:23 - pico-train - INFO - โโโ Learning Rate: 1.94e-05 |
| 2025-08-30 09:04:23 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:04:36 - pico-train - INFO - Step 60675 -- ๐ Training Metrics |
| 2025-08-30 09:04:36 - pico-train - INFO - โโโ Loss: 5.8188 |
| 2025-08-30 09:04:36 - pico-train - INFO - โโโ Learning Rate: 1.94e-05 |
| 2025-08-30 09:04:36 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:04:48 - pico-train - INFO - Step 60700 -- ๐ Training Metrics |
| 2025-08-30 09:04:48 - pico-train - INFO - โโโ Loss: 5.8257 |
| 2025-08-30 09:04:48 - pico-train - INFO - โโโ Learning Rate: 1.93e-05 |
| 2025-08-30 09:04:48 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:05:01 - pico-train - INFO - Step 60725 -- ๐ Training Metrics |
| 2025-08-30 09:05:01 - pico-train - INFO - โโโ Loss: 5.9364 |
| 2025-08-30 09:05:01 - pico-train - INFO - โโโ Learning Rate: 1.93e-05 |
| 2025-08-30 09:05:01 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:05:14 - pico-train - INFO - Step 60750 -- ๐ Training Metrics |
| 2025-08-30 09:05:14 - pico-train - INFO - โโโ Loss: 5.8968 |
| 2025-08-30 09:05:14 - pico-train - INFO - โโโ Learning Rate: 1.93e-05 |
| 2025-08-30 09:05:14 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:05:26 - pico-train - INFO - Step 60775 -- ๐ Training Metrics |
| 2025-08-30 09:05:26 - pico-train - INFO - โโโ Loss: 5.7561 |
| 2025-08-30 09:05:26 - pico-train - INFO - โโโ Learning Rate: 1.93e-05 |
| 2025-08-30 09:05:26 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:05:39 - pico-train - INFO - Step 60800 -- ๐ Training Metrics |
| 2025-08-30 09:05:39 - pico-train - INFO - โโโ Loss: 5.8257 |
| 2025-08-30 09:05:39 - pico-train - INFO - โโโ Learning Rate: 1.92e-05 |
| 2025-08-30 09:05:39 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:05:52 - pico-train - INFO - Step 60825 -- ๐ Training Metrics |
| 2025-08-30 09:05:52 - pico-train - INFO - โโโ Loss: 5.8018 |
| 2025-08-30 09:05:52 - pico-train - INFO - โโโ Learning Rate: 1.92e-05 |
| 2025-08-30 09:05:52 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:06:04 - pico-train - INFO - Step 60850 -- ๐ Training Metrics |
| 2025-08-30 09:06:04 - pico-train - INFO - โโโ Loss: 5.8325 |
| 2025-08-30 09:06:04 - pico-train - INFO - โโโ Learning Rate: 1.92e-05 |
| 2025-08-30 09:06:04 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:06:17 - pico-train - INFO - Step 60875 -- ๐ Training Metrics |
| 2025-08-30 09:06:17 - pico-train - INFO - โโโ Loss: 5.9502 |
| 2025-08-30 09:06:17 - pico-train - INFO - โโโ Learning Rate: 1.92e-05 |
| 2025-08-30 09:06:17 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:06:30 - pico-train - INFO - Step 60900 -- ๐ Training Metrics |
| 2025-08-30 09:06:30 - pico-train - INFO - โโโ Loss: 5.8632 |
| 2025-08-30 09:06:30 - pico-train - INFO - โโโ Learning Rate: 1.92e-05 |
| 2025-08-30 09:06:30 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:06:42 - pico-train - INFO - Step 60925 -- ๐ Training Metrics |
| 2025-08-30 09:06:42 - pico-train - INFO - โโโ Loss: 5.7790 |
| 2025-08-30 09:06:42 - pico-train - INFO - โโโ Learning Rate: 1.91e-05 |
| 2025-08-30 09:06:42 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:06:55 - pico-train - INFO - Step 60950 -- ๐ Training Metrics |
| 2025-08-30 09:06:55 - pico-train - INFO - โโโ Loss: 5.8264 |
| 2025-08-30 09:06:55 - pico-train - INFO - โโโ Learning Rate: 1.91e-05 |
| 2025-08-30 09:06:55 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:07:08 - pico-train - INFO - Step 60975 -- ๐ Training Metrics |
| 2025-08-30 09:07:08 - pico-train - INFO - โโโ Loss: 5.8425 |
| 2025-08-30 09:07:08 - pico-train - INFO - โโโ Learning Rate: 1.91e-05 |
| 2025-08-30 09:07:08 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:07:20 - pico-train - INFO - Step 61000 -- ๐พ Saving Checkpoint |
| 2025-08-30 09:09:19 - pico-train - INFO - Step 61000 -- ๐ Evaluation Results |
| 2025-08-30 09:09:19 - pico-train - INFO - โโโ paloma: 7.381800813081388e+30 |
| 2025-08-30 09:09:23 - pico-train - INFO - Step 61000 -- ๐ Training Metrics |
| 2025-08-30 09:09:23 - pico-train - INFO - โโโ Loss: 5.8442 |
| 2025-08-30 09:09:23 - pico-train - INFO - โโโ Learning Rate: 1.91e-05 |
| 2025-08-30 09:09:23 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:09:23 - pico-train - INFO - Step 61000 -- ๐ Saving Learning Dynamics |
| 2025-08-30 09:09:38 - pico-train - INFO - Step 61025 -- ๐ Training Metrics |
| 2025-08-30 09:09:38 - pico-train - INFO - โโโ Loss: 5.9313 |
| 2025-08-30 09:09:38 - pico-train - INFO - โโโ Learning Rate: 1.91e-05 |
| 2025-08-30 09:09:38 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:09:51 - pico-train - INFO - Step 61050 -- ๐ Training Metrics |
| 2025-08-30 09:09:51 - pico-train - INFO - โโโ Loss: 5.8519 |
| 2025-08-30 09:09:51 - pico-train - INFO - โโโ Learning Rate: 1.90e-05 |
| 2025-08-30 09:09:51 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:10:03 - pico-train - INFO - Step 61075 -- ๐ Training Metrics |
| 2025-08-30 09:10:03 - pico-train - INFO - โโโ Loss: 5.8725 |
| 2025-08-30 09:10:03 - pico-train - INFO - โโโ Learning Rate: 1.90e-05 |
| 2025-08-30 09:10:03 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:10:16 - pico-train - INFO - Step 61100 -- ๐ Training Metrics |
| 2025-08-30 09:10:16 - pico-train - INFO - โโโ Loss: 5.8322 |
| 2025-08-30 09:10:16 - pico-train - INFO - โโโ Learning Rate: 1.90e-05 |
| 2025-08-30 09:10:16 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:10:29 - pico-train - INFO - Step 61125 -- ๐ Training Metrics |
| 2025-08-30 09:10:29 - pico-train - INFO - โโโ Loss: 5.8354 |
| 2025-08-30 09:10:29 - pico-train - INFO - โโโ Learning Rate: 1.90e-05 |
| 2025-08-30 09:10:29 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:10:41 - pico-train - INFO - Step 61150 -- ๐ Training Metrics |
| 2025-08-30 09:10:41 - pico-train - INFO - โโโ Loss: 5.8735 |
| 2025-08-30 09:10:41 - pico-train - INFO - โโโ Learning Rate: 1.90e-05 |
| 2025-08-30 09:10:41 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:10:54 - pico-train - INFO - Step 61175 -- ๐ Training Metrics |
| 2025-08-30 09:10:54 - pico-train - INFO - โโโ Loss: 5.9433 |
| 2025-08-30 09:10:54 - pico-train - INFO - โโโ Learning Rate: 1.89e-05 |
| 2025-08-30 09:10:54 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:11:06 - pico-train - INFO - Step 61200 -- ๐ Training Metrics |
| 2025-08-30 09:11:06 - pico-train - INFO - โโโ Loss: 5.8394 |
| 2025-08-30 09:11:06 - pico-train - INFO - โโโ Learning Rate: 1.89e-05 |
| 2025-08-30 09:11:06 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:11:19 - pico-train - INFO - Step 61225 -- ๐ Training Metrics |
| 2025-08-30 09:11:19 - pico-train - INFO - โโโ Loss: 5.9396 |
| 2025-08-30 09:11:19 - pico-train - INFO - โโโ Learning Rate: 1.89e-05 |
| 2025-08-30 09:11:19 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:11:32 - pico-train - INFO - Step 61250 -- ๐ Training Metrics |
| 2025-08-30 09:11:32 - pico-train - INFO - โโโ Loss: 5.8461 |
| 2025-08-30 09:11:32 - pico-train - INFO - โโโ Learning Rate: 1.89e-05 |
| 2025-08-30 09:11:32 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:11:44 - pico-train - INFO - Step 61275 -- ๐ Training Metrics |
| 2025-08-30 09:11:44 - pico-train - INFO - โโโ Loss: 5.9137 |
| 2025-08-30 09:11:44 - pico-train - INFO - โโโ Learning Rate: 1.89e-05 |
| 2025-08-30 09:11:44 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:11:57 - pico-train - INFO - Step 61300 -- ๐ Training Metrics |
| 2025-08-30 09:11:57 - pico-train - INFO - โโโ Loss: 5.8249 |
| 2025-08-30 09:11:57 - pico-train - INFO - โโโ Learning Rate: 1.88e-05 |
| 2025-08-30 09:11:57 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:12:10 - pico-train - INFO - Step 61325 -- ๐ Training Metrics |
| 2025-08-30 09:12:10 - pico-train - INFO - โโโ Loss: 5.8248 |
| 2025-08-30 09:12:10 - pico-train - INFO - โโโ Learning Rate: 1.88e-05 |
| 2025-08-30 09:12:10 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:12:22 - pico-train - INFO - Step 61350 -- ๐ Training Metrics |
| 2025-08-30 09:12:22 - pico-train - INFO - โโโ Loss: 5.8349 |
| 2025-08-30 09:12:22 - pico-train - INFO - โโโ Learning Rate: 1.88e-05 |
| 2025-08-30 09:12:22 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:12:35 - pico-train - INFO - Step 61375 -- ๐ Training Metrics |
| 2025-08-30 09:12:35 - pico-train - INFO - โโโ Loss: 5.8265 |
| 2025-08-30 09:12:35 - pico-train - INFO - โโโ Learning Rate: 1.88e-05 |
| 2025-08-30 09:12:35 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:12:47 - pico-train - INFO - Step 61400 -- ๐ Training Metrics |
| 2025-08-30 09:12:47 - pico-train - INFO - โโโ Loss: 5.8919 |
| 2025-08-30 09:12:47 - pico-train - INFO - โโโ Learning Rate: 1.87e-05 |
| 2025-08-30 09:12:47 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:13:00 - pico-train - INFO - Step 61425 -- ๐ Training Metrics |
| 2025-08-30 09:13:00 - pico-train - INFO - โโโ Loss: 5.8929 |
| 2025-08-30 09:13:00 - pico-train - INFO - โโโ Learning Rate: 1.87e-05 |
| 2025-08-30 09:13:00 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:13:13 - pico-train - INFO - Step 61450 -- ๐ Training Metrics |
| 2025-08-30 09:13:13 - pico-train - INFO - โโโ Loss: 5.8063 |
| 2025-08-30 09:13:13 - pico-train - INFO - โโโ Learning Rate: 1.87e-05 |
| 2025-08-30 09:13:13 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:13:25 - pico-train - INFO - Step 61475 -- ๐ Training Metrics |
| 2025-08-30 09:13:25 - pico-train - INFO - โโโ Loss: 5.8834 |
| 2025-08-30 09:13:25 - pico-train - INFO - โโโ Learning Rate: 1.87e-05 |
| 2025-08-30 09:13:25 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:13:38 - pico-train - INFO - Step 61500 -- ๐พ Saving Checkpoint |
| 2025-08-30 09:15:52 - pico-train - INFO - Step 61500 -- ๐ Evaluation Results |
| 2025-08-30 09:15:52 - pico-train - INFO - โโโ paloma: 7.5580512131553e+30 |
| 2025-08-30 09:15:55 - pico-train - INFO - Step 61500 -- ๐ Training Metrics |
| 2025-08-30 09:15:55 - pico-train - INFO - โโโ Loss: 5.8274 |
| 2025-08-30 09:15:55 - pico-train - INFO - โโโ Learning Rate: 1.87e-05 |
| 2025-08-30 09:15:55 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:15:55 - pico-train - INFO - Step 61500 -- ๐ Saving Learning Dynamics |
| 2025-08-30 09:16:20 - pico-train - INFO - Step 61525 -- ๐ Training Metrics |
| 2025-08-30 09:16:20 - pico-train - INFO - โโโ Loss: 5.8780 |
| 2025-08-30 09:16:20 - pico-train - INFO - โโโ Learning Rate: 1.86e-05 |
| 2025-08-30 09:16:20 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:16:33 - pico-train - INFO - Step 61550 -- ๐ Training Metrics |
| 2025-08-30 09:16:33 - pico-train - INFO - โโโ Loss: 5.8784 |
| 2025-08-30 09:16:33 - pico-train - INFO - โโโ Learning Rate: 1.86e-05 |
| 2025-08-30 09:16:33 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:16:45 - pico-train - INFO - Step 61575 -- ๐ Training Metrics |
| 2025-08-30 09:16:45 - pico-train - INFO - โโโ Loss: 5.8547 |
| 2025-08-30 09:16:45 - pico-train - INFO - โโโ Learning Rate: 1.86e-05 |
| 2025-08-30 09:16:45 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:16:58 - pico-train - INFO - Step 61600 -- ๐ Training Metrics |
| 2025-08-30 09:16:58 - pico-train - INFO - โโโ Loss: 5.8624 |
| 2025-08-30 09:16:58 - pico-train - INFO - โโโ Learning Rate: 1.86e-05 |
| 2025-08-30 09:16:58 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:17:11 - pico-train - INFO - Step 61625 -- ๐ Training Metrics |
| 2025-08-30 09:17:11 - pico-train - INFO - โโโ Loss: 5.9047 |
| 2025-08-30 09:17:11 - pico-train - INFO - โโโ Learning Rate: 1.86e-05 |
| 2025-08-30 09:17:11 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:17:24 - pico-train - INFO - Step 61650 -- ๐ Training Metrics |
| 2025-08-30 09:17:24 - pico-train - INFO - โโโ Loss: 5.8888 |
| 2025-08-30 09:17:24 - pico-train - INFO - โโโ Learning Rate: 1.85e-05 |
| 2025-08-30 09:17:24 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:17:36 - pico-train - INFO - Step 61675 -- ๐ Training Metrics |
| 2025-08-30 09:17:36 - pico-train - INFO - โโโ Loss: 5.8195 |
| 2025-08-30 09:17:36 - pico-train - INFO - โโโ Learning Rate: 1.85e-05 |
| 2025-08-30 09:17:36 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:17:49 - pico-train - INFO - Step 61700 -- ๐ Training Metrics |
| 2025-08-30 09:17:49 - pico-train - INFO - โโโ Loss: 5.8452 |
| 2025-08-30 09:17:49 - pico-train - INFO - โโโ Learning Rate: 1.85e-05 |
| 2025-08-30 09:17:49 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:18:02 - pico-train - INFO - Step 61725 -- ๐ Training Metrics |
| 2025-08-30 09:18:02 - pico-train - INFO - โโโ Loss: 5.9150 |
| 2025-08-30 09:18:02 - pico-train - INFO - โโโ Learning Rate: 1.85e-05 |
| 2025-08-30 09:18:02 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:18:14 - pico-train - INFO - Step 61750 -- ๐ Training Metrics |
| 2025-08-30 09:18:14 - pico-train - INFO - โโโ Loss: 5.7953 |
| 2025-08-30 09:18:14 - pico-train - INFO - โโโ Learning Rate: 1.85e-05 |
| 2025-08-30 09:18:14 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:18:27 - pico-train - INFO - Step 61775 -- ๐ Training Metrics |
| 2025-08-30 09:18:27 - pico-train - INFO - โโโ Loss: 5.8075 |
| 2025-08-30 09:18:27 - pico-train - INFO - โโโ Learning Rate: 1.84e-05 |
| 2025-08-30 09:18:27 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:18:40 - pico-train - INFO - Step 61800 -- ๐ Training Metrics |
| 2025-08-30 09:18:40 - pico-train - INFO - โโโ Loss: 5.8305 |
| 2025-08-30 09:18:40 - pico-train - INFO - โโโ Learning Rate: 1.84e-05 |
| 2025-08-30 09:18:40 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:18:53 - pico-train - INFO - Step 61825 -- ๐ Training Metrics |
| 2025-08-30 09:18:53 - pico-train - INFO - โโโ Loss: 5.8460 |
| 2025-08-30 09:18:53 - pico-train - INFO - โโโ Learning Rate: 1.84e-05 |
| 2025-08-30 09:18:53 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:19:05 - pico-train - INFO - Step 61850 -- ๐ Training Metrics |
| 2025-08-30 09:19:05 - pico-train - INFO - โโโ Loss: 5.9274 |
| 2025-08-30 09:19:05 - pico-train - INFO - โโโ Learning Rate: 1.84e-05 |
| 2025-08-30 09:19:05 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:19:18 - pico-train - INFO - Step 61875 -- ๐ Training Metrics |
| 2025-08-30 09:19:18 - pico-train - INFO - โโโ Loss: 5.8535 |
| 2025-08-30 09:19:18 - pico-train - INFO - โโโ Learning Rate: 1.84e-05 |
| 2025-08-30 09:19:18 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:19:31 - pico-train - INFO - Step 61900 -- ๐ Training Metrics |
| 2025-08-30 09:19:31 - pico-train - INFO - โโโ Loss: 5.8254 |
| 2025-08-30 09:19:31 - pico-train - INFO - โโโ Learning Rate: 1.83e-05 |
| 2025-08-30 09:19:31 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:19:44 - pico-train - INFO - Step 61925 -- ๐ Training Metrics |
| 2025-08-30 09:19:44 - pico-train - INFO - โโโ Loss: 5.6957 |
| 2025-08-30 09:19:44 - pico-train - INFO - โโโ Learning Rate: 1.83e-05 |
| 2025-08-30 09:19:44 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:19:56 - pico-train - INFO - Step 61950 -- ๐ Training Metrics |
| 2025-08-30 09:19:56 - pico-train - INFO - โโโ Loss: 5.8474 |
| 2025-08-30 09:19:56 - pico-train - INFO - โโโ Learning Rate: 1.83e-05 |
| 2025-08-30 09:19:56 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:20:09 - pico-train - INFO - Step 61975 -- ๐ Training Metrics |
| 2025-08-30 09:20:09 - pico-train - INFO - โโโ Loss: 5.8588 |
| 2025-08-30 09:20:09 - pico-train - INFO - โโโ Learning Rate: 1.83e-05 |
| 2025-08-30 09:20:09 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:20:26 - pico-train - INFO - Step 62000 -- ๐พ Saving Checkpoint |
| 2025-08-30 09:22:33 - pico-train - INFO - Step 62000 -- ๐ Evaluation Results |
| 2025-08-30 09:22:33 - pico-train - INFO - โโโ paloma: 1.0115134118607476e+31 |
| 2025-08-30 09:22:37 - pico-train - INFO - Step 62000 -- ๐ Training Metrics |
| 2025-08-30 09:22:37 - pico-train - INFO - โโโ Loss: 5.8579 |
| 2025-08-30 09:22:37 - pico-train - INFO - โโโ Learning Rate: 1.83e-05 |
| 2025-08-30 09:22:37 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:22:37 - pico-train - INFO - Step 62000 -- ๐ Saving Learning Dynamics |
| 2025-08-30 09:22:52 - pico-train - INFO - Step 62025 -- ๐ Training Metrics |
| 2025-08-30 09:22:52 - pico-train - INFO - โโโ Loss: 5.8263 |
| 2025-08-30 09:22:52 - pico-train - INFO - โโโ Learning Rate: 1.82e-05 |
| 2025-08-30 09:22:52 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:23:05 - pico-train - INFO - Step 62050 -- ๐ Training Metrics |
| 2025-08-30 09:23:05 - pico-train - INFO - โโโ Loss: 5.8617 |
| 2025-08-30 09:23:05 - pico-train - INFO - โโโ Learning Rate: 1.82e-05 |
| 2025-08-30 09:23:05 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:23:18 - pico-train - INFO - Step 62075 -- ๐ Training Metrics |
| 2025-08-30 09:23:18 - pico-train - INFO - โโโ Loss: 5.8762 |
| 2025-08-30 09:23:18 - pico-train - INFO - โโโ Learning Rate: 1.82e-05 |
| 2025-08-30 09:23:18 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:23:30 - pico-train - INFO - Step 62100 -- ๐ Training Metrics |
| 2025-08-30 09:23:30 - pico-train - INFO - โโโ Loss: 5.8857 |
| 2025-08-30 09:23:30 - pico-train - INFO - โโโ Learning Rate: 1.82e-05 |
| 2025-08-30 09:23:30 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:23:43 - pico-train - INFO - Step 62125 -- ๐ Training Metrics |
| 2025-08-30 09:23:43 - pico-train - INFO - โโโ Loss: 5.7406 |
| 2025-08-30 09:23:43 - pico-train - INFO - โโโ Learning Rate: 1.82e-05 |
| 2025-08-30 09:23:43 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:23:55 - pico-train - INFO - Step 62150 -- ๐ Training Metrics |
| 2025-08-30 09:23:55 - pico-train - INFO - โโโ Loss: 5.8648 |
| 2025-08-30 09:23:55 - pico-train - INFO - โโโ Learning Rate: 1.81e-05 |
| 2025-08-30 09:23:55 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:24:08 - pico-train - INFO - Step 62175 -- ๐ Training Metrics |
| 2025-08-30 09:24:08 - pico-train - INFO - โโโ Loss: 5.8611 |
| 2025-08-30 09:24:08 - pico-train - INFO - โโโ Learning Rate: 1.81e-05 |
| 2025-08-30 09:24:08 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:24:21 - pico-train - INFO - Step 62200 -- ๐ Training Metrics |
| 2025-08-30 09:24:21 - pico-train - INFO - โโโ Loss: 5.8327 |
| 2025-08-30 09:24:21 - pico-train - INFO - โโโ Learning Rate: 1.81e-05 |
| 2025-08-30 09:24:21 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:24:33 - pico-train - INFO - Step 62225 -- ๐ Training Metrics |
| 2025-08-30 09:24:33 - pico-train - INFO - โโโ Loss: 5.8680 |
| 2025-08-30 09:24:33 - pico-train - INFO - โโโ Learning Rate: 1.81e-05 |
| 2025-08-30 09:24:33 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:24:46 - pico-train - INFO - Step 62250 -- ๐ Training Metrics |
| 2025-08-30 09:24:46 - pico-train - INFO - โโโ Loss: 5.8013 |
| 2025-08-30 09:24:46 - pico-train - INFO - โโโ Learning Rate: 1.80e-05 |
| 2025-08-30 09:24:46 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:24:58 - pico-train - INFO - Step 62275 -- ๐ Training Metrics |
| 2025-08-30 09:24:58 - pico-train - INFO - โโโ Loss: 5.7716 |
| 2025-08-30 09:24:58 - pico-train - INFO - โโโ Learning Rate: 1.80e-05 |
| 2025-08-30 09:24:58 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:25:11 - pico-train - INFO - Step 62300 -- ๐ Training Metrics |
| 2025-08-30 09:25:11 - pico-train - INFO - โโโ Loss: 5.8227 |
| 2025-08-30 09:25:11 - pico-train - INFO - โโโ Learning Rate: 1.80e-05 |
| 2025-08-30 09:25:11 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:25:24 - pico-train - INFO - Step 62325 -- ๐ Training Metrics |
| 2025-08-30 09:25:24 - pico-train - INFO - โโโ Loss: 5.8460 |
| 2025-08-30 09:25:24 - pico-train - INFO - โโโ Learning Rate: 1.80e-05 |
| 2025-08-30 09:25:24 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:25:37 - pico-train - INFO - Step 62350 -- ๐ Training Metrics |
| 2025-08-30 09:25:37 - pico-train - INFO - โโโ Loss: 5.8503 |
| 2025-08-30 09:25:37 - pico-train - INFO - โโโ Learning Rate: 1.80e-05 |
| 2025-08-30 09:25:37 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:25:49 - pico-train - INFO - Step 62375 -- ๐ Training Metrics |
| 2025-08-30 09:25:49 - pico-train - INFO - โโโ Loss: 5.7188 |
| 2025-08-30 09:25:49 - pico-train - INFO - โโโ Learning Rate: 1.79e-05 |
| 2025-08-30 09:25:49 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:26:02 - pico-train - INFO - Step 62400 -- ๐ Training Metrics |
| 2025-08-30 09:26:02 - pico-train - INFO - โโโ Loss: 5.8399 |
| 2025-08-30 09:26:02 - pico-train - INFO - โโโ Learning Rate: 1.79e-05 |
| 2025-08-30 09:26:02 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:26:15 - pico-train - INFO - Step 62425 -- ๐ Training Metrics |
| 2025-08-30 09:26:15 - pico-train - INFO - โโโ Loss: 5.8522 |
| 2025-08-30 09:26:15 - pico-train - INFO - โโโ Learning Rate: 1.79e-05 |
| 2025-08-30 09:26:15 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:26:27 - pico-train - INFO - Step 62450 -- ๐ Training Metrics |
| 2025-08-30 09:26:27 - pico-train - INFO - โโโ Loss: 5.8175 |
| 2025-08-30 09:26:27 - pico-train - INFO - โโโ Learning Rate: 1.79e-05 |
| 2025-08-30 09:26:27 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:26:40 - pico-train - INFO - Step 62475 -- ๐ Training Metrics |
| 2025-08-30 09:26:40 - pico-train - INFO - โโโ Loss: 5.9304 |
| 2025-08-30 09:26:40 - pico-train - INFO - โโโ Learning Rate: 1.79e-05 |
| 2025-08-30 09:26:40 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:26:53 - pico-train - INFO - Step 62500 -- ๐พ Saving Checkpoint |
| 2025-08-30 09:28:59 - pico-train - INFO - Step 62500 -- ๐ Evaluation Results |
| 2025-08-30 09:28:59 - pico-train - INFO - โโโ paloma: 1.026584430453375e+31 |
| 2025-08-30 09:29:02 - pico-train - INFO - Step 62500 -- ๐ Training Metrics |
| 2025-08-30 09:29:02 - pico-train - INFO - โโโ Loss: 5.9047 |
| 2025-08-30 09:29:02 - pico-train - INFO - โโโ Learning Rate: 1.78e-05 |
| 2025-08-30 09:29:02 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:29:02 - pico-train - INFO - Step 62500 -- ๐ Saving Learning Dynamics |
| 2025-08-30 09:29:17 - pico-train - INFO - Step 62525 -- ๐ Training Metrics |
| 2025-08-30 09:29:17 - pico-train - INFO - โโโ Loss: 5.8436 |
| 2025-08-30 09:29:17 - pico-train - INFO - โโโ Learning Rate: 1.78e-05 |
| 2025-08-30 09:29:17 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:29:29 - pico-train - INFO - Step 62550 -- ๐ Training Metrics |
| 2025-08-30 09:29:29 - pico-train - INFO - โโโ Loss: 5.8456 |
| 2025-08-30 09:29:29 - pico-train - INFO - โโโ Learning Rate: 1.78e-05 |
| 2025-08-30 09:29:29 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:29:42 - pico-train - INFO - Step 62575 -- ๐ Training Metrics |
| 2025-08-30 09:29:42 - pico-train - INFO - โโโ Loss: 5.8538 |
| 2025-08-30 09:29:42 - pico-train - INFO - โโโ Learning Rate: 1.78e-05 |
| 2025-08-30 09:29:42 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:29:55 - pico-train - INFO - Step 62600 -- ๐ Training Metrics |
| 2025-08-30 09:29:55 - pico-train - INFO - โโโ Loss: 5.9303 |
| 2025-08-30 09:29:55 - pico-train - INFO - โโโ Learning Rate: 1.78e-05 |
| 2025-08-30 09:29:55 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:30:08 - pico-train - INFO - Step 62625 -- ๐ Training Metrics |
| 2025-08-30 09:30:08 - pico-train - INFO - โโโ Loss: 5.8303 |
| 2025-08-30 09:30:08 - pico-train - INFO - โโโ Learning Rate: 1.77e-05 |
| 2025-08-30 09:30:08 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:30:21 - pico-train - INFO - Step 62650 -- ๐ Training Metrics |
| 2025-08-30 09:30:21 - pico-train - INFO - โโโ Loss: 5.8259 |
| 2025-08-30 09:30:21 - pico-train - INFO - โโโ Learning Rate: 1.77e-05 |
| 2025-08-30 09:30:21 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:30:33 - pico-train - INFO - Step 62675 -- ๐ Training Metrics |
| 2025-08-30 09:30:33 - pico-train - INFO - โโโ Loss: 5.8603 |
| 2025-08-30 09:30:33 - pico-train - INFO - โโโ Learning Rate: 1.77e-05 |
| 2025-08-30 09:30:33 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:30:46 - pico-train - INFO - Step 62700 -- ๐ Training Metrics |
| 2025-08-30 09:30:46 - pico-train - INFO - โโโ Loss: 5.8287 |
| 2025-08-30 09:30:46 - pico-train - INFO - โโโ Learning Rate: 1.77e-05 |
| 2025-08-30 09:30:46 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:30:59 - pico-train - INFO - Step 62725 -- ๐ Training Metrics |
| 2025-08-30 09:30:59 - pico-train - INFO - โโโ Loss: 5.8268 |
| 2025-08-30 09:30:59 - pico-train - INFO - โโโ Learning Rate: 1.77e-05 |
| 2025-08-30 09:30:59 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:31:12 - pico-train - INFO - Step 62750 -- ๐ Training Metrics |
| 2025-08-30 09:31:12 - pico-train - INFO - โโโ Loss: 5.8671 |
| 2025-08-30 09:31:12 - pico-train - INFO - โโโ Learning Rate: 1.76e-05 |
| 2025-08-30 09:31:12 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:31:25 - pico-train - INFO - Step 62775 -- ๐ Training Metrics |
| 2025-08-30 09:31:25 - pico-train - INFO - โโโ Loss: 5.7714 |
| 2025-08-30 09:31:25 - pico-train - INFO - โโโ Learning Rate: 1.76e-05 |
| 2025-08-30 09:31:25 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:31:37 - pico-train - INFO - Step 62800 -- ๐ Training Metrics |
| 2025-08-30 09:31:37 - pico-train - INFO - โโโ Loss: 5.8034 |
| 2025-08-30 09:31:37 - pico-train - INFO - โโโ Learning Rate: 1.76e-05 |
| 2025-08-30 09:31:37 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:31:50 - pico-train - INFO - Step 62825 -- ๐ Training Metrics |
| 2025-08-30 09:31:50 - pico-train - INFO - โโโ Loss: 5.8833 |
| 2025-08-30 09:31:50 - pico-train - INFO - โโโ Learning Rate: 1.76e-05 |
| 2025-08-30 09:31:50 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:32:03 - pico-train - INFO - Step 62850 -- ๐ Training Metrics |
| 2025-08-30 09:32:03 - pico-train - INFO - โโโ Loss: 5.7885 |
| 2025-08-30 09:32:03 - pico-train - INFO - โโโ Learning Rate: 1.76e-05 |
| 2025-08-30 09:32:03 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:32:15 - pico-train - INFO - Step 62875 -- ๐ Training Metrics |
| 2025-08-30 09:32:15 - pico-train - INFO - โโโ Loss: 5.8884 |
| 2025-08-30 09:32:15 - pico-train - INFO - โโโ Learning Rate: 1.75e-05 |
| 2025-08-30 09:32:15 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:32:28 - pico-train - INFO - Step 62900 -- ๐ Training Metrics |
| 2025-08-30 09:32:28 - pico-train - INFO - โโโ Loss: 5.7919 |
| 2025-08-30 09:32:28 - pico-train - INFO - โโโ Learning Rate: 1.75e-05 |
| 2025-08-30 09:32:28 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:32:41 - pico-train - INFO - Step 62925 -- ๐ Training Metrics |
| 2025-08-30 09:32:41 - pico-train - INFO - โโโ Loss: 5.8612 |
| 2025-08-30 09:32:41 - pico-train - INFO - โโโ Learning Rate: 1.75e-05 |
| 2025-08-30 09:32:41 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:32:54 - pico-train - INFO - Step 62950 -- ๐ Training Metrics |
| 2025-08-30 09:32:54 - pico-train - INFO - โโโ Loss: 5.7049 |
| 2025-08-30 09:32:54 - pico-train - INFO - โโโ Learning Rate: 1.75e-05 |
| 2025-08-30 09:32:54 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:33:06 - pico-train - INFO - Step 62975 -- ๐ Training Metrics |
| 2025-08-30 09:33:06 - pico-train - INFO - โโโ Loss: 5.8447 |
| 2025-08-30 09:33:06 - pico-train - INFO - โโโ Learning Rate: 1.75e-05 |
| 2025-08-30 09:33:06 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:33:19 - pico-train - INFO - Step 63000 -- ๐พ Saving Checkpoint |
| 2025-08-30 09:35:29 - pico-train - INFO - Step 63000 -- ๐ Evaluation Results |
| 2025-08-30 09:35:29 - pico-train - INFO - โโโ paloma: 1.053901252110863e+31 |
| 2025-08-30 09:35:31 - pico-train - INFO - Step 63000 -- ๐ Training Metrics |
| 2025-08-30 09:35:31 - pico-train - INFO - โโโ Loss: 5.8600 |
| 2025-08-30 09:35:31 - pico-train - INFO - โโโ Learning Rate: 1.74e-05 |
| 2025-08-30 09:35:31 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:35:31 - pico-train - INFO - Step 63000 -- ๐ Saving Learning Dynamics |
| 2025-08-30 09:35:46 - pico-train - INFO - Step 63025 -- ๐ Training Metrics |
| 2025-08-30 09:35:46 - pico-train - INFO - โโโ Loss: 5.8323 |
| 2025-08-30 09:35:46 - pico-train - INFO - โโโ Learning Rate: 1.74e-05 |
| 2025-08-30 09:35:46 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:35:59 - pico-train - INFO - Step 63050 -- ๐ Training Metrics |
| 2025-08-30 09:35:59 - pico-train - INFO - โโโ Loss: 5.7825 |
| 2025-08-30 09:35:59 - pico-train - INFO - โโโ Learning Rate: 1.74e-05 |
| 2025-08-30 09:35:59 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:36:12 - pico-train - INFO - Step 63075 -- ๐ Training Metrics |
| 2025-08-30 09:36:12 - pico-train - INFO - โโโ Loss: 5.8469 |
| 2025-08-30 09:36:12 - pico-train - INFO - โโโ Learning Rate: 1.74e-05 |
| 2025-08-30 09:36:12 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:36:24 - pico-train - INFO - Step 63100 -- ๐ Training Metrics |
| 2025-08-30 09:36:24 - pico-train - INFO - โโโ Loss: 5.8636 |
| 2025-08-30 09:36:24 - pico-train - INFO - โโโ Learning Rate: 1.74e-05 |
| 2025-08-30 09:36:24 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:36:37 - pico-train - INFO - Step 63125 -- ๐ Training Metrics |
| 2025-08-30 09:36:37 - pico-train - INFO - โโโ Loss: 5.8131 |
| 2025-08-30 09:36:37 - pico-train - INFO - โโโ Learning Rate: 1.73e-05 |
| 2025-08-30 09:36:37 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:36:50 - pico-train - INFO - Step 63150 -- ๐ Training Metrics |
| 2025-08-30 09:36:50 - pico-train - INFO - โโโ Loss: 5.8570 |
| 2025-08-30 09:36:50 - pico-train - INFO - โโโ Learning Rate: 1.73e-05 |
| 2025-08-30 09:36:50 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:37:02 - pico-train - INFO - Step 63175 -- ๐ Training Metrics |
| 2025-08-30 09:37:02 - pico-train - INFO - โโโ Loss: 5.9120 |
| 2025-08-30 09:37:02 - pico-train - INFO - โโโ Learning Rate: 1.73e-05 |
| 2025-08-30 09:37:02 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:37:15 - pico-train - INFO - Step 63200 -- ๐ Training Metrics |
| 2025-08-30 09:37:15 - pico-train - INFO - โโโ Loss: 5.7894 |
| 2025-08-30 09:37:15 - pico-train - INFO - โโโ Learning Rate: 1.73e-05 |
| 2025-08-30 09:37:15 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:37:28 - pico-train - INFO - Step 63225 -- ๐ Training Metrics |
| 2025-08-30 09:37:28 - pico-train - INFO - โโโ Loss: 5.7796 |
| 2025-08-30 09:37:28 - pico-train - INFO - โโโ Learning Rate: 1.73e-05 |
| 2025-08-30 09:37:28 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:37:41 - pico-train - INFO - Step 63250 -- ๐ Training Metrics |
| 2025-08-30 09:37:41 - pico-train - INFO - โโโ Loss: 5.7788 |
| 2025-08-30 09:37:41 - pico-train - INFO - โโโ Learning Rate: 1.72e-05 |
| 2025-08-30 09:37:41 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:37:53 - pico-train - INFO - Step 63275 -- ๐ Training Metrics |
| 2025-08-30 09:37:53 - pico-train - INFO - โโโ Loss: 5.9341 |
| 2025-08-30 09:37:53 - pico-train - INFO - โโโ Learning Rate: 1.72e-05 |
| 2025-08-30 09:37:53 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:38:06 - pico-train - INFO - Step 63300 -- ๐ Training Metrics |
| 2025-08-30 09:38:06 - pico-train - INFO - โโโ Loss: 5.7428 |
| 2025-08-30 09:38:06 - pico-train - INFO - โโโ Learning Rate: 1.72e-05 |
| 2025-08-30 09:38:06 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:38:19 - pico-train - INFO - Step 63325 -- ๐ Training Metrics |
| 2025-08-30 09:38:19 - pico-train - INFO - โโโ Loss: 5.8475 |
| 2025-08-30 09:38:19 - pico-train - INFO - โโโ Learning Rate: 1.72e-05 |
| 2025-08-30 09:38:19 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:38:31 - pico-train - INFO - Step 63350 -- ๐ Training Metrics |
| 2025-08-30 09:38:31 - pico-train - INFO - โโโ Loss: 5.8675 |
| 2025-08-30 09:38:31 - pico-train - INFO - โโโ Learning Rate: 1.72e-05 |
| 2025-08-30 09:38:31 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:38:44 - pico-train - INFO - Step 63375 -- ๐ Training Metrics |
| 2025-08-30 09:38:44 - pico-train - INFO - โโโ Loss: 5.8387 |
| 2025-08-30 09:38:44 - pico-train - INFO - โโโ Learning Rate: 1.71e-05 |
| 2025-08-30 09:38:44 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:38:57 - pico-train - INFO - Step 63400 -- ๐ Training Metrics |
| 2025-08-30 09:38:57 - pico-train - INFO - โโโ Loss: 5.8082 |
| 2025-08-30 09:38:57 - pico-train - INFO - โโโ Learning Rate: 1.71e-05 |
| 2025-08-30 09:38:57 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:39:09 - pico-train - INFO - Step 63425 -- ๐ Training Metrics |
| 2025-08-30 09:39:09 - pico-train - INFO - โโโ Loss: 5.8823 |
| 2025-08-30 09:39:09 - pico-train - INFO - โโโ Learning Rate: 1.71e-05 |
| 2025-08-30 09:39:09 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:39:22 - pico-train - INFO - Step 63450 -- ๐ Training Metrics |
| 2025-08-30 09:39:22 - pico-train - INFO - โโโ Loss: 5.8131 |
| 2025-08-30 09:39:22 - pico-train - INFO - โโโ Learning Rate: 1.71e-05 |
| 2025-08-30 09:39:22 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:39:35 - pico-train - INFO - Step 63475 -- ๐ Training Metrics |
| 2025-08-30 09:39:35 - pico-train - INFO - โโโ Loss: 5.8368 |
| 2025-08-30 09:39:35 - pico-train - INFO - โโโ Learning Rate: 1.71e-05 |
| 2025-08-30 09:39:35 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:39:48 - pico-train - INFO - Step 63500 -- ๐พ Saving Checkpoint |
| 2025-08-30 09:41:56 - pico-train - INFO - Step 63500 -- ๐ Evaluation Results |
| 2025-08-30 09:41:56 - pico-train - INFO - โโโ paloma: 1.3798321560822609e+31 |
| 2025-08-30 09:41:59 - pico-train - INFO - Step 63500 -- ๐ Training Metrics |
| 2025-08-30 09:41:59 - pico-train - INFO - โโโ Loss: 5.8774 |
| 2025-08-30 09:41:59 - pico-train - INFO - โโโ Learning Rate: 1.70e-05 |
| 2025-08-30 09:41:59 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:41:59 - pico-train - INFO - Step 63500 -- ๐ Saving Learning Dynamics |
| 2025-08-30 09:42:14 - pico-train - INFO - Step 63525 -- ๐ Training Metrics |
| 2025-08-30 09:42:14 - pico-train - INFO - โโโ Loss: 5.8403 |
| 2025-08-30 09:42:14 - pico-train - INFO - โโโ Learning Rate: 1.70e-05 |
| 2025-08-30 09:42:14 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:42:27 - pico-train - INFO - Step 63550 -- ๐ Training Metrics |
| 2025-08-30 09:42:27 - pico-train - INFO - โโโ Loss: 5.8268 |
| 2025-08-30 09:42:27 - pico-train - INFO - โโโ Learning Rate: 1.70e-05 |
| 2025-08-30 09:42:27 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:42:40 - pico-train - INFO - Step 63575 -- ๐ Training Metrics |
| 2025-08-30 09:42:40 - pico-train - INFO - โโโ Loss: 5.8713 |
| 2025-08-30 09:42:40 - pico-train - INFO - โโโ Learning Rate: 1.70e-05 |
| 2025-08-30 09:42:40 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:42:52 - pico-train - INFO - Step 63600 -- ๐ Training Metrics |
| 2025-08-30 09:42:52 - pico-train - INFO - โโโ Loss: 5.9887 |
| 2025-08-30 09:42:52 - pico-train - INFO - โโโ Learning Rate: 1.70e-05 |
| 2025-08-30 09:42:52 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:43:05 - pico-train - INFO - Step 63625 -- ๐ Training Metrics |
| 2025-08-30 09:43:05 - pico-train - INFO - โโโ Loss: 5.7719 |
| 2025-08-30 09:43:05 - pico-train - INFO - โโโ Learning Rate: 1.69e-05 |
| 2025-08-30 09:43:05 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:43:18 - pico-train - INFO - Step 63650 -- ๐ Training Metrics |
| 2025-08-30 09:43:18 - pico-train - INFO - โโโ Loss: 5.9020 |
| 2025-08-30 09:43:18 - pico-train - INFO - โโโ Learning Rate: 1.69e-05 |
| 2025-08-30 09:43:18 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:43:30 - pico-train - INFO - Step 63675 -- ๐ Training Metrics |
| 2025-08-30 09:43:30 - pico-train - INFO - โโโ Loss: 5.7964 |
| 2025-08-30 09:43:30 - pico-train - INFO - โโโ Learning Rate: 1.69e-05 |
| 2025-08-30 09:43:30 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:43:43 - pico-train - INFO - Step 63700 -- ๐ Training Metrics |
| 2025-08-30 09:43:43 - pico-train - INFO - โโโ Loss: 5.7920 |
| 2025-08-30 09:43:43 - pico-train - INFO - โโโ Learning Rate: 1.69e-05 |
| 2025-08-30 09:43:43 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:43:55 - pico-train - INFO - Step 63725 -- ๐ Training Metrics |
| 2025-08-30 09:43:55 - pico-train - INFO - โโโ Loss: 5.7781 |
| 2025-08-30 09:43:55 - pico-train - INFO - โโโ Learning Rate: 1.68e-05 |
| 2025-08-30 09:43:55 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:44:08 - pico-train - INFO - Step 63750 -- ๐ Training Metrics |
| 2025-08-30 09:44:08 - pico-train - INFO - โโโ Loss: 5.8701 |
| 2025-08-30 09:44:08 - pico-train - INFO - โโโ Learning Rate: 1.68e-05 |
| 2025-08-30 09:44:08 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:44:21 - pico-train - INFO - Step 63775 -- ๐ Training Metrics |
| 2025-08-30 09:44:21 - pico-train - INFO - โโโ Loss: 5.7957 |
| 2025-08-30 09:44:21 - pico-train - INFO - โโโ Learning Rate: 1.68e-05 |
| 2025-08-30 09:44:21 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:44:33 - pico-train - INFO - Step 63800 -- ๐ Training Metrics |
| 2025-08-30 09:44:33 - pico-train - INFO - โโโ Loss: 5.8493 |
| 2025-08-30 09:44:33 - pico-train - INFO - โโโ Learning Rate: 1.68e-05 |
| 2025-08-30 09:44:33 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:44:46 - pico-train - INFO - Step 63825 -- ๐ Training Metrics |
| 2025-08-30 09:44:46 - pico-train - INFO - โโโ Loss: 5.8591 |
| 2025-08-30 09:44:46 - pico-train - INFO - โโโ Learning Rate: 1.68e-05 |
| 2025-08-30 09:44:46 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:44:59 - pico-train - INFO - Step 63850 -- ๐ Training Metrics |
| 2025-08-30 09:44:59 - pico-train - INFO - โโโ Loss: 5.9283 |
| 2025-08-30 09:44:59 - pico-train - INFO - โโโ Learning Rate: 1.67e-05 |
| 2025-08-30 09:44:59 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:45:12 - pico-train - INFO - Step 63875 -- ๐ Training Metrics |
| 2025-08-30 09:45:12 - pico-train - INFO - โโโ Loss: 5.8760 |
| 2025-08-30 09:45:12 - pico-train - INFO - โโโ Learning Rate: 1.67e-05 |
| 2025-08-30 09:45:12 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:45:24 - pico-train - INFO - Step 63900 -- ๐ Training Metrics |
| 2025-08-30 09:45:24 - pico-train - INFO - โโโ Loss: 5.8496 |
| 2025-08-30 09:45:24 - pico-train - INFO - โโโ Learning Rate: 1.67e-05 |
| 2025-08-30 09:45:24 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:45:37 - pico-train - INFO - Step 63925 -- ๐ Training Metrics |
| 2025-08-30 09:45:37 - pico-train - INFO - โโโ Loss: 5.7896 |
| 2025-08-30 09:45:37 - pico-train - INFO - โโโ Learning Rate: 1.67e-05 |
| 2025-08-30 09:45:37 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:45:50 - pico-train - INFO - Step 63950 -- ๐ Training Metrics |
| 2025-08-30 09:45:50 - pico-train - INFO - โโโ Loss: 5.8621 |
| 2025-08-30 09:45:50 - pico-train - INFO - โโโ Learning Rate: 1.67e-05 |
| 2025-08-30 09:45:50 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:46:02 - pico-train - INFO - Step 63975 -- ๐ Training Metrics |
| 2025-08-30 09:46:02 - pico-train - INFO - โโโ Loss: 5.8765 |
| 2025-08-30 09:46:02 - pico-train - INFO - โโโ Learning Rate: 1.66e-05 |
| 2025-08-30 09:46:02 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:46:15 - pico-train - INFO - Step 64000 -- ๐พ Saving Checkpoint |
| 2025-08-30 09:48:17 - pico-train - INFO - Step 64000 -- ๐ Evaluation Results |
| 2025-08-30 09:48:17 - pico-train - INFO - โโโ paloma: 1.5176259204672668e+31 |
| 2025-08-30 09:48:20 - pico-train - INFO - Step 64000 -- ๐ Training Metrics |
| 2025-08-30 09:48:20 - pico-train - INFO - โโโ Loss: 5.9281 |
| 2025-08-30 09:48:20 - pico-train - INFO - โโโ Learning Rate: 1.66e-05 |
| 2025-08-30 09:48:20 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:48:20 - pico-train - INFO - Step 64000 -- ๐ Saving Learning Dynamics |
| 2025-08-30 09:48:35 - pico-train - INFO - Step 64025 -- ๐ Training Metrics |
| 2025-08-30 09:48:35 - pico-train - INFO - โโโ Loss: 5.8790 |
| 2025-08-30 09:48:35 - pico-train - INFO - โโโ Learning Rate: 1.66e-05 |
| 2025-08-30 09:48:35 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:48:47 - pico-train - INFO - Step 64050 -- ๐ Training Metrics |
| 2025-08-30 09:48:47 - pico-train - INFO - โโโ Loss: 5.8652 |
| 2025-08-30 09:48:47 - pico-train - INFO - โโโ Learning Rate: 1.66e-05 |
| 2025-08-30 09:48:47 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:49:00 - pico-train - INFO - Step 64075 -- ๐ Training Metrics |
| 2025-08-30 09:49:00 - pico-train - INFO - โโโ Loss: 5.8631 |
| 2025-08-30 09:49:00 - pico-train - INFO - โโโ Learning Rate: 1.66e-05 |
| 2025-08-30 09:49:00 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:49:12 - pico-train - INFO - Step 64100 -- ๐ Training Metrics |
| 2025-08-30 09:49:12 - pico-train - INFO - โโโ Loss: 5.8123 |
| 2025-08-30 09:49:12 - pico-train - INFO - โโโ Learning Rate: 1.65e-05 |
| 2025-08-30 09:49:12 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:49:25 - pico-train - INFO - Step 64125 -- ๐ Training Metrics |
| 2025-08-30 09:49:25 - pico-train - INFO - โโโ Loss: 5.8136 |
| 2025-08-30 09:49:25 - pico-train - INFO - โโโ Learning Rate: 1.65e-05 |
| 2025-08-30 09:49:25 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:49:38 - pico-train - INFO - Step 64150 -- ๐ Training Metrics |
| 2025-08-30 09:49:38 - pico-train - INFO - โโโ Loss: 5.8727 |
| 2025-08-30 09:49:38 - pico-train - INFO - โโโ Learning Rate: 1.65e-05 |
| 2025-08-30 09:49:38 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:49:50 - pico-train - INFO - Step 64175 -- ๐ Training Metrics |
| 2025-08-30 09:49:50 - pico-train - INFO - โโโ Loss: 5.8386 |
| 2025-08-30 09:49:50 - pico-train - INFO - โโโ Learning Rate: 1.65e-05 |
| 2025-08-30 09:49:50 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:50:03 - pico-train - INFO - Step 64200 -- ๐ Training Metrics |
| 2025-08-30 09:50:03 - pico-train - INFO - โโโ Loss: 5.8189 |
| 2025-08-30 09:50:03 - pico-train - INFO - โโโ Learning Rate: 1.65e-05 |
| 2025-08-30 09:50:03 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:50:16 - pico-train - INFO - Step 64225 -- ๐ Training Metrics |
| 2025-08-30 09:50:16 - pico-train - INFO - โโโ Loss: 5.8936 |
| 2025-08-30 09:50:16 - pico-train - INFO - โโโ Learning Rate: 1.64e-05 |
| 2025-08-30 09:50:16 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:50:29 - pico-train - INFO - Step 64250 -- ๐ Training Metrics |
| 2025-08-30 09:50:29 - pico-train - INFO - โโโ Loss: 5.8517 |
| 2025-08-30 09:50:29 - pico-train - INFO - โโโ Learning Rate: 1.64e-05 |
| 2025-08-30 09:50:29 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:50:41 - pico-train - INFO - Step 64275 -- ๐ Training Metrics |
| 2025-08-30 09:50:41 - pico-train - INFO - โโโ Loss: 5.9134 |
| 2025-08-30 09:50:41 - pico-train - INFO - โโโ Learning Rate: 1.64e-05 |
| 2025-08-30 09:50:41 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:50:54 - pico-train - INFO - Step 64300 -- ๐ Training Metrics |
| 2025-08-30 09:50:54 - pico-train - INFO - โโโ Loss: 5.8338 |
| 2025-08-30 09:50:54 - pico-train - INFO - โโโ Learning Rate: 1.64e-05 |
| 2025-08-30 09:50:54 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:51:07 - pico-train - INFO - Step 64325 -- ๐ Training Metrics |
| 2025-08-30 09:51:07 - pico-train - INFO - โโโ Loss: 5.9309 |
| 2025-08-30 09:51:07 - pico-train - INFO - โโโ Learning Rate: 1.64e-05 |
| 2025-08-30 09:51:07 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:51:19 - pico-train - INFO - Step 64350 -- ๐ Training Metrics |
| 2025-08-30 09:51:19 - pico-train - INFO - โโโ Loss: 5.8091 |
| 2025-08-30 09:51:19 - pico-train - INFO - โโโ Learning Rate: 1.63e-05 |
| 2025-08-30 09:51:19 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:51:32 - pico-train - INFO - Step 64375 -- ๐ Training Metrics |
| 2025-08-30 09:51:32 - pico-train - INFO - โโโ Loss: 5.8666 |
| 2025-08-30 09:51:32 - pico-train - INFO - โโโ Learning Rate: 1.63e-05 |
| 2025-08-30 09:51:32 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:51:44 - pico-train - INFO - Step 64400 -- ๐ Training Metrics |
| 2025-08-30 09:51:44 - pico-train - INFO - โโโ Loss: 5.7732 |
| 2025-08-30 09:51:44 - pico-train - INFO - โโโ Learning Rate: 1.63e-05 |
| 2025-08-30 09:51:44 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:51:57 - pico-train - INFO - Step 64425 -- ๐ Training Metrics |
| 2025-08-30 09:51:57 - pico-train - INFO - โโโ Loss: 5.8354 |
| 2025-08-30 09:51:57 - pico-train - INFO - โโโ Learning Rate: 1.63e-05 |
| 2025-08-30 09:51:57 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:52:10 - pico-train - INFO - Step 64450 -- ๐ Training Metrics |
| 2025-08-30 09:52:10 - pico-train - INFO - โโโ Loss: 5.8674 |
| 2025-08-30 09:52:10 - pico-train - INFO - โโโ Learning Rate: 1.63e-05 |
| 2025-08-30 09:52:10 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:52:23 - pico-train - INFO - Step 64475 -- ๐ Training Metrics |
| 2025-08-30 09:52:23 - pico-train - INFO - โโโ Loss: 5.8365 |
| 2025-08-30 09:52:23 - pico-train - INFO - โโโ Learning Rate: 1.62e-05 |
| 2025-08-30 09:52:23 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:52:36 - pico-train - INFO - Step 64500 -- ๐พ Saving Checkpoint |
| 2025-08-30 09:54:35 - pico-train - INFO - Step 64500 -- ๐ Evaluation Results |
| 2025-08-30 09:54:35 - pico-train - INFO - โโโ paloma: 1.7413715227937596e+31 |
| 2025-08-30 09:54:37 - pico-train - INFO - Step 64500 -- ๐ Training Metrics |
| 2025-08-30 09:54:37 - pico-train - INFO - โโโ Loss: 5.7904 |
| 2025-08-30 09:54:37 - pico-train - INFO - โโโ Learning Rate: 1.62e-05 |
| 2025-08-30 09:54:37 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:54:37 - pico-train - INFO - Step 64500 -- ๐ Saving Learning Dynamics |
| 2025-08-30 09:54:52 - pico-train - INFO - Step 64525 -- ๐ Training Metrics |
| 2025-08-30 09:54:52 - pico-train - INFO - โโโ Loss: 5.7861 |
| 2025-08-30 09:54:52 - pico-train - INFO - โโโ Learning Rate: 1.62e-05 |
| 2025-08-30 09:54:52 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:55:05 - pico-train - INFO - Step 64550 -- ๐ Training Metrics |
| 2025-08-30 09:55:05 - pico-train - INFO - โโโ Loss: 5.7797 |
| 2025-08-30 09:55:05 - pico-train - INFO - โโโ Learning Rate: 1.62e-05 |
| 2025-08-30 09:55:05 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:55:18 - pico-train - INFO - Step 64575 -- ๐ Training Metrics |
| 2025-08-30 09:55:18 - pico-train - INFO - โโโ Loss: 5.7777 |
| 2025-08-30 09:55:18 - pico-train - INFO - โโโ Learning Rate: 1.62e-05 |
| 2025-08-30 09:55:18 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:55:30 - pico-train - INFO - Step 64600 -- ๐ Training Metrics |
| 2025-08-30 09:55:30 - pico-train - INFO - โโโ Loss: 5.8649 |
| 2025-08-30 09:55:30 - pico-train - INFO - โโโ Learning Rate: 1.61e-05 |
| 2025-08-30 09:55:30 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:55:43 - pico-train - INFO - Step 64625 -- ๐ Training Metrics |
| 2025-08-30 09:55:43 - pico-train - INFO - โโโ Loss: 5.8215 |
| 2025-08-30 09:55:43 - pico-train - INFO - โโโ Learning Rate: 1.61e-05 |
| 2025-08-30 09:55:43 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:55:56 - pico-train - INFO - Step 64650 -- ๐ Training Metrics |
| 2025-08-30 09:55:56 - pico-train - INFO - โโโ Loss: 5.8024 |
| 2025-08-30 09:55:56 - pico-train - INFO - โโโ Learning Rate: 1.61e-05 |
| 2025-08-30 09:55:56 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:56:08 - pico-train - INFO - Step 64675 -- ๐ Training Metrics |
| 2025-08-30 09:56:08 - pico-train - INFO - โโโ Loss: 5.8857 |
| 2025-08-30 09:56:08 - pico-train - INFO - โโโ Learning Rate: 1.61e-05 |
| 2025-08-30 09:56:08 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:56:21 - pico-train - INFO - Step 64700 -- ๐ Training Metrics |
| 2025-08-30 09:56:21 - pico-train - INFO - โโโ Loss: 5.7671 |
| 2025-08-30 09:56:21 - pico-train - INFO - โโโ Learning Rate: 1.61e-05 |
| 2025-08-30 09:56:21 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:56:34 - pico-train - INFO - Step 64725 -- ๐ Training Metrics |
| 2025-08-30 09:56:34 - pico-train - INFO - โโโ Loss: 5.8027 |
| 2025-08-30 09:56:34 - pico-train - INFO - โโโ Learning Rate: 1.60e-05 |
| 2025-08-30 09:56:34 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:56:47 - pico-train - INFO - Step 64750 -- ๐ Training Metrics |
| 2025-08-30 09:56:47 - pico-train - INFO - โโโ Loss: 5.8995 |
| 2025-08-30 09:56:47 - pico-train - INFO - โโโ Learning Rate: 1.60e-05 |
| 2025-08-30 09:56:47 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:56:59 - pico-train - INFO - Step 64775 -- ๐ Training Metrics |
| 2025-08-30 09:56:59 - pico-train - INFO - โโโ Loss: 5.7634 |
| 2025-08-30 09:56:59 - pico-train - INFO - โโโ Learning Rate: 1.60e-05 |
| 2025-08-30 09:56:59 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:57:12 - pico-train - INFO - Step 64800 -- ๐ Training Metrics |
| 2025-08-30 09:57:12 - pico-train - INFO - โโโ Loss: 5.8010 |
| 2025-08-30 09:57:12 - pico-train - INFO - โโโ Learning Rate: 1.60e-05 |
| 2025-08-30 09:57:12 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:57:25 - pico-train - INFO - Step 64825 -- ๐ Training Metrics |
| 2025-08-30 09:57:25 - pico-train - INFO - โโโ Loss: 5.7916 |
| 2025-08-30 09:57:25 - pico-train - INFO - โโโ Learning Rate: 1.60e-05 |
| 2025-08-30 09:57:25 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:57:38 - pico-train - INFO - Step 64850 -- ๐ Training Metrics |
| 2025-08-30 09:57:38 - pico-train - INFO - โโโ Loss: 5.7833 |
| 2025-08-30 09:57:38 - pico-train - INFO - โโโ Learning Rate: 1.59e-05 |
| 2025-08-30 09:57:38 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:57:50 - pico-train - INFO - Step 64875 -- ๐ Training Metrics |
| 2025-08-30 09:57:50 - pico-train - INFO - โโโ Loss: 5.8170 |
| 2025-08-30 09:57:50 - pico-train - INFO - โโโ Learning Rate: 1.59e-05 |
| 2025-08-30 09:57:50 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:58:03 - pico-train - INFO - Step 64900 -- ๐ Training Metrics |
| 2025-08-30 09:58:03 - pico-train - INFO - โโโ Loss: 5.8529 |
| 2025-08-30 09:58:03 - pico-train - INFO - โโโ Learning Rate: 1.59e-05 |
| 2025-08-30 09:58:03 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:58:16 - pico-train - INFO - Step 64925 -- ๐ Training Metrics |
| 2025-08-30 09:58:16 - pico-train - INFO - โโโ Loss: 5.8294 |
| 2025-08-30 09:58:16 - pico-train - INFO - โโโ Learning Rate: 1.59e-05 |
| 2025-08-30 09:58:16 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:58:28 - pico-train - INFO - Step 64950 -- ๐ Training Metrics |
| 2025-08-30 09:58:28 - pico-train - INFO - โโโ Loss: 5.8264 |
| 2025-08-30 09:58:28 - pico-train - INFO - โโโ Learning Rate: 1.59e-05 |
| 2025-08-30 09:58:28 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:58:41 - pico-train - INFO - Step 64975 -- ๐ Training Metrics |
| 2025-08-30 09:58:41 - pico-train - INFO - โโโ Loss: 5.7959 |
| 2025-08-30 09:58:41 - pico-train - INFO - โโโ Learning Rate: 1.58e-05 |
| 2025-08-30 09:58:41 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 09:58:54 - pico-train - INFO - Step 65000 -- ๐พ Saving Checkpoint |
| 2025-08-30 10:00:52 - pico-train - INFO - Step 65000 -- ๐ Evaluation Results |
| 2025-08-30 10:00:52 - pico-train - INFO - โโโ paloma: 1.9165716287373382e+31 |
| 2025-08-30 10:00:55 - pico-train - INFO - Step 65000 -- ๐ Training Metrics |
| 2025-08-30 10:00:55 - pico-train - INFO - โโโ Loss: 5.8632 |
| 2025-08-30 10:00:55 - pico-train - INFO - โโโ Learning Rate: 1.58e-05 |
| 2025-08-30 10:00:55 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:00:55 - pico-train - INFO - Step 65000 -- ๐ Saving Learning Dynamics |
| 2025-08-30 10:01:10 - pico-train - INFO - Step 65025 -- ๐ Training Metrics |
| 2025-08-30 10:01:10 - pico-train - INFO - โโโ Loss: 5.8177 |
| 2025-08-30 10:01:10 - pico-train - INFO - โโโ Learning Rate: 1.58e-05 |
| 2025-08-30 10:01:10 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:01:23 - pico-train - INFO - Step 65050 -- ๐ Training Metrics |
| 2025-08-30 10:01:23 - pico-train - INFO - โโโ Loss: 5.7954 |
| 2025-08-30 10:01:23 - pico-train - INFO - โโโ Learning Rate: 1.58e-05 |
| 2025-08-30 10:01:23 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:01:35 - pico-train - INFO - Step 65075 -- ๐ Training Metrics |
| 2025-08-30 10:01:35 - pico-train - INFO - โโโ Loss: 5.7900 |
| 2025-08-30 10:01:35 - pico-train - INFO - โโโ Learning Rate: 1.58e-05 |
| 2025-08-30 10:01:35 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:01:48 - pico-train - INFO - Step 65100 -- ๐ Training Metrics |
| 2025-08-30 10:01:48 - pico-train - INFO - โโโ Loss: 5.8748 |
| 2025-08-30 10:01:48 - pico-train - INFO - โโโ Learning Rate: 1.57e-05 |
| 2025-08-30 10:01:48 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:02:00 - pico-train - INFO - Step 65125 -- ๐ Training Metrics |
| 2025-08-30 10:02:00 - pico-train - INFO - โโโ Loss: 5.8848 |
| 2025-08-30 10:02:00 - pico-train - INFO - โโโ Learning Rate: 1.57e-05 |
| 2025-08-30 10:02:00 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:02:13 - pico-train - INFO - Step 65150 -- ๐ Training Metrics |
| 2025-08-30 10:02:13 - pico-train - INFO - โโโ Loss: 5.8230 |
| 2025-08-30 10:02:13 - pico-train - INFO - โโโ Learning Rate: 1.57e-05 |
| 2025-08-30 10:02:13 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:02:26 - pico-train - INFO - Step 65175 -- ๐ Training Metrics |
| 2025-08-30 10:02:26 - pico-train - INFO - โโโ Loss: 5.8187 |
| 2025-08-30 10:02:26 - pico-train - INFO - โโโ Learning Rate: 1.57e-05 |
| 2025-08-30 10:02:26 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:02:38 - pico-train - INFO - Step 65200 -- ๐ Training Metrics |
| 2025-08-30 10:02:38 - pico-train - INFO - โโโ Loss: 5.7594 |
| 2025-08-30 10:02:38 - pico-train - INFO - โโโ Learning Rate: 1.57e-05 |
| 2025-08-30 10:02:38 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:02:51 - pico-train - INFO - Step 65225 -- ๐ Training Metrics |
| 2025-08-30 10:02:51 - pico-train - INFO - โโโ Loss: 5.8269 |
| 2025-08-30 10:02:51 - pico-train - INFO - โโโ Learning Rate: 1.57e-05 |
| 2025-08-30 10:02:51 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:03:03 - pico-train - INFO - Step 65250 -- ๐ Training Metrics |
| 2025-08-30 10:03:03 - pico-train - INFO - โโโ Loss: 5.8085 |
| 2025-08-30 10:03:03 - pico-train - INFO - โโโ Learning Rate: 1.56e-05 |
| 2025-08-30 10:03:03 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:03:16 - pico-train - INFO - Step 65275 -- ๐ Training Metrics |
| 2025-08-30 10:03:16 - pico-train - INFO - โโโ Loss: 5.7563 |
| 2025-08-30 10:03:16 - pico-train - INFO - โโโ Learning Rate: 1.56e-05 |
| 2025-08-30 10:03:16 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:03:29 - pico-train - INFO - Step 65300 -- ๐ Training Metrics |
| 2025-08-30 10:03:29 - pico-train - INFO - โโโ Loss: 5.8133 |
| 2025-08-30 10:03:29 - pico-train - INFO - โโโ Learning Rate: 1.56e-05 |
| 2025-08-30 10:03:29 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:03:41 - pico-train - INFO - Step 65325 -- ๐ Training Metrics |
| 2025-08-30 10:03:41 - pico-train - INFO - โโโ Loss: 5.8193 |
| 2025-08-30 10:03:41 - pico-train - INFO - โโโ Learning Rate: 1.56e-05 |
| 2025-08-30 10:03:41 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:03:54 - pico-train - INFO - Step 65350 -- ๐ Training Metrics |
| 2025-08-30 10:03:54 - pico-train - INFO - โโโ Loss: 5.8060 |
| 2025-08-30 10:03:54 - pico-train - INFO - โโโ Learning Rate: 1.56e-05 |
| 2025-08-30 10:03:54 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:04:06 - pico-train - INFO - Step 65375 -- ๐ Training Metrics |
| 2025-08-30 10:04:06 - pico-train - INFO - โโโ Loss: 5.8249 |
| 2025-08-30 10:04:06 - pico-train - INFO - โโโ Learning Rate: 1.55e-05 |
| 2025-08-30 10:04:06 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:04:19 - pico-train - INFO - Step 65400 -- ๐ Training Metrics |
| 2025-08-30 10:04:19 - pico-train - INFO - โโโ Loss: 5.8455 |
| 2025-08-30 10:04:19 - pico-train - INFO - โโโ Learning Rate: 1.55e-05 |
| 2025-08-30 10:04:19 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:04:32 - pico-train - INFO - Step 65425 -- ๐ Training Metrics |
| 2025-08-30 10:04:32 - pico-train - INFO - โโโ Loss: 5.8625 |
| 2025-08-30 10:04:32 - pico-train - INFO - โโโ Learning Rate: 1.55e-05 |
| 2025-08-30 10:04:32 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:04:44 - pico-train - INFO - Step 65450 -- ๐ Training Metrics |
| 2025-08-30 10:04:44 - pico-train - INFO - โโโ Loss: 5.8366 |
| 2025-08-30 10:04:44 - pico-train - INFO - โโโ Learning Rate: 1.55e-05 |
| 2025-08-30 10:04:44 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:04:57 - pico-train - INFO - Step 65475 -- ๐ Training Metrics |
| 2025-08-30 10:04:57 - pico-train - INFO - โโโ Loss: 5.8005 |
| 2025-08-30 10:04:57 - pico-train - INFO - โโโ Learning Rate: 1.55e-05 |
| 2025-08-30 10:04:57 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:05:09 - pico-train - INFO - Step 65500 -- ๐พ Saving Checkpoint |
| 2025-08-30 10:07:04 - pico-train - INFO - Step 65500 -- ๐ Evaluation Results |
| 2025-08-30 10:07:04 - pico-train - INFO - โโโ paloma: 1.8707850216569984e+31 |
| 2025-08-30 10:07:06 - pico-train - INFO - Step 65500 -- ๐ Training Metrics |
| 2025-08-30 10:07:06 - pico-train - INFO - โโโ Loss: 5.8969 |
| 2025-08-30 10:07:06 - pico-train - INFO - โโโ Learning Rate: 1.54e-05 |
| 2025-08-30 10:07:06 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:07:06 - pico-train - INFO - Step 65500 -- ๐ Saving Learning Dynamics |
| 2025-08-30 10:07:21 - pico-train - INFO - Step 65525 -- ๐ Training Metrics |
| 2025-08-30 10:07:21 - pico-train - INFO - โโโ Loss: 5.8361 |
| 2025-08-30 10:07:21 - pico-train - INFO - โโโ Learning Rate: 1.54e-05 |
| 2025-08-30 10:07:21 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:07:33 - pico-train - INFO - Step 65550 -- ๐ Training Metrics |
| 2025-08-30 10:07:33 - pico-train - INFO - โโโ Loss: 5.8304 |
| 2025-08-30 10:07:33 - pico-train - INFO - โโโ Learning Rate: 1.54e-05 |
| 2025-08-30 10:07:33 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:07:46 - pico-train - INFO - Step 65575 -- ๐ Training Metrics |
| 2025-08-30 10:07:46 - pico-train - INFO - โโโ Loss: 5.8668 |
| 2025-08-30 10:07:46 - pico-train - INFO - โโโ Learning Rate: 1.54e-05 |
| 2025-08-30 10:07:46 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:07:59 - pico-train - INFO - Step 65600 -- ๐ Training Metrics |
| 2025-08-30 10:07:59 - pico-train - INFO - โโโ Loss: 5.8797 |
| 2025-08-30 10:07:59 - pico-train - INFO - โโโ Learning Rate: 1.54e-05 |
| 2025-08-30 10:07:59 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:08:12 - pico-train - INFO - Step 65625 -- ๐ Training Metrics |
| 2025-08-30 10:08:12 - pico-train - INFO - โโโ Loss: 5.8747 |
| 2025-08-30 10:08:12 - pico-train - INFO - โโโ Learning Rate: 1.53e-05 |
| 2025-08-30 10:08:12 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:08:25 - pico-train - INFO - Step 65650 -- ๐ Training Metrics |
| 2025-08-30 10:08:25 - pico-train - INFO - โโโ Loss: 5.8350 |
| 2025-08-30 10:08:25 - pico-train - INFO - โโโ Learning Rate: 1.53e-05 |
| 2025-08-30 10:08:25 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:08:37 - pico-train - INFO - Step 65675 -- ๐ Training Metrics |
| 2025-08-30 10:08:37 - pico-train - INFO - โโโ Loss: 5.8606 |
| 2025-08-30 10:08:37 - pico-train - INFO - โโโ Learning Rate: 1.53e-05 |
| 2025-08-30 10:08:37 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:08:50 - pico-train - INFO - Step 65700 -- ๐ Training Metrics |
| 2025-08-30 10:08:50 - pico-train - INFO - โโโ Loss: 5.8106 |
| 2025-08-30 10:08:50 - pico-train - INFO - โโโ Learning Rate: 1.53e-05 |
| 2025-08-30 10:08:50 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:09:02 - pico-train - INFO - Step 65725 -- ๐ Training Metrics |
| 2025-08-30 10:09:02 - pico-train - INFO - โโโ Loss: 5.9222 |
| 2025-08-30 10:09:02 - pico-train - INFO - โโโ Learning Rate: 1.53e-05 |
| 2025-08-30 10:09:02 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:09:15 - pico-train - INFO - Step 65750 -- ๐ Training Metrics |
| 2025-08-30 10:09:15 - pico-train - INFO - โโโ Loss: 5.8246 |
| 2025-08-30 10:09:15 - pico-train - INFO - โโโ Learning Rate: 1.52e-05 |
| 2025-08-30 10:09:15 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:09:27 - pico-train - INFO - Step 65775 -- ๐ Training Metrics |
| 2025-08-30 10:09:27 - pico-train - INFO - โโโ Loss: 5.8507 |
| 2025-08-30 10:09:27 - pico-train - INFO - โโโ Learning Rate: 1.52e-05 |
| 2025-08-30 10:09:27 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:09:40 - pico-train - INFO - Step 65800 -- ๐ Training Metrics |
| 2025-08-30 10:09:40 - pico-train - INFO - โโโ Loss: 5.8379 |
| 2025-08-30 10:09:40 - pico-train - INFO - โโโ Learning Rate: 1.52e-05 |
| 2025-08-30 10:09:40 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:09:53 - pico-train - INFO - Step 65825 -- ๐ Training Metrics |
| 2025-08-30 10:09:53 - pico-train - INFO - โโโ Loss: 5.8610 |
| 2025-08-30 10:09:53 - pico-train - INFO - โโโ Learning Rate: 1.52e-05 |
| 2025-08-30 10:09:53 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:10:05 - pico-train - INFO - Step 65850 -- ๐ Training Metrics |
| 2025-08-30 10:10:05 - pico-train - INFO - โโโ Loss: 5.8496 |
| 2025-08-30 10:10:05 - pico-train - INFO - โโโ Learning Rate: 1.52e-05 |
| 2025-08-30 10:10:05 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:10:18 - pico-train - INFO - Step 65875 -- ๐ Training Metrics |
| 2025-08-30 10:10:18 - pico-train - INFO - โโโ Loss: 5.8066 |
| 2025-08-30 10:10:18 - pico-train - INFO - โโโ Learning Rate: 1.51e-05 |
| 2025-08-30 10:10:18 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:10:30 - pico-train - INFO - Step 65900 -- ๐ Training Metrics |
| 2025-08-30 10:10:30 - pico-train - INFO - โโโ Loss: 5.8117 |
| 2025-08-30 10:10:30 - pico-train - INFO - โโโ Learning Rate: 1.51e-05 |
| 2025-08-30 10:10:30 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:10:43 - pico-train - INFO - Step 65925 -- ๐ Training Metrics |
| 2025-08-30 10:10:43 - pico-train - INFO - โโโ Loss: 5.7019 |
| 2025-08-30 10:10:43 - pico-train - INFO - โโโ Learning Rate: 1.51e-05 |
| 2025-08-30 10:10:43 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:10:56 - pico-train - INFO - Step 65950 -- ๐ Training Metrics |
| 2025-08-30 10:10:56 - pico-train - INFO - โโโ Loss: 5.8699 |
| 2025-08-30 10:10:56 - pico-train - INFO - โโโ Learning Rate: 1.51e-05 |
| 2025-08-30 10:10:56 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:11:08 - pico-train - INFO - Step 65975 -- ๐ Training Metrics |
| 2025-08-30 10:11:08 - pico-train - INFO - โโโ Loss: 5.8359 |
| 2025-08-30 10:11:08 - pico-train - INFO - โโโ Learning Rate: 1.51e-05 |
| 2025-08-30 10:11:08 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:11:20 - pico-train - INFO - Step 66000 -- ๐พ Saving Checkpoint |
| 2025-08-30 10:13:16 - pico-train - INFO - Step 66000 -- ๐ Evaluation Results |
| 2025-08-30 10:13:16 - pico-train - INFO - โโโ paloma: 2.5231045927678714e+31 |
| 2025-08-30 10:13:18 - pico-train - INFO - Step 66000 -- ๐ Training Metrics |
| 2025-08-30 10:13:18 - pico-train - INFO - โโโ Loss: 5.8326 |
| 2025-08-30 10:13:18 - pico-train - INFO - โโโ Learning Rate: 1.50e-05 |
| 2025-08-30 10:13:18 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:13:18 - pico-train - INFO - Step 66000 -- ๐ Saving Learning Dynamics |
| 2025-08-30 10:13:33 - pico-train - INFO - Step 66025 -- ๐ Training Metrics |
| 2025-08-30 10:13:33 - pico-train - INFO - โโโ Loss: 5.7993 |
| 2025-08-30 10:13:33 - pico-train - INFO - โโโ Learning Rate: 1.50e-05 |
| 2025-08-30 10:13:33 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:13:45 - pico-train - INFO - Step 66050 -- ๐ Training Metrics |
| 2025-08-30 10:13:45 - pico-train - INFO - โโโ Loss: 5.7906 |
| 2025-08-30 10:13:45 - pico-train - INFO - โโโ Learning Rate: 1.50e-05 |
| 2025-08-30 10:13:45 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:13:58 - pico-train - INFO - Step 66075 -- ๐ Training Metrics |
| 2025-08-30 10:13:58 - pico-train - INFO - โโโ Loss: 5.8668 |
| 2025-08-30 10:13:58 - pico-train - INFO - โโโ Learning Rate: 1.50e-05 |
| 2025-08-30 10:13:58 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:14:11 - pico-train - INFO - Step 66100 -- ๐ Training Metrics |
| 2025-08-30 10:14:11 - pico-train - INFO - โโโ Loss: 5.7929 |
| 2025-08-30 10:14:11 - pico-train - INFO - โโโ Learning Rate: 1.50e-05 |
| 2025-08-30 10:14:11 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:14:24 - pico-train - INFO - Step 66125 -- ๐ Training Metrics |
| 2025-08-30 10:14:24 - pico-train - INFO - โโโ Loss: 5.8483 |
| 2025-08-30 10:14:24 - pico-train - INFO - โโโ Learning Rate: 1.49e-05 |
| 2025-08-30 10:14:24 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:14:37 - pico-train - INFO - Step 66150 -- ๐ Training Metrics |
| 2025-08-30 10:14:37 - pico-train - INFO - โโโ Loss: 5.8747 |
| 2025-08-30 10:14:37 - pico-train - INFO - โโโ Learning Rate: 1.49e-05 |
| 2025-08-30 10:14:37 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:14:49 - pico-train - INFO - Step 66175 -- ๐ Training Metrics |
| 2025-08-30 10:14:49 - pico-train - INFO - โโโ Loss: 5.7636 |
| 2025-08-30 10:14:49 - pico-train - INFO - โโโ Learning Rate: 1.49e-05 |
| 2025-08-30 10:14:49 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:15:02 - pico-train - INFO - Step 66200 -- ๐ Training Metrics |
| 2025-08-30 10:15:02 - pico-train - INFO - โโโ Loss: 5.6910 |
| 2025-08-30 10:15:02 - pico-train - INFO - โโโ Learning Rate: 1.49e-05 |
| 2025-08-30 10:15:02 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:15:14 - pico-train - INFO - Step 66225 -- ๐ Training Metrics |
| 2025-08-30 10:15:14 - pico-train - INFO - โโโ Loss: 5.7696 |
| 2025-08-30 10:15:14 - pico-train - INFO - โโโ Learning Rate: 1.49e-05 |
| 2025-08-30 10:15:14 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:15:27 - pico-train - INFO - Step 66250 -- ๐ Training Metrics |
| 2025-08-30 10:15:27 - pico-train - INFO - โโโ Loss: 5.8958 |
| 2025-08-30 10:15:27 - pico-train - INFO - โโโ Learning Rate: 1.48e-05 |
| 2025-08-30 10:15:27 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:15:40 - pico-train - INFO - Step 66275 -- ๐ Training Metrics |
| 2025-08-30 10:15:40 - pico-train - INFO - โโโ Loss: 5.8720 |
| 2025-08-30 10:15:40 - pico-train - INFO - โโโ Learning Rate: 1.48e-05 |
| 2025-08-30 10:15:40 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:15:53 - pico-train - INFO - Step 66300 -- ๐ Training Metrics |
| 2025-08-30 10:15:53 - pico-train - INFO - โโโ Loss: 5.7927 |
| 2025-08-30 10:15:53 - pico-train - INFO - โโโ Learning Rate: 1.48e-05 |
| 2025-08-30 10:15:53 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:16:05 - pico-train - INFO - Step 66325 -- ๐ Training Metrics |
| 2025-08-30 10:16:05 - pico-train - INFO - โโโ Loss: 5.7417 |
| 2025-08-30 10:16:05 - pico-train - INFO - โโโ Learning Rate: 1.48e-05 |
| 2025-08-30 10:16:05 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:16:18 - pico-train - INFO - Step 66350 -- ๐ Training Metrics |
| 2025-08-30 10:16:18 - pico-train - INFO - โโโ Loss: 5.7908 |
| 2025-08-30 10:16:18 - pico-train - INFO - โโโ Learning Rate: 1.48e-05 |
| 2025-08-30 10:16:18 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:16:31 - pico-train - INFO - Step 66375 -- ๐ Training Metrics |
| 2025-08-30 10:16:31 - pico-train - INFO - โโโ Loss: 5.8609 |
| 2025-08-30 10:16:31 - pico-train - INFO - โโโ Learning Rate: 1.47e-05 |
| 2025-08-30 10:16:31 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:16:43 - pico-train - INFO - Step 66400 -- ๐ Training Metrics |
| 2025-08-30 10:16:43 - pico-train - INFO - โโโ Loss: 5.7846 |
| 2025-08-30 10:16:43 - pico-train - INFO - โโโ Learning Rate: 1.47e-05 |
| 2025-08-30 10:16:43 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:16:56 - pico-train - INFO - Step 66425 -- ๐ Training Metrics |
| 2025-08-30 10:16:56 - pico-train - INFO - โโโ Loss: 5.7744 |
| 2025-08-30 10:16:56 - pico-train - INFO - โโโ Learning Rate: 1.47e-05 |
| 2025-08-30 10:16:56 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:17:09 - pico-train - INFO - Step 66450 -- ๐ Training Metrics |
| 2025-08-30 10:17:09 - pico-train - INFO - โโโ Loss: 5.7639 |
| 2025-08-30 10:17:09 - pico-train - INFO - โโโ Learning Rate: 1.47e-05 |
| 2025-08-30 10:17:09 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:17:22 - pico-train - INFO - Step 66475 -- ๐ Training Metrics |
| 2025-08-30 10:17:22 - pico-train - INFO - โโโ Loss: 5.8572 |
| 2025-08-30 10:17:22 - pico-train - INFO - โโโ Learning Rate: 1.47e-05 |
| 2025-08-30 10:17:22 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:17:34 - pico-train - INFO - Step 66500 -- ๐พ Saving Checkpoint |
| 2025-08-30 10:19:32 - pico-train - INFO - Step 66500 -- ๐ Evaluation Results |
| 2025-08-30 10:19:32 - pico-train - INFO - โโโ paloma: 2.557649624835569e+31 |
| 2025-08-30 10:19:34 - pico-train - INFO - Step 66500 -- ๐ Training Metrics |
| 2025-08-30 10:19:34 - pico-train - INFO - โโโ Loss: 5.7731 |
| 2025-08-30 10:19:34 - pico-train - INFO - โโโ Learning Rate: 1.46e-05 |
| 2025-08-30 10:19:34 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:19:34 - pico-train - INFO - Step 66500 -- ๐ Saving Learning Dynamics |
| 2025-08-30 10:19:49 - pico-train - INFO - Step 66525 -- ๐ Training Metrics |
| 2025-08-30 10:19:49 - pico-train - INFO - โโโ Loss: 5.8698 |
| 2025-08-30 10:19:49 - pico-train - INFO - โโโ Learning Rate: 1.46e-05 |
| 2025-08-30 10:19:49 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:20:01 - pico-train - INFO - Step 66550 -- ๐ Training Metrics |
| 2025-08-30 10:20:01 - pico-train - INFO - โโโ Loss: 5.7763 |
| 2025-08-30 10:20:01 - pico-train - INFO - โโโ Learning Rate: 1.46e-05 |
| 2025-08-30 10:20:01 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:20:14 - pico-train - INFO - Step 66575 -- ๐ Training Metrics |
| 2025-08-30 10:20:14 - pico-train - INFO - โโโ Loss: 5.7793 |
| 2025-08-30 10:20:14 - pico-train - INFO - โโโ Learning Rate: 1.46e-05 |
| 2025-08-30 10:20:14 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:20:27 - pico-train - INFO - Step 66600 -- ๐ Training Metrics |
| 2025-08-30 10:20:27 - pico-train - INFO - โโโ Loss: 5.8998 |
| 2025-08-30 10:20:27 - pico-train - INFO - โโโ Learning Rate: 1.46e-05 |
| 2025-08-30 10:20:27 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:20:40 - pico-train - INFO - Step 66625 -- ๐ Training Metrics |
| 2025-08-30 10:20:40 - pico-train - INFO - โโโ Loss: 5.8772 |
| 2025-08-30 10:20:40 - pico-train - INFO - โโโ Learning Rate: 1.46e-05 |
| 2025-08-30 10:20:40 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:20:53 - pico-train - INFO - Step 66650 -- ๐ Training Metrics |
| 2025-08-30 10:20:53 - pico-train - INFO - โโโ Loss: 5.7580 |
| 2025-08-30 10:20:53 - pico-train - INFO - โโโ Learning Rate: 1.45e-05 |
| 2025-08-30 10:20:53 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:21:05 - pico-train - INFO - Step 66675 -- ๐ Training Metrics |
| 2025-08-30 10:21:05 - pico-train - INFO - โโโ Loss: 5.8102 |
| 2025-08-30 10:21:05 - pico-train - INFO - โโโ Learning Rate: 1.45e-05 |
| 2025-08-30 10:21:05 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:21:18 - pico-train - INFO - Step 66700 -- ๐ Training Metrics |
| 2025-08-30 10:21:18 - pico-train - INFO - โโโ Loss: 5.8321 |
| 2025-08-30 10:21:18 - pico-train - INFO - โโโ Learning Rate: 1.45e-05 |
| 2025-08-30 10:21:18 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:21:30 - pico-train - INFO - Step 66725 -- ๐ Training Metrics |
| 2025-08-30 10:21:30 - pico-train - INFO - โโโ Loss: 5.7792 |
| 2025-08-30 10:21:30 - pico-train - INFO - โโโ Learning Rate: 1.45e-05 |
| 2025-08-30 10:21:30 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:21:43 - pico-train - INFO - Step 66750 -- ๐ Training Metrics |
| 2025-08-30 10:21:43 - pico-train - INFO - โโโ Loss: 5.7811 |
| 2025-08-30 10:21:43 - pico-train - INFO - โโโ Learning Rate: 1.45e-05 |
| 2025-08-30 10:21:43 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:21:56 - pico-train - INFO - Step 66775 -- ๐ Training Metrics |
| 2025-08-30 10:21:56 - pico-train - INFO - โโโ Loss: 5.7789 |
| 2025-08-30 10:21:56 - pico-train - INFO - โโโ Learning Rate: 1.44e-05 |
| 2025-08-30 10:21:56 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:22:08 - pico-train - INFO - Step 66800 -- ๐ Training Metrics |
| 2025-08-30 10:22:08 - pico-train - INFO - โโโ Loss: 5.7714 |
| 2025-08-30 10:22:08 - pico-train - INFO - โโโ Learning Rate: 1.44e-05 |
| 2025-08-30 10:22:08 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:22:21 - pico-train - INFO - Step 66825 -- ๐ Training Metrics |
| 2025-08-30 10:22:21 - pico-train - INFO - โโโ Loss: 5.8399 |
| 2025-08-30 10:22:21 - pico-train - INFO - โโโ Learning Rate: 1.44e-05 |
| 2025-08-30 10:22:21 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:22:34 - pico-train - INFO - Step 66850 -- ๐ Training Metrics |
| 2025-08-30 10:22:34 - pico-train - INFO - โโโ Loss: 5.7693 |
| 2025-08-30 10:22:34 - pico-train - INFO - โโโ Learning Rate: 1.44e-05 |
| 2025-08-30 10:22:34 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:22:46 - pico-train - INFO - Step 66875 -- ๐ Training Metrics |
| 2025-08-30 10:22:46 - pico-train - INFO - โโโ Loss: 5.8165 |
| 2025-08-30 10:22:46 - pico-train - INFO - โโโ Learning Rate: 1.44e-05 |
| 2025-08-30 10:22:46 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:22:59 - pico-train - INFO - Step 66900 -- ๐ Training Metrics |
| 2025-08-30 10:22:59 - pico-train - INFO - โโโ Loss: 5.7763 |
| 2025-08-30 10:22:59 - pico-train - INFO - โโโ Learning Rate: 1.43e-05 |
| 2025-08-30 10:22:59 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:23:11 - pico-train - INFO - Step 66925 -- ๐ Training Metrics |
| 2025-08-30 10:23:11 - pico-train - INFO - โโโ Loss: 5.8683 |
| 2025-08-30 10:23:11 - pico-train - INFO - โโโ Learning Rate: 1.43e-05 |
| 2025-08-30 10:23:11 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:23:24 - pico-train - INFO - Step 66950 -- ๐ Training Metrics |
| 2025-08-30 10:23:24 - pico-train - INFO - โโโ Loss: 5.8662 |
| 2025-08-30 10:23:24 - pico-train - INFO - โโโ Learning Rate: 1.43e-05 |
| 2025-08-30 10:23:24 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:23:37 - pico-train - INFO - Step 66975 -- ๐ Training Metrics |
| 2025-08-30 10:23:37 - pico-train - INFO - โโโ Loss: 5.8864 |
| 2025-08-30 10:23:37 - pico-train - INFO - โโโ Learning Rate: 1.43e-05 |
| 2025-08-30 10:23:37 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:23:49 - pico-train - INFO - Step 67000 -- ๐พ Saving Checkpoint |
| 2025-08-30 10:25:56 - pico-train - INFO - Step 67000 -- ๐ Evaluation Results |
| 2025-08-30 10:25:56 - pico-train - INFO - โโโ paloma: 2.6865032383433974e+31 |
| 2025-08-30 10:25:58 - pico-train - INFO - Step 67000 -- ๐ Training Metrics |
| 2025-08-30 10:25:58 - pico-train - INFO - โโโ Loss: 5.7555 |
| 2025-08-30 10:25:58 - pico-train - INFO - โโโ Learning Rate: 1.43e-05 |
| 2025-08-30 10:25:58 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:25:58 - pico-train - INFO - Step 67000 -- ๐ Saving Learning Dynamics |
| 2025-08-30 10:26:12 - pico-train - INFO - Step 67025 -- ๐ Training Metrics |
| 2025-08-30 10:26:12 - pico-train - INFO - โโโ Loss: 5.8167 |
| 2025-08-30 10:26:12 - pico-train - INFO - โโโ Learning Rate: 1.42e-05 |
| 2025-08-30 10:26:12 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:26:25 - pico-train - INFO - Step 67050 -- ๐ Training Metrics |
| 2025-08-30 10:26:25 - pico-train - INFO - โโโ Loss: 5.8101 |
| 2025-08-30 10:26:25 - pico-train - INFO - โโโ Learning Rate: 1.42e-05 |
| 2025-08-30 10:26:25 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:26:38 - pico-train - INFO - Step 67075 -- ๐ Training Metrics |
| 2025-08-30 10:26:38 - pico-train - INFO - โโโ Loss: 5.8146 |
| 2025-08-30 10:26:38 - pico-train - INFO - โโโ Learning Rate: 1.42e-05 |
| 2025-08-30 10:26:38 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:26:51 - pico-train - INFO - Step 67100 -- ๐ Training Metrics |
| 2025-08-30 10:26:51 - pico-train - INFO - โโโ Loss: 5.9005 |
| 2025-08-30 10:26:51 - pico-train - INFO - โโโ Learning Rate: 1.42e-05 |
| 2025-08-30 10:26:51 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:27:04 - pico-train - INFO - Step 67125 -- ๐ Training Metrics |
| 2025-08-30 10:27:04 - pico-train - INFO - โโโ Loss: 5.7768 |
| 2025-08-30 10:27:04 - pico-train - INFO - โโโ Learning Rate: 1.42e-05 |
| 2025-08-30 10:27:04 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:27:16 - pico-train - INFO - Step 67150 -- ๐ Training Metrics |
| 2025-08-30 10:27:16 - pico-train - INFO - โโโ Loss: 5.7152 |
| 2025-08-30 10:27:16 - pico-train - INFO - โโโ Learning Rate: 1.41e-05 |
| 2025-08-30 10:27:16 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:27:29 - pico-train - INFO - Step 67175 -- ๐ Training Metrics |
| 2025-08-30 10:27:29 - pico-train - INFO - โโโ Loss: 5.8443 |
| 2025-08-30 10:27:29 - pico-train - INFO - โโโ Learning Rate: 1.41e-05 |
| 2025-08-30 10:27:29 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:27:41 - pico-train - INFO - Step 67200 -- ๐ Training Metrics |
| 2025-08-30 10:27:41 - pico-train - INFO - โโโ Loss: 5.7907 |
| 2025-08-30 10:27:41 - pico-train - INFO - โโโ Learning Rate: 1.41e-05 |
| 2025-08-30 10:27:41 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:27:54 - pico-train - INFO - Step 67225 -- ๐ Training Metrics |
| 2025-08-30 10:27:54 - pico-train - INFO - โโโ Loss: 5.8160 |
| 2025-08-30 10:27:54 - pico-train - INFO - โโโ Learning Rate: 1.41e-05 |
| 2025-08-30 10:27:54 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:28:06 - pico-train - INFO - Step 67250 -- ๐ Training Metrics |
| 2025-08-30 10:28:06 - pico-train - INFO - โโโ Loss: 5.8334 |
| 2025-08-30 10:28:06 - pico-train - INFO - โโโ Learning Rate: 1.41e-05 |
| 2025-08-30 10:28:06 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:28:19 - pico-train - INFO - Step 67275 -- ๐ Training Metrics |
| 2025-08-30 10:28:19 - pico-train - INFO - โโโ Loss: 5.8201 |
| 2025-08-30 10:28:19 - pico-train - INFO - โโโ Learning Rate: 1.41e-05 |
| 2025-08-30 10:28:19 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:28:32 - pico-train - INFO - Step 67300 -- ๐ Training Metrics |
| 2025-08-30 10:28:32 - pico-train - INFO - โโโ Loss: 5.8962 |
| 2025-08-30 10:28:32 - pico-train - INFO - โโโ Learning Rate: 1.40e-05 |
| 2025-08-30 10:28:32 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:28:44 - pico-train - INFO - Step 67325 -- ๐ Training Metrics |
| 2025-08-30 10:28:44 - pico-train - INFO - โโโ Loss: 5.7876 |
| 2025-08-30 10:28:44 - pico-train - INFO - โโโ Learning Rate: 1.40e-05 |
| 2025-08-30 10:28:44 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:28:57 - pico-train - INFO - Step 67350 -- ๐ Training Metrics |
| 2025-08-30 10:28:57 - pico-train - INFO - โโโ Loss: 5.8093 |
| 2025-08-30 10:28:57 - pico-train - INFO - โโโ Learning Rate: 1.40e-05 |
| 2025-08-30 10:28:57 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:29:09 - pico-train - INFO - Step 67375 -- ๐ Training Metrics |
| 2025-08-30 10:29:09 - pico-train - INFO - โโโ Loss: 5.7282 |
| 2025-08-30 10:29:09 - pico-train - INFO - โโโ Learning Rate: 1.40e-05 |
| 2025-08-30 10:29:09 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:29:22 - pico-train - INFO - Step 67400 -- ๐ Training Metrics |
| 2025-08-30 10:29:22 - pico-train - INFO - โโโ Loss: 5.7584 |
| 2025-08-30 10:29:22 - pico-train - INFO - โโโ Learning Rate: 1.40e-05 |
| 2025-08-30 10:29:22 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:29:34 - pico-train - INFO - Step 67425 -- ๐ Training Metrics |
| 2025-08-30 10:29:34 - pico-train - INFO - โโโ Loss: 5.7801 |
| 2025-08-30 10:29:34 - pico-train - INFO - โโโ Learning Rate: 1.39e-05 |
| 2025-08-30 10:29:34 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:29:47 - pico-train - INFO - Step 67450 -- ๐ Training Metrics |
| 2025-08-30 10:29:47 - pico-train - INFO - โโโ Loss: 5.7262 |
| 2025-08-30 10:29:47 - pico-train - INFO - โโโ Learning Rate: 1.39e-05 |
| 2025-08-30 10:29:47 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:30:00 - pico-train - INFO - Step 67475 -- ๐ Training Metrics |
| 2025-08-30 10:30:00 - pico-train - INFO - โโโ Loss: 5.7496 |
| 2025-08-30 10:30:00 - pico-train - INFO - โโโ Learning Rate: 1.39e-05 |
| 2025-08-30 10:30:00 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:30:12 - pico-train - INFO - Step 67500 -- ๐พ Saving Checkpoint |
| 2025-08-30 10:32:10 - pico-train - INFO - Step 67500 -- ๐ Evaluation Results |
| 2025-08-30 10:32:10 - pico-train - INFO - โโโ paloma: 3.1065040652754565e+31 |
| 2025-08-30 10:32:13 - pico-train - INFO - Step 67500 -- ๐ Training Metrics |
| 2025-08-30 10:32:13 - pico-train - INFO - โโโ Loss: 5.7965 |
| 2025-08-30 10:32:13 - pico-train - INFO - โโโ Learning Rate: 1.39e-05 |
| 2025-08-30 10:32:13 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:32:13 - pico-train - INFO - Step 67500 -- ๐ Saving Learning Dynamics |
| 2025-08-30 10:32:27 - pico-train - INFO - Step 67525 -- ๐ Training Metrics |
| 2025-08-30 10:32:27 - pico-train - INFO - โโโ Loss: 5.8326 |
| 2025-08-30 10:32:27 - pico-train - INFO - โโโ Learning Rate: 1.39e-05 |
| 2025-08-30 10:32:27 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:32:40 - pico-train - INFO - Step 67550 -- ๐ Training Metrics |
| 2025-08-30 10:32:40 - pico-train - INFO - โโโ Loss: 5.8544 |
| 2025-08-30 10:32:40 - pico-train - INFO - โโโ Learning Rate: 1.38e-05 |
| 2025-08-30 10:32:40 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:32:53 - pico-train - INFO - Step 67575 -- ๐ Training Metrics |
| 2025-08-30 10:32:53 - pico-train - INFO - โโโ Loss: 5.8529 |
| 2025-08-30 10:32:53 - pico-train - INFO - โโโ Learning Rate: 1.38e-05 |
| 2025-08-30 10:32:53 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:33:06 - pico-train - INFO - Step 67600 -- ๐ Training Metrics |
| 2025-08-30 10:33:06 - pico-train - INFO - โโโ Loss: 5.7630 |
| 2025-08-30 10:33:06 - pico-train - INFO - โโโ Learning Rate: 1.38e-05 |
| 2025-08-30 10:33:06 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:33:19 - pico-train - INFO - Step 67625 -- ๐ Training Metrics |
| 2025-08-30 10:33:19 - pico-train - INFO - โโโ Loss: 5.8400 |
| 2025-08-30 10:33:19 - pico-train - INFO - โโโ Learning Rate: 1.38e-05 |
| 2025-08-30 10:33:19 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:33:31 - pico-train - INFO - Step 67650 -- ๐ Training Metrics |
| 2025-08-30 10:33:31 - pico-train - INFO - โโโ Loss: 5.6921 |
| 2025-08-30 10:33:31 - pico-train - INFO - โโโ Learning Rate: 1.38e-05 |
| 2025-08-30 10:33:31 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:33:44 - pico-train - INFO - Step 67675 -- ๐ Training Metrics |
| 2025-08-30 10:33:44 - pico-train - INFO - โโโ Loss: 5.7714 |
| 2025-08-30 10:33:44 - pico-train - INFO - โโโ Learning Rate: 1.37e-05 |
| 2025-08-30 10:33:44 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:33:57 - pico-train - INFO - Step 67700 -- ๐ Training Metrics |
| 2025-08-30 10:33:57 - pico-train - INFO - โโโ Loss: 5.8415 |
| 2025-08-30 10:33:57 - pico-train - INFO - โโโ Learning Rate: 1.37e-05 |
| 2025-08-30 10:33:57 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:34:09 - pico-train - INFO - Step 67725 -- ๐ Training Metrics |
| 2025-08-30 10:34:09 - pico-train - INFO - โโโ Loss: 5.7966 |
| 2025-08-30 10:34:09 - pico-train - INFO - โโโ Learning Rate: 1.37e-05 |
| 2025-08-30 10:34:09 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:34:22 - pico-train - INFO - Step 67750 -- ๐ Training Metrics |
| 2025-08-30 10:34:22 - pico-train - INFO - โโโ Loss: 5.7681 |
| 2025-08-30 10:34:22 - pico-train - INFO - โโโ Learning Rate: 1.37e-05 |
| 2025-08-30 10:34:22 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:34:34 - pico-train - INFO - Step 67775 -- ๐ Training Metrics |
| 2025-08-30 10:34:34 - pico-train - INFO - โโโ Loss: 5.8142 |
| 2025-08-30 10:34:34 - pico-train - INFO - โโโ Learning Rate: 1.37e-05 |
| 2025-08-30 10:34:34 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:34:47 - pico-train - INFO - Step 67800 -- ๐ Training Metrics |
| 2025-08-30 10:34:47 - pico-train - INFO - โโโ Loss: 5.8364 |
| 2025-08-30 10:34:47 - pico-train - INFO - โโโ Learning Rate: 1.37e-05 |
| 2025-08-30 10:34:47 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:35:00 - pico-train - INFO - Step 67825 -- ๐ Training Metrics |
| 2025-08-30 10:35:00 - pico-train - INFO - โโโ Loss: 5.7471 |
| 2025-08-30 10:35:00 - pico-train - INFO - โโโ Learning Rate: 1.36e-05 |
| 2025-08-30 10:35:00 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:35:12 - pico-train - INFO - Step 67850 -- ๐ Training Metrics |
| 2025-08-30 10:35:12 - pico-train - INFO - โโโ Loss: 5.7829 |
| 2025-08-30 10:35:12 - pico-train - INFO - โโโ Learning Rate: 1.36e-05 |
| 2025-08-30 10:35:12 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:35:25 - pico-train - INFO - Step 67875 -- ๐ Training Metrics |
| 2025-08-30 10:35:25 - pico-train - INFO - โโโ Loss: 5.7502 |
| 2025-08-30 10:35:25 - pico-train - INFO - โโโ Learning Rate: 1.36e-05 |
| 2025-08-30 10:35:25 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:35:38 - pico-train - INFO - Step 67900 -- ๐ Training Metrics |
| 2025-08-30 10:35:38 - pico-train - INFO - โโโ Loss: 5.8291 |
| 2025-08-30 10:35:38 - pico-train - INFO - โโโ Learning Rate: 1.36e-05 |
| 2025-08-30 10:35:38 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:35:50 - pico-train - INFO - Step 67925 -- ๐ Training Metrics |
| 2025-08-30 10:35:50 - pico-train - INFO - โโโ Loss: 5.8411 |
| 2025-08-30 10:35:50 - pico-train - INFO - โโโ Learning Rate: 1.36e-05 |
| 2025-08-30 10:35:50 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:36:03 - pico-train - INFO - Step 67950 -- ๐ Training Metrics |
| 2025-08-30 10:36:03 - pico-train - INFO - โโโ Loss: 5.8542 |
| 2025-08-30 10:36:03 - pico-train - INFO - โโโ Learning Rate: 1.35e-05 |
| 2025-08-30 10:36:03 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:36:16 - pico-train - INFO - Step 67975 -- ๐ Training Metrics |
| 2025-08-30 10:36:16 - pico-train - INFO - โโโ Loss: 5.9065 |
| 2025-08-30 10:36:16 - pico-train - INFO - โโโ Learning Rate: 1.35e-05 |
| 2025-08-30 10:36:16 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:36:28 - pico-train - INFO - Step 68000 -- ๐พ Saving Checkpoint |
| 2025-08-30 10:38:32 - pico-train - INFO - Step 68000 -- ๐ Evaluation Results |
| 2025-08-30 10:38:32 - pico-train - INFO - โโโ paloma: 3.3702997728095594e+31 |
| 2025-08-30 10:38:35 - pico-train - INFO - Step 68000 -- ๐ Training Metrics |
| 2025-08-30 10:38:35 - pico-train - INFO - โโโ Loss: 5.7845 |
| 2025-08-30 10:38:35 - pico-train - INFO - โโโ Learning Rate: 1.35e-05 |
| 2025-08-30 10:38:35 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:38:35 - pico-train - INFO - Step 68000 -- ๐ Saving Learning Dynamics |
| 2025-08-30 10:38:50 - pico-train - INFO - Step 68025 -- ๐ Training Metrics |
| 2025-08-30 10:38:50 - pico-train - INFO - โโโ Loss: 5.6880 |
| 2025-08-30 10:38:50 - pico-train - INFO - โโโ Learning Rate: 1.35e-05 |
| 2025-08-30 10:38:50 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:39:02 - pico-train - INFO - Step 68050 -- ๐ Training Metrics |
| 2025-08-30 10:39:02 - pico-train - INFO - โโโ Loss: 5.7669 |
| 2025-08-30 10:39:02 - pico-train - INFO - โโโ Learning Rate: 1.35e-05 |
| 2025-08-30 10:39:02 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:39:15 - pico-train - INFO - Step 68075 -- ๐ Training Metrics |
| 2025-08-30 10:39:15 - pico-train - INFO - โโโ Loss: 5.7084 |
| 2025-08-30 10:39:15 - pico-train - INFO - โโโ Learning Rate: 1.34e-05 |
| 2025-08-30 10:39:15 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:39:28 - pico-train - INFO - Step 68100 -- ๐ Training Metrics |
| 2025-08-30 10:39:28 - pico-train - INFO - โโโ Loss: 5.8807 |
| 2025-08-30 10:39:28 - pico-train - INFO - โโโ Learning Rate: 1.34e-05 |
| 2025-08-30 10:39:28 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:39:41 - pico-train - INFO - Step 68125 -- ๐ Training Metrics |
| 2025-08-30 10:39:41 - pico-train - INFO - โโโ Loss: 5.8497 |
| 2025-08-30 10:39:41 - pico-train - INFO - โโโ Learning Rate: 1.34e-05 |
| 2025-08-30 10:39:41 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:39:54 - pico-train - INFO - Step 68150 -- ๐ Training Metrics |
| 2025-08-30 10:39:54 - pico-train - INFO - โโโ Loss: 5.7487 |
| 2025-08-30 10:39:54 - pico-train - INFO - โโโ Learning Rate: 1.34e-05 |
| 2025-08-30 10:39:54 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:40:06 - pico-train - INFO - Step 68175 -- ๐ Training Metrics |
| 2025-08-30 10:40:06 - pico-train - INFO - โโโ Loss: 5.7784 |
| 2025-08-30 10:40:06 - pico-train - INFO - โโโ Learning Rate: 1.34e-05 |
| 2025-08-30 10:40:06 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:40:19 - pico-train - INFO - Step 68200 -- ๐ Training Metrics |
| 2025-08-30 10:40:19 - pico-train - INFO - โโโ Loss: 5.7622 |
| 2025-08-30 10:40:19 - pico-train - INFO - โโโ Learning Rate: 1.33e-05 |
| 2025-08-30 10:40:19 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:40:31 - pico-train - INFO - Step 68225 -- ๐ Training Metrics |
| 2025-08-30 10:40:31 - pico-train - INFO - โโโ Loss: 5.7823 |
| 2025-08-30 10:40:31 - pico-train - INFO - โโโ Learning Rate: 1.33e-05 |
| 2025-08-30 10:40:31 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:40:44 - pico-train - INFO - Step 68250 -- ๐ Training Metrics |
| 2025-08-30 10:40:44 - pico-train - INFO - โโโ Loss: 5.7689 |
| 2025-08-30 10:40:44 - pico-train - INFO - โโโ Learning Rate: 1.33e-05 |
| 2025-08-30 10:40:44 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:40:57 - pico-train - INFO - Step 68275 -- ๐ Training Metrics |
| 2025-08-30 10:40:57 - pico-train - INFO - โโโ Loss: 5.7719 |
| 2025-08-30 10:40:57 - pico-train - INFO - โโโ Learning Rate: 1.33e-05 |
| 2025-08-30 10:40:57 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:41:09 - pico-train - INFO - Step 68300 -- ๐ Training Metrics |
| 2025-08-30 10:41:09 - pico-train - INFO - โโโ Loss: 5.7754 |
| 2025-08-30 10:41:09 - pico-train - INFO - โโโ Learning Rate: 1.33e-05 |
| 2025-08-30 10:41:09 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:41:22 - pico-train - INFO - Step 68325 -- ๐ Training Metrics |
| 2025-08-30 10:41:22 - pico-train - INFO - โโโ Loss: 5.8183 |
| 2025-08-30 10:41:22 - pico-train - INFO - โโโ Learning Rate: 1.33e-05 |
| 2025-08-30 10:41:22 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:41:34 - pico-train - INFO - Step 68350 -- ๐ Training Metrics |
| 2025-08-30 10:41:34 - pico-train - INFO - โโโ Loss: 5.8116 |
| 2025-08-30 10:41:34 - pico-train - INFO - โโโ Learning Rate: 1.32e-05 |
| 2025-08-30 10:41:34 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:41:47 - pico-train - INFO - Step 68375 -- ๐ Training Metrics |
| 2025-08-30 10:41:47 - pico-train - INFO - โโโ Loss: 5.6714 |
| 2025-08-30 10:41:47 - pico-train - INFO - โโโ Learning Rate: 1.32e-05 |
| 2025-08-30 10:41:47 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:42:00 - pico-train - INFO - Step 68400 -- ๐ Training Metrics |
| 2025-08-30 10:42:00 - pico-train - INFO - โโโ Loss: 5.7859 |
| 2025-08-30 10:42:00 - pico-train - INFO - โโโ Learning Rate: 1.32e-05 |
| 2025-08-30 10:42:00 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:42:12 - pico-train - INFO - Step 68425 -- ๐ Training Metrics |
| 2025-08-30 10:42:12 - pico-train - INFO - โโโ Loss: 5.8268 |
| 2025-08-30 10:42:12 - pico-train - INFO - โโโ Learning Rate: 1.32e-05 |
| 2025-08-30 10:42:12 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:42:25 - pico-train - INFO - Step 68450 -- ๐ Training Metrics |
| 2025-08-30 10:42:25 - pico-train - INFO - โโโ Loss: 5.8194 |
| 2025-08-30 10:42:25 - pico-train - INFO - โโโ Learning Rate: 1.32e-05 |
| 2025-08-30 10:42:25 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:42:37 - pico-train - INFO - Step 68475 -- ๐ Training Metrics |
| 2025-08-30 10:42:37 - pico-train - INFO - โโโ Loss: 5.8550 |
| 2025-08-30 10:42:37 - pico-train - INFO - โโโ Learning Rate: 1.31e-05 |
| 2025-08-30 10:42:37 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:42:49 - pico-train - INFO - Step 68500 -- ๐พ Saving Checkpoint |
| 2025-08-30 10:45:00 - pico-train - INFO - Step 68500 -- ๐ Evaluation Results |
| 2025-08-30 10:45:00 - pico-train - INFO - โโโ paloma: 3.3728195138741334e+31 |
| 2025-08-30 10:45:04 - pico-train - INFO - Step 68500 -- ๐ Training Metrics |
| 2025-08-30 10:45:04 - pico-train - INFO - โโโ Loss: 5.9096 |
| 2025-08-30 10:45:04 - pico-train - INFO - โโโ Learning Rate: 1.31e-05 |
| 2025-08-30 10:45:04 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:45:04 - pico-train - INFO - Step 68500 -- ๐ Saving Learning Dynamics |
| 2025-08-30 10:45:19 - pico-train - INFO - Step 68525 -- ๐ Training Metrics |
| 2025-08-30 10:45:19 - pico-train - INFO - โโโ Loss: 5.7826 |
| 2025-08-30 10:45:19 - pico-train - INFO - โโโ Learning Rate: 1.31e-05 |
| 2025-08-30 10:45:19 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:45:32 - pico-train - INFO - Step 68550 -- ๐ Training Metrics |
| 2025-08-30 10:45:32 - pico-train - INFO - โโโ Loss: 5.7860 |
| 2025-08-30 10:45:32 - pico-train - INFO - โโโ Learning Rate: 1.31e-05 |
| 2025-08-30 10:45:32 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:45:45 - pico-train - INFO - Step 68575 -- ๐ Training Metrics |
| 2025-08-30 10:45:45 - pico-train - INFO - โโโ Loss: 5.7932 |
| 2025-08-30 10:45:45 - pico-train - INFO - โโโ Learning Rate: 1.31e-05 |
| 2025-08-30 10:45:45 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:45:58 - pico-train - INFO - Step 68600 -- ๐ Training Metrics |
| 2025-08-30 10:45:58 - pico-train - INFO - โโโ Loss: 5.8207 |
| 2025-08-30 10:45:58 - pico-train - INFO - โโโ Learning Rate: 1.30e-05 |
| 2025-08-30 10:45:58 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:46:10 - pico-train - INFO - Step 68625 -- ๐ Training Metrics |
| 2025-08-30 10:46:10 - pico-train - INFO - โโโ Loss: 5.6706 |
| 2025-08-30 10:46:10 - pico-train - INFO - โโโ Learning Rate: 1.30e-05 |
| 2025-08-30 10:46:10 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:46:23 - pico-train - INFO - Step 68650 -- ๐ Training Metrics |
| 2025-08-30 10:46:23 - pico-train - INFO - โโโ Loss: 5.7751 |
| 2025-08-30 10:46:23 - pico-train - INFO - โโโ Learning Rate: 1.30e-05 |
| 2025-08-30 10:46:23 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:46:36 - pico-train - INFO - Step 68675 -- ๐ Training Metrics |
| 2025-08-30 10:46:36 - pico-train - INFO - โโโ Loss: 5.7419 |
| 2025-08-30 10:46:36 - pico-train - INFO - โโโ Learning Rate: 1.30e-05 |
| 2025-08-30 10:46:36 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:46:48 - pico-train - INFO - Step 68700 -- ๐ Training Metrics |
| 2025-08-30 10:46:48 - pico-train - INFO - โโโ Loss: 5.8879 |
| 2025-08-30 10:46:48 - pico-train - INFO - โโโ Learning Rate: 1.30e-05 |
| 2025-08-30 10:46:48 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:47:01 - pico-train - INFO - Step 68725 -- ๐ Training Metrics |
| 2025-08-30 10:47:01 - pico-train - INFO - โโโ Loss: 5.8349 |
| 2025-08-30 10:47:01 - pico-train - INFO - โโโ Learning Rate: 1.30e-05 |
| 2025-08-30 10:47:01 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:47:13 - pico-train - INFO - Step 68750 -- ๐ Training Metrics |
| 2025-08-30 10:47:13 - pico-train - INFO - โโโ Loss: 5.8237 |
| 2025-08-30 10:47:13 - pico-train - INFO - โโโ Learning Rate: 1.29e-05 |
| 2025-08-30 10:47:13 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:47:26 - pico-train - INFO - Step 68775 -- ๐ Training Metrics |
| 2025-08-30 10:47:26 - pico-train - INFO - โโโ Loss: 5.8724 |
| 2025-08-30 10:47:26 - pico-train - INFO - โโโ Learning Rate: 1.29e-05 |
| 2025-08-30 10:47:26 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:47:39 - pico-train - INFO - Step 68800 -- ๐ Training Metrics |
| 2025-08-30 10:47:39 - pico-train - INFO - โโโ Loss: 5.7777 |
| 2025-08-30 10:47:39 - pico-train - INFO - โโโ Learning Rate: 1.29e-05 |
| 2025-08-30 10:47:39 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:47:52 - pico-train - INFO - Step 68825 -- ๐ Training Metrics |
| 2025-08-30 10:47:52 - pico-train - INFO - โโโ Loss: 5.7775 |
| 2025-08-30 10:47:52 - pico-train - INFO - โโโ Learning Rate: 1.29e-05 |
| 2025-08-30 10:47:52 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:48:04 - pico-train - INFO - Step 68850 -- ๐ Training Metrics |
| 2025-08-30 10:48:04 - pico-train - INFO - โโโ Loss: 5.8112 |
| 2025-08-30 10:48:04 - pico-train - INFO - โโโ Learning Rate: 1.29e-05 |
| 2025-08-30 10:48:04 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:48:17 - pico-train - INFO - Step 68875 -- ๐ Training Metrics |
| 2025-08-30 10:48:17 - pico-train - INFO - โโโ Loss: 5.7673 |
| 2025-08-30 10:48:17 - pico-train - INFO - โโโ Learning Rate: 1.28e-05 |
| 2025-08-30 10:48:17 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:48:30 - pico-train - INFO - Step 68900 -- ๐ Training Metrics |
| 2025-08-30 10:48:30 - pico-train - INFO - โโโ Loss: 5.7477 |
| 2025-08-30 10:48:30 - pico-train - INFO - โโโ Learning Rate: 1.28e-05 |
| 2025-08-30 10:48:30 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:48:43 - pico-train - INFO - Step 68925 -- ๐ Training Metrics |
| 2025-08-30 10:48:43 - pico-train - INFO - โโโ Loss: 5.8516 |
| 2025-08-30 10:48:43 - pico-train - INFO - โโโ Learning Rate: 1.28e-05 |
| 2025-08-30 10:48:43 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:48:56 - pico-train - INFO - Step 68950 -- ๐ Training Metrics |
| 2025-08-30 10:48:56 - pico-train - INFO - โโโ Loss: 5.7671 |
| 2025-08-30 10:48:56 - pico-train - INFO - โโโ Learning Rate: 1.28e-05 |
| 2025-08-30 10:48:56 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:49:08 - pico-train - INFO - Step 68975 -- ๐ Training Metrics |
| 2025-08-30 10:49:08 - pico-train - INFO - โโโ Loss: 5.8476 |
| 2025-08-30 10:49:08 - pico-train - INFO - โโโ Learning Rate: 1.28e-05 |
| 2025-08-30 10:49:08 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:49:20 - pico-train - INFO - Step 69000 -- ๐พ Saving Checkpoint |
| 2025-08-30 10:51:18 - pico-train - INFO - Step 69000 -- ๐ Evaluation Results |
| 2025-08-30 10:51:18 - pico-train - INFO - โโโ paloma: 4.015441614691927e+31 |
| 2025-08-30 10:51:22 - pico-train - INFO - Step 69000 -- ๐ Training Metrics |
| 2025-08-30 10:51:22 - pico-train - INFO - โโโ Loss: 5.7945 |
| 2025-08-30 10:51:22 - pico-train - INFO - โโโ Learning Rate: 1.27e-05 |
| 2025-08-30 10:51:22 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:51:22 - pico-train - INFO - Step 69000 -- ๐ Saving Learning Dynamics |
| 2025-08-30 10:51:38 - pico-train - INFO - Step 69025 -- ๐ Training Metrics |
| 2025-08-30 10:51:38 - pico-train - INFO - โโโ Loss: 5.7222 |
| 2025-08-30 10:51:38 - pico-train - INFO - โโโ Learning Rate: 1.27e-05 |
| 2025-08-30 10:51:38 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:51:50 - pico-train - INFO - Step 69050 -- ๐ Training Metrics |
| 2025-08-30 10:51:50 - pico-train - INFO - โโโ Loss: 5.8469 |
| 2025-08-30 10:51:50 - pico-train - INFO - โโโ Learning Rate: 1.27e-05 |
| 2025-08-30 10:51:50 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:52:03 - pico-train - INFO - Step 69075 -- ๐ Training Metrics |
| 2025-08-30 10:52:03 - pico-train - INFO - โโโ Loss: 5.7888 |
| 2025-08-30 10:52:03 - pico-train - INFO - โโโ Learning Rate: 1.27e-05 |
| 2025-08-30 10:52:03 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:52:16 - pico-train - INFO - Step 69100 -- ๐ Training Metrics |
| 2025-08-30 10:52:16 - pico-train - INFO - โโโ Loss: 5.8239 |
| 2025-08-30 10:52:16 - pico-train - INFO - โโโ Learning Rate: 1.27e-05 |
| 2025-08-30 10:52:16 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:52:29 - pico-train - INFO - Step 69125 -- ๐ Training Metrics |
| 2025-08-30 10:52:29 - pico-train - INFO - โโโ Loss: 5.8123 |
| 2025-08-30 10:52:29 - pico-train - INFO - โโโ Learning Rate: 1.27e-05 |
| 2025-08-30 10:52:29 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:52:42 - pico-train - INFO - Step 69150 -- ๐ Training Metrics |
| 2025-08-30 10:52:42 - pico-train - INFO - โโโ Loss: 5.8655 |
| 2025-08-30 10:52:42 - pico-train - INFO - โโโ Learning Rate: 1.26e-05 |
| 2025-08-30 10:52:42 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:52:54 - pico-train - INFO - Step 69175 -- ๐ Training Metrics |
| 2025-08-30 10:52:54 - pico-train - INFO - โโโ Loss: 5.8294 |
| 2025-08-30 10:52:54 - pico-train - INFO - โโโ Learning Rate: 1.26e-05 |
| 2025-08-30 10:52:54 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:53:07 - pico-train - INFO - Step 69200 -- ๐ Training Metrics |
| 2025-08-30 10:53:07 - pico-train - INFO - โโโ Loss: 5.8492 |
| 2025-08-30 10:53:07 - pico-train - INFO - โโโ Learning Rate: 1.26e-05 |
| 2025-08-30 10:53:07 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:53:19 - pico-train - INFO - Step 69225 -- ๐ Training Metrics |
| 2025-08-30 10:53:19 - pico-train - INFO - โโโ Loss: 5.8203 |
| 2025-08-30 10:53:19 - pico-train - INFO - โโโ Learning Rate: 1.26e-05 |
| 2025-08-30 10:53:19 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:53:32 - pico-train - INFO - Step 69250 -- ๐ Training Metrics |
| 2025-08-30 10:53:32 - pico-train - INFO - โโโ Loss: 5.8163 |
| 2025-08-30 10:53:32 - pico-train - INFO - โโโ Learning Rate: 1.26e-05 |
| 2025-08-30 10:53:32 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:53:45 - pico-train - INFO - Step 69275 -- ๐ Training Metrics |
| 2025-08-30 10:53:45 - pico-train - INFO - โโโ Loss: 5.8982 |
| 2025-08-30 10:53:45 - pico-train - INFO - โโโ Learning Rate: 1.25e-05 |
| 2025-08-30 10:53:45 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:53:57 - pico-train - INFO - Step 69300 -- ๐ Training Metrics |
| 2025-08-30 10:53:57 - pico-train - INFO - โโโ Loss: 5.7549 |
| 2025-08-30 10:53:57 - pico-train - INFO - โโโ Learning Rate: 1.25e-05 |
| 2025-08-30 10:53:57 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:54:10 - pico-train - INFO - Step 69325 -- ๐ Training Metrics |
| 2025-08-30 10:54:10 - pico-train - INFO - โโโ Loss: 5.8212 |
| 2025-08-30 10:54:10 - pico-train - INFO - โโโ Learning Rate: 1.25e-05 |
| 2025-08-30 10:54:10 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:54:23 - pico-train - INFO - Step 69350 -- ๐ Training Metrics |
| 2025-08-30 10:54:23 - pico-train - INFO - โโโ Loss: 5.8512 |
| 2025-08-30 10:54:23 - pico-train - INFO - โโโ Learning Rate: 1.25e-05 |
| 2025-08-30 10:54:23 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:54:35 - pico-train - INFO - Step 69375 -- ๐ Training Metrics |
| 2025-08-30 10:54:35 - pico-train - INFO - โโโ Loss: 5.8506 |
| 2025-08-30 10:54:35 - pico-train - INFO - โโโ Learning Rate: 1.25e-05 |
| 2025-08-30 10:54:35 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:54:48 - pico-train - INFO - Step 69400 -- ๐ Training Metrics |
| 2025-08-30 10:54:48 - pico-train - INFO - โโโ Loss: 5.7973 |
| 2025-08-30 10:54:48 - pico-train - INFO - โโโ Learning Rate: 1.25e-05 |
| 2025-08-30 10:54:48 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:55:00 - pico-train - INFO - Step 69425 -- ๐ Training Metrics |
| 2025-08-30 10:55:00 - pico-train - INFO - โโโ Loss: 5.8587 |
| 2025-08-30 10:55:00 - pico-train - INFO - โโโ Learning Rate: 1.24e-05 |
| 2025-08-30 10:55:00 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:55:13 - pico-train - INFO - Step 69450 -- ๐ Training Metrics |
| 2025-08-30 10:55:13 - pico-train - INFO - โโโ Loss: 5.7108 |
| 2025-08-30 10:55:13 - pico-train - INFO - โโโ Learning Rate: 1.24e-05 |
| 2025-08-30 10:55:13 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:55:26 - pico-train - INFO - Step 69475 -- ๐ Training Metrics |
| 2025-08-30 10:55:26 - pico-train - INFO - โโโ Loss: 5.7860 |
| 2025-08-30 10:55:26 - pico-train - INFO - โโโ Learning Rate: 1.24e-05 |
| 2025-08-30 10:55:26 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:55:38 - pico-train - INFO - Step 69500 -- ๐พ Saving Checkpoint |
| 2025-08-30 10:57:37 - pico-train - INFO - Step 69500 -- ๐ Evaluation Results |
| 2025-08-30 10:57:37 - pico-train - INFO - โโโ paloma: 4.498437349495611e+31 |
| 2025-08-30 10:57:40 - pico-train - INFO - Step 69500 -- ๐ Training Metrics |
| 2025-08-30 10:57:40 - pico-train - INFO - โโโ Loss: 5.8497 |
| 2025-08-30 10:57:40 - pico-train - INFO - โโโ Learning Rate: 1.24e-05 |
| 2025-08-30 10:57:40 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:57:40 - pico-train - INFO - Step 69500 -- ๐ Saving Learning Dynamics |
| 2025-08-30 10:57:55 - pico-train - INFO - Step 69525 -- ๐ Training Metrics |
| 2025-08-30 10:57:55 - pico-train - INFO - โโโ Loss: 5.8320 |
| 2025-08-30 10:57:55 - pico-train - INFO - โโโ Learning Rate: 1.24e-05 |
| 2025-08-30 10:57:55 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:58:08 - pico-train - INFO - Step 69550 -- ๐ Training Metrics |
| 2025-08-30 10:58:08 - pico-train - INFO - โโโ Loss: 5.7277 |
| 2025-08-30 10:58:08 - pico-train - INFO - โโโ Learning Rate: 1.23e-05 |
| 2025-08-30 10:58:08 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:58:20 - pico-train - INFO - Step 69575 -- ๐ Training Metrics |
| 2025-08-30 10:58:20 - pico-train - INFO - โโโ Loss: 5.8119 |
| 2025-08-30 10:58:20 - pico-train - INFO - โโโ Learning Rate: 1.23e-05 |
| 2025-08-30 10:58:20 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:58:34 - pico-train - INFO - Step 69600 -- ๐ Training Metrics |
| 2025-08-30 10:58:34 - pico-train - INFO - โโโ Loss: 5.8142 |
| 2025-08-30 10:58:34 - pico-train - INFO - โโโ Learning Rate: 1.23e-05 |
| 2025-08-30 10:58:34 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:58:47 - pico-train - INFO - Step 69625 -- ๐ Training Metrics |
| 2025-08-30 10:58:47 - pico-train - INFO - โโโ Loss: 5.8271 |
| 2025-08-30 10:58:47 - pico-train - INFO - โโโ Learning Rate: 1.23e-05 |
| 2025-08-30 10:58:47 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:58:59 - pico-train - INFO - Step 69650 -- ๐ Training Metrics |
| 2025-08-30 10:58:59 - pico-train - INFO - โโโ Loss: 5.7488 |
| 2025-08-30 10:58:59 - pico-train - INFO - โโโ Learning Rate: 1.23e-05 |
| 2025-08-30 10:58:59 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:59:12 - pico-train - INFO - Step 69675 -- ๐ Training Metrics |
| 2025-08-30 10:59:12 - pico-train - INFO - โโโ Loss: 5.8036 |
| 2025-08-30 10:59:12 - pico-train - INFO - โโโ Learning Rate: 1.22e-05 |
| 2025-08-30 10:59:12 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:59:25 - pico-train - INFO - Step 69700 -- ๐ Training Metrics |
| 2025-08-30 10:59:25 - pico-train - INFO - โโโ Loss: 5.8718 |
| 2025-08-30 10:59:25 - pico-train - INFO - โโโ Learning Rate: 1.22e-05 |
| 2025-08-30 10:59:25 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:59:37 - pico-train - INFO - Step 69725 -- ๐ Training Metrics |
| 2025-08-30 10:59:37 - pico-train - INFO - โโโ Loss: 5.7624 |
| 2025-08-30 10:59:37 - pico-train - INFO - โโโ Learning Rate: 1.22e-05 |
| 2025-08-30 10:59:37 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 10:59:50 - pico-train - INFO - Step 69750 -- ๐ Training Metrics |
| 2025-08-30 10:59:50 - pico-train - INFO - โโโ Loss: 5.7221 |
| 2025-08-30 10:59:50 - pico-train - INFO - โโโ Learning Rate: 1.22e-05 |
| 2025-08-30 10:59:50 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:00:03 - pico-train - INFO - Step 69775 -- ๐ Training Metrics |
| 2025-08-30 11:00:03 - pico-train - INFO - โโโ Loss: 5.8421 |
| 2025-08-30 11:00:03 - pico-train - INFO - โโโ Learning Rate: 1.22e-05 |
| 2025-08-30 11:00:03 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:00:16 - pico-train - INFO - Step 69800 -- ๐ Training Metrics |
| 2025-08-30 11:00:16 - pico-train - INFO - โโโ Loss: 5.8152 |
| 2025-08-30 11:00:16 - pico-train - INFO - โโโ Learning Rate: 1.22e-05 |
| 2025-08-30 11:00:16 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:00:28 - pico-train - INFO - Step 69825 -- ๐ Training Metrics |
| 2025-08-30 11:00:28 - pico-train - INFO - โโโ Loss: 5.8357 |
| 2025-08-30 11:00:28 - pico-train - INFO - โโโ Learning Rate: 1.21e-05 |
| 2025-08-30 11:00:28 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:00:41 - pico-train - INFO - Step 69850 -- ๐ Training Metrics |
| 2025-08-30 11:00:41 - pico-train - INFO - โโโ Loss: 5.8124 |
| 2025-08-30 11:00:41 - pico-train - INFO - โโโ Learning Rate: 1.21e-05 |
| 2025-08-30 11:00:41 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:00:53 - pico-train - INFO - Step 69875 -- ๐ Training Metrics |
| 2025-08-30 11:00:53 - pico-train - INFO - โโโ Loss: 5.8160 |
| 2025-08-30 11:00:53 - pico-train - INFO - โโโ Learning Rate: 1.21e-05 |
| 2025-08-30 11:00:53 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:01:06 - pico-train - INFO - Step 69900 -- ๐ Training Metrics |
| 2025-08-30 11:01:06 - pico-train - INFO - โโโ Loss: 5.7780 |
| 2025-08-30 11:01:06 - pico-train - INFO - โโโ Learning Rate: 1.21e-05 |
| 2025-08-30 11:01:06 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:01:19 - pico-train - INFO - Step 69925 -- ๐ Training Metrics |
| 2025-08-30 11:01:19 - pico-train - INFO - โโโ Loss: 5.7680 |
| 2025-08-30 11:01:19 - pico-train - INFO - โโโ Learning Rate: 1.21e-05 |
| 2025-08-30 11:01:19 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:01:31 - pico-train - INFO - Step 69950 -- ๐ Training Metrics |
| 2025-08-30 11:01:31 - pico-train - INFO - โโโ Loss: 5.7678 |
| 2025-08-30 11:01:31 - pico-train - INFO - โโโ Learning Rate: 1.20e-05 |
| 2025-08-30 11:01:31 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:01:44 - pico-train - INFO - Step 69975 -- ๐ Training Metrics |
| 2025-08-30 11:01:44 - pico-train - INFO - โโโ Loss: 5.7694 |
| 2025-08-30 11:01:44 - pico-train - INFO - โโโ Learning Rate: 1.20e-05 |
| 2025-08-30 11:01:44 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:01:56 - pico-train - INFO - Step 70000 -- ๐พ Saving Checkpoint |
| 2025-08-30 11:03:54 - pico-train - INFO - Step 70000 -- ๐ Evaluation Results |
| 2025-08-30 11:03:54 - pico-train - INFO - โโโ paloma: 4.524086501230947e+31 |
| 2025-08-30 11:03:58 - pico-train - INFO - Step 70000 -- ๐ Training Metrics |
| 2025-08-30 11:03:58 - pico-train - INFO - โโโ Loss: 5.7691 |
| 2025-08-30 11:03:58 - pico-train - INFO - โโโ Learning Rate: 1.20e-05 |
| 2025-08-30 11:03:58 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:03:58 - pico-train - INFO - Step 70000 -- ๐ Saving Learning Dynamics |
| 2025-08-30 11:04:13 - pico-train - INFO - Step 70025 -- ๐ Training Metrics |
| 2025-08-30 11:04:13 - pico-train - INFO - โโโ Loss: 5.8459 |
| 2025-08-30 11:04:13 - pico-train - INFO - โโโ Learning Rate: 1.20e-05 |
| 2025-08-30 11:04:13 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:04:26 - pico-train - INFO - Step 70050 -- ๐ Training Metrics |
| 2025-08-30 11:04:26 - pico-train - INFO - โโโ Loss: 5.7648 |
| 2025-08-30 11:04:26 - pico-train - INFO - โโโ Learning Rate: 1.20e-05 |
| 2025-08-30 11:04:26 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:04:38 - pico-train - INFO - Step 70075 -- ๐ Training Metrics |
| 2025-08-30 11:04:38 - pico-train - INFO - โโโ Loss: 5.9146 |
| 2025-08-30 11:04:38 - pico-train - INFO - โโโ Learning Rate: 1.20e-05 |
| 2025-08-30 11:04:38 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:04:51 - pico-train - INFO - Step 70100 -- ๐ Training Metrics |
| 2025-08-30 11:04:51 - pico-train - INFO - โโโ Loss: 5.8547 |
| 2025-08-30 11:04:51 - pico-train - INFO - โโโ Learning Rate: 1.19e-05 |
| 2025-08-30 11:04:51 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:05:04 - pico-train - INFO - Step 70125 -- ๐ Training Metrics |
| 2025-08-30 11:05:04 - pico-train - INFO - โโโ Loss: 5.7720 |
| 2025-08-30 11:05:04 - pico-train - INFO - โโโ Learning Rate: 1.19e-05 |
| 2025-08-30 11:05:04 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:05:17 - pico-train - INFO - Step 70150 -- ๐ Training Metrics |
| 2025-08-30 11:05:17 - pico-train - INFO - โโโ Loss: 5.7761 |
| 2025-08-30 11:05:17 - pico-train - INFO - โโโ Learning Rate: 1.19e-05 |
| 2025-08-30 11:05:17 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:05:29 - pico-train - INFO - Step 70175 -- ๐ Training Metrics |
| 2025-08-30 11:05:29 - pico-train - INFO - โโโ Loss: 5.7980 |
| 2025-08-30 11:05:29 - pico-train - INFO - โโโ Learning Rate: 1.19e-05 |
| 2025-08-30 11:05:29 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:05:42 - pico-train - INFO - Step 70200 -- ๐ Training Metrics |
| 2025-08-30 11:05:42 - pico-train - INFO - โโโ Loss: 5.7824 |
| 2025-08-30 11:05:42 - pico-train - INFO - โโโ Learning Rate: 1.19e-05 |
| 2025-08-30 11:05:42 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:05:55 - pico-train - INFO - Step 70225 -- ๐ Training Metrics |
| 2025-08-30 11:05:55 - pico-train - INFO - โโโ Loss: 5.8025 |
| 2025-08-30 11:05:55 - pico-train - INFO - โโโ Learning Rate: 1.18e-05 |
| 2025-08-30 11:05:55 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:06:07 - pico-train - INFO - Step 70250 -- ๐ Training Metrics |
| 2025-08-30 11:06:07 - pico-train - INFO - โโโ Loss: 5.8501 |
| 2025-08-30 11:06:07 - pico-train - INFO - โโโ Learning Rate: 1.18e-05 |
| 2025-08-30 11:06:07 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:06:20 - pico-train - INFO - Step 70275 -- ๐ Training Metrics |
| 2025-08-30 11:06:20 - pico-train - INFO - โโโ Loss: 5.7877 |
| 2025-08-30 11:06:20 - pico-train - INFO - โโโ Learning Rate: 1.18e-05 |
| 2025-08-30 11:06:20 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:06:32 - pico-train - INFO - Step 70300 -- ๐ Training Metrics |
| 2025-08-30 11:06:32 - pico-train - INFO - โโโ Loss: 5.7537 |
| 2025-08-30 11:06:32 - pico-train - INFO - โโโ Learning Rate: 1.18e-05 |
| 2025-08-30 11:06:32 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:06:45 - pico-train - INFO - Step 70325 -- ๐ Training Metrics |
| 2025-08-30 11:06:45 - pico-train - INFO - โโโ Loss: 5.8530 |
| 2025-08-30 11:06:45 - pico-train - INFO - โโโ Learning Rate: 1.18e-05 |
| 2025-08-30 11:06:45 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:06:58 - pico-train - INFO - Step 70350 -- ๐ Training Metrics |
| 2025-08-30 11:06:58 - pico-train - INFO - โโโ Loss: 5.6919 |
| 2025-08-30 11:06:58 - pico-train - INFO - โโโ Learning Rate: 1.18e-05 |
| 2025-08-30 11:06:58 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:07:10 - pico-train - INFO - Step 70375 -- ๐ Training Metrics |
| 2025-08-30 11:07:10 - pico-train - INFO - โโโ Loss: 5.7595 |
| 2025-08-30 11:07:10 - pico-train - INFO - โโโ Learning Rate: 1.17e-05 |
| 2025-08-30 11:07:10 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:07:23 - pico-train - INFO - Step 70400 -- ๐ Training Metrics |
| 2025-08-30 11:07:23 - pico-train - INFO - โโโ Loss: 5.7637 |
| 2025-08-30 11:07:23 - pico-train - INFO - โโโ Learning Rate: 1.17e-05 |
| 2025-08-30 11:07:23 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:07:36 - pico-train - INFO - Step 70425 -- ๐ Training Metrics |
| 2025-08-30 11:07:36 - pico-train - INFO - โโโ Loss: 5.8013 |
| 2025-08-30 11:07:36 - pico-train - INFO - โโโ Learning Rate: 1.17e-05 |
| 2025-08-30 11:07:36 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:07:48 - pico-train - INFO - Step 70450 -- ๐ Training Metrics |
| 2025-08-30 11:07:48 - pico-train - INFO - โโโ Loss: 5.8487 |
| 2025-08-30 11:07:48 - pico-train - INFO - โโโ Learning Rate: 1.17e-05 |
| 2025-08-30 11:07:48 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:08:01 - pico-train - INFO - Step 70475 -- ๐ Training Metrics |
| 2025-08-30 11:08:01 - pico-train - INFO - โโโ Loss: 5.7931 |
| 2025-08-30 11:08:01 - pico-train - INFO - โโโ Learning Rate: 1.17e-05 |
| 2025-08-30 11:08:01 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:08:13 - pico-train - INFO - Step 70500 -- ๐พ Saving Checkpoint |
| 2025-08-30 11:10:14 - pico-train - INFO - Step 70500 -- ๐ Evaluation Results |
| 2025-08-30 11:10:14 - pico-train - INFO - โโโ paloma: 5.389143520871013e+31 |
| 2025-08-30 11:10:18 - pico-train - INFO - Step 70500 -- ๐ Training Metrics |
| 2025-08-30 11:10:18 - pico-train - INFO - โโโ Loss: 5.8130 |
| 2025-08-30 11:10:18 - pico-train - INFO - โโโ Learning Rate: 1.16e-05 |
| 2025-08-30 11:10:18 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:10:18 - pico-train - INFO - Step 70500 -- ๐ Saving Learning Dynamics |
| 2025-08-30 11:10:34 - pico-train - INFO - Step 70525 -- ๐ Training Metrics |
| 2025-08-30 11:10:34 - pico-train - INFO - โโโ Loss: 5.8003 |
| 2025-08-30 11:10:34 - pico-train - INFO - โโโ Learning Rate: 1.16e-05 |
| 2025-08-30 11:10:34 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:10:46 - pico-train - INFO - Step 70550 -- ๐ Training Metrics |
| 2025-08-30 11:10:46 - pico-train - INFO - โโโ Loss: 5.7638 |
| 2025-08-30 11:10:46 - pico-train - INFO - โโโ Learning Rate: 1.16e-05 |
| 2025-08-30 11:10:46 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:10:59 - pico-train - INFO - Step 70575 -- ๐ Training Metrics |
| 2025-08-30 11:10:59 - pico-train - INFO - โโโ Loss: 5.8081 |
| 2025-08-30 11:10:59 - pico-train - INFO - โโโ Learning Rate: 1.16e-05 |
| 2025-08-30 11:10:59 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:11:12 - pico-train - INFO - Step 70600 -- ๐ Training Metrics |
| 2025-08-30 11:11:12 - pico-train - INFO - โโโ Loss: 5.8433 |
| 2025-08-30 11:11:12 - pico-train - INFO - โโโ Learning Rate: 1.16e-05 |
| 2025-08-30 11:11:12 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:11:24 - pico-train - INFO - Step 70625 -- ๐ Training Metrics |
| 2025-08-30 11:11:24 - pico-train - INFO - โโโ Loss: 5.7845 |
| 2025-08-30 11:11:24 - pico-train - INFO - โโโ Learning Rate: 1.16e-05 |
| 2025-08-30 11:11:24 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:11:37 - pico-train - INFO - Step 70650 -- ๐ Training Metrics |
| 2025-08-30 11:11:37 - pico-train - INFO - โโโ Loss: 5.7766 |
| 2025-08-30 11:11:37 - pico-train - INFO - โโโ Learning Rate: 1.15e-05 |
| 2025-08-30 11:11:37 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:11:50 - pico-train - INFO - Step 70675 -- ๐ Training Metrics |
| 2025-08-30 11:11:50 - pico-train - INFO - โโโ Loss: 5.8443 |
| 2025-08-30 11:11:50 - pico-train - INFO - โโโ Learning Rate: 1.15e-05 |
| 2025-08-30 11:11:50 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:12:02 - pico-train - INFO - Step 70700 -- ๐ Training Metrics |
| 2025-08-30 11:12:02 - pico-train - INFO - โโโ Loss: 5.8557 |
| 2025-08-30 11:12:02 - pico-train - INFO - โโโ Learning Rate: 1.15e-05 |
| 2025-08-30 11:12:02 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:12:16 - pico-train - INFO - Step 70725 -- ๐ Training Metrics |
| 2025-08-30 11:12:16 - pico-train - INFO - โโโ Loss: 5.7753 |
| 2025-08-30 11:12:16 - pico-train - INFO - โโโ Learning Rate: 1.15e-05 |
| 2025-08-30 11:12:16 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:12:28 - pico-train - INFO - Step 70750 -- ๐ Training Metrics |
| 2025-08-30 11:12:28 - pico-train - INFO - โโโ Loss: 5.7036 |
| 2025-08-30 11:12:28 - pico-train - INFO - โโโ Learning Rate: 1.15e-05 |
| 2025-08-30 11:12:28 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:12:41 - pico-train - INFO - Step 70775 -- ๐ Training Metrics |
| 2025-08-30 11:12:41 - pico-train - INFO - โโโ Loss: 5.8355 |
| 2025-08-30 11:12:41 - pico-train - INFO - โโโ Learning Rate: 1.14e-05 |
| 2025-08-30 11:12:41 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:12:54 - pico-train - INFO - Step 70800 -- ๐ Training Metrics |
| 2025-08-30 11:12:54 - pico-train - INFO - โโโ Loss: 5.7925 |
| 2025-08-30 11:12:54 - pico-train - INFO - โโโ Learning Rate: 1.14e-05 |
| 2025-08-30 11:12:54 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:13:06 - pico-train - INFO - Step 70825 -- ๐ Training Metrics |
| 2025-08-30 11:13:06 - pico-train - INFO - โโโ Loss: 5.7594 |
| 2025-08-30 11:13:06 - pico-train - INFO - โโโ Learning Rate: 1.14e-05 |
| 2025-08-30 11:13:06 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:13:19 - pico-train - INFO - Step 70850 -- ๐ Training Metrics |
| 2025-08-30 11:13:19 - pico-train - INFO - โโโ Loss: 5.7899 |
| 2025-08-30 11:13:19 - pico-train - INFO - โโโ Learning Rate: 1.14e-05 |
| 2025-08-30 11:13:19 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:13:31 - pico-train - INFO - Step 70875 -- ๐ Training Metrics |
| 2025-08-30 11:13:31 - pico-train - INFO - โโโ Loss: 5.8210 |
| 2025-08-30 11:13:31 - pico-train - INFO - โโโ Learning Rate: 1.14e-05 |
| 2025-08-30 11:13:31 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:13:44 - pico-train - INFO - Step 70900 -- ๐ Training Metrics |
| 2025-08-30 11:13:44 - pico-train - INFO - โโโ Loss: 5.7877 |
| 2025-08-30 11:13:44 - pico-train - INFO - โโโ Learning Rate: 1.14e-05 |
| 2025-08-30 11:13:44 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:13:57 - pico-train - INFO - Step 70925 -- ๐ Training Metrics |
| 2025-08-30 11:13:57 - pico-train - INFO - โโโ Loss: 5.8528 |
| 2025-08-30 11:13:57 - pico-train - INFO - โโโ Learning Rate: 1.13e-05 |
| 2025-08-30 11:13:57 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:14:09 - pico-train - INFO - Step 70950 -- ๐ Training Metrics |
| 2025-08-30 11:14:09 - pico-train - INFO - โโโ Loss: 5.7071 |
| 2025-08-30 11:14:09 - pico-train - INFO - โโโ Learning Rate: 1.13e-05 |
| 2025-08-30 11:14:09 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:14:22 - pico-train - INFO - Step 70975 -- ๐ Training Metrics |
| 2025-08-30 11:14:22 - pico-train - INFO - โโโ Loss: 5.7500 |
| 2025-08-30 11:14:22 - pico-train - INFO - โโโ Learning Rate: 1.13e-05 |
| 2025-08-30 11:14:22 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:14:34 - pico-train - INFO - Step 71000 -- ๐พ Saving Checkpoint |
| 2025-08-30 11:16:34 - pico-train - INFO - Step 71000 -- ๐ Evaluation Results |
| 2025-08-30 11:16:34 - pico-train - INFO - โโโ paloma: 6.106796255447029e+31 |
| 2025-08-30 11:16:38 - pico-train - INFO - Step 71000 -- ๐ Training Metrics |
| 2025-08-30 11:16:38 - pico-train - INFO - โโโ Loss: 5.8512 |
| 2025-08-30 11:16:38 - pico-train - INFO - โโโ Learning Rate: 1.13e-05 |
| 2025-08-30 11:16:38 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:16:38 - pico-train - INFO - Step 71000 -- ๐ Saving Learning Dynamics |
| 2025-08-30 11:16:53 - pico-train - INFO - Step 71025 -- ๐ Training Metrics |
| 2025-08-30 11:16:53 - pico-train - INFO - โโโ Loss: 5.7849 |
| 2025-08-30 11:16:53 - pico-train - INFO - โโโ Learning Rate: 1.13e-05 |
| 2025-08-30 11:16:53 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:17:06 - pico-train - INFO - Step 71050 -- ๐ Training Metrics |
| 2025-08-30 11:17:06 - pico-train - INFO - โโโ Loss: 5.7794 |
| 2025-08-30 11:17:06 - pico-train - INFO - โโโ Learning Rate: 1.13e-05 |
| 2025-08-30 11:17:06 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:17:19 - pico-train - INFO - Step 71075 -- ๐ Training Metrics |
| 2025-08-30 11:17:19 - pico-train - INFO - โโโ Loss: 5.8584 |
| 2025-08-30 11:17:19 - pico-train - INFO - โโโ Learning Rate: 1.12e-05 |
| 2025-08-30 11:17:19 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:17:32 - pico-train - INFO - Step 71100 -- ๐ Training Metrics |
| 2025-08-30 11:17:32 - pico-train - INFO - โโโ Loss: 5.7866 |
| 2025-08-30 11:17:32 - pico-train - INFO - โโโ Learning Rate: 1.12e-05 |
| 2025-08-30 11:17:32 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:17:44 - pico-train - INFO - Step 71125 -- ๐ Training Metrics |
| 2025-08-30 11:17:44 - pico-train - INFO - โโโ Loss: 5.7744 |
| 2025-08-30 11:17:44 - pico-train - INFO - โโโ Learning Rate: 1.12e-05 |
| 2025-08-30 11:17:44 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:17:57 - pico-train - INFO - Step 71150 -- ๐ Training Metrics |
| 2025-08-30 11:17:57 - pico-train - INFO - โโโ Loss: 5.8179 |
| 2025-08-30 11:17:57 - pico-train - INFO - โโโ Learning Rate: 1.12e-05 |
| 2025-08-30 11:17:57 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:18:09 - pico-train - INFO - Step 71175 -- ๐ Training Metrics |
| 2025-08-30 11:18:09 - pico-train - INFO - โโโ Loss: 5.8349 |
| 2025-08-30 11:18:09 - pico-train - INFO - โโโ Learning Rate: 1.12e-05 |
| 2025-08-30 11:18:09 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:18:22 - pico-train - INFO - Step 71200 -- ๐ Training Metrics |
| 2025-08-30 11:18:22 - pico-train - INFO - โโโ Loss: 5.7446 |
| 2025-08-30 11:18:22 - pico-train - INFO - โโโ Learning Rate: 1.11e-05 |
| 2025-08-30 11:18:22 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:18:35 - pico-train - INFO - Step 71225 -- ๐ Training Metrics |
| 2025-08-30 11:18:35 - pico-train - INFO - โโโ Loss: 5.8961 |
| 2025-08-30 11:18:35 - pico-train - INFO - โโโ Learning Rate: 1.11e-05 |
| 2025-08-30 11:18:35 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:18:48 - pico-train - INFO - Step 71250 -- ๐ Training Metrics |
| 2025-08-30 11:18:48 - pico-train - INFO - โโโ Loss: 5.7719 |
| 2025-08-30 11:18:48 - pico-train - INFO - โโโ Learning Rate: 1.11e-05 |
| 2025-08-30 11:18:48 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:19:01 - pico-train - INFO - Step 71275 -- ๐ Training Metrics |
| 2025-08-30 11:19:01 - pico-train - INFO - โโโ Loss: 5.7171 |
| 2025-08-30 11:19:01 - pico-train - INFO - โโโ Learning Rate: 1.11e-05 |
| 2025-08-30 11:19:01 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:19:13 - pico-train - INFO - Step 71300 -- ๐ Training Metrics |
| 2025-08-30 11:19:13 - pico-train - INFO - โโโ Loss: 5.7381 |
| 2025-08-30 11:19:13 - pico-train - INFO - โโโ Learning Rate: 1.11e-05 |
| 2025-08-30 11:19:13 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:19:26 - pico-train - INFO - Step 71325 -- ๐ Training Metrics |
| 2025-08-30 11:19:26 - pico-train - INFO - โโโ Loss: 5.7906 |
| 2025-08-30 11:19:26 - pico-train - INFO - โโโ Learning Rate: 1.11e-05 |
| 2025-08-30 11:19:26 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:19:39 - pico-train - INFO - Step 71350 -- ๐ Training Metrics |
| 2025-08-30 11:19:39 - pico-train - INFO - โโโ Loss: 5.9247 |
| 2025-08-30 11:19:39 - pico-train - INFO - โโโ Learning Rate: 1.10e-05 |
| 2025-08-30 11:19:39 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:19:51 - pico-train - INFO - Step 71375 -- ๐ Training Metrics |
| 2025-08-30 11:19:51 - pico-train - INFO - โโโ Loss: 5.8136 |
| 2025-08-30 11:19:51 - pico-train - INFO - โโโ Learning Rate: 1.10e-05 |
| 2025-08-30 11:19:51 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:20:04 - pico-train - INFO - Step 71400 -- ๐ Training Metrics |
| 2025-08-30 11:20:04 - pico-train - INFO - โโโ Loss: 5.7196 |
| 2025-08-30 11:20:04 - pico-train - INFO - โโโ Learning Rate: 1.10e-05 |
| 2025-08-30 11:20:04 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:20:16 - pico-train - INFO - Step 71425 -- ๐ Training Metrics |
| 2025-08-30 11:20:16 - pico-train - INFO - โโโ Loss: 5.7807 |
| 2025-08-30 11:20:16 - pico-train - INFO - โโโ Learning Rate: 1.10e-05 |
| 2025-08-30 11:20:16 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:20:29 - pico-train - INFO - Step 71450 -- ๐ Training Metrics |
| 2025-08-30 11:20:29 - pico-train - INFO - โโโ Loss: 5.8609 |
| 2025-08-30 11:20:29 - pico-train - INFO - โโโ Learning Rate: 1.10e-05 |
| 2025-08-30 11:20:29 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:20:42 - pico-train - INFO - Step 71475 -- ๐ Training Metrics |
| 2025-08-30 11:20:42 - pico-train - INFO - โโโ Loss: 5.7683 |
| 2025-08-30 11:20:42 - pico-train - INFO - โโโ Learning Rate: 1.10e-05 |
| 2025-08-30 11:20:42 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:20:54 - pico-train - INFO - Step 71500 -- ๐พ Saving Checkpoint |
| 2025-08-30 11:22:50 - pico-train - INFO - Step 71500 -- ๐ Evaluation Results |
| 2025-08-30 11:22:50 - pico-train - INFO - โโโ paloma: 6.282048257805562e+31 |
| 2025-08-30 11:22:53 - pico-train - INFO - Step 71500 -- ๐ Training Metrics |
| 2025-08-30 11:22:53 - pico-train - INFO - โโโ Loss: 5.8034 |
| 2025-08-30 11:22:53 - pico-train - INFO - โโโ Learning Rate: 1.09e-05 |
| 2025-08-30 11:22:53 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:22:53 - pico-train - INFO - Step 71500 -- ๐ Saving Learning Dynamics |
| 2025-08-30 11:23:08 - pico-train - INFO - Step 71525 -- ๐ Training Metrics |
| 2025-08-30 11:23:08 - pico-train - INFO - โโโ Loss: 5.7923 |
| 2025-08-30 11:23:08 - pico-train - INFO - โโโ Learning Rate: 1.09e-05 |
| 2025-08-30 11:23:08 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:23:21 - pico-train - INFO - Step 71550 -- ๐ Training Metrics |
| 2025-08-30 11:23:21 - pico-train - INFO - โโโ Loss: 5.8365 |
| 2025-08-30 11:23:21 - pico-train - INFO - โโโ Learning Rate: 1.09e-05 |
| 2025-08-30 11:23:21 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:23:34 - pico-train - INFO - Step 71575 -- ๐ Training Metrics |
| 2025-08-30 11:23:34 - pico-train - INFO - โโโ Loss: 5.7924 |
| 2025-08-30 11:23:34 - pico-train - INFO - โโโ Learning Rate: 1.09e-05 |
| 2025-08-30 11:23:34 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:23:46 - pico-train - INFO - Step 71600 -- ๐ Training Metrics |
| 2025-08-30 11:23:46 - pico-train - INFO - โโโ Loss: 5.8132 |
| 2025-08-30 11:23:46 - pico-train - INFO - โโโ Learning Rate: 1.09e-05 |
| 2025-08-30 11:23:46 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:23:59 - pico-train - INFO - Step 71625 -- ๐ Training Metrics |
| 2025-08-30 11:23:59 - pico-train - INFO - โโโ Loss: 5.8109 |
| 2025-08-30 11:23:59 - pico-train - INFO - โโโ Learning Rate: 1.08e-05 |
| 2025-08-30 11:23:59 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:24:11 - pico-train - INFO - Step 71650 -- ๐ Training Metrics |
| 2025-08-30 11:24:11 - pico-train - INFO - โโโ Loss: 5.8357 |
| 2025-08-30 11:24:11 - pico-train - INFO - โโโ Learning Rate: 1.08e-05 |
| 2025-08-30 11:24:11 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:24:24 - pico-train - INFO - Step 71675 -- ๐ Training Metrics |
| 2025-08-30 11:24:24 - pico-train - INFO - โโโ Loss: 5.8117 |
| 2025-08-30 11:24:24 - pico-train - INFO - โโโ Learning Rate: 1.08e-05 |
| 2025-08-30 11:24:24 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:24:37 - pico-train - INFO - Step 71700 -- ๐ Training Metrics |
| 2025-08-30 11:24:37 - pico-train - INFO - โโโ Loss: 5.6849 |
| 2025-08-30 11:24:37 - pico-train - INFO - โโโ Learning Rate: 1.08e-05 |
| 2025-08-30 11:24:37 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:24:50 - pico-train - INFO - Step 71725 -- ๐ Training Metrics |
| 2025-08-30 11:24:50 - pico-train - INFO - โโโ Loss: 5.8316 |
| 2025-08-30 11:24:50 - pico-train - INFO - โโโ Learning Rate: 1.08e-05 |
| 2025-08-30 11:24:50 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:25:03 - pico-train - INFO - Step 71750 -- ๐ Training Metrics |
| 2025-08-30 11:25:03 - pico-train - INFO - โโโ Loss: 5.8852 |
| 2025-08-30 11:25:03 - pico-train - INFO - โโโ Learning Rate: 1.08e-05 |
| 2025-08-30 11:25:03 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:25:15 - pico-train - INFO - Step 71775 -- ๐ Training Metrics |
| 2025-08-30 11:25:15 - pico-train - INFO - โโโ Loss: 5.7825 |
| 2025-08-30 11:25:15 - pico-train - INFO - โโโ Learning Rate: 1.07e-05 |
| 2025-08-30 11:25:15 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:25:28 - pico-train - INFO - Step 71800 -- ๐ Training Metrics |
| 2025-08-30 11:25:28 - pico-train - INFO - โโโ Loss: 5.8405 |
| 2025-08-30 11:25:28 - pico-train - INFO - โโโ Learning Rate: 1.07e-05 |
| 2025-08-30 11:25:28 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:25:41 - pico-train - INFO - Step 71825 -- ๐ Training Metrics |
| 2025-08-30 11:25:41 - pico-train - INFO - โโโ Loss: 5.7973 |
| 2025-08-30 11:25:41 - pico-train - INFO - โโโ Learning Rate: 1.07e-05 |
| 2025-08-30 11:25:41 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:25:53 - pico-train - INFO - Step 71850 -- ๐ Training Metrics |
| 2025-08-30 11:25:53 - pico-train - INFO - โโโ Loss: 5.8016 |
| 2025-08-30 11:25:53 - pico-train - INFO - โโโ Learning Rate: 1.07e-05 |
| 2025-08-30 11:25:53 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:26:06 - pico-train - INFO - Step 71875 -- ๐ Training Metrics |
| 2025-08-30 11:26:06 - pico-train - INFO - โโโ Loss: 5.6851 |
| 2025-08-30 11:26:06 - pico-train - INFO - โโโ Learning Rate: 1.07e-05 |
| 2025-08-30 11:26:06 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:26:19 - pico-train - INFO - Step 71900 -- ๐ Training Metrics |
| 2025-08-30 11:26:19 - pico-train - INFO - โโโ Loss: 5.7568 |
| 2025-08-30 11:26:19 - pico-train - INFO - โโโ Learning Rate: 1.07e-05 |
| 2025-08-30 11:26:19 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:26:31 - pico-train - INFO - Step 71925 -- ๐ Training Metrics |
| 2025-08-30 11:26:31 - pico-train - INFO - โโโ Loss: 5.7542 |
| 2025-08-30 11:26:31 - pico-train - INFO - โโโ Learning Rate: 1.06e-05 |
| 2025-08-30 11:26:31 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:26:44 - pico-train - INFO - Step 71950 -- ๐ Training Metrics |
| 2025-08-30 11:26:44 - pico-train - INFO - โโโ Loss: 5.6807 |
| 2025-08-30 11:26:44 - pico-train - INFO - โโโ Learning Rate: 1.06e-05 |
| 2025-08-30 11:26:44 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:26:56 - pico-train - INFO - Step 71975 -- ๐ Training Metrics |
| 2025-08-30 11:26:56 - pico-train - INFO - โโโ Loss: 5.7309 |
| 2025-08-30 11:26:56 - pico-train - INFO - โโโ Learning Rate: 1.06e-05 |
| 2025-08-30 11:26:56 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:27:09 - pico-train - INFO - Step 72000 -- ๐พ Saving Checkpoint |
| 2025-08-30 11:29:13 - pico-train - INFO - Step 72000 -- ๐ Evaluation Results |
| 2025-08-30 11:29:13 - pico-train - INFO - โโโ paloma: 6.442465619967253e+31 |
| 2025-08-30 11:29:15 - pico-train - INFO - Step 72000 -- ๐ Training Metrics |
| 2025-08-30 11:29:15 - pico-train - INFO - โโโ Loss: 5.7989 |
| 2025-08-30 11:29:15 - pico-train - INFO - โโโ Learning Rate: 1.06e-05 |
| 2025-08-30 11:29:15 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:29:15 - pico-train - INFO - Step 72000 -- ๐ Saving Learning Dynamics |
| 2025-08-30 11:29:30 - pico-train - INFO - Step 72025 -- ๐ Training Metrics |
| 2025-08-30 11:29:30 - pico-train - INFO - โโโ Loss: 5.7701 |
| 2025-08-30 11:29:30 - pico-train - INFO - โโโ Learning Rate: 1.06e-05 |
| 2025-08-30 11:29:30 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:29:42 - pico-train - INFO - Step 72050 -- ๐ Training Metrics |
| 2025-08-30 11:29:42 - pico-train - INFO - โโโ Loss: 5.7553 |
| 2025-08-30 11:29:42 - pico-train - INFO - โโโ Learning Rate: 1.05e-05 |
| 2025-08-30 11:29:42 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:29:55 - pico-train - INFO - Step 72075 -- ๐ Training Metrics |
| 2025-08-30 11:29:55 - pico-train - INFO - โโโ Loss: 5.6550 |
| 2025-08-30 11:29:55 - pico-train - INFO - โโโ Learning Rate: 1.05e-05 |
| 2025-08-30 11:29:55 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:30:08 - pico-train - INFO - Step 72100 -- ๐ Training Metrics |
| 2025-08-30 11:30:08 - pico-train - INFO - โโโ Loss: 5.7120 |
| 2025-08-30 11:30:08 - pico-train - INFO - โโโ Learning Rate: 1.05e-05 |
| 2025-08-30 11:30:08 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:30:20 - pico-train - INFO - Step 72125 -- ๐ Training Metrics |
| 2025-08-30 11:30:20 - pico-train - INFO - โโโ Loss: 5.8457 |
| 2025-08-30 11:30:20 - pico-train - INFO - โโโ Learning Rate: 1.05e-05 |
| 2025-08-30 11:30:20 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:30:33 - pico-train - INFO - Step 72150 -- ๐ Training Metrics |
| 2025-08-30 11:30:33 - pico-train - INFO - โโโ Loss: 5.7710 |
| 2025-08-30 11:30:33 - pico-train - INFO - โโโ Learning Rate: 1.05e-05 |
| 2025-08-30 11:30:33 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:30:46 - pico-train - INFO - Step 72175 -- ๐ Training Metrics |
| 2025-08-30 11:30:46 - pico-train - INFO - โโโ Loss: 5.8311 |
| 2025-08-30 11:30:46 - pico-train - INFO - โโโ Learning Rate: 1.05e-05 |
| 2025-08-30 11:30:46 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:30:58 - pico-train - INFO - Step 72200 -- ๐ Training Metrics |
| 2025-08-30 11:30:58 - pico-train - INFO - โโโ Loss: 5.8419 |
| 2025-08-30 11:30:58 - pico-train - INFO - โโโ Learning Rate: 1.04e-05 |
| 2025-08-30 11:30:58 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:31:12 - pico-train - INFO - Step 72225 -- ๐ Training Metrics |
| 2025-08-30 11:31:12 - pico-train - INFO - โโโ Loss: 5.7954 |
| 2025-08-30 11:31:12 - pico-train - INFO - โโโ Learning Rate: 1.04e-05 |
| 2025-08-30 11:31:12 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:31:24 - pico-train - INFO - Step 72250 -- ๐ Training Metrics |
| 2025-08-30 11:31:24 - pico-train - INFO - โโโ Loss: 5.7894 |
| 2025-08-30 11:31:24 - pico-train - INFO - โโโ Learning Rate: 1.04e-05 |
| 2025-08-30 11:31:24 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:31:37 - pico-train - INFO - Step 72275 -- ๐ Training Metrics |
| 2025-08-30 11:31:37 - pico-train - INFO - โโโ Loss: 5.7746 |
| 2025-08-30 11:31:37 - pico-train - INFO - โโโ Learning Rate: 1.04e-05 |
| 2025-08-30 11:31:37 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:31:50 - pico-train - INFO - Step 72300 -- ๐ Training Metrics |
| 2025-08-30 11:31:50 - pico-train - INFO - โโโ Loss: 5.9178 |
| 2025-08-30 11:31:50 - pico-train - INFO - โโโ Learning Rate: 1.04e-05 |
| 2025-08-30 11:31:50 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:32:02 - pico-train - INFO - Step 72325 -- ๐ Training Metrics |
| 2025-08-30 11:32:02 - pico-train - INFO - โโโ Loss: 5.8326 |
| 2025-08-30 11:32:02 - pico-train - INFO - โโโ Learning Rate: 1.04e-05 |
| 2025-08-30 11:32:02 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:32:15 - pico-train - INFO - Step 72350 -- ๐ Training Metrics |
| 2025-08-30 11:32:15 - pico-train - INFO - โโโ Loss: 5.8099 |
| 2025-08-30 11:32:15 - pico-train - INFO - โโโ Learning Rate: 1.03e-05 |
| 2025-08-30 11:32:15 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:32:27 - pico-train - INFO - Step 72375 -- ๐ Training Metrics |
| 2025-08-30 11:32:27 - pico-train - INFO - โโโ Loss: 5.7497 |
| 2025-08-30 11:32:27 - pico-train - INFO - โโโ Learning Rate: 1.03e-05 |
| 2025-08-30 11:32:27 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:32:40 - pico-train - INFO - Step 72400 -- ๐ Training Metrics |
| 2025-08-30 11:32:40 - pico-train - INFO - โโโ Loss: 5.7700 |
| 2025-08-30 11:32:40 - pico-train - INFO - โโโ Learning Rate: 1.03e-05 |
| 2025-08-30 11:32:40 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:32:53 - pico-train - INFO - Step 72425 -- ๐ Training Metrics |
| 2025-08-30 11:32:53 - pico-train - INFO - โโโ Loss: 5.8295 |
| 2025-08-30 11:32:53 - pico-train - INFO - โโโ Learning Rate: 1.03e-05 |
| 2025-08-30 11:32:53 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:33:06 - pico-train - INFO - Step 72450 -- ๐ Training Metrics |
| 2025-08-30 11:33:06 - pico-train - INFO - โโโ Loss: 5.7635 |
| 2025-08-30 11:33:06 - pico-train - INFO - โโโ Learning Rate: 1.03e-05 |
| 2025-08-30 11:33:06 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:33:18 - pico-train - INFO - Step 72475 -- ๐ Training Metrics |
| 2025-08-30 11:33:18 - pico-train - INFO - โโโ Loss: 5.7644 |
| 2025-08-30 11:33:18 - pico-train - INFO - โโโ Learning Rate: 1.03e-05 |
| 2025-08-30 11:33:18 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:33:31 - pico-train - INFO - Step 72500 -- ๐พ Saving Checkpoint |
| 2025-08-30 11:35:40 - pico-train - INFO - Step 72500 -- ๐ Evaluation Results |
| 2025-08-30 11:35:40 - pico-train - INFO - โโโ paloma: 7.433151564209409e+31 |
| 2025-08-30 11:35:42 - pico-train - INFO - Step 72500 -- ๐ Training Metrics |
| 2025-08-30 11:35:42 - pico-train - INFO - โโโ Loss: 5.7986 |
| 2025-08-30 11:35:42 - pico-train - INFO - โโโ Learning Rate: 1.02e-05 |
| 2025-08-30 11:35:42 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:35:42 - pico-train - INFO - Step 72500 -- ๐ Saving Learning Dynamics |
| 2025-08-30 11:35:57 - pico-train - INFO - Step 72525 -- ๐ Training Metrics |
| 2025-08-30 11:35:57 - pico-train - INFO - โโโ Loss: 5.8320 |
| 2025-08-30 11:35:57 - pico-train - INFO - โโโ Learning Rate: 1.02e-05 |
| 2025-08-30 11:35:57 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:36:10 - pico-train - INFO - Step 72550 -- ๐ Training Metrics |
| 2025-08-30 11:36:10 - pico-train - INFO - โโโ Loss: 5.7602 |
| 2025-08-30 11:36:10 - pico-train - INFO - โโโ Learning Rate: 1.02e-05 |
| 2025-08-30 11:36:10 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:36:22 - pico-train - INFO - Step 72575 -- ๐ Training Metrics |
| 2025-08-30 11:36:22 - pico-train - INFO - โโโ Loss: 5.7627 |
| 2025-08-30 11:36:22 - pico-train - INFO - โโโ Learning Rate: 1.02e-05 |
| 2025-08-30 11:36:22 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:36:35 - pico-train - INFO - Step 72600 -- ๐ Training Metrics |
| 2025-08-30 11:36:35 - pico-train - INFO - โโโ Loss: 5.7779 |
| 2025-08-30 11:36:35 - pico-train - INFO - โโโ Learning Rate: 1.02e-05 |
| 2025-08-30 11:36:35 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:36:48 - pico-train - INFO - Step 72625 -- ๐ Training Metrics |
| 2025-08-30 11:36:48 - pico-train - INFO - โโโ Loss: 5.8076 |
| 2025-08-30 11:36:48 - pico-train - INFO - โโโ Learning Rate: 1.02e-05 |
| 2025-08-30 11:36:48 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:37:00 - pico-train - INFO - Step 72650 -- ๐ Training Metrics |
| 2025-08-30 11:37:00 - pico-train - INFO - โโโ Loss: 5.8050 |
| 2025-08-30 11:37:00 - pico-train - INFO - โโโ Learning Rate: 1.01e-05 |
| 2025-08-30 11:37:00 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:37:13 - pico-train - INFO - Step 72675 -- ๐ Training Metrics |
| 2025-08-30 11:37:13 - pico-train - INFO - โโโ Loss: 5.8470 |
| 2025-08-30 11:37:13 - pico-train - INFO - โโโ Learning Rate: 1.01e-05 |
| 2025-08-30 11:37:13 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:37:26 - pico-train - INFO - Step 72700 -- ๐ Training Metrics |
| 2025-08-30 11:37:26 - pico-train - INFO - โโโ Loss: 5.7896 |
| 2025-08-30 11:37:26 - pico-train - INFO - โโโ Learning Rate: 1.01e-05 |
| 2025-08-30 11:37:26 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:37:39 - pico-train - INFO - Step 72725 -- ๐ Training Metrics |
| 2025-08-30 11:37:39 - pico-train - INFO - โโโ Loss: 5.7821 |
| 2025-08-30 11:37:39 - pico-train - INFO - โโโ Learning Rate: 1.01e-05 |
| 2025-08-30 11:37:39 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:37:52 - pico-train - INFO - Step 72750 -- ๐ Training Metrics |
| 2025-08-30 11:37:52 - pico-train - INFO - โโโ Loss: 5.7733 |
| 2025-08-30 11:37:52 - pico-train - INFO - โโโ Learning Rate: 1.01e-05 |
| 2025-08-30 11:37:52 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:38:04 - pico-train - INFO - Step 72775 -- ๐ Training Metrics |
| 2025-08-30 11:38:04 - pico-train - INFO - โโโ Loss: 5.8627 |
| 2025-08-30 11:38:04 - pico-train - INFO - โโโ Learning Rate: 1.00e-05 |
| 2025-08-30 11:38:04 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:38:17 - pico-train - INFO - Step 72800 -- ๐ Training Metrics |
| 2025-08-30 11:38:17 - pico-train - INFO - โโโ Loss: 5.8219 |
| 2025-08-30 11:38:17 - pico-train - INFO - โโโ Learning Rate: 1.00e-05 |
| 2025-08-30 11:38:17 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:38:30 - pico-train - INFO - Step 72825 -- ๐ Training Metrics |
| 2025-08-30 11:38:30 - pico-train - INFO - โโโ Loss: 5.8448 |
| 2025-08-30 11:38:30 - pico-train - INFO - โโโ Learning Rate: 1.00e-05 |
| 2025-08-30 11:38:30 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:38:42 - pico-train - INFO - Step 72850 -- ๐ Training Metrics |
| 2025-08-30 11:38:42 - pico-train - INFO - โโโ Loss: 5.7459 |
| 2025-08-30 11:38:42 - pico-train - INFO - โโโ Learning Rate: 1.00e-05 |
| 2025-08-30 11:38:42 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:38:55 - pico-train - INFO - Step 72875 -- ๐ Training Metrics |
| 2025-08-30 11:38:55 - pico-train - INFO - โโโ Loss: 5.8400 |
| 2025-08-30 11:38:55 - pico-train - INFO - โโโ Learning Rate: 9.98e-06 |
| 2025-08-30 11:38:55 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:39:07 - pico-train - INFO - Step 72900 -- ๐ Training Metrics |
| 2025-08-30 11:39:07 - pico-train - INFO - โโโ Loss: 5.7810 |
| 2025-08-30 11:39:07 - pico-train - INFO - โโโ Learning Rate: 9.96e-06 |
| 2025-08-30 11:39:07 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:39:20 - pico-train - INFO - Step 72925 -- ๐ Training Metrics |
| 2025-08-30 11:39:20 - pico-train - INFO - โโโ Loss: 5.8001 |
| 2025-08-30 11:39:20 - pico-train - INFO - โโโ Learning Rate: 9.95e-06 |
| 2025-08-30 11:39:20 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:39:33 - pico-train - INFO - Step 72950 -- ๐ Training Metrics |
| 2025-08-30 11:39:33 - pico-train - INFO - โโโ Loss: 5.8616 |
| 2025-08-30 11:39:33 - pico-train - INFO - โโโ Learning Rate: 9.93e-06 |
| 2025-08-30 11:39:33 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:39:45 - pico-train - INFO - Step 72975 -- ๐ Training Metrics |
| 2025-08-30 11:39:45 - pico-train - INFO - โโโ Loss: 5.8884 |
| 2025-08-30 11:39:45 - pico-train - INFO - โโโ Learning Rate: 9.91e-06 |
| 2025-08-30 11:39:45 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:39:57 - pico-train - INFO - Step 73000 -- ๐พ Saving Checkpoint |
| 2025-08-30 11:41:54 - pico-train - INFO - Step 73000 -- ๐ Evaluation Results |
| 2025-08-30 11:41:54 - pico-train - INFO - โโโ paloma: 8.156828131388013e+31 |
| 2025-08-30 11:41:56 - pico-train - INFO - Step 73000 -- ๐ Training Metrics |
| 2025-08-30 11:41:56 - pico-train - INFO - โโโ Loss: 5.7843 |
| 2025-08-30 11:41:56 - pico-train - INFO - โโโ Learning Rate: 9.89e-06 |
| 2025-08-30 11:41:56 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:41:56 - pico-train - INFO - Step 73000 -- ๐ Saving Learning Dynamics |
| 2025-08-30 11:42:11 - pico-train - INFO - Step 73025 -- ๐ Training Metrics |
| 2025-08-30 11:42:11 - pico-train - INFO - โโโ Loss: 5.7129 |
| 2025-08-30 11:42:11 - pico-train - INFO - โโโ Learning Rate: 9.88e-06 |
| 2025-08-30 11:42:11 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:42:24 - pico-train - INFO - Step 73050 -- ๐ Training Metrics |
| 2025-08-30 11:42:24 - pico-train - INFO - โโโ Loss: 5.8605 |
| 2025-08-30 11:42:24 - pico-train - INFO - โโโ Learning Rate: 9.86e-06 |
| 2025-08-30 11:42:24 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:42:37 - pico-train - INFO - Step 73075 -- ๐ Training Metrics |
| 2025-08-30 11:42:37 - pico-train - INFO - โโโ Loss: 5.8538 |
| 2025-08-30 11:42:37 - pico-train - INFO - โโโ Learning Rate: 9.84e-06 |
| 2025-08-30 11:42:37 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:42:49 - pico-train - INFO - Step 73100 -- ๐ Training Metrics |
| 2025-08-30 11:42:49 - pico-train - INFO - โโโ Loss: 5.8061 |
| 2025-08-30 11:42:49 - pico-train - INFO - โโโ Learning Rate: 9.83e-06 |
| 2025-08-30 11:42:49 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:43:02 - pico-train - INFO - Step 73125 -- ๐ Training Metrics |
| 2025-08-30 11:43:02 - pico-train - INFO - โโโ Loss: 5.6467 |
| 2025-08-30 11:43:02 - pico-train - INFO - โโโ Learning Rate: 9.81e-06 |
| 2025-08-30 11:43:02 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:43:14 - pico-train - INFO - Step 73150 -- ๐ Training Metrics |
| 2025-08-30 11:43:14 - pico-train - INFO - โโโ Loss: 5.7946 |
| 2025-08-30 11:43:14 - pico-train - INFO - โโโ Learning Rate: 9.79e-06 |
| 2025-08-30 11:43:14 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:43:27 - pico-train - INFO - Step 73175 -- ๐ Training Metrics |
| 2025-08-30 11:43:27 - pico-train - INFO - โโโ Loss: 5.7591 |
| 2025-08-30 11:43:27 - pico-train - INFO - โโโ Learning Rate: 9.78e-06 |
| 2025-08-30 11:43:27 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:43:40 - pico-train - INFO - Step 73200 -- ๐ Training Metrics |
| 2025-08-30 11:43:40 - pico-train - INFO - โโโ Loss: 5.7435 |
| 2025-08-30 11:43:40 - pico-train - INFO - โโโ Learning Rate: 9.76e-06 |
| 2025-08-30 11:43:40 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:43:55 - pico-train - INFO - Step 73225 -- ๐ Training Metrics |
| 2025-08-30 11:43:55 - pico-train - INFO - โโโ Loss: 5.7541 |
| 2025-08-30 11:43:55 - pico-train - INFO - โโโ Learning Rate: 9.74e-06 |
| 2025-08-30 11:43:55 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:44:08 - pico-train - INFO - Step 73250 -- ๐ Training Metrics |
| 2025-08-30 11:44:08 - pico-train - INFO - โโโ Loss: 5.8107 |
| 2025-08-30 11:44:08 - pico-train - INFO - โโโ Learning Rate: 9.72e-06 |
| 2025-08-30 11:44:08 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:44:20 - pico-train - INFO - Step 73275 -- ๐ Training Metrics |
| 2025-08-30 11:44:20 - pico-train - INFO - โโโ Loss: 5.7636 |
| 2025-08-30 11:44:20 - pico-train - INFO - โโโ Learning Rate: 9.71e-06 |
| 2025-08-30 11:44:20 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:44:33 - pico-train - INFO - Step 73300 -- ๐ Training Metrics |
| 2025-08-30 11:44:33 - pico-train - INFO - โโโ Loss: 5.7746 |
| 2025-08-30 11:44:33 - pico-train - INFO - โโโ Learning Rate: 9.69e-06 |
| 2025-08-30 11:44:33 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:44:46 - pico-train - INFO - Step 73325 -- ๐ Training Metrics |
| 2025-08-30 11:44:46 - pico-train - INFO - โโโ Loss: 5.8366 |
| 2025-08-30 11:44:46 - pico-train - INFO - โโโ Learning Rate: 9.67e-06 |
| 2025-08-30 11:44:46 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:44:58 - pico-train - INFO - Step 73350 -- ๐ Training Metrics |
| 2025-08-30 11:44:58 - pico-train - INFO - โโโ Loss: 5.8148 |
| 2025-08-30 11:44:58 - pico-train - INFO - โโโ Learning Rate: 9.66e-06 |
| 2025-08-30 11:44:58 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:45:11 - pico-train - INFO - Step 73375 -- ๐ Training Metrics |
| 2025-08-30 11:45:11 - pico-train - INFO - โโโ Loss: 5.8216 |
| 2025-08-30 11:45:11 - pico-train - INFO - โโโ Learning Rate: 9.64e-06 |
| 2025-08-30 11:45:11 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:45:24 - pico-train - INFO - Step 73400 -- ๐ Training Metrics |
| 2025-08-30 11:45:24 - pico-train - INFO - โโโ Loss: 5.8380 |
| 2025-08-30 11:45:24 - pico-train - INFO - โโโ Learning Rate: 9.62e-06 |
| 2025-08-30 11:45:24 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:45:36 - pico-train - INFO - Step 73425 -- ๐ Training Metrics |
| 2025-08-30 11:45:36 - pico-train - INFO - โโโ Loss: 5.7821 |
| 2025-08-30 11:45:36 - pico-train - INFO - โโโ Learning Rate: 9.61e-06 |
| 2025-08-30 11:45:36 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:45:49 - pico-train - INFO - Step 73450 -- ๐ Training Metrics |
| 2025-08-30 11:45:49 - pico-train - INFO - โโโ Loss: 5.7886 |
| 2025-08-30 11:45:49 - pico-train - INFO - โโโ Learning Rate: 9.59e-06 |
| 2025-08-30 11:45:49 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:46:01 - pico-train - INFO - Step 73475 -- ๐ Training Metrics |
| 2025-08-30 11:46:01 - pico-train - INFO - โโโ Loss: 5.7748 |
| 2025-08-30 11:46:01 - pico-train - INFO - โโโ Learning Rate: 9.57e-06 |
| 2025-08-30 11:46:01 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:46:13 - pico-train - INFO - Step 73500 -- ๐พ Saving Checkpoint |
| 2025-08-30 11:48:25 - pico-train - INFO - Step 73500 -- ๐ Evaluation Results |
| 2025-08-30 11:48:25 - pico-train - INFO - โโโ paloma: 9.704589730778985e+31 |
| 2025-08-30 11:48:27 - pico-train - INFO - Step 73500 -- ๐ Training Metrics |
| 2025-08-30 11:48:27 - pico-train - INFO - โโโ Loss: 5.6891 |
| 2025-08-30 11:48:27 - pico-train - INFO - โโโ Learning Rate: 9.56e-06 |
| 2025-08-30 11:48:27 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:48:27 - pico-train - INFO - Step 73500 -- ๐ Saving Learning Dynamics |
| 2025-08-30 11:48:43 - pico-train - INFO - Step 73525 -- ๐ Training Metrics |
| 2025-08-30 11:48:43 - pico-train - INFO - โโโ Loss: 5.7314 |
| 2025-08-30 11:48:43 - pico-train - INFO - โโโ Learning Rate: 9.54e-06 |
| 2025-08-30 11:48:43 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:48:56 - pico-train - INFO - Step 73550 -- ๐ Training Metrics |
| 2025-08-30 11:48:56 - pico-train - INFO - โโโ Loss: 5.6260 |
| 2025-08-30 11:48:56 - pico-train - INFO - โโโ Learning Rate: 9.52e-06 |
| 2025-08-30 11:48:56 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:49:08 - pico-train - INFO - Step 73575 -- ๐ Training Metrics |
| 2025-08-30 11:49:08 - pico-train - INFO - โโโ Loss: 5.7770 |
| 2025-08-30 11:49:08 - pico-train - INFO - โโโ Learning Rate: 9.51e-06 |
| 2025-08-30 11:49:08 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:49:21 - pico-train - INFO - Step 73600 -- ๐ Training Metrics |
| 2025-08-30 11:49:21 - pico-train - INFO - โโโ Loss: 5.7806 |
| 2025-08-30 11:49:21 - pico-train - INFO - โโโ Learning Rate: 9.49e-06 |
| 2025-08-30 11:49:21 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:49:34 - pico-train - INFO - Step 73625 -- ๐ Training Metrics |
| 2025-08-30 11:49:34 - pico-train - INFO - โโโ Loss: 5.7744 |
| 2025-08-30 11:49:34 - pico-train - INFO - โโโ Learning Rate: 9.47e-06 |
| 2025-08-30 11:49:34 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:49:46 - pico-train - INFO - Step 73650 -- ๐ Training Metrics |
| 2025-08-30 11:49:46 - pico-train - INFO - โโโ Loss: 5.7460 |
| 2025-08-30 11:49:46 - pico-train - INFO - โโโ Learning Rate: 9.46e-06 |
| 2025-08-30 11:49:46 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:49:59 - pico-train - INFO - Step 73675 -- ๐ Training Metrics |
| 2025-08-30 11:49:59 - pico-train - INFO - โโโ Loss: 5.8272 |
| 2025-08-30 11:49:59 - pico-train - INFO - โโโ Learning Rate: 9.44e-06 |
| 2025-08-30 11:49:59 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:50:12 - pico-train - INFO - Step 73700 -- ๐ Training Metrics |
| 2025-08-30 11:50:12 - pico-train - INFO - โโโ Loss: 5.7866 |
| 2025-08-30 11:50:12 - pico-train - INFO - โโโ Learning Rate: 9.42e-06 |
| 2025-08-30 11:50:12 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:50:25 - pico-train - INFO - Step 73725 -- ๐ Training Metrics |
| 2025-08-30 11:50:25 - pico-train - INFO - โโโ Loss: 5.7838 |
| 2025-08-30 11:50:25 - pico-train - INFO - โโโ Learning Rate: 9.41e-06 |
| 2025-08-30 11:50:25 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:50:38 - pico-train - INFO - Step 73750 -- ๐ Training Metrics |
| 2025-08-30 11:50:38 - pico-train - INFO - โโโ Loss: 5.6949 |
| 2025-08-30 11:50:38 - pico-train - INFO - โโโ Learning Rate: 9.39e-06 |
| 2025-08-30 11:50:38 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:50:51 - pico-train - INFO - Step 73775 -- ๐ Training Metrics |
| 2025-08-30 11:50:51 - pico-train - INFO - โโโ Loss: 5.7301 |
| 2025-08-30 11:50:51 - pico-train - INFO - โโโ Learning Rate: 9.37e-06 |
| 2025-08-30 11:50:51 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:51:03 - pico-train - INFO - Step 73800 -- ๐ Training Metrics |
| 2025-08-30 11:51:03 - pico-train - INFO - โโโ Loss: 5.7987 |
| 2025-08-30 11:51:03 - pico-train - INFO - โโโ Learning Rate: 9.36e-06 |
| 2025-08-30 11:51:03 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:51:16 - pico-train - INFO - Step 73825 -- ๐ Training Metrics |
| 2025-08-30 11:51:16 - pico-train - INFO - โโโ Loss: 5.8495 |
| 2025-08-30 11:51:16 - pico-train - INFO - โโโ Learning Rate: 9.34e-06 |
| 2025-08-30 11:51:16 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:51:29 - pico-train - INFO - Step 73850 -- ๐ Training Metrics |
| 2025-08-30 11:51:29 - pico-train - INFO - โโโ Loss: 5.7411 |
| 2025-08-30 11:51:29 - pico-train - INFO - โโโ Learning Rate: 9.32e-06 |
| 2025-08-30 11:51:29 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:51:41 - pico-train - INFO - Step 73875 -- ๐ Training Metrics |
| 2025-08-30 11:51:41 - pico-train - INFO - โโโ Loss: 5.7792 |
| 2025-08-30 11:51:41 - pico-train - INFO - โโโ Learning Rate: 9.31e-06 |
| 2025-08-30 11:51:41 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:51:54 - pico-train - INFO - Step 73900 -- ๐ Training Metrics |
| 2025-08-30 11:51:54 - pico-train - INFO - โโโ Loss: 5.8225 |
| 2025-08-30 11:51:54 - pico-train - INFO - โโโ Learning Rate: 9.29e-06 |
| 2025-08-30 11:51:54 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:52:07 - pico-train - INFO - Step 73925 -- ๐ Training Metrics |
| 2025-08-30 11:52:07 - pico-train - INFO - โโโ Loss: 5.7823 |
| 2025-08-30 11:52:07 - pico-train - INFO - โโโ Learning Rate: 9.27e-06 |
| 2025-08-30 11:52:07 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:52:19 - pico-train - INFO - Step 73950 -- ๐ Training Metrics |
| 2025-08-30 11:52:19 - pico-train - INFO - โโโ Loss: 5.6970 |
| 2025-08-30 11:52:19 - pico-train - INFO - โโโ Learning Rate: 9.26e-06 |
| 2025-08-30 11:52:19 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:52:32 - pico-train - INFO - Step 73975 -- ๐ Training Metrics |
| 2025-08-30 11:52:32 - pico-train - INFO - โโโ Loss: 5.7531 |
| 2025-08-30 11:52:32 - pico-train - INFO - โโโ Learning Rate: 9.24e-06 |
| 2025-08-30 11:52:32 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:52:45 - pico-train - INFO - Step 74000 -- ๐พ Saving Checkpoint |
| 2025-08-30 11:54:39 - pico-train - INFO - Step 74000 -- ๐ Evaluation Results |
| 2025-08-30 11:54:39 - pico-train - INFO - โโโ paloma: 8.636477783625786e+31 |
| 2025-08-30 11:54:42 - pico-train - INFO - Step 74000 -- ๐ Training Metrics |
| 2025-08-30 11:54:42 - pico-train - INFO - โโโ Loss: 5.7592 |
| 2025-08-30 11:54:42 - pico-train - INFO - โโโ Learning Rate: 9.22e-06 |
| 2025-08-30 11:54:42 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:54:42 - pico-train - INFO - Step 74000 -- ๐ Saving Learning Dynamics |
| 2025-08-30 11:54:58 - pico-train - INFO - Step 74025 -- ๐ Training Metrics |
| 2025-08-30 11:54:58 - pico-train - INFO - โโโ Loss: 5.7057 |
| 2025-08-30 11:54:58 - pico-train - INFO - โโโ Learning Rate: 9.21e-06 |
| 2025-08-30 11:54:58 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:55:11 - pico-train - INFO - Step 74050 -- ๐ Training Metrics |
| 2025-08-30 11:55:11 - pico-train - INFO - โโโ Loss: 5.8112 |
| 2025-08-30 11:55:11 - pico-train - INFO - โโโ Learning Rate: 9.19e-06 |
| 2025-08-30 11:55:11 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:55:23 - pico-train - INFO - Step 74075 -- ๐ Training Metrics |
| 2025-08-30 11:55:23 - pico-train - INFO - โโโ Loss: 5.8551 |
| 2025-08-30 11:55:23 - pico-train - INFO - โโโ Learning Rate: 9.17e-06 |
| 2025-08-30 11:55:23 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:55:36 - pico-train - INFO - Step 74100 -- ๐ Training Metrics |
| 2025-08-30 11:55:36 - pico-train - INFO - โโโ Loss: 5.7881 |
| 2025-08-30 11:55:36 - pico-train - INFO - โโโ Learning Rate: 9.16e-06 |
| 2025-08-30 11:55:36 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:55:49 - pico-train - INFO - Step 74125 -- ๐ Training Metrics |
| 2025-08-30 11:55:49 - pico-train - INFO - โโโ Loss: 5.7239 |
| 2025-08-30 11:55:49 - pico-train - INFO - โโโ Learning Rate: 9.14e-06 |
| 2025-08-30 11:55:49 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:56:01 - pico-train - INFO - Step 74150 -- ๐ Training Metrics |
| 2025-08-30 11:56:01 - pico-train - INFO - โโโ Loss: 5.7491 |
| 2025-08-30 11:56:01 - pico-train - INFO - โโโ Learning Rate: 9.12e-06 |
| 2025-08-30 11:56:01 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:56:14 - pico-train - INFO - Step 74175 -- ๐ Training Metrics |
| 2025-08-30 11:56:14 - pico-train - INFO - โโโ Loss: 5.7418 |
| 2025-08-30 11:56:14 - pico-train - INFO - โโโ Learning Rate: 9.11e-06 |
| 2025-08-30 11:56:14 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:56:27 - pico-train - INFO - Step 74200 -- ๐ Training Metrics |
| 2025-08-30 11:56:27 - pico-train - INFO - โโโ Loss: 5.8195 |
| 2025-08-30 11:56:27 - pico-train - INFO - โโโ Learning Rate: 9.09e-06 |
| 2025-08-30 11:56:27 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:56:40 - pico-train - INFO - Step 74225 -- ๐ Training Metrics |
| 2025-08-30 11:56:40 - pico-train - INFO - โโโ Loss: 5.8008 |
| 2025-08-30 11:56:40 - pico-train - INFO - โโโ Learning Rate: 9.07e-06 |
| 2025-08-30 11:56:40 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:56:53 - pico-train - INFO - Step 74250 -- ๐ Training Metrics |
| 2025-08-30 11:56:53 - pico-train - INFO - โโโ Loss: 5.7900 |
| 2025-08-30 11:56:53 - pico-train - INFO - โโโ Learning Rate: 9.06e-06 |
| 2025-08-30 11:56:53 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:57:05 - pico-train - INFO - Step 74275 -- ๐ Training Metrics |
| 2025-08-30 11:57:05 - pico-train - INFO - โโโ Loss: 5.8471 |
| 2025-08-30 11:57:05 - pico-train - INFO - โโโ Learning Rate: 9.04e-06 |
| 2025-08-30 11:57:05 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:57:18 - pico-train - INFO - Step 74300 -- ๐ Training Metrics |
| 2025-08-30 11:57:18 - pico-train - INFO - โโโ Loss: 5.8221 |
| 2025-08-30 11:57:18 - pico-train - INFO - โโโ Learning Rate: 9.02e-06 |
| 2025-08-30 11:57:18 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:57:31 - pico-train - INFO - Step 74325 -- ๐ Training Metrics |
| 2025-08-30 11:57:31 - pico-train - INFO - โโโ Loss: 5.7390 |
| 2025-08-30 11:57:31 - pico-train - INFO - โโโ Learning Rate: 9.01e-06 |
| 2025-08-30 11:57:31 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:57:43 - pico-train - INFO - Step 74350 -- ๐ Training Metrics |
| 2025-08-30 11:57:43 - pico-train - INFO - โโโ Loss: 5.7864 |
| 2025-08-30 11:57:43 - pico-train - INFO - โโโ Learning Rate: 8.99e-06 |
| 2025-08-30 11:57:43 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:57:56 - pico-train - INFO - Step 74375 -- ๐ Training Metrics |
| 2025-08-30 11:57:56 - pico-train - INFO - โโโ Loss: 5.8961 |
| 2025-08-30 11:57:56 - pico-train - INFO - โโโ Learning Rate: 8.98e-06 |
| 2025-08-30 11:57:56 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:58:08 - pico-train - INFO - Step 74400 -- ๐ Training Metrics |
| 2025-08-30 11:58:08 - pico-train - INFO - โโโ Loss: 5.7558 |
| 2025-08-30 11:58:08 - pico-train - INFO - โโโ Learning Rate: 8.96e-06 |
| 2025-08-30 11:58:08 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:58:21 - pico-train - INFO - Step 74425 -- ๐ Training Metrics |
| 2025-08-30 11:58:21 - pico-train - INFO - โโโ Loss: 5.7641 |
| 2025-08-30 11:58:21 - pico-train - INFO - โโโ Learning Rate: 8.94e-06 |
| 2025-08-30 11:58:21 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:58:34 - pico-train - INFO - Step 74450 -- ๐ Training Metrics |
| 2025-08-30 11:58:34 - pico-train - INFO - โโโ Loss: 5.7386 |
| 2025-08-30 11:58:34 - pico-train - INFO - โโโ Learning Rate: 8.93e-06 |
| 2025-08-30 11:58:34 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:58:46 - pico-train - INFO - Step 74475 -- ๐ Training Metrics |
| 2025-08-30 11:58:46 - pico-train - INFO - โโโ Loss: 5.7682 |
| 2025-08-30 11:58:46 - pico-train - INFO - โโโ Learning Rate: 8.91e-06 |
| 2025-08-30 11:58:46 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 11:58:59 - pico-train - INFO - Step 74500 -- ๐พ Saving Checkpoint |
| 2025-08-30 12:00:53 - pico-train - INFO - Step 74500 -- ๐ Evaluation Results |
| 2025-08-30 12:00:53 - pico-train - INFO - โโโ paloma: 9.875388203359053e+31 |
| 2025-08-30 12:00:55 - pico-train - INFO - Step 74500 -- ๐ Training Metrics |
| 2025-08-30 12:00:55 - pico-train - INFO - โโโ Loss: 5.7399 |
| 2025-08-30 12:00:55 - pico-train - INFO - โโโ Learning Rate: 8.89e-06 |
| 2025-08-30 12:00:55 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 12:00:55 - pico-train - INFO - Step 74500 -- ๐ Saving Learning Dynamics |
| 2025-08-30 12:01:10 - pico-train - INFO - Step 74525 -- ๐ Training Metrics |
| 2025-08-30 12:01:10 - pico-train - INFO - โโโ Loss: 5.7499 |
| 2025-08-30 12:01:10 - pico-train - INFO - โโโ Learning Rate: 8.88e-06 |
| 2025-08-30 12:01:10 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 12:01:22 - pico-train - INFO - Step 74550 -- ๐ Training Metrics |
| 2025-08-30 12:01:22 - pico-train - INFO - โโโ Loss: 5.8008 |
| 2025-08-30 12:01:22 - pico-train - INFO - โโโ Learning Rate: 8.86e-06 |
| 2025-08-30 12:01:22 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 12:01:35 - pico-train - INFO - Step 74575 -- ๐ Training Metrics |
| 2025-08-30 12:01:35 - pico-train - INFO - โโโ Loss: 5.8048 |
| 2025-08-30 12:01:35 - pico-train - INFO - โโโ Learning Rate: 8.85e-06 |
| 2025-08-30 12:01:35 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 12:01:48 - pico-train - INFO - Step 74600 -- ๐ Training Metrics |
| 2025-08-30 12:01:48 - pico-train - INFO - โโโ Loss: 5.7352 |
| 2025-08-30 12:01:48 - pico-train - INFO - โโโ Learning Rate: 8.83e-06 |
| 2025-08-30 12:01:48 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 12:02:00 - pico-train - INFO - Step 74625 -- ๐ Training Metrics |
| 2025-08-30 12:02:00 - pico-train - INFO - โโโ Loss: 5.7900 |
| 2025-08-30 12:02:00 - pico-train - INFO - โโโ Learning Rate: 8.81e-06 |
| 2025-08-30 12:02:00 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 12:02:13 - pico-train - INFO - Step 74650 -- ๐ Training Metrics |
| 2025-08-30 12:02:13 - pico-train - INFO - โโโ Loss: 5.8181 |
| 2025-08-30 12:02:13 - pico-train - INFO - โโโ Learning Rate: 8.80e-06 |
| 2025-08-30 12:02:13 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 12:02:25 - pico-train - INFO - Step 74675 -- ๐ Training Metrics |
| 2025-08-30 12:02:25 - pico-train - INFO - โโโ Loss: 5.8068 |
| 2025-08-30 12:02:25 - pico-train - INFO - โโโ Learning Rate: 8.78e-06 |
| 2025-08-30 12:02:25 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 12:02:38 - pico-train - INFO - Step 74700 -- ๐ Training Metrics |
| 2025-08-30 12:02:38 - pico-train - INFO - โโโ Loss: 5.7906 |
| 2025-08-30 12:02:38 - pico-train - INFO - โโโ Learning Rate: 8.76e-06 |
| 2025-08-30 12:02:38 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 12:02:51 - pico-train - INFO - Step 74725 -- ๐ Training Metrics |
| 2025-08-30 12:02:51 - pico-train - INFO - โโโ Loss: 5.7719 |
| 2025-08-30 12:02:51 - pico-train - INFO - โโโ Learning Rate: 8.75e-06 |
| 2025-08-30 12:02:51 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 12:03:04 - pico-train - INFO - Step 74750 -- ๐ Training Metrics |
| 2025-08-30 12:03:04 - pico-train - INFO - โโโ Loss: 5.7901 |
| 2025-08-30 12:03:04 - pico-train - INFO - โโโ Learning Rate: 8.73e-06 |
| 2025-08-30 12:03:04 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 12:03:16 - pico-train - INFO - Step 74775 -- ๐ Training Metrics |
| 2025-08-30 12:03:16 - pico-train - INFO - โโโ Loss: 5.7765 |
| 2025-08-30 12:03:16 - pico-train - INFO - โโโ Learning Rate: 8.72e-06 |
| 2025-08-30 12:03:16 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 12:03:29 - pico-train - INFO - Step 74800 -- ๐ Training Metrics |
| 2025-08-30 12:03:29 - pico-train - INFO - โโโ Loss: 5.7052 |
| 2025-08-30 12:03:29 - pico-train - INFO - โโโ Learning Rate: 8.70e-06 |
| 2025-08-30 12:03:29 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 12:03:42 - pico-train - INFO - Step 74825 -- ๐ Training Metrics |
| 2025-08-30 12:03:42 - pico-train - INFO - โโโ Loss: 5.7863 |
| 2025-08-30 12:03:42 - pico-train - INFO - โโโ Learning Rate: 8.68e-06 |
| 2025-08-30 12:03:42 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 12:03:54 - pico-train - INFO - Step 74850 -- ๐ Training Metrics |
| 2025-08-30 12:03:54 - pico-train - INFO - โโโ Loss: 5.7816 |
| 2025-08-30 12:03:54 - pico-train - INFO - โโโ Learning Rate: 8.67e-06 |
| 2025-08-30 12:03:54 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 12:04:07 - pico-train - INFO - Step 74875 -- ๐ Training Metrics |
| 2025-08-30 12:04:07 - pico-train - INFO - โโโ Loss: 5.7777 |
| 2025-08-30 12:04:07 - pico-train - INFO - โโโ Learning Rate: 8.65e-06 |
| 2025-08-30 12:04:07 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 12:04:21 - pico-train - INFO - Step 74900 -- ๐ Training Metrics |
| 2025-08-30 12:04:21 - pico-train - INFO - โโโ Loss: 5.8692 |
| 2025-08-30 12:04:21 - pico-train - INFO - โโโ Learning Rate: 8.63e-06 |
| 2025-08-30 12:04:21 - pico-train - INFO - โโโ Inf/NaN count: 0 |
|
|