| 2025-08-30 18:45:19 - pico-train - INFO - Step 0 -- ๐ Evaluation Results |
| 2025-08-30 18:45:19 - pico-train - INFO - โโโ paloma: inf |
| 2025-08-30 18:45:19 - pico-train - INFO - ================================================== |
| 2025-08-30 18:45:19 - pico-train - INFO - โจ Training Configuration |
| 2025-08-30 18:45:19 - pico-train - INFO - ================================================== |
| 2025-08-30 18:45:19 - pico-train - INFO - โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ checkpointing: โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ checkpoints_dir: checkpoints โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ evaluation: โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ eval_results_dir: eval_results โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ fabric_checkpoint_dir: fabric_state โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ fabric_checkpoint_filename: checkpoint.pt โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ hf_checkpoint: โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ collection_slug: null โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ repo_id: ThomasTheMaker/pico-decoder-tiny โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ learning_dynamics: โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ batch_size: 1 โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ eval_data: null โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ layer_suffixes: โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ - attention.v_proj โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ - attention.o_proj โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ - swiglu.w_2 โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ sequence_idx: -1 โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ learning_dynamics_dir: learning_dynamics โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ logs_dir: logs โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ run_name: pico-decoder-tiny-wikipedia_en-v1 โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ runs_dir: runs โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ save_every_n_steps: 2000 โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ save_to_hf: false โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ training: โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ auto_resume: true โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ data: โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ dataloader: โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ batch_size: 16 โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ dataset: โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ name: ThomasTheMaker/pretokenized_wiki_en โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ tokenizer: โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ name: allenai/OLMo-7B-0724-hf โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ vocab_size: 50304 โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ evaluation: โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ metrics: โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ - paloma โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ paloma: โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ batch_size: 1 โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ dataset_name: pico-lm/pretokenized-paloma-tinsy โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ dataset_split: val โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ max_length: 2048 โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ model: โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ activation_hidden_dim: 384 โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ attention_n_heads: 12 โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ attention_n_kv_heads: 4 โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ batch_size: 1024 โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ d_model: 96 โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ max_seq_len: 2048 โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ model_type: pico_decoder โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ n_layers: 12 โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ norm_eps: 1.0e-06 โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ position_emb_theta: 10000.0 โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ vocab_size: 50304 โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ monitoring: โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ logging: โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ log_every_n_steps: 100 โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ log_level: INFO โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ save_to_wandb: false โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ wandb: โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ entity: boymyc โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ project: pico-decoder-tiny โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ training: โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ fabric: โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ accelerator: cuda โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ num_devices: 1 โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ num_nodes: 1 โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ precision: bf16-mixed โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ max_steps: 100000 โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ optimization: โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ gradient_accumulation_steps: 1 โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ lr: 0.0002 โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ lr_scheduler: cosine โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ lr_warmup_steps: 2000 โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ optimizer: adamw โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โ โ |
| 2025-08-30 18:45:19 - pico-train - INFO - โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ |
| 2025-08-30 18:45:19 - pico-train - INFO - ================================================== |
| 2025-08-30 18:45:19 - pico-train - INFO - โญ Runtime Summary: |
| 2025-08-30 18:45:19 - pico-train - INFO - ================================================== |
| 2025-08-30 18:45:19 - pico-train - INFO - Starting from step: 0 |
| 2025-08-30 18:45:19 - pico-train - INFO - Model Setup: |
| 2025-08-30 18:45:19 - pico-train - INFO - โโ Total Parameters: 11,282,784 |
| 2025-08-30 18:45:19 - pico-train - INFO - โโ Trainable Parameters: 11,282,784 |
| 2025-08-30 18:45:19 - pico-train - INFO - Distributed Setup: |
| 2025-08-30 18:45:19 - pico-train - INFO - โโ Number of Devices: 1 |
| 2025-08-30 18:45:19 - pico-train - INFO - โโ Device Type: NVIDIA GeForce RTX 5090 |
| 2025-08-30 18:45:19 - pico-train - INFO - โโ Available Memory: 33.68 GB |
| 2025-08-30 18:45:19 - pico-train - INFO - Software Setup: |
| 2025-08-30 18:45:19 - pico-train - INFO - โโ Python Version: 3.10.12 |
| 2025-08-30 18:45:19 - pico-train - INFO - โโ PyTorch Version: 2.8.0+cu128 |
| 2025-08-30 18:45:19 - pico-train - INFO - โโ CUDA Version: 12.8 |
| 2025-08-30 18:45:19 - pico-train - INFO - โโ Operating System: Linux 6.8.0-63-generic |
| 2025-08-30 18:45:19 - pico-train - INFO - Batch Size Configuration: |
| 2025-08-30 18:45:19 - pico-train - INFO - โโ Global Batch Size: 4 |
| 2025-08-30 18:45:19 - pico-train - INFO - โโ Per Device Batch Size: 1 |
| 2025-08-30 18:45:19 - pico-train - INFO - โโ Gradient Accumulation Steps: 4 |
| 2025-08-30 18:45:19 - pico-train - INFO - ================================================== |
| 2025-08-30 18:45:20 - pico-train - INFO - Step 0 -- ๐ Training Metrics |
| 2025-08-30 18:45:20 - pico-train - INFO - โโโ Loss: 10.9887 |
| 2025-08-30 18:45:20 - pico-train - INFO - โโโ Learning Rate: 0.00e+00 |
| 2025-08-30 18:45:20 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 18:45:20 - pico-train - INFO - Step 0 -- ๐ Saving Learning Dynamics |
| 2025-08-30 18:46:12 - pico-train - INFO - Step 100 -- ๐ Training Metrics |
| 2025-08-30 18:46:12 - pico-train - INFO - โโโ Loss: 10.9677 |
| 2025-08-30 18:46:12 - pico-train - INFO - โโโ Learning Rate: 1.00e-05 |
| 2025-08-30 18:46:12 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 18:47:04 - pico-train - INFO - Step 200 -- ๐ Training Metrics |
| 2025-08-30 18:47:04 - pico-train - INFO - โโโ Loss: 10.7751 |
| 2025-08-30 18:47:04 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-30 18:47:04 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 18:47:56 - pico-train - INFO - Step 300 -- ๐ Training Metrics |
| 2025-08-30 18:47:56 - pico-train - INFO - โโโ Loss: 10.2959 |
| 2025-08-30 18:47:56 - pico-train - INFO - โโโ Learning Rate: 3.00e-05 |
| 2025-08-30 18:47:56 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 18:48:47 - pico-train - INFO - Step 400 -- ๐ Training Metrics |
| 2025-08-30 18:48:47 - pico-train - INFO - โโโ Loss: 9.8408 |
| 2025-08-30 18:48:47 - pico-train - INFO - โโโ Learning Rate: 4.00e-05 |
| 2025-08-30 18:48:47 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 18:49:39 - pico-train - INFO - Step 500 -- ๐ Training Metrics |
| 2025-08-30 18:49:39 - pico-train - INFO - โโโ Loss: 9.4370 |
| 2025-08-30 18:49:39 - pico-train - INFO - โโโ Learning Rate: 5.00e-05 |
| 2025-08-30 18:49:39 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 18:50:31 - pico-train - INFO - Step 600 -- ๐ Training Metrics |
| 2025-08-30 18:50:31 - pico-train - INFO - โโโ Loss: 9.0035 |
| 2025-08-30 18:50:31 - pico-train - INFO - โโโ Learning Rate: 6.00e-05 |
| 2025-08-30 18:50:31 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 18:51:23 - pico-train - INFO - Step 700 -- ๐ Training Metrics |
| 2025-08-30 18:51:23 - pico-train - INFO - โโโ Loss: 8.5848 |
| 2025-08-30 18:51:23 - pico-train - INFO - โโโ Learning Rate: 7.00e-05 |
| 2025-08-30 18:51:23 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 18:52:15 - pico-train - INFO - Step 800 -- ๐ Training Metrics |
| 2025-08-30 18:52:15 - pico-train - INFO - โโโ Loss: 8.1784 |
| 2025-08-30 18:52:15 - pico-train - INFO - โโโ Learning Rate: 8.00e-05 |
| 2025-08-30 18:52:15 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 18:53:07 - pico-train - INFO - Step 900 -- ๐ Training Metrics |
| 2025-08-30 18:53:07 - pico-train - INFO - โโโ Loss: 7.8857 |
| 2025-08-30 18:53:07 - pico-train - INFO - โโโ Learning Rate: 9.00e-05 |
| 2025-08-30 18:53:07 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 18:53:59 - pico-train - INFO - Step 1000 -- ๐ Training Metrics |
| 2025-08-30 18:53:59 - pico-train - INFO - โโโ Loss: 7.7107 |
| 2025-08-30 18:53:59 - pico-train - INFO - โโโ Learning Rate: 1.00e-04 |
| 2025-08-30 18:53:59 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 18:54:51 - pico-train - INFO - Step 1100 -- ๐ Training Metrics |
| 2025-08-30 18:54:51 - pico-train - INFO - โโโ Loss: 7.6318 |
| 2025-08-30 18:54:51 - pico-train - INFO - โโโ Learning Rate: 1.10e-04 |
| 2025-08-30 18:54:51 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 18:55:42 - pico-train - INFO - Step 1200 -- ๐ Training Metrics |
| 2025-08-30 18:55:42 - pico-train - INFO - โโโ Loss: 7.5751 |
| 2025-08-30 18:55:42 - pico-train - INFO - โโโ Learning Rate: 1.20e-04 |
| 2025-08-30 18:55:42 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 18:56:34 - pico-train - INFO - Step 1300 -- ๐ Training Metrics |
| 2025-08-30 18:56:34 - pico-train - INFO - โโโ Loss: 7.4863 |
| 2025-08-30 18:56:34 - pico-train - INFO - โโโ Learning Rate: 1.30e-04 |
| 2025-08-30 18:56:34 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 18:57:26 - pico-train - INFO - Step 1400 -- ๐ Training Metrics |
| 2025-08-30 18:57:26 - pico-train - INFO - โโโ Loss: 7.4114 |
| 2025-08-30 18:57:26 - pico-train - INFO - โโโ Learning Rate: 1.40e-04 |
| 2025-08-30 18:57:26 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 18:58:18 - pico-train - INFO - Step 1500 -- ๐ Training Metrics |
| 2025-08-30 18:58:18 - pico-train - INFO - โโโ Loss: 7.3511 |
| 2025-08-30 18:58:18 - pico-train - INFO - โโโ Learning Rate: 1.50e-04 |
| 2025-08-30 18:58:18 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 18:59:10 - pico-train - INFO - Step 1600 -- ๐ Training Metrics |
| 2025-08-30 18:59:10 - pico-train - INFO - โโโ Loss: 7.2875 |
| 2025-08-30 18:59:10 - pico-train - INFO - โโโ Learning Rate: 1.60e-04 |
| 2025-08-30 18:59:10 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 19:00:02 - pico-train - INFO - Step 1700 -- ๐ Training Metrics |
| 2025-08-30 19:00:02 - pico-train - INFO - โโโ Loss: 7.2048 |
| 2025-08-30 19:00:02 - pico-train - INFO - โโโ Learning Rate: 1.70e-04 |
| 2025-08-30 19:00:02 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 19:00:54 - pico-train - INFO - Step 1800 -- ๐ Training Metrics |
| 2025-08-30 19:00:54 - pico-train - INFO - โโโ Loss: 7.1454 |
| 2025-08-30 19:00:54 - pico-train - INFO - โโโ Learning Rate: 1.80e-04 |
| 2025-08-30 19:00:54 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 19:01:46 - pico-train - INFO - Step 1900 -- ๐ Training Metrics |
| 2025-08-30 19:01:46 - pico-train - INFO - โโโ Loss: 7.0684 |
| 2025-08-30 19:01:46 - pico-train - INFO - โโโ Learning Rate: 1.90e-04 |
| 2025-08-30 19:01:46 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 19:02:37 - pico-train - INFO - Step 2000 -- ๐พ Saving Checkpoint |
| 2025-08-30 19:04:26 - pico-train - INFO - Step 2000 -- ๐ Evaluation Results |
| 2025-08-30 19:04:26 - pico-train - INFO - โโโ paloma: 5.052836452912709e+20 |
| 2025-08-30 19:04:27 - pico-train - INFO - Step 2000 -- ๐ Training Metrics |
| 2025-08-30 19:04:27 - pico-train - INFO - โโโ Loss: 7.0002 |
| 2025-08-30 19:04:27 - pico-train - INFO - โโโ Learning Rate: 2.00e-04 |
| 2025-08-30 19:04:27 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 19:04:27 - pico-train - INFO - Step 2000 -- ๐ Saving Learning Dynamics |
| 2025-08-30 19:05:20 - pico-train - INFO - Step 2100 -- ๐ Training Metrics |
| 2025-08-30 19:05:20 - pico-train - INFO - โโโ Loss: 6.9488 |
| 2025-08-30 19:05:20 - pico-train - INFO - โโโ Learning Rate: 2.00e-04 |
| 2025-08-30 19:05:20 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 19:06:11 - pico-train - INFO - Step 2200 -- ๐ Training Metrics |
| 2025-08-30 19:06:11 - pico-train - INFO - โโโ Loss: 6.8843 |
| 2025-08-30 19:06:11 - pico-train - INFO - โโโ Learning Rate: 2.00e-04 |
| 2025-08-30 19:06:11 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 19:07:03 - pico-train - INFO - Step 2300 -- ๐ Training Metrics |
| 2025-08-30 19:07:03 - pico-train - INFO - โโโ Loss: 6.7946 |
| 2025-08-30 19:07:03 - pico-train - INFO - โโโ Learning Rate: 2.00e-04 |
| 2025-08-30 19:07:03 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 19:07:55 - pico-train - INFO - Step 2400 -- ๐ Training Metrics |
| 2025-08-30 19:07:55 - pico-train - INFO - โโโ Loss: 6.7990 |
| 2025-08-30 19:07:55 - pico-train - INFO - โโโ Learning Rate: 2.00e-04 |
| 2025-08-30 19:07:55 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 19:08:48 - pico-train - INFO - Step 2500 -- ๐ Training Metrics |
| 2025-08-30 19:08:48 - pico-train - INFO - โโโ Loss: 6.7534 |
| 2025-08-30 19:08:48 - pico-train - INFO - โโโ Learning Rate: 2.00e-04 |
| 2025-08-30 19:08:48 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 19:09:40 - pico-train - INFO - Step 2600 -- ๐ Training Metrics |
| 2025-08-30 19:09:40 - pico-train - INFO - โโโ Loss: 6.7236 |
| 2025-08-30 19:09:40 - pico-train - INFO - โโโ Learning Rate: 2.00e-04 |
| 2025-08-30 19:09:40 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 19:10:31 - pico-train - INFO - Step 2700 -- ๐ Training Metrics |
| 2025-08-30 19:10:31 - pico-train - INFO - โโโ Loss: 6.6555 |
| 2025-08-30 19:10:31 - pico-train - INFO - โโโ Learning Rate: 2.00e-04 |
| 2025-08-30 19:10:31 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 19:11:23 - pico-train - INFO - Step 2800 -- ๐ Training Metrics |
| 2025-08-30 19:11:23 - pico-train - INFO - โโโ Loss: 6.6370 |
| 2025-08-30 19:11:23 - pico-train - INFO - โโโ Learning Rate: 2.00e-04 |
| 2025-08-30 19:11:23 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 19:12:15 - pico-train - INFO - Step 2900 -- ๐ Training Metrics |
| 2025-08-30 19:12:15 - pico-train - INFO - โโโ Loss: 6.5945 |
| 2025-08-30 19:12:15 - pico-train - INFO - โโโ Learning Rate: 2.00e-04 |
| 2025-08-30 19:12:15 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 19:13:07 - pico-train - INFO - Step 3000 -- ๐ Training Metrics |
| 2025-08-30 19:13:07 - pico-train - INFO - โโโ Loss: 6.5725 |
| 2025-08-30 19:13:07 - pico-train - INFO - โโโ Learning Rate: 2.00e-04 |
| 2025-08-30 19:13:07 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 19:13:59 - pico-train - INFO - Step 3100 -- ๐ Training Metrics |
| 2025-08-30 19:13:59 - pico-train - INFO - โโโ Loss: 6.5427 |
| 2025-08-30 19:13:59 - pico-train - INFO - โโโ Learning Rate: 2.00e-04 |
| 2025-08-30 19:13:59 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 19:14:50 - pico-train - INFO - Step 3200 -- ๐ Training Metrics |
| 2025-08-30 19:14:50 - pico-train - INFO - โโโ Loss: 6.4710 |
| 2025-08-30 19:14:50 - pico-train - INFO - โโโ Learning Rate: 2.00e-04 |
| 2025-08-30 19:14:50 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 19:15:42 - pico-train - INFO - Step 3300 -- ๐ Training Metrics |
| 2025-08-30 19:15:42 - pico-train - INFO - โโโ Loss: 6.4428 |
| 2025-08-30 19:15:42 - pico-train - INFO - โโโ Learning Rate: 2.00e-04 |
| 2025-08-30 19:15:42 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 19:16:33 - pico-train - INFO - Step 3400 -- ๐ Training Metrics |
| 2025-08-30 19:16:33 - pico-train - INFO - โโโ Loss: 6.4280 |
| 2025-08-30 19:16:33 - pico-train - INFO - โโโ Learning Rate: 2.00e-04 |
| 2025-08-30 19:16:33 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 19:17:26 - pico-train - INFO - Step 3500 -- ๐ Training Metrics |
| 2025-08-30 19:17:26 - pico-train - INFO - โโโ Loss: 6.4210 |
| 2025-08-30 19:17:26 - pico-train - INFO - โโโ Learning Rate: 2.00e-04 |
| 2025-08-30 19:17:26 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 19:18:17 - pico-train - INFO - Step 3600 -- ๐ Training Metrics |
| 2025-08-30 19:18:17 - pico-train - INFO - โโโ Loss: 6.3990 |
| 2025-08-30 19:18:17 - pico-train - INFO - โโโ Learning Rate: 2.00e-04 |
| 2025-08-30 19:18:17 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 19:19:09 - pico-train - INFO - Step 3700 -- ๐ Training Metrics |
| 2025-08-30 19:19:09 - pico-train - INFO - โโโ Loss: 6.3625 |
| 2025-08-30 19:19:09 - pico-train - INFO - โโโ Learning Rate: 2.00e-04 |
| 2025-08-30 19:19:09 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 19:20:01 - pico-train - INFO - Step 3800 -- ๐ Training Metrics |
| 2025-08-30 19:20:01 - pico-train - INFO - โโโ Loss: 6.3046 |
| 2025-08-30 19:20:01 - pico-train - INFO - โโโ Learning Rate: 2.00e-04 |
| 2025-08-30 19:20:01 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 19:20:53 - pico-train - INFO - Step 3900 -- ๐ Training Metrics |
| 2025-08-30 19:20:53 - pico-train - INFO - โโโ Loss: 6.3153 |
| 2025-08-30 19:20:53 - pico-train - INFO - โโโ Learning Rate: 2.00e-04 |
| 2025-08-30 19:20:53 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 19:21:44 - pico-train - INFO - Step 4000 -- ๐พ Saving Checkpoint |
| 2025-08-30 19:23:33 - pico-train - INFO - Step 4000 -- ๐ Evaluation Results |
| 2025-08-30 19:23:33 - pico-train - INFO - โโโ paloma: 2.7323051886316706e+23 |
| 2025-08-30 19:23:34 - pico-train - INFO - Step 4000 -- ๐ Training Metrics |
| 2025-08-30 19:23:34 - pico-train - INFO - โโโ Loss: 6.3194 |
| 2025-08-30 19:23:34 - pico-train - INFO - โโโ Learning Rate: 2.00e-04 |
| 2025-08-30 19:23:34 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 19:23:34 - pico-train - INFO - Step 4000 -- ๐ Saving Learning Dynamics |
| 2025-08-30 19:24:26 - pico-train - INFO - Step 4100 -- ๐ Training Metrics |
| 2025-08-30 19:24:26 - pico-train - INFO - โโโ Loss: 6.2506 |
| 2025-08-30 19:24:26 - pico-train - INFO - โโโ Learning Rate: 2.00e-04 |
| 2025-08-30 19:24:26 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 19:25:18 - pico-train - INFO - Step 4200 -- ๐ Training Metrics |
| 2025-08-30 19:25:18 - pico-train - INFO - โโโ Loss: 6.2754 |
| 2025-08-30 19:25:18 - pico-train - INFO - โโโ Learning Rate: 2.00e-04 |
| 2025-08-30 19:25:18 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 19:26:10 - pico-train - INFO - Step 4300 -- ๐ Training Metrics |
| 2025-08-30 19:26:10 - pico-train - INFO - โโโ Loss: 6.2454 |
| 2025-08-30 19:26:10 - pico-train - INFO - โโโ Learning Rate: 2.00e-04 |
| 2025-08-30 19:26:10 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 19:27:02 - pico-train - INFO - Step 4400 -- ๐ Training Metrics |
| 2025-08-30 19:27:02 - pico-train - INFO - โโโ Loss: 6.1827 |
| 2025-08-30 19:27:02 - pico-train - INFO - โโโ Learning Rate: 2.00e-04 |
| 2025-08-30 19:27:02 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 19:27:54 - pico-train - INFO - Step 4500 -- ๐ Training Metrics |
| 2025-08-30 19:27:54 - pico-train - INFO - โโโ Loss: 6.1813 |
| 2025-08-30 19:27:54 - pico-train - INFO - โโโ Learning Rate: 2.00e-04 |
| 2025-08-30 19:27:54 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 19:28:46 - pico-train - INFO - Step 4600 -- ๐ Training Metrics |
| 2025-08-30 19:28:46 - pico-train - INFO - โโโ Loss: 6.1643 |
| 2025-08-30 19:28:46 - pico-train - INFO - โโโ Learning Rate: 2.00e-04 |
| 2025-08-30 19:28:46 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 19:29:37 - pico-train - INFO - Step 4700 -- ๐ Training Metrics |
| 2025-08-30 19:29:37 - pico-train - INFO - โโโ Loss: 6.1493 |
| 2025-08-30 19:29:37 - pico-train - INFO - โโโ Learning Rate: 2.00e-04 |
| 2025-08-30 19:29:37 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 19:30:29 - pico-train - INFO - Step 4800 -- ๐ Training Metrics |
| 2025-08-30 19:30:29 - pico-train - INFO - โโโ Loss: 6.1469 |
| 2025-08-30 19:30:29 - pico-train - INFO - โโโ Learning Rate: 2.00e-04 |
| 2025-08-30 19:30:29 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 19:31:20 - pico-train - INFO - Step 4900 -- ๐ Training Metrics |
| 2025-08-30 19:31:20 - pico-train - INFO - โโโ Loss: 6.0782 |
| 2025-08-30 19:31:20 - pico-train - INFO - โโโ Learning Rate: 2.00e-04 |
| 2025-08-30 19:31:20 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 19:32:13 - pico-train - INFO - Step 5000 -- ๐ Training Metrics |
| 2025-08-30 19:32:13 - pico-train - INFO - โโโ Loss: 6.1336 |
| 2025-08-30 19:32:13 - pico-train - INFO - โโโ Learning Rate: 2.00e-04 |
| 2025-08-30 19:32:13 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 19:33:04 - pico-train - INFO - Step 5100 -- ๐ Training Metrics |
| 2025-08-30 19:33:04 - pico-train - INFO - โโโ Loss: 6.1217 |
| 2025-08-30 19:33:04 - pico-train - INFO - โโโ Learning Rate: 2.00e-04 |
| 2025-08-30 19:33:04 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 19:33:56 - pico-train - INFO - Step 5200 -- ๐ Training Metrics |
| 2025-08-30 19:33:56 - pico-train - INFO - โโโ Loss: 6.0658 |
| 2025-08-30 19:33:56 - pico-train - INFO - โโโ Learning Rate: 1.99e-04 |
| 2025-08-30 19:33:56 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 19:34:47 - pico-train - INFO - Step 5300 -- ๐ Training Metrics |
| 2025-08-30 19:34:47 - pico-train - INFO - โโโ Loss: 6.0400 |
| 2025-08-30 19:34:47 - pico-train - INFO - โโโ Learning Rate: 1.99e-04 |
| 2025-08-30 19:34:47 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 19:35:39 - pico-train - INFO - Step 5400 -- ๐ Training Metrics |
| 2025-08-30 19:35:39 - pico-train - INFO - โโโ Loss: 6.0474 |
| 2025-08-30 19:35:39 - pico-train - INFO - โโโ Learning Rate: 1.99e-04 |
| 2025-08-30 19:35:39 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 19:36:31 - pico-train - INFO - Step 5500 -- ๐ Training Metrics |
| 2025-08-30 19:36:31 - pico-train - INFO - โโโ Loss: 5.9896 |
| 2025-08-30 19:36:31 - pico-train - INFO - โโโ Learning Rate: 1.99e-04 |
| 2025-08-30 19:36:31 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 19:37:23 - pico-train - INFO - Step 5600 -- ๐ Training Metrics |
| 2025-08-30 19:37:23 - pico-train - INFO - โโโ Loss: 6.0158 |
| 2025-08-30 19:37:23 - pico-train - INFO - โโโ Learning Rate: 1.99e-04 |
| 2025-08-30 19:37:23 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 19:38:15 - pico-train - INFO - Step 5700 -- ๐ Training Metrics |
| 2025-08-30 19:38:15 - pico-train - INFO - โโโ Loss: 6.0185 |
| 2025-08-30 19:38:15 - pico-train - INFO - โโโ Learning Rate: 1.99e-04 |
| 2025-08-30 19:38:15 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 19:39:07 - pico-train - INFO - Step 5800 -- ๐ Training Metrics |
| 2025-08-30 19:39:07 - pico-train - INFO - โโโ Loss: 5.9898 |
| 2025-08-30 19:39:07 - pico-train - INFO - โโโ Learning Rate: 1.99e-04 |
| 2025-08-30 19:39:07 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 19:39:58 - pico-train - INFO - Step 5900 -- ๐ Training Metrics |
| 2025-08-30 19:39:58 - pico-train - INFO - โโโ Loss: 5.9851 |
| 2025-08-30 19:39:58 - pico-train - INFO - โโโ Learning Rate: 1.99e-04 |
| 2025-08-30 19:39:58 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 19:40:49 - pico-train - INFO - Step 6000 -- ๐พ Saving Checkpoint |
| 2025-08-30 19:42:40 - pico-train - INFO - Step 6000 -- ๐ Evaluation Results |
| 2025-08-30 19:42:40 - pico-train - INFO - โโโ paloma: 1.6707155203187867e+26 |
| 2025-08-30 19:42:41 - pico-train - INFO - Step 6000 -- ๐ Training Metrics |
| 2025-08-30 19:42:41 - pico-train - INFO - โโโ Loss: 5.9321 |
| 2025-08-30 19:42:41 - pico-train - INFO - โโโ Learning Rate: 1.99e-04 |
| 2025-08-30 19:42:41 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 19:42:41 - pico-train - INFO - Step 6000 -- ๐ Saving Learning Dynamics |
| 2025-08-30 19:43:34 - pico-train - INFO - Step 6100 -- ๐ Training Metrics |
| 2025-08-30 19:43:34 - pico-train - INFO - โโโ Loss: 5.9322 |
| 2025-08-30 19:43:34 - pico-train - INFO - โโโ Learning Rate: 1.99e-04 |
| 2025-08-30 19:43:34 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 19:44:26 - pico-train - INFO - Step 6200 -- ๐ Training Metrics |
| 2025-08-30 19:44:26 - pico-train - INFO - โโโ Loss: 5.9436 |
| 2025-08-30 19:44:26 - pico-train - INFO - โโโ Learning Rate: 1.99e-04 |
| 2025-08-30 19:44:26 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 19:45:17 - pico-train - INFO - Step 6300 -- ๐ Training Metrics |
| 2025-08-30 19:45:17 - pico-train - INFO - โโโ Loss: 5.8953 |
| 2025-08-30 19:45:17 - pico-train - INFO - โโโ Learning Rate: 1.99e-04 |
| 2025-08-30 19:45:17 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 19:46:09 - pico-train - INFO - Step 6400 -- ๐ Training Metrics |
| 2025-08-30 19:46:09 - pico-train - INFO - โโโ Loss: 5.8631 |
| 2025-08-30 19:46:09 - pico-train - INFO - โโโ Learning Rate: 1.99e-04 |
| 2025-08-30 19:46:09 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 19:47:01 - pico-train - INFO - Step 6500 -- ๐ Training Metrics |
| 2025-08-30 19:47:01 - pico-train - INFO - โโโ Loss: 5.8680 |
| 2025-08-30 19:47:01 - pico-train - INFO - โโโ Learning Rate: 1.99e-04 |
| 2025-08-30 19:47:01 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 19:47:53 - pico-train - INFO - Step 6600 -- ๐ Training Metrics |
| 2025-08-30 19:47:53 - pico-train - INFO - โโโ Loss: 5.8913 |
| 2025-08-30 19:47:53 - pico-train - INFO - โโโ Learning Rate: 1.99e-04 |
| 2025-08-30 19:47:53 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 19:48:45 - pico-train - INFO - Step 6700 -- ๐ Training Metrics |
| 2025-08-30 19:48:45 - pico-train - INFO - โโโ Loss: 5.9308 |
| 2025-08-30 19:48:45 - pico-train - INFO - โโโ Learning Rate: 1.99e-04 |
| 2025-08-30 19:48:45 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 19:49:37 - pico-train - INFO - Step 6800 -- ๐ Training Metrics |
| 2025-08-30 19:49:37 - pico-train - INFO - โโโ Loss: 5.8280 |
| 2025-08-30 19:49:37 - pico-train - INFO - โโโ Learning Rate: 1.99e-04 |
| 2025-08-30 19:49:37 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 19:50:29 - pico-train - INFO - Step 6900 -- ๐ Training Metrics |
| 2025-08-30 19:50:29 - pico-train - INFO - โโโ Loss: 5.8122 |
| 2025-08-30 19:50:29 - pico-train - INFO - โโโ Learning Rate: 1.99e-04 |
| 2025-08-30 19:50:29 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 19:51:21 - pico-train - INFO - Step 7000 -- ๐ Training Metrics |
| 2025-08-30 19:51:21 - pico-train - INFO - โโโ Loss: 5.8374 |
| 2025-08-30 19:51:21 - pico-train - INFO - โโโ Learning Rate: 1.99e-04 |
| 2025-08-30 19:51:21 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 19:52:13 - pico-train - INFO - Step 7100 -- ๐ Training Metrics |
| 2025-08-30 19:52:13 - pico-train - INFO - โโโ Loss: 5.4755 |
| 2025-08-30 19:52:13 - pico-train - INFO - โโโ Learning Rate: 1.99e-04 |
| 2025-08-30 19:52:13 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 19:53:05 - pico-train - INFO - Step 7200 -- ๐ Training Metrics |
| 2025-08-30 19:53:05 - pico-train - INFO - โโโ Loss: 5.6391 |
| 2025-08-30 19:53:05 - pico-train - INFO - โโโ Learning Rate: 1.99e-04 |
| 2025-08-30 19:53:05 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 19:53:57 - pico-train - INFO - Step 7300 -- ๐ Training Metrics |
| 2025-08-30 19:53:57 - pico-train - INFO - โโโ Loss: 5.7914 |
| 2025-08-30 19:53:57 - pico-train - INFO - โโโ Learning Rate: 1.99e-04 |
| 2025-08-30 19:53:57 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 19:54:48 - pico-train - INFO - Step 7400 -- ๐ Training Metrics |
| 2025-08-30 19:54:48 - pico-train - INFO - โโโ Loss: 5.7551 |
| 2025-08-30 19:54:48 - pico-train - INFO - โโโ Learning Rate: 1.99e-04 |
| 2025-08-30 19:54:48 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 19:55:41 - pico-train - INFO - Step 7500 -- ๐ Training Metrics |
| 2025-08-30 19:55:41 - pico-train - INFO - โโโ Loss: 5.7208 |
| 2025-08-30 19:55:41 - pico-train - INFO - โโโ Learning Rate: 1.98e-04 |
| 2025-08-30 19:55:41 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 19:56:32 - pico-train - INFO - Step 7600 -- ๐ Training Metrics |
| 2025-08-30 19:56:32 - pico-train - INFO - โโโ Loss: 5.6070 |
| 2025-08-30 19:56:32 - pico-train - INFO - โโโ Learning Rate: 1.98e-04 |
| 2025-08-30 19:56:32 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 19:57:24 - pico-train - INFO - Step 7700 -- ๐ Training Metrics |
| 2025-08-30 19:57:24 - pico-train - INFO - โโโ Loss: 5.3528 |
| 2025-08-30 19:57:24 - pico-train - INFO - โโโ Learning Rate: 1.98e-04 |
| 2025-08-30 19:57:24 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 19:58:15 - pico-train - INFO - Step 7800 -- ๐ Training Metrics |
| 2025-08-30 19:58:15 - pico-train - INFO - โโโ Loss: 5.4300 |
| 2025-08-30 19:58:15 - pico-train - INFO - โโโ Learning Rate: 1.98e-04 |
| 2025-08-30 19:58:15 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 19:59:07 - pico-train - INFO - Step 7900 -- ๐ Training Metrics |
| 2025-08-30 19:59:07 - pico-train - INFO - โโโ Loss: 5.4836 |
| 2025-08-30 19:59:07 - pico-train - INFO - โโโ Learning Rate: 1.98e-04 |
| 2025-08-30 19:59:07 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 19:59:58 - pico-train - INFO - Step 8000 -- ๐พ Saving Checkpoint |
| 2025-08-30 20:01:49 - pico-train - INFO - Step 8000 -- ๐ Evaluation Results |
| 2025-08-30 20:01:49 - pico-train - INFO - โโโ paloma: 5.458564756516313e+32 |
| 2025-08-30 20:01:50 - pico-train - INFO - Step 8000 -- ๐ Training Metrics |
| 2025-08-30 20:01:50 - pico-train - INFO - โโโ Loss: 5.6596 |
| 2025-08-30 20:01:50 - pico-train - INFO - โโโ Learning Rate: 1.98e-04 |
| 2025-08-30 20:01:50 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 20:01:50 - pico-train - INFO - Step 8000 -- ๐ Saving Learning Dynamics |
| 2025-08-30 20:02:43 - pico-train - INFO - Step 8100 -- ๐ Training Metrics |
| 2025-08-30 20:02:43 - pico-train - INFO - โโโ Loss: 5.6682 |
| 2025-08-30 20:02:43 - pico-train - INFO - โโโ Learning Rate: 1.98e-04 |
| 2025-08-30 20:02:43 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 20:03:34 - pico-train - INFO - Step 8200 -- ๐ Training Metrics |
| 2025-08-30 20:03:34 - pico-train - INFO - โโโ Loss: 5.7168 |
| 2025-08-30 20:03:34 - pico-train - INFO - โโโ Learning Rate: 1.98e-04 |
| 2025-08-30 20:03:34 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 20:04:26 - pico-train - INFO - Step 8300 -- ๐ Training Metrics |
| 2025-08-30 20:04:26 - pico-train - INFO - โโโ Loss: 5.6882 |
| 2025-08-30 20:04:26 - pico-train - INFO - โโโ Learning Rate: 1.98e-04 |
| 2025-08-30 20:04:26 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 20:05:18 - pico-train - INFO - Step 8400 -- ๐ Training Metrics |
| 2025-08-30 20:05:18 - pico-train - INFO - โโโ Loss: 5.6670 |
| 2025-08-30 20:05:18 - pico-train - INFO - โโโ Learning Rate: 1.98e-04 |
| 2025-08-30 20:05:18 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 20:06:11 - pico-train - INFO - Step 8500 -- ๐ Training Metrics |
| 2025-08-30 20:06:11 - pico-train - INFO - โโโ Loss: 5.6494 |
| 2025-08-30 20:06:11 - pico-train - INFO - โโโ Learning Rate: 1.98e-04 |
| 2025-08-30 20:06:11 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 20:07:03 - pico-train - INFO - Step 8600 -- ๐ Training Metrics |
| 2025-08-30 20:07:03 - pico-train - INFO - โโโ Loss: 5.6230 |
| 2025-08-30 20:07:03 - pico-train - INFO - โโโ Learning Rate: 1.98e-04 |
| 2025-08-30 20:07:03 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 20:07:54 - pico-train - INFO - Step 8700 -- ๐ Training Metrics |
| 2025-08-30 20:07:54 - pico-train - INFO - โโโ Loss: 5.6758 |
| 2025-08-30 20:07:54 - pico-train - INFO - โโโ Learning Rate: 1.98e-04 |
| 2025-08-30 20:07:54 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 20:08:46 - pico-train - INFO - Step 8800 -- ๐ Training Metrics |
| 2025-08-30 20:08:46 - pico-train - INFO - โโโ Loss: 5.2235 |
| 2025-08-30 20:08:46 - pico-train - INFO - โโโ Learning Rate: 1.98e-04 |
| 2025-08-30 20:08:46 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 20:09:37 - pico-train - INFO - Step 8900 -- ๐ Training Metrics |
| 2025-08-30 20:09:37 - pico-train - INFO - โโโ Loss: 5.3350 |
| 2025-08-30 20:09:37 - pico-train - INFO - โโโ Learning Rate: 1.98e-04 |
| 2025-08-30 20:09:37 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 20:10:30 - pico-train - INFO - Step 9000 -- ๐ Training Metrics |
| 2025-08-30 20:10:30 - pico-train - INFO - โโโ Loss: 5.5705 |
| 2025-08-30 20:10:30 - pico-train - INFO - โโโ Learning Rate: 1.97e-04 |
| 2025-08-30 20:10:30 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 20:11:21 - pico-train - INFO - Step 9100 -- ๐ Training Metrics |
| 2025-08-30 20:11:21 - pico-train - INFO - โโโ Loss: 5.6158 |
| 2025-08-30 20:11:21 - pico-train - INFO - โโโ Learning Rate: 1.97e-04 |
| 2025-08-30 20:11:21 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 20:12:13 - pico-train - INFO - Step 9200 -- ๐ Training Metrics |
| 2025-08-30 20:12:13 - pico-train - INFO - โโโ Loss: 5.6318 |
| 2025-08-30 20:12:13 - pico-train - INFO - โโโ Learning Rate: 1.97e-04 |
| 2025-08-30 20:12:13 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 20:13:04 - pico-train - INFO - Step 9300 -- ๐ Training Metrics |
| 2025-08-30 20:13:04 - pico-train - INFO - โโโ Loss: 5.5648 |
| 2025-08-30 20:13:04 - pico-train - INFO - โโโ Learning Rate: 1.97e-04 |
| 2025-08-30 20:13:04 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 20:13:56 - pico-train - INFO - Step 9400 -- ๐ Training Metrics |
| 2025-08-30 20:13:56 - pico-train - INFO - โโโ Loss: 5.5175 |
| 2025-08-30 20:13:56 - pico-train - INFO - โโโ Learning Rate: 1.97e-04 |
| 2025-08-30 20:13:56 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 20:14:49 - pico-train - INFO - Step 9500 -- ๐ Training Metrics |
| 2025-08-30 20:14:49 - pico-train - INFO - โโโ Loss: 5.5606 |
| 2025-08-30 20:14:49 - pico-train - INFO - โโโ Learning Rate: 1.97e-04 |
| 2025-08-30 20:14:49 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 20:15:40 - pico-train - INFO - Step 9600 -- ๐ Training Metrics |
| 2025-08-30 20:15:40 - pico-train - INFO - โโโ Loss: 5.5785 |
| 2025-08-30 20:15:40 - pico-train - INFO - โโโ Learning Rate: 1.97e-04 |
| 2025-08-30 20:15:40 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 20:16:32 - pico-train - INFO - Step 9700 -- ๐ Training Metrics |
| 2025-08-30 20:16:32 - pico-train - INFO - โโโ Loss: 5.6221 |
| 2025-08-30 20:16:32 - pico-train - INFO - โโโ Learning Rate: 1.97e-04 |
| 2025-08-30 20:16:32 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 20:17:24 - pico-train - INFO - Step 9800 -- ๐ Training Metrics |
| 2025-08-30 20:17:24 - pico-train - INFO - โโโ Loss: 5.5889 |
| 2025-08-30 20:17:24 - pico-train - INFO - โโโ Learning Rate: 1.97e-04 |
| 2025-08-30 20:17:24 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 20:18:15 - pico-train - INFO - Step 9900 -- ๐ Training Metrics |
| 2025-08-30 20:18:15 - pico-train - INFO - โโโ Loss: 5.6375 |
| 2025-08-30 20:18:15 - pico-train - INFO - โโโ Learning Rate: 1.97e-04 |
| 2025-08-30 20:18:15 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 20:19:07 - pico-train - INFO - Step 10000 -- ๐พ Saving Checkpoint |
| 2025-08-30 20:20:57 - pico-train - INFO - Step 10000 -- ๐ Evaluation Results |
| 2025-08-30 20:20:57 - pico-train - INFO - โโโ paloma: 2.30417566298098e+31 |
| 2025-08-30 20:20:58 - pico-train - INFO - Step 10000 -- ๐ Training Metrics |
| 2025-08-30 20:20:58 - pico-train - INFO - โโโ Loss: 5.5828 |
| 2025-08-30 20:20:58 - pico-train - INFO - โโโ Learning Rate: 1.97e-04 |
| 2025-08-30 20:20:58 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 20:20:58 - pico-train - INFO - Step 10000 -- ๐ Saving Learning Dynamics |
| 2025-08-30 20:21:51 - pico-train - INFO - Step 10100 -- ๐ Training Metrics |
| 2025-08-30 20:21:51 - pico-train - INFO - โโโ Loss: 5.5754 |
| 2025-08-30 20:21:51 - pico-train - INFO - โโโ Learning Rate: 1.97e-04 |
| 2025-08-30 20:21:51 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 20:22:42 - pico-train - INFO - Step 10200 -- ๐ Training Metrics |
| 2025-08-30 20:22:42 - pico-train - INFO - โโโ Loss: 5.6074 |
| 2025-08-30 20:22:42 - pico-train - INFO - โโโ Learning Rate: 1.97e-04 |
| 2025-08-30 20:22:42 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 20:23:34 - pico-train - INFO - Step 10300 -- ๐ Training Metrics |
| 2025-08-30 20:23:34 - pico-train - INFO - โโโ Loss: 5.5959 |
| 2025-08-30 20:23:34 - pico-train - INFO - โโโ Learning Rate: 1.96e-04 |
| 2025-08-30 20:23:34 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 20:24:25 - pico-train - INFO - Step 10400 -- ๐ Training Metrics |
| 2025-08-30 20:24:25 - pico-train - INFO - โโโ Loss: 5.5846 |
| 2025-08-30 20:24:25 - pico-train - INFO - โโโ Learning Rate: 1.96e-04 |
| 2025-08-30 20:24:25 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 20:25:18 - pico-train - INFO - Step 10500 -- ๐ Training Metrics |
| 2025-08-30 20:25:18 - pico-train - INFO - โโโ Loss: 5.5355 |
| 2025-08-30 20:25:18 - pico-train - INFO - โโโ Learning Rate: 1.96e-04 |
| 2025-08-30 20:25:18 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 20:26:10 - pico-train - INFO - Step 10600 -- ๐ Training Metrics |
| 2025-08-30 20:26:10 - pico-train - INFO - โโโ Loss: 5.5638 |
| 2025-08-30 20:26:10 - pico-train - INFO - โโโ Learning Rate: 1.96e-04 |
| 2025-08-30 20:26:10 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 20:27:02 - pico-train - INFO - Step 10700 -- ๐ Training Metrics |
| 2025-08-30 20:27:02 - pico-train - INFO - โโโ Loss: 5.5035 |
| 2025-08-30 20:27:02 - pico-train - INFO - โโโ Learning Rate: 1.96e-04 |
| 2025-08-30 20:27:02 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 20:27:53 - pico-train - INFO - Step 10800 -- ๐ Training Metrics |
| 2025-08-30 20:27:53 - pico-train - INFO - โโโ Loss: 5.4736 |
| 2025-08-30 20:27:53 - pico-train - INFO - โโโ Learning Rate: 1.96e-04 |
| 2025-08-30 20:27:53 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 20:28:45 - pico-train - INFO - Step 10900 -- ๐ Training Metrics |
| 2025-08-30 20:28:45 - pico-train - INFO - โโโ Loss: 5.5359 |
| 2025-08-30 20:28:45 - pico-train - INFO - โโโ Learning Rate: 1.96e-04 |
| 2025-08-30 20:28:45 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 20:29:37 - pico-train - INFO - Step 11000 -- ๐ Training Metrics |
| 2025-08-30 20:29:37 - pico-train - INFO - โโโ Loss: 5.5173 |
| 2025-08-30 20:29:37 - pico-train - INFO - โโโ Learning Rate: 1.96e-04 |
| 2025-08-30 20:29:37 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 20:30:29 - pico-train - INFO - Step 11100 -- ๐ Training Metrics |
| 2025-08-30 20:30:29 - pico-train - INFO - โโโ Loss: 5.4951 |
| 2025-08-30 20:30:29 - pico-train - INFO - โโโ Learning Rate: 1.96e-04 |
| 2025-08-30 20:30:29 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 20:31:20 - pico-train - INFO - Step 11200 -- ๐ Training Metrics |
| 2025-08-30 20:31:20 - pico-train - INFO - โโโ Loss: 5.4697 |
| 2025-08-30 20:31:20 - pico-train - INFO - โโโ Learning Rate: 1.96e-04 |
| 2025-08-30 20:31:20 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 20:32:12 - pico-train - INFO - Step 11300 -- ๐ Training Metrics |
| 2025-08-30 20:32:12 - pico-train - INFO - โโโ Loss: 5.4093 |
| 2025-08-30 20:32:12 - pico-train - INFO - โโโ Learning Rate: 1.96e-04 |
| 2025-08-30 20:32:12 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 20:33:04 - pico-train - INFO - Step 11400 -- ๐ Training Metrics |
| 2025-08-30 20:33:04 - pico-train - INFO - โโโ Loss: 5.5173 |
| 2025-08-30 20:33:04 - pico-train - INFO - โโโ Learning Rate: 1.95e-04 |
| 2025-08-30 20:33:04 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 20:33:56 - pico-train - INFO - Step 11500 -- ๐ Training Metrics |
| 2025-08-30 20:33:56 - pico-train - INFO - โโโ Loss: 5.4804 |
| 2025-08-30 20:33:56 - pico-train - INFO - โโโ Learning Rate: 1.95e-04 |
| 2025-08-30 20:33:56 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 20:34:48 - pico-train - INFO - Step 11600 -- ๐ Training Metrics |
| 2025-08-30 20:34:48 - pico-train - INFO - โโโ Loss: 5.4382 |
| 2025-08-30 20:34:48 - pico-train - INFO - โโโ Learning Rate: 1.95e-04 |
| 2025-08-30 20:34:48 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 20:35:39 - pico-train - INFO - Step 11700 -- ๐ Training Metrics |
| 2025-08-30 20:35:39 - pico-train - INFO - โโโ Loss: 5.4080 |
| 2025-08-30 20:35:39 - pico-train - INFO - โโโ Learning Rate: 1.95e-04 |
| 2025-08-30 20:35:39 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 20:36:31 - pico-train - INFO - Step 11800 -- ๐ Training Metrics |
| 2025-08-30 20:36:31 - pico-train - INFO - โโโ Loss: 5.4179 |
| 2025-08-30 20:36:31 - pico-train - INFO - โโโ Learning Rate: 1.95e-04 |
| 2025-08-30 20:36:31 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 20:37:23 - pico-train - INFO - Step 11900 -- ๐ Training Metrics |
| 2025-08-30 20:37:23 - pico-train - INFO - โโโ Loss: 5.4318 |
| 2025-08-30 20:37:23 - pico-train - INFO - โโโ Learning Rate: 1.95e-04 |
| 2025-08-30 20:37:23 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 20:38:14 - pico-train - INFO - Step 12000 -- ๐พ Saving Checkpoint |
| 2025-08-30 20:40:02 - pico-train - INFO - Step 12000 -- ๐ Evaluation Results |
| 2025-08-30 20:40:02 - pico-train - INFO - โโโ paloma: 1.5035041453279693e+33 |
| 2025-08-30 20:40:03 - pico-train - INFO - Step 12000 -- ๐ Training Metrics |
| 2025-08-30 20:40:03 - pico-train - INFO - โโโ Loss: 5.4911 |
| 2025-08-30 20:40:03 - pico-train - INFO - โโโ Learning Rate: 1.95e-04 |
| 2025-08-30 20:40:03 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 20:40:03 - pico-train - INFO - Step 12000 -- ๐ Saving Learning Dynamics |
| 2025-08-30 20:40:56 - pico-train - INFO - Step 12100 -- ๐ Training Metrics |
| 2025-08-30 20:40:56 - pico-train - INFO - โโโ Loss: 5.4785 |
| 2025-08-30 20:40:56 - pico-train - INFO - โโโ Learning Rate: 1.95e-04 |
| 2025-08-30 20:40:56 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 20:41:47 - pico-train - INFO - Step 12200 -- ๐ Training Metrics |
| 2025-08-30 20:41:47 - pico-train - INFO - โโโ Loss: 5.4698 |
| 2025-08-30 20:41:47 - pico-train - INFO - โโโ Learning Rate: 1.95e-04 |
| 2025-08-30 20:41:47 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 20:42:39 - pico-train - INFO - Step 12300 -- ๐ Training Metrics |
| 2025-08-30 20:42:39 - pico-train - INFO - โโโ Loss: 5.4358 |
| 2025-08-30 20:42:39 - pico-train - INFO - โโโ Learning Rate: 1.95e-04 |
| 2025-08-30 20:42:39 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 20:43:31 - pico-train - INFO - Step 12400 -- ๐ Training Metrics |
| 2025-08-30 20:43:31 - pico-train - INFO - โโโ Loss: 5.4874 |
| 2025-08-30 20:43:31 - pico-train - INFO - โโโ Learning Rate: 1.94e-04 |
| 2025-08-30 20:43:31 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 20:44:23 - pico-train - INFO - Step 12500 -- ๐ Training Metrics |
| 2025-08-30 20:44:23 - pico-train - INFO - โโโ Loss: 5.4446 |
| 2025-08-30 20:44:23 - pico-train - INFO - โโโ Learning Rate: 1.94e-04 |
| 2025-08-30 20:44:23 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 20:45:15 - pico-train - INFO - Step 12600 -- ๐ Training Metrics |
| 2025-08-30 20:45:15 - pico-train - INFO - โโโ Loss: 5.4319 |
| 2025-08-30 20:45:15 - pico-train - INFO - โโโ Learning Rate: 1.94e-04 |
| 2025-08-30 20:45:15 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 20:46:07 - pico-train - INFO - Step 12700 -- ๐ Training Metrics |
| 2025-08-30 20:46:07 - pico-train - INFO - โโโ Loss: 5.4340 |
| 2025-08-30 20:46:07 - pico-train - INFO - โโโ Learning Rate: 1.94e-04 |
| 2025-08-30 20:46:07 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 20:46:59 - pico-train - INFO - Step 12800 -- ๐ Training Metrics |
| 2025-08-30 20:46:59 - pico-train - INFO - โโโ Loss: 5.3721 |
| 2025-08-30 20:46:59 - pico-train - INFO - โโโ Learning Rate: 1.94e-04 |
| 2025-08-30 20:46:59 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 20:47:50 - pico-train - INFO - Step 12900 -- ๐ Training Metrics |
| 2025-08-30 20:47:50 - pico-train - INFO - โโโ Loss: 5.4168 |
| 2025-08-30 20:47:50 - pico-train - INFO - โโโ Learning Rate: 1.94e-04 |
| 2025-08-30 20:47:50 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 20:48:42 - pico-train - INFO - Step 13000 -- ๐ Training Metrics |
| 2025-08-30 20:48:42 - pico-train - INFO - โโโ Loss: 5.3633 |
| 2025-08-30 20:48:42 - pico-train - INFO - โโโ Learning Rate: 1.94e-04 |
| 2025-08-30 20:48:42 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 20:49:33 - pico-train - INFO - Step 13100 -- ๐ Training Metrics |
| 2025-08-30 20:49:33 - pico-train - INFO - โโโ Loss: 5.1710 |
| 2025-08-30 20:49:33 - pico-train - INFO - โโโ Learning Rate: 1.94e-04 |
| 2025-08-30 20:49:33 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 20:50:25 - pico-train - INFO - Step 13200 -- ๐ Training Metrics |
| 2025-08-30 20:50:25 - pico-train - INFO - โโโ Loss: 5.2070 |
| 2025-08-30 20:50:25 - pico-train - INFO - โโโ Learning Rate: 1.94e-04 |
| 2025-08-30 20:50:25 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 20:51:17 - pico-train - INFO - Step 13300 -- ๐ Training Metrics |
| 2025-08-30 20:51:17 - pico-train - INFO - โโโ Loss: 5.2383 |
| 2025-08-30 20:51:17 - pico-train - INFO - โโโ Learning Rate: 1.94e-04 |
| 2025-08-30 20:51:17 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 20:52:09 - pico-train - INFO - Step 13400 -- ๐ Training Metrics |
| 2025-08-30 20:52:09 - pico-train - INFO - โโโ Loss: 4.9534 |
| 2025-08-30 20:52:09 - pico-train - INFO - โโโ Learning Rate: 1.93e-04 |
| 2025-08-30 20:52:09 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 20:53:01 - pico-train - INFO - Step 13500 -- ๐ Training Metrics |
| 2025-08-30 20:53:01 - pico-train - INFO - โโโ Loss: 5.3736 |
| 2025-08-30 20:53:01 - pico-train - INFO - โโโ Learning Rate: 1.93e-04 |
| 2025-08-30 20:53:01 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 20:53:53 - pico-train - INFO - Step 13600 -- ๐ Training Metrics |
| 2025-08-30 20:53:53 - pico-train - INFO - โโโ Loss: 5.3662 |
| 2025-08-30 20:53:53 - pico-train - INFO - โโโ Learning Rate: 1.93e-04 |
| 2025-08-30 20:53:53 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 20:54:45 - pico-train - INFO - Step 13700 -- ๐ Training Metrics |
| 2025-08-30 20:54:45 - pico-train - INFO - โโโ Loss: 5.3711 |
| 2025-08-30 20:54:45 - pico-train - INFO - โโโ Learning Rate: 1.93e-04 |
| 2025-08-30 20:54:45 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 20:55:36 - pico-train - INFO - Step 13800 -- ๐ Training Metrics |
| 2025-08-30 20:55:36 - pico-train - INFO - โโโ Loss: 5.3487 |
| 2025-08-30 20:55:36 - pico-train - INFO - โโโ Learning Rate: 1.93e-04 |
| 2025-08-30 20:55:36 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 20:56:27 - pico-train - INFO - Step 13900 -- ๐ Training Metrics |
| 2025-08-30 20:56:27 - pico-train - INFO - โโโ Loss: 5.3742 |
| 2025-08-30 20:56:27 - pico-train - INFO - โโโ Learning Rate: 1.93e-04 |
| 2025-08-30 20:56:27 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 20:57:19 - pico-train - INFO - Step 14000 -- ๐พ Saving Checkpoint |
| 2025-08-30 20:59:09 - pico-train - INFO - Step 14000 -- ๐ Evaluation Results |
| 2025-08-30 20:59:09 - pico-train - INFO - โโโ paloma: 7.8069653363655655e+34 |
| 2025-08-30 20:59:09 - pico-train - INFO - Step 14000 -- ๐ Training Metrics |
| 2025-08-30 20:59:09 - pico-train - INFO - โโโ Loss: 5.3518 |
| 2025-08-30 20:59:09 - pico-train - INFO - โโโ Learning Rate: 1.93e-04 |
| 2025-08-30 20:59:09 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 20:59:09 - pico-train - INFO - Step 14000 -- ๐ Saving Learning Dynamics |
| 2025-08-30 21:00:02 - pico-train - INFO - Step 14100 -- ๐ Training Metrics |
| 2025-08-30 21:00:02 - pico-train - INFO - โโโ Loss: 5.0946 |
| 2025-08-30 21:00:02 - pico-train - INFO - โโโ Learning Rate: 1.93e-04 |
| 2025-08-30 21:00:02 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 21:00:54 - pico-train - INFO - Step 14200 -- ๐ Training Metrics |
| 2025-08-30 21:00:54 - pico-train - INFO - โโโ Loss: 5.3664 |
| 2025-08-30 21:00:54 - pico-train - INFO - โโโ Learning Rate: 1.92e-04 |
| 2025-08-30 21:00:54 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 21:01:45 - pico-train - INFO - Step 14300 -- ๐ Training Metrics |
| 2025-08-30 21:01:45 - pico-train - INFO - โโโ Loss: 5.1603 |
| 2025-08-30 21:01:45 - pico-train - INFO - โโโ Learning Rate: 1.92e-04 |
| 2025-08-30 21:01:45 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 21:02:37 - pico-train - INFO - Step 14400 -- ๐ Training Metrics |
| 2025-08-30 21:02:37 - pico-train - INFO - โโโ Loss: 5.3446 |
| 2025-08-30 21:02:37 - pico-train - INFO - โโโ Learning Rate: 1.92e-04 |
| 2025-08-30 21:02:37 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 21:03:29 - pico-train - INFO - Step 14500 -- ๐ Training Metrics |
| 2025-08-30 21:03:29 - pico-train - INFO - โโโ Loss: 5.2801 |
| 2025-08-30 21:03:29 - pico-train - INFO - โโโ Learning Rate: 1.92e-04 |
| 2025-08-30 21:03:29 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 21:04:21 - pico-train - INFO - Step 14600 -- ๐ Training Metrics |
| 2025-08-30 21:04:21 - pico-train - INFO - โโโ Loss: 5.3449 |
| 2025-08-30 21:04:21 - pico-train - INFO - โโโ Learning Rate: 1.92e-04 |
| 2025-08-30 21:04:21 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 21:05:13 - pico-train - INFO - Step 14700 -- ๐ Training Metrics |
| 2025-08-30 21:05:13 - pico-train - INFO - โโโ Loss: 5.2675 |
| 2025-08-30 21:05:13 - pico-train - INFO - โโโ Learning Rate: 1.92e-04 |
| 2025-08-30 21:05:13 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 21:06:05 - pico-train - INFO - Step 14800 -- ๐ Training Metrics |
| 2025-08-30 21:06:05 - pico-train - INFO - โโโ Loss: 5.3139 |
| 2025-08-30 21:06:05 - pico-train - INFO - โโโ Learning Rate: 1.92e-04 |
| 2025-08-30 21:06:05 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 21:06:56 - pico-train - INFO - Step 14900 -- ๐ Training Metrics |
| 2025-08-30 21:06:56 - pico-train - INFO - โโโ Loss: 5.2704 |
| 2025-08-30 21:06:56 - pico-train - INFO - โโโ Learning Rate: 1.92e-04 |
| 2025-08-30 21:06:56 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 21:07:49 - pico-train - INFO - Step 15000 -- ๐ Training Metrics |
| 2025-08-30 21:07:49 - pico-train - INFO - โโโ Loss: 5.3162 |
| 2025-08-30 21:07:49 - pico-train - INFO - โโโ Learning Rate: 1.91e-04 |
| 2025-08-30 21:07:49 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 21:08:40 - pico-train - INFO - Step 15100 -- ๐ Training Metrics |
| 2025-08-30 21:08:40 - pico-train - INFO - โโโ Loss: 5.3013 |
| 2025-08-30 21:08:40 - pico-train - INFO - โโโ Learning Rate: 1.91e-04 |
| 2025-08-30 21:08:40 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 21:09:32 - pico-train - INFO - Step 15200 -- ๐ Training Metrics |
| 2025-08-30 21:09:32 - pico-train - INFO - โโโ Loss: 5.3360 |
| 2025-08-30 21:09:32 - pico-train - INFO - โโโ Learning Rate: 1.91e-04 |
| 2025-08-30 21:09:32 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 21:10:26 - pico-train - INFO - Step 15300 -- ๐ Training Metrics |
| 2025-08-30 21:10:26 - pico-train - INFO - โโโ Loss: 5.2511 |
| 2025-08-30 21:10:26 - pico-train - INFO - โโโ Learning Rate: 1.91e-04 |
| 2025-08-30 21:10:26 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 21:11:17 - pico-train - INFO - Step 15400 -- ๐ Training Metrics |
| 2025-08-30 21:11:17 - pico-train - INFO - โโโ Loss: 5.3518 |
| 2025-08-30 21:11:17 - pico-train - INFO - โโโ Learning Rate: 1.91e-04 |
| 2025-08-30 21:11:17 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 21:12:09 - pico-train - INFO - Step 15500 -- ๐ Training Metrics |
| 2025-08-30 21:12:09 - pico-train - INFO - โโโ Loss: 5.3463 |
| 2025-08-30 21:12:09 - pico-train - INFO - โโโ Learning Rate: 1.91e-04 |
| 2025-08-30 21:12:09 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 21:13:01 - pico-train - INFO - Step 15600 -- ๐ Training Metrics |
| 2025-08-30 21:13:01 - pico-train - INFO - โโโ Loss: 5.3388 |
| 2025-08-30 21:13:01 - pico-train - INFO - โโโ Learning Rate: 1.91e-04 |
| 2025-08-30 21:13:01 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 21:13:53 - pico-train - INFO - Step 15700 -- ๐ Training Metrics |
| 2025-08-30 21:13:53 - pico-train - INFO - โโโ Loss: 5.2737 |
| 2025-08-30 21:13:53 - pico-train - INFO - โโโ Learning Rate: 1.91e-04 |
| 2025-08-30 21:13:53 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 21:14:45 - pico-train - INFO - Step 15800 -- ๐ Training Metrics |
| 2025-08-30 21:14:45 - pico-train - INFO - โโโ Loss: 5.3195 |
| 2025-08-30 21:14:45 - pico-train - INFO - โโโ Learning Rate: 1.90e-04 |
| 2025-08-30 21:14:45 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 21:15:37 - pico-train - INFO - Step 15900 -- ๐ Training Metrics |
| 2025-08-30 21:15:37 - pico-train - INFO - โโโ Loss: 5.2779 |
| 2025-08-30 21:15:37 - pico-train - INFO - โโโ Learning Rate: 1.90e-04 |
| 2025-08-30 21:15:37 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 21:16:28 - pico-train - INFO - Step 16000 -- ๐พ Saving Checkpoint |
| 2025-08-30 21:18:17 - pico-train - INFO - Step 16000 -- ๐ Evaluation Results |
| 2025-08-30 21:18:17 - pico-train - INFO - โโโ paloma: inf |
| 2025-08-30 21:18:17 - pico-train - INFO - Step 16000 -- ๐ Training Metrics |
| 2025-08-30 21:18:17 - pico-train - INFO - โโโ Loss: 5.2974 |
| 2025-08-30 21:18:17 - pico-train - INFO - โโโ Learning Rate: 1.90e-04 |
| 2025-08-30 21:18:17 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 21:18:17 - pico-train - INFO - Step 16000 -- ๐ Saving Learning Dynamics |
| 2025-08-30 21:19:09 - pico-train - INFO - Step 16100 -- ๐ Training Metrics |
| 2025-08-30 21:19:09 - pico-train - INFO - โโโ Loss: 5.2737 |
| 2025-08-30 21:19:09 - pico-train - INFO - โโโ Learning Rate: 1.90e-04 |
| 2025-08-30 21:19:09 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 21:20:01 - pico-train - INFO - Step 16200 -- ๐ Training Metrics |
| 2025-08-30 21:20:01 - pico-train - INFO - โโโ Loss: 5.2285 |
| 2025-08-30 21:20:01 - pico-train - INFO - โโโ Learning Rate: 1.90e-04 |
| 2025-08-30 21:20:01 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 21:20:52 - pico-train - INFO - Step 16300 -- ๐ Training Metrics |
| 2025-08-30 21:20:52 - pico-train - INFO - โโโ Loss: 5.2327 |
| 2025-08-30 21:20:52 - pico-train - INFO - โโโ Learning Rate: 1.90e-04 |
| 2025-08-30 21:20:52 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 21:21:44 - pico-train - INFO - Step 16400 -- ๐ Training Metrics |
| 2025-08-30 21:21:44 - pico-train - INFO - โโโ Loss: 5.2827 |
| 2025-08-30 21:21:44 - pico-train - INFO - โโโ Learning Rate: 1.90e-04 |
| 2025-08-30 21:21:44 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 21:22:37 - pico-train - INFO - Step 16500 -- ๐ Training Metrics |
| 2025-08-30 21:22:37 - pico-train - INFO - โโโ Loss: 5.2770 |
| 2025-08-30 21:22:37 - pico-train - INFO - โโโ Learning Rate: 1.89e-04 |
| 2025-08-30 21:22:37 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 21:23:29 - pico-train - INFO - Step 16600 -- ๐ Training Metrics |
| 2025-08-30 21:23:29 - pico-train - INFO - โโโ Loss: 5.3294 |
| 2025-08-30 21:23:29 - pico-train - INFO - โโโ Learning Rate: 1.89e-04 |
| 2025-08-30 21:23:29 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 21:24:21 - pico-train - INFO - Step 16700 -- ๐ Training Metrics |
| 2025-08-30 21:24:21 - pico-train - INFO - โโโ Loss: 5.2354 |
| 2025-08-30 21:24:21 - pico-train - INFO - โโโ Learning Rate: 1.89e-04 |
| 2025-08-30 21:24:21 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 21:25:13 - pico-train - INFO - Step 16800 -- ๐ Training Metrics |
| 2025-08-30 21:25:13 - pico-train - INFO - โโโ Loss: 5.2687 |
| 2025-08-30 21:25:13 - pico-train - INFO - โโโ Learning Rate: 1.89e-04 |
| 2025-08-30 21:25:13 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 21:26:04 - pico-train - INFO - Step 16900 -- ๐ Training Metrics |
| 2025-08-30 21:26:04 - pico-train - INFO - โโโ Loss: 5.2713 |
| 2025-08-30 21:26:04 - pico-train - INFO - โโโ Learning Rate: 1.89e-04 |
| 2025-08-30 21:26:04 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 21:26:56 - pico-train - INFO - Step 17000 -- ๐ Training Metrics |
| 2025-08-30 21:26:56 - pico-train - INFO - โโโ Loss: 5.2079 |
| 2025-08-30 21:26:56 - pico-train - INFO - โโโ Learning Rate: 1.89e-04 |
| 2025-08-30 21:26:56 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 21:27:48 - pico-train - INFO - Step 17100 -- ๐ Training Metrics |
| 2025-08-30 21:27:48 - pico-train - INFO - โโโ Loss: 5.2177 |
| 2025-08-30 21:27:48 - pico-train - INFO - โโโ Learning Rate: 1.89e-04 |
| 2025-08-30 21:27:48 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 21:28:39 - pico-train - INFO - Step 17200 -- ๐ Training Metrics |
| 2025-08-30 21:28:39 - pico-train - INFO - โโโ Loss: 5.2245 |
| 2025-08-30 21:28:39 - pico-train - INFO - โโโ Learning Rate: 1.88e-04 |
| 2025-08-30 21:28:39 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 21:29:31 - pico-train - INFO - Step 17300 -- ๐ Training Metrics |
| 2025-08-30 21:29:31 - pico-train - INFO - โโโ Loss: 5.0896 |
| 2025-08-30 21:29:31 - pico-train - INFO - โโโ Learning Rate: 1.88e-04 |
| 2025-08-30 21:29:31 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 21:30:22 - pico-train - INFO - Step 17400 -- ๐ Training Metrics |
| 2025-08-30 21:30:22 - pico-train - INFO - โโโ Loss: 4.9397 |
| 2025-08-30 21:30:22 - pico-train - INFO - โโโ Learning Rate: 1.88e-04 |
| 2025-08-30 21:30:22 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 21:31:16 - pico-train - INFO - Step 17500 -- ๐ Training Metrics |
| 2025-08-30 21:31:16 - pico-train - INFO - โโโ Loss: 4.9093 |
| 2025-08-30 21:31:16 - pico-train - INFO - โโโ Learning Rate: 1.88e-04 |
| 2025-08-30 21:31:16 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 21:32:08 - pico-train - INFO - Step 17600 -- ๐ Training Metrics |
| 2025-08-30 21:32:08 - pico-train - INFO - โโโ Loss: 5.0573 |
| 2025-08-30 21:32:08 - pico-train - INFO - โโโ Learning Rate: 1.88e-04 |
| 2025-08-30 21:32:08 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 21:33:00 - pico-train - INFO - Step 17700 -- ๐ Training Metrics |
| 2025-08-30 21:33:00 - pico-train - INFO - โโโ Loss: 5.0873 |
| 2025-08-30 21:33:00 - pico-train - INFO - โโโ Learning Rate: 1.88e-04 |
| 2025-08-30 21:33:00 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 21:33:51 - pico-train - INFO - Step 17800 -- ๐ Training Metrics |
| 2025-08-30 21:33:51 - pico-train - INFO - โโโ Loss: 5.0730 |
| 2025-08-30 21:33:51 - pico-train - INFO - โโโ Learning Rate: 1.87e-04 |
| 2025-08-30 21:33:51 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 21:34:43 - pico-train - INFO - Step 17900 -- ๐ Training Metrics |
| 2025-08-30 21:34:43 - pico-train - INFO - โโโ Loss: 5.1804 |
| 2025-08-30 21:34:43 - pico-train - INFO - โโโ Learning Rate: 1.87e-04 |
| 2025-08-30 21:34:43 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 21:35:35 - pico-train - INFO - Step 18000 -- ๐พ Saving Checkpoint |
| 2025-08-30 21:37:23 - pico-train - INFO - Step 18000 -- ๐ Evaluation Results |
| 2025-08-30 21:37:23 - pico-train - INFO - โโโ paloma: inf |
| 2025-08-30 21:37:23 - pico-train - INFO - Step 18000 -- ๐ Training Metrics |
| 2025-08-30 21:37:23 - pico-train - INFO - โโโ Loss: 5.2105 |
| 2025-08-30 21:37:23 - pico-train - INFO - โโโ Learning Rate: 1.87e-04 |
| 2025-08-30 21:37:23 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 21:37:23 - pico-train - INFO - Step 18000 -- ๐ Saving Learning Dynamics |
| 2025-08-30 21:38:16 - pico-train - INFO - Step 18100 -- ๐ Training Metrics |
| 2025-08-30 21:38:16 - pico-train - INFO - โโโ Loss: 5.1961 |
| 2025-08-30 21:38:16 - pico-train - INFO - โโโ Learning Rate: 1.87e-04 |
| 2025-08-30 21:38:16 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 21:39:08 - pico-train - INFO - Step 18200 -- ๐ Training Metrics |
| 2025-08-30 21:39:08 - pico-train - INFO - โโโ Loss: 5.2475 |
| 2025-08-30 21:39:08 - pico-train - INFO - โโโ Learning Rate: 1.87e-04 |
| 2025-08-30 21:39:08 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 21:39:59 - pico-train - INFO - Step 18300 -- ๐ Training Metrics |
| 2025-08-30 21:39:59 - pico-train - INFO - โโโ Loss: 5.2362 |
| 2025-08-30 21:39:59 - pico-train - INFO - โโโ Learning Rate: 1.87e-04 |
| 2025-08-30 21:39:59 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 21:40:51 - pico-train - INFO - Step 18400 -- ๐ Training Metrics |
| 2025-08-30 21:40:51 - pico-train - INFO - โโโ Loss: 5.2194 |
| 2025-08-30 21:40:51 - pico-train - INFO - โโโ Learning Rate: 1.86e-04 |
| 2025-08-30 21:40:51 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 21:41:44 - pico-train - INFO - Step 18500 -- ๐ Training Metrics |
| 2025-08-30 21:41:44 - pico-train - INFO - โโโ Loss: 5.2559 |
| 2025-08-30 21:41:44 - pico-train - INFO - โโโ Learning Rate: 1.86e-04 |
| 2025-08-30 21:41:44 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 21:42:35 - pico-train - INFO - Step 18600 -- ๐ Training Metrics |
| 2025-08-30 21:42:35 - pico-train - INFO - โโโ Loss: 5.1074 |
| 2025-08-30 21:42:35 - pico-train - INFO - โโโ Learning Rate: 1.86e-04 |
| 2025-08-30 21:42:35 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 21:43:27 - pico-train - INFO - Step 18700 -- ๐ Training Metrics |
| 2025-08-30 21:43:27 - pico-train - INFO - โโโ Loss: 5.1509 |
| 2025-08-30 21:43:27 - pico-train - INFO - โโโ Learning Rate: 1.86e-04 |
| 2025-08-30 21:43:27 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 21:44:19 - pico-train - INFO - Step 18800 -- ๐ Training Metrics |
| 2025-08-30 21:44:19 - pico-train - INFO - โโโ Loss: 5.1142 |
| 2025-08-30 21:44:19 - pico-train - INFO - โโโ Learning Rate: 1.86e-04 |
| 2025-08-30 21:44:19 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 21:45:10 - pico-train - INFO - Step 18900 -- ๐ Training Metrics |
| 2025-08-30 21:45:11 - pico-train - INFO - โโโ Loss: 5.1417 |
| 2025-08-30 21:45:11 - pico-train - INFO - โโโ Learning Rate: 1.86e-04 |
| 2025-08-30 21:45:11 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 21:46:03 - pico-train - INFO - Step 19000 -- ๐ Training Metrics |
| 2025-08-30 21:46:03 - pico-train - INFO - โโโ Loss: 5.1488 |
| 2025-08-30 21:46:03 - pico-train - INFO - โโโ Learning Rate: 1.86e-04 |
| 2025-08-30 21:46:03 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 21:46:54 - pico-train - INFO - Step 19100 -- ๐ Training Metrics |
| 2025-08-30 21:46:54 - pico-train - INFO - โโโ Loss: 5.1151 |
| 2025-08-30 21:46:54 - pico-train - INFO - โโโ Learning Rate: 1.85e-04 |
| 2025-08-30 21:46:54 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 21:47:46 - pico-train - INFO - Step 19200 -- ๐ Training Metrics |
| 2025-08-30 21:47:46 - pico-train - INFO - โโโ Loss: 5.1375 |
| 2025-08-30 21:47:46 - pico-train - INFO - โโโ Learning Rate: 1.85e-04 |
| 2025-08-30 21:47:46 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 21:48:38 - pico-train - INFO - Step 19300 -- ๐ Training Metrics |
| 2025-08-30 21:48:38 - pico-train - INFO - โโโ Loss: 5.1469 |
| 2025-08-30 21:48:38 - pico-train - INFO - โโโ Learning Rate: 1.85e-04 |
| 2025-08-30 21:48:38 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 21:49:30 - pico-train - INFO - Step 19400 -- ๐ Training Metrics |
| 2025-08-30 21:49:30 - pico-train - INFO - โโโ Loss: 5.1239 |
| 2025-08-30 21:49:30 - pico-train - INFO - โโโ Learning Rate: 1.85e-04 |
| 2025-08-30 21:49:30 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 21:50:22 - pico-train - INFO - Step 19500 -- ๐ Training Metrics |
| 2025-08-30 21:50:22 - pico-train - INFO - โโโ Loss: 5.1759 |
| 2025-08-30 21:50:22 - pico-train - INFO - โโโ Learning Rate: 1.85e-04 |
| 2025-08-30 21:50:22 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 21:51:14 - pico-train - INFO - Step 19600 -- ๐ Training Metrics |
| 2025-08-30 21:51:14 - pico-train - INFO - โโโ Loss: 5.0507 |
| 2025-08-30 21:51:14 - pico-train - INFO - โโโ Learning Rate: 1.85e-04 |
| 2025-08-30 21:51:14 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 21:52:05 - pico-train - INFO - Step 19700 -- ๐ Training Metrics |
| 2025-08-30 21:52:05 - pico-train - INFO - โโโ Loss: 5.0907 |
| 2025-08-30 21:52:05 - pico-train - INFO - โโโ Learning Rate: 1.84e-04 |
| 2025-08-30 21:52:05 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 21:52:57 - pico-train - INFO - Step 19800 -- ๐ Training Metrics |
| 2025-08-30 21:52:57 - pico-train - INFO - โโโ Loss: 4.4806 |
| 2025-08-30 21:52:57 - pico-train - INFO - โโโ Learning Rate: 1.84e-04 |
| 2025-08-30 21:52:57 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 21:53:49 - pico-train - INFO - Step 19900 -- ๐ Training Metrics |
| 2025-08-30 21:53:49 - pico-train - INFO - โโโ Loss: 4.9627 |
| 2025-08-30 21:53:49 - pico-train - INFO - โโโ Learning Rate: 1.84e-04 |
| 2025-08-30 21:53:49 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 21:54:40 - pico-train - INFO - Step 20000 -- ๐พ Saving Checkpoint |
| 2025-08-30 21:56:29 - pico-train - INFO - Step 20000 -- ๐ Evaluation Results |
| 2025-08-30 21:56:29 - pico-train - INFO - โโโ paloma: inf |
| 2025-08-30 21:56:30 - pico-train - INFO - Step 20000 -- ๐ Training Metrics |
| 2025-08-30 21:56:30 - pico-train - INFO - โโโ Loss: 4.9833 |
| 2025-08-30 21:56:30 - pico-train - INFO - โโโ Learning Rate: 1.84e-04 |
| 2025-08-30 21:56:30 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 21:56:30 - pico-train - INFO - Step 20000 -- ๐ Saving Learning Dynamics |
| 2025-08-30 21:57:22 - pico-train - INFO - Step 20100 -- ๐ Training Metrics |
| 2025-08-30 21:57:22 - pico-train - INFO - โโโ Loss: 4.7830 |
| 2025-08-30 21:57:22 - pico-train - INFO - โโโ Learning Rate: 1.84e-04 |
| 2025-08-30 21:57:22 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 21:58:14 - pico-train - INFO - Step 20200 -- ๐ Training Metrics |
| 2025-08-30 21:58:14 - pico-train - INFO - โโโ Loss: 4.9470 |
| 2025-08-30 21:58:14 - pico-train - INFO - โโโ Learning Rate: 1.83e-04 |
| 2025-08-30 21:58:14 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 21:59:06 - pico-train - INFO - Step 20300 -- ๐ Training Metrics |
| 2025-08-30 21:59:06 - pico-train - INFO - โโโ Loss: 5.0469 |
| 2025-08-30 21:59:06 - pico-train - INFO - โโโ Learning Rate: 1.83e-04 |
| 2025-08-30 21:59:06 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 21:59:57 - pico-train - INFO - Step 20400 -- ๐ Training Metrics |
| 2025-08-30 21:59:57 - pico-train - INFO - โโโ Loss: 4.9170 |
| 2025-08-30 21:59:57 - pico-train - INFO - โโโ Learning Rate: 1.83e-04 |
| 2025-08-30 21:59:57 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 22:00:49 - pico-train - INFO - Step 20500 -- ๐ Training Metrics |
| 2025-08-30 22:00:49 - pico-train - INFO - โโโ Loss: 5.1611 |
| 2025-08-30 22:00:49 - pico-train - INFO - โโโ Learning Rate: 1.83e-04 |
| 2025-08-30 22:00:49 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 22:01:41 - pico-train - INFO - Step 20600 -- ๐ Training Metrics |
| 2025-08-30 22:01:41 - pico-train - INFO - โโโ Loss: 5.1110 |
| 2025-08-30 22:01:41 - pico-train - INFO - โโโ Learning Rate: 1.83e-04 |
| 2025-08-30 22:01:41 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 22:02:33 - pico-train - INFO - Step 20700 -- ๐ Training Metrics |
| 2025-08-30 22:02:33 - pico-train - INFO - โโโ Loss: 5.1728 |
| 2025-08-30 22:02:33 - pico-train - INFO - โโโ Learning Rate: 1.83e-04 |
| 2025-08-30 22:02:33 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 22:03:24 - pico-train - INFO - Step 20800 -- ๐ Training Metrics |
| 2025-08-30 22:03:24 - pico-train - INFO - โโโ Loss: 5.1312 |
| 2025-08-30 22:03:24 - pico-train - INFO - โโโ Learning Rate: 1.82e-04 |
| 2025-08-30 22:03:24 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 22:04:16 - pico-train - INFO - Step 20900 -- ๐ Training Metrics |
| 2025-08-30 22:04:16 - pico-train - INFO - โโโ Loss: 5.1331 |
| 2025-08-30 22:04:16 - pico-train - INFO - โโโ Learning Rate: 1.82e-04 |
| 2025-08-30 22:04:16 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 22:05:08 - pico-train - INFO - Step 21000 -- ๐ Training Metrics |
| 2025-08-30 22:05:08 - pico-train - INFO - โโโ Loss: 5.1372 |
| 2025-08-30 22:05:08 - pico-train - INFO - โโโ Learning Rate: 1.82e-04 |
| 2025-08-30 22:05:08 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 22:06:00 - pico-train - INFO - Step 21100 -- ๐ Training Metrics |
| 2025-08-30 22:06:00 - pico-train - INFO - โโโ Loss: 5.1923 |
| 2025-08-30 22:06:00 - pico-train - INFO - โโโ Learning Rate: 1.82e-04 |
| 2025-08-30 22:06:00 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 22:06:52 - pico-train - INFO - Step 21200 -- ๐ Training Metrics |
| 2025-08-30 22:06:52 - pico-train - INFO - โโโ Loss: 5.0971 |
| 2025-08-30 22:06:52 - pico-train - INFO - โโโ Learning Rate: 1.82e-04 |
| 2025-08-30 22:06:52 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 22:07:44 - pico-train - INFO - Step 21300 -- ๐ Training Metrics |
| 2025-08-30 22:07:44 - pico-train - INFO - โโโ Loss: 5.1132 |
| 2025-08-30 22:07:44 - pico-train - INFO - โโโ Learning Rate: 1.81e-04 |
| 2025-08-30 22:07:44 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 22:08:35 - pico-train - INFO - Step 21400 -- ๐ Training Metrics |
| 2025-08-30 22:08:35 - pico-train - INFO - โโโ Loss: 5.0486 |
| 2025-08-30 22:08:35 - pico-train - INFO - โโโ Learning Rate: 1.81e-04 |
| 2025-08-30 22:08:35 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 22:09:28 - pico-train - INFO - Step 21500 -- ๐ Training Metrics |
| 2025-08-30 22:09:28 - pico-train - INFO - โโโ Loss: 5.0345 |
| 2025-08-30 22:09:28 - pico-train - INFO - โโโ Learning Rate: 1.81e-04 |
| 2025-08-30 22:09:28 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 22:10:19 - pico-train - INFO - Step 21600 -- ๐ Training Metrics |
| 2025-08-30 22:10:19 - pico-train - INFO - โโโ Loss: 5.0882 |
| 2025-08-30 22:10:19 - pico-train - INFO - โโโ Learning Rate: 1.81e-04 |
| 2025-08-30 22:10:19 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 22:11:11 - pico-train - INFO - Step 21700 -- ๐ Training Metrics |
| 2025-08-30 22:11:11 - pico-train - INFO - โโโ Loss: 5.1284 |
| 2025-08-30 22:11:11 - pico-train - INFO - โโโ Learning Rate: 1.81e-04 |
| 2025-08-30 22:11:11 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 22:12:03 - pico-train - INFO - Step 21800 -- ๐ Training Metrics |
| 2025-08-30 22:12:03 - pico-train - INFO - โโโ Loss: 5.0334 |
| 2025-08-30 22:12:03 - pico-train - INFO - โโโ Learning Rate: 1.81e-04 |
| 2025-08-30 22:12:03 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 22:12:55 - pico-train - INFO - Step 21900 -- ๐ Training Metrics |
| 2025-08-30 22:12:55 - pico-train - INFO - โโโ Loss: 5.0281 |
| 2025-08-30 22:12:55 - pico-train - INFO - โโโ Learning Rate: 1.80e-04 |
| 2025-08-30 22:12:55 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 22:13:46 - pico-train - INFO - Step 22000 -- ๐พ Saving Checkpoint |
| 2025-08-30 22:15:34 - pico-train - INFO - Step 22000 -- ๐ Evaluation Results |
| 2025-08-30 22:15:34 - pico-train - INFO - โโโ paloma: inf |
| 2025-08-30 22:15:35 - pico-train - INFO - Step 22000 -- ๐ Training Metrics |
| 2025-08-30 22:15:35 - pico-train - INFO - โโโ Loss: 5.0481 |
| 2025-08-30 22:15:35 - pico-train - INFO - โโโ Learning Rate: 1.80e-04 |
| 2025-08-30 22:15:35 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 22:15:35 - pico-train - INFO - Step 22000 -- ๐ Saving Learning Dynamics |
| 2025-08-30 22:16:27 - pico-train - INFO - Step 22100 -- ๐ Training Metrics |
| 2025-08-30 22:16:27 - pico-train - INFO - โโโ Loss: 5.0789 |
| 2025-08-30 22:16:27 - pico-train - INFO - โโโ Learning Rate: 1.80e-04 |
| 2025-08-30 22:16:27 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 22:17:19 - pico-train - INFO - Step 22200 -- ๐ Training Metrics |
| 2025-08-30 22:17:19 - pico-train - INFO - โโโ Loss: 5.1574 |
| 2025-08-30 22:17:19 - pico-train - INFO - โโโ Learning Rate: 1.80e-04 |
| 2025-08-30 22:17:19 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 22:18:11 - pico-train - INFO - Step 22300 -- ๐ Training Metrics |
| 2025-08-30 22:18:11 - pico-train - INFO - โโโ Loss: 5.0839 |
| 2025-08-30 22:18:11 - pico-train - INFO - โโโ Learning Rate: 1.80e-04 |
| 2025-08-30 22:18:11 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 22:19:02 - pico-train - INFO - Step 22400 -- ๐ Training Metrics |
| 2025-08-30 22:19:02 - pico-train - INFO - โโโ Loss: 4.9870 |
| 2025-08-30 22:19:02 - pico-train - INFO - โโโ Learning Rate: 1.79e-04 |
| 2025-08-30 22:19:02 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 22:20:06 - pico-train - INFO - Step 22500 -- ๐ Training Metrics |
| 2025-08-30 22:20:06 - pico-train - INFO - โโโ Loss: 5.0445 |
| 2025-08-30 22:20:06 - pico-train - INFO - โโโ Learning Rate: 1.79e-04 |
| 2025-08-30 22:20:06 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 22:20:57 - pico-train - INFO - Step 22600 -- ๐ Training Metrics |
| 2025-08-30 22:20:57 - pico-train - INFO - โโโ Loss: 5.0542 |
| 2025-08-30 22:20:57 - pico-train - INFO - โโโ Learning Rate: 1.79e-04 |
| 2025-08-30 22:20:57 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 22:21:49 - pico-train - INFO - Step 22700 -- ๐ Training Metrics |
| 2025-08-30 22:21:49 - pico-train - INFO - โโโ Loss: 4.8050 |
| 2025-08-30 22:21:49 - pico-train - INFO - โโโ Learning Rate: 1.79e-04 |
| 2025-08-30 22:21:49 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 22:22:41 - pico-train - INFO - Step 22800 -- ๐ Training Metrics |
| 2025-08-30 22:22:41 - pico-train - INFO - โโโ Loss: 4.9966 |
| 2025-08-30 22:22:41 - pico-train - INFO - โโโ Learning Rate: 1.79e-04 |
| 2025-08-30 22:22:41 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 22:23:33 - pico-train - INFO - Step 22900 -- ๐ Training Metrics |
| 2025-08-30 22:23:33 - pico-train - INFO - โโโ Loss: 5.0640 |
| 2025-08-30 22:23:33 - pico-train - INFO - โโโ Learning Rate: 1.78e-04 |
| 2025-08-30 22:23:33 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 22:24:32 - pico-train - INFO - Step 23000 -- ๐ Training Metrics |
| 2025-08-30 22:24:32 - pico-train - INFO - โโโ Loss: 5.0862 |
| 2025-08-30 22:24:32 - pico-train - INFO - โโโ Learning Rate: 1.78e-04 |
| 2025-08-30 22:24:32 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 22:25:24 - pico-train - INFO - Step 23100 -- ๐ Training Metrics |
| 2025-08-30 22:25:24 - pico-train - INFO - โโโ Loss: 5.0783 |
| 2025-08-30 22:25:24 - pico-train - INFO - โโโ Learning Rate: 1.78e-04 |
| 2025-08-30 22:25:24 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 22:26:15 - pico-train - INFO - Step 23200 -- ๐ Training Metrics |
| 2025-08-30 22:26:15 - pico-train - INFO - โโโ Loss: 5.0221 |
| 2025-08-30 22:26:15 - pico-train - INFO - โโโ Learning Rate: 1.78e-04 |
| 2025-08-30 22:26:15 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 22:27:07 - pico-train - INFO - Step 23300 -- ๐ Training Metrics |
| 2025-08-30 22:27:07 - pico-train - INFO - โโโ Loss: 5.0721 |
| 2025-08-30 22:27:07 - pico-train - INFO - โโโ Learning Rate: 1.78e-04 |
| 2025-08-30 22:27:07 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 22:27:59 - pico-train - INFO - Step 23400 -- ๐ Training Metrics |
| 2025-08-30 22:27:59 - pico-train - INFO - โโโ Loss: 5.0740 |
| 2025-08-30 22:27:59 - pico-train - INFO - โโโ Learning Rate: 1.77e-04 |
| 2025-08-30 22:27:59 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 22:28:52 - pico-train - INFO - Step 23500 -- ๐ Training Metrics |
| 2025-08-30 22:28:52 - pico-train - INFO - โโโ Loss: 5.0539 |
| 2025-08-30 22:28:52 - pico-train - INFO - โโโ Learning Rate: 1.77e-04 |
| 2025-08-30 22:28:52 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 22:29:44 - pico-train - INFO - Step 23600 -- ๐ Training Metrics |
| 2025-08-30 22:29:44 - pico-train - INFO - โโโ Loss: 5.0133 |
| 2025-08-30 22:29:44 - pico-train - INFO - โโโ Learning Rate: 1.77e-04 |
| 2025-08-30 22:29:44 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 22:30:36 - pico-train - INFO - Step 23700 -- ๐ Training Metrics |
| 2025-08-30 22:30:36 - pico-train - INFO - โโโ Loss: 4.9849 |
| 2025-08-30 22:30:36 - pico-train - INFO - โโโ Learning Rate: 1.77e-04 |
| 2025-08-30 22:30:36 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 22:31:28 - pico-train - INFO - Step 23800 -- ๐ Training Metrics |
| 2025-08-30 22:31:28 - pico-train - INFO - โโโ Loss: 4.8925 |
| 2025-08-30 22:31:28 - pico-train - INFO - โโโ Learning Rate: 1.77e-04 |
| 2025-08-30 22:31:28 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 22:32:19 - pico-train - INFO - Step 23900 -- ๐ Training Metrics |
| 2025-08-30 22:32:19 - pico-train - INFO - โโโ Loss: 4.9386 |
| 2025-08-30 22:32:19 - pico-train - INFO - โโโ Learning Rate: 1.76e-04 |
| 2025-08-30 22:32:19 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 22:33:11 - pico-train - INFO - Step 24000 -- ๐พ Saving Checkpoint |
| 2025-08-30 22:35:00 - pico-train - INFO - Step 24000 -- ๐ Evaluation Results |
| 2025-08-30 22:35:00 - pico-train - INFO - โโโ paloma: inf |
| 2025-08-30 22:35:01 - pico-train - INFO - Step 24000 -- ๐ Training Metrics |
| 2025-08-30 22:35:01 - pico-train - INFO - โโโ Loss: 4.9406 |
| 2025-08-30 22:35:01 - pico-train - INFO - โโโ Learning Rate: 1.76e-04 |
| 2025-08-30 22:35:01 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 22:35:01 - pico-train - INFO - Step 24000 -- ๐ Saving Learning Dynamics |
| 2025-08-30 22:35:53 - pico-train - INFO - Step 24100 -- ๐ Training Metrics |
| 2025-08-30 22:35:53 - pico-train - INFO - โโโ Loss: 4.9201 |
| 2025-08-30 22:35:53 - pico-train - INFO - โโโ Learning Rate: 1.76e-04 |
| 2025-08-30 22:35:53 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 22:36:45 - pico-train - INFO - Step 24200 -- ๐ Training Metrics |
| 2025-08-30 22:36:45 - pico-train - INFO - โโโ Loss: 5.0127 |
| 2025-08-30 22:36:45 - pico-train - INFO - โโโ Learning Rate: 1.76e-04 |
| 2025-08-30 22:36:45 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 22:37:36 - pico-train - INFO - Step 24300 -- ๐ Training Metrics |
| 2025-08-30 22:37:36 - pico-train - INFO - โโโ Loss: 5.0086 |
| 2025-08-30 22:37:36 - pico-train - INFO - โโโ Learning Rate: 1.76e-04 |
| 2025-08-30 22:37:36 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 22:38:28 - pico-train - INFO - Step 24400 -- ๐ Training Metrics |
| 2025-08-30 22:38:28 - pico-train - INFO - โโโ Loss: 5.0631 |
| 2025-08-30 22:38:28 - pico-train - INFO - โโโ Learning Rate: 1.75e-04 |
| 2025-08-30 22:38:28 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 22:39:21 - pico-train - INFO - Step 24500 -- ๐ Training Metrics |
| 2025-08-30 22:39:21 - pico-train - INFO - โโโ Loss: 5.0557 |
| 2025-08-30 22:39:21 - pico-train - INFO - โโโ Learning Rate: 1.75e-04 |
| 2025-08-30 22:39:21 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 22:40:12 - pico-train - INFO - Step 24600 -- ๐ Training Metrics |
| 2025-08-30 22:40:12 - pico-train - INFO - โโโ Loss: 5.1006 |
| 2025-08-30 22:40:12 - pico-train - INFO - โโโ Learning Rate: 1.75e-04 |
| 2025-08-30 22:40:12 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 22:41:04 - pico-train - INFO - Step 24700 -- ๐ Training Metrics |
| 2025-08-30 22:41:04 - pico-train - INFO - โโโ Loss: 4.9891 |
| 2025-08-30 22:41:04 - pico-train - INFO - โโโ Learning Rate: 1.75e-04 |
| 2025-08-30 22:41:04 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 22:41:56 - pico-train - INFO - Step 24800 -- ๐ Training Metrics |
| 2025-08-30 22:41:56 - pico-train - INFO - โโโ Loss: 5.0367 |
| 2025-08-30 22:41:56 - pico-train - INFO - โโโ Learning Rate: 1.74e-04 |
| 2025-08-30 22:41:56 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 22:42:48 - pico-train - INFO - Step 24900 -- ๐ Training Metrics |
| 2025-08-30 22:42:48 - pico-train - INFO - โโโ Loss: 5.1347 |
| 2025-08-30 22:42:48 - pico-train - INFO - โโโ Learning Rate: 1.74e-04 |
| 2025-08-30 22:42:48 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 22:43:40 - pico-train - INFO - Step 25000 -- ๐ Training Metrics |
| 2025-08-30 22:43:40 - pico-train - INFO - โโโ Loss: 5.0413 |
| 2025-08-30 22:43:40 - pico-train - INFO - โโโ Learning Rate: 1.74e-04 |
| 2025-08-30 22:43:40 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 22:44:32 - pico-train - INFO - Step 25100 -- ๐ Training Metrics |
| 2025-08-30 22:44:32 - pico-train - INFO - โโโ Loss: 5.0933 |
| 2025-08-30 22:44:32 - pico-train - INFO - โโโ Learning Rate: 1.74e-04 |
| 2025-08-30 22:44:32 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 22:45:23 - pico-train - INFO - Step 25200 -- ๐ Training Metrics |
| 2025-08-30 22:45:23 - pico-train - INFO - โโโ Loss: 5.0863 |
| 2025-08-30 22:45:23 - pico-train - INFO - โโโ Learning Rate: 1.74e-04 |
| 2025-08-30 22:45:23 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 22:46:15 - pico-train - INFO - Step 25300 -- ๐ Training Metrics |
| 2025-08-30 22:46:15 - pico-train - INFO - โโโ Loss: 5.0696 |
| 2025-08-30 22:46:15 - pico-train - INFO - โโโ Learning Rate: 1.73e-04 |
| 2025-08-30 22:46:15 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 22:47:07 - pico-train - INFO - Step 25400 -- ๐ Training Metrics |
| 2025-08-30 22:47:07 - pico-train - INFO - โโโ Loss: 5.0548 |
| 2025-08-30 22:47:07 - pico-train - INFO - โโโ Learning Rate: 1.73e-04 |
| 2025-08-30 22:47:07 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 22:47:59 - pico-train - INFO - Step 25500 -- ๐ Training Metrics |
| 2025-08-30 22:47:59 - pico-train - INFO - โโโ Loss: 5.0383 |
| 2025-08-30 22:47:59 - pico-train - INFO - โโโ Learning Rate: 1.73e-04 |
| 2025-08-30 22:47:59 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 22:48:51 - pico-train - INFO - Step 25600 -- ๐ Training Metrics |
| 2025-08-30 22:48:51 - pico-train - INFO - โโโ Loss: 5.0622 |
| 2025-08-30 22:48:51 - pico-train - INFO - โโโ Learning Rate: 1.73e-04 |
| 2025-08-30 22:48:51 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 22:49:43 - pico-train - INFO - Step 25700 -- ๐ Training Metrics |
| 2025-08-30 22:49:43 - pico-train - INFO - โโโ Loss: 5.1004 |
| 2025-08-30 22:49:43 - pico-train - INFO - โโโ Learning Rate: 1.73e-04 |
| 2025-08-30 22:49:43 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 22:50:35 - pico-train - INFO - Step 25800 -- ๐ Training Metrics |
| 2025-08-30 22:50:35 - pico-train - INFO - โโโ Loss: 5.0557 |
| 2025-08-30 22:50:35 - pico-train - INFO - โโโ Learning Rate: 1.72e-04 |
| 2025-08-30 22:50:35 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 22:51:26 - pico-train - INFO - Step 25900 -- ๐ Training Metrics |
| 2025-08-30 22:51:26 - pico-train - INFO - โโโ Loss: 5.0435 |
| 2025-08-30 22:51:26 - pico-train - INFO - โโโ Learning Rate: 1.72e-04 |
| 2025-08-30 22:51:26 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 22:52:18 - pico-train - INFO - Step 26000 -- ๐พ Saving Checkpoint |
| 2025-08-30 22:54:07 - pico-train - INFO - Step 26000 -- ๐ Evaluation Results |
| 2025-08-30 22:54:07 - pico-train - INFO - โโโ paloma: inf |
| 2025-08-30 22:54:07 - pico-train - INFO - Step 26000 -- ๐ Training Metrics |
| 2025-08-30 22:54:07 - pico-train - INFO - โโโ Loss: 5.0616 |
| 2025-08-30 22:54:07 - pico-train - INFO - โโโ Learning Rate: 1.72e-04 |
| 2025-08-30 22:54:07 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 22:54:07 - pico-train - INFO - Step 26000 -- ๐ Saving Learning Dynamics |
| 2025-08-30 22:54:59 - pico-train - INFO - Step 26100 -- ๐ Training Metrics |
| 2025-08-30 22:54:59 - pico-train - INFO - โโโ Loss: 5.0612 |
| 2025-08-30 22:54:59 - pico-train - INFO - โโโ Learning Rate: 1.72e-04 |
| 2025-08-30 22:54:59 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 22:55:51 - pico-train - INFO - Step 26200 -- ๐ Training Metrics |
| 2025-08-30 22:55:51 - pico-train - INFO - โโโ Loss: 4.9843 |
| 2025-08-30 22:55:51 - pico-train - INFO - โโโ Learning Rate: 1.71e-04 |
| 2025-08-30 22:55:51 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 22:56:43 - pico-train - INFO - Step 26300 -- ๐ Training Metrics |
| 2025-08-30 22:56:43 - pico-train - INFO - โโโ Loss: 5.0666 |
| 2025-08-30 22:56:43 - pico-train - INFO - โโโ Learning Rate: 1.71e-04 |
| 2025-08-30 22:56:43 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 22:57:35 - pico-train - INFO - Step 26400 -- ๐ Training Metrics |
| 2025-08-30 22:57:35 - pico-train - INFO - โโโ Loss: 5.0759 |
| 2025-08-30 22:57:35 - pico-train - INFO - โโโ Learning Rate: 1.71e-04 |
| 2025-08-30 22:57:35 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 22:58:27 - pico-train - INFO - Step 26500 -- ๐ Training Metrics |
| 2025-08-30 22:58:27 - pico-train - INFO - โโโ Loss: 5.0495 |
| 2025-08-30 22:58:27 - pico-train - INFO - โโโ Learning Rate: 1.71e-04 |
| 2025-08-30 22:58:27 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 22:59:19 - pico-train - INFO - Step 26600 -- ๐ Training Metrics |
| 2025-08-30 22:59:19 - pico-train - INFO - โโโ Loss: 5.0772 |
| 2025-08-30 22:59:19 - pico-train - INFO - โโโ Learning Rate: 1.70e-04 |
| 2025-08-30 22:59:19 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 23:00:11 - pico-train - INFO - Step 26700 -- ๐ Training Metrics |
| 2025-08-30 23:00:11 - pico-train - INFO - โโโ Loss: 4.9182 |
| 2025-08-30 23:00:11 - pico-train - INFO - โโโ Learning Rate: 1.70e-04 |
| 2025-08-30 23:00:11 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 23:01:02 - pico-train - INFO - Step 26800 -- ๐ Training Metrics |
| 2025-08-30 23:01:02 - pico-train - INFO - โโโ Loss: 5.0919 |
| 2025-08-30 23:01:02 - pico-train - INFO - โโโ Learning Rate: 1.70e-04 |
| 2025-08-30 23:01:02 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 23:01:54 - pico-train - INFO - Step 26900 -- ๐ Training Metrics |
| 2025-08-30 23:01:54 - pico-train - INFO - โโโ Loss: 5.0713 |
| 2025-08-30 23:01:54 - pico-train - INFO - โโโ Learning Rate: 1.70e-04 |
| 2025-08-30 23:01:54 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 23:02:46 - pico-train - INFO - Step 27000 -- ๐ Training Metrics |
| 2025-08-30 23:02:46 - pico-train - INFO - โโโ Loss: 5.0032 |
| 2025-08-30 23:02:46 - pico-train - INFO - โโโ Learning Rate: 1.70e-04 |
| 2025-08-30 23:02:46 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 23:03:38 - pico-train - INFO - Step 27100 -- ๐ Training Metrics |
| 2025-08-30 23:03:38 - pico-train - INFO - โโโ Loss: 5.0434 |
| 2025-08-30 23:03:38 - pico-train - INFO - โโโ Learning Rate: 1.69e-04 |
| 2025-08-30 23:03:38 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 23:04:30 - pico-train - INFO - Step 27200 -- ๐ Training Metrics |
| 2025-08-30 23:04:30 - pico-train - INFO - โโโ Loss: 5.0721 |
| 2025-08-30 23:04:30 - pico-train - INFO - โโโ Learning Rate: 1.69e-04 |
| 2025-08-30 23:04:30 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 23:05:22 - pico-train - INFO - Step 27300 -- ๐ Training Metrics |
| 2025-08-30 23:05:22 - pico-train - INFO - โโโ Loss: 5.0265 |
| 2025-08-30 23:05:22 - pico-train - INFO - โโโ Learning Rate: 1.69e-04 |
| 2025-08-30 23:05:22 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 23:06:14 - pico-train - INFO - Step 27400 -- ๐ Training Metrics |
| 2025-08-30 23:06:14 - pico-train - INFO - โโโ Loss: 5.0055 |
| 2025-08-30 23:06:14 - pico-train - INFO - โโโ Learning Rate: 1.69e-04 |
| 2025-08-30 23:06:14 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 23:07:06 - pico-train - INFO - Step 27500 -- ๐ Training Metrics |
| 2025-08-30 23:07:06 - pico-train - INFO - โโโ Loss: 4.9898 |
| 2025-08-30 23:07:06 - pico-train - INFO - โโโ Learning Rate: 1.68e-04 |
| 2025-08-30 23:07:06 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 23:07:57 - pico-train - INFO - Step 27600 -- ๐ Training Metrics |
| 2025-08-30 23:07:57 - pico-train - INFO - โโโ Loss: 4.8553 |
| 2025-08-30 23:07:57 - pico-train - INFO - โโโ Learning Rate: 1.68e-04 |
| 2025-08-30 23:07:57 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 23:08:49 - pico-train - INFO - Step 27700 -- ๐ Training Metrics |
| 2025-08-30 23:08:49 - pico-train - INFO - โโโ Loss: 4.9803 |
| 2025-08-30 23:08:49 - pico-train - INFO - โโโ Learning Rate: 1.68e-04 |
| 2025-08-30 23:08:49 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 23:09:41 - pico-train - INFO - Step 27800 -- ๐ Training Metrics |
| 2025-08-30 23:09:41 - pico-train - INFO - โโโ Loss: 5.0268 |
| 2025-08-30 23:09:41 - pico-train - INFO - โโโ Learning Rate: 1.68e-04 |
| 2025-08-30 23:09:41 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 23:10:33 - pico-train - INFO - Step 27900 -- ๐ Training Metrics |
| 2025-08-30 23:10:33 - pico-train - INFO - โโโ Loss: 4.9910 |
| 2025-08-30 23:10:33 - pico-train - INFO - โโโ Learning Rate: 1.67e-04 |
| 2025-08-30 23:10:33 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 23:11:25 - pico-train - INFO - Step 28000 -- ๐พ Saving Checkpoint |
| 2025-08-30 23:13:14 - pico-train - INFO - Step 28000 -- ๐ Evaluation Results |
| 2025-08-30 23:13:14 - pico-train - INFO - โโโ paloma: inf |
| 2025-08-30 23:13:14 - pico-train - INFO - Step 28000 -- ๐ Training Metrics |
| 2025-08-30 23:13:14 - pico-train - INFO - โโโ Loss: 4.9920 |
| 2025-08-30 23:13:14 - pico-train - INFO - โโโ Learning Rate: 1.67e-04 |
| 2025-08-30 23:13:14 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 23:13:14 - pico-train - INFO - Step 28000 -- ๐ Saving Learning Dynamics |
| 2025-08-30 23:14:07 - pico-train - INFO - Step 28100 -- ๐ Training Metrics |
| 2025-08-30 23:14:07 - pico-train - INFO - โโโ Loss: 5.0400 |
| 2025-08-30 23:14:07 - pico-train - INFO - โโโ Learning Rate: 1.67e-04 |
| 2025-08-30 23:14:07 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 23:14:59 - pico-train - INFO - Step 28200 -- ๐ Training Metrics |
| 2025-08-30 23:14:59 - pico-train - INFO - โโโ Loss: 4.9727 |
| 2025-08-30 23:14:59 - pico-train - INFO - โโโ Learning Rate: 1.67e-04 |
| 2025-08-30 23:14:59 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 23:15:51 - pico-train - INFO - Step 28300 -- ๐ Training Metrics |
| 2025-08-30 23:15:51 - pico-train - INFO - โโโ Loss: 5.0073 |
| 2025-08-30 23:15:51 - pico-train - INFO - โโโ Learning Rate: 1.67e-04 |
| 2025-08-30 23:15:51 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 23:16:43 - pico-train - INFO - Step 28400 -- ๐ Training Metrics |
| 2025-08-30 23:16:43 - pico-train - INFO - โโโ Loss: 5.0681 |
| 2025-08-30 23:16:43 - pico-train - INFO - โโโ Learning Rate: 1.66e-04 |
| 2025-08-30 23:16:43 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 23:17:35 - pico-train - INFO - Step 28500 -- ๐ Training Metrics |
| 2025-08-30 23:17:35 - pico-train - INFO - โโโ Loss: 5.0500 |
| 2025-08-30 23:17:35 - pico-train - INFO - โโโ Learning Rate: 1.66e-04 |
| 2025-08-30 23:17:35 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 23:18:27 - pico-train - INFO - Step 28600 -- ๐ Training Metrics |
| 2025-08-30 23:18:27 - pico-train - INFO - โโโ Loss: 5.0144 |
| 2025-08-30 23:18:27 - pico-train - INFO - โโโ Learning Rate: 1.66e-04 |
| 2025-08-30 23:18:27 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 23:19:19 - pico-train - INFO - Step 28700 -- ๐ Training Metrics |
| 2025-08-30 23:19:19 - pico-train - INFO - โโโ Loss: 5.0618 |
| 2025-08-30 23:19:19 - pico-train - INFO - โโโ Learning Rate: 1.66e-04 |
| 2025-08-30 23:19:19 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 23:20:10 - pico-train - INFO - Step 28800 -- ๐ Training Metrics |
| 2025-08-30 23:20:10 - pico-train - INFO - โโโ Loss: 5.0330 |
| 2025-08-30 23:20:10 - pico-train - INFO - โโโ Learning Rate: 1.65e-04 |
| 2025-08-30 23:20:10 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 23:21:02 - pico-train - INFO - Step 28900 -- ๐ Training Metrics |
| 2025-08-30 23:21:02 - pico-train - INFO - โโโ Loss: 4.9992 |
| 2025-08-30 23:21:02 - pico-train - INFO - โโโ Learning Rate: 1.65e-04 |
| 2025-08-30 23:21:02 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 23:21:55 - pico-train - INFO - Step 29000 -- ๐ Training Metrics |
| 2025-08-30 23:21:55 - pico-train - INFO - โโโ Loss: 4.9424 |
| 2025-08-30 23:21:55 - pico-train - INFO - โโโ Learning Rate: 1.65e-04 |
| 2025-08-30 23:21:55 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 23:22:46 - pico-train - INFO - Step 29100 -- ๐ Training Metrics |
| 2025-08-30 23:22:46 - pico-train - INFO - โโโ Loss: 4.9886 |
| 2025-08-30 23:22:46 - pico-train - INFO - โโโ Learning Rate: 1.65e-04 |
| 2025-08-30 23:22:46 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 23:23:38 - pico-train - INFO - Step 29200 -- ๐ Training Metrics |
| 2025-08-30 23:23:38 - pico-train - INFO - โโโ Loss: 4.9621 |
| 2025-08-30 23:23:38 - pico-train - INFO - โโโ Learning Rate: 1.64e-04 |
| 2025-08-30 23:23:38 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 23:24:30 - pico-train - INFO - Step 29300 -- ๐ Training Metrics |
| 2025-08-30 23:24:30 - pico-train - INFO - โโโ Loss: 4.9392 |
| 2025-08-30 23:24:30 - pico-train - INFO - โโโ Learning Rate: 1.64e-04 |
| 2025-08-30 23:24:30 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 23:25:22 - pico-train - INFO - Step 29400 -- ๐ Training Metrics |
| 2025-08-30 23:25:22 - pico-train - INFO - โโโ Loss: 5.0059 |
| 2025-08-30 23:25:22 - pico-train - INFO - โโโ Learning Rate: 1.64e-04 |
| 2025-08-30 23:25:22 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 23:26:14 - pico-train - INFO - Step 29500 -- ๐ Training Metrics |
| 2025-08-30 23:26:14 - pico-train - INFO - โโโ Loss: 4.9028 |
| 2025-08-30 23:26:14 - pico-train - INFO - โโโ Learning Rate: 1.64e-04 |
| 2025-08-30 23:26:14 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 23:27:06 - pico-train - INFO - Step 29600 -- ๐ Training Metrics |
| 2025-08-30 23:27:06 - pico-train - INFO - โโโ Loss: 4.9401 |
| 2025-08-30 23:27:06 - pico-train - INFO - โโโ Learning Rate: 1.63e-04 |
| 2025-08-30 23:27:06 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 23:27:57 - pico-train - INFO - Step 29700 -- ๐ Training Metrics |
| 2025-08-30 23:27:57 - pico-train - INFO - โโโ Loss: 4.8644 |
| 2025-08-30 23:27:57 - pico-train - INFO - โโโ Learning Rate: 1.63e-04 |
| 2025-08-30 23:27:57 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 23:28:49 - pico-train - INFO - Step 29800 -- ๐ Training Metrics |
| 2025-08-30 23:28:49 - pico-train - INFO - โโโ Loss: 4.9997 |
| 2025-08-30 23:28:49 - pico-train - INFO - โโโ Learning Rate: 1.63e-04 |
| 2025-08-30 23:28:49 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 23:29:41 - pico-train - INFO - Step 29900 -- ๐ Training Metrics |
| 2025-08-30 23:29:41 - pico-train - INFO - โโโ Loss: 4.9959 |
| 2025-08-30 23:29:41 - pico-train - INFO - โโโ Learning Rate: 1.63e-04 |
| 2025-08-30 23:29:41 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 23:30:33 - pico-train - INFO - Step 30000 -- ๐พ Saving Checkpoint |
| 2025-08-30 23:32:32 - pico-train - INFO - Step 30000 -- ๐ Evaluation Results |
| 2025-08-30 23:32:32 - pico-train - INFO - โโโ paloma: inf |
| 2025-08-30 23:32:32 - pico-train - INFO - Step 30000 -- ๐ Training Metrics |
| 2025-08-30 23:32:32 - pico-train - INFO - โโโ Loss: 4.6933 |
| 2025-08-30 23:32:32 - pico-train - INFO - โโโ Learning Rate: 1.62e-04 |
| 2025-08-30 23:32:32 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 23:32:32 - pico-train - INFO - Step 30000 -- ๐ Saving Learning Dynamics |
| 2025-08-30 23:33:25 - pico-train - INFO - Step 30100 -- ๐ Training Metrics |
| 2025-08-30 23:33:25 - pico-train - INFO - โโโ Loss: 4.0656 |
| 2025-08-30 23:33:25 - pico-train - INFO - โโโ Learning Rate: 1.62e-04 |
| 2025-08-30 23:33:25 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 23:34:17 - pico-train - INFO - Step 30200 -- ๐ Training Metrics |
| 2025-08-30 23:34:17 - pico-train - INFO - โโโ Loss: 4.9761 |
| 2025-08-30 23:34:17 - pico-train - INFO - โโโ Learning Rate: 1.62e-04 |
| 2025-08-30 23:34:17 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 23:35:09 - pico-train - INFO - Step 30300 -- ๐ Training Metrics |
| 2025-08-30 23:35:09 - pico-train - INFO - โโโ Loss: 4.9614 |
| 2025-08-30 23:35:09 - pico-train - INFO - โโโ Learning Rate: 1.62e-04 |
| 2025-08-30 23:35:09 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 23:36:01 - pico-train - INFO - Step 30400 -- ๐ Training Metrics |
| 2025-08-30 23:36:01 - pico-train - INFO - โโโ Loss: 4.9599 |
| 2025-08-30 23:36:01 - pico-train - INFO - โโโ Learning Rate: 1.61e-04 |
| 2025-08-30 23:36:01 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 23:36:53 - pico-train - INFO - Step 30500 -- ๐ Training Metrics |
| 2025-08-30 23:36:53 - pico-train - INFO - โโโ Loss: 5.0080 |
| 2025-08-30 23:36:53 - pico-train - INFO - โโโ Learning Rate: 1.61e-04 |
| 2025-08-30 23:36:53 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 23:37:45 - pico-train - INFO - Step 30600 -- ๐ Training Metrics |
| 2025-08-30 23:37:45 - pico-train - INFO - โโโ Loss: 4.9963 |
| 2025-08-30 23:37:45 - pico-train - INFO - โโโ Learning Rate: 1.61e-04 |
| 2025-08-30 23:37:45 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 23:38:37 - pico-train - INFO - Step 30700 -- ๐ Training Metrics |
| 2025-08-30 23:38:37 - pico-train - INFO - โโโ Loss: 4.7991 |
| 2025-08-30 23:38:37 - pico-train - INFO - โโโ Learning Rate: 1.61e-04 |
| 2025-08-30 23:38:37 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 23:39:29 - pico-train - INFO - Step 30800 -- ๐ Training Metrics |
| 2025-08-30 23:39:29 - pico-train - INFO - โโโ Loss: 4.9226 |
| 2025-08-30 23:39:29 - pico-train - INFO - โโโ Learning Rate: 1.60e-04 |
| 2025-08-30 23:39:29 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 23:40:20 - pico-train - INFO - Step 30900 -- ๐ Training Metrics |
| 2025-08-30 23:40:20 - pico-train - INFO - โโโ Loss: 4.9592 |
| 2025-08-30 23:40:20 - pico-train - INFO - โโโ Learning Rate: 1.60e-04 |
| 2025-08-30 23:40:20 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 23:41:13 - pico-train - INFO - Step 31000 -- ๐ Training Metrics |
| 2025-08-30 23:41:13 - pico-train - INFO - โโโ Loss: 4.9497 |
| 2025-08-30 23:41:13 - pico-train - INFO - โโโ Learning Rate: 1.60e-04 |
| 2025-08-30 23:41:13 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 23:42:04 - pico-train - INFO - Step 31100 -- ๐ Training Metrics |
| 2025-08-30 23:42:04 - pico-train - INFO - โโโ Loss: 4.9682 |
| 2025-08-30 23:42:04 - pico-train - INFO - โโโ Learning Rate: 1.60e-04 |
| 2025-08-30 23:42:04 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 23:42:57 - pico-train - INFO - Step 31200 -- ๐ Training Metrics |
| 2025-08-30 23:42:57 - pico-train - INFO - โโโ Loss: 5.0640 |
| 2025-08-30 23:42:57 - pico-train - INFO - โโโ Learning Rate: 1.59e-04 |
| 2025-08-30 23:42:57 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 23:43:48 - pico-train - INFO - Step 31300 -- ๐ Training Metrics |
| 2025-08-30 23:43:48 - pico-train - INFO - โโโ Loss: 5.0062 |
| 2025-08-30 23:43:48 - pico-train - INFO - โโโ Learning Rate: 1.59e-04 |
| 2025-08-30 23:43:48 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 23:44:40 - pico-train - INFO - Step 31400 -- ๐ Training Metrics |
| 2025-08-30 23:44:40 - pico-train - INFO - โโโ Loss: 4.9914 |
| 2025-08-30 23:44:40 - pico-train - INFO - โโโ Learning Rate: 1.59e-04 |
| 2025-08-30 23:44:40 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 23:45:33 - pico-train - INFO - Step 31500 -- ๐ Training Metrics |
| 2025-08-30 23:45:33 - pico-train - INFO - โโโ Loss: 4.9742 |
| 2025-08-30 23:45:33 - pico-train - INFO - โโโ Learning Rate: 1.59e-04 |
| 2025-08-30 23:45:33 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 23:46:25 - pico-train - INFO - Step 31600 -- ๐ Training Metrics |
| 2025-08-30 23:46:25 - pico-train - INFO - โโโ Loss: 4.9519 |
| 2025-08-30 23:46:25 - pico-train - INFO - โโโ Learning Rate: 1.58e-04 |
| 2025-08-30 23:46:25 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 23:47:18 - pico-train - INFO - Step 31700 -- ๐ Training Metrics |
| 2025-08-30 23:47:18 - pico-train - INFO - โโโ Loss: 5.0267 |
| 2025-08-30 23:47:18 - pico-train - INFO - โโโ Learning Rate: 1.58e-04 |
| 2025-08-30 23:47:18 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 23:48:10 - pico-train - INFO - Step 31800 -- ๐ Training Metrics |
| 2025-08-30 23:48:10 - pico-train - INFO - โโโ Loss: 4.9511 |
| 2025-08-30 23:48:10 - pico-train - INFO - โโโ Learning Rate: 1.58e-04 |
| 2025-08-30 23:48:10 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 23:49:02 - pico-train - INFO - Step 31900 -- ๐ Training Metrics |
| 2025-08-30 23:49:02 - pico-train - INFO - โโโ Loss: 4.9648 |
| 2025-08-30 23:49:02 - pico-train - INFO - โโโ Learning Rate: 1.57e-04 |
| 2025-08-30 23:49:02 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 23:49:54 - pico-train - INFO - Step 32000 -- ๐พ Saving Checkpoint |
| 2025-08-30 23:51:43 - pico-train - INFO - Step 32000 -- ๐ Evaluation Results |
| 2025-08-30 23:51:43 - pico-train - INFO - โโโ paloma: inf |
| 2025-08-30 23:51:43 - pico-train - INFO - Step 32000 -- ๐ Training Metrics |
| 2025-08-30 23:51:43 - pico-train - INFO - โโโ Loss: 4.9717 |
| 2025-08-30 23:51:43 - pico-train - INFO - โโโ Learning Rate: 1.57e-04 |
| 2025-08-30 23:51:43 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 23:51:43 - pico-train - INFO - Step 32000 -- ๐ Saving Learning Dynamics |
| 2025-08-30 23:52:36 - pico-train - INFO - Step 32100 -- ๐ Training Metrics |
| 2025-08-30 23:52:36 - pico-train - INFO - โโโ Loss: 4.9565 |
| 2025-08-30 23:52:36 - pico-train - INFO - โโโ Learning Rate: 1.57e-04 |
| 2025-08-30 23:52:36 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 23:53:28 - pico-train - INFO - Step 32200 -- ๐ Training Metrics |
| 2025-08-30 23:53:28 - pico-train - INFO - โโโ Loss: 4.9479 |
| 2025-08-30 23:53:28 - pico-train - INFO - โโโ Learning Rate: 1.57e-04 |
| 2025-08-30 23:53:28 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 23:54:20 - pico-train - INFO - Step 32300 -- ๐ Training Metrics |
| 2025-08-30 23:54:20 - pico-train - INFO - โโโ Loss: 4.9513 |
| 2025-08-30 23:54:20 - pico-train - INFO - โโโ Learning Rate: 1.56e-04 |
| 2025-08-30 23:54:20 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 23:55:12 - pico-train - INFO - Step 32400 -- ๐ Training Metrics |
| 2025-08-30 23:55:12 - pico-train - INFO - โโโ Loss: 4.9405 |
| 2025-08-30 23:55:12 - pico-train - INFO - โโโ Learning Rate: 1.56e-04 |
| 2025-08-30 23:55:12 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 23:56:04 - pico-train - INFO - Step 32500 -- ๐ Training Metrics |
| 2025-08-30 23:56:04 - pico-train - INFO - โโโ Loss: 4.9576 |
| 2025-08-30 23:56:04 - pico-train - INFO - โโโ Learning Rate: 1.56e-04 |
| 2025-08-30 23:56:04 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 23:56:56 - pico-train - INFO - Step 32600 -- ๐ Training Metrics |
| 2025-08-30 23:56:56 - pico-train - INFO - โโโ Loss: 4.7705 |
| 2025-08-30 23:56:56 - pico-train - INFO - โโโ Learning Rate: 1.56e-04 |
| 2025-08-30 23:56:56 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 23:57:48 - pico-train - INFO - Step 32700 -- ๐ Training Metrics |
| 2025-08-30 23:57:48 - pico-train - INFO - โโโ Loss: 4.8987 |
| 2025-08-30 23:57:48 - pico-train - INFO - โโโ Learning Rate: 1.55e-04 |
| 2025-08-30 23:57:48 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 23:58:40 - pico-train - INFO - Step 32800 -- ๐ Training Metrics |
| 2025-08-30 23:58:40 - pico-train - INFO - โโโ Loss: 5.0154 |
| 2025-08-30 23:58:40 - pico-train - INFO - โโโ Learning Rate: 1.55e-04 |
| 2025-08-30 23:58:40 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-30 23:59:32 - pico-train - INFO - Step 32900 -- ๐ Training Metrics |
| 2025-08-30 23:59:32 - pico-train - INFO - โโโ Loss: 4.9710 |
| 2025-08-30 23:59:32 - pico-train - INFO - โโโ Learning Rate: 1.55e-04 |
| 2025-08-30 23:59:32 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 00:00:24 - pico-train - INFO - Step 33000 -- ๐ Training Metrics |
| 2025-08-31 00:00:24 - pico-train - INFO - โโโ Loss: 4.9653 |
| 2025-08-31 00:00:24 - pico-train - INFO - โโโ Learning Rate: 1.55e-04 |
| 2025-08-31 00:00:24 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 00:01:16 - pico-train - INFO - Step 33100 -- ๐ Training Metrics |
| 2025-08-31 00:01:16 - pico-train - INFO - โโโ Loss: 4.9613 |
| 2025-08-31 00:01:16 - pico-train - INFO - โโโ Learning Rate: 1.54e-04 |
| 2025-08-31 00:01:16 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 00:02:08 - pico-train - INFO - Step 33200 -- ๐ Training Metrics |
| 2025-08-31 00:02:08 - pico-train - INFO - โโโ Loss: 4.9135 |
| 2025-08-31 00:02:08 - pico-train - INFO - โโโ Learning Rate: 1.54e-04 |
| 2025-08-31 00:02:08 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 00:03:00 - pico-train - INFO - Step 33300 -- ๐ Training Metrics |
| 2025-08-31 00:03:00 - pico-train - INFO - โโโ Loss: 5.0232 |
| 2025-08-31 00:03:00 - pico-train - INFO - โโโ Learning Rate: 1.54e-04 |
| 2025-08-31 00:03:00 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 00:03:52 - pico-train - INFO - Step 33400 -- ๐ Training Metrics |
| 2025-08-31 00:03:52 - pico-train - INFO - โโโ Loss: 4.9169 |
| 2025-08-31 00:03:52 - pico-train - INFO - โโโ Learning Rate: 1.53e-04 |
| 2025-08-31 00:03:52 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 00:04:44 - pico-train - INFO - Step 33500 -- ๐ Training Metrics |
| 2025-08-31 00:04:44 - pico-train - INFO - โโโ Loss: 5.0064 |
| 2025-08-31 00:04:44 - pico-train - INFO - โโโ Learning Rate: 1.53e-04 |
| 2025-08-31 00:04:44 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 00:05:36 - pico-train - INFO - Step 33600 -- ๐ Training Metrics |
| 2025-08-31 00:05:36 - pico-train - INFO - โโโ Loss: 4.8591 |
| 2025-08-31 00:05:36 - pico-train - INFO - โโโ Learning Rate: 1.53e-04 |
| 2025-08-31 00:05:36 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 00:06:29 - pico-train - INFO - Step 33700 -- ๐ Training Metrics |
| 2025-08-31 00:06:29 - pico-train - INFO - โโโ Loss: 4.8708 |
| 2025-08-31 00:06:29 - pico-train - INFO - โโโ Learning Rate: 1.53e-04 |
| 2025-08-31 00:06:29 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 00:07:21 - pico-train - INFO - Step 33800 -- ๐ Training Metrics |
| 2025-08-31 00:07:21 - pico-train - INFO - โโโ Loss: 4.9443 |
| 2025-08-31 00:07:21 - pico-train - INFO - โโโ Learning Rate: 1.52e-04 |
| 2025-08-31 00:07:21 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 00:08:12 - pico-train - INFO - Step 33900 -- ๐ Training Metrics |
| 2025-08-31 00:08:12 - pico-train - INFO - โโโ Loss: 4.8950 |
| 2025-08-31 00:08:12 - pico-train - INFO - โโโ Learning Rate: 1.52e-04 |
| 2025-08-31 00:08:12 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 00:09:05 - pico-train - INFO - Step 34000 -- ๐พ Saving Checkpoint |
| 2025-08-31 00:10:53 - pico-train - INFO - Step 34000 -- ๐ Evaluation Results |
| 2025-08-31 00:10:53 - pico-train - INFO - โโโ paloma: inf |
| 2025-08-31 00:10:54 - pico-train - INFO - Step 34000 -- ๐ Training Metrics |
| 2025-08-31 00:10:54 - pico-train - INFO - โโโ Loss: 4.9473 |
| 2025-08-31 00:10:54 - pico-train - INFO - โโโ Learning Rate: 1.52e-04 |
| 2025-08-31 00:10:54 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 00:10:54 - pico-train - INFO - Step 34000 -- ๐ Saving Learning Dynamics |
| 2025-08-31 00:11:47 - pico-train - INFO - Step 34100 -- ๐ Training Metrics |
| 2025-08-31 00:11:47 - pico-train - INFO - โโโ Loss: 4.9341 |
| 2025-08-31 00:11:47 - pico-train - INFO - โโโ Learning Rate: 1.52e-04 |
| 2025-08-31 00:11:47 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 00:12:39 - pico-train - INFO - Step 34200 -- ๐ Training Metrics |
| 2025-08-31 00:12:39 - pico-train - INFO - โโโ Loss: 4.9262 |
| 2025-08-31 00:12:39 - pico-train - INFO - โโโ Learning Rate: 1.51e-04 |
| 2025-08-31 00:12:39 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 00:13:30 - pico-train - INFO - Step 34300 -- ๐ Training Metrics |
| 2025-08-31 00:13:30 - pico-train - INFO - โโโ Loss: 4.9611 |
| 2025-08-31 00:13:30 - pico-train - INFO - โโโ Learning Rate: 1.51e-04 |
| 2025-08-31 00:13:30 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 00:14:22 - pico-train - INFO - Step 34400 -- ๐ Training Metrics |
| 2025-08-31 00:14:22 - pico-train - INFO - โโโ Loss: 4.9589 |
| 2025-08-31 00:14:22 - pico-train - INFO - โโโ Learning Rate: 1.51e-04 |
| 2025-08-31 00:14:22 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 00:15:15 - pico-train - INFO - Step 34500 -- ๐ Training Metrics |
| 2025-08-31 00:15:15 - pico-train - INFO - โโโ Loss: 4.9880 |
| 2025-08-31 00:15:15 - pico-train - INFO - โโโ Learning Rate: 1.50e-04 |
| 2025-08-31 00:15:15 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 00:16:07 - pico-train - INFO - Step 34600 -- ๐ Training Metrics |
| 2025-08-31 00:16:07 - pico-train - INFO - โโโ Loss: 4.9131 |
| 2025-08-31 00:16:07 - pico-train - INFO - โโโ Learning Rate: 1.50e-04 |
| 2025-08-31 00:16:07 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 00:16:58 - pico-train - INFO - Step 34700 -- ๐ Training Metrics |
| 2025-08-31 00:16:58 - pico-train - INFO - โโโ Loss: 4.9326 |
| 2025-08-31 00:16:58 - pico-train - INFO - โโโ Learning Rate: 1.50e-04 |
| 2025-08-31 00:16:58 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 00:17:49 - pico-train - INFO - Step 34800 -- ๐ Training Metrics |
| 2025-08-31 00:17:49 - pico-train - INFO - โโโ Loss: 4.9564 |
| 2025-08-31 00:17:49 - pico-train - INFO - โโโ Learning Rate: 1.50e-04 |
| 2025-08-31 00:17:49 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 00:18:41 - pico-train - INFO - Step 34900 -- ๐ Training Metrics |
| 2025-08-31 00:18:41 - pico-train - INFO - โโโ Loss: 4.9996 |
| 2025-08-31 00:18:41 - pico-train - INFO - โโโ Learning Rate: 1.49e-04 |
| 2025-08-31 00:18:41 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 00:19:34 - pico-train - INFO - Step 35000 -- ๐ Training Metrics |
| 2025-08-31 00:19:34 - pico-train - INFO - โโโ Loss: 4.9842 |
| 2025-08-31 00:19:34 - pico-train - INFO - โโโ Learning Rate: 1.49e-04 |
| 2025-08-31 00:19:34 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 00:20:25 - pico-train - INFO - Step 35100 -- ๐ Training Metrics |
| 2025-08-31 00:20:25 - pico-train - INFO - โโโ Loss: 4.9394 |
| 2025-08-31 00:20:25 - pico-train - INFO - โโโ Learning Rate: 1.49e-04 |
| 2025-08-31 00:20:25 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 00:21:17 - pico-train - INFO - Step 35200 -- ๐ Training Metrics |
| 2025-08-31 00:21:17 - pico-train - INFO - โโโ Loss: 4.9387 |
| 2025-08-31 00:21:17 - pico-train - INFO - โโโ Learning Rate: 1.49e-04 |
| 2025-08-31 00:21:17 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 00:22:09 - pico-train - INFO - Step 35300 -- ๐ Training Metrics |
| 2025-08-31 00:22:09 - pico-train - INFO - โโโ Loss: 4.8411 |
| 2025-08-31 00:22:09 - pico-train - INFO - โโโ Learning Rate: 1.48e-04 |
| 2025-08-31 00:22:09 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 00:23:00 - pico-train - INFO - Step 35400 -- ๐ Training Metrics |
| 2025-08-31 00:23:00 - pico-train - INFO - โโโ Loss: 4.8955 |
| 2025-08-31 00:23:00 - pico-train - INFO - โโโ Learning Rate: 1.48e-04 |
| 2025-08-31 00:23:00 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 00:23:53 - pico-train - INFO - Step 35500 -- ๐ Training Metrics |
| 2025-08-31 00:23:53 - pico-train - INFO - โโโ Loss: 4.8802 |
| 2025-08-31 00:23:53 - pico-train - INFO - โโโ Learning Rate: 1.48e-04 |
| 2025-08-31 00:23:53 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 00:24:44 - pico-train - INFO - Step 35600 -- ๐ Training Metrics |
| 2025-08-31 00:24:44 - pico-train - INFO - โโโ Loss: 4.8399 |
| 2025-08-31 00:24:44 - pico-train - INFO - โโโ Learning Rate: 1.47e-04 |
| 2025-08-31 00:24:44 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 00:25:36 - pico-train - INFO - Step 35700 -- ๐ Training Metrics |
| 2025-08-31 00:25:36 - pico-train - INFO - โโโ Loss: 4.9055 |
| 2025-08-31 00:25:36 - pico-train - INFO - โโโ Learning Rate: 1.47e-04 |
| 2025-08-31 00:25:36 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 00:26:27 - pico-train - INFO - Step 35800 -- ๐ Training Metrics |
| 2025-08-31 00:26:27 - pico-train - INFO - โโโ Loss: 4.8051 |
| 2025-08-31 00:26:27 - pico-train - INFO - โโโ Learning Rate: 1.47e-04 |
| 2025-08-31 00:26:27 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 00:27:19 - pico-train - INFO - Step 35900 -- ๐ Training Metrics |
| 2025-08-31 00:27:19 - pico-train - INFO - โโโ Loss: 4.8222 |
| 2025-08-31 00:27:19 - pico-train - INFO - โโโ Learning Rate: 1.47e-04 |
| 2025-08-31 00:27:19 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 00:28:11 - pico-train - INFO - Step 36000 -- ๐พ Saving Checkpoint |
| 2025-08-31 00:29:59 - pico-train - INFO - Step 36000 -- ๐ Evaluation Results |
| 2025-08-31 00:29:59 - pico-train - INFO - โโโ paloma: inf |
| 2025-08-31 00:30:00 - pico-train - INFO - Step 36000 -- ๐ Training Metrics |
| 2025-08-31 00:30:00 - pico-train - INFO - โโโ Loss: 4.9235 |
| 2025-08-31 00:30:00 - pico-train - INFO - โโโ Learning Rate: 1.46e-04 |
| 2025-08-31 00:30:00 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 00:30:00 - pico-train - INFO - Step 36000 -- ๐ Saving Learning Dynamics |
| 2025-08-31 00:30:52 - pico-train - INFO - Step 36100 -- ๐ Training Metrics |
| 2025-08-31 00:30:52 - pico-train - INFO - โโโ Loss: 4.8898 |
| 2025-08-31 00:30:52 - pico-train - INFO - โโโ Learning Rate: 1.46e-04 |
| 2025-08-31 00:30:52 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 00:31:44 - pico-train - INFO - Step 36200 -- ๐ Training Metrics |
| 2025-08-31 00:31:44 - pico-train - INFO - โโโ Loss: 4.9274 |
| 2025-08-31 00:31:44 - pico-train - INFO - โโโ Learning Rate: 1.46e-04 |
| 2025-08-31 00:31:44 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 00:32:35 - pico-train - INFO - Step 36300 -- ๐ Training Metrics |
| 2025-08-31 00:32:35 - pico-train - INFO - โโโ Loss: 4.8847 |
| 2025-08-31 00:32:35 - pico-train - INFO - โโโ Learning Rate: 1.45e-04 |
| 2025-08-31 00:32:35 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 00:33:27 - pico-train - INFO - Step 36400 -- ๐ Training Metrics |
| 2025-08-31 00:33:27 - pico-train - INFO - โโโ Loss: 4.8662 |
| 2025-08-31 00:33:27 - pico-train - INFO - โโโ Learning Rate: 1.45e-04 |
| 2025-08-31 00:33:27 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 00:34:19 - pico-train - INFO - Step 36500 -- ๐ Training Metrics |
| 2025-08-31 00:34:19 - pico-train - INFO - โโโ Loss: 4.9214 |
| 2025-08-31 00:34:19 - pico-train - INFO - โโโ Learning Rate: 1.45e-04 |
| 2025-08-31 00:34:19 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 00:35:11 - pico-train - INFO - Step 36600 -- ๐ Training Metrics |
| 2025-08-31 00:35:11 - pico-train - INFO - โโโ Loss: 4.8951 |
| 2025-08-31 00:35:11 - pico-train - INFO - โโโ Learning Rate: 1.45e-04 |
| 2025-08-31 00:35:11 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 00:36:03 - pico-train - INFO - Step 36700 -- ๐ Training Metrics |
| 2025-08-31 00:36:03 - pico-train - INFO - โโโ Loss: 4.9194 |
| 2025-08-31 00:36:03 - pico-train - INFO - โโโ Learning Rate: 1.44e-04 |
| 2025-08-31 00:36:03 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 00:36:54 - pico-train - INFO - Step 36800 -- ๐ Training Metrics |
| 2025-08-31 00:36:54 - pico-train - INFO - โโโ Loss: 4.8769 |
| 2025-08-31 00:36:54 - pico-train - INFO - โโโ Learning Rate: 1.44e-04 |
| 2025-08-31 00:36:54 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 00:37:46 - pico-train - INFO - Step 36900 -- ๐ Training Metrics |
| 2025-08-31 00:37:46 - pico-train - INFO - โโโ Loss: 4.9548 |
| 2025-08-31 00:37:46 - pico-train - INFO - โโโ Learning Rate: 1.44e-04 |
| 2025-08-31 00:37:46 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 00:38:39 - pico-train - INFO - Step 37000 -- ๐ Training Metrics |
| 2025-08-31 00:38:39 - pico-train - INFO - โโโ Loss: 4.9128 |
| 2025-08-31 00:38:39 - pico-train - INFO - โโโ Learning Rate: 1.43e-04 |
| 2025-08-31 00:38:39 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 00:39:31 - pico-train - INFO - Step 37100 -- ๐ Training Metrics |
| 2025-08-31 00:39:31 - pico-train - INFO - โโโ Loss: 4.8547 |
| 2025-08-31 00:39:31 - pico-train - INFO - โโโ Learning Rate: 1.43e-04 |
| 2025-08-31 00:39:31 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 00:40:22 - pico-train - INFO - Step 37200 -- ๐ Training Metrics |
| 2025-08-31 00:40:22 - pico-train - INFO - โโโ Loss: 4.9243 |
| 2025-08-31 00:40:22 - pico-train - INFO - โโโ Learning Rate: 1.43e-04 |
| 2025-08-31 00:40:22 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 00:41:14 - pico-train - INFO - Step 37300 -- ๐ Training Metrics |
| 2025-08-31 00:41:14 - pico-train - INFO - โโโ Loss: 4.9263 |
| 2025-08-31 00:41:14 - pico-train - INFO - โโโ Learning Rate: 1.43e-04 |
| 2025-08-31 00:41:14 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 00:42:06 - pico-train - INFO - Step 37400 -- ๐ Training Metrics |
| 2025-08-31 00:42:06 - pico-train - INFO - โโโ Loss: 4.9288 |
| 2025-08-31 00:42:06 - pico-train - INFO - โโโ Learning Rate: 1.42e-04 |
| 2025-08-31 00:42:06 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 00:42:58 - pico-train - INFO - Step 37500 -- ๐ Training Metrics |
| 2025-08-31 00:42:58 - pico-train - INFO - โโโ Loss: 4.9187 |
| 2025-08-31 00:42:58 - pico-train - INFO - โโโ Learning Rate: 1.42e-04 |
| 2025-08-31 00:42:58 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 00:43:49 - pico-train - INFO - Step 37600 -- ๐ Training Metrics |
| 2025-08-31 00:43:49 - pico-train - INFO - โโโ Loss: 4.8727 |
| 2025-08-31 00:43:49 - pico-train - INFO - โโโ Learning Rate: 1.42e-04 |
| 2025-08-31 00:43:49 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 00:44:41 - pico-train - INFO - Step 37700 -- ๐ Training Metrics |
| 2025-08-31 00:44:41 - pico-train - INFO - โโโ Loss: 4.8757 |
| 2025-08-31 00:44:41 - pico-train - INFO - โโโ Learning Rate: 1.41e-04 |
| 2025-08-31 00:44:41 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 00:45:32 - pico-train - INFO - Step 37800 -- ๐ Training Metrics |
| 2025-08-31 00:45:32 - pico-train - INFO - โโโ Loss: 4.9217 |
| 2025-08-31 00:45:32 - pico-train - INFO - โโโ Learning Rate: 1.41e-04 |
| 2025-08-31 00:45:32 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 00:46:24 - pico-train - INFO - Step 37900 -- ๐ Training Metrics |
| 2025-08-31 00:46:24 - pico-train - INFO - โโโ Loss: 4.7568 |
| 2025-08-31 00:46:24 - pico-train - INFO - โโโ Learning Rate: 1.41e-04 |
| 2025-08-31 00:46:24 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 00:47:16 - pico-train - INFO - Step 38000 -- ๐พ Saving Checkpoint |
| 2025-08-31 00:49:04 - pico-train - INFO - Step 38000 -- ๐ Evaluation Results |
| 2025-08-31 00:49:04 - pico-train - INFO - โโโ paloma: inf |
| 2025-08-31 00:49:05 - pico-train - INFO - Step 38000 -- ๐ Training Metrics |
| 2025-08-31 00:49:05 - pico-train - INFO - โโโ Loss: 4.8588 |
| 2025-08-31 00:49:05 - pico-train - INFO - โโโ Learning Rate: 1.40e-04 |
| 2025-08-31 00:49:05 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 00:49:05 - pico-train - INFO - Step 38000 -- ๐ Saving Learning Dynamics |
| 2025-08-31 00:49:57 - pico-train - INFO - Step 38100 -- ๐ Training Metrics |
| 2025-08-31 00:49:57 - pico-train - INFO - โโโ Loss: 4.8652 |
| 2025-08-31 00:49:57 - pico-train - INFO - โโโ Learning Rate: 1.40e-04 |
| 2025-08-31 00:49:57 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 00:50:48 - pico-train - INFO - Step 38200 -- ๐ Training Metrics |
| 2025-08-31 00:50:48 - pico-train - INFO - โโโ Loss: 4.9169 |
| 2025-08-31 00:50:48 - pico-train - INFO - โโโ Learning Rate: 1.40e-04 |
| 2025-08-31 00:50:48 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 00:51:40 - pico-train - INFO - Step 38300 -- ๐ Training Metrics |
| 2025-08-31 00:51:40 - pico-train - INFO - โโโ Loss: 4.8703 |
| 2025-08-31 00:51:40 - pico-train - INFO - โโโ Learning Rate: 1.40e-04 |
| 2025-08-31 00:51:40 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 00:52:32 - pico-train - INFO - Step 38400 -- ๐ Training Metrics |
| 2025-08-31 00:52:32 - pico-train - INFO - โโโ Loss: 4.8989 |
| 2025-08-31 00:52:32 - pico-train - INFO - โโโ Learning Rate: 1.39e-04 |
| 2025-08-31 00:52:32 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 00:53:24 - pico-train - INFO - Step 38500 -- ๐ Training Metrics |
| 2025-08-31 00:53:24 - pico-train - INFO - โโโ Loss: 4.8944 |
| 2025-08-31 00:53:24 - pico-train - INFO - โโโ Learning Rate: 1.39e-04 |
| 2025-08-31 00:53:24 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 00:54:16 - pico-train - INFO - Step 38600 -- ๐ Training Metrics |
| 2025-08-31 00:54:16 - pico-train - INFO - โโโ Loss: 4.9530 |
| 2025-08-31 00:54:16 - pico-train - INFO - โโโ Learning Rate: 1.39e-04 |
| 2025-08-31 00:54:16 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 00:55:07 - pico-train - INFO - Step 38700 -- ๐ Training Metrics |
| 2025-08-31 00:55:07 - pico-train - INFO - โโโ Loss: 4.9454 |
| 2025-08-31 00:55:07 - pico-train - INFO - โโโ Learning Rate: 1.38e-04 |
| 2025-08-31 00:55:07 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 00:55:59 - pico-train - INFO - Step 38800 -- ๐ Training Metrics |
| 2025-08-31 00:55:59 - pico-train - INFO - โโโ Loss: 4.9611 |
| 2025-08-31 00:55:59 - pico-train - INFO - โโโ Learning Rate: 1.38e-04 |
| 2025-08-31 00:55:59 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 00:56:50 - pico-train - INFO - Step 38900 -- ๐ Training Metrics |
| 2025-08-31 00:56:50 - pico-train - INFO - โโโ Loss: 4.8250 |
| 2025-08-31 00:56:50 - pico-train - INFO - โโโ Learning Rate: 1.38e-04 |
| 2025-08-31 00:56:50 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 00:57:42 - pico-train - INFO - Step 39000 -- ๐ Training Metrics |
| 2025-08-31 00:57:42 - pico-train - INFO - โโโ Loss: 4.8227 |
| 2025-08-31 00:57:42 - pico-train - INFO - โโโ Learning Rate: 1.38e-04 |
| 2025-08-31 00:57:42 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 00:58:34 - pico-train - INFO - Step 39100 -- ๐ Training Metrics |
| 2025-08-31 00:58:34 - pico-train - INFO - โโโ Loss: 4.9350 |
| 2025-08-31 00:58:34 - pico-train - INFO - โโโ Learning Rate: 1.37e-04 |
| 2025-08-31 00:58:34 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 00:59:26 - pico-train - INFO - Step 39200 -- ๐ Training Metrics |
| 2025-08-31 00:59:26 - pico-train - INFO - โโโ Loss: 4.9102 |
| 2025-08-31 00:59:26 - pico-train - INFO - โโโ Learning Rate: 1.37e-04 |
| 2025-08-31 00:59:26 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 01:00:17 - pico-train - INFO - Step 39300 -- ๐ Training Metrics |
| 2025-08-31 01:00:17 - pico-train - INFO - โโโ Loss: 4.7694 |
| 2025-08-31 01:00:17 - pico-train - INFO - โโโ Learning Rate: 1.37e-04 |
| 2025-08-31 01:00:17 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 01:01:08 - pico-train - INFO - Step 39400 -- ๐ Training Metrics |
| 2025-08-31 01:01:08 - pico-train - INFO - โโโ Loss: 4.9689 |
| 2025-08-31 01:01:08 - pico-train - INFO - โโโ Learning Rate: 1.36e-04 |
| 2025-08-31 01:01:08 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 01:02:01 - pico-train - INFO - Step 39500 -- ๐ Training Metrics |
| 2025-08-31 01:02:01 - pico-train - INFO - โโโ Loss: 4.9675 |
| 2025-08-31 01:02:01 - pico-train - INFO - โโโ Learning Rate: 1.36e-04 |
| 2025-08-31 01:02:01 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 01:02:52 - pico-train - INFO - Step 39600 -- ๐ Training Metrics |
| 2025-08-31 01:02:52 - pico-train - INFO - โโโ Loss: 4.9184 |
| 2025-08-31 01:02:52 - pico-train - INFO - โโโ Learning Rate: 1.36e-04 |
| 2025-08-31 01:02:52 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 01:03:44 - pico-train - INFO - Step 39700 -- ๐ Training Metrics |
| 2025-08-31 01:03:44 - pico-train - INFO - โโโ Loss: 4.9596 |
| 2025-08-31 01:03:44 - pico-train - INFO - โโโ Learning Rate: 1.35e-04 |
| 2025-08-31 01:03:44 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 01:04:36 - pico-train - INFO - Step 39800 -- ๐ Training Metrics |
| 2025-08-31 01:04:36 - pico-train - INFO - โโโ Loss: 4.9584 |
| 2025-08-31 01:04:36 - pico-train - INFO - โโโ Learning Rate: 1.35e-04 |
| 2025-08-31 01:04:36 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 01:05:28 - pico-train - INFO - Step 39900 -- ๐ Training Metrics |
| 2025-08-31 01:05:28 - pico-train - INFO - โโโ Loss: 4.8758 |
| 2025-08-31 01:05:28 - pico-train - INFO - โโโ Learning Rate: 1.35e-04 |
| 2025-08-31 01:05:28 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 01:06:19 - pico-train - INFO - Step 40000 -- ๐พ Saving Checkpoint |
| 2025-08-31 01:08:08 - pico-train - INFO - Step 40000 -- ๐ Evaluation Results |
| 2025-08-31 01:08:08 - pico-train - INFO - โโโ paloma: inf |
| 2025-08-31 01:08:09 - pico-train - INFO - Step 40000 -- ๐ Training Metrics |
| 2025-08-31 01:08:09 - pico-train - INFO - โโโ Loss: 4.8921 |
| 2025-08-31 01:08:09 - pico-train - INFO - โโโ Learning Rate: 1.35e-04 |
| 2025-08-31 01:08:09 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 01:08:09 - pico-train - INFO - Step 40000 -- ๐ Saving Learning Dynamics |
| 2025-08-31 01:09:01 - pico-train - INFO - Step 40100 -- ๐ Training Metrics |
| 2025-08-31 01:09:01 - pico-train - INFO - โโโ Loss: 4.9123 |
| 2025-08-31 01:09:01 - pico-train - INFO - โโโ Learning Rate: 1.34e-04 |
| 2025-08-31 01:09:01 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 01:09:52 - pico-train - INFO - Step 40200 -- ๐ Training Metrics |
| 2025-08-31 01:09:52 - pico-train - INFO - โโโ Loss: 4.9111 |
| 2025-08-31 01:09:52 - pico-train - INFO - โโโ Learning Rate: 1.34e-04 |
| 2025-08-31 01:09:52 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 01:10:44 - pico-train - INFO - Step 40300 -- ๐ Training Metrics |
| 2025-08-31 01:10:44 - pico-train - INFO - โโโ Loss: 4.8833 |
| 2025-08-31 01:10:44 - pico-train - INFO - โโโ Learning Rate: 1.34e-04 |
| 2025-08-31 01:10:44 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 01:11:36 - pico-train - INFO - Step 40400 -- ๐ Training Metrics |
| 2025-08-31 01:11:36 - pico-train - INFO - โโโ Loss: 4.7673 |
| 2025-08-31 01:11:36 - pico-train - INFO - โโโ Learning Rate: 1.33e-04 |
| 2025-08-31 01:11:36 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 01:12:28 - pico-train - INFO - Step 40500 -- ๐ Training Metrics |
| 2025-08-31 01:12:28 - pico-train - INFO - โโโ Loss: 4.9048 |
| 2025-08-31 01:12:28 - pico-train - INFO - โโโ Learning Rate: 1.33e-04 |
| 2025-08-31 01:12:28 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 01:13:20 - pico-train - INFO - Step 40600 -- ๐ Training Metrics |
| 2025-08-31 01:13:20 - pico-train - INFO - โโโ Loss: 4.8952 |
| 2025-08-31 01:13:20 - pico-train - INFO - โโโ Learning Rate: 1.33e-04 |
| 2025-08-31 01:13:20 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 01:14:11 - pico-train - INFO - Step 40700 -- ๐ Training Metrics |
| 2025-08-31 01:14:11 - pico-train - INFO - โโโ Loss: 4.9022 |
| 2025-08-31 01:14:11 - pico-train - INFO - โโโ Learning Rate: 1.32e-04 |
| 2025-08-31 01:14:11 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 01:15:03 - pico-train - INFO - Step 40800 -- ๐ Training Metrics |
| 2025-08-31 01:15:03 - pico-train - INFO - โโโ Loss: 4.9242 |
| 2025-08-31 01:15:03 - pico-train - INFO - โโโ Learning Rate: 1.32e-04 |
| 2025-08-31 01:15:03 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 01:15:55 - pico-train - INFO - Step 40900 -- ๐ Training Metrics |
| 2025-08-31 01:15:55 - pico-train - INFO - โโโ Loss: 4.8794 |
| 2025-08-31 01:15:55 - pico-train - INFO - โโโ Learning Rate: 1.32e-04 |
| 2025-08-31 01:15:55 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 01:16:47 - pico-train - INFO - Step 41000 -- ๐ Training Metrics |
| 2025-08-31 01:16:47 - pico-train - INFO - โโโ Loss: 4.8925 |
| 2025-08-31 01:16:47 - pico-train - INFO - โโโ Learning Rate: 1.32e-04 |
| 2025-08-31 01:16:47 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 01:17:39 - pico-train - INFO - Step 41100 -- ๐ Training Metrics |
| 2025-08-31 01:17:39 - pico-train - INFO - โโโ Loss: 4.8677 |
| 2025-08-31 01:17:39 - pico-train - INFO - โโโ Learning Rate: 1.31e-04 |
| 2025-08-31 01:17:39 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 01:18:30 - pico-train - INFO - Step 41200 -- ๐ Training Metrics |
| 2025-08-31 01:18:30 - pico-train - INFO - โโโ Loss: 4.8106 |
| 2025-08-31 01:18:30 - pico-train - INFO - โโโ Learning Rate: 1.31e-04 |
| 2025-08-31 01:18:30 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 01:19:23 - pico-train - INFO - Step 41300 -- ๐ Training Metrics |
| 2025-08-31 01:19:23 - pico-train - INFO - โโโ Loss: 4.7647 |
| 2025-08-31 01:19:23 - pico-train - INFO - โโโ Learning Rate: 1.31e-04 |
| 2025-08-31 01:19:23 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 01:20:15 - pico-train - INFO - Step 41400 -- ๐ Training Metrics |
| 2025-08-31 01:20:15 - pico-train - INFO - โโโ Loss: 4.8469 |
| 2025-08-31 01:20:15 - pico-train - INFO - โโโ Learning Rate: 1.30e-04 |
| 2025-08-31 01:20:15 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 01:21:07 - pico-train - INFO - Step 41500 -- ๐ Training Metrics |
| 2025-08-31 01:21:07 - pico-train - INFO - โโโ Loss: 4.8299 |
| 2025-08-31 01:21:07 - pico-train - INFO - โโโ Learning Rate: 1.30e-04 |
| 2025-08-31 01:21:07 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 01:21:59 - pico-train - INFO - Step 41600 -- ๐ Training Metrics |
| 2025-08-31 01:21:59 - pico-train - INFO - โโโ Loss: 4.7820 |
| 2025-08-31 01:21:59 - pico-train - INFO - โโโ Learning Rate: 1.30e-04 |
| 2025-08-31 01:21:59 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 01:22:51 - pico-train - INFO - Step 41700 -- ๐ Training Metrics |
| 2025-08-31 01:22:51 - pico-train - INFO - โโโ Loss: 4.7469 |
| 2025-08-31 01:22:51 - pico-train - INFO - โโโ Learning Rate: 1.29e-04 |
| 2025-08-31 01:22:51 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 01:23:42 - pico-train - INFO - Step 41800 -- ๐ Training Metrics |
| 2025-08-31 01:23:42 - pico-train - INFO - โโโ Loss: 4.8483 |
| 2025-08-31 01:23:42 - pico-train - INFO - โโโ Learning Rate: 1.29e-04 |
| 2025-08-31 01:23:42 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 01:24:34 - pico-train - INFO - Step 41900 -- ๐ Training Metrics |
| 2025-08-31 01:24:34 - pico-train - INFO - โโโ Loss: 4.8249 |
| 2025-08-31 01:24:34 - pico-train - INFO - โโโ Learning Rate: 1.29e-04 |
| 2025-08-31 01:24:34 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 01:25:26 - pico-train - INFO - Step 42000 -- ๐พ Saving Checkpoint |
| 2025-08-31 01:27:14 - pico-train - INFO - Step 42000 -- ๐ Evaluation Results |
| 2025-08-31 01:27:14 - pico-train - INFO - โโโ paloma: inf |
| 2025-08-31 01:27:15 - pico-train - INFO - Step 42000 -- ๐ Training Metrics |
| 2025-08-31 01:27:15 - pico-train - INFO - โโโ Loss: 4.6586 |
| 2025-08-31 01:27:15 - pico-train - INFO - โโโ Learning Rate: 1.28e-04 |
| 2025-08-31 01:27:15 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 01:27:15 - pico-train - INFO - Step 42000 -- ๐ Saving Learning Dynamics |
| 2025-08-31 01:28:07 - pico-train - INFO - Step 42100 -- ๐ Training Metrics |
| 2025-08-31 01:28:07 - pico-train - INFO - โโโ Loss: 3.9614 |
| 2025-08-31 01:28:07 - pico-train - INFO - โโโ Learning Rate: 1.28e-04 |
| 2025-08-31 01:28:07 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 01:28:59 - pico-train - INFO - Step 42200 -- ๐ Training Metrics |
| 2025-08-31 01:28:59 - pico-train - INFO - โโโ Loss: 4.4052 |
| 2025-08-31 01:28:59 - pico-train - INFO - โโโ Learning Rate: 1.28e-04 |
| 2025-08-31 01:28:59 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 01:29:50 - pico-train - INFO - Step 42300 -- ๐ Training Metrics |
| 2025-08-31 01:29:50 - pico-train - INFO - โโโ Loss: 4.1839 |
| 2025-08-31 01:29:50 - pico-train - INFO - โโโ Learning Rate: 1.28e-04 |
| 2025-08-31 01:29:50 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 01:30:42 - pico-train - INFO - Step 42400 -- ๐ Training Metrics |
| 2025-08-31 01:30:42 - pico-train - INFO - โโโ Loss: 4.2521 |
| 2025-08-31 01:30:42 - pico-train - INFO - โโโ Learning Rate: 1.27e-04 |
| 2025-08-31 01:30:42 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 01:31:35 - pico-train - INFO - Step 42500 -- ๐ Training Metrics |
| 2025-08-31 01:31:35 - pico-train - INFO - โโโ Loss: 4.8687 |
| 2025-08-31 01:31:35 - pico-train - INFO - โโโ Learning Rate: 1.27e-04 |
| 2025-08-31 01:31:35 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 01:32:26 - pico-train - INFO - Step 42600 -- ๐ Training Metrics |
| 2025-08-31 01:32:26 - pico-train - INFO - โโโ Loss: 4.8525 |
| 2025-08-31 01:32:26 - pico-train - INFO - โโโ Learning Rate: 1.27e-04 |
| 2025-08-31 01:32:26 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 01:33:18 - pico-train - INFO - Step 42700 -- ๐ Training Metrics |
| 2025-08-31 01:33:18 - pico-train - INFO - โโโ Loss: 4.5913 |
| 2025-08-31 01:33:18 - pico-train - INFO - โโโ Learning Rate: 1.26e-04 |
| 2025-08-31 01:33:18 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 01:34:09 - pico-train - INFO - Step 42800 -- ๐ Training Metrics |
| 2025-08-31 01:34:09 - pico-train - INFO - โโโ Loss: 3.9309 |
| 2025-08-31 01:34:09 - pico-train - INFO - โโโ Learning Rate: 1.26e-04 |
| 2025-08-31 01:34:09 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 01:35:01 - pico-train - INFO - Step 42900 -- ๐ Training Metrics |
| 2025-08-31 01:35:01 - pico-train - INFO - โโโ Loss: 3.6904 |
| 2025-08-31 01:35:01 - pico-train - INFO - โโโ Learning Rate: 1.26e-04 |
| 2025-08-31 01:35:01 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 01:35:54 - pico-train - INFO - Step 43000 -- ๐ Training Metrics |
| 2025-08-31 01:35:54 - pico-train - INFO - โโโ Loss: 3.8896 |
| 2025-08-31 01:35:54 - pico-train - INFO - โโโ Learning Rate: 1.25e-04 |
| 2025-08-31 01:35:54 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 01:36:45 - pico-train - INFO - Step 43100 -- ๐ Training Metrics |
| 2025-08-31 01:36:45 - pico-train - INFO - โโโ Loss: 3.9036 |
| 2025-08-31 01:36:45 - pico-train - INFO - โโโ Learning Rate: 1.25e-04 |
| 2025-08-31 01:36:45 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 01:37:37 - pico-train - INFO - Step 43200 -- ๐ Training Metrics |
| 2025-08-31 01:37:37 - pico-train - INFO - โโโ Loss: 3.7300 |
| 2025-08-31 01:37:37 - pico-train - INFO - โโโ Learning Rate: 1.25e-04 |
| 2025-08-31 01:37:37 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 01:38:29 - pico-train - INFO - Step 43300 -- ๐ Training Metrics |
| 2025-08-31 01:38:29 - pico-train - INFO - โโโ Loss: 3.6678 |
| 2025-08-31 01:38:29 - pico-train - INFO - โโโ Learning Rate: 1.24e-04 |
| 2025-08-31 01:38:29 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 01:39:21 - pico-train - INFO - Step 43400 -- ๐ Training Metrics |
| 2025-08-31 01:39:21 - pico-train - INFO - โโโ Loss: 2.7509 |
| 2025-08-31 01:39:21 - pico-train - INFO - โโโ Learning Rate: 1.24e-04 |
| 2025-08-31 01:39:21 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 01:40:13 - pico-train - INFO - Step 43500 -- ๐ Training Metrics |
| 2025-08-31 01:40:13 - pico-train - INFO - โโโ Loss: 3.2958 |
| 2025-08-31 01:40:13 - pico-train - INFO - โโโ Learning Rate: 1.24e-04 |
| 2025-08-31 01:40:13 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 01:41:05 - pico-train - INFO - Step 43600 -- ๐ Training Metrics |
| 2025-08-31 01:41:05 - pico-train - INFO - โโโ Loss: 3.8729 |
| 2025-08-31 01:41:05 - pico-train - INFO - โโโ Learning Rate: 1.24e-04 |
| 2025-08-31 01:41:05 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 01:41:58 - pico-train - INFO - Step 43700 -- ๐ Training Metrics |
| 2025-08-31 01:41:58 - pico-train - INFO - โโโ Loss: 3.9046 |
| 2025-08-31 01:41:58 - pico-train - INFO - โโโ Learning Rate: 1.23e-04 |
| 2025-08-31 01:41:58 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 01:42:50 - pico-train - INFO - Step 43800 -- ๐ Training Metrics |
| 2025-08-31 01:42:50 - pico-train - INFO - โโโ Loss: 3.7078 |
| 2025-08-31 01:42:50 - pico-train - INFO - โโโ Learning Rate: 1.23e-04 |
| 2025-08-31 01:42:50 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 01:43:42 - pico-train - INFO - Step 43900 -- ๐ Training Metrics |
| 2025-08-31 01:43:42 - pico-train - INFO - โโโ Loss: 4.2522 |
| 2025-08-31 01:43:42 - pico-train - INFO - โโโ Learning Rate: 1.23e-04 |
| 2025-08-31 01:43:42 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 01:44:34 - pico-train - INFO - Step 44000 -- ๐พ Saving Checkpoint |
| 2025-08-31 01:46:22 - pico-train - INFO - Step 44000 -- ๐ Evaluation Results |
| 2025-08-31 01:46:22 - pico-train - INFO - โโโ paloma: inf |
| 2025-08-31 01:46:23 - pico-train - INFO - Step 44000 -- ๐ Training Metrics |
| 2025-08-31 01:46:23 - pico-train - INFO - โโโ Loss: 4.9518 |
| 2025-08-31 01:46:23 - pico-train - INFO - โโโ Learning Rate: 1.22e-04 |
| 2025-08-31 01:46:23 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 01:46:23 - pico-train - INFO - Step 44000 -- ๐ Saving Learning Dynamics |
| 2025-08-31 01:47:15 - pico-train - INFO - Step 44100 -- ๐ Training Metrics |
| 2025-08-31 01:47:15 - pico-train - INFO - โโโ Loss: 4.9398 |
| 2025-08-31 01:47:15 - pico-train - INFO - โโโ Learning Rate: 1.22e-04 |
| 2025-08-31 01:47:15 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 01:48:07 - pico-train - INFO - Step 44200 -- ๐ Training Metrics |
| 2025-08-31 01:48:07 - pico-train - INFO - โโโ Loss: 4.8995 |
| 2025-08-31 01:48:07 - pico-train - INFO - โโโ Learning Rate: 1.22e-04 |
| 2025-08-31 01:48:07 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 01:48:59 - pico-train - INFO - Step 44300 -- ๐ Training Metrics |
| 2025-08-31 01:48:59 - pico-train - INFO - โโโ Loss: 4.8545 |
| 2025-08-31 01:48:59 - pico-train - INFO - โโโ Learning Rate: 1.21e-04 |
| 2025-08-31 01:48:59 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 01:49:50 - pico-train - INFO - Step 44400 -- ๐ Training Metrics |
| 2025-08-31 01:49:50 - pico-train - INFO - โโโ Loss: 4.3091 |
| 2025-08-31 01:49:50 - pico-train - INFO - โโโ Learning Rate: 1.21e-04 |
| 2025-08-31 01:49:50 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 01:50:42 - pico-train - INFO - Step 44500 -- ๐ Training Metrics |
| 2025-08-31 01:50:42 - pico-train - INFO - โโโ Loss: 4.9417 |
| 2025-08-31 01:50:42 - pico-train - INFO - โโโ Learning Rate: 1.21e-04 |
| 2025-08-31 01:50:42 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 01:51:34 - pico-train - INFO - Step 44600 -- ๐ Training Metrics |
| 2025-08-31 01:51:34 - pico-train - INFO - โโโ Loss: 4.8857 |
| 2025-08-31 01:51:34 - pico-train - INFO - โโโ Learning Rate: 1.20e-04 |
| 2025-08-31 01:51:34 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 01:52:25 - pico-train - INFO - Step 44700 -- ๐ Training Metrics |
| 2025-08-31 01:52:25 - pico-train - INFO - โโโ Loss: 4.8884 |
| 2025-08-31 01:52:25 - pico-train - INFO - โโโ Learning Rate: 1.20e-04 |
| 2025-08-31 01:52:25 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 01:53:17 - pico-train - INFO - Step 44800 -- ๐ Training Metrics |
| 2025-08-31 01:53:17 - pico-train - INFO - โโโ Loss: 4.8784 |
| 2025-08-31 01:53:17 - pico-train - INFO - โโโ Learning Rate: 1.20e-04 |
| 2025-08-31 01:53:17 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 01:54:08 - pico-train - INFO - Step 44900 -- ๐ Training Metrics |
| 2025-08-31 01:54:08 - pico-train - INFO - โโโ Loss: 4.9107 |
| 2025-08-31 01:54:08 - pico-train - INFO - โโโ Learning Rate: 1.19e-04 |
| 2025-08-31 01:54:08 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 01:55:01 - pico-train - INFO - Step 45000 -- ๐ Training Metrics |
| 2025-08-31 01:55:01 - pico-train - INFO - โโโ Loss: 4.9012 |
| 2025-08-31 01:55:01 - pico-train - INFO - โโโ Learning Rate: 1.19e-04 |
| 2025-08-31 01:55:01 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 01:55:53 - pico-train - INFO - Step 45100 -- ๐ Training Metrics |
| 2025-08-31 01:55:53 - pico-train - INFO - โโโ Loss: 4.8593 |
| 2025-08-31 01:55:53 - pico-train - INFO - โโโ Learning Rate: 1.19e-04 |
| 2025-08-31 01:55:53 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 01:56:45 - pico-train - INFO - Step 45200 -- ๐ Training Metrics |
| 2025-08-31 01:56:45 - pico-train - INFO - โโโ Loss: 4.8602 |
| 2025-08-31 01:56:45 - pico-train - INFO - โโโ Learning Rate: 1.18e-04 |
| 2025-08-31 01:56:45 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 01:57:36 - pico-train - INFO - Step 45300 -- ๐ Training Metrics |
| 2025-08-31 01:57:36 - pico-train - INFO - โโโ Loss: 4.8273 |
| 2025-08-31 01:57:36 - pico-train - INFO - โโโ Learning Rate: 1.18e-04 |
| 2025-08-31 01:57:36 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 01:58:28 - pico-train - INFO - Step 45400 -- ๐ Training Metrics |
| 2025-08-31 01:58:28 - pico-train - INFO - โโโ Loss: 4.7291 |
| 2025-08-31 01:58:28 - pico-train - INFO - โโโ Learning Rate: 1.18e-04 |
| 2025-08-31 01:58:28 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 01:59:21 - pico-train - INFO - Step 45500 -- ๐ Training Metrics |
| 2025-08-31 01:59:21 - pico-train - INFO - โโโ Loss: 4.8598 |
| 2025-08-31 01:59:21 - pico-train - INFO - โโโ Learning Rate: 1.18e-04 |
| 2025-08-31 01:59:21 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 02:00:12 - pico-train - INFO - Step 45600 -- ๐ Training Metrics |
| 2025-08-31 02:00:12 - pico-train - INFO - โโโ Loss: 4.8709 |
| 2025-08-31 02:00:12 - pico-train - INFO - โโโ Learning Rate: 1.17e-04 |
| 2025-08-31 02:00:12 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 02:01:04 - pico-train - INFO - Step 45700 -- ๐ Training Metrics |
| 2025-08-31 02:01:04 - pico-train - INFO - โโโ Loss: 4.8223 |
| 2025-08-31 02:01:04 - pico-train - INFO - โโโ Learning Rate: 1.17e-04 |
| 2025-08-31 02:01:04 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 02:01:56 - pico-train - INFO - Step 45800 -- ๐ Training Metrics |
| 2025-08-31 02:01:56 - pico-train - INFO - โโโ Loss: 4.8057 |
| 2025-08-31 02:01:56 - pico-train - INFO - โโโ Learning Rate: 1.17e-04 |
| 2025-08-31 02:01:56 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 02:02:47 - pico-train - INFO - Step 45900 -- ๐ Training Metrics |
| 2025-08-31 02:02:47 - pico-train - INFO - โโโ Loss: 4.8073 |
| 2025-08-31 02:02:47 - pico-train - INFO - โโโ Learning Rate: 1.16e-04 |
| 2025-08-31 02:02:47 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 02:03:39 - pico-train - INFO - Step 46000 -- ๐พ Saving Checkpoint |
| 2025-08-31 02:05:32 - pico-train - INFO - Step 46000 -- ๐ Evaluation Results |
| 2025-08-31 02:05:32 - pico-train - INFO - โโโ paloma: inf |
| 2025-08-31 02:05:32 - pico-train - INFO - Step 46000 -- ๐ Training Metrics |
| 2025-08-31 02:05:32 - pico-train - INFO - โโโ Loss: 4.8024 |
| 2025-08-31 02:05:32 - pico-train - INFO - โโโ Learning Rate: 1.16e-04 |
| 2025-08-31 02:05:32 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 02:05:32 - pico-train - INFO - Step 46000 -- ๐ Saving Learning Dynamics |
| 2025-08-31 02:06:25 - pico-train - INFO - Step 46100 -- ๐ Training Metrics |
| 2025-08-31 02:06:25 - pico-train - INFO - โโโ Loss: 4.8004 |
| 2025-08-31 02:06:25 - pico-train - INFO - โโโ Learning Rate: 1.16e-04 |
| 2025-08-31 02:06:25 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 02:07:16 - pico-train - INFO - Step 46200 -- ๐ Training Metrics |
| 2025-08-31 02:07:16 - pico-train - INFO - โโโ Loss: 4.6862 |
| 2025-08-31 02:07:16 - pico-train - INFO - โโโ Learning Rate: 1.15e-04 |
| 2025-08-31 02:07:16 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 02:08:08 - pico-train - INFO - Step 46300 -- ๐ Training Metrics |
| 2025-08-31 02:08:08 - pico-train - INFO - โโโ Loss: 4.8325 |
| 2025-08-31 02:08:08 - pico-train - INFO - โโโ Learning Rate: 1.15e-04 |
| 2025-08-31 02:08:08 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 02:08:59 - pico-train - INFO - Step 46400 -- ๐ Training Metrics |
| 2025-08-31 02:08:59 - pico-train - INFO - โโโ Loss: 4.8486 |
| 2025-08-31 02:08:59 - pico-train - INFO - โโโ Learning Rate: 1.15e-04 |
| 2025-08-31 02:08:59 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 02:09:51 - pico-train - INFO - Step 46500 -- ๐ Training Metrics |
| 2025-08-31 02:09:51 - pico-train - INFO - โโโ Loss: 4.7909 |
| 2025-08-31 02:09:51 - pico-train - INFO - โโโ Learning Rate: 1.14e-04 |
| 2025-08-31 02:09:51 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 02:10:43 - pico-train - INFO - Step 46600 -- ๐ Training Metrics |
| 2025-08-31 02:10:43 - pico-train - INFO - โโโ Loss: 4.7458 |
| 2025-08-31 02:10:43 - pico-train - INFO - โโโ Learning Rate: 1.14e-04 |
| 2025-08-31 02:10:43 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 02:11:35 - pico-train - INFO - Step 46700 -- ๐ Training Metrics |
| 2025-08-31 02:11:35 - pico-train - INFO - โโโ Loss: 4.7350 |
| 2025-08-31 02:11:35 - pico-train - INFO - โโโ Learning Rate: 1.14e-04 |
| 2025-08-31 02:11:35 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 02:12:27 - pico-train - INFO - Step 46800 -- ๐ Training Metrics |
| 2025-08-31 02:12:27 - pico-train - INFO - โโโ Loss: 4.8766 |
| 2025-08-31 02:12:27 - pico-train - INFO - โโโ Learning Rate: 1.13e-04 |
| 2025-08-31 02:12:27 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 02:13:18 - pico-train - INFO - Step 46900 -- ๐ Training Metrics |
| 2025-08-31 02:13:18 - pico-train - INFO - โโโ Loss: 4.8978 |
| 2025-08-31 02:13:18 - pico-train - INFO - โโโ Learning Rate: 1.13e-04 |
| 2025-08-31 02:13:18 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 02:14:11 - pico-train - INFO - Step 47000 -- ๐ Training Metrics |
| 2025-08-31 02:14:11 - pico-train - INFO - โโโ Loss: 4.8512 |
| 2025-08-31 02:14:11 - pico-train - INFO - โโโ Learning Rate: 1.13e-04 |
| 2025-08-31 02:14:11 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 02:15:02 - pico-train - INFO - Step 47100 -- ๐ Training Metrics |
| 2025-08-31 02:15:02 - pico-train - INFO - โโโ Loss: 4.8459 |
| 2025-08-31 02:15:02 - pico-train - INFO - โโโ Learning Rate: 1.12e-04 |
| 2025-08-31 02:15:02 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 02:15:54 - pico-train - INFO - Step 47200 -- ๐ Training Metrics |
| 2025-08-31 02:15:54 - pico-train - INFO - โโโ Loss: 4.8797 |
| 2025-08-31 02:15:54 - pico-train - INFO - โโโ Learning Rate: 1.12e-04 |
| 2025-08-31 02:15:54 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 02:16:46 - pico-train - INFO - Step 47300 -- ๐ Training Metrics |
| 2025-08-31 02:16:46 - pico-train - INFO - โโโ Loss: 4.9021 |
| 2025-08-31 02:16:46 - pico-train - INFO - โโโ Learning Rate: 1.12e-04 |
| 2025-08-31 02:16:46 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 02:17:38 - pico-train - INFO - Step 47400 -- ๐ Training Metrics |
| 2025-08-31 02:17:38 - pico-train - INFO - โโโ Loss: 4.8815 |
| 2025-08-31 02:17:38 - pico-train - INFO - โโโ Learning Rate: 1.12e-04 |
| 2025-08-31 02:17:38 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 02:18:30 - pico-train - INFO - Step 47500 -- ๐ Training Metrics |
| 2025-08-31 02:18:30 - pico-train - INFO - โโโ Loss: 4.8223 |
| 2025-08-31 02:18:30 - pico-train - INFO - โโโ Learning Rate: 1.11e-04 |
| 2025-08-31 02:18:30 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 02:19:22 - pico-train - INFO - Step 47600 -- ๐ Training Metrics |
| 2025-08-31 02:19:22 - pico-train - INFO - โโโ Loss: 4.7840 |
| 2025-08-31 02:19:22 - pico-train - INFO - โโโ Learning Rate: 1.11e-04 |
| 2025-08-31 02:19:22 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 02:20:14 - pico-train - INFO - Step 47700 -- ๐ Training Metrics |
| 2025-08-31 02:20:14 - pico-train - INFO - โโโ Loss: 4.7924 |
| 2025-08-31 02:20:14 - pico-train - INFO - โโโ Learning Rate: 1.11e-04 |
| 2025-08-31 02:20:14 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 02:21:07 - pico-train - INFO - Step 47800 -- ๐ Training Metrics |
| 2025-08-31 02:21:07 - pico-train - INFO - โโโ Loss: 4.8607 |
| 2025-08-31 02:21:07 - pico-train - INFO - โโโ Learning Rate: 1.10e-04 |
| 2025-08-31 02:21:07 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 02:21:58 - pico-train - INFO - Step 47900 -- ๐ Training Metrics |
| 2025-08-31 02:21:58 - pico-train - INFO - โโโ Loss: 4.7786 |
| 2025-08-31 02:21:58 - pico-train - INFO - โโโ Learning Rate: 1.10e-04 |
| 2025-08-31 02:21:58 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 02:22:50 - pico-train - INFO - Step 48000 -- ๐พ Saving Checkpoint |
| 2025-08-31 02:24:40 - pico-train - INFO - Step 48000 -- ๐ Evaluation Results |
| 2025-08-31 02:24:40 - pico-train - INFO - โโโ paloma: inf |
| 2025-08-31 02:24:40 - pico-train - INFO - Step 48000 -- ๐ Training Metrics |
| 2025-08-31 02:24:40 - pico-train - INFO - โโโ Loss: 4.7870 |
| 2025-08-31 02:24:40 - pico-train - INFO - โโโ Learning Rate: 1.10e-04 |
| 2025-08-31 02:24:40 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 02:24:40 - pico-train - INFO - Step 48000 -- ๐ Saving Learning Dynamics |
| 2025-08-31 02:25:32 - pico-train - INFO - Step 48100 -- ๐ Training Metrics |
| 2025-08-31 02:25:32 - pico-train - INFO - โโโ Loss: 4.8513 |
| 2025-08-31 02:25:32 - pico-train - INFO - โโโ Learning Rate: 1.09e-04 |
| 2025-08-31 02:25:32 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 02:26:24 - pico-train - INFO - Step 48200 -- ๐ Training Metrics |
| 2025-08-31 02:26:24 - pico-train - INFO - โโโ Loss: 4.8859 |
| 2025-08-31 02:26:24 - pico-train - INFO - โโโ Learning Rate: 1.09e-04 |
| 2025-08-31 02:26:24 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 02:27:17 - pico-train - INFO - Step 48300 -- ๐ Training Metrics |
| 2025-08-31 02:27:17 - pico-train - INFO - โโโ Loss: 4.8814 |
| 2025-08-31 02:27:17 - pico-train - INFO - โโโ Learning Rate: 1.09e-04 |
| 2025-08-31 02:27:17 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 02:28:09 - pico-train - INFO - Step 48400 -- ๐ Training Metrics |
| 2025-08-31 02:28:09 - pico-train - INFO - โโโ Loss: 4.8762 |
| 2025-08-31 02:28:09 - pico-train - INFO - โโโ Learning Rate: 1.08e-04 |
| 2025-08-31 02:28:09 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 02:29:01 - pico-train - INFO - Step 48500 -- ๐ Training Metrics |
| 2025-08-31 02:29:01 - pico-train - INFO - โโโ Loss: 4.7832 |
| 2025-08-31 02:29:01 - pico-train - INFO - โโโ Learning Rate: 1.08e-04 |
| 2025-08-31 02:29:01 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 02:29:52 - pico-train - INFO - Step 48600 -- ๐ Training Metrics |
| 2025-08-31 02:29:52 - pico-train - INFO - โโโ Loss: 4.8735 |
| 2025-08-31 02:29:52 - pico-train - INFO - โโโ Learning Rate: 1.08e-04 |
| 2025-08-31 02:29:52 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 02:30:44 - pico-train - INFO - Step 48700 -- ๐ Training Metrics |
| 2025-08-31 02:30:44 - pico-train - INFO - โโโ Loss: 4.8643 |
| 2025-08-31 02:30:44 - pico-train - INFO - โโโ Learning Rate: 1.07e-04 |
| 2025-08-31 02:30:44 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 02:31:35 - pico-train - INFO - Step 48800 -- ๐ Training Metrics |
| 2025-08-31 02:31:35 - pico-train - INFO - โโโ Loss: 4.8452 |
| 2025-08-31 02:31:35 - pico-train - INFO - โโโ Learning Rate: 1.07e-04 |
| 2025-08-31 02:31:35 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 02:32:27 - pico-train - INFO - Step 48900 -- ๐ Training Metrics |
| 2025-08-31 02:32:27 - pico-train - INFO - โโโ Loss: 4.8968 |
| 2025-08-31 02:32:27 - pico-train - INFO - โโโ Learning Rate: 1.07e-04 |
| 2025-08-31 02:32:27 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 02:33:20 - pico-train - INFO - Step 49000 -- ๐ Training Metrics |
| 2025-08-31 02:33:20 - pico-train - INFO - โโโ Loss: 4.8301 |
| 2025-08-31 02:33:20 - pico-train - INFO - โโโ Learning Rate: 1.06e-04 |
| 2025-08-31 02:33:20 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 02:34:12 - pico-train - INFO - Step 49100 -- ๐ Training Metrics |
| 2025-08-31 02:34:12 - pico-train - INFO - โโโ Loss: 4.5497 |
| 2025-08-31 02:34:12 - pico-train - INFO - โโโ Learning Rate: 1.06e-04 |
| 2025-08-31 02:34:12 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 02:35:03 - pico-train - INFO - Step 49200 -- ๐ Training Metrics |
| 2025-08-31 02:35:03 - pico-train - INFO - โโโ Loss: 4.4657 |
| 2025-08-31 02:35:03 - pico-train - INFO - โโโ Learning Rate: 1.06e-04 |
| 2025-08-31 02:35:03 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 02:35:55 - pico-train - INFO - Step 49300 -- ๐ Training Metrics |
| 2025-08-31 02:35:55 - pico-train - INFO - โโโ Loss: 4.8406 |
| 2025-08-31 02:35:55 - pico-train - INFO - โโโ Learning Rate: 1.05e-04 |
| 2025-08-31 02:35:55 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 02:36:46 - pico-train - INFO - Step 49400 -- ๐ Training Metrics |
| 2025-08-31 02:36:46 - pico-train - INFO - โโโ Loss: 4.7300 |
| 2025-08-31 02:36:46 - pico-train - INFO - โโโ Learning Rate: 1.05e-04 |
| 2025-08-31 02:36:46 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 02:37:39 - pico-train - INFO - Step 49500 -- ๐ Training Metrics |
| 2025-08-31 02:37:39 - pico-train - INFO - โโโ Loss: 4.6906 |
| 2025-08-31 02:37:39 - pico-train - INFO - โโโ Learning Rate: 1.05e-04 |
| 2025-08-31 02:37:39 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 02:38:30 - pico-train - INFO - Step 49600 -- ๐ Training Metrics |
| 2025-08-31 02:38:30 - pico-train - INFO - โโโ Loss: 4.7507 |
| 2025-08-31 02:38:30 - pico-train - INFO - โโโ Learning Rate: 1.04e-04 |
| 2025-08-31 02:38:30 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 02:39:22 - pico-train - INFO - Step 49700 -- ๐ Training Metrics |
| 2025-08-31 02:39:22 - pico-train - INFO - โโโ Loss: 4.8010 |
| 2025-08-31 02:39:22 - pico-train - INFO - โโโ Learning Rate: 1.04e-04 |
| 2025-08-31 02:39:22 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 02:40:13 - pico-train - INFO - Step 49800 -- ๐ Training Metrics |
| 2025-08-31 02:40:13 - pico-train - INFO - โโโ Loss: 4.7701 |
| 2025-08-31 02:40:13 - pico-train - INFO - โโโ Learning Rate: 1.04e-04 |
| 2025-08-31 02:40:13 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 02:41:05 - pico-train - INFO - Step 49900 -- ๐ Training Metrics |
| 2025-08-31 02:41:05 - pico-train - INFO - โโโ Loss: 4.7618 |
| 2025-08-31 02:41:05 - pico-train - INFO - โโโ Learning Rate: 1.04e-04 |
| 2025-08-31 02:41:05 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 02:41:56 - pico-train - INFO - Step 50000 -- ๐พ Saving Checkpoint |
| 2025-08-31 02:43:45 - pico-train - INFO - Step 50000 -- ๐ Evaluation Results |
| 2025-08-31 02:43:45 - pico-train - INFO - โโโ paloma: inf |
| 2025-08-31 02:43:45 - pico-train - INFO - Step 50000 -- ๐ Training Metrics |
| 2025-08-31 02:43:45 - pico-train - INFO - โโโ Loss: 4.7503 |
| 2025-08-31 02:43:45 - pico-train - INFO - โโโ Learning Rate: 1.03e-04 |
| 2025-08-31 02:43:45 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 02:43:45 - pico-train - INFO - Step 50000 -- ๐ Saving Learning Dynamics |
| 2025-08-31 02:44:38 - pico-train - INFO - Step 50100 -- ๐ Training Metrics |
| 2025-08-31 02:44:38 - pico-train - INFO - โโโ Loss: 4.7940 |
| 2025-08-31 02:44:38 - pico-train - INFO - โโโ Learning Rate: 1.03e-04 |
| 2025-08-31 02:44:38 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 02:45:29 - pico-train - INFO - Step 50200 -- ๐ Training Metrics |
| 2025-08-31 02:45:29 - pico-train - INFO - โโโ Loss: 4.7831 |
| 2025-08-31 02:45:29 - pico-train - INFO - โโโ Learning Rate: 1.03e-04 |
| 2025-08-31 02:45:29 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 02:46:21 - pico-train - INFO - Step 50300 -- ๐ Training Metrics |
| 2025-08-31 02:46:21 - pico-train - INFO - โโโ Loss: 4.7918 |
| 2025-08-31 02:46:21 - pico-train - INFO - โโโ Learning Rate: 1.02e-04 |
| 2025-08-31 02:46:21 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 02:47:13 - pico-train - INFO - Step 50400 -- ๐ Training Metrics |
| 2025-08-31 02:47:13 - pico-train - INFO - โโโ Loss: 4.6934 |
| 2025-08-31 02:47:13 - pico-train - INFO - โโโ Learning Rate: 1.02e-04 |
| 2025-08-31 02:47:13 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 02:48:06 - pico-train - INFO - Step 50500 -- ๐ Training Metrics |
| 2025-08-31 02:48:06 - pico-train - INFO - โโโ Loss: 4.8227 |
| 2025-08-31 02:48:06 - pico-train - INFO - โโโ Learning Rate: 1.02e-04 |
| 2025-08-31 02:48:06 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 02:48:58 - pico-train - INFO - Step 50600 -- ๐ Training Metrics |
| 2025-08-31 02:48:58 - pico-train - INFO - โโโ Loss: 4.8425 |
| 2025-08-31 02:48:58 - pico-train - INFO - โโโ Learning Rate: 1.01e-04 |
| 2025-08-31 02:48:58 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 02:49:49 - pico-train - INFO - Step 50700 -- ๐ Training Metrics |
| 2025-08-31 02:49:49 - pico-train - INFO - โโโ Loss: 4.8425 |
| 2025-08-31 02:49:49 - pico-train - INFO - โโโ Learning Rate: 1.01e-04 |
| 2025-08-31 02:49:49 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 02:50:41 - pico-train - INFO - Step 50800 -- ๐ Training Metrics |
| 2025-08-31 02:50:41 - pico-train - INFO - โโโ Loss: 4.8135 |
| 2025-08-31 02:50:41 - pico-train - INFO - โโโ Learning Rate: 1.01e-04 |
| 2025-08-31 02:50:41 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 02:51:32 - pico-train - INFO - Step 50900 -- ๐ Training Metrics |
| 2025-08-31 02:51:32 - pico-train - INFO - โโโ Loss: 4.7762 |
| 2025-08-31 02:51:32 - pico-train - INFO - โโโ Learning Rate: 1.00e-04 |
| 2025-08-31 02:51:32 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 02:52:24 - pico-train - INFO - Step 51000 -- ๐ Training Metrics |
| 2025-08-31 02:52:24 - pico-train - INFO - โโโ Loss: 4.7735 |
| 2025-08-31 02:52:24 - pico-train - INFO - โโโ Learning Rate: 1.00e-04 |
| 2025-08-31 02:52:24 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 02:53:16 - pico-train - INFO - Step 51100 -- ๐ Training Metrics |
| 2025-08-31 02:53:16 - pico-train - INFO - โโโ Loss: 4.7769 |
| 2025-08-31 02:53:16 - pico-train - INFO - โโโ Learning Rate: 9.97e-05 |
| 2025-08-31 02:53:16 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 02:54:07 - pico-train - INFO - Step 51200 -- ๐ Training Metrics |
| 2025-08-31 02:54:07 - pico-train - INFO - โโโ Loss: 4.7809 |
| 2025-08-31 02:54:07 - pico-train - INFO - โโโ Learning Rate: 9.94e-05 |
| 2025-08-31 02:54:07 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 02:54:59 - pico-train - INFO - Step 51300 -- ๐ Training Metrics |
| 2025-08-31 02:54:59 - pico-train - INFO - โโโ Loss: 4.8016 |
| 2025-08-31 02:54:59 - pico-train - INFO - โโโ Learning Rate: 9.90e-05 |
| 2025-08-31 02:54:59 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 02:55:51 - pico-train - INFO - Step 51400 -- ๐ Training Metrics |
| 2025-08-31 02:55:51 - pico-train - INFO - โโโ Loss: 4.7868 |
| 2025-08-31 02:55:51 - pico-train - INFO - โโโ Learning Rate: 9.87e-05 |
| 2025-08-31 02:55:51 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 02:56:44 - pico-train - INFO - Step 51500 -- ๐ Training Metrics |
| 2025-08-31 02:56:44 - pico-train - INFO - โโโ Loss: 4.8385 |
| 2025-08-31 02:56:44 - pico-train - INFO - โโโ Learning Rate: 9.84e-05 |
| 2025-08-31 02:56:44 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 02:57:35 - pico-train - INFO - Step 51600 -- ๐ Training Metrics |
| 2025-08-31 02:57:35 - pico-train - INFO - โโโ Loss: 4.8252 |
| 2025-08-31 02:57:35 - pico-train - INFO - โโโ Learning Rate: 9.81e-05 |
| 2025-08-31 02:57:35 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 02:58:27 - pico-train - INFO - Step 51700 -- ๐ Training Metrics |
| 2025-08-31 02:58:27 - pico-train - INFO - โโโ Loss: 4.8419 |
| 2025-08-31 02:58:27 - pico-train - INFO - โโโ Learning Rate: 9.78e-05 |
| 2025-08-31 02:58:27 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 02:59:18 - pico-train - INFO - Step 51800 -- ๐ Training Metrics |
| 2025-08-31 02:59:18 - pico-train - INFO - โโโ Loss: 4.7630 |
| 2025-08-31 02:59:18 - pico-train - INFO - โโโ Learning Rate: 9.74e-05 |
| 2025-08-31 02:59:18 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 03:00:10 - pico-train - INFO - Step 51900 -- ๐ Training Metrics |
| 2025-08-31 03:00:10 - pico-train - INFO - โโโ Loss: 4.8294 |
| 2025-08-31 03:00:10 - pico-train - INFO - โโโ Learning Rate: 9.71e-05 |
| 2025-08-31 03:00:10 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 03:01:02 - pico-train - INFO - Step 52000 -- ๐พ Saving Checkpoint |
| 2025-08-31 03:02:50 - pico-train - INFO - Step 52000 -- ๐ Evaluation Results |
| 2025-08-31 03:02:50 - pico-train - INFO - โโโ paloma: inf |
| 2025-08-31 03:02:51 - pico-train - INFO - Step 52000 -- ๐ Training Metrics |
| 2025-08-31 03:02:51 - pico-train - INFO - โโโ Loss: 4.8448 |
| 2025-08-31 03:02:51 - pico-train - INFO - โโโ Learning Rate: 9.68e-05 |
| 2025-08-31 03:02:51 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 03:02:51 - pico-train - INFO - Step 52000 -- ๐ Saving Learning Dynamics |
| 2025-08-31 03:03:43 - pico-train - INFO - Step 52100 -- ๐ Training Metrics |
| 2025-08-31 03:03:43 - pico-train - INFO - โโโ Loss: 4.7965 |
| 2025-08-31 03:03:43 - pico-train - INFO - โโโ Learning Rate: 9.65e-05 |
| 2025-08-31 03:03:43 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 03:04:35 - pico-train - INFO - Step 52200 -- ๐ Training Metrics |
| 2025-08-31 03:04:35 - pico-train - INFO - โโโ Loss: 4.8062 |
| 2025-08-31 03:04:35 - pico-train - INFO - โโโ Learning Rate: 9.62e-05 |
| 2025-08-31 03:04:35 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 03:05:27 - pico-train - INFO - Step 52300 -- ๐ Training Metrics |
| 2025-08-31 03:05:27 - pico-train - INFO - โโโ Loss: 4.7954 |
| 2025-08-31 03:05:27 - pico-train - INFO - โโโ Learning Rate: 9.58e-05 |
| 2025-08-31 03:05:27 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 03:06:19 - pico-train - INFO - Step 52400 -- ๐ Training Metrics |
| 2025-08-31 03:06:19 - pico-train - INFO - โโโ Loss: 4.8452 |
| 2025-08-31 03:06:19 - pico-train - INFO - โโโ Learning Rate: 9.55e-05 |
| 2025-08-31 03:06:19 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 03:07:11 - pico-train - INFO - Step 52500 -- ๐ Training Metrics |
| 2025-08-31 03:07:11 - pico-train - INFO - โโโ Loss: 4.7621 |
| 2025-08-31 03:07:11 - pico-train - INFO - โโโ Learning Rate: 9.52e-05 |
| 2025-08-31 03:07:11 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 03:08:03 - pico-train - INFO - Step 52600 -- ๐ Training Metrics |
| 2025-08-31 03:08:03 - pico-train - INFO - โโโ Loss: 4.7518 |
| 2025-08-31 03:08:03 - pico-train - INFO - โโโ Learning Rate: 9.49e-05 |
| 2025-08-31 03:08:03 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 03:08:54 - pico-train - INFO - Step 52700 -- ๐ Training Metrics |
| 2025-08-31 03:08:54 - pico-train - INFO - โโโ Loss: 4.7219 |
| 2025-08-31 03:08:54 - pico-train - INFO - โโโ Learning Rate: 9.46e-05 |
| 2025-08-31 03:08:54 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 03:09:46 - pico-train - INFO - Step 52800 -- ๐ Training Metrics |
| 2025-08-31 03:09:46 - pico-train - INFO - โโโ Loss: 4.7477 |
| 2025-08-31 03:09:46 - pico-train - INFO - โโโ Learning Rate: 9.42e-05 |
| 2025-08-31 03:09:46 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 03:10:37 - pico-train - INFO - Step 52900 -- ๐ Training Metrics |
| 2025-08-31 03:10:37 - pico-train - INFO - โโโ Loss: 4.6478 |
| 2025-08-31 03:10:37 - pico-train - INFO - โโโ Learning Rate: 9.39e-05 |
| 2025-08-31 03:10:37 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 03:11:29 - pico-train - INFO - Step 53000 -- ๐ Training Metrics |
| 2025-08-31 03:11:29 - pico-train - INFO - โโโ Loss: 4.0911 |
| 2025-08-31 03:11:29 - pico-train - INFO - โโโ Learning Rate: 9.36e-05 |
| 2025-08-31 03:11:29 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 03:12:21 - pico-train - INFO - Step 53100 -- ๐ Training Metrics |
| 2025-08-31 03:12:21 - pico-train - INFO - โโโ Loss: 4.6383 |
| 2025-08-31 03:12:21 - pico-train - INFO - โโโ Learning Rate: 9.33e-05 |
| 2025-08-31 03:12:21 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 03:13:13 - pico-train - INFO - Step 53200 -- ๐ Training Metrics |
| 2025-08-31 03:13:13 - pico-train - INFO - โโโ Loss: 4.7310 |
| 2025-08-31 03:13:13 - pico-train - INFO - โโโ Learning Rate: 9.30e-05 |
| 2025-08-31 03:13:13 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 03:14:04 - pico-train - INFO - Step 53300 -- ๐ Training Metrics |
| 2025-08-31 03:14:04 - pico-train - INFO - โโโ Loss: 4.7118 |
| 2025-08-31 03:14:04 - pico-train - INFO - โโโ Learning Rate: 9.26e-05 |
| 2025-08-31 03:14:04 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 03:14:56 - pico-train - INFO - Step 53400 -- ๐ Training Metrics |
| 2025-08-31 03:14:56 - pico-train - INFO - โโโ Loss: 4.7295 |
| 2025-08-31 03:14:56 - pico-train - INFO - โโโ Learning Rate: 9.23e-05 |
| 2025-08-31 03:14:56 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 03:15:48 - pico-train - INFO - Step 53500 -- ๐ Training Metrics |
| 2025-08-31 03:15:48 - pico-train - INFO - โโโ Loss: 4.7496 |
| 2025-08-31 03:15:48 - pico-train - INFO - โโโ Learning Rate: 9.20e-05 |
| 2025-08-31 03:15:48 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 03:16:40 - pico-train - INFO - Step 53600 -- ๐ Training Metrics |
| 2025-08-31 03:16:40 - pico-train - INFO - โโโ Loss: 4.8170 |
| 2025-08-31 03:16:40 - pico-train - INFO - โโโ Learning Rate: 9.17e-05 |
| 2025-08-31 03:16:40 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 03:17:31 - pico-train - INFO - Step 53700 -- ๐ Training Metrics |
| 2025-08-31 03:17:31 - pico-train - INFO - โโโ Loss: 4.6796 |
| 2025-08-31 03:17:31 - pico-train - INFO - โโโ Learning Rate: 9.14e-05 |
| 2025-08-31 03:17:31 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 03:18:24 - pico-train - INFO - Step 53800 -- ๐ Training Metrics |
| 2025-08-31 03:18:24 - pico-train - INFO - โโโ Loss: 4.7291 |
| 2025-08-31 03:18:24 - pico-train - INFO - โโโ Learning Rate: 9.10e-05 |
| 2025-08-31 03:18:24 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 03:19:15 - pico-train - INFO - Step 53900 -- ๐ Training Metrics |
| 2025-08-31 03:19:15 - pico-train - INFO - โโโ Loss: 4.7736 |
| 2025-08-31 03:19:15 - pico-train - INFO - โโโ Learning Rate: 9.07e-05 |
| 2025-08-31 03:19:15 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 03:20:07 - pico-train - INFO - Step 54000 -- ๐พ Saving Checkpoint |
| 2025-08-31 03:21:55 - pico-train - INFO - Step 54000 -- ๐ Evaluation Results |
| 2025-08-31 03:21:55 - pico-train - INFO - โโโ paloma: inf |
| 2025-08-31 03:21:56 - pico-train - INFO - Step 54000 -- ๐ Training Metrics |
| 2025-08-31 03:21:56 - pico-train - INFO - โโโ Loss: 4.5813 |
| 2025-08-31 03:21:56 - pico-train - INFO - โโโ Learning Rate: 9.04e-05 |
| 2025-08-31 03:21:56 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 03:21:56 - pico-train - INFO - Step 54000 -- ๐ Saving Learning Dynamics |
| 2025-08-31 03:22:48 - pico-train - INFO - Step 54100 -- ๐ Training Metrics |
| 2025-08-31 03:22:48 - pico-train - INFO - โโโ Loss: 4.8269 |
| 2025-08-31 03:22:48 - pico-train - INFO - โโโ Learning Rate: 9.01e-05 |
| 2025-08-31 03:22:48 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 03:23:40 - pico-train - INFO - Step 54200 -- ๐ Training Metrics |
| 2025-08-31 03:23:40 - pico-train - INFO - โโโ Loss: 4.8171 |
| 2025-08-31 03:23:40 - pico-train - INFO - โโโ Learning Rate: 8.98e-05 |
| 2025-08-31 03:23:40 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 03:24:31 - pico-train - INFO - Step 54300 -- ๐ Training Metrics |
| 2025-08-31 03:24:31 - pico-train - INFO - โโโ Loss: 4.7910 |
| 2025-08-31 03:24:31 - pico-train - INFO - โโโ Learning Rate: 8.94e-05 |
| 2025-08-31 03:24:31 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 03:25:23 - pico-train - INFO - Step 54400 -- ๐ Training Metrics |
| 2025-08-31 03:25:23 - pico-train - INFO - โโโ Loss: 4.7473 |
| 2025-08-31 03:25:23 - pico-train - INFO - โโโ Learning Rate: 8.91e-05 |
| 2025-08-31 03:25:23 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 03:26:15 - pico-train - INFO - Step 54500 -- ๐ Training Metrics |
| 2025-08-31 03:26:15 - pico-train - INFO - โโโ Loss: 4.6026 |
| 2025-08-31 03:26:15 - pico-train - INFO - โโโ Learning Rate: 8.88e-05 |
| 2025-08-31 03:26:15 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 03:27:07 - pico-train - INFO - Step 54600 -- ๐ Training Metrics |
| 2025-08-31 03:27:07 - pico-train - INFO - โโโ Loss: 4.7512 |
| 2025-08-31 03:27:07 - pico-train - INFO - โโโ Learning Rate: 8.85e-05 |
| 2025-08-31 03:27:07 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 03:27:59 - pico-train - INFO - Step 54700 -- ๐ Training Metrics |
| 2025-08-31 03:27:59 - pico-train - INFO - โโโ Loss: 4.8267 |
| 2025-08-31 03:27:59 - pico-train - INFO - โโโ Learning Rate: 8.82e-05 |
| 2025-08-31 03:27:59 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 03:28:50 - pico-train - INFO - Step 54800 -- ๐ Training Metrics |
| 2025-08-31 03:28:50 - pico-train - INFO - โโโ Loss: 4.8168 |
| 2025-08-31 03:28:50 - pico-train - INFO - โโโ Learning Rate: 8.78e-05 |
| 2025-08-31 03:28:50 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 03:29:42 - pico-train - INFO - Step 54900 -- ๐ Training Metrics |
| 2025-08-31 03:29:42 - pico-train - INFO - โโโ Loss: 4.7894 |
| 2025-08-31 03:29:42 - pico-train - INFO - โโโ Learning Rate: 8.75e-05 |
| 2025-08-31 03:29:42 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 03:30:34 - pico-train - INFO - Step 55000 -- ๐ Training Metrics |
| 2025-08-31 03:30:34 - pico-train - INFO - โโโ Loss: 4.7688 |
| 2025-08-31 03:30:34 - pico-train - INFO - โโโ Learning Rate: 8.72e-05 |
| 2025-08-31 03:30:34 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 03:31:26 - pico-train - INFO - Step 55100 -- ๐ Training Metrics |
| 2025-08-31 03:31:26 - pico-train - INFO - โโโ Loss: 4.7408 |
| 2025-08-31 03:31:26 - pico-train - INFO - โโโ Learning Rate: 8.69e-05 |
| 2025-08-31 03:31:26 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 03:32:17 - pico-train - INFO - Step 55200 -- ๐ Training Metrics |
| 2025-08-31 03:32:17 - pico-train - INFO - โโโ Loss: 4.8037 |
| 2025-08-31 03:32:17 - pico-train - INFO - โโโ Learning Rate: 8.66e-05 |
| 2025-08-31 03:32:17 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 03:33:09 - pico-train - INFO - Step 55300 -- ๐ Training Metrics |
| 2025-08-31 03:33:09 - pico-train - INFO - โโโ Loss: 4.7878 |
| 2025-08-31 03:33:09 - pico-train - INFO - โโโ Learning Rate: 8.63e-05 |
| 2025-08-31 03:33:09 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 03:34:01 - pico-train - INFO - Step 55400 -- ๐ Training Metrics |
| 2025-08-31 03:34:01 - pico-train - INFO - โโโ Loss: 4.7474 |
| 2025-08-31 03:34:01 - pico-train - INFO - โโโ Learning Rate: 8.59e-05 |
| 2025-08-31 03:34:01 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 03:34:53 - pico-train - INFO - Step 55500 -- ๐ Training Metrics |
| 2025-08-31 03:34:53 - pico-train - INFO - โโโ Loss: 4.7180 |
| 2025-08-31 03:34:53 - pico-train - INFO - โโโ Learning Rate: 8.56e-05 |
| 2025-08-31 03:34:53 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 03:35:45 - pico-train - INFO - Step 55600 -- ๐ Training Metrics |
| 2025-08-31 03:35:45 - pico-train - INFO - โโโ Loss: 4.6669 |
| 2025-08-31 03:35:45 - pico-train - INFO - โโโ Learning Rate: 8.53e-05 |
| 2025-08-31 03:35:45 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 03:36:37 - pico-train - INFO - Step 55700 -- ๐ Training Metrics |
| 2025-08-31 03:36:37 - pico-train - INFO - โโโ Loss: 4.7121 |
| 2025-08-31 03:36:37 - pico-train - INFO - โโโ Learning Rate: 8.50e-05 |
| 2025-08-31 03:36:37 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 03:37:28 - pico-train - INFO - Step 55800 -- ๐ Training Metrics |
| 2025-08-31 03:37:28 - pico-train - INFO - โโโ Loss: 4.6817 |
| 2025-08-31 03:37:28 - pico-train - INFO - โโโ Learning Rate: 8.47e-05 |
| 2025-08-31 03:37:28 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 03:38:20 - pico-train - INFO - Step 55900 -- ๐ Training Metrics |
| 2025-08-31 03:38:20 - pico-train - INFO - โโโ Loss: 4.6449 |
| 2025-08-31 03:38:20 - pico-train - INFO - โโโ Learning Rate: 8.44e-05 |
| 2025-08-31 03:38:20 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 03:39:12 - pico-train - INFO - Step 56000 -- ๐พ Saving Checkpoint |
| 2025-08-31 03:41:00 - pico-train - INFO - Step 56000 -- ๐ Evaluation Results |
| 2025-08-31 03:41:00 - pico-train - INFO - โโโ paloma: inf |
| 2025-08-31 03:41:00 - pico-train - INFO - Step 56000 -- ๐ Training Metrics |
| 2025-08-31 03:41:00 - pico-train - INFO - โโโ Loss: 4.7752 |
| 2025-08-31 03:41:00 - pico-train - INFO - โโโ Learning Rate: 8.40e-05 |
| 2025-08-31 03:41:00 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 03:41:00 - pico-train - INFO - Step 56000 -- ๐ Saving Learning Dynamics |
| 2025-08-31 03:41:52 - pico-train - INFO - Step 56100 -- ๐ Training Metrics |
| 2025-08-31 03:41:52 - pico-train - INFO - โโโ Loss: 4.7332 |
| 2025-08-31 03:41:52 - pico-train - INFO - โโโ Learning Rate: 8.37e-05 |
| 2025-08-31 03:41:52 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 03:42:44 - pico-train - INFO - Step 56200 -- ๐ Training Metrics |
| 2025-08-31 03:42:44 - pico-train - INFO - โโโ Loss: 4.7442 |
| 2025-08-31 03:42:44 - pico-train - INFO - โโโ Learning Rate: 8.34e-05 |
| 2025-08-31 03:42:44 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 03:43:36 - pico-train - INFO - Step 56300 -- ๐ Training Metrics |
| 2025-08-31 03:43:36 - pico-train - INFO - โโโ Loss: 4.7683 |
| 2025-08-31 03:43:36 - pico-train - INFO - โโโ Learning Rate: 8.31e-05 |
| 2025-08-31 03:43:36 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 03:44:27 - pico-train - INFO - Step 56400 -- ๐ Training Metrics |
| 2025-08-31 03:44:27 - pico-train - INFO - โโโ Loss: 4.7436 |
| 2025-08-31 03:44:27 - pico-train - INFO - โโโ Learning Rate: 8.28e-05 |
| 2025-08-31 03:44:27 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 03:45:20 - pico-train - INFO - Step 56500 -- ๐ Training Metrics |
| 2025-08-31 03:45:20 - pico-train - INFO - โโโ Loss: 4.7703 |
| 2025-08-31 03:45:20 - pico-train - INFO - โโโ Learning Rate: 8.25e-05 |
| 2025-08-31 03:45:20 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 03:46:11 - pico-train - INFO - Step 56600 -- ๐ Training Metrics |
| 2025-08-31 03:46:11 - pico-train - INFO - โโโ Loss: 4.6823 |
| 2025-08-31 03:46:11 - pico-train - INFO - โโโ Learning Rate: 8.21e-05 |
| 2025-08-31 03:46:11 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 03:47:03 - pico-train - INFO - Step 56700 -- ๐ Training Metrics |
| 2025-08-31 03:47:03 - pico-train - INFO - โโโ Loss: 4.5874 |
| 2025-08-31 03:47:03 - pico-train - INFO - โโโ Learning Rate: 8.18e-05 |
| 2025-08-31 03:47:03 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 03:47:54 - pico-train - INFO - Step 56800 -- ๐ Training Metrics |
| 2025-08-31 03:47:54 - pico-train - INFO - โโโ Loss: 4.6526 |
| 2025-08-31 03:47:54 - pico-train - INFO - โโโ Learning Rate: 8.15e-05 |
| 2025-08-31 03:47:54 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 03:48:46 - pico-train - INFO - Step 56900 -- ๐ Training Metrics |
| 2025-08-31 03:48:46 - pico-train - INFO - โโโ Loss: 4.7713 |
| 2025-08-31 03:48:46 - pico-train - INFO - โโโ Learning Rate: 8.12e-05 |
| 2025-08-31 03:48:46 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 03:49:38 - pico-train - INFO - Step 57000 -- ๐ Training Metrics |
| 2025-08-31 03:49:38 - pico-train - INFO - โโโ Loss: 4.7954 |
| 2025-08-31 03:49:38 - pico-train - INFO - โโโ Learning Rate: 8.09e-05 |
| 2025-08-31 03:49:38 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 03:50:30 - pico-train - INFO - Step 57100 -- ๐ Training Metrics |
| 2025-08-31 03:50:30 - pico-train - INFO - โโโ Loss: 4.7338 |
| 2025-08-31 03:50:30 - pico-train - INFO - โโโ Learning Rate: 8.06e-05 |
| 2025-08-31 03:50:30 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 03:51:21 - pico-train - INFO - Step 57200 -- ๐ Training Metrics |
| 2025-08-31 03:51:21 - pico-train - INFO - โโโ Loss: 4.5513 |
| 2025-08-31 03:51:21 - pico-train - INFO - โโโ Learning Rate: 8.03e-05 |
| 2025-08-31 03:51:21 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 03:52:13 - pico-train - INFO - Step 57300 -- ๐ Training Metrics |
| 2025-08-31 03:52:13 - pico-train - INFO - โโโ Loss: 4.6619 |
| 2025-08-31 03:52:13 - pico-train - INFO - โโโ Learning Rate: 7.99e-05 |
| 2025-08-31 03:52:13 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 03:53:05 - pico-train - INFO - Step 57400 -- ๐ Training Metrics |
| 2025-08-31 03:53:05 - pico-train - INFO - โโโ Loss: 4.7497 |
| 2025-08-31 03:53:05 - pico-train - INFO - โโโ Learning Rate: 7.96e-05 |
| 2025-08-31 03:53:05 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 03:53:57 - pico-train - INFO - Step 57500 -- ๐ Training Metrics |
| 2025-08-31 03:53:57 - pico-train - INFO - โโโ Loss: 4.7734 |
| 2025-08-31 03:53:57 - pico-train - INFO - โโโ Learning Rate: 7.93e-05 |
| 2025-08-31 03:53:57 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 03:54:49 - pico-train - INFO - Step 57600 -- ๐ Training Metrics |
| 2025-08-31 03:54:49 - pico-train - INFO - โโโ Loss: 4.7634 |
| 2025-08-31 03:54:49 - pico-train - INFO - โโโ Learning Rate: 7.90e-05 |
| 2025-08-31 03:54:49 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 03:55:40 - pico-train - INFO - Step 57700 -- ๐ Training Metrics |
| 2025-08-31 03:55:40 - pico-train - INFO - โโโ Loss: 4.7525 |
| 2025-08-31 03:55:40 - pico-train - INFO - โโโ Learning Rate: 7.87e-05 |
| 2025-08-31 03:55:40 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 03:56:32 - pico-train - INFO - Step 57800 -- ๐ Training Metrics |
| 2025-08-31 03:56:32 - pico-train - INFO - โโโ Loss: 4.7636 |
| 2025-08-31 03:56:32 - pico-train - INFO - โโโ Learning Rate: 7.84e-05 |
| 2025-08-31 03:56:32 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 03:57:23 - pico-train - INFO - Step 57900 -- ๐ Training Metrics |
| 2025-08-31 03:57:23 - pico-train - INFO - โโโ Loss: 4.7354 |
| 2025-08-31 03:57:23 - pico-train - INFO - โโโ Learning Rate: 7.81e-05 |
| 2025-08-31 03:57:23 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 03:58:15 - pico-train - INFO - Step 58000 -- ๐พ Saving Checkpoint |
| 2025-08-31 04:00:04 - pico-train - INFO - Step 58000 -- ๐ Evaluation Results |
| 2025-08-31 04:00:04 - pico-train - INFO - โโโ paloma: inf |
| 2025-08-31 04:00:05 - pico-train - INFO - Step 58000 -- ๐ Training Metrics |
| 2025-08-31 04:00:05 - pico-train - INFO - โโโ Loss: 4.6258 |
| 2025-08-31 04:00:05 - pico-train - INFO - โโโ Learning Rate: 7.77e-05 |
| 2025-08-31 04:00:05 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 04:00:05 - pico-train - INFO - Step 58000 -- ๐ Saving Learning Dynamics |
| 2025-08-31 04:00:57 - pico-train - INFO - Step 58100 -- ๐ Training Metrics |
| 2025-08-31 04:00:57 - pico-train - INFO - โโโ Loss: 4.6990 |
| 2025-08-31 04:00:57 - pico-train - INFO - โโโ Learning Rate: 7.74e-05 |
| 2025-08-31 04:00:57 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 04:01:49 - pico-train - INFO - Step 58200 -- ๐ Training Metrics |
| 2025-08-31 04:01:49 - pico-train - INFO - โโโ Loss: 4.6225 |
| 2025-08-31 04:01:49 - pico-train - INFO - โโโ Learning Rate: 7.71e-05 |
| 2025-08-31 04:01:49 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 04:02:41 - pico-train - INFO - Step 58300 -- ๐ Training Metrics |
| 2025-08-31 04:02:41 - pico-train - INFO - โโโ Loss: 4.7347 |
| 2025-08-31 04:02:41 - pico-train - INFO - โโโ Learning Rate: 7.68e-05 |
| 2025-08-31 04:02:41 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 04:03:33 - pico-train - INFO - Step 58400 -- ๐ Training Metrics |
| 2025-08-31 04:03:33 - pico-train - INFO - โโโ Loss: 4.7528 |
| 2025-08-31 04:03:33 - pico-train - INFO - โโโ Learning Rate: 7.65e-05 |
| 2025-08-31 04:03:33 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 04:04:25 - pico-train - INFO - Step 58500 -- ๐ Training Metrics |
| 2025-08-31 04:04:25 - pico-train - INFO - โโโ Loss: 4.7280 |
| 2025-08-31 04:04:25 - pico-train - INFO - โโโ Learning Rate: 7.62e-05 |
| 2025-08-31 04:04:25 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 04:05:17 - pico-train - INFO - Step 58600 -- ๐ Training Metrics |
| 2025-08-31 04:05:17 - pico-train - INFO - โโโ Loss: 4.7135 |
| 2025-08-31 04:05:17 - pico-train - INFO - โโโ Learning Rate: 7.59e-05 |
| 2025-08-31 04:05:17 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 04:06:09 - pico-train - INFO - Step 58700 -- ๐ Training Metrics |
| 2025-08-31 04:06:09 - pico-train - INFO - โโโ Loss: 4.8126 |
| 2025-08-31 04:06:09 - pico-train - INFO - โโโ Learning Rate: 7.56e-05 |
| 2025-08-31 04:06:09 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 04:07:00 - pico-train - INFO - Step 58800 -- ๐ Training Metrics |
| 2025-08-31 04:07:00 - pico-train - INFO - โโโ Loss: 4.7505 |
| 2025-08-31 04:07:00 - pico-train - INFO - โโโ Learning Rate: 7.53e-05 |
| 2025-08-31 04:07:00 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 04:07:52 - pico-train - INFO - Step 58900 -- ๐ Training Metrics |
| 2025-08-31 04:07:52 - pico-train - INFO - โโโ Loss: 4.7376 |
| 2025-08-31 04:07:52 - pico-train - INFO - โโโ Learning Rate: 7.49e-05 |
| 2025-08-31 04:07:52 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 04:08:44 - pico-train - INFO - Step 59000 -- ๐ Training Metrics |
| 2025-08-31 04:08:44 - pico-train - INFO - โโโ Loss: 4.8413 |
| 2025-08-31 04:08:44 - pico-train - INFO - โโโ Learning Rate: 7.46e-05 |
| 2025-08-31 04:08:44 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 04:09:36 - pico-train - INFO - Step 59100 -- ๐ Training Metrics |
| 2025-08-31 04:09:36 - pico-train - INFO - โโโ Loss: 4.7368 |
| 2025-08-31 04:09:36 - pico-train - INFO - โโโ Learning Rate: 7.43e-05 |
| 2025-08-31 04:09:36 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 04:10:28 - pico-train - INFO - Step 59200 -- ๐ Training Metrics |
| 2025-08-31 04:10:28 - pico-train - INFO - โโโ Loss: 4.7792 |
| 2025-08-31 04:10:28 - pico-train - INFO - โโโ Learning Rate: 7.40e-05 |
| 2025-08-31 04:10:28 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 04:11:19 - pico-train - INFO - Step 59300 -- ๐ Training Metrics |
| 2025-08-31 04:11:19 - pico-train - INFO - โโโ Loss: 4.7737 |
| 2025-08-31 04:11:19 - pico-train - INFO - โโโ Learning Rate: 7.37e-05 |
| 2025-08-31 04:11:19 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 04:12:11 - pico-train - INFO - Step 59400 -- ๐ Training Metrics |
| 2025-08-31 04:12:11 - pico-train - INFO - โโโ Loss: 4.5704 |
| 2025-08-31 04:12:11 - pico-train - INFO - โโโ Learning Rate: 7.34e-05 |
| 2025-08-31 04:12:11 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 04:13:03 - pico-train - INFO - Step 59500 -- ๐ Training Metrics |
| 2025-08-31 04:13:03 - pico-train - INFO - โโโ Loss: 4.7184 |
| 2025-08-31 04:13:03 - pico-train - INFO - โโโ Learning Rate: 7.31e-05 |
| 2025-08-31 04:13:03 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 04:13:54 - pico-train - INFO - Step 59600 -- ๐ Training Metrics |
| 2025-08-31 04:13:54 - pico-train - INFO - โโโ Loss: 4.8264 |
| 2025-08-31 04:13:54 - pico-train - INFO - โโโ Learning Rate: 7.28e-05 |
| 2025-08-31 04:13:54 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 04:14:46 - pico-train - INFO - Step 59700 -- ๐ Training Metrics |
| 2025-08-31 04:14:46 - pico-train - INFO - โโโ Loss: 4.7104 |
| 2025-08-31 04:14:46 - pico-train - INFO - โโโ Learning Rate: 7.25e-05 |
| 2025-08-31 04:14:46 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 04:15:38 - pico-train - INFO - Step 59800 -- ๐ Training Metrics |
| 2025-08-31 04:15:38 - pico-train - INFO - โโโ Loss: 4.7607 |
| 2025-08-31 04:15:38 - pico-train - INFO - โโโ Learning Rate: 7.22e-05 |
| 2025-08-31 04:15:38 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 04:16:30 - pico-train - INFO - Step 59900 -- ๐ Training Metrics |
| 2025-08-31 04:16:30 - pico-train - INFO - โโโ Loss: 4.6433 |
| 2025-08-31 04:16:30 - pico-train - INFO - โโโ Learning Rate: 7.19e-05 |
| 2025-08-31 04:16:30 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 04:17:22 - pico-train - INFO - Step 60000 -- ๐พ Saving Checkpoint |
| 2025-08-31 04:19:11 - pico-train - INFO - Step 60000 -- ๐ Evaluation Results |
| 2025-08-31 04:19:11 - pico-train - INFO - โโโ paloma: inf |
| 2025-08-31 04:19:11 - pico-train - INFO - Step 60000 -- ๐ Training Metrics |
| 2025-08-31 04:19:11 - pico-train - INFO - โโโ Loss: 4.7643 |
| 2025-08-31 04:19:11 - pico-train - INFO - โโโ Learning Rate: 7.15e-05 |
| 2025-08-31 04:19:11 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 04:19:11 - pico-train - INFO - Step 60000 -- ๐ Saving Learning Dynamics |
| 2025-08-31 04:20:04 - pico-train - INFO - Step 60100 -- ๐ Training Metrics |
| 2025-08-31 04:20:04 - pico-train - INFO - โโโ Loss: 4.7230 |
| 2025-08-31 04:20:04 - pico-train - INFO - โโโ Learning Rate: 7.12e-05 |
| 2025-08-31 04:20:04 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 04:20:55 - pico-train - INFO - Step 60200 -- ๐ Training Metrics |
| 2025-08-31 04:20:55 - pico-train - INFO - โโโ Loss: 4.7552 |
| 2025-08-31 04:20:55 - pico-train - INFO - โโโ Learning Rate: 7.09e-05 |
| 2025-08-31 04:20:55 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 04:21:47 - pico-train - INFO - Step 60300 -- ๐ Training Metrics |
| 2025-08-31 04:21:47 - pico-train - INFO - โโโ Loss: 4.7716 |
| 2025-08-31 04:21:47 - pico-train - INFO - โโโ Learning Rate: 7.06e-05 |
| 2025-08-31 04:21:47 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 04:22:38 - pico-train - INFO - Step 60400 -- ๐ Training Metrics |
| 2025-08-31 04:22:38 - pico-train - INFO - โโโ Loss: 4.8298 |
| 2025-08-31 04:22:38 - pico-train - INFO - โโโ Learning Rate: 7.03e-05 |
| 2025-08-31 04:22:38 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 04:23:31 - pico-train - INFO - Step 60500 -- ๐ Training Metrics |
| 2025-08-31 04:23:31 - pico-train - INFO - โโโ Loss: 4.7631 |
| 2025-08-31 04:23:31 - pico-train - INFO - โโโ Learning Rate: 7.00e-05 |
| 2025-08-31 04:23:31 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 04:24:22 - pico-train - INFO - Step 60600 -- ๐ Training Metrics |
| 2025-08-31 04:24:22 - pico-train - INFO - โโโ Loss: 4.7056 |
| 2025-08-31 04:24:22 - pico-train - INFO - โโโ Learning Rate: 6.97e-05 |
| 2025-08-31 04:24:22 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 04:25:14 - pico-train - INFO - Step 60700 -- ๐ Training Metrics |
| 2025-08-31 04:25:14 - pico-train - INFO - โโโ Loss: 4.7684 |
| 2025-08-31 04:25:14 - pico-train - INFO - โโโ Learning Rate: 6.94e-05 |
| 2025-08-31 04:25:14 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 04:26:06 - pico-train - INFO - Step 60800 -- ๐ Training Metrics |
| 2025-08-31 04:26:06 - pico-train - INFO - โโโ Loss: 4.7622 |
| 2025-08-31 04:26:06 - pico-train - INFO - โโโ Learning Rate: 6.91e-05 |
| 2025-08-31 04:26:06 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 04:26:57 - pico-train - INFO - Step 60900 -- ๐ Training Metrics |
| 2025-08-31 04:26:57 - pico-train - INFO - โโโ Loss: 4.6733 |
| 2025-08-31 04:26:57 - pico-train - INFO - โโโ Learning Rate: 6.88e-05 |
| 2025-08-31 04:26:57 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 04:27:49 - pico-train - INFO - Step 61000 -- ๐ Training Metrics |
| 2025-08-31 04:27:49 - pico-train - INFO - โโโ Loss: 4.7075 |
| 2025-08-31 04:27:49 - pico-train - INFO - โโโ Learning Rate: 6.85e-05 |
| 2025-08-31 04:27:49 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 04:28:41 - pico-train - INFO - Step 61100 -- ๐ Training Metrics |
| 2025-08-31 04:28:41 - pico-train - INFO - โโโ Loss: 4.6605 |
| 2025-08-31 04:28:41 - pico-train - INFO - โโโ Learning Rate: 6.82e-05 |
| 2025-08-31 04:28:41 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 04:29:32 - pico-train - INFO - Step 61200 -- ๐ Training Metrics |
| 2025-08-31 04:29:32 - pico-train - INFO - โโโ Loss: 4.7365 |
| 2025-08-31 04:29:32 - pico-train - INFO - โโโ Learning Rate: 6.79e-05 |
| 2025-08-31 04:29:32 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 04:30:24 - pico-train - INFO - Step 61300 -- ๐ Training Metrics |
| 2025-08-31 04:30:24 - pico-train - INFO - โโโ Loss: 4.6952 |
| 2025-08-31 04:30:24 - pico-train - INFO - โโโ Learning Rate: 6.76e-05 |
| 2025-08-31 04:30:24 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 04:31:15 - pico-train - INFO - Step 61400 -- ๐ Training Metrics |
| 2025-08-31 04:31:15 - pico-train - INFO - โโโ Loss: 4.7439 |
| 2025-08-31 04:31:15 - pico-train - INFO - โโโ Learning Rate: 6.73e-05 |
| 2025-08-31 04:31:15 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 04:32:07 - pico-train - INFO - Step 61500 -- ๐ Training Metrics |
| 2025-08-31 04:32:07 - pico-train - INFO - โโโ Loss: 4.7678 |
| 2025-08-31 04:32:07 - pico-train - INFO - โโโ Learning Rate: 6.70e-05 |
| 2025-08-31 04:32:07 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 04:32:59 - pico-train - INFO - Step 61600 -- ๐ Training Metrics |
| 2025-08-31 04:32:59 - pico-train - INFO - โโโ Loss: 4.7163 |
| 2025-08-31 04:32:59 - pico-train - INFO - โโโ Learning Rate: 6.67e-05 |
| 2025-08-31 04:32:59 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 04:33:50 - pico-train - INFO - Step 61700 -- ๐ Training Metrics |
| 2025-08-31 04:33:50 - pico-train - INFO - โโโ Loss: 4.7610 |
| 2025-08-31 04:33:50 - pico-train - INFO - โโโ Learning Rate: 6.64e-05 |
| 2025-08-31 04:33:50 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 04:34:42 - pico-train - INFO - Step 61800 -- ๐ Training Metrics |
| 2025-08-31 04:34:42 - pico-train - INFO - โโโ Loss: 4.7427 |
| 2025-08-31 04:34:42 - pico-train - INFO - โโโ Learning Rate: 6.61e-05 |
| 2025-08-31 04:34:42 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 04:35:34 - pico-train - INFO - Step 61900 -- ๐ Training Metrics |
| 2025-08-31 04:35:34 - pico-train - INFO - โโโ Loss: 4.7452 |
| 2025-08-31 04:35:34 - pico-train - INFO - โโโ Learning Rate: 6.58e-05 |
| 2025-08-31 04:35:34 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 04:36:26 - pico-train - INFO - Step 62000 -- ๐พ Saving Checkpoint |
| 2025-08-31 04:38:14 - pico-train - INFO - Step 62000 -- ๐ Evaluation Results |
| 2025-08-31 04:38:14 - pico-train - INFO - โโโ paloma: inf |
| 2025-08-31 04:38:15 - pico-train - INFO - Step 62000 -- ๐ Training Metrics |
| 2025-08-31 04:38:15 - pico-train - INFO - โโโ Loss: 4.7605 |
| 2025-08-31 04:38:15 - pico-train - INFO - โโโ Learning Rate: 6.55e-05 |
| 2025-08-31 04:38:15 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 04:38:15 - pico-train - INFO - Step 62000 -- ๐ Saving Learning Dynamics |
| 2025-08-31 04:39:07 - pico-train - INFO - Step 62100 -- ๐ Training Metrics |
| 2025-08-31 04:39:07 - pico-train - INFO - โโโ Loss: 4.7000 |
| 2025-08-31 04:39:07 - pico-train - INFO - โโโ Learning Rate: 6.52e-05 |
| 2025-08-31 04:39:07 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 04:39:59 - pico-train - INFO - Step 62200 -- ๐ Training Metrics |
| 2025-08-31 04:39:59 - pico-train - INFO - โโโ Loss: 4.7021 |
| 2025-08-31 04:39:59 - pico-train - INFO - โโโ Learning Rate: 6.49e-05 |
| 2025-08-31 04:39:59 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 04:40:50 - pico-train - INFO - Step 62300 -- ๐ Training Metrics |
| 2025-08-31 04:40:50 - pico-train - INFO - โโโ Loss: 4.7187 |
| 2025-08-31 04:40:50 - pico-train - INFO - โโโ Learning Rate: 6.46e-05 |
| 2025-08-31 04:40:50 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 04:41:42 - pico-train - INFO - Step 62400 -- ๐ Training Metrics |
| 2025-08-31 04:41:42 - pico-train - INFO - โโโ Loss: 4.7618 |
| 2025-08-31 04:41:42 - pico-train - INFO - โโโ Learning Rate: 6.43e-05 |
| 2025-08-31 04:41:42 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 04:42:34 - pico-train - INFO - Step 62500 -- ๐ Training Metrics |
| 2025-08-31 04:42:34 - pico-train - INFO - โโโ Loss: 4.6404 |
| 2025-08-31 04:42:34 - pico-train - INFO - โโโ Learning Rate: 6.40e-05 |
| 2025-08-31 04:42:34 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 04:43:26 - pico-train - INFO - Step 62600 -- ๐ Training Metrics |
| 2025-08-31 04:43:26 - pico-train - INFO - โโโ Loss: 4.5798 |
| 2025-08-31 04:43:26 - pico-train - INFO - โโโ Learning Rate: 6.37e-05 |
| 2025-08-31 04:43:26 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 04:44:18 - pico-train - INFO - Step 62700 -- ๐ Training Metrics |
| 2025-08-31 04:44:18 - pico-train - INFO - โโโ Loss: 4.6632 |
| 2025-08-31 04:44:18 - pico-train - INFO - โโโ Learning Rate: 6.34e-05 |
| 2025-08-31 04:44:18 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 04:45:09 - pico-train - INFO - Step 62800 -- ๐ Training Metrics |
| 2025-08-31 04:45:09 - pico-train - INFO - โโโ Loss: 4.5843 |
| 2025-08-31 04:45:09 - pico-train - INFO - โโโ Learning Rate: 6.31e-05 |
| 2025-08-31 04:45:09 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 04:46:01 - pico-train - INFO - Step 62900 -- ๐ Training Metrics |
| 2025-08-31 04:46:01 - pico-train - INFO - โโโ Loss: 4.6597 |
| 2025-08-31 04:46:01 - pico-train - INFO - โโโ Learning Rate: 6.28e-05 |
| 2025-08-31 04:46:01 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 04:46:53 - pico-train - INFO - Step 63000 -- ๐ Training Metrics |
| 2025-08-31 04:46:53 - pico-train - INFO - โโโ Loss: 4.6418 |
| 2025-08-31 04:46:53 - pico-train - INFO - โโโ Learning Rate: 6.25e-05 |
| 2025-08-31 04:46:53 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 04:47:45 - pico-train - INFO - Step 63100 -- ๐ Training Metrics |
| 2025-08-31 04:47:45 - pico-train - INFO - โโโ Loss: 4.7716 |
| 2025-08-31 04:47:45 - pico-train - INFO - โโโ Learning Rate: 6.22e-05 |
| 2025-08-31 04:47:45 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 04:48:37 - pico-train - INFO - Step 63200 -- ๐ Training Metrics |
| 2025-08-31 04:48:37 - pico-train - INFO - โโโ Loss: 4.7586 |
| 2025-08-31 04:48:37 - pico-train - INFO - โโโ Learning Rate: 6.19e-05 |
| 2025-08-31 04:48:37 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 04:49:28 - pico-train - INFO - Step 63300 -- ๐ Training Metrics |
| 2025-08-31 04:49:28 - pico-train - INFO - โโโ Loss: 4.6901 |
| 2025-08-31 04:49:28 - pico-train - INFO - โโโ Learning Rate: 6.16e-05 |
| 2025-08-31 04:49:28 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 04:50:20 - pico-train - INFO - Step 63400 -- ๐ Training Metrics |
| 2025-08-31 04:50:20 - pico-train - INFO - โโโ Loss: 4.6636 |
| 2025-08-31 04:50:20 - pico-train - INFO - โโโ Learning Rate: 6.13e-05 |
| 2025-08-31 04:50:20 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 04:51:12 - pico-train - INFO - Step 63500 -- ๐ Training Metrics |
| 2025-08-31 04:51:12 - pico-train - INFO - โโโ Loss: 4.6091 |
| 2025-08-31 04:51:12 - pico-train - INFO - โโโ Learning Rate: 6.10e-05 |
| 2025-08-31 04:51:12 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 04:52:03 - pico-train - INFO - Step 63600 -- ๐ Training Metrics |
| 2025-08-31 04:52:03 - pico-train - INFO - โโโ Loss: 4.6391 |
| 2025-08-31 04:52:03 - pico-train - INFO - โโโ Learning Rate: 6.07e-05 |
| 2025-08-31 04:52:03 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 04:52:55 - pico-train - INFO - Step 63700 -- ๐ Training Metrics |
| 2025-08-31 04:52:55 - pico-train - INFO - โโโ Loss: 4.6842 |
| 2025-08-31 04:52:55 - pico-train - INFO - โโโ Learning Rate: 6.04e-05 |
| 2025-08-31 04:52:55 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 04:53:47 - pico-train - INFO - Step 63800 -- ๐ Training Metrics |
| 2025-08-31 04:53:47 - pico-train - INFO - โโโ Loss: 4.7425 |
| 2025-08-31 04:53:47 - pico-train - INFO - โโโ Learning Rate: 6.01e-05 |
| 2025-08-31 04:53:47 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 04:54:38 - pico-train - INFO - Step 63900 -- ๐ Training Metrics |
| 2025-08-31 04:54:38 - pico-train - INFO - โโโ Loss: 4.6382 |
| 2025-08-31 04:54:38 - pico-train - INFO - โโโ Learning Rate: 5.98e-05 |
| 2025-08-31 04:54:38 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 04:55:30 - pico-train - INFO - Step 64000 -- ๐พ Saving Checkpoint |
| 2025-08-31 04:57:17 - pico-train - INFO - Step 64000 -- ๐ Evaluation Results |
| 2025-08-31 04:57:17 - pico-train - INFO - โโโ paloma: inf |
| 2025-08-31 04:57:18 - pico-train - INFO - Step 64000 -- ๐ Training Metrics |
| 2025-08-31 04:57:18 - pico-train - INFO - โโโ Loss: 4.7247 |
| 2025-08-31 04:57:18 - pico-train - INFO - โโโ Learning Rate: 5.95e-05 |
| 2025-08-31 04:57:18 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 04:57:18 - pico-train - INFO - Step 64000 -- ๐ Saving Learning Dynamics |
| 2025-08-31 04:58:10 - pico-train - INFO - Step 64100 -- ๐ Training Metrics |
| 2025-08-31 04:58:10 - pico-train - INFO - โโโ Loss: 4.7745 |
| 2025-08-31 04:58:10 - pico-train - INFO - โโโ Learning Rate: 5.92e-05 |
| 2025-08-31 04:58:10 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 04:59:01 - pico-train - INFO - Step 64200 -- ๐ Training Metrics |
| 2025-08-31 04:59:01 - pico-train - INFO - โโโ Loss: 4.7469 |
| 2025-08-31 04:59:01 - pico-train - INFO - โโโ Learning Rate: 5.89e-05 |
| 2025-08-31 04:59:01 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 04:59:53 - pico-train - INFO - Step 64300 -- ๐ Training Metrics |
| 2025-08-31 04:59:53 - pico-train - INFO - โโโ Loss: 4.7574 |
| 2025-08-31 04:59:53 - pico-train - INFO - โโโ Learning Rate: 5.86e-05 |
| 2025-08-31 04:59:53 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 05:00:44 - pico-train - INFO - Step 64400 -- ๐ Training Metrics |
| 2025-08-31 05:00:44 - pico-train - INFO - โโโ Loss: 4.7948 |
| 2025-08-31 05:00:44 - pico-train - INFO - โโโ Learning Rate: 5.84e-05 |
| 2025-08-31 05:00:44 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 05:01:37 - pico-train - INFO - Step 64500 -- ๐ Training Metrics |
| 2025-08-31 05:01:37 - pico-train - INFO - โโโ Loss: 4.7107 |
| 2025-08-31 05:01:37 - pico-train - INFO - โโโ Learning Rate: 5.81e-05 |
| 2025-08-31 05:01:37 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 05:02:29 - pico-train - INFO - Step 64600 -- ๐ Training Metrics |
| 2025-08-31 05:02:29 - pico-train - INFO - โโโ Loss: 4.7752 |
| 2025-08-31 05:02:29 - pico-train - INFO - โโโ Learning Rate: 5.78e-05 |
| 2025-08-31 05:02:29 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 05:03:21 - pico-train - INFO - Step 64700 -- ๐ Training Metrics |
| 2025-08-31 05:03:21 - pico-train - INFO - โโโ Loss: 4.6828 |
| 2025-08-31 05:03:21 - pico-train - INFO - โโโ Learning Rate: 5.75e-05 |
| 2025-08-31 05:03:21 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 05:04:12 - pico-train - INFO - Step 64800 -- ๐ Training Metrics |
| 2025-08-31 05:04:12 - pico-train - INFO - โโโ Loss: 4.7338 |
| 2025-08-31 05:04:12 - pico-train - INFO - โโโ Learning Rate: 5.72e-05 |
| 2025-08-31 05:04:12 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 05:05:04 - pico-train - INFO - Step 64900 -- ๐ Training Metrics |
| 2025-08-31 05:05:04 - pico-train - INFO - โโโ Loss: 4.6559 |
| 2025-08-31 05:05:04 - pico-train - INFO - โโโ Learning Rate: 5.69e-05 |
| 2025-08-31 05:05:04 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 05:05:56 - pico-train - INFO - Step 65000 -- ๐ Training Metrics |
| 2025-08-31 05:05:56 - pico-train - INFO - โโโ Loss: 4.5495 |
| 2025-08-31 05:05:56 - pico-train - INFO - โโโ Learning Rate: 5.66e-05 |
| 2025-08-31 05:05:56 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 05:06:48 - pico-train - INFO - Step 65100 -- ๐ Training Metrics |
| 2025-08-31 05:06:48 - pico-train - INFO - โโโ Loss: 4.6956 |
| 2025-08-31 05:06:48 - pico-train - INFO - โโโ Learning Rate: 5.63e-05 |
| 2025-08-31 05:06:48 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 05:07:39 - pico-train - INFO - Step 65200 -- ๐ Training Metrics |
| 2025-08-31 05:07:39 - pico-train - INFO - โโโ Loss: 4.7201 |
| 2025-08-31 05:07:39 - pico-train - INFO - โโโ Learning Rate: 5.60e-05 |
| 2025-08-31 05:07:39 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 05:08:31 - pico-train - INFO - Step 65300 -- ๐ Training Metrics |
| 2025-08-31 05:08:31 - pico-train - INFO - โโโ Loss: 4.7585 |
| 2025-08-31 05:08:31 - pico-train - INFO - โโโ Learning Rate: 5.57e-05 |
| 2025-08-31 05:08:31 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 05:09:22 - pico-train - INFO - Step 65400 -- ๐ Training Metrics |
| 2025-08-31 05:09:22 - pico-train - INFO - โโโ Loss: 4.7034 |
| 2025-08-31 05:09:22 - pico-train - INFO - โโโ Learning Rate: 5.55e-05 |
| 2025-08-31 05:09:22 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 05:10:14 - pico-train - INFO - Step 65500 -- ๐ Training Metrics |
| 2025-08-31 05:10:14 - pico-train - INFO - โโโ Loss: 4.6977 |
| 2025-08-31 05:10:14 - pico-train - INFO - โโโ Learning Rate: 5.52e-05 |
| 2025-08-31 05:10:14 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 05:11:06 - pico-train - INFO - Step 65600 -- ๐ Training Metrics |
| 2025-08-31 05:11:06 - pico-train - INFO - โโโ Loss: 4.7311 |
| 2025-08-31 05:11:06 - pico-train - INFO - โโโ Learning Rate: 5.49e-05 |
| 2025-08-31 05:11:06 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 05:11:57 - pico-train - INFO - Step 65700 -- ๐ Training Metrics |
| 2025-08-31 05:11:57 - pico-train - INFO - โโโ Loss: 4.7227 |
| 2025-08-31 05:11:57 - pico-train - INFO - โโโ Learning Rate: 5.46e-05 |
| 2025-08-31 05:11:57 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 05:12:49 - pico-train - INFO - Step 65800 -- ๐ Training Metrics |
| 2025-08-31 05:12:49 - pico-train - INFO - โโโ Loss: 4.7637 |
| 2025-08-31 05:12:49 - pico-train - INFO - โโโ Learning Rate: 5.43e-05 |
| 2025-08-31 05:12:49 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 05:13:40 - pico-train - INFO - Step 65900 -- ๐ Training Metrics |
| 2025-08-31 05:13:40 - pico-train - INFO - โโโ Loss: 4.6328 |
| 2025-08-31 05:13:40 - pico-train - INFO - โโโ Learning Rate: 5.40e-05 |
| 2025-08-31 05:13:40 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 05:14:32 - pico-train - INFO - Step 66000 -- ๐พ Saving Checkpoint |
| 2025-08-31 05:16:20 - pico-train - INFO - Step 66000 -- ๐ Evaluation Results |
| 2025-08-31 05:16:20 - pico-train - INFO - โโโ paloma: inf |
| 2025-08-31 05:16:20 - pico-train - INFO - Step 66000 -- ๐ Training Metrics |
| 2025-08-31 05:16:20 - pico-train - INFO - โโโ Loss: 4.6922 |
| 2025-08-31 05:16:20 - pico-train - INFO - โโโ Learning Rate: 5.37e-05 |
| 2025-08-31 05:16:20 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 05:16:20 - pico-train - INFO - Step 66000 -- ๐ Saving Learning Dynamics |
| 2025-08-31 05:17:13 - pico-train - INFO - Step 66100 -- ๐ Training Metrics |
| 2025-08-31 05:17:13 - pico-train - INFO - โโโ Loss: 4.6547 |
| 2025-08-31 05:17:13 - pico-train - INFO - โโโ Learning Rate: 5.35e-05 |
| 2025-08-31 05:17:13 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 05:18:04 - pico-train - INFO - Step 66200 -- ๐ Training Metrics |
| 2025-08-31 05:18:04 - pico-train - INFO - โโโ Loss: 4.6359 |
| 2025-08-31 05:18:04 - pico-train - INFO - โโโ Learning Rate: 5.32e-05 |
| 2025-08-31 05:18:04 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 05:18:56 - pico-train - INFO - Step 66300 -- ๐ Training Metrics |
| 2025-08-31 05:18:56 - pico-train - INFO - โโโ Loss: 4.7213 |
| 2025-08-31 05:18:56 - pico-train - INFO - โโโ Learning Rate: 5.29e-05 |
| 2025-08-31 05:18:56 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 05:19:47 - pico-train - INFO - Step 66400 -- ๐ Training Metrics |
| 2025-08-31 05:19:47 - pico-train - INFO - โโโ Loss: 4.7410 |
| 2025-08-31 05:19:47 - pico-train - INFO - โโโ Learning Rate: 5.26e-05 |
| 2025-08-31 05:19:47 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 05:20:40 - pico-train - INFO - Step 66500 -- ๐ Training Metrics |
| 2025-08-31 05:20:40 - pico-train - INFO - โโโ Loss: 4.6971 |
| 2025-08-31 05:20:40 - pico-train - INFO - โโโ Learning Rate: 5.23e-05 |
| 2025-08-31 05:20:40 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 05:21:31 - pico-train - INFO - Step 66600 -- ๐ Training Metrics |
| 2025-08-31 05:21:31 - pico-train - INFO - โโโ Loss: 4.6825 |
| 2025-08-31 05:21:31 - pico-train - INFO - โโโ Learning Rate: 5.20e-05 |
| 2025-08-31 05:21:31 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 05:22:23 - pico-train - INFO - Step 66700 -- ๐ Training Metrics |
| 2025-08-31 05:22:23 - pico-train - INFO - โโโ Loss: 4.7025 |
| 2025-08-31 05:22:23 - pico-train - INFO - โโโ Learning Rate: 5.18e-05 |
| 2025-08-31 05:22:23 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 05:23:14 - pico-train - INFO - Step 66800 -- ๐ Training Metrics |
| 2025-08-31 05:23:14 - pico-train - INFO - โโโ Loss: 4.6376 |
| 2025-08-31 05:23:14 - pico-train - INFO - โโโ Learning Rate: 5.15e-05 |
| 2025-08-31 05:23:14 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 05:24:06 - pico-train - INFO - Step 66900 -- ๐ Training Metrics |
| 2025-08-31 05:24:06 - pico-train - INFO - โโโ Loss: 4.6858 |
| 2025-08-31 05:24:06 - pico-train - INFO - โโโ Learning Rate: 5.12e-05 |
| 2025-08-31 05:24:06 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 05:24:58 - pico-train - INFO - Step 67000 -- ๐ Training Metrics |
| 2025-08-31 05:24:58 - pico-train - INFO - โโโ Loss: 4.6740 |
| 2025-08-31 05:24:58 - pico-train - INFO - โโโ Learning Rate: 5.09e-05 |
| 2025-08-31 05:24:58 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 05:25:49 - pico-train - INFO - Step 67100 -- ๐ Training Metrics |
| 2025-08-31 05:25:49 - pico-train - INFO - โโโ Loss: 4.6885 |
| 2025-08-31 05:25:49 - pico-train - INFO - โโโ Learning Rate: 5.06e-05 |
| 2025-08-31 05:25:49 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 05:26:41 - pico-train - INFO - Step 67200 -- ๐ Training Metrics |
| 2025-08-31 05:26:41 - pico-train - INFO - โโโ Loss: 4.6838 |
| 2025-08-31 05:26:41 - pico-train - INFO - โโโ Learning Rate: 5.04e-05 |
| 2025-08-31 05:26:41 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 05:27:33 - pico-train - INFO - Step 67300 -- ๐ Training Metrics |
| 2025-08-31 05:27:33 - pico-train - INFO - โโโ Loss: 4.7456 |
| 2025-08-31 05:27:33 - pico-train - INFO - โโโ Learning Rate: 5.01e-05 |
| 2025-08-31 05:27:33 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 05:28:25 - pico-train - INFO - Step 67400 -- ๐ Training Metrics |
| 2025-08-31 05:28:25 - pico-train - INFO - โโโ Loss: 4.6889 |
| 2025-08-31 05:28:25 - pico-train - INFO - โโโ Learning Rate: 4.98e-05 |
| 2025-08-31 05:28:25 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 05:29:17 - pico-train - INFO - Step 67500 -- ๐ Training Metrics |
| 2025-08-31 05:29:17 - pico-train - INFO - โโโ Loss: 4.6822 |
| 2025-08-31 05:29:17 - pico-train - INFO - โโโ Learning Rate: 4.95e-05 |
| 2025-08-31 05:29:17 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 05:30:08 - pico-train - INFO - Step 67600 -- ๐ Training Metrics |
| 2025-08-31 05:30:08 - pico-train - INFO - โโโ Loss: 4.7133 |
| 2025-08-31 05:30:08 - pico-train - INFO - โโโ Learning Rate: 4.93e-05 |
| 2025-08-31 05:30:08 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 05:31:00 - pico-train - INFO - Step 67700 -- ๐ Training Metrics |
| 2025-08-31 05:31:00 - pico-train - INFO - โโโ Loss: 4.6865 |
| 2025-08-31 05:31:00 - pico-train - INFO - โโโ Learning Rate: 4.90e-05 |
| 2025-08-31 05:31:00 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 05:31:51 - pico-train - INFO - Step 67800 -- ๐ Training Metrics |
| 2025-08-31 05:31:51 - pico-train - INFO - โโโ Loss: 4.6382 |
| 2025-08-31 05:31:51 - pico-train - INFO - โโโ Learning Rate: 4.87e-05 |
| 2025-08-31 05:31:51 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 05:32:43 - pico-train - INFO - Step 67900 -- ๐ Training Metrics |
| 2025-08-31 05:32:43 - pico-train - INFO - โโโ Loss: 4.6698 |
| 2025-08-31 05:32:43 - pico-train - INFO - โโโ Learning Rate: 4.84e-05 |
| 2025-08-31 05:32:43 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 05:33:34 - pico-train - INFO - Step 68000 -- ๐พ Saving Checkpoint |
| 2025-08-31 05:35:23 - pico-train - INFO - Step 68000 -- ๐ Evaluation Results |
| 2025-08-31 05:35:23 - pico-train - INFO - โโโ paloma: inf |
| 2025-08-31 05:35:23 - pico-train - INFO - Step 68000 -- ๐ Training Metrics |
| 2025-08-31 05:35:23 - pico-train - INFO - โโโ Loss: 4.6390 |
| 2025-08-31 05:35:23 - pico-train - INFO - โโโ Learning Rate: 4.82e-05 |
| 2025-08-31 05:35:23 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 05:35:23 - pico-train - INFO - Step 68000 -- ๐ Saving Learning Dynamics |
| 2025-08-31 05:36:16 - pico-train - INFO - Step 68100 -- ๐ Training Metrics |
| 2025-08-31 05:36:16 - pico-train - INFO - โโโ Loss: 4.7282 |
| 2025-08-31 05:36:16 - pico-train - INFO - โโโ Learning Rate: 4.79e-05 |
| 2025-08-31 05:36:16 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 05:37:08 - pico-train - INFO - Step 68200 -- ๐ Training Metrics |
| 2025-08-31 05:37:08 - pico-train - INFO - โโโ Loss: 4.6266 |
| 2025-08-31 05:37:08 - pico-train - INFO - โโโ Learning Rate: 4.76e-05 |
| 2025-08-31 05:37:08 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 05:37:59 - pico-train - INFO - Step 68300 -- ๐ Training Metrics |
| 2025-08-31 05:37:59 - pico-train - INFO - โโโ Loss: 4.6421 |
| 2025-08-31 05:37:59 - pico-train - INFO - โโโ Learning Rate: 4.73e-05 |
| 2025-08-31 05:37:59 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 05:38:51 - pico-train - INFO - Step 68400 -- ๐ Training Metrics |
| 2025-08-31 05:38:51 - pico-train - INFO - โโโ Loss: 4.6914 |
| 2025-08-31 05:38:51 - pico-train - INFO - โโโ Learning Rate: 4.71e-05 |
| 2025-08-31 05:38:51 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 05:39:43 - pico-train - INFO - Step 68500 -- ๐ Training Metrics |
| 2025-08-31 05:39:43 - pico-train - INFO - โโโ Loss: 4.6940 |
| 2025-08-31 05:39:43 - pico-train - INFO - โโโ Learning Rate: 4.68e-05 |
| 2025-08-31 05:39:43 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 05:40:35 - pico-train - INFO - Step 68600 -- ๐ Training Metrics |
| 2025-08-31 05:40:35 - pico-train - INFO - โโโ Loss: 4.7047 |
| 2025-08-31 05:40:35 - pico-train - INFO - โโโ Learning Rate: 4.65e-05 |
| 2025-08-31 05:40:35 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 05:41:26 - pico-train - INFO - Step 68700 -- ๐ Training Metrics |
| 2025-08-31 05:41:26 - pico-train - INFO - โโโ Loss: 4.6693 |
| 2025-08-31 05:41:26 - pico-train - INFO - โโโ Learning Rate: 4.63e-05 |
| 2025-08-31 05:41:26 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 05:42:17 - pico-train - INFO - Step 68800 -- ๐ Training Metrics |
| 2025-08-31 05:42:17 - pico-train - INFO - โโโ Loss: 4.4842 |
| 2025-08-31 05:42:17 - pico-train - INFO - โโโ Learning Rate: 4.60e-05 |
| 2025-08-31 05:42:17 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 05:43:09 - pico-train - INFO - Step 68900 -- ๐ Training Metrics |
| 2025-08-31 05:43:09 - pico-train - INFO - โโโ Loss: 4.6504 |
| 2025-08-31 05:43:09 - pico-train - INFO - โโโ Learning Rate: 4.57e-05 |
| 2025-08-31 05:43:09 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 05:44:01 - pico-train - INFO - Step 69000 -- ๐ Training Metrics |
| 2025-08-31 05:44:01 - pico-train - INFO - โโโ Loss: 4.6144 |
| 2025-08-31 05:44:01 - pico-train - INFO - โโโ Learning Rate: 4.54e-05 |
| 2025-08-31 05:44:01 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 05:44:53 - pico-train - INFO - Step 69100 -- ๐ Training Metrics |
| 2025-08-31 05:44:53 - pico-train - INFO - โโโ Loss: 4.7140 |
| 2025-08-31 05:44:53 - pico-train - INFO - โโโ Learning Rate: 4.52e-05 |
| 2025-08-31 05:44:53 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 05:45:45 - pico-train - INFO - Step 69200 -- ๐ Training Metrics |
| 2025-08-31 05:45:45 - pico-train - INFO - โโโ Loss: 4.6957 |
| 2025-08-31 05:45:45 - pico-train - INFO - โโโ Learning Rate: 4.49e-05 |
| 2025-08-31 05:45:45 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 05:46:37 - pico-train - INFO - Step 69300 -- ๐ Training Metrics |
| 2025-08-31 05:46:37 - pico-train - INFO - โโโ Loss: 4.7104 |
| 2025-08-31 05:46:37 - pico-train - INFO - โโโ Learning Rate: 4.46e-05 |
| 2025-08-31 05:46:37 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 05:47:28 - pico-train - INFO - Step 69400 -- ๐ Training Metrics |
| 2025-08-31 05:47:28 - pico-train - INFO - โโโ Loss: 4.6734 |
| 2025-08-31 05:47:28 - pico-train - INFO - โโโ Learning Rate: 4.44e-05 |
| 2025-08-31 05:47:28 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 05:48:21 - pico-train - INFO - Step 69500 -- ๐ Training Metrics |
| 2025-08-31 05:48:21 - pico-train - INFO - โโโ Loss: 4.6432 |
| 2025-08-31 05:48:21 - pico-train - INFO - โโโ Learning Rate: 4.41e-05 |
| 2025-08-31 05:48:21 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 05:49:13 - pico-train - INFO - Step 69600 -- ๐ Training Metrics |
| 2025-08-31 05:49:13 - pico-train - INFO - โโโ Loss: 4.6413 |
| 2025-08-31 05:49:13 - pico-train - INFO - โโโ Learning Rate: 4.38e-05 |
| 2025-08-31 05:49:13 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 05:50:04 - pico-train - INFO - Step 69700 -- ๐ Training Metrics |
| 2025-08-31 05:50:04 - pico-train - INFO - โโโ Loss: 4.3247 |
| 2025-08-31 05:50:04 - pico-train - INFO - โโโ Learning Rate: 4.36e-05 |
| 2025-08-31 05:50:04 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 05:50:56 - pico-train - INFO - Step 69800 -- ๐ Training Metrics |
| 2025-08-31 05:50:56 - pico-train - INFO - โโโ Loss: 4.6143 |
| 2025-08-31 05:50:56 - pico-train - INFO - โโโ Learning Rate: 4.33e-05 |
| 2025-08-31 05:50:56 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 05:51:47 - pico-train - INFO - Step 69900 -- ๐ Training Metrics |
| 2025-08-31 05:51:47 - pico-train - INFO - โโโ Loss: 4.6423 |
| 2025-08-31 05:51:47 - pico-train - INFO - โโโ Learning Rate: 4.31e-05 |
| 2025-08-31 05:51:47 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 05:52:39 - pico-train - INFO - Step 70000 -- ๐พ Saving Checkpoint |
| 2025-08-31 05:54:28 - pico-train - INFO - Step 70000 -- ๐ Evaluation Results |
| 2025-08-31 05:54:28 - pico-train - INFO - โโโ paloma: inf |
| 2025-08-31 05:54:29 - pico-train - INFO - Step 70000 -- ๐ Training Metrics |
| 2025-08-31 05:54:29 - pico-train - INFO - โโโ Loss: 4.5899 |
| 2025-08-31 05:54:29 - pico-train - INFO - โโโ Learning Rate: 4.28e-05 |
| 2025-08-31 05:54:29 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 05:54:29 - pico-train - INFO - Step 70000 -- ๐ Saving Learning Dynamics |
| 2025-08-31 05:55:21 - pico-train - INFO - Step 70100 -- ๐ Training Metrics |
| 2025-08-31 05:55:21 - pico-train - INFO - โโโ Loss: 4.6870 |
| 2025-08-31 05:55:21 - pico-train - INFO - โโโ Learning Rate: 4.25e-05 |
| 2025-08-31 05:55:21 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 05:56:12 - pico-train - INFO - Step 70200 -- ๐ Training Metrics |
| 2025-08-31 05:56:12 - pico-train - INFO - โโโ Loss: 4.7299 |
| 2025-08-31 05:56:12 - pico-train - INFO - โโโ Learning Rate: 4.23e-05 |
| 2025-08-31 05:56:12 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 05:57:04 - pico-train - INFO - Step 70300 -- ๐ Training Metrics |
| 2025-08-31 05:57:04 - pico-train - INFO - โโโ Loss: 4.5664 |
| 2025-08-31 05:57:04 - pico-train - INFO - โโโ Learning Rate: 4.20e-05 |
| 2025-08-31 05:57:04 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 05:57:56 - pico-train - INFO - Step 70400 -- ๐ Training Metrics |
| 2025-08-31 05:57:56 - pico-train - INFO - โโโ Loss: 4.5943 |
| 2025-08-31 05:57:56 - pico-train - INFO - โโโ Learning Rate: 4.17e-05 |
| 2025-08-31 05:57:56 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 05:58:48 - pico-train - INFO - Step 70500 -- ๐ Training Metrics |
| 2025-08-31 05:58:48 - pico-train - INFO - โโโ Loss: 4.6737 |
| 2025-08-31 05:58:48 - pico-train - INFO - โโโ Learning Rate: 4.15e-05 |
| 2025-08-31 05:58:48 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 05:59:40 - pico-train - INFO - Step 70600 -- ๐ Training Metrics |
| 2025-08-31 05:59:40 - pico-train - INFO - โโโ Loss: 4.6301 |
| 2025-08-31 05:59:40 - pico-train - INFO - โโโ Learning Rate: 4.12e-05 |
| 2025-08-31 05:59:40 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 06:00:31 - pico-train - INFO - Step 70700 -- ๐ Training Metrics |
| 2025-08-31 06:00:31 - pico-train - INFO - โโโ Loss: 4.6488 |
| 2025-08-31 06:00:31 - pico-train - INFO - โโโ Learning Rate: 4.10e-05 |
| 2025-08-31 06:00:31 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 06:01:23 - pico-train - INFO - Step 70800 -- ๐ Training Metrics |
| 2025-08-31 06:01:23 - pico-train - INFO - โโโ Loss: 4.6588 |
| 2025-08-31 06:01:23 - pico-train - INFO - โโโ Learning Rate: 4.07e-05 |
| 2025-08-31 06:01:23 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 06:02:14 - pico-train - INFO - Step 70900 -- ๐ Training Metrics |
| 2025-08-31 06:02:14 - pico-train - INFO - โโโ Loss: 4.6257 |
| 2025-08-31 06:02:14 - pico-train - INFO - โโโ Learning Rate: 4.04e-05 |
| 2025-08-31 06:02:14 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 06:03:06 - pico-train - INFO - Step 71000 -- ๐ Training Metrics |
| 2025-08-31 06:03:06 - pico-train - INFO - โโโ Loss: 4.6493 |
| 2025-08-31 06:03:06 - pico-train - INFO - โโโ Learning Rate: 4.02e-05 |
| 2025-08-31 06:03:06 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 06:03:58 - pico-train - INFO - Step 71100 -- ๐ Training Metrics |
| 2025-08-31 06:03:58 - pico-train - INFO - โโโ Loss: 4.6038 |
| 2025-08-31 06:03:58 - pico-train - INFO - โโโ Learning Rate: 3.99e-05 |
| 2025-08-31 06:03:58 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 06:04:50 - pico-train - INFO - Step 71200 -- ๐ Training Metrics |
| 2025-08-31 06:04:50 - pico-train - INFO - โโโ Loss: 4.6615 |
| 2025-08-31 06:04:50 - pico-train - INFO - โโโ Learning Rate: 3.97e-05 |
| 2025-08-31 06:04:50 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 06:05:41 - pico-train - INFO - Step 71300 -- ๐ Training Metrics |
| 2025-08-31 06:05:41 - pico-train - INFO - โโโ Loss: 4.7068 |
| 2025-08-31 06:05:41 - pico-train - INFO - โโโ Learning Rate: 3.94e-05 |
| 2025-08-31 06:05:41 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 06:06:32 - pico-train - INFO - Step 71400 -- ๐ Training Metrics |
| 2025-08-31 06:06:32 - pico-train - INFO - โโโ Loss: 4.6622 |
| 2025-08-31 06:06:32 - pico-train - INFO - โโโ Learning Rate: 3.92e-05 |
| 2025-08-31 06:06:32 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 06:07:25 - pico-train - INFO - Step 71500 -- ๐ Training Metrics |
| 2025-08-31 06:07:25 - pico-train - INFO - โโโ Loss: 4.7188 |
| 2025-08-31 06:07:25 - pico-train - INFO - โโโ Learning Rate: 3.89e-05 |
| 2025-08-31 06:07:25 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 06:08:16 - pico-train - INFO - Step 71600 -- ๐ Training Metrics |
| 2025-08-31 06:08:16 - pico-train - INFO - โโโ Loss: 4.6602 |
| 2025-08-31 06:08:16 - pico-train - INFO - โโโ Learning Rate: 3.87e-05 |
| 2025-08-31 06:08:16 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 06:09:08 - pico-train - INFO - Step 71700 -- ๐ Training Metrics |
| 2025-08-31 06:09:08 - pico-train - INFO - โโโ Loss: 4.6068 |
| 2025-08-31 06:09:08 - pico-train - INFO - โโโ Learning Rate: 3.84e-05 |
| 2025-08-31 06:09:08 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 06:09:59 - pico-train - INFO - Step 71800 -- ๐ Training Metrics |
| 2025-08-31 06:09:59 - pico-train - INFO - โโโ Loss: 4.7080 |
| 2025-08-31 06:09:59 - pico-train - INFO - โโโ Learning Rate: 3.82e-05 |
| 2025-08-31 06:09:59 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 06:10:51 - pico-train - INFO - Step 71900 -- ๐ Training Metrics |
| 2025-08-31 06:10:51 - pico-train - INFO - โโโ Loss: 4.6023 |
| 2025-08-31 06:10:51 - pico-train - INFO - โโโ Learning Rate: 3.79e-05 |
| 2025-08-31 06:10:51 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 06:11:42 - pico-train - INFO - Step 72000 -- ๐พ Saving Checkpoint |
| 2025-08-31 06:13:30 - pico-train - INFO - Step 72000 -- ๐ Evaluation Results |
| 2025-08-31 06:13:30 - pico-train - INFO - โโโ paloma: inf |
| 2025-08-31 06:13:31 - pico-train - INFO - Step 72000 -- ๐ Training Metrics |
| 2025-08-31 06:13:31 - pico-train - INFO - โโโ Loss: 4.5242 |
| 2025-08-31 06:13:31 - pico-train - INFO - โโโ Learning Rate: 3.77e-05 |
| 2025-08-31 06:13:31 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 06:13:31 - pico-train - INFO - Step 72000 -- ๐ Saving Learning Dynamics |
| 2025-08-31 06:14:23 - pico-train - INFO - Step 72100 -- ๐ Training Metrics |
| 2025-08-31 06:14:23 - pico-train - INFO - โโโ Loss: 4.5232 |
| 2025-08-31 06:14:23 - pico-train - INFO - โโโ Learning Rate: 3.74e-05 |
| 2025-08-31 06:14:23 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 06:15:15 - pico-train - INFO - Step 72200 -- ๐ Training Metrics |
| 2025-08-31 06:15:15 - pico-train - INFO - โโโ Loss: 4.3945 |
| 2025-08-31 06:15:15 - pico-train - INFO - โโโ Learning Rate: 3.72e-05 |
| 2025-08-31 06:15:15 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 06:16:06 - pico-train - INFO - Step 72300 -- ๐ Training Metrics |
| 2025-08-31 06:16:06 - pico-train - INFO - โโโ Loss: 4.5338 |
| 2025-08-31 06:16:06 - pico-train - INFO - โโโ Learning Rate: 3.69e-05 |
| 2025-08-31 06:16:06 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 06:16:58 - pico-train - INFO - Step 72400 -- ๐ Training Metrics |
| 2025-08-31 06:16:58 - pico-train - INFO - โโโ Loss: 4.4615 |
| 2025-08-31 06:16:58 - pico-train - INFO - โโโ Learning Rate: 3.67e-05 |
| 2025-08-31 06:16:58 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 06:17:51 - pico-train - INFO - Step 72500 -- ๐ Training Metrics |
| 2025-08-31 06:17:51 - pico-train - INFO - โโโ Loss: 4.6405 |
| 2025-08-31 06:17:51 - pico-train - INFO - โโโ Learning Rate: 3.64e-05 |
| 2025-08-31 06:17:51 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 06:18:42 - pico-train - INFO - Step 72600 -- ๐ Training Metrics |
| 2025-08-31 06:18:42 - pico-train - INFO - โโโ Loss: 4.5952 |
| 2025-08-31 06:18:42 - pico-train - INFO - โโโ Learning Rate: 3.62e-05 |
| 2025-08-31 06:18:42 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 06:19:34 - pico-train - INFO - Step 72700 -- ๐ Training Metrics |
| 2025-08-31 06:19:34 - pico-train - INFO - โโโ Loss: 4.4701 |
| 2025-08-31 06:19:34 - pico-train - INFO - โโโ Learning Rate: 3.59e-05 |
| 2025-08-31 06:19:34 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 06:20:25 - pico-train - INFO - Step 72800 -- ๐ Training Metrics |
| 2025-08-31 06:20:25 - pico-train - INFO - โโโ Loss: 4.5874 |
| 2025-08-31 06:20:25 - pico-train - INFO - โโโ Learning Rate: 3.57e-05 |
| 2025-08-31 06:20:25 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 06:21:17 - pico-train - INFO - Step 72900 -- ๐ Training Metrics |
| 2025-08-31 06:21:17 - pico-train - INFO - โโโ Loss: 4.5509 |
| 2025-08-31 06:21:17 - pico-train - INFO - โโโ Learning Rate: 3.54e-05 |
| 2025-08-31 06:21:17 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 06:22:09 - pico-train - INFO - Step 73000 -- ๐ Training Metrics |
| 2025-08-31 06:22:09 - pico-train - INFO - โโโ Loss: 4.5398 |
| 2025-08-31 06:22:09 - pico-train - INFO - โโโ Learning Rate: 3.52e-05 |
| 2025-08-31 06:22:09 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 06:23:00 - pico-train - INFO - Step 73100 -- ๐ Training Metrics |
| 2025-08-31 06:23:00 - pico-train - INFO - โโโ Loss: 4.3339 |
| 2025-08-31 06:23:00 - pico-train - INFO - โโโ Learning Rate: 3.49e-05 |
| 2025-08-31 06:23:00 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 06:23:52 - pico-train - INFO - Step 73200 -- ๐ Training Metrics |
| 2025-08-31 06:23:52 - pico-train - INFO - โโโ Loss: 4.3875 |
| 2025-08-31 06:23:52 - pico-train - INFO - โโโ Learning Rate: 3.47e-05 |
| 2025-08-31 06:23:52 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 06:24:43 - pico-train - INFO - Step 73300 -- ๐ Training Metrics |
| 2025-08-31 06:24:43 - pico-train - INFO - โโโ Loss: 4.5318 |
| 2025-08-31 06:24:43 - pico-train - INFO - โโโ Learning Rate: 3.44e-05 |
| 2025-08-31 06:24:43 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 06:25:34 - pico-train - INFO - Step 73400 -- ๐ Training Metrics |
| 2025-08-31 06:25:34 - pico-train - INFO - โโโ Loss: 4.5182 |
| 2025-08-31 06:25:34 - pico-train - INFO - โโโ Learning Rate: 3.42e-05 |
| 2025-08-31 06:25:34 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 06:26:27 - pico-train - INFO - Step 73500 -- ๐ Training Metrics |
| 2025-08-31 06:26:27 - pico-train - INFO - โโโ Loss: 4.5567 |
| 2025-08-31 06:26:27 - pico-train - INFO - โโโ Learning Rate: 3.40e-05 |
| 2025-08-31 06:26:27 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 06:27:19 - pico-train - INFO - Step 73600 -- ๐ Training Metrics |
| 2025-08-31 06:27:19 - pico-train - INFO - โโโ Loss: 4.4407 |
| 2025-08-31 06:27:19 - pico-train - INFO - โโโ Learning Rate: 3.37e-05 |
| 2025-08-31 06:27:19 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 06:28:11 - pico-train - INFO - Step 73700 -- ๐ Training Metrics |
| 2025-08-31 06:28:11 - pico-train - INFO - โโโ Loss: 4.5335 |
| 2025-08-31 06:28:11 - pico-train - INFO - โโโ Learning Rate: 3.35e-05 |
| 2025-08-31 06:28:11 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 06:29:03 - pico-train - INFO - Step 73800 -- ๐ Training Metrics |
| 2025-08-31 06:29:03 - pico-train - INFO - โโโ Loss: 4.6355 |
| 2025-08-31 06:29:03 - pico-train - INFO - โโโ Learning Rate: 3.32e-05 |
| 2025-08-31 06:29:03 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 06:29:54 - pico-train - INFO - Step 73900 -- ๐ Training Metrics |
| 2025-08-31 06:29:54 - pico-train - INFO - โโโ Loss: 4.6095 |
| 2025-08-31 06:29:54 - pico-train - INFO - โโโ Learning Rate: 3.30e-05 |
| 2025-08-31 06:29:54 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 06:30:46 - pico-train - INFO - Step 74000 -- ๐พ Saving Checkpoint |
| 2025-08-31 06:32:33 - pico-train - INFO - Step 74000 -- ๐ Evaluation Results |
| 2025-08-31 06:32:33 - pico-train - INFO - โโโ paloma: inf |
| 2025-08-31 06:32:34 - pico-train - INFO - Step 74000 -- ๐ Training Metrics |
| 2025-08-31 06:32:34 - pico-train - INFO - โโโ Loss: 4.5579 |
| 2025-08-31 06:32:34 - pico-train - INFO - โโโ Learning Rate: 3.28e-05 |
| 2025-08-31 06:32:34 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 06:32:34 - pico-train - INFO - Step 74000 -- ๐ Saving Learning Dynamics |
| 2025-08-31 06:33:26 - pico-train - INFO - Step 74100 -- ๐ Training Metrics |
| 2025-08-31 06:33:26 - pico-train - INFO - โโโ Loss: 4.5207 |
| 2025-08-31 06:33:26 - pico-train - INFO - โโโ Learning Rate: 3.25e-05 |
| 2025-08-31 06:33:26 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 06:34:17 - pico-train - INFO - Step 74200 -- ๐ Training Metrics |
| 2025-08-31 06:34:17 - pico-train - INFO - โโโ Loss: 4.6675 |
| 2025-08-31 06:34:17 - pico-train - INFO - โโโ Learning Rate: 3.23e-05 |
| 2025-08-31 06:34:17 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 06:35:09 - pico-train - INFO - Step 74300 -- ๐ Training Metrics |
| 2025-08-31 06:35:09 - pico-train - INFO - โโโ Loss: 4.4126 |
| 2025-08-31 06:35:09 - pico-train - INFO - โโโ Learning Rate: 3.21e-05 |
| 2025-08-31 06:35:09 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 06:36:01 - pico-train - INFO - Step 74400 -- ๐ Training Metrics |
| 2025-08-31 06:36:01 - pico-train - INFO - โโโ Loss: 4.7039 |
| 2025-08-31 06:36:01 - pico-train - INFO - โโโ Learning Rate: 3.18e-05 |
| 2025-08-31 06:36:01 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 06:36:53 - pico-train - INFO - Step 74500 -- ๐ Training Metrics |
| 2025-08-31 06:36:53 - pico-train - INFO - โโโ Loss: 4.6828 |
| 2025-08-31 06:36:53 - pico-train - INFO - โโโ Learning Rate: 3.16e-05 |
| 2025-08-31 06:36:53 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 06:37:45 - pico-train - INFO - Step 74600 -- ๐ Training Metrics |
| 2025-08-31 06:37:45 - pico-train - INFO - โโโ Loss: 4.7129 |
| 2025-08-31 06:37:45 - pico-train - INFO - โโโ Learning Rate: 3.14e-05 |
| 2025-08-31 06:37:45 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 06:38:36 - pico-train - INFO - Step 74700 -- ๐ Training Metrics |
| 2025-08-31 06:38:36 - pico-train - INFO - โโโ Loss: 4.5640 |
| 2025-08-31 06:38:36 - pico-train - INFO - โโโ Learning Rate: 3.11e-05 |
| 2025-08-31 06:38:36 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 06:39:28 - pico-train - INFO - Step 74800 -- ๐ Training Metrics |
| 2025-08-31 06:39:28 - pico-train - INFO - โโโ Loss: 4.6214 |
| 2025-08-31 06:39:28 - pico-train - INFO - โโโ Learning Rate: 3.09e-05 |
| 2025-08-31 06:39:28 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 06:40:20 - pico-train - INFO - Step 74900 -- ๐ Training Metrics |
| 2025-08-31 06:40:20 - pico-train - INFO - โโโ Loss: 4.6219 |
| 2025-08-31 06:40:20 - pico-train - INFO - โโโ Learning Rate: 3.07e-05 |
| 2025-08-31 06:40:20 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 06:41:12 - pico-train - INFO - Step 75000 -- ๐ Training Metrics |
| 2025-08-31 06:41:12 - pico-train - INFO - โโโ Loss: 4.6650 |
| 2025-08-31 06:41:12 - pico-train - INFO - โโโ Learning Rate: 3.04e-05 |
| 2025-08-31 06:41:12 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 06:42:03 - pico-train - INFO - Step 75100 -- ๐ Training Metrics |
| 2025-08-31 06:42:03 - pico-train - INFO - โโโ Loss: 4.6657 |
| 2025-08-31 06:42:03 - pico-train - INFO - โโโ Learning Rate: 3.02e-05 |
| 2025-08-31 06:42:03 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 06:42:54 - pico-train - INFO - Step 75200 -- ๐ Training Metrics |
| 2025-08-31 06:42:54 - pico-train - INFO - โโโ Loss: 4.6446 |
| 2025-08-31 06:42:54 - pico-train - INFO - โโโ Learning Rate: 3.00e-05 |
| 2025-08-31 06:42:54 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 06:43:46 - pico-train - INFO - Step 75300 -- ๐ Training Metrics |
| 2025-08-31 06:43:46 - pico-train - INFO - โโโ Loss: 4.6650 |
| 2025-08-31 06:43:46 - pico-train - INFO - โโโ Learning Rate: 2.97e-05 |
| 2025-08-31 06:43:46 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 06:44:38 - pico-train - INFO - Step 75400 -- ๐ Training Metrics |
| 2025-08-31 06:44:38 - pico-train - INFO - โโโ Loss: 4.7400 |
| 2025-08-31 06:44:38 - pico-train - INFO - โโโ Learning Rate: 2.95e-05 |
| 2025-08-31 06:44:38 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 06:45:30 - pico-train - INFO - Step 75500 -- ๐ Training Metrics |
| 2025-08-31 06:45:30 - pico-train - INFO - โโโ Loss: 4.7032 |
| 2025-08-31 06:45:30 - pico-train - INFO - โโโ Learning Rate: 2.93e-05 |
| 2025-08-31 06:45:30 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 06:46:21 - pico-train - INFO - Step 75600 -- ๐ Training Metrics |
| 2025-08-31 06:46:21 - pico-train - INFO - โโโ Loss: 4.7058 |
| 2025-08-31 06:46:21 - pico-train - INFO - โโโ Learning Rate: 2.91e-05 |
| 2025-08-31 06:46:21 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 06:47:13 - pico-train - INFO - Step 75700 -- ๐ Training Metrics |
| 2025-08-31 06:47:13 - pico-train - INFO - โโโ Loss: 4.6217 |
| 2025-08-31 06:47:13 - pico-train - INFO - โโโ Learning Rate: 2.88e-05 |
| 2025-08-31 06:47:13 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 06:48:04 - pico-train - INFO - Step 75800 -- ๐ Training Metrics |
| 2025-08-31 06:48:04 - pico-train - INFO - โโโ Loss: 4.3060 |
| 2025-08-31 06:48:04 - pico-train - INFO - โโโ Learning Rate: 2.86e-05 |
| 2025-08-31 06:48:04 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 06:48:56 - pico-train - INFO - Step 75900 -- ๐ Training Metrics |
| 2025-08-31 06:48:56 - pico-train - INFO - โโโ Loss: 4.6372 |
| 2025-08-31 06:48:56 - pico-train - INFO - โโโ Learning Rate: 2.84e-05 |
| 2025-08-31 06:48:56 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 06:49:47 - pico-train - INFO - Step 76000 -- ๐พ Saving Checkpoint |
| 2025-08-31 06:51:36 - pico-train - INFO - Step 76000 -- ๐ Evaluation Results |
| 2025-08-31 06:51:36 - pico-train - INFO - โโโ paloma: inf |
| 2025-08-31 06:51:36 - pico-train - INFO - Step 76000 -- ๐ Training Metrics |
| 2025-08-31 06:51:36 - pico-train - INFO - โโโ Loss: 4.7263 |
| 2025-08-31 06:51:36 - pico-train - INFO - โโโ Learning Rate: 2.82e-05 |
| 2025-08-31 06:51:36 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 06:51:36 - pico-train - INFO - Step 76000 -- ๐ Saving Learning Dynamics |
| 2025-08-31 06:52:29 - pico-train - INFO - Step 76100 -- ๐ Training Metrics |
| 2025-08-31 06:52:29 - pico-train - INFO - โโโ Loss: 4.5898 |
| 2025-08-31 06:52:29 - pico-train - INFO - โโโ Learning Rate: 2.79e-05 |
| 2025-08-31 06:52:29 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 06:53:20 - pico-train - INFO - Step 76200 -- ๐ Training Metrics |
| 2025-08-31 06:53:20 - pico-train - INFO - โโโ Loss: 4.5457 |
| 2025-08-31 06:53:20 - pico-train - INFO - โโโ Learning Rate: 2.77e-05 |
| 2025-08-31 06:53:20 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 06:54:12 - pico-train - INFO - Step 76300 -- ๐ Training Metrics |
| 2025-08-31 06:54:12 - pico-train - INFO - โโโ Loss: 4.6344 |
| 2025-08-31 06:54:12 - pico-train - INFO - โโโ Learning Rate: 2.75e-05 |
| 2025-08-31 06:54:12 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 06:55:03 - pico-train - INFO - Step 76400 -- ๐ Training Metrics |
| 2025-08-31 06:55:03 - pico-train - INFO - โโโ Loss: 4.7356 |
| 2025-08-31 06:55:03 - pico-train - INFO - โโโ Learning Rate: 2.73e-05 |
| 2025-08-31 06:55:03 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 06:55:55 - pico-train - INFO - Step 76500 -- ๐ Training Metrics |
| 2025-08-31 06:55:55 - pico-train - INFO - โโโ Loss: 4.6593 |
| 2025-08-31 06:55:55 - pico-train - INFO - โโโ Learning Rate: 2.71e-05 |
| 2025-08-31 06:55:55 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 06:56:47 - pico-train - INFO - Step 76600 -- ๐ Training Metrics |
| 2025-08-31 06:56:47 - pico-train - INFO - โโโ Loss: 4.7932 |
| 2025-08-31 06:56:47 - pico-train - INFO - โโโ Learning Rate: 2.68e-05 |
| 2025-08-31 06:56:47 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 06:57:38 - pico-train - INFO - Step 76700 -- ๐ Training Metrics |
| 2025-08-31 06:57:38 - pico-train - INFO - โโโ Loss: 4.7641 |
| 2025-08-31 06:57:38 - pico-train - INFO - โโโ Learning Rate: 2.66e-05 |
| 2025-08-31 06:57:38 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 06:58:30 - pico-train - INFO - Step 76800 -- ๐ Training Metrics |
| 2025-08-31 06:58:30 - pico-train - INFO - โโโ Loss: 4.7706 |
| 2025-08-31 06:58:30 - pico-train - INFO - โโโ Learning Rate: 2.64e-05 |
| 2025-08-31 06:58:30 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 06:59:21 - pico-train - INFO - Step 76900 -- ๐ Training Metrics |
| 2025-08-31 06:59:21 - pico-train - INFO - โโโ Loss: 4.7679 |
| 2025-08-31 06:59:21 - pico-train - INFO - โโโ Learning Rate: 2.62e-05 |
| 2025-08-31 06:59:21 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 07:00:14 - pico-train - INFO - Step 77000 -- ๐ Training Metrics |
| 2025-08-31 07:00:14 - pico-train - INFO - โโโ Loss: 4.7579 |
| 2025-08-31 07:00:14 - pico-train - INFO - โโโ Learning Rate: 2.60e-05 |
| 2025-08-31 07:00:14 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 07:01:05 - pico-train - INFO - Step 77100 -- ๐ Training Metrics |
| 2025-08-31 07:01:05 - pico-train - INFO - โโโ Loss: 4.7721 |
| 2025-08-31 07:01:05 - pico-train - INFO - โโโ Learning Rate: 2.58e-05 |
| 2025-08-31 07:01:05 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 07:01:57 - pico-train - INFO - Step 77200 -- ๐ Training Metrics |
| 2025-08-31 07:01:57 - pico-train - INFO - โโโ Loss: 4.7586 |
| 2025-08-31 07:01:57 - pico-train - INFO - โโโ Learning Rate: 2.55e-05 |
| 2025-08-31 07:01:57 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 07:02:48 - pico-train - INFO - Step 77300 -- ๐ Training Metrics |
| 2025-08-31 07:02:48 - pico-train - INFO - โโโ Loss: 4.7567 |
| 2025-08-31 07:02:48 - pico-train - INFO - โโโ Learning Rate: 2.53e-05 |
| 2025-08-31 07:02:48 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 07:03:40 - pico-train - INFO - Step 77400 -- ๐ Training Metrics |
| 2025-08-31 07:03:40 - pico-train - INFO - โโโ Loss: 4.7464 |
| 2025-08-31 07:03:40 - pico-train - INFO - โโโ Learning Rate: 2.51e-05 |
| 2025-08-31 07:03:40 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 07:04:32 - pico-train - INFO - Step 77500 -- ๐ Training Metrics |
| 2025-08-31 07:04:32 - pico-train - INFO - โโโ Loss: 4.7402 |
| 2025-08-31 07:04:32 - pico-train - INFO - โโโ Learning Rate: 2.49e-05 |
| 2025-08-31 07:04:32 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 07:05:23 - pico-train - INFO - Step 77600 -- ๐ Training Metrics |
| 2025-08-31 07:05:23 - pico-train - INFO - โโโ Loss: 4.8047 |
| 2025-08-31 07:05:23 - pico-train - INFO - โโโ Learning Rate: 2.47e-05 |
| 2025-08-31 07:05:23 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 07:06:15 - pico-train - INFO - Step 77700 -- ๐ Training Metrics |
| 2025-08-31 07:06:15 - pico-train - INFO - โโโ Loss: 4.8035 |
| 2025-08-31 07:06:15 - pico-train - INFO - โโโ Learning Rate: 2.45e-05 |
| 2025-08-31 07:06:15 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 07:07:06 - pico-train - INFO - Step 77800 -- ๐ Training Metrics |
| 2025-08-31 07:07:06 - pico-train - INFO - โโโ Loss: 4.7325 |
| 2025-08-31 07:07:06 - pico-train - INFO - โโโ Learning Rate: 2.43e-05 |
| 2025-08-31 07:07:06 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 07:07:58 - pico-train - INFO - Step 77900 -- ๐ Training Metrics |
| 2025-08-31 07:07:58 - pico-train - INFO - โโโ Loss: 4.6718 |
| 2025-08-31 07:07:58 - pico-train - INFO - โโโ Learning Rate: 2.41e-05 |
| 2025-08-31 07:07:58 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 07:08:51 - pico-train - INFO - Step 78000 -- ๐พ Saving Checkpoint |
| 2025-08-31 07:10:39 - pico-train - INFO - Step 78000 -- ๐ Evaluation Results |
| 2025-08-31 07:10:39 - pico-train - INFO - โโโ paloma: inf |
| 2025-08-31 07:10:39 - pico-train - INFO - Step 78000 -- ๐ Training Metrics |
| 2025-08-31 07:10:39 - pico-train - INFO - โโโ Loss: 4.7240 |
| 2025-08-31 07:10:39 - pico-train - INFO - โโโ Learning Rate: 2.39e-05 |
| 2025-08-31 07:10:39 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 07:10:39 - pico-train - INFO - Step 78000 -- ๐ Saving Learning Dynamics |
| 2025-08-31 07:11:31 - pico-train - INFO - Step 78100 -- ๐ Training Metrics |
| 2025-08-31 07:11:31 - pico-train - INFO - โโโ Loss: 4.7809 |
| 2025-08-31 07:11:31 - pico-train - INFO - โโโ Learning Rate: 2.36e-05 |
| 2025-08-31 07:11:31 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 07:12:23 - pico-train - INFO - Step 78200 -- ๐ Training Metrics |
| 2025-08-31 07:12:23 - pico-train - INFO - โโโ Loss: 4.7888 |
| 2025-08-31 07:12:23 - pico-train - INFO - โโโ Learning Rate: 2.34e-05 |
| 2025-08-31 07:12:23 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 07:13:15 - pico-train - INFO - Step 78300 -- ๐ Training Metrics |
| 2025-08-31 07:13:15 - pico-train - INFO - โโโ Loss: 4.7886 |
| 2025-08-31 07:13:15 - pico-train - INFO - โโโ Learning Rate: 2.32e-05 |
| 2025-08-31 07:13:15 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 07:14:07 - pico-train - INFO - Step 78400 -- ๐ Training Metrics |
| 2025-08-31 07:14:07 - pico-train - INFO - โโโ Loss: 4.8433 |
| 2025-08-31 07:14:07 - pico-train - INFO - โโโ Learning Rate: 2.30e-05 |
| 2025-08-31 07:14:07 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 07:14:59 - pico-train - INFO - Step 78500 -- ๐ Training Metrics |
| 2025-08-31 07:14:59 - pico-train - INFO - โโโ Loss: 4.7275 |
| 2025-08-31 07:14:59 - pico-train - INFO - โโโ Learning Rate: 2.28e-05 |
| 2025-08-31 07:14:59 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 07:15:51 - pico-train - INFO - Step 78600 -- ๐ Training Metrics |
| 2025-08-31 07:15:51 - pico-train - INFO - โโโ Loss: 4.7189 |
| 2025-08-31 07:15:51 - pico-train - INFO - โโโ Learning Rate: 2.26e-05 |
| 2025-08-31 07:15:51 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 07:16:43 - pico-train - INFO - Step 78700 -- ๐ Training Metrics |
| 2025-08-31 07:16:43 - pico-train - INFO - โโโ Loss: 4.7776 |
| 2025-08-31 07:16:43 - pico-train - INFO - โโโ Learning Rate: 2.24e-05 |
| 2025-08-31 07:16:43 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 07:17:35 - pico-train - INFO - Step 78800 -- ๐ Training Metrics |
| 2025-08-31 07:17:35 - pico-train - INFO - โโโ Loss: 4.7769 |
| 2025-08-31 07:17:35 - pico-train - INFO - โโโ Learning Rate: 2.22e-05 |
| 2025-08-31 07:17:35 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 07:18:27 - pico-train - INFO - Step 78900 -- ๐ Training Metrics |
| 2025-08-31 07:18:27 - pico-train - INFO - โโโ Loss: 4.7239 |
| 2025-08-31 07:18:27 - pico-train - INFO - โโโ Learning Rate: 2.20e-05 |
| 2025-08-31 07:18:27 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 07:19:19 - pico-train - INFO - Step 79000 -- ๐ Training Metrics |
| 2025-08-31 07:19:19 - pico-train - INFO - โโโ Loss: 4.7447 |
| 2025-08-31 07:19:19 - pico-train - INFO - โโโ Learning Rate: 2.18e-05 |
| 2025-08-31 07:19:19 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 07:20:11 - pico-train - INFO - Step 79100 -- ๐ Training Metrics |
| 2025-08-31 07:20:11 - pico-train - INFO - โโโ Loss: 4.7424 |
| 2025-08-31 07:20:11 - pico-train - INFO - โโโ Learning Rate: 2.16e-05 |
| 2025-08-31 07:20:11 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 07:21:02 - pico-train - INFO - Step 79200 -- ๐ Training Metrics |
| 2025-08-31 07:21:02 - pico-train - INFO - โโโ Loss: 4.7622 |
| 2025-08-31 07:21:02 - pico-train - INFO - โโโ Learning Rate: 2.14e-05 |
| 2025-08-31 07:21:02 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 07:21:54 - pico-train - INFO - Step 79300 -- ๐ Training Metrics |
| 2025-08-31 07:21:54 - pico-train - INFO - โโโ Loss: 4.7830 |
| 2025-08-31 07:21:54 - pico-train - INFO - โโโ Learning Rate: 2.12e-05 |
| 2025-08-31 07:21:54 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 07:22:46 - pico-train - INFO - Step 79400 -- ๐ Training Metrics |
| 2025-08-31 07:22:46 - pico-train - INFO - โโโ Loss: 4.7487 |
| 2025-08-31 07:22:46 - pico-train - INFO - โโโ Learning Rate: 2.10e-05 |
| 2025-08-31 07:22:46 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 07:23:38 - pico-train - INFO - Step 79500 -- ๐ Training Metrics |
| 2025-08-31 07:23:38 - pico-train - INFO - โโโ Loss: 4.7489 |
| 2025-08-31 07:23:38 - pico-train - INFO - โโโ Learning Rate: 2.08e-05 |
| 2025-08-31 07:23:38 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 07:24:30 - pico-train - INFO - Step 79600 -- ๐ Training Metrics |
| 2025-08-31 07:24:30 - pico-train - INFO - โโโ Loss: 4.7076 |
| 2025-08-31 07:24:30 - pico-train - INFO - โโโ Learning Rate: 2.06e-05 |
| 2025-08-31 07:24:30 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 07:25:21 - pico-train - INFO - Step 79700 -- ๐ Training Metrics |
| 2025-08-31 07:25:21 - pico-train - INFO - โโโ Loss: 4.7456 |
| 2025-08-31 07:25:21 - pico-train - INFO - โโโ Learning Rate: 2.04e-05 |
| 2025-08-31 07:25:21 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 07:26:14 - pico-train - INFO - Step 79800 -- ๐ Training Metrics |
| 2025-08-31 07:26:14 - pico-train - INFO - โโโ Loss: 4.7621 |
| 2025-08-31 07:26:14 - pico-train - INFO - โโโ Learning Rate: 2.02e-05 |
| 2025-08-31 07:26:14 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 07:27:05 - pico-train - INFO - Step 79900 -- ๐ Training Metrics |
| 2025-08-31 07:27:05 - pico-train - INFO - โโโ Loss: 4.8204 |
| 2025-08-31 07:27:05 - pico-train - INFO - โโโ Learning Rate: 2.01e-05 |
| 2025-08-31 07:27:05 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 07:27:57 - pico-train - INFO - Step 80000 -- ๐พ Saving Checkpoint |
| 2025-08-31 07:29:44 - pico-train - INFO - Step 80000 -- ๐ Evaluation Results |
| 2025-08-31 07:29:44 - pico-train - INFO - โโโ paloma: inf |
| 2025-08-31 07:29:45 - pico-train - INFO - Step 80000 -- ๐ Training Metrics |
| 2025-08-31 07:29:45 - pico-train - INFO - โโโ Loss: 4.8223 |
| 2025-08-31 07:29:45 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 07:29:45 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 07:29:45 - pico-train - INFO - Step 80000 -- ๐ Saving Learning Dynamics |
| 2025-08-31 07:30:37 - pico-train - INFO - Step 80100 -- ๐ Training Metrics |
| 2025-08-31 07:30:37 - pico-train - INFO - โโโ Loss: 4.7804 |
| 2025-08-31 07:30:37 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 07:30:37 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 07:31:29 - pico-train - INFO - Step 80200 -- ๐ Training Metrics |
| 2025-08-31 07:31:29 - pico-train - INFO - โโโ Loss: 4.7785 |
| 2025-08-31 07:31:29 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 07:31:29 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 07:32:20 - pico-train - INFO - Step 80300 -- ๐ Training Metrics |
| 2025-08-31 07:32:20 - pico-train - INFO - โโโ Loss: 4.7587 |
| 2025-08-31 07:32:20 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 07:32:20 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 07:33:12 - pico-train - INFO - Step 80400 -- ๐ Training Metrics |
| 2025-08-31 07:33:12 - pico-train - INFO - โโโ Loss: 4.7610 |
| 2025-08-31 07:33:12 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 07:33:12 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 07:34:04 - pico-train - INFO - Step 80500 -- ๐ Training Metrics |
| 2025-08-31 07:34:04 - pico-train - INFO - โโโ Loss: 4.7088 |
| 2025-08-31 07:34:04 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 07:34:04 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 07:34:56 - pico-train - INFO - Step 80600 -- ๐ Training Metrics |
| 2025-08-31 07:34:56 - pico-train - INFO - โโโ Loss: 4.7338 |
| 2025-08-31 07:34:56 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 07:34:56 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 07:35:47 - pico-train - INFO - Step 80700 -- ๐ Training Metrics |
| 2025-08-31 07:35:47 - pico-train - INFO - โโโ Loss: 4.7664 |
| 2025-08-31 07:35:47 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 07:35:47 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 07:36:39 - pico-train - INFO - Step 80800 -- ๐ Training Metrics |
| 2025-08-31 07:36:39 - pico-train - INFO - โโโ Loss: 4.2818 |
| 2025-08-31 07:36:39 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 07:36:39 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 07:37:30 - pico-train - INFO - Step 80900 -- ๐ Training Metrics |
| 2025-08-31 07:37:30 - pico-train - INFO - โโโ Loss: 4.6346 |
| 2025-08-31 07:37:30 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 07:37:30 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 07:38:22 - pico-train - INFO - Step 81000 -- ๐ Training Metrics |
| 2025-08-31 07:38:22 - pico-train - INFO - โโโ Loss: 4.7401 |
| 2025-08-31 07:38:22 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 07:38:22 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 07:39:14 - pico-train - INFO - Step 81100 -- ๐ Training Metrics |
| 2025-08-31 07:39:14 - pico-train - INFO - โโโ Loss: 4.7133 |
| 2025-08-31 07:39:14 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 07:39:14 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 07:40:05 - pico-train - INFO - Step 81200 -- ๐ Training Metrics |
| 2025-08-31 07:40:05 - pico-train - INFO - โโโ Loss: 4.7280 |
| 2025-08-31 07:40:05 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 07:40:05 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 07:40:57 - pico-train - INFO - Step 81300 -- ๐ Training Metrics |
| 2025-08-31 07:40:57 - pico-train - INFO - โโโ Loss: 4.7991 |
| 2025-08-31 07:40:57 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 07:40:57 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 07:41:48 - pico-train - INFO - Step 81400 -- ๐ Training Metrics |
| 2025-08-31 07:41:48 - pico-train - INFO - โโโ Loss: 4.6982 |
| 2025-08-31 07:41:48 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 07:41:48 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 07:42:40 - pico-train - INFO - Step 81500 -- ๐ Training Metrics |
| 2025-08-31 07:42:40 - pico-train - INFO - โโโ Loss: 4.7575 |
| 2025-08-31 07:42:40 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 07:42:40 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 07:43:32 - pico-train - INFO - Step 81600 -- ๐ Training Metrics |
| 2025-08-31 07:43:32 - pico-train - INFO - โโโ Loss: 4.7485 |
| 2025-08-31 07:43:32 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 07:43:32 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 07:44:23 - pico-train - INFO - Step 81700 -- ๐ Training Metrics |
| 2025-08-31 07:44:23 - pico-train - INFO - โโโ Loss: 4.7542 |
| 2025-08-31 07:44:23 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 07:44:23 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 07:45:18 - pico-train - INFO - Step 81800 -- ๐ Training Metrics |
| 2025-08-31 07:45:18 - pico-train - INFO - โโโ Loss: 4.7250 |
| 2025-08-31 07:45:18 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 07:45:18 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 07:46:09 - pico-train - INFO - Step 81900 -- ๐ Training Metrics |
| 2025-08-31 07:46:09 - pico-train - INFO - โโโ Loss: 4.7202 |
| 2025-08-31 07:46:09 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 07:46:09 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 07:47:00 - pico-train - INFO - Step 82000 -- ๐พ Saving Checkpoint |
| 2025-08-31 07:48:48 - pico-train - INFO - Step 82000 -- ๐ Evaluation Results |
| 2025-08-31 07:48:48 - pico-train - INFO - โโโ paloma: inf |
| 2025-08-31 07:48:48 - pico-train - INFO - Step 82000 -- ๐ Training Metrics |
| 2025-08-31 07:48:48 - pico-train - INFO - โโโ Loss: 4.7742 |
| 2025-08-31 07:48:48 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 07:48:48 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 07:48:48 - pico-train - INFO - Step 82000 -- ๐ Saving Learning Dynamics |
| 2025-08-31 07:49:40 - pico-train - INFO - Step 82100 -- ๐ Training Metrics |
| 2025-08-31 07:49:40 - pico-train - INFO - โโโ Loss: 4.7275 |
| 2025-08-31 07:49:40 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 07:49:40 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 07:50:32 - pico-train - INFO - Step 82200 -- ๐ Training Metrics |
| 2025-08-31 07:50:32 - pico-train - INFO - โโโ Loss: 4.7428 |
| 2025-08-31 07:50:32 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 07:50:32 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 07:51:23 - pico-train - INFO - Step 82300 -- ๐ Training Metrics |
| 2025-08-31 07:51:23 - pico-train - INFO - โโโ Loss: 4.7513 |
| 2025-08-31 07:51:23 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 07:51:23 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 07:52:15 - pico-train - INFO - Step 82400 -- ๐ Training Metrics |
| 2025-08-31 07:52:15 - pico-train - INFO - โโโ Loss: 4.7617 |
| 2025-08-31 07:52:15 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 07:52:15 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 07:53:07 - pico-train - INFO - Step 82500 -- ๐ Training Metrics |
| 2025-08-31 07:53:07 - pico-train - INFO - โโโ Loss: 4.7632 |
| 2025-08-31 07:53:07 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 07:53:07 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 07:53:58 - pico-train - INFO - Step 82600 -- ๐ Training Metrics |
| 2025-08-31 07:53:58 - pico-train - INFO - โโโ Loss: 4.7351 |
| 2025-08-31 07:53:58 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 07:53:58 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 07:54:50 - pico-train - INFO - Step 82700 -- ๐ Training Metrics |
| 2025-08-31 07:54:50 - pico-train - INFO - โโโ Loss: 4.7281 |
| 2025-08-31 07:54:50 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 07:54:50 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 07:55:41 - pico-train - INFO - Step 82800 -- ๐ Training Metrics |
| 2025-08-31 07:55:41 - pico-train - INFO - โโโ Loss: 4.7582 |
| 2025-08-31 07:55:41 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 07:55:41 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 07:56:32 - pico-train - INFO - Step 82900 -- ๐ Training Metrics |
| 2025-08-31 07:56:32 - pico-train - INFO - โโโ Loss: 4.6581 |
| 2025-08-31 07:56:32 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 07:56:32 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 07:57:24 - pico-train - INFO - Step 83000 -- ๐ Training Metrics |
| 2025-08-31 07:57:24 - pico-train - INFO - โโโ Loss: 4.7060 |
| 2025-08-31 07:57:24 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 07:57:24 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 07:58:16 - pico-train - INFO - Step 83100 -- ๐ Training Metrics |
| 2025-08-31 07:58:16 - pico-train - INFO - โโโ Loss: 4.7217 |
| 2025-08-31 07:58:16 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 07:58:16 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 07:59:07 - pico-train - INFO - Step 83200 -- ๐ Training Metrics |
| 2025-08-31 07:59:07 - pico-train - INFO - โโโ Loss: 4.7407 |
| 2025-08-31 07:59:07 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 07:59:07 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 07:59:59 - pico-train - INFO - Step 83300 -- ๐ Training Metrics |
| 2025-08-31 07:59:59 - pico-train - INFO - โโโ Loss: 4.7347 |
| 2025-08-31 07:59:59 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 07:59:59 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 08:00:50 - pico-train - INFO - Step 83400 -- ๐ Training Metrics |
| 2025-08-31 08:00:50 - pico-train - INFO - โโโ Loss: 4.7371 |
| 2025-08-31 08:00:50 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 08:00:50 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 08:01:42 - pico-train - INFO - Step 83500 -- ๐ Training Metrics |
| 2025-08-31 08:01:42 - pico-train - INFO - โโโ Loss: 4.7134 |
| 2025-08-31 08:01:42 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 08:01:42 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 08:02:34 - pico-train - INFO - Step 83600 -- ๐ Training Metrics |
| 2025-08-31 08:02:34 - pico-train - INFO - โโโ Loss: 4.7324 |
| 2025-08-31 08:02:34 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 08:02:34 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 08:03:25 - pico-train - INFO - Step 83700 -- ๐ Training Metrics |
| 2025-08-31 08:03:25 - pico-train - INFO - โโโ Loss: 4.6926 |
| 2025-08-31 08:03:25 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 08:03:25 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 08:04:16 - pico-train - INFO - Step 83800 -- ๐ Training Metrics |
| 2025-08-31 08:04:16 - pico-train - INFO - โโโ Loss: 4.7118 |
| 2025-08-31 08:04:16 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 08:04:16 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 08:05:08 - pico-train - INFO - Step 83900 -- ๐ Training Metrics |
| 2025-08-31 08:05:08 - pico-train - INFO - โโโ Loss: 4.7378 |
| 2025-08-31 08:05:08 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 08:05:08 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 08:06:00 - pico-train - INFO - Step 84000 -- ๐พ Saving Checkpoint |
| 2025-08-31 08:07:47 - pico-train - INFO - Step 84000 -- ๐ Evaluation Results |
| 2025-08-31 08:07:47 - pico-train - INFO - โโโ paloma: inf |
| 2025-08-31 08:07:48 - pico-train - INFO - Step 84000 -- ๐ Training Metrics |
| 2025-08-31 08:07:48 - pico-train - INFO - โโโ Loss: 4.7013 |
| 2025-08-31 08:07:48 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 08:07:48 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 08:07:48 - pico-train - INFO - Step 84000 -- ๐ Saving Learning Dynamics |
| 2025-08-31 08:08:40 - pico-train - INFO - Step 84100 -- ๐ Training Metrics |
| 2025-08-31 08:08:40 - pico-train - INFO - โโโ Loss: 4.6884 |
| 2025-08-31 08:08:40 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 08:08:40 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 08:09:32 - pico-train - INFO - Step 84200 -- ๐ Training Metrics |
| 2025-08-31 08:09:32 - pico-train - INFO - โโโ Loss: 4.6894 |
| 2025-08-31 08:09:32 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 08:09:32 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 08:10:23 - pico-train - INFO - Step 84300 -- ๐ Training Metrics |
| 2025-08-31 08:10:23 - pico-train - INFO - โโโ Loss: 4.7470 |
| 2025-08-31 08:10:23 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 08:10:23 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 08:11:14 - pico-train - INFO - Step 84400 -- ๐ Training Metrics |
| 2025-08-31 08:11:14 - pico-train - INFO - โโโ Loss: 4.7089 |
| 2025-08-31 08:11:14 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 08:11:14 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 08:12:06 - pico-train - INFO - Step 84500 -- ๐ Training Metrics |
| 2025-08-31 08:12:06 - pico-train - INFO - โโโ Loss: 4.6500 |
| 2025-08-31 08:12:06 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 08:12:06 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 08:12:58 - pico-train - INFO - Step 84600 -- ๐ Training Metrics |
| 2025-08-31 08:12:58 - pico-train - INFO - โโโ Loss: 4.6849 |
| 2025-08-31 08:12:58 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 08:12:58 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 08:13:49 - pico-train - INFO - Step 84700 -- ๐ Training Metrics |
| 2025-08-31 08:13:49 - pico-train - INFO - โโโ Loss: 4.6797 |
| 2025-08-31 08:13:49 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 08:13:49 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 08:14:41 - pico-train - INFO - Step 84800 -- ๐ Training Metrics |
| 2025-08-31 08:14:41 - pico-train - INFO - โโโ Loss: 4.7071 |
| 2025-08-31 08:14:41 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 08:14:41 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 08:15:32 - pico-train - INFO - Step 84900 -- ๐ Training Metrics |
| 2025-08-31 08:15:32 - pico-train - INFO - โโโ Loss: 4.6975 |
| 2025-08-31 08:15:32 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 08:15:32 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 08:16:25 - pico-train - INFO - Step 85000 -- ๐ Training Metrics |
| 2025-08-31 08:16:25 - pico-train - INFO - โโโ Loss: 4.7164 |
| 2025-08-31 08:16:25 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 08:16:25 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 08:17:16 - pico-train - INFO - Step 85100 -- ๐ Training Metrics |
| 2025-08-31 08:17:16 - pico-train - INFO - โโโ Loss: 4.6050 |
| 2025-08-31 08:17:16 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 08:17:16 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 08:18:07 - pico-train - INFO - Step 85200 -- ๐ Training Metrics |
| 2025-08-31 08:18:07 - pico-train - INFO - โโโ Loss: 4.6708 |
| 2025-08-31 08:18:07 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 08:18:07 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 08:18:59 - pico-train - INFO - Step 85300 -- ๐ Training Metrics |
| 2025-08-31 08:18:59 - pico-train - INFO - โโโ Loss: 4.6323 |
| 2025-08-31 08:18:59 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 08:18:59 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 08:19:51 - pico-train - INFO - Step 85400 -- ๐ Training Metrics |
| 2025-08-31 08:19:51 - pico-train - INFO - โโโ Loss: 4.5546 |
| 2025-08-31 08:19:51 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 08:19:51 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 08:20:43 - pico-train - INFO - Step 85500 -- ๐ Training Metrics |
| 2025-08-31 08:20:43 - pico-train - INFO - โโโ Loss: 4.7022 |
| 2025-08-31 08:20:43 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 08:20:43 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 08:21:34 - pico-train - INFO - Step 85600 -- ๐ Training Metrics |
| 2025-08-31 08:21:34 - pico-train - INFO - โโโ Loss: 4.6936 |
| 2025-08-31 08:21:34 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 08:21:34 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 08:22:26 - pico-train - INFO - Step 85700 -- ๐ Training Metrics |
| 2025-08-31 08:22:26 - pico-train - INFO - โโโ Loss: 4.7058 |
| 2025-08-31 08:22:26 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 08:22:26 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 08:23:17 - pico-train - INFO - Step 85800 -- ๐ Training Metrics |
| 2025-08-31 08:23:17 - pico-train - INFO - โโโ Loss: 4.7049 |
| 2025-08-31 08:23:17 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 08:23:17 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 08:24:09 - pico-train - INFO - Step 85900 -- ๐ Training Metrics |
| 2025-08-31 08:24:09 - pico-train - INFO - โโโ Loss: 4.6506 |
| 2025-08-31 08:24:09 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 08:24:09 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 08:25:00 - pico-train - INFO - Step 86000 -- ๐พ Saving Checkpoint |
| 2025-08-31 08:26:48 - pico-train - INFO - Step 86000 -- ๐ Evaluation Results |
| 2025-08-31 08:26:48 - pico-train - INFO - โโโ paloma: inf |
| 2025-08-31 08:26:49 - pico-train - INFO - Step 86000 -- ๐ Training Metrics |
| 2025-08-31 08:26:49 - pico-train - INFO - โโโ Loss: 4.6790 |
| 2025-08-31 08:26:49 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 08:26:49 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 08:26:49 - pico-train - INFO - Step 86000 -- ๐ Saving Learning Dynamics |
| 2025-08-31 08:27:41 - pico-train - INFO - Step 86100 -- ๐ Training Metrics |
| 2025-08-31 08:27:41 - pico-train - INFO - โโโ Loss: 4.7205 |
| 2025-08-31 08:27:41 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 08:27:41 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 08:28:32 - pico-train - INFO - Step 86200 -- ๐ Training Metrics |
| 2025-08-31 08:28:32 - pico-train - INFO - โโโ Loss: 4.7132 |
| 2025-08-31 08:28:32 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 08:28:32 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 08:29:23 - pico-train - INFO - Step 86300 -- ๐ Training Metrics |
| 2025-08-31 08:29:23 - pico-train - INFO - โโโ Loss: 4.6258 |
| 2025-08-31 08:29:23 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 08:29:23 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 08:30:15 - pico-train - INFO - Step 86400 -- ๐ Training Metrics |
| 2025-08-31 08:30:15 - pico-train - INFO - โโโ Loss: 4.7446 |
| 2025-08-31 08:30:15 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 08:30:15 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 08:31:07 - pico-train - INFO - Step 86500 -- ๐ Training Metrics |
| 2025-08-31 08:31:07 - pico-train - INFO - โโโ Loss: 4.6696 |
| 2025-08-31 08:31:07 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 08:31:07 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 08:31:58 - pico-train - INFO - Step 86600 -- ๐ Training Metrics |
| 2025-08-31 08:31:58 - pico-train - INFO - โโโ Loss: 4.7031 |
| 2025-08-31 08:31:58 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 08:31:58 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 08:32:50 - pico-train - INFO - Step 86700 -- ๐ Training Metrics |
| 2025-08-31 08:32:50 - pico-train - INFO - โโโ Loss: 4.7437 |
| 2025-08-31 08:32:50 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 08:32:50 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 08:33:41 - pico-train - INFO - Step 86800 -- ๐ Training Metrics |
| 2025-08-31 08:33:41 - pico-train - INFO - โโโ Loss: 4.7354 |
| 2025-08-31 08:33:41 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 08:33:41 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 08:34:33 - pico-train - INFO - Step 86900 -- ๐ Training Metrics |
| 2025-08-31 08:34:33 - pico-train - INFO - โโโ Loss: 4.7387 |
| 2025-08-31 08:34:33 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 08:34:33 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 08:35:25 - pico-train - INFO - Step 87000 -- ๐ Training Metrics |
| 2025-08-31 08:35:25 - pico-train - INFO - โโโ Loss: 4.7328 |
| 2025-08-31 08:35:25 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 08:35:25 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 08:36:16 - pico-train - INFO - Step 87100 -- ๐ Training Metrics |
| 2025-08-31 08:36:16 - pico-train - INFO - โโโ Loss: 4.7011 |
| 2025-08-31 08:36:16 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 08:36:16 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 08:37:08 - pico-train - INFO - Step 87200 -- ๐ Training Metrics |
| 2025-08-31 08:37:08 - pico-train - INFO - โโโ Loss: 4.7231 |
| 2025-08-31 08:37:08 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 08:37:08 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 08:38:00 - pico-train - INFO - Step 87300 -- ๐ Training Metrics |
| 2025-08-31 08:38:00 - pico-train - INFO - โโโ Loss: 4.7254 |
| 2025-08-31 08:38:00 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 08:38:00 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 08:38:52 - pico-train - INFO - Step 87400 -- ๐ Training Metrics |
| 2025-08-31 08:38:52 - pico-train - INFO - โโโ Loss: 4.7838 |
| 2025-08-31 08:38:52 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 08:38:52 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 08:39:44 - pico-train - INFO - Step 87500 -- ๐ Training Metrics |
| 2025-08-31 08:39:44 - pico-train - INFO - โโโ Loss: 4.7237 |
| 2025-08-31 08:39:44 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 08:39:44 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 08:40:35 - pico-train - INFO - Step 87600 -- ๐ Training Metrics |
| 2025-08-31 08:40:35 - pico-train - INFO - โโโ Loss: 4.7589 |
| 2025-08-31 08:40:35 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 08:40:35 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 08:41:26 - pico-train - INFO - Step 87700 -- ๐ Training Metrics |
| 2025-08-31 08:41:26 - pico-train - INFO - โโโ Loss: 4.7597 |
| 2025-08-31 08:41:26 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 08:41:26 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 08:42:18 - pico-train - INFO - Step 87800 -- ๐ Training Metrics |
| 2025-08-31 08:42:18 - pico-train - INFO - โโโ Loss: 4.7013 |
| 2025-08-31 08:42:18 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 08:42:18 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 08:43:10 - pico-train - INFO - Step 87900 -- ๐ Training Metrics |
| 2025-08-31 08:43:10 - pico-train - INFO - โโโ Loss: 4.6875 |
| 2025-08-31 08:43:10 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 08:43:10 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 08:44:02 - pico-train - INFO - Step 88000 -- ๐พ Saving Checkpoint |
| 2025-08-31 08:45:50 - pico-train - INFO - Step 88000 -- ๐ Evaluation Results |
| 2025-08-31 08:45:50 - pico-train - INFO - โโโ paloma: inf |
| 2025-08-31 08:45:50 - pico-train - INFO - Step 88000 -- ๐ Training Metrics |
| 2025-08-31 08:45:50 - pico-train - INFO - โโโ Loss: 4.7735 |
| 2025-08-31 08:45:50 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 08:45:50 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 08:45:50 - pico-train - INFO - Step 88000 -- ๐ Saving Learning Dynamics |
| 2025-08-31 08:46:42 - pico-train - INFO - Step 88100 -- ๐ Training Metrics |
| 2025-08-31 08:46:42 - pico-train - INFO - โโโ Loss: 4.7450 |
| 2025-08-31 08:46:42 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 08:46:42 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 08:47:34 - pico-train - INFO - Step 88200 -- ๐ Training Metrics |
| 2025-08-31 08:47:34 - pico-train - INFO - โโโ Loss: 4.7223 |
| 2025-08-31 08:47:34 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 08:47:34 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 08:48:26 - pico-train - INFO - Step 88300 -- ๐ Training Metrics |
| 2025-08-31 08:48:26 - pico-train - INFO - โโโ Loss: 4.7561 |
| 2025-08-31 08:48:26 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 08:48:26 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 08:49:17 - pico-train - INFO - Step 88400 -- ๐ Training Metrics |
| 2025-08-31 08:49:17 - pico-train - INFO - โโโ Loss: 4.7080 |
| 2025-08-31 08:49:17 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 08:49:17 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 08:50:09 - pico-train - INFO - Step 88500 -- ๐ Training Metrics |
| 2025-08-31 08:50:09 - pico-train - INFO - โโโ Loss: 4.6788 |
| 2025-08-31 08:50:09 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 08:50:09 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 08:51:00 - pico-train - INFO - Step 88600 -- ๐ Training Metrics |
| 2025-08-31 08:51:00 - pico-train - INFO - โโโ Loss: 4.6932 |
| 2025-08-31 08:51:00 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 08:51:00 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 08:51:52 - pico-train - INFO - Step 88700 -- ๐ Training Metrics |
| 2025-08-31 08:51:52 - pico-train - INFO - โโโ Loss: 4.7393 |
| 2025-08-31 08:51:52 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 08:51:52 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 08:52:43 - pico-train - INFO - Step 88800 -- ๐ Training Metrics |
| 2025-08-31 08:52:43 - pico-train - INFO - โโโ Loss: 4.7357 |
| 2025-08-31 08:52:43 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 08:52:43 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 08:53:35 - pico-train - INFO - Step 88900 -- ๐ Training Metrics |
| 2025-08-31 08:53:35 - pico-train - INFO - โโโ Loss: 4.7379 |
| 2025-08-31 08:53:35 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 08:53:35 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 08:54:27 - pico-train - INFO - Step 89000 -- ๐ Training Metrics |
| 2025-08-31 08:54:27 - pico-train - INFO - โโโ Loss: 4.7562 |
| 2025-08-31 08:54:27 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 08:54:27 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 08:55:19 - pico-train - INFO - Step 89100 -- ๐ Training Metrics |
| 2025-08-31 08:55:19 - pico-train - INFO - โโโ Loss: 4.7427 |
| 2025-08-31 08:55:19 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 08:55:19 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 08:56:10 - pico-train - INFO - Step 89200 -- ๐ Training Metrics |
| 2025-08-31 08:56:10 - pico-train - INFO - โโโ Loss: 4.7273 |
| 2025-08-31 08:56:10 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 08:56:10 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 08:57:02 - pico-train - INFO - Step 89300 -- ๐ Training Metrics |
| 2025-08-31 08:57:02 - pico-train - INFO - โโโ Loss: 4.7312 |
| 2025-08-31 08:57:02 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 08:57:02 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 08:57:53 - pico-train - INFO - Step 89400 -- ๐ Training Metrics |
| 2025-08-31 08:57:53 - pico-train - INFO - โโโ Loss: 4.6950 |
| 2025-08-31 08:57:53 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 08:57:53 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 08:58:45 - pico-train - INFO - Step 89500 -- ๐ Training Metrics |
| 2025-08-31 08:58:45 - pico-train - INFO - โโโ Loss: 4.6980 |
| 2025-08-31 08:58:45 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 08:58:45 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 08:59:37 - pico-train - INFO - Step 89600 -- ๐ Training Metrics |
| 2025-08-31 08:59:37 - pico-train - INFO - โโโ Loss: 4.6993 |
| 2025-08-31 08:59:37 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 08:59:37 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 09:00:28 - pico-train - INFO - Step 89700 -- ๐ Training Metrics |
| 2025-08-31 09:00:28 - pico-train - INFO - โโโ Loss: 4.6600 |
| 2025-08-31 09:00:28 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 09:00:28 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 09:01:20 - pico-train - INFO - Step 89800 -- ๐ Training Metrics |
| 2025-08-31 09:01:20 - pico-train - INFO - โโโ Loss: 4.7641 |
| 2025-08-31 09:01:20 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 09:01:20 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 09:02:11 - pico-train - INFO - Step 89900 -- ๐ Training Metrics |
| 2025-08-31 09:02:11 - pico-train - INFO - โโโ Loss: 4.7180 |
| 2025-08-31 09:02:11 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 09:02:11 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 09:03:03 - pico-train - INFO - Step 90000 -- ๐พ Saving Checkpoint |
| 2025-08-31 09:04:53 - pico-train - INFO - Step 90000 -- ๐ Evaluation Results |
| 2025-08-31 09:04:53 - pico-train - INFO - โโโ paloma: inf |
| 2025-08-31 09:04:53 - pico-train - INFO - Step 90000 -- ๐ Training Metrics |
| 2025-08-31 09:04:53 - pico-train - INFO - โโโ Loss: 4.6701 |
| 2025-08-31 09:04:53 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 09:04:53 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 09:04:53 - pico-train - INFO - Step 90000 -- ๐ Saving Learning Dynamics |
| 2025-08-31 09:05:45 - pico-train - INFO - Step 90100 -- ๐ Training Metrics |
| 2025-08-31 09:05:45 - pico-train - INFO - โโโ Loss: 4.7339 |
| 2025-08-31 09:05:45 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 09:05:45 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 09:06:37 - pico-train - INFO - Step 90200 -- ๐ Training Metrics |
| 2025-08-31 09:06:37 - pico-train - INFO - โโโ Loss: 4.7007 |
| 2025-08-31 09:06:37 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 09:06:37 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 09:07:28 - pico-train - INFO - Step 90300 -- ๐ Training Metrics |
| 2025-08-31 09:07:28 - pico-train - INFO - โโโ Loss: 4.7463 |
| 2025-08-31 09:07:28 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 09:07:28 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 09:08:20 - pico-train - INFO - Step 90400 -- ๐ Training Metrics |
| 2025-08-31 09:08:20 - pico-train - INFO - โโโ Loss: 4.7707 |
| 2025-08-31 09:08:20 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 09:08:20 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 09:09:12 - pico-train - INFO - Step 90500 -- ๐ Training Metrics |
| 2025-08-31 09:09:12 - pico-train - INFO - โโโ Loss: 4.7085 |
| 2025-08-31 09:09:12 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 09:09:12 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 09:10:03 - pico-train - INFO - Step 90600 -- ๐ Training Metrics |
| 2025-08-31 09:10:03 - pico-train - INFO - โโโ Loss: 4.7055 |
| 2025-08-31 09:10:03 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 09:10:03 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 09:10:55 - pico-train - INFO - Step 90700 -- ๐ Training Metrics |
| 2025-08-31 09:10:55 - pico-train - INFO - โโโ Loss: 4.7187 |
| 2025-08-31 09:10:55 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 09:10:55 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 09:11:46 - pico-train - INFO - Step 90800 -- ๐ Training Metrics |
| 2025-08-31 09:11:46 - pico-train - INFO - โโโ Loss: 4.7100 |
| 2025-08-31 09:11:46 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 09:11:46 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 09:12:38 - pico-train - INFO - Step 90900 -- ๐ Training Metrics |
| 2025-08-31 09:12:38 - pico-train - INFO - โโโ Loss: 4.7254 |
| 2025-08-31 09:12:38 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 09:12:38 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 09:13:30 - pico-train - INFO - Step 91000 -- ๐ Training Metrics |
| 2025-08-31 09:13:30 - pico-train - INFO - โโโ Loss: 4.7218 |
| 2025-08-31 09:13:30 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 09:13:30 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 09:14:21 - pico-train - INFO - Step 91100 -- ๐ Training Metrics |
| 2025-08-31 09:14:21 - pico-train - INFO - โโโ Loss: 4.7145 |
| 2025-08-31 09:14:21 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 09:14:21 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 09:15:12 - pico-train - INFO - Step 91200 -- ๐ Training Metrics |
| 2025-08-31 09:15:12 - pico-train - INFO - โโโ Loss: 4.7425 |
| 2025-08-31 09:15:12 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 09:15:12 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 09:16:04 - pico-train - INFO - Step 91300 -- ๐ Training Metrics |
| 2025-08-31 09:16:04 - pico-train - INFO - โโโ Loss: 4.7399 |
| 2025-08-31 09:16:04 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 09:16:04 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 09:16:56 - pico-train - INFO - Step 91400 -- ๐ Training Metrics |
| 2025-08-31 09:16:56 - pico-train - INFO - โโโ Loss: 4.7377 |
| 2025-08-31 09:16:56 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 09:16:56 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 09:17:48 - pico-train - INFO - Step 91500 -- ๐ Training Metrics |
| 2025-08-31 09:17:48 - pico-train - INFO - โโโ Loss: 4.7343 |
| 2025-08-31 09:17:48 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 09:17:48 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 09:18:40 - pico-train - INFO - Step 91600 -- ๐ Training Metrics |
| 2025-08-31 09:18:40 - pico-train - INFO - โโโ Loss: 4.7149 |
| 2025-08-31 09:18:40 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 09:18:40 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 09:19:31 - pico-train - INFO - Step 91700 -- ๐ Training Metrics |
| 2025-08-31 09:19:31 - pico-train - INFO - โโโ Loss: 4.7143 |
| 2025-08-31 09:19:31 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 09:19:31 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 09:20:23 - pico-train - INFO - Step 91800 -- ๐ Training Metrics |
| 2025-08-31 09:20:23 - pico-train - INFO - โโโ Loss: 4.7069 |
| 2025-08-31 09:20:23 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 09:20:23 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 09:21:15 - pico-train - INFO - Step 91900 -- ๐ Training Metrics |
| 2025-08-31 09:21:15 - pico-train - INFO - โโโ Loss: 4.6903 |
| 2025-08-31 09:21:15 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 09:21:15 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 09:22:06 - pico-train - INFO - Step 92000 -- ๐พ Saving Checkpoint |
| 2025-08-31 09:23:55 - pico-train - INFO - Step 92000 -- ๐ Evaluation Results |
| 2025-08-31 09:23:55 - pico-train - INFO - โโโ paloma: inf |
| 2025-08-31 09:23:55 - pico-train - INFO - Step 92000 -- ๐ Training Metrics |
| 2025-08-31 09:23:55 - pico-train - INFO - โโโ Loss: 4.7150 |
| 2025-08-31 09:23:55 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 09:23:55 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 09:23:55 - pico-train - INFO - Step 92000 -- ๐ Saving Learning Dynamics |
| 2025-08-31 09:24:48 - pico-train - INFO - Step 92100 -- ๐ Training Metrics |
| 2025-08-31 09:24:48 - pico-train - INFO - โโโ Loss: 4.7068 |
| 2025-08-31 09:24:48 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 09:24:48 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 09:25:40 - pico-train - INFO - Step 92200 -- ๐ Training Metrics |
| 2025-08-31 09:25:40 - pico-train - INFO - โโโ Loss: 4.7260 |
| 2025-08-31 09:25:40 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 09:25:40 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 09:26:32 - pico-train - INFO - Step 92300 -- ๐ Training Metrics |
| 2025-08-31 09:26:32 - pico-train - INFO - โโโ Loss: 4.7208 |
| 2025-08-31 09:26:32 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 09:26:32 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 09:27:24 - pico-train - INFO - Step 92400 -- ๐ Training Metrics |
| 2025-08-31 09:27:24 - pico-train - INFO - โโโ Loss: 4.6989 |
| 2025-08-31 09:27:24 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 09:27:24 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 09:28:16 - pico-train - INFO - Step 92500 -- ๐ Training Metrics |
| 2025-08-31 09:28:16 - pico-train - INFO - โโโ Loss: 4.7195 |
| 2025-08-31 09:28:16 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 09:28:16 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 09:29:08 - pico-train - INFO - Step 92600 -- ๐ Training Metrics |
| 2025-08-31 09:29:08 - pico-train - INFO - โโโ Loss: 4.6940 |
| 2025-08-31 09:29:08 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 09:29:08 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 09:29:59 - pico-train - INFO - Step 92700 -- ๐ Training Metrics |
| 2025-08-31 09:29:59 - pico-train - INFO - โโโ Loss: 4.7561 |
| 2025-08-31 09:29:59 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 09:29:59 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 09:30:51 - pico-train - INFO - Step 92800 -- ๐ Training Metrics |
| 2025-08-31 09:30:51 - pico-train - INFO - โโโ Loss: 4.6793 |
| 2025-08-31 09:30:51 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 09:30:51 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 09:31:42 - pico-train - INFO - Step 92900 -- ๐ Training Metrics |
| 2025-08-31 09:31:42 - pico-train - INFO - โโโ Loss: 4.7148 |
| 2025-08-31 09:31:42 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 09:31:42 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 09:32:35 - pico-train - INFO - Step 93000 -- ๐ Training Metrics |
| 2025-08-31 09:32:35 - pico-train - INFO - โโโ Loss: 4.7254 |
| 2025-08-31 09:32:35 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 09:32:35 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 09:33:26 - pico-train - INFO - Step 93100 -- ๐ Training Metrics |
| 2025-08-31 09:33:26 - pico-train - INFO - โโโ Loss: 4.6995 |
| 2025-08-31 09:33:26 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 09:33:26 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 09:34:17 - pico-train - INFO - Step 93200 -- ๐ Training Metrics |
| 2025-08-31 09:34:17 - pico-train - INFO - โโโ Loss: 4.7255 |
| 2025-08-31 09:34:17 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 09:34:17 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 09:35:09 - pico-train - INFO - Step 93300 -- ๐ Training Metrics |
| 2025-08-31 09:35:09 - pico-train - INFO - โโโ Loss: 4.7238 |
| 2025-08-31 09:35:09 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 09:35:09 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 09:36:00 - pico-train - INFO - Step 93400 -- ๐ Training Metrics |
| 2025-08-31 09:36:00 - pico-train - INFO - โโโ Loss: 4.7211 |
| 2025-08-31 09:36:00 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 09:36:00 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 09:36:52 - pico-train - INFO - Step 93500 -- ๐ Training Metrics |
| 2025-08-31 09:36:52 - pico-train - INFO - โโโ Loss: 4.6741 |
| 2025-08-31 09:36:52 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 09:36:52 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 09:37:43 - pico-train - INFO - Step 93600 -- ๐ Training Metrics |
| 2025-08-31 09:37:43 - pico-train - INFO - โโโ Loss: 4.7281 |
| 2025-08-31 09:37:43 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 09:37:43 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 09:38:35 - pico-train - INFO - Step 93700 -- ๐ Training Metrics |
| 2025-08-31 09:38:35 - pico-train - INFO - โโโ Loss: 4.6642 |
| 2025-08-31 09:38:35 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 09:38:35 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 09:39:27 - pico-train - INFO - Step 93800 -- ๐ Training Metrics |
| 2025-08-31 09:39:27 - pico-train - INFO - โโโ Loss: 4.7126 |
| 2025-08-31 09:39:27 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 09:39:27 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 09:40:18 - pico-train - INFO - Step 93900 -- ๐ Training Metrics |
| 2025-08-31 09:40:18 - pico-train - INFO - โโโ Loss: 4.7402 |
| 2025-08-31 09:40:18 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 09:40:18 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 09:41:09 - pico-train - INFO - Step 94000 -- ๐พ Saving Checkpoint |
| 2025-08-31 09:42:58 - pico-train - INFO - Step 94000 -- ๐ Evaluation Results |
| 2025-08-31 09:42:58 - pico-train - INFO - โโโ paloma: inf |
| 2025-08-31 09:42:59 - pico-train - INFO - Step 94000 -- ๐ Training Metrics |
| 2025-08-31 09:42:59 - pico-train - INFO - โโโ Loss: 4.7579 |
| 2025-08-31 09:42:59 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 09:42:59 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 09:42:59 - pico-train - INFO - Step 94000 -- ๐ Saving Learning Dynamics |
| 2025-08-31 09:43:51 - pico-train - INFO - Step 94100 -- ๐ Training Metrics |
| 2025-08-31 09:43:51 - pico-train - INFO - โโโ Loss: 4.7224 |
| 2025-08-31 09:43:51 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 09:43:51 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 09:44:43 - pico-train - INFO - Step 94200 -- ๐ Training Metrics |
| 2025-08-31 09:44:43 - pico-train - INFO - โโโ Loss: 4.7131 |
| 2025-08-31 09:44:43 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 09:44:43 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 09:45:35 - pico-train - INFO - Step 94300 -- ๐ Training Metrics |
| 2025-08-31 09:45:35 - pico-train - INFO - โโโ Loss: 4.6659 |
| 2025-08-31 09:45:35 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 09:45:35 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 09:46:26 - pico-train - INFO - Step 94400 -- ๐ Training Metrics |
| 2025-08-31 09:46:26 - pico-train - INFO - โโโ Loss: 4.6991 |
| 2025-08-31 09:46:26 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 09:46:26 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 09:47:18 - pico-train - INFO - Step 94500 -- ๐ Training Metrics |
| 2025-08-31 09:47:18 - pico-train - INFO - โโโ Loss: 4.7049 |
| 2025-08-31 09:47:18 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 09:47:18 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 09:48:10 - pico-train - INFO - Step 94600 -- ๐ Training Metrics |
| 2025-08-31 09:48:10 - pico-train - INFO - โโโ Loss: 4.6958 |
| 2025-08-31 09:48:10 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 09:48:10 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 09:49:02 - pico-train - INFO - Step 94700 -- ๐ Training Metrics |
| 2025-08-31 09:49:02 - pico-train - INFO - โโโ Loss: 4.6892 |
| 2025-08-31 09:49:02 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 09:49:02 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 09:49:53 - pico-train - INFO - Step 94800 -- ๐ Training Metrics |
| 2025-08-31 09:49:53 - pico-train - INFO - โโโ Loss: 4.6371 |
| 2025-08-31 09:49:53 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 09:49:53 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 09:50:45 - pico-train - INFO - Step 94900 -- ๐ Training Metrics |
| 2025-08-31 09:50:45 - pico-train - INFO - โโโ Loss: 4.7253 |
| 2025-08-31 09:50:45 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 09:50:45 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 09:51:37 - pico-train - INFO - Step 95000 -- ๐ Training Metrics |
| 2025-08-31 09:51:37 - pico-train - INFO - โโโ Loss: 4.7112 |
| 2025-08-31 09:51:37 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 09:51:37 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 09:52:29 - pico-train - INFO - Step 95100 -- ๐ Training Metrics |
| 2025-08-31 09:52:29 - pico-train - INFO - โโโ Loss: 4.7578 |
| 2025-08-31 09:52:29 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 09:52:29 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 09:53:20 - pico-train - INFO - Step 95200 -- ๐ Training Metrics |
| 2025-08-31 09:53:20 - pico-train - INFO - โโโ Loss: 4.7148 |
| 2025-08-31 09:53:20 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 09:53:20 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 09:54:12 - pico-train - INFO - Step 95300 -- ๐ Training Metrics |
| 2025-08-31 09:54:12 - pico-train - INFO - โโโ Loss: 4.6893 |
| 2025-08-31 09:54:12 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 09:54:12 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 09:55:03 - pico-train - INFO - Step 95400 -- ๐ Training Metrics |
| 2025-08-31 09:55:03 - pico-train - INFO - โโโ Loss: 4.6751 |
| 2025-08-31 09:55:03 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 09:55:03 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 09:55:56 - pico-train - INFO - Step 95500 -- ๐ Training Metrics |
| 2025-08-31 09:55:56 - pico-train - INFO - โโโ Loss: 4.7003 |
| 2025-08-31 09:55:56 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 09:55:56 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 09:56:47 - pico-train - INFO - Step 95600 -- ๐ Training Metrics |
| 2025-08-31 09:56:47 - pico-train - INFO - โโโ Loss: 4.7165 |
| 2025-08-31 09:56:47 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 09:56:47 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 09:57:39 - pico-train - INFO - Step 95700 -- ๐ Training Metrics |
| 2025-08-31 09:57:39 - pico-train - INFO - โโโ Loss: 4.6655 |
| 2025-08-31 09:57:39 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 09:57:39 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 09:58:30 - pico-train - INFO - Step 95800 -- ๐ Training Metrics |
| 2025-08-31 09:58:30 - pico-train - INFO - โโโ Loss: 4.7263 |
| 2025-08-31 09:58:30 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 09:58:30 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 09:59:22 - pico-train - INFO - Step 95900 -- ๐ Training Metrics |
| 2025-08-31 09:59:22 - pico-train - INFO - โโโ Loss: 4.6837 |
| 2025-08-31 09:59:22 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 09:59:22 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 10:00:13 - pico-train - INFO - Step 96000 -- ๐พ Saving Checkpoint |
| 2025-08-31 10:02:02 - pico-train - INFO - Step 96000 -- ๐ Evaluation Results |
| 2025-08-31 10:02:02 - pico-train - INFO - โโโ paloma: inf |
| 2025-08-31 10:02:02 - pico-train - INFO - Step 96000 -- ๐ Training Metrics |
| 2025-08-31 10:02:02 - pico-train - INFO - โโโ Loss: 4.7159 |
| 2025-08-31 10:02:02 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 10:02:02 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 10:02:02 - pico-train - INFO - Step 96000 -- ๐ Saving Learning Dynamics |
| 2025-08-31 10:02:55 - pico-train - INFO - Step 96100 -- ๐ Training Metrics |
| 2025-08-31 10:02:55 - pico-train - INFO - โโโ Loss: 4.7156 |
| 2025-08-31 10:02:55 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 10:02:55 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 10:03:47 - pico-train - INFO - Step 96200 -- ๐ Training Metrics |
| 2025-08-31 10:03:47 - pico-train - INFO - โโโ Loss: 4.7599 |
| 2025-08-31 10:03:47 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 10:03:47 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 10:04:38 - pico-train - INFO - Step 96300 -- ๐ Training Metrics |
| 2025-08-31 10:04:38 - pico-train - INFO - โโโ Loss: 4.7298 |
| 2025-08-31 10:04:38 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 10:04:38 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 10:05:30 - pico-train - INFO - Step 96400 -- ๐ Training Metrics |
| 2025-08-31 10:05:30 - pico-train - INFO - โโโ Loss: 4.7523 |
| 2025-08-31 10:05:30 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 10:05:30 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 10:06:23 - pico-train - INFO - Step 96500 -- ๐ Training Metrics |
| 2025-08-31 10:06:23 - pico-train - INFO - โโโ Loss: 4.7078 |
| 2025-08-31 10:06:23 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 10:06:23 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 10:07:15 - pico-train - INFO - Step 96600 -- ๐ Training Metrics |
| 2025-08-31 10:07:15 - pico-train - INFO - โโโ Loss: 4.6924 |
| 2025-08-31 10:07:15 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 10:07:15 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 10:08:06 - pico-train - INFO - Step 96700 -- ๐ Training Metrics |
| 2025-08-31 10:08:06 - pico-train - INFO - โโโ Loss: 4.7354 |
| 2025-08-31 10:08:06 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 10:08:06 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 10:08:58 - pico-train - INFO - Step 96800 -- ๐ Training Metrics |
| 2025-08-31 10:08:58 - pico-train - INFO - โโโ Loss: 4.7358 |
| 2025-08-31 10:08:58 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 10:08:58 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 10:09:49 - pico-train - INFO - Step 96900 -- ๐ Training Metrics |
| 2025-08-31 10:09:49 - pico-train - INFO - โโโ Loss: 4.7259 |
| 2025-08-31 10:09:49 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 10:09:49 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 10:10:42 - pico-train - INFO - Step 97000 -- ๐ Training Metrics |
| 2025-08-31 10:10:42 - pico-train - INFO - โโโ Loss: 4.7149 |
| 2025-08-31 10:10:42 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 10:10:42 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 10:11:33 - pico-train - INFO - Step 97100 -- ๐ Training Metrics |
| 2025-08-31 10:11:33 - pico-train - INFO - โโโ Loss: 4.7222 |
| 2025-08-31 10:11:33 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 10:11:33 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 10:12:25 - pico-train - INFO - Step 97200 -- ๐ Training Metrics |
| 2025-08-31 10:12:25 - pico-train - INFO - โโโ Loss: 4.6903 |
| 2025-08-31 10:12:25 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 10:12:25 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 10:13:17 - pico-train - INFO - Step 97300 -- ๐ Training Metrics |
| 2025-08-31 10:13:17 - pico-train - INFO - โโโ Loss: 4.7370 |
| 2025-08-31 10:13:17 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 10:13:17 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 10:14:09 - pico-train - INFO - Step 97400 -- ๐ Training Metrics |
| 2025-08-31 10:14:09 - pico-train - INFO - โโโ Loss: 4.7236 |
| 2025-08-31 10:14:09 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 10:14:09 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 10:15:01 - pico-train - INFO - Step 97500 -- ๐ Training Metrics |
| 2025-08-31 10:15:01 - pico-train - INFO - โโโ Loss: 4.6993 |
| 2025-08-31 10:15:01 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 10:15:01 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 10:15:53 - pico-train - INFO - Step 97600 -- ๐ Training Metrics |
| 2025-08-31 10:15:53 - pico-train - INFO - โโโ Loss: 4.7178 |
| 2025-08-31 10:15:53 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 10:15:53 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 10:16:44 - pico-train - INFO - Step 97700 -- ๐ Training Metrics |
| 2025-08-31 10:16:44 - pico-train - INFO - โโโ Loss: 4.6855 |
| 2025-08-31 10:16:44 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 10:16:44 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 10:17:36 - pico-train - INFO - Step 97800 -- ๐ Training Metrics |
| 2025-08-31 10:17:36 - pico-train - INFO - โโโ Loss: 4.6959 |
| 2025-08-31 10:17:36 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 10:17:36 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 10:18:28 - pico-train - INFO - Step 97900 -- ๐ Training Metrics |
| 2025-08-31 10:18:28 - pico-train - INFO - โโโ Loss: 4.6574 |
| 2025-08-31 10:18:28 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 10:18:28 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 10:19:20 - pico-train - INFO - Step 98000 -- ๐พ Saving Checkpoint |
| 2025-08-31 10:21:20 - pico-train - INFO - Step 98000 -- ๐ Evaluation Results |
| 2025-08-31 10:21:20 - pico-train - INFO - โโโ paloma: inf |
| 2025-08-31 10:21:21 - pico-train - INFO - Step 98000 -- ๐ Training Metrics |
| 2025-08-31 10:21:21 - pico-train - INFO - โโโ Loss: 4.6908 |
| 2025-08-31 10:21:21 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 10:21:21 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 10:21:21 - pico-train - INFO - Step 98000 -- ๐ Saving Learning Dynamics |
| 2025-08-31 10:22:13 - pico-train - INFO - Step 98100 -- ๐ Training Metrics |
| 2025-08-31 10:22:13 - pico-train - INFO - โโโ Loss: 4.6721 |
| 2025-08-31 10:22:13 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 10:22:13 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 10:23:05 - pico-train - INFO - Step 98200 -- ๐ Training Metrics |
| 2025-08-31 10:23:05 - pico-train - INFO - โโโ Loss: 4.6259 |
| 2025-08-31 10:23:05 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 10:23:05 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 10:23:56 - pico-train - INFO - Step 98300 -- ๐ Training Metrics |
| 2025-08-31 10:23:56 - pico-train - INFO - โโโ Loss: 4.7160 |
| 2025-08-31 10:23:56 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 10:23:56 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 10:24:47 - pico-train - INFO - Step 98400 -- ๐ Training Metrics |
| 2025-08-31 10:24:47 - pico-train - INFO - โโโ Loss: 4.6780 |
| 2025-08-31 10:24:47 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 10:24:47 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 10:25:39 - pico-train - INFO - Step 98500 -- ๐ Training Metrics |
| 2025-08-31 10:25:39 - pico-train - INFO - โโโ Loss: 4.6724 |
| 2025-08-31 10:25:39 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 10:25:39 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 10:26:31 - pico-train - INFO - Step 98600 -- ๐ Training Metrics |
| 2025-08-31 10:26:31 - pico-train - INFO - โโโ Loss: 4.6849 |
| 2025-08-31 10:26:31 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 10:26:31 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 10:27:22 - pico-train - INFO - Step 98700 -- ๐ Training Metrics |
| 2025-08-31 10:27:22 - pico-train - INFO - โโโ Loss: 4.7412 |
| 2025-08-31 10:27:22 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 10:27:22 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 10:28:14 - pico-train - INFO - Step 98800 -- ๐ Training Metrics |
| 2025-08-31 10:28:14 - pico-train - INFO - โโโ Loss: 4.6764 |
| 2025-08-31 10:28:14 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 10:28:14 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 10:29:06 - pico-train - INFO - Step 98900 -- ๐ Training Metrics |
| 2025-08-31 10:29:06 - pico-train - INFO - โโโ Loss: 4.6885 |
| 2025-08-31 10:29:06 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 10:29:06 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 10:29:58 - pico-train - INFO - Step 99000 -- ๐ Training Metrics |
| 2025-08-31 10:29:58 - pico-train - INFO - โโโ Loss: 4.6226 |
| 2025-08-31 10:29:58 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 10:29:58 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 10:30:50 - pico-train - INFO - Step 99100 -- ๐ Training Metrics |
| 2025-08-31 10:30:50 - pico-train - INFO - โโโ Loss: 4.6555 |
| 2025-08-31 10:30:50 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 10:30:50 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 10:31:41 - pico-train - INFO - Step 99200 -- ๐ Training Metrics |
| 2025-08-31 10:31:41 - pico-train - INFO - โโโ Loss: 4.7045 |
| 2025-08-31 10:31:41 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 10:31:41 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 10:32:33 - pico-train - INFO - Step 99300 -- ๐ Training Metrics |
| 2025-08-31 10:32:33 - pico-train - INFO - โโโ Loss: 4.6697 |
| 2025-08-31 10:32:33 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 10:32:33 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 10:33:25 - pico-train - INFO - Step 99400 -- ๐ Training Metrics |
| 2025-08-31 10:33:25 - pico-train - INFO - โโโ Loss: 4.7305 |
| 2025-08-31 10:33:25 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 10:33:25 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 10:34:17 - pico-train - INFO - Step 99500 -- ๐ Training Metrics |
| 2025-08-31 10:34:17 - pico-train - INFO - โโโ Loss: 4.7240 |
| 2025-08-31 10:34:17 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 10:34:17 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 10:35:09 - pico-train - INFO - Step 99600 -- ๐ Training Metrics |
| 2025-08-31 10:35:09 - pico-train - INFO - โโโ Loss: 4.7108 |
| 2025-08-31 10:35:09 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 10:35:09 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 10:36:01 - pico-train - INFO - Step 99700 -- ๐ Training Metrics |
| 2025-08-31 10:36:01 - pico-train - INFO - โโโ Loss: 4.6812 |
| 2025-08-31 10:36:01 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 10:36:01 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 10:36:53 - pico-train - INFO - Step 99800 -- ๐ Training Metrics |
| 2025-08-31 10:36:53 - pico-train - INFO - โโโ Loss: 4.6736 |
| 2025-08-31 10:36:53 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 10:36:53 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 10:37:45 - pico-train - INFO - Step 99900 -- ๐ Training Metrics |
| 2025-08-31 10:37:45 - pico-train - INFO - โโโ Loss: 4.7339 |
| 2025-08-31 10:37:45 - pico-train - INFO - โโโ Learning Rate: 2.00e-05 |
| 2025-08-31 10:37:45 - pico-train - INFO - โโโ Inf/NaN count: 0 |
| 2025-08-31 10:38:36 - pico-train - INFO - Step 100000 -- ๐พ Saving Checkpoint |
| 2025-08-31 10:40:26 - pico-train - INFO - Step 100000 -- ๐ Evaluation Results |
| 2025-08-31 10:40:26 - pico-train - INFO - โโโ paloma: inf |
| 2025-08-31 10:40:26 - pico-train - INFO - ๐ Training complete! Final step: 100000 |
|
|