| 2025-08-30 15:43:27 - pico-train - INFO - Step 62000 -- ๐ Evaluation Results | |
| 2025-08-30 15:43:27 - pico-train - INFO - โโโ paloma: inf | |
| 2025-08-30 15:43:28 - pico-train - INFO - ================================================== | |
| 2025-08-30 15:43:28 - pico-train - INFO - โจ Training Configuration | |
| 2025-08-30 15:43:28 - pico-train - INFO - ================================================== | |
| 2025-08-30 15:43:28 - pico-train - INFO - โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ checkpointing: โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ checkpoints_dir: checkpoints โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ evaluation: โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ eval_results_dir: eval_results โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ fabric_checkpoint_dir: fabric_state โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ fabric_checkpoint_filename: checkpoint.pt โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ hf_checkpoint: โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ collection_slug: null โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ repo_id: ThomasTheMaker/pico-decoder-tiny โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ learning_dynamics: โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ batch_size: 1 โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ eval_data: null โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ layer_suffixes: โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ - attention.v_proj โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ - attention.o_proj โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ - swiglu.w_2 โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ sequence_idx: -1 โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ learning_dynamics_dir: learning_dynamics โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ logs_dir: logs โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ run_name: pico-decoder-tiny-dolma10M-v1 โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ runs_dir: runs โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ save_every_n_steps: 2000 โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ save_to_hf: true โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ training: โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ auto_resume: true โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ data: โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ dataloader: โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ batch_size: 16 โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ dataset: โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ name: ThomasTheMaker/pretokenized-dolma-10M โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ tokenizer: โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ name: allenai/OLMo-7B-0724-hf โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ vocab_size: 50304 โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ evaluation: โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ metrics: โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ - paloma โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ paloma: โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ batch_size: 1 โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ dataset_name: pico-lm/pretokenized-paloma-tinsy โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ dataset_split: val โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ max_length: 2048 โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ model: โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ activation_hidden_dim: 384 โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ attention_n_heads: 12 โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ attention_n_kv_heads: 4 โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ batch_size: 1024 โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ d_model: 96 โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ max_seq_len: 2048 โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ model_type: pico_decoder โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ n_layers: 12 โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ norm_eps: 1.0e-06 โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ position_emb_theta: 10000.0 โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ vocab_size: 50304 โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ monitoring: โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ logging: โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ log_every_n_steps: 100 โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ log_level: INFO โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ save_to_wandb: false โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ wandb: โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ entity: boymyc โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ project: pico-decoder-tiny โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ training: โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ fabric: โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ accelerator: cuda โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ num_devices: 1 โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ num_nodes: 1 โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ precision: bf16-mixed โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ max_steps: 100000 โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ optimization: โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ gradient_accumulation_steps: 1 โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ lr: 0.0002 โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ lr_scheduler: cosine โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ lr_warmup_steps: 2000 โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ optimizer: adamw โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โ โ | |
| 2025-08-30 15:43:28 - pico-train - INFO - โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ | |
| 2025-08-30 15:43:28 - pico-train - INFO - ================================================== | |
| 2025-08-30 15:43:28 - pico-train - INFO - โญ Runtime Summary: | |
| 2025-08-30 15:43:28 - pico-train - INFO - ================================================== | |
| 2025-08-30 15:43:28 - pico-train - INFO - Starting from step: 62000 | |
| 2025-08-30 15:43:28 - pico-train - INFO - Model Setup: | |
| 2025-08-30 15:43:28 - pico-train - INFO - โโ Total Parameters: 11,282,784 | |
| 2025-08-30 15:43:28 - pico-train - INFO - โโ Trainable Parameters: 11,282,784 | |
| 2025-08-30 15:43:28 - pico-train - INFO - Distributed Setup: | |
| 2025-08-30 15:43:28 - pico-train - INFO - โโ Number of Devices: 1 | |
| 2025-08-30 15:43:28 - pico-train - INFO - โโ Device Type: NVIDIA H100 80GB HBM3 | |
| 2025-08-30 15:43:28 - pico-train - INFO - โโ Available Memory: 85.03 GB | |
| 2025-08-30 15:43:28 - pico-train - INFO - Software Setup: | |
| 2025-08-30 15:43:28 - pico-train - INFO - โโ Python Version: 3.12.3 | |
| 2025-08-30 15:43:28 - pico-train - INFO - โโ PyTorch Version: 2.8.0+cu128 | |
| 2025-08-30 15:43:28 - pico-train - INFO - โโ CUDA Version: 12.8 | |
| 2025-08-30 15:43:28 - pico-train - INFO - โโ Operating System: Linux 6.8.0-71-generic | |
| 2025-08-30 15:43:28 - pico-train - INFO - Batch Size Configuration: | |
| 2025-08-30 15:43:28 - pico-train - INFO - โโ Global Batch Size: 16 | |
| 2025-08-30 15:43:28 - pico-train - INFO - โโ Per Device Batch Size: 16 | |
| 2025-08-30 15:43:28 - pico-train - INFO - โโ Gradient Accumulation Steps: 1 | |
| 2025-08-30 15:43:28 - pico-train - INFO - ================================================== | |
| 2025-08-30 15:43:29 - pico-train - INFO - Step 62000 -- ๐ Training Metrics | |
| 2025-08-30 15:43:29 - pico-train - INFO - โโโ Loss: 4.5970 | |
| 2025-08-30 15:43:29 - pico-train - INFO - โโโ Learning Rate: 6.55e-05 | |
| 2025-08-30 15:43:29 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 15:43:29 - pico-train - INFO - Step 62000 -- ๐ Saving Learning Dynamics | |
| 2025-08-30 15:44:25 - pico-train - INFO - Step 62100 -- ๐ Training Metrics | |
| 2025-08-30 15:44:25 - pico-train - INFO - โโโ Loss: 4.8133 | |
| 2025-08-30 15:44:25 - pico-train - INFO - โโโ Learning Rate: 6.52e-05 | |
| 2025-08-30 15:44:25 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 15:45:17 - pico-train - INFO - Step 62200 -- ๐ Training Metrics | |
| 2025-08-30 15:45:17 - pico-train - INFO - โโโ Loss: 4.8221 | |
| 2025-08-30 15:45:17 - pico-train - INFO - โโโ Learning Rate: 6.49e-05 | |
| 2025-08-30 15:45:17 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 15:46:09 - pico-train - INFO - Step 62300 -- ๐ Training Metrics | |
| 2025-08-30 15:46:09 - pico-train - INFO - โโโ Loss: 4.8068 | |
| 2025-08-30 15:46:09 - pico-train - INFO - โโโ Learning Rate: 6.46e-05 | |
| 2025-08-30 15:46:09 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 15:47:01 - pico-train - INFO - Step 62400 -- ๐ Training Metrics | |
| 2025-08-30 15:47:01 - pico-train - INFO - โโโ Loss: 4.7858 | |
| 2025-08-30 15:47:01 - pico-train - INFO - โโโ Learning Rate: 6.43e-05 | |
| 2025-08-30 15:47:01 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 15:47:53 - pico-train - INFO - Step 62500 -- ๐ Training Metrics | |
| 2025-08-30 15:47:53 - pico-train - INFO - โโโ Loss: 4.8460 | |
| 2025-08-30 15:47:53 - pico-train - INFO - โโโ Learning Rate: 6.40e-05 | |
| 2025-08-30 15:47:53 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 15:48:45 - pico-train - INFO - Step 62600 -- ๐ Training Metrics | |
| 2025-08-30 15:48:45 - pico-train - INFO - โโโ Loss: 4.8264 | |
| 2025-08-30 15:48:45 - pico-train - INFO - โโโ Learning Rate: 6.37e-05 | |
| 2025-08-30 15:48:45 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 15:49:37 - pico-train - INFO - Step 62700 -- ๐ Training Metrics | |
| 2025-08-30 15:49:37 - pico-train - INFO - โโโ Loss: 4.8266 | |
| 2025-08-30 15:49:37 - pico-train - INFO - โโโ Learning Rate: 6.34e-05 | |
| 2025-08-30 15:49:37 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 15:50:29 - pico-train - INFO - Step 62800 -- ๐ Training Metrics | |
| 2025-08-30 15:50:29 - pico-train - INFO - โโโ Loss: 4.8317 | |
| 2025-08-30 15:50:29 - pico-train - INFO - โโโ Learning Rate: 6.31e-05 | |
| 2025-08-30 15:50:29 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 15:51:20 - pico-train - INFO - Step 62900 -- ๐ Training Metrics | |
| 2025-08-30 15:51:20 - pico-train - INFO - โโโ Loss: 4.8337 | |
| 2025-08-30 15:51:20 - pico-train - INFO - โโโ Learning Rate: 6.28e-05 | |
| 2025-08-30 15:51:20 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 15:52:12 - pico-train - INFO - Step 63000 -- ๐ Training Metrics | |
| 2025-08-30 15:52:12 - pico-train - INFO - โโโ Loss: 4.8183 | |
| 2025-08-30 15:52:12 - pico-train - INFO - โโโ Learning Rate: 6.25e-05 | |
| 2025-08-30 15:52:12 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 15:53:04 - pico-train - INFO - Step 63100 -- ๐ Training Metrics | |
| 2025-08-30 15:53:04 - pico-train - INFO - โโโ Loss: 4.8177 | |
| 2025-08-30 15:53:04 - pico-train - INFO - โโโ Learning Rate: 6.22e-05 | |
| 2025-08-30 15:53:04 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 15:53:56 - pico-train - INFO - Step 63200 -- ๐ Training Metrics | |
| 2025-08-30 15:53:56 - pico-train - INFO - โโโ Loss: 4.8094 | |
| 2025-08-30 15:53:56 - pico-train - INFO - โโโ Learning Rate: 6.19e-05 | |
| 2025-08-30 15:53:56 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 15:54:48 - pico-train - INFO - Step 63300 -- ๐ Training Metrics | |
| 2025-08-30 15:54:48 - pico-train - INFO - โโโ Loss: 4.8294 | |
| 2025-08-30 15:54:48 - pico-train - INFO - โโโ Learning Rate: 6.16e-05 | |
| 2025-08-30 15:54:48 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 15:55:40 - pico-train - INFO - Step 63400 -- ๐ Training Metrics | |
| 2025-08-30 15:55:40 - pico-train - INFO - โโโ Loss: 4.8073 | |
| 2025-08-30 15:55:40 - pico-train - INFO - โโโ Learning Rate: 6.13e-05 | |
| 2025-08-30 15:55:40 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 15:56:32 - pico-train - INFO - Step 63500 -- ๐ Training Metrics | |
| 2025-08-30 15:56:32 - pico-train - INFO - โโโ Loss: 4.8364 | |
| 2025-08-30 15:56:32 - pico-train - INFO - โโโ Learning Rate: 6.10e-05 | |
| 2025-08-30 15:56:32 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 15:57:23 - pico-train - INFO - Step 63600 -- ๐ Training Metrics | |
| 2025-08-30 15:57:23 - pico-train - INFO - โโโ Loss: 4.8236 | |
| 2025-08-30 15:57:23 - pico-train - INFO - โโโ Learning Rate: 6.07e-05 | |
| 2025-08-30 15:57:23 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 15:58:15 - pico-train - INFO - Step 63700 -- ๐ Training Metrics | |
| 2025-08-30 15:58:15 - pico-train - INFO - โโโ Loss: 4.8114 | |
| 2025-08-30 15:58:15 - pico-train - INFO - โโโ Learning Rate: 6.04e-05 | |
| 2025-08-30 15:58:15 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 15:59:07 - pico-train - INFO - Step 63800 -- ๐ Training Metrics | |
| 2025-08-30 15:59:07 - pico-train - INFO - โโโ Loss: 4.8078 | |
| 2025-08-30 15:59:07 - pico-train - INFO - โโโ Learning Rate: 6.01e-05 | |
| 2025-08-30 15:59:07 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 15:59:59 - pico-train - INFO - Step 63900 -- ๐ Training Metrics | |
| 2025-08-30 15:59:59 - pico-train - INFO - โโโ Loss: 4.8107 | |
| 2025-08-30 15:59:59 - pico-train - INFO - โโโ Learning Rate: 5.98e-05 | |
| 2025-08-30 15:59:59 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 16:00:50 - pico-train - INFO - Step 64000 -- ๐พ Saving Checkpoint | |
| 2025-08-30 16:02:54 - pico-train - INFO - Step 64000 -- ๐ Evaluation Results | |
| 2025-08-30 16:02:54 - pico-train - INFO - โโโ paloma: inf | |
| 2025-08-30 16:02:56 - pico-train - INFO - Step 64000 -- ๐ Training Metrics | |
| 2025-08-30 16:02:56 - pico-train - INFO - โโโ Loss: 4.8145 | |
| 2025-08-30 16:02:56 - pico-train - INFO - โโโ Learning Rate: 5.95e-05 | |
| 2025-08-30 16:02:56 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 16:02:56 - pico-train - INFO - Step 64000 -- ๐ Saving Learning Dynamics | |
| 2025-08-30 16:03:52 - pico-train - INFO - Step 64100 -- ๐ Training Metrics | |
| 2025-08-30 16:03:52 - pico-train - INFO - โโโ Loss: 4.8479 | |
| 2025-08-30 16:03:52 - pico-train - INFO - โโโ Learning Rate: 5.92e-05 | |
| 2025-08-30 16:03:52 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 16:04:44 - pico-train - INFO - Step 64200 -- ๐ Training Metrics | |
| 2025-08-30 16:04:44 - pico-train - INFO - โโโ Loss: 4.8139 | |
| 2025-08-30 16:04:44 - pico-train - INFO - โโโ Learning Rate: 5.89e-05 | |
| 2025-08-30 16:04:44 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 16:05:36 - pico-train - INFO - Step 64300 -- ๐ Training Metrics | |
| 2025-08-30 16:05:36 - pico-train - INFO - โโโ Loss: 4.7867 | |
| 2025-08-30 16:05:36 - pico-train - INFO - โโโ Learning Rate: 5.86e-05 | |
| 2025-08-30 16:05:36 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 16:06:28 - pico-train - INFO - Step 64400 -- ๐ Training Metrics | |
| 2025-08-30 16:06:28 - pico-train - INFO - โโโ Loss: 4.8168 | |
| 2025-08-30 16:06:28 - pico-train - INFO - โโโ Learning Rate: 5.84e-05 | |
| 2025-08-30 16:06:28 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 16:07:20 - pico-train - INFO - Step 64500 -- ๐ Training Metrics | |
| 2025-08-30 16:07:20 - pico-train - INFO - โโโ Loss: 4.8131 | |
| 2025-08-30 16:07:20 - pico-train - INFO - โโโ Learning Rate: 5.81e-05 | |
| 2025-08-30 16:07:20 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 16:08:12 - pico-train - INFO - Step 64600 -- ๐ Training Metrics | |
| 2025-08-30 16:08:12 - pico-train - INFO - โโโ Loss: 4.8285 | |
| 2025-08-30 16:08:12 - pico-train - INFO - โโโ Learning Rate: 5.78e-05 | |
| 2025-08-30 16:08:12 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 16:09:04 - pico-train - INFO - Step 64700 -- ๐ Training Metrics | |
| 2025-08-30 16:09:04 - pico-train - INFO - โโโ Loss: 4.8170 | |
| 2025-08-30 16:09:04 - pico-train - INFO - โโโ Learning Rate: 5.75e-05 | |
| 2025-08-30 16:09:04 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 16:09:56 - pico-train - INFO - Step 64800 -- ๐ Training Metrics | |
| 2025-08-30 16:09:56 - pico-train - INFO - โโโ Loss: 4.8317 | |
| 2025-08-30 16:09:56 - pico-train - INFO - โโโ Learning Rate: 5.72e-05 | |
| 2025-08-30 16:09:56 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 16:10:48 - pico-train - INFO - Step 64900 -- ๐ Training Metrics | |
| 2025-08-30 16:10:48 - pico-train - INFO - โโโ Loss: 4.8368 | |
| 2025-08-30 16:10:48 - pico-train - INFO - โโโ Learning Rate: 5.69e-05 | |
| 2025-08-30 16:10:48 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 16:11:40 - pico-train - INFO - Step 65000 -- ๐ Training Metrics | |
| 2025-08-30 16:11:40 - pico-train - INFO - โโโ Loss: 4.8129 | |
| 2025-08-30 16:11:40 - pico-train - INFO - โโโ Learning Rate: 5.66e-05 | |
| 2025-08-30 16:11:40 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 16:12:32 - pico-train - INFO - Step 65100 -- ๐ Training Metrics | |
| 2025-08-30 16:12:32 - pico-train - INFO - โโโ Loss: 4.8226 | |
| 2025-08-30 16:12:32 - pico-train - INFO - โโโ Learning Rate: 5.63e-05 | |
| 2025-08-30 16:12:32 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 16:13:24 - pico-train - INFO - Step 65200 -- ๐ Training Metrics | |
| 2025-08-30 16:13:24 - pico-train - INFO - โโโ Loss: 4.8321 | |
| 2025-08-30 16:13:24 - pico-train - INFO - โโโ Learning Rate: 5.60e-05 | |
| 2025-08-30 16:13:24 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 16:14:16 - pico-train - INFO - Step 65300 -- ๐ Training Metrics | |
| 2025-08-30 16:14:16 - pico-train - INFO - โโโ Loss: 4.8352 | |
| 2025-08-30 16:14:16 - pico-train - INFO - โโโ Learning Rate: 5.57e-05 | |
| 2025-08-30 16:14:16 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 16:15:08 - pico-train - INFO - Step 65400 -- ๐ Training Metrics | |
| 2025-08-30 16:15:08 - pico-train - INFO - โโโ Loss: 4.8119 | |
| 2025-08-30 16:15:08 - pico-train - INFO - โโโ Learning Rate: 5.55e-05 | |
| 2025-08-30 16:15:08 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 16:16:00 - pico-train - INFO - Step 65500 -- ๐ Training Metrics | |
| 2025-08-30 16:16:00 - pico-train - INFO - โโโ Loss: 4.7889 | |
| 2025-08-30 16:16:00 - pico-train - INFO - โโโ Learning Rate: 5.52e-05 | |
| 2025-08-30 16:16:00 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 16:16:52 - pico-train - INFO - Step 65600 -- ๐ Training Metrics | |
| 2025-08-30 16:16:52 - pico-train - INFO - โโโ Loss: 4.8119 | |
| 2025-08-30 16:16:52 - pico-train - INFO - โโโ Learning Rate: 5.49e-05 | |
| 2025-08-30 16:16:52 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 16:17:44 - pico-train - INFO - Step 65700 -- ๐ Training Metrics | |
| 2025-08-30 16:17:44 - pico-train - INFO - โโโ Loss: 4.8193 | |
| 2025-08-30 16:17:44 - pico-train - INFO - โโโ Learning Rate: 5.46e-05 | |
| 2025-08-30 16:17:44 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 16:18:35 - pico-train - INFO - Step 65800 -- ๐ Training Metrics | |
| 2025-08-30 16:18:35 - pico-train - INFO - โโโ Loss: 4.8121 | |
| 2025-08-30 16:18:35 - pico-train - INFO - โโโ Learning Rate: 5.43e-05 | |
| 2025-08-30 16:18:35 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 16:19:27 - pico-train - INFO - Step 65900 -- ๐ Training Metrics | |
| 2025-08-30 16:19:27 - pico-train - INFO - โโโ Loss: 4.8057 | |
| 2025-08-30 16:19:27 - pico-train - INFO - โโโ Learning Rate: 5.40e-05 | |
| 2025-08-30 16:19:27 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 16:20:19 - pico-train - INFO - Step 66000 -- ๐พ Saving Checkpoint | |
| 2025-08-30 16:22:18 - pico-train - INFO - Step 66000 -- ๐ Evaluation Results | |
| 2025-08-30 16:22:18 - pico-train - INFO - โโโ paloma: inf | |
| 2025-08-30 16:22:20 - pico-train - INFO - Step 66000 -- ๐ Training Metrics | |
| 2025-08-30 16:22:20 - pico-train - INFO - โโโ Loss: 4.8260 | |
| 2025-08-30 16:22:20 - pico-train - INFO - โโโ Learning Rate: 5.37e-05 | |
| 2025-08-30 16:22:20 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 16:22:20 - pico-train - INFO - Step 66000 -- ๐ Saving Learning Dynamics | |
| 2025-08-30 16:23:16 - pico-train - INFO - Step 66100 -- ๐ Training Metrics | |
| 2025-08-30 16:23:16 - pico-train - INFO - โโโ Loss: 4.8110 | |
| 2025-08-30 16:23:16 - pico-train - INFO - โโโ Learning Rate: 5.35e-05 | |
| 2025-08-30 16:23:16 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 16:24:09 - pico-train - INFO - Step 66200 -- ๐ Training Metrics | |
| 2025-08-30 16:24:09 - pico-train - INFO - โโโ Loss: 4.8156 | |
| 2025-08-30 16:24:09 - pico-train - INFO - โโโ Learning Rate: 5.32e-05 | |
| 2025-08-30 16:24:09 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 16:25:02 - pico-train - INFO - Step 66300 -- ๐ Training Metrics | |
| 2025-08-30 16:25:02 - pico-train - INFO - โโโ Loss: 4.7928 | |
| 2025-08-30 16:25:02 - pico-train - INFO - โโโ Learning Rate: 5.29e-05 | |
| 2025-08-30 16:25:02 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 16:25:55 - pico-train - INFO - Step 66400 -- ๐ Training Metrics | |
| 2025-08-30 16:25:55 - pico-train - INFO - โโโ Loss: 4.8202 | |
| 2025-08-30 16:25:55 - pico-train - INFO - โโโ Learning Rate: 5.26e-05 | |
| 2025-08-30 16:25:55 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 16:26:49 - pico-train - INFO - Step 66500 -- ๐ Training Metrics | |
| 2025-08-30 16:26:49 - pico-train - INFO - โโโ Loss: 4.8117 | |
| 2025-08-30 16:26:49 - pico-train - INFO - โโโ Learning Rate: 5.23e-05 | |
| 2025-08-30 16:26:49 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 16:27:42 - pico-train - INFO - Step 66600 -- ๐ Training Metrics | |
| 2025-08-30 16:27:42 - pico-train - INFO - โโโ Loss: 4.8047 | |
| 2025-08-30 16:27:42 - pico-train - INFO - โโโ Learning Rate: 5.20e-05 | |
| 2025-08-30 16:27:42 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 16:28:34 - pico-train - INFO - Step 66700 -- ๐ Training Metrics | |
| 2025-08-30 16:28:34 - pico-train - INFO - โโโ Loss: 4.7995 | |
| 2025-08-30 16:28:34 - pico-train - INFO - โโโ Learning Rate: 5.18e-05 | |
| 2025-08-30 16:28:34 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 16:29:28 - pico-train - INFO - Step 66800 -- ๐ Training Metrics | |
| 2025-08-30 16:29:28 - pico-train - INFO - โโโ Loss: 4.8074 | |
| 2025-08-30 16:29:28 - pico-train - INFO - โโโ Learning Rate: 5.15e-05 | |
| 2025-08-30 16:29:28 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 16:30:21 - pico-train - INFO - Step 66900 -- ๐ Training Metrics | |
| 2025-08-30 16:30:21 - pico-train - INFO - โโโ Loss: 4.7890 | |
| 2025-08-30 16:30:21 - pico-train - INFO - โโโ Learning Rate: 5.12e-05 | |
| 2025-08-30 16:30:21 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 16:31:14 - pico-train - INFO - Step 67000 -- ๐ Training Metrics | |
| 2025-08-30 16:31:14 - pico-train - INFO - โโโ Loss: 4.8216 | |
| 2025-08-30 16:31:14 - pico-train - INFO - โโโ Learning Rate: 5.09e-05 | |
| 2025-08-30 16:31:14 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 16:32:07 - pico-train - INFO - Step 67100 -- ๐ Training Metrics | |
| 2025-08-30 16:32:07 - pico-train - INFO - โโโ Loss: 4.8034 | |
| 2025-08-30 16:32:07 - pico-train - INFO - โโโ Learning Rate: 5.06e-05 | |
| 2025-08-30 16:32:07 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 16:32:59 - pico-train - INFO - Step 67200 -- ๐ Training Metrics | |
| 2025-08-30 16:32:59 - pico-train - INFO - โโโ Loss: 4.8062 | |
| 2025-08-30 16:32:59 - pico-train - INFO - โโโ Learning Rate: 5.04e-05 | |
| 2025-08-30 16:32:59 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 16:33:51 - pico-train - INFO - Step 67300 -- ๐ Training Metrics | |
| 2025-08-30 16:33:51 - pico-train - INFO - โโโ Loss: 4.8106 | |
| 2025-08-30 16:33:51 - pico-train - INFO - โโโ Learning Rate: 5.01e-05 | |
| 2025-08-30 16:33:51 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 16:34:43 - pico-train - INFO - Step 67400 -- ๐ Training Metrics | |
| 2025-08-30 16:34:43 - pico-train - INFO - โโโ Loss: 4.8168 | |
| 2025-08-30 16:34:43 - pico-train - INFO - โโโ Learning Rate: 4.98e-05 | |
| 2025-08-30 16:34:43 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 16:35:36 - pico-train - INFO - Step 67500 -- ๐ Training Metrics | |
| 2025-08-30 16:35:36 - pico-train - INFO - โโโ Loss: 4.7968 | |
| 2025-08-30 16:35:36 - pico-train - INFO - โโโ Learning Rate: 4.95e-05 | |
| 2025-08-30 16:35:36 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 16:36:27 - pico-train - INFO - Step 67600 -- ๐ Training Metrics | |
| 2025-08-30 16:36:27 - pico-train - INFO - โโโ Loss: 4.7905 | |
| 2025-08-30 16:36:27 - pico-train - INFO - โโโ Learning Rate: 4.93e-05 | |
| 2025-08-30 16:36:27 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 16:37:19 - pico-train - INFO - Step 67700 -- ๐ Training Metrics | |
| 2025-08-30 16:37:19 - pico-train - INFO - โโโ Loss: 4.8253 | |
| 2025-08-30 16:37:19 - pico-train - INFO - โโโ Learning Rate: 4.90e-05 | |
| 2025-08-30 16:37:19 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 16:38:11 - pico-train - INFO - Step 67800 -- ๐ Training Metrics | |
| 2025-08-30 16:38:11 - pico-train - INFO - โโโ Loss: 4.7848 | |
| 2025-08-30 16:38:11 - pico-train - INFO - โโโ Learning Rate: 4.87e-05 | |
| 2025-08-30 16:38:11 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 16:39:03 - pico-train - INFO - Step 67900 -- ๐ Training Metrics | |
| 2025-08-30 16:39:03 - pico-train - INFO - โโโ Loss: 4.8165 | |
| 2025-08-30 16:39:03 - pico-train - INFO - โโโ Learning Rate: 4.84e-05 | |
| 2025-08-30 16:39:03 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 16:39:55 - pico-train - INFO - Step 68000 -- ๐พ Saving Checkpoint | |
| 2025-08-30 16:42:09 - pico-train - INFO - Step 68000 -- ๐ Evaluation Results | |
| 2025-08-30 16:42:09 - pico-train - INFO - โโโ paloma: inf | |
| 2025-08-30 16:42:10 - pico-train - INFO - Step 68000 -- ๐ Training Metrics | |
| 2025-08-30 16:42:10 - pico-train - INFO - โโโ Loss: 4.8264 | |
| 2025-08-30 16:42:10 - pico-train - INFO - โโโ Learning Rate: 4.82e-05 | |
| 2025-08-30 16:42:10 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 16:42:10 - pico-train - INFO - Step 68000 -- ๐ Saving Learning Dynamics | |
| 2025-08-30 16:43:07 - pico-train - INFO - Step 68100 -- ๐ Training Metrics | |
| 2025-08-30 16:43:07 - pico-train - INFO - โโโ Loss: 4.8363 | |
| 2025-08-30 16:43:07 - pico-train - INFO - โโโ Learning Rate: 4.79e-05 | |
| 2025-08-30 16:43:07 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 16:43:59 - pico-train - INFO - Step 68200 -- ๐ Training Metrics | |
| 2025-08-30 16:43:59 - pico-train - INFO - โโโ Loss: 4.7964 | |
| 2025-08-30 16:43:59 - pico-train - INFO - โโโ Learning Rate: 4.76e-05 | |
| 2025-08-30 16:43:59 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 16:44:51 - pico-train - INFO - Step 68300 -- ๐ Training Metrics | |
| 2025-08-30 16:44:51 - pico-train - INFO - โโโ Loss: 4.7999 | |
| 2025-08-30 16:44:51 - pico-train - INFO - โโโ Learning Rate: 4.73e-05 | |
| 2025-08-30 16:44:51 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 16:45:43 - pico-train - INFO - Step 68400 -- ๐ Training Metrics | |
| 2025-08-30 16:45:43 - pico-train - INFO - โโโ Loss: 4.8119 | |
| 2025-08-30 16:45:43 - pico-train - INFO - โโโ Learning Rate: 4.71e-05 | |
| 2025-08-30 16:45:43 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 16:46:35 - pico-train - INFO - Step 68500 -- ๐ Training Metrics | |
| 2025-08-30 16:46:35 - pico-train - INFO - โโโ Loss: 4.7998 | |
| 2025-08-30 16:46:35 - pico-train - INFO - โโโ Learning Rate: 4.68e-05 | |
| 2025-08-30 16:46:35 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 16:47:27 - pico-train - INFO - Step 68600 -- ๐ Training Metrics | |
| 2025-08-30 16:47:27 - pico-train - INFO - โโโ Loss: 4.8010 | |
| 2025-08-30 16:47:27 - pico-train - INFO - โโโ Learning Rate: 4.65e-05 | |
| 2025-08-30 16:47:27 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 16:48:19 - pico-train - INFO - Step 68700 -- ๐ Training Metrics | |
| 2025-08-30 16:48:19 - pico-train - INFO - โโโ Loss: 4.7986 | |
| 2025-08-30 16:48:19 - pico-train - INFO - โโโ Learning Rate: 4.63e-05 | |
| 2025-08-30 16:48:19 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 16:49:12 - pico-train - INFO - Step 68800 -- ๐ Training Metrics | |
| 2025-08-30 16:49:12 - pico-train - INFO - โโโ Loss: 4.8133 | |
| 2025-08-30 16:49:12 - pico-train - INFO - โโโ Learning Rate: 4.60e-05 | |
| 2025-08-30 16:49:12 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 16:50:05 - pico-train - INFO - Step 68900 -- ๐ Training Metrics | |
| 2025-08-30 16:50:05 - pico-train - INFO - โโโ Loss: 4.7944 | |
| 2025-08-30 16:50:05 - pico-train - INFO - โโโ Learning Rate: 4.57e-05 | |
| 2025-08-30 16:50:05 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 16:50:58 - pico-train - INFO - Step 69000 -- ๐ Training Metrics | |
| 2025-08-30 16:50:58 - pico-train - INFO - โโโ Loss: 4.8021 | |
| 2025-08-30 16:50:58 - pico-train - INFO - โโโ Learning Rate: 4.54e-05 | |
| 2025-08-30 16:50:58 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 16:51:51 - pico-train - INFO - Step 69100 -- ๐ Training Metrics | |
| 2025-08-30 16:51:51 - pico-train - INFO - โโโ Loss: 4.7611 | |
| 2025-08-30 16:51:51 - pico-train - INFO - โโโ Learning Rate: 4.52e-05 | |
| 2025-08-30 16:51:51 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 16:52:44 - pico-train - INFO - Step 69200 -- ๐ Training Metrics | |
| 2025-08-30 16:52:44 - pico-train - INFO - โโโ Loss: 4.7981 | |
| 2025-08-30 16:52:44 - pico-train - INFO - โโโ Learning Rate: 4.49e-05 | |
| 2025-08-30 16:52:44 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 16:53:38 - pico-train - INFO - Step 69300 -- ๐ Training Metrics | |
| 2025-08-30 16:53:38 - pico-train - INFO - โโโ Loss: 4.8066 | |
| 2025-08-30 16:53:38 - pico-train - INFO - โโโ Learning Rate: 4.46e-05 | |
| 2025-08-30 16:53:38 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 16:54:31 - pico-train - INFO - Step 69400 -- ๐ Training Metrics | |
| 2025-08-30 16:54:31 - pico-train - INFO - โโโ Loss: 4.8053 | |
| 2025-08-30 16:54:31 - pico-train - INFO - โโโ Learning Rate: 4.44e-05 | |
| 2025-08-30 16:54:31 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 16:55:23 - pico-train - INFO - Step 69500 -- ๐ Training Metrics | |
| 2025-08-30 16:55:23 - pico-train - INFO - โโโ Loss: 4.7953 | |
| 2025-08-30 16:55:23 - pico-train - INFO - โโโ Learning Rate: 4.41e-05 | |
| 2025-08-30 16:55:23 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 16:56:16 - pico-train - INFO - Step 69600 -- ๐ Training Metrics | |
| 2025-08-30 16:56:16 - pico-train - INFO - โโโ Loss: 4.8087 | |
| 2025-08-30 16:56:16 - pico-train - INFO - โโโ Learning Rate: 4.38e-05 | |
| 2025-08-30 16:56:16 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 16:57:10 - pico-train - INFO - Step 69700 -- ๐ Training Metrics | |
| 2025-08-30 16:57:10 - pico-train - INFO - โโโ Loss: 4.7915 | |
| 2025-08-30 16:57:10 - pico-train - INFO - โโโ Learning Rate: 4.36e-05 | |
| 2025-08-30 16:57:10 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 16:58:03 - pico-train - INFO - Step 69800 -- ๐ Training Metrics | |
| 2025-08-30 16:58:03 - pico-train - INFO - โโโ Loss: 4.8145 | |
| 2025-08-30 16:58:03 - pico-train - INFO - โโโ Learning Rate: 4.33e-05 | |
| 2025-08-30 16:58:03 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 16:58:56 - pico-train - INFO - Step 69900 -- ๐ Training Metrics | |
| 2025-08-30 16:58:56 - pico-train - INFO - โโโ Loss: 4.8056 | |
| 2025-08-30 16:58:56 - pico-train - INFO - โโโ Learning Rate: 4.31e-05 | |
| 2025-08-30 16:58:56 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 16:59:48 - pico-train - INFO - Step 70000 -- ๐พ Saving Checkpoint | |
| 2025-08-30 17:01:50 - pico-train - INFO - Step 70000 -- ๐ Evaluation Results | |
| 2025-08-30 17:01:50 - pico-train - INFO - โโโ paloma: inf | |
| 2025-08-30 17:01:52 - pico-train - INFO - Step 70000 -- ๐ Training Metrics | |
| 2025-08-30 17:01:52 - pico-train - INFO - โโโ Loss: 4.7898 | |
| 2025-08-30 17:01:52 - pico-train - INFO - โโโ Learning Rate: 4.28e-05 | |
| 2025-08-30 17:01:52 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 17:01:52 - pico-train - INFO - Step 70000 -- ๐ Saving Learning Dynamics | |
| 2025-08-30 17:02:48 - pico-train - INFO - Step 70100 -- ๐ Training Metrics | |
| 2025-08-30 17:02:48 - pico-train - INFO - โโโ Loss: 4.7929 | |
| 2025-08-30 17:02:48 - pico-train - INFO - โโโ Learning Rate: 4.25e-05 | |
| 2025-08-30 17:02:48 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 17:03:40 - pico-train - INFO - Step 70200 -- ๐ Training Metrics | |
| 2025-08-30 17:03:40 - pico-train - INFO - โโโ Loss: 4.8215 | |
| 2025-08-30 17:03:40 - pico-train - INFO - โโโ Learning Rate: 4.23e-05 | |
| 2025-08-30 17:03:40 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 17:04:32 - pico-train - INFO - Step 70300 -- ๐ Training Metrics | |
| 2025-08-30 17:04:32 - pico-train - INFO - โโโ Loss: 4.8139 | |
| 2025-08-30 17:04:32 - pico-train - INFO - โโโ Learning Rate: 4.20e-05 | |
| 2025-08-30 17:04:32 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 17:05:24 - pico-train - INFO - Step 70400 -- ๐ Training Metrics | |
| 2025-08-30 17:05:24 - pico-train - INFO - โโโ Loss: 4.7922 | |
| 2025-08-30 17:05:24 - pico-train - INFO - โโโ Learning Rate: 4.17e-05 | |
| 2025-08-30 17:05:24 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 17:06:16 - pico-train - INFO - Step 70500 -- ๐ Training Metrics | |
| 2025-08-30 17:06:16 - pico-train - INFO - โโโ Loss: 4.7923 | |
| 2025-08-30 17:06:16 - pico-train - INFO - โโโ Learning Rate: 4.15e-05 | |
| 2025-08-30 17:06:16 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 17:07:08 - pico-train - INFO - Step 70600 -- ๐ Training Metrics | |
| 2025-08-30 17:07:08 - pico-train - INFO - โโโ Loss: 4.8075 | |
| 2025-08-30 17:07:08 - pico-train - INFO - โโโ Learning Rate: 4.12e-05 | |
| 2025-08-30 17:07:08 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 17:08:00 - pico-train - INFO - Step 70700 -- ๐ Training Metrics | |
| 2025-08-30 17:08:00 - pico-train - INFO - โโโ Loss: 4.7833 | |
| 2025-08-30 17:08:00 - pico-train - INFO - โโโ Learning Rate: 4.10e-05 | |
| 2025-08-30 17:08:00 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 17:08:52 - pico-train - INFO - Step 70800 -- ๐ Training Metrics | |
| 2025-08-30 17:08:52 - pico-train - INFO - โโโ Loss: 4.8036 | |
| 2025-08-30 17:08:52 - pico-train - INFO - โโโ Learning Rate: 4.07e-05 | |
| 2025-08-30 17:08:52 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 17:09:44 - pico-train - INFO - Step 70900 -- ๐ Training Metrics | |
| 2025-08-30 17:09:44 - pico-train - INFO - โโโ Loss: 4.7910 | |
| 2025-08-30 17:09:44 - pico-train - INFO - โโโ Learning Rate: 4.04e-05 | |
| 2025-08-30 17:09:44 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 17:10:36 - pico-train - INFO - Step 71000 -- ๐ Training Metrics | |
| 2025-08-30 17:10:36 - pico-train - INFO - โโโ Loss: 4.7723 | |
| 2025-08-30 17:10:36 - pico-train - INFO - โโโ Learning Rate: 4.02e-05 | |
| 2025-08-30 17:10:36 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 17:11:28 - pico-train - INFO - Step 71100 -- ๐ Training Metrics | |
| 2025-08-30 17:11:28 - pico-train - INFO - โโโ Loss: 4.7768 | |
| 2025-08-30 17:11:28 - pico-train - INFO - โโโ Learning Rate: 3.99e-05 | |
| 2025-08-30 17:11:28 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 17:12:19 - pico-train - INFO - Step 71200 -- ๐ Training Metrics | |
| 2025-08-30 17:12:19 - pico-train - INFO - โโโ Loss: 4.7984 | |
| 2025-08-30 17:12:19 - pico-train - INFO - โโโ Learning Rate: 3.97e-05 | |
| 2025-08-30 17:12:19 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 17:13:11 - pico-train - INFO - Step 71300 -- ๐ Training Metrics | |
| 2025-08-30 17:13:11 - pico-train - INFO - โโโ Loss: 4.7825 | |
| 2025-08-30 17:13:11 - pico-train - INFO - โโโ Learning Rate: 3.94e-05 | |
| 2025-08-30 17:13:11 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 17:14:03 - pico-train - INFO - Step 71400 -- ๐ Training Metrics | |
| 2025-08-30 17:14:03 - pico-train - INFO - โโโ Loss: 4.8093 | |
| 2025-08-30 17:14:03 - pico-train - INFO - โโโ Learning Rate: 3.92e-05 | |
| 2025-08-30 17:14:03 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 17:14:55 - pico-train - INFO - Step 71500 -- ๐ Training Metrics | |
| 2025-08-30 17:14:55 - pico-train - INFO - โโโ Loss: 4.7903 | |
| 2025-08-30 17:14:55 - pico-train - INFO - โโโ Learning Rate: 3.89e-05 | |
| 2025-08-30 17:14:55 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 17:15:47 - pico-train - INFO - Step 71600 -- ๐ Training Metrics | |
| 2025-08-30 17:15:47 - pico-train - INFO - โโโ Loss: 4.8269 | |
| 2025-08-30 17:15:47 - pico-train - INFO - โโโ Learning Rate: 3.87e-05 | |
| 2025-08-30 17:15:47 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 17:16:39 - pico-train - INFO - Step 71700 -- ๐ Training Metrics | |
| 2025-08-30 17:16:39 - pico-train - INFO - โโโ Loss: 4.8135 | |
| 2025-08-30 17:16:39 - pico-train - INFO - โโโ Learning Rate: 3.84e-05 | |
| 2025-08-30 17:16:39 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 17:17:31 - pico-train - INFO - Step 71800 -- ๐ Training Metrics | |
| 2025-08-30 17:17:31 - pico-train - INFO - โโโ Loss: 4.7759 | |
| 2025-08-30 17:17:31 - pico-train - INFO - โโโ Learning Rate: 3.82e-05 | |
| 2025-08-30 17:17:31 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 17:18:22 - pico-train - INFO - Step 71900 -- ๐ Training Metrics | |
| 2025-08-30 17:18:22 - pico-train - INFO - โโโ Loss: 4.7837 | |
| 2025-08-30 17:18:22 - pico-train - INFO - โโโ Learning Rate: 3.79e-05 | |
| 2025-08-30 17:18:22 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 17:19:15 - pico-train - INFO - Step 72000 -- ๐พ Saving Checkpoint | |
| 2025-08-30 17:21:27 - pico-train - INFO - Step 72000 -- ๐ Evaluation Results | |
| 2025-08-30 17:21:27 - pico-train - INFO - โโโ paloma: inf | |
| 2025-08-30 17:21:28 - pico-train - INFO - Step 72000 -- ๐ Training Metrics | |
| 2025-08-30 17:21:28 - pico-train - INFO - โโโ Loss: 4.8016 | |
| 2025-08-30 17:21:28 - pico-train - INFO - โโโ Learning Rate: 3.77e-05 | |
| 2025-08-30 17:21:28 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 17:21:28 - pico-train - INFO - Step 72000 -- ๐ Saving Learning Dynamics | |
| 2025-08-30 17:22:25 - pico-train - INFO - Step 72100 -- ๐ Training Metrics | |
| 2025-08-30 17:22:25 - pico-train - INFO - โโโ Loss: 4.7643 | |
| 2025-08-30 17:22:25 - pico-train - INFO - โโโ Learning Rate: 3.74e-05 | |
| 2025-08-30 17:22:25 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 17:23:16 - pico-train - INFO - Step 72200 -- ๐ Training Metrics | |
| 2025-08-30 17:23:16 - pico-train - INFO - โโโ Loss: 4.7938 | |
| 2025-08-30 17:23:16 - pico-train - INFO - โโโ Learning Rate: 3.72e-05 | |
| 2025-08-30 17:23:16 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 17:24:08 - pico-train - INFO - Step 72300 -- ๐ Training Metrics | |
| 2025-08-30 17:24:08 - pico-train - INFO - โโโ Loss: 4.7962 | |
| 2025-08-30 17:24:08 - pico-train - INFO - โโโ Learning Rate: 3.69e-05 | |
| 2025-08-30 17:24:08 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 17:25:00 - pico-train - INFO - Step 72400 -- ๐ Training Metrics | |
| 2025-08-30 17:25:00 - pico-train - INFO - โโโ Loss: 4.8089 | |
| 2025-08-30 17:25:00 - pico-train - INFO - โโโ Learning Rate: 3.67e-05 | |
| 2025-08-30 17:25:00 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 17:25:52 - pico-train - INFO - Step 72500 -- ๐ Training Metrics | |
| 2025-08-30 17:25:52 - pico-train - INFO - โโโ Loss: 4.8081 | |
| 2025-08-30 17:25:52 - pico-train - INFO - โโโ Learning Rate: 3.64e-05 | |
| 2025-08-30 17:25:52 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 17:26:44 - pico-train - INFO - Step 72600 -- ๐ Training Metrics | |
| 2025-08-30 17:26:44 - pico-train - INFO - โโโ Loss: 4.8095 | |
| 2025-08-30 17:26:44 - pico-train - INFO - โโโ Learning Rate: 3.62e-05 | |
| 2025-08-30 17:26:44 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 17:27:36 - pico-train - INFO - Step 72700 -- ๐ Training Metrics | |
| 2025-08-30 17:27:36 - pico-train - INFO - โโโ Loss: 4.8020 | |
| 2025-08-30 17:27:36 - pico-train - INFO - โโโ Learning Rate: 3.59e-05 | |
| 2025-08-30 17:27:36 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 17:28:28 - pico-train - INFO - Step 72800 -- ๐ Training Metrics | |
| 2025-08-30 17:28:28 - pico-train - INFO - โโโ Loss: 4.7579 | |
| 2025-08-30 17:28:28 - pico-train - INFO - โโโ Learning Rate: 3.57e-05 | |
| 2025-08-30 17:28:28 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 17:29:20 - pico-train - INFO - Step 72900 -- ๐ Training Metrics | |
| 2025-08-30 17:29:20 - pico-train - INFO - โโโ Loss: 4.7869 | |
| 2025-08-30 17:29:20 - pico-train - INFO - โโโ Learning Rate: 3.54e-05 | |
| 2025-08-30 17:29:20 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 17:30:12 - pico-train - INFO - Step 73000 -- ๐ Training Metrics | |
| 2025-08-30 17:30:12 - pico-train - INFO - โโโ Loss: 4.7825 | |
| 2025-08-30 17:30:12 - pico-train - INFO - โโโ Learning Rate: 3.52e-05 | |
| 2025-08-30 17:30:12 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 17:31:03 - pico-train - INFO - Step 73100 -- ๐ Training Metrics | |
| 2025-08-30 17:31:03 - pico-train - INFO - โโโ Loss: 4.8111 | |
| 2025-08-30 17:31:03 - pico-train - INFO - โโโ Learning Rate: 3.49e-05 | |
| 2025-08-30 17:31:03 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 17:31:55 - pico-train - INFO - Step 73200 -- ๐ Training Metrics | |
| 2025-08-30 17:31:55 - pico-train - INFO - โโโ Loss: 4.8028 | |
| 2025-08-30 17:31:55 - pico-train - INFO - โโโ Learning Rate: 3.47e-05 | |
| 2025-08-30 17:31:55 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 17:32:47 - pico-train - INFO - Step 73300 -- ๐ Training Metrics | |
| 2025-08-30 17:32:47 - pico-train - INFO - โโโ Loss: 4.8025 | |
| 2025-08-30 17:32:47 - pico-train - INFO - โโโ Learning Rate: 3.44e-05 | |
| 2025-08-30 17:32:47 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 17:33:39 - pico-train - INFO - Step 73400 -- ๐ Training Metrics | |
| 2025-08-30 17:33:39 - pico-train - INFO - โโโ Loss: 4.7917 | |
| 2025-08-30 17:33:39 - pico-train - INFO - โโโ Learning Rate: 3.42e-05 | |
| 2025-08-30 17:33:39 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 17:34:31 - pico-train - INFO - Step 73500 -- ๐ Training Metrics | |
| 2025-08-30 17:34:31 - pico-train - INFO - โโโ Loss: 4.7851 | |
| 2025-08-30 17:34:31 - pico-train - INFO - โโโ Learning Rate: 3.40e-05 | |
| 2025-08-30 17:34:31 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 17:35:23 - pico-train - INFO - Step 73600 -- ๐ Training Metrics | |
| 2025-08-30 17:35:23 - pico-train - INFO - โโโ Loss: 4.7807 | |
| 2025-08-30 17:35:23 - pico-train - INFO - โโโ Learning Rate: 3.37e-05 | |
| 2025-08-30 17:35:23 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 17:36:15 - pico-train - INFO - Step 73700 -- ๐ Training Metrics | |
| 2025-08-30 17:36:15 - pico-train - INFO - โโโ Loss: 4.7741 | |
| 2025-08-30 17:36:15 - pico-train - INFO - โโโ Learning Rate: 3.35e-05 | |
| 2025-08-30 17:36:15 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 17:37:07 - pico-train - INFO - Step 73800 -- ๐ Training Metrics | |
| 2025-08-30 17:37:07 - pico-train - INFO - โโโ Loss: 4.8076 | |
| 2025-08-30 17:37:07 - pico-train - INFO - โโโ Learning Rate: 3.32e-05 | |
| 2025-08-30 17:37:07 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 17:37:59 - pico-train - INFO - Step 73900 -- ๐ Training Metrics | |
| 2025-08-30 17:37:59 - pico-train - INFO - โโโ Loss: 4.8119 | |
| 2025-08-30 17:37:59 - pico-train - INFO - โโโ Learning Rate: 3.30e-05 | |
| 2025-08-30 17:37:59 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 17:38:50 - pico-train - INFO - Step 74000 -- ๐พ Saving Checkpoint | |
| 2025-08-30 17:40:51 - pico-train - INFO - Step 74000 -- ๐ Evaluation Results | |
| 2025-08-30 17:40:51 - pico-train - INFO - โโโ paloma: inf | |
| 2025-08-30 17:40:53 - pico-train - INFO - Step 74000 -- ๐ Training Metrics | |
| 2025-08-30 17:40:53 - pico-train - INFO - โโโ Loss: 4.7960 | |
| 2025-08-30 17:40:53 - pico-train - INFO - โโโ Learning Rate: 3.28e-05 | |
| 2025-08-30 17:40:53 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 17:40:53 - pico-train - INFO - Step 74000 -- ๐ Saving Learning Dynamics | |
| 2025-08-30 17:41:49 - pico-train - INFO - Step 74100 -- ๐ Training Metrics | |
| 2025-08-30 17:41:49 - pico-train - INFO - โโโ Loss: 4.7909 | |
| 2025-08-30 17:41:49 - pico-train - INFO - โโโ Learning Rate: 3.25e-05 | |
| 2025-08-30 17:41:49 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 17:42:42 - pico-train - INFO - Step 74200 -- ๐ Training Metrics | |
| 2025-08-30 17:42:42 - pico-train - INFO - โโโ Loss: 4.7807 | |
| 2025-08-30 17:42:42 - pico-train - INFO - โโโ Learning Rate: 3.23e-05 | |
| 2025-08-30 17:42:42 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 17:43:36 - pico-train - INFO - Step 74300 -- ๐ Training Metrics | |
| 2025-08-30 17:43:36 - pico-train - INFO - โโโ Loss: 4.7711 | |
| 2025-08-30 17:43:36 - pico-train - INFO - โโโ Learning Rate: 3.21e-05 | |
| 2025-08-30 17:43:36 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 17:44:29 - pico-train - INFO - Step 74400 -- ๐ Training Metrics | |
| 2025-08-30 17:44:29 - pico-train - INFO - โโโ Loss: 4.7837 | |
| 2025-08-30 17:44:29 - pico-train - INFO - โโโ Learning Rate: 3.18e-05 | |
| 2025-08-30 17:44:29 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 17:45:21 - pico-train - INFO - Step 74500 -- ๐ Training Metrics | |
| 2025-08-30 17:45:21 - pico-train - INFO - โโโ Loss: 4.7668 | |
| 2025-08-30 17:45:21 - pico-train - INFO - โโโ Learning Rate: 3.16e-05 | |
| 2025-08-30 17:45:21 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 17:46:15 - pico-train - INFO - Step 74600 -- ๐ Training Metrics | |
| 2025-08-30 17:46:15 - pico-train - INFO - โโโ Loss: 4.7985 | |
| 2025-08-30 17:46:15 - pico-train - INFO - โโโ Learning Rate: 3.14e-05 | |
| 2025-08-30 17:46:15 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 17:47:08 - pico-train - INFO - Step 74700 -- ๐ Training Metrics | |
| 2025-08-30 17:47:08 - pico-train - INFO - โโโ Loss: 4.7702 | |
| 2025-08-30 17:47:08 - pico-train - INFO - โโโ Learning Rate: 3.11e-05 | |
| 2025-08-30 17:47:08 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 17:48:01 - pico-train - INFO - Step 74800 -- ๐ Training Metrics | |
| 2025-08-30 17:48:01 - pico-train - INFO - โโโ Loss: 4.8002 | |
| 2025-08-30 17:48:01 - pico-train - INFO - โโโ Learning Rate: 3.09e-05 | |
| 2025-08-30 17:48:01 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 17:48:54 - pico-train - INFO - Step 74900 -- ๐ Training Metrics | |
| 2025-08-30 17:48:54 - pico-train - INFO - โโโ Loss: 4.7955 | |
| 2025-08-30 17:48:54 - pico-train - INFO - โโโ Learning Rate: 3.07e-05 | |
| 2025-08-30 17:48:54 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 17:49:48 - pico-train - INFO - Step 75000 -- ๐ Training Metrics | |
| 2025-08-30 17:49:48 - pico-train - INFO - โโโ Loss: 4.8023 | |
| 2025-08-30 17:49:48 - pico-train - INFO - โโโ Learning Rate: 3.04e-05 | |
| 2025-08-30 17:49:48 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 17:50:41 - pico-train - INFO - Step 75100 -- ๐ Training Metrics | |
| 2025-08-30 17:50:41 - pico-train - INFO - โโโ Loss: 4.7842 | |
| 2025-08-30 17:50:41 - pico-train - INFO - โโโ Learning Rate: 3.02e-05 | |
| 2025-08-30 17:50:41 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 17:51:34 - pico-train - INFO - Step 75200 -- ๐ Training Metrics | |
| 2025-08-30 17:51:34 - pico-train - INFO - โโโ Loss: 4.7890 | |
| 2025-08-30 17:51:34 - pico-train - INFO - โโโ Learning Rate: 3.00e-05 | |
| 2025-08-30 17:51:34 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 17:52:27 - pico-train - INFO - Step 75300 -- ๐ Training Metrics | |
| 2025-08-30 17:52:27 - pico-train - INFO - โโโ Loss: 4.8004 | |
| 2025-08-30 17:52:27 - pico-train - INFO - โโโ Learning Rate: 2.97e-05 | |
| 2025-08-30 17:52:27 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 17:53:20 - pico-train - INFO - Step 75400 -- ๐ Training Metrics | |
| 2025-08-30 17:53:20 - pico-train - INFO - โโโ Loss: 4.7917 | |
| 2025-08-30 17:53:20 - pico-train - INFO - โโโ Learning Rate: 2.95e-05 | |
| 2025-08-30 17:53:20 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 17:54:13 - pico-train - INFO - Step 75500 -- ๐ Training Metrics | |
| 2025-08-30 17:54:13 - pico-train - INFO - โโโ Loss: 4.7867 | |
| 2025-08-30 17:54:13 - pico-train - INFO - โโโ Learning Rate: 2.93e-05 | |
| 2025-08-30 17:54:13 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 17:55:07 - pico-train - INFO - Step 75600 -- ๐ Training Metrics | |
| 2025-08-30 17:55:07 - pico-train - INFO - โโโ Loss: 4.7957 | |
| 2025-08-30 17:55:07 - pico-train - INFO - โโโ Learning Rate: 2.91e-05 | |
| 2025-08-30 17:55:07 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 17:56:00 - pico-train - INFO - Step 75700 -- ๐ Training Metrics | |
| 2025-08-30 17:56:00 - pico-train - INFO - โโโ Loss: 4.7840 | |
| 2025-08-30 17:56:00 - pico-train - INFO - โโโ Learning Rate: 2.88e-05 | |
| 2025-08-30 17:56:00 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 17:56:56 - pico-train - INFO - Step 75800 -- ๐ Training Metrics | |
| 2025-08-30 17:56:56 - pico-train - INFO - โโโ Loss: 4.7990 | |
| 2025-08-30 17:56:56 - pico-train - INFO - โโโ Learning Rate: 2.86e-05 | |
| 2025-08-30 17:56:56 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 17:57:48 - pico-train - INFO - Step 75900 -- ๐ Training Metrics | |
| 2025-08-30 17:57:48 - pico-train - INFO - โโโ Loss: 4.7904 | |
| 2025-08-30 17:57:48 - pico-train - INFO - โโโ Learning Rate: 2.84e-05 | |
| 2025-08-30 17:57:48 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 17:58:41 - pico-train - INFO - Step 76000 -- ๐พ Saving Checkpoint | |
| 2025-08-30 18:01:59 - pico-train - INFO - Step 76000 -- ๐ Evaluation Results | |
| 2025-08-30 18:01:59 - pico-train - INFO - โโโ paloma: inf | |
| 2025-08-30 18:02:00 - pico-train - INFO - Step 76000 -- ๐ Training Metrics | |
| 2025-08-30 18:02:00 - pico-train - INFO - โโโ Loss: 4.7972 | |
| 2025-08-30 18:02:00 - pico-train - INFO - โโโ Learning Rate: 2.82e-05 | |
| 2025-08-30 18:02:00 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 18:02:00 - pico-train - INFO - Step 76000 -- ๐ Saving Learning Dynamics | |
| 2025-08-30 18:03:04 - pico-train - INFO - Step 76100 -- ๐ Training Metrics | |
| 2025-08-30 18:03:04 - pico-train - INFO - โโโ Loss: 4.7730 | |
| 2025-08-30 18:03:04 - pico-train - INFO - โโโ Learning Rate: 2.79e-05 | |
| 2025-08-30 18:03:04 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 18:03:56 - pico-train - INFO - Step 76200 -- ๐ Training Metrics | |
| 2025-08-30 18:03:56 - pico-train - INFO - โโโ Loss: 4.7997 | |
| 2025-08-30 18:03:56 - pico-train - INFO - โโโ Learning Rate: 2.77e-05 | |
| 2025-08-30 18:03:56 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 18:04:48 - pico-train - INFO - Step 76300 -- ๐ Training Metrics | |
| 2025-08-30 18:04:48 - pico-train - INFO - โโโ Loss: 4.7843 | |
| 2025-08-30 18:04:48 - pico-train - INFO - โโโ Learning Rate: 2.75e-05 | |
| 2025-08-30 18:04:48 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 18:05:40 - pico-train - INFO - Step 76400 -- ๐ Training Metrics | |
| 2025-08-30 18:05:40 - pico-train - INFO - โโโ Loss: 4.7858 | |
| 2025-08-30 18:05:40 - pico-train - INFO - โโโ Learning Rate: 2.73e-05 | |
| 2025-08-30 18:05:40 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 18:06:32 - pico-train - INFO - Step 76500 -- ๐ Training Metrics | |
| 2025-08-30 18:06:32 - pico-train - INFO - โโโ Loss: 4.8110 | |
| 2025-08-30 18:06:32 - pico-train - INFO - โโโ Learning Rate: 2.71e-05 | |
| 2025-08-30 18:06:32 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 18:07:24 - pico-train - INFO - Step 76600 -- ๐ Training Metrics | |
| 2025-08-30 18:07:24 - pico-train - INFO - โโโ Loss: 4.7834 | |
| 2025-08-30 18:07:24 - pico-train - INFO - โโโ Learning Rate: 2.68e-05 | |
| 2025-08-30 18:07:24 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 18:08:16 - pico-train - INFO - Step 76700 -- ๐ Training Metrics | |
| 2025-08-30 18:08:16 - pico-train - INFO - โโโ Loss: 4.7936 | |
| 2025-08-30 18:08:16 - pico-train - INFO - โโโ Learning Rate: 2.66e-05 | |
| 2025-08-30 18:08:16 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 18:09:08 - pico-train - INFO - Step 76800 -- ๐ Training Metrics | |
| 2025-08-30 18:09:08 - pico-train - INFO - โโโ Loss: 4.7869 | |
| 2025-08-30 18:09:08 - pico-train - INFO - โโโ Learning Rate: 2.64e-05 | |
| 2025-08-30 18:09:08 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 18:10:00 - pico-train - INFO - Step 76900 -- ๐ Training Metrics | |
| 2025-08-30 18:10:00 - pico-train - INFO - โโโ Loss: 4.7979 | |
| 2025-08-30 18:10:00 - pico-train - INFO - โโโ Learning Rate: 2.62e-05 | |
| 2025-08-30 18:10:00 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 18:10:54 - pico-train - INFO - Step 77000 -- ๐ Training Metrics | |
| 2025-08-30 18:10:54 - pico-train - INFO - โโโ Loss: 4.7956 | |
| 2025-08-30 18:10:54 - pico-train - INFO - โโโ Learning Rate: 2.60e-05 | |
| 2025-08-30 18:10:54 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 18:11:46 - pico-train - INFO - Step 77100 -- ๐ Training Metrics | |
| 2025-08-30 18:11:46 - pico-train - INFO - โโโ Loss: 4.7974 | |
| 2025-08-30 18:11:46 - pico-train - INFO - โโโ Learning Rate: 2.58e-05 | |
| 2025-08-30 18:11:46 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 18:12:38 - pico-train - INFO - Step 77200 -- ๐ Training Metrics | |
| 2025-08-30 18:12:38 - pico-train - INFO - โโโ Loss: 4.8074 | |
| 2025-08-30 18:12:38 - pico-train - INFO - โโโ Learning Rate: 2.55e-05 | |
| 2025-08-30 18:12:38 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 18:13:30 - pico-train - INFO - Step 77300 -- ๐ Training Metrics | |
| 2025-08-30 18:13:30 - pico-train - INFO - โโโ Loss: 4.8276 | |
| 2025-08-30 18:13:30 - pico-train - INFO - โโโ Learning Rate: 2.53e-05 | |
| 2025-08-30 18:13:30 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 18:14:27 - pico-train - INFO - Step 77400 -- ๐ Training Metrics | |
| 2025-08-30 18:14:27 - pico-train - INFO - โโโ Loss: 4.7908 | |
| 2025-08-30 18:14:27 - pico-train - INFO - โโโ Learning Rate: 2.51e-05 | |
| 2025-08-30 18:14:27 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 18:15:20 - pico-train - INFO - Step 77500 -- ๐ Training Metrics | |
| 2025-08-30 18:15:20 - pico-train - INFO - โโโ Loss: 4.8142 | |
| 2025-08-30 18:15:20 - pico-train - INFO - โโโ Learning Rate: 2.49e-05 | |
| 2025-08-30 18:15:20 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 18:16:13 - pico-train - INFO - Step 77600 -- ๐ Training Metrics | |
| 2025-08-30 18:16:13 - pico-train - INFO - โโโ Loss: 4.8052 | |
| 2025-08-30 18:16:13 - pico-train - INFO - โโโ Learning Rate: 2.47e-05 | |
| 2025-08-30 18:16:13 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 18:17:06 - pico-train - INFO - Step 77700 -- ๐ Training Metrics | |
| 2025-08-30 18:17:06 - pico-train - INFO - โโโ Loss: 4.7876 | |
| 2025-08-30 18:17:06 - pico-train - INFO - โโโ Learning Rate: 2.45e-05 | |
| 2025-08-30 18:17:06 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 18:18:01 - pico-train - INFO - Step 77800 -- ๐ Training Metrics | |
| 2025-08-30 18:18:01 - pico-train - INFO - โโโ Loss: 4.8011 | |
| 2025-08-30 18:18:01 - pico-train - INFO - โโโ Learning Rate: 2.43e-05 | |
| 2025-08-30 18:18:01 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 18:18:54 - pico-train - INFO - Step 77900 -- ๐ Training Metrics | |
| 2025-08-30 18:18:54 - pico-train - INFO - โโโ Loss: 4.7936 | |
| 2025-08-30 18:18:54 - pico-train - INFO - โโโ Learning Rate: 2.41e-05 | |
| 2025-08-30 18:18:54 - pico-train - INFO - โโโ Inf/NaN count: 0 | |
| 2025-08-30 18:19:47 - pico-train - INFO - Step 78000 -- ๐พ Saving Checkpoint | |